[patch] Loop-aware SLP

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [patch] Loop-aware SLP
@ 2007-08-14 13:01 Ira Rosen
  2007-08-14 13:05 ` [patch] Loop-aware SLP 1/5 Ira Rosen
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Ira Rosen @ 2007-08-14 13:01 UTC (permalink / raw)
  To: gcc-patches; +Cc: Dorit Nuzman

This patch adds a support of straight-line code vectorization inside loops.
This a merge from autovect branch.
So, for example, if you have an unrolled loop like this:

for (i=0; i<N; i++){
 a[4*i] = b[4*i] + x0;
 a[4*i+1] = b[4*i+1] + x1;
 a[4*i+2] = b[4*i+2] + x2;
 a[4*i+3] = b[4*i+3] + x3;
}

it will get vectorized we follows:

for (i=0; i<N; i++){
 va[4*i:4*i+3] = vb[4*i:4*i+3] + {x0,x1,x2,x3};
}

Note that, as opposed to loop vectorization, the loop count is not divided
by the vectorization factor.

This implementation also handles cases in which there are not enough scalar
stmts to fill up a vector, and unrolls the loop as necessary during the
vectorization analysis. So the following loop:

for (i=0; i<N; i++){
 a[2*i] = b[2*i] + x0;
 a[2*i+1] = b[2*i+1] + x1;
}

is vectorized into the following:

for (i=0; i<N/2; i++){
 va[4*i:4*i+3] = vb[4*i:4*i+3] + {x0,x1,x0,x1};
}

SLP opportunities are discovered based on strided access support analysis;
we start off from groups of adjacent memory-references (which is something
we already have), and analyze their def-use chain in search for sequences
of isomorphic operations that can be packed into vector operations. The
main differences between this implementation and the approach described in
the original SLP paper (http://portal.acm.org/citation.cfm?id=349320) are
the following:

- We focus on loops, rather then any basic-block in the program. We do that
because this way we can reuse all the existing loop-based vectorization
infrastructure. Also we expect that most of the worthwhile opportunities
are in loops, and in this case you loose if you work without the loop
context.

- The original SLP algorithm starts off from "seeds" of pairs of adjacent
loads/stores, and continues growing them and further merge in the
Combination stage of the algorithm (to get VS (Vector Size) sized groups).
Our seed is a groups of adjacent (interleaved) stores identified beforehand
by the interleaving analysis, so we don't need a combination stage at all.
We don't yet look at loads, and not sure we'd need to.

- Our approach combines both SLP-based and loop-based vectorization. So,
for example, if we have a loop like this:

for (i=0; i<N; i++){
 a[4*i] = b[4*i] + x0;
 a[4*i+1] = b[4*i+1] + x1;
 a[4*i+2] = b[4*i+2] + x2;
 a[4*i+3] = b[4*i+3] + x3;

 c[i] = 0;
}

we vectorize it as follows:

for (i=0; i<N/4; i++){
 va[16*i:16*i+3] = vb[16*i:16*i+3] + {x0,x1,x2,x3};
 va[16*i+4:16*i+7] = vb[16*i+4:16*i+7] + {x0,x1,x2,x3};
 va[16*i+8:16*i+11] = vb[16*i+8:16*i+11] + {x0,x1,x2,x3};
 va[16*i+12:16*i+15] = vb[16*i+12:16*i+15] + {x0,x1,x2,x3};

 vc[4*i:4*i+3] = {0,0,0,0};
}

(note that stmts 1,2,3,4 got vectorized using SLP, and stmt 5 got
vectorized using regular loop-vectorization).

We plan to extend this initial "SLP-like vectorization in loops" in the
following directions:
- support reduction
- support data permutations
- support interleaved accesses with gaps
- support loops with multiple types

More information can be found in our summit paper (
http://gcc.gnu.org/wiki/HomePage?action=AttachFile&do=get&target=GCC2007-Proceedings.pdf).

This is a joint work with Dorit.

The patch is divided into 5 parts and is relative to this cleanup patch (
http://gcc.gnu.org/ml/gcc-patches/2007-08/msg00806.html). I am going to
submit these parts as a reply to this note.
Each part can be compiled (but with warnings about unused arguments and
variables) and passes vectorization tests. The whole patch passes bootstrap
with vectorization enabled and vectorization testcases on x86_64-linux. I
am going to run full regtesting and bootstrap on PowerPC.

O.K. for mainline once the testing is completed?

Thanks,
Ira

:ADDPATCH SSA (vectorizer):

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] Loop-aware SLP 1/5
  2007-08-14 13:01 [patch] Loop-aware SLP Ira Rosen
@ 2007-08-14 13:05 ` Ira Rosen
  2007-08-14 13:07 ` [patch] Loop-aware SLP 2/5 Ira Rosen
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Ira Rosen @ 2007-08-14 13:05 UTC (permalink / raw)
  To: Ira Rosen; +Cc: Dorit Nuzman, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2214 bytes --]

This is the first part of loop-aware SLP patch. This part adds SLP data
structures and their initialization.
It also adds an SLP argument for vectorizable_... functions (e.g.,
vectorizable_store).

During SLP analysis a computation tree that represents an SLP instance (a
sequence of groups of SLPable stmts) in the loop is built. Each node
contains a group of scalar stmts that can be packed together into vector
stmt(s), the vector stmt(s) (after the transformation phase), and tree
information. We keep this data in struct _slp_tree.

SLP instance info (the root to its computation tree, the size of groups of
scalar stmts packed together, required unrolling factor) is represented by
struct _slp_instance. For each loop, we keep a VEC of its SLP instances.

Thanks,
Ira

ChangeLog:

      * tree-vectorizer.h (enum vect_def_type): Start enumeration from 1.
      (struct _slp_tree, struct _slp_instance): Define new data structures
along
      macros for their access.
      (struct _loop_vec_info): Define new fields: strided_stores,
slp_instances,
      and slp_unrolling_factor along macros for their access.
      (enum slp_vect_type): New.
      (struct _stmt_vec_info): Define new field, slp_type, and macros for
its
      access.
      (vect_free_slp_tree): Declare.
      (vectorizable_load): Add an argument of type slp_tree.
      ( vectorizable_store, vectorizable_operation,
vectorizable_conversion,
      vectorizable_assignment): Likewise.
      * tree-vectorizer.c (new_stmt_vec_info): Initialize the new field.
      (new_loop_vec_info): Likewise.
      (destroy_loop_vec_info): Free memory allocated for SLP structures.
      * tree-vect-analyze.c (vect_analyze_operations): Add argument for
calls
      to vectorizable_ () functions.
      (vect_free_slp_tree): New.
      * tree-vect-transform.c (vectorizable_conversion): Add a new argument
      for SLP.
      (vectorizable_store, vectorizable_operation, vectorizable_conversion,
      vectorizable_assignment): Likewise.
      (vect_transform_stmt): Add a new argument for SLP. Check that there
is
      no SLP transformation required for unsupported cases. Add SLP
      argument for supported cases.

(See attached file: slp-part1.txt)

[-- Attachment #2: slp-part1.txt --]
[-- Type: text/plain, Size: 16555 bytes --]

--- tree-vectorizer.h.patch1	2007-08-14 13:43:52.000000000 +0300
+++ tree-vectorizer.h	2007-08-14 13:49:57.000000000 +0300
@@ -59,7 +59,7 @@ enum dr_alignment_support {
 
 /* Define type of def-use cross-iteration cycle.  */
 enum vect_def_type {
-  vect_constant_def,
+  vect_constant_def = 1,
   vect_invariant_def,
   vect_loop_def,
   vect_induction_def,
@@ -81,6 +81,61 @@ enum verbosity_levels {
   MAX_VERBOSITY_LEVEL
 };
 
+/************************************************************************
+  SLP
+ ************************************************************************/
+
+/* A computation tree of an SLP instance. Each node corresponds to a group of
+   stmts to be packed in a SIMD stmt.  */
+typedef struct _slp_tree {
+  /* Only binary and unary operations are supported. LEFT child corresponds to
+     the first operand and RIGHT child to the second if the operation is
+     binary.  */
+  struct _slp_tree *left;
+  struct _slp_tree *right;
+  /* A group of scalar stmts to be vectorized together.  */
+  VEC (tree, heap) *stmts;
+  /* Vectorized stmt/s.  */
+  VEC (tree, heap) *vec_stmts;
+  /* Number of vectorized stmts.  */
+  unsigned int vec_stmts_size;
+  /* Vectorization costs associated with SLP node.  */
+  struct
+  {
+    int outside_of_loop;     /* Statements generated outside loop.  */
+    int inside_of_loop;      /* Statements generated inside loop.  */
+  } cost;
+} *slp_tree;
+
+
+/* SLP instance is a sequence of stmts in a loop that can be packed into
+   SIMD stmts.  */
+typedef struct _slp_instance {
+  /* The root of SLP tree.  */
+  slp_tree root;
+
+  /* Size of groups of scalar stmts that will be replaced by SIMD stmt/s.  */
+  unsigned int group_size;
+
+  /* The unrolling factor required to vectorized this SLP instance.  */
+  unsigned int unrolling_factor;
+
+} *slp_instance;
+
+DEF_VEC_P(slp_instance);
+DEF_VEC_ALLOC_P(slp_instance, heap);
+
+/* Access Functions.  */
+#define SLP_INSTANCE_TREE(S)                     (S)->root
+#define SLP_INSTANCE_GROUP_SIZE(S)               (S)->group_size
+#define SLP_INSTANCE_UNROLLING_FACTOR(S)         (S)->unrolling_factor
+
+#define SLP_TREE_LEFT(S)                         (S)->left
+#define SLP_TREE_RIGHT(S)                        (S)->right
+#define SLP_TREE_SCALAR_STMTS(S)                 (S)->stmts
+#define SLP_TREE_VEC_STMTS(S)                    (S)->vec_stmts
+#define SLP_TREE_NUMBER_OF_VEC_STMTS(S)          (S)->vec_stmts_size
+
 /*-----------------------------------------------------------------*/
 /* Info on vectorized loops.                                       */
 /*-----------------------------------------------------------------*/
@@ -139,6 +194,16 @@ typedef struct _loop_vec_info {
 
   /* The loop location in the source.  */
   LOC loop_line_number;
+
+  /* All interleaving chains of stores in the loop, represented by the first
+     stmt in the chain.  */
+  VEC(tree, heap) *strided_stores;
+
+  /* All SLP instances in the loop.  */
+  VEC(slp_instance, heap) *slp_instances;
+
+  /* The unrolling factor needed to SLP the loop.  */
+  unsigned slp_unrolling_factor;
 } *loop_vec_info;
 
 /* Access Functions.  */
@@ -157,6 +222,9 @@ typedef struct _loop_vec_info {
 #define LOOP_VINFO_UNALIGNED_DR(L)    (L)->unaligned_dr
 #define LOOP_VINFO_MAY_MISALIGN_STMTS(L) (L)->may_misalign_stmts
 #define LOOP_VINFO_LOC(L)             (L)->loop_line_number
+#define LOOP_VINFO_STRIDED_STORES(L)  (L)->strided_stores
+#define LOOP_VINFO_SLP_INSTANCES(L)   (L)->slp_instances
+#define LOOP_VINFO_SLP_UNROLLING_FACTOR(L) (L)->slp_unrolling_factor
 
 #define NITERS_KNOWN_P(n)                     \
 (host_integerp ((n),0)                        \
@@ -198,6 +266,17 @@ enum vect_relevant {
   vect_used_in_loop  
 };
 
+/* The type of vectorization that can be applied to the stmt: regular loop-base
+   vectorization, pure SLP (the stmt is a part of SLP instances and does not
+   have uses outside SLP instances) or hybrid SLP and loop-based (the stmt is
+   a part of SLP instance and also must be loop-based vectorized, since it has
+   uses outside SLP sequences).  */
+enum slp_vect_type {
+  loop_vect = 0,
+  pure_slp,
+  hybrid
+};
+
 typedef struct data_reference *dr_p;
 DEF_VEC_P(dr_p);
 DEF_VEC_ALLOC_P(dr_p,heap);
@@ -282,6 +361,9 @@ typedef struct _stmt_vec_info {
     int outside_of_loop;     /* Statements generated outside loop.  */
     int inside_of_loop;      /* Statements generated inside loop.  */
   } cost;
+
+  /* If this stmt is a part of pure or hybrid SLP.  */
+  enum slp_vect_type slp_type;
 } *stmt_vec_info;
 
 /* Access Functions.  */
@@ -318,6 +400,10 @@ typedef struct _stmt_vec_info {
 #define STMT_VINFO_OUTSIDE_OF_LOOP_COST(S) (S)->cost.outside_of_loop
 #define STMT_VINFO_INSIDE_OF_LOOP_COST(S)  (S)->cost.inside_of_loop
 
+#define STMT_VINFO_HYBRID_SLP(S)           ((S)->slp_type == hybrid)
+#define STMT_VINFO_PURE_SLP(S)             ((S)->slp_type == pure_slp)
+#define STMT_VINFO_SLP_TYPE(S)              (S)->slp_type
+
 /* These are some defines for the initial implementation of the vectorizer's
    cost model.  These will later be target specific hooks.  */
 
@@ -482,6 +568,7 @@ extern stmt_vec_info new_stmt_vec_info (
 /** In tree-vect-analyze.c  **/
 /* Driver for analysis stage.  */
 extern loop_vec_info vect_analyze_loop (struct loop *);
+extern void vect_free_slp_tree (slp_tree);
 
 
 /** In tree-vect-patterns.c  **/
@@ -494,14 +581,16 @@ void vect_pattern_recog (loop_vec_info);
 
 
 /** In tree-vect-transform.c  **/
-extern bool vectorizable_load (tree, block_stmt_iterator *, tree *);
-extern bool vectorizable_store (tree, block_stmt_iterator *, tree *);
-extern bool vectorizable_operation (tree, block_stmt_iterator *, tree *);
+extern bool vectorizable_load (tree, block_stmt_iterator *, tree *, slp_tree);
+extern bool vectorizable_store (tree, block_stmt_iterator *, tree *, slp_tree);
+extern bool vectorizable_operation (tree, block_stmt_iterator *, tree *, 
+				    slp_tree);
 extern bool vectorizable_type_promotion (tree, block_stmt_iterator *, tree *);
 extern bool vectorizable_type_demotion (tree, block_stmt_iterator *, tree *);
 extern bool vectorizable_conversion (tree, block_stmt_iterator *, 
-				     tree *);
-extern bool vectorizable_assignment (tree, block_stmt_iterator *, tree *);
+				     tree *, slp_tree);
+extern bool vectorizable_assignment (tree, block_stmt_iterator *, tree *, 
+				     slp_tree);
 extern tree vectorizable_function (tree, tree, tree);
 extern bool vectorizable_call (tree, block_stmt_iterator *, tree *);
 extern bool vectorizable_condition (tree, block_stmt_iterator *, tree *);
--- tree-vectorizer.c.patch1	2007-08-14 13:43:52.000000000 +0300
+++ tree-vectorizer.c	2007-08-14 13:43:52.000000000 +0300
@@ -1338,6 +1338,7 @@ new_stmt_vec_info (tree stmt, loop_vec_i
   STMT_VINFO_SAME_ALIGN_REFS (res) = VEC_alloc (dr_p, heap, 5);
   STMT_VINFO_INSIDE_OF_LOOP_COST (res) = 0;
   STMT_VINFO_OUTSIDE_OF_LOOP_COST (res) = 0;
+  STMT_VINFO_SLP_TYPE (res) = 0;
   DR_GROUP_FIRST_DR (res) = NULL_TREE;
   DR_GROUP_NEXT_DR (res) = NULL_TREE;
   DR_GROUP_SIZE (res) = 0;
@@ -1402,6 +1403,9 @@ new_loop_vec_info (struct loop *loop)
   LOOP_VINFO_UNALIGNED_DR (res) = NULL;
   LOOP_VINFO_MAY_MISALIGN_STMTS (res)
     = VEC_alloc (tree, heap, PARAM_VALUE (PARAM_VECT_MAX_VERSION_CHECKS));
+  LOOP_VINFO_STRIDED_STORES (res) = VEC_alloc (tree, heap, 10);
+  LOOP_VINFO_SLP_INSTANCES (res) = VEC_alloc (slp_instance, heap, 10);
+  LOOP_VINFO_SLP_UNROLLING_FACTOR (res) = 1;
 
   return res;
 }
@@ -1420,6 +1424,8 @@ destroy_loop_vec_info (loop_vec_info loo
   int nbbs;
   block_stmt_iterator si;
   int j;
+  VEC (slp_instance, heap) *slp_instances;
+  slp_instance instance;
 
   if (!loop_vinfo)
     return;
@@ -1481,6 +1487,10 @@ destroy_loop_vec_info (loop_vec_info loo
   free_data_refs (LOOP_VINFO_DATAREFS (loop_vinfo));
   free_dependence_relations (LOOP_VINFO_DDRS (loop_vinfo));
   VEC_free (tree, heap, LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo));
+  slp_instances = LOOP_VINFO_SLP_INSTANCES (loop_vinfo);
+  for (j = 0; VEC_iterate (slp_instance, slp_instances, j, instance); j++)
+    vect_free_slp_tree (SLP_INSTANCE_TREE (instance));
+  VEC_free (slp_instance, heap, LOOP_VINFO_SLP_INSTANCES (loop_vinfo));
 
   free (loop_vinfo);
   loop->aux = NULL;
--- tree-vect-analyze.c.patch1	2007-08-14 13:43:51.000000000 +0300
+++ tree-vect-analyze.c	2007-08-14 13:54:47.000000000 +0300
@@ -485,12 +485,12 @@ vect_analyze_operations (loop_vec_info l
 
 	  ok = (vectorizable_type_promotion (stmt, NULL, NULL)
 		|| vectorizable_type_demotion (stmt, NULL, NULL)
-		|| vectorizable_conversion (stmt, NULL, NULL)
-		|| vectorizable_operation (stmt, NULL, NULL)
-		|| vectorizable_assignment (stmt, NULL, NULL)
-		|| vectorizable_load (stmt, NULL, NULL)
+		|| vectorizable_conversion (stmt, NULL, NULL, NULL)
+		|| vectorizable_operation (stmt, NULL, NULL, NULL)
+		|| vectorizable_assignment (stmt, NULL, NULL, NULL)
+		|| vectorizable_load (stmt, NULL, NULL, NULL)
 		|| vectorizable_call (stmt, NULL, NULL)
-		|| vectorizable_store (stmt, NULL, NULL)
+		|| vectorizable_store (stmt, NULL, NULL, NULL)
 		|| vectorizable_condition (stmt, NULL, NULL)
 		|| vectorizable_reduction (stmt, NULL, NULL));
 
@@ -2150,6 +2150,29 @@ vect_analyze_data_ref_accesses (loop_vec
 }
 
 
+/* Recursively free the memory allocated for the SLP tree rooted at NODE.  */
+
+void
+vect_free_slp_tree (slp_tree node)
+{
+  if (!node)
+    return;
+
+  if (SLP_TREE_LEFT (node))
+    vect_free_slp_tree (SLP_TREE_LEFT (node));
+   
+  if (SLP_TREE_RIGHT (node))
+    vect_free_slp_tree (SLP_TREE_RIGHT (node));
+   
+  VEC_free (tree, heap, SLP_TREE_SCALAR_STMTS (node));
+  
+  if (SLP_TREE_VEC_STMTS (node))
+    VEC_free (tree, heap, SLP_TREE_VEC_STMTS (node));
+
+  free (node);
+}
+
+
 /* Function vect_analyze_data_refs.
 
   Find all the data references in the loop.
--- tree-vect-analyze.c.patch1	2007-08-14 13:43:51.000000000 +0300
+++ tree-vect-analyze.c	2007-08-14 13:54:47.000000000 +0300
@@ -485,12 +485,12 @@ vect_analyze_operations (loop_vec_info l
 
 	  ok = (vectorizable_type_promotion (stmt, NULL, NULL)
 		|| vectorizable_type_demotion (stmt, NULL, NULL)
-		|| vectorizable_conversion (stmt, NULL, NULL)
-		|| vectorizable_operation (stmt, NULL, NULL)
-		|| vectorizable_assignment (stmt, NULL, NULL)
-		|| vectorizable_load (stmt, NULL, NULL)
+		|| vectorizable_conversion (stmt, NULL, NULL, NULL)
+		|| vectorizable_operation (stmt, NULL, NULL, NULL)
+		|| vectorizable_assignment (stmt, NULL, NULL, NULL)
+		|| vectorizable_load (stmt, NULL, NULL, NULL)
 		|| vectorizable_call (stmt, NULL, NULL)
-		|| vectorizable_store (stmt, NULL, NULL)
+		|| vectorizable_store (stmt, NULL, NULL, NULL)
 		|| vectorizable_condition (stmt, NULL, NULL)
 		|| vectorizable_reduction (stmt, NULL, NULL));
 
@@ -2150,6 +2150,29 @@ vect_analyze_data_ref_accesses (loop_vec
 }
 
 
+/* Recursively free the memory allocated for the SLP tree rooted at NODE.  */
+
+void
+vect_free_slp_tree (slp_tree node)
+{
+  if (!node)
+    return;
+
+  if (SLP_TREE_LEFT (node))
+    vect_free_slp_tree (SLP_TREE_LEFT (node));
+   
+  if (SLP_TREE_RIGHT (node))
+    vect_free_slp_tree (SLP_TREE_RIGHT (node));
+   
+  VEC_free (tree, heap, SLP_TREE_SCALAR_STMTS (node));
+  
+  if (SLP_TREE_VEC_STMTS (node))
+    VEC_free (tree, heap, SLP_TREE_VEC_STMTS (node));
+
+  free (node);
+}
+
+
 /* Function vect_analyze_data_refs.
 
   Find all the data references in the loop.
--- tree-vect-transform.c.patch1	2007-08-14 13:43:51.000000000 +0300
+++ tree-vect-transform.c.patch2	2007-08-14 14:03:27.000000000 +0300
@@ -2664,7 +2664,7 @@ vect_gen_widened_results_half (enum tree
 
 bool
 vectorizable_conversion (tree stmt, block_stmt_iterator *bsi,
-			 tree *vec_stmt)
+			 tree *vec_stmt, slp_tree slp_node)
 {
   tree vec_dest;
   tree scalar_dest;
@@ -2916,7 +2916,8 @@ vectorizable_conversion (tree stmt, bloc
    Return FALSE if not a vectorizable STMT, TRUE otherwise.  */
 
 bool
-vectorizable_assignment (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)
+vectorizable_assignment (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt, 
+			 slp_tree slp_node)
 {
   tree vec_dest;
   tree scalar_dest;
@@ -3059,7 +3060,8 @@ vectorizable_induction (tree phi, block_
    Return FALSE if not a vectorizable STMT, TRUE otherwise.  */
 
 bool
-vectorizable_operation (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)
+vectorizable_operation (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt, 
+			slp_tree slp_node)
 {
   tree vec_dest;
   tree scalar_dest;
@@ -3833,7 +3835,8 @@ vect_permute_store_chain (VEC(tree,heap)
    Return FALSE if not a vectorizable STMT, TRUE otherwise.  */
 
 bool
-vectorizable_store (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)
+vectorizable_store (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt,
+		    slp_tree slp_node)
 {
   tree scalar_dest;
   tree data_ref;
@@ -4464,7 +4467,8 @@ vect_transform_strided_load (tree stmt, 
    Return FALSE if not a vectorizable STMT, TRUE otherwise.  */
 
 bool
-vectorizable_load (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)
+vectorizable_load (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt, 
+		   slp_tree slp_node)
 {
   tree scalar_dest;
   tree vec_dest = NULL;
@@ -5013,7 +5017,8 @@ vectorizable_condition (tree stmt, block
    Create a vectorized stmt to replace STMT, and insert it at BSI.  */
 
 static bool
-vect_transform_stmt (tree stmt, block_stmt_iterator *bsi, bool *strided_store)
+vect_transform_stmt (tree stmt, block_stmt_iterator *bsi, bool *strided_store, 
+		     slp_tree slp_node)
 {
   bool is_store = false;
   tree vec_stmt = NULL_TREE;
@@ -5024,42 +5029,45 @@ vect_transform_stmt (tree stmt, block_st
   switch (STMT_VINFO_TYPE (stmt_info))
     {
     case type_demotion_vec_info_type:
+      gcc_assert (!slp_node);
       done = vectorizable_type_demotion (stmt, bsi, &vec_stmt);
       gcc_assert (done);
       break;
 
     case type_promotion_vec_info_type:
+      gcc_assert (!slp_node);
       done = vectorizable_type_promotion (stmt, bsi, &vec_stmt);
       gcc_assert (done);
       break;
 
     case type_conversion_vec_info_type:
-      done = vectorizable_conversion (stmt, bsi, &vec_stmt);
+      done = vectorizable_conversion (stmt, bsi, &vec_stmt, slp_node);
       gcc_assert (done);
       break;
 
     case induc_vec_info_type:
+      gcc_assert (!slp_node);
       done = vectorizable_induction (stmt, bsi, &vec_stmt);
       gcc_assert (done);
       break;
 
     case op_vec_info_type:
-      done = vectorizable_operation (stmt, bsi, &vec_stmt);
+      done = vectorizable_operation (stmt, bsi, &vec_stmt, slp_node);
       gcc_assert (done);
       break;
 
     case assignment_vec_info_type:
-      done = vectorizable_assignment (stmt, bsi, &vec_stmt);
+      done = vectorizable_assignment (stmt, bsi, &vec_stmt, slp_node);
       gcc_assert (done);
       break;
 
     case load_vec_info_type:
-      done = vectorizable_load (stmt, bsi, &vec_stmt);
+      done = vectorizable_load (stmt, bsi, &vec_stmt, slp_node);
       gcc_assert (done);
       break;
 
     case store_vec_info_type:
-      done = vectorizable_store (stmt, bsi, &vec_stmt);
+      done = vectorizable_store (stmt, bsi, &vec_stmt, slp_node);
       gcc_assert (done);
       if (STMT_VINFO_STRIDED_ACCESS (stmt_info))
 	{
@@ -5076,15 +5084,18 @@ vect_transform_stmt (tree stmt, block_st
       break;
 
     case condition_vec_info_type:
+      gcc_assert (!slp_node);
       done = vectorizable_condition (stmt, bsi, &vec_stmt);
       gcc_assert (done);
       break;
 
     case call_vec_info_type:
+      gcc_assert (!slp_node);
       done = vectorizable_call (stmt, bsi, &vec_stmt);
       break;
 
     case reduc_vec_info_type:
+      gcc_assert (!slp_node);
       done = vectorizable_reduction (stmt, bsi, &vec_stmt);
       gcc_assert (done);
       break;
@@ -5931,7 +5942,7 @@ vect_transform_loop (loop_vec_info loop_
 	    {
 	      if (vect_print_dump_info (REPORT_DETAILS))
 		fprintf (vect_dump, "transform phi.");
-	      vect_transform_stmt (phi, NULL, NULL);
+	      vect_transform_stmt (phi, NULL, NULL, NULL);
 	    }
 	}
 
@@ -5965,7 +5976,7 @@ vect_transform_loop (loop_vec_info loop_
 	    fprintf (vect_dump, "transform statement.");
 
 	  strided_store = false;
-	  is_store = vect_transform_stmt (stmt, &si, &strided_store);
+	  is_store = vect_transform_stmt (stmt, &si, &strided_store, NULL);
           if (is_store)
             {
 	      stmt_ann_t ann;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] Loop-aware SLP 2/5
  2007-08-14 13:01 [patch] Loop-aware SLP Ira Rosen
  2007-08-14 13:05 ` [patch] Loop-aware SLP 1/5 Ira Rosen
@ 2007-08-14 13:07 ` Ira Rosen
  2007-08-14 13:12 ` [patch] Loop-aware SLP 3/5 Ira Rosen
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Ira Rosen @ 2007-08-14 13:07 UTC (permalink / raw)
  To: Ira Rosen; +Cc: Dorit Nuzman, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2313 bytes --]

This is the second part of loop-ware SLP patch.
We add SLP analysis and transformation functions here.

The analysis for SLP builds computation trees in a recursive bottom-up,
use-def manner rooted at interleaved
store groups. Each node in a tree contains the same number of isomorphic
statements, which can execute
in parallel. At each step, the uses of statements in the current node are
examined for their corresponding
defs, and we either extend the tree by additional child nodes, conclude
that the current node is a leaf, or terminate the tree. When the def
statements are all loop invariant (including constants) or are all loads
from interleaved memory addresses (i.e., form an interleaved load group),
the node is considered a leaf. Additional
child nodes are created when the def statements are all loop-variant,
isomorphic and independent non-load
statements. In all other cases, including reaching defs which belong to SSA
phi-nodes, we terminate the tree
and delete it.

Upon reaching the first statement of a computation tree to be SLP’ed
(typically the first load), we generate the vectorized version of the whole
tree, from leaves to root, getting vector def stmts from already scheduled
tree nodes. (Note that this part of the patch doesn't contain actual call
to vect_schedule_slp(), which is called from vect_transform_loop() - this
will be added in the next part of the patch).

Thanks,
Ira

ChangeLog:

      * tree-vect-analyze.c (vect_analyze_operations): For not pure SLP
stmts
      with strided access check that the group size is power of 2. Update
the
      vectorization factor according to SLP.
      (vect_get_and_check_slp_defs, vect_build_slp_tree,
vect_print_slp_tree,
      vect_mark_slp_stmts, vect_analyze_slp_instance, vect_analyze_slp,
      vect_make_slp_decision, vect_detect_hybrid_slp_stmts,
      vect_detect_hybrid_slp): New.
      (vect_analyze_loop): Call vect_analyze_slp, vect_make_slp_decision
      and vect_detect_hybrid_slp.
      * tree-vect-transform.c (vect_get_constant_vectors,
      vect_get_slp_vect_defs, vect_get_slp_defs, vect_remove_stores,
      vect_schedule_slp_instance, vect_schedule_slp): New.
      * Makefile.in: (tree-vect-analyze.o): Depend on recog.h.

(See attached file: slp-part2.txt)

[-- Attachment #2: slp-part2.txt --]
[-- Type: text/plain, Size: 31872 bytes --]

--- tree-vect-analyze.c.patch2	2007-08-14 14:03:22.000000000 +0300
+++ tree-vect-analyze.c	2007-08-14 14:31:36.000000000 +0300
@@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.  
 #include "tree-scalar-evolution.h"
 #include "tree-vectorizer.h"
 #include "toplev.h"
+#include "recog.h"
 
 
 /* Function vect_determine_vectorization_factor
@@ -369,6 +370,7 @@ vect_analyze_operations (loop_vec_info l
   int min_profitable_iters;
   int min_scalar_loop_bound;
   unsigned int th;
+  bool vect_pure_slp = true;
 
   if (vect_print_dump_info (REPORT_DETAILS))
     fprintf (vect_dump, "=== vect_analyze_operations ===");
@@ -509,6 +511,28 @@ vect_analyze_operations (loop_vec_info l
 		}
 	      return false;
 	    }	
+
+	  if (!STMT_VINFO_PURE_SLP (stmt_info))
+	    {
+	      /* STMT needs regular vectorization - pure SLP is impossible.  */
+	      vect_pure_slp = false;
+
+	      /* Groups of strided accesses which size is not a power of 2 are 
+		 not vectorizable yet. If SLP also fails, the loop cannot be
+		 vectorized.  */
+	      if (STMT_VINFO_STRIDED_ACCESS (stmt_info)
+		  && exact_log2 (DR_GROUP_SIZE (vinfo_for_stmt (
+			          DR_GROUP_FIRST_DR (stmt_info)))) == -1)
+		{
+		  if (vect_print_dump_info (REPORT_DETAILS))
+		    {
+		      fprintf (vect_dump, "not vectorized: the size of group "
+			       "of strided accesses is not a power of 2");
+		      print_generic_expr (vect_dump, stmt, TDF_SLIM);
+		    }
+		  return false;
+		}
+	    }
 	} /* stmts in bb */
     } /* bbs */
 
@@ -528,6 +552,23 @@ vect_analyze_operations (loop_vec_info l
       return false;
     }
 
+  /* If all the stmts in the loop can be SLPed, we perform pure SLP, and
+     vectorization factor of the loop is the unrolling factor required by the
+     SLP instances.  */
+  if (vect_pure_slp)
+    {
+      vectorization_factor = LOOP_VINFO_SLP_UNROLLING_FACTOR (loop_vinfo);
+      LOOP_VINFO_VECT_FACTOR (loop_vinfo) = vectorization_factor;    
+      if (vect_print_dump_info (REPORT_DETAILS)) 
+	fprintf (vect_dump, "PURE SLP: VF = %d", vectorization_factor);
+    }
+  else
+    {
+      vectorization_factor = least_common_multiple (vectorization_factor,
+                                 LOOP_VINFO_SLP_UNROLLING_FACTOR (loop_vinfo));
+      LOOP_VINFO_VECT_FACTOR (loop_vinfo) = vectorization_factor;
+    }
+
   if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
       && vect_print_dump_info (REPORT_DETAILS))
     fprintf (vect_dump,
@@ -2173,6 +2214,605 @@ vect_free_slp_tree (slp_tree node)
 }
 
 
+/* Get the defs for the RHS, check that they are of a legal type and that
+   they match the defs of the first stmt of the SLP group.  */
+
+static bool
+vect_get_and_check_slp_defs (loop_vec_info loop_vinfo, slp_tree slp_node,
+			     tree rhs, VEC (tree, heap) **def_stmts0,
+			     VEC (tree, heap) **def_stmts1,
+			     enum vect_def_type *first_stmt_dt0,
+			     enum vect_def_type *first_stmt_dt1,
+			     tree *first_stmt_def0_type, 
+			     tree *first_stmt_def1_type,
+			     tree *first_stmt_const_oprnd)
+{
+  tree oprnd;
+  enum operation_type op_type = TREE_OPERAND_LENGTH (rhs);
+  unsigned int i, number_of_oprnds = op_type;
+  tree def, def_stmt;
+  enum vect_def_type dt[2] = {vect_unknown_def_type, vect_unknown_def_type};
+  stmt_vec_info stmt_info = 
+    vinfo_for_stmt (VEC_index (tree, SLP_TREE_SCALAR_STMTS (slp_node), 0));
+
+  /* Store.  */
+  if (!op_type)
+    number_of_oprnds = 1;
+  else
+    gcc_assert (op_type == unary_op || op_type == binary_op);
+
+  for (i = 0; i < number_of_oprnds; i++)
+    {
+      if (op_type)
+	oprnd = TREE_OPERAND (rhs, i);
+      else
+	oprnd = rhs;
+
+      if (!vect_is_simple_use (oprnd, loop_vinfo, &def_stmt, &def, &dt[i])
+	  || (!def_stmt && dt[i] != vect_constant_def))
+	{
+	  if (vect_print_dump_info (REPORT_DETAILS)) 
+	    {
+	      fprintf (vect_dump, "Build SLP failed: can't find def for ");
+	      print_generic_expr (vect_dump, oprnd, TDF_SLIM);
+	    }
+
+	  return false;
+	}
+
+      if (!*first_stmt_dt0)
+	{
+	  /* op0 of the first stmt of the group - store its info.  */
+	  *first_stmt_dt0 = dt[i];
+	  if (def)
+	    *first_stmt_def0_type = TREE_TYPE (def);
+	  else
+	    *first_stmt_const_oprnd = oprnd;
+	}
+      
+      else
+	{
+	  if (!*first_stmt_dt1 && i == 1)
+	    {
+	      /* op1 of the first stmt of the group - store its info.  */
+	      *first_stmt_dt1 = dt[i];
+	      if (def)
+		*first_stmt_def1_type = TREE_TYPE (def);
+	      else
+		/* We assume that the stmt constains only one constant 
+		   operand.  */
+		*first_stmt_const_oprnd = oprnd;
+	    }
+	  else
+	    {
+	      /* Not first stmt of the group, check that the def-stmt/s match 
+		 the def-stmt/s of the first stmt.  */
+	      if ((i == 0 
+		   && (*first_stmt_dt0 != dt[i]
+		       || (*first_stmt_def0_type && def
+			   && *first_stmt_def0_type != TREE_TYPE (def))))
+		  || (i == 1 
+		      && (*first_stmt_dt1 != dt[i]
+			  || (*first_stmt_def1_type && def
+			      && *first_stmt_def1_type != TREE_TYPE (def))))		  
+		  || (!def 
+		      && TREE_TYPE (*first_stmt_const_oprnd) 
+		      != TREE_TYPE (oprnd)))
+		{ 
+		  if (vect_print_dump_info (REPORT_DETAILS)) 
+		    fprintf (vect_dump, "Build SLP failed: different types ");
+		  
+		  return false;
+		}
+	    }
+	}
+
+      /* Check the types of the definitions.  */
+      switch (dt[i])
+	{
+	case vect_constant_def:
+	case vect_invariant_def:
+	  break;
+	  
+	case vect_loop_def:
+	  if (i == 0)
+	    VEC_safe_push (tree, heap, *def_stmts0, def_stmt);
+	  else
+	    VEC_safe_push (tree, heap, *def_stmts1, def_stmt);
+	  break;
+
+	default:
+	  /* FORNOW: Not supported.  */
+	  if (vect_print_dump_info (REPORT_DETAILS)) 
+	    {
+	      fprintf (vect_dump, "Build SLP failed: illegal type of def ");
+	      print_generic_expr (vect_dump, def, TDF_SLIM);
+	    }
+
+	  return false;
+	}
+    }
+
+  return true;
+}
+
+
+/* Recursively build an SLP tree starting from NODE.
+   Fail (and return FALSE) if def-stmts are not isomorphic, require data 
+   permutation or are of unsupported types of operation. Otherwise, return 
+   TRUE.
+   SLP_IMPOSSIBLE is TRUE if it is impossible to SLP in the loop, for example
+   in the case of multiple types for now.  */
+
+static bool
+vect_build_slp_tree (loop_vec_info loop_vinfo, slp_tree *node, 
+		     unsigned int group_size, bool *slp_impossible)
+{
+  VEC (tree, heap) *def_stmts0 = VEC_alloc (tree, heap, group_size);
+  VEC (tree, heap) *def_stmts1 =  VEC_alloc (tree, heap, group_size);
+  unsigned int i;
+  VEC (tree, heap) *stmts = SLP_TREE_SCALAR_STMTS (*node);
+  tree stmt = VEC_index (tree, stmts, 0);
+  enum vect_def_type first_stmt_dt0 = 0, first_stmt_dt1 = 0;
+  enum tree_code first_stmt_code = 0;
+  tree first_stmt_def1_type = NULL_TREE, first_stmt_def0_type = NULL_TREE;
+  tree lhs, rhs, prev_stmt = NULL_TREE;
+  bool stop_recursion = false, need_same_oprnds = false;
+  tree vectype, scalar_type, first_op1 = NULL_TREE;
+  unsigned int vectorization_factor = 0, ncopies;
+  optab optab;
+  int icode;
+  enum machine_mode optab_op2_mode;
+  enum machine_mode vec_mode;
+  tree first_stmt_const_oprnd = NULL_TREE;
+ 
+  /* For every stmt in NODE find its def stmt/s.  */
+  for (i = 0; VEC_iterate (tree, stmts, i, stmt); i++)
+    {
+      if (vect_print_dump_info (REPORT_DETAILS)) 
+	{
+	  fprintf (vect_dump, "Build SLP for ");
+	  print_generic_expr (vect_dump, stmt, TDF_SLIM);
+	}
+
+      if (TREE_CODE (stmt) != GIMPLE_MODIFY_STMT)
+	{
+	  if (vect_print_dump_info (REPORT_DETAILS)) 
+	    {
+	      fprintf (vect_dump, "Build SLP failed: not MODIFY_STMT ");
+	      print_generic_expr (vect_dump, stmt, TDF_SLIM);
+	    }
+	  
+	  return false;
+	}
+
+      scalar_type = TREE_TYPE (GIMPLE_STMT_OPERAND (stmt, 0));
+      vectype = get_vectype_for_scalar_type (scalar_type);
+      gcc_assert (LOOP_VINFO_VECT_FACTOR (loop_vinfo));
+      vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+      ncopies = vectorization_factor / TYPE_VECTOR_SUBPARTS (vectype);
+      if (ncopies > 1)
+	{
+	  if (vect_print_dump_info (REPORT_DETAILS)) 
+	    fprintf (vect_dump, "SLP failed - multiple types ");
+	  
+	  *slp_impossible = true;
+	  return false;
+	}
+
+      lhs = GIMPLE_STMT_OPERAND (stmt, 0);
+      rhs = GIMPLE_STMT_OPERAND (stmt, 1);
+
+      /* Check the operation.  */
+      if (i == 0)
+	{
+	  first_stmt_code = TREE_CODE (rhs);
+
+	  /* Shift arguments should be equal in all the packed stmts for a 
+	     vector shift with scalar shift operand.  */
+	  if (TREE_CODE (rhs) == LSHIFT_EXPR || TREE_CODE (rhs) == RSHIFT_EXPR)
+	    {
+	      vec_mode = TYPE_MODE (vectype);
+	      optab = optab_for_tree_code (TREE_CODE (rhs), vectype);
+	      if (!optab)
+		{
+		  if (vect_print_dump_info (REPORT_DETAILS))
+		    fprintf (vect_dump, "Build SLP failed: no optab.");
+		  return false;
+		}
+	      icode = (int) optab->handlers[(int) vec_mode].insn_code;
+	      optab_op2_mode = insn_data[icode].operand[2].mode;
+	      if (!VECTOR_MODE_P (optab_op2_mode))
+		{
+		  need_same_oprnds = true;
+		  first_op1 = TREE_OPERAND (rhs, 1);
+		}
+	    }
+	}
+      else
+	{
+	  if (first_stmt_code != TREE_CODE (rhs))
+	    {
+	      if (vect_print_dump_info (REPORT_DETAILS)) 
+		{
+		  fprintf (vect_dump, 
+			   "Build SLP failed: different operation in stmt ");
+		  print_generic_expr (vect_dump, stmt, TDF_SLIM);
+		}
+	      
+	      return false;
+	    }
+	  
+	  if (need_same_oprnds 
+	      && !operand_equal_p (first_op1, TREE_OPERAND (rhs, 1), 0))
+	    {
+	      if (vect_print_dump_info (REPORT_DETAILS)) 
+		{
+		  fprintf (vect_dump, 
+			   "Build SLP failed: different shift arguments in ");
+		  print_generic_expr (vect_dump, stmt, TDF_SLIM);
+		}
+	      
+	      return false;
+	    }
+	}
+
+      /* Strided store or load.  */
+      if (DR_GROUP_FIRST_DR (vinfo_for_stmt (stmt)))
+	{
+	  if (REFERENCE_CLASS_P (lhs))
+	    {
+	      /* Store.  */
+	      if (!vect_get_and_check_slp_defs (loop_vinfo, *node, rhs, 
+						&def_stmts0, &def_stmts1, 
+						&first_stmt_dt0, 
+						&first_stmt_dt1, 
+						&first_stmt_def0_type, 
+						&first_stmt_def1_type,
+						&first_stmt_const_oprnd))
+		return false;
+	    }
+	    else
+	      {
+		/* Load.  */
+		if (i == 0)
+		  {
+		    /* First stmt of the SLP group should be the first load of 
+		       the interleaving loop if data permutation is not 
+		       allowed.  */
+		    if  (DR_GROUP_FIRST_DR (vinfo_for_stmt (stmt)) != stmt) 
+		      {
+			/* FORNOW: data permutations are not supported.  */
+			if (vect_print_dump_info (REPORT_DETAILS)) 
+			  {
+			    fprintf (vect_dump, "Build SLP failed: strided ");
+			    fprintf (vect_dump, " loads need permutation ");
+			    print_generic_expr (vect_dump, stmt, TDF_SLIM);
+			  }
+
+			return false;
+		      }
+		  }
+		else
+		  {
+		    if (DR_GROUP_NEXT_DR (vinfo_for_stmt (prev_stmt)) != stmt)
+		      {
+			/* FORNOW: data permutations are not supported.  */
+			if (vect_print_dump_info (REPORT_DETAILS)) 
+			  {
+			    fprintf (vect_dump, "Build SLP failed: strided ");
+			    fprintf (vect_dump, " loads need permutation ");
+			    print_generic_expr (vect_dump, stmt, TDF_SLIM);
+			  }
+			return false;
+		      }
+		  }
+
+		prev_stmt = stmt;
+
+		/* We stop the tree when we reach a group of loads.  */
+		stop_recursion = true;
+		continue;
+	      }
+	} /* Strided access.  */
+      else
+	{
+	  if (REFERENCE_CLASS_P (rhs))
+	    {
+	      /* Not strided load. */
+	      if (vect_print_dump_info (REPORT_DETAILS)) 
+		{
+		  fprintf (vect_dump, "Build SLP failed: not strided load ");
+		  print_generic_expr (vect_dump, stmt, TDF_SLIM);
+		}
+
+	      /* FORNOW: Not strided loads are not supported.  */
+	      return false;
+	    }
+
+	  /* Not memory operation.  */
+	  if (!BINARY_CLASS_P (rhs) && !UNARY_CLASS_P (rhs))
+	    {
+	      if (vect_print_dump_info (REPORT_DETAILS)) 
+		{
+		  fprintf (vect_dump, "Build SLP failed: operation");
+		  fprintf (vect_dump, " unsupported ");
+		  print_generic_expr (vect_dump, stmt, TDF_SLIM);
+		}
+
+	      return false;
+	    }
+
+	  /* Find the def-stmts.  */ 
+	  if (!vect_get_and_check_slp_defs (loop_vinfo, *node, rhs, &def_stmts0, 
+					    &def_stmts1, &first_stmt_dt0, 
+					    &first_stmt_dt1, 
+					    &first_stmt_def0_type, 
+					    &first_stmt_def1_type,
+					    &first_stmt_const_oprnd))
+	    return false;
+	}
+    }
+
+  /* Strided loads were reached - stop the recursion.  */
+  if (stop_recursion)
+    return true;
+
+  /* Create SLP_TREE nodes for the definition node/s.  */ 
+  if (first_stmt_dt0 == vect_loop_def)
+    {
+      slp_tree left_node = XNEW (struct _slp_tree);
+      SLP_TREE_SCALAR_STMTS (left_node) = def_stmts0;
+      SLP_TREE_VEC_STMTS (left_node) = NULL;
+      SLP_TREE_LEFT (left_node) = NULL;
+      SLP_TREE_RIGHT (left_node) = NULL;
+      if (!vect_build_slp_tree (loop_vinfo, &left_node, group_size, 
+				slp_impossible))
+	return false;
+      
+      SLP_TREE_LEFT (*node) = left_node;
+    }
+
+  if (first_stmt_dt1 == vect_loop_def)
+    {
+      slp_tree right_node = XNEW (struct _slp_tree);
+      SLP_TREE_SCALAR_STMTS (right_node) = def_stmts1;
+      SLP_TREE_VEC_STMTS (right_node) = NULL;
+      SLP_TREE_LEFT (right_node) = NULL;
+      SLP_TREE_RIGHT (right_node) = NULL;
+      if (!vect_build_slp_tree (loop_vinfo, &right_node, group_size,
+				slp_impossible))
+	return false;
+      
+      SLP_TREE_RIGHT (*node) = right_node;
+    }
+
+  return true;
+}
+
+
+static void
+vect_print_slp_tree (slp_tree node)
+{
+  int i;
+  tree stmt;
+
+  if (!node)
+    return;
+
+  fprintf (vect_dump, "node ");
+  for (i = 0; VEC_iterate (tree, SLP_TREE_SCALAR_STMTS (node), i, stmt); i++)
+    {
+      fprintf (vect_dump, "\n\tstmt %d ", i);
+      print_generic_expr (vect_dump, stmt, TDF_SLIM);  
+    }
+  fprintf (vect_dump, "\n");
+
+  vect_print_slp_tree (SLP_TREE_LEFT (node));
+  vect_print_slp_tree (SLP_TREE_RIGHT (node));
+}
+
+
+/* Mark the tree rooted at NODE with MARK (PURE_SLP or HYBRID). 
+   When marking stmts as hybrid, only particular stmt in the group (which index
+   is J) is marked. For pure slp mark j is -1.  */
+
+static void
+vect_mark_slp_stmts (slp_tree node, enum slp_vect_type mark, int j)
+{
+  int i;
+  tree stmt;
+
+  if (!node)
+    return;
+
+  for (i = 0; VEC_iterate (tree, SLP_TREE_SCALAR_STMTS (node), i, stmt); i++)
+    if (j < 0 || i == j)
+      STMT_VINFO_SLP_TYPE (vinfo_for_stmt (stmt)) = mark;
+
+  vect_mark_slp_stmts (SLP_TREE_LEFT (node), mark, j);
+  vect_mark_slp_stmts (SLP_TREE_RIGHT (node), mark, j);
+}
+
+
+/* Analyze an SLP instance starting from a group of strided stores. Call
+   vect_build_slp_tree to build a tree of packed stmts if possible. 
+   Return FALSE if it's impossible to SLP any stt in the loop.  */
+
+static bool
+vect_analyze_slp_instance (loop_vec_info loop_vinfo, tree stmt)
+{
+  slp_instance new_instance;
+  slp_tree node = XNEW (struct _slp_tree);
+  unsigned int group_size = DR_GROUP_SIZE (vinfo_for_stmt (stmt));
+  unsigned int unrolling_factor = 1, nunits;
+  tree vectype, scalar_type, next;
+  unsigned int vectorization_factor = 0, ncopies;
+  bool slp_impossible = false; 
+
+  /* FORNOW: multiple types are not supported.  */
+  scalar_type = TREE_TYPE (DR_REF (STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt))));
+  vectype = get_vectype_for_scalar_type (scalar_type);
+  nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  ncopies = vectorization_factor / nunits;
+  if (ncopies > 1)
+    {
+      if (vect_print_dump_info (REPORT_DETAILS)) 
+	  fprintf (vect_dump, "SLP failed - multiple types ");
+
+      return false;
+    }
+
+  /* Create a node (a root of the SLP tree) for the packed strided stores.  */ 
+  SLP_TREE_SCALAR_STMTS (node) = VEC_alloc (tree, heap, group_size);
+  next = stmt;
+  /* Collect the stores and store them in SLP_TREE_SCALAR_STMTS.  */
+  while (next)
+    {
+      VEC_safe_push (tree, heap, SLP_TREE_SCALAR_STMTS (node), next);
+      next = DR_GROUP_NEXT_DR (vinfo_for_stmt (next));
+    }
+
+  SLP_TREE_VEC_STMTS (node) = NULL;
+  SLP_TREE_NUMBER_OF_VEC_STMTS (node) = 0;
+  SLP_TREE_LEFT (node) = NULL;
+  SLP_TREE_RIGHT (node) = NULL;
+
+  /* Calculate the unrolling factor.  */
+  unrolling_factor = least_common_multiple (nunits, group_size) / group_size;
+	
+  /* Build the tree for the SLP instance.  */
+  if (vect_build_slp_tree (loop_vinfo, &node, group_size, &slp_impossible))
+    {
+      /* Create a new SLP instance.  */  
+      new_instance = XNEW (struct _slp_instance);
+      SLP_INSTANCE_TREE (new_instance) = node;
+      SLP_INSTANCE_GROUP_SIZE (new_instance) = group_size;
+      SLP_INSTANCE_UNROLLING_FACTOR (new_instance) = unrolling_factor;
+      VEC_safe_push (slp_instance, heap, LOOP_VINFO_SLP_INSTANCES (loop_vinfo), 
+		     new_instance);
+      if (vect_print_dump_info (REPORT_DETAILS))
+	vect_print_slp_tree (node);
+
+      return true;
+    }
+
+  /* Failed to SLP.  */
+  /* Free the allocated memory.  */
+  vect_free_slp_tree (node);
+
+  if (slp_impossible)
+    return false;
+
+  /* SLP failed for this instance, but it is still possible to SLP other stmts 
+     in the loop.  */
+  return true;
+}
+
+
+/* Check if there are stmts in the loop can be vectorized using SLP. Build SLP
+   trees of packed scalar stmts if SLP is possible.  */
+
+static bool
+vect_analyze_slp (loop_vec_info loop_vinfo)
+{
+  unsigned int i;
+  VEC (tree, heap) *strided_stores = LOOP_VINFO_STRIDED_STORES (loop_vinfo);
+  tree store;
+
+  if (vect_print_dump_info (REPORT_DETAILS))
+    fprintf (vect_dump, "=== vect_analyze_slp ===");
+
+  for (i = 0; VEC_iterate (tree, strided_stores, i, store); i++)
+    if (!vect_analyze_slp_instance (loop_vinfo, store))
+      {
+	/* SLP failed. No instance can be SLPed in the loop.  */
+	if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))	
+	  fprintf (vect_dump, "SLP failed.");
+
+	return false;
+      }
+
+  return true;
+}
+
+
+/* For each possible SLP instance decide whether to SLP it and calculate overall
+   unrolling factor needed to SLP the loop.  */
+
+static void
+vect_make_slp_decision (loop_vec_info loop_vinfo)
+{
+  unsigned int i, unrolling_factor = 1;
+  VEC (slp_instance, heap) *slp_instances = LOOP_VINFO_SLP_INSTANCES (loop_vinfo);
+  slp_instance instance;
+  int decided_to_slp = 0;
+
+  if (vect_print_dump_info (REPORT_DETAILS))
+    fprintf (vect_dump, "=== vect_make_slp_decision ===");
+
+  for (i = 0; VEC_iterate (slp_instance, slp_instances, i, instance); i++)
+    {
+      /* FORNOW: SLP if you can.  */
+      if (unrolling_factor < SLP_INSTANCE_UNROLLING_FACTOR (instance))
+	unrolling_factor = SLP_INSTANCE_UNROLLING_FACTOR (instance);
+
+      vect_mark_slp_stmts (SLP_INSTANCE_TREE (instance), pure_slp, -1);
+      decided_to_slp++;
+    }
+
+  LOOP_VINFO_SLP_UNROLLING_FACTOR (loop_vinfo) = unrolling_factor;
+
+  if (decided_to_slp && vect_print_dump_info (REPORT_DETAILS)) 
+    fprintf (vect_dump, "Decided to SLP %d instances. Unrolling factor %d", 
+	     decided_to_slp, unrolling_factor);
+}
+
+
+/* Mark the tree rooted at NODE with MARK (PURE_SLP or HYBRID).  */
+
+static void
+vect_detect_hybrid_slp_stmts (slp_tree node)
+{
+  int i;
+  tree stmt;
+  imm_use_iterator imm_iter;
+  tree use_stmt;
+
+  if (!node)
+    return;
+
+  for (i = 0; VEC_iterate (tree, SLP_TREE_SCALAR_STMTS (node), i, stmt); i++)
+    if (STMT_VINFO_PURE_SLP (vinfo_for_stmt (stmt))
+	&& TREE_CODE (GIMPLE_STMT_OPERAND (stmt, 0)) == SSA_NAME)
+      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, GIMPLE_STMT_OPERAND (stmt, 0))
+	if (vinfo_for_stmt (use_stmt)
+	    && !STMT_VINFO_SLP_TYPE (vinfo_for_stmt (use_stmt)))
+	  vect_mark_slp_stmts (node, hybrid, i);
+
+  vect_detect_hybrid_slp_stmts (SLP_TREE_LEFT (node));
+  vect_detect_hybrid_slp_stmts (SLP_TREE_RIGHT (node));
+}
+
+
+/* Find stmts that must be both vectorized and SLPed.  */
+
+static void
+vect_detect_hybrid_slp (loop_vec_info loop_vinfo)
+{
+  unsigned int i;
+  VEC (slp_instance, heap) *slp_instances = LOOP_VINFO_SLP_INSTANCES (loop_vinfo);
+  slp_instance instance;
+
+  if (vect_print_dump_info (REPORT_DETAILS))
+    fprintf (vect_dump, "=== vect_detect_hybrid_slp ===");
+
+  for (i = 0; VEC_iterate (slp_instance, slp_instances, i, instance); i++)
+    vect_detect_hybrid_slp_stmts (SLP_INSTANCE_TREE (instance));
+}
+
+
 /* Function vect_analyze_data_refs.
 
   Find all the data references in the loop.
@@ -2889,6 +3529,17 @@ vect_analyze_loop (struct loop *loop)
       return NULL;
     }
 
+  /* Check the SLP opportunities in the loop, analyze and build SLP trees.  */
+  ok = vect_analyze_slp (loop_vinfo);
+  if (ok)
+    {
+      /* Decide which possible SLP instances to SLP.  */
+      vect_make_slp_decision (loop_vinfo);
+
+      /* Find stmts that need to be both vectorized and SLPed.  */
+      vect_detect_hybrid_slp (loop_vinfo);
+    }
+
   /* This pass will decide on using loop versioning and/or loop peeling in
      order to enhance the alignment of data references in the loop.  */
 
--- tree-vect-transform.c.patch2	2007-08-14 14:03:27.000000000 +0300
+++ tree-vect-transform.c	2007-08-14 14:29:56.000000000 +0300
@@ -1101,6 +1101,181 @@ vect_init_vector (tree stmt, tree vector
 }
 
 
+/* For constant and loop invariant defs of SLP_NODE this function returns 
+   (vector) defs (VEC_OPRNDS) that will be used in the vectorized stmts.  
+   OP_NUM determines if we gather defs for operand 0 or operand 1 of the scalar
+   stmts.  */
+
+static bool
+vect_get_constant_vectors (slp_tree slp_node, VEC(tree,heap) **vec_oprnds,
+			   unsigned int op_num)
+{
+  VEC (tree, heap) *stmts = SLP_TREE_SCALAR_STMTS (slp_node);
+  tree stmt = VEC_index (tree, stmts, 0);
+  stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt);
+  tree vectype = STMT_VINFO_VECTYPE (stmt_vinfo);
+  int nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  tree vec_cst;
+  tree t = NULL_TREE;
+  int j, number_of_places_left_in_vector;
+  tree vector_type;
+  tree op, vop, operation;
+  int group_size = VEC_length (tree, stmts);
+  unsigned int vec_num, i;
+  int number_of_copies = 1;
+  bool is_store = false;
+  unsigned int number_of_vectors = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
+  VEC (tree, heap) *voprnds = VEC_alloc (tree, heap, number_of_vectors);
+
+  if (STMT_VINFO_DATA_REF (stmt_vinfo))
+    is_store = true;
+
+  /* NUMBER_OF_COPIES is the number of times we need to use the same values in
+     created vectors. It is greater than 1 if unrolling is performed. 
+
+     For example, we have two scalar operands, s1 and s2 (e.g., group of
+     strided accesses of size two), while NUINTS is four (i.e., four scalars
+     of this type can be packed in a vector). The output vector will contain
+     two copies of each scalar operand: {s1, s2, s1, s2}. (NUMBER_OF_COPIES
+     will be 2).
+
+     If GROUP_SIZE > NUNITS, the scalars will be split into several vectors 
+     containing the operands.
+
+     For example, NUINTS is four as before, and the group size is 8 
+     (s1, s2, ..., s8). We will create two vectors {s1, s2, s3, s4} and
+     {s5, s6, s7, s8}.  */
+    
+  number_of_copies = least_common_multiple (nunits, group_size) / group_size;
+
+  number_of_places_left_in_vector = nunits;
+  for (j = 0; j < number_of_copies; j++)
+    {
+      for (i = group_size - 1; VEC_iterate (tree, stmts, i, stmt); i--)
+        {
+          operation = GIMPLE_STMT_OPERAND (stmt, 1);
+	  if (is_store)
+	    op = operation;
+	  else
+	    op = TREE_OPERAND (operation, op_num);
+
+          /* Create 'vect_ = {op0,op1,...,opn}'.  */
+          t = tree_cons (NULL_TREE, op, t);
+
+          number_of_places_left_in_vector--;
+
+          if (number_of_places_left_in_vector == 0)
+            {
+              number_of_places_left_in_vector = nunits;
+
+	      vector_type = get_vectype_for_scalar_type (TREE_TYPE (op));
+              vec_cst = build_constructor_from_list (vector_type, t);
+              VEC_quick_push (tree, voprnds,
+                              vect_init_vector (stmt, vec_cst, vector_type));
+              t = NULL_TREE;
+            }
+        }
+    }
+
+  /* Since the vectors are created in the reverse order, we should invert 
+     them.  */
+  vec_num = VEC_length (tree, voprnds);
+  for (j = vec_num - 1; j >= 0; j--)
+    {
+      vop = VEC_index (tree, voprnds, j);
+      VEC_quick_push (tree, *vec_oprnds, vop);
+    }
+
+  VEC_free (tree, heap, voprnds);
+
+  /* In case that VF is greater than the unrolling factor needed for the SLP
+     group of stmts, NUMBER_OF_VECTORS to be created is greater than 
+     NUMBER_OF_SCALARS/NUNITS or NUNITS/NUMBER_OF_SCALARS, and hence we have 
+     to replicate the vectors.  */
+  while (number_of_vectors > VEC_length (tree, *vec_oprnds))
+    {
+      for (i = 0; VEC_iterate (tree, *vec_oprnds, i, vop) && i < vec_num; i++)
+        VEC_quick_push (tree, *vec_oprnds, vop);
+    }
+
+  return true;
+}
+
+
+/* Get vectorized defintions from SLP_NODE that contains corresponding
+   vectorized def-stmts.  */
+ 
+static void
+vect_get_slp_vect_defs (slp_tree slp_node, VEC (tree,heap) **vec_oprnds)
+{
+  tree vec_oprnd;
+  tree vec_def_stmt;
+  unsigned int i;
+
+  gcc_assert (SLP_TREE_VEC_STMTS (slp_node));
+
+  for (i = 0; 
+       VEC_iterate (tree, SLP_TREE_VEC_STMTS (slp_node), i, vec_def_stmt); 
+       i++)
+    {
+      gcc_assert (vec_def_stmt);
+      vec_oprnd = GIMPLE_STMT_OPERAND (vec_def_stmt, 0);
+      VEC_quick_push (tree, *vec_oprnds, vec_oprnd);
+    }
+}
+
+
+/* Get vectorized definitions for SLP_NODE. 
+   If the scalar definitions are loop invariants or constants, collect them and 
+   call vect_get_constant_vectors() to create vector stmts.
+   Otherwise, the def-stmts must be already vectorized and the vectorized stmts
+   must be stored in the LEFT/RIGHT node of SLP_NODE, and we call
+   vect_get_slp_vect_defs() to retrieve them.  */
+ 
+static bool
+vect_get_slp_defs (slp_tree slp_node, VEC (tree,heap) **vec_oprnds0,
+                   VEC (tree,heap) **vec_oprnds1)
+{
+  tree operation, first_stmt;
+
+  /* Allocate memory for vectorized defs.  */
+  *vec_oprnds0 = VEC_alloc (tree, heap, 
+			    SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node));
+
+  /* SLP_NODE corresponds either to a group of stores or to a group of 
+     unary/binary operations. We don't call this function for loads.  */
+  if (SLP_TREE_LEFT (slp_node)) 
+    /* The defs are already vectorized.  */ 
+    vect_get_slp_vect_defs (SLP_TREE_LEFT (slp_node), vec_oprnds0);
+  else
+    /* Build vectors from scalar defs.  */
+    if (!vect_get_constant_vectors (slp_node, vec_oprnds0, 0))
+      return false;
+
+  first_stmt = VEC_index (tree, SLP_TREE_SCALAR_STMTS (slp_node), 0);
+  if (STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt)))
+    /* Since we don't call this function with loads, this is a group of 
+       stores.  */
+    return true;
+
+  operation = GIMPLE_STMT_OPERAND (first_stmt, 1);
+  if (TREE_OPERAND_LENGTH (operation) == unary_op)
+    return true;
+
+  *vec_oprnds1 = VEC_alloc (tree, heap, 
+			    SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node));
+
+  if (SLP_TREE_RIGHT (slp_node))
+    /* The defs are already vectorized.  */ 
+    vect_get_slp_vect_defs (SLP_TREE_RIGHT (slp_node), vec_oprnds1);
+  else
+    /* Build vectors from scalar defs.  */
+    return vect_get_constant_vectors (slp_node, vec_oprnds1, 1);
+
+  return true;
+}
+
+
 /* Function get_initial_def_for_induction
 
    Input:
@@ -5797,6 +5972,115 @@ vect_create_cond_for_align_checks (loop_
 }
 
 
+/* Remove a group of stores (for SLP or interleaving), free their 
+   stmt_vec_info.  */
+
+static void
+vect_remove_stores (tree first_stmt)
+{
+  stmt_ann_t ann;
+  tree next = first_stmt;
+  tree tmp;
+  stmt_vec_info next_stmt_info;
+  block_stmt_iterator next_si;
+
+  while (next)
+    {
+      /* Free the attached stmt_vec_info and remove the stmt.  */
+      next_si = bsi_for_stmt (next);
+      bsi_remove (&next_si, true);
+      next_stmt_info = vinfo_for_stmt (next);
+      ann = stmt_ann (next);
+      tmp = DR_GROUP_NEXT_DR (next_stmt_info);
+      free (next_stmt_info);
+      set_stmt_info (ann, NULL);
+      next = tmp;
+    }
+}
+
+
+/* Vectorize SLP instance tree in postorder.  */
+
+static bool
+vect_schedule_slp_instance (slp_tree node, unsigned int vec_stmts_size)
+{
+  tree stmt;
+  bool strided_store, is_store;
+  block_stmt_iterator si;
+  stmt_vec_info stmt_info;
+
+  if (!node)
+    return false;
+
+  vect_schedule_slp_instance (SLP_TREE_LEFT (node), vec_stmts_size);
+  vect_schedule_slp_instance (SLP_TREE_RIGHT (node), vec_stmts_size);
+  
+  stmt = VEC_index(tree, SLP_TREE_SCALAR_STMTS (node), 0);
+  stmt_info = vinfo_for_stmt (stmt);
+  SLP_TREE_VEC_STMTS (node) = VEC_alloc (tree, heap, vec_stmts_size);
+  SLP_TREE_NUMBER_OF_VEC_STMTS (node) = vec_stmts_size;
+
+  if (vect_print_dump_info (REPORT_DETAILS))
+    {
+      fprintf (vect_dump, "------>vectorizing SLP node starting from: ");
+      print_generic_expr (vect_dump, stmt, TDF_SLIM);
+    }	
+
+  si = bsi_for_stmt (stmt);
+  is_store = vect_transform_stmt (stmt, &si, &strided_store, node);
+  if (is_store)
+    {
+      if (DR_GROUP_FIRST_DR (stmt_info))
+	/* If IS_STORE is TRUE, the vectorization of the
+	   interleaving chain was completed - free all the stores in
+	   the chain.  */
+	vect_remove_stores (DR_GROUP_FIRST_DR (stmt_info));
+      else
+	/* FORNOW: SLP originates only from strided stores.  */
+	gcc_unreachable ();
+
+      return true;
+    }
+
+  /* FORNOW: SLP originates only from strided stores.  */
+  return false;
+}
+
+
+static bool
+vect_schedule_slp (loop_vec_info loop_vinfo, unsigned int nunits)
+{
+  VEC (slp_instance, heap) *slp_instances = 
+    LOOP_VINFO_SLP_INSTANCES (loop_vinfo);
+  slp_instance instance;
+  unsigned int vec_stmts_size;
+  unsigned int group_size, i;
+  unsigned int vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  bool is_store = false;
+
+  for (i = 0; VEC_iterate (slp_instance, slp_instances, i, instance); i++)
+    {
+      group_size = SLP_INSTANCE_GROUP_SIZE (instance);
+      /* For each SLP instance calculate number of vector stmts to be created 
+	 for the scalar stmts in each node of the SLP tree. Number of vector 
+	 elements in one vector iteration is the number of scalar elements in 
+	 one scalar iteration (GROUP_SIZE) multiplied by VF divided by vector 
+	 size.  */		      
+      vec_stmts_size = vectorization_factor * group_size / nunits;
+			  
+      /* Schedule the tree of INSTANCE.  */
+      is_store = vect_schedule_slp_instance (SLP_INSTANCE_TREE (instance), 
+					     vec_stmts_size);
+		     
+      if (vect_print_dump_info (REPORT_VECTORIZED_LOOPS)
+	  || vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
+	fprintf (vect_dump, "vectorizing stmts using SLP.");
+    }
+
+  return is_store;
+}
+
+
 /* Function vect_transform_loop.
 
    The analysis phase has determined that the loop is vectorizable.
Index: Makefile.in
===================================================================
--- Makefile.in (revision 127314)
+++ Makefile.in (working copy)
@@ -2223,7 +2223,7 @@ tree-data-ref.o: tree-data-ref.c $(CONFI
    $(TREE_FLOW_H) $(TREE_DUMP_H) $(TIMEVAR_H) $(CFGLOOP_H) \
    $(TREE_DATA_REF_H) $(SCEV_H) tree-pass.h tree-chrec.h langhooks.h
 tree-vect-analyze.o: tree-vect-analyze.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
-   $(TM_H) $(GGC_H) $(OPTABS_H) $(TREE_H) $(BASIC_BLOCK_H) \
+   $(TM_H) $(GGC_H) $(OPTABS_H) $(TREE_H) $(RECOG_H) $(BASIC_BLOCK_H) \
    $(DIAGNOSTIC_H) $(TREE_FLOW_H) $(TREE_DUMP_H) $(TIMEVAR_H) $(CFGLOOP_H) \
    tree-vectorizer.h $(TREE_DATA_REF_H) $(SCEV_H) $(EXPR_H) tree-chrec.h
 tree-vect-patterns.o: tree-vect-patterns.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] Loop-aware SLP 3/5
  2007-08-14 13:01 [patch] Loop-aware SLP Ira Rosen
  2007-08-14 13:05 ` [patch] Loop-aware SLP 1/5 Ira Rosen
  2007-08-14 13:07 ` [patch] Loop-aware SLP 2/5 Ira Rosen
@ 2007-08-14 13:12 ` Ira Rosen
  2007-08-14 13:16 ` [patch] Loop-aware SLP 4/5 Ira Rosen
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Ira Rosen @ 2007-08-14 13:12 UTC (permalink / raw)
  To: Ira Rosen; +Cc: Dorit Nuzman, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1129 bytes --]

This is the third part of loop-aware SLP patch.

This part completes the SLP support inside loops by collecting SLP seeds
(groups of strided stores) and scheduling of SLP instances. SLP computation
tree is scheduled in postorder. To vectorize a node we get its vector
definitions from already SLPed left and right nodes.

Thanks,
Ira

ChangeLog:

      * tree-vect-analyze.c (vect_analyze_group_access): Collect groups of
      strided stores for further use in SLP analysis.
      * tree-vect-transform.c (vectorizable_reduction): Don't handle SLP
      for now.
      (vectorizable_call): Likewise.
      (vectorizable_conversion): Handle SLP (call vect_get_slp_defs to
      get SLPed defs).
      (vectorizable_assignment): Likewise.
      (vectorizable_induction): Don't handle SLP for now.
      (vectorizable_operation): Handle SLP (call vect_get_slp_defs to
      get SLPed defs).
      (vectorizable_store, vectorizable_load): Likewise.
      (vectorizable_condition): Don't handle SLP for now.
      (vect_transform_loop): Skip stmts without stmt_vinfo.
      Schedule SLP instances.

(See attached file: slp-part3.txt)

[-- Attachment #2: slp-part3.txt --]
[-- Type: text/plain, Size: 26586 bytes --]

--- tree-vect-analyze.c.patch3	2007-08-14 14:39:55.000000000 +0300
+++ tree-vect-analyze.c	2007-08-14 15:19:08.000000000 +0300
@@ -1955,7 +1955,8 @@ vect_analyze_data_refs_alignment (loop_v
 
 /* Analyze groups of strided accesses: check that DR belongs to a group of
    strided accesses of legal size, step, etc. Detect gaps, single element
-   interleaving, and other special cases. Set strided access info.  */
+   interleaving, and other special cases. Set strided access info.
+   Collect groups of strided stores for further use in SLP analysis.  */
 
 static bool
 vect_analyze_group_access (struct data_reference *dr)
@@ -1966,6 +1967,9 @@ vect_analyze_group_access (struct data_r
   tree stmt = DR_STMT (dr);
   HOST_WIDE_INT dr_step = TREE_INT_CST_LOW (step);
   HOST_WIDE_INT stride;
+  bool slp_impossible = false;
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
 
   /* For interleaving, STRIDE is STEP counted in elements, i.e., the size of the
      interleaving group (including gaps).  */
@@ -2062,12 +2066,18 @@ vect_analyze_group_access (struct data_r
              size. Otherwise, we have gaps.  */
           diff = (TREE_INT_CST_LOW (DR_INIT (data_ref))
                   - TREE_INT_CST_LOW (prev_init)) / type_size;
-          if (!DR_IS_READ (data_ref) && diff != 1)
-            {
-              if (vect_print_dump_info (REPORT_DETAILS))
-                fprintf (vect_dump, "interleaved store with gaps");
-              return false;
-            }
+	  if (diff != 1)
+	    {
+	      /* SLP of accesses with gaps is not supported.  */
+	      slp_impossible = true;
+	      if (!DR_IS_READ (data_ref))
+		{
+		  if (vect_print_dump_info (REPORT_DETAILS))
+		    fprintf (vect_dump, "interleaved store with gaps");
+		  return false;
+		}
+	    }
+
           /* Store the gap from the previous member of the group. If there is no
              gap in the access, DR_GROUP_GAP is always 1.  */
           DR_GROUP_GAP (vinfo_for_stmt (next)) = diff;
@@ -2116,15 +2126,28 @@ vect_analyze_group_access (struct data_r
           return false;
         }
 
-      /* FORNOW: we handle only interleaving that is a power of 2.  */
+      /* FORNOW: we handle only interleaving that is a power of 2.  
+         We don't fail here if it may be still possible to vectorize the
+         group using SLP. If not, the size of the group will be checked in
+         vect_analyze_operations, and the vectorization will fail.  */
       if (exact_log2 (stride) == -1)
-        {
-          if (vect_print_dump_info (REPORT_DETAILS))
-            fprintf (vect_dump, "interleaving is not a power of 2");
-          return false;
-        }
+	{
+	  if (vect_print_dump_info (REPORT_DETAILS))
+	    fprintf (vect_dump, "interleaving is not a power of 2");
+
+	  if (slp_impossible)
+	    return false;
+	}
       DR_GROUP_SIZE (vinfo_for_stmt (stmt)) = stride;
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "Detected interleaving of size %d", (int)stride);
+
+      /* SLP: create an SLP data structure for every interleaving group of 
+	 stores for further analysis in vect_analyse_slp.  */
+      if (!DR_IS_READ (dr) && !slp_impossible)
+	VEC_safe_push (tree, heap, LOOP_VINFO_STRIDED_STORES (loop_vinfo), stmt);
     }
+
   return true;
 }
 
--- tree-vect-transform.c.patch3	2007-08-14 14:40:03.000000000 +0300
+++ tree-vect-transform.c	2007-08-14 15:19:08.000000000 +0300
@@ -2246,6 +2246,10 @@ vectorizable_reduction (tree stmt, block
 
   gcc_assert (ncopies >= 1);
 
+  /* FORNOW: SLP not supported.  */
+  if (STMT_VINFO_SLP_TYPE (stmt_info))
+    return false;
+
   /* 1. Is vectorizable reduction?  */
 
   /* Not supportable if the reduction variable is used in the loop.  */
@@ -2550,6 +2554,10 @@ vectorizable_call (tree stmt, block_stmt
   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_loop_def)
     return false;
 
+  /* FORNOW: SLP not supported.  */
+  if (STMT_VINFO_SLP_TYPE (stmt_info))
+    return false;
+
   /* FORNOW: not yet supported.  */
   if (STMT_VINFO_LIVE_P (stmt_info))
     {
@@ -2853,7 +2861,7 @@ vectorizable_conversion (tree stmt, bloc
   tree new_temp;
   tree def, def_stmt;
   enum vect_def_type dt0;
-  tree new_stmt;
+  tree new_stmt = NULL_TREE;
   stmt_vec_info prev_stmt_info;
   int nunits_in;
   int nunits_out;
@@ -2863,6 +2871,9 @@ vectorizable_conversion (tree stmt, bloc
   tree rhs_type, lhs_type;
   tree builtin_decl;
   enum { NARROW, NONE, WIDEN } modifier;
+  int i;
+  VEC(tree,heap) *vec_oprnds0 = NULL;
+  tree vop0;
 
   /* Is STMT a vectorizable conversion?   */
 
@@ -2925,6 +2936,10 @@ vectorizable_conversion (tree stmt, bloc
   else
     ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_in;
 
+  /* FORNOW */
+  if (slp_node) 
+    ncopies = 1;
+
   /* Sanity check: make sure that at least one copy of the vectorized stmt
      needs to be generated.  */
   gcc_assert (ncopies >= 1);
@@ -2954,7 +2969,12 @@ vectorizable_conversion (tree stmt, bloc
     }
 
   if (modifier != NONE)
-    STMT_VINFO_VECTYPE (stmt_info) = vectype_in;
+    {
+      STMT_VINFO_VECTYPE (stmt_info) = vectype_in;
+      /* FORNOW: SLP not supported.  */
+      if (STMT_VINFO_SLP_TYPE (stmt_info))
+	return false;      
+    }
 
   if (!vec_stmt)		/* transformation not required.  */
     {
@@ -2969,6 +2989,9 @@ vectorizable_conversion (tree stmt, bloc
   /* Handle def.  */
   vec_dest = vect_create_destination_var (scalar_dest, vectype_out);
 
+  if (modifier == NONE && !slp_node)
+    vec_oprnds0 = VEC_alloc (tree, heap, 1);
+
   prev_stmt_info = NULL;
   switch (modifier)
     {
@@ -2979,24 +3002,46 @@ vectorizable_conversion (tree stmt, bloc
 	  ssa_op_iter iter;
 
 	  if (j == 0)
-	    vec_oprnd0 = vect_get_vec_def_for_operand (op0, stmt, NULL);
+	    if (slp_node)
+	      {
+		/* Get vectorized arguments for SLP_NODE.  */
+		if (!vect_get_slp_defs (slp_node, &vec_oprnds0, NULL))
+		  return false;
+	      }
+	    else
+	      {	    
+		vec_oprnd0 = vect_get_vec_def_for_operand (op0, stmt, NULL);
+		VEC_quick_push (tree, vec_oprnds0, vec_oprnd0);
+	      }
 	  else
-	    vec_oprnd0 = vect_get_vec_def_for_stmt_copy (dt0, vec_oprnd0);
+	    {
+	      /* FORNOW: unreachable with SLP.  */
+	      gcc_assert (!slp_node);
+	      
+	      vec_oprnd0 = vect_get_vec_def_for_stmt_copy (dt0, vec_oprnd0);
+	      VEC_replace (tree, vec_oprnds0, 0, vec_oprnd0);
+	    }
 
 	  builtin_decl =
 	    targetm.vectorize.builtin_conversion (code, vectype_in);
-	  new_stmt = build_call_expr (builtin_decl, 1, vec_oprnd0);
+	  for (i = 0; VEC_iterate (tree, vec_oprnds0, i, vop0); i++)
+	    { 
+	      new_stmt = build_call_expr (builtin_decl, 1, vop0);
 
-	  /* Arguments are ready. create the new vector stmt.  */
-	  new_stmt = build_gimple_modify_stmt (vec_dest, new_stmt);
-	  new_temp = make_ssa_name (vec_dest, new_stmt);
-	  GIMPLE_STMT_OPERAND (new_stmt, 0) = new_temp;
-	  vect_finish_stmt_generation (stmt, new_stmt, bsi);
-	  FOR_EACH_SSA_TREE_OPERAND (sym, new_stmt, iter, SSA_OP_ALL_VIRTUALS)
-	    {
-	      if (TREE_CODE (sym) == SSA_NAME)
-		sym = SSA_NAME_VAR (sym);
-	      mark_sym_for_renaming (sym);
+	      /* Arguments are ready. create the new vector stmt.  */
+	      new_stmt = build_gimple_modify_stmt (vec_dest, new_stmt);
+	      new_temp = make_ssa_name (vec_dest, new_stmt);
+	      GIMPLE_STMT_OPERAND (new_stmt, 0) = new_temp;
+	      vect_finish_stmt_generation (stmt, new_stmt, bsi);
+	      FOR_EACH_SSA_TREE_OPERAND (sym, new_stmt, iter, 
+					 SSA_OP_ALL_VIRTUALS)
+		{
+		  if (TREE_CODE (sym) == SSA_NAME)
+		    sym = SSA_NAME_VAR (sym);
+		  mark_sym_for_renaming (sym);
+		}
+	      if (slp_node)
+		VEC_quick_push (tree, SLP_TREE_VEC_STMTS (slp_node), new_stmt);
 	    }
 
 	  if (j == 0)
@@ -3106,6 +3151,9 @@ vectorizable_assignment (tree stmt, bloc
   enum vect_def_type dt[2] = {vect_unknown_def_type, vect_unknown_def_type};
   int nunits = TYPE_VECTOR_SUBPARTS (vectype);
   int ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
+  int i;
+  VEC(tree,heap) *vec_oprnds = NULL;
+  tree vop;
 
   gcc_assert (ncopies >= 1);
   if (ncopies > 1)
@@ -3158,15 +3206,34 @@ vectorizable_assignment (tree stmt, bloc
   vec_dest = vect_create_destination_var (scalar_dest, vectype);
 
   /* Handle use.  */
-  op = GIMPLE_STMT_OPERAND (stmt, 1);
-  vec_oprnd = vect_get_vec_def_for_operand (op, stmt, NULL);
+  if (slp_node)
+    {
+      /* Get vectorized arguments for SLP_NODE.  */
+      if (!vect_get_slp_defs (slp_node, &vec_oprnds, NULL))
+	return false;
+    }
+  else
+    {
+      op = GIMPLE_STMT_OPERAND (stmt, 1);
+      vec_oprnd = vect_get_vec_def_for_operand (op, stmt, NULL);
+      vec_oprnds = VEC_alloc (tree, heap, 1);
+      VEC_quick_push (tree, vec_oprnds, vec_oprnd);
+    }
 
   /* Arguments are ready. create the new vector stmt.  */
-  *vec_stmt = build_gimple_modify_stmt (vec_dest, vec_oprnd);
-  new_temp = make_ssa_name (vec_dest, *vec_stmt);
-  GIMPLE_STMT_OPERAND (*vec_stmt, 0) = new_temp;
-  vect_finish_stmt_generation (stmt, *vec_stmt, bsi);
+  for (i = 0; VEC_iterate (tree, vec_oprnds, i, vop); i++)
+    {
+      *vec_stmt = build_gimple_modify_stmt (vec_dest, vop);
+      new_temp = make_ssa_name (vec_dest, *vec_stmt);
+      GIMPLE_STMT_OPERAND (*vec_stmt, 0) = new_temp;
+      vect_finish_stmt_generation (stmt, *vec_stmt, bsi);
+      STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt;
+
+      if (slp_node)
+	VEC_quick_push (tree, SLP_TREE_VEC_STMTS (slp_node), *vec_stmt);
+   }
   
+  VEC_free (tree, heap, vec_oprnds);       
   return true;
 }
 
@@ -3194,6 +3261,10 @@ vectorizable_induction (tree phi, block_
   if (!STMT_VINFO_RELEVANT_P (stmt_info))
     return false;
 
+  /* FORNOW: SLP not supported.  */
+  if (STMT_VINFO_SLP_TYPE (stmt_info))
+    return false;
+
   gcc_assert (STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def);
 
   if (STMT_VINFO_LIVE_P (stmt_info))
@@ -3255,14 +3326,19 @@ vectorizable_operation (tree stmt, block
   enum machine_mode optab_op2_mode;
   tree def, def_stmt;
   enum vect_def_type dt[2] = {vect_unknown_def_type, vect_unknown_def_type};
-  tree new_stmt;
+  tree new_stmt = NULL_TREE;
   stmt_vec_info prev_stmt_info;
   int nunits_in = TYPE_VECTOR_SUBPARTS (vectype);
   int nunits_out;
   tree vectype_out;
   int ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_in;
-  int j;
+  int j, i;
+  VEC(tree,heap) *vec_oprnds0 = NULL, *vec_oprnds1 = NULL;
+  tree vop0, vop1;
 
+  /* FORNOW */
+  if (slp_node)
+    ncopies = 1;
   gcc_assert (ncopies >= 1);
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info))
@@ -3397,6 +3473,13 @@ vectorizable_operation (tree stmt, block
   /* Handle def.  */
   vec_dest = vect_create_destination_var (scalar_dest, vectype);
 
+  if (!slp_node)
+    {
+      vec_oprnds0 = VEC_alloc (tree, heap, 1);
+      if (op_type == binary_op)
+	vec_oprnds1 = VEC_alloc (tree, heap, 1);
+    }
+
   /* In case the vectorization factor (VF) is bigger than the number
      of elements that we can fit in a vectype (nunits), we have to generate
      more than one vector stmt - i.e - we need to "unroll" the
@@ -3456,45 +3539,72 @@ vectorizable_operation (tree stmt, block
       /* Handle uses.  */
       if (j == 0)
 	{
-	  vec_oprnd0 = vect_get_vec_def_for_operand (op0, stmt, NULL);
-	  if (op_type == binary_op)
-	    {
-	      if (code == LSHIFT_EXPR || code == RSHIFT_EXPR)
-	        {
-	          /* Vector shl and shr insn patterns can be defined with
-	             scalar operand 2 (shift operand).  In this case, use
-	             constant or loop invariant op1 directly, without
-	             extending it to vector mode first.  */
-	          optab_op2_mode = insn_data[icode].operand[2].mode;
-	          if (!VECTOR_MODE_P (optab_op2_mode))
-	            {
-	              if (vect_print_dump_info (REPORT_DETAILS))
-	                fprintf (vect_dump, "operand 1 using scalar mode.");
-	              vec_oprnd1 = op1;
-	            }
-	        }
-	      if (!vec_oprnd1)
-	        vec_oprnd1 = vect_get_vec_def_for_operand (op1, stmt, NULL);
+	  if (slp_node)
+            {
+	      /* Get vectorized arguments for SLP_NODE.  */
+              if (!vect_get_slp_defs (slp_node, &vec_oprnds0, &vec_oprnds1))
+                return false;
+            }
+          else
+            {
+	      vec_oprnd0 = vect_get_vec_def_for_operand (op0, stmt, NULL);
+	      VEC_quick_push (tree, vec_oprnds0, vec_oprnd0);
+	      if (op_type == binary_op)
+		{
+		  if (code == LSHIFT_EXPR || code == RSHIFT_EXPR)
+		    {
+		      /* Vector shl and shr insn patterns can be defined with
+			 scalar operand 2 (shift operand).  In this case, use
+			 constant or loop invariant op1 directly, without
+			 extending it to vector mode first.  */
+		      optab_op2_mode = insn_data[icode].operand[2].mode;
+		      if (!VECTOR_MODE_P (optab_op2_mode))
+			{
+			  if (vect_print_dump_info (REPORT_DETAILS))
+			    fprintf (vect_dump, "operand 1 using scalar mode.");
+			  vec_oprnd1 = op1;
+			}
+		    }
+		  if (!vec_oprnd1)
+		    vec_oprnd1 = vect_get_vec_def_for_operand (op1, stmt, NULL);
+
+		  VEC_quick_push (tree, vec_oprnds1, vec_oprnd1);
+                }
 	    }
 	}
       else
 	{
+          /* FORNOW unreachable with SLP.  */
+	  gcc_assert (!slp_node);
+
 	  vec_oprnd0 = vect_get_vec_def_for_stmt_copy (dt[0], vec_oprnd0);
+	  VEC_replace (tree, vec_oprnds0, 0, vec_oprnd0);
 	  if (op_type == binary_op)
-	    vec_oprnd1 = vect_get_vec_def_for_stmt_copy (dt[1], vec_oprnd1);
+	    {
+	      vec_oprnd1 = vect_get_vec_def_for_stmt_copy (dt[1], vec_oprnd1);
+	      VEC_replace (tree, vec_oprnds1, 0, vec_oprnd1);
+	    }
 	}
 
       /* Arguments are ready. Create the new vector stmt.  */
+      for (i = 0; VEC_iterate (tree, vec_oprnds0, i, vop0); i++)
+        {
+          if (op_type == binary_op)
+            {
+              vop1 = VEC_index (tree, vec_oprnds1, i);
+              new_stmt = build_gimple_modify_stmt (vec_dest,
+					 build2 (code, vectype, vop0, vop1));
+            }
+	  else
+	    new_stmt = build_gimple_modify_stmt (vec_dest,
+				    build1 (code, vectype, vop0));
 
-      if (op_type == binary_op)
-        new_stmt = build_gimple_modify_stmt (vec_dest,
-                    build2 (code, vectype, vec_oprnd0, vec_oprnd1));
-      else
-        new_stmt = build_gimple_modify_stmt (vec_dest,
-                    build1 (code, vectype, vec_oprnd0));
-      new_temp = make_ssa_name (vec_dest, new_stmt);
-      GIMPLE_STMT_OPERAND (new_stmt, 0) = new_temp;
-      vect_finish_stmt_generation (stmt, new_stmt, bsi);
+	  new_temp = make_ssa_name (vec_dest, new_stmt);
+	  GIMPLE_STMT_OPERAND (new_stmt, 0) = new_temp;
+	  vect_finish_stmt_generation (stmt, new_stmt, bsi);
+          if (slp_node)
+            VEC_quick_push (tree, SLP_TREE_VEC_STMTS (slp_node), new_stmt);
+        }
 
       if (j == 0)
 	STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
@@ -3503,6 +3613,10 @@ vectorizable_operation (tree stmt, block
       prev_stmt_info = vinfo_for_stmt (new_stmt);
     }
 
+  VEC_free (tree, heap, vec_oprnds0);
+  if (vec_oprnds1)
+    VEC_free (tree, heap, vec_oprnds1);
+
   return true;
 }
 
@@ -3863,6 +3977,7 @@ vect_strided_store_supported (tree vecty
 	fprintf (vect_dump, "interleave op not supported by target.");
       return false;
     }
+
   return true;
 }
 
@@ -4035,7 +4150,14 @@ vectorizable_store (tree stmt, block_stm
   bool strided_store = false;
   unsigned int group_size, i;
   VEC(tree,heap) *dr_chain = NULL, *oprnds = NULL, *result_chain = NULL;
-  
+  VEC(tree,heap) *vec_oprnds = NULL;
+  bool slp = (slp_node != NULL);
+  stmt_vec_info first_stmt_vinfo;
+  unsigned int vec_num;
+
+  if (slp) /* FORNOW.  */
+    ncopies = 1;
+
   gcc_assert (ncopies >= 1);
 
   if (!STMT_VINFO_RELEVANT_P (stmt_info))
@@ -4082,7 +4204,8 @@ vectorizable_store (tree stmt, block_stm
   if (STMT_VINFO_STRIDED_ACCESS (stmt_info))
     {
       strided_store = true;
-      if (!vect_strided_store_supported (vectype))
+      if (!vect_strided_store_supported (vectype)
+	  && !STMT_VINFO_PURE_SLP (stmt_info) && !slp)
 	return false;      
     }
 
@@ -4106,17 +4229,28 @@ vectorizable_store (tree stmt, block_stm
       /* We vectorize all the stmts of the interleaving group when we
 	 reach the last stmt in the group.  */
       if (DR_GROUP_STORE_COUNT (vinfo_for_stmt (first_stmt)) 
-	  < DR_GROUP_SIZE (vinfo_for_stmt (first_stmt)))
+	  < DR_GROUP_SIZE (vinfo_for_stmt (first_stmt))
+	  && !slp)
 	{
 	  *vec_stmt = NULL_TREE;
 	  return true;
 	}
+
+      if (slp)
+	strided_store = false;
+
+      /* VEC_NUM is the number of vect stmts to be created for this group.  */
+      if (slp && SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) < group_size)
+	vec_num = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
+      else
+	vec_num = group_size;
     }
   else 
     {
       first_stmt = stmt;
       first_dr = dr;
-      group_size = 1;
+      group_size = vec_num = 1;
+      first_stmt_vinfo = stmt_info;
     }
   
   if (vect_print_dump_info (REPORT_DETAILS))
@@ -4176,26 +4310,40 @@ vectorizable_store (tree stmt, block_stm
 
       if (j == 0)
 	{
-	  /* For interleaved stores we collect vectorized defs for all the 
-	     stores in the group in DR_CHAIN and OPRNDS. DR_CHAIN is then used
-	     as an input to vect_permute_store_chain(), and OPRNDS as an input
-	     to vect_get_vec_def_for_stmt_copy() for the next copy.
-	     If the store is not strided, GROUP_SIZE is 1, and DR_CHAIN and
-	     OPRNDS are of size 1.  */
-	  next_stmt = first_stmt;	  
-	  for (i = 0; i < group_size; i++)
-	    {
-	      /* Since gaps are not supported for interleaved stores, GROUP_SIZE
-		 is the exact number of stmts in the chain. Therefore, NEXT_STMT
-		 can't be NULL_TREE.  In case that there is no interleaving, 
-		 GROUP_SIZE is 1, and only one iteration of the loop will be 
-		 executed.  */
-	      gcc_assert (next_stmt);
-	      op = GIMPLE_STMT_OPERAND (next_stmt, 1);
-	      vec_oprnd = vect_get_vec_def_for_operand (op, next_stmt, NULL);
-	      VEC_quick_push(tree, dr_chain, vec_oprnd); 
-	      VEC_quick_push(tree, oprnds, vec_oprnd); 
-	      next_stmt = DR_GROUP_NEXT_DR (vinfo_for_stmt (next_stmt));
+          if (slp)
+            {
+	      /* Get vectorized arguments for SLP_NODE.  */
+              if (!vect_get_slp_defs (slp_node, &vec_oprnds, NULL))
+                return false;
+
+              vec_oprnd = VEC_index (tree, vec_oprnds, 0);
+            }
+          else
+            {
+	      /* For interleaved stores we collect vectorized defs for all the 
+		 stores in the group in DR_CHAIN and OPRNDS. DR_CHAIN is then 
+		 used as an input to vect_permute_store_chain(), and OPRNDS as 
+		 an input to vect_get_vec_def_for_stmt_copy() for the next copy.
+
+		 If the store is not strided, GROUP_SIZE is 1, and DR_CHAIN and
+		 OPRNDS are of size 1.  */
+	      next_stmt = first_stmt;	  
+	      for (i = 0; i < group_size; i++)
+		{
+		  /* Since gaps are not supported for interleaved stores, 
+		     GROUP_SIZE is the exact number of stmts in the chain. 
+		     Therefore, NEXT_STMT can't be NULL_TREE.  In case that 
+		     there is no interleaving, GROUP_SIZE is 1, and only one 
+		     iteration of the loop will be executed.  */
+		  gcc_assert (next_stmt);
+		  op = GIMPLE_STMT_OPERAND (next_stmt, 1);
+
+		  vec_oprnd = vect_get_vec_def_for_operand (op, next_stmt, 
+							    NULL);
+		  VEC_quick_push(tree, dr_chain, vec_oprnd); 
+		  VEC_quick_push(tree, oprnds, vec_oprnd); 
+		  next_stmt = DR_GROUP_NEXT_DR (vinfo_for_stmt (next_stmt));
+		}
 	    }
 	  dataref_ptr = vect_create_data_ref_ptr (first_stmt, bsi, NULL_TREE, 
 						  &dummy, &ptr_incr, false,
@@ -4203,6 +4351,9 @@ vectorizable_store (tree stmt, block_stm
 	}
       else 
 	{
+	  /* FORNOW SLP doesn't work for multiple types.  */
+	  gcc_assert (!slp);
+
 	  /* For interleaved stores we created vectorized defs for all the 
 	     defs stored in OPRNDS in the previous iteration (previous copy). 
 	     DR_CHAIN is then used as an input to vect_permute_store_chain(), 
@@ -4230,12 +4381,18 @@ vectorizable_store (tree stmt, block_stm
 	}
 
       next_stmt = first_stmt;
-      for (i = 0; i < group_size; i++)
+      for (i = 0; i < vec_num; i++)
 	{
-	  /* For strided stores vectorized defs are interleaved in 
-	     vect_permute_store_chain().  */
-	  if (strided_store)
-	    vec_oprnd = VEC_index(tree, result_chain, i);
+	  if (i > 0)
+	    /* Bump the vector pointer.  */
+	    dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, bsi, stmt);
+
+	  if (slp)
+	    vec_oprnd = VEC_index (tree, vec_oprnds, i);
+	  else if (strided_store)
+	    /* For strided stores vectorized defs are interleaved in 
+	       vect_permute_store_chain().  */
+	    vec_oprnd = VEC_index (tree, result_chain, i);
 
 	  data_ref = build_fold_indirect_ref (dataref_ptr);
 	  /* Arguments are ready. Create the new vector stmt.  */
@@ -4250,10 +4407,9 @@ vectorizable_store (tree stmt, block_stm
 
 	  prev_stmt_info = vinfo_for_stmt (new_stmt);
 	  next_stmt = DR_GROUP_NEXT_DR (vinfo_for_stmt (next_stmt));
+
 	  if (!next_stmt)
 	    break;
-	  /* Bump the vector pointer.  */
-	  dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, bsi, stmt);
 	}
     }
 
@@ -4672,6 +4828,12 @@ vectorizable_load (tree stmt, block_stmt
   VEC(tree,heap) *dr_chain = NULL;
   bool strided_load = false;
   tree first_stmt;
+  int vec_num;
+  bool slp = (slp_node != NULL);
+
+  /* FORNOW.  */
+  if (slp) 
+    ncopies = 1;
 
   gcc_assert (ncopies >= 1);
 
@@ -4723,7 +4885,8 @@ vectorizable_load (tree stmt, block_stmt
       strided_load = true;
 
       /* Check if interleaving is supported.  */
-      if (!vect_strided_load_supported (vectype))
+      if (!vect_strided_load_supported (vectype)
+	  && !STMT_VINFO_PURE_SLP (stmt_info) && !slp)
 	return false;
     }
 
@@ -4751,12 +4914,21 @@ vectorizable_load (tree stmt, block_stmt
       first_dr = STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt));
       group_size = DR_GROUP_SIZE (vinfo_for_stmt (first_stmt));
       dr_chain = VEC_alloc (tree, heap, group_size);
+
+      /* VEC_NUM is the number of vect stmts to be created for this group.  */
+      if (slp)
+	{
+	  strided_load = false;
+	  vec_num = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
+	}
+      else
+	vec_num = group_size;
     }
   else
     {
       first_stmt = stmt;
       first_dr = dr;
-      group_size = 1;
+      group_size = vec_num = 1;
     }
 
   alignment_support_cheme = vect_supportable_dr_alignment (first_dr);
@@ -4877,8 +5049,11 @@ vectorizable_load (tree stmt, block_stmt
       else
         dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, bsi, stmt);
 
-      for (i = 0; i < group_size; i++)
+      for (i = 0; i < vec_num; i++)
 	{
+	  if (i > 0)
+	    dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, bsi, stmt);
+
 	  /* 2. Create the vector-load in the loop.  */
 	  switch (alignment_support_cheme)
 	    {
@@ -4926,16 +5101,21 @@ vectorizable_load (tree stmt, block_stmt
 	      new_temp = make_ssa_name (vec_dest, new_stmt);
 	      GIMPLE_STMT_OPERAND (new_stmt, 0) = new_temp;
 	      vect_finish_stmt_generation (stmt, new_stmt, bsi);
-	      if (i == group_size - 1 && j == ncopies - 1)
+	      if (i == vec_num - 1 && j == ncopies - 1)
 		add_phi_arg (phi_stmt, lsq, loop_latch_edge (loop));
 	      msq = lsq;
 	    }
+
 	  if (strided_load)
 	    VEC_quick_push (tree, dr_chain, new_temp);
-	  if (i < group_size - 1)
-	    dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, bsi, stmt);	  
+
+          if (slp)
+	    VEC_quick_push (tree, SLP_TREE_VEC_STMTS (slp_node), new_stmt);
 	}
 
+      if (slp)
+        continue;
+
       if (strided_load)
 	{
 	  if (!vect_transform_strided_load (stmt, dr_chain, group_size, bsi))
@@ -5096,6 +5276,10 @@ vectorizable_condition (tree stmt, block
   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_loop_def)
     return false;
 
+  /* FORNOW: SLP not supported.  */
+  if (STMT_VINFO_SLP_TYPE (stmt_info))
+    return false;
+
   /* FORNOW: not yet supported.  */
   if (STMT_VINFO_LIVE_P (stmt_info))
     {
@@ -6098,6 +6282,8 @@ vect_transform_loop (loop_vec_info loop_
   tree ratio = NULL;
   int vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   bool strided_store;
+  bool slp_scheduled = false;
+  unsigned int nunits;
 
   if (vect_print_dump_info (REPORT_DETAILS))
     fprintf (vect_dump, "=== vec_transform_loop ===");
@@ -6240,21 +6426,57 @@ vect_transform_loop (loop_vec_info loop_
 	      fprintf (vect_dump, "------>vectorizing statement: ");
 	      print_generic_expr (vect_dump, stmt, TDF_SLIM);
 	    }	
+
 	  stmt_info = vinfo_for_stmt (stmt);
-	  gcc_assert (stmt_info);
-	  if (!STMT_VINFO_RELEVANT_P (stmt_info)
-	      && !STMT_VINFO_LIVE_P (stmt_info))
+	  if (!stmt_info 
+	      ||(!STMT_VINFO_RELEVANT_P (stmt_info)
+		 && !STMT_VINFO_LIVE_P (stmt_info)))
 	    {
 	      bsi_next (&si);
 	      continue;
 	    }
 
 	  gcc_assert (STMT_VINFO_VECTYPE (stmt_info));
-	  if ((TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info))
-		 != (unsigned HOST_WIDE_INT) vectorization_factor)
+	  nunits = 
+	    (unsigned int) TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info));
+	  if (!STMT_VINFO_SLP_TYPE (stmt_info)
+	      && nunits != (unsigned int) vectorization_factor
 	      && vect_print_dump_info (REPORT_DETAILS))
+	    /* For SLP VF is set according to unrolling factor, and not to 
+	       vector size, hence for SLP this print is not valid.  */
 	    fprintf (vect_dump, "multiple-types.");
 
+	  /* SLP. Schedule all the SLP instances when the first SLP stmt is 
+	     reached.  */
+	  if (STMT_VINFO_SLP_TYPE (stmt_info))
+	    {
+	      if (!slp_scheduled)
+		{
+		  slp_scheduled = true;
+
+		  if (vect_print_dump_info (REPORT_DETAILS))
+		    fprintf (vect_dump, "=== scheduling SLP instances ===");
+
+		  is_store = vect_schedule_slp (loop_vinfo, nunits);
+		  
+		  /* IS_STORE is true if STMT is a store. Stores cannot be of 
+		     hybrid SLP type. They are removed in   
+		     vect_schedule_slp_instance and their vinfo is destroyed. */
+		  if (is_store)
+		    { 
+		      bsi_next (&si);
+		      continue;
+		    }
+		}
+	      
+	      /* Hybrid SLP stmts must be vectorized in addition to SLP.  */
+	      if (STMT_VINFO_PURE_SLP (stmt_info))
+		{
+		  bsi_next (&si);
+		  continue;
+		}
+	    }
+
 	  /* -------- vectorize statement ------------ */
 	  if (vect_print_dump_info (REPORT_DETAILS))
 	    fprintf (vect_dump, "transform statement.");

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] Loop-aware SLP 4/5
  2007-08-14 13:01 [patch] Loop-aware SLP Ira Rosen
                   ` (2 preceding siblings ...)
  2007-08-14 13:12 ` [patch] Loop-aware SLP 3/5 Ira Rosen
@ 2007-08-14 13:16 ` Ira Rosen
  2007-08-14 13:18 ` [patch] Loop-aware SLP 5/5 Ira Rosen
  2007-08-14 13:21 ` [patch] Loop-aware SLP Paolo Bonzini
  5 siblings, 0 replies; 14+ messages in thread
From: Ira Rosen @ 2007-08-14 13:16 UTC (permalink / raw)
  To: Ira Rosen; +Cc: Dorit Nuzman, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1582 bytes --]

This part adds an adjustment of vectorization cost model to SLP.

SLP costs are calculated for each SLP computation tree recursively during
its analysis, and added to the overall costs if we decide to SLP.

Thanks,
Ira

ChangeLog:

      * tree-vectorizer.h (_slp_tree): Add new fields for costs and their
      access functions.
      (_slp_instance): Likewise.
      (vect_model_simple_cost, vect_model_store_cost,
vect_model_load_cost):
      Declare (make extern).
      * tree-vect-analyze.c (vect_update_slp_costs_according_to_vf): New.
      (vect_analyze_operations): Call
vect_update_slp_costs_according_to_vf.
      (vect_get_and_check_slp_defs): Calculate costs. Add arguments.
      (vect_build_slp_tree): Likewise.
      (vect_analyze_slp_instance): Initialize cost fields. Update
      arguments of vect_build_slp_tree.
      * tree-vect-transform.c (vect_estimate_min_profitable_iters): Take
      SLP costs into account.
      (vect_model_simple_cost): Make extern, add SLP parameter and handle
      SLP.
      (vect_model_store_cost, vect_model_load_cost): Likewise.
      (vectorizable_call): Add argument to vect_model_simple_cost.
      (vectorizable_assignment): Call vect_model_simple_cost only for not
      pure SLP stmts.
      (vectorizable_operation): Likewise.
      (vectorizable_type_demotion): Add argument to
      vect_model_simple_cost.
      (vectorizable_type_promotion): Likewise.
      (vectorizable_store): Call vect_model_simple_cost only for not pure
      SLP stmts.
      (vectorizable_load): Likewise.

(See attached file: slp-part4.txt)

[-- Attachment #2: slp-part4.txt --]
[-- Type: text/plain, Size: 19284 bytes --]

--- tree-vectorizer.h.patch2	2007-08-14 15:43:29.000000000 +0300
+++ tree-vectorizer.h	2007-08-14 15:44:42.000000000 +0300
@@ -120,6 +120,12 @@ typedef struct _slp_instance {
   /* The unrolling factor required to vectorized this SLP instance.  */
   unsigned int unrolling_factor;
 
+  /* Vectorization costs associated with SLP instance.  */
+  struct  
+  {
+    int outside_of_loop;     /* Statements generated outside loop.  */
+    int inside_of_loop;      /* Statements generated inside loop.  */
+  } cost;
 } *slp_instance;
 
 DEF_VEC_P(slp_instance);
@@ -129,12 +135,16 @@ DEF_VEC_ALLOC_P(slp_instance, heap);
 #define SLP_INSTANCE_TREE(S)                     (S)->root
 #define SLP_INSTANCE_GROUP_SIZE(S)               (S)->group_size
 #define SLP_INSTANCE_UNROLLING_FACTOR(S)         (S)->unrolling_factor
+#define SLP_INSTANCE_OUTSIDE_OF_LOOP_COST(S)     (S)->cost.outside_of_loop
+#define SLP_INSTANCE_INSIDE_OF_LOOP_COST(S)      (S)->cost.inside_of_loop
 
 #define SLP_TREE_LEFT(S)                         (S)->left
 #define SLP_TREE_RIGHT(S)                        (S)->right
 #define SLP_TREE_SCALAR_STMTS(S)                 (S)->stmts
 #define SLP_TREE_VEC_STMTS(S)                    (S)->vec_stmts
 #define SLP_TREE_NUMBER_OF_VEC_STMTS(S)          (S)->vec_stmts_size
+#define SLP_TREE_OUTSIDE_OF_LOOP_COST(S)         (S)->cost.outside_of_loop
+#define SLP_TREE_INSIDE_OF_LOOP_COST(S)          (S)->cost.inside_of_loop
 
 /*-----------------------------------------------------------------*/
 /* Info on vectorized loops.                                       */
@@ -598,6 +608,11 @@ extern bool vectorizable_live_operation 
 extern bool vectorizable_reduction (tree, block_stmt_iterator *, tree *);
 extern bool vectorizable_induction (tree, block_stmt_iterator *, tree *);
 extern int  vect_estimate_min_profitable_iters (loop_vec_info);
+extern void vect_model_simple_cost (stmt_vec_info, int, enum vect_def_type *, 
+				    slp_tree);
+extern void vect_model_store_cost (stmt_vec_info, int, enum vect_def_type, 
+				   slp_tree);
+extern void vect_model_load_cost (stmt_vec_info, int, slp_tree);
 /* Driver for transformation stage.  */
 extern void vect_transform_loop (loop_vec_info);
 
--- tree-vect-analyze.c.patch4	2007-08-14 15:42:42.000000000 +0300
+++ tree-vect-analyze.c	2007-08-14 15:43:26.000000000 +0300
@@ -350,6 +350,33 @@ vect_can_advance_ivs_p (loop_vec_info lo
 }
 
 
+/* SLP costs are calculated according to SLP instance unrolling factor (i.e., 
+   the number of created vector stmts depends on the unrolling factor). However,
+   the actual number of vector stmts for every SLP node depends on VF which is
+   set later in vect_analyze_operations(). Hence, SLP costs should be updated.
+   In this function we assume that the inside costs calculated in 
+   vect_model_xxx_cost are linear in ncopies.  */
+
+static void
+vect_update_slp_costs_according_to_vf (loop_vec_info loop_vinfo)
+{
+  unsigned int i, vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  VEC (slp_instance, heap) *slp_instances = LOOP_VINFO_SLP_INSTANCES (loop_vinfo);
+  slp_instance instance;
+
+  if (vect_print_dump_info (REPORT_DETAILS))
+    fprintf (vect_dump, "=== vect_update_slp_costs_according_to_vf ===");
+
+  for (i = 0; VEC_iterate (slp_instance, slp_instances, i, instance); i++)
+    {
+      /* We assume that costs are linear in ncopies.  */
+      if (SLP_INSTANCE_UNROLLING_FACTOR (instance) != vf)
+	SLP_INSTANCE_INSIDE_OF_LOOP_COST (instance) *= vf 
+	  / SLP_INSTANCE_UNROLLING_FACTOR (instance);	  
+    }
+}
+
+
 /* Function vect_analyze_operations.
 
    Scan the loop stmts and make sure they are all vectorizable.  */
@@ -569,6 +596,10 @@ vect_analyze_operations (loop_vec_info l
       LOOP_VINFO_VECT_FACTOR (loop_vinfo) = vectorization_factor;
     }
 
+  /* After VF is set, SLP costs should be updated since the number of created
+     vector stmts depends on VF.  */
+  vect_update_slp_costs_according_to_vf (loop_vinfo);
+
   if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
       && vect_print_dump_info (REPORT_DETAILS))
     fprintf (vect_dump,
@@ -2248,7 +2279,8 @@ vect_get_and_check_slp_defs (loop_vec_in
 			     enum vect_def_type *first_stmt_dt1,
 			     tree *first_stmt_def0_type, 
 			     tree *first_stmt_def1_type,
-			     tree *first_stmt_const_oprnd)
+			     tree *first_stmt_const_oprnd,
+			     int ncopies_for_cost)
 {
   tree oprnd;
   enum operation_type op_type = TREE_OPERAND_LENGTH (rhs);
@@ -2291,6 +2323,14 @@ vect_get_and_check_slp_defs (loop_vec_in
 	    *first_stmt_def0_type = TREE_TYPE (def);
 	  else
 	    *first_stmt_const_oprnd = oprnd;
+
+	  /* Analyze costs (for the first stmt of the group only).  */
+	  if (op_type)
+	    /* Not memory operation (we don't call this functions for loads).  */
+	    vect_model_simple_cost (stmt_info, ncopies_for_cost, dt, slp_node);
+	  else
+	    /* Store.  */
+	    vect_model_store_cost (stmt_info, ncopies_for_cost, dt[0], slp_node);
 	}
       
       else
@@ -2369,7 +2409,9 @@ vect_get_and_check_slp_defs (loop_vec_in
 
 static bool
 vect_build_slp_tree (loop_vec_info loop_vinfo, slp_tree *node, 
-		     unsigned int group_size, bool *slp_impossible)
+		     unsigned int group_size, bool *slp_impossible,
+		     int *inside_cost, int *outside_cost,
+		     int ncopies_for_cost)
 {
   VEC (tree, heap) *def_stmts0 = VEC_alloc (tree, heap, group_size);
   VEC (tree, heap) *def_stmts1 =  VEC_alloc (tree, heap, group_size);
@@ -2492,7 +2534,8 @@ vect_build_slp_tree (loop_vec_info loop_
 						&first_stmt_dt1, 
 						&first_stmt_def0_type, 
 						&first_stmt_def1_type,
-						&first_stmt_const_oprnd))
+						&first_stmt_const_oprnd,
+						ncopies_for_cost))
 		return false;
 	    }
 	    else
@@ -2515,6 +2558,10 @@ vect_build_slp_tree (loop_vec_info loop_
 
 			return false;
 		      }
+
+		    /* Analyze costs (for the first stmt in the group).  */
+		    vect_model_load_cost (vinfo_for_stmt (stmt), 
+					  ncopies_for_cost, *node);
 		  }
 		else
 		  {
@@ -2572,11 +2619,16 @@ vect_build_slp_tree (loop_vec_info loop_
 					    &first_stmt_dt1, 
 					    &first_stmt_def0_type, 
 					    &first_stmt_def1_type,
-					    &first_stmt_const_oprnd))
+					    &first_stmt_const_oprnd,
+					    ncopies_for_cost))
 	    return false;
 	}
     }
 
+  /* Add the costs of the node to the overall instance costs.  */
+  *inside_cost += SLP_TREE_INSIDE_OF_LOOP_COST (*node); 
+  *outside_cost += SLP_TREE_OUTSIDE_OF_LOOP_COST (*node);
+
   /* Strided loads were reached - stop the recursion.  */
   if (stop_recursion)
     return true;
@@ -2589,8 +2641,11 @@ vect_build_slp_tree (loop_vec_info loop_
       SLP_TREE_VEC_STMTS (left_node) = NULL;
       SLP_TREE_LEFT (left_node) = NULL;
       SLP_TREE_RIGHT (left_node) = NULL;
+      SLP_TREE_OUTSIDE_OF_LOOP_COST (left_node) = 0;
+      SLP_TREE_INSIDE_OF_LOOP_COST (left_node) = 0;
       if (!vect_build_slp_tree (loop_vinfo, &left_node, group_size, 
-				slp_impossible))
+				slp_impossible, inside_cost, outside_cost,
+				ncopies_for_cost))
 	return false;
       
       SLP_TREE_LEFT (*node) = left_node;
@@ -2603,8 +2658,11 @@ vect_build_slp_tree (loop_vec_info loop_
       SLP_TREE_VEC_STMTS (right_node) = NULL;
       SLP_TREE_LEFT (right_node) = NULL;
       SLP_TREE_RIGHT (right_node) = NULL;
+      SLP_TREE_OUTSIDE_OF_LOOP_COST (right_node) = 0;
+      SLP_TREE_INSIDE_OF_LOOP_COST (right_node) = 0;
       if (!vect_build_slp_tree (loop_vinfo, &right_node, group_size,
-				slp_impossible))
+				slp_impossible, inside_cost, outside_cost,
+				ncopies_for_cost))
 	return false;
       
       SLP_TREE_RIGHT (*node) = right_node;
@@ -2672,6 +2730,7 @@ vect_analyze_slp_instance (loop_vec_info
   tree vectype, scalar_type, next;
   unsigned int vectorization_factor = 0, ncopies;
   bool slp_impossible = false; 
+  int inside_cost = 0, outside_cost = 0, ncopies_for_cost;
 
   /* FORNOW: multiple types are not supported.  */
   scalar_type = TREE_TYPE (DR_REF (STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt))));
@@ -2701,18 +2760,28 @@ vect_analyze_slp_instance (loop_vec_info
   SLP_TREE_NUMBER_OF_VEC_STMTS (node) = 0;
   SLP_TREE_LEFT (node) = NULL;
   SLP_TREE_RIGHT (node) = NULL;
+  SLP_TREE_OUTSIDE_OF_LOOP_COST (node) = 0;
+  SLP_TREE_INSIDE_OF_LOOP_COST (node) = 0;
 
   /* Calculate the unrolling factor.  */
   unrolling_factor = least_common_multiple (nunits, group_size) / group_size;
 	
+  /* Calculate the number of vector stmts to create based on the unrolling
+     factor (number of vectors is 1 if NUNITS >= GROUP_SIZE, and is
+     GROUP_SIZE / NUNITS otherwise.  */
+  ncopies_for_cost = unrolling_factor * group_size / nunits;
+
   /* Build the tree for the SLP instance.  */
-  if (vect_build_slp_tree (loop_vinfo, &node, group_size, &slp_impossible))
+  if (vect_build_slp_tree (loop_vinfo, &node, group_size, &slp_impossible,
+			   &inside_cost, &outside_cost, ncopies_for_cost))
     {
       /* Create a new SLP instance.  */  
       new_instance = XNEW (struct _slp_instance);
       SLP_INSTANCE_TREE (new_instance) = node;
       SLP_INSTANCE_GROUP_SIZE (new_instance) = group_size;
       SLP_INSTANCE_UNROLLING_FACTOR (new_instance) = unrolling_factor;
+      SLP_INSTANCE_OUTSIDE_OF_LOOP_COST (new_instance) = outside_cost;
+      SLP_INSTANCE_INSIDE_OF_LOOP_COST (new_instance) = inside_cost;
       VEC_safe_push (slp_instance, heap, LOOP_VINFO_SLP_INSTANCES (loop_vinfo), 
 		     new_instance);
       if (vect_print_dump_info (REPORT_DETAILS))
--- tree-vect-transform.c.patch4	2007-08-14 15:42:48.000000000 +0300
+++ tree-vect-transform.c	2007-08-14 15:45:55.000000000 +0300
@@ -99,6 +99,8 @@ vect_estimate_min_profitable_iters (loop
   basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
   int nbbs = loop->num_nodes;
   int byte_misalign;
+  VEC (slp_instance, heap) *slp_instances;
+  slp_instance instance;
 
   /* Cost model disabled.  */
   if (!flag_vect_cost_model)
@@ -250,6 +252,14 @@ vect_estimate_min_profitable_iters (loop
 		  targetm.vectorize.builtin_vectorization_cost (runtime_test));
     }
 
+  /* Add SLP costs.  */
+  slp_instances = LOOP_VINFO_SLP_INSTANCES (loop_vinfo);
+  for (i = 0; VEC_iterate (slp_instance, slp_instances, i, instance); i++)
+    {
+      vec_outside_cost += SLP_INSTANCE_OUTSIDE_OF_LOOP_COST (instance);
+      vec_inside_cost += SLP_INSTANCE_INSIDE_OF_LOOP_COST (instance);
+    }
+
   /* Calculate number of iterations required to make the vector version 
      profitable, relative to the loop bodies only. The following condition
      must hold true: ((SIC*VF)-VIC)*niters > VOC*VF, where
@@ -416,25 +426,37 @@ vect_model_induction_cost (stmt_vec_info
    single op.  Right now, this does not account for multiple insns that could
    be generated for the single vector op.  We will handle that shortly.  */
 
-static void
+void
 vect_model_simple_cost (stmt_vec_info stmt_info, int ncopies, 
-                        enum vect_def_type *dt)
+			enum vect_def_type *dt, slp_tree slp_node)
 {
   int i;
+  int *inside_cost_field, *outside_cost_field;
 
-  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = ncopies * TARG_VEC_STMT_COST;
+  /* Take addresses of relevant fields to update in the function.  */
+  if (slp_node)
+    {
+      inside_cost_field = &(SLP_TREE_INSIDE_OF_LOOP_COST (slp_node));
+      outside_cost_field = &(SLP_TREE_OUTSIDE_OF_LOOP_COST (slp_node));
+    }
+  else
+    {
+      inside_cost_field = &(STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info));
+      outside_cost_field = &(STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info));
+    }
+
+  *inside_cost_field = ncopies * TARG_VEC_STMT_COST;
 
   /* FORNOW: Assuming maximum 2 args per stmts.  */
   for (i = 0; i < 2; i++)
     {
       if (dt[i] == vect_constant_def || dt[i] == vect_invariant_def)
-	STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) += TARG_SCALAR_TO_VEC_COST; 
+	*outside_cost_field += TARG_SCALAR_TO_VEC_COST; 
     }
   
   if (vect_print_dump_info (REPORT_DETAILS))
     fprintf (vect_dump, "vect_model_simple_cost: inside_cost = %d, "
-             "outside_cost = %d .", STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info),
-             STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info));
+             "outside_cost = %d .", *inside_cost_field, *outside_cost_field);
 }
 
 
@@ -461,18 +483,31 @@ vect_cost_strided_group_size (stmt_vec_i
    Models cost for stores.  In the case of strided accesses, one access
    has the overhead of the strided access attributed to it.  */
 
-static void
+void
 vect_model_store_cost (stmt_vec_info stmt_info, int ncopies, 
-                       enum vect_def_type dt)
+		       enum vect_def_type dt, slp_tree slp_node)
 {
   int cost = 0;
   int group_size;
+  int *inside_cost_field, *outside_cost_field;
+
+  /* Take addresses of relevant fields to update in the function.  */
+  if (slp_node)
+    {
+      inside_cost_field = &(SLP_TREE_INSIDE_OF_LOOP_COST (slp_node));
+      outside_cost_field = &(SLP_TREE_OUTSIDE_OF_LOOP_COST (slp_node));
+    }
+  else
+    {
+      inside_cost_field = &(STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info));
+      outside_cost_field = &(STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info));
+    }
 
   if (dt == vect_constant_def || dt == vect_invariant_def)
-    STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = TARG_SCALAR_TO_VEC_COST;
+    *outside_cost_field = TARG_SCALAR_TO_VEC_COST;
 
   /* Strided access?  */
-  if (STMT_VINFO_STRIDED_ACCESS (stmt_info)) 
+  if (DR_GROUP_FIRST_DR (stmt_info)) 
     group_size = vect_cost_strided_group_size (stmt_info);
   /* Not a strided access.  */
   else
@@ -495,12 +530,11 @@ vect_model_store_cost (stmt_vec_info stm
   /* Costs of the stores.  */
   cost += ncopies * TARG_VEC_STORE_COST;
 
-  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost;
+  *inside_cost_field = cost;
 
   if (vect_print_dump_info (REPORT_DETAILS))
     fprintf (vect_dump, "vect_model_store_cost: inside_cost = %d, "
-             "outside_cost = %d .", STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info),
-             STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info));
+             "outside_cost = %d .", *inside_cost_field, *outside_cost_field);
 }
 
 
@@ -511,8 +545,8 @@ vect_model_store_cost (stmt_vec_info stm
    accesses are supported for loads, we also account for the costs of the 
    access scheme chosen.  */
 
-static void
-vect_model_load_cost (stmt_vec_info stmt_info, int ncopies)
+void
+vect_model_load_cost (stmt_vec_info stmt_info, int ncopies, slp_tree slp_node)
 		 
 {
   int inner_cost = 0;
@@ -520,10 +554,23 @@ vect_model_load_cost (stmt_vec_info stmt
   int alignment_support_cheme;
   tree first_stmt;
   struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info), *first_dr;
+  int *inside_cost_field, *outside_cost_field;
+
+  /* Take addresses of relevant fields to update in the function.  */
+  if (slp_node)
+    {
+      inside_cost_field = &(SLP_TREE_INSIDE_OF_LOOP_COST (slp_node));
+      outside_cost_field = &(SLP_TREE_OUTSIDE_OF_LOOP_COST (slp_node));
+    }
+  else
+    {
+      inside_cost_field = &(STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info));
+      outside_cost_field = &(STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info));
+    }
 
   /* Strided accesses?  */
   first_stmt = DR_GROUP_FIRST_DR (stmt_info);
-  if (first_stmt)
+  if (first_stmt && !slp_node)
     {
       group_size = vect_cost_strided_group_size (stmt_info);
       first_dr = STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt));
@@ -589,14 +636,14 @@ vect_model_load_cost (stmt_vec_info stmt
            access in the group. Inside the loop, there is a load op
            and a realignment op.  */
 
-        if ((!STMT_VINFO_STRIDED_ACCESS (stmt_info)) || group_size > 1)
+        if ((!DR_GROUP_FIRST_DR (stmt_info)) || group_size > 1 || slp_node)
           {
             outer_cost = 2*TARG_VEC_STMT_COST;
             if (targetm.vectorize.builtin_mask_for_load)
               outer_cost += TARG_VEC_STMT_COST;
           }
         
-        STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = outer_cost;
+        *outside_cost_field = outer_cost;
 
         inner_cost += ncopies * (TARG_VEC_LOAD_COST + TARG_VEC_STMT_COST);
 
@@ -607,12 +654,12 @@ vect_model_load_cost (stmt_vec_info stmt
       gcc_unreachable ();
     }
 
-  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = inner_cost;
+  *inside_cost_field = inner_cost;
 
   if (vect_print_dump_info (REPORT_DETAILS))
     fprintf (vect_dump, "vect_model_load_cost: inside_cost = %d, "
-             "outside_cost = %d .", STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info),
-             STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info));
+             "outside_cost = %d .", *inside_cost_field, *outside_cost_field);
+
 }
 
 
@@ -2658,7 +2705,7 @@ vectorizable_call (tree stmt, block_stmt
       STMT_VINFO_TYPE (stmt_info) = call_vec_info_type;
       if (vect_print_dump_info (REPORT_DETAILS))
         fprintf (vect_dump, "=== vectorizable_call ===");
-      vect_model_simple_cost (stmt_info, ncopies, dt);
+      vect_model_simple_cost (stmt_info, ncopies, dt, NULL);
       return true;
     }
 
@@ -3194,7 +3241,7 @@ vectorizable_assignment (tree stmt, bloc
       STMT_VINFO_TYPE (stmt_info) = assignment_vec_info_type;
       if (vect_print_dump_info (REPORT_DETAILS))
         fprintf (vect_dump, "=== vectorizable_assignment ===");
-      vect_model_simple_cost (stmt_info, ncopies, dt);
+      vect_model_simple_cost (stmt_info, ncopies, dt, NULL);
       return true;
     }
 
@@ -3461,7 +3508,7 @@ vectorizable_operation (tree stmt, block
       STMT_VINFO_TYPE (stmt_info) = op_vec_info_type;
       if (vect_print_dump_info (REPORT_DETAILS))
         fprintf (vect_dump, "=== vectorizable_operation ===");
-      vect_model_simple_cost (stmt_info, ncopies, dt);
+      vect_model_simple_cost (stmt_info, ncopies, dt, NULL);
       return true;
     }
 
@@ -3719,7 +3766,7 @@ vectorizable_type_demotion (tree stmt, b
       STMT_VINFO_TYPE (stmt_info) = type_demotion_vec_info_type;
       if (vect_print_dump_info (REPORT_DETAILS))
         fprintf (vect_dump, "=== vectorizable_demotion ===");
-      vect_model_simple_cost (stmt_info, ncopies, dt);
+      vect_model_simple_cost (stmt_info, ncopies, dt, NULL);
       return true;
     }
 
@@ -3882,7 +3929,7 @@ vectorizable_type_promotion (tree stmt, 
       STMT_VINFO_TYPE (stmt_info) = type_promotion_vec_info_type;
       if (vect_print_dump_info (REPORT_DETAILS))
         fprintf (vect_dump, "=== vectorizable_promotion ===");
-      vect_model_simple_cost (stmt_info, 2*ncopies, dt);
+      vect_model_simple_cost (stmt_info, 2*ncopies, dt, NULL);
       return true;
     }
 
@@ -4212,7 +4259,8 @@ vectorizable_store (tree stmt, block_stm
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = store_vec_info_type;
-      vect_model_store_cost (stmt_info, ncopies, dt);
+      if (!STMT_VINFO_PURE_SLP (stmt_info))
+	vect_model_store_cost (stmt_info, ncopies, dt, NULL);
       return true;
     }
 
@@ -4893,7 +4941,8 @@ vectorizable_load (tree stmt, block_stmt
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = load_vec_info_type;
-      vect_model_load_cost (stmt_info, ncopies);
+      if (!STMT_VINFO_PURE_SLP (stmt_info))
+	vect_model_load_cost (stmt_info, ncopies, NULL);
       return true;
     }
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] Loop-aware SLP 5/5
  2007-08-14 13:01 [patch] Loop-aware SLP Ira Rosen
                   ` (3 preceding siblings ...)
  2007-08-14 13:16 ` [patch] Loop-aware SLP 4/5 Ira Rosen
@ 2007-08-14 13:18 ` Ira Rosen
  2007-08-14 13:21 ` [patch] Loop-aware SLP Paolo Bonzini
  5 siblings, 0 replies; 14+ messages in thread
From: Ira Rosen @ 2007-08-14 13:18 UTC (permalink / raw)
  To: Ira Rosen; +Cc: Dorit Nuzman, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1657 bytes --]

This is the last part of loop-ware SLP patch. It contains 39 testcases and
changes to lib/target-supports.exp and relevant vect.exp.

Thanks,
Ira

ChangeLog:

      * gcc.dg/vect/vect.exp: Compile tests starting with slp-.
      Remove "vect" part from test names for -ffast-math,
-ffast-math-errno,
      -fwrapv, -ftrapv tests. Add -fno-tree-scev-cprop for slp- tests.
      Compile tests with -fno-tree-pre.
      * gcc.dg/vect/costmodel/ppc/ppc-costmodel-vect.exp: Run SLP tests.
      * lib/target-supports.exp (check_effective_target_vect_strided): New.
      * gcc.dg/vect/slp-1.c, gcc.dg/vect/slp-2.c, gcc.dg/vect/slp-3.c,
      gcc.dg/vect/slp-4.c, gcc.dg/vect/slp-5.c, gcc.dg/vect/slp-6.c,
      gcc.dg/vect/slp-7.c, gcc.dg/vect/slp-8.c, gcc.dg/vect/slp-9.c,
      gcc.dg/vect/slp-10.c, gcc.dg/vect/slp-11.c, gcc.dg/vect/slp-12.c,
      gcc.dg/vect/slp-13.c, gcc.dg/vect/slp-14.c, gcc.dg/vect/slp-15.c,
      gcc.dg/vect/slp-16.c, gcc.dg/vect/slp-17.c, gcc.dg/vect/slp-18.c,
      gcc.dg/vect/slp-19.c, gcc.dg/vect/slp-20.c, gcc.dg/vect/slp-21.c,
      gcc.dg/vect/slp-22.c, gcc.dg/vect/slp-23.c, gcc.dg/vect/slp-24.c,
      gcc.dg/vect/slp-25.c, gcc.dg/vect/slp-26.c, gcc.dg/vect/slp-28.c,
      gcc.dg/vect/fast-math-slp-27.c, gcc.dg/vect/no-tree-pre-slp-29.c,
      gcc.dg/vect/no-scevccp-slp-30.c, gcc.dg/vect/no-scevccp-slp-31.c,
      gcc.dg/vect/no-math-errno-slp-32.c, gcc.dg/vect/slp-33.c,
      gcc.dg/vect/slp-34.c, gcc.dg/vect/slp-35.c, gcc.dg/vect/slp-36.c,
      gcc.dg/vect/slp-37.c, gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c,
      gcc.dg/vect/costmodel/ppc/costmodel-slp-33.c: New testcases.

(See attached file: slp-part5.txt)

[-- Attachment #2: slp-part5.txt --]
[-- Type: text/plain, Size: 114288 bytes --]

Index: gcc.dg/vect/no-scevccp-slp-31.c
===================================================================
--- gcc.dg/vect/no-scevccp-slp-31.c	(revision 0)
+++ gcc.dg/vect/no-scevccp-slp-31.c	(revision 0)
@@ -0,0 +1,58 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include "tree-vect.h"
+
+#define N 128 
+
+int
+main1 ()
+{
+  int i, j;
+  unsigned short out[N*8], a[N][N];
+   
+ for (i = 0; i < N; i++)
+   {
+    for (j = 0; j < N; j++)
+      {
+        a[i][j] = 8;
+      }
+    out[i*4] = 8;
+    out[i*4 + 1] = 18;
+    out[i*4 + 2] = 28;
+    out[i*4 + 3] = 38;
+   }
+
+  /* check results:  */
+ for (i = 0; i < N; i++)
+   {
+    for (j = 0; j < N; j++) 
+      {
+        if (a[i][j] != 8)
+           abort ();
+      }
+    if (out[i*4] != 8
+        || out[i*4 + 1] != 18
+        || out[i*4 + 2] != 28
+        || out[i*4 + 3] != 38)
+      abort();
+   }
+
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+  
Index: gcc.dg/vect/slp-24.c
===================================================================
--- gcc.dg/vect/slp-24.c	(revision 0)
+++ gcc.dg/vect/slp-24.c	(revision 0)
@@ -0,0 +1,82 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include "tree-vect.h"
+
+#define N 16
+#define DIFF 242
+
+typedef struct {
+   unsigned char a;
+   unsigned char b;
+   unsigned char c;
+   unsigned char d;
+} s;
+
+void
+main1 (unsigned char x, unsigned char max_result, unsigned char min_result, s *arr)
+{
+  int i;
+  unsigned char ub[N*2] = {1,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,1,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+  unsigned char uc[N] = {1,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+  unsigned char udiff = 2;
+  unsigned char umax = x;
+  unsigned char umin = x;
+  unsigned char ua1[N*2];
+  s *pIn = arr;
+  s out[N];
+
+  for (i = 0; i < N; i++) {
+    udiff += (unsigned char)(ub[i] - uc[i]);
+
+    ua1[2*i+1] = ub[2*i+1];
+    ua1[2*i] = ub[2*i];
+
+    out[i].d = pIn->d - 1;
+    out[i].b = pIn->b - 4;
+    out[i].c = pIn->c - 8;
+    out[i].a = pIn->a - 3;
+
+    pIn++;
+  }
+
+  for (i = 0; i < N; i++) {
+    if (ua1[2*i] != ub[2*i]
+        || ua1[2*i+1] != ub[2*i+1]
+        || out[i].a != arr[i].a - 3
+        || out[i].b != arr[i].b - 4
+        || out[i].c != arr[i].c - 8
+        || out[i].d != arr[i].d - 1)
+      abort();
+  }
+
+  /* check results:  */
+  if (udiff != DIFF)
+    abort ();
+}
+
+int main (void)
+{
+  int i; 
+  s arr[N];
+
+  for (i = 0; i < N; i++)
+    {
+      arr[i].a = i + 9;
+      arr[i].b = i * 2 + 10;
+      arr[i].c = 17;
+      arr[i].d = i+34;
+      if (arr[i].a == 178)
+         abort();
+    }
+  check_vect ();
+  
+  main1 (100, 100, 1, arr);
+  main1 (0, 15, 0, arr);
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc.dg/vect/slp-6.c
===================================================================
--- gcc.dg/vect/slp-6.c	(revision 0)
+++ gcc.dg/vect/slp-6.c	(revision 0)
@@ -0,0 +1,122 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include "tree-vect.h"
+
+#define N 8 
+
+int
+main1 ()
+{
+  int i;
+  unsigned short out[N*8];
+  unsigned short in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+  unsigned int in2[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+  unsigned int out2[N*8];
+
+  for (i = 0; i < N; i++)
+    {
+      out[i*8] = in[i*8] + 5;
+      out[i*8 + 1] = in[i*8 + 1] + 6;
+      out[i*8 + 2] = in[i*8 + 2] + 7;
+      out[i*8 + 3] = in[i*8 + 3] + 8;
+      out[i*8 + 4] = in[i*8 + 4] + 9;
+      out[i*8 + 5] = in[i*8 + 5] + 10;
+      out[i*8 + 6] = in[i*8 + 6] + 11;
+      out[i*8 + 7] = in[i*8 + 7] + 12;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (out[i*8] !=  in[i*8] + 5
+         || out[i*8 + 1] != in[i*8 + 1] + 6
+         || out[i*8 + 2] != in[i*8 + 2] + 7
+         || out[i*8 + 3] != in[i*8 + 3] + 8
+         || out[i*8 + 4] != in[i*8 + 4] + 9
+         || out[i*8 + 5] != in[i*8 + 5] + 10
+         || out[i*8 + 6] != in[i*8 + 6] + 11
+         || out[i*8 + 7] != in[i*8 + 7] + 12)
+	abort ();
+    }
+
+  for (i = 0; i < N*2; i++)
+    {
+      out[i*4] = in[i*4] + 2;
+      out[i*4 + 1] = in[i*4 + 1] + 2;
+      out[i*4 + 2] = in[i*4 + 2] + 1;
+      out[i*4 + 3] = in[i*4 + 3] + 3;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N*2; i++)
+    {
+      if (out[i*4] !=  in[i*4] + 2
+         || out[i*4 + 1] != in[i*4 + 1] + 2
+         || out[i*4 + 2] != in[i*4 + 2] + 1
+         || out[i*4 + 3] != in[i*4 + 3] + 3)
+        abort ();
+    }
+
+  for (i = 0; i < N/2; i++)
+    {
+      out2[i*16] = in2[i*16] * 2;
+      out2[i*16 + 1] = in2[i*16 + 1] * 3;
+      out2[i*16 + 2] = in2[i*16 + 2] * 4;
+      out2[i*16 + 3] = in2[i*16 + 3] * 3;
+      out2[i*16 + 4] = in2[i*16 + 4] * 2;
+      out2[i*16 + 5] = in2[i*16 + 5] * 3;
+      out2[i*16 + 6] = in2[i*16 + 6] * 2;
+      out2[i*16 + 7] = in2[i*16 + 7] * 4;
+      out2[i*16 + 8] = in2[i*16 + 8] * 2;
+      out2[i*16 + 9] = in2[i*16 + 9] * 5;
+      out2[i*16 + 10] = in2[i*16 + 10] * 2;
+      out2[i*16 + 11] = in2[i*16 + 11] * 3;
+      out2[i*16 + 12] = in2[i*16 + 12] * 4;
+      out2[i*16 + 13] = in2[i*16 + 13] * 4;
+      out2[i*16 + 14] = in2[i*16 + 14] * 3;
+      out2[i*16 + 15] = in2[i*16 + 15] * 2;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N/2; i++)
+    {
+      if (out2[i*16] !=  in2[i*16] * 2
+         || out2[i*16 + 1] != in2[i*16 + 1] * 3
+         || out2[i*16 + 2] != in2[i*16 + 2] * 4
+         || out2[i*16 + 3] != in2[i*16 + 3] * 3
+         || out2[i*16 + 4] != in2[i*16 + 4] * 2
+         || out2[i*16 + 5] != in2[i*16 + 5] * 3
+         || out2[i*16 + 6] != in2[i*16 + 6] * 2
+         || out2[i*16 + 7] != in2[i*16 + 7] * 4
+         || out2[i*16 + 8] != in2[i*16 + 8] * 2
+         || out2[i*16 + 9] != in2[i*16 + 9] * 5
+         || out2[i*16 + 10] != in2[i*16 + 10] * 2
+         || out2[i*16 + 11] != in2[i*16 + 11] * 3
+         || out2[i*16 + 12] != in2[i*16 + 12] * 4
+         || out2[i*16 + 13] != in2[i*16 + 13] * 4
+         || out2[i*16 + 14] != in2[i*16 + 14] * 3
+         || out2[i*16 + 15] != in2[i*16 + 15] * 2)
+        abort ();
+    }
+
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect"  {target vect_int_mult} } } */
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  {target  { ! { vect_int_mult } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" {target vect_int_mult  } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" {target  { ! { vect_int_mult } } } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+  
Index: gcc.dg/vect/slp-33.c
===================================================================
--- gcc.dg/vect/slp-33.c	(revision 0)
+++ gcc.dg/vect/slp-33.c	(revision 0)
@@ -0,0 +1,112 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include "tree-vect.h"
+
+#define N 8 
+
+int
+main1 ()
+{
+  int i;
+  unsigned int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7;
+  unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+  float out2[N*8];
+
+  /* SLP with unrolling by 4.  */
+  for (i = 0; i < N; i++)
+    {
+      a0 = in[i*7] + 5;
+      a1 = in[i*7 + 1] + 6;
+      a2 = in[i*7 + 2] + 7;
+      a3 = in[i*7 + 3] + 8;
+      a4 = in[i*7 + 4] + 9;
+      a5 = in[i*7 + 5] + 10;
+      a6 = in[i*7 + 6] + 11;
+
+      b0 = a0 * 3;
+      b1 = a1 * 2;
+      b2 = a2 * 12;
+      b3 = a3 * 5;
+      b4 = a4 * 8;
+      b5 = a5 * 4;
+      b6 = a6 * 3;
+
+      out[i*7] = b0 - 2;
+      out[i*7 + 1] = b1 - 3; 
+      out[i*7 + 2] = b2 - 2;
+      out[i*7 + 3] = b3 - 1;
+      out[i*7 + 4] = b4 - 8;
+      out[i*7 + 5] = b5 - 7;
+      out[i*7 + 6] = b6 - 3;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (out[i*7] !=  (in[i*7] + 5) * 3 - 2
+         || out[i*7 + 1] != (in[i*7 + 1] + 6) * 2 - 3
+         || out[i*7 + 2] != (in[i*7 + 2] + 7) * 12 - 2
+         || out[i*7 + 3] != (in[i*7 + 3] + 8) * 5 - 1
+         || out[i*7 + 4] != (in[i*7 + 4] + 9) * 8 - 8
+         || out[i*7 + 5] != (in[i*7 + 5] + 10) * 4 - 7
+         || out[i*7 + 6] != (in[i*7 + 6] + 11) * 3 - 3)
+	abort ();
+    }
+
+  /* SLP with unrolling by 4.  */
+  for (i = 0; i < N*2; i++)
+    {
+      out[i*3] = (in[i*3] + 2) * 3;
+      out[i*3 + 1] = (in[i*3 + 1] + 2) * 7;
+      out[i*3 + 2] = (in[i*3 + 2] + 7) * 3;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N*2; i++)
+    {
+      if (out[i*3] !=  (in[i*3] + 2) * 3
+         || out[i*3 + 1] != (in[i*3 + 1] + 2) * 7
+         || out[i*3 + 2] != (in[i*3 + 2] + 7) * 3)
+        abort ();
+    }
+
+  /* SLP with unrolling by 4.  */
+  for (i = 0; i < N*2; i++)
+    {
+      out2[i*3] = (float) (in[i*3] * 2 + 5) ;
+      out2[i*3 + 1] = (float) (in[i*3 + 1] * 3 + 7);
+      out2[i*3 + 2] = (float) (in[i*3 + 2] * 5 + 4);
+    }
+
+  /* check results:  */
+  for (i = 0; i < N*2; i++)
+    {
+      if (out2[i*3] !=  (float) (in[i*3] * 2 + 5)
+         || out2[i*3 + 1] != (float) (in[i*3 + 1] * 3 + 7)
+         || out2[i*3 + 2] != (float) (in[i*3 + 2] * 5 + 4))
+        abort ();
+    }
+
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect"  {target {vect_intfloat_cvt && vect_int_mult} } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  {target {{! { vect_intfloat_cvt}} && vect_int_mult} } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  {target {{! { vect_intfloat_cvt}} && {!{vect_int_mult}}} } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" {target {vect_intfloat_cvt && vect_int_mult} } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect"  {target {{! { vect_intfloat_cvt}} && vect_int_mult} } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  {target {{! { vect_intfloat_cvt}} && {!{vect_int_mult}}} } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+  
Index: gcc.dg/vect/slp-16.c
===================================================================
--- gcc.dg/vect/slp-16.c	(revision 0)
+++ gcc.dg/vect/slp-16.c	(revision 0)
@@ -0,0 +1,70 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include "tree-vect.h"
+
+#define N 8 
+
+int
+main1 ()
+{
+  int i;
+  unsigned int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7;
+  unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+  unsigned int in2[N*16] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+  unsigned int out2[N*16];
+
+  /* SLP group of size that is not a multiple of vector size. 
+     Unrolling by 2.  */
+  for (i = 0; i < N; i++)
+    {
+      a0 = in[i*2] + 5;
+      a1 = in[i*2 + 1] + 6; 
+    
+      b0 = a0 * 3;
+      b1 = a1 * 2; 
+    
+      out[i*2] = b0 - 2;
+      out[i*2 + 1] = b1 - 3; 
+    
+      out2[i*6] = in2[i*6] * 2;
+      out2[i*6 + 1] = in2[i*6 + 1] * 3;
+      out2[i*6 + 2] = in2[i*6 + 2] * 4;
+      out2[i*6 + 3] = in2[i*6 + 3] * 2;
+      out2[i*6 + 4] = in2[i*6 + 4] * 4;
+      out2[i*6 + 5] = in2[i*6 + 5] * 3;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (out[i*2] !=  (in[i*2] + 5) * 3 - 2
+         || out[i*2 + 1] != (in[i*2 + 1] + 6) * 2 - 3
+         || out2[i*6] != in2[i*6] * 2
+         || out2[i*6 + 1] != in2[i*6 + 1] * 3
+         || out2[i*6 + 2] != in2[i*6 + 2] * 4
+         || out2[i*6 + 3] != in2[i*6 + 3] * 2
+         || out2[i*6 + 4] != in2[i*6 + 4] * 4
+         || out2[i*6 + 5] != in2[i*6 + 5] * 3)
+        abort ();
+    }
+
+
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_int_mult } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_int_mult } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+  
Index: gcc.dg/vect/slp-25.c
===================================================================
--- gcc.dg/vect/slp-25.c	(revision 0)
+++ gcc.dg/vect/slp-25.c	(revision 0)
@@ -0,0 +1,59 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 128
+
+/* Unaligned stores.  */
+
+int main1 (int n)
+{
+  int i;
+  int ia[N+1];
+  short sa[N+1];
+
+  for (i = 1; i <= N/2; i++)
+    {
+      ia[2*i] = 25;
+      ia[2*i + 1] = 5;
+    }
+
+  /* check results:  */
+  for (i = 1; i <= N/2; i++)
+    {
+      if (ia[2*i] != 25
+          || ia[2*i + 1] != 5)
+        abort ();
+    }
+
+  for (i = 1; i <= n/2; i++)
+    {
+      sa[2*i] = 25;
+      sa[2*i + 1] = 5;
+    }
+
+  /* check results:  */
+  for (i = 1; i <= n/2; i++)
+    {
+      if (sa[2*i] != 25
+          || sa[2*i + 1] != 5)
+        abort ();
+    }
+
+
+  return 0;
+}
+
+int main (void)
+{ 
+
+  check_vect ();
+  
+  return main1 (N);
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 2 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc.dg/vect/slp-7.c
===================================================================
--- gcc.dg/vect/slp-7.c	(revision 0)
+++ gcc.dg/vect/slp-7.c	(revision 0)
@@ -0,0 +1,127 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include "tree-vect.h"
+
+#define N 8 
+
+int
+main1 ()
+{
+  int i;
+  unsigned int out[N*8], ia[N*2];
+  unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+  unsigned short in2[N*16] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+  unsigned short sa[N], out2[N*16];
+
+  for (i = 0; i < N; i++)
+    {
+      out[i*8] = in[i*8] + 5;
+      out[i*8 + 1] = in[i*8 + 1] + 6;
+      out[i*8 + 2] = in[i*8 + 2] + 7;
+      out[i*8 + 3] = in[i*8 + 3] + 8;
+      out[i*8 + 4] = in[i*8 + 4] + 9;
+      out[i*8 + 5] = in[i*8 + 5] + 10;
+      out[i*8 + 6] = in[i*8 + 6] + 11;
+      out[i*8 + 7] = in[i*8 + 7] + 12;
+
+      ia[i] = in[i]; 
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (out[i*8] !=  in[i*8] + 5
+         || out[i*8 + 1] != in[i*8 + 1] + 6
+         || out[i*8 + 2] != in[i*8 + 2] + 7
+         || out[i*8 + 3] != in[i*8 + 3] + 8
+         || out[i*8 + 4] != in[i*8 + 4] + 9
+         || out[i*8 + 5] != in[i*8 + 5] + 10
+         || out[i*8 + 6] != in[i*8 + 6] + 11
+         || out[i*8 + 7] != in[i*8 + 7] + 12
+         || ia[i] != in[i])
+	abort ();
+    }
+
+  for (i = 0; i < N*2; i++)
+    {
+      out[i*4] = in[i*4] + 1; 
+      out[i*4 + 1] = in[i*4 + 1] + 2;
+      out[i*4 + 2] = in[i*4 + 2] + 3;
+      out[i*4 + 3] = in[i*4 + 3] + 4;
+
+      ia[i] = in[i];
+    }
+
+  /* check results:  */
+  for (i = 0; i < N*2; i++)
+    {
+      if (out[i*4] !=  in[i*4] + 1
+         || out[i*4 + 1] != in[i*4 + 1] + 2
+         || out[i*4 + 2] != in[i*4 + 2] + 3 
+         || out[i*4 + 3] != in[i*4 + 3] + 4
+         || ia[i] != in[i])
+        abort ();
+    }
+
+  for (i = 0; i < N; i++)
+    {
+      out2[i*16] = in2[i*16] * 2;
+      out2[i*16 + 1] = in2[i*16 + 1] * 3;
+      out2[i*16 + 2] = in2[i*16 + 2] * 4;
+      out2[i*16 + 3] = in2[i*16 + 3] * 3;
+      out2[i*16 + 4] = in2[i*16 + 4] * 2;
+      out2[i*16 + 5] = in2[i*16 + 5] * 3;
+      out2[i*16 + 6] = in2[i*16 + 6] * 2;
+      out2[i*16 + 7] = in2[i*16 + 7] * 4;
+      out2[i*16 + 8] = in2[i*16 + 8] * 2;
+      out2[i*16 + 9] = in2[i*16 + 9] * 5;
+      out2[i*16 + 10] = in2[i*16 + 10] * 2;
+      out2[i*16 + 11] = in2[i*16 + 11] * 3;
+      out2[i*16 + 12] = in2[i*16 + 12] * 4;
+      out2[i*16 + 13] = in2[i*16 + 13] * 4;
+      out2[i*16 + 14] = in2[i*16 + 14] * 3;
+      out2[i*16 + 15] = in2[i*16 + 15] * 2;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (out2[i*16] !=  in2[i*16] * 2
+         || out2[i*16 + 1] != in2[i*16 + 1] * 3
+         || out2[i*16 + 2] != in2[i*16 + 2] * 4
+         || out2[i*16 + 3] != in2[i*16 + 3] * 3
+         || out2[i*16 + 4] != in2[i*16 + 4] * 2
+         || out2[i*16 + 5] != in2[i*16 + 5] * 3
+         || out2[i*16 + 6] != in2[i*16 + 6] * 2
+         || out2[i*16 + 7] != in2[i*16 + 7] * 4
+         || out2[i*16 + 8] != in2[i*16 + 8] * 2
+         || out2[i*16 + 9] != in2[i*16 + 9] * 5
+         || out2[i*16 + 10] != in2[i*16 + 10] * 2
+         || out2[i*16 + 11] != in2[i*16 + 11] * 3
+         || out2[i*16 + 12] != in2[i*16 + 12] * 4
+         || out2[i*16 + 13] != in2[i*16 + 13] * 4
+         || out2[i*16 + 14] != in2[i*16 + 14] * 3
+         || out2[i*16 + 15] != in2[i*16 + 15] * 2)
+        abort ();
+    }
+
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect"  {target {  vect_strided &&  vect_int_mult  } } } }*/
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  {target  { ! {  vect_strided && vect_int_mult } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect"  } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+ 
Index: gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c
===================================================================
--- gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c	(revision 0)
+++ gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c	(revision 0)
@@ -0,0 +1,124 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include "../../tree-vect.h"
+
+#define N 8 
+
+int
+main1 ()
+{
+  int i;
+  unsigned int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7;
+  unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+  float out2[N*8], fa[N*4];
+  unsigned int ia[N], ib[N*2];
+
+  for (i = 0; i < N; i++)
+    {
+      
+      a0 = in[i*8] + 5;
+      a1 = in[i*8 + 1] + 6;
+      a2 = in[i*8 + 2] + 7;
+      a3 = in[i*8 + 3] + 8;
+      a4 = in[i*8 + 4] + 9;
+      a5 = in[i*8 + 5] + 10;
+      a6 = in[i*8 + 6] + 11;
+      a7 = in[i*8 + 7] + 12;
+
+      b0 = a0 * 3;
+      b1 = a1 * 2;
+      b2 = a2 * 12;
+      b3 = a3 * 5;
+      b4 = a4 * 8;
+      b5 = a5 * 4;
+      b6 = a6 * 3;
+      b7 = a7 * 2;
+
+      out[i*8] = b0 - 2;
+      out[i*8 + 1] = b1 - 3; 
+      out[i*8 + 2] = b2 - 2;
+      out[i*8 + 3] = b3 - 1;
+      out[i*8 + 4] = b4 - 8;
+      out[i*8 + 5] = b5 - 7;
+      out[i*8 + 6] = b6 - 3;
+      out[i*8 + 7] = b7 - 7;
+
+      ia[i] = b6;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
+         || out[i*8 + 1] != (in[i*8 + 1] + 6) * 2 - 3
+         || out[i*8 + 2] != (in[i*8 + 2] + 7) * 12 - 2
+         || out[i*8 + 3] != (in[i*8 + 3] + 8) * 5 - 1
+         || out[i*8 + 4] != (in[i*8 + 4] + 9) * 8 - 8
+         || out[i*8 + 5] != (in[i*8 + 5] + 10) * 4 - 7
+         || out[i*8 + 6] != (in[i*8 + 6] + 11) * 3 - 3
+         || out[i*8 + 7] != (in[i*8 + 7] + 12) * 2 - 7
+         || ia[i] != (in[i*8 + 6] + 11) * 3)
+	abort ();
+    }
+
+  for (i = 0; i < N*2; i++)
+    {
+      out[i*4] = (in[i*4] + 2) * 3;
+      out[i*4 + 1] = (in[i*4 + 1] + 2) * 7;
+      out[i*4 + 2] = (in[i*4 + 2] + 7) * 3;
+      out[i*4 + 3] = (in[i*4 + 3] + 3) * 4;
+
+      ib[i] = 7;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N*2; i++)
+    {
+      if (out[i*4] !=  (in[i*4] + 2) * 3
+         || out[i*4 + 1] != (in[i*4 + 1] + 2) * 7
+         || out[i*4 + 2] != (in[i*4 + 2] + 7) * 3
+         || out[i*4 + 3] != (in[i*4 + 3] + 3) * 4
+         || ib[i] != 7)
+        abort ();
+    }
+
+  for (i = 0; i < N*4; i++)
+    {
+      out2[i*2] = (float) (in[i*2] * 2 + 6) ;
+      out2[i*2 + 1] = (float) (in[i*2 + 1] * 3 + 7);
+    
+      fa[i] = (float) in[i*2+1];
+    }
+
+  /* check results:  */
+  for (i = 0; i < N*4; i++)
+    {
+      if (out2[i*2] !=  (float) (in[i*2] * 2 + 6)
+         || out2[i*2 + 1] != (float) (in[i*2 + 1] * 3 + 7)
+         || fa[i] != (float) in[i*2+1])
+        abort ();
+    }
+
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect"  {target { vect_strided && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  {target { {! {vect_strided}} && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  {target  { ! vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" {target { vect_strided && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" {target { {! {vect_strided}} && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" {target  { ! vect_int_mult } } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+  
Index: gcc.dg/vect/costmodel/ppc/costmodel-slp-33.c
===================================================================
--- gcc.dg/vect/costmodel/ppc/costmodel-slp-33.c	(revision 0)
+++ gcc.dg/vect/costmodel/ppc/costmodel-slp-33.c	(revision 0)
@@ -0,0 +1,45 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "../../tree-vect.h"
+
+#define N 32
+
+struct s{
+  short a;	/* aligned */
+  char b[N-1];  /* unaligned (offset 2B) */
+};
+ 
+int main1 ()
+{  
+  int i;
+  struct s tmp;
+
+  /* unaligned */
+  for (i = 0; i < N/4; i++)
+    {
+      tmp.b[2*i] = 5;
+      tmp.b[2*i+1] = 15;
+    }
+
+  /* check results:  */
+  for (i = 0; i <N/4; i++)
+    {
+      if (tmp.b[2*i] != 5
+          || tmp.b[2*i+1] != 15)
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{ 
+  check_vect ();
+  
+  return main1 ();
+} 
+
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc.dg/vect/costmodel/ppc/ppc-costmodel-vect.exp
===================================================================
--- gcc.dg/vect/costmodel/ppc/ppc-costmodel-vect.exp	(revision 127394)
+++ gcc.dg/vect/costmodel/ppc/ppc-costmodel-vect.exp	(working copy)
@@ -64,6 +64,8 @@ dg-runtest [lsort [glob -nocomplain $src
 	"" $DEFAULT_VECTCFLAGS
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/costmodel-vect-*.\[cS\]]]  \
 	"" $DEFAULT_VECTCFLAGS
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/costmodel-slp-*.\[cS\]]]  \
+        "" $DEFAULT_VECTCFLAGS
 
 #### Tests with special options
 global SAVED_DEFAULT_VECTCFLAGS
Index: gcc.dg/vect/slp-34.c
===================================================================
--- gcc.dg/vect/slp-34.c	(revision 0)
+++ gcc.dg/vect/slp-34.c	(revision 0)
@@ -0,0 +1,61 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include "tree-vect.h"
+
+#define N 8 
+
+int
+main1 ()
+{
+  int i;
+  unsigned short out[N*8];
+  unsigned short in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+  unsigned short in2[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+  unsigned short out2[N*8];
+
+  /* SLP with unrolling by 8.  */
+  for (i = 0; i < N; i++)
+    {
+      out[i*3] = in[i*3] + 5;
+      out[i*3 + 1] = in[i*3 + 1] + 6;
+      out[i*3 + 2] = in[i*3 + 2] + 16;
+
+      out2[i*5] = in2[i*5] + 2;
+      out2[i*5 + 1] = in2[i*5 + 1] + 2;
+      out2[i*5 + 2] = in2[i*5 + 2] + 1;
+      out2[i*5 + 3] = in2[i*5 + 3] + 3;
+      out2[i*5 + 4] = in2[i*5 + 4] + 13;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (out[i*3] != in[i*3] + 5
+          || out[i*3 + 1] != in[i*3 + 1] + 6
+          || out[i*3 + 2] != in[i*3 + 2] + 16
+          || out2[i*5] != in2[i*5] + 2
+          || out2[i*5 + 1] != in2[i*5 + 1] + 2
+          || out2[i*5 + 2] != in2[i*5 + 2] + 1
+          || out2[i*5 + 3] != in2[i*5 + 3] + 3
+          || out2[i*5 + 4] != in2[i*5 + 4] + 13)
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect"  } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+  
Index: gcc.dg/vect/slp-17.c
===================================================================
--- gcc.dg/vect/slp-17.c	(revision 0)
+++ gcc.dg/vect/slp-17.c	(revision 0)
@@ -0,0 +1,56 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include "tree-vect.h"
+
+#define N 8 
+
+int
+main1 ()
+{
+  int i;
+  unsigned short out[N*8];
+  unsigned short in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+  unsigned short in2[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+  unsigned short out2[N*8];
+
+  for (i = 0; i < N*2; i++)
+    {
+      out[i*2] = in[i*2] + 5;
+      out[i*2 + 1] = in[i*2 + 1] + 6;
+
+      out2[i*4] = in2[i*4] + 2;
+      out2[i*4 + 1] = in2[i*4 + 1] + 2;
+      out2[i*4 + 2] = in2[i*4 + 2] + 1;
+      out2[i*4 + 3] = in2[i*4 + 3] + 3;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N*2; i++)
+    {
+      if (out[i*2] != in[i*2] + 5
+         || out[i*2 + 1] != in[i*2 + 1] + 6
+         || out2[i*4] !=  in2[i*4] + 2
+         || out2[i*4 + 1] != in2[i*4 + 1] + 2
+         || out2[i*4 + 2] != in2[i*4 + 2] + 1
+         || out2[i*4 + 3] != in2[i*4 + 3] + 3)
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect"  } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+  
Index: gcc.dg/vect/slp-26.c
===================================================================
--- gcc.dg/vect/slp-26.c	(revision 0)
+++ gcc.dg/vect/slp-26.c	(revision 0)
@@ -0,0 +1,53 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include "tree-vect.h"
+
+#define N 8 
+
+int
+main1 ()
+{
+  int i;
+  unsigned short in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+  unsigned short out[N*8], a[N], b[N] = {3,6,9,12,15,18,21,24};
+
+  /* Partial SLP is not supported.  */
+  for (i = 0; i < N; i++)
+    {
+      out[i*4] = in[i*4];
+      out[i*4 + 1] = in[i*4 + 1];
+      out[i*4 + 2] = in[i*4 + 2];
+      out[i*4 + 3] = in[i*4 + 3];
+
+      a[i] = b[i] / 3;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (out[i*4] !=  in[i*4]
+         || out[i*4 + 1] != in[i*4 + 1]
+         || out[i*4 + 2] != in[i*4 + 2]
+         || out[i*4 + 3] != in[i*4 + 3]
+         || a[i] != b[i] / 3)
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+  
Index: gcc.dg/vect/slp-8.c
===================================================================
--- gcc.dg/vect/slp-8.c	(revision 0)
+++ gcc.dg/vect/slp-8.c	(revision 0)
@@ -0,0 +1,45 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 32
+
+int main1 ()
+{
+  int i;
+  int ib[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+  float fa[N];
+
+  /* int -> float */
+  for (i = 0; i < N/4; i++)
+    {
+      fa[4*i] = (float) ib[4*i];	
+      fa[4*i + 1] = (float) ib[4*i + 1];	
+      fa[4*i + 2] = (float) ib[4*i + 2];	
+      fa[4*i + 3] = (float) ib[4*i + 3];	
+    }
+
+  /* check results:  */
+  for (i = 0; i < N/4; i++)
+    {
+      if (fa[4*i] != (float) ib[4*i]      
+          || fa[4*i + 1] != (float) ib[4*i + 1]
+          || fa[4*i + 2] != (float) ib[4*i + 2]
+          || fa[4*i + 3] != (float) ib[4*i + 3])
+        abort (); 
+    }   
+
+  return 0;
+}
+
+int main (void)
+{ 
+  check_vect ();
+  
+  return main1 ();
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target powerpc*-*-* i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target powerpc*-*-* i?86-*-* x86_64-*-* } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc.dg/vect/no-math-errno-slp-32.c
===================================================================
--- gcc.dg/vect/no-math-errno-slp-32.c	(revision 0)
+++ gcc.dg/vect/no-math-errno-slp-32.c	(revision 0)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_double } */
+
+double x[256];
+
+void foo(void)
+{
+  int i;
+  for (i=0; i<128; ++i)
+   {
+    x[2*i] = __builtin_pow (x[2*i], 0.5);
+    x[2*i+1] = __builtin_pow (x[2*i+1], 0.5);
+   }
+}
+
+/* { dg-final { scan-tree-dump "pattern recognized" "vect" { xfail spu*-*-* } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc.dg/vect/slp-10.c
===================================================================
--- gcc.dg/vect/slp-10.c	(revision 0)
+++ gcc.dg/vect/slp-10.c	(revision 0)
@@ -0,0 +1,114 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include "tree-vect.h"
+
+#define N 8 
+
+int
+main1 ()
+{
+  int i;
+  unsigned int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7;
+  unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+  float out2[N*8];
+
+  for (i = 0; i < N; i++)
+    {
+      
+      a0 = in[i*8] + 5;
+      a1 = in[i*8 + 1] + 6;
+      a2 = in[i*8 + 2] + 7;
+      a3 = in[i*8 + 3] + 8;
+      a4 = in[i*8 + 4] + 9;
+      a5 = in[i*8 + 5] + 10;
+      a6 = in[i*8 + 6] + 11;
+      a7 = in[i*8 + 7] + 12;
+
+      b0 = a0 * 3;
+      b1 = a1 * 2;
+      b2 = a2 * 12;
+      b3 = a3 * 5;
+      b4 = a4 * 8;
+      b5 = a5 * 4;
+      b6 = a6 * 3;
+      b7 = a7 * 2;
+
+      out[i*8] = b0 - 2;
+      out[i*8 + 1] = b1 - 3; 
+      out[i*8 + 2] = b2 - 2;
+      out[i*8 + 3] = b3 - 1;
+      out[i*8 + 4] = b4 - 8;
+      out[i*8 + 5] = b5 - 7;
+      out[i*8 + 6] = b6 - 3;
+      out[i*8 + 7] = b7 - 7;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
+         || out[i*8 + 1] != (in[i*8 + 1] + 6) * 2 - 3
+         || out[i*8 + 2] != (in[i*8 + 2] + 7) * 12 - 2
+         || out[i*8 + 3] != (in[i*8 + 3] + 8) * 5 - 1
+         || out[i*8 + 4] != (in[i*8 + 4] + 9) * 8 - 8
+         || out[i*8 + 5] != (in[i*8 + 5] + 10) * 4 - 7
+         || out[i*8 + 6] != (in[i*8 + 6] + 11) * 3 - 3
+         || out[i*8 + 7] != (in[i*8 + 7] + 12) * 2 - 7)
+	abort ();
+    }
+
+  for (i = 0; i < N*2; i++)
+    {
+      out[i*4] = (in[i*4] + 2) * 3;
+      out[i*4 + 1] = (in[i*4 + 1] + 2) * 7;
+      out[i*4 + 2] = (in[i*4 + 2] + 7) * 3;
+      out[i*4 + 3] = (in[i*4 + 3] + 7) * 7;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N*2; i++)
+    {
+      if (out[i*4] !=  (in[i*4] + 2) * 3
+         || out[i*4 + 1] != (in[i*4 + 1] + 2) * 7
+         || out[i*4 + 2] != (in[i*4 + 2] + 7) * 3
+         || out[i*4 + 3] != (in[i*4 + 3] + 7) * 7)
+        abort ();
+    }
+
+  for (i = 0; i < N*4; i++)
+    {
+      out2[i*2] = (float) (in[i*2] * 2 + 5) ;
+      out2[i*2 + 1] = (float) (in[i*2 + 1] * 3 + 7);
+    }
+
+  /* check results:  */
+  for (i = 0; i < N*4; i++)
+    {
+      if (out2[i*2] !=  (float) (in[i*2] * 2 + 5)
+         || out2[i*2 + 1] != (float) (in[i*2 + 1] * 3 + 7))
+        abort ();
+    }
+
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect"  {target {vect_intfloat_cvt && vect_int_mult} } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  {target {{! { vect_intfloat_cvt}} && vect_int_mult} } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  {target {{! { vect_intfloat_cvt}} && {!{vect_int_mult}}} } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" {target {vect_intfloat_cvt && vect_int_mult} } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect"  {target {{! { vect_intfloat_cvt}} && vect_int_mult} } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  {target {{! { vect_intfloat_cvt}} && {!{vect_int_mult}}} } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+  
Index: gcc.dg/vect/slp-35.c
===================================================================
--- gcc.dg/vect/slp-35.c	(revision 0)
+++ gcc.dg/vect/slp-35.c	(revision 0)
@@ -0,0 +1,73 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 128 
+
+typedef struct {
+   int a;
+   int b;
+   int c;
+   int d;
+   int e;
+} s;
+
+int
+main1 (s *arr)
+{
+  int i;
+  s *ptr = arr;
+  s res[N];
+
+  /* SLP with unrolling by 4.  */
+  for (i = 0; i < N; i++)
+    {
+      res[i].c = ptr->c + ptr->c;
+      res[i].a = ptr->a + ptr->a;
+      res[i].d = ptr->d + ptr->d;
+      res[i].b = ptr->b + ptr->b;
+      res[i].e = ptr->e + ptr->e; 
+      ptr++; 
+    } 
+   
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    { 
+      if (res[i].c != arr[i].c + arr[i].c
+          || res[i].a != arr[i].a + arr[i].a
+          || res[i].d != arr[i].d + arr[i].d
+          || res[i].b != arr[i].b + arr[i].b
+          || res[i].e != arr[i].e + arr[i].e)
+         abort();
+    }
+
+}
+
+int main (void)
+{
+  int i;
+  s arr[N];
+  
+  check_vect ();
+
+  for (i = 0; i < N; i++)
+    { 
+      arr[i].a = i;
+      arr[i].b = i * 2;
+      arr[i].c = 17;
+      arr[i].d = i+34;
+      arr[i].e = i * 3 + 5;
+      if (arr[i].a == 178)
+         abort(); 
+    } 
+
+  main1 (arr);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+  
Index: gcc.dg/vect/slp-18.c
===================================================================
--- gcc.dg/vect/slp-18.c	(revision 0)
+++ gcc.dg/vect/slp-18.c	(revision 0)
@@ -0,0 +1,97 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include "tree-vect.h"
+
+#define N 8 
+
+int
+main1 ()
+{
+  int i;
+  unsigned int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7;
+  unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+  float out2[N*8];
+
+  for (i = 0; i < N; i++)
+    {
+      
+      a0 = in[i*8] + 5;
+      a1 = in[i*8 + 1] + 6;
+      a2 = in[i*8 + 2] + 7;
+      a3 = in[i*8 + 3] + 8;
+      a4 = in[i*8 + 4] + 9;
+      a5 = in[i*8 + 5] + 10;
+      a6 = in[i*8 + 6] + 11;
+      a7 = in[i*8 + 7] + 12;
+
+      b0 = a0 * 3;
+      b1 = a1 * 2;
+      b2 = a2 * 12;
+      b3 = a3 * 5;
+      b4 = a4 * 8;
+      b5 = a5 * 4;
+      b6 = a6 * 3;
+      b7 = a7 * 2;
+
+      out[i*8] = b0 - 2;
+      out[i*8 + 1] = b1 - 3; 
+      out[i*8 + 2] = b2 - 2;
+      out[i*8 + 3] = b3 - 1;
+      out[i*8 + 4] = b4 - 8;
+      out[i*8 + 5] = b5 - 7;
+      out[i*8 + 6] = b6 - 3;
+      out[i*8 + 7] = b7 - 7;
+
+
+      out2[i*8] = (float) b0;
+      out2[i*8 + 1] = (float) b1;
+      out2[i*8 + 2] = (float) b2; 
+      out2[i*8 + 3] = (float) b3; 
+      out2[i*8 + 4] = (float) b4; 
+      out2[i*8 + 5] = (float) b5; 
+      out2[i*8 + 6] = (float) b6;
+      out2[i*8 + 7] = (float) b7;
+  }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
+         || out[i*8 + 1] != (in[i*8 + 1] + 6) * 2 - 3
+         || out[i*8 + 2] != (in[i*8 + 2] + 7) * 12 - 2
+         || out[i*8 + 3] != (in[i*8 + 3] + 8) * 5 - 1
+         || out[i*8 + 4] != (in[i*8 + 4] + 9) * 8 - 8
+         || out[i*8 + 5] != (in[i*8 + 5] + 10) * 4 - 7
+         || out[i*8 + 6] != (in[i*8 + 6] + 11) * 3 - 3
+         || out[i*8 + 7] != (in[i*8 + 7] + 12) * 2 - 7)
+	abort ();
+
+      if (out2[i*8] != (float) ((in[i*8] + 5) * 3)
+         || out2[i*8 + 1] != (float) ((in[i*8 + 1] + 6) * 2)
+         || out2[i*8 + 2] != (float) ((in[i*8 + 2] + 7) * 12)
+         || out2[i*8 + 3] != (float) ((in[i*8 + 3] + 8) * 5)
+         || out2[i*8 + 4] != (float) ((in[i*8 + 4] + 9) * 8)
+         || out2[i*8 + 5] != (float) ((in[i*8 + 5] + 10) * 4)
+         || out2[i*8 + 6] != (float) ((in[i*8 + 6] + 11) * 3)
+         || out2[i*8 + 7] != (float) ((in[i*8 + 7] + 12) * 2))
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { vect_strided } } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+  
Index: gcc.dg/vect/slp-1.c
===================================================================
--- gcc.dg/vect/slp-1.c	(revision 0)
+++ gcc.dg/vect/slp-1.c	(revision 0)
@@ -0,0 +1,124 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include "tree-vect.h"
+
+#define N 128 
+
+int
+main1 ()
+{
+  int i;
+  unsigned short out[N*8];
+
+  for (i = 0; i < N; i++)
+    {
+      out[i*4] = 8;
+      out[i*4 + 1] = 18;
+      out[i*4 + 2] = 28;
+      out[i*4 + 3] = 38;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (out[i*4] != 8 
+         || out[i*4 + 1] != 18
+         || out[i*4 + 2] != 28
+         || out[i*4 + 3] != 38)
+	abort ();
+    }
+
+  for (i = 0; i < N; i++)
+    {
+      out[i*8] = 8;
+      out[i*8 + 1] = 7;
+      out[i*8 + 2] = 81;
+      out[i*8 + 3] = 28;
+      out[i*8 + 4] = 18;
+      out[i*8 + 5] = 85;
+      out[i*8 + 6] = 5;
+      out[i*8 + 7] = 4;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (out[i*8] != 8
+         || out[i*8 + 1] != 7
+         || out[i*8 + 2] != 81
+         || out[i*8 + 3] != 28
+         || out[i*8 + 4] != 18
+         || out[i*8 + 5] != 85
+         || out[i*8 + 6] != 5
+         || out[i*8 + 7] != 4)
+        abort ();
+    }
+
+  /* SLP with unrolling by 8.  */
+  for (i = 0; i < N; i++)
+    {
+      out[i*5] = 8;
+      out[i*5 + 1] = 7;
+      out[i*5 + 2] = 81;
+      out[i*5 + 3] = 28;
+      out[i*5 + 4] = 18;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (out[i*5] != 8
+         || out[i*5 + 1] != 7
+         || out[i*5 + 2] != 81
+         || out[i*5 + 3] != 28
+         || out[i*5 + 4] != 18)
+        abort ();
+    }
+
+  /* SLP with unrolling by 8.  */
+  for (i = 0; i < N/2; i++)
+    {
+      out[i*9] = 8;
+      out[i*9 + 1] = 7;
+      out[i*9 + 2] = 81;
+      out[i*9 + 3] = 28;
+      out[i*9 + 4] = 18;
+      out[i*9 + 5] = 85;
+      out[i*9 + 6] = 5;
+      out[i*9 + 7] = 4;
+      out[i*9 + 8] = 14;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N/2; i++)
+    {
+      if (out[i*9] != 8
+         || out[i*9 + 1] != 7
+         || out[i*9 + 2] != 81
+         || out[i*9 + 3] != 28
+         || out[i*9 + 4] != 18
+         || out[i*9 + 5] != 85
+         || out[i*9 + 6] != 5
+         || out[i*9 + 7] != 4
+         || out[i*9 + 8] != 14)
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect"  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect"  } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+  
Index: gcc.dg/vect/slp-9.c
===================================================================
--- gcc.dg/vect/slp-9.c	(revision 0)
+++ gcc.dg/vect/slp-9.c	(revision 0)
@@ -0,0 +1,47 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 64
+
+short X[N] __attribute__ ((__aligned__(16)));
+short Y[N] __attribute__ ((__aligned__(16)));
+int result[N];
+
+/* short->int widening-mult */
+int
+foo1(int len) {
+  int i;
+
+  for (i=0; i<len/2; i++) {
+    result[2*i] = X[2*i] * Y[2*i];
+    result[2*i+1] = X[2*i+1] * Y[2*i+1];
+  }
+}
+
+int main (void)
+{
+  int i;
+
+  check_vect ();
+
+  for (i=0; i<N; i++) {
+    X[i] = i;
+    Y[i] = 64-i;
+  }
+
+  foo1 (N);
+
+  for (i=0; i<N; i++) {
+    if (result[i] != X[i] * Y[i])
+      abort ();
+  }
+  
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" {  target { vect_strided && vect_widen_mult_hi_to_si } } } }*/
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: gcc.dg/vect/fast-math-slp-27.c
===================================================================
--- gcc.dg/vect/fast-math-slp-27.c	(revision 0)
+++ gcc.dg/vect/fast-math-slp-27.c	(revision 0)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_float } */
+
+float x[256];
+
+void foo(void)
+{
+  int i;
+  for (i=0; i<256; ++i)
+   {
+    x[2*i] = x[2*i] * x[2*i];
+    x[2*i+1] = x[2*i+1] * x[2*i+1];
+   }
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target vect_strided } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc.dg/vect/slp-11.c
===================================================================
--- gcc.dg/vect/slp-11.c	(revision 0)
+++ gcc.dg/vect/slp-11.c	(revision 0)
@@ -0,0 +1,113 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include "tree-vect.h"
+
+#define N 8 
+
+int
+main1 ()
+{
+  int i;
+  unsigned int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7;
+  unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+  float out2[N*8];
+
+  /* Different operations - not SLPable.  */
+  for (i = 0; i < N; i++)
+    {
+      a0 = in[i*8] + 5;
+      a1 = in[i*8 + 1] * 6;
+      a2 = in[i*8 + 2] + 7;
+      a3 = in[i*8 + 3] + 8;
+      a4 = in[i*8 + 4] + 9;
+      a5 = in[i*8 + 5] + 10;
+      a6 = in[i*8 + 6] + 11;
+      a7 = in[i*8 + 7] + 12;
+
+      b0 = a0 * 3;
+      b1 = a1 * 2;
+      b2 = a2 * 12;
+      b3 = a3 * 5;
+      b4 = a4 * 8;
+      b5 = a5 * 4;
+      b6 = a6 * 3;
+      b7 = a7 * 2;
+
+      out[i*8] = b0 - 2;
+      out[i*8 + 1] = b1 - 3; 
+      out[i*8 + 2] = b2 - 2;
+      out[i*8 + 3] = b3 - 1;
+      out[i*8 + 4] = b4 - 8;
+      out[i*8 + 5] = b5 - 7;
+      out[i*8 + 6] = b6 - 3;
+      out[i*8 + 7] = b7 - 7;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
+         || out[i*8 + 1] != (in[i*8 + 1] * 6) * 2 - 3
+         || out[i*8 + 2] != (in[i*8 + 2] + 7) * 12 - 2
+         || out[i*8 + 3] != (in[i*8 + 3] + 8) * 5 - 1
+         || out[i*8 + 4] != (in[i*8 + 4] + 9) * 8 - 8
+         || out[i*8 + 5] != (in[i*8 + 5] + 10) * 4 - 7
+         || out[i*8 + 6] != (in[i*8 + 6] + 11) * 3 - 3
+         || out[i*8 + 7] != (in[i*8 + 7] + 12) * 2 - 7)
+	abort ();
+    }
+
+  /* Requires permutation - not SLPable.  */
+  for (i = 0; i < N*2; i++)
+    {
+      out[i*4] = (in[i*4] + 2) * 3;
+      out[i*4 + 1] = (in[i*4 + 2] + 2) * 7;
+      out[i*4 + 2] = (in[i*4 + 1] + 7) * 3;
+      out[i*4 + 3] = (in[i*4 + 3] + 3) * 4;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N*2; i++)
+    {
+      if (out[i*4] !=  (in[i*4] + 2) * 3
+         || out[i*4 + 1] != (in[i*4 + 2] + 2) * 7
+         || out[i*4 + 2] != (in[i*4 + 1] + 7) * 3
+         || out[i*4 + 3] != (in[i*4 + 3] + 3) * 4)
+        abort ();
+    }
+
+  /* Different operations - not SLPable.  */
+  for (i = 0; i < N*4; i++)
+    {
+      out2[i*2] = ((float) in[i*2] * 2 + 6) ;
+      out2[i*2 + 1] = (float) (in[i*2 + 1] * 3 + 7);
+    }
+
+  /* check results:  */
+  for (i = 0; i < N*4; i++)
+    {
+      if (out2[i*2] !=  ((float) in[i*2] * 2 + 6)
+         || out2[i*2 + 1] != (float) (in[i*2 + 1] * 3 + 7))
+        abort ();
+    }
+
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect"  { target { vect_strided &&  vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  {target  { ! { vect_int_mult && vect_strided } } } } }  */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0  "vect"  } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+  
Index: gcc.dg/vect/slp-36.c
===================================================================
--- gcc.dg/vect/slp-36.c	(revision 0)
+++ gcc.dg/vect/slp-36.c	(revision 0)
@@ -0,0 +1,75 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_shift } */
+
+#define N 32
+
+/* All the loops are vectorizable on platforms with vector shift argument.  */
+
+void
+test_1 (void)
+{
+  static unsigned int bm[N];
+  static unsigned int cm[N];
+  int j;
+
+  /* Vectorizable on platforms with scalar shift argument.  */
+  for (j = 0; j < N/2; j++)
+   {
+    bm[2*j] <<= 8;
+    bm[2*j+1] <<= 8;
+   }
+
+  /* Not vectorizable on platforms with scalar shift argument.  */
+  for (j = 0; j < N/2; j++)
+   {
+    cm[2*j] <<= 8;
+    cm[2*j+1] <<= 7;
+   }
+}
+
+void
+test_2 (int a, int b)
+{
+  static unsigned int bm[N];
+  static unsigned int cm[N];
+  int j;
+
+  /* Vectorizable on platforms with scalar shift argument.  */
+  for (j = 0; j < N/2; j++)
+   {
+    bm[2*j] <<= a;
+    bm[2*j+1] <<= a;
+   }
+
+  /* Not vectorizable on platforms with scalar shift argument.  */
+  for (j = 0; j < N/2; j++)
+   {
+    cm[2*j] <<= a;
+    cm[2*j+1] <<= b;
+   }
+}
+
+void
+test_3 (void)
+{
+ static unsigned int bm[N];
+ int am[N];
+ int j;
+
+ /* Not vectorizable on platforms with scalar shift argument.  */
+ for (j = 0; j < N/2; j++)
+   {
+    bm[2*j] <<= am[j];
+    bm[2*j+1] <<= am[j];
+   }
+
+ /* Not vectorizable on platforms with scalar shift argument.  */
+ for (j = 0; j < N/2; j++)
+   {
+    bm[2*j] <<= am[2*j];
+    bm[2*j+1] <<= am[2*j+1];
+   }
+
+}
+
+/* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc.dg/vect/slp-19.c
===================================================================
--- gcc.dg/vect/slp-19.c	(revision 0)
+++ gcc.dg/vect/slp-19.c	(revision 0)
@@ -0,0 +1,155 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include "tree-vect.h"
+
+#define N 16 
+
+int
+main1 ()
+{
+  unsigned int i;
+  unsigned int out[N*8];
+  unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+  unsigned int ia[N*2], a0, a1, a2, a3;
+
+  for (i = 0; i < N; i++)
+    {
+      out[i*8] = in[i*8];
+      out[i*8 + 1] = in[i*8 + 1];
+      out[i*8 + 2] = in[i*8 + 2];
+      out[i*8 + 3] = in[i*8 + 3];
+      out[i*8 + 4] = in[i*8 + 4];
+      out[i*8 + 5] = in[i*8 + 5];
+      out[i*8 + 6] = in[i*8 + 6];
+      out[i*8 + 7] = in[i*8 + 7];
+    
+      ia[i] = in[i*8 + 2];
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (out[i*8] !=  in[i*8]
+         || out[i*8 + 1] != in[i*8 + 1]
+         || out[i*8 + 2] != in[i*8 + 2]
+         || out[i*8 + 3] != in[i*8 + 3]
+         || out[i*8 + 4] != in[i*8 + 4]
+         || out[i*8 + 5] != in[i*8 + 5]
+         || out[i*8 + 6] != in[i*8 + 6]
+         || out[i*8 + 7] != in[i*8 + 7]
+         || ia[i] != in[i*8 + 2])
+	abort ();
+    }
+
+  for (i = 0; i < N*2; i++)
+    {
+      a0 = in[i*4] + 1;
+      a1 = in[i*4 + 1] + 2;
+      a2 = in[i*4 + 2] + 3;
+      a3 = in[i*4 + 3] + 4;
+
+      out[i*4] = a0;
+      out[i*4 + 1] = a1;
+      out[i*4 + 2] = a2;
+      out[i*4 + 3] = a3;
+
+      ia[i] = a2;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N*2; i++)
+    {
+      if (out[i*4] !=  in[i*4] + 1
+         || out[i*4 + 1] != in[i*4 + 1] + 2
+         || out[i*4 + 2] != in[i*4 + 2] + 3
+         || out[i*4 + 3] != in[i*4 + 3] + 4
+         || ia[i] != in[i*4 + 2] + 3)
+        abort ();
+    }
+
+  /* The last stmt requires interleaving of not power of 2 size - not 
+     vectorizable.  */
+  for (i = 0; i < N/2; i++)
+    {
+      out[i*12] = in[i*12];
+      out[i*12 + 1] = in[i*12 + 1];
+      out[i*12 + 2] = in[i*12 + 2];
+      out[i*12 + 3] = in[i*12 + 3];
+      out[i*12 + 4] = in[i*12 + 4];
+      out[i*12 + 5] = in[i*12 + 5];
+      out[i*12 + 6] = in[i*12 + 6];
+      out[i*12 + 7] = in[i*12 + 7];
+      out[i*12 + 8] = in[i*12 + 8];
+      out[i*12 + 9] = in[i*12 + 9];
+      out[i*12 + 10] = in[i*12 + 10];
+      out[i*12 + 11] = in[i*12 + 11];
+
+      ia[i] = in[i*12 + 7];
+    }
+
+  /* check results:  */
+  for (i = 0; i < N/2; i++)
+    {
+      if (out[i*12] !=  in[i*12]
+         || out[i*12 + 1] != in[i*12 + 1]
+         || out[i*12 + 2] != in[i*12 + 2]
+         || out[i*12 + 3] != in[i*12 + 3]
+         || out[i*12 + 4] != in[i*12 + 4]
+         || out[i*12 + 5] != in[i*12 + 5]
+         || out[i*12 + 6] != in[i*12 + 6]
+         || out[i*12 + 7] != in[i*12 + 7]
+         || out[i*12 + 8] != in[i*12 + 8]
+         || out[i*12 + 9] != in[i*12 + 9]
+         || out[i*12 + 10] != in[i*12 + 10]
+         || out[i*12 + 11] != in[i*12 + 11]
+         || ia[i] != in[i*12 + 7])
+        abort ();
+    }
+
+  /* Hybrid SLP with unrolling by 2.  */
+  for (i = 0; i < N; i++)
+    {
+      out[i*6] = in[i*6];
+      out[i*6 + 1] = in[i*6 + 1];
+      out[i*6 + 2] = in[i*6 + 2];
+      out[i*6 + 3] = in[i*6 + 3];
+      out[i*6 + 4] = in[i*6 + 4];
+      out[i*6 + 5] = in[i*6 + 5];
+    
+      ia[i] = i;
+    } 
+    
+  /* check results:  */
+  for (i = 0; i < N/2; i++)
+    {
+      if (out[i*6] !=  in[i*6]
+         || out[i*6 + 1] != in[i*6 + 1]
+         || out[i*6 + 2] != in[i*6 + 2]
+         || out[i*6 + 3] != in[i*6 + 3]
+         || out[i*6 + 4] != in[i*6 + 4]
+         || out[i*6 + 5] != in[i*6 + 5]
+         || ia[i] != i)
+        abort ();
+    }
+
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target  vect_strided  } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target  { ! { vect_strided } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect"  { target  vect_strided  } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  { target { ! { vect_strided } } } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+  
Index: gcc.dg/vect/slp-20.c
===================================================================
--- gcc.dg/vect/slp-20.c	(revision 0)
+++ gcc.dg/vect/slp-20.c	(revision 0)
@@ -0,0 +1,116 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include "tree-vect.h"
+
+#define N 128 
+
+int
+main1 (unsigned short a0, unsigned short a1, unsigned short a2, 
+       unsigned short a3, unsigned short a4, unsigned short a5,
+       unsigned short a6, unsigned short a7, unsigned short a8)
+{
+  int i;
+  unsigned short out[N*8], out2[N*8], b0, b1, b2, b3, b4, b5, b6, b7, b8;
+
+  for (i = 0; i < N; i++)
+    {
+      b0 = a0 + 8;
+      b1 = a1 + 7;
+      b2 = a2 + 6;
+      b3 = a3 + 5;
+      b4 = a4 + 4;
+      b5 = a5 + 3;
+
+      out[i*4] = b0;
+      out[i*4 + 1] = b1;
+      out[i*4 + 2] = b2;
+      out[i*4 + 3] = b3;
+
+      out2[i*4] = b0;
+      out2[i*4 + 1] = b1;
+      out2[i*4 + 2] = b4;
+      out2[i*4 + 3] = b5;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (out[i*4] != b0 
+         || out[i*4 + 1] != b1
+         || out[i*4 + 2] != b2
+         || out[i*4 + 3] != b3)
+	abort ();
+
+      if (out2[i*4] != b0  
+         || out2[i*4 + 1] != b1
+         || out2[i*4 + 2] != b4
+         || out2[i*4 + 3] != b5)
+        abort ();
+    }
+
+  for (i = 0; i < N; i++)
+    {
+      b0 = a0 + 8;
+      b1 = a1 + 7;
+      b2 = a2 + 6;
+      b3 = a3 + 5;
+      b4 = a4 + 4;
+      b5 = a5 + 3;
+      b6 = a6 + 2;
+      b7 = a7 + 1;
+      b8 = a8 + 9;
+
+      out[i*4] = b0;
+      out[i*4 + 1] = b1;
+      out[i*4 + 2] = b2;
+      out[i*4 + 3] = b3;
+
+      out2[i*8] = b0;
+      out2[i*8 + 1] = b1;
+      out2[i*8 + 2] = b4;
+      out2[i*8 + 3] = b5;
+      out2[i*8 + 4] = b6;
+      out2[i*8 + 5] = b2;
+      out2[i*8 + 6] = b7;
+      out2[i*8 + 7] = b8;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (out[i*4] != b0 
+         || out[i*4 + 1] != b1
+         || out[i*4 + 2] != b2
+         || out[i*4 + 3] != b3)
+        abort ();
+
+      if (out2[i*8] != b0
+         || out2[i*8 + 1] != b1
+         || out2[i*8 + 2] != b4
+         || out2[i*8 + 3] != b5
+         || out2[i*8 + 4] != b6
+         || out2[i*8 + 5] != b2
+         || out2[i*8 + 6] != b7
+         || out2[i*8 + 7] != b8)
+        abort ();
+    }
+
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 (8,7,6,5,4,3,2,1,0);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect"  } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+  
Index: gcc.dg/vect/slp-28.c
===================================================================
--- gcc.dg/vect/slp-28.c	(revision 0)
+++ gcc.dg/vect/slp-28.c	(revision 0)
@@ -0,0 +1,86 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include "tree-vect.h"
+
+#define N 32 
+
+int
+main1 ()
+{
+  int i;
+  unsigned short in[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+  unsigned short in2[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+  unsigned short in3[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
+  unsigned short check[N] = {0,1,2,3,5,6,7,8,10,11,12,13,15,16,17,18,20,21,22,23,25,26,27,28,30,31,32,33,35,36,37,38};
+  unsigned short check3[N] = {0,1,2,3,4,5,6,7,8,9,10,11,5,6,7,8,9,10,11,12,13,14,15,16,10,11,12,13,14,15,16,17};
+
+  for (i = 0; i < N/4; i++)
+    {
+      in[i*4] = in[i*4] + 5;
+      in[i*4 + 1] = in[i*4 + 1] + 5;
+      in[i*4 + 2] = in[i*4 + 2] + 5;
+      in[i*4 + 3] = in[i*4 + 3] + 5;
+
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (in[i] != i+5)
+        abort ();
+    }
+
+  /* Not vectorizable because of data dependencies.  */
+  for (i = 1; i < N/4; i++)
+    {
+      in2[i*4] = in2[(i-1)*4] + 5;
+      in2[i*4 + 1] = in2[(i-1)*4 + 1] + 5;
+      in2[i*4 + 2] = in2[(i-1)*4 + 2] + 5;
+      in2[i*4 + 3] = in2[(i-1)*4 + 3] + 5;
+
+    }
+
+  /* check results:  */
+  for (i = 4; i < N; i++)
+    {
+      if (in2[i] != check[i])
+        abort ();
+    }
+  
+  /* Not vectorizable because of data dependencies: distance 3 is greater than 
+     the actual VF with SLP (2), but the analysis fail to detect that for now.  */
+  for (i = 3; i < N/4; i++)
+    {
+      in3[i*4] = in3[(i-3)*4] + 5;
+      in3[i*4 + 1] = in3[(i-3)*4 + 1] + 5;
+      in3[i*4 + 2] = in3[(i-3)*4 + 2] + 5;
+      in3[i*4 + 3] = in3[(i-3)*4 + 3] + 5;
+
+    }
+
+  /* check results:  */
+  for (i = 12; i < N; i++)
+    {
+      if (in3[i] != check3[i])
+        abort ();
+    }
+
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+  
Index: gcc.dg/vect/slp-2.c
===================================================================
--- gcc.dg/vect/slp-2.c	(revision 0)
+++ gcc.dg/vect/slp-2.c	(revision 0)
@@ -0,0 +1,146 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include "tree-vect.h"
+
+#define N 128 
+
+int
+main1 (unsigned short a0, unsigned short a1, unsigned short a2, 
+       unsigned short a3, unsigned short a4, unsigned short a5,
+       unsigned short a6, unsigned short a7, unsigned short a8,
+       unsigned short a9, unsigned short a10, unsigned short a11,
+       unsigned short a12, unsigned short a13, unsigned short a14,
+       unsigned short a15)
+{
+  int i;
+  unsigned short out[N*16];
+
+  for (i = 0; i < N; i++)
+    {
+      out[i*4] = a8;
+      out[i*4 + 1] = a1;
+      out[i*4 + 2] = a2;
+      out[i*4 + 3] = a3;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (out[i*4] != a8 
+         || out[i*4 + 1] != a1
+         || out[i*4 + 2] != a2
+         || out[i*4 + 3] != a3)
+	abort ();
+    }
+
+  for (i = 0; i < N; i++)
+    {
+      out[i*16] = a8;
+      out[i*16 + 1] = a7;
+      out[i*16 + 2] = a1;
+      out[i*16 + 3] = a2;
+      out[i*16 + 4] = a8;
+      out[i*16 + 5] = a5;
+      out[i*16 + 6] = a5;
+      out[i*16 + 7] = a4;
+      out[i*16 + 8] = a12;
+      out[i*16 + 9] = a13;
+      out[i*16 + 10] = a14;
+      out[i*16 + 11] = a15;
+      out[i*16 + 12] = a6;
+      out[i*16 + 13] = a9;
+      out[i*16 + 14] = a0;
+      out[i*16 + 15] = a7;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (out[i*16] != a8
+          || out[i*16 + 1] != a7
+          || out[i*16 + 2] != a1
+          || out[i*16 + 3] != a2
+          || out[i*16 + 4] != a8
+          || out[i*16 + 5] != a5
+          || out[i*16 + 6] != a5
+          || out[i*16 + 7] != a4
+          || out[i*16 + 8] != a12
+          || out[i*16 + 9] != a13
+          || out[i*16 + 10] != a14
+          || out[i*16 + 11] != a15
+          || out[i*16 + 12] != a6
+          || out[i*16 + 13] != a9
+          || out[i*16 + 14] != a0
+          || out[i*16 + 15] != a7)
+        abort ();
+    }
+
+  /* SLP with unrolling by 8.  */
+  for (i = 0; i < N; i++)
+    {
+      out[i*3] = a8;
+      out[i*3 + 1] = a1;
+      out[i*3 + 2] = a2;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (out[i*3] != a8
+         || out[i*3 + 1] != a1
+         || out[i*3 + 2] != a2)
+        abort ();
+    }
+
+  /* SLP with unrolling by 8.  */
+  for (i = 0; i < N; i++)
+    {
+      out[i*11] = a8;
+      out[i*11 + 1] = a7;
+      out[i*11 + 2] = a1;
+      out[i*11 + 3] = a2;
+      out[i*11 + 4] = a8;
+      out[i*11 + 5] = a5;
+      out[i*11 + 6] = a5;
+      out[i*11 + 7] = a4;
+      out[i*11 + 8] = a12;
+      out[i*11 + 9] = a13;
+      out[i*11 + 10] = a14;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (out[i*11] != a8
+          || out[i*11 + 1] != a7
+          || out[i*11 + 2] != a1
+          || out[i*11 + 3] != a2
+          || out[i*11 + 4] != a8
+          || out[i*11 + 5] != a5
+          || out[i*11 + 6] != a5
+          || out[i*11 + 7] != a4
+          || out[i*11 + 8] != a12
+          || out[i*11 + 9] != a13
+          || out[i*11 + 10] != a14)
+        abort ();
+    }
+
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 (15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect"  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect"  } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+  
Index: gcc.dg/vect/slp-37.c
===================================================================
--- gcc.dg/vect/slp-37.c	(revision 0)
+++ gcc.dg/vect/slp-37.c	(revision 0)
@@ -0,0 +1,67 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include <stdlib.h>
+#include "tree-vect.h"
+
+#define N 128 
+
+typedef struct {
+   int a;
+   int b;
+   void *c;
+} s1;
+
+int
+foo1 (s1 *arr)
+{
+  int i;
+  s1 *ptr = arr;
+
+  /* Different constant types - not SLPable.  The group size is not power of 2,
+     interleaving is not supported either.  */
+  for (i = 0; i < N; i++)
+    {
+      ptr->a = 6;
+      ptr->b = 7;
+      ptr->c = NULL;
+      ptr++; 
+    } 
+   
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    { 
+       if (arr[i].a != 6 
+           || arr[i].b != 7
+           || arr[i].c != NULL)
+         abort();
+    }
+}
+
+int main (void)
+{
+  int i;
+  s1 arr1[N];
+  
+  check_vect ();
+
+  for (i = 0; i < N; i++)
+    { 
+      arr1[i].a = i;
+      arr1[i].b = i * 2;
+      arr1[i].c = (void *)arr1;
+
+      if (arr1[i].a == 178)
+         abort(); 
+    } 
+
+
+  foo1 (arr1);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect"  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+  
Index: gcc.dg/vect/slp-21.c
===================================================================
--- gcc.dg/vect/slp-21.c	(revision 0)
+++ gcc.dg/vect/slp-21.c	(revision 0)
@@ -0,0 +1,208 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include "tree-vect.h"
+
+#define N 128 
+
+int
+main1 ()
+{
+  unsigned short i;
+  unsigned short out[N*8], out2[N*8], b0, b1, b2, b3, b4, a0, a1, a2, a3, b5;
+  unsigned short in[N*8];
+
+  for (i = 0; i < N*8; i++)
+    {
+      in[i] = i;
+    }
+
+  /* Different operations in both cases - vectorization with interleaving.  */
+  for (i = 0; i < N; i++)
+    {
+      a0 = in[i*4];
+      a1 = in[i*4 + 1];
+      a2 = in[i*4 + 2];
+      a3 = in[i*4 + 3];
+
+      b0 = a0 * 8;
+      b1 = a1 + 7;
+      b2 = a2 + 6;
+      b3 = a3 * 5;
+      
+      b4 = a2 + 4;
+      b5 = a3 + 3;
+
+      out[i*4] = b0;
+      out[i*4 + 1] = b1;
+      out[i*4 + 2] = b2;
+      out[i*4 + 3] = b3;
+
+      out2[i*4] = b0;
+      out2[i*4 + 1] = b1;
+      out2[i*4 + 2] = b4;
+      out2[i*4 + 3] = b5;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      a0 = in[i*4];
+      a1 = in[i*4 + 1];
+      a2 = in[i*4 + 2];
+      a3 = in[i*4 + 3];
+
+      b0 = a0 * 8;
+      b1 = a1 + 7;
+      b2 = a2 + 6;
+      b3 = a3 * 5;
+
+      b4 = a2 + 4;
+      b5 = a3 + 3;
+
+      if (out[i*4] != b0 
+         || out[i*4 + 1] != b1
+         || out[i*4 + 2] != b2
+         || out[i*4 + 3] != b3)
+	abort ();
+
+      if (out2[i*4] != b0  
+         || out2[i*4 + 1] != b1
+         || out2[i*4 + 2] != b4
+         || out2[i*4 + 3] != b5)
+        abort ();
+    }
+
+  /* Different operations in the first case - vectorization with interleaving.  */
+  for (i = 0; i < N; i++)
+    {
+      a0 = in[i*4];
+      a1 = in[i*4 + 1];
+      a2 = in[i*4 + 2];
+      a3 = in[i*4 + 3];
+
+      b0 = a0 + 8;
+      b1 = a1 + 7;
+      b2 = a2 + 6;
+      b3 = a3 * 5;
+
+      b4 = a2 + 4;
+      b5 = a3 + 3;
+
+      out[i*4] = b0;
+      out[i*4 + 1] = b1;
+      out[i*4 + 2] = b2;
+      out[i*4 + 3] = b3;
+
+      out2[i*4] = b0;
+      out2[i*4 + 1] = b1;
+      out2[i*4 + 2] = b4;
+      out2[i*4 + 3] = b5;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      a0 = in[i*4];
+      a1 = in[i*4 + 1];
+      a2 = in[i*4 + 2];
+      a3 = in[i*4 + 3];
+
+      b0 = a0 + 8;
+      b1 = a1 + 7;
+      b2 = a2 + 6;
+      b3 = a3 * 5;
+
+      b4 = a2 + 4;
+      b5 = a3 + 3;
+
+      if (out[i*4] != b0
+         || out[i*4 + 1] != b1
+         || out[i*4 + 2] != b2
+         || out[i*4 + 3] != b3)
+        abort ();
+
+      if (out2[i*4] != b0
+         || out2[i*4 + 1] != b1
+         || out2[i*4 + 2] != b4
+         || out2[i*4 + 3] != b5)
+        abort ();
+    }
+
+
+  /* Different operations in the second case - vectorization with interleaving.  */
+  for (i = 0; i < N; i++) 
+    { 
+      a0 = in[i*4];
+      a1 = in[i*4 + 1];
+      a2 = in[i*4 + 2];
+      a3 = in[i*4 + 3];
+
+      b0 = a0 + 8;
+      b1 = a1 + 7;
+      b2 = a2 + 6;
+      b3 = a3 + 5;
+
+      b4 = a2 * 4;
+      b5 = a3 + 3;
+
+      out[i*4] = b0;
+      out[i*4 + 1] = b1;
+      out[i*4 + 2] = b2;
+      out[i*4 + 3] = b3;
+
+      out2[i*4] = b0;
+      out2[i*4 + 1] = b1;
+      out2[i*4 + 2] = b4;
+      out2[i*4 + 3] = b5;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      a0 = in[i*4];
+      a1 = in[i*4 + 1];
+      a2 = in[i*4 + 2];
+      a3 = in[i*4 + 3];
+
+      b0 = a0 + 8;
+      b1 = a1 + 7;
+      b2 = a2 + 6;
+      b3 = a3 + 5;
+
+      b4 = a2 * 4;
+      b5 = a3 + 3;
+
+      if (out[i*4] != b0
+         || out[i*4 + 1] != b1
+         || out[i*4 + 2] != b2
+         || out[i*4 + 3] != b3)
+        abort ();
+
+      if (out2[i*4] != b0
+         || out2[i*4 + 1] != b1
+         || out2[i*4 + 2] != b4
+         || out2[i*4 + 3] != b5)
+        abort ();
+    }
+
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect"  { target vect_strided } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target  { ! { vect_strided } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_strided }  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  { target { ! { vect_strided } } } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+  
Index: gcc.dg/vect/slp-3.c
===================================================================
--- gcc.dg/vect/slp-3.c	(revision 0)
+++ gcc.dg/vect/slp-3.c	(revision 0)
@@ -0,0 +1,147 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include "tree-vect.h"
+
+#define N 8 
+
+int
+main1 ()
+{
+  int i;
+  unsigned short out[N*8];
+  unsigned short in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+
+  for (i = 0; i < N; i++)
+    {
+      out[i*8] = in[i*8];
+      out[i*8 + 1] = in[i*8 + 1];
+      out[i*8 + 2] = in[i*8 + 2];
+      out[i*8 + 3] = in[i*8 + 3];
+      out[i*8 + 4] = in[i*8 + 4];
+      out[i*8 + 5] = in[i*8 + 5];
+      out[i*8 + 6] = in[i*8 + 6];
+      out[i*8 + 7] = in[i*8 + 7];
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (out[i*8] !=  in[i*8]
+         || out[i*8 + 1] != in[i*8 + 1]
+         || out[i*8 + 2] != in[i*8 + 2]
+         || out[i*8 + 3] != in[i*8 + 3]
+         || out[i*8 + 4] != in[i*8 + 4]
+         || out[i*8 + 5] != in[i*8 + 5]
+         || out[i*8 + 6] != in[i*8 + 6]
+         || out[i*8 + 7] != in[i*8 + 7])
+	abort ();
+    }
+
+  for (i = 0; i < N*2; i++)
+    {
+      out[i*4] = in[i*4];
+      out[i*4 + 1] = in[i*4 + 1];
+      out[i*4 + 2] = in[i*4 + 2];
+      out[i*4 + 3] = in[i*4 + 3];
+    }
+
+  /* check results:  */
+  for (i = 0; i < N*2; i++)
+    {
+      if (out[i*4] !=  in[i*4]
+         || out[i*4 + 1] != in[i*4 + 1]
+         || out[i*4 + 2] != in[i*4 + 2]
+         || out[i*4 + 3] != in[i*4 + 3])
+        abort ();
+    }
+
+  for (i = 0; i < N/2; i++)
+    {
+      out[i*16] = in[i*16];
+      out[i*16 + 1] = in[i*16 + 1];
+      out[i*16 + 2] = in[i*16 + 2];
+      out[i*16 + 3] = in[i*16 + 3];
+      out[i*16 + 4] = in[i*16 + 4];
+      out[i*16 + 5] = in[i*16 + 5];
+      out[i*16 + 6] = in[i*16 + 6];
+      out[i*16 + 7] = in[i*16 + 7];
+      out[i*16 + 8] = in[i*16 + 8];
+      out[i*16 + 9] = in[i*16 + 9];
+      out[i*16 + 10] = in[i*16 + 10];
+      out[i*16 + 11] = in[i*16 + 11];
+      out[i*16 + 12] = in[i*16 + 12];
+      out[i*16 + 13] = in[i*16 + 13];
+      out[i*16 + 14] = in[i*16 + 14];
+      out[i*16 + 15] = in[i*16 + 15];
+    }
+
+  /* check results:  */
+  for (i = 0; i < N/2; i++)
+    {
+      if (out[i*16] !=  in[i*16]
+         || out[i*16 + 1] != in[i*16 + 1]
+         || out[i*16 + 2] != in[i*16 + 2]
+         || out[i*16 + 3] != in[i*16 + 3]
+         || out[i*16 + 4] != in[i*16 + 4]
+         || out[i*16 + 5] != in[i*16 + 5]
+         || out[i*16 + 6] != in[i*16 + 6]
+         || out[i*16 + 7] != in[i*16 + 7]
+         || out[i*16 + 8] != in[i*16 + 8]
+         || out[i*16 + 9] != in[i*16 + 9]
+         || out[i*16 + 10] != in[i*16 + 10]
+         || out[i*16 + 11] != in[i*16 + 11]
+         || out[i*16 + 12] != in[i*16 + 12]
+         || out[i*16 + 13] != in[i*16 + 13]
+         || out[i*16 + 14] != in[i*16 + 14]
+         || out[i*16 + 15] != in[i*16 + 15])
+        abort ();
+    }
+
+  /* SLP with unrolling by 8.  */
+  for (i = 0; i < N/2; i++)
+    {
+      out[i*9] = in[i*9];
+      out[i*9 + 1] = in[i*9 + 1];
+      out[i*9 + 2] = in[i*9 + 2];
+      out[i*9 + 3] = in[i*9 + 3];
+      out[i*9 + 4] = in[i*9 + 4];
+      out[i*9 + 5] = in[i*9 + 5];
+      out[i*9 + 6] = in[i*9 + 6];
+      out[i*9 + 7] = in[i*9 + 7];
+      out[i*9 + 8] = in[i*9 + 8];
+    }
+
+  /* check results:  */
+  for (i = 0; i < N/2; i++)
+    {
+      if (out[i*9] !=  in[i*9]
+         || out[i*9 + 1] != in[i*9 + 1]
+         || out[i*9 + 2] != in[i*9 + 2]
+         || out[i*9 + 3] != in[i*9 + 3]
+         || out[i*9 + 4] != in[i*9 + 4]
+         || out[i*9 + 5] != in[i*9 + 5]
+         || out[i*9 + 6] != in[i*9 + 6]
+         || out[i*9 + 7] != in[i*9 + 7]
+         || out[i*9 + 8] != in[i*9 + 8])
+        abort ();
+    }
+
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect"  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect"  } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+  
Index: gcc.dg/vect/slp-13.c
===================================================================
--- gcc.dg/vect/slp-13.c	(revision 0)
+++ gcc.dg/vect/slp-13.c	(revision 0)
@@ -0,0 +1,134 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include "tree-vect.h"
+
+#define N 8 
+
+int
+main1 ()
+{
+  int i;
+  unsigned short out[N*8];
+  unsigned short in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+  unsigned int in2[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+  unsigned int out2[N*8];
+
+  /* Induction is not SLPable yet.  */
+  for (i = 0; i < N; i++)
+    {
+      out[i*8] = in[i*8] + i;
+      out[i*8 + 1] = in[i*8 + 1] + i;
+      out[i*8 + 2] = in[i*8 + 2] + i;
+      out[i*8 + 3] = in[i*8 + 3] + i;
+      out[i*8 + 4] = in[i*8 + 4] + i;
+      out[i*8 + 5] = in[i*8 + 5] + i;
+      out[i*8 + 6] = in[i*8 + 6] + i;
+      out[i*8 + 7] = in[i*8 + 7] + i;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (out[i*8] !=  in[i*8] + i
+         || out[i*8 + 1] != in[i*8 + 1] + i
+         || out[i*8 + 2] != in[i*8 + 2] + i
+         || out[i*8 + 3] != in[i*8 + 3] + i
+         || out[i*8 + 4] != in[i*8 + 4] + i
+         || out[i*8 + 5] != in[i*8 + 5] + i
+         || out[i*8 + 6] != in[i*8 + 6] + i
+         || out[i*8 + 7] != in[i*8 + 7] + i)
+	abort ();
+    }
+
+  /* Induction is not SLPable yet and strided group size must be a power of 2 
+     to get vectorized.  */
+  for (i = 0; i < N/2; i++)
+    {
+      out2[i*12] = in2[i*12] + i;
+      out2[i*12 + 1] = in2[i*12 + 1] + i;
+      out2[i*12 + 2] = in2[i*12 + 2] + i;
+      out2[i*12 + 3] = in2[i*12 + 3] + i;
+      out2[i*12 + 4] = in2[i*12 + 4] + i;
+      out2[i*12 + 5] = in2[i*12 + 5] + i;
+      out2[i*12 + 6] = in2[i*12 + 6] + i;
+      out2[i*12 + 7] = in2[i*12 + 7] + i;
+      out2[i*12 + 8] = in2[i*12 + 8] + i;
+      out2[i*12 + 9] = in2[i*12 + 9] + i;
+      out2[i*12 + 10] = in2[i*12 + 10] + i;
+      out2[i*12 + 11] = in2[i*12 + 11] + i;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N/2; i++)
+    {
+        if (out2[i*12] != in2[i*12] + i
+            || out2[i*12 + 1] != in2[i*12 + 1] + i
+            || out2[i*12 + 2] != in2[i*12 + 2] + i
+            || out2[i*12 + 3] != in2[i*12 + 3] + i
+            || out2[i*12 + 4] != in2[i*12 + 4] + i
+            || out2[i*12 + 5] != in2[i*12 + 5] + i
+            || out2[i*12 + 6] != in2[i*12 + 6] + i
+            || out2[i*12 + 7] != in2[i*12 + 7] + i
+            || out2[i*12 + 8] != in2[i*12 + 8] + i
+            || out2[i*12 + 9] != in2[i*12 + 9] + i
+            || out2[i*12 + 10] != in2[i*12 + 10] + i
+            || out2[i*12 + 11] != in2[i*12 + 11] + i)
+          abort ();
+    }
+
+  /* Not power of 2 but SLPable.  */
+  for (i = 0; i < N/2; i++)
+    {
+      out2[i*12] = in2[i*12] + 1;
+      out2[i*12 + 1] = in2[i*12 + 1] + 2;
+      out2[i*12 + 2] = in2[i*12 + 2] + 3;
+      out2[i*12 + 3] = in2[i*12 + 3] + 4;
+      out2[i*12 + 4] = in2[i*12 + 4] + 5;
+      out2[i*12 + 5] = in2[i*12 + 5] + 6;
+      out2[i*12 + 6] = in2[i*12 + 6] + 7;
+      out2[i*12 + 7] = in2[i*12 + 7] + 8;
+      out2[i*12 + 8] = in2[i*12 + 8] + 9;
+      out2[i*12 + 9] = in2[i*12 + 9] + 10;
+      out2[i*12 + 10] = in2[i*12 + 10] + 11;
+      out2[i*12 + 11] = in2[i*12 + 11] + 12;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N/2; i++)
+    {
+        if (out2[i*12] != in2[i*12] + 1
+            || out2[i*12 + 1] != in2[i*12 + 1] + 2
+            || out2[i*12 + 2] != in2[i*12 + 2] + 3
+            || out2[i*12 + 3] != in2[i*12 + 3] + 4
+            || out2[i*12 + 4] != in2[i*12 + 4] + 5
+            || out2[i*12 + 5] != in2[i*12 + 5] + 6
+            || out2[i*12 + 6] != in2[i*12 + 6] + 7
+            || out2[i*12 + 7] != in2[i*12 + 7] + 8
+            || out2[i*12 + 8] != in2[i*12 + 8] + 9
+            || out2[i*12 + 9] != in2[i*12 + 9] + 10
+            || out2[i*12 + 10] != in2[i*12 + 10] + 11
+            || out2[i*12 + 11] != in2[i*12 + 11] + 12)
+          abort ();
+    }
+
+
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { xfail *-*-* }  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+  
Index: gcc.dg/vect/vect.exp
===================================================================
--- gcc.dg/vect/vect.exp	(revision 127394)
+++ gcc.dg/vect/vect.exp	(working copy)
@@ -108,6 +108,8 @@ dg-runtest [lsort [glob -nocomplain $src
 	"" $DEFAULT_VECTCFLAGS
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/vect-*.\[cS\]]]  \
 	"" $DEFAULT_VECTCFLAGS
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/slp-*.\[cS\]]]  \
+        "" $DEFAULT_VECTCFLAGS
 
 #### Tests with special options
 global SAVED_DEFAULT_VECTCFLAGS
@@ -116,25 +118,25 @@ set SAVED_DEFAULT_VECTCFLAGS $DEFAULT_VE
 # -ffast-math tests
 set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
 lappend DEFAULT_VECTCFLAGS "-ffast-math"
-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/fast-math-vect*.\[cS\]]]  \
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/fast-math-*.\[cS\]]]  \
 	"" $DEFAULT_VECTCFLAGS
 
 # -fno-math-errno tests
 set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
 lappend DEFAULT_VECTCFLAGS "-fno-math-errno"
-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-math-errno-vect*.\[cS\]]]  \
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-math-errno-*.\[cS\]]]  \
 	"" $DEFAULT_VECTCFLAGS
 
 # -fwrapv tests
 set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
 lappend DEFAULT_VECTCFLAGS "-fwrapv"
-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/wrapv-vect*.\[cS\]]]  \
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/wrapv-*.\[cS\]]]  \
         "" $DEFAULT_VECTCFLAGS
 
 # -ftrapv tests
 set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
 lappend DEFAULT_VECTCFLAGS "-ftrapv"
-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/trapv-vect*.\[cS\]]]  \
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/trapv-*.\[cS\]]]  \
 	"" $DEFAULT_VECTCFLAGS
 
 # -fdump-tree-dceloop-details tests
@@ -179,12 +181,24 @@ lappend DEFAULT_VECTCFLAGS "-fno-tree-sc
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-tree-scev-cprop-*.\[cS\]]]  \
 	"" $DEFAULT_VECTCFLAGS
 
+# -fno-tree-scev-cprop
+set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
+lappend DEFAULT_VECTCFLAGS "-fno-tree-scev-cprop"
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-scevccp-slp-*.\[cS\]]]  \
+        "" $DEFAULT_VECTCFLAGS
+
 # -fno-tree-dominator-opts
 set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
 lappend DEFAULT_VECTCFLAGS "-fno-tree-dominator-opts"
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-tree-dom-*.\[cS\]]]  \
 	"" $DEFAULT_VECTCFLAGS
 
+# -fno-tree-pre
+set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
+lappend DEFAULT_VECTCFLAGS "-fno-tree-pre"
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-tree-pre-*.\[cS\]]]  \
+        "" $DEFAULT_VECTCFLAGS
+
 # With -Os
 set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
 lappend DEFAULT_VECTCFLAGS "-Os"
Index: gcc.dg/vect/slp-22.c
===================================================================
--- gcc.dg/vect/slp-22.c	(revision 0)
+++ gcc.dg/vect/slp-22.c	(revision 0)
@@ -0,0 +1,135 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include "tree-vect.h"
+
+#define N 128 
+
+int
+main1 (unsigned short a0, unsigned short a1, unsigned short a2, 
+       unsigned short a3, unsigned short a4, unsigned short a5,
+       unsigned short a6, unsigned short a7, unsigned short a8)
+{
+  int i;
+  unsigned short out[N*8], out2[N*8], out3[N*8], b0, b1, b2, b3, b4, b5, b6, b7, b8;
+
+  for (i = 0; i < N; i++)
+    {
+      b0 = a0 + 8;
+      b1 = a1 + 7;
+      b2 = a2 + 6;
+      b3 = a3 + 5;
+      b4 = a4 + 4;
+      b5 = a5 + 3;
+
+      out[i*4] = b0;
+      out[i*4 + 1] = b1;
+      out[i*4 + 2] = b2;
+      out[i*4 + 3] = b3;
+
+      out2[i*4] = b0;
+      out2[i*4 + 1] = b1;
+      out2[i*4 + 2] = b4;
+      out2[i*4 + 3] = b5;
+
+      out3[i*4] = b2;
+      out3[i*4 + 1] = b1;
+      out3[i*4 + 2] = b4;
+      out3[i*4 + 3] = b5;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (out[i*4] != b0 
+         || out[i*4 + 1] != b1
+         || out[i*4 + 2] != b2
+         || out[i*4 + 3] != b3)
+	abort ();
+
+    
+      if (out2[i*4] != b0
+         || out2[i*4 + 1] != b1
+         || out2[i*4 + 2] != b4
+         || out2[i*4 + 3] != b5)
+        abort ();
+   
+      if (out3[i*4] != b2  
+         || out3[i*4 + 1] != b1
+         || out3[i*4 + 2] != b4
+         || out3[i*4 + 3] != b5)
+        abort ();
+    }
+
+  for (i = 0; i < N; i++)
+    {
+      b0 = a0 + 8;
+      b1 = a1 + 7;
+      b2 = a2 + 6;
+      b3 = a3 + 5;
+      b4 = a4 + 4;
+      b5 = a5 + 3;
+      b6 = a6 + 2;
+      b7 = a7 + 1;
+      b8 = a8 + 9;
+
+      out[i*4] = b0;
+      out[i*4 + 1] = b1;
+      out[i*4 + 2] = b2;
+      out[i*4 + 3] = b3;
+
+      out2[i*8] = b0;
+      out2[i*8 + 1] = b1;
+      out2[i*8 + 2] = b4;
+      out2[i*8 + 3] = b5;
+      out2[i*8 + 4] = b6;
+      out2[i*8 + 5] = b2;
+      out2[i*8 + 6] = b7;
+      out2[i*8 + 7] = b8;
+
+      out3[2*i + 1] = a0;
+      out3[2*i] = b8; 
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (out[i*4] != b0 
+         || out[i*4 + 1] != b1
+         || out[i*4 + 2] != b2
+         || out[i*4 + 3] != b3)
+        abort ();
+
+      if (out2[i*8] != b0
+         || out2[i*8 + 1] != b1
+         || out2[i*8 + 2] != b4
+         || out2[i*8 + 3] != b5
+         || out2[i*8 + 4] != b6
+         || out2[i*8 + 5] != b2
+         || out2[i*8 + 6] != b7
+         || out2[i*8 + 7] != b8)
+        abort ();
+
+      if (out3[2*i] != b8
+          || out3[2*i+1] != a0)
+        abort();
+    }
+
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 (8,7,6,5,4,3,2,1,0);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 6 "vect"  } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+  
Index: gcc.dg/vect/no-tree-pre-slp-29.c
===================================================================
--- gcc.dg/vect/no-tree-pre-slp-29.c	(revision 0)
+++ gcc.dg/vect/no-tree-pre-slp-29.c	(revision 0)
@@ -0,0 +1,79 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include "tree-vect.h"
+
+#define N 8 
+
+unsigned short in2[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+
+int
+main1 (unsigned short *in)
+{
+  int i;
+  unsigned short out[N*8];
+
+  for (i = 0; i < N; i++)
+    {
+      out[i*4] = in[i*4];
+      out[i*4 + 1] = in[i*4 + 1];
+      out[i*4 + 2] = in[i*4 + 2];
+      out[i*4 + 3] = in[i*4 + 3];
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (out[i*4] !=  in[i*4]
+         || out[i*4 + 1] != in[i*4 + 1]
+         || out[i*4 + 2] != in[i*4 + 2]
+         || out[i*4 + 3] != in[i*4 + 3])
+        abort ();
+    }
+
+  return 0;
+}
+
+int
+main2 (unsigned short * __restrict__ in, unsigned short * __restrict__ out)
+{
+  int i;
+
+  for (i = 0; i < N; i++)
+    {
+      out[i*4] = in[i*4];
+      out[i*4 + 1] = in[i*4 + 1];
+      out[i*4 + 2] = in[i*4 + 2];
+      out[i*4 + 3] = in[i*4 + 3];
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (out[i*4] !=  in[i*4]
+         || out[i*4 + 1] != in[i*4 + 1]
+         || out[i*4 + 2] != in[i*4 + 2]
+         || out[i*4 + 3] != in[i*4 + 3])
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{
+  unsigned short out[N*8];
+
+  check_vect ();
+
+  main1 (&in2[5]);
+  main2 (&in2[3], &out[3]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect"  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect"  } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+  
Index: gcc.dg/vect/slp-4.c
===================================================================
--- gcc.dg/vect/slp-4.c	(revision 0)
+++ gcc.dg/vect/slp-4.c	(revision 0)
@@ -0,0 +1,128 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include "tree-vect.h"
+
+#define N 16 
+
+int
+main1 ()
+{
+  int i;
+  unsigned short out[N*8];
+  unsigned short in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+  unsigned int ia[N*2];
+
+  for (i = 0; i < N; i++)
+    {
+      out[i*8] = in[i*8];
+      out[i*8 + 1] = in[i*8 + 1];
+      out[i*8 + 2] = in[i*8 + 2];
+      out[i*8 + 3] = in[i*8 + 3];
+      out[i*8 + 4] = in[i*8 + 4];
+      out[i*8 + 5] = in[i*8 + 5];
+      out[i*8 + 6] = in[i*8 + 6];
+      out[i*8 + 7] = in[i*8 + 7];
+    
+      ia[i] = 7;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (out[i*8] !=  in[i*8]
+         || out[i*8 + 1] != in[i*8 + 1]
+         || out[i*8 + 2] != in[i*8 + 2]
+         || out[i*8 + 3] != in[i*8 + 3]
+         || out[i*8 + 4] != in[i*8 + 4]
+         || out[i*8 + 5] != in[i*8 + 5]
+         || out[i*8 + 6] != in[i*8 + 6]
+         || out[i*8 + 7] != in[i*8 + 7]
+         || ia[i] != 7)
+	abort ();
+    }
+
+  for (i = 0; i < N*2; i++)
+    {
+      out[i*4] = in[i*4];
+      out[i*4 + 1] = in[i*4 + 1];
+      out[i*4 + 2] = in[i*4 + 2];
+      out[i*4 + 3] = in[i*4 + 3];
+
+      ia[i] = 12;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N*2; i++)
+    {
+      if (out[i*4] !=  in[i*4]
+         || out[i*4 + 1] != in[i*4 + 1]
+         || out[i*4 + 2] != in[i*4 + 2]
+         || out[i*4 + 3] != in[i*4 + 3]
+         || ia[i] != 12)
+        abort ();
+    }
+
+  for (i = 0; i < N/2; i++)
+    {
+      out[i*16] = in[i*16];
+      out[i*16 + 1] = in[i*16 + 1];
+      out[i*16 + 2] = in[i*16 + 2];
+      out[i*16 + 3] = in[i*16 + 3];
+      out[i*16 + 4] = in[i*16 + 4];
+      out[i*16 + 5] = in[i*16 + 5];
+      out[i*16 + 6] = in[i*16 + 6];
+      out[i*16 + 7] = in[i*16 + 7];
+      out[i*16 + 8] = in[i*16 + 8];
+      out[i*16 + 9] = in[i*16 + 9];
+      out[i*16 + 10] = in[i*16 + 10];
+      out[i*16 + 11] = in[i*16 + 11];
+      out[i*16 + 12] = in[i*16 + 12];
+      out[i*16 + 13] = in[i*16 + 13];
+      out[i*16 + 14] = in[i*16 + 14];
+      out[i*16 + 15] = in[i*16 + 15];
+
+      ia[i] = 21;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N/2; i++)
+    {
+      if (out[i*16] !=  in[i*16]
+         || out[i*16 + 1] != in[i*16 + 1]
+         || out[i*16 + 2] != in[i*16 + 2]
+         || out[i*16 + 3] != in[i*16 + 3]
+         || out[i*16 + 4] != in[i*16 + 4]
+         || out[i*16 + 5] != in[i*16 + 5]
+         || out[i*16 + 6] != in[i*16 + 6]
+         || out[i*16 + 7] != in[i*16 + 7]
+         || out[i*16 + 8] != in[i*16 + 8]
+         || out[i*16 + 9] != in[i*16 + 9]
+         || out[i*16 + 10] != in[i*16 + 10]
+         || out[i*16 + 11] != in[i*16 + 11]
+         || out[i*16 + 12] != in[i*16 + 12]
+         || out[i*16 + 13] != in[i*16 + 13]
+         || out[i*16 + 14] != in[i*16 + 14]
+         || out[i*16 + 15] != in[i*16 + 15]
+         || ia[i] != 21)
+        abort ();
+    }
+
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect"  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect"  } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+  
Index: gcc.dg/vect/slp-12a.c
===================================================================
--- gcc.dg/vect/slp-12a.c	(revision 0)
+++ gcc.dg/vect/slp-12a.c	(revision 0)
@@ -0,0 +1,105 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include "tree-vect.h"
+
+#define N 8 
+
+int
+main1 ()
+{
+  int i;
+  unsigned int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7;
+  unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+  unsigned int ia[N], ib[N*2];
+
+  for (i = 0; i < N; i++)
+    {
+      
+      a0 = in[i*8] + 5;
+      a1 = in[i*8 + 1] + 6;
+      a2 = in[i*8 + 2] + 7;
+      a3 = in[i*8 + 3] + 8;
+      a4 = in[i*8 + 4] + 9;
+      a5 = in[i*8 + 5] + 10;
+      a6 = in[i*8 + 6] + 11;
+      a7 = in[i*8 + 7] + 12;
+
+      b0 = a0 * 3;
+      b1 = a1 * 2;
+      b2 = a2 * 12;
+      b3 = a3 * 5;
+      b4 = a4 * 8;
+      b5 = a5 * 4;
+      b6 = a6 * 3;
+      b7 = a7 * 2;
+
+      out[i*8] = b0 - 2;
+      out[i*8 + 1] = b1 - 3; 
+      out[i*8 + 2] = b2 - 2;
+      out[i*8 + 3] = b3 - 1;
+      out[i*8 + 4] = b4 - 8;
+      out[i*8 + 5] = b5 - 7;
+      out[i*8 + 6] = b6 - 3;
+      out[i*8 + 7] = b7 - 7;
+
+      ia[i] = b6;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
+         || out[i*8 + 1] != (in[i*8 + 1] + 6) * 2 - 3
+         || out[i*8 + 2] != (in[i*8 + 2] + 7) * 12 - 2
+         || out[i*8 + 3] != (in[i*8 + 3] + 8) * 5 - 1
+         || out[i*8 + 4] != (in[i*8 + 4] + 9) * 8 - 8
+         || out[i*8 + 5] != (in[i*8 + 5] + 10) * 4 - 7
+         || out[i*8 + 6] != (in[i*8 + 6] + 11) * 3 - 3
+         || out[i*8 + 7] != (in[i*8 + 7] + 12) * 2 - 7
+         || ia[i] != (in[i*8 + 6] + 11) * 3)
+	abort ();
+    }
+
+  for (i = 0; i < N*2; i++)
+    {
+      out[i*4] = (in[i*4] + 2) * 3;
+      out[i*4 + 1] = (in[i*4 + 1] + 2) * 7;
+      out[i*4 + 2] = (in[i*4 + 2] + 7) * 3;
+      out[i*4 + 3] = (in[i*4 + 3] + 7) * 7;
+
+      ib[i] = 7;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N*2; i++)
+    {
+      if (out[i*4] !=  (in[i*4] + 2) * 3
+         || out[i*4 + 1] != (in[i*4 + 1] + 2) * 7
+         || out[i*4 + 2] != (in[i*4 + 2] + 7) * 3
+         || out[i*4 + 3] != (in[i*4 + 3] + 7) * 7 
+         || ib[i] != 7)
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  {target { vect_strided && vect_int_mult} } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  {target { {! {vect_strided}} && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  {target  { ! vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" {target { vect_strided && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" {target { {! {vect_strided}} && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" {target  { ! vect_int_mult } } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+  
Index: gcc.dg/vect/slp-14.c
===================================================================
--- gcc.dg/vect/slp-14.c	(revision 0)
+++ gcc.dg/vect/slp-14.c	(revision 0)
@@ -0,0 +1,118 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include "tree-vect.h"
+
+#define N 8 
+
+int
+main1 (int n)
+{
+  int i;
+  unsigned int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7;
+  unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+  unsigned short in2[N*16] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+  unsigned short out2[N*16];
+
+  /* Multiple types are not SLPable yet.  */
+  for (i = 0; i < n; i++)
+    {
+      a0 = in[i*8] + 5;
+      a1 = in[i*8 + 1] + 6; 
+      a2 = in[i*8 + 2] + 7; 
+      a3 = in[i*8 + 3] + 8; 
+      a4 = in[i*8 + 4] + 9;
+      a5 = in[i*8 + 5] + 10;
+      a6 = in[i*8 + 6] + 11;
+      a7 = in[i*8 + 7] + 12;
+    
+      b0 = a0 * 3;
+      b1 = a1 * 2; 
+      b2 = a2 * 12;
+      b3 = a3 * 5; 
+      b4 = a4 * 8;
+      b5 = a5 * 4;
+      b6 = a6 * 3;
+      b7 = a7 * 2;
+    
+      out[i*8] = b0 - 2;
+      out[i*8 + 1] = b1 - 3; 
+      out[i*8 + 2] = b2 - 2;
+      out[i*8 + 3] = b3 - 1;
+      out[i*8 + 4] = b4 - 8;
+      out[i*8 + 5] = b5 - 7;
+      out[i*8 + 6] = b6 - 3;
+      out[i*8 + 7] = b7 - 7;
+    
+      out2[i*16] = in2[i*16] + 2;
+      out2[i*16 + 1] = in2[i*16 + 1] + 3;
+      out2[i*16 + 2] = in2[i*16 + 2] + 4;
+      out2[i*16 + 3] = in2[i*16 + 3] + 3;
+      out2[i*16 + 4] = in2[i*16 + 4] + 2;
+      out2[i*16 + 5] = in2[i*16 + 5] + 3;
+      out2[i*16 + 6] = in2[i*16 + 6] + 2;
+      out2[i*16 + 7] = in2[i*16 + 7] + 4;
+      out2[i*16 + 8] = in2[i*16 + 8] + 2;
+      out2[i*16 + 9] = in2[i*16 + 9] + 5;
+      out2[i*16 + 10] = in2[i*16 + 10] + 2;
+      out2[i*16 + 11] = in2[i*16 + 11] + 3;
+      out2[i*16 + 12] = in2[i*16 + 12] + 4;
+      out2[i*16 + 13] = in2[i*16 + 13] + 4;
+      out2[i*16 + 14] = in2[i*16 + 14] + 3;
+      out2[i*16 + 15] = in2[i*16 + 15] + 2;
+}
+
+  /* check results:  */
+  for (i = 0; i < n; i++)
+    {
+      if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
+         || out[i*8 + 1] != (in[i*8 + 1] + 6) * 2 - 3
+         || out[i*8 + 2] != (in[i*8 + 2] + 7) * 12 - 2
+         || out[i*8 + 3] != (in[i*8 + 3] + 8) * 5 - 1
+         || out[i*8 + 4] != (in[i*8 + 4] + 9) * 8 - 8
+         || out[i*8 + 5] != (in[i*8 + 5] + 10) * 4 - 7
+         || out[i*8 + 6] != (in[i*8 + 6] + 11) * 3 - 3
+         || out[i*8 + 7] != (in[i*8 + 7] + 12) * 2 - 7)
+        abort ();
+
+      if (out2[i*16] !=  in2[i*16] + 2
+         || out2[i*16 + 1] != in2[i*16 + 1] + 3
+         || out2[i*16 + 2] != in2[i*16 + 2] + 4
+         || out2[i*16 + 3] != in2[i*16 + 3] + 3
+         || out2[i*16 + 4] != in2[i*16 + 4] + 2
+         || out2[i*16 + 5] != in2[i*16 + 5] + 3
+         || out2[i*16 + 6] != in2[i*16 + 6] + 2
+         || out2[i*16 + 7] != in2[i*16 + 7] + 4
+         || out2[i*16 + 8] != in2[i*16 + 8] + 2
+         || out2[i*16 + 9] != in2[i*16 + 9] + 5
+         || out2[i*16 + 10] != in2[i*16 + 10] + 2
+         || out2[i*16 + 11] != in2[i*16 + 11] + 3
+         || out2[i*16 + 12] != in2[i*16 + 12] + 4
+         || out2[i*16 + 13] != in2[i*16 + 13] + 4
+         || out2[i*16 + 14] != in2[i*16 + 14] + 3
+         || out2[i*16 + 15] != in2[i*16 + 15] + 2)
+        abort ();
+    
+    }
+
+
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 (N);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_strided && vect_int_mult  } } } }  */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  {target { ! { vect_strided  && vect_int_mult } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { xfail *-*-* }  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+  
Index: gcc.dg/vect/no-scevccp-slp-30.c
===================================================================
--- gcc.dg/vect/no-scevccp-slp-30.c	(revision 0)
+++ gcc.dg/vect/no-scevccp-slp-30.c	(revision 0)
@@ -0,0 +1,58 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include "tree-vect.h"
+
+#define N 128 
+
+int
+main1 ()
+{
+  int i, j;
+  unsigned short out[N*8], a[N];
+   
+ for (j = 0; j < N; j++)
+   {
+    for (i = 0; i < N; i++)
+      {
+        out[i*4] = 8;
+        out[i*4 + 1] = 18;
+        out[i*4 + 2] = 28;
+        out[i*4 + 3] = 38;
+      }
+    a[j] = 8;
+   }
+
+  /* check results:  */
+   for (j = 0; j < N; j++)
+   {
+    for (i = 0; i < N; i++)
+      {
+        if (out[i*4] != 8
+            || out[i*4 + 1] != 18
+            || out[i*4 + 2] != 28
+            || out[i*4 + 3] != 38)
+          abort();
+      }
+   
+    if (a[j] != 8)
+       abort ();
+   }
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+  
Index: gcc.dg/vect/slp-23.c
===================================================================
--- gcc.dg/vect/slp-23.c	(revision 0)
+++ gcc.dg/vect/slp-23.c	(revision 0)
@@ -0,0 +1,113 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 128 
+
+typedef struct {
+   int a;
+   int b;
+   int c;
+   int d;
+   int e;
+   int f;
+   int g;
+   int h;
+} s;
+
+int
+main1 (s *arr)
+{
+  int i;
+  s *ptr = arr;
+  s res[N];
+
+  for (i = 0; i < N; i++)
+    {
+      res[i].c = ptr->c + ptr->c;
+      res[i].a = ptr->a + ptr->a;
+      res[i].d = ptr->d + ptr->d;
+      res[i].b = ptr->b + ptr->b;
+      res[i].f = ptr->f + ptr->f;
+      res[i].e = ptr->e + ptr->e; 
+      res[i].h = ptr->h + ptr->h;   
+      res[i].g = ptr->g + ptr->g;
+      ptr++; 
+    } 
+   
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    { 
+      if (res[i].c != arr[i].c + arr[i].c
+          || res[i].a != arr[i].a + arr[i].a
+          || res[i].d != arr[i].d + arr[i].d
+          || res[i].b != arr[i].b + arr[i].b
+          || res[i].f != arr[i].f + arr[i].f
+          || res[i].e != arr[i].e + arr[i].e
+          || res[i].h != arr[i].h + arr[i].h
+          || res[i].g != arr[i].g + arr[i].g)
+         abort();
+    }
+
+  ptr = arr;
+  for (i = 0; i < N; i++)
+    {
+      res[i].c = ptr->c + ptr->c;
+      res[i].a = ptr->a + ptr->a;
+      res[i].d = ptr->d + ptr->d;
+      res[i].b = ptr->b + ptr->b;
+      res[i].f = ptr->f + ptr->f;
+      res[i].e = ptr->e + ptr->e;
+      res[i].h = ptr->e + ptr->e;
+      res[i].g = ptr->g + ptr->g;
+      ptr++;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (res[i].c != arr[i].c + arr[i].c
+          || res[i].a != arr[i].a + arr[i].a
+          || res[i].d != arr[i].d + arr[i].d
+          || res[i].b != arr[i].b + arr[i].b
+          || res[i].f != arr[i].f + arr[i].f
+          || res[i].e != arr[i].e + arr[i].e
+          || res[i].h != arr[i].e + arr[i].e
+          || res[i].g != arr[i].g + arr[i].g)
+         abort();
+    }
+
+}
+
+int main (void)
+{
+  int i;
+  s arr[N];
+  
+  check_vect ();
+
+  for (i = 0; i < N; i++)
+    { 
+      arr[i].a = i;
+      arr[i].b = i * 2;
+      arr[i].c = 17;
+      arr[i].d = i+34;
+      arr[i].e = i * 3 + 5;
+      arr[i].f = i * 5;
+      arr[i].g = i - 3;
+      arr[i].h = 56;
+      if (arr[i].a == 178)
+         abort(); 
+    } 
+
+  main1 (arr);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target vect_strided } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { ! { vect_strided } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+  
Index: gcc.dg/vect/slp-5.c
===================================================================
--- gcc.dg/vect/slp-5.c	(revision 0)
+++ gcc.dg/vect/slp-5.c	(revision 0)
@@ -0,0 +1,128 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include "tree-vect.h"
+
+#define N 16 
+
+int
+main1 ()
+{
+  int i;
+  unsigned int out[N*8];
+  unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+  unsigned short ia[N];
+  unsigned int ib[N*2];
+
+  /* Not SLPable for now: multiple types with SLP of the smaller type.  */
+  for (i = 0; i < N; i++)
+    {
+      out[i*8] = in[i*8];
+      out[i*8 + 1] = in[i*8 + 1];
+      out[i*8 + 2] = in[i*8 + 2];
+      out[i*8 + 3] = in[i*8 + 3];
+      out[i*8 + 4] = in[i*8 + 4];
+      out[i*8 + 5] = in[i*8 + 5];
+      out[i*8 + 6] = in[i*8 + 6];
+      out[i*8 + 7] = in[i*8 + 7];
+    
+      ia[i] = 7;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (out[i*8] !=  in[i*8]
+         || out[i*8 + 1] != in[i*8 + 1]
+         || out[i*8 + 2] != in[i*8 + 2]
+         || out[i*8 + 3] != in[i*8 + 3]
+         || out[i*8 + 4] != in[i*8 + 4]
+         || out[i*8 + 5] != in[i*8 + 5]
+         || out[i*8 + 6] != in[i*8 + 6]
+         || out[i*8 + 7] != in[i*8 + 7]
+         || ia[i] != 7)
+	abort ();
+    }
+
+  for (i = 0; i < N*2; i++)
+    {
+      out[i*4] = in[i*4];
+      out[i*4 + 1] = in[i*4 + 1];
+      out[i*4 + 2] = in[i*4 + 2];
+      out[i*4 + 3] = in[i*4 + 3];
+
+      ib[i] = 12;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N*2; i++)
+    {
+      if (out[i*4] !=  in[i*4]
+         || out[i*4 + 1] != in[i*4 + 1]
+         || out[i*4 + 2] != in[i*4 + 2]
+         || out[i*4 + 3] != in[i*4 + 3]
+         || ib[i] != 12)
+        abort ();
+    }
+
+  for (i = 0; i < N/2; i++)
+    {
+      out[i*16] = in[i*16];
+      out[i*16 + 1] = in[i*16 + 1];
+      out[i*16 + 2] = in[i*16 + 2];
+      out[i*16 + 3] = in[i*16 + 3];
+      out[i*16 + 4] = in[i*16 + 4];
+      out[i*16 + 5] = in[i*16 + 5];
+      out[i*16 + 6] = in[i*16 + 6];
+      out[i*16 + 7] = in[i*16 + 7];
+      out[i*16 + 8] = in[i*16 + 8];
+      out[i*16 + 9] = in[i*16 + 9];
+      out[i*16 + 10] = in[i*16 + 10];
+      out[i*16 + 11] = in[i*16 + 11];
+      out[i*16 + 12] = in[i*16 + 12];
+      out[i*16 + 13] = in[i*16 + 13];
+      out[i*16 + 14] = in[i*16 + 14];
+      out[i*16 + 15] = in[i*16 + 15];
+    }
+
+  /* check results:  */
+  for (i = 0; i < N/2; i++)
+    {
+      if (out[i*16] !=  in[i*16]
+         || out[i*16 + 1] != in[i*16 + 1]
+         || out[i*16 + 2] != in[i*16 + 2]
+         || out[i*16 + 3] != in[i*16 + 3]
+         || out[i*16 + 4] != in[i*16 + 4]
+         || out[i*16 + 5] != in[i*16 + 5]
+         || out[i*16 + 6] != in[i*16 + 6]
+         || out[i*16 + 7] != in[i*16 + 7]
+         || out[i*16 + 8] != in[i*16 + 8]
+         || out[i*16 + 9] != in[i*16 + 9]
+         || out[i*16 + 10] != in[i*16 + 10]
+         || out[i*16 + 11] != in[i*16 + 11]
+         || out[i*16 + 12] != in[i*16 + 12]
+         || out[i*16 + 13] != in[i*16 + 13]
+         || out[i*16 + 14] != in[i*16 + 14]
+         || out[i*16 + 15] != in[i*16 + 15])
+        abort ();
+    }
+
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target { vect_strided } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target  { ! { vect_strided } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect"  } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+  
Index: gcc.dg/vect/slp-12b.c
===================================================================
--- gcc.dg/vect/slp-12b.c	(revision 0)
+++ gcc.dg/vect/slp-12b.c	(revision 0)
@@ -0,0 +1,51 @@
+/* { dg-require-effective-target vect_intfloat_cvt } */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include "tree-vect.h"
+
+#define N 64 
+
+int
+main1 ()
+{
+  int i;
+  unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+  float out2[N*8], fa[N*4];
+
+  for (i = 0; i < N; i++)
+    {
+      out2[i*2] = (float) (in[i*2] * 2 + 6) ;
+      out2[i*2 + 1] = (float) (in[i*2 + 1] * 3 + 7);
+    
+      fa[i] = (float) in[i*2+1];
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (out2[i*2] !=  (float) (in[i*2] * 2 + 6)
+         || out2[i*2 + 1] != (float) (in[i*2 + 1] * 3 + 7)
+         || fa[i] != (float) in[i*2+1])
+        abort ();
+    }
+
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  {target { vect_strided && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  {target { { ! { vect_int_mult }} || { ! {vect_strided}}} } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  {target { vect_strided && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  {target { { ! { vect_int_mult }} || { ! {vect_strided}}} } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+  
Index: gcc.dg/vect/slp-15.c
===================================================================
--- gcc.dg/vect/slp-15.c	(revision 0)
+++ gcc.dg/vect/slp-15.c	(revision 0)
@@ -0,0 +1,117 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include "tree-vect.h"
+
+#define N 8 
+
+int
+main1 (int n)
+{
+  int i;
+  unsigned int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7;
+  unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+  unsigned int in2[N*16] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+  unsigned int out2[N*16];
+
+  for (i = 0; i < n; i++)
+    {
+      a0 = in[i*8] + 5;
+      a1 = in[i*8 + 1] + 6; 
+      a2 = in[i*8 + 2] + 7; 
+      a3 = in[i*8 + 3] + 8; 
+      a4 = in[i*8 + 4] + 9;
+      a5 = in[i*8 + 5] + 10;
+      a6 = in[i*8 + 6] + 11;
+      a7 = in[i*8 + 7] + 12;
+    
+      b0 = a0 * 3;
+      b1 = a1 * 2; 
+      b2 = a2 * 12;
+      b3 = a3 * 5; 
+      b4 = a4 * 8;
+      b5 = a5 * 4;
+      b6 = a6 * 3;
+      b7 = a7 * 2;
+    
+      out[i*8] = b0 - 2;
+      out[i*8 + 1] = b1 - 3; 
+      out[i*8 + 2] = b2 - 2;
+      out[i*8 + 3] = b3 - 1;
+      out[i*8 + 4] = b4 - 8;
+      out[i*8 + 5] = b5 - 7;
+      out[i*8 + 6] = b6 - 3;
+      out[i*8 + 7] = b7 - 7;
+    
+      out2[i*16] = in2[i*16] * 2;
+      out2[i*16 + 1] = in2[i*16 + 1] * 3;
+      out2[i*16 + 2] = in2[i*16 + 2] * 4;
+      out2[i*16 + 3] = in2[i*16 + 3] * 3;
+      out2[i*16 + 4] = in2[i*16 + 4] * 2;
+      out2[i*16 + 5] = in2[i*16 + 5] * 3;
+      out2[i*16 + 6] = in2[i*16 + 6] * 2;
+      out2[i*16 + 7] = in2[i*16 + 7] * 4;
+      out2[i*16 + 8] = in2[i*16 + 8] * 2;
+      out2[i*16 + 9] = in2[i*16 + 9] * 5;
+      out2[i*16 + 10] = in2[i*16 + 10] * 2;
+      out2[i*16 + 11] = in2[i*16 + 11] * 3;
+      out2[i*16 + 12] = in2[i*16 + 12] * 4;
+      out2[i*16 + 13] = in2[i*16 + 13] * 4;
+      out2[i*16 + 14] = in2[i*16 + 14] * 3;
+      out2[i*16 + 15] = in2[i*16 + 15] * 2;
+}
+
+  /* check results:  */
+  for (i = 0; i < n; i++)
+    {
+      if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
+         || out[i*8 + 1] != (in[i*8 + 1] + 6) * 2 - 3
+         || out[i*8 + 2] != (in[i*8 + 2] + 7) * 12 - 2
+         || out[i*8 + 3] != (in[i*8 + 3] + 8) * 5 - 1
+         || out[i*8 + 4] != (in[i*8 + 4] + 9) * 8 - 8
+         || out[i*8 + 5] != (in[i*8 + 5] + 10) * 4 - 7
+         || out[i*8 + 6] != (in[i*8 + 6] + 11) * 3 - 3
+         || out[i*8 + 7] != (in[i*8 + 7] + 12) * 2 - 7)
+        abort ();
+
+      if (out2[i*16] !=  in2[i*16] * 2
+         || out2[i*16 + 1] != in2[i*16 + 1] * 3
+         || out2[i*16 + 2] != in2[i*16 + 2] * 4
+         || out2[i*16 + 3] != in2[i*16 + 3] * 3
+         || out2[i*16 + 4] != in2[i*16 + 4] * 2
+         || out2[i*16 + 5] != in2[i*16 + 5] * 3
+         || out2[i*16 + 6] != in2[i*16 + 6] * 2
+         || out2[i*16 + 7] != in2[i*16 + 7] * 4
+         || out2[i*16 + 8] != in2[i*16 + 8] * 2
+         || out2[i*16 + 9] != in2[i*16 + 9] * 5
+         || out2[i*16 + 10] != in2[i*16 + 10] * 2
+         || out2[i*16 + 11] != in2[i*16 + 11] * 3
+         || out2[i*16 + 12] != in2[i*16 + 12] * 4
+         || out2[i*16 + 13] != in2[i*16 + 13] * 4
+         || out2[i*16 + 14] != in2[i*16 + 14] * 3
+         || out2[i*16 + 15] != in2[i*16 + 15] * 2)
+        abort ();
+    
+    }
+
+
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 (N);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  {target vect_int_mult } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  {target  { ! { vect_int_mult } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" {target vect_int_mult } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" {target { ! { vect_int_mult } } } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+  
Index: ChangeLog
===================================================================
--- ChangeLog	(revision 127394)
+++ ChangeLog	(working copy)
@@ -1,3 +1,28 @@
+2007-08-14  Ira Rosen  <irar@il.ibm.com>
+
+	* gcc.dg/vect/vect.exp: Compile tests starting with slp-.
+        Remove "vect" part from test names for -ffast-math, -ffast-math-errno,
+        -fwrapv, -ftrapv tests. Add -fno-tree-scev-cprop for slp- tests.
+        Compile tests with -fno-tree-pre.
+        * lib/target-supports.exp (check_effective_target_vect_strided): New.
+        * gcc.dg/vect/slp-1.c, gcc.dg/vect/slp-2.c, gcc.dg/vect/slp-3.c,
+        gcc.dg/vect/slp-4.c, gcc.dg/vect/slp-5.c, gcc.dg/vect/slp-6.c,
+        gcc.dg/vect/slp-7.c, gcc.dg/vect/slp-8.c, gcc.dg/vect/slp-9.c,
+        gcc.dg/vect/slp-10.c, gcc.dg/vect/slp-11.c, gcc.dg/vect/slp-12.c,
+        gcc.dg/vect/slp-13.c, gcc.dg/vect/slp-14.c, gcc.dg/vect/slp-15.c,
+        gcc.dg/vect/slp-16.c, gcc.dg/vect/slp-17.c, gcc.dg/vect/slp-18.c,
+        gcc.dg/vect/slp-19.c, gcc.dg/vect/slp-20.c, gcc.dg/vect/slp-21.c,
+        gcc.dg/vect/slp-22.c, gcc.dg/vect/slp-23.c, gcc.dg/vect/slp-24.c,
+        gcc.dg/vect/slp-25.c, gcc.dg/vect/slp-26.c, gcc.dg/vect/slp-28.c,
+        gcc.dg/vect/fast-math-slp-27.c, gcc.dg/vect/no-tree-pre-slp-29.c,
+        gcc.dg/vect/no-scevccp-slp-30.c, gcc.dg/vect/no-scevccp-slp-31.c,
+        gcc.dg/vect/no-math-errno-slp-32.c, gcc.dg/vect/slp-33.c,
+	gcc.dg/vect/slp-34.c, gcc.dg/vect/slp-35.c, gcc.dg/vect/slp-36.c,
+	gcc.dg/vect/slp-37.c: New testcases.
+        * gcc.dg/vect/costmodel/ppc/ppc-costmodel-vect.exp: Run SLP tests.
+        * gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c: New test.
+        * gcc.dg/vect/costmodel/ppc/costmodel-slp-33.c: New test.
+
 2007-08-13  Richard Sandiford  <richard@codesourcery.com>
 
 	* gcc.dg/pr21255-4.c: Skip for PIC.
Index: lib/target-supports.exp
===================================================================
--- lib/target-supports.exp	(revision 127394)
+++ lib/target-supports.exp	(working copy)
@@ -2282,6 +2282,24 @@ proc check_effective_target_vect_interle
     return $et_vect_interleave_saved
 }
 
+# Return 1 if the target supports vector interleaving and extract even/odd, 0 otherwise.
+proc check_effective_target_vect_strided { } {
+    global et_vect_strided_saved
+
+    if [info exists et_vect_strided_saved] {
+        verbose "check_effective_target_vect_strided: using cached result" 2
+    } else {
+        set et_vect_strided_saved 0
+        if { [check_effective_target_vect_interleave]
+             && [check_effective_target_vect_extract_even_odd] } {
+           set et_vect_strided_saved 1
+        }
+    }
+
+    verbose "check_effective_target_vect_strided: returning $et_vect_strided_saved" 2
+    return $et_vect_strided_saved
+}
+
 # Return 1 if the target supports section-anchors
 
 proc check_effective_target_section_anchors { } {

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] Loop-aware SLP
  2007-08-14 13:01 [patch] Loop-aware SLP Ira Rosen
                   ` (4 preceding siblings ...)
  2007-08-14 13:18 ` [patch] Loop-aware SLP 5/5 Ira Rosen
@ 2007-08-14 13:21 ` Paolo Bonzini
  2007-08-14 14:47   ` Daniel Berlin
  2007-08-14 15:23   ` Ira Rosen
  5 siblings, 2 replies; 14+ messages in thread
From: Paolo Bonzini @ 2007-08-14 13:21 UTC (permalink / raw)
  To: Ira Rosen; +Cc: gcc-patches, Dorit Nuzman


> We plan to extend this initial "SLP-like vectorization in loops" in the
> following directions:
> - support reduction
> - support data permutations
> - support interleaved accesses with gaps
> - support loops with multiple types

Maybe it is already supported, but do you plan adding struct 
vectorization, as in

struct col {
   char r, g, b, a;
}

struct col *a, *b;
for (...)
   {
     a[i].r = b[i].r;
     a[i].g = b[i].g;
     a[i].b = b[i].b;
     a[i].a = b[i].a;
   }

or even with the last assignment's RHS being 255?

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] Loop-aware SLP
  2007-08-14 13:21 ` [patch] Loop-aware SLP Paolo Bonzini
@ 2007-08-14 14:47   ` Daniel Berlin
  2007-08-14 16:03     ` Ira Rosen
  2007-08-14 15:23   ` Ira Rosen
  1 sibling, 1 reply; 14+ messages in thread
From: Daniel Berlin @ 2007-08-14 14:47 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Ira Rosen, gcc-patches, Dorit Nuzman

On 8/14/07, Paolo Bonzini <bonzini@gnu.org> wrote:
>
> > We plan to extend this initial "SLP-like vectorization in loops" in the
> > following directions:
> > - support reduction
> > - support data permutations
> > - support interleaved accesses with gaps
> > - support loops with multiple types
>
> Maybe it is already supported, but do you plan adding struct
> vectorization, as in
>
> struct col {
>    char r, g, b, a;
> }
>
> struct col *a, *b;
> for (...)
>    {
>      a[i].r = b[i].r;
>      a[i].g = b[i].g;
>      a[i].b = b[i].b;
>      a[i].a = b[i].a;
>    }
>
> or even with the last assignment's RHS being 255?

Unless they are specifically checking for ARRAY_REF, it should just work.

Though now that i look

vectorizable_store and vectorizable_load in tree-vect-transform.c
claim to handle non-scalar, and specifically mentions structures.
However, they both test for:

  if (TREE_CODE (scalar_dest) != ARRAY_REF
      && TREE_CODE (scalar_dest) != INDIRECT_REF
      && !DR_GROUP_FIRST_DR (stmt_info))
    return false;

This should also have a && TREE_CODE (scalar_dest) != COMPONENT_REF,
unless the DR_GROUP_FIRST_DR somehow takes care of this (i don't see
how).

You may also want to support BIT_FIELD_REF (in which case just use
(REFERENCE_CLASS_P && code != VIEW_CONVERT_EXPR)).

The second test there should probably be INDIRECT_REF_P, which
includes the aligned/misaligned versions. :)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] Loop-aware SLP
  2007-08-14 13:21 ` [patch] Loop-aware SLP Paolo Bonzini
  2007-08-14 14:47   ` Daniel Berlin
@ 2007-08-14 15:23   ` Ira Rosen
  2007-08-14 16:13     ` Paolo Bonzini
  1 sibling, 1 reply; 14+ messages in thread
From: Ira Rosen @ 2007-08-14 15:23 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Dorit Nuzman, gcc-patches, Paolo Bonzini



Paolo Bonzini <paolo.bonzini@gmail.com> wrote on 14/08/2007 16:21:23:

> Maybe it is already supported, but do you plan adding struct
> vectorization, as in
>
> struct col {
>    char r, g, b, a;
> }
>
> struct col *a, *b;
> for (...)
>    {
>      a[i].r = b[i].r;
>      a[i].g = b[i].g;
>      a[i].b = b[i].b;
>      a[i].a = b[i].a;
>    }

It is supported.

>
> or even with the last assignment's RHS being 255?

This one is not, since in this case RHSs are not isomorphic.

Thanks,
Ira

>
> Thanks,
>
> Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] Loop-aware SLP
  2007-08-14 14:47   ` Daniel Berlin
@ 2007-08-14 16:03     ` Ira Rosen
  0 siblings, 0 replies; 14+ messages in thread
From: Ira Rosen @ 2007-08-14 16:03 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: Paolo Bonzini, Dorit Nuzman, gcc-patches

gcc-patches-owner@gcc.gnu.org wrote on 14/08/2007 17:47:30:

> On 8/14/07, Paolo Bonzini <bonzini@gnu.org> wrote:
>
> Unless they are specifically checking for ARRAY_REF, it should just work.

Yes, it works.

>
> Though now that i look
>
> vectorizable_store and vectorizable_load in tree-vect-transform.c
> claim to handle non-scalar, and specifically mentions structures.
> However, they both test for:
>
>   if (TREE_CODE (scalar_dest) != ARRAY_REF
>       && TREE_CODE (scalar_dest) != INDIRECT_REF
>       && !DR_GROUP_FIRST_DR (stmt_info))
>     return false;
>
>
> This should also have a && TREE_CODE (scalar_dest) != COMPONENT_REF,
> unless the DR_GROUP_FIRST_DR somehow takes care of this (i don't see
> how).

Data-refs analysis used to handle only some kinds of simple COMPONENT_REFs,
so we decided to support vectorization of COMPONENT_REFs only if they are a
part of group of strided accesses (and we analyze such accesses in the
vectorizer). Data-refs analysis handles only ARRAY_REFS, INDIRECT_REFs and
COMPONENT_REFs, so you can deduce that (!ARRAY_REF && !INDIRECT_REF &&
strided accesses) means COMPONENT_REF (which is also a part of strided
group) ;-). I
definitely have to rewrite this condition.

Now after Zdenek's rewrite of data-refs analysis we probably should
consider vectorizing all kinds of COMPONENT_REFs and not only groups of
strided accesses. However, COMPONENT_REFs are usually non-consecuitive
accesses, and currently non-consecuitive accesses can only be vectorized if
they are strided. Hence, I don't think this will increase vectorization
potential too much.

>
> You may also want to support BIT_FIELD_REF (in which case just use
> (REFERENCE_CLASS_P && code != VIEW_CONVERT_EXPR)).

O.K.

>
> The second test there should probably be INDIRECT_REF_P, which
> includes the aligned/misaligned versions. :)

Yes, right. We should do this. Diego has commented about this about a year
ago...

Thanks,
Ira

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] Loop-aware SLP
  2007-08-14 15:23   ` Ira Rosen
@ 2007-08-14 16:13     ` Paolo Bonzini
  2007-08-14 18:03       ` Dorit Nuzman
  0 siblings, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2007-08-14 16:13 UTC (permalink / raw)
  To: Ira Rosen; +Cc: Dorit Nuzman, gcc-patches, Paolo Bonzini

>> struct col *a, *b;
>> for (...)
>>    {
>>      a[i].r = b[i].r;
>>      a[i].g = b[i].g;
>>      a[i].b = b[i].b;
>>      a[i].a = b[i].a;
>>    }
> 
> It is supported.
> 
>> or even with the last assignment's RHS being 255?
> 
> This one is not, since in this case RHSs are not isomorphic.

I see.  However, it can be transformed to "a[i] = b[i] | {0,0,0,255}" 
which can also be generic-vectorized.  Do you think it would be hard to 
add this kind of transformation?

Similar code would be generating "a[i] = b[i] & {255,255,255,0)".

Of course, the definitive one would be to turn

       a[i].r = b[i].r + (c[i].r - b[i].r) * c[i].a;
       a[i].g = b[i].g + (c[i].g - b[i].g) * c[i].a;
       a[i].b = b[i].b + (c[i].b - b[i].b) * c[i].a;
       a[i].a = 0;

into

       a[i] = (b[i] + (c[i] - b[i]) * splat(c[i].a)) & (255,255,255,0)

but I'm asking too much maybe. :-)

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] Loop-aware SLP
  2007-08-14 16:13     ` Paolo Bonzini
@ 2007-08-14 18:03       ` Dorit Nuzman
  0 siblings, 0 replies; 14+ messages in thread
From: Dorit Nuzman @ 2007-08-14 18:03 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: gcc-patches, Ira Rosen, Paolo Bonzini

Paolo Bonzini <paolo.bonzini@gmail.com> wrote on 14/08/2007 19:07:00:

> >> struct col *a, *b;
> >> for (...)
> >>    {
> >>      a[i].r = b[i].r;
> >>      a[i].g = b[i].g;
> >>      a[i].b = b[i].b;
> >>      a[i].a = b[i].a;
> >>    }
> >
> > It is supported.
> >
> >> or even with the last assignment's RHS being 255?
> >
> > This one is not, since in this case RHSs are not isomorphic.
>
> I see.  However, it can be transformed to "a[i] = b[i] | {0,0,0,255}"
> which can also be generic-vectorized.  Do you think it would be hard to
> add this kind of transformation?
>
> Similar code would be generating "a[i] = b[i] & {255,255,255,0)".
>
> Of course, the definitive one would be to turn
>
>        a[i].r = b[i].r + (c[i].r - b[i].r) * c[i].a;
>        a[i].g = b[i].g + (c[i].g - b[i].g) * c[i].a;
>        a[i].b = b[i].b + (c[i].b - b[i].b) * c[i].a;
>        a[i].a = 0;
>
> into
>
>        a[i] = (b[i] + (c[i] - b[i]) * splat(c[i].a)) & (255,255,255,0)
>
> but I'm asking too much maybe. :-)
>

This should be doable. Similar kinds of tricks were also brought up at the
summit after Ira's talk -  like "padding" any gaps with the corresponding
identity operation. Another example that comes to mind in this context is
if you have, say:
   c[i] = a[i] + b[i]
   c[i+1] = a[i+1] - b[i+1]
   c[i+2] = a[i+2] + b[i+2]
   c[i+3] = a[i+3] - b[i+3]
This is also a case of non isomorphic operations (that actually some SIMD
platforms that have subadd operation can directly support), that could be
vectorized by negating the even locations of array b and then using a
vector add. So, yes, the transformation you mention is definitely a
relevant direction for extending this work.

dorit

> Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] Loop-aware SLP
       [not found] <OFE23AAA54.58CDD7B4-ONC2257337.0041DED3-C2257337.004775ED@LocalDomain>
@ 2007-08-16 13:30 ` Dorit Nuzman
  0 siblings, 0 replies; 14+ messages in thread
From: Dorit Nuzman @ 2007-08-16 13:30 UTC (permalink / raw)
  To: Ira Rosen; +Cc: gcc-patches

Ira Rosen/Haifa/IBM wrote on 14/08/2007 16:00:32:

> This patch adds a support of straight-line code vectorization inside
> loops. This a merge from autovect branch.

Hi Ira,

Just wanted to let you know I'm going over your patches (it's just taking
time...)

>
>
> We plan to extend this initial "SLP-like vectorization in loops" in the
> following directions:
> - support reduction
> - support data permutations
> - support interleaved accesses with gaps
> - support loops with multiple types
>

(for the record - a more comprehensive list of SLP related todo can be
found here: http://gcc.gnu.org/wiki/VectorizationTasks)

thanks,
dorit

> More information can be found in our summit paper (http://gcc.gnu.
>
org/wiki/HomePage?action=AttachFile&do=get&target=GCC2007-Proceedings.pdf).
>
> This is a joint work with Dorit.

> The patch is divided into 5 parts and is relative to this cleanup patch (
> http://gcc.gnu.org/ml/gcc-patches/2007-08/msg00806.html). I am going
> to submit these parts as a reply to this note.
> Each part can be compiled (but with warnings about unused arguments
> and variables) and passes vectorization tests. The whole patch
> passes bootstrap with vectorization enabled and vectorization
> testcases on x86_64-linux. I am going to run full regtesting and
> bootstrap on PowerPC.
>
> O.K. for mainline once the testing is completed?
>
> Thanks,
> Ira
>
> :ADDPATCH SSA (vectorizer):
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch] Loop-aware SLP
       [not found] <46C1D3A2.30106@lu.unisi.ch>
@ 2007-08-14 17:02 ` Ira Rosen
  0 siblings, 0 replies; 14+ messages in thread
From: Ira Rosen @ 2007-08-14 17:02 UTC (permalink / raw)
  To: bonzini; +Cc: Paolo Bonzini, Daniel Berlin, Dorit Nuzman, gcc-patches



Paolo Bonzini <paolo.bonzini@lu.unisi.ch> wrote on 14/08/2007 19:09:06:

>
> > However, COMPONENT_REFs are usually non-consecuitive
> > accesses, and currently non-consecuitive accesses can only be
vectorized if
> > they are strided. Hence, I don't think this will increase vectorization
> > potential too much.
>
> Well, not so much "usually".  I would think that they are consecutive
> exactly in the kind of code that can benefit from COMPONENT_REF SLP.
> Though again, strided accesses are probably the majority.

Our SLP analysis is based on groups strided accesses. So, by
"non-consecuitive accesses can only be vectorized if they are strided", I
meant both SLP vectorization and strided accesses vectorization.

Thanks,
Ira

>
> Thanks for all your work!
>
> Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2007-08-16 13:30 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-08-14 13:01 [patch] Loop-aware SLP Ira Rosen
2007-08-14 13:05 ` [patch] Loop-aware SLP 1/5 Ira Rosen
2007-08-14 13:07 ` [patch] Loop-aware SLP 2/5 Ira Rosen
2007-08-14 13:12 ` [patch] Loop-aware SLP 3/5 Ira Rosen
2007-08-14 13:16 ` [patch] Loop-aware SLP 4/5 Ira Rosen
2007-08-14 13:18 ` [patch] Loop-aware SLP 5/5 Ira Rosen
2007-08-14 13:21 ` [patch] Loop-aware SLP Paolo Bonzini
2007-08-14 14:47   ` Daniel Berlin
2007-08-14 16:03     ` Ira Rosen
2007-08-14 15:23   ` Ira Rosen
2007-08-14 16:13     ` Paolo Bonzini
2007-08-14 18:03       ` Dorit Nuzman
     [not found] <46C1D3A2.30106@lu.unisi.ch>
2007-08-14 17:02 ` Ira Rosen
     [not found] <OFE23AAA54.58CDD7B4-ONC2257337.0041DED3-C2257337.004775ED@LocalDomain>
2007-08-16 13:30 ` Dorit Nuzman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).