public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [0/9] Direct support for loads and stores of interleaved vectors
@ 2011-04-12 13:21 Richard Sandiford
  2011-04-12 13:25 ` [1/9] Generalise vect_create_data_ref_ptr Richard Sandiford
                   ` (9 more replies)
  0 siblings, 10 replies; 27+ messages in thread
From: Richard Sandiford @ 2011-04-12 13:21 UTC (permalink / raw)
  To: gcc-patches

The vectoriser can handle interleaved loads such as:

    for (int i = 0; i < N; i++)
      res[i] = a[2 * i] + a[2 * i + 1];

The vectorised code loads two consecutive vectors from A, then permutes
the elements.  It can handle stores in a similar way.

This patch series adds support for load and store instructions that have
the interleaving "built in", such as NEON's vldN and vstN.  The series
is based on the outline here:

    http://gcc.gnu.org/ml/gcc/2011-03/msg00322.html

except that I'm now using "internal" functions rather than built-ins.

I'll update my internal function patch:

    http://gcc.gnu.org/ml/gcc-patches/2011-04/msg00609.html

after Richard's recent changes and retest, but the patches in this
series are unaffected.

Richard

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [1/9] Generalise vect_create_data_ref_ptr
  2011-04-12 13:21 [0/9] Direct support for loads and stores of interleaved vectors Richard Sandiford
@ 2011-04-12 13:25 ` Richard Sandiford
  2011-04-12 13:30   ` Richard Guenther
  2011-04-12 13:28 ` [2/9] Reindent parts of vectorizable_load and vectorizable_store Richard Sandiford
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Richard Sandiford @ 2011-04-12 13:25 UTC (permalink / raw)
  To: gcc-patches; +Cc: patches

This first patch generalises vect_create_data_ref_ptr & bump_data_ref_ptr
so that they can handle array as well as vector types.  The two cases are
so similar that it's mostly a renaming exercise.

Tested on x86_64-linux-gnu and arm-linux-gnueabi.  OK to install?

Richard


gcc/
	* tree-vectorizer.h (vect_create_data_ref_ptr): Add an extra
	type parameter.
	* tree-vect-data-refs.c (vect_create_data_ref_ptr): Add an aggr_type
	parameter.  Generalise code to handle arrays as well as vectors.
	(vect_setup_realignment): Update accordingly.
	* tree-vect-stmts.c (vectorizable_store): Likewise.
	(vectorizable_load): Likewise.

Index: gcc/tree-vectorizer.h
===================================================================
--- gcc/tree-vectorizer.h	2011-04-12 11:53:54.000000000 +0100
+++ gcc/tree-vectorizer.h	2011-04-12 11:55:07.000000000 +0100
@@ -823,9 +823,9 @@ extern bool vect_verify_datarefs_alignme
 extern bool vect_analyze_data_ref_accesses (loop_vec_info, bb_vec_info);
 extern bool vect_prune_runtime_alias_test_list (loop_vec_info);
 extern bool vect_analyze_data_refs (loop_vec_info, bb_vec_info, int *);
-extern tree vect_create_data_ref_ptr (gimple, struct loop *, tree, tree *,
-                                      gimple_stmt_iterator *, gimple *,
-                                      bool, bool *);
+extern tree vect_create_data_ref_ptr (gimple, tree, struct loop *, tree,
+				      tree *, gimple_stmt_iterator *,
+				      gimple *, bool, bool *);
 extern tree bump_vector_ptr (tree, gimple, gimple_stmt_iterator *, gimple, tree);
 extern tree vect_create_destination_var (tree, tree);
 extern bool vect_strided_store_supported (tree);
Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2011-04-12 11:53:54.000000000 +0100
+++ gcc/tree-vect-data-refs.c	2011-04-12 11:55:07.000000000 +0100
@@ -2911,32 +2911,33 @@ vect_create_addr_base_for_vector_ref (gi
 
 /* Function vect_create_data_ref_ptr.
 
-   Create a new pointer to vector type (vp), that points to the first location
-   accessed in the loop by STMT, along with the def-use update chain to
-   appropriately advance the pointer through the loop iterations. Also set
-   aliasing information for the pointer.  This vector pointer is used by the
-   callers to this function to create a memory reference expression for vector
-   load/store access.
+   Create a new pointer-to-TYPE variable (ap), that points to the first
+   location accessed in the loop by STMT, along with the def-use update
+   chain to appropriately advance the pointer through the loop iterations.
+   Also set aliasing information for the pointer.  This pointer is used by
+   the callers to this function to create a memory reference expression for
+   vector load/store access.
 
    Input:
    1. STMT: a stmt that references memory. Expected to be of the form
          GIMPLE_ASSIGN <name, data-ref> or
 	 GIMPLE_ASSIGN <data-ref, name>.
-   2. AT_LOOP: the loop where the vector memref is to be created.
-   3. OFFSET (optional): an offset to be added to the initial address accessed
+   2. AGGR_TYPE: the type of the reference, which should be either a vector
+        or an array.
+   3. AT_LOOP: the loop where the vector memref is to be created.
+   4. OFFSET (optional): an offset to be added to the initial address accessed
         by the data-ref in STMT.
-   4. BSI: location where the new stmts are to be placed if there is no loop
-   5. ONLY_INIT: indicate if vp is to be updated in the loop, or remain
+   5. BSI: location where the new stmts are to be placed if there is no loop
+   6. ONLY_INIT: indicate if ap is to be updated in the loop, or remain
         pointing to the initial address.
-   6. TYPE: if not NULL indicates the required type of the data-ref.
 
    Output:
    1. Declare a new ptr to vector_type, and have it point to the base of the
       data reference (initial addressed accessed by the data reference).
       For example, for vector of type V8HI, the following code is generated:
 
-      v8hi *vp;
-      vp = (v8hi *)initial_address;
+      v8hi *ap;
+      ap = (v8hi *)initial_address;
 
       if OFFSET is not supplied:
          initial_address = &a[init];
@@ -2956,9 +2957,10 @@ vect_create_addr_base_for_vector_ref (gi
    4. Return the pointer.  */
 
 tree
-vect_create_data_ref_ptr (gimple stmt, struct loop *at_loop, tree offset,
-			  tree *initial_address, gimple_stmt_iterator *gsi,
-			  gimple *ptr_incr, bool only_init, bool *inv_p)
+vect_create_data_ref_ptr (gimple stmt, tree aggr_type, struct loop *at_loop,
+			  tree offset, tree *initial_address,
+			  gimple_stmt_iterator *gsi, gimple *ptr_incr,
+			  bool only_init, bool *inv_p)
 {
   tree base_name;
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
@@ -2966,17 +2968,16 @@ vect_create_data_ref_ptr (gimple stmt, s
   struct loop *loop = NULL;
   bool nested_in_vect_loop = false;
   struct loop *containing_loop = NULL;
-  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
-  tree vect_ptr_type;
-  tree vect_ptr;
+  tree aggr_ptr_type;
+  tree aggr_ptr;
   tree new_temp;
   gimple vec_stmt;
   gimple_seq new_stmt_list = NULL;
   edge pe = NULL;
   basic_block new_bb;
-  tree vect_ptr_init;
+  tree aggr_ptr_init;
   struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
-  tree vptr;
+  tree aptr;
   gimple_stmt_iterator incr_gsi;
   bool insert_after;
   bool negative;
@@ -2986,6 +2987,9 @@ vect_create_data_ref_ptr (gimple stmt, s
   bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
   tree base;
 
+  gcc_assert (TREE_CODE (aggr_type) == ARRAY_TYPE
+	      || TREE_CODE (aggr_type) == VECTOR_TYPE);
+
   if (loop_vinfo)
     {
       loop = LOOP_VINFO_LOOP (loop_vinfo);
@@ -3020,8 +3024,9 @@ vect_create_data_ref_ptr (gimple stmt, s
   if (vect_print_dump_info (REPORT_DETAILS))
     {
       tree data_ref_base = base_name;
-      fprintf (vect_dump, "create vector-pointer variable to type: ");
-      print_generic_expr (vect_dump, vectype, TDF_SLIM);
+      fprintf (vect_dump, "create %s-pointer variable to type: ",
+	       tree_code_name[(int) TREE_CODE (aggr_type)]);
+      print_generic_expr (vect_dump, aggr_type, TDF_SLIM);
       if (TREE_CODE (data_ref_base) == VAR_DECL
           || TREE_CODE (data_ref_base) == ARRAY_REF)
         fprintf (vect_dump, "  vectorizing an array ref: ");
@@ -3032,27 +3037,28 @@ vect_create_data_ref_ptr (gimple stmt, s
       print_generic_expr (vect_dump, base_name, TDF_SLIM);
     }
 
-  /* (1) Create the new vector-pointer variable.  */
-  vect_ptr_type = build_pointer_type (vectype);
+  /* (1) Create the new aggregate-pointer variable.  */
+  aggr_ptr_type = build_pointer_type (aggr_type);
   base = get_base_address (DR_REF (dr));
   if (base
       && TREE_CODE (base) == MEM_REF)
-    vect_ptr_type
-      = build_qualified_type (vect_ptr_type,
+    aggr_ptr_type
+      = build_qualified_type (aggr_ptr_type,
 			      TYPE_QUALS (TREE_TYPE (TREE_OPERAND (base, 0))));
-  vect_ptr = vect_get_new_vect_var (vect_ptr_type, vect_pointer_var,
+  aggr_ptr = vect_get_new_vect_var (aggr_ptr_type, vect_pointer_var,
                                     get_name (base_name));
 
-  /* Vector types inherit the alias set of their component type by default so
-     we need to use a ref-all pointer if the data reference does not conflict
-     with the created vector data reference because it is not addressable.  */
-  if (!alias_sets_conflict_p (get_deref_alias_set (vect_ptr),
+  /* Vector and array types inherit the alias set of their component
+     type by default so we need to use a ref-all pointer if the data
+     reference does not conflict with the created aggregated data
+     reference because it is not addressable.  */
+  if (!alias_sets_conflict_p (get_deref_alias_set (aggr_ptr),
 			      get_alias_set (DR_REF (dr))))
     {
-      vect_ptr_type
-	= build_pointer_type_for_mode (vectype,
-				       TYPE_MODE (vect_ptr_type), true);
-      vect_ptr = vect_get_new_vect_var (vect_ptr_type, vect_pointer_var,
+      aggr_ptr_type
+	= build_pointer_type_for_mode (aggr_type,
+				       TYPE_MODE (aggr_ptr_type), true);
+      aggr_ptr = vect_get_new_vect_var (aggr_ptr_type, vect_pointer_var,
 					get_name (base_name));
     }
 
@@ -3063,14 +3069,14 @@ vect_create_data_ref_ptr (gimple stmt, s
       do
 	{
 	  tree lhs = gimple_assign_lhs (orig_stmt);
-	  if (!alias_sets_conflict_p (get_deref_alias_set (vect_ptr),
+	  if (!alias_sets_conflict_p (get_deref_alias_set (aggr_ptr),
 				      get_alias_set (lhs)))
 	    {
-	      vect_ptr_type
-		= build_pointer_type_for_mode (vectype,
-					       TYPE_MODE (vect_ptr_type), true);
-	      vect_ptr
-		= vect_get_new_vect_var (vect_ptr_type, vect_pointer_var,
+	      aggr_ptr_type
+		= build_pointer_type_for_mode (aggr_type,
+					       TYPE_MODE (aggr_ptr_type), true);
+	      aggr_ptr
+		= vect_get_new_vect_var (aggr_ptr_type, vect_pointer_var,
 					 get_name (base_name));
 	      break;
 	    }
@@ -3080,7 +3086,7 @@ vect_create_data_ref_ptr (gimple stmt, s
       while (orig_stmt);
     }
 
-  add_referenced_var (vect_ptr);
+  add_referenced_var (aggr_ptr);
 
   /* Note: If the dataref is in an inner-loop nested in LOOP, and we are
      vectorizing LOOP (i.e., outer-loop vectorization), we need to create two
@@ -3113,8 +3119,8 @@ vect_create_data_ref_ptr (gimple stmt, s
 		vp2 = vp1 + step
 		if () goto LOOP   */
 
-  /* (2) Calculate the initial address the vector-pointer, and set
-         the vector-pointer to point to it before the loop.  */
+  /* (2) Calculate the initial address of the aggregate-pointer, and set
+     the aggregate-pointer to point to it before the loop.  */
 
   /* Create: (&(base[init_val+offset]) in the loop preheader.  */
 
@@ -3133,17 +3139,17 @@ vect_create_data_ref_ptr (gimple stmt, s
 
   *initial_address = new_temp;
 
-  /* Create: p = (vectype *) initial_base  */
+  /* Create: p = (aggr_type *) initial_base  */
   if (TREE_CODE (new_temp) != SSA_NAME
-      || !useless_type_conversion_p (vect_ptr_type, TREE_TYPE (new_temp)))
+      || !useless_type_conversion_p (aggr_ptr_type, TREE_TYPE (new_temp)))
     {
-      vec_stmt = gimple_build_assign (vect_ptr,
-				      fold_convert (vect_ptr_type, new_temp));
-      vect_ptr_init = make_ssa_name (vect_ptr, vec_stmt);
+      vec_stmt = gimple_build_assign (aggr_ptr,
+				      fold_convert (aggr_ptr_type, new_temp));
+      aggr_ptr_init = make_ssa_name (aggr_ptr, vec_stmt);
       /* Copy the points-to information if it exists. */
       if (DR_PTR_INFO (dr))
-	duplicate_ssa_name_ptr_info (vect_ptr_init, DR_PTR_INFO (dr));
-      gimple_assign_set_lhs (vec_stmt, vect_ptr_init);
+	duplicate_ssa_name_ptr_info (aggr_ptr_init, DR_PTR_INFO (dr));
+      gimple_assign_set_lhs (vec_stmt, aggr_ptr_init);
       if (pe)
 	{
 	  new_bb = gsi_insert_on_edge_immediate (pe, vec_stmt);
@@ -3153,19 +3159,19 @@ vect_create_data_ref_ptr (gimple stmt, s
 	gsi_insert_before (gsi, vec_stmt, GSI_SAME_STMT);
     }
   else
-    vect_ptr_init = new_temp;
+    aggr_ptr_init = new_temp;
 
-  /* (3) Handle the updating of the vector-pointer inside the loop.
+  /* (3) Handle the updating of the aggregate-pointer inside the loop.
      This is needed when ONLY_INIT is false, and also when AT_LOOP is the
      inner-loop nested in LOOP (during outer-loop vectorization).  */
 
   /* No update in loop is required.  */
   if (only_init && (!loop_vinfo || at_loop == loop))
-    vptr = vect_ptr_init;
+    aptr = aggr_ptr_init;
   else
     {
-      /* The step of the vector pointer is the Vector Size.  */
-      tree step = TYPE_SIZE_UNIT (vectype);
+      /* The step of the aggregate pointer is the type size.  */
+      tree step = TYPE_SIZE_UNIT (aggr_type);
       /* One exception to the above is when the scalar step of the load in
 	 LOOP is zero. In this case the step here is also zero.  */
       if (*inv_p)
@@ -3175,9 +3181,9 @@ vect_create_data_ref_ptr (gimple stmt, s
 
       standard_iv_increment_position (loop, &incr_gsi, &insert_after);
 
-      create_iv (vect_ptr_init,
-		 fold_convert (vect_ptr_type, step),
-		 vect_ptr, loop, &incr_gsi, insert_after,
+      create_iv (aggr_ptr_init,
+		 fold_convert (aggr_ptr_type, step),
+		 aggr_ptr, loop, &incr_gsi, insert_after,
 		 &indx_before_incr, &indx_after_incr);
       incr = gsi_stmt (incr_gsi);
       set_vinfo_for_stmt (incr, new_stmt_vec_info (incr, loop_vinfo, NULL));
@@ -3191,14 +3197,14 @@ vect_create_data_ref_ptr (gimple stmt, s
       if (ptr_incr)
 	*ptr_incr = incr;
 
-      vptr = indx_before_incr;
+      aptr = indx_before_incr;
     }
 
   if (!nested_in_vect_loop || only_init)
-    return vptr;
+    return aptr;
 
 
-  /* (4) Handle the updating of the vector-pointer inside the inner-loop
+  /* (4) Handle the updating of the aggregate-pointer inside the inner-loop
      nested in LOOP, if exists.  */
 
   gcc_assert (nested_in_vect_loop);
@@ -3206,7 +3212,7 @@ vect_create_data_ref_ptr (gimple stmt, s
     {
       standard_iv_increment_position (containing_loop, &incr_gsi,
 				      &insert_after);
-      create_iv (vptr, fold_convert (vect_ptr_type, DR_STEP (dr)), vect_ptr,
+      create_iv (aptr, fold_convert (aggr_ptr_type, DR_STEP (dr)), aggr_ptr,
 		 containing_loop, &incr_gsi, insert_after, &indx_before_incr,
 		 &indx_after_incr);
       incr = gsi_stmt (incr_gsi);
@@ -3674,8 +3680,9 @@ vect_setup_realignment (gimple stmt, gim
 
       gcc_assert (!compute_in_loop);
       vec_dest = vect_create_destination_var (scalar_dest, vectype);
-      ptr = vect_create_data_ref_ptr (stmt, loop_for_initial_load, NULL_TREE,
-				      &init_addr, NULL, &inc, true, &inv_p);
+      ptr = vect_create_data_ref_ptr (stmt, vectype, loop_for_initial_load,
+				      NULL_TREE, &init_addr, NULL, &inc,
+				      true, &inv_p);
       new_stmt = gimple_build_assign_with_ops
 		   (BIT_AND_EXPR, NULL_TREE, ptr,
 		    build_int_cst (TREE_TYPE (ptr),
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2011-04-12 11:55:07.000000000 +0100
+++ gcc/tree-vect-stmts.c	2011-04-12 11:55:07.000000000 +0100
@@ -3581,9 +3581,9 @@ vectorizable_store (gimple stmt, gimple_
 	  /* We should have catched mismatched types earlier.  */
 	  gcc_assert (useless_type_conversion_p (vectype,
 						 TREE_TYPE (vec_oprnd)));
-	  dataref_ptr = vect_create_data_ref_ptr (first_stmt, NULL, NULL_TREE,
-						  &dummy, gsi, &ptr_incr, false,
-						  &inv_p);
+	  dataref_ptr = vect_create_data_ref_ptr (first_stmt, vectype, NULL,
+						  NULL_TREE, &dummy, gsi,
+						  &ptr_incr, false, &inv_p);
 	  gcc_assert (bb_vinfo || !inv_p);
 	}
       else
@@ -4109,9 +4109,9 @@ vectorizable_load (gimple stmt, gimple_s
     {
       /* 1. Create the vector pointer update chain.  */
       if (j == 0)
-        dataref_ptr = vect_create_data_ref_ptr (first_stmt, at_loop, offset,
-						&dummy, gsi, &ptr_incr, false,
-						&inv_p);
+        dataref_ptr = vect_create_data_ref_ptr (first_stmt, vectype, at_loop,
+						offset, &dummy, gsi,
+						&ptr_incr, false, &inv_p);
       else
         dataref_ptr =
 		bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt, NULL_TREE);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [2/9] Reindent parts of vectorizable_load and vectorizable_store
  2011-04-12 13:21 [0/9] Direct support for loads and stores of interleaved vectors Richard Sandiford
  2011-04-12 13:25 ` [1/9] Generalise vect_create_data_ref_ptr Richard Sandiford
@ 2011-04-12 13:28 ` Richard Sandiford
  2011-04-12 13:33   ` Richard Guenther
  2011-04-12 13:40 ` [3/9] STMT_VINFO_RELATED_STMT handling in vectorizable_store Richard Sandiford
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Richard Sandiford @ 2011-04-12 13:28 UTC (permalink / raw)
  To: gcc-patches; +Cc: patches

This patch just reindents part of vectorizable_load and vectorizable_store
so that the main diff is easier to read.  It also CSEs the element type,
which seemed better than breaking the long lines.

I've included both the real diff and a -b version.

Tested on x86_64-linux-gnu and arm-linux-gnueabi.  OK to install?

Richard


gcc/
	* tree-vect-stmts.c (vectorizable_store): Store the element type
	in a local variable.  Indent generation of per-vector memory accesses.
	(vectorizable_load): Likewise.

Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2011-04-12 11:55:08.000000000 +0100
+++ gcc/tree-vect-stmts.c	2011-04-12 11:55:08.000000000 +0100
@@ -3308,6 +3308,7 @@ vectorizable_store (gimple stmt, gimple_
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info), *first_dr = NULL;
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  tree elem_type;
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   struct loop *loop = NULL;
   enum machine_mode vec_mode;
@@ -3383,7 +3384,8 @@ vectorizable_store (gimple stmt, gimple_
 
   /* The scalar rhs type needs to be trivially convertible to the vector
      component type.  This should always be the case.  */
-  if (!useless_type_conversion_p (TREE_TYPE (vectype), TREE_TYPE (op)))
+  elem_type = TREE_TYPE (vectype);
+  if (!useless_type_conversion_p (elem_type, TREE_TYPE (op)))
     {
       if (vect_print_dump_info (REPORT_DETAILS))
         fprintf (vect_dump, "???  operands of different types");
@@ -3608,72 +3610,75 @@ vectorizable_store (gimple stmt, gimple_
 		bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt, NULL_TREE);
 	}
 
-      if (strided_store)
+      if (1)
 	{
-	  result_chain = VEC_alloc (tree, heap, group_size);
-	  /* Permute.  */
-	  if (!vect_permute_store_chain (dr_chain, group_size, stmt, gsi,
-					 &result_chain))
-	    return false;
-	}
-
-      next_stmt = first_stmt;
-      for (i = 0; i < vec_num; i++)
-	{
-	  struct ptr_info_def *pi;
-
-	  if (i > 0)
-	    /* Bump the vector pointer.  */
-	    dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt,
-					   NULL_TREE);
-
-	  if (slp)
-	    vec_oprnd = VEC_index (tree, vec_oprnds, i);
-	  else if (strided_store)
-	    /* For strided stores vectorized defs are interleaved in
-	       vect_permute_store_chain().  */
-	    vec_oprnd = VEC_index (tree, result_chain, i);
-
-	  data_ref = build2 (MEM_REF, TREE_TYPE (vec_oprnd), dataref_ptr,
-			     build_int_cst (reference_alias_ptr_type
-					    (DR_REF (first_dr)), 0));
-	  pi = get_ptr_info (dataref_ptr);
-	  pi->align = TYPE_ALIGN_UNIT (vectype);
-          if (aligned_access_p (first_dr))
-	    pi->misalign = 0;
-          else if (DR_MISALIGNMENT (first_dr) == -1)
+	  if (strided_store)
 	    {
-	      TREE_TYPE (data_ref)
-		= build_aligned_type (TREE_TYPE (data_ref),
-				      TYPE_ALIGN (TREE_TYPE (vectype)));
-	      pi->align = TYPE_ALIGN_UNIT (TREE_TYPE (vectype));
-	      pi->misalign = 0;
+	      result_chain = VEC_alloc (tree, heap, group_size);
+	      /* Permute.  */
+	      if (!vect_permute_store_chain (dr_chain, group_size, stmt, gsi,
+					     &result_chain))
+		return false;
 	    }
-	  else
+
+	  next_stmt = first_stmt;
+	  for (i = 0; i < vec_num; i++)
 	    {
-	      TREE_TYPE (data_ref)
-		= build_aligned_type (TREE_TYPE (data_ref),
-				      TYPE_ALIGN (TREE_TYPE (vectype)));
-	      pi->misalign = DR_MISALIGNMENT (first_dr);
-	    }
+	      struct ptr_info_def *pi;
 
-	  /* Arguments are ready.  Create the new vector stmt.  */
-	  new_stmt = gimple_build_assign (data_ref, vec_oprnd);
-	  vect_finish_stmt_generation (stmt, new_stmt, gsi);
-	  mark_symbols_for_renaming (new_stmt);
+	      if (i > 0)
+		/* Bump the vector pointer.  */
+		dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi,
+					       stmt, NULL_TREE);
+
+	      if (slp)
+		vec_oprnd = VEC_index (tree, vec_oprnds, i);
+	      else if (strided_store)
+		/* For strided stores vectorized defs are interleaved in
+		   vect_permute_store_chain().  */
+		vec_oprnd = VEC_index (tree, result_chain, i);
+
+	      data_ref = build2 (MEM_REF, TREE_TYPE (vec_oprnd), dataref_ptr,
+				 build_int_cst (reference_alias_ptr_type
+						(DR_REF (first_dr)), 0));
+	      pi = get_ptr_info (dataref_ptr);
+	      pi->align = TYPE_ALIGN_UNIT (vectype);
+	      if (aligned_access_p (first_dr))
+		pi->misalign = 0;
+	      else if (DR_MISALIGNMENT (first_dr) == -1)
+		{
+		  TREE_TYPE (data_ref)
+		    = build_aligned_type (TREE_TYPE (data_ref),
+					  TYPE_ALIGN (elem_type));
+		  pi->align = TYPE_ALIGN_UNIT (elem_type);
+		  pi->misalign = 0;
+		}
+	      else
+		{
+		  TREE_TYPE (data_ref)
+		    = build_aligned_type (TREE_TYPE (data_ref),
+					  TYPE_ALIGN (elem_type));
+		  pi->misalign = DR_MISALIGNMENT (first_dr);
+		}
 
-          if (slp)
-            continue;
+	      /* Arguments are ready.  Create the new vector stmt.  */
+	      new_stmt = gimple_build_assign (data_ref, vec_oprnd);
+	      vect_finish_stmt_generation (stmt, new_stmt, gsi);
+	      mark_symbols_for_renaming (new_stmt);
 
-          if (j == 0)
-            STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt =  new_stmt;
-	  else
-	    STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
+	      if (slp)
+		continue;
+
+	      if (j == 0)
+		STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt =  new_stmt;
+	      else
+		STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
 
-	  prev_stmt_info = vinfo_for_stmt (new_stmt);
-	  next_stmt = DR_GROUP_NEXT_DR (vinfo_for_stmt (next_stmt));
-	  if (!next_stmt)
-	    break;
+	      prev_stmt_info = vinfo_for_stmt (new_stmt);
+	      next_stmt = DR_GROUP_NEXT_DR (vinfo_for_stmt (next_stmt));
+	      if (!next_stmt)
+		break;
+	    }
 	}
     }
 
@@ -3784,6 +3789,7 @@ vectorizable_load (gimple stmt, gimple_s
   bool nested_in_vect_loop = false;
   struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info), *first_dr;
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  tree elem_type;
   tree new_temp;
   enum machine_mode mode;
   gimple new_stmt = NULL;
@@ -3888,7 +3894,8 @@ vectorizable_load (gimple stmt, gimple_s
 
   /* The vector component type needs to be trivially convertible to the
      scalar lhs.  This should always be the case.  */
-  if (!useless_type_conversion_p (TREE_TYPE (scalar_dest), TREE_TYPE (vectype)))
+  elem_type = TREE_TYPE (vectype);
+  if (!useless_type_conversion_p (TREE_TYPE (scalar_dest), elem_type))
     {
       if (vect_print_dump_info (REPORT_DETAILS))
         fprintf (vect_dump, "???  operands of different types");
@@ -4117,193 +4124,205 @@ vectorizable_load (gimple stmt, gimple_s
       if (strided_load || slp_perm)
 	dr_chain = VEC_alloc (tree, heap, vec_num);
 
-      for (i = 0; i < vec_num; i++)
+      if (1)
 	{
-	  if (i > 0)
-	    dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt,
-					   NULL_TREE);
-
-	  /* 2. Create the vector-load in the loop.  */
-	  switch (alignment_support_scheme)
+	  for (i = 0; i < vec_num; i++)
 	    {
-	    case dr_aligned:
-	    case dr_unaligned_supported:
-	      {
-		struct ptr_info_def *pi;
-		data_ref
-		  = build2 (MEM_REF, vectype, dataref_ptr,
-			    build_int_cst (reference_alias_ptr_type
-					   (DR_REF (first_dr)), 0));
-		pi = get_ptr_info (dataref_ptr);
-		pi->align = TYPE_ALIGN_UNIT (vectype);
-		if (alignment_support_scheme == dr_aligned)
-		  {
-		    gcc_assert (aligned_access_p (first_dr));
-		    pi->misalign = 0;
-		  }
-		else if (DR_MISALIGNMENT (first_dr) == -1)
+	      if (i > 0)
+		dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi,
+					       stmt, NULL_TREE);
+
+	      /* 2. Create the vector-load in the loop.  */
+	      switch (alignment_support_scheme)
+		{
+		case dr_aligned:
+		case dr_unaligned_supported:
 		  {
-		    TREE_TYPE (data_ref)
-		      = build_aligned_type (TREE_TYPE (data_ref),
-					    TYPE_ALIGN (TREE_TYPE (vectype)));
-		    pi->align = TYPE_ALIGN_UNIT (TREE_TYPE (vectype));
-		    pi->misalign = 0;
+		    struct ptr_info_def *pi;
+		    data_ref
+		      = build2 (MEM_REF, vectype, dataref_ptr,
+				build_int_cst (reference_alias_ptr_type
+					       (DR_REF (first_dr)), 0));
+		    pi = get_ptr_info (dataref_ptr);
+		    pi->align = TYPE_ALIGN_UNIT (vectype);
+		    if (alignment_support_scheme == dr_aligned)
+		      {
+			gcc_assert (aligned_access_p (first_dr));
+			pi->misalign = 0;
+		      }
+		    else if (DR_MISALIGNMENT (first_dr) == -1)
+		      {
+			TREE_TYPE (data_ref)
+			  = build_aligned_type (TREE_TYPE (data_ref),
+						TYPE_ALIGN (elem_type));
+			pi->align = TYPE_ALIGN_UNIT (elem_type);
+			pi->misalign = 0;
+		      }
+		    else
+		      {
+			TREE_TYPE (data_ref)
+			  = build_aligned_type (TREE_TYPE (data_ref),
+						TYPE_ALIGN (elem_type));
+			pi->misalign = DR_MISALIGNMENT (first_dr);
+		      }
+		    break;
 		  }
-		else
+		case dr_explicit_realign:
 		  {
-		    TREE_TYPE (data_ref)
-		      = build_aligned_type (TREE_TYPE (data_ref),
-					    TYPE_ALIGN (TREE_TYPE (vectype)));
-		    pi->misalign = DR_MISALIGNMENT (first_dr);
+		    tree ptr, bump;
+		    tree vs_minus_1;
+
+		    vs_minus_1 = size_int (TYPE_VECTOR_SUBPARTS (vectype) - 1);
+
+		    if (compute_in_loop)
+		      msq = vect_setup_realignment (first_stmt, gsi,
+						    &realignment_token,
+						    dr_explicit_realign,
+						    dataref_ptr, NULL);
+
+		    new_stmt = gimple_build_assign_with_ops
+				 (BIT_AND_EXPR, NULL_TREE, dataref_ptr,
+				  build_int_cst
+				  (TREE_TYPE (dataref_ptr),
+				   -(HOST_WIDE_INT)TYPE_ALIGN_UNIT (vectype)));
+		    ptr = make_ssa_name (SSA_NAME_VAR (dataref_ptr), new_stmt);
+		    gimple_assign_set_lhs (new_stmt, ptr);
+		    vect_finish_stmt_generation (stmt, new_stmt, gsi);
+		    data_ref
+		      = build2 (MEM_REF, vectype, ptr,
+				build_int_cst (reference_alias_ptr_type
+						 (DR_REF (first_dr)), 0));
+		    vec_dest = vect_create_destination_var (scalar_dest,
+							    vectype);
+		    new_stmt = gimple_build_assign (vec_dest, data_ref);
+		    new_temp = make_ssa_name (vec_dest, new_stmt);
+		    gimple_assign_set_lhs (new_stmt, new_temp);
+		    gimple_set_vdef (new_stmt, gimple_vdef (stmt));
+		    gimple_set_vuse (new_stmt, gimple_vuse (stmt));
+		    vect_finish_stmt_generation (stmt, new_stmt, gsi);
+		    msq = new_temp;
+
+		    bump = size_binop (MULT_EXPR, vs_minus_1,
+				       TYPE_SIZE_UNIT (scalar_type));
+		    ptr = bump_vector_ptr (dataref_ptr, NULL, gsi, stmt, bump);
+		    new_stmt = gimple_build_assign_with_ops
+				 (BIT_AND_EXPR, NULL_TREE, ptr,
+				  build_int_cst
+				  (TREE_TYPE (ptr),
+				   -(HOST_WIDE_INT)TYPE_ALIGN_UNIT (vectype)));
+		    ptr = make_ssa_name (SSA_NAME_VAR (dataref_ptr), new_stmt);
+		    gimple_assign_set_lhs (new_stmt, ptr);
+		    vect_finish_stmt_generation (stmt, new_stmt, gsi);
+		    data_ref
+		      = build2 (MEM_REF, vectype, ptr,
+				build_int_cst (reference_alias_ptr_type
+						 (DR_REF (first_dr)), 0));
+		    break;
 		  }
-		break;
-	      }
-	    case dr_explicit_realign:
-	      {
-		tree ptr, bump;
-		tree vs_minus_1 = size_int (TYPE_VECTOR_SUBPARTS (vectype) - 1);
-
-		if (compute_in_loop)
-		  msq = vect_setup_realignment (first_stmt, gsi,
-						&realignment_token,
-						dr_explicit_realign,
-						dataref_ptr, NULL);
-
-		new_stmt = gimple_build_assign_with_ops
-			     (BIT_AND_EXPR, NULL_TREE, dataref_ptr,
-			      build_int_cst
-			        (TREE_TYPE (dataref_ptr),
-				 -(HOST_WIDE_INT)TYPE_ALIGN_UNIT (vectype)));
-		ptr = make_ssa_name (SSA_NAME_VAR (dataref_ptr), new_stmt);
-		gimple_assign_set_lhs (new_stmt, ptr);
-		vect_finish_stmt_generation (stmt, new_stmt, gsi);
-		data_ref
-		  = build2 (MEM_REF, vectype, ptr,
-			    build_int_cst (reference_alias_ptr_type
-					     (DR_REF (first_dr)), 0));
-		vec_dest = vect_create_destination_var (scalar_dest, vectype);
-		new_stmt = gimple_build_assign (vec_dest, data_ref);
-		new_temp = make_ssa_name (vec_dest, new_stmt);
-		gimple_assign_set_lhs (new_stmt, new_temp);
-		gimple_set_vdef (new_stmt, gimple_vdef (stmt));
-		gimple_set_vuse (new_stmt, gimple_vuse (stmt));
-		vect_finish_stmt_generation (stmt, new_stmt, gsi);
-		msq = new_temp;
-
-		bump = size_binop (MULT_EXPR, vs_minus_1,
-				   TYPE_SIZE_UNIT (scalar_type));
-		ptr = bump_vector_ptr (dataref_ptr, NULL, gsi, stmt, bump);
-		new_stmt = gimple_build_assign_with_ops
-			     (BIT_AND_EXPR, NULL_TREE, ptr,
-			      build_int_cst
-			        (TREE_TYPE (ptr),
-				 -(HOST_WIDE_INT)TYPE_ALIGN_UNIT (vectype)));
-		ptr = make_ssa_name (SSA_NAME_VAR (dataref_ptr), new_stmt);
-		gimple_assign_set_lhs (new_stmt, ptr);
-		vect_finish_stmt_generation (stmt, new_stmt, gsi);
-		data_ref
-		  = build2 (MEM_REF, vectype, ptr,
-			    build_int_cst (reference_alias_ptr_type
-					     (DR_REF (first_dr)), 0));
-	        break;
-	      }
-	    case dr_explicit_realign_optimized:
-	      new_stmt = gimple_build_assign_with_ops
-			   (BIT_AND_EXPR, NULL_TREE, dataref_ptr,
-			    build_int_cst
-			      (TREE_TYPE (dataref_ptr),
-			       -(HOST_WIDE_INT)TYPE_ALIGN_UNIT (vectype)));
-	      new_temp = make_ssa_name (SSA_NAME_VAR (dataref_ptr), new_stmt);
-	      gimple_assign_set_lhs (new_stmt, new_temp);
-	      vect_finish_stmt_generation (stmt, new_stmt, gsi);
-	      data_ref
-		= build2 (MEM_REF, vectype, new_temp,
-			  build_int_cst (reference_alias_ptr_type
-					   (DR_REF (first_dr)), 0));
-	      break;
-	    default:
-	      gcc_unreachable ();
-	    }
-	  vec_dest = vect_create_destination_var (scalar_dest, vectype);
-	  new_stmt = gimple_build_assign (vec_dest, data_ref);
-	  new_temp = make_ssa_name (vec_dest, new_stmt);
-	  gimple_assign_set_lhs (new_stmt, new_temp);
-	  vect_finish_stmt_generation (stmt, new_stmt, gsi);
-	  mark_symbols_for_renaming (new_stmt);
-
-	  /* 3. Handle explicit realignment if necessary/supported.  Create in
-		loop: vec_dest = realign_load (msq, lsq, realignment_token)  */
-	  if (alignment_support_scheme == dr_explicit_realign_optimized
-	      || alignment_support_scheme == dr_explicit_realign)
-	    {
-	      lsq = gimple_assign_lhs (new_stmt);
-	      if (!realignment_token)
-		realignment_token = dataref_ptr;
+		case dr_explicit_realign_optimized:
+		  new_stmt = gimple_build_assign_with_ops
+			       (BIT_AND_EXPR, NULL_TREE, dataref_ptr,
+				build_int_cst
+				  (TREE_TYPE (dataref_ptr),
+				   -(HOST_WIDE_INT)TYPE_ALIGN_UNIT (vectype)));
+		  new_temp = make_ssa_name (SSA_NAME_VAR (dataref_ptr),
+					    new_stmt);
+		  gimple_assign_set_lhs (new_stmt, new_temp);
+		  vect_finish_stmt_generation (stmt, new_stmt, gsi);
+		  data_ref
+		    = build2 (MEM_REF, vectype, new_temp,
+			      build_int_cst (reference_alias_ptr_type
+					       (DR_REF (first_dr)), 0));
+		  break;
+		default:
+		  gcc_unreachable ();
+		}
 	      vec_dest = vect_create_destination_var (scalar_dest, vectype);
-	      new_stmt
-		= gimple_build_assign_with_ops3 (REALIGN_LOAD_EXPR, vec_dest,
-						 msq, lsq, realignment_token);
+	      new_stmt = gimple_build_assign (vec_dest, data_ref);
 	      new_temp = make_ssa_name (vec_dest, new_stmt);
 	      gimple_assign_set_lhs (new_stmt, new_temp);
 	      vect_finish_stmt_generation (stmt, new_stmt, gsi);
+	      mark_symbols_for_renaming (new_stmt);
 
-	      if (alignment_support_scheme == dr_explicit_realign_optimized)
+	      /* 3. Handle explicit realignment if necessary/supported.
+		 Create in loop:
+		   vec_dest = realign_load (msq, lsq, realignment_token)  */
+	      if (alignment_support_scheme == dr_explicit_realign_optimized
+		  || alignment_support_scheme == dr_explicit_realign)
 		{
-		  gcc_assert (phi);
-		  if (i == vec_num - 1 && j == ncopies - 1)
-		    add_phi_arg (phi, lsq, loop_latch_edge (containing_loop),
-				 UNKNOWN_LOCATION);
-		  msq = lsq;
+		  lsq = gimple_assign_lhs (new_stmt);
+		  if (!realignment_token)
+		    realignment_token = dataref_ptr;
+		  vec_dest = vect_create_destination_var (scalar_dest, vectype);
+		  new_stmt
+		    = gimple_build_assign_with_ops3 (REALIGN_LOAD_EXPR,
+						     vec_dest, msq, lsq,
+						     realignment_token);
+		  new_temp = make_ssa_name (vec_dest, new_stmt);
+		  gimple_assign_set_lhs (new_stmt, new_temp);
+		  vect_finish_stmt_generation (stmt, new_stmt, gsi);
+
+		  if (alignment_support_scheme == dr_explicit_realign_optimized)
+		    {
+		      gcc_assert (phi);
+		      if (i == vec_num - 1 && j == ncopies - 1)
+			add_phi_arg (phi, lsq,
+				     loop_latch_edge (containing_loop),
+				     UNKNOWN_LOCATION);
+		      msq = lsq;
+		    }
 		}
-	    }
 
-	  /* 4. Handle invariant-load.  */
-	  if (inv_p && !bb_vinfo)
-	    {
-	      gcc_assert (!strided_load);
-	      gcc_assert (nested_in_vect_loop_p (loop, stmt));
-	      if (j == 0)
+	      /* 4. Handle invariant-load.  */
+	      if (inv_p && !bb_vinfo)
 		{
-		  int k;
-		  tree t = NULL_TREE;
-		  tree vec_inv, bitpos, bitsize = TYPE_SIZE (scalar_type);
-
-		  /* CHECKME: bitpos depends on endianess?  */
-		  bitpos = bitsize_zero_node;
-		  vec_inv = build3 (BIT_FIELD_REF, scalar_type, new_temp,
-				    bitsize, bitpos);
-		  vec_dest =
-			vect_create_destination_var (scalar_dest, NULL_TREE);
-		  new_stmt = gimple_build_assign (vec_dest, vec_inv);
-                  new_temp = make_ssa_name (vec_dest, new_stmt);
-		  gimple_assign_set_lhs (new_stmt, new_temp);
-		  vect_finish_stmt_generation (stmt, new_stmt, gsi);
+		  gcc_assert (!strided_load);
+		  gcc_assert (nested_in_vect_loop_p (loop, stmt));
+		  if (j == 0)
+		    {
+		      int k;
+		      tree t = NULL_TREE;
+		      tree vec_inv, bitpos, bitsize = TYPE_SIZE (scalar_type);
+
+		      /* CHECKME: bitpos depends on endianess?  */
+		      bitpos = bitsize_zero_node;
+		      vec_inv = build3 (BIT_FIELD_REF, scalar_type, new_temp,
+					bitsize, bitpos);
+		      vec_dest = vect_create_destination_var (scalar_dest,
+							      NULL_TREE);
+		      new_stmt = gimple_build_assign (vec_dest, vec_inv);
+		      new_temp = make_ssa_name (vec_dest, new_stmt);
+		      gimple_assign_set_lhs (new_stmt, new_temp);
+		      vect_finish_stmt_generation (stmt, new_stmt, gsi);
+
+		      for (k = nunits - 1; k >= 0; --k)
+			t = tree_cons (NULL_TREE, new_temp, t);
+		      /* FIXME: use build_constructor directly.  */
+		      vec_inv = build_constructor_from_list (vectype, t);
+		      new_temp = vect_init_vector (stmt, vec_inv,
+						   vectype, gsi);
+		      new_stmt = SSA_NAME_DEF_STMT (new_temp);
+		    }
+		  else
+		    gcc_unreachable (); /* FORNOW. */
+		}
 
-		  for (k = nunits - 1; k >= 0; --k)
-		    t = tree_cons (NULL_TREE, new_temp, t);
-		  /* FIXME: use build_constructor directly.  */
-		  vec_inv = build_constructor_from_list (vectype, t);
-		  new_temp = vect_init_vector (stmt, vec_inv, vectype, gsi);
+	      if (negative)
+		{
+		  new_temp = reverse_vec_elements (new_temp, stmt, gsi);
 		  new_stmt = SSA_NAME_DEF_STMT (new_temp);
 		}
-	      else
-		gcc_unreachable (); /* FORNOW. */
-	    }
 
-	  if (negative)
-	    {
-	      new_temp = reverse_vec_elements (new_temp, stmt, gsi);
-	      new_stmt = SSA_NAME_DEF_STMT (new_temp);
+	      /* Collect vector loads and later create their permutation in
+		 vect_transform_strided_load ().  */
+	      if (strided_load || slp_perm)
+		VEC_quick_push (tree, dr_chain, new_temp);
+
+	      /* Store vector loads in the corresponding SLP_NODE.  */
+	      if (slp && !slp_perm)
+		VEC_quick_push (gimple, SLP_TREE_VEC_STMTS (slp_node),
+				new_stmt);
 	    }
-
-	  /* Collect vector loads and later create their permutation in
-	     vect_transform_strided_load ().  */
-          if (strided_load || slp_perm)
-            VEC_quick_push (tree, dr_chain, new_temp);
-
-         /* Store vector loads in the corresponding SLP_NODE.  */
-	  if (slp && !slp_perm)
-	    VEC_quick_push (gimple, SLP_TREE_VEC_STMTS (slp_node), new_stmt);
 	}
 
       if (slp && !slp_perm)
@@ -4322,7 +4341,8 @@ vectorizable_load (gimple stmt, gimple_s
         {
           if (strided_load)
   	    {
-	      if (!vect_transform_strided_load (stmt, dr_chain, group_size, gsi))
+	      if (!vect_transform_strided_load (stmt, dr_chain,
+						group_size, gsi))
 	        return false;
 
 	      *vec_stmt = STMT_VINFO_VEC_STMT (stmt_info);

Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2011-04-12 14:27:00.000000000 +0100
+++ gcc/tree-vect-stmts.c	2011-04-12 14:27:02.000000000 +0100
@@ -3308,6 +3308,7 @@ vectorizable_store (gimple stmt, gimple_
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info), *first_dr = NULL;
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  tree elem_type;
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   struct loop *loop = NULL;
   enum machine_mode vec_mode;
@@ -3383,7 +3384,8 @@ vectorizable_store (gimple stmt, gimple_
 
   /* The scalar rhs type needs to be trivially convertible to the vector
      component type.  This should always be the case.  */
-  if (!useless_type_conversion_p (TREE_TYPE (vectype), TREE_TYPE (op)))
+  elem_type = TREE_TYPE (vectype);
+  if (!useless_type_conversion_p (elem_type, TREE_TYPE (op)))
     {
       if (vect_print_dump_info (REPORT_DETAILS))
         fprintf (vect_dump, "???  operands of different types");
@@ -3608,6 +3610,8 @@ vectorizable_store (gimple stmt, gimple_
 		bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt, NULL_TREE);
 	}
 
+      if (1)
+	{
       if (strided_store)
 	{
 	  result_chain = VEC_alloc (tree, heap, group_size);
@@ -3624,8 +3628,8 @@ vectorizable_store (gimple stmt, gimple_
 
 	  if (i > 0)
 	    /* Bump the vector pointer.  */
-	    dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt,
-					   NULL_TREE);
+		dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi,
+					       stmt, NULL_TREE);
 
 	  if (slp)
 	    vec_oprnd = VEC_index (tree, vec_oprnds, i);
@@ -3645,15 +3649,15 @@ vectorizable_store (gimple stmt, gimple_
 	    {
 	      TREE_TYPE (data_ref)
 		= build_aligned_type (TREE_TYPE (data_ref),
-				      TYPE_ALIGN (TREE_TYPE (vectype)));
-	      pi->align = TYPE_ALIGN_UNIT (TREE_TYPE (vectype));
+					  TYPE_ALIGN (elem_type));
+		  pi->align = TYPE_ALIGN_UNIT (elem_type);
 	      pi->misalign = 0;
 	    }
 	  else
 	    {
 	      TREE_TYPE (data_ref)
 		= build_aligned_type (TREE_TYPE (data_ref),
-				      TYPE_ALIGN (TREE_TYPE (vectype)));
+					  TYPE_ALIGN (elem_type));
 	      pi->misalign = DR_MISALIGNMENT (first_dr);
 	    }
 
@@ -3676,6 +3680,7 @@ vectorizable_store (gimple stmt, gimple_
 	    break;
 	}
     }
+    }
 
   VEC_free (tree, heap, dr_chain);
   VEC_free (tree, heap, oprnds);
@@ -3784,6 +3789,7 @@ vectorizable_load (gimple stmt, gimple_s
   bool nested_in_vect_loop = false;
   struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info), *first_dr;
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  tree elem_type;
   tree new_temp;
   enum machine_mode mode;
   gimple new_stmt = NULL;
@@ -3888,7 +3894,8 @@ vectorizable_load (gimple stmt, gimple_s
 
   /* The vector component type needs to be trivially convertible to the
      scalar lhs.  This should always be the case.  */
-  if (!useless_type_conversion_p (TREE_TYPE (scalar_dest), TREE_TYPE (vectype)))
+  elem_type = TREE_TYPE (vectype);
+  if (!useless_type_conversion_p (TREE_TYPE (scalar_dest), elem_type))
     {
       if (vect_print_dump_info (REPORT_DETAILS))
         fprintf (vect_dump, "???  operands of different types");
@@ -4117,11 +4124,13 @@ vectorizable_load (gimple stmt, gimple_s
       if (strided_load || slp_perm)
 	dr_chain = VEC_alloc (tree, heap, vec_num);
 
+      if (1)
+	{
       for (i = 0; i < vec_num; i++)
 	{
 	  if (i > 0)
-	    dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt,
-					   NULL_TREE);
+		dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi,
+					       stmt, NULL_TREE);
 
 	  /* 2. Create the vector-load in the loop.  */
 	  switch (alignment_support_scheme)
@@ -4145,15 +4154,15 @@ vectorizable_load (gimple stmt, gimple_s
 		  {
 		    TREE_TYPE (data_ref)
 		      = build_aligned_type (TREE_TYPE (data_ref),
-					    TYPE_ALIGN (TREE_TYPE (vectype)));
-		    pi->align = TYPE_ALIGN_UNIT (TREE_TYPE (vectype));
+						TYPE_ALIGN (elem_type));
+			pi->align = TYPE_ALIGN_UNIT (elem_type);
 		    pi->misalign = 0;
 		  }
 		else
 		  {
 		    TREE_TYPE (data_ref)
 		      = build_aligned_type (TREE_TYPE (data_ref),
-					    TYPE_ALIGN (TREE_TYPE (vectype)));
+						TYPE_ALIGN (elem_type));
 		    pi->misalign = DR_MISALIGNMENT (first_dr);
 		  }
 		break;
@@ -4161,7 +4170,9 @@ vectorizable_load (gimple stmt, gimple_s
 	    case dr_explicit_realign:
 	      {
 		tree ptr, bump;
-		tree vs_minus_1 = size_int (TYPE_VECTOR_SUBPARTS (vectype) - 1);
+		    tree vs_minus_1;
+
+		    vs_minus_1 = size_int (TYPE_VECTOR_SUBPARTS (vectype) - 1);
 
 		if (compute_in_loop)
 		  msq = vect_setup_realignment (first_stmt, gsi,
@@ -4181,7 +4192,8 @@ vectorizable_load (gimple stmt, gimple_s
 		  = build2 (MEM_REF, vectype, ptr,
 			    build_int_cst (reference_alias_ptr_type
 					     (DR_REF (first_dr)), 0));
-		vec_dest = vect_create_destination_var (scalar_dest, vectype);
+		    vec_dest = vect_create_destination_var (scalar_dest,
+							    vectype);
 		new_stmt = gimple_build_assign (vec_dest, data_ref);
 		new_temp = make_ssa_name (vec_dest, new_stmt);
 		gimple_assign_set_lhs (new_stmt, new_temp);
@@ -4213,7 +4225,8 @@ vectorizable_load (gimple stmt, gimple_s
 			    build_int_cst
 			      (TREE_TYPE (dataref_ptr),
 			       -(HOST_WIDE_INT)TYPE_ALIGN_UNIT (vectype)));
-	      new_temp = make_ssa_name (SSA_NAME_VAR (dataref_ptr), new_stmt);
+		  new_temp = make_ssa_name (SSA_NAME_VAR (dataref_ptr),
+					    new_stmt);
 	      gimple_assign_set_lhs (new_stmt, new_temp);
 	      vect_finish_stmt_generation (stmt, new_stmt, gsi);
 	      data_ref
@@ -4231,8 +4244,9 @@ vectorizable_load (gimple stmt, gimple_s
 	  vect_finish_stmt_generation (stmt, new_stmt, gsi);
 	  mark_symbols_for_renaming (new_stmt);
 
-	  /* 3. Handle explicit realignment if necessary/supported.  Create in
-		loop: vec_dest = realign_load (msq, lsq, realignment_token)  */
+	      /* 3. Handle explicit realignment if necessary/supported.
+		 Create in loop:
+		   vec_dest = realign_load (msq, lsq, realignment_token)  */
 	  if (alignment_support_scheme == dr_explicit_realign_optimized
 	      || alignment_support_scheme == dr_explicit_realign)
 	    {
@@ -4241,8 +4255,9 @@ vectorizable_load (gimple stmt, gimple_s
 		realignment_token = dataref_ptr;
 	      vec_dest = vect_create_destination_var (scalar_dest, vectype);
 	      new_stmt
-		= gimple_build_assign_with_ops3 (REALIGN_LOAD_EXPR, vec_dest,
-						 msq, lsq, realignment_token);
+		    = gimple_build_assign_with_ops3 (REALIGN_LOAD_EXPR,
+						     vec_dest, msq, lsq,
+						     realignment_token);
 	      new_temp = make_ssa_name (vec_dest, new_stmt);
 	      gimple_assign_set_lhs (new_stmt, new_temp);
 	      vect_finish_stmt_generation (stmt, new_stmt, gsi);
@@ -4251,7 +4266,8 @@ vectorizable_load (gimple stmt, gimple_s
 		{
 		  gcc_assert (phi);
 		  if (i == vec_num - 1 && j == ncopies - 1)
-		    add_phi_arg (phi, lsq, loop_latch_edge (containing_loop),
+			add_phi_arg (phi, lsq,
+				     loop_latch_edge (containing_loop),
 				 UNKNOWN_LOCATION);
 		  msq = lsq;
 		}
@@ -4272,8 +4288,8 @@ vectorizable_load (gimple stmt, gimple_s
 		  bitpos = bitsize_zero_node;
 		  vec_inv = build3 (BIT_FIELD_REF, scalar_type, new_temp,
 				    bitsize, bitpos);
-		  vec_dest =
-			vect_create_destination_var (scalar_dest, NULL_TREE);
+		      vec_dest = vect_create_destination_var (scalar_dest,
+							      NULL_TREE);
 		  new_stmt = gimple_build_assign (vec_dest, vec_inv);
                   new_temp = make_ssa_name (vec_dest, new_stmt);
 		  gimple_assign_set_lhs (new_stmt, new_temp);
@@ -4283,7 +4299,8 @@ vectorizable_load (gimple stmt, gimple_s
 		    t = tree_cons (NULL_TREE, new_temp, t);
 		  /* FIXME: use build_constructor directly.  */
 		  vec_inv = build_constructor_from_list (vectype, t);
-		  new_temp = vect_init_vector (stmt, vec_inv, vectype, gsi);
+		      new_temp = vect_init_vector (stmt, vec_inv,
+						   vectype, gsi);
 		  new_stmt = SSA_NAME_DEF_STMT (new_temp);
 		}
 	      else
@@ -4303,7 +4320,9 @@ vectorizable_load (gimple stmt, gimple_s
 
          /* Store vector loads in the corresponding SLP_NODE.  */
 	  if (slp && !slp_perm)
-	    VEC_quick_push (gimple, SLP_TREE_VEC_STMTS (slp_node), new_stmt);
+		VEC_quick_push (gimple, SLP_TREE_VEC_STMTS (slp_node),
+				new_stmt);
+	    }
 	}
 
       if (slp && !slp_perm)
@@ -4322,7 +4341,8 @@ vectorizable_load (gimple stmt, gimple_s
         {
           if (strided_load)
   	    {
-	      if (!vect_transform_strided_load (stmt, dr_chain, group_size, gsi))
+	      if (!vect_transform_strided_load (stmt, dr_chain,
+						group_size, gsi))
 	        return false;
 
 	      *vec_stmt = STMT_VINFO_VEC_STMT (stmt_info);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [1/9] Generalise vect_create_data_ref_ptr
  2011-04-12 13:25 ` [1/9] Generalise vect_create_data_ref_ptr Richard Sandiford
@ 2011-04-12 13:30   ` Richard Guenther
  0 siblings, 0 replies; 27+ messages in thread
From: Richard Guenther @ 2011-04-12 13:30 UTC (permalink / raw)
  To: gcc-patches, patches, richard.sandiford

On Tue, Apr 12, 2011 at 3:24 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This first patch generalises vect_create_data_ref_ptr & bump_data_ref_ptr
> so that they can handle array as well as vector types.  The two cases are
> so similar that it's mostly a renaming exercise.
>
> Tested on x86_64-linux-gnu and arm-linux-gnueabi.  OK to install?

Ok.

Thanks,
Richard.

> Richard
>
>
> gcc/
>        * tree-vectorizer.h (vect_create_data_ref_ptr): Add an extra
>        type parameter.
>        * tree-vect-data-refs.c (vect_create_data_ref_ptr): Add an aggr_type
>        parameter.  Generalise code to handle arrays as well as vectors.
>        (vect_setup_realignment): Update accordingly.
>        * tree-vect-stmts.c (vectorizable_store): Likewise.
>        (vectorizable_load): Likewise.
>
> Index: gcc/tree-vectorizer.h
> ===================================================================
> --- gcc/tree-vectorizer.h       2011-04-12 11:53:54.000000000 +0100
> +++ gcc/tree-vectorizer.h       2011-04-12 11:55:07.000000000 +0100
> @@ -823,9 +823,9 @@ extern bool vect_verify_datarefs_alignme
>  extern bool vect_analyze_data_ref_accesses (loop_vec_info, bb_vec_info);
>  extern bool vect_prune_runtime_alias_test_list (loop_vec_info);
>  extern bool vect_analyze_data_refs (loop_vec_info, bb_vec_info, int *);
> -extern tree vect_create_data_ref_ptr (gimple, struct loop *, tree, tree *,
> -                                      gimple_stmt_iterator *, gimple *,
> -                                      bool, bool *);
> +extern tree vect_create_data_ref_ptr (gimple, tree, struct loop *, tree,
> +                                     tree *, gimple_stmt_iterator *,
> +                                     gimple *, bool, bool *);
>  extern tree bump_vector_ptr (tree, gimple, gimple_stmt_iterator *, gimple, tree);
>  extern tree vect_create_destination_var (tree, tree);
>  extern bool vect_strided_store_supported (tree);
> Index: gcc/tree-vect-data-refs.c
> ===================================================================
> --- gcc/tree-vect-data-refs.c   2011-04-12 11:53:54.000000000 +0100
> +++ gcc/tree-vect-data-refs.c   2011-04-12 11:55:07.000000000 +0100
> @@ -2911,32 +2911,33 @@ vect_create_addr_base_for_vector_ref (gi
>
>  /* Function vect_create_data_ref_ptr.
>
> -   Create a new pointer to vector type (vp), that points to the first location
> -   accessed in the loop by STMT, along with the def-use update chain to
> -   appropriately advance the pointer through the loop iterations. Also set
> -   aliasing information for the pointer.  This vector pointer is used by the
> -   callers to this function to create a memory reference expression for vector
> -   load/store access.
> +   Create a new pointer-to-TYPE variable (ap), that points to the first
> +   location accessed in the loop by STMT, along with the def-use update
> +   chain to appropriately advance the pointer through the loop iterations.
> +   Also set aliasing information for the pointer.  This pointer is used by
> +   the callers to this function to create a memory reference expression for
> +   vector load/store access.
>
>    Input:
>    1. STMT: a stmt that references memory. Expected to be of the form
>          GIMPLE_ASSIGN <name, data-ref> or
>         GIMPLE_ASSIGN <data-ref, name>.
> -   2. AT_LOOP: the loop where the vector memref is to be created.
> -   3. OFFSET (optional): an offset to be added to the initial address accessed
> +   2. AGGR_TYPE: the type of the reference, which should be either a vector
> +        or an array.
> +   3. AT_LOOP: the loop where the vector memref is to be created.
> +   4. OFFSET (optional): an offset to be added to the initial address accessed
>         by the data-ref in STMT.
> -   4. BSI: location where the new stmts are to be placed if there is no loop
> -   5. ONLY_INIT: indicate if vp is to be updated in the loop, or remain
> +   5. BSI: location where the new stmts are to be placed if there is no loop
> +   6. ONLY_INIT: indicate if ap is to be updated in the loop, or remain
>         pointing to the initial address.
> -   6. TYPE: if not NULL indicates the required type of the data-ref.
>
>    Output:
>    1. Declare a new ptr to vector_type, and have it point to the base of the
>       data reference (initial addressed accessed by the data reference).
>       For example, for vector of type V8HI, the following code is generated:
>
> -      v8hi *vp;
> -      vp = (v8hi *)initial_address;
> +      v8hi *ap;
> +      ap = (v8hi *)initial_address;
>
>       if OFFSET is not supplied:
>          initial_address = &a[init];
> @@ -2956,9 +2957,10 @@ vect_create_addr_base_for_vector_ref (gi
>    4. Return the pointer.  */
>
>  tree
> -vect_create_data_ref_ptr (gimple stmt, struct loop *at_loop, tree offset,
> -                         tree *initial_address, gimple_stmt_iterator *gsi,
> -                         gimple *ptr_incr, bool only_init, bool *inv_p)
> +vect_create_data_ref_ptr (gimple stmt, tree aggr_type, struct loop *at_loop,
> +                         tree offset, tree *initial_address,
> +                         gimple_stmt_iterator *gsi, gimple *ptr_incr,
> +                         bool only_init, bool *inv_p)
>  {
>   tree base_name;
>   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> @@ -2966,17 +2968,16 @@ vect_create_data_ref_ptr (gimple stmt, s
>   struct loop *loop = NULL;
>   bool nested_in_vect_loop = false;
>   struct loop *containing_loop = NULL;
> -  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> -  tree vect_ptr_type;
> -  tree vect_ptr;
> +  tree aggr_ptr_type;
> +  tree aggr_ptr;
>   tree new_temp;
>   gimple vec_stmt;
>   gimple_seq new_stmt_list = NULL;
>   edge pe = NULL;
>   basic_block new_bb;
> -  tree vect_ptr_init;
> +  tree aggr_ptr_init;
>   struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
> -  tree vptr;
> +  tree aptr;
>   gimple_stmt_iterator incr_gsi;
>   bool insert_after;
>   bool negative;
> @@ -2986,6 +2987,9 @@ vect_create_data_ref_ptr (gimple stmt, s
>   bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
>   tree base;
>
> +  gcc_assert (TREE_CODE (aggr_type) == ARRAY_TYPE
> +             || TREE_CODE (aggr_type) == VECTOR_TYPE);
> +
>   if (loop_vinfo)
>     {
>       loop = LOOP_VINFO_LOOP (loop_vinfo);
> @@ -3020,8 +3024,9 @@ vect_create_data_ref_ptr (gimple stmt, s
>   if (vect_print_dump_info (REPORT_DETAILS))
>     {
>       tree data_ref_base = base_name;
> -      fprintf (vect_dump, "create vector-pointer variable to type: ");
> -      print_generic_expr (vect_dump, vectype, TDF_SLIM);
> +      fprintf (vect_dump, "create %s-pointer variable to type: ",
> +              tree_code_name[(int) TREE_CODE (aggr_type)]);
> +      print_generic_expr (vect_dump, aggr_type, TDF_SLIM);
>       if (TREE_CODE (data_ref_base) == VAR_DECL
>           || TREE_CODE (data_ref_base) == ARRAY_REF)
>         fprintf (vect_dump, "  vectorizing an array ref: ");
> @@ -3032,27 +3037,28 @@ vect_create_data_ref_ptr (gimple stmt, s
>       print_generic_expr (vect_dump, base_name, TDF_SLIM);
>     }
>
> -  /* (1) Create the new vector-pointer variable.  */
> -  vect_ptr_type = build_pointer_type (vectype);
> +  /* (1) Create the new aggregate-pointer variable.  */
> +  aggr_ptr_type = build_pointer_type (aggr_type);
>   base = get_base_address (DR_REF (dr));
>   if (base
>       && TREE_CODE (base) == MEM_REF)
> -    vect_ptr_type
> -      = build_qualified_type (vect_ptr_type,
> +    aggr_ptr_type
> +      = build_qualified_type (aggr_ptr_type,
>                              TYPE_QUALS (TREE_TYPE (TREE_OPERAND (base, 0))));
> -  vect_ptr = vect_get_new_vect_var (vect_ptr_type, vect_pointer_var,
> +  aggr_ptr = vect_get_new_vect_var (aggr_ptr_type, vect_pointer_var,
>                                     get_name (base_name));
>
> -  /* Vector types inherit the alias set of their component type by default so
> -     we need to use a ref-all pointer if the data reference does not conflict
> -     with the created vector data reference because it is not addressable.  */
> -  if (!alias_sets_conflict_p (get_deref_alias_set (vect_ptr),
> +  /* Vector and array types inherit the alias set of their component
> +     type by default so we need to use a ref-all pointer if the data
> +     reference does not conflict with the created aggregated data
> +     reference because it is not addressable.  */
> +  if (!alias_sets_conflict_p (get_deref_alias_set (aggr_ptr),
>                              get_alias_set (DR_REF (dr))))
>     {
> -      vect_ptr_type
> -       = build_pointer_type_for_mode (vectype,
> -                                      TYPE_MODE (vect_ptr_type), true);
> -      vect_ptr = vect_get_new_vect_var (vect_ptr_type, vect_pointer_var,
> +      aggr_ptr_type
> +       = build_pointer_type_for_mode (aggr_type,
> +                                      TYPE_MODE (aggr_ptr_type), true);
> +      aggr_ptr = vect_get_new_vect_var (aggr_ptr_type, vect_pointer_var,
>                                        get_name (base_name));
>     }
>
> @@ -3063,14 +3069,14 @@ vect_create_data_ref_ptr (gimple stmt, s
>       do
>        {
>          tree lhs = gimple_assign_lhs (orig_stmt);
> -         if (!alias_sets_conflict_p (get_deref_alias_set (vect_ptr),
> +         if (!alias_sets_conflict_p (get_deref_alias_set (aggr_ptr),
>                                      get_alias_set (lhs)))
>            {
> -             vect_ptr_type
> -               = build_pointer_type_for_mode (vectype,
> -                                              TYPE_MODE (vect_ptr_type), true);
> -             vect_ptr
> -               = vect_get_new_vect_var (vect_ptr_type, vect_pointer_var,
> +             aggr_ptr_type
> +               = build_pointer_type_for_mode (aggr_type,
> +                                              TYPE_MODE (aggr_ptr_type), true);
> +             aggr_ptr
> +               = vect_get_new_vect_var (aggr_ptr_type, vect_pointer_var,
>                                         get_name (base_name));
>              break;
>            }
> @@ -3080,7 +3086,7 @@ vect_create_data_ref_ptr (gimple stmt, s
>       while (orig_stmt);
>     }
>
> -  add_referenced_var (vect_ptr);
> +  add_referenced_var (aggr_ptr);
>
>   /* Note: If the dataref is in an inner-loop nested in LOOP, and we are
>      vectorizing LOOP (i.e., outer-loop vectorization), we need to create two
> @@ -3113,8 +3119,8 @@ vect_create_data_ref_ptr (gimple stmt, s
>                vp2 = vp1 + step
>                if () goto LOOP   */
>
> -  /* (2) Calculate the initial address the vector-pointer, and set
> -         the vector-pointer to point to it before the loop.  */
> +  /* (2) Calculate the initial address of the aggregate-pointer, and set
> +     the aggregate-pointer to point to it before the loop.  */
>
>   /* Create: (&(base[init_val+offset]) in the loop preheader.  */
>
> @@ -3133,17 +3139,17 @@ vect_create_data_ref_ptr (gimple stmt, s
>
>   *initial_address = new_temp;
>
> -  /* Create: p = (vectype *) initial_base  */
> +  /* Create: p = (aggr_type *) initial_base  */
>   if (TREE_CODE (new_temp) != SSA_NAME
> -      || !useless_type_conversion_p (vect_ptr_type, TREE_TYPE (new_temp)))
> +      || !useless_type_conversion_p (aggr_ptr_type, TREE_TYPE (new_temp)))
>     {
> -      vec_stmt = gimple_build_assign (vect_ptr,
> -                                     fold_convert (vect_ptr_type, new_temp));
> -      vect_ptr_init = make_ssa_name (vect_ptr, vec_stmt);
> +      vec_stmt = gimple_build_assign (aggr_ptr,
> +                                     fold_convert (aggr_ptr_type, new_temp));
> +      aggr_ptr_init = make_ssa_name (aggr_ptr, vec_stmt);
>       /* Copy the points-to information if it exists. */
>       if (DR_PTR_INFO (dr))
> -       duplicate_ssa_name_ptr_info (vect_ptr_init, DR_PTR_INFO (dr));
> -      gimple_assign_set_lhs (vec_stmt, vect_ptr_init);
> +       duplicate_ssa_name_ptr_info (aggr_ptr_init, DR_PTR_INFO (dr));
> +      gimple_assign_set_lhs (vec_stmt, aggr_ptr_init);
>       if (pe)
>        {
>          new_bb = gsi_insert_on_edge_immediate (pe, vec_stmt);
> @@ -3153,19 +3159,19 @@ vect_create_data_ref_ptr (gimple stmt, s
>        gsi_insert_before (gsi, vec_stmt, GSI_SAME_STMT);
>     }
>   else
> -    vect_ptr_init = new_temp;
> +    aggr_ptr_init = new_temp;
>
> -  /* (3) Handle the updating of the vector-pointer inside the loop.
> +  /* (3) Handle the updating of the aggregate-pointer inside the loop.
>      This is needed when ONLY_INIT is false, and also when AT_LOOP is the
>      inner-loop nested in LOOP (during outer-loop vectorization).  */
>
>   /* No update in loop is required.  */
>   if (only_init && (!loop_vinfo || at_loop == loop))
> -    vptr = vect_ptr_init;
> +    aptr = aggr_ptr_init;
>   else
>     {
> -      /* The step of the vector pointer is the Vector Size.  */
> -      tree step = TYPE_SIZE_UNIT (vectype);
> +      /* The step of the aggregate pointer is the type size.  */
> +      tree step = TYPE_SIZE_UNIT (aggr_type);
>       /* One exception to the above is when the scalar step of the load in
>         LOOP is zero. In this case the step here is also zero.  */
>       if (*inv_p)
> @@ -3175,9 +3181,9 @@ vect_create_data_ref_ptr (gimple stmt, s
>
>       standard_iv_increment_position (loop, &incr_gsi, &insert_after);
>
> -      create_iv (vect_ptr_init,
> -                fold_convert (vect_ptr_type, step),
> -                vect_ptr, loop, &incr_gsi, insert_after,
> +      create_iv (aggr_ptr_init,
> +                fold_convert (aggr_ptr_type, step),
> +                aggr_ptr, loop, &incr_gsi, insert_after,
>                 &indx_before_incr, &indx_after_incr);
>       incr = gsi_stmt (incr_gsi);
>       set_vinfo_for_stmt (incr, new_stmt_vec_info (incr, loop_vinfo, NULL));
> @@ -3191,14 +3197,14 @@ vect_create_data_ref_ptr (gimple stmt, s
>       if (ptr_incr)
>        *ptr_incr = incr;
>
> -      vptr = indx_before_incr;
> +      aptr = indx_before_incr;
>     }
>
>   if (!nested_in_vect_loop || only_init)
> -    return vptr;
> +    return aptr;
>
>
> -  /* (4) Handle the updating of the vector-pointer inside the inner-loop
> +  /* (4) Handle the updating of the aggregate-pointer inside the inner-loop
>      nested in LOOP, if exists.  */
>
>   gcc_assert (nested_in_vect_loop);
> @@ -3206,7 +3212,7 @@ vect_create_data_ref_ptr (gimple stmt, s
>     {
>       standard_iv_increment_position (containing_loop, &incr_gsi,
>                                      &insert_after);
> -      create_iv (vptr, fold_convert (vect_ptr_type, DR_STEP (dr)), vect_ptr,
> +      create_iv (aptr, fold_convert (aggr_ptr_type, DR_STEP (dr)), aggr_ptr,
>                 containing_loop, &incr_gsi, insert_after, &indx_before_incr,
>                 &indx_after_incr);
>       incr = gsi_stmt (incr_gsi);
> @@ -3674,8 +3680,9 @@ vect_setup_realignment (gimple stmt, gim
>
>       gcc_assert (!compute_in_loop);
>       vec_dest = vect_create_destination_var (scalar_dest, vectype);
> -      ptr = vect_create_data_ref_ptr (stmt, loop_for_initial_load, NULL_TREE,
> -                                     &init_addr, NULL, &inc, true, &inv_p);
> +      ptr = vect_create_data_ref_ptr (stmt, vectype, loop_for_initial_load,
> +                                     NULL_TREE, &init_addr, NULL, &inc,
> +                                     true, &inv_p);
>       new_stmt = gimple_build_assign_with_ops
>                   (BIT_AND_EXPR, NULL_TREE, ptr,
>                    build_int_cst (TREE_TYPE (ptr),
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       2011-04-12 11:55:07.000000000 +0100
> +++ gcc/tree-vect-stmts.c       2011-04-12 11:55:07.000000000 +0100
> @@ -3581,9 +3581,9 @@ vectorizable_store (gimple stmt, gimple_
>          /* We should have catched mismatched types earlier.  */
>          gcc_assert (useless_type_conversion_p (vectype,
>                                                 TREE_TYPE (vec_oprnd)));
> -         dataref_ptr = vect_create_data_ref_ptr (first_stmt, NULL, NULL_TREE,
> -                                                 &dummy, gsi, &ptr_incr, false,
> -                                                 &inv_p);
> +         dataref_ptr = vect_create_data_ref_ptr (first_stmt, vectype, NULL,
> +                                                 NULL_TREE, &dummy, gsi,
> +                                                 &ptr_incr, false, &inv_p);
>          gcc_assert (bb_vinfo || !inv_p);
>        }
>       else
> @@ -4109,9 +4109,9 @@ vectorizable_load (gimple stmt, gimple_s
>     {
>       /* 1. Create the vector pointer update chain.  */
>       if (j == 0)
> -        dataref_ptr = vect_create_data_ref_ptr (first_stmt, at_loop, offset,
> -                                               &dummy, gsi, &ptr_incr, false,
> -                                               &inv_p);
> +        dataref_ptr = vect_create_data_ref_ptr (first_stmt, vectype, at_loop,
> +                                               offset, &dummy, gsi,
> +                                               &ptr_incr, false, &inv_p);
>       else
>         dataref_ptr =
>                bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt, NULL_TREE);
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [2/9] Reindent parts of vectorizable_load and vectorizable_store
  2011-04-12 13:28 ` [2/9] Reindent parts of vectorizable_load and vectorizable_store Richard Sandiford
@ 2011-04-12 13:33   ` Richard Guenther
  2011-04-12 14:39     ` Richard Sandiford
  0 siblings, 1 reply; 27+ messages in thread
From: Richard Guenther @ 2011-04-12 13:33 UTC (permalink / raw)
  To: gcc-patches, patches, richard.sandiford

On Tue, Apr 12, 2011 at 3:28 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This patch just reindents part of vectorizable_load and vectorizable_store
> so that the main diff is easier to read.  It also CSEs the element type,
> which seemed better than breaking the long lines.
>
> I've included both the real diff and a -b version.
>
> Tested on x86_64-linux-gnu and arm-linux-gnueabi.  OK to install?

CSEing element type is ok, but please don't install patches (separately)
that introduce if (1)s.  I suppose this patch is to make followups smaller?

Richard.

> Richard
>
>
> gcc/
>        * tree-vect-stmts.c (vectorizable_store): Store the element type
>        in a local variable.  Indent generation of per-vector memory accesses.
>        (vectorizable_load): Likewise.
>
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       2011-04-12 11:55:08.000000000 +0100
> +++ gcc/tree-vect-stmts.c       2011-04-12 11:55:08.000000000 +0100
> @@ -3308,6 +3308,7 @@ vectorizable_store (gimple stmt, gimple_
>   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>   struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info), *first_dr = NULL;
>   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> +  tree elem_type;
>   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>   struct loop *loop = NULL;
>   enum machine_mode vec_mode;
> @@ -3383,7 +3384,8 @@ vectorizable_store (gimple stmt, gimple_
>
>   /* The scalar rhs type needs to be trivially convertible to the vector
>      component type.  This should always be the case.  */
> -  if (!useless_type_conversion_p (TREE_TYPE (vectype), TREE_TYPE (op)))
> +  elem_type = TREE_TYPE (vectype);
> +  if (!useless_type_conversion_p (elem_type, TREE_TYPE (op)))
>     {
>       if (vect_print_dump_info (REPORT_DETAILS))
>         fprintf (vect_dump, "???  operands of different types");
> @@ -3608,72 +3610,75 @@ vectorizable_store (gimple stmt, gimple_
>                bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt, NULL_TREE);
>        }
>
> -      if (strided_store)
> +      if (1)
>        {
> -         result_chain = VEC_alloc (tree, heap, group_size);
> -         /* Permute.  */
> -         if (!vect_permute_store_chain (dr_chain, group_size, stmt, gsi,
> -                                        &result_chain))
> -           return false;
> -       }
> -
> -      next_stmt = first_stmt;
> -      for (i = 0; i < vec_num; i++)
> -       {
> -         struct ptr_info_def *pi;
> -
> -         if (i > 0)
> -           /* Bump the vector pointer.  */
> -           dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt,
> -                                          NULL_TREE);
> -
> -         if (slp)
> -           vec_oprnd = VEC_index (tree, vec_oprnds, i);
> -         else if (strided_store)
> -           /* For strided stores vectorized defs are interleaved in
> -              vect_permute_store_chain().  */
> -           vec_oprnd = VEC_index (tree, result_chain, i);
> -
> -         data_ref = build2 (MEM_REF, TREE_TYPE (vec_oprnd), dataref_ptr,
> -                            build_int_cst (reference_alias_ptr_type
> -                                           (DR_REF (first_dr)), 0));
> -         pi = get_ptr_info (dataref_ptr);
> -         pi->align = TYPE_ALIGN_UNIT (vectype);
> -          if (aligned_access_p (first_dr))
> -           pi->misalign = 0;
> -          else if (DR_MISALIGNMENT (first_dr) == -1)
> +         if (strided_store)
>            {
> -             TREE_TYPE (data_ref)
> -               = build_aligned_type (TREE_TYPE (data_ref),
> -                                     TYPE_ALIGN (TREE_TYPE (vectype)));
> -             pi->align = TYPE_ALIGN_UNIT (TREE_TYPE (vectype));
> -             pi->misalign = 0;
> +             result_chain = VEC_alloc (tree, heap, group_size);
> +             /* Permute.  */
> +             if (!vect_permute_store_chain (dr_chain, group_size, stmt, gsi,
> +                                            &result_chain))
> +               return false;
>            }
> -         else
> +
> +         next_stmt = first_stmt;
> +         for (i = 0; i < vec_num; i++)
>            {
> -             TREE_TYPE (data_ref)
> -               = build_aligned_type (TREE_TYPE (data_ref),
> -                                     TYPE_ALIGN (TREE_TYPE (vectype)));
> -             pi->misalign = DR_MISALIGNMENT (first_dr);
> -           }
> +             struct ptr_info_def *pi;
>
> -         /* Arguments are ready.  Create the new vector stmt.  */
> -         new_stmt = gimple_build_assign (data_ref, vec_oprnd);
> -         vect_finish_stmt_generation (stmt, new_stmt, gsi);
> -         mark_symbols_for_renaming (new_stmt);
> +             if (i > 0)
> +               /* Bump the vector pointer.  */
> +               dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi,
> +                                              stmt, NULL_TREE);
> +
> +             if (slp)
> +               vec_oprnd = VEC_index (tree, vec_oprnds, i);
> +             else if (strided_store)
> +               /* For strided stores vectorized defs are interleaved in
> +                  vect_permute_store_chain().  */
> +               vec_oprnd = VEC_index (tree, result_chain, i);
> +
> +             data_ref = build2 (MEM_REF, TREE_TYPE (vec_oprnd), dataref_ptr,
> +                                build_int_cst (reference_alias_ptr_type
> +                                               (DR_REF (first_dr)), 0));
> +             pi = get_ptr_info (dataref_ptr);
> +             pi->align = TYPE_ALIGN_UNIT (vectype);
> +             if (aligned_access_p (first_dr))
> +               pi->misalign = 0;
> +             else if (DR_MISALIGNMENT (first_dr) == -1)
> +               {
> +                 TREE_TYPE (data_ref)
> +                   = build_aligned_type (TREE_TYPE (data_ref),
> +                                         TYPE_ALIGN (elem_type));
> +                 pi->align = TYPE_ALIGN_UNIT (elem_type);
> +                 pi->misalign = 0;
> +               }
> +             else
> +               {
> +                 TREE_TYPE (data_ref)
> +                   = build_aligned_type (TREE_TYPE (data_ref),
> +                                         TYPE_ALIGN (elem_type));
> +                 pi->misalign = DR_MISALIGNMENT (first_dr);
> +               }
>
> -          if (slp)
> -            continue;
> +             /* Arguments are ready.  Create the new vector stmt.  */
> +             new_stmt = gimple_build_assign (data_ref, vec_oprnd);
> +             vect_finish_stmt_generation (stmt, new_stmt, gsi);
> +             mark_symbols_for_renaming (new_stmt);
>
> -          if (j == 0)
> -            STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt =  new_stmt;
> -         else
> -           STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
> +             if (slp)
> +               continue;
> +
> +             if (j == 0)
> +               STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt =  new_stmt;
> +             else
> +               STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
>
> -         prev_stmt_info = vinfo_for_stmt (new_stmt);
> -         next_stmt = DR_GROUP_NEXT_DR (vinfo_for_stmt (next_stmt));
> -         if (!next_stmt)
> -           break;
> +             prev_stmt_info = vinfo_for_stmt (new_stmt);
> +             next_stmt = DR_GROUP_NEXT_DR (vinfo_for_stmt (next_stmt));
> +             if (!next_stmt)
> +               break;
> +           }
>        }
>     }
>
> @@ -3784,6 +3789,7 @@ vectorizable_load (gimple stmt, gimple_s
>   bool nested_in_vect_loop = false;
>   struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info), *first_dr;
>   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> +  tree elem_type;
>   tree new_temp;
>   enum machine_mode mode;
>   gimple new_stmt = NULL;
> @@ -3888,7 +3894,8 @@ vectorizable_load (gimple stmt, gimple_s
>
>   /* The vector component type needs to be trivially convertible to the
>      scalar lhs.  This should always be the case.  */
> -  if (!useless_type_conversion_p (TREE_TYPE (scalar_dest), TREE_TYPE (vectype)))
> +  elem_type = TREE_TYPE (vectype);
> +  if (!useless_type_conversion_p (TREE_TYPE (scalar_dest), elem_type))
>     {
>       if (vect_print_dump_info (REPORT_DETAILS))
>         fprintf (vect_dump, "???  operands of different types");
> @@ -4117,193 +4124,205 @@ vectorizable_load (gimple stmt, gimple_s
>       if (strided_load || slp_perm)
>        dr_chain = VEC_alloc (tree, heap, vec_num);
>
> -      for (i = 0; i < vec_num; i++)
> +      if (1)
>        {
> -         if (i > 0)
> -           dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt,
> -                                          NULL_TREE);
> -
> -         /* 2. Create the vector-load in the loop.  */
> -         switch (alignment_support_scheme)
> +         for (i = 0; i < vec_num; i++)
>            {
> -           case dr_aligned:
> -           case dr_unaligned_supported:
> -             {
> -               struct ptr_info_def *pi;
> -               data_ref
> -                 = build2 (MEM_REF, vectype, dataref_ptr,
> -                           build_int_cst (reference_alias_ptr_type
> -                                          (DR_REF (first_dr)), 0));
> -               pi = get_ptr_info (dataref_ptr);
> -               pi->align = TYPE_ALIGN_UNIT (vectype);
> -               if (alignment_support_scheme == dr_aligned)
> -                 {
> -                   gcc_assert (aligned_access_p (first_dr));
> -                   pi->misalign = 0;
> -                 }
> -               else if (DR_MISALIGNMENT (first_dr) == -1)
> +             if (i > 0)
> +               dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi,
> +                                              stmt, NULL_TREE);
> +
> +             /* 2. Create the vector-load in the loop.  */
> +             switch (alignment_support_scheme)
> +               {
> +               case dr_aligned:
> +               case dr_unaligned_supported:
>                  {
> -                   TREE_TYPE (data_ref)
> -                     = build_aligned_type (TREE_TYPE (data_ref),
> -                                           TYPE_ALIGN (TREE_TYPE (vectype)));
> -                   pi->align = TYPE_ALIGN_UNIT (TREE_TYPE (vectype));
> -                   pi->misalign = 0;
> +                   struct ptr_info_def *pi;
> +                   data_ref
> +                     = build2 (MEM_REF, vectype, dataref_ptr,
> +                               build_int_cst (reference_alias_ptr_type
> +                                              (DR_REF (first_dr)), 0));
> +                   pi = get_ptr_info (dataref_ptr);
> +                   pi->align = TYPE_ALIGN_UNIT (vectype);
> +                   if (alignment_support_scheme == dr_aligned)
> +                     {
> +                       gcc_assert (aligned_access_p (first_dr));
> +                       pi->misalign = 0;
> +                     }
> +                   else if (DR_MISALIGNMENT (first_dr) == -1)
> +                     {
> +                       TREE_TYPE (data_ref)
> +                         = build_aligned_type (TREE_TYPE (data_ref),
> +                                               TYPE_ALIGN (elem_type));
> +                       pi->align = TYPE_ALIGN_UNIT (elem_type);
> +                       pi->misalign = 0;
> +                     }
> +                   else
> +                     {
> +                       TREE_TYPE (data_ref)
> +                         = build_aligned_type (TREE_TYPE (data_ref),
> +                                               TYPE_ALIGN (elem_type));
> +                       pi->misalign = DR_MISALIGNMENT (first_dr);
> +                     }
> +                   break;
>                  }
> -               else
> +               case dr_explicit_realign:
>                  {
> -                   TREE_TYPE (data_ref)
> -                     = build_aligned_type (TREE_TYPE (data_ref),
> -                                           TYPE_ALIGN (TREE_TYPE (vectype)));
> -                   pi->misalign = DR_MISALIGNMENT (first_dr);
> +                   tree ptr, bump;
> +                   tree vs_minus_1;
> +
> +                   vs_minus_1 = size_int (TYPE_VECTOR_SUBPARTS (vectype) - 1);
> +
> +                   if (compute_in_loop)
> +                     msq = vect_setup_realignment (first_stmt, gsi,
> +                                                   &realignment_token,
> +                                                   dr_explicit_realign,
> +                                                   dataref_ptr, NULL);
> +
> +                   new_stmt = gimple_build_assign_with_ops
> +                                (BIT_AND_EXPR, NULL_TREE, dataref_ptr,
> +                                 build_int_cst
> +                                 (TREE_TYPE (dataref_ptr),
> +                                  -(HOST_WIDE_INT)TYPE_ALIGN_UNIT (vectype)));
> +                   ptr = make_ssa_name (SSA_NAME_VAR (dataref_ptr), new_stmt);
> +                   gimple_assign_set_lhs (new_stmt, ptr);
> +                   vect_finish_stmt_generation (stmt, new_stmt, gsi);
> +                   data_ref
> +                     = build2 (MEM_REF, vectype, ptr,
> +                               build_int_cst (reference_alias_ptr_type
> +                                                (DR_REF (first_dr)), 0));
> +                   vec_dest = vect_create_destination_var (scalar_dest,
> +                                                           vectype);
> +                   new_stmt = gimple_build_assign (vec_dest, data_ref);
> +                   new_temp = make_ssa_name (vec_dest, new_stmt);
> +                   gimple_assign_set_lhs (new_stmt, new_temp);
> +                   gimple_set_vdef (new_stmt, gimple_vdef (stmt));
> +                   gimple_set_vuse (new_stmt, gimple_vuse (stmt));
> +                   vect_finish_stmt_generation (stmt, new_stmt, gsi);
> +                   msq = new_temp;
> +
> +                   bump = size_binop (MULT_EXPR, vs_minus_1,
> +                                      TYPE_SIZE_UNIT (scalar_type));
> +                   ptr = bump_vector_ptr (dataref_ptr, NULL, gsi, stmt, bump);
> +                   new_stmt = gimple_build_assign_with_ops
> +                                (BIT_AND_EXPR, NULL_TREE, ptr,
> +                                 build_int_cst
> +                                 (TREE_TYPE (ptr),
> +                                  -(HOST_WIDE_INT)TYPE_ALIGN_UNIT (vectype)));
> +                   ptr = make_ssa_name (SSA_NAME_VAR (dataref_ptr), new_stmt);
> +                   gimple_assign_set_lhs (new_stmt, ptr);
> +                   vect_finish_stmt_generation (stmt, new_stmt, gsi);
> +                   data_ref
> +                     = build2 (MEM_REF, vectype, ptr,
> +                               build_int_cst (reference_alias_ptr_type
> +                                                (DR_REF (first_dr)), 0));
> +                   break;
>                  }
> -               break;
> -             }
> -           case dr_explicit_realign:
> -             {
> -               tree ptr, bump;
> -               tree vs_minus_1 = size_int (TYPE_VECTOR_SUBPARTS (vectype) - 1);
> -
> -               if (compute_in_loop)
> -                 msq = vect_setup_realignment (first_stmt, gsi,
> -                                               &realignment_token,
> -                                               dr_explicit_realign,
> -                                               dataref_ptr, NULL);
> -
> -               new_stmt = gimple_build_assign_with_ops
> -                            (BIT_AND_EXPR, NULL_TREE, dataref_ptr,
> -                             build_int_cst
> -                               (TREE_TYPE (dataref_ptr),
> -                                -(HOST_WIDE_INT)TYPE_ALIGN_UNIT (vectype)));
> -               ptr = make_ssa_name (SSA_NAME_VAR (dataref_ptr), new_stmt);
> -               gimple_assign_set_lhs (new_stmt, ptr);
> -               vect_finish_stmt_generation (stmt, new_stmt, gsi);
> -               data_ref
> -                 = build2 (MEM_REF, vectype, ptr,
> -                           build_int_cst (reference_alias_ptr_type
> -                                            (DR_REF (first_dr)), 0));
> -               vec_dest = vect_create_destination_var (scalar_dest, vectype);
> -               new_stmt = gimple_build_assign (vec_dest, data_ref);
> -               new_temp = make_ssa_name (vec_dest, new_stmt);
> -               gimple_assign_set_lhs (new_stmt, new_temp);
> -               gimple_set_vdef (new_stmt, gimple_vdef (stmt));
> -               gimple_set_vuse (new_stmt, gimple_vuse (stmt));
> -               vect_finish_stmt_generation (stmt, new_stmt, gsi);
> -               msq = new_temp;
> -
> -               bump = size_binop (MULT_EXPR, vs_minus_1,
> -                                  TYPE_SIZE_UNIT (scalar_type));
> -               ptr = bump_vector_ptr (dataref_ptr, NULL, gsi, stmt, bump);
> -               new_stmt = gimple_build_assign_with_ops
> -                            (BIT_AND_EXPR, NULL_TREE, ptr,
> -                             build_int_cst
> -                               (TREE_TYPE (ptr),
> -                                -(HOST_WIDE_INT)TYPE_ALIGN_UNIT (vectype)));
> -               ptr = make_ssa_name (SSA_NAME_VAR (dataref_ptr), new_stmt);
> -               gimple_assign_set_lhs (new_stmt, ptr);
> -               vect_finish_stmt_generation (stmt, new_stmt, gsi);
> -               data_ref
> -                 = build2 (MEM_REF, vectype, ptr,
> -                           build_int_cst (reference_alias_ptr_type
> -                                            (DR_REF (first_dr)), 0));
> -               break;
> -             }
> -           case dr_explicit_realign_optimized:
> -             new_stmt = gimple_build_assign_with_ops
> -                          (BIT_AND_EXPR, NULL_TREE, dataref_ptr,
> -                           build_int_cst
> -                             (TREE_TYPE (dataref_ptr),
> -                              -(HOST_WIDE_INT)TYPE_ALIGN_UNIT (vectype)));
> -             new_temp = make_ssa_name (SSA_NAME_VAR (dataref_ptr), new_stmt);
> -             gimple_assign_set_lhs (new_stmt, new_temp);
> -             vect_finish_stmt_generation (stmt, new_stmt, gsi);
> -             data_ref
> -               = build2 (MEM_REF, vectype, new_temp,
> -                         build_int_cst (reference_alias_ptr_type
> -                                          (DR_REF (first_dr)), 0));
> -             break;
> -           default:
> -             gcc_unreachable ();
> -           }
> -         vec_dest = vect_create_destination_var (scalar_dest, vectype);
> -         new_stmt = gimple_build_assign (vec_dest, data_ref);
> -         new_temp = make_ssa_name (vec_dest, new_stmt);
> -         gimple_assign_set_lhs (new_stmt, new_temp);
> -         vect_finish_stmt_generation (stmt, new_stmt, gsi);
> -         mark_symbols_for_renaming (new_stmt);
> -
> -         /* 3. Handle explicit realignment if necessary/supported.  Create in
> -               loop: vec_dest = realign_load (msq, lsq, realignment_token)  */
> -         if (alignment_support_scheme == dr_explicit_realign_optimized
> -             || alignment_support_scheme == dr_explicit_realign)
> -           {
> -             lsq = gimple_assign_lhs (new_stmt);
> -             if (!realignment_token)
> -               realignment_token = dataref_ptr;
> +               case dr_explicit_realign_optimized:
> +                 new_stmt = gimple_build_assign_with_ops
> +                              (BIT_AND_EXPR, NULL_TREE, dataref_ptr,
> +                               build_int_cst
> +                                 (TREE_TYPE (dataref_ptr),
> +                                  -(HOST_WIDE_INT)TYPE_ALIGN_UNIT (vectype)));
> +                 new_temp = make_ssa_name (SSA_NAME_VAR (dataref_ptr),
> +                                           new_stmt);
> +                 gimple_assign_set_lhs (new_stmt, new_temp);
> +                 vect_finish_stmt_generation (stmt, new_stmt, gsi);
> +                 data_ref
> +                   = build2 (MEM_REF, vectype, new_temp,
> +                             build_int_cst (reference_alias_ptr_type
> +                                              (DR_REF (first_dr)), 0));
> +                 break;
> +               default:
> +                 gcc_unreachable ();
> +               }
>              vec_dest = vect_create_destination_var (scalar_dest, vectype);
> -             new_stmt
> -               = gimple_build_assign_with_ops3 (REALIGN_LOAD_EXPR, vec_dest,
> -                                                msq, lsq, realignment_token);
> +             new_stmt = gimple_build_assign (vec_dest, data_ref);
>              new_temp = make_ssa_name (vec_dest, new_stmt);
>              gimple_assign_set_lhs (new_stmt, new_temp);
>              vect_finish_stmt_generation (stmt, new_stmt, gsi);
> +             mark_symbols_for_renaming (new_stmt);
>
> -             if (alignment_support_scheme == dr_explicit_realign_optimized)
> +             /* 3. Handle explicit realignment if necessary/supported.
> +                Create in loop:
> +                  vec_dest = realign_load (msq, lsq, realignment_token)  */
> +             if (alignment_support_scheme == dr_explicit_realign_optimized
> +                 || alignment_support_scheme == dr_explicit_realign)
>                {
> -                 gcc_assert (phi);
> -                 if (i == vec_num - 1 && j == ncopies - 1)
> -                   add_phi_arg (phi, lsq, loop_latch_edge (containing_loop),
> -                                UNKNOWN_LOCATION);
> -                 msq = lsq;
> +                 lsq = gimple_assign_lhs (new_stmt);
> +                 if (!realignment_token)
> +                   realignment_token = dataref_ptr;
> +                 vec_dest = vect_create_destination_var (scalar_dest, vectype);
> +                 new_stmt
> +                   = gimple_build_assign_with_ops3 (REALIGN_LOAD_EXPR,
> +                                                    vec_dest, msq, lsq,
> +                                                    realignment_token);
> +                 new_temp = make_ssa_name (vec_dest, new_stmt);
> +                 gimple_assign_set_lhs (new_stmt, new_temp);
> +                 vect_finish_stmt_generation (stmt, new_stmt, gsi);
> +
> +                 if (alignment_support_scheme == dr_explicit_realign_optimized)
> +                   {
> +                     gcc_assert (phi);
> +                     if (i == vec_num - 1 && j == ncopies - 1)
> +                       add_phi_arg (phi, lsq,
> +                                    loop_latch_edge (containing_loop),
> +                                    UNKNOWN_LOCATION);
> +                     msq = lsq;
> +                   }
>                }
> -           }
>
> -         /* 4. Handle invariant-load.  */
> -         if (inv_p && !bb_vinfo)
> -           {
> -             gcc_assert (!strided_load);
> -             gcc_assert (nested_in_vect_loop_p (loop, stmt));
> -             if (j == 0)
> +             /* 4. Handle invariant-load.  */
> +             if (inv_p && !bb_vinfo)
>                {
> -                 int k;
> -                 tree t = NULL_TREE;
> -                 tree vec_inv, bitpos, bitsize = TYPE_SIZE (scalar_type);
> -
> -                 /* CHECKME: bitpos depends on endianess?  */
> -                 bitpos = bitsize_zero_node;
> -                 vec_inv = build3 (BIT_FIELD_REF, scalar_type, new_temp,
> -                                   bitsize, bitpos);
> -                 vec_dest =
> -                       vect_create_destination_var (scalar_dest, NULL_TREE);
> -                 new_stmt = gimple_build_assign (vec_dest, vec_inv);
> -                  new_temp = make_ssa_name (vec_dest, new_stmt);
> -                 gimple_assign_set_lhs (new_stmt, new_temp);
> -                 vect_finish_stmt_generation (stmt, new_stmt, gsi);
> +                 gcc_assert (!strided_load);
> +                 gcc_assert (nested_in_vect_loop_p (loop, stmt));
> +                 if (j == 0)
> +                   {
> +                     int k;
> +                     tree t = NULL_TREE;
> +                     tree vec_inv, bitpos, bitsize = TYPE_SIZE (scalar_type);
> +
> +                     /* CHECKME: bitpos depends on endianess?  */
> +                     bitpos = bitsize_zero_node;
> +                     vec_inv = build3 (BIT_FIELD_REF, scalar_type, new_temp,
> +                                       bitsize, bitpos);
> +                     vec_dest = vect_create_destination_var (scalar_dest,
> +                                                             NULL_TREE);
> +                     new_stmt = gimple_build_assign (vec_dest, vec_inv);
> +                     new_temp = make_ssa_name (vec_dest, new_stmt);
> +                     gimple_assign_set_lhs (new_stmt, new_temp);
> +                     vect_finish_stmt_generation (stmt, new_stmt, gsi);
> +
> +                     for (k = nunits - 1; k >= 0; --k)
> +                       t = tree_cons (NULL_TREE, new_temp, t);
> +                     /* FIXME: use build_constructor directly.  */
> +                     vec_inv = build_constructor_from_list (vectype, t);
> +                     new_temp = vect_init_vector (stmt, vec_inv,
> +                                                  vectype, gsi);
> +                     new_stmt = SSA_NAME_DEF_STMT (new_temp);
> +                   }
> +                 else
> +                   gcc_unreachable (); /* FORNOW. */
> +               }
>
> -                 for (k = nunits - 1; k >= 0; --k)
> -                   t = tree_cons (NULL_TREE, new_temp, t);
> -                 /* FIXME: use build_constructor directly.  */
> -                 vec_inv = build_constructor_from_list (vectype, t);
> -                 new_temp = vect_init_vector (stmt, vec_inv, vectype, gsi);
> +             if (negative)
> +               {
> +                 new_temp = reverse_vec_elements (new_temp, stmt, gsi);
>                  new_stmt = SSA_NAME_DEF_STMT (new_temp);
>                }
> -             else
> -               gcc_unreachable (); /* FORNOW. */
> -           }
>
> -         if (negative)
> -           {
> -             new_temp = reverse_vec_elements (new_temp, stmt, gsi);
> -             new_stmt = SSA_NAME_DEF_STMT (new_temp);
> +             /* Collect vector loads and later create their permutation in
> +                vect_transform_strided_load ().  */
> +             if (strided_load || slp_perm)
> +               VEC_quick_push (tree, dr_chain, new_temp);
> +
> +             /* Store vector loads in the corresponding SLP_NODE.  */
> +             if (slp && !slp_perm)
> +               VEC_quick_push (gimple, SLP_TREE_VEC_STMTS (slp_node),
> +                               new_stmt);
>            }
> -
> -         /* Collect vector loads and later create their permutation in
> -            vect_transform_strided_load ().  */
> -          if (strided_load || slp_perm)
> -            VEC_quick_push (tree, dr_chain, new_temp);
> -
> -         /* Store vector loads in the corresponding SLP_NODE.  */
> -         if (slp && !slp_perm)
> -           VEC_quick_push (gimple, SLP_TREE_VEC_STMTS (slp_node), new_stmt);
>        }
>
>       if (slp && !slp_perm)
> @@ -4322,7 +4341,8 @@ vectorizable_load (gimple stmt, gimple_s
>         {
>           if (strided_load)
>            {
> -             if (!vect_transform_strided_load (stmt, dr_chain, group_size, gsi))
> +             if (!vect_transform_strided_load (stmt, dr_chain,
> +                                               group_size, gsi))
>                return false;
>
>              *vec_stmt = STMT_VINFO_VEC_STMT (stmt_info);
>
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       2011-04-12 14:27:00.000000000 +0100
> +++ gcc/tree-vect-stmts.c       2011-04-12 14:27:02.000000000 +0100
> @@ -3308,6 +3308,7 @@ vectorizable_store (gimple stmt, gimple_
>   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>   struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info), *first_dr = NULL;
>   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> +  tree elem_type;
>   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>   struct loop *loop = NULL;
>   enum machine_mode vec_mode;
> @@ -3383,7 +3384,8 @@ vectorizable_store (gimple stmt, gimple_
>
>   /* The scalar rhs type needs to be trivially convertible to the vector
>      component type.  This should always be the case.  */
> -  if (!useless_type_conversion_p (TREE_TYPE (vectype), TREE_TYPE (op)))
> +  elem_type = TREE_TYPE (vectype);
> +  if (!useless_type_conversion_p (elem_type, TREE_TYPE (op)))
>     {
>       if (vect_print_dump_info (REPORT_DETAILS))
>         fprintf (vect_dump, "???  operands of different types");
> @@ -3608,6 +3610,8 @@ vectorizable_store (gimple stmt, gimple_
>                bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt, NULL_TREE);
>        }
>
> +      if (1)
> +       {
>       if (strided_store)
>        {
>          result_chain = VEC_alloc (tree, heap, group_size);
> @@ -3624,8 +3628,8 @@ vectorizable_store (gimple stmt, gimple_
>
>          if (i > 0)
>            /* Bump the vector pointer.  */
> -           dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt,
> -                                          NULL_TREE);
> +               dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi,
> +                                              stmt, NULL_TREE);
>
>          if (slp)
>            vec_oprnd = VEC_index (tree, vec_oprnds, i);
> @@ -3645,15 +3649,15 @@ vectorizable_store (gimple stmt, gimple_
>            {
>              TREE_TYPE (data_ref)
>                = build_aligned_type (TREE_TYPE (data_ref),
> -                                     TYPE_ALIGN (TREE_TYPE (vectype)));
> -             pi->align = TYPE_ALIGN_UNIT (TREE_TYPE (vectype));
> +                                         TYPE_ALIGN (elem_type));
> +                 pi->align = TYPE_ALIGN_UNIT (elem_type);
>              pi->misalign = 0;
>            }
>          else
>            {
>              TREE_TYPE (data_ref)
>                = build_aligned_type (TREE_TYPE (data_ref),
> -                                     TYPE_ALIGN (TREE_TYPE (vectype)));
> +                                         TYPE_ALIGN (elem_type));
>              pi->misalign = DR_MISALIGNMENT (first_dr);
>            }
>
> @@ -3676,6 +3680,7 @@ vectorizable_store (gimple stmt, gimple_
>            break;
>        }
>     }
> +    }
>
>   VEC_free (tree, heap, dr_chain);
>   VEC_free (tree, heap, oprnds);
> @@ -3784,6 +3789,7 @@ vectorizable_load (gimple stmt, gimple_s
>   bool nested_in_vect_loop = false;
>   struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info), *first_dr;
>   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> +  tree elem_type;
>   tree new_temp;
>   enum machine_mode mode;
>   gimple new_stmt = NULL;
> @@ -3888,7 +3894,8 @@ vectorizable_load (gimple stmt, gimple_s
>
>   /* The vector component type needs to be trivially convertible to the
>      scalar lhs.  This should always be the case.  */
> -  if (!useless_type_conversion_p (TREE_TYPE (scalar_dest), TREE_TYPE (vectype)))
> +  elem_type = TREE_TYPE (vectype);
> +  if (!useless_type_conversion_p (TREE_TYPE (scalar_dest), elem_type))
>     {
>       if (vect_print_dump_info (REPORT_DETAILS))
>         fprintf (vect_dump, "???  operands of different types");
> @@ -4117,11 +4124,13 @@ vectorizable_load (gimple stmt, gimple_s
>       if (strided_load || slp_perm)
>        dr_chain = VEC_alloc (tree, heap, vec_num);
>
> +      if (1)
> +       {
>       for (i = 0; i < vec_num; i++)
>        {
>          if (i > 0)
> -           dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt,
> -                                          NULL_TREE);
> +               dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi,
> +                                              stmt, NULL_TREE);
>
>          /* 2. Create the vector-load in the loop.  */
>          switch (alignment_support_scheme)
> @@ -4145,15 +4154,15 @@ vectorizable_load (gimple stmt, gimple_s
>                  {
>                    TREE_TYPE (data_ref)
>                      = build_aligned_type (TREE_TYPE (data_ref),
> -                                           TYPE_ALIGN (TREE_TYPE (vectype)));
> -                   pi->align = TYPE_ALIGN_UNIT (TREE_TYPE (vectype));
> +                                               TYPE_ALIGN (elem_type));
> +                       pi->align = TYPE_ALIGN_UNIT (elem_type);
>                    pi->misalign = 0;
>                  }
>                else
>                  {
>                    TREE_TYPE (data_ref)
>                      = build_aligned_type (TREE_TYPE (data_ref),
> -                                           TYPE_ALIGN (TREE_TYPE (vectype)));
> +                                               TYPE_ALIGN (elem_type));
>                    pi->misalign = DR_MISALIGNMENT (first_dr);
>                  }
>                break;
> @@ -4161,7 +4170,9 @@ vectorizable_load (gimple stmt, gimple_s
>            case dr_explicit_realign:
>              {
>                tree ptr, bump;
> -               tree vs_minus_1 = size_int (TYPE_VECTOR_SUBPARTS (vectype) - 1);
> +                   tree vs_minus_1;
> +
> +                   vs_minus_1 = size_int (TYPE_VECTOR_SUBPARTS (vectype) - 1);
>
>                if (compute_in_loop)
>                  msq = vect_setup_realignment (first_stmt, gsi,
> @@ -4181,7 +4192,8 @@ vectorizable_load (gimple stmt, gimple_s
>                  = build2 (MEM_REF, vectype, ptr,
>                            build_int_cst (reference_alias_ptr_type
>                                             (DR_REF (first_dr)), 0));
> -               vec_dest = vect_create_destination_var (scalar_dest, vectype);
> +                   vec_dest = vect_create_destination_var (scalar_dest,
> +                                                           vectype);
>                new_stmt = gimple_build_assign (vec_dest, data_ref);
>                new_temp = make_ssa_name (vec_dest, new_stmt);
>                gimple_assign_set_lhs (new_stmt, new_temp);
> @@ -4213,7 +4225,8 @@ vectorizable_load (gimple stmt, gimple_s
>                            build_int_cst
>                              (TREE_TYPE (dataref_ptr),
>                               -(HOST_WIDE_INT)TYPE_ALIGN_UNIT (vectype)));
> -             new_temp = make_ssa_name (SSA_NAME_VAR (dataref_ptr), new_stmt);
> +                 new_temp = make_ssa_name (SSA_NAME_VAR (dataref_ptr),
> +                                           new_stmt);
>              gimple_assign_set_lhs (new_stmt, new_temp);
>              vect_finish_stmt_generation (stmt, new_stmt, gsi);
>              data_ref
> @@ -4231,8 +4244,9 @@ vectorizable_load (gimple stmt, gimple_s
>          vect_finish_stmt_generation (stmt, new_stmt, gsi);
>          mark_symbols_for_renaming (new_stmt);
>
> -         /* 3. Handle explicit realignment if necessary/supported.  Create in
> -               loop: vec_dest = realign_load (msq, lsq, realignment_token)  */
> +             /* 3. Handle explicit realignment if necessary/supported.
> +                Create in loop:
> +                  vec_dest = realign_load (msq, lsq, realignment_token)  */
>          if (alignment_support_scheme == dr_explicit_realign_optimized
>              || alignment_support_scheme == dr_explicit_realign)
>            {
> @@ -4241,8 +4255,9 @@ vectorizable_load (gimple stmt, gimple_s
>                realignment_token = dataref_ptr;
>              vec_dest = vect_create_destination_var (scalar_dest, vectype);
>              new_stmt
> -               = gimple_build_assign_with_ops3 (REALIGN_LOAD_EXPR, vec_dest,
> -                                                msq, lsq, realignment_token);
> +                   = gimple_build_assign_with_ops3 (REALIGN_LOAD_EXPR,
> +                                                    vec_dest, msq, lsq,
> +                                                    realignment_token);
>              new_temp = make_ssa_name (vec_dest, new_stmt);
>              gimple_assign_set_lhs (new_stmt, new_temp);
>              vect_finish_stmt_generation (stmt, new_stmt, gsi);
> @@ -4251,7 +4266,8 @@ vectorizable_load (gimple stmt, gimple_s
>                {
>                  gcc_assert (phi);
>                  if (i == vec_num - 1 && j == ncopies - 1)
> -                   add_phi_arg (phi, lsq, loop_latch_edge (containing_loop),
> +                       add_phi_arg (phi, lsq,
> +                                    loop_latch_edge (containing_loop),
>                                 UNKNOWN_LOCATION);
>                  msq = lsq;
>                }
> @@ -4272,8 +4288,8 @@ vectorizable_load (gimple stmt, gimple_s
>                  bitpos = bitsize_zero_node;
>                  vec_inv = build3 (BIT_FIELD_REF, scalar_type, new_temp,
>                                    bitsize, bitpos);
> -                 vec_dest =
> -                       vect_create_destination_var (scalar_dest, NULL_TREE);
> +                     vec_dest = vect_create_destination_var (scalar_dest,
> +                                                             NULL_TREE);
>                  new_stmt = gimple_build_assign (vec_dest, vec_inv);
>                   new_temp = make_ssa_name (vec_dest, new_stmt);
>                  gimple_assign_set_lhs (new_stmt, new_temp);
> @@ -4283,7 +4299,8 @@ vectorizable_load (gimple stmt, gimple_s
>                    t = tree_cons (NULL_TREE, new_temp, t);
>                  /* FIXME: use build_constructor directly.  */
>                  vec_inv = build_constructor_from_list (vectype, t);
> -                 new_temp = vect_init_vector (stmt, vec_inv, vectype, gsi);
> +                     new_temp = vect_init_vector (stmt, vec_inv,
> +                                                  vectype, gsi);
>                  new_stmt = SSA_NAME_DEF_STMT (new_temp);
>                }
>              else
> @@ -4303,7 +4320,9 @@ vectorizable_load (gimple stmt, gimple_s
>
>          /* Store vector loads in the corresponding SLP_NODE.  */
>          if (slp && !slp_perm)
> -           VEC_quick_push (gimple, SLP_TREE_VEC_STMTS (slp_node), new_stmt);
> +               VEC_quick_push (gimple, SLP_TREE_VEC_STMTS (slp_node),
> +                               new_stmt);
> +           }
>        }
>
>       if (slp && !slp_perm)
> @@ -4322,7 +4341,8 @@ vectorizable_load (gimple stmt, gimple_s
>         {
>           if (strided_load)
>            {
> -             if (!vect_transform_strided_load (stmt, dr_chain, group_size, gsi))
> +             if (!vect_transform_strided_load (stmt, dr_chain,
> +                                               group_size, gsi))
>                return false;
>
>              *vec_stmt = STMT_VINFO_VEC_STMT (stmt_info);
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [3/9] STMT_VINFO_RELATED_STMT handling in vectorizable_store
  2011-04-12 13:21 [0/9] Direct support for loads and stores of interleaved vectors Richard Sandiford
  2011-04-12 13:25 ` [1/9] Generalise vect_create_data_ref_ptr Richard Sandiford
  2011-04-12 13:28 ` [2/9] Reindent parts of vectorizable_load and vectorizable_store Richard Sandiford
@ 2011-04-12 13:40 ` Richard Sandiford
  2011-04-17 10:25   ` Ira Rosen
  2011-04-12 13:44 ` [4/9] Move power-of-two checks for interleaving Richard Sandiford
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Richard Sandiford @ 2011-04-12 13:40 UTC (permalink / raw)
  To: gcc-patches; +Cc: patches

vectorizable_store contains the code:

  for (j = 0; j < ncopies; j++)
    {
      for (i = 0; i < vec_num; i++)
	{
	  ...
	      if (j == 0)
	        STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
	      else
	        STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
	      prev_stmt_info = vinfo_for_stmt (new_stmt);
         }
    }

That is, STMT_VINFO_VEC_STMT (stmt_info) and *vec_stmt contain the last
statement emitted for the _last_ vector of the first copy.  However,
for later copies, the last statement for _every_ vector is chained using
STMT_VINFO_RELATED_STMT.  This seems a bit inconsistent, and isn't
what I expected from the comments.  It also seems different from
other vectorisation functions, where each copy has exactly one
STMT_VINFO_RELATED_STMT.  I wasn't sure whether the difference here
was deliberate or not.

The reason I'm changing it is that it makes the control flow for
the new code more obvious.

Tested on x86_64-linux-gnu and arm-linux-gnueabi.  OK to install?

Richard


gcc/
	* tree-vect-stmts.c (vectorizable_store): Only chain one related
	statement per copy.

Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2011-04-12 11:55:08.000000000 +0100
+++ gcc/tree-vect-stmts.c	2011-04-12 11:55:09.000000000 +0100
@@ -3612,6 +3612,7 @@ vectorizable_store (gimple stmt, gimple_
 
       if (1)
 	{
+	  new_stmt = NULL;
 	  if (strided_store)
 	    {
 	      result_chain = VEC_alloc (tree, heap, group_size);
@@ -3669,17 +3670,19 @@ vectorizable_store (gimple stmt, gimple_
 	      if (slp)
 		continue;
 
-	      if (j == 0)
-		STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt =  new_stmt;
-	      else
-		STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
-
-	      prev_stmt_info = vinfo_for_stmt (new_stmt);
 	      next_stmt = DR_GROUP_NEXT_DR (vinfo_for_stmt (next_stmt));
 	      if (!next_stmt)
 		break;
 	    }
 	}
+      if (!slp)
+	{
+	  if (j == 0)
+	    STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt =  new_stmt;
+	  else
+	    STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
+	  prev_stmt_info = vinfo_for_stmt (new_stmt);
+	}
     }
 
   VEC_free (tree, heap, dr_chain);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [4/9] Move power-of-two checks for interleaving
  2011-04-12 13:21 [0/9] Direct support for loads and stores of interleaved vectors Richard Sandiford
                   ` (2 preceding siblings ...)
  2011-04-12 13:40 ` [3/9] STMT_VINFO_RELATED_STMT handling in vectorizable_store Richard Sandiford
@ 2011-04-12 13:44 ` Richard Sandiford
  2011-04-12 13:57   ` Richard Guenther
  2011-04-12 13:59 ` [5/9] Main target-independent support for direct interleaving Richard Sandiford
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Richard Sandiford @ 2011-04-12 13:44 UTC (permalink / raw)
  To: gcc-patches; +Cc: patches

NEON has vld3 and vst3 instructions, which support an interleaving of
three vectors.  This patch therefore removes the blanket power-of-two
requirement for interleaving and enforces it on a per-operation
basis instead.

The patch also replaces:

  /* Check that the operation is supported.  */
  if (!vect_strided_store_supported (vectype))
    return false;

with:

  gcc_assert (vect_strided_store_supported (vectype, length));

because it was vectorizable_store's responsibility to check this upfront.
Likewise for loads.

Tested on x86_64-linux-gnu and arm-linux-gnueabi.  OK to install?

Richard


gcc/
	* tree-vectorizer.h (vect_strided_store_supported): Add a
	HOST_WIDE_INT argument.
	(vect_strided_load_supported): Likewise.
	(vect_permute_store_chain): Return void.
	(vect_transform_strided_load): Likewise.
	(vect_permute_load_chain): Delete.
	* tree-vect-data-refs.c (vect_strided_store_supported): Take a
	count argument.  Check that the count is a power of two.
	(vect_strided_load_supported): Likewise.
	(vect_permute_store_chain): Return void.  Update after above changes.
	Assert that the access is supported.
	(vect_permute_load_chain): Likewise.
	(vect_transform_strided_load): Return void.
	* tree-vect-stmts.c (vectorizable_store): Update calls after
	above interface changes.
	(vectorizable_load): Likewise.
	(vect_analyze_stmt): Don't check for strided powers of two here.

Index: gcc/tree-vectorizer.h
===================================================================
--- gcc/tree-vectorizer.h	2011-04-12 11:55:07.000000000 +0100
+++ gcc/tree-vectorizer.h	2011-04-12 11:55:09.000000000 +0100
@@ -828,16 +828,14 @@ extern tree vect_create_data_ref_ptr (gi
 				      gimple *, bool, bool *);
 extern tree bump_vector_ptr (tree, gimple, gimple_stmt_iterator *, gimple, tree);
 extern tree vect_create_destination_var (tree, tree);
-extern bool vect_strided_store_supported (tree);
-extern bool vect_strided_load_supported (tree);
-extern bool vect_permute_store_chain (VEC(tree,heap) *,unsigned int, gimple,
+extern bool vect_strided_store_supported (tree, unsigned HOST_WIDE_INT);
+extern bool vect_strided_load_supported (tree, unsigned HOST_WIDE_INT);
+extern void vect_permute_store_chain (VEC(tree,heap) *,unsigned int, gimple,
                                     gimple_stmt_iterator *, VEC(tree,heap) **);
 extern tree vect_setup_realignment (gimple, gimple_stmt_iterator *, tree *,
                                     enum dr_alignment_support, tree,
                                     struct loop **);
-extern bool vect_permute_load_chain (VEC(tree,heap) *,unsigned int, gimple,
-                                    gimple_stmt_iterator *, VEC(tree,heap) **);
-extern bool vect_transform_strided_load (gimple, VEC(tree,heap) *, int,
+extern void vect_transform_strided_load (gimple, VEC(tree,heap) *, int,
                                          gimple_stmt_iterator *);
 extern int vect_get_place_in_interleaving_chain (gimple, gimple);
 extern tree vect_get_new_vect_var (tree, enum vect_var_kind, const char *);
Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2011-04-12 11:55:07.000000000 +0100
+++ gcc/tree-vect-data-refs.c	2011-04-12 11:55:09.000000000 +0100
@@ -2196,19 +2196,6 @@ vect_analyze_group_access (struct data_r
           return false;
         }
 
-      /* FORNOW: we handle only interleaving that is a power of 2.
-         We don't fail here if it may be still possible to vectorize the
-         group using SLP.  If not, the size of the group will be checked in
-         vect_analyze_operations, and the vectorization will fail.  */
-      if (exact_log2 (stride) == -1)
-	{
-	  if (vect_print_dump_info (REPORT_DETAILS))
-	    fprintf (vect_dump, "interleaving is not a power of 2");
-
-	  if (slp_impossible)
-	    return false;
-	}
-
       if (stride == 0)
         stride = count;
 
@@ -3349,13 +3336,22 @@ vect_create_destination_var (tree scalar
    and FALSE otherwise.  */
 
 bool
-vect_strided_store_supported (tree vectype)
+vect_strided_store_supported (tree vectype, unsigned HOST_WIDE_INT count)
 {
   optab interleave_high_optab, interleave_low_optab;
   enum machine_mode mode;
 
   mode = TYPE_MODE (vectype);
 
+  /* vect_permute_store_chain requires the group size to be a power of two.  */
+  if (exact_log2 (count) == -1)
+    {
+      if (vect_print_dump_info (REPORT_DETAILS))
+	fprintf (vect_dump, "the size of the group of strided accesses"
+		 " is not a power of 2");
+      return false;
+    }
+
   /* Check that the operation is supported.  */
   interleave_high_optab = optab_for_tree_code (VEC_INTERLEAVE_HIGH_EXPR,
 					       vectype, optab_default);
@@ -3441,7 +3437,7 @@ vect_strided_store_supported (tree vecty
    I3:  4 12 20 28  5 13 21 30
    I4:  6 14 22 30  7 15 23 31.  */
 
-bool
+void
 vect_permute_store_chain (VEC(tree,heap) *dr_chain,
 			  unsigned int length,
 			  gimple stmt,
@@ -3455,9 +3451,7 @@ vect_permute_store_chain (VEC(tree,heap)
   unsigned int j;
   enum tree_code high_code, low_code;
 
-  /* Check that the operation is supported.  */
-  if (!vect_strided_store_supported (vectype))
-    return false;
+  gcc_assert (vect_strided_store_supported (vectype, length));
 
   *result_chain = VEC_copy (tree, heap, dr_chain);
 
@@ -3510,7 +3504,6 @@ vect_permute_store_chain (VEC(tree,heap)
 	}
       dr_chain = VEC_copy (tree, heap, *result_chain);
     }
-  return true;
 }
 
 /* Function vect_setup_realignment
@@ -3787,13 +3780,22 @@ vect_setup_realignment (gimple stmt, gim
    and FALSE otherwise.  */
 
 bool
-vect_strided_load_supported (tree vectype)
+vect_strided_load_supported (tree vectype, unsigned HOST_WIDE_INT count)
 {
   optab perm_even_optab, perm_odd_optab;
   enum machine_mode mode;
 
   mode = TYPE_MODE (vectype);
 
+  /* vect_permute_load_chain requires the group size to be a power of two.  */
+  if (exact_log2 (count) == -1)
+    {
+      if (vect_print_dump_info (REPORT_DETAILS))
+	fprintf (vect_dump, "the size of the group of strided accesses"
+		 " is not a power of 2");
+      return false;
+    }
+
   perm_even_optab = optab_for_tree_code (VEC_EXTRACT_EVEN_EXPR, vectype,
 					 optab_default);
   if (!perm_even_optab)
@@ -3905,7 +3907,7 @@ vect_strided_load_supported (tree vectyp
    3rd vec (E2):  2 6 10 14 18 22 26 30
    4th vec (E4):  3 7 11 15 19 23 27 31.  */
 
-bool
+static void
 vect_permute_load_chain (VEC(tree,heap) *dr_chain,
 			 unsigned int length,
 			 gimple stmt,
@@ -3918,9 +3920,7 @@ vect_permute_load_chain (VEC(tree,heap) 
   int i;
   unsigned int j;
 
-  /* Check that the operation is supported.  */
-  if (!vect_strided_load_supported (vectype))
-    return false;
+  gcc_assert (vect_strided_load_supported (vectype, length));
 
   *result_chain = VEC_copy (tree, heap, dr_chain);
   for (i = 0; i < exact_log2 (length); i++)
@@ -3963,7 +3963,6 @@ vect_permute_load_chain (VEC(tree,heap) 
 	}
       dr_chain = VEC_copy (tree, heap, *result_chain);
     }
-  return true;
 }
 
 
@@ -3974,7 +3973,7 @@ vect_permute_load_chain (VEC(tree,heap) 
    the scalar statements.
 */
 
-bool
+void
 vect_transform_strided_load (gimple stmt, VEC(tree,heap) *dr_chain, int size,
 			     gimple_stmt_iterator *gsi)
 {
@@ -3990,8 +3989,7 @@ vect_transform_strided_load (gimple stmt
      vectors, that are ready for vector computation.  */
   result_chain = VEC_alloc (tree, heap, size);
   /* Permute.  */
-  if (!vect_permute_load_chain (dr_chain, size, stmt, gsi, &result_chain))
-    return false;
+  vect_permute_load_chain (dr_chain, size, stmt, gsi, &result_chain);
 
   /* Put a permuted data-ref in the VECTORIZED_STMT field.
      Since we scan the chain starting from it's first node, their order
@@ -4055,7 +4053,6 @@ vect_transform_strided_load (gimple stmt
     }
 
   VEC_free (tree, heap, result_chain);
-  return true;
 }
 
 /* Function vect_force_dr_alignment_p.
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2011-04-12 11:55:09.000000000 +0100
+++ gcc/tree-vect-stmts.c	2011-04-12 11:55:09.000000000 +0100
@@ -3412,9 +3412,12 @@ vectorizable_store (gimple stmt, gimple_
     {
       strided_store = true;
       first_stmt = DR_GROUP_FIRST_DR (stmt_info);
-      if (!vect_strided_store_supported (vectype)
-	  && !PURE_SLP_STMT (stmt_info) && !slp)
-	return false;
+      if (!slp && !PURE_SLP_STMT (stmt_info))
+	{
+	  group_size = DR_GROUP_SIZE (vinfo_for_stmt (first_stmt));
+	  if (!vect_strided_store_supported (vectype, group_size))
+	    return false;
+	}
 
       if (first_stmt == stmt)
 	{
@@ -3617,9 +3620,8 @@ vectorizable_store (gimple stmt, gimple_
 	    {
 	      result_chain = VEC_alloc (tree, heap, group_size);
 	      /* Permute.  */
-	      if (!vect_permute_store_chain (dr_chain, group_size, stmt, gsi,
-					     &result_chain))
-		return false;
+	      vect_permute_store_chain (dr_chain, group_size, stmt, gsi,
+					&result_chain);
 	    }
 
 	  next_stmt = first_stmt;
@@ -3912,10 +3914,13 @@ vectorizable_load (gimple stmt, gimple_s
       /* FORNOW */
       gcc_assert (! nested_in_vect_loop);
 
-      /* Check if interleaving is supported.  */
-      if (!vect_strided_load_supported (vectype)
-	  && !PURE_SLP_STMT (stmt_info) && !slp)
-	return false;
+      first_stmt = DR_GROUP_FIRST_DR (stmt_info);
+      if (!slp && !PURE_SLP_STMT (stmt_info))
+	{
+	  group_size = DR_GROUP_SIZE (vinfo_for_stmt (first_stmt));
+	  if (!vect_strided_load_supported (vectype, group_size))
+	    return false;
+	}
     }
 
   if (negative)
@@ -4344,10 +4349,7 @@ vectorizable_load (gimple stmt, gimple_s
         {
           if (strided_load)
   	    {
-	      if (!vect_transform_strided_load (stmt, dr_chain,
-						group_size, gsi))
-	        return false;
-
+	      vect_transform_strided_load (stmt, dr_chain, group_size, gsi);
 	      *vec_stmt = STMT_VINFO_VEC_STMT (stmt_info);
 	    }
           else
@@ -4766,27 +4768,6 @@ vect_analyze_stmt (gimple stmt, bool *ne
        return false;
     }
 
-  if (!PURE_SLP_STMT (stmt_info))
-    {
-      /* Groups of strided accesses whose size is not a power of 2 are not
-         vectorizable yet using loop-vectorization.  Therefore, if this stmt
-	 feeds non-SLP-able stmts (i.e., this stmt has to be both SLPed and
-	 loop-based vectorized), the loop cannot be vectorized.  */
-      if (STMT_VINFO_STRIDED_ACCESS (stmt_info)
-          && exact_log2 (DR_GROUP_SIZE (vinfo_for_stmt (
-                                        DR_GROUP_FIRST_DR (stmt_info)))) == -1)
-        {
-          if (vect_print_dump_info (REPORT_DETAILS))
-            {
-              fprintf (vect_dump, "not vectorized: the size of group "
-                                  "of strided accesses is not a power of 2");
-              print_gimple_stmt (vect_dump, stmt, 0, TDF_SLIM);
-            }
-
-          return false;
-        }
-    }
-
   return true;
 }
 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [4/9] Move power-of-two checks for interleaving
  2011-04-12 13:44 ` [4/9] Move power-of-two checks for interleaving Richard Sandiford
@ 2011-04-12 13:57   ` Richard Guenther
  0 siblings, 0 replies; 27+ messages in thread
From: Richard Guenther @ 2011-04-12 13:57 UTC (permalink / raw)
  To: gcc-patches, patches, richard.sandiford

On Tue, Apr 12, 2011 at 3:44 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> NEON has vld3 and vst3 instructions, which support an interleaving of
> three vectors.  This patch therefore removes the blanket power-of-two
> requirement for interleaving and enforces it on a per-operation
> basis instead.
>
> The patch also replaces:
>
>  /* Check that the operation is supported.  */
>  if (!vect_strided_store_supported (vectype))
>    return false;
>
> with:
>
>  gcc_assert (vect_strided_store_supported (vectype, length));
>
> because it was vectorizable_store's responsibility to check this upfront.
> Likewise for loads.
>
> Tested on x86_64-linux-gnu and arm-linux-gnueabi.  OK to install?

Ok.

Thanks,
Richard.

> Richard
>
>
> gcc/
>        * tree-vectorizer.h (vect_strided_store_supported): Add a
>        HOST_WIDE_INT argument.
>        (vect_strided_load_supported): Likewise.
>        (vect_permute_store_chain): Return void.
>        (vect_transform_strided_load): Likewise.
>        (vect_permute_load_chain): Delete.
>        * tree-vect-data-refs.c (vect_strided_store_supported): Take a
>        count argument.  Check that the count is a power of two.
>        (vect_strided_load_supported): Likewise.
>        (vect_permute_store_chain): Return void.  Update after above changes.
>        Assert that the access is supported.
>        (vect_permute_load_chain): Likewise.
>        (vect_transform_strided_load): Return void.
>        * tree-vect-stmts.c (vectorizable_store): Update calls after
>        above interface changes.
>        (vectorizable_load): Likewise.
>        (vect_analyze_stmt): Don't check for strided powers of two here.
>
> Index: gcc/tree-vectorizer.h
> ===================================================================
> --- gcc/tree-vectorizer.h       2011-04-12 11:55:07.000000000 +0100
> +++ gcc/tree-vectorizer.h       2011-04-12 11:55:09.000000000 +0100
> @@ -828,16 +828,14 @@ extern tree vect_create_data_ref_ptr (gi
>                                      gimple *, bool, bool *);
>  extern tree bump_vector_ptr (tree, gimple, gimple_stmt_iterator *, gimple, tree);
>  extern tree vect_create_destination_var (tree, tree);
> -extern bool vect_strided_store_supported (tree);
> -extern bool vect_strided_load_supported (tree);
> -extern bool vect_permute_store_chain (VEC(tree,heap) *,unsigned int, gimple,
> +extern bool vect_strided_store_supported (tree, unsigned HOST_WIDE_INT);
> +extern bool vect_strided_load_supported (tree, unsigned HOST_WIDE_INT);
> +extern void vect_permute_store_chain (VEC(tree,heap) *,unsigned int, gimple,
>                                     gimple_stmt_iterator *, VEC(tree,heap) **);
>  extern tree vect_setup_realignment (gimple, gimple_stmt_iterator *, tree *,
>                                     enum dr_alignment_support, tree,
>                                     struct loop **);
> -extern bool vect_permute_load_chain (VEC(tree,heap) *,unsigned int, gimple,
> -                                    gimple_stmt_iterator *, VEC(tree,heap) **);
> -extern bool vect_transform_strided_load (gimple, VEC(tree,heap) *, int,
> +extern void vect_transform_strided_load (gimple, VEC(tree,heap) *, int,
>                                          gimple_stmt_iterator *);
>  extern int vect_get_place_in_interleaving_chain (gimple, gimple);
>  extern tree vect_get_new_vect_var (tree, enum vect_var_kind, const char *);
> Index: gcc/tree-vect-data-refs.c
> ===================================================================
> --- gcc/tree-vect-data-refs.c   2011-04-12 11:55:07.000000000 +0100
> +++ gcc/tree-vect-data-refs.c   2011-04-12 11:55:09.000000000 +0100
> @@ -2196,19 +2196,6 @@ vect_analyze_group_access (struct data_r
>           return false;
>         }
>
> -      /* FORNOW: we handle only interleaving that is a power of 2.
> -         We don't fail here if it may be still possible to vectorize the
> -         group using SLP.  If not, the size of the group will be checked in
> -         vect_analyze_operations, and the vectorization will fail.  */
> -      if (exact_log2 (stride) == -1)
> -       {
> -         if (vect_print_dump_info (REPORT_DETAILS))
> -           fprintf (vect_dump, "interleaving is not a power of 2");
> -
> -         if (slp_impossible)
> -           return false;
> -       }
> -
>       if (stride == 0)
>         stride = count;
>
> @@ -3349,13 +3336,22 @@ vect_create_destination_var (tree scalar
>    and FALSE otherwise.  */
>
>  bool
> -vect_strided_store_supported (tree vectype)
> +vect_strided_store_supported (tree vectype, unsigned HOST_WIDE_INT count)
>  {
>   optab interleave_high_optab, interleave_low_optab;
>   enum machine_mode mode;
>
>   mode = TYPE_MODE (vectype);
>
> +  /* vect_permute_store_chain requires the group size to be a power of two.  */
> +  if (exact_log2 (count) == -1)
> +    {
> +      if (vect_print_dump_info (REPORT_DETAILS))
> +       fprintf (vect_dump, "the size of the group of strided accesses"
> +                " is not a power of 2");
> +      return false;
> +    }
> +
>   /* Check that the operation is supported.  */
>   interleave_high_optab = optab_for_tree_code (VEC_INTERLEAVE_HIGH_EXPR,
>                                               vectype, optab_default);
> @@ -3441,7 +3437,7 @@ vect_strided_store_supported (tree vecty
>    I3:  4 12 20 28  5 13 21 30
>    I4:  6 14 22 30  7 15 23 31.  */
>
> -bool
> +void
>  vect_permute_store_chain (VEC(tree,heap) *dr_chain,
>                          unsigned int length,
>                          gimple stmt,
> @@ -3455,9 +3451,7 @@ vect_permute_store_chain (VEC(tree,heap)
>   unsigned int j;
>   enum tree_code high_code, low_code;
>
> -  /* Check that the operation is supported.  */
> -  if (!vect_strided_store_supported (vectype))
> -    return false;
> +  gcc_assert (vect_strided_store_supported (vectype, length));
>
>   *result_chain = VEC_copy (tree, heap, dr_chain);
>
> @@ -3510,7 +3504,6 @@ vect_permute_store_chain (VEC(tree,heap)
>        }
>       dr_chain = VEC_copy (tree, heap, *result_chain);
>     }
> -  return true;
>  }
>
>  /* Function vect_setup_realignment
> @@ -3787,13 +3780,22 @@ vect_setup_realignment (gimple stmt, gim
>    and FALSE otherwise.  */
>
>  bool
> -vect_strided_load_supported (tree vectype)
> +vect_strided_load_supported (tree vectype, unsigned HOST_WIDE_INT count)
>  {
>   optab perm_even_optab, perm_odd_optab;
>   enum machine_mode mode;
>
>   mode = TYPE_MODE (vectype);
>
> +  /* vect_permute_load_chain requires the group size to be a power of two.  */
> +  if (exact_log2 (count) == -1)
> +    {
> +      if (vect_print_dump_info (REPORT_DETAILS))
> +       fprintf (vect_dump, "the size of the group of strided accesses"
> +                " is not a power of 2");
> +      return false;
> +    }
> +
>   perm_even_optab = optab_for_tree_code (VEC_EXTRACT_EVEN_EXPR, vectype,
>                                         optab_default);
>   if (!perm_even_optab)
> @@ -3905,7 +3907,7 @@ vect_strided_load_supported (tree vectyp
>    3rd vec (E2):  2 6 10 14 18 22 26 30
>    4th vec (E4):  3 7 11 15 19 23 27 31.  */
>
> -bool
> +static void
>  vect_permute_load_chain (VEC(tree,heap) *dr_chain,
>                         unsigned int length,
>                         gimple stmt,
> @@ -3918,9 +3920,7 @@ vect_permute_load_chain (VEC(tree,heap)
>   int i;
>   unsigned int j;
>
> -  /* Check that the operation is supported.  */
> -  if (!vect_strided_load_supported (vectype))
> -    return false;
> +  gcc_assert (vect_strided_load_supported (vectype, length));
>
>   *result_chain = VEC_copy (tree, heap, dr_chain);
>   for (i = 0; i < exact_log2 (length); i++)
> @@ -3963,7 +3963,6 @@ vect_permute_load_chain (VEC(tree,heap)
>        }
>       dr_chain = VEC_copy (tree, heap, *result_chain);
>     }
> -  return true;
>  }
>
>
> @@ -3974,7 +3973,7 @@ vect_permute_load_chain (VEC(tree,heap)
>    the scalar statements.
>  */
>
> -bool
> +void
>  vect_transform_strided_load (gimple stmt, VEC(tree,heap) *dr_chain, int size,
>                             gimple_stmt_iterator *gsi)
>  {
> @@ -3990,8 +3989,7 @@ vect_transform_strided_load (gimple stmt
>      vectors, that are ready for vector computation.  */
>   result_chain = VEC_alloc (tree, heap, size);
>   /* Permute.  */
> -  if (!vect_permute_load_chain (dr_chain, size, stmt, gsi, &result_chain))
> -    return false;
> +  vect_permute_load_chain (dr_chain, size, stmt, gsi, &result_chain);
>
>   /* Put a permuted data-ref in the VECTORIZED_STMT field.
>      Since we scan the chain starting from it's first node, their order
> @@ -4055,7 +4053,6 @@ vect_transform_strided_load (gimple stmt
>     }
>
>   VEC_free (tree, heap, result_chain);
> -  return true;
>  }
>
>  /* Function vect_force_dr_alignment_p.
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       2011-04-12 11:55:09.000000000 +0100
> +++ gcc/tree-vect-stmts.c       2011-04-12 11:55:09.000000000 +0100
> @@ -3412,9 +3412,12 @@ vectorizable_store (gimple stmt, gimple_
>     {
>       strided_store = true;
>       first_stmt = DR_GROUP_FIRST_DR (stmt_info);
> -      if (!vect_strided_store_supported (vectype)
> -         && !PURE_SLP_STMT (stmt_info) && !slp)
> -       return false;
> +      if (!slp && !PURE_SLP_STMT (stmt_info))
> +       {
> +         group_size = DR_GROUP_SIZE (vinfo_for_stmt (first_stmt));
> +         if (!vect_strided_store_supported (vectype, group_size))
> +           return false;
> +       }
>
>       if (first_stmt == stmt)
>        {
> @@ -3617,9 +3620,8 @@ vectorizable_store (gimple stmt, gimple_
>            {
>              result_chain = VEC_alloc (tree, heap, group_size);
>              /* Permute.  */
> -             if (!vect_permute_store_chain (dr_chain, group_size, stmt, gsi,
> -                                            &result_chain))
> -               return false;
> +             vect_permute_store_chain (dr_chain, group_size, stmt, gsi,
> +                                       &result_chain);
>            }
>
>          next_stmt = first_stmt;
> @@ -3912,10 +3914,13 @@ vectorizable_load (gimple stmt, gimple_s
>       /* FORNOW */
>       gcc_assert (! nested_in_vect_loop);
>
> -      /* Check if interleaving is supported.  */
> -      if (!vect_strided_load_supported (vectype)
> -         && !PURE_SLP_STMT (stmt_info) && !slp)
> -       return false;
> +      first_stmt = DR_GROUP_FIRST_DR (stmt_info);
> +      if (!slp && !PURE_SLP_STMT (stmt_info))
> +       {
> +         group_size = DR_GROUP_SIZE (vinfo_for_stmt (first_stmt));
> +         if (!vect_strided_load_supported (vectype, group_size))
> +           return false;
> +       }
>     }
>
>   if (negative)
> @@ -4344,10 +4349,7 @@ vectorizable_load (gimple stmt, gimple_s
>         {
>           if (strided_load)
>            {
> -             if (!vect_transform_strided_load (stmt, dr_chain,
> -                                               group_size, gsi))
> -               return false;
> -
> +             vect_transform_strided_load (stmt, dr_chain, group_size, gsi);
>              *vec_stmt = STMT_VINFO_VEC_STMT (stmt_info);
>            }
>           else
> @@ -4766,27 +4768,6 @@ vect_analyze_stmt (gimple stmt, bool *ne
>        return false;
>     }
>
> -  if (!PURE_SLP_STMT (stmt_info))
> -    {
> -      /* Groups of strided accesses whose size is not a power of 2 are not
> -         vectorizable yet using loop-vectorization.  Therefore, if this stmt
> -        feeds non-SLP-able stmts (i.e., this stmt has to be both SLPed and
> -        loop-based vectorized), the loop cannot be vectorized.  */
> -      if (STMT_VINFO_STRIDED_ACCESS (stmt_info)
> -          && exact_log2 (DR_GROUP_SIZE (vinfo_for_stmt (
> -                                        DR_GROUP_FIRST_DR (stmt_info)))) == -1)
> -        {
> -          if (vect_print_dump_info (REPORT_DETAILS))
> -            {
> -              fprintf (vect_dump, "not vectorized: the size of group "
> -                                  "of strided accesses is not a power of 2");
> -              print_gimple_stmt (vect_dump, stmt, 0, TDF_SLIM);
> -            }
> -
> -          return false;
> -        }
> -    }
> -
>   return true;
>  }
>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [5/9] Main target-independent support for direct interleaving
  2011-04-12 13:21 [0/9] Direct support for loads and stores of interleaved vectors Richard Sandiford
                   ` (3 preceding siblings ...)
  2011-04-12 13:44 ` [4/9] Move power-of-two checks for interleaving Richard Sandiford
@ 2011-04-12 13:59 ` Richard Sandiford
  2011-04-17 14:26   ` Ira Rosen
  2011-04-18 11:54   ` Richard Guenther
  2011-04-12 14:01 ` [6/9] NEON vec_load_lanes and vec_store_lanes patterns Richard Sandiford
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 27+ messages in thread
From: Richard Sandiford @ 2011-04-12 13:59 UTC (permalink / raw)
  To: gcc-patches; +Cc: patches

This patch adds vec_load_lanes and vec_store_lanes optabs for instructions
like NEON's vldN and vstN.  The optabs are defined this way because the
vectors must be allocated to a block of consecutive registers.

Tested on x86_64-linux-gnu and arm-linux-gnueabi.  OK to install?

Richard


gcc/
	* doc/md.texi (vec_load_lanes, vec_store_lanes): Document.
	* optabs.h (COI_vec_load_lanes, COI_vec_store_lanes): New
	convert_optab_index values.
	(vec_load_lanes_optab, vec_store_lanes_optab): New convert optabs.
	* genopinit.c (optabs): Initialize the new optabs.
	* internal-fn.def (LOAD_LANES, STORE_LANES): New internal functions.
	* internal-fn.c (get_multi_vector_move, expand_LOAD_LANES)
	(expand_STORE_LANES): New functions.
	* tree.h (build_simple_array_type): Declare.
	* tree.c (build_simple_array_type): New function.
	* tree-vectorizer.h (vect_model_store_cost): Add a bool argument.
	(vect_model_load_cost): Likewise.
	(vect_store_lanes_supported, vect_load_lanes_supported)
	(vect_record_strided_load_vectors): Declare.
	* tree-vect-data-refs.c (vect_lanes_optab_supported_p)
	(vect_store_lanes_supported, vect_load_lanes_supported): New functions.
	(vect_transform_strided_load): Split out statement recording into...
	(vect_record_strided_load_vectors): ...this new function.
	* tree-vect-stmts.c (create_vector_array, read_vector_array)
	(write_vector_array, create_array_ref): New functions.
	(vect_model_store_cost): Add store_lanes_p argument.
	(vect_model_load_cost): Add load_lanes_p argument.
	(vectorizable_store): Try to use store-lanes functions for
	interleaved stores.
	(vectorizable_load): Likewise load-lanes and loads.
	* tree-vect-slp.c (vect_get_and_check_slp_defs)
	(vect_build_slp_tree):

Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi	2011-04-12 12:16:46.000000000 +0100
+++ gcc/doc/md.texi	2011-04-12 14:48:28.000000000 +0100
@@ -3846,6 +3846,48 @@ into consecutive memory locations.  Oper
 consecutive memory locations, operand 1 is the first register, and
 operand 2 is a constant: the number of consecutive registers.
 
+@cindex @code{vec_load_lanes@var{m}@var{n}} instruction pattern
+@item @samp{vec_load_lanes@var{m}@var{n}}
+Perform an interleaved load of several vectors from memory operand 1
+into register operand 0.  Both operands have mode @var{m}.  The register
+operand is viewed as holding consecutive vectors of mode @var{n},
+while the memory operand is a flat array that contains the same number
+of elements.  The operation is equivalent to:
+
+@smallexample
+int c = GET_MODE_SIZE (@var{m}) / GET_MODE_SIZE (@var{n});
+for (j = 0; j < GET_MODE_NUNITS (@var{n}); j++)
+  for (i = 0; i < c; i++)
+    operand0[i][j] = operand1[j * c + i];
+@end smallexample
+
+For example, @samp{vec_load_lanestiv4hi} loads 8 16-bit values
+from memory into a register of mode @samp{TI}@.  The register
+contains two consecutive vectors of mode @samp{V4HI}@.
+
+This pattern can only be used if:
+@smallexample
+TARGET_ARRAY_MODE_SUPPORTED_P (@var{n}, @var{c})
+@end smallexample
+is true.  GCC assumes that, if a target supports this kind of
+instruction for some mode @var{n}, it also supports unaligned
+loads for vectors of mode @var{n}.
+
+@cindex @code{vec_store_lanes@var{m}@var{n}} instruction pattern
+@item @samp{vec_store_lanes@var{m}@var{n}}
+Equivalent to @samp{vec_load_lanes@var{m}@var{n}}, with the memory
+and register operands reversed.  That is, the instruction is
+equivalent to:
+
+@smallexample
+int c = GET_MODE_SIZE (@var{m}) / GET_MODE_SIZE (@var{n});
+for (j = 0; j < GET_MODE_NUNITS (@var{n}); j++)
+  for (i = 0; i < c; i++)
+    operand0[j * c + i] = operand1[i][j];
+@end smallexample
+
+for a memory operand 0 and register operand 1.
+
 @cindex @code{vec_set@var{m}} instruction pattern
 @item @samp{vec_set@var{m}}
 Set given field in the vector value.  Operand 0 is the vector to modify,
Index: gcc/optabs.h
===================================================================
--- gcc/optabs.h	2011-04-12 12:16:46.000000000 +0100
+++ gcc/optabs.h	2011-04-12 14:48:28.000000000 +0100
@@ -578,6 +578,9 @@ enum convert_optab_index
   COI_satfract,
   COI_satfractuns,
 
+  COI_vec_load_lanes,
+  COI_vec_store_lanes,
+
   COI_MAX
 };
 
@@ -598,6 +601,8 @@ #define fract_optab (&convert_optab_tabl
 #define fractuns_optab (&convert_optab_table[COI_fractuns])
 #define satfract_optab (&convert_optab_table[COI_satfract])
 #define satfractuns_optab (&convert_optab_table[COI_satfractuns])
+#define vec_load_lanes_optab (&convert_optab_table[COI_vec_load_lanes])
+#define vec_store_lanes_optab (&convert_optab_table[COI_vec_store_lanes])
 
 /* Contains the optab used for each rtx code.  */
 extern optab code_to_optab[NUM_RTX_CODE + 1];
Index: gcc/genopinit.c
===================================================================
--- gcc/genopinit.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/genopinit.c	2011-04-12 14:48:28.000000000 +0100
@@ -74,6 +74,8 @@ static const char * const optabs[] =
   "set_convert_optab_handler (fractuns_optab, $B, $A, CODE_FOR_$(fractuns$Q$a$I$b2$))",
   "set_convert_optab_handler (satfract_optab, $B, $A, CODE_FOR_$(satfract$a$Q$b2$))",
   "set_convert_optab_handler (satfractuns_optab, $B, $A, CODE_FOR_$(satfractuns$I$a$Q$b2$))",
+  "set_convert_optab_handler (vec_load_lanes_optab, $A, $B, CODE_FOR_$(vec_load_lanes$a$b$))",
+  "set_convert_optab_handler (vec_store_lanes_optab, $A, $B, CODE_FOR_$(vec_store_lanes$a$b$))",
   "set_optab_handler (add_optab, $A, CODE_FOR_$(add$P$a3$))",
   "set_optab_handler (addv_optab, $A, CODE_FOR_$(add$F$a3$)),\n\
     set_optab_handler (add_optab, $A, CODE_FOR_$(add$F$a3$))",
Index: gcc/internal-fn.def
===================================================================
--- gcc/internal-fn.def	2011-04-12 14:10:42.000000000 +0100
+++ gcc/internal-fn.def	2011-04-12 14:48:28.000000000 +0100
@@ -32,3 +32,6 @@ along with GCC; see the file COPYING3.  
 
    where NAME is the name of the function and FLAGS is a set of
    ECF_* flags.  */
+
+DEF_INTERNAL_FN (LOAD_LANES, ECF_CONST | ECF_LEAF)
+DEF_INTERNAL_FN (STORE_LANES, ECF_CONST | ECF_LEAF)
Index: gcc/internal-fn.c
===================================================================
--- gcc/internal-fn.c	2011-04-12 14:10:42.000000000 +0100
+++ gcc/internal-fn.c	2011-04-12 14:48:28.000000000 +0100
@@ -41,6 +41,69 @@ #define DEF_INTERNAL_FN(CODE, FLAGS) FLA
   0
 };
 
+/* ARRAY_TYPE is an array of vector modes.  Return the associated insn
+   for load-lanes-style optab OPTAB.  The insn must exist.  */
+
+static enum insn_code
+get_multi_vector_move (tree array_type, convert_optab optab)
+{
+  enum insn_code icode;
+  enum machine_mode imode;
+  enum machine_mode vmode;
+
+  gcc_assert (TREE_CODE (array_type) == ARRAY_TYPE);
+  imode = TYPE_MODE (array_type);
+  vmode = TYPE_MODE (TREE_TYPE (array_type));
+
+  icode = convert_optab_handler (optab, imode, vmode);
+  gcc_assert (icode != CODE_FOR_nothing);
+  return icode;
+}
+
+/* Expand: LHS = LOAD_LANES (ARGS[0]).  */
+
+static void
+expand_LOAD_LANES (tree lhs, tree *args)
+{
+  struct expand_operand ops[2];
+  tree type;
+  rtx target, mem;
+
+  type = TREE_TYPE (lhs);
+
+  target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  mem = expand_normal (args[0]);
+
+  gcc_assert (MEM_P (mem));
+  PUT_MODE (mem, TYPE_MODE (type));
+
+  create_output_operand (&ops[0], target, TYPE_MODE (type));
+  create_fixed_operand (&ops[1], mem);
+  expand_insn (get_multi_vector_move (type, vec_load_lanes_optab), 2, ops);
+}
+
+/* Expand: LHS = STORE_LANES (ARGS[0]).  */
+
+static void
+expand_STORE_LANES (tree lhs, tree *args)
+{
+  struct expand_operand ops[2];
+  tree type;
+  rtx target, rhs;
+
+  type = TREE_TYPE (args[0]);
+
+  target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  rhs = expand_normal (args[0]);
+
+  gcc_assert (MEM_P (target));
+  PUT_MODE (target, TYPE_MODE (type));
+
+  create_fixed_operand (&ops[0], target);
+  create_input_operand (&ops[1], rhs, TYPE_MODE (type));
+  expand_insn (get_multi_vector_move (type, vec_store_lanes_optab), 2, ops);
+}
+
 /* Routines to expand each internal function, indexed by function number.
    Each routine has the prototype:
 
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	2011-04-12 12:16:46.000000000 +0100
+++ gcc/tree.h	2011-04-12 14:48:28.000000000 +0100
@@ -4198,6 +4198,7 @@ extern tree build_type_no_quals (tree);
 extern tree build_index_type (tree);
 extern tree build_array_type (tree, tree);
 extern tree build_nonshared_array_type (tree, tree);
+extern tree build_simple_array_type (tree, unsigned HOST_WIDE_INT);
 extern tree build_function_type (tree, tree);
 extern tree build_function_type_list (tree, ...);
 extern tree build_function_type_skip_args (tree, bitmap);
Index: gcc/tree.c
===================================================================
--- gcc/tree.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/tree.c	2011-04-12 14:48:28.000000000 +0100
@@ -7385,6 +7385,15 @@ build_nonshared_array_type (tree elt_typ
   return build_array_type_1 (elt_type, index_type, false);
 }
 
+/* Return a representation of ELT_TYPE[NELTS], using indices of type
+   sizetype.  */
+
+tree
+build_simple_array_type (tree elt_type, unsigned HOST_WIDE_INT nelts)
+{
+  return build_array_type (elt_type, build_index_type (size_int (nelts - 1)));
+}
+
 /* Recursively examines the array elements of TYPE, until a non-array
    element type is found.  */
 
Index: gcc/tree-vectorizer.h
===================================================================
--- gcc/tree-vectorizer.h	2011-04-12 14:48:27.000000000 +0100
+++ gcc/tree-vectorizer.h	2011-04-12 14:48:28.000000000 +0100
@@ -788,9 +788,9 @@ extern void free_stmt_vec_info (gimple s
 extern tree vectorizable_function (gimple, tree, tree);
 extern void vect_model_simple_cost (stmt_vec_info, int, enum vect_def_type *,
                                     slp_tree);
-extern void vect_model_store_cost (stmt_vec_info, int, enum vect_def_type,
-                                   slp_tree);
-extern void vect_model_load_cost (stmt_vec_info, int, slp_tree);
+extern void vect_model_store_cost (stmt_vec_info, int, bool,
+				   enum vect_def_type, slp_tree);
+extern void vect_model_load_cost (stmt_vec_info, int, bool, slp_tree);
 extern void vect_finish_stmt_generation (gimple, gimple,
                                          gimple_stmt_iterator *);
 extern bool vect_mark_stmts_to_be_vectorized (loop_vec_info);
@@ -829,7 +829,9 @@ extern tree vect_create_data_ref_ptr (gi
 extern tree bump_vector_ptr (tree, gimple, gimple_stmt_iterator *, gimple, tree);
 extern tree vect_create_destination_var (tree, tree);
 extern bool vect_strided_store_supported (tree, unsigned HOST_WIDE_INT);
+extern bool vect_store_lanes_supported (tree, unsigned HOST_WIDE_INT);
 extern bool vect_strided_load_supported (tree, unsigned HOST_WIDE_INT);
+extern bool vect_load_lanes_supported (tree, unsigned HOST_WIDE_INT);
 extern void vect_permute_store_chain (VEC(tree,heap) *,unsigned int, gimple,
                                     gimple_stmt_iterator *, VEC(tree,heap) **);
 extern tree vect_setup_realignment (gimple, gimple_stmt_iterator *, tree *,
@@ -837,6 +839,7 @@ extern tree vect_setup_realignment (gimp
                                     struct loop **);
 extern void vect_transform_strided_load (gimple, VEC(tree,heap) *, int,
                                          gimple_stmt_iterator *);
+extern void vect_record_strided_load_vectors (gimple, VEC(tree,heap) *);
 extern int vect_get_place_in_interleaving_chain (gimple, gimple);
 extern tree vect_get_new_vect_var (tree, enum vect_var_kind, const char *);
 extern tree vect_create_addr_base_for_vector_ref (gimple, gimple_seq *,
Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2011-04-12 14:48:27.000000000 +0100
+++ gcc/tree-vect-data-refs.c	2011-04-12 14:49:18.000000000 +0100
@@ -43,6 +43,45 @@ Software Foundation; either version 3, o
 #include "expr.h"
 #include "optabs.h"
 
+/* Return true if load- or store-lanes optab OPTAB is implemented for
+   COUNT vectors of type VECTYPE.  NAME is the name of OPTAB.  */
+
+static bool
+vect_lanes_optab_supported_p (const char *name, convert_optab optab,
+			      tree vectype, unsigned HOST_WIDE_INT count)
+{
+  enum machine_mode mode, array_mode;
+  bool limit_p;
+
+  mode = TYPE_MODE (vectype);
+  limit_p = !targetm.array_mode_supported_p (mode, count);
+  array_mode = mode_for_size (count * GET_MODE_BITSIZE (mode),
+			      MODE_INT, limit_p);
+
+  if (array_mode == BLKmode)
+    {
+      if (vect_print_dump_info (REPORT_DETAILS))
+	fprintf (vect_dump, "no array mode for %s[" HOST_WIDE_INT_PRINT_DEC "]",
+		 GET_MODE_NAME (mode), count);
+      return false;
+    }
+
+  if (convert_optab_handler (optab, array_mode, mode) == CODE_FOR_nothing)
+    {
+      if (vect_print_dump_info (REPORT_DETAILS))
+	fprintf (vect_dump, "cannot use %s<%s><%s>",
+		 name, GET_MODE_NAME (array_mode), GET_MODE_NAME (mode));
+      return false;
+    }
+
+  if (vect_print_dump_info (REPORT_DETAILS))
+    fprintf (vect_dump, "can use %s<%s><%s>",
+	     name, GET_MODE_NAME (array_mode), GET_MODE_NAME (mode));
+
+  return true;
+}
+
+
 /* Return the smallest scalar part of STMT.
    This is used to determine the vectype of the stmt.  We generally set the
    vectype according to the type of the result (lhs).  For stmts whose
@@ -3376,6 +3415,18 @@ vect_strided_store_supported (tree vecty
 }
 
 
+/* Return TRUE if vec_store_lanes is avaiable for COUNT vectors of
+   type VECTYPE.  */
+
+bool
+vect_store_lanes_supported (tree vectype, unsigned HOST_WIDE_INT count)
+{
+  return vect_lanes_optab_supported_p ("vec_store_lanes",
+				       vec_store_lanes_optab,
+				       vectype, count);
+}
+
+
 /* Function vect_permute_store_chain.
 
    Given a chain of interleaved stores in DR_CHAIN of LENGTH that must be
@@ -3830,6 +3881,16 @@ vect_strided_load_supported (tree vectyp
   return true;
 }
 
+/* Return TRUE if vec_load_lanes is avaiable for COUNT vectors of
+   type VECTYPE.  */
+
+bool
+vect_load_lanes_supported (tree vectype, unsigned HOST_WIDE_INT count)
+{
+  return vect_lanes_optab_supported_p ("vec_load_lanes",
+				       vec_load_lanes_optab,
+				       vectype, count);
+}
 
 /* Function vect_permute_load_chain.
 
@@ -3977,19 +4038,28 @@ vect_permute_load_chain (VEC(tree,heap) 
 vect_transform_strided_load (gimple stmt, VEC(tree,heap) *dr_chain, int size,
 			     gimple_stmt_iterator *gsi)
 {
-  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
-  gimple first_stmt = DR_GROUP_FIRST_DR (stmt_info);
-  gimple next_stmt, new_stmt;
   VEC(tree,heap) *result_chain = NULL;
-  unsigned int i, gap_count;
-  tree tmp_data_ref;
 
   /* DR_CHAIN contains input data-refs that are a part of the interleaving.
      RESULT_CHAIN is the output of vect_permute_load_chain, it contains permuted
      vectors, that are ready for vector computation.  */
   result_chain = VEC_alloc (tree, heap, size);
-  /* Permute.  */
   vect_permute_load_chain (dr_chain, size, stmt, gsi, &result_chain);
+  vect_record_strided_load_vectors (stmt, result_chain);
+  VEC_free (tree, heap, result_chain);
+}
+
+/* RESULT_CHAIN contains the output of a group of strided loads that were
+   generated as part of the vectorization of STMT.  Assign the statement
+   for each vector to the associated scalar statement.  */
+
+void
+vect_record_strided_load_vectors (gimple stmt, VEC(tree,heap) *result_chain)
+{
+  gimple first_stmt = DR_GROUP_FIRST_DR (vinfo_for_stmt (stmt));
+  gimple next_stmt, new_stmt;
+  unsigned int i, gap_count;
+  tree tmp_data_ref;
 
   /* Put a permuted data-ref in the VECTORIZED_STMT field.
      Since we scan the chain starting from it's first node, their order
@@ -4051,8 +4121,6 @@ vect_transform_strided_load (gimple stmt
 	    break;
         }
     }
-
-  VEC_free (tree, heap, result_chain);
 }
 
 /* Function vect_force_dr_alignment_p.
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2011-04-12 14:48:27.000000000 +0100
+++ gcc/tree-vect-stmts.c	2011-04-12 14:52:10.000000000 +0100
@@ -42,6 +42,81 @@ Software Foundation; either version 3, o
 #include "langhooks.h"
 
 
+/* Return a variable of type ELEM_TYPE[NELEMS].  */
+
+static tree
+create_vector_array (tree elem_type, unsigned HOST_WIDE_INT nelems)
+{
+  return create_tmp_var (build_simple_array_type (elem_type, nelems),
+			 "vect_array");
+}
+
+/* ARRAY is an array of vectors created by create_vector_array.
+   Return an SSA_NAME for the vector in index N.  The reference
+   is part of the vectorization of STMT and the vector is associated
+   with scalar destination SCALAR_DEST.  */
+
+static tree
+read_vector_array (gimple stmt, gimple_stmt_iterator *gsi, tree scalar_dest,
+		   tree array, unsigned HOST_WIDE_INT n)
+{
+  tree vect_type, vect, vect_name, array_ref;
+  gimple new_stmt;
+
+  gcc_assert (TREE_CODE (TREE_TYPE (array)) == ARRAY_TYPE);
+  vect_type = TREE_TYPE (TREE_TYPE (array));
+  vect = vect_create_destination_var (scalar_dest, vect_type);
+  array_ref = build4 (ARRAY_REF, vect_type, array,
+		      build_int_cst (size_type_node, n),
+		      NULL_TREE, NULL_TREE);
+
+  new_stmt = gimple_build_assign (vect, array_ref);
+  vect_name = make_ssa_name (vect, new_stmt);
+  gimple_assign_set_lhs (new_stmt, vect_name);
+  vect_finish_stmt_generation (stmt, new_stmt, gsi);
+  mark_symbols_for_renaming (new_stmt);
+
+  return vect_name;
+}
+
+/* ARRAY is an array of vectors created by create_vector_array.
+   Emit code to store SSA_NAME VECT in index N of the array.
+   The store is part of the vectorization of STMT.  */
+
+static void
+write_vector_array (gimple stmt, gimple_stmt_iterator *gsi, tree vect,
+		    tree array, unsigned HOST_WIDE_INT n)
+{
+  tree array_ref;
+  gimple new_stmt;
+
+  array_ref = build4 (ARRAY_REF, TREE_TYPE (vect), array,
+		      build_int_cst (size_type_node, n),
+		      NULL_TREE, NULL_TREE);
+
+  new_stmt = gimple_build_assign (array_ref, vect);
+  vect_finish_stmt_generation (stmt, new_stmt, gsi);
+  mark_symbols_for_renaming (new_stmt);
+}
+
+/* PTR is a pointer to array type TYPE.  Return a representation of *PTR.
+   The memory reference replaces those in FIRST_DR (and its group).  */
+
+static tree
+create_array_ref (tree type, tree ptr, struct data_reference *first_dr)
+{
+  struct ptr_info_def *pi;
+  tree mem_ref, alias_ptr_type;
+
+  alias_ptr_type = reference_alias_ptr_type (DR_REF (first_dr));
+  mem_ref = build2 (MEM_REF, type, ptr, build_int_cst (alias_ptr_type, 0));
+  /* Arrays have the same alignment as their type.  */
+  pi = get_ptr_info (ptr);
+  pi->align = TYPE_ALIGN_UNIT (type);
+  pi->misalign = 0;
+  return mem_ref;
+}
+
 /* Utility functions used by vect_mark_stmts_to_be_vectorized.  */
 
 /* Function vect_mark_relevant.
@@ -648,7 +723,8 @@ vect_cost_strided_group_size (stmt_vec_i
 
 void
 vect_model_store_cost (stmt_vec_info stmt_info, int ncopies,
-		       enum vect_def_type dt, slp_tree slp_node)
+		       bool store_lanes_p, enum vect_def_type dt,
+		       slp_tree slp_node)
 {
   int group_size;
   unsigned int inside_cost = 0, outside_cost = 0;
@@ -685,9 +761,11 @@ vect_model_store_cost (stmt_vec_info stm
       first_dr = STMT_VINFO_DATA_REF (stmt_info);
     }
 
-  /* Is this an access in a group of stores, which provide strided access?
-     If so, add in the cost of the permutes.  */
-  if (group_size > 1)
+  /* We assume that the cost of a single store-lanes instruction is
+     equivalent to the cost of GROUP_SIZE separate stores.  If a strided
+     access is instead being provided by a load-and-permute operation,
+     include the cost of the permutes.  */
+  if (!store_lanes_p && group_size > 1)
     {
       /* Uses a high and low interleave operation for each needed permute.  */
       inside_cost = ncopies * exact_log2(group_size) * group_size
@@ -763,8 +841,8 @@ vect_get_store_cost (struct data_referen
    access scheme chosen.  */
 
 void
-vect_model_load_cost (stmt_vec_info stmt_info, int ncopies, slp_tree slp_node)
-
+vect_model_load_cost (stmt_vec_info stmt_info, int ncopies, bool load_lanes_p,
+		      slp_tree slp_node)
 {
   int group_size;
   gimple first_stmt;
@@ -789,9 +867,11 @@ vect_model_load_cost (stmt_vec_info stmt
       first_dr = dr;
     }
 
-  /* Is this an access in a group of loads providing strided access?
-     If so, add in the cost of the permutes.  */
-  if (group_size > 1)
+  /* We assume that the cost of a single load-lanes instruction is
+     equivalent to the cost of GROUP_SIZE separate loads.  If a strided
+     access is instead being provided by a load-and-permute operation,
+     include the cost of the permutes.  */
+  if (!load_lanes_p && group_size > 1)
     {
       /* Uses an even and odd extract operations for each needed permute.  */
       inside_cost = ncopies * exact_log2(group_size) * group_size
@@ -3324,6 +3404,7 @@ vectorizable_store (gimple stmt, gimple_
   int j;
   gimple next_stmt, first_stmt = NULL;
   bool strided_store = false;
+  bool store_lanes_p = false;
   unsigned int group_size, i;
   VEC(tree,heap) *dr_chain = NULL, *oprnds = NULL, *result_chain = NULL;
   bool inv_p;
@@ -3331,6 +3412,7 @@ vectorizable_store (gimple stmt, gimple_
   bool slp = (slp_node != NULL);
   unsigned int vec_num;
   bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
+  tree aggr_type;
 
   if (loop_vinfo)
     loop = LOOP_VINFO_LOOP (loop_vinfo);
@@ -3415,7 +3497,9 @@ vectorizable_store (gimple stmt, gimple_
       if (!slp && !PURE_SLP_STMT (stmt_info))
 	{
 	  group_size = DR_GROUP_SIZE (vinfo_for_stmt (first_stmt));
-	  if (!vect_strided_store_supported (vectype, group_size))
+	  if (vect_store_lanes_supported (vectype, group_size))
+	    store_lanes_p = true;
+	  else if (!vect_strided_store_supported (vectype, group_size))
 	    return false;
 	}
 
@@ -3443,7 +3527,7 @@ vectorizable_store (gimple stmt, gimple_
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = store_vec_info_type;
-      vect_model_store_cost (stmt_info, ncopies, dt, NULL);
+      vect_model_store_cost (stmt_info, ncopies, store_lanes_p, dt, NULL);
       return true;
     }
 
@@ -3498,6 +3582,16 @@ vectorizable_store (gimple stmt, gimple_
 
   alignment_support_scheme = vect_supportable_dr_alignment (first_dr, false);
   gcc_assert (alignment_support_scheme);
+  /* Targets with store-lane instructions must not require explicit
+     realignment.  */
+  gcc_assert (!store_lanes_p
+	      || alignment_support_scheme == dr_aligned
+	      || alignment_support_scheme == dr_unaligned_supported);
+
+  if (store_lanes_p)
+    aggr_type = build_simple_array_type (elem_type, vec_num * nunits);
+  else
+    aggr_type = vectype;
 
   /* In case the vectorization factor (VF) is bigger than the number
      of elements that we can fit in a vectype (nunits), we have to generate
@@ -3586,7 +3680,7 @@ vectorizable_store (gimple stmt, gimple_
 	  /* We should have catched mismatched types earlier.  */
 	  gcc_assert (useless_type_conversion_p (vectype,
 						 TREE_TYPE (vec_oprnd)));
-	  dataref_ptr = vect_create_data_ref_ptr (first_stmt, vectype, NULL,
+	  dataref_ptr = vect_create_data_ref_ptr (first_stmt, aggr_type, NULL,
 						  NULL_TREE, &dummy, gsi,
 						  &ptr_incr, false, &inv_p);
 	  gcc_assert (bb_vinfo || !inv_p);
@@ -3609,11 +3703,31 @@ vectorizable_store (gimple stmt, gimple_
 	      VEC_replace(tree, dr_chain, i, vec_oprnd);
 	      VEC_replace(tree, oprnds, i, vec_oprnd);
 	    }
-	  dataref_ptr =
-		bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt, NULL_TREE);
+	  dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt,
+					 TYPE_SIZE_UNIT (aggr_type));
 	}
 
-      if (1)
+      if (store_lanes_p)
+	{
+	  tree vec_array;
+
+	  /* Combine all the vectors into an array.  */
+	  vec_array = create_vector_array (vectype, vec_num);
+	  for (i = 0; i < vec_num; i++)
+	    {
+	      vec_oprnd = VEC_index (tree, dr_chain, i);
+	      write_vector_array (stmt, gsi, vec_oprnd, vec_array, i);
+	    }
+
+	  /* Emit:
+	       MEM_REF[...all elements...] = STORE_LANES (VEC_ARRAY).  */
+	  data_ref = create_array_ref (aggr_type, dataref_ptr, first_dr);
+	  new_stmt = gimple_build_call_internal (IFN_STORE_LANES, 1, vec_array);
+	  gimple_call_set_lhs (new_stmt, data_ref);
+	  vect_finish_stmt_generation (stmt, new_stmt, gsi);
+	  mark_symbols_for_renaming (new_stmt);
+	}
+      else
 	{
 	  new_stmt = NULL;
 	  if (strided_store)
@@ -3811,6 +3925,7 @@ vectorizable_load (gimple stmt, gimple_s
   gimple phi = NULL;
   VEC(tree,heap) *dr_chain = NULL;
   bool strided_load = false;
+  bool load_lanes_p = false;
   gimple first_stmt;
   tree scalar_type;
   bool inv_p;
@@ -3823,6 +3938,7 @@ vectorizable_load (gimple stmt, gimple_s
   enum tree_code code;
   bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
   int vf;
+  tree aggr_type;
 
   if (loop_vinfo)
     {
@@ -3918,7 +4034,9 @@ vectorizable_load (gimple stmt, gimple_s
       if (!slp && !PURE_SLP_STMT (stmt_info))
 	{
 	  group_size = DR_GROUP_SIZE (vinfo_for_stmt (first_stmt));
-	  if (!vect_strided_load_supported (vectype, group_size))
+	  if (vect_load_lanes_supported (vectype, group_size))
+	    load_lanes_p = true;
+	  else if (!vect_strided_load_supported (vectype, group_size))
 	    return false;
 	}
     }
@@ -3945,7 +4063,7 @@ vectorizable_load (gimple stmt, gimple_s
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = load_vec_info_type;
-      vect_model_load_cost (stmt_info, ncopies, NULL);
+      vect_model_load_cost (stmt_info, ncopies, load_lanes_p, NULL);
       return true;
     }
 
@@ -3986,6 +4104,11 @@ vectorizable_load (gimple stmt, gimple_s
 
   alignment_support_scheme = vect_supportable_dr_alignment (first_dr, false);
   gcc_assert (alignment_support_scheme);
+  /* Targets with load-lane instructions must not require explicit
+     realignment.  */
+  gcc_assert (!load_lanes_p
+	      || alignment_support_scheme == dr_aligned
+	      || alignment_support_scheme == dr_unaligned_supported);
 
   /* In case the vectorization factor (VF) is bigger than the number
      of elements that we can fit in a vectype (nunits), we have to generate
@@ -4117,22 +4240,52 @@ vectorizable_load (gimple stmt, gimple_s
   if (negative)
     offset = size_int (-TYPE_VECTOR_SUBPARTS (vectype) + 1);
 
+  if (load_lanes_p)
+    aggr_type = build_simple_array_type (elem_type, vec_num * nunits);
+  else
+    aggr_type = vectype;
+
   prev_stmt_info = NULL;
   for (j = 0; j < ncopies; j++)
     {
-      /* 1. Create the vector pointer update chain.  */
+      /* 1. Create the vector or array pointer update chain.  */
       if (j == 0)
-        dataref_ptr = vect_create_data_ref_ptr (first_stmt, vectype, at_loop,
+        dataref_ptr = vect_create_data_ref_ptr (first_stmt, aggr_type, at_loop,
 						offset, &dummy, gsi,
 						&ptr_incr, false, &inv_p);
       else
-        dataref_ptr =
-		bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt, NULL_TREE);
+        dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt,
+				       TYPE_SIZE_UNIT (aggr_type));
 
       if (strided_load || slp_perm)
 	dr_chain = VEC_alloc (tree, heap, vec_num);
 
-      if (1)
+      if (load_lanes_p)
+	{
+	  tree vec_array;
+
+	  vec_array = create_vector_array (vectype, vec_num);
+
+	  /* Emit:
+	       VEC_ARRAY = LOAD_LANES (MEM_REF[...all elements...]).  */
+	  data_ref = create_array_ref (aggr_type, dataref_ptr, first_dr);
+	  new_stmt = gimple_build_call_internal (IFN_LOAD_LANES, 1, data_ref);
+	  gimple_call_set_lhs (new_stmt, vec_array);
+	  vect_finish_stmt_generation (stmt, new_stmt, gsi);
+	  mark_symbols_for_renaming (new_stmt);
+
+	  /* Extract each vector into an SSA_NAME.  */
+	  for (i = 0; i < vec_num; i++)
+	    {
+	      new_temp = read_vector_array (stmt, gsi, scalar_dest,
+					    vec_array, i);
+	      VEC_quick_push (tree, dr_chain, new_temp);
+	    }
+
+	  /* Record the mapping between SSA_NAMEs and statements.  */
+	  vect_record_strided_load_vectors (stmt, dr_chain);
+	}
+      else
 	{
 	  for (i = 0; i < vec_num; i++)
 	    {
@@ -4349,7 +4502,8 @@ vectorizable_load (gimple stmt, gimple_s
         {
           if (strided_load)
   	    {
-	      vect_transform_strided_load (stmt, dr_chain, group_size, gsi);
+	      if (!load_lanes_p)
+		vect_transform_strided_load (stmt, dr_chain, group_size, gsi);
 	      *vec_stmt = STMT_VINFO_VEC_STMT (stmt_info);
 	    }
           else
Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/tree-vect-slp.c	2011-04-12 14:48:28.000000000 +0100
@@ -215,7 +215,8 @@ vect_get_and_check_slp_defs (loop_vec_in
 	    vect_model_simple_cost (stmt_info, ncopies_for_cost, dt, slp_node);
 	  else
 	    /* Store.  */
-	    vect_model_store_cost (stmt_info, ncopies_for_cost, dt[0], slp_node);
+	    vect_model_store_cost (stmt_info, ncopies_for_cost, false,
+				   dt[0], slp_node);
 	}
 
       else
@@ -579,7 +580,7 @@ vect_build_slp_tree (loop_vec_info loop_
 
                   /* Analyze costs (for the first stmt in the group).  */
                   vect_model_load_cost (vinfo_for_stmt (stmt),
-                                        ncopies_for_cost, *node);
+                                        ncopies_for_cost, false, *node);
                 }
 
               /* Store the place of this load in the interleaving chain.  In

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [6/9] NEON vec_load_lanes and vec_store_lanes patterns
  2011-04-12 13:21 [0/9] Direct support for loads and stores of interleaved vectors Richard Sandiford
                   ` (4 preceding siblings ...)
  2011-04-12 13:59 ` [5/9] Main target-independent support for direct interleaving Richard Sandiford
@ 2011-04-12 14:01 ` Richard Sandiford
  2011-04-15 13:20   ` Richard Earnshaw
  2011-04-12 14:14 ` [7/9] Testsuite: remove vect_{extract_even_odd,strided}_wide Richard Sandiford
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Richard Sandiford @ 2011-04-12 14:01 UTC (permalink / raw)
  To: gcc-patches; +Cc: patches

This patch adds vec_load_lanes and vec_store_lanes patterns for NEON.
They feed directly into the corresponding intrinsic patterns.

Tested on x86_64-linux-gnu and arm-linux-gnueabi.  OK to install?

Richard


gcc/
	* config/arm/neon.md (vec_load_lanes<mode><mode>): New expanders,
	(vec_store_lanes<mode><mode>): Likewise.

Index: gcc/config/arm/neon.md
===================================================================
--- gcc/config/arm/neon.md	2011-04-12 11:54:04.000000000 +0100
+++ gcc/config/arm/neon.md	2011-04-12 11:55:11.000000000 +0100
@@ -4258,6 +4258,12 @@ (define_expand "neon_vreinterpretv2di<mo
   DONE;
 })
 
+(define_expand "vec_load_lanes<mode><mode>"
+  [(set (match_operand:VDQX 0 "s_register_operand")
+        (unspec:VDQX [(match_operand:VDQX 1 "neon_struct_operand")]
+                     UNSPEC_VLD1))]
+  "TARGET_NEON")
+
 (define_insn "neon_vld1<mode>"
   [(set (match_operand:VDQX 0 "s_register_operand" "=w")
         (unspec:VDQX [(match_operand:VDQX 1 "neon_struct_operand" "Um")]
@@ -4355,6 +4361,12 @@ (define_insn "neon_vld1_dup<mode>"
                     (const_string "neon_vld1_1_2_regs")))]
 )
 
+(define_expand "vec_store_lanes<mode><mode>"
+  [(set (match_operand:VDQX 0 "neon_struct_operand")
+	(unspec:VDQX [(match_operand:VDQX 1 "s_register_operand")]
+		     UNSPEC_VST1))]
+  "TARGET_NEON")
+
 (define_insn "neon_vst1<mode>"
   [(set (match_operand:VDQX 0 "neon_struct_operand" "=Um")
 	(unspec:VDQX [(match_operand:VDQX 1 "s_register_operand" "w")]
@@ -4411,6 +4423,13 @@ (define_insn "neon_vst1_lane<mode>"
   [(set_attr "neon_type" "neon_vst1_vst2_lane")]
 )
 
+(define_expand "vec_load_lanesti<mode>"
+  [(set (match_operand:TI 0 "s_register_operand")
+        (unspec:TI [(match_operand:TI 1 "neon_struct_operand")
+                    (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+		   UNSPEC_VLD2))]
+  "TARGET_NEON")
+
 (define_insn "neon_vld2<mode>"
   [(set (match_operand:TI 0 "s_register_operand" "=w")
         (unspec:TI [(match_operand:TI 1 "neon_struct_operand" "Um")
@@ -4429,6 +4448,13 @@ (define_insn "neon_vld2<mode>"
                     (const_string "neon_vld2_2_regs_vld1_vld2_all_lanes")))]
 )
 
+(define_expand "vec_load_lanesoi<mode>"
+  [(set (match_operand:OI 0 "s_register_operand")
+        (unspec:OI [(match_operand:OI 1 "neon_struct_operand")
+                    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+		   UNSPEC_VLD2))]
+  "TARGET_NEON")
+
 (define_insn "neon_vld2<mode>"
   [(set (match_operand:OI 0 "s_register_operand" "=w")
         (unspec:OI [(match_operand:OI 1 "neon_struct_operand" "Um")
@@ -4511,6 +4537,13 @@ (define_insn "neon_vld2_dup<mode>"
                     (const_string "neon_vld1_1_2_regs")))]
 )
 
+(define_expand "vec_store_lanesti<mode>"
+  [(set (match_operand:TI 0 "neon_struct_operand")
+	(unspec:TI [(match_operand:TI 1 "s_register_operand")
+                    (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                   UNSPEC_VST2))]
+  "TARGET_NEON")
+
 (define_insn "neon_vst2<mode>"
   [(set (match_operand:TI 0 "neon_struct_operand" "=Um")
         (unspec:TI [(match_operand:TI 1 "s_register_operand" "w")
@@ -4529,6 +4562,13 @@ (define_insn "neon_vst2<mode>"
                     (const_string "neon_vst1_1_2_regs_vst2_2_regs")))]
 )
 
+(define_expand "vec_store_lanesoi<mode>"
+  [(set (match_operand:OI 0 "neon_struct_operand")
+	(unspec:OI [(match_operand:OI 1 "s_register_operand")
+                    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                   UNSPEC_VST2))]
+  "TARGET_NEON")
+
 (define_insn "neon_vst2<mode>"
   [(set (match_operand:OI 0 "neon_struct_operand" "=Um")
 	(unspec:OI [(match_operand:OI 1 "s_register_operand" "w")
@@ -4594,6 +4634,13 @@ (define_insn "neon_vst2_lane<mode>"
   [(set_attr "neon_type" "neon_vst1_vst2_lane")]
 )
 
+(define_expand "vec_load_lanesei<mode>"
+  [(set (match_operand:EI 0 "s_register_operand")
+        (unspec:EI [(match_operand:EI 1 "neon_struct_operand")
+                    (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+		   UNSPEC_VLD3))]
+  "TARGET_NEON")
+
 (define_insn "neon_vld3<mode>"
   [(set (match_operand:EI 0 "s_register_operand" "=w")
         (unspec:EI [(match_operand:EI 1 "neon_struct_operand" "Um")
@@ -4612,6 +4659,16 @@ (define_insn "neon_vld3<mode>"
                     (const_string "neon_vld3_vld4")))]
 )
 
+(define_expand "vec_load_lanesci<mode>"
+  [(match_operand:CI 0 "s_register_operand")
+   (match_operand:CI 1 "neon_struct_operand")
+   (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+  "TARGET_NEON"
+{
+  emit_insn (gen_neon_vld3<mode> (operands[0], operands[1]));
+  DONE;
+})
+
 (define_expand "neon_vld3<mode>"
   [(match_operand:CI 0 "s_register_operand")
    (match_operand:CI 1 "neon_struct_operand")
@@ -4751,6 +4808,13 @@ (define_insn "neon_vld3_dup<mode>"
                     (const_string "neon_vld3_vld4_all_lanes")
                     (const_string "neon_vld1_1_2_regs")))])
 
+(define_expand "vec_store_lanesei<mode>"
+  [(set (match_operand:EI 0 "neon_struct_operand")
+	(unspec:EI [(match_operand:EI 1 "s_register_operand")
+                    (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                   UNSPEC_VST3))]
+  "TARGET_NEON")
+
 (define_insn "neon_vst3<mode>"
   [(set (match_operand:EI 0 "neon_struct_operand" "=Um")
         (unspec:EI [(match_operand:EI 1 "s_register_operand" "w")
@@ -4768,6 +4832,16 @@ (define_insn "neon_vst3<mode>"
                     (const_string "neon_vst1_1_2_regs_vst2_2_regs")
                     (const_string "neon_vst2_4_regs_vst3_vst4")))])
 
+(define_expand "vec_store_lanesci<mode>"
+  [(match_operand:CI 0 "neon_struct_operand")
+   (match_operand:CI 1 "s_register_operand")
+   (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+  "TARGET_NEON"
+{
+  emit_insn (gen_neon_vst3<mode> (operands[0], operands[1]));
+  DONE;
+})
+
 (define_expand "neon_vst3<mode>"
   [(match_operand:CI 0 "neon_struct_operand")
    (match_operand:CI 1 "s_register_operand")
@@ -4879,6 +4953,13 @@ (define_insn "neon_vst3_lane<mode>"
 }
 [(set_attr "neon_type" "neon_vst3_vst4_lane")])
 
+(define_expand "vec_load_lanesoi<mode>"
+  [(set (match_operand:OI 0 "s_register_operand")
+        (unspec:OI [(match_operand:OI 1 "neon_struct_operand")
+                    (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+		   UNSPEC_VLD4))]
+  "TARGET_NEON")
+
 (define_insn "neon_vld4<mode>"
   [(set (match_operand:OI 0 "s_register_operand" "=w")
         (unspec:OI [(match_operand:OI 1 "neon_struct_operand" "Um")
@@ -4897,6 +4978,16 @@ (define_insn "neon_vld4<mode>"
                     (const_string "neon_vld3_vld4")))]
 )
 
+(define_expand "vec_load_lanesxi<mode>"
+  [(match_operand:XI 0 "s_register_operand")
+   (match_operand:XI 1 "neon_struct_operand")
+   (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+  "TARGET_NEON"
+{
+  emit_insn (gen_neon_vld4<mode> (operands[0], operands[1]));
+  DONE;
+})
+
 (define_expand "neon_vld4<mode>"
   [(match_operand:XI 0 "s_register_operand")
    (match_operand:XI 1 "neon_struct_operand")
@@ -5043,6 +5134,13 @@ (define_insn "neon_vld4_dup<mode>"
                     (const_string "neon_vld1_1_2_regs")))]
 )
 
+(define_expand "vec_store_lanesoi<mode>"
+  [(set (match_operand:OI 0 "neon_struct_operand")
+	(unspec:OI [(match_operand:OI 1 "s_register_operand")
+                    (unspec:VDX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+                   UNSPEC_VST4))]
+  "TARGET_NEON")
+
 (define_insn "neon_vst4<mode>"
   [(set (match_operand:OI 0 "neon_struct_operand" "=Um")
         (unspec:OI [(match_operand:OI 1 "s_register_operand" "w")
@@ -5061,6 +5159,16 @@ (define_insn "neon_vst4<mode>"
                     (const_string "neon_vst2_4_regs_vst3_vst4")))]
 )
 
+(define_expand "vec_store_lanesxi<mode>"
+  [(match_operand:XI 0 "neon_struct_operand")
+   (match_operand:XI 1 "s_register_operand")
+   (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+  "TARGET_NEON"
+{
+  emit_insn (gen_neon_vst4<mode> (operands[0], operands[1]));
+  DONE;
+})
+
 (define_expand "neon_vst4<mode>"
   [(match_operand:XI 0 "neon_struct_operand")
    (match_operand:XI 1 "s_register_operand")

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [7/9] Testsuite: remove vect_{extract_even_odd,strided}_wide
  2011-04-12 13:21 [0/9] Direct support for loads and stores of interleaved vectors Richard Sandiford
                   ` (5 preceding siblings ...)
  2011-04-12 14:01 ` [6/9] NEON vec_load_lanes and vec_store_lanes patterns Richard Sandiford
@ 2011-04-12 14:14 ` Richard Sandiford
  2011-04-15 12:43   ` Richard Guenther
  2011-04-12 14:19 ` [8/9] Testsuite: split tests for strided accesses Richard Sandiford
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 27+ messages in thread
From: Richard Sandiford @ 2011-04-12 14:14 UTC (permalink / raw)
  To: gcc-patches; +Cc: patches

We have separate vect_extract_even_odd and vect_extract_even_odd_wide
target selectors, and separate vect_strided and vect_strided_wide
selectors.  The comment suggests that "wide" is for 32+ bits,
but we often use the non-wide forms for 32-bit tests.  We also have
tests that combine 16-bit and 32-bit strided accesses without checking
for both widths.

I'm about to split vect_strided into vect_stridedN (for each stride
factor N).  One option was to preserve the wide distinction and have
vect_stridedN_wide as well.  However, given the current usage,
and given that the two selectors are the same, I think it makes sense
to combine them until we know what distinction we need to make.

Tested on x86_64-linux-gnu and arm-linux-gnueabi.  OK to install?

Richard


gcc/testsuite/
	* lib/target-supports.exp
	(check_effective_target_vect_extract_even_odd_wide): Delete.
	(check_effective_target_vect_strided_wide): Likewise.
	* gcc.dg/vect/O3-pr39675-2.c: Use the non-wide versions instead.
	* gcc.dg/vect/fast-math-pr35982.c: Likewise.
	* gcc.dg/vect/fast-math-vect-complex-3.c: Likewise.
	* gcc.dg/vect/pr37539.c: Likewise.
	* gcc.dg/vect/slp-11.c: Likewise.
	* gcc.dg/vect/slp-12a.c: Likewise.
	* gcc.dg/vect/slp-12b.c: Likewise.
	* gcc.dg/vect/slp-19.c: Likewise.
	* gcc.dg/vect/slp-23.c: Likewise.
	* gcc.dg/vect/vect-1.c: Likewise.
	* gcc.dg/vect/vect-98.c: Likewise.
	* gcc.dg/vect/vect-107.c: Likewise.
	* gcc.dg/vect/vect-strided-float.c: Likewise.

Index: gcc/testsuite/lib/target-supports.exp
===================================================================
--- gcc/testsuite/lib/target-supports.exp	2011-04-12 11:53:54.000000000 +0100
+++ gcc/testsuite/lib/target-supports.exp	2011-04-12 11:55:11.000000000 +0100
@@ -3121,29 +3121,6 @@ proc check_effective_target_vect_extract
     return $et_vect_extract_even_odd_saved
 }
 
-# Return 1 if the target supports vector even/odd elements extraction of
-# vectors with SImode elements or larger, 0 otherwise.
-
-proc check_effective_target_vect_extract_even_odd_wide { } {
-    global et_vect_extract_even_odd_wide_saved
-    
-    if [info exists et_vect_extract_even_odd_wide_saved] {
-        verbose "check_effective_target_vect_extract_even_odd_wide: using cached result" 2
-    } else {
-        set et_vect_extract_even_odd_wide_saved 0 
-        if { [istarget powerpc*-*-*] 
-             || [istarget i?86-*-*]
-             || [istarget x86_64-*-*]
-             || [istarget ia64-*-*]
-             || [istarget spu-*-*] } {
-           set et_vect_extract_even_odd_wide_saved 1
-        }
-    }
-
-    verbose "check_effective_target_vect_extract_even_wide_odd: returning $et_vect_extract_even_odd_wide_saved" 2
-    return $et_vect_extract_even_odd_wide_saved
-}
-
 # Return 1 if the target supports vector interleaving, 0 otherwise.
 
 proc check_effective_target_vect_interleave { } {
@@ -3184,25 +3161,6 @@ proc check_effective_target_vect_strided
     return $et_vect_strided_saved
 }
 
-# Return 1 if the target supports vector interleaving and extract even/odd
-# for wide element types, 0 otherwise.
-proc check_effective_target_vect_strided_wide { } {
-    global et_vect_strided_wide_saved
-
-    if [info exists et_vect_strided_wide_saved] {
-        verbose "check_effective_target_vect_strided_wide: using cached result" 2
-    } else {
-        set et_vect_strided_wide_saved 0
-        if { [check_effective_target_vect_interleave]
-             && [check_effective_target_vect_extract_even_odd_wide] } {
-           set et_vect_strided_wide_saved 1
-        }
-    }
-
-    verbose "check_effective_target_vect_strided_wide: returning $et_vect_strided_wide_saved" 2
-    return $et_vect_strided_wide_saved
-}
-
 # Return 1 if the target supports section-anchors
 
 proc check_effective_target_section_anchors { } {
Index: gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c	2011-04-12 11:53:54.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c	2011-04-12 11:55:11.000000000 +0100
@@ -26,7 +26,7 @@ foo ()
     }
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided_wide } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_strided_wide } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_strided } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
   
Index: gcc/testsuite/gcc.dg/vect/fast-math-pr35982.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/fast-math-pr35982.c	2011-04-12 11:53:54.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/fast-math-pr35982.c	2011-04-12 11:55:11.000000000 +0100
@@ -20,7 +20,7 @@ float method2_int16 (struct mem *mem)
   return avg;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_extract_even_odd_wide  } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail vect_extract_even_odd_wide  } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_extract_even_odd  } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail vect_extract_even_odd  } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/fast-math-vect-complex-3.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/fast-math-vect-complex-3.c	2011-04-12 11:53:54.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/fast-math-vect-complex-3.c	2011-04-12 11:55:11.000000000 +0100
@@ -56,5 +56,5 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave  && vect_extract_even_odd_wide } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave  && vect_extract_even_odd } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/pr37539.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/pr37539.c	2011-04-12 11:53:54.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/pr37539.c	2011-04-12 11:55:11.000000000 +0100
@@ -40,7 +40,7 @@ int main ()
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_strided_wide } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_strided } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
 
Index: gcc/testsuite/gcc.dg/vect/slp-11.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-11.c	2011-04-12 11:53:54.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-11.c	2011-04-12 11:55:11.000000000 +0100
@@ -105,9 +105,9 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect"  { target { { vect_uintfloat_cvt && vect_strided_wide } &&  vect_int_mult } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  { target { { { ! vect_uintfloat_cvt } && vect_strided_wide } &&  vect_int_mult } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  {target  { ! { vect_int_mult && vect_strided_wide } } } } }  */
+/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect"  { target { { vect_uintfloat_cvt && vect_strided } &&  vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  { target { { { ! vect_uintfloat_cvt } && vect_strided } &&  vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  {target  { ! { vect_int_mult && vect_strided } } } } }  */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0  "vect"  } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-12a.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-12a.c	2011-04-12 11:53:54.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-12a.c	2011-04-12 11:55:11.000000000 +0100
@@ -94,11 +94,11 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  {target { vect_strided_wide && vect_int_mult} } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  {target { {! {vect_strided_wide}} && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  {target { vect_strided && vect_int_mult} } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  {target { {! {vect_strided}} && vect_int_mult } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  {target  { ! vect_int_mult } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" {target { vect_strided_wide && vect_int_mult } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" {target { {! {vect_strided_wide}} && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" {target { vect_strided && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" {target { {! {vect_strided}} && vect_int_mult } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" {target  { ! vect_int_mult } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-12b.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-12b.c	2011-04-12 11:53:54.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-12b.c	2011-04-12 11:55:11.000000000 +0100
@@ -43,9 +43,9 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  {target { vect_strided_wide && vect_int_mult } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  {target { { ! { vect_int_mult }} || { ! {vect_strided_wide}}} } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  {target { vect_strided_wide && vect_int_mult } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  {target { { ! { vect_int_mult }} || { ! {vect_strided_wide}}} } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  {target { vect_strided && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  {target { { ! { vect_int_mult }} || { ! {vect_strided}}} } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  {target { vect_strided && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  {target { { ! { vect_int_mult }} || { ! {vect_strided}}} } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-19.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-19.c	2011-04-12 11:53:54.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-19.c	2011-04-12 11:55:11.000000000 +0100
@@ -146,9 +146,9 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target  vect_strided_wide  } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target  { ! { vect_strided_wide } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect"  { target  vect_strided_wide  } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  { target { ! { vect_strided_wide } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target  vect_strided  } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target  { ! { vect_strided } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect"  { target  vect_strided  } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  { target { ! { vect_strided } } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-23.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-23.c	2011-04-12 11:53:54.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-23.c	2011-04-12 11:55:11.000000000 +0100
@@ -106,8 +106,8 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target { vect_strided_wide } && {! { vect_no_align} } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { ! { vect_strided_wide || vect_no_align} } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target { vect_strided } && {! { vect_no_align} } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { ! { vect_strided || vect_no_align} } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
   
Index: gcc/testsuite/gcc.dg/vect/vect-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-1.c	2011-04-12 11:53:54.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-1.c	2011-04-12 11:55:11.000000000 +0100
@@ -85,6 +85,6 @@ foo (int n)
   fbar (a);
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 6 loops" 1 "vect" { target vect_extract_even_odd_wide } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 5 loops" 1 "vect" { xfail vect_extract_even_odd_wide } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 6 loops" 1 "vect" { target vect_extract_even_odd } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 5 loops" 1 "vect" { xfail vect_extract_even_odd } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-98.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-98.c	2011-04-12 11:53:54.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-98.c	2011-04-12 11:55:11.000000000 +0100
@@ -38,6 +38,6 @@ int main (void)
 }
 
 /* Needs interleaving support.  */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd_wide } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { xfail  { vect_interleave && vect_extract_even_odd_wide } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { xfail  { vect_interleave && vect_extract_even_odd } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-107.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-107.c	2011-04-12 11:53:54.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-107.c	2011-04-12 11:55:11.000000000 +0100
@@ -40,6 +40,6 @@ int main (void)
   return main1 ();
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_extract_even_odd_wide } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail vect_extract_even_odd_wide } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_extract_even_odd } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail vect_extract_even_odd } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-strided-float.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-strided-float.c	2011-04-12 11:53:54.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-strided-float.c	2011-04-12 11:55:11.000000000 +0100
@@ -39,7 +39,7 @@ int main (void)
 }
 
 /* Needs interleaving support.  */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd_wide } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail { vect_interleave && vect_extract_even_odd_wide } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail { vect_interleave && vect_extract_even_odd } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
   

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [8/9] Testsuite: split tests for strided accesses
  2011-04-12 13:21 [0/9] Direct support for loads and stores of interleaved vectors Richard Sandiford
                   ` (6 preceding siblings ...)
  2011-04-12 14:14 ` [7/9] Testsuite: remove vect_{extract_even_odd,strided}_wide Richard Sandiford
@ 2011-04-12 14:19 ` Richard Sandiford
  2011-04-15 12:44   ` Richard Guenther
  2011-04-12 14:29 ` [9/9] Testsuite: Replace vect_strided with vect_stridedN Richard Sandiford
  2011-04-12 14:34 ` [10/9] Add tests for stride-3 accesses Richard Sandiford
  9 siblings, 1 reply; 27+ messages in thread
From: Richard Sandiford @ 2011-04-12 14:19 UTC (permalink / raw)
  To: gcc-patches; +Cc: patches

The next patch introduces separate vect_stridedN target selectors
for each tested stride factor N.  At the moment, some tests contain
several independent loops that have different stride factors.
It's easier to make the next change if we put these loops into
separate tests.

Tested on x86_64-linux-gnu and arm-linux-gnueabi.  OK to install?

Richard


gcc/testsuite/
	* gcc.dg/vect/slp-11.c: Split into...
	* gcc.dg/vect/slp-11a.c, gcc.dg/vect/slp-11b.c,
	gcc.dg/vect/slp-11c.c: ...these tests.
	* gcc.dg/vect/slp-12a.c: Split 4-stride loop into...
	* gcc.dg/vect/slp-12c.c: ...this new test.
	* gcc.dg/vect/slp-19.c: Split into...
	* gcc.dg/vect/slp-19a.c, gcc.dg/vect/slp-19b.c,
	gcc.dg/vect/slp-19c.c: ...these new tests.

Index: gcc/testsuite/gcc.dg/vect/slp-11.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-11.c	2011-04-12 15:18:24.000000000 +0100
+++ /dev/null	2011-03-23 08:42:11.268792848 +0000
@@ -1,113 +0,0 @@
-/* { dg-require-effective-target vect_int } */
-
-#include <stdarg.h>
-#include "tree-vect.h"
-
-#define N 8 
-
-int
-main1 ()
-{
-  int i;
-  unsigned int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7;
-  unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
-  float out2[N*8];
-
-  /* Different operations - not SLPable.  */
-  for (i = 0; i < N; i++)
-    {
-      a0 = in[i*8] + 5;
-      a1 = in[i*8 + 1] * 6;
-      a2 = in[i*8 + 2] + 7;
-      a3 = in[i*8 + 3] + 8;
-      a4 = in[i*8 + 4] + 9;
-      a5 = in[i*8 + 5] + 10;
-      a6 = in[i*8 + 6] + 11;
-      a7 = in[i*8 + 7] + 12;
-
-      b0 = a0 * 3;
-      b1 = a1 * 2;
-      b2 = a2 * 12;
-      b3 = a3 * 5;
-      b4 = a4 * 8;
-      b5 = a5 * 4;
-      b6 = a6 * 3;
-      b7 = a7 * 2;
-
-      out[i*8] = b0 - 2;
-      out[i*8 + 1] = b1 - 3; 
-      out[i*8 + 2] = b2 - 2;
-      out[i*8 + 3] = b3 - 1;
-      out[i*8 + 4] = b4 - 8;
-      out[i*8 + 5] = b5 - 7;
-      out[i*8 + 6] = b6 - 3;
-      out[i*8 + 7] = b7 - 7;
-    }
-
-  /* check results:  */
-  for (i = 0; i < N; i++)
-    {
-      if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
-         || out[i*8 + 1] != (in[i*8 + 1] * 6) * 2 - 3
-         || out[i*8 + 2] != (in[i*8 + 2] + 7) * 12 - 2
-         || out[i*8 + 3] != (in[i*8 + 3] + 8) * 5 - 1
-         || out[i*8 + 4] != (in[i*8 + 4] + 9) * 8 - 8
-         || out[i*8 + 5] != (in[i*8 + 5] + 10) * 4 - 7
-         || out[i*8 + 6] != (in[i*8 + 6] + 11) * 3 - 3
-         || out[i*8 + 7] != (in[i*8 + 7] + 12) * 2 - 7)
-	abort ();
-    }
-
-  /* Requires permutation - not SLPable.  */
-  for (i = 0; i < N*2; i++)
-    {
-      out[i*4] = (in[i*4] + 2) * 3;
-      out[i*4 + 1] = (in[i*4 + 2] + 2) * 7;
-      out[i*4 + 2] = (in[i*4 + 1] + 7) * 3;
-      out[i*4 + 3] = (in[i*4 + 3] + 3) * 4;
-    }
-
-  /* check results:  */
-  for (i = 0; i < N*2; i++)
-    {
-      if (out[i*4] !=  (in[i*4] + 2) * 3
-         || out[i*4 + 1] != (in[i*4 + 2] + 2) * 7
-         || out[i*4 + 2] != (in[i*4 + 1] + 7) * 3
-         || out[i*4 + 3] != (in[i*4 + 3] + 3) * 4)
-        abort ();
-    }
-
-  /* Different operations - not SLPable.  */
-  for (i = 0; i < N*4; i++)
-    {
-      out2[i*2] = ((float) in[i*2] * 2 + 6) ;
-      out2[i*2 + 1] = (float) (in[i*2 + 1] * 3 + 7);
-    }
-
-  /* check results:  */
-  for (i = 0; i < N*4; i++)
-    {
-      if (out2[i*2] !=  ((float) in[i*2] * 2 + 6)
-         || out2[i*2 + 1] != (float) (in[i*2 + 1] * 3 + 7))
-        abort ();
-    }
-
-
-  return 0;
-}
-
-int main (void)
-{
-  check_vect ();
-
-  main1 ();
-
-  return 0;
-}
-
-/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect"  { target { { vect_uintfloat_cvt && vect_strided } &&  vect_int_mult } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  { target { { { ! vect_uintfloat_cvt } && vect_strided } &&  vect_int_mult } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  {target  { ! { vect_int_mult && vect_strided } } } } }  */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0  "vect"  } } */
-/* { dg-final { cleanup-tree-dump "vect" } } */
-  
Index: gcc/testsuite/gcc.dg/vect/slp-11a.c
===================================================================
--- /dev/null	2011-03-23 08:42:11.268792848 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-11a.c	2011-04-12 15:18:25.000000000 +0100
@@ -0,0 +1,75 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 8
+
+int
+main1 ()
+{
+  int i;
+  unsigned int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7;
+  unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+
+  /* Different operations - not SLPable.  */
+  for (i = 0; i < N; i++)
+    {
+      a0 = in[i*8] + 5;
+      a1 = in[i*8 + 1] * 6;
+      a2 = in[i*8 + 2] + 7;
+      a3 = in[i*8 + 3] + 8;
+      a4 = in[i*8 + 4] + 9;
+      a5 = in[i*8 + 5] + 10;
+      a6 = in[i*8 + 6] + 11;
+      a7 = in[i*8 + 7] + 12;
+
+      b0 = a0 * 3;
+      b1 = a1 * 2;
+      b2 = a2 * 12;
+      b3 = a3 * 5;
+      b4 = a4 * 8;
+      b5 = a5 * 4;
+      b6 = a6 * 3;
+      b7 = a7 * 2;
+
+      out[i*8] = b0 - 2;
+      out[i*8 + 1] = b1 - 3;
+      out[i*8 + 2] = b2 - 2;
+      out[i*8 + 3] = b3 - 1;
+      out[i*8 + 4] = b4 - 8;
+      out[i*8 + 5] = b5 - 7;
+      out[i*8 + 6] = b6 - 3;
+      out[i*8 + 7] = b7 - 7;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
+         || out[i*8 + 1] != (in[i*8 + 1] * 6) * 2 - 3
+         || out[i*8 + 2] != (in[i*8 + 2] + 7) * 12 - 2
+         || out[i*8 + 3] != (in[i*8 + 3] + 8) * 5 - 1
+         || out[i*8 + 4] != (in[i*8 + 4] + 9) * 8 - 8
+         || out[i*8 + 5] != (in[i*8 + 5] + 10) * 4 - 7
+         || out[i*8 + 6] != (in[i*8 + 6] + 11) * 3 - 3
+         || out[i*8 + 7] != (in[i*8 + 7] + 12) * 2 - 7)
+	abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_strided && vect_int_mult } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/slp-11b.c
===================================================================
--- /dev/null	2011-03-23 08:42:11.268792848 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-11b.c	2011-04-12 15:18:25.000000000 +0100
@@ -0,0 +1,49 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 8
+
+int
+main1 ()
+{
+  int i;
+  unsigned int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7;
+  unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+
+  /* Requires permutation - not SLPable.  */
+  for (i = 0; i < N*2; i++)
+    {
+      out[i*4] = (in[i*4] + 2) * 3;
+      out[i*4 + 1] = (in[i*4 + 2] + 2) * 7;
+      out[i*4 + 2] = (in[i*4 + 1] + 7) * 3;
+      out[i*4 + 3] = (in[i*4 + 3] + 3) * 4;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N*2; i++)
+    {
+      if (out[i*4] !=  (in[i*4] + 2) * 3
+         || out[i*4 + 1] != (in[i*4 + 2] + 2) * 7
+         || out[i*4 + 2] != (in[i*4 + 1] + 7) * 3
+         || out[i*4 + 3] != (in[i*4 + 3] + 3) * 4)
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_strided && vect_int_mult } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/slp-11c.c
===================================================================
--- /dev/null	2011-03-23 08:42:11.268792848 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-11c.c	2011-04-12 15:18:25.000000000 +0100
@@ -0,0 +1,46 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 8
+
+int
+main1 ()
+{
+  int i;
+  unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+  float out[N*8];
+
+  /* Different operations - not SLPable.  */
+  for (i = 0; i < N*4; i++)
+    {
+      out[i*2] = ((float) in[i*2] * 2 + 6) ;
+      out[i*2 + 1] = (float) (in[i*2 + 1] * 3 + 7);
+    }
+
+  /* check results:  */
+  for (i = 0; i < N*4; i++)
+    {
+      if (out[i*2] !=  ((float) in[i*2] * 2 + 6)
+         || out[i*2 + 1] != (float) (in[i*2 + 1] * 3 + 7))
+        abort ();
+    }
+
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { { vect_uintfloat_cvt && vect_strided } && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { { vect_uintfloat_cvt && vect_strided } && vect_int_mult } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0  "vect"  } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/slp-12a.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-12a.c	2011-04-12 15:18:24.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-12a.c	2011-04-12 15:18:25.000000000 +0100
@@ -11,7 +11,7 @@ main1 ()
   int i;
   unsigned int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7;
   unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
-  unsigned int ia[N], ib[N*2];
+  unsigned int ia[N];
 
   for (i = 0; i < N; i++)
     {
@@ -61,27 +61,6 @@ main1 ()
 	abort ();
     }
 
-  for (i = 0; i < N*2; i++)
-    {
-      out[i*4] = (in[i*4] + 2) * 3;
-      out[i*4 + 1] = (in[i*4 + 1] + 2) * 7;
-      out[i*4 + 2] = (in[i*4 + 2] + 7) * 3;
-      out[i*4 + 3] = (in[i*4 + 3] + 7) * 7;
-
-      ib[i] = 7;
-    }
-
-  /* check results:  */
-  for (i = 0; i < N*2; i++)
-    {
-      if (out[i*4] !=  (in[i*4] + 2) * 3
-         || out[i*4 + 1] != (in[i*4 + 1] + 2) * 7
-         || out[i*4 + 2] != (in[i*4 + 2] + 7) * 3
-         || out[i*4 + 3] != (in[i*4 + 3] + 7) * 7 
-         || ib[i] != 7)
-        abort ();
-    }
-
   return 0;
 }
 
@@ -94,11 +73,8 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  {target { vect_strided && vect_int_mult} } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  {target { {! {vect_strided}} && vect_int_mult } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  {target  { ! vect_int_mult } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" {target { vect_strided && vect_int_mult } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" {target { {! {vect_strided}} && vect_int_mult } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" {target  { ! vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_strided && vect_int_mult } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { vect_strided && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! { vect_strided && vect_int_mult } } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
-  
Index: gcc/testsuite/gcc.dg/vect/slp-12c.c
===================================================================
--- /dev/null	2011-03-23 08:42:11.268792848 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-12c.c	2011-04-12 15:18:25.000000000 +0100
@@ -0,0 +1,53 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 8
+
+int
+main1 ()
+{
+  int i;
+  unsigned int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7;
+  unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+  unsigned int ia[N*2];
+
+  for (i = 0; i < N*2; i++)
+    {
+      out[i*4] = (in[i*4] + 2) * 3;
+      out[i*4 + 1] = (in[i*4 + 1] + 2) * 7;
+      out[i*4 + 2] = (in[i*4 + 2] + 7) * 3;
+      out[i*4 + 3] = (in[i*4 + 3] + 7) * 7;
+
+      ia[i] = 7;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N*2; i++)
+    {
+      if (out[i*4] !=  (in[i*4] + 2) * 3
+         || out[i*4 + 1] != (in[i*4 + 1] + 2) * 7
+         || out[i*4 + 2] != (in[i*4 + 2] + 7) * 3
+         || out[i*4 + 3] != (in[i*4 + 3] + 7) * 7
+         || ia[i] != 7)
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  { target { ! vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_int_mult } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! vect_int_mult } } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/slp-19.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-19.c	2011-04-12 15:18:24.000000000 +0100
+++ /dev/null	2011-03-23 08:42:11.268792848 +0000
@@ -1,154 +0,0 @@
-/* { dg-require-effective-target vect_int } */
-
-#include <stdarg.h>
-#include "tree-vect.h"
-
-#define N 16 
-
-int
-main1 ()
-{
-  unsigned int i;
-  unsigned int out[N*8];
-  unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
-  unsigned int ia[N*2], a0, a1, a2, a3;
-
-  for (i = 0; i < N; i++)
-    {
-      out[i*8] = in[i*8];
-      out[i*8 + 1] = in[i*8 + 1];
-      out[i*8 + 2] = in[i*8 + 2];
-      out[i*8 + 3] = in[i*8 + 3];
-      out[i*8 + 4] = in[i*8 + 4];
-      out[i*8 + 5] = in[i*8 + 5];
-      out[i*8 + 6] = in[i*8 + 6];
-      out[i*8 + 7] = in[i*8 + 7];
-    
-      ia[i] = in[i*8 + 2];
-    }
-
-  /* check results:  */
-  for (i = 0; i < N; i++)
-    {
-      if (out[i*8] !=  in[i*8]
-         || out[i*8 + 1] != in[i*8 + 1]
-         || out[i*8 + 2] != in[i*8 + 2]
-         || out[i*8 + 3] != in[i*8 + 3]
-         || out[i*8 + 4] != in[i*8 + 4]
-         || out[i*8 + 5] != in[i*8 + 5]
-         || out[i*8 + 6] != in[i*8 + 6]
-         || out[i*8 + 7] != in[i*8 + 7]
-         || ia[i] != in[i*8 + 2])
-	abort ();
-    }
-
-  for (i = 0; i < N*2; i++)
-    {
-      a0 = in[i*4] + 1;
-      a1 = in[i*4 + 1] + 2;
-      a2 = in[i*4 + 2] + 3;
-      a3 = in[i*4 + 3] + 4;
-
-      out[i*4] = a0;
-      out[i*4 + 1] = a1;
-      out[i*4 + 2] = a2;
-      out[i*4 + 3] = a3;
-
-      ia[i] = a2;
-    }
-
-  /* check results:  */
-  for (i = 0; i < N*2; i++)
-    {
-      if (out[i*4] !=  in[i*4] + 1
-         || out[i*4 + 1] != in[i*4 + 1] + 2
-         || out[i*4 + 2] != in[i*4 + 2] + 3
-         || out[i*4 + 3] != in[i*4 + 3] + 4
-         || ia[i] != in[i*4 + 2] + 3)
-        abort ();
-    }
-
-  /* The last stmt requires interleaving of not power of 2 size - not 
-     vectorizable.  */
-  for (i = 0; i < N/2; i++)
-    {
-      out[i*12] = in[i*12];
-      out[i*12 + 1] = in[i*12 + 1];
-      out[i*12 + 2] = in[i*12 + 2];
-      out[i*12 + 3] = in[i*12 + 3];
-      out[i*12 + 4] = in[i*12 + 4];
-      out[i*12 + 5] = in[i*12 + 5];
-      out[i*12 + 6] = in[i*12 + 6];
-      out[i*12 + 7] = in[i*12 + 7];
-      out[i*12 + 8] = in[i*12 + 8];
-      out[i*12 + 9] = in[i*12 + 9];
-      out[i*12 + 10] = in[i*12 + 10];
-      out[i*12 + 11] = in[i*12 + 11];
-
-      ia[i] = in[i*12 + 7];
-    }
-
-  /* check results:  */
-  for (i = 0; i < N/2; i++)
-    {
-      if (out[i*12] !=  in[i*12]
-         || out[i*12 + 1] != in[i*12 + 1]
-         || out[i*12 + 2] != in[i*12 + 2]
-         || out[i*12 + 3] != in[i*12 + 3]
-         || out[i*12 + 4] != in[i*12 + 4]
-         || out[i*12 + 5] != in[i*12 + 5]
-         || out[i*12 + 6] != in[i*12 + 6]
-         || out[i*12 + 7] != in[i*12 + 7]
-         || out[i*12 + 8] != in[i*12 + 8]
-         || out[i*12 + 9] != in[i*12 + 9]
-         || out[i*12 + 10] != in[i*12 + 10]
-         || out[i*12 + 11] != in[i*12 + 11]
-         || ia[i] != in[i*12 + 7])
-        abort ();
-    }
-
-  /* Hybrid SLP with unrolling by 2.  */
-  for (i = 0; i < N; i++)
-    {
-      out[i*6] = in[i*6];
-      out[i*6 + 1] = in[i*6 + 1];
-      out[i*6 + 2] = in[i*6 + 2];
-      out[i*6 + 3] = in[i*6 + 3];
-      out[i*6 + 4] = in[i*6 + 4];
-      out[i*6 + 5] = in[i*6 + 5];
-    
-      ia[i] = i;
-    } 
-    
-  /* check results:  */
-  for (i = 0; i < N/2; i++)
-    {
-      if (out[i*6] !=  in[i*6]
-         || out[i*6 + 1] != in[i*6 + 1]
-         || out[i*6 + 2] != in[i*6 + 2]
-         || out[i*6 + 3] != in[i*6 + 3]
-         || out[i*6 + 4] != in[i*6 + 4]
-         || out[i*6 + 5] != in[i*6 + 5]
-         || ia[i] != i)
-        abort ();
-    }
-
-
-  return 0;
-}
-
-int main (void)
-{
-  check_vect ();
-
-  main1 ();
-
-  return 0;
-}
-
-/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target  vect_strided  } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target  { ! { vect_strided } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect"  { target  vect_strided  } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  { target { ! { vect_strided } } } } } */
-/* { dg-final { cleanup-tree-dump "vect" } } */
-  
Index: gcc/testsuite/gcc.dg/vect/slp-19a.c
===================================================================
--- /dev/null	2011-03-23 08:42:11.268792848 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-19a.c	2011-04-12 15:18:25.000000000 +0100
@@ -0,0 +1,61 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 16
+
+int
+main1 ()
+{
+  unsigned int i;
+  unsigned int out[N*8];
+  unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+  unsigned int ia[N*2];
+
+  for (i = 0; i < N; i++)
+    {
+      out[i*8] = in[i*8];
+      out[i*8 + 1] = in[i*8 + 1];
+      out[i*8 + 2] = in[i*8 + 2];
+      out[i*8 + 3] = in[i*8 + 3];
+      out[i*8 + 4] = in[i*8 + 4];
+      out[i*8 + 5] = in[i*8 + 5];
+      out[i*8 + 6] = in[i*8 + 6];
+      out[i*8 + 7] = in[i*8 + 7];
+
+      ia[i] = in[i*8 + 2];
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (out[i*8] !=  in[i*8]
+         || out[i*8 + 1] != in[i*8 + 1]
+         || out[i*8 + 2] != in[i*8 + 2]
+         || out[i*8 + 3] != in[i*8 + 3]
+         || out[i*8 + 4] != in[i*8 + 4]
+         || out[i*8 + 5] != in[i*8 + 5]
+         || out[i*8 + 6] != in[i*8 + 6]
+         || out[i*8 + 7] != in[i*8 + 7]
+         || ia[i] != in[i*8 + 2])
+	abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! vect_strided } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_strided } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! vect_strided } } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/slp-19b.c
===================================================================
--- /dev/null	2011-03-23 08:42:11.268792848 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-19b.c	2011-04-12 15:18:25.000000000 +0100
@@ -0,0 +1,58 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 16
+
+int
+main1 ()
+{
+  unsigned int i;
+  unsigned int out[N*8];
+  unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+  unsigned int ia[N*2], a0, a1, a2, a3;
+
+  for (i = 0; i < N*2; i++)
+    {
+      a0 = in[i*4] + 1;
+      a1 = in[i*4 + 1] + 2;
+      a2 = in[i*4 + 2] + 3;
+      a3 = in[i*4 + 3] + 4;
+
+      out[i*4] = a0;
+      out[i*4 + 1] = a1;
+      out[i*4 + 2] = a2;
+      out[i*4 + 3] = a3;
+
+      ia[i] = a2;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N*2; i++)
+    {
+      if (out[i*4] !=  in[i*4] + 1
+         || out[i*4 + 1] != in[i*4 + 1] + 2
+         || out[i*4 + 2] != in[i*4 + 2] + 3
+         || out[i*4 + 3] != in[i*4 + 3] + 4
+         || ia[i] != in[i*4 + 2] + 3)
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! vect_strided } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_strided } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! vect_strided } } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/slp-19c.c
===================================================================
--- /dev/null	2011-03-23 08:42:11.268792848 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-19c.c	2011-04-12 15:18:25.000000000 +0100
@@ -0,0 +1,95 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 16
+
+int
+main1 ()
+{
+  unsigned int i;
+  unsigned int out[N*8];
+  unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
+  unsigned int ia[N*2], a0, a1, a2, a3;
+
+  /* The last stmt requires interleaving of not power of 2 size - not
+     vectorizable.  */
+  for (i = 0; i < N/2; i++)
+    {
+      out[i*12] = in[i*12];
+      out[i*12 + 1] = in[i*12 + 1];
+      out[i*12 + 2] = in[i*12 + 2];
+      out[i*12 + 3] = in[i*12 + 3];
+      out[i*12 + 4] = in[i*12 + 4];
+      out[i*12 + 5] = in[i*12 + 5];
+      out[i*12 + 6] = in[i*12 + 6];
+      out[i*12 + 7] = in[i*12 + 7];
+      out[i*12 + 8] = in[i*12 + 8];
+      out[i*12 + 9] = in[i*12 + 9];
+      out[i*12 + 10] = in[i*12 + 10];
+      out[i*12 + 11] = in[i*12 + 11];
+
+      ia[i] = in[i*12 + 7];
+    }
+
+  /* check results:  */
+  for (i = 0; i < N/2; i++)
+    {
+      if (out[i*12] !=  in[i*12]
+         || out[i*12 + 1] != in[i*12 + 1]
+         || out[i*12 + 2] != in[i*12 + 2]
+         || out[i*12 + 3] != in[i*12 + 3]
+         || out[i*12 + 4] != in[i*12 + 4]
+         || out[i*12 + 5] != in[i*12 + 5]
+         || out[i*12 + 6] != in[i*12 + 6]
+         || out[i*12 + 7] != in[i*12 + 7]
+         || out[i*12 + 8] != in[i*12 + 8]
+         || out[i*12 + 9] != in[i*12 + 9]
+         || out[i*12 + 10] != in[i*12 + 10]
+         || out[i*12 + 11] != in[i*12 + 11]
+         || ia[i] != in[i*12 + 7])
+        abort ();
+    }
+
+  /* Hybrid SLP with unrolling by 2.  */
+  for (i = 0; i < N; i++)
+    {
+      out[i*6] = in[i*6];
+      out[i*6 + 1] = in[i*6 + 1];
+      out[i*6 + 2] = in[i*6 + 2];
+      out[i*6 + 3] = in[i*6 + 3];
+      out[i*6 + 4] = in[i*6 + 4];
+      out[i*6 + 5] = in[i*6 + 5];
+
+      ia[i] = i;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N/2; i++)
+    {
+      if (out[i*6] !=  in[i*6]
+         || out[i*6 + 1] != in[i*6 + 1]
+         || out[i*6 + 2] != in[i*6 + 2]
+         || out[i*6 + 3] != in[i*6 + 3]
+         || out[i*6 + 4] != in[i*6 + 4]
+         || out[i*6 + 5] != in[i*6 + 5]
+         || ia[i] != i)
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [9/9] Testsuite: Replace vect_strided with vect_stridedN
  2011-04-12 13:21 [0/9] Direct support for loads and stores of interleaved vectors Richard Sandiford
                   ` (7 preceding siblings ...)
  2011-04-12 14:19 ` [8/9] Testsuite: split tests for strided accesses Richard Sandiford
@ 2011-04-12 14:29 ` Richard Sandiford
  2011-04-15 12:44   ` Richard Guenther
  2011-04-12 14:34 ` [10/9] Add tests for stride-3 accesses Richard Sandiford
  9 siblings, 1 reply; 27+ messages in thread
From: Richard Sandiford @ 2011-04-12 14:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: patches

This patch replaces the general vect_strided target selector with
a group of vect_stridedN selectors, one for each tested stride factor N.

Also, some tests used vect_interleave && vect_extract_even_odd for
strided accesses.  The two conditions used to be equivalent, but aren't
after this series for ARM, so I've replaced them with vect_stridedN instead.

Some tests used vect_interleave for loops that could be vectorised
even without extract-even/odd support.  vect_interleave used to be
a looser condition than vect_stridedN, but again isn't after this
series, so I've used vect_interleave || vect_stridedN where necessary.

Tested on x86_64-linux-gnu and arm-linux-gnueabi.  OK to install?

Richard


gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_vect_strided):
	Replace with...
	(check_effective_target_vect_strided2)
	(check_effective_target_vect_strided3)
	(check_effective_target_vect_strided4)
	(check_effective_target_vect_strided8): ...these new functions.

	* gcc.dg/vect/O3-pr39675-2.c: Update accordingly.
	* gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c: Likewise.
	* gcc.dg/vect/fast-math-slp-27.c: Likewise.
	* gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c: Likewise.
	* gcc.dg/vect/pr37539.c: Likewise.
	* gcc.dg/vect/slp-11a.c: Likewise.
	* gcc.dg/vect/slp-11b.c: Likewise.
	* gcc.dg/vect/slp-11c.c: Likewise.
	* gcc.dg/vect/slp-12a.c: Likewise.
	* gcc.dg/vect/slp-12b.c: Likewise.
	* gcc.dg/vect/slp-18.c: Likewise.
	* gcc.dg/vect/slp-19a.c: Likewise.
	* gcc.dg/vect/slp-19b.c: Likewise.
	* gcc.dg/vect/slp-21.c: Likewise.
	* gcc.dg/vect/slp-23.c: Likewise.
	* gcc.dg/vect/vect-cselim-1.c: Likewise.

	* gcc.dg/vect/fast-math-vect-complex-3.c: Use vect_stridedN
	instead of vect_interleave && vect_extract_even_odd.
	* gcc.dg/vect/no-scevccp-outer-10a.c: Likewise.
	* gcc.dg/vect/no-scevccp-outer-10b.c: Likewise.
	* gcc.dg/vect/no-scevccp-outer-20.c: Likewise.
	* gcc.dg/vect/vect-1.c: Likewise.
	* gcc.dg/vect/vect-10.c: Likewise.
	* gcc.dg/vect/vect-98.c: Likewise.
	* gcc.dg/vect/vect-107.c: Likewise.
	* gcc.dg/vect/vect-strided-a-mult.c: Likewise.
	* gcc.dg/vect/vect-strided-a-u16-i2.c: Likewise.
	* gcc.dg/vect/vect-strided-a-u16-i4.c: Likewise.
	* gcc.dg/vect/vect-strided-a-u16-mult.c: Likewise.
	* gcc.dg/vect/vect-strided-a-u32-mult.c: Likewise.
	* gcc.dg/vect/vect-strided-a-u8-i2-gap.c: Likewise.
	* gcc.dg/vect/vect-strided-a-u8-i8-gap2.c: Likewise.
	* gcc.dg/vect/vect-strided-a-u8-i8-gap7.c: Likewise.
	* gcc.dg/vect/vect-strided-float.c: Likewise.
	* gcc.dg/vect/vect-strided-mult-char-ls.c: Likewise.
	* gcc.dg/vect/vect-strided-mult.c: Likewise.
	* gcc.dg/vect/vect-strided-same-dr.c: Likewise.
	* gcc.dg/vect/vect-strided-u16-i2.c: Likewise.
	* gcc.dg/vect/vect-strided-u16-i4.c: Likewise.
	* gcc.dg/vect/vect-strided-u32-i4.c: Likewise.
	* gcc.dg/vect/vect-strided-u32-i8.c: Likewise.
	* gcc.dg/vect/vect-strided-u32-mult.c: Likewise.
	* gcc.dg/vect/vect-strided-u8-i2-gap.c: Likewise.
	* gcc.dg/vect/vect-strided-u8-i2.c: Likewise.
	* gcc.dg/vect/vect-strided-u8-i8-gap2.c: Likewise.
	* gcc.dg/vect/vect-strided-u8-i8-gap4.c: Likewise.
	* gcc.dg/vect/vect-strided-u8-i8-gap7.c: Likewise.
	* gcc.dg/vect/vect-strided-u8-i8.c: Likewise.
	* gcc.dg/vect/vect-vfa-03.c: Likewise.

	* gcc.dg/vect/no-scevccp-outer-18.c: Add vect_stridedN to the
	target condition.
	* gcc.dg/vect/pr30843.c: Likewise.
	* gcc.dg/vect/pr33866.c: Likewise.
	* gcc.dg/vect/slp-reduc-6.c: Likewise.
	* gcc.dg/vect/vect-strided-store-a-u8-i2.c: Likewise.
	* gcc.dg/vect/vect-strided-store-u16-i4.c: Likewise.
	* gcc.dg/vect/vect-strided-store-u32-i2.c: Likewise.

Index: gcc/testsuite/lib/target-supports.exp
===================================================================
--- gcc/testsuite/lib/target-supports.exp	2011-04-12 15:18:24.000000000 +0100
+++ gcc/testsuite/lib/target-supports.exp	2011-04-12 15:22:06.000000000 +0100
@@ -3143,22 +3143,30 @@ proc check_effective_target_vect_interle
     return $et_vect_interleave_saved
 }
 
-# Return 1 if the target supports vector interleaving and extract even/odd, 0 otherwise.
-proc check_effective_target_vect_strided { } {
-    global et_vect_strided_saved
+foreach N {2 3 4 8} {
+    eval [string map [list N $N] {
+	# Return 1 if the target supports 2-vector interleaving
+	proc check_effective_target_vect_stridedN { } {
+	    global et_vect_stridedN_saved
+
+	    if [info exists et_vect_stridedN_saved] {
+		verbose "check_effective_target_vect_stridedN: using cached result" 2
+	    } else {
+		set et_vect_stridedN_saved 0
+		if { (N & -N) == N
+		     && [check_effective_target_vect_interleave]
+		     && [check_effective_target_vect_extract_even_odd] } {
+		    set et_vect_stridedN_saved 1
+		}
+		if { [istarget arm*-*-*] && N >= 2 && N <= 4 } {
+		    set et_vect_stridedN_saved 1
+		}
+	    }
 
-    if [info exists et_vect_strided_saved] {
-        verbose "check_effective_target_vect_strided: using cached result" 2
-    } else {
-        set et_vect_strided_saved 0
-        if { [check_effective_target_vect_interleave]
-             && [check_effective_target_vect_extract_even_odd] } {
-           set et_vect_strided_saved 1
-        }
-    }
-
-    verbose "check_effective_target_vect_strided: returning $et_vect_strided_saved" 2
-    return $et_vect_strided_saved
+	    verbose "check_effective_target_vect_stridedN: returning $et_vect_stridedN_saved" 2
+	    return $et_vect_stridedN_saved
+	}
+    }]
 }
 
 # Return 1 if the target supports section-anchors
Index: gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c	2011-04-12 15:18:24.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c	2011-04-12 15:22:06.000000000 +0100
@@ -26,7 +26,7 @@ foo ()
     }
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_strided } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided4 } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_strided4 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
   
Index: gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c	2011-04-12 15:22:06.000000000 +0100
@@ -113,7 +113,7 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect"  {target { vect_strided && vect_int_mult } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" {target { vect_strided && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect"  {target { vect_strided8 && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" {target { vect_strided8 && vect_int_mult } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
   
Index: gcc/testsuite/gcc.dg/vect/fast-math-slp-27.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/fast-math-slp-27.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/fast-math-slp-27.c	2011-04-12 15:22:06.000000000 +0100
@@ -13,5 +13,5 @@ void foo(void)
    }
 }
 
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target vect_strided } } } */
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target vect_strided2 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c	2011-04-12 15:22:06.000000000 +0100
@@ -65,5 +65,5 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { xfail { vect_no_align || {! vect_strided } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { xfail { vect_no_align || { ! vect_strided2 } } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/pr37539.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/pr37539.c	2011-04-12 15:18:24.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/pr37539.c	2011-04-12 15:22:06.000000000 +0100
@@ -40,7 +40,7 @@ int main ()
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_strided } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target { vect_strided4 && vect_strided2 } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
 
Index: gcc/testsuite/gcc.dg/vect/slp-11a.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-11a.c	2011-04-12 15:18:25.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-11a.c	2011-04-12 15:22:06.000000000 +0100
@@ -69,7 +69,7 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided && vect_int_mult } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_strided && vect_int_mult } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided8 && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_strided8 && vect_int_mult } } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/slp-11b.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-11b.c	2011-04-12 15:18:25.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-11b.c	2011-04-12 15:22:06.000000000 +0100
@@ -43,7 +43,7 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided && vect_int_mult } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_strided && vect_int_mult } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided4 && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_strided4 && vect_int_mult } } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/slp-11c.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-11c.c	2011-04-12 15:18:25.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-11c.c	2011-04-12 15:22:06.000000000 +0100
@@ -40,7 +40,7 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { { vect_uintfloat_cvt && vect_strided } && vect_int_mult } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { { vect_uintfloat_cvt && vect_strided } && vect_int_mult } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { { vect_uintfloat_cvt && vect_strided2 } && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { { vect_uintfloat_cvt && vect_strided2 } && vect_int_mult } } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0  "vect"  } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/slp-12a.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-12a.c	2011-04-12 15:18:25.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-12a.c	2011-04-12 15:22:06.000000000 +0100
@@ -73,8 +73,8 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided && vect_int_mult } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_strided && vect_int_mult } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { vect_strided && vect_int_mult } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! { vect_strided && vect_int_mult } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided8 && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_strided8 && vect_int_mult } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { vect_strided8 && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! { vect_strided8 && vect_int_mult } } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/slp-12b.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-12b.c	2011-04-12 15:18:24.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-12b.c	2011-04-12 15:22:06.000000000 +0100
@@ -43,9 +43,9 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  {target { vect_strided && vect_int_mult } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  {target { { ! { vect_int_mult }} || { ! {vect_strided}}} } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  {target { vect_strided && vect_int_mult } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  {target { { ! { vect_int_mult }} || { ! {vect_strided}}} } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_strided2 && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  { target { { ! { vect_strided2 && vect_int_mult } } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  { target { vect_strided2 && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  { target { { ! { vect_strided2 && vect_int_mult } } } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-18.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-18.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-18.c	2011-04-12 15:22:06.000000000 +0100
@@ -91,7 +91,7 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { vect_strided } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided8 } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_strided8 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-19a.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-19a.c	2011-04-12 15:18:25.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-19a.c	2011-04-12 15:22:06.000000000 +0100
@@ -54,8 +54,8 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! vect_strided } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_strided } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! vect_strided } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided8 } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! vect_strided8 } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_strided8 } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! vect_strided8} } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/slp-19b.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-19b.c	2011-04-12 15:18:25.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-19b.c	2011-04-12 15:22:06.000000000 +0100
@@ -51,8 +51,8 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! vect_strided } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_strided } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! vect_strided } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided4 } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! vect_strided4 } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_strided4 } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! vect_strided4 } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/slp-21.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-21.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-21.c	2011-04-12 15:22:06.000000000 +0100
@@ -199,9 +199,9 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect"  { target { vect_strided || vect_extract_even_odd } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target  { ! { vect_strided || vect_extract_even_odd } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_strided }  } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  { target { ! { vect_strided } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect"  { target { vect_strided4 || vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target  { ! { vect_strided4 || vect_extract_even_odd } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_strided4 }  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  { target { ! { vect_strided4 } } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-23.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-23.c	2011-04-12 15:18:24.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-23.c	2011-04-12 15:22:06.000000000 +0100
@@ -106,8 +106,8 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target { vect_strided } && {! { vect_no_align} } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { ! { vect_strided || vect_no_align} } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target { vect_strided8 && { ! { vect_no_align} } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { ! { vect_strided8 || vect_no_align } } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
   
Index: gcc/testsuite/gcc.dg/vect/vect-cselim-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-cselim-1.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-cselim-1.c	2011-04-12 15:22:06.000000000 +0100
@@ -82,5 +82,5 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { xfail { vect_no_align || {! vect_strided } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { xfail { vect_no_align || { ! vect_strided2 } } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/fast-math-vect-complex-3.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/fast-math-vect-complex-3.c	2011-04-12 15:18:24.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/fast-math-vect-complex-3.c	2011-04-12 15:22:06.000000000 +0100
@@ -56,5 +56,5 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave  && vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided2 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10a.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10a.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10a.c	2011-04-12 15:22:06.000000000 +0100
@@ -54,5 +54,5 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target vect_strided2 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10b.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10b.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10b.c	2011-04-12 15:22:06.000000000 +0100
@@ -53,5 +53,5 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target vect_strided2 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/no-scevccp-outer-20.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/no-scevccp-outer-20.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/no-scevccp-outer-20.c	2011-04-12 15:22:06.000000000 +0100
@@ -50,5 +50,5 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target vect_strided2 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-1.c	2011-04-12 15:18:24.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-1.c	2011-04-12 15:22:06.000000000 +0100
@@ -85,6 +85,6 @@ foo (int n)
   fbar (a);
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 6 loops" 1 "vect" { target vect_extract_even_odd } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 5 loops" 1 "vect" { xfail vect_extract_even_odd } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 6 loops" 1 "vect" { target vect_strided2 } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 5 loops" 1 "vect" { xfail vect_strided2 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-10.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-10.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-10.c	2011-04-12 15:22:06.000000000 +0100
@@ -22,5 +22,5 @@ int foo ()
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { ! vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { ! vect_strided2 } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-98.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-98.c	2011-04-12 15:18:25.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-98.c	2011-04-12 15:22:06.000000000 +0100
@@ -38,6 +38,6 @@ int main (void)
 }
 
 /* Needs interleaving support.  */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { xfail  { vect_interleave && vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided4 } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { xfail  vect_strided4 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-107.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-107.c	2011-04-12 15:18:25.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-107.c	2011-04-12 15:22:06.000000000 +0100
@@ -40,6 +40,6 @@ int main (void)
   return main1 ();
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_extract_even_odd } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail vect_extract_even_odd } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided2 } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail vect_strided2 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-strided-a-mult.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-strided-a-mult.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-strided-a-mult.c	2011-04-12 15:22:06.000000000 +0100
@@ -71,6 +71,6 @@ int main (void)
   return 0;
 }   
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave && vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided2 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i2.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i2.c	2011-04-12 15:22:06.000000000 +0100
@@ -55,6 +55,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave && vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided2 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
   
Index: gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c	2011-04-12 15:22:06.000000000 +0100
@@ -68,6 +68,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave && vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided4 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
   
Index: gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-mult.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-mult.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-mult.c	2011-04-12 15:22:06.000000000 +0100
@@ -62,6 +62,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave && vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided2 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/vect-strided-a-u32-mult.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-strided-a-u32-mult.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-strided-a-u32-mult.c	2011-04-12 15:22:06.000000000 +0100
@@ -61,6 +61,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave && vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided2 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c	2011-04-12 15:22:06.000000000 +0100
@@ -69,6 +69,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave && vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided2 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
   
Index: gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2.c	2011-04-12 15:22:06.000000000 +0100
@@ -76,6 +76,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided8 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
   
Index: gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7.c	2011-04-12 15:22:06.000000000 +0100
@@ -81,6 +81,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided8 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
   
Index: gcc/testsuite/gcc.dg/vect/vect-strided-float.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-strided-float.c	2011-04-12 15:18:25.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-strided-float.c	2011-04-12 15:22:06.000000000 +0100
@@ -39,7 +39,7 @@ int main (void)
 }
 
 /* Needs interleaving support.  */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail { vect_interleave && vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided2 } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail vect_strided2 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
   
Index: gcc/testsuite/gcc.dg/vect/vect-strided-mult-char-ls.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-strided-mult-char-ls.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-strided-mult-char-ls.c	2011-04-12 15:22:06.000000000 +0100
@@ -71,6 +71,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave && vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided2 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
   
Index: gcc/testsuite/gcc.dg/vect/vect-strided-mult.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-strided-mult.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-strided-mult.c	2011-04-12 15:22:06.000000000 +0100
@@ -71,6 +71,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave && vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided2 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/vect-strided-same-dr.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-strided-same-dr.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-strided-same-dr.c	2011-04-12 15:22:06.000000000 +0100
@@ -72,5 +72,5 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave && vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided2 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-strided-u16-i2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-strided-u16-i2.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-strided-u16-i2.c	2011-04-12 15:22:06.000000000 +0100
@@ -55,6 +55,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave && vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided2 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
   
Index: gcc/testsuite/gcc.dg/vect/vect-strided-u16-i4.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-strided-u16-i4.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-strided-u16-i4.c	2011-04-12 15:22:06.000000000 +0100
@@ -68,6 +68,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave && vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided4 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
   
Index: gcc/testsuite/gcc.dg/vect/vect-strided-u32-i4.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-strided-u32-i4.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-strided-u32-i4.c	2011-04-12 15:22:06.000000000 +0100
@@ -63,6 +63,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave && vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided4 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
   
Index: gcc/testsuite/gcc.dg/vect/vect-strided-u32-i8.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-strided-u32-i8.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-strided-u32-i8.c	2011-04-12 15:22:06.000000000 +0100
@@ -77,6 +77,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided8 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
   
Index: gcc/testsuite/gcc.dg/vect/vect-strided-u32-mult.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-strided-u32-mult.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-strided-u32-mult.c	2011-04-12 15:22:06.000000000 +0100
@@ -60,6 +60,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave && vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided2 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2-gap.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2-gap.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2-gap.c	2011-04-12 15:22:06.000000000 +0100
@@ -71,6 +71,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave && vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided2 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
   
Index: gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2.c	2011-04-12 15:22:06.000000000 +0100
@@ -54,6 +54,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave && vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided2 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
    
Index: gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2.c	2011-04-12 15:22:06.000000000 +0100
@@ -78,6 +78,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided8 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
   
Index: gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c	2011-04-12 15:22:06.000000000 +0100
@@ -98,6 +98,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided8 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
   
Index: gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7.c	2011-04-12 15:22:06.000000000 +0100
@@ -83,6 +83,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided8 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
   
Index: gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8.c	2011-04-12 15:22:06.000000000 +0100
@@ -85,6 +85,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided8 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
   
Index: gcc/testsuite/gcc.dg/vect/vect-vfa-03.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-vfa-03.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-vfa-03.c	2011-04-12 15:22:06.000000000 +0100
@@ -53,6 +53,6 @@ main (void)
 } 
 
 /* Needs interleaving support.  */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { xfail  { vect_interleave && vect_extract_even_odd } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided2 } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { xfail  vect_strided2 } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/no-scevccp-outer-18.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/no-scevccp-outer-18.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/no-scevccp-outer-18.c	2011-04-12 15:22:06.000000000 +0100
@@ -47,5 +47,5 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target vect_interleave } } } */
+/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target { vect_interleave || vect_strided2 } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/pr30843.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/pr30843.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/pr30843.c	2011-04-12 15:22:06.000000000 +0100
@@ -20,6 +20,6 @@ void dacP98FillRGBMap (unsigned char *pB
     }
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave || vect_strided4 } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/pr33866.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/pr33866.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/pr33866.c	2011-04-12 15:22:06.000000000 +0100
@@ -27,6 +27,6 @@ void test_select_fill_hyper_simple (long
 }
 
 /* Needs interleaving support.  */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave || vect_strided2 } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/slp-reduc-6.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-reduc-6.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-reduc-6.c	2011-04-12 15:22:06.000000000 +0100
@@ -42,7 +42,7 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { xfail { vect_no_int_add || { ! vect_unpack } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { xfail { vect_no_int_add || { ! { vect_unpack || vect_strided2 } } } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" } } */
 /* { dg-final { scan-tree-dump-times "different interleaving chains in one node" 1 "vect" { target { ! vect_no_int_add } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-strided-store-a-u8-i2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-strided-store-a-u8-i2.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-strided-store-a-u8-i2.c	2011-04-12 15:22:06.000000000 +0100
@@ -55,6 +55,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave || vect_strided2 } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/vect-strided-store-u16-i4.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-strided-store-u16-i4.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-strided-store-u16-i4.c	2011-04-12 15:22:06.000000000 +0100
@@ -65,8 +65,8 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect"  { target { vect_interleave && vect_pack_trunc  } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { { ! { vect_interleave } } && { vect_pack_trunc } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect"  { target { { vect_interleave || vect_strided4 } && vect_pack_trunc } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { { ! { vect_interleave || vect_strided4 } } && { vect_pack_trunc } } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
 
Index: gcc/testsuite/gcc.dg/vect/vect-strided-store-u32-i2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-strided-store-u32-i2.c	2011-04-12 12:16:46.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-strided-store-u32-i2.c	2011-04-12 15:22:06.000000000 +0100
@@ -39,7 +39,7 @@ int main (void)
 }
 
 /* Needs interleaving support.  */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave } } } } */
-/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail { vect_interleave } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave || vect_strided2 } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail { vect_interleave || vect_strided2 } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [10/9] Add tests for stride-3 accesses
  2011-04-12 13:21 [0/9] Direct support for loads and stores of interleaved vectors Richard Sandiford
                   ` (8 preceding siblings ...)
  2011-04-12 14:29 ` [9/9] Testsuite: Replace vect_strided with vect_stridedN Richard Sandiford
@ 2011-04-12 14:34 ` Richard Sandiford
  2011-04-15 12:45   ` Richard Guenther
  9 siblings, 1 reply; 27+ messages in thread
From: Richard Sandiford @ 2011-04-12 14:34 UTC (permalink / raw)
  To: gcc-patches; +Cc: patches

This patch adds a test for stride-3 accesses.  I didn't add any
particularly complicated cases because I think the testsuite already
covers the interaction between the strided loads & stores and other
operations pretty well.  Let me know if there's something I should
add though.

Tested on x86_64-linux-gnu and arm-linux-gnueabi.  OK to install?

Richard


gcc/testsuite/
	* gcc.dg/vect/vect-strided-u16-i3.c: New test.

Index: gcc/testsuite/gcc.dg/vect/vect-strided-u16-i3.c
===================================================================
--- /dev/null	2011-03-23 08:42:11.268792848 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-strided-u16-i3.c	2011-04-12 11:55:17.000000000 +0100
@@ -0,0 +1,112 @@
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 128
+
+typedef struct {
+   unsigned short a;
+   unsigned short b;
+   unsigned short c;
+} s;
+
+#define A(I) (I)
+#define B(I) ((I) * 2)
+#define C(I) ((unsigned short) ~((I) ^ 0x18))
+
+void __attribute__ ((noinline))
+check1 (s *res)
+{
+  int i;
+
+  for (i = 0; i < N; i++)
+    if (res[i].a != C (i)
+	|| res[i].b != A (i)
+	|| res[i].c != B (i))
+      abort ();
+}
+
+void __attribute__ ((noinline))
+check2 (unsigned short *res)
+{
+  int i;
+
+  for (i = 0; i < N; i++)
+    if (res[i] != (unsigned short) (A (i) + B (i) + C (i)))
+      abort ();
+}
+
+void __attribute__ ((noinline))
+check3 (s *res)
+{
+  int i;
+
+  for (i = 0; i < N; i++)
+    if (res[i].a != i
+	|| res[i].b != i
+	|| res[i].c != i)
+      abort ();
+}
+
+void __attribute__ ((noinline))
+check4 (unsigned short *res)
+{
+  int i;
+
+  for (i = 0; i < N; i++)
+    if (res[i] != (unsigned short) (A (i) + B (i)))
+      abort ();
+}
+
+void __attribute__ ((noinline))
+main1 (s *arr)
+{
+  int i;
+  s *ptr = arr;
+  s res1[N];
+  unsigned short res2[N];
+
+  for (i = 0; i < N; i++)
+    {
+      res1[i].a = arr[i].c;
+      res1[i].b = arr[i].a;
+      res1[i].c = arr[i].b;
+    }
+  check1 (res1);
+
+  for (i = 0; i < N; i++)
+    res2[i] = arr[i].a + arr[i].b + arr[i].c;
+  check2 (res2);
+
+  for (i = 0; i < N; i++)
+    {
+      res1[i].a = i;
+      res1[i].b = i;
+      res1[i].c = i;
+    }
+  check3 (res1);
+
+  for (i = 0; i < N; i++)
+    res2[i] = arr[i].a + arr[i].b;
+  check4 (res2);
+}
+
+int main (void)
+{
+  int i;
+  s arr[N];
+
+  check_vect ();
+
+  for (i = 0; i < N; i++)
+    {
+      arr[i].a = A (i);
+      arr[i].b = B (i);
+      arr[i].c = C (i);
+    }
+  main1 (arr);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect"  { target vect_strided3 } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [2/9] Reindent parts of vectorizable_load and vectorizable_store
  2011-04-12 13:33   ` Richard Guenther
@ 2011-04-12 14:39     ` Richard Sandiford
  0 siblings, 0 replies; 27+ messages in thread
From: Richard Sandiford @ 2011-04-12 14:39 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, patches

Richard Guenther <richard.guenther@gmail.com> writes:
> On Tue, Apr 12, 2011 at 3:28 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> This patch just reindents part of vectorizable_load and vectorizable_store
>> so that the main diff is easier to read.  It also CSEs the element type,
>> which seemed better than breaking the long lines.
>>
>> I've included both the real diff and a -b version.
>>
>> Tested on x86_64-linux-gnu and arm-linux-gnueabi.  OK to install?
>
> CSEing element type is ok, but please don't install patches (separately)
> that introduce if (1)s.  I suppose this patch is to make followups smaller?

Yeah, patch 5 was pretty unreadable otherwise.

Richard

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [7/9] Testsuite: remove vect_{extract_even_odd,strided}_wide
  2011-04-12 14:14 ` [7/9] Testsuite: remove vect_{extract_even_odd,strided}_wide Richard Sandiford
@ 2011-04-15 12:43   ` Richard Guenther
  0 siblings, 0 replies; 27+ messages in thread
From: Richard Guenther @ 2011-04-15 12:43 UTC (permalink / raw)
  To: gcc-patches, patches, richard.sandiford

On Tue, Apr 12, 2011 at 4:14 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> We have separate vect_extract_even_odd and vect_extract_even_odd_wide
> target selectors, and separate vect_strided and vect_strided_wide
> selectors.  The comment suggests that "wide" is for 32+ bits,
> but we often use the non-wide forms for 32-bit tests.  We also have
> tests that combine 16-bit and 32-bit strided accesses without checking
> for both widths.
>
> I'm about to split vect_strided into vect_stridedN (for each stride
> factor N).  One option was to preserve the wide distinction and have
> vect_stridedN_wide as well.  However, given the current usage,
> and given that the two selectors are the same, I think it makes sense
> to combine them until we know what distinction we need to make.
>
> Tested on x86_64-linux-gnu and arm-linux-gnueabi.  OK to install?

Ok.

Thanks,
Richard.

> Richard
>
>
> gcc/testsuite/
>        * lib/target-supports.exp
>        (check_effective_target_vect_extract_even_odd_wide): Delete.
>        (check_effective_target_vect_strided_wide): Likewise.
>        * gcc.dg/vect/O3-pr39675-2.c: Use the non-wide versions instead.
>        * gcc.dg/vect/fast-math-pr35982.c: Likewise.
>        * gcc.dg/vect/fast-math-vect-complex-3.c: Likewise.
>        * gcc.dg/vect/pr37539.c: Likewise.
>        * gcc.dg/vect/slp-11.c: Likewise.
>        * gcc.dg/vect/slp-12a.c: Likewise.
>        * gcc.dg/vect/slp-12b.c: Likewise.
>        * gcc.dg/vect/slp-19.c: Likewise.
>        * gcc.dg/vect/slp-23.c: Likewise.
>        * gcc.dg/vect/vect-1.c: Likewise.
>        * gcc.dg/vect/vect-98.c: Likewise.
>        * gcc.dg/vect/vect-107.c: Likewise.
>        * gcc.dg/vect/vect-strided-float.c: Likewise.
>
> Index: gcc/testsuite/lib/target-supports.exp
> ===================================================================
> --- gcc/testsuite/lib/target-supports.exp       2011-04-12 11:53:54.000000000 +0100
> +++ gcc/testsuite/lib/target-supports.exp       2011-04-12 11:55:11.000000000 +0100
> @@ -3121,29 +3121,6 @@ proc check_effective_target_vect_extract
>     return $et_vect_extract_even_odd_saved
>  }
>
> -# Return 1 if the target supports vector even/odd elements extraction of
> -# vectors with SImode elements or larger, 0 otherwise.
> -
> -proc check_effective_target_vect_extract_even_odd_wide { } {
> -    global et_vect_extract_even_odd_wide_saved
> -
> -    if [info exists et_vect_extract_even_odd_wide_saved] {
> -        verbose "check_effective_target_vect_extract_even_odd_wide: using cached result" 2
> -    } else {
> -        set et_vect_extract_even_odd_wide_saved 0
> -        if { [istarget powerpc*-*-*]
> -             || [istarget i?86-*-*]
> -             || [istarget x86_64-*-*]
> -             || [istarget ia64-*-*]
> -             || [istarget spu-*-*] } {
> -           set et_vect_extract_even_odd_wide_saved 1
> -        }
> -    }
> -
> -    verbose "check_effective_target_vect_extract_even_wide_odd: returning $et_vect_extract_even_odd_wide_saved" 2
> -    return $et_vect_extract_even_odd_wide_saved
> -}
> -
>  # Return 1 if the target supports vector interleaving, 0 otherwise.
>
>  proc check_effective_target_vect_interleave { } {
> @@ -3184,25 +3161,6 @@ proc check_effective_target_vect_strided
>     return $et_vect_strided_saved
>  }
>
> -# Return 1 if the target supports vector interleaving and extract even/odd
> -# for wide element types, 0 otherwise.
> -proc check_effective_target_vect_strided_wide { } {
> -    global et_vect_strided_wide_saved
> -
> -    if [info exists et_vect_strided_wide_saved] {
> -        verbose "check_effective_target_vect_strided_wide: using cached result" 2
> -    } else {
> -        set et_vect_strided_wide_saved 0
> -        if { [check_effective_target_vect_interleave]
> -             && [check_effective_target_vect_extract_even_odd_wide] } {
> -           set et_vect_strided_wide_saved 1
> -        }
> -    }
> -
> -    verbose "check_effective_target_vect_strided_wide: returning $et_vect_strided_wide_saved" 2
> -    return $et_vect_strided_wide_saved
> -}
> -
>  # Return 1 if the target supports section-anchors
>
>  proc check_effective_target_section_anchors { } {
> Index: gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c    2011-04-12 11:53:54.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c    2011-04-12 11:55:11.000000000 +0100
> @@ -26,7 +26,7 @@ foo ()
>     }
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided_wide } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_strided_wide } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_strided } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/fast-math-pr35982.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/fast-math-pr35982.c       2011-04-12 11:53:54.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/fast-math-pr35982.c       2011-04-12 11:55:11.000000000 +0100
> @@ -20,7 +20,7 @@ float method2_int16 (struct mem *mem)
>   return avg;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_extract_even_odd_wide  } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail vect_extract_even_odd_wide  } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_extract_even_odd  } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail vect_extract_even_odd  } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/fast-math-vect-complex-3.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/fast-math-vect-complex-3.c        2011-04-12 11:53:54.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/fast-math-vect-complex-3.c        2011-04-12 11:55:11.000000000 +0100
> @@ -56,5 +56,5 @@ main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave  && vect_extract_even_odd_wide } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave  && vect_extract_even_odd } } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/pr37539.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/pr37539.c 2011-04-12 11:53:54.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/pr37539.c 2011-04-12 11:55:11.000000000 +0100
> @@ -40,7 +40,7 @@ int main ()
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_strided_wide } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_strided } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
>
> Index: gcc/testsuite/gcc.dg/vect/slp-11.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/slp-11.c  2011-04-12 11:53:54.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/slp-11.c  2011-04-12 11:55:11.000000000 +0100
> @@ -105,9 +105,9 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect"  { target { { vect_uintfloat_cvt && vect_strided_wide } &&  vect_int_mult } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  { target { { { ! vect_uintfloat_cvt } && vect_strided_wide } &&  vect_int_mult } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  {target  { ! { vect_int_mult && vect_strided_wide } } } } }  */
> +/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect"  { target { { vect_uintfloat_cvt && vect_strided } &&  vect_int_mult } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  { target { { { ! vect_uintfloat_cvt } && vect_strided } &&  vect_int_mult } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  {target  { ! { vect_int_mult && vect_strided } } } } }  */
>  /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0  "vect"  } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/slp-12a.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/slp-12a.c 2011-04-12 11:53:54.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/slp-12a.c 2011-04-12 11:55:11.000000000 +0100
> @@ -94,11 +94,11 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  {target { vect_strided_wide && vect_int_mult} } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  {target { {! {vect_strided_wide}} && vect_int_mult } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  {target { vect_strided && vect_int_mult} } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  {target { {! {vect_strided}} && vect_int_mult } } } } */
>  /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  {target  { ! vect_int_mult } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" {target { vect_strided_wide && vect_int_mult } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" {target { {! {vect_strided_wide}} && vect_int_mult } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" {target { vect_strided && vect_int_mult } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" {target { {! {vect_strided}} && vect_int_mult } } } } */
>  /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" {target  { ! vect_int_mult } } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/slp-12b.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/slp-12b.c 2011-04-12 11:53:54.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/slp-12b.c 2011-04-12 11:55:11.000000000 +0100
> @@ -43,9 +43,9 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  {target { vect_strided_wide && vect_int_mult } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  {target { { ! { vect_int_mult }} || { ! {vect_strided_wide}}} } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  {target { vect_strided_wide && vect_int_mult } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  {target { { ! { vect_int_mult }} || { ! {vect_strided_wide}}} } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  {target { vect_strided && vect_int_mult } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  {target { { ! { vect_int_mult }} || { ! {vect_strided}}} } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  {target { vect_strided && vect_int_mult } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  {target { { ! { vect_int_mult }} || { ! {vect_strided}}} } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/slp-19.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/slp-19.c  2011-04-12 11:53:54.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/slp-19.c  2011-04-12 11:55:11.000000000 +0100
> @@ -146,9 +146,9 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target  vect_strided_wide  } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target  { ! { vect_strided_wide } } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect"  { target  vect_strided_wide  } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  { target { ! { vect_strided_wide } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target  vect_strided  } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target  { ! { vect_strided } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect"  { target  vect_strided  } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  { target { ! { vect_strided } } } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/slp-23.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/slp-23.c  2011-04-12 11:53:54.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/slp-23.c  2011-04-12 11:55:11.000000000 +0100
> @@ -106,8 +106,8 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target { vect_strided_wide } && {! { vect_no_align} } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { ! { vect_strided_wide || vect_no_align} } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target { vect_strided } && {! { vect_no_align} } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { ! { vect_strided || vect_no_align} } } } } */
>  /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-1.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-1.c  2011-04-12 11:53:54.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-1.c  2011-04-12 11:55:11.000000000 +0100
> @@ -85,6 +85,6 @@ foo (int n)
>   fbar (a);
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 6 loops" 1 "vect" { target vect_extract_even_odd_wide } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 5 loops" 1 "vect" { xfail vect_extract_even_odd_wide } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 6 loops" 1 "vect" { target vect_extract_even_odd } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 5 loops" 1 "vect" { xfail vect_extract_even_odd } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-98.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-98.c 2011-04-12 11:53:54.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-98.c 2011-04-12 11:55:11.000000000 +0100
> @@ -38,6 +38,6 @@ int main (void)
>  }
>
>  /* Needs interleaving support.  */
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd_wide } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { xfail  { vect_interleave && vect_extract_even_odd_wide } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { xfail  { vect_interleave && vect_extract_even_odd } } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-107.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-107.c        2011-04-12 11:53:54.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-107.c        2011-04-12 11:55:11.000000000 +0100
> @@ -40,6 +40,6 @@ int main (void)
>   return main1 ();
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_extract_even_odd_wide } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail vect_extract_even_odd_wide } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_extract_even_odd } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail vect_extract_even_odd } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-strided-float.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-strided-float.c      2011-04-12 11:53:54.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-strided-float.c      2011-04-12 11:55:11.000000000 +0100
> @@ -39,7 +39,7 @@ int main (void)
>  }
>
>  /* Needs interleaving support.  */
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd_wide } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail { vect_interleave && vect_extract_even_odd_wide } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail { vect_interleave && vect_extract_even_odd } } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [9/9] Testsuite: Replace vect_strided with vect_stridedN
  2011-04-12 14:29 ` [9/9] Testsuite: Replace vect_strided with vect_stridedN Richard Sandiford
@ 2011-04-15 12:44   ` Richard Guenther
  0 siblings, 0 replies; 27+ messages in thread
From: Richard Guenther @ 2011-04-15 12:44 UTC (permalink / raw)
  To: gcc-patches, patches, richard.sandiford

On Tue, Apr 12, 2011 at 4:28 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This patch replaces the general vect_strided target selector with
> a group of vect_stridedN selectors, one for each tested stride factor N.
>
> Also, some tests used vect_interleave && vect_extract_even_odd for
> strided accesses.  The two conditions used to be equivalent, but aren't
> after this series for ARM, so I've replaced them with vect_stridedN instead.
>
> Some tests used vect_interleave for loops that could be vectorised
> even without extract-even/odd support.  vect_interleave used to be
> a looser condition than vect_stridedN, but again isn't after this
> series, so I've used vect_interleave || vect_stridedN where necessary.
>
> Tested on x86_64-linux-gnu and arm-linux-gnueabi.  OK to install?

Ok.

Thanks,
Richard.

> Richard
>
>
> gcc/testsuite/
>        * lib/target-supports.exp (check_effective_target_vect_strided):
>        Replace with...
>        (check_effective_target_vect_strided2)
>        (check_effective_target_vect_strided3)
>        (check_effective_target_vect_strided4)
>        (check_effective_target_vect_strided8): ...these new functions.
>
>        * gcc.dg/vect/O3-pr39675-2.c: Update accordingly.
>        * gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c: Likewise.
>        * gcc.dg/vect/fast-math-slp-27.c: Likewise.
>        * gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c: Likewise.
>        * gcc.dg/vect/pr37539.c: Likewise.
>        * gcc.dg/vect/slp-11a.c: Likewise.
>        * gcc.dg/vect/slp-11b.c: Likewise.
>        * gcc.dg/vect/slp-11c.c: Likewise.
>        * gcc.dg/vect/slp-12a.c: Likewise.
>        * gcc.dg/vect/slp-12b.c: Likewise.
>        * gcc.dg/vect/slp-18.c: Likewise.
>        * gcc.dg/vect/slp-19a.c: Likewise.
>        * gcc.dg/vect/slp-19b.c: Likewise.
>        * gcc.dg/vect/slp-21.c: Likewise.
>        * gcc.dg/vect/slp-23.c: Likewise.
>        * gcc.dg/vect/vect-cselim-1.c: Likewise.
>
>        * gcc.dg/vect/fast-math-vect-complex-3.c: Use vect_stridedN
>        instead of vect_interleave && vect_extract_even_odd.
>        * gcc.dg/vect/no-scevccp-outer-10a.c: Likewise.
>        * gcc.dg/vect/no-scevccp-outer-10b.c: Likewise.
>        * gcc.dg/vect/no-scevccp-outer-20.c: Likewise.
>        * gcc.dg/vect/vect-1.c: Likewise.
>        * gcc.dg/vect/vect-10.c: Likewise.
>        * gcc.dg/vect/vect-98.c: Likewise.
>        * gcc.dg/vect/vect-107.c: Likewise.
>        * gcc.dg/vect/vect-strided-a-mult.c: Likewise.
>        * gcc.dg/vect/vect-strided-a-u16-i2.c: Likewise.
>        * gcc.dg/vect/vect-strided-a-u16-i4.c: Likewise.
>        * gcc.dg/vect/vect-strided-a-u16-mult.c: Likewise.
>        * gcc.dg/vect/vect-strided-a-u32-mult.c: Likewise.
>        * gcc.dg/vect/vect-strided-a-u8-i2-gap.c: Likewise.
>        * gcc.dg/vect/vect-strided-a-u8-i8-gap2.c: Likewise.
>        * gcc.dg/vect/vect-strided-a-u8-i8-gap7.c: Likewise.
>        * gcc.dg/vect/vect-strided-float.c: Likewise.
>        * gcc.dg/vect/vect-strided-mult-char-ls.c: Likewise.
>        * gcc.dg/vect/vect-strided-mult.c: Likewise.
>        * gcc.dg/vect/vect-strided-same-dr.c: Likewise.
>        * gcc.dg/vect/vect-strided-u16-i2.c: Likewise.
>        * gcc.dg/vect/vect-strided-u16-i4.c: Likewise.
>        * gcc.dg/vect/vect-strided-u32-i4.c: Likewise.
>        * gcc.dg/vect/vect-strided-u32-i8.c: Likewise.
>        * gcc.dg/vect/vect-strided-u32-mult.c: Likewise.
>        * gcc.dg/vect/vect-strided-u8-i2-gap.c: Likewise.
>        * gcc.dg/vect/vect-strided-u8-i2.c: Likewise.
>        * gcc.dg/vect/vect-strided-u8-i8-gap2.c: Likewise.
>        * gcc.dg/vect/vect-strided-u8-i8-gap4.c: Likewise.
>        * gcc.dg/vect/vect-strided-u8-i8-gap7.c: Likewise.
>        * gcc.dg/vect/vect-strided-u8-i8.c: Likewise.
>        * gcc.dg/vect/vect-vfa-03.c: Likewise.
>
>        * gcc.dg/vect/no-scevccp-outer-18.c: Add vect_stridedN to the
>        target condition.
>        * gcc.dg/vect/pr30843.c: Likewise.
>        * gcc.dg/vect/pr33866.c: Likewise.
>        * gcc.dg/vect/slp-reduc-6.c: Likewise.
>        * gcc.dg/vect/vect-strided-store-a-u8-i2.c: Likewise.
>        * gcc.dg/vect/vect-strided-store-u16-i4.c: Likewise.
>        * gcc.dg/vect/vect-strided-store-u32-i2.c: Likewise.
>
> Index: gcc/testsuite/lib/target-supports.exp
> ===================================================================
> --- gcc/testsuite/lib/target-supports.exp       2011-04-12 15:18:24.000000000 +0100
> +++ gcc/testsuite/lib/target-supports.exp       2011-04-12 15:22:06.000000000 +0100
> @@ -3143,22 +3143,30 @@ proc check_effective_target_vect_interle
>     return $et_vect_interleave_saved
>  }
>
> -# Return 1 if the target supports vector interleaving and extract even/odd, 0 otherwise.
> -proc check_effective_target_vect_strided { } {
> -    global et_vect_strided_saved
> +foreach N {2 3 4 8} {
> +    eval [string map [list N $N] {
> +       # Return 1 if the target supports 2-vector interleaving
> +       proc check_effective_target_vect_stridedN { } {
> +           global et_vect_stridedN_saved
> +
> +           if [info exists et_vect_stridedN_saved] {
> +               verbose "check_effective_target_vect_stridedN: using cached result" 2
> +           } else {
> +               set et_vect_stridedN_saved 0
> +               if { (N & -N) == N
> +                    && [check_effective_target_vect_interleave]
> +                    && [check_effective_target_vect_extract_even_odd] } {
> +                   set et_vect_stridedN_saved 1
> +               }
> +               if { [istarget arm*-*-*] && N >= 2 && N <= 4 } {
> +                   set et_vect_stridedN_saved 1
> +               }
> +           }
>
> -    if [info exists et_vect_strided_saved] {
> -        verbose "check_effective_target_vect_strided: using cached result" 2
> -    } else {
> -        set et_vect_strided_saved 0
> -        if { [check_effective_target_vect_interleave]
> -             && [check_effective_target_vect_extract_even_odd] } {
> -           set et_vect_strided_saved 1
> -        }
> -    }
> -
> -    verbose "check_effective_target_vect_strided: returning $et_vect_strided_saved" 2
> -    return $et_vect_strided_saved
> +           verbose "check_effective_target_vect_stridedN: returning $et_vect_stridedN_saved" 2
> +           return $et_vect_stridedN_saved
> +       }
> +    }]
>  }
>
>  # Return 1 if the target supports section-anchors
> Index: gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c    2011-04-12 15:18:24.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c    2011-04-12 15:22:06.000000000 +0100
> @@ -26,7 +26,7 @@ foo ()
>     }
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_strided } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided4 } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_strided4 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c  2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c  2011-04-12 15:22:06.000000000 +0100
> @@ -113,7 +113,7 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect"  {target { vect_strided && vect_int_mult } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" {target { vect_strided && vect_int_mult } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect"  {target { vect_strided8 && vect_int_mult } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" {target { vect_strided8 && vect_int_mult } } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/fast-math-slp-27.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/fast-math-slp-27.c        2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/fast-math-slp-27.c        2011-04-12 15:22:06.000000000 +0100
> @@ -13,5 +13,5 @@ void foo(void)
>    }
>  }
>
> -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target vect_strided } } } */
> +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target vect_strided2 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c     2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c     2011-04-12 15:22:06.000000000 +0100
> @@ -65,5 +65,5 @@ main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { xfail { vect_no_align || {! vect_strided } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { xfail { vect_no_align || { ! vect_strided2 } } } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/pr37539.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/pr37539.c 2011-04-12 15:18:24.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/pr37539.c 2011-04-12 15:22:06.000000000 +0100
> @@ -40,7 +40,7 @@ int main ()
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_strided } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target { vect_strided4 && vect_strided2 } } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
>
> Index: gcc/testsuite/gcc.dg/vect/slp-11a.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/slp-11a.c 2011-04-12 15:18:25.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/slp-11a.c 2011-04-12 15:22:06.000000000 +0100
> @@ -69,7 +69,7 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided && vect_int_mult } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_strided && vect_int_mult } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided8 && vect_int_mult } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_strided8 && vect_int_mult } } } } } */
>  /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/slp-11b.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/slp-11b.c 2011-04-12 15:18:25.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/slp-11b.c 2011-04-12 15:22:06.000000000 +0100
> @@ -43,7 +43,7 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided && vect_int_mult } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_strided && vect_int_mult } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided4 && vect_int_mult } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_strided4 && vect_int_mult } } } } } */
>  /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/slp-11c.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/slp-11c.c 2011-04-12 15:18:25.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/slp-11c.c 2011-04-12 15:22:06.000000000 +0100
> @@ -40,7 +40,7 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { { vect_uintfloat_cvt && vect_strided } && vect_int_mult } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { { vect_uintfloat_cvt && vect_strided } && vect_int_mult } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { { vect_uintfloat_cvt && vect_strided2 } && vect_int_mult } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { { vect_uintfloat_cvt && vect_strided2 } && vect_int_mult } } } } } */
>  /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0  "vect"  } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/slp-12a.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/slp-12a.c 2011-04-12 15:18:25.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/slp-12a.c 2011-04-12 15:22:06.000000000 +0100
> @@ -73,8 +73,8 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided && vect_int_mult } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_strided && vect_int_mult } } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { vect_strided && vect_int_mult } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! { vect_strided && vect_int_mult } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided8 && vect_int_mult } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_strided8 && vect_int_mult } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { vect_strided8 && vect_int_mult } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! { vect_strided8 && vect_int_mult } } } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/slp-12b.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/slp-12b.c 2011-04-12 15:18:24.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/slp-12b.c 2011-04-12 15:22:06.000000000 +0100
> @@ -43,9 +43,9 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  {target { vect_strided && vect_int_mult } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  {target { { ! { vect_int_mult }} || { ! {vect_strided}}} } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  {target { vect_strided && vect_int_mult } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  {target { { ! { vect_int_mult }} || { ! {vect_strided}}} } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_strided2 && vect_int_mult } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  { target { { ! { vect_strided2 && vect_int_mult } } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  { target { vect_strided2 && vect_int_mult } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  { target { { ! { vect_strided2 && vect_int_mult } } } } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/slp-18.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/slp-18.c  2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/slp-18.c  2011-04-12 15:22:06.000000000 +0100
> @@ -91,7 +91,7 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { vect_strided } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided8 } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_strided8 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/slp-19a.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/slp-19a.c 2011-04-12 15:18:25.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/slp-19a.c 2011-04-12 15:22:06.000000000 +0100
> @@ -54,8 +54,8 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! vect_strided } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_strided } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! vect_strided } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided8 } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! vect_strided8 } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_strided8 } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! vect_strided8} } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/slp-19b.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/slp-19b.c 2011-04-12 15:18:25.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/slp-19b.c 2011-04-12 15:22:06.000000000 +0100
> @@ -51,8 +51,8 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! vect_strided } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_strided } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! vect_strided } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided4 } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! vect_strided4 } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_strided4 } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! vect_strided4 } } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/slp-21.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/slp-21.c  2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/slp-21.c  2011-04-12 15:22:06.000000000 +0100
> @@ -199,9 +199,9 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect"  { target { vect_strided || vect_extract_even_odd } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target  { ! { vect_strided || vect_extract_even_odd } } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_strided }  } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  { target { ! { vect_strided } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect"  { target { vect_strided4 || vect_extract_even_odd } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target  { ! { vect_strided4 || vect_extract_even_odd } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_strided4 }  } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  { target { ! { vect_strided4 } } } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/slp-23.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/slp-23.c  2011-04-12 15:18:24.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/slp-23.c  2011-04-12 15:22:06.000000000 +0100
> @@ -106,8 +106,8 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target { vect_strided } && {! { vect_no_align} } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { ! { vect_strided || vect_no_align} } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target { vect_strided8 && { ! { vect_no_align} } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { ! { vect_strided8 || vect_no_align } } } } } */
>  /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-cselim-1.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-cselim-1.c   2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-cselim-1.c   2011-04-12 15:22:06.000000000 +0100
> @@ -82,5 +82,5 @@ main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { xfail { vect_no_align || {! vect_strided } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { xfail { vect_no_align || { ! vect_strided2 } } } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/fast-math-vect-complex-3.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/fast-math-vect-complex-3.c        2011-04-12 15:18:24.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/fast-math-vect-complex-3.c        2011-04-12 15:22:06.000000000 +0100
> @@ -56,5 +56,5 @@ main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave  && vect_extract_even_odd } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided2 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10a.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10a.c    2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10a.c    2011-04-12 15:22:06.000000000 +0100
> @@ -54,5 +54,5 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target vect_strided2 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10b.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10b.c    2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/no-scevccp-outer-10b.c    2011-04-12 15:22:06.000000000 +0100
> @@ -53,5 +53,5 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target vect_strided2 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/no-scevccp-outer-20.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/no-scevccp-outer-20.c     2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/no-scevccp-outer-20.c     2011-04-12 15:22:06.000000000 +0100
> @@ -50,5 +50,5 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target vect_strided2 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-1.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-1.c  2011-04-12 15:18:24.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-1.c  2011-04-12 15:22:06.000000000 +0100
> @@ -85,6 +85,6 @@ foo (int n)
>   fbar (a);
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 6 loops" 1 "vect" { target vect_extract_even_odd } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 5 loops" 1 "vect" { xfail vect_extract_even_odd } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 6 loops" 1 "vect" { target vect_strided2 } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 5 loops" 1 "vect" { xfail vect_strided2 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-10.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-10.c 2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-10.c 2011-04-12 15:22:06.000000000 +0100
> @@ -22,5 +22,5 @@ int foo ()
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { ! vect_extract_even_odd } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { ! vect_strided2 } } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-98.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-98.c 2011-04-12 15:18:25.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-98.c 2011-04-12 15:22:06.000000000 +0100
> @@ -38,6 +38,6 @@ int main (void)
>  }
>
>  /* Needs interleaving support.  */
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { xfail  { vect_interleave && vect_extract_even_odd } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided4 } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { xfail  vect_strided4 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-107.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-107.c        2011-04-12 15:18:25.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-107.c        2011-04-12 15:22:06.000000000 +0100
> @@ -40,6 +40,6 @@ int main (void)
>   return main1 ();
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_extract_even_odd } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail vect_extract_even_odd } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided2 } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail vect_strided2 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-strided-a-mult.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-strided-a-mult.c     2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-strided-a-mult.c     2011-04-12 15:22:06.000000000 +0100
> @@ -71,6 +71,6 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave && vect_extract_even_odd } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided2 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i2.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i2.c   2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i2.c   2011-04-12 15:22:06.000000000 +0100
> @@ -55,6 +55,6 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave && vect_extract_even_odd } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided2 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c   2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c   2011-04-12 15:22:06.000000000 +0100
> @@ -68,6 +68,6 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave && vect_extract_even_odd } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided4 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-mult.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-mult.c 2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-mult.c 2011-04-12 15:22:06.000000000 +0100
> @@ -62,6 +62,6 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave && vect_extract_even_odd } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided2 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-strided-a-u32-mult.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-strided-a-u32-mult.c 2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-strided-a-u32-mult.c 2011-04-12 15:22:06.000000000 +0100
> @@ -61,6 +61,6 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave && vect_extract_even_odd } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided2 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c        2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i2-gap.c        2011-04-12 15:22:06.000000000 +0100
> @@ -69,6 +69,6 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave && vect_extract_even_odd } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided2 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2.c       2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap2.c       2011-04-12 15:22:06.000000000 +0100
> @@ -76,6 +76,6 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided8 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7.c       2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-strided-a-u8-i8-gap7.c       2011-04-12 15:22:06.000000000 +0100
> @@ -81,6 +81,6 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided8 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-strided-float.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-strided-float.c      2011-04-12 15:18:25.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-strided-float.c      2011-04-12 15:22:06.000000000 +0100
> @@ -39,7 +39,7 @@ int main (void)
>  }
>
>  /* Needs interleaving support.  */
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail { vect_interleave && vect_extract_even_odd } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided2 } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail vect_strided2 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-strided-mult-char-ls.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-strided-mult-char-ls.c       2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-strided-mult-char-ls.c       2011-04-12 15:22:06.000000000 +0100
> @@ -71,6 +71,6 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave && vect_extract_even_odd } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided2 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-strided-mult.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-strided-mult.c       2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-strided-mult.c       2011-04-12 15:22:06.000000000 +0100
> @@ -71,6 +71,6 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave && vect_extract_even_odd } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided2 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-strided-same-dr.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-strided-same-dr.c    2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-strided-same-dr.c    2011-04-12 15:22:06.000000000 +0100
> @@ -72,5 +72,5 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave && vect_extract_even_odd } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided2 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-strided-u16-i2.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-strided-u16-i2.c     2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-strided-u16-i2.c     2011-04-12 15:22:06.000000000 +0100
> @@ -55,6 +55,6 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave && vect_extract_even_odd } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided2 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-strided-u16-i4.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-strided-u16-i4.c     2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-strided-u16-i4.c     2011-04-12 15:22:06.000000000 +0100
> @@ -68,6 +68,6 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave && vect_extract_even_odd } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided4 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-strided-u32-i4.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-strided-u32-i4.c     2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-strided-u32-i4.c     2011-04-12 15:22:06.000000000 +0100
> @@ -63,6 +63,6 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave && vect_extract_even_odd } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided4 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-strided-u32-i8.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-strided-u32-i8.c     2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-strided-u32-i8.c     2011-04-12 15:22:06.000000000 +0100
> @@ -77,6 +77,6 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided8 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-strided-u32-mult.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-strided-u32-mult.c   2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-strided-u32-mult.c   2011-04-12 15:22:06.000000000 +0100
> @@ -60,6 +60,6 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave && vect_extract_even_odd } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided2 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2-gap.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2-gap.c  2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2-gap.c  2011-04-12 15:22:06.000000000 +0100
> @@ -71,6 +71,6 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave && vect_extract_even_odd } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided2 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2.c      2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-strided-u8-i2.c      2011-04-12 15:22:06.000000000 +0100
> @@ -54,6 +54,6 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave && vect_extract_even_odd } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_strided2 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2.c 2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap2.c 2011-04-12 15:22:06.000000000 +0100
> @@ -78,6 +78,6 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided8 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c 2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c 2011-04-12 15:22:06.000000000 +0100
> @@ -98,6 +98,6 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided8 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7.c 2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap7.c 2011-04-12 15:22:06.000000000 +0100
> @@ -83,6 +83,6 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided8 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8.c      2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8.c      2011-04-12 15:22:06.000000000 +0100
> @@ -85,6 +85,6 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided8 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-vfa-03.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-vfa-03.c     2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-vfa-03.c     2011-04-12 15:22:06.000000000 +0100
> @@ -53,6 +53,6 @@ main (void)
>  }
>
>  /* Needs interleaving support.  */
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { xfail  { vect_interleave && vect_extract_even_odd } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided2 } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { xfail  vect_strided2 } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/no-scevccp-outer-18.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/no-scevccp-outer-18.c     2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/no-scevccp-outer-18.c     2011-04-12 15:22:06.000000000 +0100
> @@ -47,5 +47,5 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target vect_interleave } } } */
> +/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target { vect_interleave || vect_strided2 } } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/pr30843.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/pr30843.c 2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/pr30843.c 2011-04-12 15:22:06.000000000 +0100
> @@ -20,6 +20,6 @@ void dacP98FillRGBMap (unsigned char *pB
>     }
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave || vect_strided4 } } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/pr33866.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/pr33866.c 2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/pr33866.c 2011-04-12 15:22:06.000000000 +0100
> @@ -27,6 +27,6 @@ void test_select_fill_hyper_simple (long
>  }
>
>  /* Needs interleaving support.  */
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave || vect_strided2 } } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/slp-reduc-6.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/slp-reduc-6.c     2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/slp-reduc-6.c     2011-04-12 15:22:06.000000000 +0100
> @@ -42,7 +42,7 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { xfail { vect_no_int_add || { ! vect_unpack } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { xfail { vect_no_int_add || { ! { vect_unpack || vect_strided2 } } } } } } */
>  /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" } } */
>  /* { dg-final { scan-tree-dump-times "different interleaving chains in one node" 1 "vect" { target { ! vect_no_int_add } } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-strided-store-a-u8-i2.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-strided-store-a-u8-i2.c      2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-strided-store-a-u8-i2.c      2011-04-12 15:22:06.000000000 +0100
> @@ -55,6 +55,6 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_interleave || vect_strided2 } } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
> Index: gcc/testsuite/gcc.dg/vect/vect-strided-store-u16-i4.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-strided-store-u16-i4.c       2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-strided-store-u16-i4.c       2011-04-12 15:22:06.000000000 +0100
> @@ -65,8 +65,8 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect"  { target { vect_interleave && vect_pack_trunc  } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { { ! { vect_interleave } } && { vect_pack_trunc } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect"  { target { { vect_interleave || vect_strided4 } && vect_pack_trunc } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { { ! { vect_interleave || vect_strided4 } } && { vect_pack_trunc } } } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
>
> Index: gcc/testsuite/gcc.dg/vect/vect-strided-store-u32-i2.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-strided-store-u32-i2.c       2011-04-12 12:16:46.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-strided-store-u32-i2.c       2011-04-12 15:22:06.000000000 +0100
> @@ -39,7 +39,7 @@ int main (void)
>  }
>
>  /* Needs interleaving support.  */
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail { vect_interleave } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave || vect_strided2 } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { xfail { vect_interleave || vect_strided2 } } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [8/9] Testsuite: split tests for strided accesses
  2011-04-12 14:19 ` [8/9] Testsuite: split tests for strided accesses Richard Sandiford
@ 2011-04-15 12:44   ` Richard Guenther
  0 siblings, 0 replies; 27+ messages in thread
From: Richard Guenther @ 2011-04-15 12:44 UTC (permalink / raw)
  To: gcc-patches, patches, richard.sandiford

On Tue, Apr 12, 2011 at 4:19 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> The next patch introduces separate vect_stridedN target selectors
> for each tested stride factor N.  At the moment, some tests contain
> several independent loops that have different stride factors.
> It's easier to make the next change if we put these loops into
> separate tests.
>
> Tested on x86_64-linux-gnu and arm-linux-gnueabi.  OK to install?

Ok.

Thanks,
Richard.

> Richard
>
>
> gcc/testsuite/
>        * gcc.dg/vect/slp-11.c: Split into...
>        * gcc.dg/vect/slp-11a.c, gcc.dg/vect/slp-11b.c,
>        gcc.dg/vect/slp-11c.c: ...these tests.
>        * gcc.dg/vect/slp-12a.c: Split 4-stride loop into...
>        * gcc.dg/vect/slp-12c.c: ...this new test.
>        * gcc.dg/vect/slp-19.c: Split into...
>        * gcc.dg/vect/slp-19a.c, gcc.dg/vect/slp-19b.c,
>        gcc.dg/vect/slp-19c.c: ...these new tests.
>
> Index: gcc/testsuite/gcc.dg/vect/slp-11.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/slp-11.c  2011-04-12 15:18:24.000000000 +0100
> +++ /dev/null   2011-03-23 08:42:11.268792848 +0000
> @@ -1,113 +0,0 @@
> -/* { dg-require-effective-target vect_int } */
> -
> -#include <stdarg.h>
> -#include "tree-vect.h"
> -
> -#define N 8
> -
> -int
> -main1 ()
> -{
> -  int i;
> -  unsigned int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7;
> -  unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
> -  float out2[N*8];
> -
> -  /* Different operations - not SLPable.  */
> -  for (i = 0; i < N; i++)
> -    {
> -      a0 = in[i*8] + 5;
> -      a1 = in[i*8 + 1] * 6;
> -      a2 = in[i*8 + 2] + 7;
> -      a3 = in[i*8 + 3] + 8;
> -      a4 = in[i*8 + 4] + 9;
> -      a5 = in[i*8 + 5] + 10;
> -      a6 = in[i*8 + 6] + 11;
> -      a7 = in[i*8 + 7] + 12;
> -
> -      b0 = a0 * 3;
> -      b1 = a1 * 2;
> -      b2 = a2 * 12;
> -      b3 = a3 * 5;
> -      b4 = a4 * 8;
> -      b5 = a5 * 4;
> -      b6 = a6 * 3;
> -      b7 = a7 * 2;
> -
> -      out[i*8] = b0 - 2;
> -      out[i*8 + 1] = b1 - 3;
> -      out[i*8 + 2] = b2 - 2;
> -      out[i*8 + 3] = b3 - 1;
> -      out[i*8 + 4] = b4 - 8;
> -      out[i*8 + 5] = b5 - 7;
> -      out[i*8 + 6] = b6 - 3;
> -      out[i*8 + 7] = b7 - 7;
> -    }
> -
> -  /* check results:  */
> -  for (i = 0; i < N; i++)
> -    {
> -      if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
> -         || out[i*8 + 1] != (in[i*8 + 1] * 6) * 2 - 3
> -         || out[i*8 + 2] != (in[i*8 + 2] + 7) * 12 - 2
> -         || out[i*8 + 3] != (in[i*8 + 3] + 8) * 5 - 1
> -         || out[i*8 + 4] != (in[i*8 + 4] + 9) * 8 - 8
> -         || out[i*8 + 5] != (in[i*8 + 5] + 10) * 4 - 7
> -         || out[i*8 + 6] != (in[i*8 + 6] + 11) * 3 - 3
> -         || out[i*8 + 7] != (in[i*8 + 7] + 12) * 2 - 7)
> -       abort ();
> -    }
> -
> -  /* Requires permutation - not SLPable.  */
> -  for (i = 0; i < N*2; i++)
> -    {
> -      out[i*4] = (in[i*4] + 2) * 3;
> -      out[i*4 + 1] = (in[i*4 + 2] + 2) * 7;
> -      out[i*4 + 2] = (in[i*4 + 1] + 7) * 3;
> -      out[i*4 + 3] = (in[i*4 + 3] + 3) * 4;
> -    }
> -
> -  /* check results:  */
> -  for (i = 0; i < N*2; i++)
> -    {
> -      if (out[i*4] !=  (in[i*4] + 2) * 3
> -         || out[i*4 + 1] != (in[i*4 + 2] + 2) * 7
> -         || out[i*4 + 2] != (in[i*4 + 1] + 7) * 3
> -         || out[i*4 + 3] != (in[i*4 + 3] + 3) * 4)
> -        abort ();
> -    }
> -
> -  /* Different operations - not SLPable.  */
> -  for (i = 0; i < N*4; i++)
> -    {
> -      out2[i*2] = ((float) in[i*2] * 2 + 6) ;
> -      out2[i*2 + 1] = (float) (in[i*2 + 1] * 3 + 7);
> -    }
> -
> -  /* check results:  */
> -  for (i = 0; i < N*4; i++)
> -    {
> -      if (out2[i*2] !=  ((float) in[i*2] * 2 + 6)
> -         || out2[i*2 + 1] != (float) (in[i*2 + 1] * 3 + 7))
> -        abort ();
> -    }
> -
> -
> -  return 0;
> -}
> -
> -int main (void)
> -{
> -  check_vect ();
> -
> -  main1 ();
> -
> -  return 0;
> -}
> -
> -/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect"  { target { { vect_uintfloat_cvt && vect_strided } &&  vect_int_mult } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  { target { { { ! vect_uintfloat_cvt } && vect_strided } &&  vect_int_mult } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  {target  { ! { vect_int_mult && vect_strided } } } } }  */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0  "vect"  } } */
> -/* { dg-final { cleanup-tree-dump "vect" } } */
> -
> Index: gcc/testsuite/gcc.dg/vect/slp-11a.c
> ===================================================================
> --- /dev/null   2011-03-23 08:42:11.268792848 +0000
> +++ gcc/testsuite/gcc.dg/vect/slp-11a.c 2011-04-12 15:18:25.000000000 +0100
> @@ -0,0 +1,75 @@
> +/* { dg-require-effective-target vect_int } */
> +
> +#include <stdarg.h>
> +#include "tree-vect.h"
> +
> +#define N 8
> +
> +int
> +main1 ()
> +{
> +  int i;
> +  unsigned int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7;
> +  unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
> +
> +  /* Different operations - not SLPable.  */
> +  for (i = 0; i < N; i++)
> +    {
> +      a0 = in[i*8] + 5;
> +      a1 = in[i*8 + 1] * 6;
> +      a2 = in[i*8 + 2] + 7;
> +      a3 = in[i*8 + 3] + 8;
> +      a4 = in[i*8 + 4] + 9;
> +      a5 = in[i*8 + 5] + 10;
> +      a6 = in[i*8 + 6] + 11;
> +      a7 = in[i*8 + 7] + 12;
> +
> +      b0 = a0 * 3;
> +      b1 = a1 * 2;
> +      b2 = a2 * 12;
> +      b3 = a3 * 5;
> +      b4 = a4 * 8;
> +      b5 = a5 * 4;
> +      b6 = a6 * 3;
> +      b7 = a7 * 2;
> +
> +      out[i*8] = b0 - 2;
> +      out[i*8 + 1] = b1 - 3;
> +      out[i*8 + 2] = b2 - 2;
> +      out[i*8 + 3] = b3 - 1;
> +      out[i*8 + 4] = b4 - 8;
> +      out[i*8 + 5] = b5 - 7;
> +      out[i*8 + 6] = b6 - 3;
> +      out[i*8 + 7] = b7 - 7;
> +    }
> +
> +  /* check results:  */
> +  for (i = 0; i < N; i++)
> +    {
> +      if (out[i*8] !=  (in[i*8] + 5) * 3 - 2
> +         || out[i*8 + 1] != (in[i*8 + 1] * 6) * 2 - 3
> +         || out[i*8 + 2] != (in[i*8 + 2] + 7) * 12 - 2
> +         || out[i*8 + 3] != (in[i*8 + 3] + 8) * 5 - 1
> +         || out[i*8 + 4] != (in[i*8 + 4] + 9) * 8 - 8
> +         || out[i*8 + 5] != (in[i*8 + 5] + 10) * 4 - 7
> +         || out[i*8 + 6] != (in[i*8 + 6] + 11) * 3 - 3
> +         || out[i*8 + 7] != (in[i*8 + 7] + 12) * 2 - 7)
> +       abort ();
> +    }
> +
> +  return 0;
> +}
> +
> +int main (void)
> +{
> +  check_vect ();
> +
> +  main1 ();
> +
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided && vect_int_mult } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_strided && vect_int_mult } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" } } */
> +/* { dg-final { cleanup-tree-dump "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/slp-11b.c
> ===================================================================
> --- /dev/null   2011-03-23 08:42:11.268792848 +0000
> +++ gcc/testsuite/gcc.dg/vect/slp-11b.c 2011-04-12 15:18:25.000000000 +0100
> @@ -0,0 +1,49 @@
> +/* { dg-require-effective-target vect_int } */
> +
> +#include <stdarg.h>
> +#include "tree-vect.h"
> +
> +#define N 8
> +
> +int
> +main1 ()
> +{
> +  int i;
> +  unsigned int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7;
> +  unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
> +
> +  /* Requires permutation - not SLPable.  */
> +  for (i = 0; i < N*2; i++)
> +    {
> +      out[i*4] = (in[i*4] + 2) * 3;
> +      out[i*4 + 1] = (in[i*4 + 2] + 2) * 7;
> +      out[i*4 + 2] = (in[i*4 + 1] + 7) * 3;
> +      out[i*4 + 3] = (in[i*4 + 3] + 3) * 4;
> +    }
> +
> +  /* check results:  */
> +  for (i = 0; i < N*2; i++)
> +    {
> +      if (out[i*4] !=  (in[i*4] + 2) * 3
> +         || out[i*4 + 1] != (in[i*4 + 2] + 2) * 7
> +         || out[i*4 + 2] != (in[i*4 + 1] + 7) * 3
> +         || out[i*4 + 3] != (in[i*4 + 3] + 3) * 4)
> +        abort ();
> +    }
> +
> +  return 0;
> +}
> +
> +int main (void)
> +{
> +  check_vect ();
> +
> +  main1 ();
> +
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided && vect_int_mult } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_strided && vect_int_mult } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" } } */
> +/* { dg-final { cleanup-tree-dump "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/slp-11c.c
> ===================================================================
> --- /dev/null   2011-03-23 08:42:11.268792848 +0000
> +++ gcc/testsuite/gcc.dg/vect/slp-11c.c 2011-04-12 15:18:25.000000000 +0100
> @@ -0,0 +1,46 @@
> +/* { dg-require-effective-target vect_int } */
> +
> +#include <stdarg.h>
> +#include "tree-vect.h"
> +
> +#define N 8
> +
> +int
> +main1 ()
> +{
> +  int i;
> +  unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
> +  float out[N*8];
> +
> +  /* Different operations - not SLPable.  */
> +  for (i = 0; i < N*4; i++)
> +    {
> +      out[i*2] = ((float) in[i*2] * 2 + 6) ;
> +      out[i*2 + 1] = (float) (in[i*2 + 1] * 3 + 7);
> +    }
> +
> +  /* check results:  */
> +  for (i = 0; i < N*4; i++)
> +    {
> +      if (out[i*2] !=  ((float) in[i*2] * 2 + 6)
> +         || out[i*2 + 1] != (float) (in[i*2 + 1] * 3 + 7))
> +        abort ();
> +    }
> +
> +
> +  return 0;
> +}
> +
> +int main (void)
> +{
> +  check_vect ();
> +
> +  main1 ();
> +
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { { vect_uintfloat_cvt && vect_strided } && vect_int_mult } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { { vect_uintfloat_cvt && vect_strided } && vect_int_mult } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0  "vect"  } } */
> +/* { dg-final { cleanup-tree-dump "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/slp-12a.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/slp-12a.c 2011-04-12 15:18:24.000000000 +0100
> +++ gcc/testsuite/gcc.dg/vect/slp-12a.c 2011-04-12 15:18:25.000000000 +0100
> @@ -11,7 +11,7 @@ main1 ()
>   int i;
>   unsigned int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7;
>   unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
> -  unsigned int ia[N], ib[N*2];
> +  unsigned int ia[N];
>
>   for (i = 0; i < N; i++)
>     {
> @@ -61,27 +61,6 @@ main1 ()
>        abort ();
>     }
>
> -  for (i = 0; i < N*2; i++)
> -    {
> -      out[i*4] = (in[i*4] + 2) * 3;
> -      out[i*4 + 1] = (in[i*4 + 1] + 2) * 7;
> -      out[i*4 + 2] = (in[i*4 + 2] + 7) * 3;
> -      out[i*4 + 3] = (in[i*4 + 3] + 7) * 7;
> -
> -      ib[i] = 7;
> -    }
> -
> -  /* check results:  */
> -  for (i = 0; i < N*2; i++)
> -    {
> -      if (out[i*4] !=  (in[i*4] + 2) * 3
> -         || out[i*4 + 1] != (in[i*4 + 1] + 2) * 7
> -         || out[i*4 + 2] != (in[i*4 + 2] + 7) * 3
> -         || out[i*4 + 3] != (in[i*4 + 3] + 7) * 7
> -         || ib[i] != 7)
> -        abort ();
> -    }
> -
>   return 0;
>  }
>
> @@ -94,11 +73,8 @@ int main (void)
>   return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect"  {target { vect_strided && vect_int_mult} } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  {target { {! {vect_strided}} && vect_int_mult } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  {target  { ! vect_int_mult } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" {target { vect_strided && vect_int_mult } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" {target { {! {vect_strided}} && vect_int_mult } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" {target  { ! vect_int_mult } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided && vect_int_mult } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_strided && vect_int_mult } } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { vect_strided && vect_int_mult } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! { vect_strided && vect_int_mult } } } } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
> -
> Index: gcc/testsuite/gcc.dg/vect/slp-12c.c
> ===================================================================
> --- /dev/null   2011-03-23 08:42:11.268792848 +0000
> +++ gcc/testsuite/gcc.dg/vect/slp-12c.c 2011-04-12 15:18:25.000000000 +0100
> @@ -0,0 +1,53 @@
> +/* { dg-require-effective-target vect_int } */
> +
> +#include <stdarg.h>
> +#include "tree-vect.h"
> +
> +#define N 8
> +
> +int
> +main1 ()
> +{
> +  int i;
> +  unsigned int out[N*8], a0, a1, a2, a3, a4, a5, a6, a7, b1, b0, b2, b3, b4, b5, b6, b7;
> +  unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
> +  unsigned int ia[N*2];
> +
> +  for (i = 0; i < N*2; i++)
> +    {
> +      out[i*4] = (in[i*4] + 2) * 3;
> +      out[i*4 + 1] = (in[i*4 + 1] + 2) * 7;
> +      out[i*4 + 2] = (in[i*4 + 2] + 7) * 3;
> +      out[i*4 + 3] = (in[i*4 + 3] + 7) * 7;
> +
> +      ia[i] = 7;
> +    }
> +
> +  /* check results:  */
> +  for (i = 0; i < N*2; i++)
> +    {
> +      if (out[i*4] !=  (in[i*4] + 2) * 3
> +         || out[i*4 + 1] != (in[i*4 + 1] + 2) * 7
> +         || out[i*4 + 2] != (in[i*4 + 2] + 7) * 3
> +         || out[i*4 + 3] != (in[i*4 + 3] + 7) * 7
> +         || ia[i] != 7)
> +        abort ();
> +    }
> +
> +  return 0;
> +}
> +
> +int main (void)
> +{
> +  check_vect ();
> +
> +  main1 ();
> +
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target { vect_int_mult } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  { target { ! vect_int_mult } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_int_mult } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! vect_int_mult } } } } */
> +/* { dg-final { cleanup-tree-dump "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/slp-19.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/slp-19.c  2011-04-12 15:18:24.000000000 +0100
> +++ /dev/null   2011-03-23 08:42:11.268792848 +0000
> @@ -1,154 +0,0 @@
> -/* { dg-require-effective-target vect_int } */
> -
> -#include <stdarg.h>
> -#include "tree-vect.h"
> -
> -#define N 16
> -
> -int
> -main1 ()
> -{
> -  unsigned int i;
> -  unsigned int out[N*8];
> -  unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
> -  unsigned int ia[N*2], a0, a1, a2, a3;
> -
> -  for (i = 0; i < N; i++)
> -    {
> -      out[i*8] = in[i*8];
> -      out[i*8 + 1] = in[i*8 + 1];
> -      out[i*8 + 2] = in[i*8 + 2];
> -      out[i*8 + 3] = in[i*8 + 3];
> -      out[i*8 + 4] = in[i*8 + 4];
> -      out[i*8 + 5] = in[i*8 + 5];
> -      out[i*8 + 6] = in[i*8 + 6];
> -      out[i*8 + 7] = in[i*8 + 7];
> -
> -      ia[i] = in[i*8 + 2];
> -    }
> -
> -  /* check results:  */
> -  for (i = 0; i < N; i++)
> -    {
> -      if (out[i*8] !=  in[i*8]
> -         || out[i*8 + 1] != in[i*8 + 1]
> -         || out[i*8 + 2] != in[i*8 + 2]
> -         || out[i*8 + 3] != in[i*8 + 3]
> -         || out[i*8 + 4] != in[i*8 + 4]
> -         || out[i*8 + 5] != in[i*8 + 5]
> -         || out[i*8 + 6] != in[i*8 + 6]
> -         || out[i*8 + 7] != in[i*8 + 7]
> -         || ia[i] != in[i*8 + 2])
> -       abort ();
> -    }
> -
> -  for (i = 0; i < N*2; i++)
> -    {
> -      a0 = in[i*4] + 1;
> -      a1 = in[i*4 + 1] + 2;
> -      a2 = in[i*4 + 2] + 3;
> -      a3 = in[i*4 + 3] + 4;
> -
> -      out[i*4] = a0;
> -      out[i*4 + 1] = a1;
> -      out[i*4 + 2] = a2;
> -      out[i*4 + 3] = a3;
> -
> -      ia[i] = a2;
> -    }
> -
> -  /* check results:  */
> -  for (i = 0; i < N*2; i++)
> -    {
> -      if (out[i*4] !=  in[i*4] + 1
> -         || out[i*4 + 1] != in[i*4 + 1] + 2
> -         || out[i*4 + 2] != in[i*4 + 2] + 3
> -         || out[i*4 + 3] != in[i*4 + 3] + 4
> -         || ia[i] != in[i*4 + 2] + 3)
> -        abort ();
> -    }
> -
> -  /* The last stmt requires interleaving of not power of 2 size - not
> -     vectorizable.  */
> -  for (i = 0; i < N/2; i++)
> -    {
> -      out[i*12] = in[i*12];
> -      out[i*12 + 1] = in[i*12 + 1];
> -      out[i*12 + 2] = in[i*12 + 2];
> -      out[i*12 + 3] = in[i*12 + 3];
> -      out[i*12 + 4] = in[i*12 + 4];
> -      out[i*12 + 5] = in[i*12 + 5];
> -      out[i*12 + 6] = in[i*12 + 6];
> -      out[i*12 + 7] = in[i*12 + 7];
> -      out[i*12 + 8] = in[i*12 + 8];
> -      out[i*12 + 9] = in[i*12 + 9];
> -      out[i*12 + 10] = in[i*12 + 10];
> -      out[i*12 + 11] = in[i*12 + 11];
> -
> -      ia[i] = in[i*12 + 7];
> -    }
> -
> -  /* check results:  */
> -  for (i = 0; i < N/2; i++)
> -    {
> -      if (out[i*12] !=  in[i*12]
> -         || out[i*12 + 1] != in[i*12 + 1]
> -         || out[i*12 + 2] != in[i*12 + 2]
> -         || out[i*12 + 3] != in[i*12 + 3]
> -         || out[i*12 + 4] != in[i*12 + 4]
> -         || out[i*12 + 5] != in[i*12 + 5]
> -         || out[i*12 + 6] != in[i*12 + 6]
> -         || out[i*12 + 7] != in[i*12 + 7]
> -         || out[i*12 + 8] != in[i*12 + 8]
> -         || out[i*12 + 9] != in[i*12 + 9]
> -         || out[i*12 + 10] != in[i*12 + 10]
> -         || out[i*12 + 11] != in[i*12 + 11]
> -         || ia[i] != in[i*12 + 7])
> -        abort ();
> -    }
> -
> -  /* Hybrid SLP with unrolling by 2.  */
> -  for (i = 0; i < N; i++)
> -    {
> -      out[i*6] = in[i*6];
> -      out[i*6 + 1] = in[i*6 + 1];
> -      out[i*6 + 2] = in[i*6 + 2];
> -      out[i*6 + 3] = in[i*6 + 3];
> -      out[i*6 + 4] = in[i*6 + 4];
> -      out[i*6 + 5] = in[i*6 + 5];
> -
> -      ia[i] = i;
> -    }
> -
> -  /* check results:  */
> -  for (i = 0; i < N/2; i++)
> -    {
> -      if (out[i*6] !=  in[i*6]
> -         || out[i*6 + 1] != in[i*6 + 1]
> -         || out[i*6 + 2] != in[i*6 + 2]
> -         || out[i*6 + 3] != in[i*6 + 3]
> -         || out[i*6 + 4] != in[i*6 + 4]
> -         || out[i*6 + 5] != in[i*6 + 5]
> -         || ia[i] != i)
> -        abort ();
> -    }
> -
> -
> -  return 0;
> -}
> -
> -int main (void)
> -{
> -  check_vect ();
> -
> -  main1 ();
> -
> -  return 0;
> -}
> -
> -/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target  vect_strided  } } } */
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target  { ! { vect_strided } } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect"  { target  vect_strided  } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  { target { ! { vect_strided } } } } } */
> -/* { dg-final { cleanup-tree-dump "vect" } } */
> -
> Index: gcc/testsuite/gcc.dg/vect/slp-19a.c
> ===================================================================
> --- /dev/null   2011-03-23 08:42:11.268792848 +0000
> +++ gcc/testsuite/gcc.dg/vect/slp-19a.c 2011-04-12 15:18:25.000000000 +0100
> @@ -0,0 +1,61 @@
> +/* { dg-require-effective-target vect_int } */
> +
> +#include <stdarg.h>
> +#include "tree-vect.h"
> +
> +#define N 16
> +
> +int
> +main1 ()
> +{
> +  unsigned int i;
> +  unsigned int out[N*8];
> +  unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
> +  unsigned int ia[N*2];
> +
> +  for (i = 0; i < N; i++)
> +    {
> +      out[i*8] = in[i*8];
> +      out[i*8 + 1] = in[i*8 + 1];
> +      out[i*8 + 2] = in[i*8 + 2];
> +      out[i*8 + 3] = in[i*8 + 3];
> +      out[i*8 + 4] = in[i*8 + 4];
> +      out[i*8 + 5] = in[i*8 + 5];
> +      out[i*8 + 6] = in[i*8 + 6];
> +      out[i*8 + 7] = in[i*8 + 7];
> +
> +      ia[i] = in[i*8 + 2];
> +    }
> +
> +  /* check results:  */
> +  for (i = 0; i < N; i++)
> +    {
> +      if (out[i*8] !=  in[i*8]
> +         || out[i*8 + 1] != in[i*8 + 1]
> +         || out[i*8 + 2] != in[i*8 + 2]
> +         || out[i*8 + 3] != in[i*8 + 3]
> +         || out[i*8 + 4] != in[i*8 + 4]
> +         || out[i*8 + 5] != in[i*8 + 5]
> +         || out[i*8 + 6] != in[i*8 + 6]
> +         || out[i*8 + 7] != in[i*8 + 7]
> +         || ia[i] != in[i*8 + 2])
> +       abort ();
> +    }
> +
> +  return 0;
> +}
> +
> +int main (void)
> +{
> +  check_vect ();
> +
> +  main1 ();
> +
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! vect_strided } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_strided } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! vect_strided } } } } */
> +/* { dg-final { cleanup-tree-dump "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/slp-19b.c
> ===================================================================
> --- /dev/null   2011-03-23 08:42:11.268792848 +0000
> +++ gcc/testsuite/gcc.dg/vect/slp-19b.c 2011-04-12 15:18:25.000000000 +0100
> @@ -0,0 +1,58 @@
> +/* { dg-require-effective-target vect_int } */
> +
> +#include <stdarg.h>
> +#include "tree-vect.h"
> +
> +#define N 16
> +
> +int
> +main1 ()
> +{
> +  unsigned int i;
> +  unsigned int out[N*8];
> +  unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
> +  unsigned int ia[N*2], a0, a1, a2, a3;
> +
> +  for (i = 0; i < N*2; i++)
> +    {
> +      a0 = in[i*4] + 1;
> +      a1 = in[i*4 + 1] + 2;
> +      a2 = in[i*4 + 2] + 3;
> +      a3 = in[i*4 + 3] + 4;
> +
> +      out[i*4] = a0;
> +      out[i*4 + 1] = a1;
> +      out[i*4 + 2] = a2;
> +      out[i*4 + 3] = a3;
> +
> +      ia[i] = a2;
> +    }
> +
> +  /* check results:  */
> +  for (i = 0; i < N*2; i++)
> +    {
> +      if (out[i*4] !=  in[i*4] + 1
> +         || out[i*4 + 1] != in[i*4 + 1] + 2
> +         || out[i*4 + 2] != in[i*4 + 2] + 3
> +         || out[i*4 + 3] != in[i*4 + 3] + 4
> +         || ia[i] != in[i*4 + 2] + 3)
> +        abort ();
> +    }
> +
> +  return 0;
> +}
> +
> +int main (void)
> +{
> +  check_vect ();
> +
> +  main1 ();
> +
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! vect_strided } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_strided } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! vect_strided } } } } */
> +/* { dg-final { cleanup-tree-dump "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/slp-19c.c
> ===================================================================
> --- /dev/null   2011-03-23 08:42:11.268792848 +0000
> +++ gcc/testsuite/gcc.dg/vect/slp-19c.c 2011-04-12 15:18:25.000000000 +0100
> @@ -0,0 +1,95 @@
> +/* { dg-require-effective-target vect_int } */
> +
> +#include <stdarg.h>
> +#include "tree-vect.h"
> +
> +#define N 16
> +
> +int
> +main1 ()
> +{
> +  unsigned int i;
> +  unsigned int out[N*8];
> +  unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
> +  unsigned int ia[N*2], a0, a1, a2, a3;
> +
> +  /* The last stmt requires interleaving of not power of 2 size - not
> +     vectorizable.  */
> +  for (i = 0; i < N/2; i++)
> +    {
> +      out[i*12] = in[i*12];
> +      out[i*12 + 1] = in[i*12 + 1];
> +      out[i*12 + 2] = in[i*12 + 2];
> +      out[i*12 + 3] = in[i*12 + 3];
> +      out[i*12 + 4] = in[i*12 + 4];
> +      out[i*12 + 5] = in[i*12 + 5];
> +      out[i*12 + 6] = in[i*12 + 6];
> +      out[i*12 + 7] = in[i*12 + 7];
> +      out[i*12 + 8] = in[i*12 + 8];
> +      out[i*12 + 9] = in[i*12 + 9];
> +      out[i*12 + 10] = in[i*12 + 10];
> +      out[i*12 + 11] = in[i*12 + 11];
> +
> +      ia[i] = in[i*12 + 7];
> +    }
> +
> +  /* check results:  */
> +  for (i = 0; i < N/2; i++)
> +    {
> +      if (out[i*12] !=  in[i*12]
> +         || out[i*12 + 1] != in[i*12 + 1]
> +         || out[i*12 + 2] != in[i*12 + 2]
> +         || out[i*12 + 3] != in[i*12 + 3]
> +         || out[i*12 + 4] != in[i*12 + 4]
> +         || out[i*12 + 5] != in[i*12 + 5]
> +         || out[i*12 + 6] != in[i*12 + 6]
> +         || out[i*12 + 7] != in[i*12 + 7]
> +         || out[i*12 + 8] != in[i*12 + 8]
> +         || out[i*12 + 9] != in[i*12 + 9]
> +         || out[i*12 + 10] != in[i*12 + 10]
> +         || out[i*12 + 11] != in[i*12 + 11]
> +         || ia[i] != in[i*12 + 7])
> +        abort ();
> +    }
> +
> +  /* Hybrid SLP with unrolling by 2.  */
> +  for (i = 0; i < N; i++)
> +    {
> +      out[i*6] = in[i*6];
> +      out[i*6 + 1] = in[i*6 + 1];
> +      out[i*6 + 2] = in[i*6 + 2];
> +      out[i*6 + 3] = in[i*6 + 3];
> +      out[i*6 + 4] = in[i*6 + 4];
> +      out[i*6 + 5] = in[i*6 + 5];
> +
> +      ia[i] = i;
> +    }
> +
> +  /* check results:  */
> +  for (i = 0; i < N/2; i++)
> +    {
> +      if (out[i*6] !=  in[i*6]
> +         || out[i*6 + 1] != in[i*6 + 1]
> +         || out[i*6 + 2] != in[i*6 + 2]
> +         || out[i*6 + 3] != in[i*6 + 3]
> +         || out[i*6 + 4] != in[i*6 + 4]
> +         || out[i*6 + 5] != in[i*6 + 5]
> +         || ia[i] != i)
> +        abort ();
> +    }
> +
> +  return 0;
> +}
> +
> +int main (void)
> +{
> +  check_vect ();
> +
> +  main1 ();
> +
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" } } */
> +/* { dg-final { cleanup-tree-dump "vect" } } */
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [10/9] Add tests for stride-3 accesses
  2011-04-12 14:34 ` [10/9] Add tests for stride-3 accesses Richard Sandiford
@ 2011-04-15 12:45   ` Richard Guenther
  0 siblings, 0 replies; 27+ messages in thread
From: Richard Guenther @ 2011-04-15 12:45 UTC (permalink / raw)
  To: gcc-patches, patches, richard.sandiford

On Tue, Apr 12, 2011 at 4:34 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This patch adds a test for stride-3 accesses.  I didn't add any
> particularly complicated cases because I think the testsuite already
> covers the interaction between the strided loads & stores and other
> operations pretty well.  Let me know if there's something I should
> add though.
>
> Tested on x86_64-linux-gnu and arm-linux-gnueabi.  OK to install?

Ok.

Thanks,
Richard.

> Richard
>
>
> gcc/testsuite/
>        * gcc.dg/vect/vect-strided-u16-i3.c: New test.
>
> Index: gcc/testsuite/gcc.dg/vect/vect-strided-u16-i3.c
> ===================================================================
> --- /dev/null   2011-03-23 08:42:11.268792848 +0000
> +++ gcc/testsuite/gcc.dg/vect/vect-strided-u16-i3.c     2011-04-12 11:55:17.000000000 +0100
> @@ -0,0 +1,112 @@
> +#include <stdarg.h>
> +#include "tree-vect.h"
> +
> +#define N 128
> +
> +typedef struct {
> +   unsigned short a;
> +   unsigned short b;
> +   unsigned short c;
> +} s;
> +
> +#define A(I) (I)
> +#define B(I) ((I) * 2)
> +#define C(I) ((unsigned short) ~((I) ^ 0x18))
> +
> +void __attribute__ ((noinline))
> +check1 (s *res)
> +{
> +  int i;
> +
> +  for (i = 0; i < N; i++)
> +    if (res[i].a != C (i)
> +       || res[i].b != A (i)
> +       || res[i].c != B (i))
> +      abort ();
> +}
> +
> +void __attribute__ ((noinline))
> +check2 (unsigned short *res)
> +{
> +  int i;
> +
> +  for (i = 0; i < N; i++)
> +    if (res[i] != (unsigned short) (A (i) + B (i) + C (i)))
> +      abort ();
> +}
> +
> +void __attribute__ ((noinline))
> +check3 (s *res)
> +{
> +  int i;
> +
> +  for (i = 0; i < N; i++)
> +    if (res[i].a != i
> +       || res[i].b != i
> +       || res[i].c != i)
> +      abort ();
> +}
> +
> +void __attribute__ ((noinline))
> +check4 (unsigned short *res)
> +{
> +  int i;
> +
> +  for (i = 0; i < N; i++)
> +    if (res[i] != (unsigned short) (A (i) + B (i)))
> +      abort ();
> +}
> +
> +void __attribute__ ((noinline))
> +main1 (s *arr)
> +{
> +  int i;
> +  s *ptr = arr;
> +  s res1[N];
> +  unsigned short res2[N];
> +
> +  for (i = 0; i < N; i++)
> +    {
> +      res1[i].a = arr[i].c;
> +      res1[i].b = arr[i].a;
> +      res1[i].c = arr[i].b;
> +    }
> +  check1 (res1);
> +
> +  for (i = 0; i < N; i++)
> +    res2[i] = arr[i].a + arr[i].b + arr[i].c;
> +  check2 (res2);
> +
> +  for (i = 0; i < N; i++)
> +    {
> +      res1[i].a = i;
> +      res1[i].b = i;
> +      res1[i].c = i;
> +    }
> +  check3 (res1);
> +
> +  for (i = 0; i < N; i++)
> +    res2[i] = arr[i].a + arr[i].b;
> +  check4 (res2);
> +}
> +
> +int main (void)
> +{
> +  int i;
> +  s arr[N];
> +
> +  check_vect ();
> +
> +  for (i = 0; i < N; i++)
> +    {
> +      arr[i].a = A (i);
> +      arr[i].b = B (i);
> +      arr[i].c = C (i);
> +    }
> +  main1 (arr);
> +
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect"  { target vect_strided3 } } } */
> +/* { dg-final { cleanup-tree-dump "vect" } } */
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [6/9] NEON vec_load_lanes and vec_store_lanes patterns
  2011-04-12 14:01 ` [6/9] NEON vec_load_lanes and vec_store_lanes patterns Richard Sandiford
@ 2011-04-15 13:20   ` Richard Earnshaw
  0 siblings, 0 replies; 27+ messages in thread
From: Richard Earnshaw @ 2011-04-15 13:20 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches, patches


On Tue, 2011-04-12 at 15:01 +0100, Richard Sandiford wrote:
> This patch adds vec_load_lanes and vec_store_lanes patterns for NEON.
> They feed directly into the corresponding intrinsic patterns.
> 
> Tested on x86_64-linux-gnu and arm-linux-gnueabi.  OK to install?
> 
> Richard
> 
> 
> gcc/
> 	* config/arm/neon.md (vec_load_lanes<mode><mode>): New expanders,
> 	(vec_store_lanes<mode><mode>): Likewise.
> 

OK.

R.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [3/9] STMT_VINFO_RELATED_STMT handling in vectorizable_store
  2011-04-12 13:40 ` [3/9] STMT_VINFO_RELATED_STMT handling in vectorizable_store Richard Sandiford
@ 2011-04-17 10:25   ` Ira Rosen
  0 siblings, 0 replies; 27+ messages in thread
From: Ira Rosen @ 2011-04-17 10:25 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches, patches



gcc-patches-owner@gcc.gnu.org wrote on 12/04/2011 04:38:54 PM:

> vectorizable_store contains the code:
>
>   for (j = 0; j < ncopies; j++)
>     {
>       for (i = 0; i < vec_num; i++)
>    {
>      ...
>          if (j == 0)
>            STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
>          else
>            STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
>          prev_stmt_info = vinfo_for_stmt (new_stmt);
>          }
>     }
>
> That is, STMT_VINFO_VEC_STMT (stmt_info) and *vec_stmt contain the last
> statement emitted for the _last_ vector of the first copy.  However,
> for later copies, the last statement for _every_ vector is chained using
> STMT_VINFO_RELATED_STMT.  This seems a bit inconsistent, and isn't
> what I expected from the comments.  It also seems different from
> other vectorisation functions, where each copy has exactly one
> STMT_VINFO_RELATED_STMT.  I wasn't sure whether the difference here
> was deliberate or not.

I think it doesn't really matter because STMT_VINFO_RELATED_STMT is used
for retrieving copies of vector operands, and stores don't define any.

>
> The reason I'm changing it is that it makes the control flow for
> the new code more obvious.
>
> Tested on x86_64-linux-gnu and arm-linux-gnueabi.  OK to install?

OK.

Thanks,
Ira

>
> Richard
>
>
> gcc/
>    * tree-vect-stmts.c (vectorizable_store): Only chain one related
>    statement per copy.
>
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c   2011-04-12 11:55:08.000000000 +0100
> +++ gcc/tree-vect-stmts.c   2011-04-12 11:55:09.000000000 +0100
> @@ -3612,6 +3612,7 @@ vectorizable_store (gimple stmt, gimple_
>
>        if (1)
>     {
> +     new_stmt = NULL;
>       if (strided_store)
>         {
>           result_chain = VEC_alloc (tree, heap, group_size);
> @@ -3669,17 +3670,19 @@ vectorizable_store (gimple stmt, gimple_
>           if (slp)
>        continue;
>
> -         if (j == 0)
> -      STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt =  new_stmt;
> -         else
> -      STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
> -
> -         prev_stmt_info = vinfo_for_stmt (new_stmt);
>           next_stmt = DR_GROUP_NEXT_DR (vinfo_for_stmt (next_stmt));
>           if (!next_stmt)
>        break;
>         }
>     }
> +      if (!slp)
> +   {
> +     if (j == 0)
> +       STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt =  new_stmt;
> +     else
> +       STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
> +     prev_stmt_info = vinfo_for_stmt (new_stmt);
> +   }
>      }
>
>    VEC_free (tree, heap, dr_chain);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [5/9] Main target-independent support for direct interleaving
  2011-04-12 13:59 ` [5/9] Main target-independent support for direct interleaving Richard Sandiford
@ 2011-04-17 14:26   ` Ira Rosen
  2011-04-18 11:54   ` Richard Guenther
  1 sibling, 0 replies; 27+ messages in thread
From: Ira Rosen @ 2011-04-17 14:26 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches, patches

gcc-patches-owner@gcc.gnu.org wrote on 12/04/2011 04:59:16 PM:

>
> This patch adds vec_load_lanes and vec_store_lanes optabs for
instructions
> like NEON's vldN and vstN.  The optabs are defined this way because the
> vectors must be allocated to a block of consecutive registers.
>
> Tested on x86_64-linux-gnu and arm-linux-gnueabi.  OK to install?

The vectorizer part is fine with me except for:


> @@ -685,9 +761,11 @@ vect_model_store_cost (stmt_vec_info stm
>        first_dr = STMT_VINFO_DATA_REF (stmt_info);
>      }
>
> -  /* Is this an access in a group of stores, which provide strided
access?
> -     If so, add in the cost of the permutes.  */
> -  if (group_size > 1)
> +  /* We assume that the cost of a single store-lanes instruction is
> +     equivalent to the cost of GROUP_SIZE separate stores.  If a strided
> +     access is instead being provided by a load-and-permute operation,

I think it should be 'permute-and-store' and not 'load-and-permute'.

> +     include the cost of the permutes.  */
> +  if (!store_lanes_p && group_size > 1)
>      {
>        /* Uses a high and low interleave operation for each needed
> permute.  */
>        inside_cost = ncopies * exact_log2(group_size) * group_size


Thanks,
Ira

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [5/9] Main target-independent support for direct interleaving
  2011-04-12 13:59 ` [5/9] Main target-independent support for direct interleaving Richard Sandiford
  2011-04-17 14:26   ` Ira Rosen
@ 2011-04-18 11:54   ` Richard Guenther
  2011-04-18 11:57     ` Richard Sandiford
  1 sibling, 1 reply; 27+ messages in thread
From: Richard Guenther @ 2011-04-18 11:54 UTC (permalink / raw)
  To: gcc-patches, patches, richard.sandiford

On Tue, Apr 12, 2011 at 3:59 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This patch adds vec_load_lanes and vec_store_lanes optabs for instructions
> like NEON's vldN and vstN.  The optabs are defined this way because the
> vectors must be allocated to a block of consecutive registers.
>
> Tested on x86_64-linux-gnu and arm-linux-gnueabi.  OK to install?
>
> Richard
>
>
> gcc/
>        * doc/md.texi (vec_load_lanes, vec_store_lanes): Document.
>        * optabs.h (COI_vec_load_lanes, COI_vec_store_lanes): New
>        convert_optab_index values.
>        (vec_load_lanes_optab, vec_store_lanes_optab): New convert optabs.
>        * genopinit.c (optabs): Initialize the new optabs.
>        * internal-fn.def (LOAD_LANES, STORE_LANES): New internal functions.
>        * internal-fn.c (get_multi_vector_move, expand_LOAD_LANES)
>        (expand_STORE_LANES): New functions.
>        * tree.h (build_simple_array_type): Declare.
>        * tree.c (build_simple_array_type): New function.
>        * tree-vectorizer.h (vect_model_store_cost): Add a bool argument.
>        (vect_model_load_cost): Likewise.
>        (vect_store_lanes_supported, vect_load_lanes_supported)
>        (vect_record_strided_load_vectors): Declare.
>        * tree-vect-data-refs.c (vect_lanes_optab_supported_p)
>        (vect_store_lanes_supported, vect_load_lanes_supported): New functions.
>        (vect_transform_strided_load): Split out statement recording into...
>        (vect_record_strided_load_vectors): ...this new function.
>        * tree-vect-stmts.c (create_vector_array, read_vector_array)
>        (write_vector_array, create_array_ref): New functions.
>        (vect_model_store_cost): Add store_lanes_p argument.
>        (vect_model_load_cost): Add load_lanes_p argument.
>        (vectorizable_store): Try to use store-lanes functions for
>        interleaved stores.
>        (vectorizable_load): Likewise load-lanes and loads.
>        * tree-vect-slp.c (vect_get_and_check_slp_defs)
>        (vect_build_slp_tree):
>
> Index: gcc/doc/md.texi
> ===================================================================
> --- gcc/doc/md.texi     2011-04-12 12:16:46.000000000 +0100
> +++ gcc/doc/md.texi     2011-04-12 14:48:28.000000000 +0100
> @@ -3846,6 +3846,48 @@ into consecutive memory locations.  Oper
>  consecutive memory locations, operand 1 is the first register, and
>  operand 2 is a constant: the number of consecutive registers.
>
> +@cindex @code{vec_load_lanes@var{m}@var{n}} instruction pattern
> +@item @samp{vec_load_lanes@var{m}@var{n}}
> +Perform an interleaved load of several vectors from memory operand 1
> +into register operand 0.  Both operands have mode @var{m}.  The register
> +operand is viewed as holding consecutive vectors of mode @var{n},
> +while the memory operand is a flat array that contains the same number
> +of elements.  The operation is equivalent to:
> +
> +@smallexample
> +int c = GET_MODE_SIZE (@var{m}) / GET_MODE_SIZE (@var{n});
> +for (j = 0; j < GET_MODE_NUNITS (@var{n}); j++)
> +  for (i = 0; i < c; i++)
> +    operand0[i][j] = operand1[j * c + i];
> +@end smallexample
> +
> +For example, @samp{vec_load_lanestiv4hi} loads 8 16-bit values
> +from memory into a register of mode @samp{TI}@.  The register
> +contains two consecutive vectors of mode @samp{V4HI}@.

So vec_load_lanestiv2qi would load ... ?  c == 8 here.  Intuitively
such operation would have adjacent blocks of siv2qi memory.  But
maybe you want to constrain the mode size to GET_MODE_SIZE (@var{n})
* GET_MODE_NUNITS (@var{n})?  In which case the mode m is
redundant?  You could specify that we load NUNITS adjacent vectors into
an integer mode of appropriate size.

> +This pattern can only be used if:
> +@smallexample
> +TARGET_ARRAY_MODE_SUPPORTED_P (@var{n}, @var{c})
> +@end smallexample
> +is true.  GCC assumes that, if a target supports this kind of
> +instruction for some mode @var{n}, it also supports unaligned
> +loads for vectors of mode @var{n}.
> +
> +@cindex @code{vec_store_lanes@var{m}@var{n}} instruction pattern
> +@item @samp{vec_store_lanes@var{m}@var{n}}
> +Equivalent to @samp{vec_load_lanes@var{m}@var{n}}, with the memory
> +and register operands reversed.  That is, the instruction is
> +equivalent to:
> +
> +@smallexample
> +int c = GET_MODE_SIZE (@var{m}) / GET_MODE_SIZE (@var{n});
> +for (j = 0; j < GET_MODE_NUNITS (@var{n}); j++)
> +  for (i = 0; i < c; i++)
> +    operand0[j * c + i] = operand1[i][j];
> +@end smallexample
> +
> +for a memory operand 0 and register operand 1.
> +
>  @cindex @code{vec_set@var{m}} instruction pattern
>  @item @samp{vec_set@var{m}}
>  Set given field in the vector value.  Operand 0 is the vector to modify,
> Index: gcc/optabs.h
> ===================================================================
> --- gcc/optabs.h        2011-04-12 12:16:46.000000000 +0100
> +++ gcc/optabs.h        2011-04-12 14:48:28.000000000 +0100
> @@ -578,6 +578,9 @@ enum convert_optab_index
>   COI_satfract,
>   COI_satfractuns,
>
> +  COI_vec_load_lanes,
> +  COI_vec_store_lanes,
> +

Um, they are not really conversion optabs.  Any reason they can't
use the direct_optab table and path?  What are the two modes
usually?  I don't see how you specify the kind of permutation that
is performed on the load - so, why not go the targetm.expand_builtin
path instead (well, targetm.expand_internal_fn, of course - or rather
targetm.expand_gimple_call which we need anyway for expanding
directly from gimple calls at some point).

>   COI_MAX
>  };
>
> @@ -598,6 +601,8 @@ #define fract_optab (&convert_optab_tabl
>  #define fractuns_optab (&convert_optab_table[COI_fractuns])
>  #define satfract_optab (&convert_optab_table[COI_satfract])
>  #define satfractuns_optab (&convert_optab_table[COI_satfractuns])
> +#define vec_load_lanes_optab (&convert_optab_table[COI_vec_load_lanes])
> +#define vec_store_lanes_optab (&convert_optab_table[COI_vec_store_lanes])
>
>  /* Contains the optab used for each rtx code.  */
>  extern optab code_to_optab[NUM_RTX_CODE + 1];
> Index: gcc/genopinit.c
> ===================================================================
> --- gcc/genopinit.c     2011-04-12 12:16:46.000000000 +0100
> +++ gcc/genopinit.c     2011-04-12 14:48:28.000000000 +0100
> @@ -74,6 +74,8 @@ static const char * const optabs[] =
>   "set_convert_optab_handler (fractuns_optab, $B, $A, CODE_FOR_$(fractuns$Q$a$I$b2$))",
>   "set_convert_optab_handler (satfract_optab, $B, $A, CODE_FOR_$(satfract$a$Q$b2$))",
>   "set_convert_optab_handler (satfractuns_optab, $B, $A, CODE_FOR_$(satfractuns$I$a$Q$b2$))",
> +  "set_convert_optab_handler (vec_load_lanes_optab, $A, $B, CODE_FOR_$(vec_load_lanes$a$b$))",
> +  "set_convert_optab_handler (vec_store_lanes_optab, $A, $B, CODE_FOR_$(vec_store_lanes$a$b$))",
>   "set_optab_handler (add_optab, $A, CODE_FOR_$(add$P$a3$))",
>   "set_optab_handler (addv_optab, $A, CODE_FOR_$(add$F$a3$)),\n\
>     set_optab_handler (add_optab, $A, CODE_FOR_$(add$F$a3$))",
> Index: gcc/internal-fn.def
> ===================================================================
> --- gcc/internal-fn.def 2011-04-12 14:10:42.000000000 +0100
> +++ gcc/internal-fn.def 2011-04-12 14:48:28.000000000 +0100
> @@ -32,3 +32,6 @@ along with GCC; see the file COPYING3.
>
>    where NAME is the name of the function and FLAGS is a set of
>    ECF_* flags.  */
> +
> +DEF_INTERNAL_FN (LOAD_LANES, ECF_CONST | ECF_LEAF)
> +DEF_INTERNAL_FN (STORE_LANES, ECF_CONST | ECF_LEAF)
> Index: gcc/internal-fn.c
> ===================================================================
> --- gcc/internal-fn.c   2011-04-12 14:10:42.000000000 +0100
> +++ gcc/internal-fn.c   2011-04-12 14:48:28.000000000 +0100
> @@ -41,6 +41,69 @@ #define DEF_INTERNAL_FN(CODE, FLAGS) FLA
>   0
>  };
>
> +/* ARRAY_TYPE is an array of vector modes.  Return the associated insn
> +   for load-lanes-style optab OPTAB.  The insn must exist.  */
> +
> +static enum insn_code
> +get_multi_vector_move (tree array_type, convert_optab optab)
> +{
> +  enum insn_code icode;
> +  enum machine_mode imode;
> +  enum machine_mode vmode;
> +
> +  gcc_assert (TREE_CODE (array_type) == ARRAY_TYPE);
> +  imode = TYPE_MODE (array_type);
> +  vmode = TYPE_MODE (TREE_TYPE (array_type));
> +
> +  icode = convert_optab_handler (optab, imode, vmode);
> +  gcc_assert (icode != CODE_FOR_nothing);
> +  return icode;
> +}
> +
> +/* Expand: LHS = LOAD_LANES (ARGS[0]).  */
> +
> +static void
> +expand_LOAD_LANES (tree lhs, tree *args)
> +{
> +  struct expand_operand ops[2];
> +  tree type;
> +  rtx target, mem;
> +
> +  type = TREE_TYPE (lhs);
> +
> +  target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> +  mem = expand_normal (args[0]);
> +
> +  gcc_assert (MEM_P (mem));
> +  PUT_MODE (mem, TYPE_MODE (type));
> +
> +  create_output_operand (&ops[0], target, TYPE_MODE (type));
> +  create_fixed_operand (&ops[1], mem);
> +  expand_insn (get_multi_vector_move (type, vec_load_lanes_optab), 2, ops);
> +}
> +
> +/* Expand: LHS = STORE_LANES (ARGS[0]).  */
> +
> +static void
> +expand_STORE_LANES (tree lhs, tree *args)
> +{
> +  struct expand_operand ops[2];
> +  tree type;
> +  rtx target, rhs;
> +
> +  type = TREE_TYPE (args[0]);
> +
> +  target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> +  rhs = expand_normal (args[0]);
> +
> +  gcc_assert (MEM_P (target));
> +  PUT_MODE (target, TYPE_MODE (type));
> +
> +  create_fixed_operand (&ops[0], target);
> +  create_input_operand (&ops[1], rhs, TYPE_MODE (type));
> +  expand_insn (get_multi_vector_move (type, vec_store_lanes_optab), 2, ops);
> +}
> +
>  /* Routines to expand each internal function, indexed by function number.
>    Each routine has the prototype:
>
> Index: gcc/tree.h
> ===================================================================
> --- gcc/tree.h  2011-04-12 12:16:46.000000000 +0100
> +++ gcc/tree.h  2011-04-12 14:48:28.000000000 +0100
> @@ -4198,6 +4198,7 @@ extern tree build_type_no_quals (tree);
>  extern tree build_index_type (tree);
>  extern tree build_array_type (tree, tree);
>  extern tree build_nonshared_array_type (tree, tree);
> +extern tree build_simple_array_type (tree, unsigned HOST_WIDE_INT);
>  extern tree build_function_type (tree, tree);
>  extern tree build_function_type_list (tree, ...);
>  extern tree build_function_type_skip_args (tree, bitmap);
> Index: gcc/tree.c
> ===================================================================
> --- gcc/tree.c  2011-04-12 12:16:46.000000000 +0100
> +++ gcc/tree.c  2011-04-12 14:48:28.000000000 +0100
> @@ -7385,6 +7385,15 @@ build_nonshared_array_type (tree elt_typ
>   return build_array_type_1 (elt_type, index_type, false);
>  }
>
> +/* Return a representation of ELT_TYPE[NELTS], using indices of type
> +   sizetype.  */
> +
> +tree
> +build_simple_array_type (tree elt_type, unsigned HOST_WIDE_INT nelts)

build_array_type_nelts

The rest looks ok to me.

Richard.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [5/9] Main target-independent support for direct interleaving
  2011-04-18 11:54   ` Richard Guenther
@ 2011-04-18 11:57     ` Richard Sandiford
  2011-04-18 12:54       ` Richard Guenther
  0 siblings, 1 reply; 27+ messages in thread
From: Richard Sandiford @ 2011-04-18 11:57 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, patches

Richard Guenther <richard.guenther@gmail.com> writes:
> On Tue, Apr 12, 2011 at 3:59 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> This patch adds vec_load_lanes and vec_store_lanes optabs for instructions
>> like NEON's vldN and vstN.  The optabs are defined this way because the
>> vectors must be allocated to a block of consecutive registers.
>>
>> Tested on x86_64-linux-gnu and arm-linux-gnueabi.  OK to install?
>>
>> Richard
>>
>>
>> gcc/
>>        * doc/md.texi (vec_load_lanes, vec_store_lanes): Document.
>>        * optabs.h (COI_vec_load_lanes, COI_vec_store_lanes): New
>>        convert_optab_index values.
>>        (vec_load_lanes_optab, vec_store_lanes_optab): New convert optabs.
>>        * genopinit.c (optabs): Initialize the new optabs.
>>        * internal-fn.def (LOAD_LANES, STORE_LANES): New internal functions.
>>        * internal-fn.c (get_multi_vector_move, expand_LOAD_LANES)
>>        (expand_STORE_LANES): New functions.
>>        * tree.h (build_simple_array_type): Declare.
>>        * tree.c (build_simple_array_type): New function.
>>        * tree-vectorizer.h (vect_model_store_cost): Add a bool argument.
>>        (vect_model_load_cost): Likewise.
>>        (vect_store_lanes_supported, vect_load_lanes_supported)
>>        (vect_record_strided_load_vectors): Declare.
>>        * tree-vect-data-refs.c (vect_lanes_optab_supported_p)
>>        (vect_store_lanes_supported, vect_load_lanes_supported): New functions.
>>        (vect_transform_strided_load): Split out statement recording into...
>>        (vect_record_strided_load_vectors): ...this new function.
>>        * tree-vect-stmts.c (create_vector_array, read_vector_array)
>>        (write_vector_array, create_array_ref): New functions.
>>        (vect_model_store_cost): Add store_lanes_p argument.
>>        (vect_model_load_cost): Add load_lanes_p argument.
>>        (vectorizable_store): Try to use store-lanes functions for
>>        interleaved stores.
>>        (vectorizable_load): Likewise load-lanes and loads.
>>        * tree-vect-slp.c (vect_get_and_check_slp_defs)
>>        (vect_build_slp_tree):
>>
>> Index: gcc/doc/md.texi
>> ===================================================================
>> --- gcc/doc/md.texi     2011-04-12 12:16:46.000000000 +0100
>> +++ gcc/doc/md.texi     2011-04-12 14:48:28.000000000 +0100
>> @@ -3846,6 +3846,48 @@ into consecutive memory locations.  Oper
>>  consecutive memory locations, operand 1 is the first register, and
>>  operand 2 is a constant: the number of consecutive registers.
>>
>> +@cindex @code{vec_load_lanes@var{m}@var{n}} instruction pattern
>> +@item @samp{vec_load_lanes@var{m}@var{n}}
>> +Perform an interleaved load of several vectors from memory operand 1
>> +into register operand 0.  Both operands have mode @var{m}.  The register
>> +operand is viewed as holding consecutive vectors of mode @var{n},
>> +while the memory operand is a flat array that contains the same number
>> +of elements.  The operation is equivalent to:
>> +
>> +@smallexample
>> +int c = GET_MODE_SIZE (@var{m}) / GET_MODE_SIZE (@var{n});
>> +for (j = 0; j < GET_MODE_NUNITS (@var{n}); j++)
>> +  for (i = 0; i < c; i++)
>> +    operand0[i][j] = operand1[j * c + i];
>> +@end smallexample
>> +
>> +For example, @samp{vec_load_lanestiv4hi} loads 8 16-bit values
>> +from memory into a register of mode @samp{TI}@.  The register
>> +contains two consecutive vectors of mode @samp{V4HI}@.
>
> So vec_load_lanestiv2qi would load ... ?  c == 8 here.  Intuitively
> such operation would have adjacent blocks of siv2qi memory.  But
> maybe you want to constrain the mode size to GET_MODE_SIZE (@var{n})
> * GET_MODE_NUNITS (@var{n})?  In which case the mode m is
> redundant?  You could specify that we load NUNITS adjacent vectors into
> an integer mode of appropriate size.

Like you say, vec_load_lanestiv2qi would load 16 QImode elements into
8 consecutive V2QI registers.  The first element from register vector I
would come from operand1[I] and the second element would come from
operand1[I + 8].  That's meant to be a valid combination.

We specifically want to allow:

  GET_MODE_SIZE (@var{m})
    != GET_MODE_SIZE (@var{n}) * GET_MODE_NUNITS (@var{n})

The vec_load_lanestiv4hi example in the docs is one case of this:

  GET_MODE_SIZE (@var{m}) = 16
  GET_MODE_SIZE (@var{n}) = 8
  GET_MODE_NUNITS (@var{n}) = 4

That example maps directly to ARM's vld2.32.  We also want cases
where @var{m} is three times the size of @var{n} (vld3.WW) and
cases where @var{m} is four times the size of @var{n} (vld4.WW)

>> +/* Return a representation of ELT_TYPE[NELTS], using indices of type
>> +   sizetype.  */
>> +
>> +tree
>> +build_simple_array_type (tree elt_type, unsigned HOST_WIDE_INT nelts)
>
> build_array_type_nelts

OK.

Richard

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [5/9] Main target-independent support for direct interleaving
  2011-04-18 11:57     ` Richard Sandiford
@ 2011-04-18 12:54       ` Richard Guenther
  2011-04-18 12:58         ` Richard Sandiford
  0 siblings, 1 reply; 27+ messages in thread
From: Richard Guenther @ 2011-04-18 12:54 UTC (permalink / raw)
  To: Richard Guenther, gcc-patches, patches, richard.sandiford

On Mon, Apr 18, 2011 at 1:24 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Guenther <richard.guenther@gmail.com> writes:
>> On Tue, Apr 12, 2011 at 3:59 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> This patch adds vec_load_lanes and vec_store_lanes optabs for instructions
>>> like NEON's vldN and vstN.  The optabs are defined this way because the
>>> vectors must be allocated to a block of consecutive registers.
>>>
>>> Tested on x86_64-linux-gnu and arm-linux-gnueabi.  OK to install?
>>>
>>> Richard
>>>
>>>
>>> gcc/
>>>        * doc/md.texi (vec_load_lanes, vec_store_lanes): Document.
>>>        * optabs.h (COI_vec_load_lanes, COI_vec_store_lanes): New
>>>        convert_optab_index values.
>>>        (vec_load_lanes_optab, vec_store_lanes_optab): New convert optabs.
>>>        * genopinit.c (optabs): Initialize the new optabs.
>>>        * internal-fn.def (LOAD_LANES, STORE_LANES): New internal functions.
>>>        * internal-fn.c (get_multi_vector_move, expand_LOAD_LANES)
>>>        (expand_STORE_LANES): New functions.
>>>        * tree.h (build_simple_array_type): Declare.
>>>        * tree.c (build_simple_array_type): New function.
>>>        * tree-vectorizer.h (vect_model_store_cost): Add a bool argument.
>>>        (vect_model_load_cost): Likewise.
>>>        (vect_store_lanes_supported, vect_load_lanes_supported)
>>>        (vect_record_strided_load_vectors): Declare.
>>>        * tree-vect-data-refs.c (vect_lanes_optab_supported_p)
>>>        (vect_store_lanes_supported, vect_load_lanes_supported): New functions.
>>>        (vect_transform_strided_load): Split out statement recording into...
>>>        (vect_record_strided_load_vectors): ...this new function.
>>>        * tree-vect-stmts.c (create_vector_array, read_vector_array)
>>>        (write_vector_array, create_array_ref): New functions.
>>>        (vect_model_store_cost): Add store_lanes_p argument.
>>>        (vect_model_load_cost): Add load_lanes_p argument.
>>>        (vectorizable_store): Try to use store-lanes functions for
>>>        interleaved stores.
>>>        (vectorizable_load): Likewise load-lanes and loads.
>>>        * tree-vect-slp.c (vect_get_and_check_slp_defs)
>>>        (vect_build_slp_tree):
>>>
>>> Index: gcc/doc/md.texi
>>> ===================================================================
>>> --- gcc/doc/md.texi     2011-04-12 12:16:46.000000000 +0100
>>> +++ gcc/doc/md.texi     2011-04-12 14:48:28.000000000 +0100
>>> @@ -3846,6 +3846,48 @@ into consecutive memory locations.  Oper
>>>  consecutive memory locations, operand 1 is the first register, and
>>>  operand 2 is a constant: the number of consecutive registers.
>>>
>>> +@cindex @code{vec_load_lanes@var{m}@var{n}} instruction pattern
>>> +@item @samp{vec_load_lanes@var{m}@var{n}}
>>> +Perform an interleaved load of several vectors from memory operand 1
>>> +into register operand 0.  Both operands have mode @var{m}.  The register
>>> +operand is viewed as holding consecutive vectors of mode @var{n},
>>> +while the memory operand is a flat array that contains the same number
>>> +of elements.  The operation is equivalent to:
>>> +
>>> +@smallexample
>>> +int c = GET_MODE_SIZE (@var{m}) / GET_MODE_SIZE (@var{n});
>>> +for (j = 0; j < GET_MODE_NUNITS (@var{n}); j++)
>>> +  for (i = 0; i < c; i++)
>>> +    operand0[i][j] = operand1[j * c + i];
>>> +@end smallexample
>>> +
>>> +For example, @samp{vec_load_lanestiv4hi} loads 8 16-bit values
>>> +from memory into a register of mode @samp{TI}@.  The register
>>> +contains two consecutive vectors of mode @samp{V4HI}@.
>>
>> So vec_load_lanestiv2qi would load ... ?  c == 8 here.  Intuitively
>> such operation would have adjacent blocks of siv2qi memory.  But
>> maybe you want to constrain the mode size to GET_MODE_SIZE (@var{n})
>> * GET_MODE_NUNITS (@var{n})?  In which case the mode m is
>> redundant?  You could specify that we load NUNITS adjacent vectors into
>> an integer mode of appropriate size.
>
> Like you say, vec_load_lanestiv2qi would load 16 QImode elements into
> 8 consecutive V2QI registers.  The first element from register vector I
> would come from operand1[I] and the second element would come from
> operand1[I + 8].  That's meant to be a valid combination.

Ok, but the C loop from the example doesn't seem to match.  Or I couldn't
wrap my head around it despite looking for 5 minutes and already having
coffee ;)  I would have expected the vectors being in memory as

  v0[0], v1[0], v0[1], v1[1], v2[0], v3[1]. v2[1], v3[1], ...

not

  v0[0], v1[0], v2[0], ...

as I would have thought the former is more useful (simple unrolling for
stride 2).  We'd need a separate set of optabs for such an interleaving
scheme?  In which case we might want to come up with a more
specific name than load_lane?

> We specifically want to allow:
>
>  GET_MODE_SIZE (@var{m})
>    != GET_MODE_SIZE (@var{n}) * GET_MODE_NUNITS (@var{n})
>
> The vec_load_lanestiv4hi example in the docs is one case of this:
>
>  GET_MODE_SIZE (@var{m}) = 16
>  GET_MODE_SIZE (@var{n}) = 8
>  GET_MODE_NUNITS (@var{n}) = 4
>
> That example maps directly to ARM's vld2.32.  We also want cases
> where @var{m} is three times the size of @var{n} (vld3.WW) and
> cases where @var{m} is four times the size of @var{n} (vld4.WW)
>
>>> +/* Return a representation of ELT_TYPE[NELTS], using indices of type
>>> +   sizetype.  */
>>> +
>>> +tree
>>> +build_simple_array_type (tree elt_type, unsigned HOST_WIDE_INT nelts)
>>
>> build_array_type_nelts
>
> OK.
>
> Richard
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [5/9] Main target-independent support for direct interleaving
  2011-04-18 12:54       ` Richard Guenther
@ 2011-04-18 12:58         ` Richard Sandiford
  2011-04-18 13:22           ` Richard Guenther
  0 siblings, 1 reply; 27+ messages in thread
From: Richard Sandiford @ 2011-04-18 12:58 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, patches

Richard Guenther <richard.guenther@gmail.com> writes:
> On Mon, Apr 18, 2011 at 1:24 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Richard Guenther <richard.guenther@gmail.com> writes:
>>> On Tue, Apr 12, 2011 at 3:59 PM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> Index: gcc/doc/md.texi
>>>> ===================================================================
>>>> --- gcc/doc/md.texi     2011-04-12 12:16:46.000000000 +0100
>>>> +++ gcc/doc/md.texi     2011-04-12 14:48:28.000000000 +0100
>>>> @@ -3846,6 +3846,48 @@ into consecutive memory locations.  Oper
>>>>  consecutive memory locations, operand 1 is the first register, and
>>>>  operand 2 is a constant: the number of consecutive registers.
>>>>
>>>> +@cindex @code{vec_load_lanes@var{m}@var{n}} instruction pattern
>>>> +@item @samp{vec_load_lanes@var{m}@var{n}}
>>>> +Perform an interleaved load of several vectors from memory operand 1
>>>> +into register operand 0.  Both operands have mode @var{m}.  The register
>>>> +operand is viewed as holding consecutive vectors of mode @var{n},
>>>> +while the memory operand is a flat array that contains the same number
>>>> +of elements.  The operation is equivalent to:
>>>> +
>>>> +@smallexample
>>>> +int c = GET_MODE_SIZE (@var{m}) / GET_MODE_SIZE (@var{n});
>>>> +for (j = 0; j < GET_MODE_NUNITS (@var{n}); j++)
>>>> +  for (i = 0; i < c; i++)
>>>> +    operand0[i][j] = operand1[j * c + i];
>>>> +@end smallexample
>>>> +
>>>> +For example, @samp{vec_load_lanestiv4hi} loads 8 16-bit values
>>>> +from memory into a register of mode @samp{TI}@.  The register
>>>> +contains two consecutive vectors of mode @samp{V4HI}@.
>>>
>>> So vec_load_lanestiv2qi would load ... ?  c == 8 here.  Intuitively
>>> such operation would have adjacent blocks of siv2qi memory.  But
>>> maybe you want to constrain the mode size to GET_MODE_SIZE (@var{n})
>>> * GET_MODE_NUNITS (@var{n})?  In which case the mode m is
>>> redundant?  You could specify that we load NUNITS adjacent vectors into
>>> an integer mode of appropriate size.
>>
>> Like you say, vec_load_lanestiv2qi would load 16 QImode elements into
>> 8 consecutive V2QI registers.  The first element from register vector I
>> would come from operand1[I] and the second element would come from
>> operand1[I + 8].  That's meant to be a valid combination.
>
> Ok, but the C loop from the example doesn't seem to match.  Or I couldn't
> wrap my head around it despite looking for 5 minutes and already having
> coffee ;)  I would have expected the vectors being in memory as
>
>   v0[0], v1[0], v0[1], v1[1], v2[0], v3[1]. v2[1], v3[1], ...
>
> not
>
>   v0[0], v1[0], v2[0], ...
>
> as I would have thought the former is more useful (simple unrolling for
> stride 2).

The second one's right.  All lane 0 elements, followed by all lane 1
elements, etc.  I think that's what the C loop says.

> We'd need a separate set of optabs for such an interleaving
> scheme?  In which case we might want to come up with a more
> specific name than load_lane?

Yeah, if someone has a single instruction that does your first example,
then it would need a new optab.  The individual vector pairs could be
represented using the current optab though, if each pair needs a
separate instruction.  E.g. with your v2qi example, vec_load_lanessiv2qi
would load:

   v0[0], v1[0], v0[1], v1[1]

and you could repeat for the others.  So load_lanes (as defined here)
could be treated as a primitive, and your first example could be something
like "repeat_load_lanes".

If you don't like the name "load_lanes" though, I'm happy to use
something else.

Richard

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [5/9] Main target-independent support for direct interleaving
  2011-04-18 12:58         ` Richard Sandiford
@ 2011-04-18 13:22           ` Richard Guenther
  0 siblings, 0 replies; 27+ messages in thread
From: Richard Guenther @ 2011-04-18 13:22 UTC (permalink / raw)
  To: Richard Guenther, gcc-patches, patches, richard.sandiford

On Mon, Apr 18, 2011 at 2:19 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Guenther <richard.guenther@gmail.com> writes:
>> On Mon, Apr 18, 2011 at 1:24 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> Richard Guenther <richard.guenther@gmail.com> writes:
>>>> On Tue, Apr 12, 2011 at 3:59 PM, Richard Sandiford
>>>> <richard.sandiford@linaro.org> wrote:
>>>>> Index: gcc/doc/md.texi
>>>>> ===================================================================
>>>>> --- gcc/doc/md.texi     2011-04-12 12:16:46.000000000 +0100
>>>>> +++ gcc/doc/md.texi     2011-04-12 14:48:28.000000000 +0100
>>>>> @@ -3846,6 +3846,48 @@ into consecutive memory locations.  Oper
>>>>>  consecutive memory locations, operand 1 is the first register, and
>>>>>  operand 2 is a constant: the number of consecutive registers.
>>>>>
>>>>> +@cindex @code{vec_load_lanes@var{m}@var{n}} instruction pattern
>>>>> +@item @samp{vec_load_lanes@var{m}@var{n}}
>>>>> +Perform an interleaved load of several vectors from memory operand 1
>>>>> +into register operand 0.  Both operands have mode @var{m}.  The register
>>>>> +operand is viewed as holding consecutive vectors of mode @var{n},
>>>>> +while the memory operand is a flat array that contains the same number
>>>>> +of elements.  The operation is equivalent to:
>>>>> +
>>>>> +@smallexample
>>>>> +int c = GET_MODE_SIZE (@var{m}) / GET_MODE_SIZE (@var{n});
>>>>> +for (j = 0; j < GET_MODE_NUNITS (@var{n}); j++)
>>>>> +  for (i = 0; i < c; i++)
>>>>> +    operand0[i][j] = operand1[j * c + i];
>>>>> +@end smallexample
>>>>> +
>>>>> +For example, @samp{vec_load_lanestiv4hi} loads 8 16-bit values
>>>>> +from memory into a register of mode @samp{TI}@.  The register
>>>>> +contains two consecutive vectors of mode @samp{V4HI}@.
>>>>
>>>> So vec_load_lanestiv2qi would load ... ?  c == 8 here.  Intuitively
>>>> such operation would have adjacent blocks of siv2qi memory.  But
>>>> maybe you want to constrain the mode size to GET_MODE_SIZE (@var{n})
>>>> * GET_MODE_NUNITS (@var{n})?  In which case the mode m is
>>>> redundant?  You could specify that we load NUNITS adjacent vectors into
>>>> an integer mode of appropriate size.
>>>
>>> Like you say, vec_load_lanestiv2qi would load 16 QImode elements into
>>> 8 consecutive V2QI registers.  The first element from register vector I
>>> would come from operand1[I] and the second element would come from
>>> operand1[I + 8].  That's meant to be a valid combination.
>>
>> Ok, but the C loop from the example doesn't seem to match.  Or I couldn't
>> wrap my head around it despite looking for 5 minutes and already having
>> coffee ;)  I would have expected the vectors being in memory as
>>
>>   v0[0], v1[0], v0[1], v1[1], v2[0], v3[1]. v2[1], v3[1], ...
>>
>> not
>>
>>   v0[0], v1[0], v2[0], ...
>>
>> as I would have thought the former is more useful (simple unrolling for
>> stride 2).
>
> The second one's right.  All lane 0 elements, followed by all lane 1
> elements, etc.  I think that's what the C loop says.
>
>> We'd need a separate set of optabs for such an interleaving
>> scheme?  In which case we might want to come up with a more
>> specific name than load_lane?
>
> Yeah, if someone has a single instruction that does your first example,
> then it would need a new optab.  The individual vector pairs could be
> represented using the current optab though, if each pair needs a
> separate instruction.  E.g. with your v2qi example, vec_load_lanessiv2qi
> would load:
>
>   v0[0], v1[0], v0[1], v1[1]
>
> and you could repeat for the others.  So load_lanes (as defined here)
> could be treated as a primitive, and your first example could be something
> like "repeat_load_lanes".
>
> If you don't like the name "load_lanes" though, I'm happy to use
> something else.

Ah, no - repeat_load_lanes sounds a good name for the new optab if
we need it at any point.

Richard.

> Richard
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2011-04-18 12:59 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-12 13:21 [0/9] Direct support for loads and stores of interleaved vectors Richard Sandiford
2011-04-12 13:25 ` [1/9] Generalise vect_create_data_ref_ptr Richard Sandiford
2011-04-12 13:30   ` Richard Guenther
2011-04-12 13:28 ` [2/9] Reindent parts of vectorizable_load and vectorizable_store Richard Sandiford
2011-04-12 13:33   ` Richard Guenther
2011-04-12 14:39     ` Richard Sandiford
2011-04-12 13:40 ` [3/9] STMT_VINFO_RELATED_STMT handling in vectorizable_store Richard Sandiford
2011-04-17 10:25   ` Ira Rosen
2011-04-12 13:44 ` [4/9] Move power-of-two checks for interleaving Richard Sandiford
2011-04-12 13:57   ` Richard Guenther
2011-04-12 13:59 ` [5/9] Main target-independent support for direct interleaving Richard Sandiford
2011-04-17 14:26   ` Ira Rosen
2011-04-18 11:54   ` Richard Guenther
2011-04-18 11:57     ` Richard Sandiford
2011-04-18 12:54       ` Richard Guenther
2011-04-18 12:58         ` Richard Sandiford
2011-04-18 13:22           ` Richard Guenther
2011-04-12 14:01 ` [6/9] NEON vec_load_lanes and vec_store_lanes patterns Richard Sandiford
2011-04-15 13:20   ` Richard Earnshaw
2011-04-12 14:14 ` [7/9] Testsuite: remove vect_{extract_even_odd,strided}_wide Richard Sandiford
2011-04-15 12:43   ` Richard Guenther
2011-04-12 14:19 ` [8/9] Testsuite: split tests for strided accesses Richard Sandiford
2011-04-15 12:44   ` Richard Guenther
2011-04-12 14:29 ` [9/9] Testsuite: Replace vect_strided with vect_stridedN Richard Sandiford
2011-04-15 12:44   ` Richard Guenther
2011-04-12 14:34 ` [10/9] Add tests for stride-3 accesses Richard Sandiford
2011-04-15 12:45   ` Richard Guenther

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).