public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Fix PRs c++/31078 and c++/31103 (canonical types failures)
@ 2007-04-05 17:41 Doug Gregor
  2007-04-05 19:42 ` [patch] Vectorizer cost model implementation Linthicum, Tony
  2007-04-12  3:37 ` [PATCH] Fix PRs c++/31078 and c++/31103 (canonical types failures) Mark Mitchell
  0 siblings, 2 replies; 30+ messages in thread
From: Doug Gregor @ 2007-04-05 17:41 UTC (permalink / raw)
  To: GCC Patches

This patch fixes two PRs related to failures in the canonical types
system, both with the same cause. In the original canonical types
patch, I failed to update TYPE_CANONICAL in c_build_qualified_type.
Thus, the canonical type system sometimes considered two array types
identical when their elements types actually had different cv
qualifiers. This didn't cause a huge number of failures because the
C++ front end only directly calls cp_build_qualified_type and
build_array_type, both of which already set canonical types properly.
PR 31078 got to c_build_qualified_type through fix_string_type, for
example, so the issue shows up with the type of a string literal.

This clears up the two PRs listed. Regtesting on
i386-apple-darwin8.8.1. Assuming no regressions, okay to commit to
mainline?

:ADDPATCH c/c++:

  - Doug

2007-04-05  Douglas Gregor  <doug.gregor@gmail.com>

	PR c++/31078
	PR c++/31103
	* c-common.c (c_build_qualified_type): Set canonical type
	appropriately.

2007-04-05  Douglas Gregor  <doug.gregor@gmail.com>

	* g++.dg/other/pr31078.C: New.

Index: testsuite/g++.dg/other/pr31078.C
===================================================================
--- testsuite/g++.dg/other/pr31078.C	(revision 0)
+++ testsuite/g++.dg/other/pr31078.C	(revision 0)
@@ -0,0 +1,31 @@
+typedef int SLONG;
+typedef char SCHAR;
+typedef short SSHORT;
+typedef char TEXT;
+typedef long ISC_STATUS;
+const SLONG gds_arg_string = 2;
+const SLONG gds_sys_request = 335544373L;
+enum jrd_blk_t
+{
+    type_str, type_dcc, type_sbm, type_smb, type_blb, type_irb, type_jrn,
+};
+struct blk
+{
+};
+template < class RPT, SSHORT BLOCK_TYPE = 0 > class pool_alloc_rpt:public blk
+{
+};
+class jrn:public pool_alloc_rpt < SCHAR, type_jrn >
+{
+public:ISC_STATUS * jrn_status_vector;
+  TEXT jrn_server[1];
+};
+typedef jrn *JRN;
+extern void IBERR_build_status (ISC_STATUS *, ISC_STATUS, ...);
+static void
+error (ISC_STATUS * status_vector, JRN journal, int status, TEXT * string)
+{
+  IBERR_build_status (status_vector, gds_sys_request, gds_arg_string, string,
+		      gds_arg_string, (journal) ? journal->jrn_server : "",
+		      0);
+}
Index: c-common.c
===================================================================
--- c-common.c	(revision 123513)
+++ c-common.c	(working copy)
@@ -2894,8 +2894,26 @@ c_build_qualified_type (tree type, int t
 	}
       if (!t)
 	{
+          tree domain = TYPE_DOMAIN (type);
+
 	  t = build_variant_type_copy (type);
 	  TREE_TYPE (t) = element_type;
+
+          if (TYPE_STRUCTURAL_EQUALITY_P (element_type)
+              || (domain && TYPE_STRUCTURAL_EQUALITY_P (domain)))
+            SET_TYPE_STRUCTURAL_EQUALITY (t);
+          else if (TYPE_CANONICAL (element_type) != element_type
+                   || (domain && TYPE_CANONICAL (domain) != domain))
+            {
+              tree unqualified_canon
+                = build_array_type (TYPE_CANONICAL (element_type),
+                                    domain? TYPE_CANONICAL (domain)
+                                          : NULL_TREE);
+              TYPE_CANONICAL (t)
+                = c_build_qualified_type (unqualified_canon, type_quals);
+            }
+          else
+            TYPE_CANONICAL (t) = t;
 	}
       return t;
     }

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [patch] Vectorizer cost model implementation
  2007-04-05 17:41 [PATCH] Fix PRs c++/31078 and c++/31103 (canonical types failures) Doug Gregor
@ 2007-04-05 19:42 ` Linthicum, Tony
  2007-04-05 19:54   ` Steven Bosscher
                     ` (3 more replies)
  2007-04-12  3:37 ` [PATCH] Fix PRs c++/31078 and c++/31103 (canonical types failures) Mark Mitchell
  1 sibling, 4 replies; 30+ messages in thread
From: Linthicum, Tony @ 2007-04-05 19:42 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 341 bytes --]

Hello All,

This patch contains the initial implementation for a cost model for the
vectorizer.  The patch is currently missing loop versioning for
dynamically trip counted loops and some cost model specific tests.  My
colleague, Harsha Jagasia, will be providing follow on patches for those
things shortly.

Thank you.

Tony



[-- Attachment #2: patch --]
[-- Type: application/octet-stream, Size: 19331 bytes --]

Index: gcc/testsuite/gcc.dg/vect/vect.exp
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect.exp	(revision 123477)
+++ gcc/testsuite/gcc.dg/vect/vect.exp	(working copy)
@@ -23,7 +23,7 @@
 set DEFAULT_VECTCFLAGS ""
 
 # These flags are used for all targets.
-lappend DEFAULT_VECTCFLAGS "-O2" "-ftree-vectorize"
+lappend DEFAULT_VECTCFLAGS "-O2" "-ftree-vectorize" "-fno-vect-cost-model"
 
 # If the target system supports vector instructions, the default action
 # for a test is 'run', otherwise it's 'compile'.  Save current default.
Index: gcc/tree-vectorizer.c
===================================================================
--- gcc/tree-vectorizer.c	(revision 123477)
+++ gcc/tree-vectorizer.c	(working copy)
@@ -1367,6 +1367,9 @@
   else
     STMT_VINFO_DEF_TYPE (res) = vect_loop_def;
   STMT_VINFO_SAME_ALIGN_REFS (res) = VEC_alloc (dr_p, heap, 5);
+  STMT_VINFO_INSIDE_OF_LOOP_COST (res) = 0;
+  STMT_VINFO_OUTSIDE_OF_LOOP_COST (res) = 0;
+  STMT_VINFO_MEM_COUNT_FOR_COST (res) = 0;
   DR_GROUP_FIRST_DR (res) = NULL_TREE;
   DR_GROUP_NEXT_DR (res) = NULL_TREE;
   DR_GROUP_SIZE (res) = 0;
@@ -2216,6 +2219,7 @@
      only over initial loops skipping newly generated ones.  */
   FOR_EACH_LOOP (li, loop, 0)
     {
+      int required_iters;
       loop_vec_info loop_vinfo;
 
       vect_loop_location = find_loop_location (loop);
@@ -2225,7 +2229,25 @@
       if (!loop_vinfo || !LOOP_VINFO_VECTORIZABLE_P (loop_vinfo))
 	continue;
 
+      required_iters = vect_model_required_iters (loop_vinfo);
+      if (required_iters < 0)
+	{
+	  if (vect_print_dump_info (REPORT_DETAILS))
+	    fprintf (vect_dump, "Vector body less profitable than scalar.\n");
+	  continue;
+	}
+
+      if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) &&
+	  LOOP_VINFO_INT_NITERS (loop_vinfo) < (unsigned) required_iters)
+	    {
+	        if (vect_print_dump_info (REPORT_DETAILS))
+		  fprintf (vect_dump,
+			   "Not vectorized: min iters: %d, actual: %lu\n",
+			   required_iters, LOOP_VINFO_INT_NITERS (loop_vinfo));
+		continue;
+	    }
       vect_transform_loop (loop_vinfo);
+
       num_vectorized_loops++;
     }
   vect_loop_location = UNKNOWN_LOC;
Index: gcc/tree-vectorizer.h
===================================================================
--- gcc/tree-vectorizer.h	(revision 123477)
+++ gcc/tree-vectorizer.h	(working copy)
@@ -264,6 +264,14 @@
   /* For loads only, if there is a store with the same location, this field is
      TRUE.  */
   bool read_write_dep;
+
+  /* Vectorization costs associated with statement */
+  struct  
+  {
+    int outside_of_loop;     /* Statements generated outside loop.  */
+    int inside_of_loop;      /* Statements generated inside loop.  */
+    int mem_count;           /* Strided load/store count for cost analysis */
+  } cost;
 } *stmt_vec_info;
 
 /* Access Functions.  */
@@ -296,7 +304,19 @@
 #define DR_GROUP_READ_WRITE_DEPENDENCE(S)  (S)->read_write_dep
 
 #define STMT_VINFO_RELEVANT_P(S)          ((S)->relevant != vect_unused_in_loop)
+#define STMT_VINFO_OUTSIDE_OF_LOOP_COST(S) (S)->cost.outside_of_loop
+#define STMT_VINFO_INSIDE_OF_LOOP_COST(S)  (S)->cost.inside_of_loop
+#define STMT_VINFO_MEM_COUNT_FOR_COST(S)   (S)->cost.mem_count
 
+/* These are some defines for the initial implementation of the vectorizer's
+   cost model.  These will later be target specific hooks.  */
+#define TARG_COND_BRANCH_COST        3
+#define TARG_VEC_STMT_COST           1
+#define TARG_VEC_TO_SCALAR_COST      1
+#define TARG_VEC_LOAD_COST           1
+#define TARG_VEC_UNALIGNED_LOAD_COST 2
+#define TARG_VEC_STORE_COST          1
+
 static inline void set_stmt_info (stmt_ann_t ann, stmt_vec_info stmt_info);
 static inline stmt_vec_info vinfo_for_stmt (tree stmt);
 
@@ -428,6 +448,7 @@
 extern bool vectorizable_condition (tree, block_stmt_iterator *, tree *);
 extern bool vectorizable_live_operation (tree, block_stmt_iterator *, tree *);
 extern bool vectorizable_reduction (tree, block_stmt_iterator *, tree *);
+extern int  vect_model_required_iters (loop_vec_info);
 /* Driver for transformation stage.  */
 extern void vect_transform_loop (loop_vec_info);
 
Index: gcc/common.opt
===================================================================
--- gcc/common.opt	(revision 123477)
+++ gcc/common.opt	(working copy)
@@ -1101,6 +1101,10 @@
 Common Report Var(flag_tree_vectorize) Optimization
 Enable loop vectorization on trees
 
+fno-vect-cost-model
+Common Report Var(flag_no_vect_cost_model) Optimization
+Disable use of cost model in vectorization
+
 ftree-vect-loop-version
 Common Report Var(flag_tree_vect_loop_version) Init(1) Optimization
 Enable loop versioning when doing loop vectorization on trees
Index: gcc/tree-vect-transform.c
===================================================================
--- gcc/tree-vect-transform.c	(revision 123477)
+++ gcc/tree-vect-transform.c	(working copy)
@@ -74,6 +74,349 @@
 static int vect_min_worthwhile_factor (enum tree_code);
 
 
+/* Function vect_model_required_iters
+
+   Return the number of iterations required for the vector version of the
+   loop to be profitable relative to the cost of the scalar version of the
+   loop.  */
+
+int
+vect_model_required_iters (loop_vec_info loop_vinfo)
+{
+  int i;
+  int min_iters;
+  div_t overhead;
+  int peel_iters;
+  int vec_inside_cost = 0;
+  int vec_outside_cost = 0;
+  int scalar_single_iter_cost = 0;
+  int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
+  int nbbs = loop->num_nodes;
+
+  /* Cost model disabled.  */
+  if (flag_no_vect_cost_model)
+    return 0;
+
+  /* Requires a loop to finish up remaining iterations after
+     vector loop.  Add additional cost for the test. */
+  if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
+    vec_outside_cost += TARG_COND_BRANCH_COST;
+
+  /* Requires loop versioning tests.  */
+  if (LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo))
+    vec_outside_cost += TARG_COND_BRANCH_COST;
+
+  /* Requires guard code for peeling.  */
+  if (LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo))
+    vec_outside_cost += TARG_COND_BRANCH_COST;
+
+  /* Count statements in scalar loop.  Using this as scalar cost for a single
+     iteration for now.  */
+  for (i = 0; i < nbbs; i++)
+    {
+      block_stmt_iterator si;
+      basic_block bb = bbs[i];
+
+      for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
+	{
+	  tree stmt = bsi_stmt (si);
+          stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+	  if (!STMT_VINFO_RELEVANT_P (stmt_info)
+	      && !STMT_VINFO_LIVE_P (stmt_info))
+	    continue;
+	  scalar_single_iter_cost++;
+	  vec_inside_cost += STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info);
+	  vec_outside_cost += STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info);
+	}
+    }
+
+  /* Add in costs of peeled instructions, if any.  */
+  peel_iters = LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo);
+  vec_outside_cost += peel_iters * scalar_single_iter_cost;
+
+  /* Calculate number of iterations required to make the vector version 
+     profitable, relative to the loop bodies only.  */
+  if (scalar_single_iter_cost > vec_inside_cost)
+    min_iters = 1;
+  else
+    {
+      div_t min_tmp = div (vec_inside_cost, scalar_single_iter_cost);
+      min_iters = min_tmp.rem == 0 ? min_tmp.quot : min_tmp.quot + 1;
+    }
+
+  /* If the minimum number of iterations required for loop body profitability
+     is greater than the vectorization factor, then the vector version can
+     never be profitable.  */
+  if (min_iters > vf)
+    {
+      if (vect_print_dump_info (REPORT_DETAILS))
+	fprintf (vect_dump, "Cost: min iterations %d greater than VF %d\n",
+		 min_iters, vf);
+      return -1;
+    }
+
+  else
+    {
+      /* Now we've determined the minimum number of iterations required for the
+	 body of the vector loop to be profitable.  We must now determine how 
+	 many more iterations are required to overcome the additional overhead
+	 that the vector loop requires, if any.  The overhead is contained in 
+	 the "outside of loop" costs gathered above.  */
+      if ((min_iters * scalar_single_iter_cost) <
+	  (vec_inside_cost + vec_outside_cost))
+	{
+	  overhead = div (vec_outside_cost, scalar_single_iter_cost);
+
+	  if (overhead.rem > 0)
+	    min_iters += (overhead.quot + 1);
+	  else
+	    min_iters += overhead.quot;
+	}
+    }
+
+  if (vect_print_dump_info (REPORT_DETAILS))
+    {
+      fprintf (vect_dump, "Cost model analysis: \n");
+      fprintf (vect_dump, "  Vector inside of loop cost: %d\n",
+	       vec_inside_cost);
+      fprintf (vect_dump, "  Vector outside of loop cost: %d\n",
+	       vec_outside_cost);
+      fprintf (vect_dump, "  Scalar cost: %d\n", scalar_single_iter_cost);
+      fprintf (vect_dump, "  Calculated minimum iters for profitability: %d\n",
+	       min_iters);
+      fprintf (vect_dump, "  Actual minimum iters for profitability: %d\n",
+	       min_iters < vf ? vf : min_iters);
+    }
+
+  return min_iters < vf ? vf : min_iters;
+}
+
+    
+/* Function vect_model_reduction_cost.  
+
+   Models cost for a reduction operation, including the vector ops 
+   generated within the strip-mine loop, the initial definition before
+   the loop, and the epilog code that must be generated.  */
+
+static void
+vect_model_reduction_cost (stmt_vec_info stmt_info, enum tree_code reduc_code,
+			   int ncopies)
+{
+  int outer_cost = 0;
+  enum tree_code code;
+  optab optab;
+  tree vectype;
+  tree orig_stmt;
+  tree reduction_op;
+  enum machine_mode mode;
+  tree operation = GIMPLE_STMT_OPERAND (STMT_VINFO_STMT (stmt_info), 1);
+  int op_type = TREE_CODE_LENGTH (TREE_CODE (operation));
+
+  /* Cost of reduction op inside loop */
+  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) += ncopies * TARG_VEC_STMT_COST;
+
+  reduction_op = TREE_OPERAND (operation, op_type-1);
+  vectype = get_vectype_for_scalar_type (TREE_TYPE (reduction_op));
+  mode = TYPE_MODE (vectype);
+  orig_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
+
+  if (!orig_stmt) 
+    orig_stmt = STMT_VINFO_STMT (stmt_info);
+
+  code = TREE_CODE (GIMPLE_STMT_OPERAND (orig_stmt, 1));
+
+  /* Add in cost for intial definition.  */
+  outer_cost += TARG_VEC_STMT_COST;
+
+  /* Determine if the initial definition will be used to adjust the final
+     result in the epilog.  */
+#ifdef ADJUST_IN_EPILOG
+  if (code != MIN_EXPR || code != MAX_EXPR)
+    {
+      outer_cost += TARG_VEC_STMT_COST;
+    }
+#endif	
+
+  /* Determine cost of epilog code.  */
+  if (reduc_code < NUM_TREE_CODES) 
+    /* We have a reduction operator that will reduce the vector in one 
+       statement.  Also requires scalar extract.  */
+    outer_cost += 2 * TARG_VEC_STMT_COST;
+  else 
+    {
+      int vec_size_in_bits = tree_low_cst (TYPE_SIZE (vectype), 1);
+      tree bitsize =
+	TYPE_SIZE (TREE_TYPE ( GIMPLE_STMT_OPERAND (orig_stmt, 0)));
+      int element_bitsize = tree_low_cst (bitsize, 1);
+      int nelements = vec_size_in_bits / element_bitsize;
+
+      optab = optab_for_tree_code (code, vectype);
+
+      /* We have a whole vector shift available */
+      if (!VECTOR_MODE_P (mode) || optab->handlers[mode].insn_code ==
+	  CODE_FOR_nothing) 
+	{
+	  /* Final reduction via vector shifts and the reduction operator.  
+	     Also requires scalar extract.  */
+	  outer_cost += ((exact_log2(nelements) * 2 + 1) * TARG_VEC_STMT_COST);
+	} 
+      else
+	{
+	  /* Use extracts and reduction op for final reduction.  For N 
+	     elements, we have N extracts and N-1 reduction ops.  */
+	  outer_cost += ((nelements + nelements - 1) * TARG_VEC_STMT_COST);
+	}
+    }
+
+  STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = outer_cost;
+}
+
+
+/* Function vect_model_simple_cost.  
+
+   Models cost for simple operations, i.e. those that only emit ncopies of a 
+   single op.  Right now, this does not account for multiple insns that could
+   be generated for the single vector op.  We will handle that shortly.  */
+
+static void
+vect_model_simple_cost (stmt_vec_info stmt_info, int ncopies)
+{
+  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = ncopies * TARG_VEC_STMT_COST;
+}
+
+/* Function vect_cost_strided_group_size 
+ 
+   For a load or store, returns the group_size if it's the last store, and 1
+   if it is not.  */
+static int
+vect_cost_strided_group_size (stmt_vec_info stmt_info)
+{
+  tree first_stmt = DR_GROUP_FIRST_DR (stmt_info);
+  stmt_vec_info first_info = vinfo_for_stmt (first_stmt);
+  int group_size = DR_GROUP_SIZE (first_info);
+  STMT_VINFO_MEM_COUNT_FOR_COST (first_info)++;
+    
+  /* Is it the last in the group?  */
+  if (STMT_VINFO_MEM_COUNT_FOR_COST (first_info) < group_size)
+    group_size = 1;
+
+  return group_size;
+}
+
+
+/* Function vect_model_store_cost
+
+   Models cost for stores.  In the case of strided accesses, the last access
+   has the overhead of the strided access attributed to it.  */
+
+static void
+vect_model_store_cost (stmt_vec_info stmt_info, int ncopies)
+{
+  int cost = 0;
+  int group_size;
+
+  /* Strided access?  */
+
+  if (DR_GROUP_FIRST_DR (stmt_info)) 
+    {
+      group_size = vect_cost_strided_group_size (stmt_info);
+    }
+  else 
+    /* Not a strided access.  */
+    group_size = 1;
+
+  /* Is this the last access in a group of stores providing strided acess?  
+     If so, add in the cost of the permutes.  */
+  if (group_size > 1) 
+    {
+      /* Uses a high and low interleave operation for each needed permute.  */
+      cost = ncopies * exact_log2(group_size) * group_size *
+	TARG_VEC_TO_SCALAR_COST;
+    }
+
+  /* Costs of the stores.  */
+  cost += ncopies * TARG_VEC_STORE_COST;
+
+  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost;
+}
+
+/* Function vect_model_load_cost
+
+   Models cost for loads.  In the case of strided accesses, the last access
+   has the overhead of the strided access attributed to it.  Since unaligned
+   accesses are supported for loads, we also account for the costs of the 
+   access scheme chosen.  */
+
+static void
+vect_model_load_cost (stmt_vec_info stmt_info, int ncopies)
+		 
+{
+  int inner_cost = 0;
+  int group_size;
+  int alignment_support_cheme;
+  tree first_stmt;
+  struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info), *first_dr;
+
+  /* Strided acesses? */
+  first_stmt = DR_GROUP_FIRST_DR (stmt_info);
+  if (first_stmt)
+    {
+      group_size = vect_cost_strided_group_size (stmt_info);
+      first_dr = STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt));
+    }
+  else
+    {
+      /* Not a strided access */
+      group_size = 1;
+      first_dr = dr;
+    }
+  alignment_support_cheme = vect_supportable_dr_alignment (first_dr);
+
+  /* Is this the last access in a group of loads providing strided acess?  
+     If so, add in the cost of the permutes.  */
+  if (group_size > 1) 
+    {
+      /* Uses a high and low interleave operation for each needed permute.  */
+      inner_cost = ncopies * exact_log2(group_size) * group_size *
+	TARG_VEC_STMT_COST;
+    }
+
+  /* The loads themselves */
+  switch (alignment_support_cheme)
+    {
+    case dr_aligned:
+      inner_cost += ncopies * TARG_VEC_LOAD_COST;
+      break;
+
+    case dr_unaligned_supported:
+      /* Here, we assign an additional cost for the unaligned load.  */
+      inner_cost += ncopies * TARG_VEC_UNALIGNED_LOAD_COST;
+
+    case dr_unaligned_software_pipeline:
+      {
+	int outer_cost;
+
+	/* Unaligned software pipeline has a load of an address, an intial
+	   load, and possibly a mask operation to "prime" the loop.  Inside
+	   the loop, there is a load op and a realignment op.  */
+	outer_cost = 2 * TARG_VEC_STMT_COST;
+	if (targetm.vectorize.builtin_mask_for_load)
+	  outer_cost += TARG_VEC_STMT_COST;
+	STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = outer_cost;
+	
+	inner_cost += (2*ncopies) * TARG_VEC_LOAD_COST;
+	break;
+      }
+
+    default:
+      gcc_unreachable ();
+    }
+
+  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = inner_cost;
+}
+
 /* Function vect_get_new_vect_var.
 
    Returns a name for a new variable. The current naming scheme appends the 
@@ -1695,6 +2038,7 @@
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
+      vect_model_reduction_cost (stmt_info, epilog_reduc_code, ncopies);
       return true;
     }
 
@@ -1889,9 +2233,13 @@
 
   gcc_assert (ZERO_SSA_OPERANDS (stmt, SSA_OP_ALL_VIRTUALS));
 
+  ncopies = (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
+	     / TYPE_VECTOR_SUBPARTS (vectype_out));
+
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = call_vec_info_type;
+      vect_model_simple_cost (stmt_info, ncopies);
       return true;
     }
 
@@ -1900,8 +2248,6 @@
   if (vect_print_dump_info (REPORT_DETAILS))
     fprintf (vect_dump, "transform operation.");
 
-  ncopies = (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
-	     / TYPE_VECTOR_SUBPARTS (vectype_out));
   gcc_assert (ncopies >= 1);
 
   /* Handle def.  */
@@ -2158,6 +2504,7 @@
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = assignment_vec_info_type;
+      vect_model_simple_cost (stmt_info, ncopies);
       return true;
     }
 
@@ -2357,6 +2704,7 @@
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = op_vec_info_type;
+      vect_model_simple_cost (stmt_info, ncopies);
       return true;
     }
 
@@ -2581,6 +2929,7 @@
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = type_demotion_vec_info_type;
+      vect_model_simple_cost (stmt_info, ncopies);
       return true;
     }
                                                                                 
@@ -2795,6 +3144,7 @@
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = type_promotion_vec_info_type;
+      vect_model_simple_cost (stmt_info, 2*ncopies);
       return true;
     }
 
@@ -3102,14 +3452,12 @@
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = store_vec_info_type;
+      vect_model_store_cost (stmt_info, ncopies);
       return true;
     }
 
   /** Transform.  **/
 
-  if (vect_print_dump_info (REPORT_DETAILS))
-    fprintf (vect_dump, "transform store. ncopies = %d",ncopies);
-
   if (strided_store)
     {
       first_stmt = DR_GROUP_FIRST_DR (stmt_info);
@@ -3134,6 +3482,10 @@
       group_size = 1;
     }
   
+
+  if (vect_print_dump_info (REPORT_DETAILS))
+    fprintf (vect_dump, "transform store. ncopies = %d",ncopies);
+
   dr_chain = VEC_alloc (tree, heap, group_size);
   oprnds = VEC_alloc (tree, heap, group_size);
 
@@ -3764,14 +4116,14 @@
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = load_vec_info_type;
+      vect_model_load_cost (stmt_info, ncopies);
       return true;
     }
 
-  /** Transform.  **/
-
   if (vect_print_dump_info (REPORT_DETAILS))
     fprintf (vect_dump, "transform load.");
 
+  /** Transform.  **/
   if (strided_load)
     {
       first_stmt = DR_GROUP_FIRST_DR (stmt_info);
@@ -3793,6 +4145,7 @@
     }
 
   alignment_support_cheme = vect_supportable_dr_alignment (first_dr);
+
   gcc_assert (alignment_support_cheme);
 
 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] Vectorizer cost model implementation
  2007-04-05 19:42 ` [patch] Vectorizer cost model implementation Linthicum, Tony
@ 2007-04-05 19:54   ` Steven Bosscher
  2007-04-05 20:06     ` Linthicum, Tony
  2007-04-05 21:35   ` Eric Christopher
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 30+ messages in thread
From: Steven Bosscher @ 2007-04-05 19:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: Linthicum, Tony

On Thursday 05 April 2007 21:41, Linthicum, Tony wrote:
> Hello All,
>
> This patch contains the initial implementation for a cost model for the
> vectorizer.  The patch is currently missing loop versioning for
> dynamically trip counted loops and some cost model specific tests.  My
> colleague, Harsha Jagasia, will be providing follow on patches for those
> things shortly.

Is it your plan to have this accepted for the trunk or for a branch?

If so, where's the ChangeLog entry?  How did you test this?

Gr.
Steven

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [patch] Vectorizer cost model implementation
  2007-04-05 19:54   ` Steven Bosscher
@ 2007-04-05 20:06     ` Linthicum, Tony
  0 siblings, 0 replies; 30+ messages in thread
From: Linthicum, Tony @ 2007-04-05 20:06 UTC (permalink / raw)
  To: Steven Bosscher, gcc-patches

> -----Original Message-----
> From: Steven Bosscher [mailto:stevenb.gcc@gmail.com]
> 
> Is it your plan to have this accepted for the trunk or for a branch?

The plan is to have it accepted to the trunk, I believe.

> 
> If so, where's the ChangeLog entry?  

Oops!  I forgot.  My apologies.

>
> How did you test this?
> 

I used the vect tests ... a number of them "fail" with the cost model
enabled, and the cost model is correct in its assessment.  I disabled
the cost model for the test suite for now so that those tests can
continue to function as intended.  I.e. test specific vectorization
capabilities.  Those tests will be made into cost model tests.
Additionally, more tests should be added as well that target the cost
model.

I know that these issues should have been addressed before the patch was
submitted and at the very least I should have done a better job of
explaining the situation in my submission.  So, I must again apologize.
I'm not going to be working on this project any longer, and it was
desired that I submit the patch under my name prior to leaving.  As I
said, though, my colleague Harsha will be addressing any shortcomings in
this patch and will be supporting it going forward.

Thanks for you comments.

Tony



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] Vectorizer cost model implementation
  2007-04-05 19:42 ` [patch] Vectorizer cost model implementation Linthicum, Tony
  2007-04-05 19:54   ` Steven Bosscher
@ 2007-04-05 21:35   ` Eric Christopher
  2007-04-06  0:24   ` Sebastian Pop
  2007-04-08 14:37   ` Dorit Nuzman
  3 siblings, 0 replies; 30+ messages in thread
From: Eric Christopher @ 2007-04-05 21:35 UTC (permalink / raw)
  To: Linthicum, Tony; +Cc: GCC Patches

>

(thanks for doing this work!)

A few style things I noticed:

a)

+      if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) &&
+	  LOOP_VINFO_INT_NITERS (loop_vinfo) < (unsigned) required_iters)

coding standard nit. The && needs to be on the next line. And instead  
of casting
to unsigned, why not just compare < since we already know that  
required_iters
is >= 0?

b)

+#ifdef ADJUST_IN_EPILOG

We actually use the 'epilogue' spelling in the rest of the compiler.

c)

+      if (!VECTOR_MODE_P (mode) || optab->handlers[mode].insn_code ==
+	  CODE_FOR_nothing)

Put the full || body on the second line.

d)

+      cost = ncopies * exact_log2(group_size) * group_size *
+	TARG_VEC_TO_SCALAR_COST;

Same sort of thing here.

Otherwise it looks nice. Not that I can approve it.

-eric

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] Vectorizer cost model implementation
  2007-04-05 19:42 ` [patch] Vectorizer cost model implementation Linthicum, Tony
  2007-04-05 19:54   ` Steven Bosscher
  2007-04-05 21:35   ` Eric Christopher
@ 2007-04-06  0:24   ` Sebastian Pop
  2007-04-06 15:41     ` Jagasia, Harsha
  2007-04-08 13:24     ` Dorit Nuzman
  2007-04-08 14:37   ` Dorit Nuzman
  3 siblings, 2 replies; 30+ messages in thread
From: Sebastian Pop @ 2007-04-06  0:24 UTC (permalink / raw)
  To: Linthicum, Tony; +Cc: Jagasia, Harsha, GCC Patches

On 4/5/07, Linthicum, Tony <tony.linthicum@amd.com> wrote:
> This patch contains the initial implementation for a cost model for the
> vectorizer.

Nice.  Actually, I'm thinking that we'll need to count statements in a similar
way, for cost functions of other loop transformations.

The new option fno-vect-cost-model should be named fvect-cost-model
otherwise you'll get also the fno-no-vect-cost-model ;-) and there should
be some lines of documentation for that option in doc/invoke.texi

+/* These are some defines for the initial implementation of the vectorizer's
+   cost model.  These will later be target specific hooks.  */
+#define TARG_COND_BRANCH_COST        3
+#define TARG_VEC_STMT_COST           1
+#define TARG_VEC_TO_SCALAR_COST      1
+#define TARG_VEC_LOAD_COST           1
+#define TARG_VEC_UNALIGNED_LOAD_COST 2
+#define TARG_VEC_STORE_COST          1

You should comment each of the "magic numbers" with something
similar to:

/* Overhead for generating a branch.  */
#define TARG_COND_BRANCH_COST        3

Thanks,
Sebastian

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [patch] Vectorizer cost model implementation
  2007-04-06  0:24   ` Sebastian Pop
@ 2007-04-06 15:41     ` Jagasia, Harsha
  2007-04-08 13:24     ` Dorit Nuzman
  1 sibling, 0 replies; 30+ messages in thread
From: Jagasia, Harsha @ 2007-04-06 15:41 UTC (permalink / raw)
  To: Sebastian Pop, Linthicum, Tony, Meissner, Michael; +Cc: GCC Patches

>The new option fno-vect-cost-model should be named fvect-cost-model
>otherwise you'll get also the fno-no-vect-cost-model ;-) and there
should
>be some lines of documentation for that option in doc/invoke.texi

I will fix that :)

>
>+/* These are some defines for the initial implementation of the
>vectorizer's
>+   cost model.  These will later be target specific hooks.  */
>+#define TARG_COND_BRANCH_COST        3
>+#define TARG_VEC_STMT_COST           1
>+#define TARG_VEC_TO_SCALAR_COST      1
>+#define TARG_VEC_LOAD_COST           1
>+#define TARG_VEC_UNALIGNED_LOAD_COST 2
>+#define TARG_VEC_STORE_COST          1
>
>You should comment each of the "magic numbers" with something
>similar to:
>
>/* Overhead for generating a branch.  */
>#define TARG_COND_BRANCH_COST        3

Ok

Thanks,
Harsha


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] Vectorizer cost model implementation
  2007-04-06  0:24   ` Sebastian Pop
  2007-04-06 15:41     ` Jagasia, Harsha
@ 2007-04-08 13:24     ` Dorit Nuzman
  1 sibling, 0 replies; 30+ messages in thread
From: Dorit Nuzman @ 2007-04-08 13:24 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: GCC Patches, Jagasia, Harsha, Linthicum, Tony

> On 4/5/07, Linthicum, Tony <tony.linthicum@amd.com> wrote:
> > This patch contains the initial implementation for a cost model for the
> > vectorizer.
>
> Nice.  Actually, I'm thinking that we'll need to count statements ina
similar
> way, for cost functions of other loop transformations.
>

Indeed. I'm hoping we could arrange a BOF around that at the summit, to
discuss that, and other utilities that can be useful for several loop
optimizations (data-reuse analysis, reduction-like scalar-cycles detection,
nested-loops transformations,...).

dorit


> The new option fno-vect-cost-model should be named fvect-cost-model
> otherwise you'll get also the fno-no-vect-cost-model ;-) and there should
> be some lines of documentation for that option in doc/invoke.texi
>
> +/* These are some defines for the initial implementation of the
vectorizer's
> +   cost model.  These will later be target specific hooks.  */
> +#define TARG_COND_BRANCH_COST        3
> +#define TARG_VEC_STMT_COST           1
> +#define TARG_VEC_TO_SCALAR_COST      1
> +#define TARG_VEC_LOAD_COST           1
> +#define TARG_VEC_UNALIGNED_LOAD_COST 2
> +#define TARG_VEC_STORE_COST          1
>
> You should comment each of the "magic numbers" with something
> similar to:
>
> /* Overhead for generating a branch.  */
> #define TARG_COND_BRANCH_COST        3
>
> Thanks,
> Sebastian

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] Vectorizer cost model implementation
  2007-04-05 19:42 ` [patch] Vectorizer cost model implementation Linthicum, Tony
                     ` (2 preceding siblings ...)
  2007-04-06  0:24   ` Sebastian Pop
@ 2007-04-08 14:37   ` Dorit Nuzman
  2007-04-11 21:25     ` Jagasia, Harsha
  2007-04-22  5:45     ` Jagasia, Harsha
  3 siblings, 2 replies; 30+ messages in thread
From: Dorit Nuzman @ 2007-04-08 14:37 UTC (permalink / raw)
  To: Linthicum, Tony; +Cc: GCC Patches, Jagasia, Harsha

> Hello All,
>
> This patch contains the initial implementation for a cost model for the
> vectorizer.

Thanks Tony for this work, and Harsha for taking it on!
As I told Tony off-line, I think this is an excellent start.  With this,
people can start experimenting and tuning the cost model on different
targets, and gradually develop the required target-specific APIs, and also
see how to generalize this for the use of other loop transformations.

Here are some comments to the patch:

> Index: gcc/testsuite/gcc.dg/vect/vect.exp
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect.exp  (revision 123477)
> +++ gcc/testsuite/gcc.dg/vect/vect.exp  (working copy)
> @@ -23,7 +23,7 @@
>  set DEFAULT_VECTCFLAGS ""
>
>  # These flags are used for all targets.
> -lappend DEFAULT_VECTCFLAGS "-O2" "-ftree-vectorize"
> +lappend DEFAULT_VECTCFLAGS "-O2" "-ftree-vectorize"
"-fno-vect-cost-model"

This is discussed separately (consider not to disable the cost-model by
default, and instead xfail the tests that fail to vectorize with cost model
enabled (if there's not too many of them), duplicating them etc...).


> @@ -2216,6 +2219,7 @@
>       only over initial loops skipping newly generated ones.  */
>    FOR_EACH_LOOP (li, loop, 0)
>      {
> +      int required_iters;

What do you think about renaming this variable to something like
"min_profitable_iters", or "estimated_min_iters" or anyhow something that
would indicate that it's about estimated profitability and not something
that's really inherently required?


> @@ -2225,7 +2229,25 @@
>        if (!loop_vinfo || !LOOP_VINFO_VECTORIZABLE_P (loop_vinfo))
>     continue;
>
> +      required_iters = vect_model_required_iters (loop_vinfo);
> +      if (required_iters < 0)
> +   {
> +     if (vect_print_dump_info (REPORT_DETAILS))
> +       fprintf (vect_dump, "Vector body less profitable than
scalar.\n");
> +     continue;
> +   }
> +
> +      if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) &&
> +     LOOP_VINFO_INT_NITERS (loop_vinfo) < (unsigned) required_iters)
> +       {
> +           if (vect_print_dump_info (REPORT_DETAILS))
> +           fprintf (vect_dump,
> +                  "Not vectorized: min iters: %d, actual: %lu\n",
> +                  required_iters, LOOP_VINFO_INT_NITERS (loop_vinfo));
> +         continue;
> +       }
>        vect_transform_loop (loop_vinfo);
> +
>        num_vectorized_loops++;
>      }
>    vect_loop_location = UNKNOWN_LOC;

One comment is that we use REPORT_UNVECTORIZED_LOOPS (rather then
REPORT_DETAILS) to guard the last printout that explains why a loop wasn't
vectorized. The idea is that when you use -fdump-tree-vect-stats (or
-ftree-vectorizer-verbose=2) you'll get one printout per loop explaining
why it wasn't vectorized. I think that with the current patch, if a loop
doesn't get vectorized because of the cost model, you'll see an explanation
only when using -fdump-tree-vect-details (or -ftree-vectorizer-verbose=7).
So you can either change REPORT_DETAILS to REPORT_UNVECTORIZED_LOOPS in the
code above, or leave it as is and add another, more user friedly, printout
under REPORT_UNVECTORIZED_LOOPS (e.g. "not vectorized: vectorization not
profitable").

The other comment is that this piece of code would be better placed
together with the last part of the function vect_analyze_operations, where
we already do several similar loop-bound related checks (if the loop-bound
is known, we compare it against the VF or whatever the user specified via
--param min-vect-loop-bound, and other checks). So, please move it there,
or, if you want, factor all of this out to a separate function, to be
called after vect_analyze_operations.

This also reminds me that we need to consider what to do if the user asked
to vectorize when the loop-bound is greater than X (via --param
min-vect-loop-bound), but the cost model determined that it would be
profitable when the loop-bound is greater than Y, and X != Y. The --param
min-vect-loop-bound was originally added only until such time that we have
a cost model, but maybe we want to keep it around, at least until the cost
model has been tuned/matured. In the meantime we can decide to do one of
the following:
(1) always go with the user
(2) always go with the cost model
(3) go with the user if the user is more conservative than the cost-model
(i.e. X > Y)
(4) go with the user if the user is less conservative than the cost-model
(i.e. X < Y)

The first option is equivalent to turning off the cost model with
-fno-vect-cost-model, so I'd go with one of 2,3,4.


> +
> +  /* Vectorization costs associated with statement */

According to the coding conventions comments should be treated like real
sentences, so there should be a period at the end. Also, coding conventions
ask for two spaces before end of the comment. (There are a few other
occurrences of that in the patch).


> +extern int  vect_model_required_iters (loop_vec_info);

same comment as I made above - how about
'vect_estimate_min_profitable_iters'? (or something shorter in that spirit)


> +fno-vect-cost-model
> +Common Report Var(flag_no_vect_cost_model) Optimization
> +Disable use of cost model in vectorization
> +

As others already commented - the flag should be introduced as
-fvect-cost-model, and documented in the texi files.


> +  /* Requires loop versioning tests.  */
> +  if (LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo))
> +    vec_outside_cost += TARG_COND_BRANCH_COST;

We can try to be a bit more "accurate" here and have the cost depend on the
number of stmts in the may_misalign list (the more datarefs in the list,
the heavier is the test we create).  Obviously, there are a lot of
improvements/fine-tuning like that that can be done, and these can be done
later.  For now, I would just ask that we record this in the code (add a
FIXME comment here).


> +  if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
> +    vec_outside_cost += TARG_COND_BRANCH_COST;
...
> +  /* Requires guard code for peeling.  */
> +  if (LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo))
> +    vec_outside_cost += TARG_COND_BRANCH_COST;

In the smae spirit of the above comment - we actually generate 2 guards for
each peel-loop that we create, so 2*TARG_COND_BRANCH_COST would be more
"accurate".


> +  /* Count statements in scalar loop.  Using this as scalar cost for a
single
> +     iteration for now.  */
> +  for (i = 0; i < nbbs; i++)
> +    {
> +      block_stmt_iterator si;
> +      basic_block bb = bbs[i];
> +
> +      for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
> +   {
> +     tree stmt = bsi_stmt (si);
> +          stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> +     if (!STMT_VINFO_RELEVANT_P (stmt_info)
> +         && !STMT_VINFO_LIVE_P (stmt_info))
> +       continue;
> +     scalar_single_iter_cost++;
> +     vec_inside_cost += STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info);
> +     vec_outside_cost += STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info);
> +   }
> +    }

This is more a note for myself - to remember to update this code when
looking at outer-loop vectorization (the stmts in the inner-loop execute
more times than the rest of the stmts). More adjustments would probably be
needed for outer-loop support.


> +  /* Add in costs of peeled instructions, if any.  */
> +  peel_iters = LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo);
> +  vec_outside_cost += peel_iters * scalar_single_iter_cost;
> +

You'll have a problem here if the number of iterations to be peeled is not
known at compile time... in this case LOOP_PEELING_FOR_ALIGNEMNT is
negative, and the actual number of iterations is computed at run time.  So
in that case you also need to build an expression that would be evaluated
at runtime. I understand that at this point this patch handles only the
case that the number of iterations in known, so until it is extended you
just want to check if LOOP_PEELING_FOR_ALIGNMENT > 0.


> +static void
> +vect_model_reduction_cost (stmt_vec_info stmt_info, enum tree_code
reduc_code,
> +                  int ncopies)
> +{

As discussed with Tony off-line - there's a close dependency between this
function and the reduction transformation functions, as is the case with
all the vect_model_*_cost functions - they all depend on how the respective
vectorizable_* functions are implemented. At some point we may want to
revisit this and see how to better design it.


> +  /* Add in cost for intial definition.  */

typo - initial


> +  /* Determine cost of epilog code.  */
> +  if (reduc_code < NUM_TREE_CODES)
> +    /* We have a reduction operator that will reduce the vector in one
> +       statement.  Also requires scalar extract.  */
> +    outer_cost += 2 * TARG_VEC_STMT_COST;

Did you want to use TARG_VEC_TO_SCALAR_COST for the scalar extract? (As
pointed out by others - it is not documented in tree-vectorizer.h).


> +static void
> +vect_model_simple_cost (stmt_vec_info stmt_info, int ncopies)
> +{
> +  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = ncopies *
TARG_VEC_STMT_COST;
> +}
> +
> +/* Function vect_cost_strided_group_size
> +
> +   For a load or store, returns the group_size if it's the last store,
and 1
> +   if it is not.  */
> +static int
> +vect_cost_strided_group_size (stmt_vec_info stmt_info)
> +{

This is more of a convention within the tree-vect-* files: we keep 2 empty
lines between functions, and one empty line between the description of the
function and the function itself (you have kept this convention almost
everywhere else, but these are a couple such occurrences here and there).


> +/* Function vect_cost_strided_group_size
> +
> +   For a load or store, returns the group_size if it's the last store,
and 1
> +   if it is not.  */
> +static int
> +vect_cost_strided_group_size (stmt_vec_info stmt_info)
> +{
> +  tree first_stmt = DR_GROUP_FIRST_DR (stmt_info);
> +  stmt_vec_info first_info = vinfo_for_stmt (first_stmt);
> +  int group_size = DR_GROUP_SIZE (first_info);
> +  STMT_VINFO_MEM_COUNT_FOR_COST (first_info)++;
> +
> +  /* Is it the last in the group?  */
> +  if (STMT_VINFO_MEM_COUNT_FOR_COST (first_info) < group_size)
> +    group_size = 1;
> +
> +  return group_size;
> +}

I think there's no need to keep track of whether this is the last
load/store in a group or not. it doesn't matter for which of the
stores/loads we return the actual group_size, as long as we do it exactly
once per group. So I think you could rewrite it as follows (this way you
can also get rid of the field MEM_COUNT_FOR_COST):

   static int
   vect_cost_strided_group_size (stmt_vec_info stmt_info)
   {
     tree first_stmt = DR_GROUP_FIRST_DR (stmt_info);

     if (first_stmt == STMT_VINFO_STMT (stmt_info))
       return DR_GROUP_SIZE (stmt_info);

     return 1;
   }


> +  /* Is this the last access in a group of stores providing strided
acess?

typo - access. Appears a couple of times.


> +  if (group_size > 1)
> +    {
> +      /* Uses a high and low interleave operation for each needed
permute.  */
> +      cost = ncopies * exact_log2(group_size) * group_size *
> +   TARG_VEC_TO_SCALAR_COST;
> +    }

why are you using TARG_VEC_TO_SCALAR_COST? Again, it's not documented, so I
don't know what it's meant to represent, but permute/shuffle like
operations do not produce a scalar from a vector. Looks like
TARG_VEC_STMT_COST is more appropriate here?


> +      /* Uses a high and low interleave operation for each needed
permute.  */
> +      inner_cost = ncopies * exact_log2(group_size) * group_size *
> +   TARG_VEC_STMT_COST;

For the loads we actually use even and odd extract operations, not
interleave operations (and the cost for both should probably be the same)


> +    case dr_unaligned_software_pipeline:
> +      {
> +   int outer_cost;
> +
> +   /* Unaligned software pipeline has a load of an address, an intial

typo - initial


> +      load, and possibly a mask operation to "prime" the loop.  Inside
> +      the loop, there is a load op and a realignment op.  */
> +   outer_cost = 2 * TARG_VEC_STMT_COST;
> +   if (targetm.vectorize.builtin_mask_for_load)
> +     outer_cost += TARG_VEC_STMT_COST;
> +   STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = outer_cost;

if this is a group of loads (when vectorizing a strided access) - looks
like the outer-cost will be added by each load in the group? (it should be
added only once per group)


> +
> +   inner_cost += (2*ncopies) * TARG_VEC_LOAD_COST;

so you're using TARG_VEC_LOAD_COST also for the cost of the realignment op?
I think is should be a TARG_VEC_STMT_COST.


thanks!

dorit


> The patch is currently missing loop versioning for
> dynamically trip counted loops and some cost model specific tests.  My
> colleague, Harsha Jagasia, will be providing follow on patches for those
> things shortly.
>
> Thank you.
>
> Tony
>
>
> [attachment "patch" deleted by Dorit Nuzman/Haifa/IBM]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [patch] Vectorizer cost model implementation
  2007-04-08 14:37   ` Dorit Nuzman
@ 2007-04-11 21:25     ` Jagasia, Harsha
  2007-04-22  5:45     ` Jagasia, Harsha
  1 sibling, 0 replies; 30+ messages in thread
From: Jagasia, Harsha @ 2007-04-11 21:25 UTC (permalink / raw)
  To: Dorit Nuzman, Meissner, Michael; +Cc: GCC Patches

Just as a note, I am looking into fixes for the review suggestions you
have below and in related threads. Comments will follow.

Thanks,
Harsha

>-----Original Message-----
>From: Dorit Nuzman [mailto:DORIT@il.ibm.com]
>Sent: Sunday, April 08, 2007 9:39 AM
>To: Linthicum, Tony
>Cc: GCC Patches; Jagasia, Harsha
>Subject: Re: [patch] Vectorizer cost model implementation
>
>> Hello All,
>>
>> This patch contains the initial implementation for a cost model for
the
>> vectorizer.
>
>Thanks Tony for this work, and Harsha for taking it on!
>As I told Tony off-line, I think this is an excellent start.  With
this,
>people can start experimenting and tuning the cost model on different
>targets, and gradually develop the required target-specific APIs, and
also
>see how to generalize this for the use of other loop transformations.
>
>Here are some comments to the patch:
>
>> Index: gcc/testsuite/gcc.dg/vect/vect.exp
>> ===================================================================
>> --- gcc/testsuite/gcc.dg/vect/vect.exp  (revision 123477)
>> +++ gcc/testsuite/gcc.dg/vect/vect.exp  (working copy)
>> @@ -23,7 +23,7 @@
>>  set DEFAULT_VECTCFLAGS ""
>>
>>  # These flags are used for all targets.
>> -lappend DEFAULT_VECTCFLAGS "-O2" "-ftree-vectorize"
>> +lappend DEFAULT_VECTCFLAGS "-O2" "-ftree-vectorize"
>"-fno-vect-cost-model"
>
>This is discussed separately (consider not to disable the cost-model by
>default, and instead xfail the tests that fail to vectorize with cost
model
>enabled (if there's not too many of them), duplicating them etc...).
>
>
>> @@ -2216,6 +2219,7 @@
>>       only over initial loops skipping newly generated ones.  */
>>    FOR_EACH_LOOP (li, loop, 0)
>>      {
>> +      int required_iters;
>
>What do you think about renaming this variable to something like
>"min_profitable_iters", or "estimated_min_iters" or anyhow something
that
>would indicate that it's about estimated profitability and not
something
>that's really inherently required?
>
>
>> @@ -2225,7 +2229,25 @@
>>        if (!loop_vinfo || !LOOP_VINFO_VECTORIZABLE_P (loop_vinfo))
>>     continue;
>>
>> +      required_iters = vect_model_required_iters (loop_vinfo);
>> +      if (required_iters < 0)
>> +   {
>> +     if (vect_print_dump_info (REPORT_DETAILS))
>> +       fprintf (vect_dump, "Vector body less profitable than
>scalar.\n");
>> +     continue;
>> +   }
>> +
>> +      if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) &&
>> +     LOOP_VINFO_INT_NITERS (loop_vinfo) < (unsigned) required_iters)
>> +       {
>> +           if (vect_print_dump_info (REPORT_DETAILS))
>> +           fprintf (vect_dump,
>> +                  "Not vectorized: min iters: %d, actual: %lu\n",
>> +                  required_iters, LOOP_VINFO_INT_NITERS
(loop_vinfo));
>> +         continue;
>> +       }
>>        vect_transform_loop (loop_vinfo);
>> +
>>        num_vectorized_loops++;
>>      }
>>    vect_loop_location = UNKNOWN_LOC;
>
>One comment is that we use REPORT_UNVECTORIZED_LOOPS (rather then
>REPORT_DETAILS) to guard the last printout that explains why a loop
wasn't
>vectorized. The idea is that when you use -fdump-tree-vect-stats (or
>-ftree-vectorizer-verbose=2) you'll get one printout per loop
explaining
>why it wasn't vectorized. I think that with the current patch, if a
loop
>doesn't get vectorized because of the cost model, you'll see an
explanation
>only when using -fdump-tree-vect-details (or
-ftree-vectorizer-verbose=7).
>So you can either change REPORT_DETAILS to REPORT_UNVECTORIZED_LOOPS in
the
>code above, or leave it as is and add another, more user friedly,
printout
>under REPORT_UNVECTORIZED_LOOPS (e.g. "not vectorized: vectorization
not
>profitable").
>
>The other comment is that this piece of code would be better placed
>together with the last part of the function vect_analyze_operations,
where
>we already do several similar loop-bound related checks (if the
loop-bound
>is known, we compare it against the VF or whatever the user specified
via
>--param min-vect-loop-bound, and other checks). So, please move it
there,
>or, if you want, factor all of this out to a separate function, to be
>called after vect_analyze_operations.
>
>This also reminds me that we need to consider what to do if the user
asked
>to vectorize when the loop-bound is greater than X (via --param
>min-vect-loop-bound), but the cost model determined that it would be
>profitable when the loop-bound is greater than Y, and X != Y. The
--param
>min-vect-loop-bound was originally added only until such time that we
have
>a cost model, but maybe we want to keep it around, at least until the
cost
>model has been tuned/matured. In the meantime we can decide to do one
of
>the following:
>(1) always go with the user
>(2) always go with the cost model
>(3) go with the user if the user is more conservative than the
cost-model
>(i.e. X > Y)
>(4) go with the user if the user is less conservative than the
cost-model
>(i.e. X < Y)
>
>The first option is equivalent to turning off the cost model with
>-fno-vect-cost-model, so I'd go with one of 2,3,4.
>
>
>> +
>> +  /* Vectorization costs associated with statement */
>
>According to the coding conventions comments should be treated like
real
>sentences, so there should be a period at the end. Also, coding
conventions
>ask for two spaces before end of the comment. (There are a few other
>occurrences of that in the patch).
>
>
>> +extern int  vect_model_required_iters (loop_vec_info);
>
>same comment as I made above - how about
>'vect_estimate_min_profitable_iters'? (or something shorter in that
spirit)
>
>
>> +fno-vect-cost-model
>> +Common Report Var(flag_no_vect_cost_model) Optimization
>> +Disable use of cost model in vectorization
>> +
>
>As others already commented - the flag should be introduced as
>-fvect-cost-model, and documented in the texi files.
>
>
>> +  /* Requires loop versioning tests.  */
>> +  if (LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo))
>> +    vec_outside_cost += TARG_COND_BRANCH_COST;
>
>We can try to be a bit more "accurate" here and have the cost depend on
the
>number of stmts in the may_misalign list (the more datarefs in the
list,
>the heavier is the test we create).  Obviously, there are a lot of
>improvements/fine-tuning like that that can be done, and these can be
done
>later.  For now, I would just ask that we record this in the code (add
a
>FIXME comment here).
>
>
>> +  if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
>> +    vec_outside_cost += TARG_COND_BRANCH_COST;
>...
>> +  /* Requires guard code for peeling.  */
>> +  if (LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo))
>> +    vec_outside_cost += TARG_COND_BRANCH_COST;
>
>In the smae spirit of the above comment - we actually generate 2 guards
for
>each peel-loop that we create, so 2*TARG_COND_BRANCH_COST would be more
>"accurate".
>
>
>> +  /* Count statements in scalar loop.  Using this as scalar cost for
a
>single
>> +     iteration for now.  */
>> +  for (i = 0; i < nbbs; i++)
>> +    {
>> +      block_stmt_iterator si;
>> +      basic_block bb = bbs[i];
>> +
>> +      for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
>> +   {
>> +     tree stmt = bsi_stmt (si);
>> +          stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>> +     if (!STMT_VINFO_RELEVANT_P (stmt_info)
>> +         && !STMT_VINFO_LIVE_P (stmt_info))
>> +       continue;
>> +     scalar_single_iter_cost++;
>> +     vec_inside_cost += STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info);
>> +     vec_outside_cost += STMT_VINFO_OUTSIDE_OF_LOOP_COST
(stmt_info);
>> +   }
>> +    }
>
>This is more a note for myself - to remember to update this code when
>looking at outer-loop vectorization (the stmts in the inner-loop
execute
>more times than the rest of the stmts). More adjustments would probably
be
>needed for outer-loop support.
>
>
>> +  /* Add in costs of peeled instructions, if any.  */
>> +  peel_iters = LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo);
>> +  vec_outside_cost += peel_iters * scalar_single_iter_cost;
>> +
>
>You'll have a problem here if the number of iterations to be peeled is
not
>known at compile time... in this case LOOP_PEELING_FOR_ALIGNEMNT is
>negative, and the actual number of iterations is computed at run time.
So
>in that case you also need to build an expression that would be
evaluated
>at runtime. I understand that at this point this patch handles only the
>case that the number of iterations in known, so until it is extended
you
>just want to check if LOOP_PEELING_FOR_ALIGNMENT > 0.
>
>
>> +static void
>> +vect_model_reduction_cost (stmt_vec_info stmt_info, enum tree_code
>reduc_code,
>> +                  int ncopies)
>> +{
>
>As discussed with Tony off-line - there's a close dependency between
this
>function and the reduction transformation functions, as is the case
with
>all the vect_model_*_cost functions - they all depend on how the
respective
>vectorizable_* functions are implemented. At some point we may want to
>revisit this and see how to better design it.
>
>
>> +  /* Add in cost for intial definition.  */
>
>typo - initial
>
>
>> +  /* Determine cost of epilog code.  */
>> +  if (reduc_code < NUM_TREE_CODES)
>> +    /* We have a reduction operator that will reduce the vector in
one
>> +       statement.  Also requires scalar extract.  */
>> +    outer_cost += 2 * TARG_VEC_STMT_COST;
>
>Did you want to use TARG_VEC_TO_SCALAR_COST for the scalar extract? (As
>pointed out by others - it is not documented in tree-vectorizer.h).
>
>
>> +static void
>> +vect_model_simple_cost (stmt_vec_info stmt_info, int ncopies)
>> +{
>> +  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = ncopies *
>TARG_VEC_STMT_COST;
>> +}
>> +
>> +/* Function vect_cost_strided_group_size
>> +
>> +   For a load or store, returns the group_size if it's the last
store,
>and 1
>> +   if it is not.  */
>> +static int
>> +vect_cost_strided_group_size (stmt_vec_info stmt_info)
>> +{
>
>This is more of a convention within the tree-vect-* files: we keep 2
empty
>lines between functions, and one empty line between the description of
the
>function and the function itself (you have kept this convention almost
>everywhere else, but these are a couple such occurrences here and
there).
>
>
>> +/* Function vect_cost_strided_group_size
>> +
>> +   For a load or store, returns the group_size if it's the last
store,
>and 1
>> +   if it is not.  */
>> +static int
>> +vect_cost_strided_group_size (stmt_vec_info stmt_info)
>> +{
>> +  tree first_stmt = DR_GROUP_FIRST_DR (stmt_info);
>> +  stmt_vec_info first_info = vinfo_for_stmt (first_stmt);
>> +  int group_size = DR_GROUP_SIZE (first_info);
>> +  STMT_VINFO_MEM_COUNT_FOR_COST (first_info)++;
>> +
>> +  /* Is it the last in the group?  */
>> +  if (STMT_VINFO_MEM_COUNT_FOR_COST (first_info) < group_size)
>> +    group_size = 1;
>> +
>> +  return group_size;
>> +}
>
>I think there's no need to keep track of whether this is the last
>load/store in a group or not. it doesn't matter for which of the
>stores/loads we return the actual group_size, as long as we do it
exactly
>once per group. So I think you could rewrite it as follows (this way
you
>can also get rid of the field MEM_COUNT_FOR_COST):
>
>   static int
>   vect_cost_strided_group_size (stmt_vec_info stmt_info)
>   {
>     tree first_stmt = DR_GROUP_FIRST_DR (stmt_info);
>
>     if (first_stmt == STMT_VINFO_STMT (stmt_info))
>       return DR_GROUP_SIZE (stmt_info);
>
>     return 1;
>   }
>
>
>> +  /* Is this the last access in a group of stores providing strided
>acess?
>
>typo - access. Appears a couple of times.
>
>
>> +  if (group_size > 1)
>> +    {
>> +      /* Uses a high and low interleave operation for each needed
>permute.  */
>> +      cost = ncopies * exact_log2(group_size) * group_size *
>> +   TARG_VEC_TO_SCALAR_COST;
>> +    }
>
>why are you using TARG_VEC_TO_SCALAR_COST? Again, it's not documented,
so I
>don't know what it's meant to represent, but permute/shuffle like
>operations do not produce a scalar from a vector. Looks like
>TARG_VEC_STMT_COST is more appropriate here?
>
>
>> +      /* Uses a high and low interleave operation for each needed
>permute.  */
>> +      inner_cost = ncopies * exact_log2(group_size) * group_size *
>> +   TARG_VEC_STMT_COST;
>
>For the loads we actually use even and odd extract operations, not
>interleave operations (and the cost for both should probably be the
same)
>
>
>> +    case dr_unaligned_software_pipeline:
>> +      {
>> +   int outer_cost;
>> +
>> +   /* Unaligned software pipeline has a load of an address, an
intial
>
>typo - initial
>
>
>> +      load, and possibly a mask operation to "prime" the loop.
Inside
>> +      the loop, there is a load op and a realignment op.  */
>> +   outer_cost = 2 * TARG_VEC_STMT_COST;
>> +   if (targetm.vectorize.builtin_mask_for_load)
>> +     outer_cost += TARG_VEC_STMT_COST;
>> +   STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = outer_cost;
>
>if this is a group of loads (when vectorizing a strided access) - looks
>like the outer-cost will be added by each load in the group? (it should
be
>added only once per group)
>
>
>> +
>> +   inner_cost += (2*ncopies) * TARG_VEC_LOAD_COST;
>
>so you're using TARG_VEC_LOAD_COST also for the cost of the realignment
op?
>I think is should be a TARG_VEC_STMT_COST.
>
>
>thanks!
>
>dorit
>
>
>> The patch is currently missing loop versioning for
>> dynamically trip counted loops and some cost model specific tests.
My
>> colleague, Harsha Jagasia, will be providing follow on patches for
those
>> things shortly.
>>
>> Thank you.
>>
>> Tony
>>
>>
>> [attachment "patch" deleted by Dorit Nuzman/Haifa/IBM]
>
>



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] Fix PRs c++/31078 and c++/31103 (canonical types failures)
  2007-04-05 17:41 [PATCH] Fix PRs c++/31078 and c++/31103 (canonical types failures) Doug Gregor
  2007-04-05 19:42 ` [patch] Vectorizer cost model implementation Linthicum, Tony
@ 2007-04-12  3:37 ` Mark Mitchell
  1 sibling, 0 replies; 30+ messages in thread
From: Mark Mitchell @ 2007-04-12  3:37 UTC (permalink / raw)
  To: Doug Gregor; +Cc: GCC Patches

Doug Gregor wrote:

> 2007-04-05  Douglas Gregor  <doug.gregor@gmail.com>
> 
>     PR c++/31078
>     PR c++/31103
>     * c-common.c (c_build_qualified_type): Set canonical type
>     appropriately.
> 
> 2007-04-05  Douglas Gregor  <doug.gregor@gmail.com>
> 
>     * g++.dg/other/pr31078.C: New.

:REVIEWMAIL: OK

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [patch] Vectorizer cost model implementation
  2007-04-08 14:37   ` Dorit Nuzman
  2007-04-11 21:25     ` Jagasia, Harsha
@ 2007-04-22  5:45     ` Jagasia, Harsha
  2007-04-22  6:23       ` Andrew Pinski
                         ` (2 more replies)
  1 sibling, 3 replies; 30+ messages in thread
From: Jagasia, Harsha @ 2007-04-22  5:45 UTC (permalink / raw)
  To: Dorit Nuzman, Linthicum, Tony, Meissner, Michael; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 13609 bytes --]

Hello All,

>> This patch contains the initial implementation for a cost model for
the
>> vectorizer.
>
>Thanks Tony for this work, and Harsha for taking it on!
>As I told Tony off-line, I think this is an excellent start.  With
this,
>people can start experimenting and tuning the cost model on different
>targets, and gradually develop the required target-specific APIs, and
also
>see how to generalize this for the use of other loop transformations.

Indeed, the next step for us is to start tuning it to get
target-specific API's going.

I have attached a modified patch with changes for Dorit's feedback
below. Comments inline below to discuss the changes.

>
>Here are some comments to the patch:
>
>> Index: gcc/testsuite/gcc.dg/vect/vect.exp
>> ===================================================================
>> --- gcc/testsuite/gcc.dg/vect/vect.exp  (revision 123477)
>> +++ gcc/testsuite/gcc.dg/vect/vect.exp  (working copy)
>> @@ -23,7 +23,7 @@
>>  set DEFAULT_VECTCFLAGS ""
>>
>>  # These flags are used for all targets.
>> -lappend DEFAULT_VECTCFLAGS "-O2" "-ftree-vectorize"
>> +lappend DEFAULT_VECTCFLAGS "-O2" "-ftree-vectorize"
>"-fno-vect-cost-model"
>
>This is discussed separately (consider not to disable the cost-model by
>default, and instead xfail the tests that fail to vectorize with cost
model
>enabled (if there's not too many of them), duplicating them etc...).

Once I make the changes to the cost functions that were recommended
below, 9 of the currently existing tests fail to vectorize on x86-64,
because the cost model decides its not worthwhile to vectorize. All but
one of those 9 tests also fail on the x86 target. I have not evaluated
how many fail on other targets like powerpc, sparc or others. The
unknown-test behavior on other targets will be an issue if the cost
model is turned on by default in vect.exp. 

If the cost model is turned off by default, we could add separate tests
for it. However then the cost model may not get exercised very much.
Dorit and I think that it would be good to have a discussion on this
list about the two approaches.

For now, I have turned the cost model on in vect.exp for the vect-*
tests. For every vect-* test that failed, I duplicated it in
no-costmodel-vect-* test. For the duplicated test, I turned off the cost
model to allow it to pass. 

For every vect-* test that failed, I then added a dg-final test for
"vectorization not profitable" string. I removed the original dg-final
test (instead of xfail) because the gcc wiki suggests that xfail should
only be used for bugs that can not be fixed immediately. And the
behavior under focus is not a bug, but a correct estimation on part of
the cost model.

The test that failed to vectorize on x86-64, because the cost model
deemed it unprofitable, but passed on x86 is pr30843.c. This test uses
an unsigned long which is 4 bytes on x86 and 8 bytes on x86-64. It turns
on that the larger vectorization factor on x86 makes it profitable to
vectorize this loop. And hence the dg-final test I added for the
"vectorization not profitable" string passed on x86-64, but failed on
x86. So I had to explicitly target it to not be checked on x86. If
pr30843.c fails on other targets, they could also be added to this list.

I am relatively new to the dejagnu framework, so I would appreciate some
input on whether this implementation is on the right track.


>
>> @@ -2225,7 +2229,25 @@
>>        if (!loop_vinfo || !LOOP_VINFO_VECTORIZABLE_P (loop_vinfo))
>>     continue;
>>
>> +      required_iters = vect_model_required_iters (loop_vinfo);
>> +      if (required_iters < 0)
>> +   {
>> +     if (vect_print_dump_info (REPORT_DETAILS))
>> +       fprintf (vect_dump, "Vector body less profitable than
>scalar.\n");
>> +     continue;
>> +   }
>> +
>> +      if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) &&
>> +     LOOP_VINFO_INT_NITERS (loop_vinfo) < (unsigned) required_iters)
>> +       {
>> +           if (vect_print_dump_info (REPORT_DETAILS))
>> +           fprintf (vect_dump,
>> +                  "Not vectorized: min iters: %d, actual: %lu\n",
>> +                  required_iters, LOOP_VINFO_INT_NITERS
(loop_vinfo));
>> +         continue;
>> +       }
>>        vect_transform_loop (loop_vinfo);
>> +
>>        num_vectorized_loops++;
>>      }
>>    vect_loop_location = UNKNOWN_LOC;
>
>The other comment is that this piece of code would be better placed
>together with the last part of the function vect_analyze_operations,
where
>we already do several similar loop-bound related checks (if the
loop-bound
>is known, we compare it against the VF or whatever the user specified
via
>--param min-vect-loop-bound, and other checks). So, please move it
there,
>or, if you want, factor all of this out to a separate function, to be
>called after vect_analyze_operations.
>
>This also reminds me that we need to consider what to do if the user
asked
>to vectorize when the loop-bound is greater than X (via --param
>min-vect-loop-bound), but the cost model determined that it would be
>profitable when the loop-bound is greater than Y, and X != Y. The
--param
>min-vect-loop-bound was originally added only until such time that we
have
>a cost model, but maybe we want to keep it around, at least until the
cost
>model has been tuned/matured. In the meantime we can decide to do one
of
>the following:
>(1) always go with the user
>(2) always go with the cost model
>(3) go with the user if the user is more conservative than the
cost-model
>(i.e. X > Y)
>(4) go with the user if the user is less conservative than the
cost-model
>(i.e. X < Y)
>
>The first option is equivalent to turning off the cost model with
>-fno-vect-cost-model, so I'd go with one of 2,3,4.

I have moved the code to vect_analyze_operations. Also for now I have
chosen option 3. Down the road, I plan to test the cost model for
performance with benchmarks like spec/netlib etc to see if some other
choice would work better.


>> +fno-vect-cost-model
>> +Common Report Var(flag_no_vect_cost_model) Optimization
>> +Disable use of cost model in vectorization
>> +
>
>As others already commented - the flag should be introduced as
>-fvect-cost-model, and documented in the texi files.

Done.

>
>
>> +  /* Requires loop versioning tests.  */
>> +  if (LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo))
>> +    vec_outside_cost += TARG_COND_BRANCH_COST;
>
>We can try to be a bit more "accurate" here and have the cost depend on
the
>number of stmts in the may_misalign list (the more datarefs in the
list,
>the heavier is the test we create).  Obviously, there are a lot of
>improvements/fine-tuning like that that can be done, and these can be
done
>later.  For now, I would just ask that we record this in the code (add
a
>FIXME comment here).

For now I have added a FIXME.

>
>
>> +  if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
>> +    vec_outside_cost += TARG_COND_BRANCH_COST;
>...
>> +  /* Requires guard code for peeling.  */
>> +  if (LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo))
>> +    vec_outside_cost += TARG_COND_BRANCH_COST;
>
>In the smae spirit of the above comment - we actually generate 2 guards
for
>each peel-loop that we create, so 2*TARG_COND_BRANCH_COST would be more
>"accurate".

Done.


>
>
>> +  /* Count statements in scalar loop.  Using this as scalar cost for
a
>single
>> +     iteration for now.  */
>> +  for (i = 0; i < nbbs; i++)
>> +    {
>> +      block_stmt_iterator si;
>> +      basic_block bb = bbs[i];
>> +
>> +      for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
>> +   {
>> +     tree stmt = bsi_stmt (si);
>> +          stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>> +     if (!STMT_VINFO_RELEVANT_P (stmt_info)
>> +         && !STMT_VINFO_LIVE_P (stmt_info))
>> +       continue;
>> +     scalar_single_iter_cost++;
>> +     vec_inside_cost += STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info);
>> +     vec_outside_cost += STMT_VINFO_OUTSIDE_OF_LOOP_COST
(stmt_info);
>> +   }
>> +    }
>
>This is more a note for myself - to remember to update this code when
>looking at outer-loop vectorization (the stmts in the inner-loop
execute
>more times than the rest of the stmts). More adjustments would probably
be
>needed for outer-loop support.

For now I have added a TODO.

>
>
>> +  /* Add in costs of peeled instructions, if any.  */
>> +  peel_iters = LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo);
>> +  vec_outside_cost += peel_iters * scalar_single_iter_cost;
>> +
>
>You'll have a problem here if the number of iterations to be peeled is
not
>known at compile time... in this case LOOP_PEELING_FOR_ALIGNEMNT is
>negative, and the actual number of iterations is computed at run time.
So
>in that case you also need to build an expression that would be
evaluated
>at runtime. I understand that at this point this patch handles only the
>case that the number of iterations in known, so until it is extended
you
>just want to check if LOOP_PEELING_FOR_ALIGNMENT > 0.

Added the check.

>
>
>> +static void
>> +vect_model_reduction_cost (stmt_vec_info stmt_info, enum tree_code
>reduc_code,
>> +                  int ncopies)
>> +{
>
>As discussed with Tony off-line - there's a close dependency between
this
>function and the reduction transformation functions, as is the case
with
>all the vect_model_*_cost functions - they all depend on how the
respective
>vectorizable_* functions are implemented. At some point we may want to
>revisit this and see how to better design it.

Indeed, for now I have added a TODO comment.

>> +  /* Determine cost of epilog code.  */
>> +  if (reduc_code < NUM_TREE_CODES)
>> +    /* We have a reduction operator that will reduce the vector in
one
>> +       statement.  Also requires scalar extract.  */
>> +    outer_cost += 2 * TARG_VEC_STMT_COST;
>
>Did you want to use TARG_VEC_TO_SCALAR_COST for the scalar extract? (As
>pointed out by others - it is not documented in tree-vectorizer.h).

Fixed to use TARG_VEC_TO_SCALAR_COST and documented the target costs.

>
>> +/* Function vect_cost_strided_group_size
>> +
>> +   For a load or store, returns the group_size if it's the last
store,
>and 1
>> +   if it is not.  */
>> +static int
>> +vect_cost_strided_group_size (stmt_vec_info stmt_info)
>> +{
>> +  tree first_stmt = DR_GROUP_FIRST_DR (stmt_info);
>> +  stmt_vec_info first_info = vinfo_for_stmt (first_stmt);
>> +  int group_size = DR_GROUP_SIZE (first_info);
>> +  STMT_VINFO_MEM_COUNT_FOR_COST (first_info)++;
>> +
>> +  /* Is it the last in the group?  */
>> +  if (STMT_VINFO_MEM_COUNT_FOR_COST (first_info) < group_size)
>> +    group_size = 1;
>> +
>> +  return group_size;
>> +}
>
>I think there's no need to keep track of whether this is the last
>load/store in a group or not. it doesn't matter for which of the
>stores/loads we return the actual group_size, as long as we do it
exactly
>once per group. So I think you could rewrite it as follows (this way
you
>can also get rid of the field MEM_COUNT_FOR_COST):
>
>   static int
>   vect_cost_strided_group_size (stmt_vec_info stmt_info)
>   {
>     tree first_stmt = DR_GROUP_FIRST_DR (stmt_info);
>
>     if (first_stmt == STMT_VINFO_STMT (stmt_info))
>       return DR_GROUP_SIZE (stmt_info);
>
>     return 1;
>   }
>

Done.

>
>> +  if (group_size > 1)
>> +    {
>> +      /* Uses a high and low interleave operation for each needed
>permute.  */
>> +      cost = ncopies * exact_log2(group_size) * group_size *
>> +   TARG_VEC_TO_SCALAR_COST;
>> +    }
>
>why are you using TARG_VEC_TO_SCALAR_COST? Again, it's not documented,
so I
>don't know what it's meant to represent, but permute/shuffle like
>operations do not produce a scalar from a vector. Looks like
>TARG_VEC_STMT_COST is more appropriate here?

Fixed to use TARG_VEC_STMT_COST.


>
>
>> +      /* Uses a high and low interleave operation for each needed
>permute.  */
>> +      inner_cost = ncopies * exact_log2(group_size) * group_size *
>> +   TARG_VEC_STMT_COST;
>
>For the loads we actually use even and odd extract operations, not
>interleave operations (and the cost for both should probably be the
same)

Fixed the comment and left the cost the same.

>
>> +      load, and possibly a mask operation to "prime" the loop.
Inside
>> +      the loop, there is a load op and a realignment op.  */
>> +   outer_cost = 2 * TARG_VEC_STMT_COST;
>> +   if (targetm.vectorize.builtin_mask_for_load)
>> +     outer_cost += TARG_VEC_STMT_COST;
>> +   STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = outer_cost;
>
>if this is a group of loads (when vectorizing a strided access) - looks
>like the outer-cost will be added by each load in the group? (it should
be
>added only once per group)

I have fixed this to only be counted if the load is not strided or it's
the one in the stride group that's been marked with the group size.

>
>
>> +
>> +   inner_cost += (2*ncopies) * TARG_VEC_LOAD_COST;
>
>so you're using TARG_VEC_LOAD_COST also for the cost of the realignment
op?
>I think is should be a TARG_VEC_STMT_COST.

Fixed.


Added a check for the dynamically trip counted loops in
vect_do_peeling_for_loop_bound.

Also, I have pulled out the check for ADJUST_IN_EPILOG and the code
enabled by it.

The other comments on typo's, naming and such have also been fixed.

Thanks,
Harsha

[-- Attachment #2: patch --]
[-- Type: application/octet-stream, Size: 52430 bytes --]

Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 123494)
+++ doc/invoke.texi	(working copy)
@@ -354,7 +354,7 @@
 -fcheck-data-deps @gol
 -ftree-dominator-opts -ftree-dse -ftree-copyrename -ftree-sink @gol
 -ftree-ch -ftree-sra -ftree-ter -ftree-fre -ftree-vectorize @gol
--ftree-vect-loop-version -ftree-salias -fipa-pta -fweb @gol
+-ftree-vect-loop-version -fvect-cost-model -ftree-salias -fipa-pta -fweb @gol
 -ftree-copy-prop -ftree-store-ccp -ftree-store-copy-prop -fwhole-program @gol
 --param @var{name}=@var{value}
 -O  -O0  -O1  -O2  -O3  -Os}
@@ -5528,6 +5528,9 @@
 to control which version is executed.  This option is enabled by default
 except at level @option{-Os} where it is disabled.
 
+@item -fvect-cost-model
+Enable cost model for vectorization.
+
 @item -ftree-vrp
 Perform Value Range Propagation on trees.  This is similar to the
 constant propagation pass, but instead of values, ranges of values are
Index: testsuite/gcc.dg/vect/vect-105.c
===================================================================
--- testsuite/gcc.dg/vect/vect-105.c	(revision 123494)
+++ testsuite/gcc.dg/vect/vect-105.c	(working copy)
@@ -60,7 +60,7 @@
   return main1 (N);
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" } } */
 /*  { dg-final { scan-tree-dump-times "Alignment of access forced using versioning" 2 "vect" { target vect_no_align } } } */
 /* { dg-final { scan-tree-dump-times "possible dependence between data-refs" 0 "vect" } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-costmodel-vect-105.c
===================================================================
--- testsuite/gcc.dg/vect/no-costmodel-vect-105.c	(revision 0)
+++ testsuite/gcc.dg/vect/no-costmodel-vect-105.c	(revision 0)
@@ -0,0 +1,67 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdlib.h>
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 4
+
+struct extraction
+{
+  int a[N][N];
+  int b[N][N];
+};
+
+static int a[N][N] = {{1,2,3,11},{4,5,6,12},{7,8,9,13},{34,45,67,83}};
+static int b[N][N] = {{17,28,15,23},{0,2,3,24},{4,31,82,25},{29,31,432,256}};
+static int c[N][N] = {{1,2,3,11},{4,9,13,34},{45,67,83,13},{34,45,67,83}};
+
+int main1 (int x) {
+  int i,j;
+  struct extraction *p;
+  p = (struct extraction *) malloc (sizeof (struct extraction));
+
+  for (i = 0; i < N; i++)
+   {
+    for (j = 0; j < N; j++)
+     {
+       p->a[i][j] = a[i][j];
+       p->b[i][j] = b[i][j];
+       if (x == 135)
+	 abort (); /* to avoid vectorization  */
+     }
+   }
+
+  /* Vectorizable: distance > number of iterations.  */
+  for (i = 1; i < N; i++)
+  {
+    for (j = 0; j < N; j++)
+    {
+       *((int *)p + x + i + j) = *((int *)p + x + i + j + 5);
+    }
+  }
+
+  /* check results: */
+  for (i = 0; i < N; i++)
+   {
+    for (j = 0; j < N; j++)
+     {
+       if (p->a[i][j] != c[i][j])
+         abort();
+     }
+  }
+  return 0;
+}
+
+int main (void)
+{ 
+  check_vect ();
+
+  return main1 (N);
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/*  { dg-final { scan-tree-dump-times "Alignment of access forced using versioning" 2 "vect" { target vect_no_align } } } */
+/* { dg-final { scan-tree-dump-times "possible dependence between data-refs" 0 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: testsuite/gcc.dg/vect/vect-92.c
===================================================================
--- testsuite/gcc.dg/vect/vect-92.c	(revision 123494)
+++ testsuite/gcc.dg/vect/vect-92.c	(working copy)
@@ -90,7 +90,8 @@
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 3 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
 /* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 "vect" } } */
 /* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 3 "vect" } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-costmodel-vect-reduc-1char.c
===================================================================
--- testsuite/gcc.dg/vect/no-costmodel-vect-reduc-1char.c	(revision 0)
+++ testsuite/gcc.dg/vect/no-costmodel-vect-reduc-1char.c	(revision 0)
@@ -0,0 +1,50 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 16
+#define DIFF 242
+
+void
+main1 (unsigned char x, unsigned char max_result, unsigned char min_result)
+{
+  int i;
+  unsigned char ub[N] = {1,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+  unsigned char uc[N] = {1,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+  unsigned char udiff = 2;
+  unsigned char umax = x;
+  unsigned char umin = x;
+
+  for (i = 0; i < N; i++) {
+    udiff += (unsigned char)(ub[i] - uc[i]);
+  }
+
+  for (i = 0; i < N; i++) {
+    umax = umax < uc[i] ? uc[i] : umax;
+  }
+
+  for (i = 0; i < N; i++) {
+    umin = umin > uc[i] ? uc[i] : umin;
+  }
+
+  /* check results:  */
+  if (udiff != DIFF)
+    abort ();
+  if (umax != max_result)
+    abort ();
+  if (umin != min_result)
+    abort ();
+}
+
+int main (void)
+{ 
+  check_vect ();
+  
+  main1 (100, 100, 1);
+  main1 (0, 15, 0);
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { xfail vect_no_int_max } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-section-anchors-vect-69.c
===================================================================
--- testsuite/gcc.dg/vect/no-section-anchors-vect-69.c	(revision 123494)
+++ testsuite/gcc.dg/vect/no-section-anchors-vect-69.c	(working copy)
@@ -111,7 +111,8 @@
   return main1 ();
 } 
 
-/* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 3 "vect" } }  */
 /* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 "vect" } } */
 /* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 3 "vect" } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-costmodel-vect-76.c
===================================================================
--- testsuite/gcc.dg/vect/no-costmodel-vect-76.c	(revision 0)
+++ testsuite/gcc.dg/vect/no-costmodel-vect-76.c	(revision 0)
@@ -0,0 +1,74 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 8
+#define OFF 4
+
+/* Check handling of accesses for which the "initial condition" -
+   the expression that represents the first location accessed - is
+   more involved than just an ssa_name.  */
+
+int ib[N+OFF] __attribute__ ((__aligned__(16))) = {0, 1, 3, 5, 7, 11, 13, 17, 0, 2, 6, 10};
+
+int main1 (int *pib)
+{
+  int i;
+  int ia[N+OFF];
+  int ic[N+OFF] = {0, 1, 3, 5, 7, 11, 13, 17, 0, 2, 6, 10};
+
+  for (i = OFF; i < N; i++)
+    {
+      ia[i] = pib[i - OFF];
+    }
+
+
+  /* check results:  */
+  for (i = OFF; i < N; i++)
+    {
+     if (ia[i] != pib[i - OFF])
+        abort ();
+    }
+
+  for (i = 0; i < N; i++)
+    {
+      ia[i] = pib[i - OFF];
+    }
+
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+     if (ia[i] != pib[i - OFF])
+        abort ();
+    }
+
+  for (i = OFF; i < N; i++)
+    {
+      ia[i] = ic[i - OFF];
+    }
+
+
+  /* check results:  */
+  for (i = OFF; i < N; i++)
+    {
+     if (ia[i] != ic[i - OFF])
+        abort ();  
+    }
+
+  return 0;  
+}
+
+int main (void)
+{
+  check_vect ();
+
+  main1 (&ib[OFF]);
+  return 0;
+}
+
+
+/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 2 "vect" { xfail vect_no_align } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-costmodel-pr30843.c
===================================================================
--- testsuite/gcc.dg/vect/no-costmodel-pr30843.c	(revision 0)
+++ testsuite/gcc.dg/vect/no-costmodel-pr30843.c	(revision 0)
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_long } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 16
+
+void dacP98FillRGBMap (unsigned char *pBuffer)
+{
+    unsigned long dw, dw1;
+    unsigned long *pdw = (unsigned long *)(pBuffer);
+
+    for( dw = 256, dw1 = 0; dw; dw--, dw1 += 0x01010101) 
+    {
+       *pdw++ = dw1;
+       *pdw++ = dw1;
+       *pdw++ = dw1;
+       *pdw++ = dw1;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave } } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: testsuite/gcc.dg/vect/vect-76.c
===================================================================
--- testsuite/gcc.dg/vect/vect-76.c	(revision 123494)
+++ testsuite/gcc.dg/vect/vect-76.c	(working copy)
@@ -69,6 +69,7 @@
 }
 
 
-/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
 /* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 2 "vect" { xfail vect_no_align } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-costmodel-no-section-anchors-vect-69.c
===================================================================
--- testsuite/gcc.dg/vect/no-costmodel-no-section-anchors-vect-69.c	(revision 0)
+++ testsuite/gcc.dg/vect/no-costmodel-no-section-anchors-vect-69.c	(revision 0)
@@ -0,0 +1,117 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 8
+
+struct s{
+  int m;
+  int n[N][N][N];
+};
+
+struct s2{
+  int m;
+  int n[N-1][N-1][N-1];
+};
+
+struct test1{
+  struct s a; /* array a.n is unaligned */
+  int b;
+  int c;
+  struct s e; /* array e.n is aligned */
+};
+
+struct test2{
+  struct s2 a; /* array a.n is unaligned */
+  int b;
+  int c;
+  struct s2 e; /* array e.n is aligned */
+};
+
+
+struct test1 tmp1[4];
+struct test2 tmp2[4];
+
+int main1 ()
+{  
+  int i,j;
+
+  /* 1. unaligned */
+  for (i = 0; i < N; i++)
+    {
+      tmp1[2].a.n[1][2][i] = 5;
+    }
+
+  /* check results:  */
+  for (i = 0; i <N; i++)
+    {
+      if (tmp1[2].a.n[1][2][i] != 5)
+        abort ();
+    }
+
+  /* 2. aligned */
+  for (i = 3; i < N-1; i++)
+    {
+      tmp1[2].a.n[1][2][i] = 6;
+    }
+
+  /* check results:  */
+  for (i = 3; i < N-1; i++)
+    {
+      if (tmp1[2].a.n[1][2][i] != 6)
+        abort ();
+    }
+
+  /* 3. aligned */
+  for (i = 0; i < N; i++)
+    {
+      for (j = 0; j < N; j++)
+	{
+          tmp1[2].e.n[1][i][j] = 8;
+	}
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      for (j = 0; j < N; j++)
+	{
+          if (tmp1[2].e.n[1][i][j] != 8)
+	    abort ();
+	}
+    }
+
+  /* 4. unaligned */
+  for (i = 0; i < N-4; i++)
+    {
+      for (j = 0; j < N-4; j++)
+	{
+          tmp2[2].e.n[1][i][j] = 8;
+	}
+    }
+
+  /* check results:  */
+  for (i = 0; i < N-4; i++)
+    {
+      for (j = 0; j < N-4; j++)
+	{
+          if (tmp2[2].e.n[1][i][j] != 8)
+	    abort ();
+	}
+    }
+
+  return 0;
+}
+
+int main (void)
+{ 
+  check_vect ();
+  
+  return main1 ();
+} 
+
+/* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 3 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/pr30843.c
===================================================================
--- testsuite/gcc.dg/vect/pr30843.c	(revision 123494)
+++ testsuite/gcc.dg/vect/pr30843.c	(working copy)
@@ -20,6 +20,6 @@
     }
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_interleave } } } } */
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" { target { vect_interleave && !i?86-*-* } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
Index: testsuite/gcc.dg/vect/no-costmodel-vect-70.c
===================================================================
--- testsuite/gcc.dg/vect/no-costmodel-vect-70.c	(revision 0)
+++ testsuite/gcc.dg/vect/no-costmodel-vect-70.c	(revision 0)
@@ -0,0 +1,67 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 12
+
+struct s{
+  int m;
+  int n[N][N][N];
+};
+
+struct test1{
+  struct s a; /* array a.n is unaligned */
+  int b;
+  int c;
+  struct s e[N]; /* array e.n is aligned */
+};
+
+int main1 ()
+{
+  int i,j;
+  struct test1 tmp1;
+
+  for (i = 0; i < N; i++)
+    for (j = 3; j < N-3; j++)
+      {
+        tmp1.e[i].n[1][2][j] = 8;
+      }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    for (j = 3; j < N-3; j++)
+    {
+      if (tmp1.e[i].n[1][2][j] != 8)
+          abort ();
+    }
+  
+  /* not consecutive */
+  for (i = 0; i < N; i++)
+    for (j = 3; j < N-3; j++)
+      { 
+        tmp1.e[j].n[1][2][j] = 8;
+      }
+  
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    for (j = 3; j < N-3; j++)
+    {
+      if (tmp1.e[j].n[1][2][j] != 8)
+          abort ();
+    }
+    
+  return 0;
+}
+       
+int main (void)
+{
+  check_vect ();
+    
+  return main1 ();
+}
+          
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/vect-70.c
===================================================================
--- testsuite/gcc.dg/vect/vect-70.c	(revision 123494)
+++ testsuite/gcc.dg/vect/vect-70.c	(working copy)
@@ -61,7 +61,7 @@
   return main1 ();
 }
           
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" } }  */
 /* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 "vect" } } */
 /* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 1 "vect" } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/vect-reduc-1char.c
===================================================================
--- testsuite/gcc.dg/vect/vect-reduc-1char.c	(revision 123494)
+++ testsuite/gcc.dg/vect/vect-reduc-1char.c	(working copy)
@@ -46,5 +46,6 @@
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { xfail vect_no_int_max } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail vect_no_int_max } } } */
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 2 "vect" { xfail { vect_no_int_max } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/vect.exp
===================================================================
--- testsuite/gcc.dg/vect/vect.exp	(revision 123494)
+++ testsuite/gcc.dg/vect/vect.exp	(working copy)
@@ -25,6 +25,12 @@
 # These flags are used for all targets.
 lappend DEFAULT_VECTCFLAGS "-O2" "-ftree-vectorize"
 
+# Set up cost model enablement flag.  
+set COSTMODEL_ON_FLAG "-fvect-cost-model"
+
+# Set up cost model disablement flag.
+set COSTMODEL_OFF_FLAG "-fno-vect-cost-model"
+
 # If the target system supports vector instructions, the default action
 # for a test is 'run', otherwise it's 'compile'.  Save current default.
 # Executing vector instructions on a system without hardware vector support
@@ -88,9 +94,13 @@
 
 # Main loop.
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/pr*.\[cS\]]]  \
-	"" $DEFAULT_VECTCFLAGS
+	$COSTMODEL_ON_FLAG $DEFAULT_VECTCFLAGS
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-costmodel-pr*.\[cS\]]]  \
+	$COSTMODEL_OFF_FLAG $DEFAULT_VECTCFLAGS
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/vect-*.\[cS\]]]  \
-	"" $DEFAULT_VECTCFLAGS
+	$COSTMODEL_ON_FLAG $DEFAULT_VECTCFLAGS 
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-costmodel-vect-*.\[cS\]]]  \
+	$COSTMODEL_OFF_FLAG $DEFAULT_VECTCFLAGS
 
 #### Tests with special options
 global SAVED_DEFAULT_VECTCFLAGS
@@ -100,78 +110,82 @@
 set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
 lappend DEFAULT_VECTCFLAGS "-ffast-math"
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/fast-math-vect*.\[cS\]]]  \
-	"" $DEFAULT_VECTCFLAGS
+	$COSTMODEL_ON_FLAG $DEFAULT_VECTCFLAGS
 
 # -fno-math-errno tests
 set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
 lappend DEFAULT_VECTCFLAGS "-fno-math-errno"
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-math-errno-vect*.\[cS\]]]  \
-	"" $DEFAULT_VECTCFLAGS
+	$COSTMODEL_ON_FLAG $DEFAULT_VECTCFLAGS
 
 # -fwrapv tests
 set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
 lappend DEFAULT_VECTCFLAGS "-fwrapv"
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/wrapv-vect*.\[cS\]]]  \
-        "" $DEFAULT_VECTCFLAGS
+        $COSTMODEL_ON_FLAG $DEFAULT_VECTCFLAGS
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-costmodel-wrapv-vect*.\[cS\]]]  \
+        $COSTMODEL_OFF_FLAG $DEFAULT_VECTCFLAGS
 
 # -ftrapv tests
 set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
 lappend DEFAULT_VECTCFLAGS "-ftrapv"
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/trapv-vect*.\[cS\]]]  \
-	"" $DEFAULT_VECTCFLAGS
+	$COSTMODEL_ON_FLAG $DEFAULT_VECTCFLAGS
 
 # -fdump-tree-dceloop-details tests
 set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
 lappend DEFAULT_VECTCFLAGS "-fdump-tree-dceloop-details"
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/dump-tree-dceloop-*.\[cS\]]]  \
-        "" $DEFAULT_VECTCFLAGS
+        $COSTMODEL_ON_FLAG $DEFAULT_VECTCFLAGS
 
 # -fno-tree-dce tests
 set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
 lappend DEFAULT_VECTCFLAGS "-fno-tree-dce"
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-tree-dce-*.\[cS\]]]  \
-	"" $DEFAULT_VECTCFLAGS
+	$COSTMODEL_ON_FLAG $DEFAULT_VECTCFLAGS
 
 # -fsection-anchors tests
 set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
 lappend DEFAULT_VECTCFLAGS "-fsection-anchors"
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/section-anchors-*.\[cS\]]]  \
-	"" $DEFAULT_VECTCFLAGS
+	$COSTMODEL_ON_FLAG $DEFAULT_VECTCFLAGS
 
 # -fno-section-anchors tests
 set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
 lappend DEFAULT_VECTCFLAGS "-fno-section-anchors"
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-section-anchors-*.\[cS\]]]  \
-	"" $DEFAULT_VECTCFLAGS
+	$COSTMODEL_ON_FLAG $DEFAULT_VECTCFLAGS
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-costmodel-no-section-anchors-*.\[cS\]]]  \
+	$COSTMODEL_OFF_FLAG $DEFAULT_VECTCFLAGS
 
 # -funswitch-loops tests
 set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
 lappend DEFAULT_VECTCFLAGS "-funswitch-loops"
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/unswitch-loops-*.\[cS\]]]  \
-	"" $DEFAULT_VECTCFLAGS
+	$COSTMODEL_ON_FLAG $DEFAULT_VECTCFLAGS
 
 # -fno-trapping-math tests
 set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
 lappend DEFAULT_VECTCFLAGS "-fno-trapping-math"
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-trapping-math-*.\[cS\]]]  \
-	"" $DEFAULT_VECTCFLAGS
+	$COSTMODEL_ON_FLAG $DEFAULT_VECTCFLAGS
 
 # -fno-tree-scev-cprop
 set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
 lappend DEFAULT_VECTCFLAGS "-fno-tree-scev-cprop"
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-tree-scev-cprop-*.\[cS\]]]  \
-	"" $DEFAULT_VECTCFLAGS
+	$COSTMODEL_ON_FLAG $DEFAULT_VECTCFLAGS
 
 # -fno-tree-dominator-opts
 set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
 lappend DEFAULT_VECTCFLAGS "-fno-tree-dominator-opts"
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-tree-dom-*.\[cS\]]]  \
-	"" $DEFAULT_VECTCFLAGS
+	$COSTMODEL_ON_FLAG $DEFAULT_VECTCFLAGS
 
 # With -Os
 lappend DEFAULT_VECTCFLAGS "-Os"
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/Os-vect-*.\[cS\]]]  \
-        "" $DEFAULT_VECTCFLAGS
+        $COSTMODEL_ON_FLAG $DEFAULT_VECTCFLAGS
 
 # Clean up.
 set dg-do-what-default ${save-dg-do-what-default}
Index: testsuite/gcc.dg/vect/no-costmodel-wrapv-vect-reduc-2char.c
===================================================================
--- testsuite/gcc.dg/vect/no-costmodel-wrapv-vect-reduc-2char.c	(revision 0)
+++ testsuite/gcc.dg/vect/no-costmodel-wrapv-vect-reduc-2char.c	(revision 0)
@@ -0,0 +1,49 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 16
+#define DIFF 121
+
+void main1 (signed char x, signed char max_result, signed char min_result)
+{
+  int i;
+  signed char b[N] = {1,2,3,6,8,10,12,14,16,18,20,22,24,26,28,30};
+  signed char c[N] = {1,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+  signed char diff = 2;
+  signed char max = x;
+  signed char min = x;
+
+  for (i = 0; i < N; i++) {
+    diff += (signed char)(b[i] - c[i]);
+  }
+
+  for (i = 0; i < N; i++) {
+    max = max < c[i] ? c[i] : max;
+  }
+
+  for (i = 0; i < N; i++) {
+    min = min > c[i] ? c[i] : min;
+  }
+
+  /* check results:  */
+  if (diff != DIFF)
+    abort ();
+  if (max != max_result)
+    abort ();
+  if (min != min_result)
+    abort ();
+}
+
+int main (void)
+{ 
+  check_vect ();
+  
+  main1 (100, 100, 1);
+  main1 (0, 15, 0);
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { xfail vect_no_int_max } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-costmodel-vect-66.c
===================================================================
--- testsuite/gcc.dg/vect/no-costmodel-vect-66.c	(revision 0)
+++ testsuite/gcc.dg/vect/no-costmodel-vect-66.c	(revision 0)
@@ -0,0 +1,82 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 16
+
+int main1 ()
+{
+  int i, j;
+  int ib[6] = {0,3,6,9,12,15};
+  int ia[8][5][6];
+  int ic[16][16][5][6];
+
+  /* Multidimensional array. Aligned. */
+  for (i = 0; i < 16; i++)
+    {
+      for (j = 0; j < 4; j++)
+        {
+           ia[2][6][j] = 5;
+        }
+    }
+
+  /* check results: */  
+  for (i = 0; i < 16; i++)
+    {
+      for (j = 0; j < 4; j++)
+        {
+           if (ia[2][6][j] != 5)
+                abort();
+        }
+    }
+  /* Multidimensional array. Aligned. */
+  for (i = 0; i < 16; i++)
+    {
+      for (j = 0; j < 4; j++)
+           ia[3][6][j+2] = 5;
+    }
+
+  /* check results: */  
+  for (i = 0; i < 16; i++)
+    {
+      for (j = 2; j < 6; j++)
+        {
+           if (ia[3][6][j] != 5)
+                abort();
+        }
+    }
+
+  /* Multidimensional array. Not aligned. */
+  for (i = 0; i < 16; i++)
+    {
+      for (j = 0; j < 4; j++)
+        {
+           ic[2][1][6][j+1] = 5;
+        }
+    }
+
+  /* check results: */  
+  for (i = 0; i < 16; i++)
+    {
+      for (j = 0; j < 4; j++)
+        {
+           if (ic[2][1][6][j+1] != 5)
+                abort();
+        }
+    }
+
+  return 0;
+}
+
+int main (void)
+{ 
+  check_vect ();
+
+  return main1 ();
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/vect-66.c
===================================================================
--- testsuite/gcc.dg/vect/vect-66.c	(revision 123494)
+++ testsuite/gcc.dg/vect/vect-66.c	(working copy)
@@ -76,7 +76,8 @@
   return main1 ();
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" } } */
 /* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 "vect" } } */
 /* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 1 "vect" } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-costmodel-vect-92.c
===================================================================
--- testsuite/gcc.dg/vect/no-costmodel-vect-92.c	(revision 0)
+++ testsuite/gcc.dg/vect/no-costmodel-vect-92.c	(revision 0)
@@ -0,0 +1,96 @@
+/* { dg-require-effective-target vect_float } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 256
+
+float pa[N] __attribute__ ((__aligned__(16)));
+float pb[N] __attribute__ ((__aligned__(16))) = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,48,51,54,57};
+float pc[N] __attribute__ ((__aligned__(16))) = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19};
+
+/* Check handling of unaligned accesses when the misalignment is
+   known at compile time and different accesses have the same
+   misalignment (e.g. peeling to align one access will align all
+   accesses with the same misalignment.  Also, the number of 
+   peeled iterations is known in this case, and the vectorizer
+   can use this information (generate prolog and epilog loops
+   with known number of iterations, and only if needed).  */
+
+int
+main1 ()
+{
+  int i;
+
+  for (i = 0; i < 5; i++)
+    {
+      pa[i+1] = pb[i+1] * pc[i+1];
+    }
+
+  /* check results:  */
+  for (i = 0; i < 5; i++)
+    {
+      if (pa[i+1] != (pb[i+1] * pc[i+1]))
+	abort ();
+    }
+
+  return 0;
+}
+
+int
+main2 ()
+{
+  int i;
+
+  for (i = 0; i < 6; i++)
+    {
+      pa[i+1] = pb[i+1] * pc[i+1];
+    }
+
+  /* check results:  */
+  for (i = 0; i < 6; i++)
+    {
+      if (pa[i+1] != (pb[i+1] * pc[i+1]))
+	abort ();
+    }
+
+  return 0;
+}
+
+int
+main3 (int n)
+{
+  int i;
+
+  for (i = 0; i < n; i++)
+    {
+      pa[i+1] = pb[i+1] * pc[i+1];
+    }
+
+  /* check results:  */
+  for (i = 0; i < n; i++)
+    {
+      if (pa[i+1] != (pb[i+1] * pc[i+1]))
+	abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{
+  int i;
+
+  check_vect ();
+
+  main1 ();
+  main2 ();
+  main3 (N-1);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 3 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 3 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/wrapv-vect-reduc-2char.c
===================================================================
--- testsuite/gcc.dg/vect/wrapv-vect-reduc-2char.c	(revision 123494)
+++ testsuite/gcc.dg/vect/wrapv-vect-reduc-2char.c	(working copy)
@@ -45,5 +45,6 @@
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { xfail vect_no_int_max } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail vect_no_int_max } } } */
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 2 "vect" { xfail { vect_no_int_max } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
Index: tree-vectorizer.c
===================================================================
--- tree-vectorizer.c	(revision 123494)
+++ tree-vectorizer.c	(working copy)
@@ -1367,6 +1367,8 @@
   else
     STMT_VINFO_DEF_TYPE (res) = vect_loop_def;
   STMT_VINFO_SAME_ALIGN_REFS (res) = VEC_alloc (dr_p, heap, 5);
+  STMT_VINFO_INSIDE_OF_LOOP_COST (res) = 0;
+  STMT_VINFO_OUTSIDE_OF_LOOP_COST (res) = 0;
   DR_GROUP_FIRST_DR (res) = NULL_TREE;
   DR_GROUP_NEXT_DR (res) = NULL_TREE;
   DR_GROUP_SIZE (res) = 0;
Index: tree-vectorizer.h
===================================================================
--- tree-vectorizer.h	(revision 123494)
+++ tree-vectorizer.h	(working copy)
@@ -264,6 +264,13 @@
   /* For loads only, if there is a store with the same location, this field is
      TRUE.  */
   bool read_write_dep;
+
+  /* Vectorization costs associated with statement.  */
+  struct  
+  {
+    int outside_of_loop;     /* Statements generated outside loop.  */
+    int inside_of_loop;      /* Statements generated inside loop.  */
+  } cost;
 } *stmt_vec_info;
 
 /* Access Functions.  */
@@ -296,7 +303,31 @@
 #define DR_GROUP_READ_WRITE_DEPENDENCE(S)  (S)->read_write_dep
 
 #define STMT_VINFO_RELEVANT_P(S)          ((S)->relevant != vect_unused_in_loop)
+#define STMT_VINFO_OUTSIDE_OF_LOOP_COST(S) (S)->cost.outside_of_loop
+#define STMT_VINFO_INSIDE_OF_LOOP_COST(S)  (S)->cost.inside_of_loop
 
+/* These are some defines for the initial implementation of the vectorizer's
+   cost model.  These will later be target specific hooks.  */
+
+/* Cost of conditional branch.  */
+#define TARG_COND_BRANCH_COST        3
+
+/* Cost of any vector operation, excluding load, store or vector to scalar
+   operation.  */ 
+#define TARG_VEC_STMT_COST           1
+
+/* Cost of vector to scalar operation.  */
+#define TARG_VEC_TO_SCALAR_COST      1
+
+/* Cost of aligned vector load.  */
+#define TARG_VEC_LOAD_COST           1
+
+/* Cost of misaligned vector load.  */
+#define TARG_VEC_UNALIGNED_LOAD_COST 2
+
+/* Cost of vector store.  */
+#define TARG_VEC_STORE_COST          1
+
 static inline void set_stmt_info (stmt_ann_t ann, stmt_vec_info stmt_info);
 static inline stmt_vec_info vinfo_for_stmt (tree stmt);
 
@@ -428,6 +459,7 @@
 extern bool vectorizable_condition (tree, block_stmt_iterator *, tree *);
 extern bool vectorizable_live_operation (tree, block_stmt_iterator *, tree *);
 extern bool vectorizable_reduction (tree, block_stmt_iterator *, tree *);
+extern int  vect_estimate_min_profitable_iters (loop_vec_info);
 /* Driver for transformation stage.  */
 extern void vect_transform_loop (loop_vec_info);
 
Index: tree-vect-analyze.c
===================================================================
--- tree-vect-analyze.c	(revision 123494)
+++ tree-vect-analyze.c	(working copy)
@@ -298,6 +298,8 @@
   tree phi;
   stmt_vec_info stmt_info;
   bool need_to_vectorize = false;
+  int min_profitable_iters;
+  int min_scalar_loop_bound;
 
   if (vect_print_dump_info (REPORT_DETAILS))
     fprintf (vect_dump, "=== vect_analyze_operations ===");
@@ -419,8 +421,6 @@
 	} /* stmts in bb */
     } /* bbs */
 
-  /* TODO: Analyze cost. Decide if worth while to vectorize.  */
-
   /* All operations in the loop are either irrelevant (deal with loop
      control, or dead), or only used outside the loop and can be moved
      out of the loop (e.g. invariants, inductions).  The loop can be 
@@ -444,16 +444,79 @@
         vectorization_factor, LOOP_VINFO_INT_NITERS (loop_vinfo));
 
   if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
-      && ((LOOP_VINFO_INT_NITERS (loop_vinfo) < vectorization_factor)
-	  || (LOOP_VINFO_INT_NITERS (loop_vinfo) <=
-		((unsigned) (PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)) 
-					   * vectorization_factor))))
+      && (LOOP_VINFO_INT_NITERS (loop_vinfo) < vectorization_factor))
     {
       if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
-	fprintf (vect_dump, "not vectorized: iteration count too small.");
+        fprintf (vect_dump, "not vectorized: iteration count too small.");
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump,
+                 "not vectorized: iteration count smaller than vectorization factor.");
       return false;
     }
 
+  /* Analyze cost. Decide if worth while to vectorize.  */
+
+  min_profitable_iters = vect_estimate_min_profitable_iters (loop_vinfo);
+
+  if (min_profitable_iters < 0)
+    {
+      if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
+        fprintf (vect_dump, 
+                 "not vectorized: vectorization not profitable.");
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, 
+                 "not vectorized: minimum profitable iterations greater than vectorization factor.");
+      return false;
+    }
+
+  min_scalar_loop_bound = (PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND))
+                          * vectorization_factor;
+
+  if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
+    {
+      /* Cost model is enabled.  */
+      if (min_profitable_iters)
+        {
+          /* If run-time loop bound param is defined and is more conservative
+             than minimum profitable iterations determined by the cost model,
+             then use the run-time loop bound param as the lower bound to check
+             iteration count.  Else use the minimum profitable iterations
+             determined by the cost model as the lower bound.  */
+
+          if ((min_scalar_loop_bound 
+               && (min_scalar_loop_bound < min_profitable_iters)
+               && (LOOP_VINFO_INT_NITERS (loop_vinfo)
+                   < ((unsigned) min_scalar_loop_bound)))
+              || (LOOP_VINFO_INT_NITERS (loop_vinfo)
+                  < ((unsigned) min_profitable_iters)))
+            {
+              if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))	      
+                fprintf (vect_dump,
+                         "not vectorized: vectorization not profitable.");
+              if (vect_print_dump_info (REPORT_DETAILS))	      
+                fprintf (vect_dump,
+                         "not vectorized: iteration count smaller than run-time loop bound parameter or minimum profitable iterations (whichever is more conservative).");
+	      return false;
+            }          
+        }
+      /* Cost model is disabled.  */
+      else
+        {
+          /* Check relative to run-time loop bound param.  */
+          if (LOOP_VINFO_INT_NITERS (loop_vinfo) 
+              < ((unsigned) min_scalar_loop_bound))
+            {
+              if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
+                fprintf (vect_dump,
+                         "not vectorized: iteration count smaller than run-time loop bound parameter.");
+              if (vect_print_dump_info (REPORT_DETAILS))
+                fprintf (vect_dump,
+                         "not vectorized: iteration count smaller than run-time loop bound parameter.");
+              return false;
+            }
+        }
+    }
+
   if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
       || LOOP_VINFO_INT_NITERS (loop_vinfo) % vectorization_factor != 0
       || LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo))
Index: common.opt
===================================================================
--- common.opt	(revision 123494)
+++ common.opt	(working copy)
@@ -1101,6 +1101,10 @@
 Common Report Var(flag_tree_vectorize) Optimization
 Enable loop vectorization on trees
 
+fvect-cost-model
+Common Report Var(flag_vect_cost_model) Optimization
+Enable use of cost model in vectorization
+
 ftree-vect-loop-version
 Common Report Var(flag_tree_vect_loop_version) Init(1) Optimization
 Enable loop versioning when doing loop vectorization on trees
Index: tree-vect-transform.c
===================================================================
--- tree-vect-transform.c	(revision 123494)
+++ tree-vect-transform.c	(working copy)
@@ -74,6 +74,359 @@
 static int vect_min_worthwhile_factor (enum tree_code);
 
 
+/* Function vect_estimate_min_profitable_iters
+
+   Return the number of iterations required for the vector version of the
+   loop to be profitable relative to the cost of the scalar version of the
+   loop.  */
+
+int
+vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo)
+{
+  int i;
+  int min_profitable_iters;
+  div_t overhead;
+  int peel_iters;
+  int vec_inside_cost = 0;
+  int vec_outside_cost = 0;
+  int scalar_single_iter_cost = 0;
+  int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
+  int nbbs = loop->num_nodes;
+
+  /* Cost model disabled.  */
+  if (!flag_vect_cost_model)
+    {
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "cost model disabled\n");      
+      return 0;
+    }
+
+  /* Requires a loop to finish up remaining iterations after
+     vector loop.  Add additional cost for the test.  */
+  if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
+    vec_outside_cost += TARG_COND_BRANCH_COST;
+
+  /* Requires loop versioning tests.  */
+  /* FIXME: Make cost depend on number of stmts in may_misalign list.  */
+  if (LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo))
+    vec_outside_cost += TARG_COND_BRANCH_COST;
+
+  /* Requires 2 guards for each peel loop.  */
+  if (LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo))
+    vec_outside_cost += 2*TARG_COND_BRANCH_COST;
+
+  /* Count statements in scalar loop.  Using this as scalar cost for a single
+     iteration for now.  */
+  /* TODO: Add outer loop support.  */
+  for (i = 0; i < nbbs; i++)
+    {
+      block_stmt_iterator si;
+      basic_block bb = bbs[i];
+
+      for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
+        {
+          tree stmt = bsi_stmt (si);
+          stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+          if (!STMT_VINFO_RELEVANT_P (stmt_info)
+              && !STMT_VINFO_LIVE_P (stmt_info))
+            continue;
+          scalar_single_iter_cost++;
+          vec_inside_cost += STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info);
+          vec_outside_cost += STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info);
+        }
+    }
+
+  /* Add in costs of peeled instructions, if any.  */
+  peel_iters = LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo);
+  /* TODO: Support if number of peeled iters is only known at run time.  */
+  if (peel_iters > 0)
+    vec_outside_cost += peel_iters * scalar_single_iter_cost;
+
+  /* Calculate number of iterations required to make the vector version 
+     profitable, relative to the loop bodies only.  */
+  if (scalar_single_iter_cost > vec_inside_cost)
+    min_profitable_iters = 1;
+  else
+    {
+      div_t min_tmp = div (vec_inside_cost, scalar_single_iter_cost);
+      min_profitable_iters = min_tmp.rem == 0 
+                             ? min_tmp.quot : min_tmp.quot + 1;
+    }
+
+  /* If the minimum number of iterations required for loop body profitability
+     is greater than the vectorization factor, then the vector version can
+     never be profitable.  */
+  if (min_profitable_iters > vf)
+    {
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "cost model: min profitable iterations %d greater than VF %d\n", min_profitable_iters, vf);
+      return -1;
+    }
+  else
+    {
+      /* Now we've determined the minimum number of iterations required for the
+         body of the vector loop to be profitable.  We must now determine how 
+         many more iterations are required to overcome the additional overhead
+         that the vector loop requires, if any.  The overhead is contained in
+         the "outside of loop" costs gathered above.  */
+      if ((min_profitable_iters * scalar_single_iter_cost) <
+          (vec_inside_cost + vec_outside_cost))
+        {
+          overhead = div (vec_outside_cost, scalar_single_iter_cost);
+
+          if (overhead.rem > 0)
+            min_profitable_iters += (overhead.quot + 1);
+          else
+            min_profitable_iters += overhead.quot;
+        }
+    }
+
+  if (vect_print_dump_info (REPORT_DETAILS))
+    {
+      fprintf (vect_dump, "Cost model analysis: \n");
+      fprintf (vect_dump, "  Vector inside of loop cost: %d\n",
+	       vec_inside_cost);
+      fprintf (vect_dump, "  Vector outside of loop cost: %d\n",
+	       vec_outside_cost);
+      fprintf (vect_dump, "  Scalar cost: %d\n", scalar_single_iter_cost);
+      fprintf (vect_dump, "  Calculated minimum iters for profitability: %d\n",
+	       min_profitable_iters);
+      fprintf (vect_dump, "  Actual minimum iters for profitability: %d\n",
+	       min_profitable_iters < vf ? vf : min_profitable_iters);
+    }
+
+  return min_profitable_iters < vf ? vf : min_profitable_iters;
+}
+
+
+/* TODO: Close dependency between vect_model_*_cost and vectorizable_* 
+   functions. Design better to avoid maintainence issues.  */
+    
+/* Function vect_model_reduction_cost.  
+
+   Models cost for a reduction operation, including the vector ops 
+   generated within the strip-mine loop, the initial definition before
+   the loop, and the epilog code that must be generated.  */
+
+static void
+vect_model_reduction_cost (stmt_vec_info stmt_info, enum tree_code reduc_code,
+			   int ncopies)
+{
+  int outer_cost = 0;
+  enum tree_code code;
+  optab optab;
+  tree vectype;
+  tree orig_stmt;
+  tree reduction_op;
+  enum machine_mode mode;
+  tree operation = GIMPLE_STMT_OPERAND (STMT_VINFO_STMT (stmt_info), 1);
+  int op_type = TREE_CODE_LENGTH (TREE_CODE (operation));
+
+  /* Cost of reduction op inside loop.  */
+  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) += ncopies * TARG_VEC_STMT_COST;
+
+  reduction_op = TREE_OPERAND (operation, op_type-1);
+  vectype = get_vectype_for_scalar_type (TREE_TYPE (reduction_op));
+  mode = TYPE_MODE (vectype);
+  orig_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
+
+  if (!orig_stmt) 
+    orig_stmt = STMT_VINFO_STMT (stmt_info);
+
+  code = TREE_CODE (GIMPLE_STMT_OPERAND (orig_stmt, 1));
+
+  /* Add in cost for initial definition.  */
+  outer_cost += TARG_VEC_STMT_COST;
+
+  /* Determine cost of epilog code.  */
+  if (reduc_code < NUM_TREE_CODES) 
+    /* We have a reduction operator that will reduce the vector in one 
+       statement.  Also requires scalar extract.  */
+    outer_cost += TARG_VEC_STMT_COST + TARG_VEC_TO_SCALAR_COST;
+  else 
+    {
+      int vec_size_in_bits = tree_low_cst (TYPE_SIZE (vectype), 1);
+      tree bitsize =
+	TYPE_SIZE (TREE_TYPE ( GIMPLE_STMT_OPERAND (orig_stmt, 0)));
+      int element_bitsize = tree_low_cst (bitsize, 1);
+      int nelements = vec_size_in_bits / element_bitsize;
+
+      optab = optab_for_tree_code (code, vectype);
+
+      /* We have a whole vector shift available.  */
+      if (!VECTOR_MODE_P (mode) 
+          || optab->handlers[mode].insn_code == CODE_FOR_nothing)
+        {
+	  /* Final reduction via vector shifts and the reduction operator.  
+	     Also requires scalar extract.  */
+	  outer_cost += ((exact_log2(nelements) * 2 + 1) * TARG_VEC_STMT_COST);
+        } 
+      else
+	{
+	  /* Use extracts and reduction op for final reduction.  For N 
+	     elements, we have N extracts and N-1 reduction ops.  */
+	  outer_cost += ((nelements + nelements - 1) * TARG_VEC_STMT_COST);
+	}
+    }
+
+  STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = outer_cost;
+}
+
+
+/* Function vect_model_simple_cost.  
+
+   Models cost for simple operations, i.e. those that only emit ncopies of a 
+   single op.  Right now, this does not account for multiple insns that could
+   be generated for the single vector op.  We will handle that shortly.  */
+
+static void
+vect_model_simple_cost (stmt_vec_info stmt_info, int ncopies)
+{
+  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = ncopies * TARG_VEC_STMT_COST;
+}
+
+
+/* Function vect_cost_strided_group_size 
+ 
+   For strided load or store, return the group_size only if it is the first
+   load or store of a group, else return 1.  This ensures that group size is
+   only returned once per group.  */
+
+static int
+vect_cost_strided_group_size (stmt_vec_info stmt_info)
+{
+  tree first_stmt = DR_GROUP_FIRST_DR (stmt_info);
+
+  if (first_stmt == STMT_VINFO_STMT (stmt_info))
+    return DR_GROUP_SIZE (stmt_info);
+
+  return 1;
+}
+
+
+/* Function vect_model_store_cost
+
+   Models cost for stores.  In the case of strided accesses, the last access
+   has the overhead of the strided access attributed to it.  */
+
+static void
+vect_model_store_cost (stmt_vec_info stmt_info, int ncopies)
+{
+  int cost = 0;
+  int group_size;
+
+  /* Strided access?  */
+
+  if (DR_GROUP_FIRST_DR (stmt_info)) 
+    {
+      group_size = vect_cost_strided_group_size (stmt_info);
+    }
+  else 
+    /* Not a strided access.  */
+    group_size = 1;
+
+  /* Is this an access in a group of stores, which provide strided access?  
+     If so, add in the cost of the permutes.  */
+  if (group_size > 1) 
+    {
+      /* Uses a high and low interleave operation for each needed permute.  */
+      cost = ncopies * exact_log2(group_size) * group_size 
+             * TARG_VEC_STMT_COST;	
+    }
+
+  /* Costs of the stores.  */
+  cost += ncopies * TARG_VEC_STORE_COST;
+
+  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost;
+}
+
+
+/* Function vect_model_load_cost
+
+   Models cost for loads.  In the case of strided accesses, the last access
+   has the overhead of the strided access attributed to it.  Since unaligned
+   accesses are supported for loads, we also account for the costs of the 
+   access scheme chosen.  */
+
+static void
+vect_model_load_cost (stmt_vec_info stmt_info, int ncopies)
+		 
+{
+  int inner_cost = 0;
+  int group_size;
+  int alignment_support_cheme;
+  tree first_stmt;
+  struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info), *first_dr;
+
+  /* Strided accesses?  */
+  first_stmt = DR_GROUP_FIRST_DR (stmt_info);
+  if (first_stmt)
+    {
+      group_size = vect_cost_strided_group_size (stmt_info);
+      first_dr = STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt));
+    }
+  else
+    {
+      /* Not a strided access.  */
+      group_size = 1;
+      first_dr = dr;
+    }
+  alignment_support_cheme = vect_supportable_dr_alignment (first_dr);
+
+  /* Is this an access in a group of loads providing strided access?  
+     If so, add in the cost of the permutes.  */
+  if (group_size > 1) 
+    {
+      /* Uses an even and odd extract operations for each needed permute.  */
+      inner_cost = ncopies * exact_log2(group_size) * group_size *
+                   TARG_VEC_STMT_COST;
+    }
+
+  /* The loads themselves.  */
+  switch (alignment_support_cheme)
+    {
+    case dr_aligned:
+      inner_cost += ncopies * TARG_VEC_LOAD_COST;
+      break;
+
+    case dr_unaligned_supported:
+      /* Here, we assign an additional cost for the unaligned load.  */
+      inner_cost += ncopies * TARG_VEC_UNALIGNED_LOAD_COST;
+
+    case dr_unaligned_software_pipeline:
+      {
+        int outer_cost = 0;
+
+        /* Unaligned software pipeline has a load of an address, an initial
+           load, and possibly a mask operation to "prime" the loop. However,
+           if this is an access in a group of loads, which provide strided
+           acccess, then the above cost should only be considered for one
+           access in the group. Inside the loop, there is a load op
+           and a realignment op.  */
+
+        if ((!DR_GROUP_FIRST_DR (stmt_info)) || group_size > 1)
+          {
+            outer_cost = 2*TARG_VEC_STMT_COST;
+            if (targetm.vectorize.builtin_mask_for_load)
+              outer_cost += TARG_VEC_STMT_COST;
+          }
+        
+        STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = outer_cost;
+
+        inner_cost += ncopies * (TARG_VEC_LOAD_COST + TARG_VEC_STMT_COST);
+        break;
+      }
+
+    default:
+      gcc_unreachable ();
+    }
+
+  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = inner_cost;
+}
+
+
 /* Function vect_get_new_vect_var.
 
    Returns a name for a new variable. The current naming scheme appends the 
@@ -1695,6 +2048,7 @@
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
+      vect_model_reduction_cost (stmt_info, epilog_reduc_code, ncopies);
       return true;
     }
 
@@ -1889,9 +2243,13 @@
 
   gcc_assert (ZERO_SSA_OPERANDS (stmt, SSA_OP_ALL_VIRTUALS));
 
+  ncopies = (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
+	     / TYPE_VECTOR_SUBPARTS (vectype_out));
+
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = call_vec_info_type;
+      vect_model_simple_cost (stmt_info, ncopies);
       return true;
     }
 
@@ -1900,8 +2258,6 @@
   if (vect_print_dump_info (REPORT_DETAILS))
     fprintf (vect_dump, "transform operation.");
 
-  ncopies = (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
-	     / TYPE_VECTOR_SUBPARTS (vectype_out));
   gcc_assert (ncopies >= 1);
 
   /* Handle def.  */
@@ -2158,6 +2514,7 @@
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = assignment_vec_info_type;
+      vect_model_simple_cost (stmt_info, ncopies);
       return true;
     }
 
@@ -2357,6 +2714,7 @@
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = op_vec_info_type;
+      vect_model_simple_cost (stmt_info, ncopies);
       return true;
     }
 
@@ -2581,6 +2939,7 @@
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = type_demotion_vec_info_type;
+      vect_model_simple_cost (stmt_info, ncopies);
       return true;
     }
                                                                                 
@@ -2795,6 +3154,7 @@
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = type_promotion_vec_info_type;
+      vect_model_simple_cost (stmt_info, 2*ncopies);
       return true;
     }
 
@@ -3102,14 +3462,12 @@
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = store_vec_info_type;
+      vect_model_store_cost (stmt_info, ncopies);
       return true;
     }
 
   /** Transform.  **/
 
-  if (vect_print_dump_info (REPORT_DETAILS))
-    fprintf (vect_dump, "transform store. ncopies = %d",ncopies);
-
   if (strided_store)
     {
       first_stmt = DR_GROUP_FIRST_DR (stmt_info);
@@ -3134,6 +3492,9 @@
       group_size = 1;
     }
   
+  if (vect_print_dump_info (REPORT_DETAILS))
+    fprintf (vect_dump, "transform store. ncopies = %d",ncopies);
+
   dr_chain = VEC_alloc (tree, heap, group_size);
   oprnds = VEC_alloc (tree, heap, group_size);
 
@@ -3764,14 +4125,15 @@
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = load_vec_info_type;
+      vect_model_load_cost (stmt_info, ncopies);
       return true;
     }
 
-  /** Transform.  **/
-
   if (vect_print_dump_info (REPORT_DETAILS))
     fprintf (vect_dump, "transform load.");
 
+  /** Transform.  **/
+
   if (strided_load)
     {
       first_stmt = DR_GROUP_FIRST_DR (stmt_info);
@@ -4654,6 +5016,8 @@
   basic_block preheader;
   int loop_num;
   unsigned int th;
+  int min_scalar_loop_bound;
+  int min_profitable_iters;
 
   if (vect_print_dump_info (REPORT_DETAILS))
     fprintf (vect_dump, "=== vect_do_peeling_for_loop_bound ===");
@@ -4669,11 +5033,37 @@
 				   &ratio_mult_vf_name, ratio);
 
   loop_num  = loop->num; 
-  /* Threshold for vectorized loop.  */
-  th = (PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)) * 
-			LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+
+  /* Analyze cost to set threshhold for vectorized loop.  */
+
+  min_profitable_iters = vect_estimate_min_profitable_iters (loop_vinfo);
+
+  min_scalar_loop_bound = (PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)) * 
+                          LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+
+  /* Cost model is enabled.  */
+  if (min_profitable_iters)
+    {
+      /* If run-time loop bound param is defined and is more conservative
+         than minimum profitable iterations determined by the cost model,
+         then use it as the lower bound to check iteration count.  Else
+         use the minimum profitable iterations determined by the cost
+         model as the lower bound.  */
+
+      if (min_scalar_loop_bound 
+          && min_scalar_loop_bound < min_profitable_iters)
+        th = min_scalar_loop_bound;
+      else
+        th = min_profitable_iters;
+    }
+  /* Cost model is disabled.  */
+  else
+    /* Set threshold to run-time loop bound param.  */
+    th = min_scalar_loop_bound;
+
   new_loop = slpeel_tree_peel_loop_to_edge (loop, single_exit (loop),
-					    ratio_mult_vf_name, ni_name, false, th);
+                                            ratio_mult_vf_name, ni_name, false,
+                                            th);
   gcc_assert (new_loop);
   gcc_assert (loop_num == loop->num);
 #ifdef ENABLE_CHECKING

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] Vectorizer cost model implementation
  2007-04-22  5:45     ` Jagasia, Harsha
@ 2007-04-22  6:23       ` Andrew Pinski
  2007-04-22  9:09         ` Eric Christopher
  2007-04-22 15:57       ` Paul Brook
  2007-04-22 22:39       ` Dorit Nuzman
  2 siblings, 1 reply; 30+ messages in thread
From: Andrew Pinski @ 2007-04-22  6:23 UTC (permalink / raw)
  To: Jagasia, Harsha
  Cc: Dorit Nuzman, Linthicum, Tony, Meissner, Michael, GCC Patches

On 4/21/07, Jagasia, Harsha <harsha.jagasia@amd.com> wrote:
> For now, I have turned the cost model on in vect.exp for the vect-*
> tests. For every vect-* test that failed, I duplicated it in
> no-costmodel-vect-* test. For the duplicated test, I turned off the cost
> model to allow it to pass.

I rather you do the opposite in that you create a costmodel-vect-* and
the tests that already exist just turn off the cost model.  This is
better to compare testresults between the trunk (and 4.3) and the
earlier releases.

Tommorrow (the 22nd) sometime I will run this patch on spu-elf and
report the results of the testsuite (for the vect-* tests).

Thanks,
Andrew Pinski

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] Vectorizer cost model implementation
  2007-04-22  6:23       ` Andrew Pinski
@ 2007-04-22  9:09         ` Eric Christopher
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Christopher @ 2007-04-22  9:09 UTC (permalink / raw)
  To: Andrew Pinski
  Cc: Jagasia, Harsha, Dorit Nuzman, Linthicum, Tony, Meissner,
	Michael, GCC Patches


On Apr 21, 2007, at 11:07 PM, Andrew Pinski wrote:

> On 4/21/07, Jagasia, Harsha <harsha.jagasia@amd.com> wrote:
>> For now, I have turned the cost model on in vect.exp for the vect-*
>> tests. For every vect-* test that failed, I duplicated it in
>> no-costmodel-vect-* test. For the duplicated test, I turned off  
>> the cost
>> model to allow it to pass.
>
> I rather you do the opposite in that you create a costmodel-vect-* and
> the tests that already exist just turn off the cost model.  This is
> better to compare testresults between the trunk (and 4.3) and the
> earlier releases.

I concur here. We want the existing testcases to continue testing what
they were previously.

-eric

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] Vectorizer cost model implementation
  2007-04-22  5:45     ` Jagasia, Harsha
  2007-04-22  6:23       ` Andrew Pinski
@ 2007-04-22 15:57       ` Paul Brook
  2007-04-22 22:39       ` Dorit Nuzman
  2 siblings, 0 replies; 30+ messages in thread
From: Paul Brook @ 2007-04-22 15:57 UTC (permalink / raw)
  To: gcc-patches
  Cc: Jagasia, Harsha, Dorit Nuzman, Linthicum, Tony, Meissner, Michael


> For now, I have turned the cost model on in vect.exp for the vect-*
> tests. For every vect-* test that failed, I duplicated it in
> no-costmodel-vect-* test. For the duplicated test, I turned off the cost
> model to allow it to pass.
>
> For every vect-* test that failed, I then added a dg-final test for
> "vectorization not profitable" string. I removed the original dg-final
> test (instead of xfail) because the gcc wiki suggests that xfail should
> only be used for bugs that can not be fixed immediately. And the
> behavior under focus is not a bug, but a correct estimation on part of
> the cost model.

> I am relatively new to the dejagnu framework, so I would appreciate some
> input on whether this implementation is on the right track.

I'm worried about the amount of effort this approach is going to take to 
maintain, and whether it's going to scale to all the target gcc supports.

GCC already supports at many different SIMD capable implementations (x86, ppc, 
mips and cell/spu), arm will support it soon, and we (CodeSourey) have at 
least 1 local SIMD capable port. It's also more than possible that future 
implementations of these architectures (eg. a new x86 or ppc core) could 
change the cost model for the existing architectures.

Paul

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [patch] Vectorizer cost model implementation
  2007-04-22  5:45     ` Jagasia, Harsha
  2007-04-22  6:23       ` Andrew Pinski
  2007-04-22 15:57       ` Paul Brook
@ 2007-04-22 22:39       ` Dorit Nuzman
  2007-04-24 20:14         ` Jagasia, Harsha
  2 siblings, 1 reply; 30+ messages in thread
From: Dorit Nuzman @ 2007-04-22 22:39 UTC (permalink / raw)
  To: Jagasia, Harsha; +Cc: GCC Patches, Meissner, Michael, Linthicum, Tony

Hi Harsha,

> I have attached a modified patch with changes for Dorit's feedback
> below. Comments inline below to discuss the changes.
>

Thanks for addressing the comments,

> The test that failed to vectorize on x86-64, because the cost model
> deemed it unprofitable, but passed on x86 is pr30843.c. This test uses
> an unsigned long which is 4 bytes on x86 and 8 bytes on x86-64. It turns
> on that the larger vectorization factor on x86 makes it profitable to
> vectorize this loop. And hence the dg-final test I added for the
> "vectorization not profitable" string passed on x86-64, but failed on
> x86. So I had to explicitly target it to not be checked on x86. If
> pr30843.c fails on other targets, they could also be added to this list.
>

I think this is probably a good example why this approach (enabling the
cost-model by default for the generic multi-target tests) would be hard to
maintain... sounds like this is what other people feel as well. I'm sorry
for asking you to consider switching to this approach - maybe we want to
revert back to the scheme in the original patch you sent after all...?

I can test the patch on powerpc-linux. Will try to do it early this week.


> +  /* Add in costs of peeled instructions, if any.  */
> +  peel_iters = LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo);
> +  /* TODO: Support if number of peeled iters is only known at run time.
*/
> +  if (peel_iters > 0)
> +    vec_outside_cost += peel_iters * scalar_single_iter_cost;

So this means that when we don't know the value of peel_iters we assume it
is 0. Maybe we prefer to be more conservative and rather assume that it is
VF-1? (we know that peel_iters is between 0 and VF-1).

Also (something I haven't noticed before) - we'd probably want to do the
same thing to account for the cost of the scalar peel-loop we create to
align the number of iterations (i.e. when !LOOP_VINFO_NITERS_KNOWN_P or
when the loop count doesn't evenly divide by VF). Currently we only have:
> +  if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
> +    vec_outside_cost += TARG_COND_BRANCH_COST;


But the thing that bothers me much more than this, is that I can't make
sense of the computation in vect_estimate_min_profitable_iters. AFAIU,
vect_estimate_min_profitable_iters computes the min_profitable_iters as
follows:
      (*) min_profitable_iters = VIC/SIC + VOC/SIC

where:
VIC = vector iteration cost
SIC = scalar iteration cost
VOC = vector outside cost

which looks wrong... (how come there's no dependency on VF?)

If you start from this equation:
      (**) SIC*niters > VIC*niters/VF + VOC

you end up with this:
      (***) min_profitable_iters = VOC/(SIC - VIC/VF)

(I somehow managed to convince myself that
vect_estimate_min_profitable_iters made sense when I first read it, but now
it just doesn't make sense to me anymore...)


The rest of the comments are really minor stuff:

> +  for (i = 0; i < nbbs; i++)
> +    {
> +      block_stmt_iterator si;
> +      basic_block bb = bbs[i];
> +
> +      for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
> +        {
> +          tree stmt = bsi_stmt (si);
> +          stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> +          if (!STMT_VINFO_RELEVANT_P (stmt_info)
> +              && !STMT_VINFO_LIVE_P (stmt_info))
> +            continue;
> +          scalar_single_iter_cost++;
> +          vec_inside_cost += STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info);
> +          vec_outside_cost += STMT_VINFO_OUTSIDE_OF_LOOP_COST
(stmt_info);
> +        }
> +    }

It just occurred to me that it's a bit "unfair" how all scalar stmts have
the same cost (=1), and for vector stmts we have potentially different
costs for different kinds of stmt... It's ok for now, maybe with a TODO
comment to consider assigning different costs in the future.


> >This also reminds me that we need to consider what to do if the user
> asked
> >to vectorize when the loop-bound is greater than X (via --param
> >min-vect-loop-bound), but the cost model determined that it would be
> >profitable when the loop-bound is greater than Y, and X != Y. The
> --param
> >min-vect-loop-bound was originally added only until such time that we
> have
> >a cost model, but maybe we want to keep it around, at least until the
> cost
> >model has been tuned/matured. In the meantime we can decide to do one
> of
> >the following:
> >(1) always go with the user
> >(2) always go with the cost model
> >(3) go with the user if the user is more conservative than the
> cost-model
> >(i.e. X > Y)
> >(4) go with the user if the user is less conservative than the
> cost-model
> >(i.e. X < Y)
> >
> >The first option is equivalent to turning off the cost model with
> >-fno-vect-cost-model, so I'd go with one of 2,3,4.
>
> I have moved the code to vect_analyze_operations. Also for now I have
> chosen option 3. Down the road, I plan to test the cost model for
> performance with benchmarks like spec/netlib etc to see if some other
> choice would work better.
>

ok (not sure there's so many important vectorizable loops in spec, but at
least I remember the vectorizer degrades performance in eon, so that's
something to look at).


> +  /* Cost model is enabled.  */
> +  if (min_profitable_iters)
> +    {
> +      /* If run-time loop bound param is defined and is more
conservative
> +         than minimum profitable iterations determined by the cost
model,
> +         then use it as the lower bound to check iteration count.  Else
> +         use the minimum profitable iterations determined by the cost
> +         model as the lower bound.  */
> +
> +      if (min_scalar_loop_bound
> +          && min_scalar_loop_bound < min_profitable_iters)
> +        th = min_scalar_loop_bound;
> +      else
> +        th = min_profitable_iters;
> +    }
> +  /* Cost model is disabled.  */
> +  else
> +    /* Set threshold to run-time loop bound param.  */
> +    th = min_scalar_loop_bound;

My only comment here is that the loop bound param is also used to control
vectorization at compile time, so I think maybe calling it a "run-time loop
bound param" can be a bit misleading. A more accurate description would be
"user specified loop bound param". (This appears in another place in the
patch).


Just a coding style issue:

> +  /* Strided access?  */
> +
> +  if (DR_GROUP_FIRST_DR (stmt_info))
> +    {
> +      group_size = vect_cost_strided_group_size (stmt_info);
> +    }
> +  else
> +    /* Not a strided access.  */
> +    group_size = 1;

no need to use { } if you have only one stmt in the then clause.

Lets also add a TODO note somewhere to look into taking profile info or
range info into account if available.

thanks,
dorit

>
> Thanks,
> Harsha
> [attachment "patch" deleted by Dorit Nuzman/Haifa/IBM]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [patch] Vectorizer cost model implementation
  2007-04-22 22:39       ` Dorit Nuzman
@ 2007-04-24 20:14         ` Jagasia, Harsha
  2007-04-25 11:49           ` Dorit Nuzman
  0 siblings, 1 reply; 30+ messages in thread
From: Jagasia, Harsha @ 2007-04-24 20:14 UTC (permalink / raw)
  To: Dorit Nuzman; +Cc: GCC Patches, Meissner, Michael, Linthicum, Tony

Hi Dorit,
>Thanks for addressing the comments,
:)

>> The test that failed to vectorize on x86-64, because the cost model
>> deemed it unprofitable, but passed on x86 is pr30843.c. This test
uses
>> an unsigned long which is 4 bytes on x86 and 8 bytes on x86-64. It
turns
>> on that the larger vectorization factor on x86 makes it profitable to
>> vectorize this loop. And hence the dg-final test I added for the
>> "vectorization not profitable" string passed on x86-64, but failed on
>> x86. So I had to explicitly target it to not be checked on x86. If
>> pr30843.c fails on other targets, they could also be added to this
list.
>>
>
>I think this is probably a good example why this approach (enabling the
>cost-model by default for the generic multi-target tests) would be hard
to
>maintain... sounds like this is what other people feel as well. I'm
sorry
>for asking you to consider switching to this approach - maybe we want
to
>revert back to the scheme in the original patch you sent after all...?

No worries; should be easy to revert back to the old approach. I will
create subdirectories for specific targets within testsuite/gcc.dg/vect.
And then add the tests for atleast x86 and x86_64. 

>I can test the patch on powerpc-linux. Will try to do it early this
week.

As you suggested in an offline mail to me, it may be a good idea to hold
off on the tests until I fix the patch for the issues you have brought
up below.

>
>
>> +  /* Add in costs of peeled instructions, if any.  */
>> +  peel_iters = LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo);
>> +  /* TODO: Support if number of peeled iters is only known at run
time.
>*/
>> +  if (peel_iters > 0)
>> +    vec_outside_cost += peel_iters * scalar_single_iter_cost;
>
>So this means that when we don't know the value of peel_iters we assume
it
>is 0. Maybe we prefer to be more conservative and rather assume that it
is
>VF-1? (we know that peel_iters is between 0 and VF-1).

Just to clarify, are you recommending adding an else clause to "if
(peel_iters > 0)" and assuming that peel_iters is VF-1 in that code
path, until the real fix comes along?

>
>Also (something I haven't noticed before) - we'd probably want to do
the
>same thing to account for the cost of the scalar peel-loop we create to
>align the number of iterations (i.e. when !LOOP_VINFO_NITERS_KNOWN_P or
>when the loop count doesn't evenly divide by VF). Currently we only
have:
>> +  if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
>> +    vec_outside_cost += TARG_COND_BRANCH_COST;

Ok, I will add the cost of the scalar peel-loop. I will also change the
condition in the snippet immediately above, which adds the cost of the
branch for the scalar peel loop, to alternatively check if the loop
count doesn't evenly divide by VF. Also I think it's better to use 2*
TARG_COND_BRANCH_COST because 2 guards would be generated.

>
>But the thing that bothers me much more than this, is that I can't make
>sense of the computation in vect_estimate_min_profitable_iters. AFAIU,
>vect_estimate_min_profitable_iters computes the min_profitable_iters as
>follows:
>      (*) min_profitable_iters = VIC/SIC + VOC/SIC
>
>where:
>VIC = vector iteration cost
>SIC = scalar iteration cost
>VOC = vector outside cost
>
>which looks wrong... (how come there's no dependency on VF?)
>
>If you start from this equation:
>      (**) SIC*niters > VIC*niters/VF + VOC
>
>you end up with this:
>      (***) min_profitable_iters = VOC/(SIC - VIC/VF)
>
>(I somehow managed to convince myself that
>vect_estimate_min_profitable_iters made sense when I first read it, but
now
>it just doesn't make sense to me anymore...)
>

We had some offline mails about this, but indeed what you have stated
here is correct. The current code is an approximation, but there are
cases where it won't yield the right answer.

I will use the *** equation to calculate the minimum profitable iters
and will also test if SIC > VIC/VF.


>The rest of the comments are really minor stuff:
>
>> +  for (i = 0; i < nbbs; i++)
>> +    {
>> +      block_stmt_iterator si;
>> +      basic_block bb = bbs[i];
>> +
>> +      for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
>> +        {
>> +          tree stmt = bsi_stmt (si);
>> +          stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>> +          if (!STMT_VINFO_RELEVANT_P (stmt_info)
>> +              && !STMT_VINFO_LIVE_P (stmt_info))
>> +            continue;
>> +          scalar_single_iter_cost++;
>> +          vec_inside_cost += STMT_VINFO_INSIDE_OF_LOOP_COST
(stmt_info);
>> +          vec_outside_cost += STMT_VINFO_OUTSIDE_OF_LOOP_COST
>(stmt_info);
>> +        }
>> +    }
>
>It just occurred to me that it's a bit "unfair" how all scalar stmts
have
>the same cost (=1), and for vector stmts we have potentially different
>costs for different kinds of stmt... It's ok for now, maybe with a TODO
>comment to consider assigning different costs in the future.

I will add a TODO for now.

>
>> >This also reminds me that we need to consider what to do if the user
>> asked
>> >to vectorize when the loop-bound is greater than X (via --param
>> >min-vect-loop-bound), but the cost model determined that it would be
>> >profitable when the loop-bound is greater than Y, and X != Y. The
>> --param
>> >min-vect-loop-bound was originally added only until such time that
we
>> have
>> >a cost model, but maybe we want to keep it around, at least until
the
>> cost
>> >model has been tuned/matured. In the meantime we can decide to do
one
>> of
>> >the following:
>> >(1) always go with the user
>> >(2) always go with the cost model
>> >(3) go with the user if the user is more conservative than the
>> cost-model
>> >(i.e. X > Y)
>> >(4) go with the user if the user is less conservative than the
>> cost-model
>> >(i.e. X < Y)
>> >
>> >The first option is equivalent to turning off the cost model with
>> >-fno-vect-cost-model, so I'd go with one of 2,3,4.
>>
>> I have moved the code to vect_analyze_operations. Also for now I have
>> chosen option 3. Down the road, I plan to test the cost model for
>> performance with benchmarks like spec/netlib etc to see if some other
>> choice would work better.
>>
>
>ok (not sure there's so many important vectorizable loops in spec, but
at
>least I remember the vectorizer degrades performance in eon, so that's
>something to look at).

I was thinking of cpu2006; but I have no details to offer on whether any
degradation occurs yet or (perhaps outside the scope of the current cost
model) if any expected improvements are not observed. But, we can look
into eon as well.

>Lets also add a TODO note somewhere to look into taking profile info or
>range info into account if available.

Ok, I have added a comment before vect_estimate_min_profitable_iters.

Thanks,
Harsha


^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [patch] Vectorizer cost model implementation
  2007-04-24 20:14         ` Jagasia, Harsha
@ 2007-04-25 11:49           ` Dorit Nuzman
  2007-04-25 15:37             ` Jagasia, Harsha
  2007-06-06 16:36             ` Jagasia, Harsha
  0 siblings, 2 replies; 30+ messages in thread
From: Dorit Nuzman @ 2007-04-25 11:49 UTC (permalink / raw)
  To: Jagasia, Harsha; +Cc: GCC Patches, Meissner, Michael, Linthicum, Tony

"Jagasia, Harsha" <harsha.jagasia@amd.com> wrote on 24/04/2007 23:00:51:

> Hi Dorit,
> >Thanks for addressing the comments,
> :)
...
> >I think this is probably a good example why this approach (enabling the
> >cost-model by default for the generic multi-target tests) would be hard
> to
> >maintain... sounds like this is what other people feel as well. I'm
> sorry
> >for asking you to consider switching to this approach - maybe we want
> to
> >revert back to the scheme in the original patch you sent after all...?
>
> No worries; should be easy to revert back to the old approach. I will
> create subdirectories for specific targets within testsuite/gcc.dg/vect.
> And then add the tests for atleast x86 and x86_64.
>
> >I can test the patch on powerpc-linux. Will try to do it early this
> week.
>
> As you suggested in an offline mail to me, it may be a good idea to hold
> off on the tests until I fix the patch for the issues you have brought
> up below.
>

ok

> >
> >
> >> +  /* Add in costs of peeled instructions, if any.  */
> >> +  peel_iters = LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo);
> >> +  /* TODO: Support if number of peeled iters is only known at run
> time.
> >*/
> >> +  if (peel_iters > 0)
> >> +    vec_outside_cost += peel_iters * scalar_single_iter_cost;
> >
> >So this means that when we don't know the value of peel_iters we assume
> it
> >is 0. Maybe we prefer to be more conservative and rather assume that it
> is
> >VF-1? (we know that peel_iters is between 0 and VF-1).
>
> Just to clarify, are you recommending adding an else clause to "if
> (peel_iters > 0)" and assuming that peel_iters is VF-1 in that code
> path, until the real fix comes along?
>

I guess I mean something like:

   /* FORNOW: If we don't know what is the value of peel_iters at compile
time - we assume the worst.
      TODO: Build an expression that represents peel_iters to be used in a
run-time test.  */
   if (peel_iters < 0)
     peel_iters = VF - 1;
   vec_outside_cost += peel_iters * scalar_single_iter_cost;


> >
> >Also (something I haven't noticed before) - we'd probably want to do
> the
> >same thing to account for the cost of the scalar peel-loop we create to
> >align the number of iterations (i.e. when !LOOP_VINFO_NITERS_KNOWN_P or
> >when the loop count doesn't evenly divide by VF). Currently we only
> have:
> >> +  if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
> >> +    vec_outside_cost += TARG_COND_BRANCH_COST;
>
> Ok, I will add the cost of the scalar peel-loop. I will also change the
> condition in the snippet immediately above, which adds the cost of the
> branch for the scalar peel loop, to alternatively check if the loop
> count doesn't evenly divide by VF.

you mean to do the following, right?:

      if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
          || LOOP_VINFO_INT_NITERS (loop_vinfo) % vectorization_factor != 0
)
        vec_outside_cost += 2*TARG_COND_BRANCH_COST;

> Also I think it's better to use 2*
> TARG_COND_BRANCH_COST because 2 guards would be generated.
>

agreed.

thanks,

dorit

> >
> >But the thing that bothers me much more than this, is that I can't make
> >sense of the computation in vect_estimate_min_profitable_iters. AFAIU,
> >vect_estimate_min_profitable_iters computes the min_profitable_iters as
> >follows:
> >      (*) min_profitable_iters = VIC/SIC + VOC/SIC
> >
> >where:
> >VIC = vector iteration cost
> >SIC = scalar iteration cost
> >VOC = vector outside cost
> >
> >which looks wrong... (how come there's no dependency on VF?)
> >
> >If you start from this equation:
> >      (**) SIC*niters > VIC*niters/VF + VOC
> >
> >you end up with this:
> >      (***) min_profitable_iters = VOC/(SIC - VIC/VF)
> >
> >(I somehow managed to convince myself that
> >vect_estimate_min_profitable_iters made sense when I first read it, but
> now
> >it just doesn't make sense to me anymore...)
> >
>
> We had some offline mails about this, but indeed what you have stated
> here is correct. The current code is an approximation, but there are
> cases where it won't yield the right answer.
>
> I will use the *** equation to calculate the minimum profitable iters
> and will also test if SIC > VIC/VF.
>
>
> >The rest of the comments are really minor stuff:
> >
> >> +  for (i = 0; i < nbbs; i++)
> >> +    {
> >> +      block_stmt_iterator si;
> >> +      basic_block bb = bbs[i];
> >> +
> >> +      for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
> >> +        {
> >> +          tree stmt = bsi_stmt (si);
> >> +          stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> >> +          if (!STMT_VINFO_RELEVANT_P (stmt_info)
> >> +              && !STMT_VINFO_LIVE_P (stmt_info))
> >> +            continue;
> >> +          scalar_single_iter_cost++;
> >> +          vec_inside_cost += STMT_VINFO_INSIDE_OF_LOOP_COST
> (stmt_info);
> >> +          vec_outside_cost += STMT_VINFO_OUTSIDE_OF_LOOP_COST
> >(stmt_info);
> >> +        }
> >> +    }
> >
> >It just occurred to me that it's a bit "unfair" how all scalar stmts
> have
> >the same cost (=1), and for vector stmts we have potentially different
> >costs for different kinds of stmt... It's ok for now, maybe with a TODO
> >comment to consider assigning different costs in the future.
>
> I will add a TODO for now.
>
> >
> >> >This also reminds me that we need to consider what to do if the user
> >> asked
> >> >to vectorize when the loop-bound is greater than X (via --param
> >> >min-vect-loop-bound), but the cost model determined that it would be
> >> >profitable when the loop-bound is greater than Y, and X != Y. The
> >> --param
> >> >min-vect-loop-bound was originally added only until such time that
> we
> >> have
> >> >a cost model, but maybe we want to keep it around, at least until
> the
> >> cost
> >> >model has been tuned/matured. In the meantime we can decide to do
> one
> >> of
> >> >the following:
> >> >(1) always go with the user
> >> >(2) always go with the cost model
> >> >(3) go with the user if the user is more conservative than the
> >> cost-model
> >> >(i.e. X > Y)
> >> >(4) go with the user if the user is less conservative than the
> >> cost-model
> >> >(i.e. X < Y)
> >> >
> >> >The first option is equivalent to turning off the cost model with
> >> >-fno-vect-cost-model, so I'd go with one of 2,3,4.
> >>
> >> I have moved the code to vect_analyze_operations. Also for now I have
> >> chosen option 3. Down the road, I plan to test the cost model for
> >> performance with benchmarks like spec/netlib etc to see if some other
> >> choice would work better.
> >>
> >
> >ok (not sure there's so many important vectorizable loops in spec, but
> at
> >least I remember the vectorizer degrades performance in eon, so that's
> >something to look at).
>
> I was thinking of cpu2006; but I have no details to offer on whether any
> degradation occurs yet or (perhaps outside the scope of the current cost
> model) if any expected improvements are not observed. But, we can look
> into eon as well.
>
> >Lets also add a TODO note somewhere to look into taking profile info or
> >range info into account if available.
>
> Ok, I have added a comment before vect_estimate_min_profitable_iters.
>
> Thanks,
> Harsha
>
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [patch] Vectorizer cost model implementation
  2007-04-25 11:49           ` Dorit Nuzman
@ 2007-04-25 15:37             ` Jagasia, Harsha
  2007-06-06 16:36             ` Jagasia, Harsha
  1 sibling, 0 replies; 30+ messages in thread
From: Jagasia, Harsha @ 2007-04-25 15:37 UTC (permalink / raw)
  To: Dorit Nuzman; +Cc: GCC Patches, Meissner, Michael, Linthicum, Tony

>you mean to do the following, right?:
>
>      if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
>          || LOOP_VINFO_INT_NITERS (loop_vinfo) % vectorization_factor
!= 0
>)
>        vec_outside_cost += 2*TARG_COND_BRANCH_COST;

Yes.

Thanks,
Harsha


^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [patch] Vectorizer cost model implementation
  2007-04-25 11:49           ` Dorit Nuzman
  2007-04-25 15:37             ` Jagasia, Harsha
@ 2007-06-06 16:36             ` Jagasia, Harsha
  2007-06-07  2:09               ` Eric Christopher
  2007-06-07 19:55               ` Dorit Nuzman
  1 sibling, 2 replies; 30+ messages in thread
From: Jagasia, Harsha @ 2007-06-06 16:36 UTC (permalink / raw)
  To: Dorit Nuzman; +Cc: GCC Patches, Meissner, Michael

[-- Attachment #1: Type: text/plain, Size: 2235 bytes --]

Hello,

>> >I can test the patch on powerpc-linux. Will try to do it early this
>> week.
>>
>> As you suggested in an offline mail to me, it may be a good idea to
hold
>> off on the tests until I fix the patch for the issues you have
brought
>> up below.
>>
>
>ok

I have attached the cost model patch. The test cases have been reduced
to a smaller subset which I think should give us a fairly good coverage
to start with. Currently I have only added the tests on x86 and x86-64. 

>you mean to do the following, right?:
>
>      if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
>          || LOOP_VINFO_INT_NITERS (loop_vinfo) % vectorization_factor
!= 0
>)
>        vec_outside_cost += 2*TARG_COND_BRANCH_COST;
>
>> Also I think it's better to use 2*
>> TARG_COND_BRANCH_COST because 2 guards would be generated.
>>
>
>agreed.

With this patch, the branch and peel costs are calculated conservatively
if the alignment is unknown or if the loop iterations are unknown.


>> >But the thing that bothers me much more than this, is that I can't
make
>> >sense of the computation in vect_estimate_min_profitable_iters.
AFAIU,
>> >vect_estimate_min_profitable_iters computes the min_profitable_iters
as
>> >follows:
>> >      (*) min_profitable_iters = VIC/SIC + VOC/SIC
>> >
>> >where:
>> >VIC = vector iteration cost
>> >SIC = scalar iteration cost
>> >VOC = vector outside cost
>> >
>> >which looks wrong... (how come there's no dependency on VF?)
>> >
>> >If you start from this equation:
>> >      (**) SIC*niters > VIC*niters/VF + VOC
>> >
>> >you end up with this:
>> >      (***) min_profitable_iters = VOC/(SIC - VIC/VF)
>> >
>> >(I somehow managed to convince myself that
>> >vect_estimate_min_profitable_iters made sense when I first read it,
but
>> now
>> >it just doesn't make sense to me anymore...)
>> >
>>
>> We had some offline mails about this, but indeed what you have stated
>> here is correct. The current code is an approximation, but there are
>> cases where it won't yield the right answer.
>>
>> I will use the *** equation to calculate the minimum profitable iters
>> and will also test if SIC > VIC/VF.

This has been fixed.

Thanks,
Harsha

[-- Attachment #2: costmodel.0606.patch --]
[-- Type: application/octet-stream, Size: 55999 bytes --]

Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 125492)
+++ gcc/doc/invoke.texi	(working copy)
@@ -357,7 +357,7 @@
 -fcheck-data-deps @gol
 -ftree-dominator-opts -ftree-dse -ftree-copyrename -ftree-sink @gol
 -ftree-ch -ftree-sra -ftree-ter -ftree-fre -ftree-vectorize @gol
--ftree-vect-loop-version -ftree-salias -fipa-pta -fweb @gol
+-ftree-vect-loop-version -fvect-cost-model -ftree-salias -fipa-pta -fweb @gol
 -ftree-copy-prop -ftree-store-ccp -ftree-store-copy-prop -fwhole-program @gol
 --param @var{name}=@var{value}
 -O  -O0  -O1  -O2  -O3  -Os}
@@ -5666,6 +5666,9 @@
 to control which version is executed.  This option is enabled by default
 except at level @option{-Os} where it is disabled.
 
+@item -fvect-cost-model
+Enable cost model for vectorization.
+
 @item -ftree-vrp
 Perform Value Range Propagation on trees.  This is similar to the
 constant propagation pass, but instead of values, ranges of values are
Index: gcc/ChangeLog
===================================================================
--- gcc/ChangeLog	(revision 125492)
+++ gcc/ChangeLog	(working copy)
@@ -1,3 +1,39 @@
+2007-06-06  Harsha Jagsia <harsha.jagasia@amd.com>
+	    Tony Linthicum <tony.linthicum@amd.com>
+		   
+	* common.opt: Add -fvect-cost-model flag.	
+	* tree-vectorizer.c: Initialize inside and outside cost fields in
+	stmt_vec_info struct for STMT.
+	* tree-vectorizer.h: Define inside and outside cost fields in
+	stmt_vec_info struct and access functions for the same. 
+	Add target costs for vector stataments to the cost model.
+	Define vect_estimate_min_profitable_iters function.
+	* tree-vect-analyze.c (vect_analyze_operations): Add a compile-time
+	check to evaluate if loop iterations are less than minimum profitable 
+	iterations determined by cost model or minimum vect loop bound defined
+	by user, whichever is more conservative.
+	* tree-vect-transform.c (vect_do_peeling_for_loop_bound): Add a
+	run-time check to evaluate if loop iterations are less than minimum
+	profitable iterations determined by cost model or minimum vect loop
+	bound defined by user, whichever is more conservative.
+	(vect_estimate_min_profitable_iterations): New function.
+	(vect_model_induction_cost): New function.
+	(vect_model_simple_cost): New function.
+	(vect_cost_strided_group_size): New function.
+	(vect_model_store_cost): New function.
+	(vect_model_load_cost): New function.
+	(vectorizable_reduction): Call vect_model_reduction_cost during
+	analysis phase.
+	(vectorizable_reduction): Call vect_model_reduction_cost during
+	analysis phase.
+	(vectorizable_induction): Call vect_model_induction_cost during
+	analysis phase.
+	(vectorizable_load): Call vect_model_load_cost during analysis phase.
+	(vectorizable_store): Call vect_model_store_cost during analysis phase.
+	(vectorizable_call, vectorizable_assignment, vectorizable_operation,
+	 vectorizable_promotion, vectorizable_demotion):
+	Call vect_model_simple_cost during analysis phase.
+
 2007-06-06  Zdenek Dvorak  <dvorakz@suse.cz>
 
 	* haifa-sched.c (restore_bb_notes): Clear bb field of the notes
Index: gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-68.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-68.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-68.c	(revision 0)
@@ -0,0 +1,89 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "../../tree-vect.h"
+
+#define N 32
+
+struct s{
+  int m;
+  int n[N][N][N];
+};
+
+struct test1{
+  struct s a; /* array a.n is unaligned */
+  int b;
+  int c;
+  struct s e; /* array e.n is aligned */
+};
+
+int main1 ()
+{  
+  int i,j;
+  struct test1 tmp1;
+
+  /* 1. unaligned */
+  for (i = 0; i < N; i++)
+    {
+      tmp1.a.n[1][2][i] = 5;
+    }
+
+  /* check results:  */
+  for (i = 0; i <N; i++)
+    {
+      if (tmp1.a.n[1][2][i] != 5)
+        abort ();
+    }
+
+  /* 2. aligned */
+  for (i = 3; i < N-1; i++)
+    {
+      tmp1.a.n[1][2][i] = 6;
+    }
+
+  /* check results:  */
+  for (i = 3; i < N-1; i++)
+    {
+      if (tmp1.a.n[1][2][i] != 6)
+        abort ();
+    }
+
+  /* 3. aligned */
+  for (i = 0; i < N; i++)
+    {
+      tmp1.e.n[1][2][i] = 7;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (tmp1.e.n[1][2][i] != 7)
+        abort ();
+    }
+
+  /* 4. unaligned */
+  for (i = 3; i < N-3; i++)
+    {
+      tmp1.e.n[1][2][i] = 8;
+    }
+ 
+  /* check results:  */
+  for (i = 3; i <N-3; i++)
+    {
+      if (tmp1.e.n[1][2][i] != 8)
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{ 
+  check_vect ();
+  
+  return main1 ();
+} 
+
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-fast-math-vect-pr29925.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-fast-math-vect-pr29925.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-fast-math-vect-pr29925.c	(revision 0)
@@ -0,0 +1,39 @@
+/* { dg-require-effective-target vect_float } */
+
+#include <stdlib.h>
+#include "../../tree-vect.h"
+
+void interp_pitch(float *exc, float *interp, int pitch, int len)
+{
+   int i,k;
+   int maxj;
+
+   maxj=3;
+   for (i=0;i<len;i++)
+   {
+      float tmp = 0;
+      for (k=0;k<7;k++)
+      {
+         tmp += exc[i-pitch+k+maxj-6];
+      }
+      interp[i] = tmp;
+   }
+}
+
+int main()
+{
+   float *exc = calloc(126,sizeof(float));
+   float *interp = calloc(80,sizeof(float));
+   int pitch = -35;
+
+   check_vect ();
+
+   interp_pitch(exc, interp, pitch, 80);
+   free(exc);
+   free(interp);
+   return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-reduc-1char.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-reduc-1char.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-reduc-1char.c	(revision 0)
@@ -0,0 +1,51 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "../../tree-vect.h"
+
+#define N 16
+#define DIFF 242
+
+void
+main1 (unsigned char x, unsigned char max_result, unsigned char min_result)
+{
+  int i;
+  unsigned char ub[N] = {1,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+  unsigned char uc[N] = {1,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+  unsigned char udiff = 2;
+  unsigned char umax = x;
+  unsigned char umin = x;
+
+  for (i = 0; i < N; i++) {
+    udiff += (unsigned char)(ub[i] - uc[i]);
+  }
+
+  for (i = 0; i < N; i++) {
+    umax = umax < uc[i] ? uc[i] : umax;
+  }
+
+  for (i = 0; i < N; i++) {
+    umin = umin > uc[i] ? uc[i] : umin;
+  }
+
+  /* check results:  */
+  if (udiff != DIFF)
+    abort ();
+  if (umax != max_result)
+    abort ();
+  if (umin != min_result)
+    abort ();
+}
+
+int main (void)
+{ 
+  check_vect ();
+  
+  main1 (100, 100, 1);
+  main1 (0, 15, 0);
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail vect_no_int_max } } } */
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 2 "vect" { xfail vect_no_int_max } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/costmodel/i386/i386-costmodel-vect.exp
===================================================================
--- gcc/testsuite/gcc.dg/vect/costmodel/i386/i386-costmodel-vect.exp	(revision 0)
+++ gcc/testsuite/gcc.dg/vect/costmodel/i386/i386-costmodel-vect.exp	(revision 0)
@@ -0,0 +1,67 @@
+# Copyright (C) 1997, 2004, 2005, 2006 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+# 
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+# 
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.  
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# Exit immediately if this isn't a x86 target.
+if { ![istarget i?86*-*-*] && ![istarget x86_64-*-*] } then {
+  return
+}
+
+# Set up flags used for tests that don't specify options.
+set DEFAULT_VECTCFLAGS ""
+
+# These flags are used for all targets.
+lappend DEFAULT_VECTCFLAGS "-O2" "-ftree-vectorize" "-fvect-cost-model"
+
+# If the target system supports vector instructions, the default action
+# for a test is 'run', otherwise it's 'compile'.  Save current default.
+# Executing vector instructions on a system without hardware vector support
+# is also disabled by a call to check_vect, but disabling execution here is
+# more efficient.
+global dg-do-what-default
+set save-dg-do-what-default ${dg-do-what-default}
+
+lappend DEFAULT_VECTCFLAGS "-msse2"
+set dg-do-what-default run
+
+# Initialize `dg'.
+dg-init
+
+lappend DEFAULT_VECTCFLAGS "-fdump-tree-vect-details"
+
+# Main loop.
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/costmodel-vect-*.\[cS\]]]  \
+	"" $DEFAULT_VECTCFLAGS
+
+#### Tests with special options
+global SAVED_DEFAULT_VECTCFLAGS
+set SAVED_DEFAULT_VECTCFLAGS $DEFAULT_VECTCFLAGS
+
+# -ffast-math tests
+set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
+lappend DEFAULT_VECTCFLAGS "-ffast-math"
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/costmodel-fast-math-vect*.\[cS\]]]  \
+	"" $DEFAULT_VECTCFLAGS
+
+# Clean up.
+set dg-do-what-default ${save-dg-do-what-default}
+
+# All done.
+dg-finish
Index: gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c	(revision 0)
@@ -0,0 +1,91 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "../../tree-vect.h"
+
+#define N 32
+
+struct t{
+  int k[N];
+  int l; 
+};
+  
+struct s{
+  char a;	/* aligned */
+  char b[N-1];  /* unaligned (offset 1B) */
+  char c[N];    /* aligned (offset NB) */
+  struct t d;   /* aligned (offset 2NB) */
+  struct t e;   /* unaligned (offset 2N+4N+4 B) */
+};
+ 
+int main1 ()
+{  
+  int i;
+  struct s tmp;
+
+  /* unaligned */
+  for (i = 0; i < N/2; i++)
+    {
+      tmp.b[i] = 5;
+    }
+
+  /* check results:  */
+  for (i = 0; i <N/2; i++)
+    {
+      if (tmp.b[i] != 5)
+        abort ();
+    }
+
+  /* aligned */
+  for (i = 0; i < N/2; i++)
+    {
+      tmp.c[i] = 6;
+    }
+
+  /* check results:  */
+  for (i = 0; i <N/2; i++)
+    {
+      if (tmp.c[i] != 6)
+        abort ();
+    }
+
+  /* aligned */
+  for (i = 0; i < N/2; i++)
+    {
+      tmp.d.k[i] = 7;
+    }
+
+  /* check results:  */
+  for (i = 0; i <N/2; i++)
+    {
+      if (tmp.d.k[i] != 7)
+        abort ();
+    }
+
+  /* unaligned */
+  for (i = 0; i < N/2; i++)
+    {
+      tmp.e.k[i] = 8;
+    }
+
+  /* check results:  */
+  for (i = 0; i <N/2; i++)
+    {
+      if (tmp.e.k[i] != 8)
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{ 
+  check_vect ();
+  
+  return main1 ();
+} 
+
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 2 "vect" } }
+ */
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-33.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-33.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-33.c	(revision 0)
@@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "../../tree-vect.h"
+
+#define N 16
+struct test {
+  char ca[N];
+};
+
+extern struct test s;
+ 
+int main1 ()
+{  
+  int i;
+
+  for (i = 0; i < N; i++)
+    {
+      s.ca[i] = 5;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (s.ca[i] != 5)
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{ 
+  return main1 ();
+} 
+
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/costmodel/x86_64/x86_64-costmodel-vect.exp
===================================================================
--- gcc/testsuite/gcc.dg/vect/costmodel/x86_64/x86_64-costmodel-vect.exp	(revision 0)
+++ gcc/testsuite/gcc.dg/vect/costmodel/x86_64/x86_64-costmodel-vect.exp	(revision 0)
@@ -0,0 +1,70 @@
+# Copyright (C) 1997, 2004, 2005, 2006 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+# 
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+# 
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.  
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# Exit immediately if this isn't a x86 target.
+if { (![istarget x86_64-*-*] && ![istarget i?86-*-*])
+     || ![is-effective-target lp64] } then {
+  return
+}
+
+# Set up flags used for tests that don't specify options.
+set DEFAULT_VECTCFLAGS ""
+
+# These flags are used for all targets.
+lappend DEFAULT_VECTCFLAGS "-O2" "-ftree-vectorize" "-fvect-cost-model"
+
+# If the target system supports vector instructions, the default action
+# for a test is 'run', otherwise it's 'compile'.  Save current default.
+# Executing vector instructions on a system without hardware vector support
+# is also disabled by a call to check_vect, but disabling execution here is
+# more efficient.
+global dg-do-what-default
+set save-dg-do-what-default ${dg-do-what-default}
+
+lappend DEFAULT_VECTCFLAGS "-msse2"
+set dg-do-what-default run
+
+# Initialize `dg'.
+dg-init
+
+lappend DEFAULT_VECTCFLAGS "-fdump-tree-vect-details"
+
+# Main loop.
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/costmodel-pr*.\[cS\]]]  \
+	"" $DEFAULT_VECTCFLAGS
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/costmodel-vect-*.\[cS\]]]  \
+	"" $DEFAULT_VECTCFLAGS
+
+#### Tests with special options
+global SAVED_DEFAULT_VECTCFLAGS
+set SAVED_DEFAULT_VECTCFLAGS $DEFAULT_VECTCFLAGS
+
+# -ffast-math tests
+set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
+lappend DEFAULT_VECTCFLAGS "-ffast-math"
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/costmodel-fast-math-vect*.\[cS\]]]  \
+	"" $DEFAULT_VECTCFLAGS
+
+# Clean up.
+set dg-do-what-default ${save-dg-do-what-default}
+
+# All done.
+dg-finish
Index: gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-68.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-68.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-68.c	(revision 0)
@@ -0,0 +1,89 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "../../tree-vect.h"
+
+#define N 32
+
+struct s{
+  int m;
+  int n[N][N][N];
+};
+
+struct test1{
+  struct s a; /* array a.n is unaligned */
+  int b;
+  int c;
+  struct s e; /* array e.n is aligned */
+};
+
+int main1 ()
+{  
+  int i,j;
+  struct test1 tmp1;
+
+  /* 1. unaligned */
+  for (i = 0; i < N; i++)
+    {
+      tmp1.a.n[1][2][i] = 5;
+    }
+
+  /* check results:  */
+  for (i = 0; i <N; i++)
+    {
+      if (tmp1.a.n[1][2][i] != 5)
+        abort ();
+    }
+
+  /* 2. aligned */
+  for (i = 3; i < N-1; i++)
+    {
+      tmp1.a.n[1][2][i] = 6;
+    }
+
+  /* check results:  */
+  for (i = 3; i < N-1; i++)
+    {
+      if (tmp1.a.n[1][2][i] != 6)
+        abort ();
+    }
+
+  /* 3. aligned */
+  for (i = 0; i < N; i++)
+    {
+      tmp1.e.n[1][2][i] = 7;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (tmp1.e.n[1][2][i] != 7)
+        abort ();
+    }
+
+  /* 4. unaligned */
+  for (i = 3; i < N-3; i++)
+    {
+      tmp1.e.n[1][2][i] = 8;
+    }
+ 
+  /* check results:  */
+  for (i = 3; i <N-3; i++)
+    {
+      if (tmp1.e.n[1][2][i] != 8)
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{ 
+  check_vect ();
+  
+  return main1 ();
+} 
+
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-fast-math-vect-pr29925.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-fast-math-vect-pr29925.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-fast-math-vect-pr29925.c	(revision 0)
@@ -0,0 +1,39 @@
+/* { dg-require-effective-target vect_float } */
+
+#include <stdlib.h>
+#include "../../tree-vect.h"
+
+void interp_pitch(float *exc, float *interp, int pitch, int len)
+{
+   int i,k;
+   int maxj;
+
+   maxj=3;
+   for (i=0;i<len;i++)
+   {
+      float tmp = 0;
+      for (k=0;k<7;k++)
+      {
+         tmp += exc[i-pitch+k+maxj-6];
+      }
+      interp[i] = tmp;
+   }
+}
+
+int main()
+{
+   float *exc = calloc(126,sizeof(float));
+   float *interp = calloc(80,sizeof(float));
+   int pitch = -35;
+
+   check_vect ();
+
+   interp_pitch(exc, interp, pitch, 80);
+   free(exc);
+   free(interp);
+   return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-reduc-1char.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-reduc-1char.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-reduc-1char.c	(revision 0)
@@ -0,0 +1,51 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "../../tree-vect.h"
+
+#define N 16
+#define DIFF 242
+
+void
+main1 (unsigned char x, unsigned char max_result, unsigned char min_result)
+{
+  int i;
+  unsigned char ub[N] = {1,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+  unsigned char uc[N] = {1,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+  unsigned char udiff = 2;
+  unsigned char umax = x;
+  unsigned char umin = x;
+
+  for (i = 0; i < N; i++) {
+    udiff += (unsigned char)(ub[i] - uc[i]);
+  }
+
+  for (i = 0; i < N; i++) {
+    umax = umax < uc[i] ? uc[i] : umax;
+  }
+
+  for (i = 0; i < N; i++) {
+    umin = umin > uc[i] ? uc[i] : umin;
+  }
+
+  /* check results:  */
+  if (udiff != DIFF)
+    abort ();
+  if (umax != max_result)
+    abort ();
+  if (umin != min_result)
+    abort ();
+}
+
+int main (void)
+{ 
+  check_vect ();
+  
+  main1 (100, 100, 1);
+  main1 (0, 15, 0);
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail vect_no_int_max } } } */
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 2 "vect" { xfail vect_no_int_max } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c	(revision 0)
@@ -0,0 +1,91 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "../../tree-vect.h"
+
+#define N 32
+
+struct t{
+  int k[N];
+  int l; 
+};
+  
+struct s{
+  char a;	/* aligned */
+  char b[N-1];  /* unaligned (offset 1B) */
+  char c[N];    /* aligned (offset NB) */
+  struct t d;   /* aligned (offset 2NB) */
+  struct t e;   /* unaligned (offset 2N+4N+4 B) */
+};
+ 
+int main1 ()
+{  
+  int i;
+  struct s tmp;
+
+  /* unaligned */
+  for (i = 0; i < N/2; i++)
+    {
+      tmp.b[i] = 5;
+    }
+
+  /* check results:  */
+  for (i = 0; i <N/2; i++)
+    {
+      if (tmp.b[i] != 5)
+        abort ();
+    }
+
+  /* aligned */
+  for (i = 0; i < N/2; i++)
+    {
+      tmp.c[i] = 6;
+    }
+
+  /* check results:  */
+  for (i = 0; i <N/2; i++)
+    {
+      if (tmp.c[i] != 6)
+        abort ();
+    }
+
+  /* aligned */
+  for (i = 0; i < N/2; i++)
+    {
+      tmp.d.k[i] = 7;
+    }
+
+  /* check results:  */
+  for (i = 0; i <N/2; i++)
+    {
+      if (tmp.d.k[i] != 7)
+        abort ();
+    }
+
+  /* unaligned */
+  for (i = 0; i < N/2; i++)
+    {
+      tmp.e.k[i] = 8;
+    }
+
+  /* check results:  */
+  for (i = 0; i <N/2; i++)
+    {
+      if (tmp.e.k[i] != 8)
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{ 
+  check_vect ();
+  
+  return main1 ();
+} 
+
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 2 "vect" } }
+ */
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr30843.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr30843.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr30843.c	(revision 0)
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_long } */
+
+#include <stdarg.h>
+#include "../../tree-vect.h"
+
+#define N 16
+
+void dacP98FillRGBMap (unsigned char *pBuffer)
+{
+    unsigned long dw, dw1;
+    unsigned long *pdw = (unsigned long *)(pBuffer);
+
+    for( dw = 256, dw1 = 0; dw; dw--, dw1 += 0x01010101) 
+    {
+       *pdw++ = dw1;
+       *pdw++ = dw1;
+       *pdw++ = dw1;
+       *pdw++ = dw1;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" { target vect_interleave
+} } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c	(revision 0)
+++ gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c	(revision 0)
@@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "../../tree-vect.h"
+
+#define N 16
+struct test {
+  char ca[N];
+};
+
+extern struct test s;
+ 
+int main1 ()
+{  
+  int i;
+
+  for (i = 0; i < N; i++)
+    {
+      s.ca[i] = 5;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (s.ca[i] != 5)
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{ 
+  return main1 ();
+} 
+
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
Index: gcc/testsuite/ChangeLog
===================================================================
--- gcc/testsuite/ChangeLog	(revision 125492)
+++ gcc/testsuite/ChangeLog	(working copy)
@@ -1,3 +1,27 @@
+2007-06-06  Harsha Jagsia <harsha.jagasia@amd.com>
+
+	* gcc.dg/vect/costmodel: New directory.
+	* gcc.dg/vect/costmodel/i386: New directory.
+	* gcc.dg/vect/costmodel/i386/i386-costmodel-vect.exp: New testsuite.
+	* gcc.dg/vect/costmodel/i386/costmodel-fast-math-vect-pr29925.c:
+	New test.
+	* gcc.dg/vect/costmodel/i386/costmodel-vect-31.c: New test.
+	* gcc.dg/vect/costmodel/i386/costmodel-vect-33.c: New test.
+	* gcc.dg/vect/costmodel/i386/costmodel-vect-33.c: New test.
+	* gcc.dg/vect/costmodel/i386/costmodel-vect-68.c: New test.
+	* gcc.dg/vect/costmodel/i386/costmodel-vect-reduc-1char.c: New test.
+	* gcc.dg/vect/costmodel/x86_64: New directory.
+	* gcc.dg/vect/costmodel/x86_64/x86_64-costmodel-vect.exp:
+	New testsuite.	
+	* gcc.dg/vect/costmodel/x86_64/costmodel-fast-math-vect-pr29925.c:
+	New test.
+	* gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c: New test.
+	* gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c: New test.
+	* gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c: New test.
+	* gcc.dg/vect/costmodel/x86_64/costmodel-vect-68.c: New test.
+	* gcc.dg/vect/costmodel/x86_64/costmodel-vect-reduc-1char.c: New test.
+	* gcc.dg/vect/costmodel/x86_64/costmodel-pr30843.c: New test.
+	
 2007-06-06  Ian Lance Taylor  <iant@google.com>
 
 	* g++.dg/conversion/enum1.C: New test.
Index: gcc/tree-vectorizer.c
===================================================================
--- gcc/tree-vectorizer.c	(revision 125492)
+++ gcc/tree-vectorizer.c	(working copy)
@@ -1351,6 +1351,8 @@
   else
     STMT_VINFO_DEF_TYPE (res) = vect_loop_def;
   STMT_VINFO_SAME_ALIGN_REFS (res) = VEC_alloc (dr_p, heap, 5);
+  STMT_VINFO_INSIDE_OF_LOOP_COST (res) = 0;
+  STMT_VINFO_OUTSIDE_OF_LOOP_COST (res) = 0;
   DR_GROUP_FIRST_DR (res) = NULL_TREE;
   DR_GROUP_NEXT_DR (res) = NULL_TREE;
   DR_GROUP_SIZE (res) = 0;
Index: gcc/tree-vectorizer.h
===================================================================
--- gcc/tree-vectorizer.h	(revision 125492)
+++ gcc/tree-vectorizer.h	(working copy)
@@ -1,5 +1,5 @@
 /* Loop Vectorization
-   Copyright (C) 2003, 2004, 2005, 2006 Free Software Foundation, Inc.
+   Copyright (C) 2003, 2004, 2005, 2006, 2007 Free Software Foundation, Inc.
    Contributed by Dorit Naishlos <dorit@il.ibm.com>
 
 This file is part of GCC.
@@ -268,6 +268,13 @@
   /* For loads only, if there is a store with the same location, this field is
      TRUE.  */
   bool read_write_dep;
+
+  /* Vectorization costs associated with statement.  */
+  struct  
+  {
+    int outside_of_loop;     /* Statements generated outside loop.  */
+    int inside_of_loop;      /* Statements generated inside loop.  */
+  } cost;
 } *stmt_vec_info;
 
 /* Access Functions.  */
@@ -300,7 +307,43 @@
 #define DR_GROUP_READ_WRITE_DEPENDENCE(S)  (S)->read_write_dep
 
 #define STMT_VINFO_RELEVANT_P(S)          ((S)->relevant != vect_unused_in_loop)
+#define STMT_VINFO_OUTSIDE_OF_LOOP_COST(S) (S)->cost.outside_of_loop
+#define STMT_VINFO_INSIDE_OF_LOOP_COST(S)  (S)->cost.inside_of_loop
 
+/* These are some defines for the initial implementation of the vectorizer's
+   cost model.  These will later be target specific hooks.  */
+
+/* Cost of conditional branch.  */
+#ifndef TARG_COND_BRANCH_COST
+#define TARG_COND_BRANCH_COST        3
+#endif
+
+/* Cost of any vector operation, excluding load, store or vector to scalar
+   operation.  */ 
+#ifndef TARG_VEC_STMT_COST
+#define TARG_VEC_STMT_COST           1
+#endif
+
+/* Cost of vector to scalar operation.  */
+#ifndef TARG_VEC_TO_SCALAR_COST
+#define TARG_VEC_TO_SCALAR_COST      1
+#endif
+
+/* Cost of aligned vector load.  */
+#ifndef TARG_VEC_LOAD_COST
+#define TARG_VEC_LOAD_COST           1
+#endif
+
+/* Cost of misaligned vector load.  */
+#ifndef TARG_VEC_UNALIGNED_LOAD_COST
+#define TARG_VEC_UNALIGNED_LOAD_COST 2
+#endif
+
+/* Cost of vector store.  */
+#ifndef TARG_VEC_STORE_COST
+#define TARG_VEC_STORE_COST          1
+#endif
+
 static inline void set_stmt_info (stmt_ann_t ann, stmt_vec_info stmt_info);
 static inline stmt_vec_info vinfo_for_stmt (tree stmt);
 
@@ -437,6 +480,7 @@
 extern bool vectorizable_live_operation (tree, block_stmt_iterator *, tree *);
 extern bool vectorizable_reduction (tree, block_stmt_iterator *, tree *);
 extern bool vectorizable_induction (tree, block_stmt_iterator *, tree *);
+extern int  vect_estimate_min_profitable_iters (loop_vec_info);
 /* Driver for transformation stage.  */
 extern void vect_transform_loop (loop_vec_info);
 
Index: gcc/tree-vect-analyze.c
===================================================================
--- gcc/tree-vect-analyze.c	(revision 125492)
+++ gcc/tree-vect-analyze.c	(working copy)
@@ -1,5 +1,5 @@
 /* Analysis Utilities for Loop Vectorization.
-   Copyright (C) 2003,2004,2005,2006 Free Software Foundation, Inc.
+   Copyright (C) 2003, 2004, 2005, 2006, 2007 Free Software Foundation, Inc.
    Contributed by Dorit Naishlos <dorit@il.ibm.com>
 
 This file is part of GCC.
@@ -293,6 +293,9 @@
   tree phi;
   stmt_vec_info stmt_info;
   bool need_to_vectorize = false;
+  int min_profitable_iters;
+  int min_scalar_loop_bound;
+  unsigned int th;
 
   if (vect_print_dump_info (REPORT_DETAILS))
     fprintf (vect_dump, "=== vect_analyze_operations ===");
@@ -436,8 +439,6 @@
 	} /* stmts in bb */
     } /* bbs */
 
-  /* TODO: Analyze cost. Decide if worth while to vectorize.  */
-
   /* All operations in the loop are either irrelevant (deal with loop
      control, or dead), or only used outside the loop and can be moved
      out of the loop (e.g. invariants, inductions).  The loop can be 
@@ -461,16 +462,55 @@
         vectorization_factor, LOOP_VINFO_INT_NITERS (loop_vinfo));
 
   if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
-      && ((LOOP_VINFO_INT_NITERS (loop_vinfo) < vectorization_factor)
-	  || (LOOP_VINFO_INT_NITERS (loop_vinfo) <=
-		((unsigned) (PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)) 
-					   * vectorization_factor))))
+      && (LOOP_VINFO_INT_NITERS (loop_vinfo) < vectorization_factor))
     {
       if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
-	fprintf (vect_dump, "not vectorized: iteration count too small.");
+        fprintf (vect_dump, "not vectorized: iteration count too small.");
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump,"not vectorized: iteration count smaller than "
+                 "vectorization factor.");
       return false;
     }
 
+  /* Analyze cost. Decide if worth while to vectorize.  */
+
+  min_profitable_iters = vect_estimate_min_profitable_iters (loop_vinfo);
+
+  if (min_profitable_iters < 0)
+    {
+      if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
+        fprintf (vect_dump, "not vectorized: vectorization not profitable.");
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "not vectorized: vector version will never be "
+                 "profitable.");
+      return false;
+    }
+
+  min_scalar_loop_bound = (PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND))
+                          * vectorization_factor;
+
+  /* Use the cost model only if it is more conservative than user specified
+     threshold.  */
+
+  th = (unsigned) min_scalar_loop_bound;
+  if (min_profitable_iters 
+      && (!min_scalar_loop_bound
+          || min_profitable_iters > min_scalar_loop_bound))
+    th = (unsigned) min_profitable_iters;
+
+  if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+      && LOOP_VINFO_INT_NITERS (loop_vinfo) < th)
+    {
+      if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))	      
+        fprintf (vect_dump, "not vectorized: vectorization not "
+                 "profitable.");
+      if (vect_print_dump_info (REPORT_DETAILS))	      
+        fprintf (vect_dump, "not vectorized: iteration count smaller than "
+                 "user specified loop bound parameter or minimum "
+                 "profitable iterations (whichever is more conservative).");
+      return false;
+    }  
+
   if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
       || LOOP_VINFO_INT_NITERS (loop_vinfo) % vectorization_factor != 0
       || LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo))
Index: gcc/common.opt
===================================================================
--- gcc/common.opt	(revision 125492)
+++ gcc/common.opt	(working copy)
@@ -1110,6 +1110,10 @@
 Common Report Var(flag_tree_vectorize) Optimization
 Enable loop vectorization on trees
 
+fvect-cost-model
+Common Report Var(flag_vect_cost_model) Optimization
+Enable use of cost model in vectorization
+
 ftree-vect-loop-version
 Common Report Var(flag_tree_vect_loop_version) Init(1) Optimization
 Enable loop versioning when doing loop vectorization on trees
Index: gcc/tree-vect-transform.c
===================================================================
--- gcc/tree-vect-transform.c	(revision 125492)
+++ gcc/tree-vect-transform.c	(working copy)
@@ -74,6 +74,490 @@
 static int vect_min_worthwhile_factor (enum tree_code);
 
 
+/* Function vect_estimate_min_profitable_iters
+
+   Return the number of iterations required for the vector version of the
+   loop to be profitable relative to the cost of the scalar version of the
+   loop.
+
+   TODO: Take profile info into account before making vectorization
+   decisions, if available.  */
+
+int
+vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo)
+{
+  int i;
+  int min_profitable_iters;
+  int peel_iters_prologue;
+  int peel_iters_epilogue;
+  int vec_inside_cost = 0;
+  int vec_outside_cost = 0;
+  int scalar_single_iter_cost = 0;
+  int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
+  int nbbs = loop->num_nodes;
+
+  /* Cost model disabled.  */
+  if (!flag_vect_cost_model)
+    {
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "cost model disabled.");      
+      return 0;
+    }
+
+  /* Requires loop versioning tests to handle misalignment.
+     FIXME: Make cost depend on number of stmts in may_misalign list.  */
+
+  if (LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo))
+    {
+      vec_outside_cost += TARG_COND_BRANCH_COST;
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "cost model: Adding cost of checks for loop "
+                 "versioning.\n");
+    }
+
+  /* Requires a prologue loop when peeling to handle mislignment. Add cost of
+     two guards, one for the peeled loop and one for the vector loop.  */
+
+  peel_iters_prologue = LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo);
+  if (peel_iters_prologue)
+    {
+      vec_outside_cost += 2 * TARG_COND_BRANCH_COST;
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "cost model: Adding cost of checks for "
+                 "prologue.\n");
+    }
+
+ /* Requires an epilogue loop to finish up remaining iterations after vector
+    loop. Add cost of two guards, one for the peeled loop and one for the
+    vector loop.  */
+
+  if ((peel_iters_prologue < 0)
+      || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+      || LOOP_VINFO_INT_NITERS (loop_vinfo) % vf)
+    {
+      vec_outside_cost += 2 * TARG_COND_BRANCH_COST;
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "cost model : Adding cost of checks for "
+                 "epilogue.\n");
+    }
+
+  /* Count statements in scalar loop.  Using this as scalar cost for a single
+     iteration for now.
+
+     TODO: Add outer loop support.
+
+     TODO: Consider assigning different costs to different scalar
+     statements.  */
+
+  for (i = 0; i < nbbs; i++)
+    {
+      block_stmt_iterator si;
+      basic_block bb = bbs[i];
+
+      for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
+        {
+          tree stmt = bsi_stmt (si);
+          stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+          if (!STMT_VINFO_RELEVANT_P (stmt_info)
+              && !STMT_VINFO_LIVE_P (stmt_info))
+            continue;
+          scalar_single_iter_cost++;
+          vec_inside_cost += STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info);
+          vec_outside_cost += STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info);
+        }
+    }
+
+  /* Add additional cost for the peeled instructions in prologue and epilogue
+     loop.
+
+     FORNOW: If we dont know the value of peel_iters for prologue or epilogue
+     at compile-time - we assume the worst.  
+
+     TODO: Build an expression that represents peel_iters for prologue and
+     epilogue to be used in a run-time test.  */
+
+  peel_iters_prologue = LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo);
+
+  if (peel_iters_prologue < 0)
+    {
+      peel_iters_prologue = vf - 1;
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "cost model: "
+                 "prologue peel iters set conservatively.");
+
+      /* If peeling for alignment is unknown, loop bound of main loop becomes
+         unkown.  */
+      peel_iters_epilogue = vf - 1;
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "cost model: "
+                 "epilogue peel iters set conservatively because "
+                 "peeling for alignment is unknown .");
+    }
+  else 
+    {
+      if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
+        {
+          peel_iters_epilogue = vf - 1;
+          if (vect_print_dump_info (REPORT_DETAILS))
+            fprintf (vect_dump, "cost model: "
+                     "epilogue peel iters set conservatively because "
+                     "loop iterations are unknown .");
+        }
+      else      
+        peel_iters_epilogue = 
+                     (LOOP_VINFO_INT_NITERS (loop_vinfo) - peel_iters_prologue)
+                     % vf;
+    }
+
+  vec_outside_cost += (peel_iters_prologue * scalar_single_iter_cost)
+                      + (peel_iters_epilogue * scalar_single_iter_cost);
+
+  /* Calculate number of iterations required to make the vector version 
+     profitable, relative to the loop bodies only. The following condition
+     must hold true: ((SIC*VF)-VIC)*niters > VOC*VF, where
+     SIC = scalar iteration cost, VIC = vector iteration cost,
+     VOC = vector outside cost and VF = vectorization factor.  */
+
+  if ((scalar_single_iter_cost * vf) > vec_inside_cost)
+    {
+      if (vec_outside_cost == 0)
+        min_profitable_iters = 1;
+      else
+        {
+          min_profitable_iters = (vec_outside_cost * vf)
+                                 / ((scalar_single_iter_cost * vf)
+                                    - vec_inside_cost);
+
+          if ((scalar_single_iter_cost * vf * min_profitable_iters)
+              <= ((vec_inside_cost * min_profitable_iters)
+                  + (vec_outside_cost * vf)))
+            min_profitable_iters++;
+        }
+    }
+  /* vector version will never be profitable.  */
+  else
+    {
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "cost model: vector iteration cost = %d "
+                 "is divisible by scalar iteration cost = %d by a factor "
+                 "greater than or equal to the vectorization factor = %d .",
+                 vec_inside_cost, scalar_single_iter_cost, vf);
+      return -1;
+    }
+
+  if (vect_print_dump_info (REPORT_DETAILS))
+    {
+      fprintf (vect_dump, "Cost model analysis: \n");
+      fprintf (vect_dump, "  Vector inside of loop cost: %d\n",
+	       vec_inside_cost);
+      fprintf (vect_dump, "  Vector outside of loop cost: %d\n",
+	       vec_outside_cost);
+      fprintf (vect_dump, "  Scalar cost: %d\n", scalar_single_iter_cost);
+      fprintf (vect_dump, "  prologue iterations: %d\n",
+               peel_iters_prologue);
+      fprintf (vect_dump, "  epilogue iterations: %d\n",
+               peel_iters_epilogue);
+      fprintf (vect_dump, "  Calculated minimum iters for profitability: %d\n",
+	       min_profitable_iters);
+      fprintf (vect_dump, "  Actual minimum iters for profitability: %d\n",
+	       min_profitable_iters < vf ? vf : min_profitable_iters);
+    }
+
+  return min_profitable_iters < vf ? vf : min_profitable_iters;
+}
+
+
+/* TODO: Close dependency between vect_model_*_cost and vectorizable_* 
+   functions. Design better to avoid maintainence issues.  */
+    
+/* Function vect_model_reduction_cost.  
+
+   Models cost for a reduction operation, including the vector ops 
+   generated within the strip-mine loop, the initial definition before
+   the loop, and the epilogue code that must be generated.  */
+
+static void
+vect_model_reduction_cost (stmt_vec_info stmt_info, enum tree_code reduc_code,
+			   int ncopies)
+{
+  int outer_cost = 0;
+  enum tree_code code;
+  optab optab;
+  tree vectype;
+  tree orig_stmt;
+  tree reduction_op;
+  enum machine_mode mode;
+  tree operation = GIMPLE_STMT_OPERAND (STMT_VINFO_STMT (stmt_info), 1);
+  int op_type = TREE_CODE_LENGTH (TREE_CODE (operation));
+
+  /* Cost of reduction op inside loop.  */
+  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) += ncopies * TARG_VEC_STMT_COST;
+
+  reduction_op = TREE_OPERAND (operation, op_type-1);
+  vectype = get_vectype_for_scalar_type (TREE_TYPE (reduction_op));
+  mode = TYPE_MODE (vectype);
+  orig_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
+
+  if (!orig_stmt) 
+    orig_stmt = STMT_VINFO_STMT (stmt_info);
+
+  code = TREE_CODE (GIMPLE_STMT_OPERAND (orig_stmt, 1));
+
+  /* Add in cost for initial definition.  */
+  outer_cost += TARG_VEC_STMT_COST;
+
+  /* Determine cost of epilogue code.
+
+     We have a reduction operator that will reduce the vector in one statement.
+     Also requires scalar extract.  */
+
+  if (reduc_code < NUM_TREE_CODES) 
+    outer_cost += TARG_VEC_STMT_COST + TARG_VEC_TO_SCALAR_COST;
+  else 
+    {
+      int vec_size_in_bits = tree_low_cst (TYPE_SIZE (vectype), 1);
+      tree bitsize =
+	TYPE_SIZE (TREE_TYPE ( GIMPLE_STMT_OPERAND (orig_stmt, 0)));
+      int element_bitsize = tree_low_cst (bitsize, 1);
+      int nelements = vec_size_in_bits / element_bitsize;
+
+      optab = optab_for_tree_code (code, vectype);
+
+      /* We have a whole vector shift available.  */
+      if (!VECTOR_MODE_P (mode) 
+          || optab->handlers[mode].insn_code == CODE_FOR_nothing)
+        /* Final reduction via vector shifts and the reduction operator. Also
+           requires scalar extract.  */
+	outer_cost += ((exact_log2(nelements) * 2 + 1) * TARG_VEC_STMT_COST); 
+      else
+	/* Use extracts and reduction op for final reduction.  For N elements,
+           we have N extracts and N-1 reduction ops.  */
+	outer_cost += ((nelements + nelements - 1) * TARG_VEC_STMT_COST);
+    }
+
+  STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = outer_cost;
+
+  if (vect_print_dump_info (REPORT_DETAILS))
+    fprintf (vect_dump, "vect_model_reduction_cost: inside_cost = %d, "
+             "outside_cost = %d .", STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info),
+             STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info));
+}
+
+
+/* Function vect_model_induction_cost.
+
+   Models cost for induction operations.  */
+
+static void
+vect_model_induction_cost (stmt_vec_info stmt_info, int ncopies)
+{
+  /* loop cost for vec_loop.  */
+  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = ncopies * TARG_VEC_STMT_COST;
+  /* prologue cost for vec_init and vec_step.  */
+  STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = 2 * TARG_VEC_STMT_COST;
+  
+  if (vect_print_dump_info (REPORT_DETAILS))
+    fprintf (vect_dump, "vect_model_induction_cost: inside_cost = %d, "
+             "outside_cost = %d .", STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info),
+             STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info));
+}
+
+
+/* Function vect_model_simple_cost.  
+
+   Models cost for simple operations, i.e. those that only emit ncopies of a 
+   single op.  Right now, this does not account for multiple insns that could
+   be generated for the single vector op.  We will handle that shortly.  */
+
+static void
+vect_model_simple_cost (stmt_vec_info stmt_info, int ncopies)
+{
+  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = ncopies * TARG_VEC_STMT_COST;
+
+  if (vect_print_dump_info (REPORT_DETAILS))
+    fprintf (vect_dump, "vect_model_simple_cost: inside_cost = %d, "
+             "outside_cost = %d .", STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info),
+             STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info));
+}
+
+
+/* Function vect_cost_strided_group_size 
+ 
+   For strided load or store, return the group_size only if it is the first
+   load or store of a group, else return 1.  This ensures that group size is
+   only returned once per group.  */
+
+static int
+vect_cost_strided_group_size (stmt_vec_info stmt_info)
+{
+  tree first_stmt = DR_GROUP_FIRST_DR (stmt_info);
+
+  if (first_stmt == STMT_VINFO_STMT (stmt_info))
+    return DR_GROUP_SIZE (stmt_info);
+
+  return 1;
+}
+
+
+/* Function vect_model_store_cost
+
+   Models cost for stores.  In the case of strided accesses, one access
+   has the overhead of the strided access attributed to it.  */
+
+static void
+vect_model_store_cost (stmt_vec_info stmt_info, int ncopies)
+{
+  int cost = 0;
+  int group_size;
+
+  /* Strided access?  */
+  if (DR_GROUP_FIRST_DR (stmt_info)) 
+    group_size = vect_cost_strided_group_size (stmt_info);
+  /* Not a strided access.  */
+  else
+    group_size = 1;
+
+  /* Is this an access in a group of stores, which provide strided access?  
+     If so, add in the cost of the permutes.  */
+  if (group_size > 1) 
+    {
+      /* Uses a high and low interleave operation for each needed permute.  */
+      cost = ncopies * exact_log2(group_size) * group_size 
+             * TARG_VEC_STMT_COST;
+
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "vect_model_store_cost: strided group_size = %d .",
+                 group_size);
+
+    }
+
+  /* Costs of the stores.  */
+  cost += ncopies * TARG_VEC_STORE_COST;
+
+  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost;
+
+  if (vect_print_dump_info (REPORT_DETAILS))
+    fprintf (vect_dump, "vect_model_store_cost: inside_cost = %d, "
+             "outside_cost = %d .", STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info),
+             STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info));
+}
+
+
+/* Function vect_model_load_cost
+
+   Models cost for loads.  In the case of strided accesses, the last access
+   has the overhead of the strided access attributed to it.  Since unaligned
+   accesses are supported for loads, we also account for the costs of the 
+   access scheme chosen.  */
+
+static void
+vect_model_load_cost (stmt_vec_info stmt_info, int ncopies)
+		 
+{
+  int inner_cost = 0;
+  int group_size;
+  int alignment_support_cheme;
+  tree first_stmt;
+  struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info), *first_dr;
+
+  /* Strided accesses?  */
+  first_stmt = DR_GROUP_FIRST_DR (stmt_info);
+  if (first_stmt)
+    {
+      group_size = vect_cost_strided_group_size (stmt_info);
+      first_dr = STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt));
+    }
+  /* Not a strided access.  */
+  else
+    {
+      group_size = 1;
+      first_dr = dr;
+    }
+
+  alignment_support_cheme = vect_supportable_dr_alignment (first_dr);
+
+  /* Is this an access in a group of loads providing strided access?  
+     If so, add in the cost of the permutes.  */
+  if (group_size > 1) 
+    {
+      /* Uses an even and odd extract operations for each needed permute.  */
+      inner_cost = ncopies * exact_log2(group_size) * group_size
+                   * TARG_VEC_STMT_COST;
+
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "vect_model_load_cost: strided group_size = %d .",
+                 group_size);
+
+    }
+
+  /* The loads themselves.  */
+  switch (alignment_support_cheme)
+    {
+    case dr_aligned:
+      {
+        inner_cost += ncopies * TARG_VEC_LOAD_COST;
+
+        if (vect_print_dump_info (REPORT_DETAILS))
+          fprintf (vect_dump, "vect_model_load_cost: aligned.");
+
+        break;
+      }
+    case dr_unaligned_supported:
+      {
+        /* Here, we assign an additional cost for the unaligned load.  */
+        inner_cost += ncopies * TARG_VEC_UNALIGNED_LOAD_COST;
+
+        if (vect_print_dump_info (REPORT_DETAILS))
+          fprintf (vect_dump, "vect_model_load_cost: unaligned supported by "
+                   "hardware.");
+
+        break;
+      }
+    case dr_unaligned_software_pipeline:
+      {
+        int outer_cost = 0;
+
+        if (vect_print_dump_info (REPORT_DETAILS))
+          fprintf (vect_dump, "vect_model_load_cost: unaligned software "
+                   "pipelined.");
+
+        /* Unaligned software pipeline has a load of an address, an initial
+           load, and possibly a mask operation to "prime" the loop. However,
+           if this is an access in a group of loads, which provide strided
+           acccess, then the above cost should only be considered for one
+           access in the group. Inside the loop, there is a load op
+           and a realignment op.  */
+
+        if ((!DR_GROUP_FIRST_DR (stmt_info)) || group_size > 1)
+          {
+            outer_cost = 2*TARG_VEC_STMT_COST;
+            if (targetm.vectorize.builtin_mask_for_load)
+              outer_cost += TARG_VEC_STMT_COST;
+          }
+        
+        STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = outer_cost;
+
+        inner_cost += ncopies * (TARG_VEC_LOAD_COST + TARG_VEC_STMT_COST);
+
+        break;
+      }
+
+    default:
+      gcc_unreachable ();
+    }
+
+  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = inner_cost;
+
+  if (vect_print_dump_info (REPORT_DETAILS))
+    fprintf (vect_dump, "vect_model_load_cost: inside_cost = %d, "
+             "outside_cost = %d .", STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info),
+             STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info));
+
+}
+
+
 /* Function vect_get_new_vect_var.
 
    Returns a name for a new variable. The current naming scheme appends the 
@@ -1655,6 +2139,7 @@
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
+      vect_model_reduction_cost (stmt_info, epilog_reduc_code, ncopies);
       return true;
     }
 
@@ -1862,9 +2347,15 @@
 
   gcc_assert (ZERO_SSA_OPERANDS (stmt, SSA_OP_ALL_VIRTUALS));
 
+  ncopies = (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
+	     / TYPE_VECTOR_SUBPARTS (vectype_out));
+
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = call_vec_info_type;
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "=== vectorizable_call ===");
+      vect_model_simple_cost (stmt_info, ncopies);
       return true;
     }
 
@@ -1873,8 +2364,6 @@
   if (vect_print_dump_info (REPORT_DETAILS))
     fprintf (vect_dump, "transform operation.");
 
-  ncopies = (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
-	     / TYPE_VECTOR_SUBPARTS (vectype_out));
   gcc_assert (ncopies >= 1);
 
   /* Handle def.  */
@@ -2301,6 +2790,9 @@
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = assignment_vec_info_type;
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "=== vectorizable_assignment ===");
+      vect_model_simple_cost (stmt_info, ncopies);
       return true;
     }
 
@@ -2391,6 +2883,9 @@
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = induc_vec_info_type;
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "=== vectorizable_induction ===");
+      vect_model_induction_cost (stmt_info, ncopies);
       return true;
     }
 
@@ -2554,6 +3049,9 @@
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = op_vec_info_type;
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "=== vectorizable_operation ===");
+      vect_model_simple_cost (stmt_info, ncopies);
       return true;
     }
 
@@ -2773,6 +3271,9 @@
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = type_demotion_vec_info_type;
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "=== vectorizable_demotion ===");
+      vect_model_simple_cost (stmt_info, ncopies);
       return true;
     }
 
@@ -2932,6 +3433,9 @@
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = type_promotion_vec_info_type;
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "=== vectorizable_promotion ===");
+      vect_model_simple_cost (stmt_info, 2*ncopies);
       return true;
     }
 
@@ -3252,14 +3756,12 @@
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = store_vec_info_type;
+      vect_model_store_cost (stmt_info, ncopies);
       return true;
     }
 
   /** Transform.  **/
 
-  if (vect_print_dump_info (REPORT_DETAILS))
-    fprintf (vect_dump, "transform store. ncopies = %d",ncopies);
-
   if (strided_store)
     {
       first_stmt = DR_GROUP_FIRST_DR (stmt_info);
@@ -3284,6 +3786,9 @@
       group_size = 1;
     }
   
+  if (vect_print_dump_info (REPORT_DETAILS))
+    fprintf (vect_dump, "transform store. ncopies = %d",ncopies);
+
   dr_chain = VEC_alloc (tree, heap, group_size);
   oprnds = VEC_alloc (tree, heap, group_size);
 
@@ -3915,14 +4420,15 @@
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = load_vec_info_type;
+      vect_model_load_cost (stmt_info, ncopies);
       return true;
     }
 
-  /** Transform.  **/
-
   if (vect_print_dump_info (REPORT_DETAILS))
     fprintf (vect_dump, "transform load.");
 
+  /** Transform.  **/
+
   if (strided_load)
     {
       first_stmt = DR_GROUP_FIRST_DR (stmt_info);
@@ -4807,6 +5313,8 @@
   basic_block preheader;
   int loop_num;
   unsigned int th;
+  int min_scalar_loop_bound;
+  int min_profitable_iters;
 
   if (vect_print_dump_info (REPORT_DETAILS))
     fprintf (vect_dump, "=== vect_do_peeling_for_loop_bound ===");
@@ -4822,11 +5330,28 @@
 				   &ratio_mult_vf_name, ratio);
 
   loop_num  = loop->num; 
-  /* Threshold for vectorized loop.  */
-  th = (PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)) * 
-			LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+
+  /* Analyze cost to set threshhold for vectorized loop.  */
+  min_profitable_iters = vect_estimate_min_profitable_iters (loop_vinfo);
+
+  min_scalar_loop_bound = (PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND))
+                          * LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+
+  /* Use the cost model only if it is more conservative than user specified
+     threshold.  */
+
+  th = (unsigned) min_scalar_loop_bound;
+  if (min_profitable_iters
+      && (!min_scalar_loop_bound
+          || min_profitable_iters > min_scalar_loop_bound))
+    th = (unsigned) min_profitable_iters;
+
+  if (vect_print_dump_info (REPORT_DETAILS))
+    fprintf (vect_dump, "vectorization may not be profitable.");
+
   new_loop = slpeel_tree_peel_loop_to_edge (loop, single_exit (loop),
-					    ratio_mult_vf_name, ni_name, false, th);
+                                            ratio_mult_vf_name, ni_name, false,
+                                            th);
   gcc_assert (new_loop);
   gcc_assert (loop_num == loop->num);
 #ifdef ENABLE_CHECKING
Index: ChangeLog
===================================================================
--- ChangeLog	(revision 125492)
+++ ChangeLog	(working copy)
@@ -1,3 +1,7 @@
+2007-06-06  Harsha Jagsia <harsha.jagasia@amd.com>
+
+        * doc/extend.texi: Add -fvect-cost-model flag.
+
 2007-06-01  Steve Ellcey  <sje@cup.hp.com>
 
 	* libtool.m4 (LT_CMD_MAX_LEN): Try using getconf to set

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] Vectorizer cost model implementation
  2007-06-06 16:36             ` Jagasia, Harsha
@ 2007-06-07  2:09               ` Eric Christopher
  2007-06-07 19:55               ` Dorit Nuzman
  1 sibling, 0 replies; 30+ messages in thread
From: Eric Christopher @ 2007-06-07  2:09 UTC (permalink / raw)
  To: Jagasia, Harsha; +Cc: Dorit Nuzman, GCC Patches, Meissner, Michael

>
> I have attached the cost model patch. The test cases have been reduced
> to a smaller subset which I think should give us a fairly good  
> coverage
> to start with. Currently I have only added the tests on x86 and  
> x86-64.
>

Want to add them in numerical order maybe?

And a few comments:

+  /* Cost model disabled.  */
+  if (!flag_vect_cost_model)
+    {
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "cost model disabled.");
+      return 0;
+    }

Want to add this where vect_estimate_min_profitable_iters is called from
rather than inside? Lot of code to just return 0.

And in a similar vein:

        STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
+      vect_model_reduction_cost (stmt_info, epilog_reduc_code,  
ncopies);

Shouldn't we determine if we're going to compute costs before doing so?

Otherwise it looks OK to me (note I can't approve it).

-eric

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [patch] Vectorizer cost model implementation
  2007-06-06 16:36             ` Jagasia, Harsha
  2007-06-07  2:09               ` Eric Christopher
@ 2007-06-07 19:55               ` Dorit Nuzman
  2007-06-07 20:11                 ` Jagasia, Harsha
  2007-06-08 19:21                 ` [patch] Vectorizer cost model implementation : committed Jagasia, Harsha
  1 sibling, 2 replies; 30+ messages in thread
From: Dorit Nuzman @ 2007-06-07 19:55 UTC (permalink / raw)
  To: Jagasia, Harsha; +Cc: GCC Patches, Meissner, Michael

"Jagasia, Harsha" <harsha.jagasia@amd.com> wrote on 06/06/2007 19:27:49:

> Hello,
>
...
>
> This has been fixed.
>

Hi Harsha,

This looks good now. There are just a few style issues to clean up:


> Index: gcc/ChangeLog
> ===================================================================
> --- gcc/ChangeLog     (revision 125492)
> +++ gcc/ChangeLog     (working copy)
> @@ -1,3 +1,39 @@
> +2007-06-06  Harsha Jagsia <harsha.jagasia@amd.com>

I think you mispelled your name? :-)
(also in the testsuite ChangeLog)

(by the way, the ChangeLog should be pasted as text in the email, instead
of included in the patch)


> +       Tony Linthicum <tony.linthicum@amd.com>
> +
> +   * common.opt: Add -fvect-cost-model flag.

I think the format should be:
      * common.opt (fvect-cost-model): New flag.


> +   * tree-vectorizer.c: Initialize inside and outside cost fields in
> +   stmt_vec_info struct for STMT.

You should indicate the name of the function where the change is made:
      * tree-vectorizer.c (new_stmt_vec_info): ...


> +   * tree-vectorizer.h: Define inside and outside cost fields in
> +   stmt_vec_info struct and access functions for the same.

Same here...:
      * tree-vectorizer.h (stmt_vec_info): ...
      (vect_estimate_min_profitable_iters): ...
      (TARG_COND_BRANCH_COST):...
      (TARG_VEC_STMT_COST):...
      ...


> +   Add target costs for vector stataments to the cost model.

typo: stataments --> statements


> +   (vectorizable_reduction): Call vect_model_reduction_cost during
> +   analysis phase.
> +   (vectorizable_reduction): Call vect_model_reduction_cost during
> +   analysis phase.

duplicate line...


About:
> +   * gcc.dg/vect/costmodel/i386/i386-costmodel-vect.exp: New testsuite.
> +   * gcc.dg/vect/costmodel/x86_64/x86_64-costmodel-vect.exp:
> +   New testsuite.

Maybe you'd want to look later into how to reduce duplication between the
*.exp files for the different targets (have them share a .exp in the
gcc.dg/vect/costmodel/ or something?)


> +/* Cost of any vector operation, excluding load, store or vector to
scalar
> +   operation.  */
> +#ifndef TARG_VEC_STMT_COST
> +#define TARG_VEC_STMT_COST           1
> +#endif
> +

As we discussed in the past, these are fine for now; probably one of the
first follow-up patches would be to replace these with target builtin
functions whose default implementation could be these costs. (for example,
a cost of 1 for the TARG_VEC_TO_SCALAR_COST is way off for Altivec, where
this is done through memory...)


> +  /* Requires a prologue loop when peeling to handle mislignment. Add
cost of

typo: mislignment --> misalignment


I had tested on powerpc an earlier version of this patch (from about a
month ago). I think since then you only made a few fixes, so I think it
should be ok to go ahead and commit it (with the above fixes). I will add
some powerpc testcases soon after that.

thanks,

dorit


> Thanks,
> Harsha
> [attachment "costmodel.0606.patch" deleted by Dorit Nuzman/Haifa/IBM]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [patch] Vectorizer cost model implementation
  2007-06-07 19:55               ` Dorit Nuzman
@ 2007-06-07 20:11                 ` Jagasia, Harsha
  2007-06-08 19:21                 ` [patch] Vectorizer cost model implementation : committed Jagasia, Harsha
  1 sibling, 0 replies; 30+ messages in thread
From: Jagasia, Harsha @ 2007-06-07 20:11 UTC (permalink / raw)
  To: Dorit Nuzman; +Cc: GCC Patches, Meissner, Michael

Hi Dorit,

>
>This looks good now. There are just a few style issues to clean up:
>
>
>> Index: gcc/ChangeLog
>> ===================================================================
>> --- gcc/ChangeLog     (revision 125492)
>> +++ gcc/ChangeLog     (working copy)
>> @@ -1,3 +1,39 @@
>> +2007-06-06  Harsha Jagsia <harsha.jagasia@amd.com>
>
>I think you mispelled your name? :-)
>(also in the testsuite ChangeLog)
>
>(by the way, the ChangeLog should be pasted as text in the email,
instead
>of included in the patch)
>
>
>> +       Tony Linthicum <tony.linthicum@amd.com>
>> +
>> +   * common.opt: Add -fvect-cost-model flag.
>
>I think the format should be:
>      * common.opt (fvect-cost-model): New flag.
>
>
>> +   * tree-vectorizer.c: Initialize inside and outside cost fields in
>> +   stmt_vec_info struct for STMT.
>
>You should indicate the name of the function where the change is made:
>      * tree-vectorizer.c (new_stmt_vec_info): ...
>
>
>> +   * tree-vectorizer.h: Define inside and outside cost fields in
>> +   stmt_vec_info struct and access functions for the same.
>
>Same here...:
>      * tree-vectorizer.h (stmt_vec_info): ...
>      (vect_estimate_min_profitable_iters): ...
>      (TARG_COND_BRANCH_COST):...
>      (TARG_VEC_STMT_COST):...
>      ...
>
>
>> +   Add target costs for vector stataments to the cost model.
>
>typo: stataments --> statements
>
>
>> +   (vectorizable_reduction): Call vect_model_reduction_cost during
>> +   analysis phase.
>> +   (vectorizable_reduction): Call vect_model_reduction_cost during
>> +   analysis phase.
>
>duplicate line...
>

My bad :) I will fix these.

>
>About:
>> +   * gcc.dg/vect/costmodel/i386/i386-costmodel-vect.exp: New
testsuite.
>> +   * gcc.dg/vect/costmodel/x86_64/x86_64-costmodel-vect.exp:
>> +   New testsuite.
>
>Maybe you'd want to look later into how to reduce duplication between
the
>*.exp files for the different targets (have them share a .exp in the
>gcc.dg/vect/costmodel/ or something?)
>

Yes, that's in the works.

>> +/* Cost of any vector operation, excluding load, store or vector to
>scalar
>> +   operation.  */
>> +#ifndef TARG_VEC_STMT_COST
>> +#define TARG_VEC_STMT_COST           1
>> +#endif
>> +
>
>As we discussed in the past, these are fine for now; probably one of
the
>first follow-up patches would be to replace these with target builtin
>functions whose default implementation could be these costs. (for
example,
>a cost of 1 for the TARG_VEC_TO_SCALAR_COST is way off for Altivec,
where
>this is done through memory...)
>
>

Indeed, I will share some thoughts on that soon.

>> +  /* Requires a prologue loop when peeling to handle mislignment.
Add
>cost of
>
>typo: mislignment --> misalignment
>
>
>I had tested on powerpc an earlier version of this patch (from about a
>month ago). I think since then you only made a few fixes, so I think it
>should be ok to go ahead and commit it (with the above fixes). I will
add
>some powerpc testcases soon after that.

Ok. I think another thing we had talked about is turning on the cost
model by default for the vectorizer, to widen its coverage. I am
currently working on a patch for this. 

Since there was some concern on the maintenance of tests on different
targets, if the cost model is on by default, I plan to turn the cost
model off for all the tests under testsuite that use -free-vectorize.
But as new tests are added, individual contributors would need to turn
the cost model off explicitly, if the intent is to test a specific
vectorization feature irrespective of the costs.

Thanks,
Harsha



^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [patch] Vectorizer cost model implementation : committed
  2007-06-07 19:55               ` Dorit Nuzman
  2007-06-07 20:11                 ` Jagasia, Harsha
@ 2007-06-08 19:21                 ` Jagasia, Harsha
  1 sibling, 0 replies; 30+ messages in thread
From: Jagasia, Harsha @ 2007-06-08 19:21 UTC (permalink / raw)
  To: Dorit Nuzman; +Cc: GCC Patches, Meissner, Michael

>should be ok to go ahead and commit it (with the above fixes). I will
add
>some powerpc testcases soon after that.

Patch committed.

Thanks,
Harsha


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] Vectorizer cost model implementation
  2007-04-05 22:26 ` Jagasia, Harsha
  2007-04-05 22:29   ` Eric Christopher
@ 2007-04-08 13:23   ` Dorit Nuzman
  1 sibling, 0 replies; 30+ messages in thread
From: Dorit Nuzman @ 2007-04-08 13:23 UTC (permalink / raw)
  To: Jagasia, Harsha; +Cc: echristo, gcc-patches, Meissner, Michael, Linthicum, Tony

> >b)
> >
> >+#ifdef ADJUST_IN_EPILOG
> >
> >We actually use the 'epilogue' spelling in the rest of the compiler.
>
> ADJUST_IN_EPILOG has been defined outside of this patch, so we left it
> as is.
>

This is intended to disappear with this patch:
http://gcc.gnu.org/ml/gcc-patches/2007-04/msg00045.html

> >
> >c)
> >+      if (!VECTOR_MODE_P (mode) || optab->handlers[mode].insn_code ==
> >+     CODE_FOR_nothing)
> >
> >Put the full || body on the second line.
>
> Ok
>
> >
> >d)
> >+      cost = ncopies * exact_log2(group_size) * group_size *
> >+   TARG_VEC_TO_SCALAR_COST;
> >
> >Same sort of thing here.
>
> Ok
>
> >Otherwise it looks nice. Not that I can approve it.
>
> Thanks for the review. AFAIK, Dorit has reviewed the patch (she may be
> on a break this week). If a MAINTAINER can approve as well, that would
> be great.

Indeed I looked over the patch, and was in close touch with Tony throughout
it's design/development, but as I told Tony, I do have some comments (to be
sent shortly), so please don't consider the patch approved yet on my part
:-) (and of course, it would require an actual approval from a maintainer).
(It is indeed holidays over here - which is why I didn't get to type out my
comments earlier).

thanks,
dorit

>
> Thanks,
> Harsha
>
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [patch] Vectorizer cost model implementation
  2007-04-05 20:38 ` [patch] Vectorizer cost model implementation Jagasia, Harsha
@ 2007-04-08 13:19   ` Dorit Nuzman
  0 siblings, 0 replies; 30+ messages in thread
From: Dorit Nuzman @ 2007-04-08 13:19 UTC (permalink / raw)
  To: Jagasia, Harsha; +Cc: gcc-patches, Meissner, Michael

> Currently 6 of the existing vector tests do not get vectorized and hence
> do not pass, because the cost model (correctly) estimates that it's not
> profitable to vectorize them. So we have disabled the cost model on all
> of the existing vect tests. I plan to start by adding appropriate copies
> of those tests that fail, with the cost model enabled.

As Tony and I had discussed off line, there are two options to handle this,
in a way that would preserve the current behavior of the testcases. Say you
have a testcase vect-n.c that is failing to vectorize with the cost model;
we can do one of the following:
1) Add "-fno-vect-cost-model" to the default vectorizer flags in vect.exp.
This way vect-n.c will preserve it's current behavior. If it is an
interesting testcase cost-model-wise, duplicate the test to a new test
named costmodel-vect-n.c, and add to vect.exp the logic to compile tests
prefixed with "costmodel-" without the flag.
or...
2) Change vect-n.c to xfail, with the expected string (i.e. "non
profitable" or whatever you print out), and duplicate it into a new test
called no-costmodel-n.c, which will be compiled with the new flag, to test
the current (old) behavior.

I don't have a strong preference here. You chose the first option, which is
fine.  The second option however exercises the cost model more, so I have a
slight preference towards that (especially that you say there are only 6
tests failing, so it will not be a lot of work to create 6 duplicate
tests). Also, I think we intend to turn the cost model on by default in
gcc, so maybe it will be less confusing if the default behavior of the
testcases will be the same.


> I will also add
> more tests to verify various decisions that the cost model makes.
>

great. Please also add a check that these testcase do not get vectorized
for the expected reason (i.e. because of profitability considerations).

thanks!

dorit


> Please continue to provide review on the patch.
>
> Thanks,
> Harsha Jagasia
>
>
> >-----Original Message-----
> >From: Jagasia, Harsha
> >Sent: Thursday, April 05, 2007 3:21 PM
> >To: Jagasia, Harsha
> >Subject:
> >
> >> -----Original Message-----
> >> From: Steven Bosscher [mailto:stevenb.gcc@gmail.com]
> >>
> >> Is it your plan to have this accepted for the trunk or for a branch?
> >
> >The plan is to have it accepted to the trunk, I believe.
> >
> >>
> >> If so, where's the ChangeLog entry?
> >
> >Oops!  I forgot.  My apologies.
> >
> >>
> >> How did you test this?
> >>
> >
> >I used the vect tests ... a number of them "fail" with the cost model
> >enabled, and the cost model is correct in its assessment.  I disabled
> >the cost model for the test suite for now so that those tests can
> >continue to function as intended.  I.e. test specific vectorization
> >capabilities.  Those tests will be made into cost model tests.
> >Additionally, more tests should be added as well that target the cost
> >model.
> >
> >I know that these issues should have been addressed before the patch
> was
> >submitted and at the very least I should have done a better job of
> >explaining the situation in my submission.  So, I must again apologize.
> >I'm not going to be working on this project any longer, and it was
> >desired that I submit the patch under my name prior to leaving.  As I
> >said, though, my colleague Harsha will be addressing any shortcomings
> in
> >this patch and will be supporting it going forward.
> >
> >Thanks for you comments.
> >
> >Tony
> >
> >
> >Regards,
> >Harsha
> >
> >--------------------------------------------------
> >Harsha Jagasia               GNU Tools Team (SEE)
> >512-602-1775                 Advanced Micro Devices
> >harsha.jagasia@amd.com       Austin, TX
> >
>
>
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [patch] Vectorizer cost model implementation
  2007-04-05 22:29   ` Eric Christopher
@ 2007-04-05 22:31     ` Jagasia, Harsha
  0 siblings, 0 replies; 30+ messages in thread
From: Jagasia, Harsha @ 2007-04-05 22:31 UTC (permalink / raw)
  To: Eric Christopher; +Cc: gcc-patches, Meissner, Michael, Linthicum, Tony

>>>
>>> +#ifdef ADJUST_IN_EPILOG
>>>
>>> We actually use the 'epilogue' spelling in the rest of the compiler.
>>
>> ADJUST_IN_EPILOG has been defined outside of this patch, so we left
it
>> as is.
>
>Fair enough, but you have other spellings of "epilog" in the patch as
>well. I mean, I can go back and fix them up, but you're there already
:)

I will fix them :)

>
>-eric
>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] Vectorizer cost model implementation
  2007-04-05 22:26 ` Jagasia, Harsha
@ 2007-04-05 22:29   ` Eric Christopher
  2007-04-05 22:31     ` Jagasia, Harsha
  2007-04-08 13:23   ` Dorit Nuzman
  1 sibling, 1 reply; 30+ messages in thread
From: Eric Christopher @ 2007-04-05 22:29 UTC (permalink / raw)
  To: Jagasia, Harsha; +Cc: gcc-patches, Meissner, Michael, Linthicum, Tony

>>
>> +#ifdef ADJUST_IN_EPILOG
>>
>> We actually use the 'epilogue' spelling in the rest of the compiler.
>
> ADJUST_IN_EPILOG has been defined outside of this patch, so we left it
> as is.

Fair enough, but you have other spellings of "epilog" in the patch as  
well. I mean, I can go back and fix them up, but you're there already :)

-eric

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] Vectorizer cost model implementation
       [not found] <D5B24B5251882048AD03DDFA431BB790012F0031@SAUSEXMB3.amd.com>
@ 2007-04-05 22:26 ` Jagasia, Harsha
  2007-04-05 22:29   ` Eric Christopher
  2007-04-08 13:23   ` Dorit Nuzman
  0 siblings, 2 replies; 30+ messages in thread
From: Jagasia, Harsha @ 2007-04-05 22:26 UTC (permalink / raw)
  To: gcc-patches, echristo; +Cc: Meissner, Michael, Linthicum, Tony

>A few style things I noticed:
>
>a)
>+      if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) &&
>+	  LOOP_VINFO_INT_NITERS (loop_vinfo) < (unsigned)
required_iters)
>
>coding standard nit. The && needs to be on the next line. And instead
of
>casting
>to unsigned, why not just compare < since we already know that
>required_iters
>is >= 0?

Thanks for pointing this out, we will fix it.

>
>b)
>
>+#ifdef ADJUST_IN_EPILOG
>
>We actually use the 'epilogue' spelling in the rest of the compiler.

ADJUST_IN_EPILOG has been defined outside of this patch, so we left it
as is.

>
>c)
>+      if (!VECTOR_MODE_P (mode) || optab->handlers[mode].insn_code ==
>+	  CODE_FOR_nothing)
>
>Put the full || body on the second line.

Ok

>
>d)
>+      cost = ncopies * exact_log2(group_size) * group_size *
>+	TARG_VEC_TO_SCALAR_COST;
>
>Same sort of thing here.

Ok

>Otherwise it looks nice. Not that I can approve it.

Thanks for the review. AFAIK, Dorit has reviewed the patch (she may be
on a break this week). If a MAINTAINER can approve as well, that would
be great.

Thanks,
Harsha


^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [patch] Vectorizer cost model implementation
       [not found] <D5B24B5251882048AD03DDFA431BB790012F002E@SAUSEXMB3.amd.com>
@ 2007-04-05 20:38 ` Jagasia, Harsha
  2007-04-08 13:19   ` Dorit Nuzman
  0 siblings, 1 reply; 30+ messages in thread
From: Jagasia, Harsha @ 2007-04-05 20:38 UTC (permalink / raw)
  To: gcc-patches; +Cc: Meissner, Michael

Currently 6 of the existing vector tests do not get vectorized and hence
do not pass, because the cost model (correctly) estimates that it's not
profitable to vectorize them. So we have disabled the cost model on all
of the existing vect tests. I plan to start by adding appropriate copies
of those tests that fail, with the cost model enabled. I will also add
more tests to verify various decisions that the cost model makes.

Please continue to provide review on the patch.

Thanks,
Harsha Jagasia


>-----Original Message-----
>From: Jagasia, Harsha
>Sent: Thursday, April 05, 2007 3:21 PM
>To: Jagasia, Harsha
>Subject:
>
>> -----Original Message-----
>> From: Steven Bosscher [mailto:stevenb.gcc@gmail.com]
>>
>> Is it your plan to have this accepted for the trunk or for a branch?
>
>The plan is to have it accepted to the trunk, I believe.
>
>>
>> If so, where's the ChangeLog entry?
>
>Oops!  I forgot.  My apologies.
>
>>
>> How did you test this?
>>
>
>I used the vect tests ... a number of them "fail" with the cost model
>enabled, and the cost model is correct in its assessment.  I disabled
>the cost model for the test suite for now so that those tests can
>continue to function as intended.  I.e. test specific vectorization
>capabilities.  Those tests will be made into cost model tests.
>Additionally, more tests should be added as well that target the cost
>model.
>
>I know that these issues should have been addressed before the patch
was
>submitted and at the very least I should have done a better job of
>explaining the situation in my submission.  So, I must again apologize.
>I'm not going to be working on this project any longer, and it was
>desired that I submit the patch under my name prior to leaving.  As I
>said, though, my colleague Harsha will be addressing any shortcomings
in
>this patch and will be supporting it going forward.
>
>Thanks for you comments.
>
>Tony
>
>
>Regards,
>Harsha
>
>--------------------------------------------------
>Harsha Jagasia               GNU Tools Team (SEE)
>512-602-1775                 Advanced Micro Devices
>harsha.jagasia@amd.com       Austin, TX
>



^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2007-06-08 19:13 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-04-05 17:41 [PATCH] Fix PRs c++/31078 and c++/31103 (canonical types failures) Doug Gregor
2007-04-05 19:42 ` [patch] Vectorizer cost model implementation Linthicum, Tony
2007-04-05 19:54   ` Steven Bosscher
2007-04-05 20:06     ` Linthicum, Tony
2007-04-05 21:35   ` Eric Christopher
2007-04-06  0:24   ` Sebastian Pop
2007-04-06 15:41     ` Jagasia, Harsha
2007-04-08 13:24     ` Dorit Nuzman
2007-04-08 14:37   ` Dorit Nuzman
2007-04-11 21:25     ` Jagasia, Harsha
2007-04-22  5:45     ` Jagasia, Harsha
2007-04-22  6:23       ` Andrew Pinski
2007-04-22  9:09         ` Eric Christopher
2007-04-22 15:57       ` Paul Brook
2007-04-22 22:39       ` Dorit Nuzman
2007-04-24 20:14         ` Jagasia, Harsha
2007-04-25 11:49           ` Dorit Nuzman
2007-04-25 15:37             ` Jagasia, Harsha
2007-06-06 16:36             ` Jagasia, Harsha
2007-06-07  2:09               ` Eric Christopher
2007-06-07 19:55               ` Dorit Nuzman
2007-06-07 20:11                 ` Jagasia, Harsha
2007-06-08 19:21                 ` [patch] Vectorizer cost model implementation : committed Jagasia, Harsha
2007-04-12  3:37 ` [PATCH] Fix PRs c++/31078 and c++/31103 (canonical types failures) Mark Mitchell
     [not found] <D5B24B5251882048AD03DDFA431BB790012F002E@SAUSEXMB3.amd.com>
2007-04-05 20:38 ` [patch] Vectorizer cost model implementation Jagasia, Harsha
2007-04-08 13:19   ` Dorit Nuzman
     [not found] <D5B24B5251882048AD03DDFA431BB790012F0031@SAUSEXMB3.amd.com>
2007-04-05 22:26 ` Jagasia, Harsha
2007-04-05 22:29   ` Eric Christopher
2007-04-05 22:31     ` Jagasia, Harsha
2007-04-08 13:23   ` Dorit Nuzman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).