public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [patch] Loop-aware SLP 4/5
       [not found] <OF21828808.D8D7629C-ONC2257337.00487532-C2257337.0048D04A@LocalDomain>
@ 2007-08-18 18:40 ` Dorit Nuzman
  0 siblings, 0 replies; 7+ messages in thread
From: Dorit Nuzman @ 2007-08-18 18:40 UTC (permalink / raw)
  To: Ira Rosen; +Cc: gcc-patches

Ira Rosen/Haifa/IBM wrote on 14/08/2007 16:15:19:

> This part adds an adjustment of vectorization cost model to SLP.
>
> SLP costs are calculated for each SLP computation tree recursively
> during its analysis, and added to the overall costs if we decide to SLP.
>
> Thanks,
> Ira
>

+/* SLP costs are calculated according to SLP instance unrolling factor
(i.e.,
+   the number of created vector stmts depends on the unrolling factor).
However,
+   the actual number of vector stmts for every SLP node depends on VF
which is
+   set later in vect_analyze_operations(). Hence, SLP costs should be
updated.
+   In this function we assume that the inside costs calculated in
+   vect_model_xxx_cost are linear in ncopies.  */

can you explain why this assumption holds (or when it may be broken, and
assert against that)?


+      /* We assume that costs are linear in ncopies.  */
+      if (SLP_INSTANCE_UNROLLING_FACTOR (instance) != vf)
+     SLP_INSTANCE_INSIDE_OF_LOOP_COST (instance) *= vf
+       / SLP_INSTANCE_UNROLLING_FACTOR (instance);

to me it's less confusing if you remove the 'if' (when unrolling==vf you'll
just multiply by 1, so no need for the special check).


and lastly, about the following repeating change:

+  int *inside_cost_field, *outside_cost_field;
+
+  /* Take addresses of relevant fields to update in the function.  */
+  if (slp_node)
+    {
+      inside_cost_field = &(SLP_TREE_INSIDE_OF_LOOP_COST (slp_node));
+      outside_cost_field = &(SLP_TREE_OUTSIDE_OF_LOOP_COST (slp_node));
+    }
+  else
+    {
+      inside_cost_field = &(STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info));
+      outside_cost_field = &(STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info));
+    }
...
-        STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = outer_cost;
+        *outside_cost_field = outer_cost;

         inner_cost += ncopies * (TARG_VEC_LOAD_COST + TARG_VEC_STMT_COST);

@@ -607,12 +654,12 @@ vect_model_load_cost (stmt_vec_info stmt
       gcc_unreachable ();
     }

-  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = inner_cost;
+  *inside_cost_field = inner_cost;


Again, in an attempt to reduce "if (slp_node)" switches in the code, I'd
replace
      STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost;
      STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = cost;
with
      stmt_vinfo_inside_of_loop_cost (stmt_info, slp_node, cost);
      stmt_vinfo_outside_of_loop_cost (stmt_info, slp_node, cost);

...and hide the "if (slp_node)" switch there:
      stmt_vinfo_inside_of_loop_cost (stmt_info, slp_node, cost)
        {
          if (slp_node)
            SLP_TREE_INSIDE_OF_LOOP_COST (slp_node) = cost;
          else
            STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost;
        }

(for your consideration...)

ok otherwise,

thanks,
dorit

> ChangeLog:
>
>  * tree-vectorizer.h (_slp_tree): Add new fields for costs and their
>  access functions.
>  (_slp_instance): Likewise.
>  (vect_model_simple_cost, vect_model_store_cost, vect_model_load_cost):
>  Declare (make extern).
>  * tree-vect-analyze.c (vect_update_slp_costs_according_to_vf): New.
>  (vect_analyze_operations): Call vect_update_slp_costs_according_to_vf.
>  (vect_get_and_check_slp_defs): Calculate costs. Add arguments.
>  (vect_build_slp_tree): Likewise.
>  (vect_analyze_slp_instance): Initialize cost fields. Update
>  arguments of vect_build_slp_tree.
>  * tree-vect-transform.c (vect_estimate_min_profitable_iters): Take
>  SLP costs into account.
>  (vect_model_simple_cost): Make extern, add SLP parameter and handle
>  SLP.
>  (vect_model_store_cost, vect_model_load_cost): Likewise.
>       (vectorizable_call): Add argument to vect_model_simple_cost.
>       (vectorizable_assignment): Call vect_model_simple_cost only for not
>       pure SLP stmts.
>       (vectorizable_operation): Likewise.
>       (vectorizable_type_demotion): Add argument to
>  vect_model_simple_cost.
>       (vectorizable_type_promotion): Likewise.
>       (vectorizable_store): Call vect_model_simple_cost only for not pure
>  SLP stmts.
>       (vectorizable_load): Likewise.
>
> [attachment "slp-part4.txt" deleted by Dorit Nuzman/Haifa/IBM]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] Loop-aware SLP 4/5
       [not found] <OF56DBF560.643A33F4-ONC2257351.003B4089-C2257351.003BEBE7@LocalDomain>
@ 2007-09-09 11:54 ` Dorit Nuzman
  0 siblings, 0 replies; 7+ messages in thread
From: Dorit Nuzman @ 2007-09-09 11:54 UTC (permalink / raw)
  To: Ira Rosen; +Cc: gcc-patches


Ira Rosen/Haifa/IBM wrote on 09/09/2007 13:54:30:

> Dorit Nuzman/Haifa/IBM wrote on 09/09/2007 11:00:13:
>
> > Ira Rosen/Haifa/IBM wrote on 09/09/2007 10:12:40:
> >
> > > Dorit Nuzman/Haifa/IBM wrote on 18/08/2007 21:36:27:
> > >
> > > > Ira Rosen/Haifa/IBM wrote on 14/08/2007 16:15:19:
> > > >
> > > > and lastly, about the following repeating change:
> > > >
> > > > +  int *inside_cost_field, *outside_cost_field;
> > > > +
> > > > +  /* Take addresses of relevant fields to update in the function.
*/
> > > > +  if (slp_node)
> > > > +    {
> > > > +      inside_cost_field = &(SLP_TREE_INSIDE_OF_LOOP_COST
(slp_node));
> > > > +      outside_cost_field = &(SLP_TREE_OUTSIDE_OF_LOOP_COST
(slp_node));
> > > > +    }
> > > > +  else
> > > > +    {
> > > > +      inside_cost_field = &(STMT_VINFO_INSIDE_OF_LOOP_COST
> (stmt_info));
> > > > +      outside_cost_field = &(STMT_VINFO_OUTSIDE_OF_LOOP_COST
> > (stmt_info));
> > > > +    }
> > > > ...
> > > > -        STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = outer_cost;
> > > > +        *outside_cost_field = outer_cost;
> > > >
> > > >          inner_cost += ncopies * (TARG_VEC_LOAD_COST +
> > TARG_VEC_STMT_COST);
> > > >
> > > > @@ -607,12 +654,12 @@ vect_model_load_cost (stmt_vec_info stmt
> > > >        gcc_unreachable ();
> > > >      }
> > > >
> > > > -  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = inner_cost;
> > > > +  *inside_cost_field = inner_cost;
> > > >
> > > > Again, in an attempt to reduce "if (slp_node)" switches in the
code,
> > > > I'd replace
> > > >  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost;
> > > >  STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = cost;
> > > > with
> > > >  stmt_vinfo_inside_of_loop_cost (stmt_info, slp_node, cost);
> > > >  stmt_vinfo_outside_of_loop_cost (stmt_info, slp_node, cost);
> > > >
> > > > ...and hide the "if (slp_node)" switch there:
> > > >  stmt_vinfo_inside_of_loop_cost (stmt_info, slp_node, cost)
> > > >    {
> > > >      if (slp_node)
> > > >   SLP_TREE_INSIDE_OF_LOOP_COST (slp_node) = cost;
> > > >      else
> > > >   STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost;
> > > >    }
> > > >
> > > > (for your consideration...)
> > >
> > > I moved the choice of the relevant field to a different
> (inlined) function.
> > >
> >
> > I prefer the alternative I suggested above - it's a bit less code
> > (no need for the call to the selection function) and more
> > importantly the whole field selection machinery is entirely hidden.
> > Please consider changing (or explain why you prefer the other
solution).
> >
> > In the meantime, ok to commit this patch and all the other parts as
> > well (the above change can be applied separately later).
>
> Here is the patch.
> Bootstrapped and tested on x86_64-linux.
> OK for mainline?
>

yes

thanks,
dorit

> Thanks,
> Ira
>
> ChangeLog:
>
>  * tree-vectorizer.h (stmt_vinfo_set_inside_of_loop_cost,
>  stmt_vinfo_set_outside_of_loop_cost): New functions.
>  * tree-vect-transform.c (vect_get_cost_fields): Remove.
>  (vect_model_simple_cost): Call
stmt_vinfo_set_inside/outside_of_loop_cost
>  to set the relevant cost field instead of calling vect_get_cost_fields.
>  (vect_model_store_cost, vect_model_load_cost): Likewise.
>
> [attachment "costs.txt" deleted by Dorit Nuzman/Haifa/IBM]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] Loop-aware SLP 4/5
       [not found] <OFADE7D20D.3F117D3E-ONC2257351.002B67C0-C2257351.002BF742@LocalDomain>
  2007-09-09  9:05 ` Ira Rosen
@ 2007-09-09 11:32 ` Ira Rosen
  1 sibling, 0 replies; 7+ messages in thread
From: Ira Rosen @ 2007-09-09 11:32 UTC (permalink / raw)
  To: Dorit Nuzman; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3038 bytes --]



Dorit Nuzman/Haifa/IBM wrote on 09/09/2007 11:00:13:

> Ira Rosen/Haifa/IBM wrote on 09/09/2007 10:12:40:
>
> > Dorit Nuzman/Haifa/IBM wrote on 18/08/2007 21:36:27:
> >
> > > Ira Rosen/Haifa/IBM wrote on 14/08/2007 16:15:19:
> > >
> > > and lastly, about the following repeating change:
> > >
> > > +  int *inside_cost_field, *outside_cost_field;
> > > +
> > > +  /* Take addresses of relevant fields to update in the function.
*/
> > > +  if (slp_node)
> > > +    {
> > > +      inside_cost_field = &(SLP_TREE_INSIDE_OF_LOOP_COST
(slp_node));
> > > +      outside_cost_field = &(SLP_TREE_OUTSIDE_OF_LOOP_COST
(slp_node));
> > > +    }
> > > +  else
> > > +    {
> > > +      inside_cost_field = &(STMT_VINFO_INSIDE_OF_LOOP_COST
(stmt_info));
> > > +      outside_cost_field = &(STMT_VINFO_OUTSIDE_OF_LOOP_COST
> (stmt_info));
> > > +    }
> > > ...
> > > -        STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = outer_cost;
> > > +        *outside_cost_field = outer_cost;
> > >
> > >          inner_cost += ncopies * (TARG_VEC_LOAD_COST +
> TARG_VEC_STMT_COST);
> > >
> > > @@ -607,12 +654,12 @@ vect_model_load_cost (stmt_vec_info stmt
> > >        gcc_unreachable ();
> > >      }
> > >
> > > -  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = inner_cost;
> > > +  *inside_cost_field = inner_cost;
> > >
> > > Again, in an attempt to reduce "if (slp_node)" switches in the code,
> > > I'd replace
> > >  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost;
> > >  STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = cost;
> > > with
> > >  stmt_vinfo_inside_of_loop_cost (stmt_info, slp_node, cost);
> > >  stmt_vinfo_outside_of_loop_cost (stmt_info, slp_node, cost);
> > >
> > > ...and hide the "if (slp_node)" switch there:
> > >  stmt_vinfo_inside_of_loop_cost (stmt_info, slp_node, cost)
> > >    {
> > >      if (slp_node)
> > >   SLP_TREE_INSIDE_OF_LOOP_COST (slp_node) = cost;
> > >      else
> > >   STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost;
> > >    }
> > >
> > > (for your consideration...)
> >
> > I moved the choice of the relevant field to a different (inlined)
function.
> >
>
> I prefer the alternative I suggested above - it's a bit less code
> (no need for the call to the selection function) and more
> importantly the whole field selection machinery is entirely hidden.
> Please consider changing (or explain why you prefer the other solution).
>
> In the meantime, ok to commit this patch and all the other parts as
> well (the above change can be applied separately later).

Here is the patch.
Bootstrapped and tested on x86_64-linux.
OK for mainline?

Thanks,
Ira

ChangeLog:

      * tree-vectorizer.h (stmt_vinfo_set_inside_of_loop_cost,
      stmt_vinfo_set_outside_of_loop_cost): New functions.
      * tree-vect-transform.c (vect_get_cost_fields): Remove.
      (vect_model_simple_cost): Call
stmt_vinfo_set_inside/outside_of_loop_cost
      to set the relevant cost field instead of calling
vect_get_cost_fields.
      (vect_model_store_cost, vect_model_load_cost): Likewise.

(See attached file: costs.txt)

[-- Attachment #2: costs.txt --]
[-- Type: text/plain, Size: 9050 bytes --]

Index: tree-vectorizer.h
===================================================================
--- tree-vectorizer.h	(revision 128289)
+++ tree-vectorizer.h	(working copy)
@@ -561,6 +561,27 @@ is_loop_header_bb_p (basic_block bb)
   return false;
 }
 
+static inline void 
+stmt_vinfo_set_inside_of_loop_cost (stmt_vec_info stmt_info, slp_tree slp_node, 
+				    int cost)
+{
+  if (slp_node)
+    SLP_TREE_INSIDE_OF_LOOP_COST (slp_node) = cost;
+  else
+    STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost;
+}     
+
+static inline void 
+stmt_vinfo_set_outside_of_loop_cost (stmt_vec_info stmt_info, slp_tree slp_node, 
+				     int cost)
+{
+  if (slp_node)
+    SLP_TREE_OUTSIDE_OF_LOOP_COST (slp_node) = cost;
+  else
+    STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = cost;
+}     
+
+
 /*-----------------------------------------------------------------*/
 /* Info on data references alignment.                              */
 /*-----------------------------------------------------------------*/
Index: tree-vect-transform.c
===================================================================
--- tree-vect-transform.c	(revision 128289)
+++ tree-vect-transform.c	(working copy)
@@ -462,26 +462,6 @@ vect_model_induction_cost (stmt_vec_info
 }
 
 
-/* Return addresses of the cost fields of SLP_NODE if it's not NULL, and of
-   the stmt otherwise.  */
-
-static inline void
-vect_get_cost_fields (stmt_vec_info stmt_info, slp_tree slp_node, 
-		      int **inside_cost_field, int **outside_cost_field)
-{
-  if (slp_node)
-    {
-      *inside_cost_field = &(SLP_TREE_INSIDE_OF_LOOP_COST (slp_node));
-      *outside_cost_field = &(SLP_TREE_OUTSIDE_OF_LOOP_COST (slp_node));
-    }
-  else
-    {
-      *inside_cost_field = &(STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info));
-      *outside_cost_field = &(STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info));
-    }
-}
-
-
 /* Function vect_model_simple_cost.  
 
    Models cost for simple operations, i.e. those that only emit ncopies of a 
@@ -493,24 +473,24 @@ vect_model_simple_cost (stmt_vec_info st
 			enum vect_def_type *dt, slp_tree slp_node)
 {
   int i;
-  int *inside_cost_field, *outside_cost_field;
+  int inside_cost = 0, outside_cost = 0;
 
-  /* Take addresses of relevant fields to update in the function.  */
-  vect_get_cost_fields (stmt_info, slp_node, &inside_cost_field, 
-			&outside_cost_field);
-
-  *inside_cost_field = ncopies * TARG_VEC_STMT_COST;
+  inside_cost = ncopies * TARG_VEC_STMT_COST;
 
   /* FORNOW: Assuming maximum 2 args per stmts.  */
   for (i = 0; i < 2; i++)
     {
       if (dt[i] == vect_constant_def || dt[i] == vect_invariant_def)
-	*outside_cost_field += TARG_SCALAR_TO_VEC_COST; 
+	outside_cost += TARG_SCALAR_TO_VEC_COST; 
     }
   
   if (vect_print_dump_info (REPORT_DETAILS))
     fprintf (vect_dump, "vect_model_simple_cost: inside_cost = %d, "
-             "outside_cost = %d .", *inside_cost_field, *outside_cost_field);
+             "outside_cost = %d .", inside_cost, outside_cost);
+
+  /* Set the costs either in STMT_INFO or SLP_NODE (if exists).  */
+  stmt_vinfo_set_inside_of_loop_cost (stmt_info, slp_node, inside_cost);
+  stmt_vinfo_set_outside_of_loop_cost (stmt_info, slp_node, outside_cost);
 }
 
 
@@ -541,16 +521,11 @@ void
 vect_model_store_cost (stmt_vec_info stmt_info, int ncopies, 
 		       enum vect_def_type dt, slp_tree slp_node)
 {
-  int cost = 0;
   int group_size;
-  int *inside_cost_field, *outside_cost_field;
-
-  /* Take addresses of relevant fields to update in the function.  */
-  vect_get_cost_fields (stmt_info, slp_node, &inside_cost_field, 
-			&outside_cost_field);
+  int inside_cost = 0, outside_cost = 0;
 
   if (dt == vect_constant_def || dt == vect_invariant_def)
-    *outside_cost_field = TARG_SCALAR_TO_VEC_COST;
+    outside_cost = TARG_SCALAR_TO_VEC_COST;
 
   /* Strided access?  */
   if (DR_GROUP_FIRST_DR (stmt_info)) 
@@ -564,7 +539,7 @@ vect_model_store_cost (stmt_vec_info stm
   if (group_size > 1) 
     {
       /* Uses a high and low interleave operation for each needed permute.  */
-      cost = ncopies * exact_log2(group_size) * group_size 
+      inside_cost = ncopies * exact_log2(group_size) * group_size 
              * TARG_VEC_STMT_COST;
 
       if (vect_print_dump_info (REPORT_DETAILS))
@@ -574,13 +549,15 @@ vect_model_store_cost (stmt_vec_info stm
     }
 
   /* Costs of the stores.  */
-  cost += ncopies * TARG_VEC_STORE_COST;
-
-  *inside_cost_field = cost;
+  inside_cost += ncopies * TARG_VEC_STORE_COST;
 
   if (vect_print_dump_info (REPORT_DETAILS))
     fprintf (vect_dump, "vect_model_store_cost: inside_cost = %d, "
-             "outside_cost = %d .", *inside_cost_field, *outside_cost_field);
+             "outside_cost = %d .", inside_cost, outside_cost);
+
+  /* Set the costs either in STMT_INFO or SLP_NODE (if exists).  */
+  stmt_vinfo_set_inside_of_loop_cost (stmt_info, slp_node, inside_cost);
+  stmt_vinfo_set_outside_of_loop_cost (stmt_info, slp_node, outside_cost);
 }
 
 
@@ -595,16 +572,11 @@ void
 vect_model_load_cost (stmt_vec_info stmt_info, int ncopies, slp_tree slp_node)
 		 
 {
-  int inner_cost = 0;
   int group_size;
   int alignment_support_cheme;
   tree first_stmt;
   struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info), *first_dr;
-  int *inside_cost_field, *outside_cost_field;
-
-  /* Take addresses of relevant fields to update in the function.  */
-  vect_get_cost_fields (stmt_info, slp_node, &inside_cost_field, 
-			&outside_cost_field);
+  int inside_cost = 0, outside_cost = 0;
 
   /* Strided accesses?  */
   first_stmt = DR_GROUP_FIRST_DR (stmt_info);
@@ -627,8 +599,8 @@ vect_model_load_cost (stmt_vec_info stmt
   if (group_size > 1) 
     {
       /* Uses an even and odd extract operations for each needed permute.  */
-      inner_cost = ncopies * exact_log2(group_size) * group_size
-                   * TARG_VEC_STMT_COST;
+      inside_cost = ncopies * exact_log2(group_size) * group_size
+	* TARG_VEC_STMT_COST;
 
       if (vect_print_dump_info (REPORT_DETAILS))
         fprintf (vect_dump, "vect_model_load_cost: strided group_size = %d .",
@@ -641,7 +613,7 @@ vect_model_load_cost (stmt_vec_info stmt
     {
     case dr_aligned:
       {
-        inner_cost += ncopies * TARG_VEC_LOAD_COST;
+        inside_cost += ncopies * TARG_VEC_LOAD_COST;
 
         if (vect_print_dump_info (REPORT_DETAILS))
           fprintf (vect_dump, "vect_model_load_cost: aligned.");
@@ -651,7 +623,7 @@ vect_model_load_cost (stmt_vec_info stmt
     case dr_unaligned_supported:
       {
         /* Here, we assign an additional cost for the unaligned load.  */
-        inner_cost += ncopies * TARG_VEC_UNALIGNED_LOAD_COST;
+        inside_cost += ncopies * TARG_VEC_UNALIGNED_LOAD_COST;
 
         if (vect_print_dump_info (REPORT_DETAILS))
           fprintf (vect_dump, "vect_model_load_cost: unaligned supported by "
@@ -661,20 +633,18 @@ vect_model_load_cost (stmt_vec_info stmt
       }
     case dr_explicit_realign:
       {
-        inner_cost += ncopies * (2*TARG_VEC_LOAD_COST + TARG_VEC_STMT_COST);
+        inside_cost += ncopies * (2*TARG_VEC_LOAD_COST + TARG_VEC_STMT_COST);
 
         /* FIXME: If the misalignment remains fixed across the iterations of
            the containing loop, the following cost should be added to the
            outside costs.  */
         if (targetm.vectorize.builtin_mask_for_load)
-          inner_cost += TARG_VEC_STMT_COST;
+          inside_cost += TARG_VEC_STMT_COST;
 
         break;
       }
     case dr_explicit_realign_optimized:
       {
-        int outer_cost = 0;
-
         if (vect_print_dump_info (REPORT_DETAILS))
           fprintf (vect_dump, "vect_model_load_cost: unaligned software "
                    "pipelined.");
@@ -688,14 +658,12 @@ vect_model_load_cost (stmt_vec_info stmt
 
         if ((!DR_GROUP_FIRST_DR (stmt_info)) || group_size > 1 || slp_node)
           {
-            outer_cost = 2*TARG_VEC_STMT_COST;
+            outside_cost = 2*TARG_VEC_STMT_COST;
             if (targetm.vectorize.builtin_mask_for_load)
-              outer_cost += TARG_VEC_STMT_COST;
+              outside_cost += TARG_VEC_STMT_COST;
           }
-        
-        *outside_cost_field = outer_cost;
 
-        inner_cost += ncopies * (TARG_VEC_LOAD_COST + TARG_VEC_STMT_COST);
+        inside_cost += ncopies * (TARG_VEC_LOAD_COST + TARG_VEC_STMT_COST);
 
         break;
       }
@@ -703,13 +671,14 @@ vect_model_load_cost (stmt_vec_info stmt
     default:
       gcc_unreachable ();
     }
-
-  *inside_cost_field = inner_cost;
-
+  
   if (vect_print_dump_info (REPORT_DETAILS))
     fprintf (vect_dump, "vect_model_load_cost: inside_cost = %d, "
-             "outside_cost = %d .", *inside_cost_field, *outside_cost_field);
+             "outside_cost = %d .", inside_cost, outside_cost);
 
+  /* Set the costs either in STMT_INFO or SLP_NODE (if exists).  */
+  stmt_vinfo_set_inside_of_loop_cost (stmt_info, slp_node, inside_cost);
+  stmt_vinfo_set_outside_of_loop_cost (stmt_info, slp_node, outside_cost);
 }
 
 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] Loop-aware SLP 4/5
       [not found] <OFADE7D20D.3F117D3E-ONC2257351.002B67C0-C2257351.002BF742@LocalDomain>
@ 2007-09-09  9:05 ` Ira Rosen
  2007-09-09 11:32 ` Ira Rosen
  1 sibling, 0 replies; 7+ messages in thread
From: Ira Rosen @ 2007-09-09  9:05 UTC (permalink / raw)
  To: Dorit Nuzman; +Cc: gcc-patches



Dorit Nuzman/Haifa/IBM wrote on 09/09/2007 11:00:13:

> Ira Rosen/Haifa/IBM wrote on 09/09/2007 10:12:40:
>
> > Dorit Nuzman/Haifa/IBM wrote on 18/08/2007 21:36:27:
> >
> > > Ira Rosen/Haifa/IBM wrote on 14/08/2007 16:15:19:
> > >
> > > > This part adds an adjustment of vectorization cost model to SLP.
> > > >
> > > > SLP costs are calculated for each SLP computation tree recursively
> > > > during its analysis, and added to the overall costs if we decide to
SLP.
> > > >
> > > > Thanks,
> > > > Ira
> > > >
> > >
> > > +/* SLP costs are calculated according to SLP instance unrolling
> > factor (i.e.,
> > > +   the number of created vector stmts depends on the unrolling
> > > factor). However,
> > > +   the actual number of vector stmts for every SLP node depends on
> > > VF which is
> > > +   set later in vect_analyze_operations(). Hence, SLP costs should
> > > be updated.
> > > +   In this function we assume that the inside costs calculated in
> > > +   vect_model_xxx_cost are linear in ncopies.  */
> > >
> > > can you explain why this assumption holds (or when it may be broken,
> > > and assert against that)?
> > >
> > > +      /* We assume that costs are linear in ncopies.  */
> > > +      if (SLP_INSTANCE_UNROLLING_FACTOR (instance) != vf)
> > > + SLP_INSTANCE_INSIDE_OF_LOOP_COST (instance) *= vf
> > > +   / SLP_INSTANCE_UNROLLING_FACTOR (instance);
> > >
> > > to me it's less confusing if you remove the 'if' (when unrolling==vf
> > > you'll just multiply by 1, so no need for the special check).
> > >
> >
> > Done.
> >
> > > and lastly, about the following repeating change:
> > >
> > > +  int *inside_cost_field, *outside_cost_field;
> > > +
> > > +  /* Take addresses of relevant fields to update in the function.
*/
> > > +  if (slp_node)
> > > +    {
> > > +      inside_cost_field = &(SLP_TREE_INSIDE_OF_LOOP_COST
(slp_node));
> > > +      outside_cost_field = &(SLP_TREE_OUTSIDE_OF_LOOP_COST
(slp_node));
> > > +    }
> > > +  else
> > > +    {
> > > +      inside_cost_field = &(STMT_VINFO_INSIDE_OF_LOOP_COST
(stmt_info));
> > > +      outside_cost_field = &(STMT_VINFO_OUTSIDE_OF_LOOP_COST
> (stmt_info));
> > > +    }
> > > ...
> > > -        STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = outer_cost;
> > > +        *outside_cost_field = outer_cost;
> > >
> > >          inner_cost += ncopies * (TARG_VEC_LOAD_COST +
> TARG_VEC_STMT_COST);
> > >
> > > @@ -607,12 +654,12 @@ vect_model_load_cost (stmt_vec_info stmt
> > >        gcc_unreachable ();
> > >      }
> > >
> > > -  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = inner_cost;
> > > +  *inside_cost_field = inner_cost;
> > >
> > > Again, in an attempt to reduce "if (slp_node)" switches in the code,
> > > I'd replace
> > >  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost;
> > >  STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = cost;
> > > with
> > >  stmt_vinfo_inside_of_loop_cost (stmt_info, slp_node, cost);
> > >  stmt_vinfo_outside_of_loop_cost (stmt_info, slp_node, cost);
> > >
> > > ...and hide the "if (slp_node)" switch there:
> > >  stmt_vinfo_inside_of_loop_cost (stmt_info, slp_node, cost)
> > >    {
> > >      if (slp_node)
> > >   SLP_TREE_INSIDE_OF_LOOP_COST (slp_node) = cost;
> > >      else
> > >   STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost;
> > >    }
> > >
> > > (for your consideration...)
> >
> > I moved the choice of the relevant field to a different (inlined)
function.
> >
>
> I prefer the alternative I suggested above - it's a bit less code
> (no need for the call to the selection function) and more
> importantly the whole field selection machinery is entirely hidden.
> Please consider changing (or explain why you prefer the other solution).
>
> In the meantime, ok to commit this patch and all the other parts as
> well (the above change can be applied separately later).

O.K. I will reconsider it.
Committed as is meantime.

Thanks,
Ira


>
> thanks for addressing the comments,
>
> dorit
>
> > Thanks,
> > Ira
> >
> > >
> > > ok otherwise,
> > >
> > > thanks,
> > > dorit
> > >
> > > > ChangeLog:
> > > >
> > > >  * tree-vectorizer.h (_slp_tree): Add new fields for costs and
their
> > > >  access functions.
> > > >  (_slp_instance): Likewise.
> > > >  (vect_model_simple_cost, vect_model_store_cost,
vect_model_load_cost):
> > > >  Declare (make extern).
> > > >  * tree-vect-analyze.c (vect_update_slp_costs_according_to_vf):
New.
> > > >  (vect_analyze_operations): Call
vect_update_slp_costs_according_to_vf.
> > > >  (vect_get_and_check_slp_defs): Calculate costs. Add arguments.
> > > >  (vect_build_slp_tree): Likewise.
> > > >  (vect_analyze_slp_instance): Initialize cost fields. Update
> > > >  arguments of vect_build_slp_tree.
> > > >  * tree-vect-transform.c (vect_estimate_min_profitable_iters): Take
> > > >  SLP costs into account.
> > > >  (vect_model_simple_cost): Make extern, add SLP parameter and
handle
> > > >  SLP.
> > > >  (vect_model_store_cost, vect_model_load_cost): Likewise.
> > > >       (vectorizable_call): Add argument to vect_model_simple_cost.
> > > >       (vectorizable_assignment): Call vect_model_simple_cost
> only for not
> > > >       pure SLP stmts.
> > > >       (vectorizable_operation): Likewise.
> > > >       (vectorizable_type_demotion): Add argument to
> > > >  vect_model_simple_cost.
> > > >       (vectorizable_type_promotion): Likewise.
> > > >       (vectorizable_store): Call vect_model_simple_cost only
> for not pure
> > > >  SLP stmts.
> > > >       (vectorizable_load): Likewise.
> > > >
> > > > [attachment "slp-part4.txt" deleted by Dorit Nuzman/Haifa/IBM]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] Loop-aware SLP 4/5
       [not found] <OF988F5AAC.5E4646D5-ONC225734D.002EB0F3-C2257351.00279CD2@LocalDomain>
@ 2007-09-09  8:59 ` Dorit Nuzman
  0 siblings, 0 replies; 7+ messages in thread
From: Dorit Nuzman @ 2007-09-09  8:59 UTC (permalink / raw)
  To: Ira Rosen; +Cc: gcc-patches

Ira Rosen/Haifa/IBM wrote on 09/09/2007 10:12:40:

> Dorit Nuzman/Haifa/IBM wrote on 18/08/2007 21:36:27:
>
> > Ira Rosen/Haifa/IBM wrote on 14/08/2007 16:15:19:
> >
> > > This part adds an adjustment of vectorization cost model to SLP.
> > >
> > > SLP costs are calculated for each SLP computation tree recursively
> > > during its analysis, and added to the overall costs if we decide to
SLP.
> > >
> > > Thanks,
> > > Ira
> > >
> >
> > +/* SLP costs are calculated according to SLP instance unrolling
> factor (i.e.,
> > +   the number of created vector stmts depends on the unrolling
> > factor). However,
> > +   the actual number of vector stmts for every SLP node depends on
> > VF which is
> > +   set later in vect_analyze_operations(). Hence, SLP costs should
> > be updated.
> > +   In this function we assume that the inside costs calculated in
> > +   vect_model_xxx_cost are linear in ncopies.  */
> >
> > can you explain why this assumption holds (or when it may be broken,
> > and assert against that)?
> >
> > +      /* We assume that costs are linear in ncopies.  */
> > +      if (SLP_INSTANCE_UNROLLING_FACTOR (instance) != vf)
> > + SLP_INSTANCE_INSIDE_OF_LOOP_COST (instance) *= vf
> > +   / SLP_INSTANCE_UNROLLING_FACTOR (instance);
> >
> > to me it's less confusing if you remove the 'if' (when unrolling==vf
> > you'll just multiply by 1, so no need for the special check).
> >
>
> Done.
>
> > and lastly, about the following repeating change:
> >
> > +  int *inside_cost_field, *outside_cost_field;
> > +
> > +  /* Take addresses of relevant fields to update in the function.  */
> > +  if (slp_node)
> > +    {
> > +      inside_cost_field = &(SLP_TREE_INSIDE_OF_LOOP_COST (slp_node));
> > +      outside_cost_field = &(SLP_TREE_OUTSIDE_OF_LOOP_COST
(slp_node));
> > +    }
> > +  else
> > +    {
> > +      inside_cost_field = &(STMT_VINFO_INSIDE_OF_LOOP_COST
(stmt_info));
> > +      outside_cost_field = &(STMT_VINFO_OUTSIDE_OF_LOOP_COST
(stmt_info));
> > +    }
> > ...
> > -        STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = outer_cost;
> > +        *outside_cost_field = outer_cost;
> >
> >          inner_cost += ncopies * (TARG_VEC_LOAD_COST +
TARG_VEC_STMT_COST);
> >
> > @@ -607,12 +654,12 @@ vect_model_load_cost (stmt_vec_info stmt
> >        gcc_unreachable ();
> >      }
> >
> > -  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = inner_cost;
> > +  *inside_cost_field = inner_cost;
> >
> > Again, in an attempt to reduce "if (slp_node)" switches in the code,
> > I'd replace
> >  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost;
> >  STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = cost;
> > with
> >  stmt_vinfo_inside_of_loop_cost (stmt_info, slp_node, cost);
> >  stmt_vinfo_outside_of_loop_cost (stmt_info, slp_node, cost);
> >
> > ...and hide the "if (slp_node)" switch there:
> >  stmt_vinfo_inside_of_loop_cost (stmt_info, slp_node, cost)
> >    {
> >      if (slp_node)
> >   SLP_TREE_INSIDE_OF_LOOP_COST (slp_node) = cost;
> >      else
> >   STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost;
> >    }
> >
> > (for your consideration...)
>
> I moved the choice of the relevant field to a different (inlined)
function.
>

I prefer the alternative I suggested above - it's a bit less code (no need
for the call to the selection function) and more importantly the whole
field selection machinery is entirely hidden. Please consider changing (or
explain why you prefer the other solution).

In the meantime, ok to commit this patch and all the other parts as well
(the above change can be applied separately later).

thanks for addressing the comments,

dorit

> Thanks,
> Ira
>
> >
> > ok otherwise,
> >
> > thanks,
> > dorit
> >
> > > ChangeLog:
> > >
> > >  * tree-vectorizer.h (_slp_tree): Add new fields for costs and their
> > >  access functions.
> > >  (_slp_instance): Likewise.
> > >  (vect_model_simple_cost, vect_model_store_cost,
vect_model_load_cost):
> > >  Declare (make extern).
> > >  * tree-vect-analyze.c (vect_update_slp_costs_according_to_vf): New.
> > >  (vect_analyze_operations): Call
vect_update_slp_costs_according_to_vf.
> > >  (vect_get_and_check_slp_defs): Calculate costs. Add arguments.
> > >  (vect_build_slp_tree): Likewise.
> > >  (vect_analyze_slp_instance): Initialize cost fields. Update
> > >  arguments of vect_build_slp_tree.
> > >  * tree-vect-transform.c (vect_estimate_min_profitable_iters): Take
> > >  SLP costs into account.
> > >  (vect_model_simple_cost): Make extern, add SLP parameter and handle
> > >  SLP.
> > >  (vect_model_store_cost, vect_model_load_cost): Likewise.
> > >       (vectorizable_call): Add argument to vect_model_simple_cost.
> > >       (vectorizable_assignment): Call vect_model_simple_cost only for
not
> > >       pure SLP stmts.
> > >       (vectorizable_operation): Likewise.
> > >       (vectorizable_type_demotion): Add argument to
> > >  vect_model_simple_cost.
> > >       (vectorizable_type_promotion): Likewise.
> > >       (vectorizable_store): Call vect_model_simple_cost only for not
pure
> > >  SLP stmts.
> > >       (vectorizable_load): Likewise.
> > >
> > > [attachment "slp-part4.txt" deleted by Dorit Nuzman/Haifa/IBM]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] Loop-aware SLP 4/5
       [not found] <OFFE7E55A6.7B3C8820-ONC225733B.006323F7-C225733B.00663708@LocalDomain>
@ 2007-09-09  7:57 ` Ira Rosen
  0 siblings, 0 replies; 7+ messages in thread
From: Ira Rosen @ 2007-09-09  7:57 UTC (permalink / raw)
  To: Dorit Nuzman; +Cc: gcc-patches



Dorit Nuzman/Haifa/IBM wrote on 18/08/2007 21:36:27:

> Ira Rosen/Haifa/IBM wrote on 14/08/2007 16:15:19:
>
> > This part adds an adjustment of vectorization cost model to SLP.
> >
> > SLP costs are calculated for each SLP computation tree recursively
> > during its analysis, and added to the overall costs if we decide to
SLP.
> >
> > Thanks,
> > Ira
> >
>
> +/* SLP costs are calculated according to SLP instance unrolling factor
(i.e.,
> +   the number of created vector stmts depends on the unrolling
> factor). However,
> +   the actual number of vector stmts for every SLP node depends on
> VF which is
> +   set later in vect_analyze_operations(). Hence, SLP costs should
> be updated.
> +   In this function we assume that the inside costs calculated in
> +   vect_model_xxx_cost are linear in ncopies.  */
>
> can you explain why this assumption holds (or when it may be broken,
> and assert against that)?
>
> +      /* We assume that costs are linear in ncopies.  */
> +      if (SLP_INSTANCE_UNROLLING_FACTOR (instance) != vf)
> + SLP_INSTANCE_INSIDE_OF_LOOP_COST (instance) *= vf
> +   / SLP_INSTANCE_UNROLLING_FACTOR (instance);
>
> to me it's less confusing if you remove the 'if' (when unrolling==vf
> you'll just multiply by 1, so no need for the special check).
>

Done.

> and lastly, about the following repeating change:
>
> +  int *inside_cost_field, *outside_cost_field;
> +
> +  /* Take addresses of relevant fields to update in the function.  */
> +  if (slp_node)
> +    {
> +      inside_cost_field = &(SLP_TREE_INSIDE_OF_LOOP_COST (slp_node));
> +      outside_cost_field = &(SLP_TREE_OUTSIDE_OF_LOOP_COST (slp_node));
> +    }
> +  else
> +    {
> +      inside_cost_field = &(STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info));
> +      outside_cost_field = &(STMT_VINFO_OUTSIDE_OF_LOOP_COST
(stmt_info));
> +    }
> ...
> -        STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = outer_cost;
> +        *outside_cost_field = outer_cost;
>
>          inner_cost += ncopies * (TARG_VEC_LOAD_COST +
TARG_VEC_STMT_COST);
>
> @@ -607,12 +654,12 @@ vect_model_load_cost (stmt_vec_info stmt
>        gcc_unreachable ();
>      }
>
> -  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = inner_cost;
> +  *inside_cost_field = inner_cost;
>
> Again, in an attempt to reduce "if (slp_node)" switches in the code,
> I'd replace
>  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost;
>  STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = cost;
> with
>  stmt_vinfo_inside_of_loop_cost (stmt_info, slp_node, cost);
>  stmt_vinfo_outside_of_loop_cost (stmt_info, slp_node, cost);
>
> ...and hide the "if (slp_node)" switch there:
>  stmt_vinfo_inside_of_loop_cost (stmt_info, slp_node, cost)
>    {
>      if (slp_node)
>   SLP_TREE_INSIDE_OF_LOOP_COST (slp_node) = cost;
>      else
>   STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost;
>    }
>
> (for your consideration...)

I moved the choice of the relevant field to a different (inlined) function.

Thanks,
Ira


>
> ok otherwise,
>
> thanks,
> dorit
>
> > ChangeLog:
> >
> >  * tree-vectorizer.h (_slp_tree): Add new fields for costs and their
> >  access functions.
> >  (_slp_instance): Likewise.
> >  (vect_model_simple_cost, vect_model_store_cost, vect_model_load_cost):
> >  Declare (make extern).
> >  * tree-vect-analyze.c (vect_update_slp_costs_according_to_vf): New.
> >  (vect_analyze_operations): Call vect_update_slp_costs_according_to_vf.
> >  (vect_get_and_check_slp_defs): Calculate costs. Add arguments.
> >  (vect_build_slp_tree): Likewise.
> >  (vect_analyze_slp_instance): Initialize cost fields. Update
> >  arguments of vect_build_slp_tree.
> >  * tree-vect-transform.c (vect_estimate_min_profitable_iters): Take
> >  SLP costs into account.
> >  (vect_model_simple_cost): Make extern, add SLP parameter and handle
> >  SLP.
> >  (vect_model_store_cost, vect_model_load_cost): Likewise.
> >       (vectorizable_call): Add argument to vect_model_simple_cost.
> >       (vectorizable_assignment): Call vect_model_simple_cost only for
not
> >       pure SLP stmts.
> >       (vectorizable_operation): Likewise.
> >       (vectorizable_type_demotion): Add argument to
> >  vect_model_simple_cost.
> >       (vectorizable_type_promotion): Likewise.
> >       (vectorizable_store): Call vect_model_simple_cost only for not
pure
> >  SLP stmts.
> >       (vectorizable_load): Likewise.
> >
> > [attachment "slp-part4.txt" deleted by Dorit Nuzman/Haifa/IBM]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] Loop-aware SLP 4/5
  2007-08-14 13:01 [patch] Loop-aware SLP Ira Rosen
@ 2007-08-14 13:16 ` Ira Rosen
  0 siblings, 0 replies; 7+ messages in thread
From: Ira Rosen @ 2007-08-14 13:16 UTC (permalink / raw)
  To: Ira Rosen; +Cc: Dorit Nuzman, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1582 bytes --]

This part adds an adjustment of vectorization cost model to SLP.

SLP costs are calculated for each SLP computation tree recursively during
its analysis, and added to the overall costs if we decide to SLP.

Thanks,
Ira

ChangeLog:

      * tree-vectorizer.h (_slp_tree): Add new fields for costs and their
      access functions.
      (_slp_instance): Likewise.
      (vect_model_simple_cost, vect_model_store_cost,
vect_model_load_cost):
      Declare (make extern).
      * tree-vect-analyze.c (vect_update_slp_costs_according_to_vf): New.
      (vect_analyze_operations): Call
vect_update_slp_costs_according_to_vf.
      (vect_get_and_check_slp_defs): Calculate costs. Add arguments.
      (vect_build_slp_tree): Likewise.
      (vect_analyze_slp_instance): Initialize cost fields. Update
      arguments of vect_build_slp_tree.
      * tree-vect-transform.c (vect_estimate_min_profitable_iters): Take
      SLP costs into account.
      (vect_model_simple_cost): Make extern, add SLP parameter and handle
      SLP.
      (vect_model_store_cost, vect_model_load_cost): Likewise.
      (vectorizable_call): Add argument to vect_model_simple_cost.
      (vectorizable_assignment): Call vect_model_simple_cost only for not
      pure SLP stmts.
      (vectorizable_operation): Likewise.
      (vectorizable_type_demotion): Add argument to
      vect_model_simple_cost.
      (vectorizable_type_promotion): Likewise.
      (vectorizable_store): Call vect_model_simple_cost only for not pure
      SLP stmts.
      (vectorizable_load): Likewise.

(See attached file: slp-part4.txt)

[-- Attachment #2: slp-part4.txt --]
[-- Type: text/plain, Size: 19284 bytes --]

--- tree-vectorizer.h.patch2	2007-08-14 15:43:29.000000000 +0300
+++ tree-vectorizer.h	2007-08-14 15:44:42.000000000 +0300
@@ -120,6 +120,12 @@ typedef struct _slp_instance {
   /* The unrolling factor required to vectorized this SLP instance.  */
   unsigned int unrolling_factor;
 
+  /* Vectorization costs associated with SLP instance.  */
+  struct  
+  {
+    int outside_of_loop;     /* Statements generated outside loop.  */
+    int inside_of_loop;      /* Statements generated inside loop.  */
+  } cost;
 } *slp_instance;
 
 DEF_VEC_P(slp_instance);
@@ -129,12 +135,16 @@ DEF_VEC_ALLOC_P(slp_instance, heap);
 #define SLP_INSTANCE_TREE(S)                     (S)->root
 #define SLP_INSTANCE_GROUP_SIZE(S)               (S)->group_size
 #define SLP_INSTANCE_UNROLLING_FACTOR(S)         (S)->unrolling_factor
+#define SLP_INSTANCE_OUTSIDE_OF_LOOP_COST(S)     (S)->cost.outside_of_loop
+#define SLP_INSTANCE_INSIDE_OF_LOOP_COST(S)      (S)->cost.inside_of_loop
 
 #define SLP_TREE_LEFT(S)                         (S)->left
 #define SLP_TREE_RIGHT(S)                        (S)->right
 #define SLP_TREE_SCALAR_STMTS(S)                 (S)->stmts
 #define SLP_TREE_VEC_STMTS(S)                    (S)->vec_stmts
 #define SLP_TREE_NUMBER_OF_VEC_STMTS(S)          (S)->vec_stmts_size
+#define SLP_TREE_OUTSIDE_OF_LOOP_COST(S)         (S)->cost.outside_of_loop
+#define SLP_TREE_INSIDE_OF_LOOP_COST(S)          (S)->cost.inside_of_loop
 
 /*-----------------------------------------------------------------*/
 /* Info on vectorized loops.                                       */
@@ -598,6 +608,11 @@ extern bool vectorizable_live_operation 
 extern bool vectorizable_reduction (tree, block_stmt_iterator *, tree *);
 extern bool vectorizable_induction (tree, block_stmt_iterator *, tree *);
 extern int  vect_estimate_min_profitable_iters (loop_vec_info);
+extern void vect_model_simple_cost (stmt_vec_info, int, enum vect_def_type *, 
+				    slp_tree);
+extern void vect_model_store_cost (stmt_vec_info, int, enum vect_def_type, 
+				   slp_tree);
+extern void vect_model_load_cost (stmt_vec_info, int, slp_tree);
 /* Driver for transformation stage.  */
 extern void vect_transform_loop (loop_vec_info);
 
--- tree-vect-analyze.c.patch4	2007-08-14 15:42:42.000000000 +0300
+++ tree-vect-analyze.c	2007-08-14 15:43:26.000000000 +0300
@@ -350,6 +350,33 @@ vect_can_advance_ivs_p (loop_vec_info lo
 }
 
 
+/* SLP costs are calculated according to SLP instance unrolling factor (i.e., 
+   the number of created vector stmts depends on the unrolling factor). However,
+   the actual number of vector stmts for every SLP node depends on VF which is
+   set later in vect_analyze_operations(). Hence, SLP costs should be updated.
+   In this function we assume that the inside costs calculated in 
+   vect_model_xxx_cost are linear in ncopies.  */
+
+static void
+vect_update_slp_costs_according_to_vf (loop_vec_info loop_vinfo)
+{
+  unsigned int i, vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  VEC (slp_instance, heap) *slp_instances = LOOP_VINFO_SLP_INSTANCES (loop_vinfo);
+  slp_instance instance;
+
+  if (vect_print_dump_info (REPORT_DETAILS))
+    fprintf (vect_dump, "=== vect_update_slp_costs_according_to_vf ===");
+
+  for (i = 0; VEC_iterate (slp_instance, slp_instances, i, instance); i++)
+    {
+      /* We assume that costs are linear in ncopies.  */
+      if (SLP_INSTANCE_UNROLLING_FACTOR (instance) != vf)
+	SLP_INSTANCE_INSIDE_OF_LOOP_COST (instance) *= vf 
+	  / SLP_INSTANCE_UNROLLING_FACTOR (instance);	  
+    }
+}
+
+
 /* Function vect_analyze_operations.
 
    Scan the loop stmts and make sure they are all vectorizable.  */
@@ -569,6 +596,10 @@ vect_analyze_operations (loop_vec_info l
       LOOP_VINFO_VECT_FACTOR (loop_vinfo) = vectorization_factor;
     }
 
+  /* After VF is set, SLP costs should be updated since the number of created
+     vector stmts depends on VF.  */
+  vect_update_slp_costs_according_to_vf (loop_vinfo);
+
   if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
       && vect_print_dump_info (REPORT_DETAILS))
     fprintf (vect_dump,
@@ -2248,7 +2279,8 @@ vect_get_and_check_slp_defs (loop_vec_in
 			     enum vect_def_type *first_stmt_dt1,
 			     tree *first_stmt_def0_type, 
 			     tree *first_stmt_def1_type,
-			     tree *first_stmt_const_oprnd)
+			     tree *first_stmt_const_oprnd,
+			     int ncopies_for_cost)
 {
   tree oprnd;
   enum operation_type op_type = TREE_OPERAND_LENGTH (rhs);
@@ -2291,6 +2323,14 @@ vect_get_and_check_slp_defs (loop_vec_in
 	    *first_stmt_def0_type = TREE_TYPE (def);
 	  else
 	    *first_stmt_const_oprnd = oprnd;
+
+	  /* Analyze costs (for the first stmt of the group only).  */
+	  if (op_type)
+	    /* Not memory operation (we don't call this functions for loads).  */
+	    vect_model_simple_cost (stmt_info, ncopies_for_cost, dt, slp_node);
+	  else
+	    /* Store.  */
+	    vect_model_store_cost (stmt_info, ncopies_for_cost, dt[0], slp_node);
 	}
       
       else
@@ -2369,7 +2409,9 @@ vect_get_and_check_slp_defs (loop_vec_in
 
 static bool
 vect_build_slp_tree (loop_vec_info loop_vinfo, slp_tree *node, 
-		     unsigned int group_size, bool *slp_impossible)
+		     unsigned int group_size, bool *slp_impossible,
+		     int *inside_cost, int *outside_cost,
+		     int ncopies_for_cost)
 {
   VEC (tree, heap) *def_stmts0 = VEC_alloc (tree, heap, group_size);
   VEC (tree, heap) *def_stmts1 =  VEC_alloc (tree, heap, group_size);
@@ -2492,7 +2534,8 @@ vect_build_slp_tree (loop_vec_info loop_
 						&first_stmt_dt1, 
 						&first_stmt_def0_type, 
 						&first_stmt_def1_type,
-						&first_stmt_const_oprnd))
+						&first_stmt_const_oprnd,
+						ncopies_for_cost))
 		return false;
 	    }
 	    else
@@ -2515,6 +2558,10 @@ vect_build_slp_tree (loop_vec_info loop_
 
 			return false;
 		      }
+
+		    /* Analyze costs (for the first stmt in the group).  */
+		    vect_model_load_cost (vinfo_for_stmt (stmt), 
+					  ncopies_for_cost, *node);
 		  }
 		else
 		  {
@@ -2572,11 +2619,16 @@ vect_build_slp_tree (loop_vec_info loop_
 					    &first_stmt_dt1, 
 					    &first_stmt_def0_type, 
 					    &first_stmt_def1_type,
-					    &first_stmt_const_oprnd))
+					    &first_stmt_const_oprnd,
+					    ncopies_for_cost))
 	    return false;
 	}
     }
 
+  /* Add the costs of the node to the overall instance costs.  */
+  *inside_cost += SLP_TREE_INSIDE_OF_LOOP_COST (*node); 
+  *outside_cost += SLP_TREE_OUTSIDE_OF_LOOP_COST (*node);
+
   /* Strided loads were reached - stop the recursion.  */
   if (stop_recursion)
     return true;
@@ -2589,8 +2641,11 @@ vect_build_slp_tree (loop_vec_info loop_
       SLP_TREE_VEC_STMTS (left_node) = NULL;
       SLP_TREE_LEFT (left_node) = NULL;
       SLP_TREE_RIGHT (left_node) = NULL;
+      SLP_TREE_OUTSIDE_OF_LOOP_COST (left_node) = 0;
+      SLP_TREE_INSIDE_OF_LOOP_COST (left_node) = 0;
       if (!vect_build_slp_tree (loop_vinfo, &left_node, group_size, 
-				slp_impossible))
+				slp_impossible, inside_cost, outside_cost,
+				ncopies_for_cost))
 	return false;
       
       SLP_TREE_LEFT (*node) = left_node;
@@ -2603,8 +2658,11 @@ vect_build_slp_tree (loop_vec_info loop_
       SLP_TREE_VEC_STMTS (right_node) = NULL;
       SLP_TREE_LEFT (right_node) = NULL;
       SLP_TREE_RIGHT (right_node) = NULL;
+      SLP_TREE_OUTSIDE_OF_LOOP_COST (right_node) = 0;
+      SLP_TREE_INSIDE_OF_LOOP_COST (right_node) = 0;
       if (!vect_build_slp_tree (loop_vinfo, &right_node, group_size,
-				slp_impossible))
+				slp_impossible, inside_cost, outside_cost,
+				ncopies_for_cost))
 	return false;
       
       SLP_TREE_RIGHT (*node) = right_node;
@@ -2672,6 +2730,7 @@ vect_analyze_slp_instance (loop_vec_info
   tree vectype, scalar_type, next;
   unsigned int vectorization_factor = 0, ncopies;
   bool slp_impossible = false; 
+  int inside_cost = 0, outside_cost = 0, ncopies_for_cost;
 
   /* FORNOW: multiple types are not supported.  */
   scalar_type = TREE_TYPE (DR_REF (STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt))));
@@ -2701,18 +2760,28 @@ vect_analyze_slp_instance (loop_vec_info
   SLP_TREE_NUMBER_OF_VEC_STMTS (node) = 0;
   SLP_TREE_LEFT (node) = NULL;
   SLP_TREE_RIGHT (node) = NULL;
+  SLP_TREE_OUTSIDE_OF_LOOP_COST (node) = 0;
+  SLP_TREE_INSIDE_OF_LOOP_COST (node) = 0;
 
   /* Calculate the unrolling factor.  */
   unrolling_factor = least_common_multiple (nunits, group_size) / group_size;
 	
+  /* Calculate the number of vector stmts to create based on the unrolling
+     factor (number of vectors is 1 if NUNITS >= GROUP_SIZE, and is
+     GROUP_SIZE / NUNITS otherwise.  */
+  ncopies_for_cost = unrolling_factor * group_size / nunits;
+
   /* Build the tree for the SLP instance.  */
-  if (vect_build_slp_tree (loop_vinfo, &node, group_size, &slp_impossible))
+  if (vect_build_slp_tree (loop_vinfo, &node, group_size, &slp_impossible,
+			   &inside_cost, &outside_cost, ncopies_for_cost))
     {
       /* Create a new SLP instance.  */  
       new_instance = XNEW (struct _slp_instance);
       SLP_INSTANCE_TREE (new_instance) = node;
       SLP_INSTANCE_GROUP_SIZE (new_instance) = group_size;
       SLP_INSTANCE_UNROLLING_FACTOR (new_instance) = unrolling_factor;
+      SLP_INSTANCE_OUTSIDE_OF_LOOP_COST (new_instance) = outside_cost;
+      SLP_INSTANCE_INSIDE_OF_LOOP_COST (new_instance) = inside_cost;
       VEC_safe_push (slp_instance, heap, LOOP_VINFO_SLP_INSTANCES (loop_vinfo), 
 		     new_instance);
       if (vect_print_dump_info (REPORT_DETAILS))
--- tree-vect-transform.c.patch4	2007-08-14 15:42:48.000000000 +0300
+++ tree-vect-transform.c	2007-08-14 15:45:55.000000000 +0300
@@ -99,6 +99,8 @@ vect_estimate_min_profitable_iters (loop
   basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
   int nbbs = loop->num_nodes;
   int byte_misalign;
+  VEC (slp_instance, heap) *slp_instances;
+  slp_instance instance;
 
   /* Cost model disabled.  */
   if (!flag_vect_cost_model)
@@ -250,6 +252,14 @@ vect_estimate_min_profitable_iters (loop
 		  targetm.vectorize.builtin_vectorization_cost (runtime_test));
     }
 
+  /* Add SLP costs.  */
+  slp_instances = LOOP_VINFO_SLP_INSTANCES (loop_vinfo);
+  for (i = 0; VEC_iterate (slp_instance, slp_instances, i, instance); i++)
+    {
+      vec_outside_cost += SLP_INSTANCE_OUTSIDE_OF_LOOP_COST (instance);
+      vec_inside_cost += SLP_INSTANCE_INSIDE_OF_LOOP_COST (instance);
+    }
+
   /* Calculate number of iterations required to make the vector version 
      profitable, relative to the loop bodies only. The following condition
      must hold true: ((SIC*VF)-VIC)*niters > VOC*VF, where
@@ -416,25 +426,37 @@ vect_model_induction_cost (stmt_vec_info
    single op.  Right now, this does not account for multiple insns that could
    be generated for the single vector op.  We will handle that shortly.  */
 
-static void
+void
 vect_model_simple_cost (stmt_vec_info stmt_info, int ncopies, 
-                        enum vect_def_type *dt)
+			enum vect_def_type *dt, slp_tree slp_node)
 {
   int i;
+  int *inside_cost_field, *outside_cost_field;
 
-  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = ncopies * TARG_VEC_STMT_COST;
+  /* Take addresses of relevant fields to update in the function.  */
+  if (slp_node)
+    {
+      inside_cost_field = &(SLP_TREE_INSIDE_OF_LOOP_COST (slp_node));
+      outside_cost_field = &(SLP_TREE_OUTSIDE_OF_LOOP_COST (slp_node));
+    }
+  else
+    {
+      inside_cost_field = &(STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info));
+      outside_cost_field = &(STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info));
+    }
+
+  *inside_cost_field = ncopies * TARG_VEC_STMT_COST;
 
   /* FORNOW: Assuming maximum 2 args per stmts.  */
   for (i = 0; i < 2; i++)
     {
       if (dt[i] == vect_constant_def || dt[i] == vect_invariant_def)
-	STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) += TARG_SCALAR_TO_VEC_COST; 
+	*outside_cost_field += TARG_SCALAR_TO_VEC_COST; 
     }
   
   if (vect_print_dump_info (REPORT_DETAILS))
     fprintf (vect_dump, "vect_model_simple_cost: inside_cost = %d, "
-             "outside_cost = %d .", STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info),
-             STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info));
+             "outside_cost = %d .", *inside_cost_field, *outside_cost_field);
 }
 
 
@@ -461,18 +483,31 @@ vect_cost_strided_group_size (stmt_vec_i
    Models cost for stores.  In the case of strided accesses, one access
    has the overhead of the strided access attributed to it.  */
 
-static void
+void
 vect_model_store_cost (stmt_vec_info stmt_info, int ncopies, 
-                       enum vect_def_type dt)
+		       enum vect_def_type dt, slp_tree slp_node)
 {
   int cost = 0;
   int group_size;
+  int *inside_cost_field, *outside_cost_field;
+
+  /* Take addresses of relevant fields to update in the function.  */
+  if (slp_node)
+    {
+      inside_cost_field = &(SLP_TREE_INSIDE_OF_LOOP_COST (slp_node));
+      outside_cost_field = &(SLP_TREE_OUTSIDE_OF_LOOP_COST (slp_node));
+    }
+  else
+    {
+      inside_cost_field = &(STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info));
+      outside_cost_field = &(STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info));
+    }
 
   if (dt == vect_constant_def || dt == vect_invariant_def)
-    STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = TARG_SCALAR_TO_VEC_COST;
+    *outside_cost_field = TARG_SCALAR_TO_VEC_COST;
 
   /* Strided access?  */
-  if (STMT_VINFO_STRIDED_ACCESS (stmt_info)) 
+  if (DR_GROUP_FIRST_DR (stmt_info)) 
     group_size = vect_cost_strided_group_size (stmt_info);
   /* Not a strided access.  */
   else
@@ -495,12 +530,11 @@ vect_model_store_cost (stmt_vec_info stm
   /* Costs of the stores.  */
   cost += ncopies * TARG_VEC_STORE_COST;
 
-  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost;
+  *inside_cost_field = cost;
 
   if (vect_print_dump_info (REPORT_DETAILS))
     fprintf (vect_dump, "vect_model_store_cost: inside_cost = %d, "
-             "outside_cost = %d .", STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info),
-             STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info));
+             "outside_cost = %d .", *inside_cost_field, *outside_cost_field);
 }
 
 
@@ -511,8 +545,8 @@ vect_model_store_cost (stmt_vec_info stm
    accesses are supported for loads, we also account for the costs of the 
    access scheme chosen.  */
 
-static void
-vect_model_load_cost (stmt_vec_info stmt_info, int ncopies)
+void
+vect_model_load_cost (stmt_vec_info stmt_info, int ncopies, slp_tree slp_node)
 		 
 {
   int inner_cost = 0;
@@ -520,10 +554,23 @@ vect_model_load_cost (stmt_vec_info stmt
   int alignment_support_cheme;
   tree first_stmt;
   struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info), *first_dr;
+  int *inside_cost_field, *outside_cost_field;
+
+  /* Take addresses of relevant fields to update in the function.  */
+  if (slp_node)
+    {
+      inside_cost_field = &(SLP_TREE_INSIDE_OF_LOOP_COST (slp_node));
+      outside_cost_field = &(SLP_TREE_OUTSIDE_OF_LOOP_COST (slp_node));
+    }
+  else
+    {
+      inside_cost_field = &(STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info));
+      outside_cost_field = &(STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info));
+    }
 
   /* Strided accesses?  */
   first_stmt = DR_GROUP_FIRST_DR (stmt_info);
-  if (first_stmt)
+  if (first_stmt && !slp_node)
     {
       group_size = vect_cost_strided_group_size (stmt_info);
       first_dr = STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt));
@@ -589,14 +636,14 @@ vect_model_load_cost (stmt_vec_info stmt
            access in the group. Inside the loop, there is a load op
            and a realignment op.  */
 
-        if ((!STMT_VINFO_STRIDED_ACCESS (stmt_info)) || group_size > 1)
+        if ((!DR_GROUP_FIRST_DR (stmt_info)) || group_size > 1 || slp_node)
           {
             outer_cost = 2*TARG_VEC_STMT_COST;
             if (targetm.vectorize.builtin_mask_for_load)
               outer_cost += TARG_VEC_STMT_COST;
           }
         
-        STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = outer_cost;
+        *outside_cost_field = outer_cost;
 
         inner_cost += ncopies * (TARG_VEC_LOAD_COST + TARG_VEC_STMT_COST);
 
@@ -607,12 +654,12 @@ vect_model_load_cost (stmt_vec_info stmt
       gcc_unreachable ();
     }
 
-  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = inner_cost;
+  *inside_cost_field = inner_cost;
 
   if (vect_print_dump_info (REPORT_DETAILS))
     fprintf (vect_dump, "vect_model_load_cost: inside_cost = %d, "
-             "outside_cost = %d .", STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info),
-             STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info));
+             "outside_cost = %d .", *inside_cost_field, *outside_cost_field);
+
 }
 
 
@@ -2658,7 +2705,7 @@ vectorizable_call (tree stmt, block_stmt
       STMT_VINFO_TYPE (stmt_info) = call_vec_info_type;
       if (vect_print_dump_info (REPORT_DETAILS))
         fprintf (vect_dump, "=== vectorizable_call ===");
-      vect_model_simple_cost (stmt_info, ncopies, dt);
+      vect_model_simple_cost (stmt_info, ncopies, dt, NULL);
       return true;
     }
 
@@ -3194,7 +3241,7 @@ vectorizable_assignment (tree stmt, bloc
       STMT_VINFO_TYPE (stmt_info) = assignment_vec_info_type;
       if (vect_print_dump_info (REPORT_DETAILS))
         fprintf (vect_dump, "=== vectorizable_assignment ===");
-      vect_model_simple_cost (stmt_info, ncopies, dt);
+      vect_model_simple_cost (stmt_info, ncopies, dt, NULL);
       return true;
     }
 
@@ -3461,7 +3508,7 @@ vectorizable_operation (tree stmt, block
       STMT_VINFO_TYPE (stmt_info) = op_vec_info_type;
       if (vect_print_dump_info (REPORT_DETAILS))
         fprintf (vect_dump, "=== vectorizable_operation ===");
-      vect_model_simple_cost (stmt_info, ncopies, dt);
+      vect_model_simple_cost (stmt_info, ncopies, dt, NULL);
       return true;
     }
 
@@ -3719,7 +3766,7 @@ vectorizable_type_demotion (tree stmt, b
       STMT_VINFO_TYPE (stmt_info) = type_demotion_vec_info_type;
       if (vect_print_dump_info (REPORT_DETAILS))
         fprintf (vect_dump, "=== vectorizable_demotion ===");
-      vect_model_simple_cost (stmt_info, ncopies, dt);
+      vect_model_simple_cost (stmt_info, ncopies, dt, NULL);
       return true;
     }
 
@@ -3882,7 +3929,7 @@ vectorizable_type_promotion (tree stmt, 
       STMT_VINFO_TYPE (stmt_info) = type_promotion_vec_info_type;
       if (vect_print_dump_info (REPORT_DETAILS))
         fprintf (vect_dump, "=== vectorizable_promotion ===");
-      vect_model_simple_cost (stmt_info, 2*ncopies, dt);
+      vect_model_simple_cost (stmt_info, 2*ncopies, dt, NULL);
       return true;
     }
 
@@ -4212,7 +4259,8 @@ vectorizable_store (tree stmt, block_stm
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = store_vec_info_type;
-      vect_model_store_cost (stmt_info, ncopies, dt);
+      if (!STMT_VINFO_PURE_SLP (stmt_info))
+	vect_model_store_cost (stmt_info, ncopies, dt, NULL);
       return true;
     }
 
@@ -4893,7 +4941,8 @@ vectorizable_load (tree stmt, block_stmt
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = load_vec_info_type;
-      vect_model_load_cost (stmt_info, ncopies);
+      if (!STMT_VINFO_PURE_SLP (stmt_info))
+	vect_model_load_cost (stmt_info, ncopies, NULL);
       return true;
     }
 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-09-09 11:00 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <OF21828808.D8D7629C-ONC2257337.00487532-C2257337.0048D04A@LocalDomain>
2007-08-18 18:40 ` [patch] Loop-aware SLP 4/5 Dorit Nuzman
     [not found] <OF56DBF560.643A33F4-ONC2257351.003B4089-C2257351.003BEBE7@LocalDomain>
2007-09-09 11:54 ` Dorit Nuzman
     [not found] <OFADE7D20D.3F117D3E-ONC2257351.002B67C0-C2257351.002BF742@LocalDomain>
2007-09-09  9:05 ` Ira Rosen
2007-09-09 11:32 ` Ira Rosen
     [not found] <OF988F5AAC.5E4646D5-ONC225734D.002EB0F3-C2257351.00279CD2@LocalDomain>
2007-09-09  8:59 ` Dorit Nuzman
     [not found] <OFFE7E55A6.7B3C8820-ONC225733B.006323F7-C225733B.00663708@LocalDomain>
2007-09-09  7:57 ` Ira Rosen
2007-08-14 13:01 [patch] Loop-aware SLP Ira Rosen
2007-08-14 13:16 ` [patch] Loop-aware SLP 4/5 Ira Rosen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).