* Re: [patch] Loop-aware SLP 4/5 [not found] <OF988F5AAC.5E4646D5-ONC225734D.002EB0F3-C2257351.00279CD2@LocalDomain> @ 2007-09-09 8:59 ` Dorit Nuzman 0 siblings, 0 replies; 7+ messages in thread From: Dorit Nuzman @ 2007-09-09 8:59 UTC (permalink / raw) To: Ira Rosen; +Cc: gcc-patches Ira Rosen/Haifa/IBM wrote on 09/09/2007 10:12:40: > Dorit Nuzman/Haifa/IBM wrote on 18/08/2007 21:36:27: > > > Ira Rosen/Haifa/IBM wrote on 14/08/2007 16:15:19: > > > > > This part adds an adjustment of vectorization cost model to SLP. > > > > > > SLP costs are calculated for each SLP computation tree recursively > > > during its analysis, and added to the overall costs if we decide to SLP. > > > > > > Thanks, > > > Ira > > > > > > > +/* SLP costs are calculated according to SLP instance unrolling > factor (i.e., > > + the number of created vector stmts depends on the unrolling > > factor). However, > > + the actual number of vector stmts for every SLP node depends on > > VF which is > > + set later in vect_analyze_operations(). Hence, SLP costs should > > be updated. > > + In this function we assume that the inside costs calculated in > > + vect_model_xxx_cost are linear in ncopies. */ > > > > can you explain why this assumption holds (or when it may be broken, > > and assert against that)? > > > > + /* We assume that costs are linear in ncopies. */ > > + if (SLP_INSTANCE_UNROLLING_FACTOR (instance) != vf) > > + SLP_INSTANCE_INSIDE_OF_LOOP_COST (instance) *= vf > > + / SLP_INSTANCE_UNROLLING_FACTOR (instance); > > > > to me it's less confusing if you remove the 'if' (when unrolling==vf > > you'll just multiply by 1, so no need for the special check). > > > > Done. > > > and lastly, about the following repeating change: > > > > + int *inside_cost_field, *outside_cost_field; > > + > > + /* Take addresses of relevant fields to update in the function. */ > > + if (slp_node) > > + { > > + inside_cost_field = &(SLP_TREE_INSIDE_OF_LOOP_COST (slp_node)); > > + outside_cost_field = &(SLP_TREE_OUTSIDE_OF_LOOP_COST (slp_node)); > > + } > > + else > > + { > > + inside_cost_field = &(STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info)); > > + outside_cost_field = &(STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info)); > > + } > > ... > > - STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = outer_cost; > > + *outside_cost_field = outer_cost; > > > > inner_cost += ncopies * (TARG_VEC_LOAD_COST + TARG_VEC_STMT_COST); > > > > @@ -607,12 +654,12 @@ vect_model_load_cost (stmt_vec_info stmt > > gcc_unreachable (); > > } > > > > - STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = inner_cost; > > + *inside_cost_field = inner_cost; > > > > Again, in an attempt to reduce "if (slp_node)" switches in the code, > > I'd replace > > STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost; > > STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = cost; > > with > > stmt_vinfo_inside_of_loop_cost (stmt_info, slp_node, cost); > > stmt_vinfo_outside_of_loop_cost (stmt_info, slp_node, cost); > > > > ...and hide the "if (slp_node)" switch there: > > stmt_vinfo_inside_of_loop_cost (stmt_info, slp_node, cost) > > { > > if (slp_node) > > SLP_TREE_INSIDE_OF_LOOP_COST (slp_node) = cost; > > else > > STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost; > > } > > > > (for your consideration...) > > I moved the choice of the relevant field to a different (inlined) function. > I prefer the alternative I suggested above - it's a bit less code (no need for the call to the selection function) and more importantly the whole field selection machinery is entirely hidden. Please consider changing (or explain why you prefer the other solution). In the meantime, ok to commit this patch and all the other parts as well (the above change can be applied separately later). thanks for addressing the comments, dorit > Thanks, > Ira > > > > > ok otherwise, > > > > thanks, > > dorit > > > > > ChangeLog: > > > > > > * tree-vectorizer.h (_slp_tree): Add new fields for costs and their > > > access functions. > > > (_slp_instance): Likewise. > > > (vect_model_simple_cost, vect_model_store_cost, vect_model_load_cost): > > > Declare (make extern). > > > * tree-vect-analyze.c (vect_update_slp_costs_according_to_vf): New. > > > (vect_analyze_operations): Call vect_update_slp_costs_according_to_vf. > > > (vect_get_and_check_slp_defs): Calculate costs. Add arguments. > > > (vect_build_slp_tree): Likewise. > > > (vect_analyze_slp_instance): Initialize cost fields. Update > > > arguments of vect_build_slp_tree. > > > * tree-vect-transform.c (vect_estimate_min_profitable_iters): Take > > > SLP costs into account. > > > (vect_model_simple_cost): Make extern, add SLP parameter and handle > > > SLP. > > > (vect_model_store_cost, vect_model_load_cost): Likewise. > > > (vectorizable_call): Add argument to vect_model_simple_cost. > > > (vectorizable_assignment): Call vect_model_simple_cost only for not > > > pure SLP stmts. > > > (vectorizable_operation): Likewise. > > > (vectorizable_type_demotion): Add argument to > > > vect_model_simple_cost. > > > (vectorizable_type_promotion): Likewise. > > > (vectorizable_store): Call vect_model_simple_cost only for not pure > > > SLP stmts. > > > (vectorizable_load): Likewise. > > > > > > [attachment "slp-part4.txt" deleted by Dorit Nuzman/Haifa/IBM] ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <OF56DBF560.643A33F4-ONC2257351.003B4089-C2257351.003BEBE7@LocalDomain>]
* Re: [patch] Loop-aware SLP 4/5 [not found] <OF56DBF560.643A33F4-ONC2257351.003B4089-C2257351.003BEBE7@LocalDomain> @ 2007-09-09 11:54 ` Dorit Nuzman 0 siblings, 0 replies; 7+ messages in thread From: Dorit Nuzman @ 2007-09-09 11:54 UTC (permalink / raw) To: Ira Rosen; +Cc: gcc-patches Ira Rosen/Haifa/IBM wrote on 09/09/2007 13:54:30: > Dorit Nuzman/Haifa/IBM wrote on 09/09/2007 11:00:13: > > > Ira Rosen/Haifa/IBM wrote on 09/09/2007 10:12:40: > > > > > Dorit Nuzman/Haifa/IBM wrote on 18/08/2007 21:36:27: > > > > > > > Ira Rosen/Haifa/IBM wrote on 14/08/2007 16:15:19: > > > > > > > > and lastly, about the following repeating change: > > > > > > > > + int *inside_cost_field, *outside_cost_field; > > > > + > > > > + /* Take addresses of relevant fields to update in the function. */ > > > > + if (slp_node) > > > > + { > > > > + inside_cost_field = &(SLP_TREE_INSIDE_OF_LOOP_COST (slp_node)); > > > > + outside_cost_field = &(SLP_TREE_OUTSIDE_OF_LOOP_COST (slp_node)); > > > > + } > > > > + else > > > > + { > > > > + inside_cost_field = &(STMT_VINFO_INSIDE_OF_LOOP_COST > (stmt_info)); > > > > + outside_cost_field = &(STMT_VINFO_OUTSIDE_OF_LOOP_COST > > (stmt_info)); > > > > + } > > > > ... > > > > - STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = outer_cost; > > > > + *outside_cost_field = outer_cost; > > > > > > > > inner_cost += ncopies * (TARG_VEC_LOAD_COST + > > TARG_VEC_STMT_COST); > > > > > > > > @@ -607,12 +654,12 @@ vect_model_load_cost (stmt_vec_info stmt > > > > gcc_unreachable (); > > > > } > > > > > > > > - STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = inner_cost; > > > > + *inside_cost_field = inner_cost; > > > > > > > > Again, in an attempt to reduce "if (slp_node)" switches in the code, > > > > I'd replace > > > > STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost; > > > > STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = cost; > > > > with > > > > stmt_vinfo_inside_of_loop_cost (stmt_info, slp_node, cost); > > > > stmt_vinfo_outside_of_loop_cost (stmt_info, slp_node, cost); > > > > > > > > ...and hide the "if (slp_node)" switch there: > > > > stmt_vinfo_inside_of_loop_cost (stmt_info, slp_node, cost) > > > > { > > > > if (slp_node) > > > > SLP_TREE_INSIDE_OF_LOOP_COST (slp_node) = cost; > > > > else > > > > STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost; > > > > } > > > > > > > > (for your consideration...) > > > > > > I moved the choice of the relevant field to a different > (inlined) function. > > > > > > > I prefer the alternative I suggested above - it's a bit less code > > (no need for the call to the selection function) and more > > importantly the whole field selection machinery is entirely hidden. > > Please consider changing (or explain why you prefer the other solution). > > > > In the meantime, ok to commit this patch and all the other parts as > > well (the above change can be applied separately later). > > Here is the patch. > Bootstrapped and tested on x86_64-linux. > OK for mainline? > yes thanks, dorit > Thanks, > Ira > > ChangeLog: > > * tree-vectorizer.h (stmt_vinfo_set_inside_of_loop_cost, > stmt_vinfo_set_outside_of_loop_cost): New functions. > * tree-vect-transform.c (vect_get_cost_fields): Remove. > (vect_model_simple_cost): Call stmt_vinfo_set_inside/outside_of_loop_cost > to set the relevant cost field instead of calling vect_get_cost_fields. > (vect_model_store_cost, vect_model_load_cost): Likewise. > > [attachment "costs.txt" deleted by Dorit Nuzman/Haifa/IBM] ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <OFADE7D20D.3F117D3E-ONC2257351.002B67C0-C2257351.002BF742@LocalDomain>]
* Re: [patch] Loop-aware SLP 4/5 [not found] <OFADE7D20D.3F117D3E-ONC2257351.002B67C0-C2257351.002BF742@LocalDomain> @ 2007-09-09 9:05 ` Ira Rosen 2007-09-09 11:32 ` Ira Rosen 1 sibling, 0 replies; 7+ messages in thread From: Ira Rosen @ 2007-09-09 9:05 UTC (permalink / raw) To: Dorit Nuzman; +Cc: gcc-patches Dorit Nuzman/Haifa/IBM wrote on 09/09/2007 11:00:13: > Ira Rosen/Haifa/IBM wrote on 09/09/2007 10:12:40: > > > Dorit Nuzman/Haifa/IBM wrote on 18/08/2007 21:36:27: > > > > > Ira Rosen/Haifa/IBM wrote on 14/08/2007 16:15:19: > > > > > > > This part adds an adjustment of vectorization cost model to SLP. > > > > > > > > SLP costs are calculated for each SLP computation tree recursively > > > > during its analysis, and added to the overall costs if we decide to SLP. > > > > > > > > Thanks, > > > > Ira > > > > > > > > > > +/* SLP costs are calculated according to SLP instance unrolling > > factor (i.e., > > > + the number of created vector stmts depends on the unrolling > > > factor). However, > > > + the actual number of vector stmts for every SLP node depends on > > > VF which is > > > + set later in vect_analyze_operations(). Hence, SLP costs should > > > be updated. > > > + In this function we assume that the inside costs calculated in > > > + vect_model_xxx_cost are linear in ncopies. */ > > > > > > can you explain why this assumption holds (or when it may be broken, > > > and assert against that)? > > > > > > + /* We assume that costs are linear in ncopies. */ > > > + if (SLP_INSTANCE_UNROLLING_FACTOR (instance) != vf) > > > + SLP_INSTANCE_INSIDE_OF_LOOP_COST (instance) *= vf > > > + / SLP_INSTANCE_UNROLLING_FACTOR (instance); > > > > > > to me it's less confusing if you remove the 'if' (when unrolling==vf > > > you'll just multiply by 1, so no need for the special check). > > > > > > > Done. > > > > > and lastly, about the following repeating change: > > > > > > + int *inside_cost_field, *outside_cost_field; > > > + > > > + /* Take addresses of relevant fields to update in the function. */ > > > + if (slp_node) > > > + { > > > + inside_cost_field = &(SLP_TREE_INSIDE_OF_LOOP_COST (slp_node)); > > > + outside_cost_field = &(SLP_TREE_OUTSIDE_OF_LOOP_COST (slp_node)); > > > + } > > > + else > > > + { > > > + inside_cost_field = &(STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info)); > > > + outside_cost_field = &(STMT_VINFO_OUTSIDE_OF_LOOP_COST > (stmt_info)); > > > + } > > > ... > > > - STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = outer_cost; > > > + *outside_cost_field = outer_cost; > > > > > > inner_cost += ncopies * (TARG_VEC_LOAD_COST + > TARG_VEC_STMT_COST); > > > > > > @@ -607,12 +654,12 @@ vect_model_load_cost (stmt_vec_info stmt > > > gcc_unreachable (); > > > } > > > > > > - STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = inner_cost; > > > + *inside_cost_field = inner_cost; > > > > > > Again, in an attempt to reduce "if (slp_node)" switches in the code, > > > I'd replace > > > STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost; > > > STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = cost; > > > with > > > stmt_vinfo_inside_of_loop_cost (stmt_info, slp_node, cost); > > > stmt_vinfo_outside_of_loop_cost (stmt_info, slp_node, cost); > > > > > > ...and hide the "if (slp_node)" switch there: > > > stmt_vinfo_inside_of_loop_cost (stmt_info, slp_node, cost) > > > { > > > if (slp_node) > > > SLP_TREE_INSIDE_OF_LOOP_COST (slp_node) = cost; > > > else > > > STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost; > > > } > > > > > > (for your consideration...) > > > > I moved the choice of the relevant field to a different (inlined) function. > > > > I prefer the alternative I suggested above - it's a bit less code > (no need for the call to the selection function) and more > importantly the whole field selection machinery is entirely hidden. > Please consider changing (or explain why you prefer the other solution). > > In the meantime, ok to commit this patch and all the other parts as > well (the above change can be applied separately later). O.K. I will reconsider it. Committed as is meantime. Thanks, Ira > > thanks for addressing the comments, > > dorit > > > Thanks, > > Ira > > > > > > > > ok otherwise, > > > > > > thanks, > > > dorit > > > > > > > ChangeLog: > > > > > > > > * tree-vectorizer.h (_slp_tree): Add new fields for costs and their > > > > access functions. > > > > (_slp_instance): Likewise. > > > > (vect_model_simple_cost, vect_model_store_cost, vect_model_load_cost): > > > > Declare (make extern). > > > > * tree-vect-analyze.c (vect_update_slp_costs_according_to_vf): New. > > > > (vect_analyze_operations): Call vect_update_slp_costs_according_to_vf. > > > > (vect_get_and_check_slp_defs): Calculate costs. Add arguments. > > > > (vect_build_slp_tree): Likewise. > > > > (vect_analyze_slp_instance): Initialize cost fields. Update > > > > arguments of vect_build_slp_tree. > > > > * tree-vect-transform.c (vect_estimate_min_profitable_iters): Take > > > > SLP costs into account. > > > > (vect_model_simple_cost): Make extern, add SLP parameter and handle > > > > SLP. > > > > (vect_model_store_cost, vect_model_load_cost): Likewise. > > > > (vectorizable_call): Add argument to vect_model_simple_cost. > > > > (vectorizable_assignment): Call vect_model_simple_cost > only for not > > > > pure SLP stmts. > > > > (vectorizable_operation): Likewise. > > > > (vectorizable_type_demotion): Add argument to > > > > vect_model_simple_cost. > > > > (vectorizable_type_promotion): Likewise. > > > > (vectorizable_store): Call vect_model_simple_cost only > for not pure > > > > SLP stmts. > > > > (vectorizable_load): Likewise. > > > > > > > > [attachment "slp-part4.txt" deleted by Dorit Nuzman/Haifa/IBM] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch] Loop-aware SLP 4/5 [not found] <OFADE7D20D.3F117D3E-ONC2257351.002B67C0-C2257351.002BF742@LocalDomain> 2007-09-09 9:05 ` Ira Rosen @ 2007-09-09 11:32 ` Ira Rosen 1 sibling, 0 replies; 7+ messages in thread From: Ira Rosen @ 2007-09-09 11:32 UTC (permalink / raw) To: Dorit Nuzman; +Cc: gcc-patches [-- Attachment #1: Type: text/plain, Size: 3038 bytes --] Dorit Nuzman/Haifa/IBM wrote on 09/09/2007 11:00:13: > Ira Rosen/Haifa/IBM wrote on 09/09/2007 10:12:40: > > > Dorit Nuzman/Haifa/IBM wrote on 18/08/2007 21:36:27: > > > > > Ira Rosen/Haifa/IBM wrote on 14/08/2007 16:15:19: > > > > > > and lastly, about the following repeating change: > > > > > > + int *inside_cost_field, *outside_cost_field; > > > + > > > + /* Take addresses of relevant fields to update in the function. */ > > > + if (slp_node) > > > + { > > > + inside_cost_field = &(SLP_TREE_INSIDE_OF_LOOP_COST (slp_node)); > > > + outside_cost_field = &(SLP_TREE_OUTSIDE_OF_LOOP_COST (slp_node)); > > > + } > > > + else > > > + { > > > + inside_cost_field = &(STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info)); > > > + outside_cost_field = &(STMT_VINFO_OUTSIDE_OF_LOOP_COST > (stmt_info)); > > > + } > > > ... > > > - STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = outer_cost; > > > + *outside_cost_field = outer_cost; > > > > > > inner_cost += ncopies * (TARG_VEC_LOAD_COST + > TARG_VEC_STMT_COST); > > > > > > @@ -607,12 +654,12 @@ vect_model_load_cost (stmt_vec_info stmt > > > gcc_unreachable (); > > > } > > > > > > - STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = inner_cost; > > > + *inside_cost_field = inner_cost; > > > > > > Again, in an attempt to reduce "if (slp_node)" switches in the code, > > > I'd replace > > > STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost; > > > STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = cost; > > > with > > > stmt_vinfo_inside_of_loop_cost (stmt_info, slp_node, cost); > > > stmt_vinfo_outside_of_loop_cost (stmt_info, slp_node, cost); > > > > > > ...and hide the "if (slp_node)" switch there: > > > stmt_vinfo_inside_of_loop_cost (stmt_info, slp_node, cost) > > > { > > > if (slp_node) > > > SLP_TREE_INSIDE_OF_LOOP_COST (slp_node) = cost; > > > else > > > STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost; > > > } > > > > > > (for your consideration...) > > > > I moved the choice of the relevant field to a different (inlined) function. > > > > I prefer the alternative I suggested above - it's a bit less code > (no need for the call to the selection function) and more > importantly the whole field selection machinery is entirely hidden. > Please consider changing (or explain why you prefer the other solution). > > In the meantime, ok to commit this patch and all the other parts as > well (the above change can be applied separately later). Here is the patch. Bootstrapped and tested on x86_64-linux. OK for mainline? Thanks, Ira ChangeLog: * tree-vectorizer.h (stmt_vinfo_set_inside_of_loop_cost, stmt_vinfo_set_outside_of_loop_cost): New functions. * tree-vect-transform.c (vect_get_cost_fields): Remove. (vect_model_simple_cost): Call stmt_vinfo_set_inside/outside_of_loop_cost to set the relevant cost field instead of calling vect_get_cost_fields. (vect_model_store_cost, vect_model_load_cost): Likewise. (See attached file: costs.txt) [-- Attachment #2: costs.txt --] [-- Type: text/plain, Size: 9050 bytes --] Index: tree-vectorizer.h =================================================================== --- tree-vectorizer.h (revision 128289) +++ tree-vectorizer.h (working copy) @@ -561,6 +561,27 @@ is_loop_header_bb_p (basic_block bb) return false; } +static inline void +stmt_vinfo_set_inside_of_loop_cost (stmt_vec_info stmt_info, slp_tree slp_node, + int cost) +{ + if (slp_node) + SLP_TREE_INSIDE_OF_LOOP_COST (slp_node) = cost; + else + STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost; +} + +static inline void +stmt_vinfo_set_outside_of_loop_cost (stmt_vec_info stmt_info, slp_tree slp_node, + int cost) +{ + if (slp_node) + SLP_TREE_OUTSIDE_OF_LOOP_COST (slp_node) = cost; + else + STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = cost; +} + + /*-----------------------------------------------------------------*/ /* Info on data references alignment. */ /*-----------------------------------------------------------------*/ Index: tree-vect-transform.c =================================================================== --- tree-vect-transform.c (revision 128289) +++ tree-vect-transform.c (working copy) @@ -462,26 +462,6 @@ vect_model_induction_cost (stmt_vec_info } -/* Return addresses of the cost fields of SLP_NODE if it's not NULL, and of - the stmt otherwise. */ - -static inline void -vect_get_cost_fields (stmt_vec_info stmt_info, slp_tree slp_node, - int **inside_cost_field, int **outside_cost_field) -{ - if (slp_node) - { - *inside_cost_field = &(SLP_TREE_INSIDE_OF_LOOP_COST (slp_node)); - *outside_cost_field = &(SLP_TREE_OUTSIDE_OF_LOOP_COST (slp_node)); - } - else - { - *inside_cost_field = &(STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info)); - *outside_cost_field = &(STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info)); - } -} - - /* Function vect_model_simple_cost. Models cost for simple operations, i.e. those that only emit ncopies of a @@ -493,24 +473,24 @@ vect_model_simple_cost (stmt_vec_info st enum vect_def_type *dt, slp_tree slp_node) { int i; - int *inside_cost_field, *outside_cost_field; + int inside_cost = 0, outside_cost = 0; - /* Take addresses of relevant fields to update in the function. */ - vect_get_cost_fields (stmt_info, slp_node, &inside_cost_field, - &outside_cost_field); - - *inside_cost_field = ncopies * TARG_VEC_STMT_COST; + inside_cost = ncopies * TARG_VEC_STMT_COST; /* FORNOW: Assuming maximum 2 args per stmts. */ for (i = 0; i < 2; i++) { if (dt[i] == vect_constant_def || dt[i] == vect_invariant_def) - *outside_cost_field += TARG_SCALAR_TO_VEC_COST; + outside_cost += TARG_SCALAR_TO_VEC_COST; } if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, "vect_model_simple_cost: inside_cost = %d, " - "outside_cost = %d .", *inside_cost_field, *outside_cost_field); + "outside_cost = %d .", inside_cost, outside_cost); + + /* Set the costs either in STMT_INFO or SLP_NODE (if exists). */ + stmt_vinfo_set_inside_of_loop_cost (stmt_info, slp_node, inside_cost); + stmt_vinfo_set_outside_of_loop_cost (stmt_info, slp_node, outside_cost); } @@ -541,16 +521,11 @@ void vect_model_store_cost (stmt_vec_info stmt_info, int ncopies, enum vect_def_type dt, slp_tree slp_node) { - int cost = 0; int group_size; - int *inside_cost_field, *outside_cost_field; - - /* Take addresses of relevant fields to update in the function. */ - vect_get_cost_fields (stmt_info, slp_node, &inside_cost_field, - &outside_cost_field); + int inside_cost = 0, outside_cost = 0; if (dt == vect_constant_def || dt == vect_invariant_def) - *outside_cost_field = TARG_SCALAR_TO_VEC_COST; + outside_cost = TARG_SCALAR_TO_VEC_COST; /* Strided access? */ if (DR_GROUP_FIRST_DR (stmt_info)) @@ -564,7 +539,7 @@ vect_model_store_cost (stmt_vec_info stm if (group_size > 1) { /* Uses a high and low interleave operation for each needed permute. */ - cost = ncopies * exact_log2(group_size) * group_size + inside_cost = ncopies * exact_log2(group_size) * group_size * TARG_VEC_STMT_COST; if (vect_print_dump_info (REPORT_DETAILS)) @@ -574,13 +549,15 @@ vect_model_store_cost (stmt_vec_info stm } /* Costs of the stores. */ - cost += ncopies * TARG_VEC_STORE_COST; - - *inside_cost_field = cost; + inside_cost += ncopies * TARG_VEC_STORE_COST; if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, "vect_model_store_cost: inside_cost = %d, " - "outside_cost = %d .", *inside_cost_field, *outside_cost_field); + "outside_cost = %d .", inside_cost, outside_cost); + + /* Set the costs either in STMT_INFO or SLP_NODE (if exists). */ + stmt_vinfo_set_inside_of_loop_cost (stmt_info, slp_node, inside_cost); + stmt_vinfo_set_outside_of_loop_cost (stmt_info, slp_node, outside_cost); } @@ -595,16 +572,11 @@ void vect_model_load_cost (stmt_vec_info stmt_info, int ncopies, slp_tree slp_node) { - int inner_cost = 0; int group_size; int alignment_support_cheme; tree first_stmt; struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info), *first_dr; - int *inside_cost_field, *outside_cost_field; - - /* Take addresses of relevant fields to update in the function. */ - vect_get_cost_fields (stmt_info, slp_node, &inside_cost_field, - &outside_cost_field); + int inside_cost = 0, outside_cost = 0; /* Strided accesses? */ first_stmt = DR_GROUP_FIRST_DR (stmt_info); @@ -627,8 +599,8 @@ vect_model_load_cost (stmt_vec_info stmt if (group_size > 1) { /* Uses an even and odd extract operations for each needed permute. */ - inner_cost = ncopies * exact_log2(group_size) * group_size - * TARG_VEC_STMT_COST; + inside_cost = ncopies * exact_log2(group_size) * group_size + * TARG_VEC_STMT_COST; if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, "vect_model_load_cost: strided group_size = %d .", @@ -641,7 +613,7 @@ vect_model_load_cost (stmt_vec_info stmt { case dr_aligned: { - inner_cost += ncopies * TARG_VEC_LOAD_COST; + inside_cost += ncopies * TARG_VEC_LOAD_COST; if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, "vect_model_load_cost: aligned."); @@ -651,7 +623,7 @@ vect_model_load_cost (stmt_vec_info stmt case dr_unaligned_supported: { /* Here, we assign an additional cost for the unaligned load. */ - inner_cost += ncopies * TARG_VEC_UNALIGNED_LOAD_COST; + inside_cost += ncopies * TARG_VEC_UNALIGNED_LOAD_COST; if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, "vect_model_load_cost: unaligned supported by " @@ -661,20 +633,18 @@ vect_model_load_cost (stmt_vec_info stmt } case dr_explicit_realign: { - inner_cost += ncopies * (2*TARG_VEC_LOAD_COST + TARG_VEC_STMT_COST); + inside_cost += ncopies * (2*TARG_VEC_LOAD_COST + TARG_VEC_STMT_COST); /* FIXME: If the misalignment remains fixed across the iterations of the containing loop, the following cost should be added to the outside costs. */ if (targetm.vectorize.builtin_mask_for_load) - inner_cost += TARG_VEC_STMT_COST; + inside_cost += TARG_VEC_STMT_COST; break; } case dr_explicit_realign_optimized: { - int outer_cost = 0; - if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, "vect_model_load_cost: unaligned software " "pipelined."); @@ -688,14 +658,12 @@ vect_model_load_cost (stmt_vec_info stmt if ((!DR_GROUP_FIRST_DR (stmt_info)) || group_size > 1 || slp_node) { - outer_cost = 2*TARG_VEC_STMT_COST; + outside_cost = 2*TARG_VEC_STMT_COST; if (targetm.vectorize.builtin_mask_for_load) - outer_cost += TARG_VEC_STMT_COST; + outside_cost += TARG_VEC_STMT_COST; } - - *outside_cost_field = outer_cost; - inner_cost += ncopies * (TARG_VEC_LOAD_COST + TARG_VEC_STMT_COST); + inside_cost += ncopies * (TARG_VEC_LOAD_COST + TARG_VEC_STMT_COST); break; } @@ -703,13 +671,14 @@ vect_model_load_cost (stmt_vec_info stmt default: gcc_unreachable (); } - - *inside_cost_field = inner_cost; - + if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, "vect_model_load_cost: inside_cost = %d, " - "outside_cost = %d .", *inside_cost_field, *outside_cost_field); + "outside_cost = %d .", inside_cost, outside_cost); + /* Set the costs either in STMT_INFO or SLP_NODE (if exists). */ + stmt_vinfo_set_inside_of_loop_cost (stmt_info, slp_node, inside_cost); + stmt_vinfo_set_outside_of_loop_cost (stmt_info, slp_node, outside_cost); } ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <OFFE7E55A6.7B3C8820-ONC225733B.006323F7-C225733B.00663708@LocalDomain>]
* Re: [patch] Loop-aware SLP 4/5 [not found] <OFFE7E55A6.7B3C8820-ONC225733B.006323F7-C225733B.00663708@LocalDomain> @ 2007-09-09 7:57 ` Ira Rosen 0 siblings, 0 replies; 7+ messages in thread From: Ira Rosen @ 2007-09-09 7:57 UTC (permalink / raw) To: Dorit Nuzman; +Cc: gcc-patches Dorit Nuzman/Haifa/IBM wrote on 18/08/2007 21:36:27: > Ira Rosen/Haifa/IBM wrote on 14/08/2007 16:15:19: > > > This part adds an adjustment of vectorization cost model to SLP. > > > > SLP costs are calculated for each SLP computation tree recursively > > during its analysis, and added to the overall costs if we decide to SLP. > > > > Thanks, > > Ira > > > > +/* SLP costs are calculated according to SLP instance unrolling factor (i.e., > + the number of created vector stmts depends on the unrolling > factor). However, > + the actual number of vector stmts for every SLP node depends on > VF which is > + set later in vect_analyze_operations(). Hence, SLP costs should > be updated. > + In this function we assume that the inside costs calculated in > + vect_model_xxx_cost are linear in ncopies. */ > > can you explain why this assumption holds (or when it may be broken, > and assert against that)? > > + /* We assume that costs are linear in ncopies. */ > + if (SLP_INSTANCE_UNROLLING_FACTOR (instance) != vf) > + SLP_INSTANCE_INSIDE_OF_LOOP_COST (instance) *= vf > + / SLP_INSTANCE_UNROLLING_FACTOR (instance); > > to me it's less confusing if you remove the 'if' (when unrolling==vf > you'll just multiply by 1, so no need for the special check). > Done. > and lastly, about the following repeating change: > > + int *inside_cost_field, *outside_cost_field; > + > + /* Take addresses of relevant fields to update in the function. */ > + if (slp_node) > + { > + inside_cost_field = &(SLP_TREE_INSIDE_OF_LOOP_COST (slp_node)); > + outside_cost_field = &(SLP_TREE_OUTSIDE_OF_LOOP_COST (slp_node)); > + } > + else > + { > + inside_cost_field = &(STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info)); > + outside_cost_field = &(STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info)); > + } > ... > - STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = outer_cost; > + *outside_cost_field = outer_cost; > > inner_cost += ncopies * (TARG_VEC_LOAD_COST + TARG_VEC_STMT_COST); > > @@ -607,12 +654,12 @@ vect_model_load_cost (stmt_vec_info stmt > gcc_unreachable (); > } > > - STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = inner_cost; > + *inside_cost_field = inner_cost; > > Again, in an attempt to reduce "if (slp_node)" switches in the code, > I'd replace > STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost; > STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = cost; > with > stmt_vinfo_inside_of_loop_cost (stmt_info, slp_node, cost); > stmt_vinfo_outside_of_loop_cost (stmt_info, slp_node, cost); > > ...and hide the "if (slp_node)" switch there: > stmt_vinfo_inside_of_loop_cost (stmt_info, slp_node, cost) > { > if (slp_node) > SLP_TREE_INSIDE_OF_LOOP_COST (slp_node) = cost; > else > STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost; > } > > (for your consideration...) I moved the choice of the relevant field to a different (inlined) function. Thanks, Ira > > ok otherwise, > > thanks, > dorit > > > ChangeLog: > > > > * tree-vectorizer.h (_slp_tree): Add new fields for costs and their > > access functions. > > (_slp_instance): Likewise. > > (vect_model_simple_cost, vect_model_store_cost, vect_model_load_cost): > > Declare (make extern). > > * tree-vect-analyze.c (vect_update_slp_costs_according_to_vf): New. > > (vect_analyze_operations): Call vect_update_slp_costs_according_to_vf. > > (vect_get_and_check_slp_defs): Calculate costs. Add arguments. > > (vect_build_slp_tree): Likewise. > > (vect_analyze_slp_instance): Initialize cost fields. Update > > arguments of vect_build_slp_tree. > > * tree-vect-transform.c (vect_estimate_min_profitable_iters): Take > > SLP costs into account. > > (vect_model_simple_cost): Make extern, add SLP parameter and handle > > SLP. > > (vect_model_store_cost, vect_model_load_cost): Likewise. > > (vectorizable_call): Add argument to vect_model_simple_cost. > > (vectorizable_assignment): Call vect_model_simple_cost only for not > > pure SLP stmts. > > (vectorizable_operation): Likewise. > > (vectorizable_type_demotion): Add argument to > > vect_model_simple_cost. > > (vectorizable_type_promotion): Likewise. > > (vectorizable_store): Call vect_model_simple_cost only for not pure > > SLP stmts. > > (vectorizable_load): Likewise. > > > > [attachment "slp-part4.txt" deleted by Dorit Nuzman/Haifa/IBM] ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <OF21828808.D8D7629C-ONC2257337.00487532-C2257337.0048D04A@LocalDomain>]
* Re: [patch] Loop-aware SLP 4/5 [not found] <OF21828808.D8D7629C-ONC2257337.00487532-C2257337.0048D04A@LocalDomain> @ 2007-08-18 18:40 ` Dorit Nuzman 0 siblings, 0 replies; 7+ messages in thread From: Dorit Nuzman @ 2007-08-18 18:40 UTC (permalink / raw) To: Ira Rosen; +Cc: gcc-patches Ira Rosen/Haifa/IBM wrote on 14/08/2007 16:15:19: > This part adds an adjustment of vectorization cost model to SLP. > > SLP costs are calculated for each SLP computation tree recursively > during its analysis, and added to the overall costs if we decide to SLP. > > Thanks, > Ira > +/* SLP costs are calculated according to SLP instance unrolling factor (i.e., + the number of created vector stmts depends on the unrolling factor). However, + the actual number of vector stmts for every SLP node depends on VF which is + set later in vect_analyze_operations(). Hence, SLP costs should be updated. + In this function we assume that the inside costs calculated in + vect_model_xxx_cost are linear in ncopies. */ can you explain why this assumption holds (or when it may be broken, and assert against that)? + /* We assume that costs are linear in ncopies. */ + if (SLP_INSTANCE_UNROLLING_FACTOR (instance) != vf) + SLP_INSTANCE_INSIDE_OF_LOOP_COST (instance) *= vf + / SLP_INSTANCE_UNROLLING_FACTOR (instance); to me it's less confusing if you remove the 'if' (when unrolling==vf you'll just multiply by 1, so no need for the special check). and lastly, about the following repeating change: + int *inside_cost_field, *outside_cost_field; + + /* Take addresses of relevant fields to update in the function. */ + if (slp_node) + { + inside_cost_field = &(SLP_TREE_INSIDE_OF_LOOP_COST (slp_node)); + outside_cost_field = &(SLP_TREE_OUTSIDE_OF_LOOP_COST (slp_node)); + } + else + { + inside_cost_field = &(STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info)); + outside_cost_field = &(STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info)); + } ... - STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = outer_cost; + *outside_cost_field = outer_cost; inner_cost += ncopies * (TARG_VEC_LOAD_COST + TARG_VEC_STMT_COST); @@ -607,12 +654,12 @@ vect_model_load_cost (stmt_vec_info stmt gcc_unreachable (); } - STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = inner_cost; + *inside_cost_field = inner_cost; Again, in an attempt to reduce "if (slp_node)" switches in the code, I'd replace STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost; STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = cost; with stmt_vinfo_inside_of_loop_cost (stmt_info, slp_node, cost); stmt_vinfo_outside_of_loop_cost (stmt_info, slp_node, cost); ...and hide the "if (slp_node)" switch there: stmt_vinfo_inside_of_loop_cost (stmt_info, slp_node, cost) { if (slp_node) SLP_TREE_INSIDE_OF_LOOP_COST (slp_node) = cost; else STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost; } (for your consideration...) ok otherwise, thanks, dorit > ChangeLog: > > * tree-vectorizer.h (_slp_tree): Add new fields for costs and their > access functions. > (_slp_instance): Likewise. > (vect_model_simple_cost, vect_model_store_cost, vect_model_load_cost): > Declare (make extern). > * tree-vect-analyze.c (vect_update_slp_costs_according_to_vf): New. > (vect_analyze_operations): Call vect_update_slp_costs_according_to_vf. > (vect_get_and_check_slp_defs): Calculate costs. Add arguments. > (vect_build_slp_tree): Likewise. > (vect_analyze_slp_instance): Initialize cost fields. Update > arguments of vect_build_slp_tree. > * tree-vect-transform.c (vect_estimate_min_profitable_iters): Take > SLP costs into account. > (vect_model_simple_cost): Make extern, add SLP parameter and handle > SLP. > (vect_model_store_cost, vect_model_load_cost): Likewise. > (vectorizable_call): Add argument to vect_model_simple_cost. > (vectorizable_assignment): Call vect_model_simple_cost only for not > pure SLP stmts. > (vectorizable_operation): Likewise. > (vectorizable_type_demotion): Add argument to > vect_model_simple_cost. > (vectorizable_type_promotion): Likewise. > (vectorizable_store): Call vect_model_simple_cost only for not pure > SLP stmts. > (vectorizable_load): Likewise. > > [attachment "slp-part4.txt" deleted by Dorit Nuzman/Haifa/IBM] ^ permalink raw reply [flat|nested] 7+ messages in thread
* [patch] Loop-aware SLP @ 2007-08-14 13:01 Ira Rosen 2007-08-14 13:16 ` [patch] Loop-aware SLP 4/5 Ira Rosen 0 siblings, 1 reply; 7+ messages in thread From: Ira Rosen @ 2007-08-14 13:01 UTC (permalink / raw) To: gcc-patches; +Cc: Dorit Nuzman This patch adds a support of straight-line code vectorization inside loops. This a merge from autovect branch. So, for example, if you have an unrolled loop like this: for (i=0; i<N; i++){ a[4*i] = b[4*i] + x0; a[4*i+1] = b[4*i+1] + x1; a[4*i+2] = b[4*i+2] + x2; a[4*i+3] = b[4*i+3] + x3; } it will get vectorized we follows: for (i=0; i<N; i++){ va[4*i:4*i+3] = vb[4*i:4*i+3] + {x0,x1,x2,x3}; } Note that, as opposed to loop vectorization, the loop count is not divided by the vectorization factor. This implementation also handles cases in which there are not enough scalar stmts to fill up a vector, and unrolls the loop as necessary during the vectorization analysis. So the following loop: for (i=0; i<N; i++){ a[2*i] = b[2*i] + x0; a[2*i+1] = b[2*i+1] + x1; } is vectorized into the following: for (i=0; i<N/2; i++){ va[4*i:4*i+3] = vb[4*i:4*i+3] + {x0,x1,x0,x1}; } SLP opportunities are discovered based on strided access support analysis; we start off from groups of adjacent memory-references (which is something we already have), and analyze their def-use chain in search for sequences of isomorphic operations that can be packed into vector operations. The main differences between this implementation and the approach described in the original SLP paper (http://portal.acm.org/citation.cfm?id=349320) are the following: - We focus on loops, rather then any basic-block in the program. We do that because this way we can reuse all the existing loop-based vectorization infrastructure. Also we expect that most of the worthwhile opportunities are in loops, and in this case you loose if you work without the loop context. - The original SLP algorithm starts off from "seeds" of pairs of adjacent loads/stores, and continues growing them and further merge in the Combination stage of the algorithm (to get VS (Vector Size) sized groups). Our seed is a groups of adjacent (interleaved) stores identified beforehand by the interleaving analysis, so we don't need a combination stage at all. We don't yet look at loads, and not sure we'd need to. - Our approach combines both SLP-based and loop-based vectorization. So, for example, if we have a loop like this: for (i=0; i<N; i++){ a[4*i] = b[4*i] + x0; a[4*i+1] = b[4*i+1] + x1; a[4*i+2] = b[4*i+2] + x2; a[4*i+3] = b[4*i+3] + x3; c[i] = 0; } we vectorize it as follows: for (i=0; i<N/4; i++){ va[16*i:16*i+3] = vb[16*i:16*i+3] + {x0,x1,x2,x3}; va[16*i+4:16*i+7] = vb[16*i+4:16*i+7] + {x0,x1,x2,x3}; va[16*i+8:16*i+11] = vb[16*i+8:16*i+11] + {x0,x1,x2,x3}; va[16*i+12:16*i+15] = vb[16*i+12:16*i+15] + {x0,x1,x2,x3}; vc[4*i:4*i+3] = {0,0,0,0}; } (note that stmts 1,2,3,4 got vectorized using SLP, and stmt 5 got vectorized using regular loop-vectorization). We plan to extend this initial "SLP-like vectorization in loops" in the following directions: - support reduction - support data permutations - support interleaved accesses with gaps - support loops with multiple types More information can be found in our summit paper ( http://gcc.gnu.org/wiki/HomePage?action=AttachFile&do=get&target=GCC2007-Proceedings.pdf). This is a joint work with Dorit. The patch is divided into 5 parts and is relative to this cleanup patch ( http://gcc.gnu.org/ml/gcc-patches/2007-08/msg00806.html). I am going to submit these parts as a reply to this note. Each part can be compiled (but with warnings about unused arguments and variables) and passes vectorization tests. The whole patch passes bootstrap with vectorization enabled and vectorization testcases on x86_64-linux. I am going to run full regtesting and bootstrap on PowerPC. O.K. for mainline once the testing is completed? Thanks, Ira :ADDPATCH SSA (vectorizer): ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch] Loop-aware SLP 4/5 2007-08-14 13:01 [patch] Loop-aware SLP Ira Rosen @ 2007-08-14 13:16 ` Ira Rosen 0 siblings, 0 replies; 7+ messages in thread From: Ira Rosen @ 2007-08-14 13:16 UTC (permalink / raw) To: Ira Rosen; +Cc: Dorit Nuzman, gcc-patches [-- Attachment #1: Type: text/plain, Size: 1582 bytes --] This part adds an adjustment of vectorization cost model to SLP. SLP costs are calculated for each SLP computation tree recursively during its analysis, and added to the overall costs if we decide to SLP. Thanks, Ira ChangeLog: * tree-vectorizer.h (_slp_tree): Add new fields for costs and their access functions. (_slp_instance): Likewise. (vect_model_simple_cost, vect_model_store_cost, vect_model_load_cost): Declare (make extern). * tree-vect-analyze.c (vect_update_slp_costs_according_to_vf): New. (vect_analyze_operations): Call vect_update_slp_costs_according_to_vf. (vect_get_and_check_slp_defs): Calculate costs. Add arguments. (vect_build_slp_tree): Likewise. (vect_analyze_slp_instance): Initialize cost fields. Update arguments of vect_build_slp_tree. * tree-vect-transform.c (vect_estimate_min_profitable_iters): Take SLP costs into account. (vect_model_simple_cost): Make extern, add SLP parameter and handle SLP. (vect_model_store_cost, vect_model_load_cost): Likewise. (vectorizable_call): Add argument to vect_model_simple_cost. (vectorizable_assignment): Call vect_model_simple_cost only for not pure SLP stmts. (vectorizable_operation): Likewise. (vectorizable_type_demotion): Add argument to vect_model_simple_cost. (vectorizable_type_promotion): Likewise. (vectorizable_store): Call vect_model_simple_cost only for not pure SLP stmts. (vectorizable_load): Likewise. (See attached file: slp-part4.txt) [-- Attachment #2: slp-part4.txt --] [-- Type: text/plain, Size: 19284 bytes --] --- tree-vectorizer.h.patch2 2007-08-14 15:43:29.000000000 +0300 +++ tree-vectorizer.h 2007-08-14 15:44:42.000000000 +0300 @@ -120,6 +120,12 @@ typedef struct _slp_instance { /* The unrolling factor required to vectorized this SLP instance. */ unsigned int unrolling_factor; + /* Vectorization costs associated with SLP instance. */ + struct + { + int outside_of_loop; /* Statements generated outside loop. */ + int inside_of_loop; /* Statements generated inside loop. */ + } cost; } *slp_instance; DEF_VEC_P(slp_instance); @@ -129,12 +135,16 @@ DEF_VEC_ALLOC_P(slp_instance, heap); #define SLP_INSTANCE_TREE(S) (S)->root #define SLP_INSTANCE_GROUP_SIZE(S) (S)->group_size #define SLP_INSTANCE_UNROLLING_FACTOR(S) (S)->unrolling_factor +#define SLP_INSTANCE_OUTSIDE_OF_LOOP_COST(S) (S)->cost.outside_of_loop +#define SLP_INSTANCE_INSIDE_OF_LOOP_COST(S) (S)->cost.inside_of_loop #define SLP_TREE_LEFT(S) (S)->left #define SLP_TREE_RIGHT(S) (S)->right #define SLP_TREE_SCALAR_STMTS(S) (S)->stmts #define SLP_TREE_VEC_STMTS(S) (S)->vec_stmts #define SLP_TREE_NUMBER_OF_VEC_STMTS(S) (S)->vec_stmts_size +#define SLP_TREE_OUTSIDE_OF_LOOP_COST(S) (S)->cost.outside_of_loop +#define SLP_TREE_INSIDE_OF_LOOP_COST(S) (S)->cost.inside_of_loop /*-----------------------------------------------------------------*/ /* Info on vectorized loops. */ @@ -598,6 +608,11 @@ extern bool vectorizable_live_operation extern bool vectorizable_reduction (tree, block_stmt_iterator *, tree *); extern bool vectorizable_induction (tree, block_stmt_iterator *, tree *); extern int vect_estimate_min_profitable_iters (loop_vec_info); +extern void vect_model_simple_cost (stmt_vec_info, int, enum vect_def_type *, + slp_tree); +extern void vect_model_store_cost (stmt_vec_info, int, enum vect_def_type, + slp_tree); +extern void vect_model_load_cost (stmt_vec_info, int, slp_tree); /* Driver for transformation stage. */ extern void vect_transform_loop (loop_vec_info); --- tree-vect-analyze.c.patch4 2007-08-14 15:42:42.000000000 +0300 +++ tree-vect-analyze.c 2007-08-14 15:43:26.000000000 +0300 @@ -350,6 +350,33 @@ vect_can_advance_ivs_p (loop_vec_info lo } +/* SLP costs are calculated according to SLP instance unrolling factor (i.e., + the number of created vector stmts depends on the unrolling factor). However, + the actual number of vector stmts for every SLP node depends on VF which is + set later in vect_analyze_operations(). Hence, SLP costs should be updated. + In this function we assume that the inside costs calculated in + vect_model_xxx_cost are linear in ncopies. */ + +static void +vect_update_slp_costs_according_to_vf (loop_vec_info loop_vinfo) +{ + unsigned int i, vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); + VEC (slp_instance, heap) *slp_instances = LOOP_VINFO_SLP_INSTANCES (loop_vinfo); + slp_instance instance; + + if (vect_print_dump_info (REPORT_DETAILS)) + fprintf (vect_dump, "=== vect_update_slp_costs_according_to_vf ==="); + + for (i = 0; VEC_iterate (slp_instance, slp_instances, i, instance); i++) + { + /* We assume that costs are linear in ncopies. */ + if (SLP_INSTANCE_UNROLLING_FACTOR (instance) != vf) + SLP_INSTANCE_INSIDE_OF_LOOP_COST (instance) *= vf + / SLP_INSTANCE_UNROLLING_FACTOR (instance); + } +} + + /* Function vect_analyze_operations. Scan the loop stmts and make sure they are all vectorizable. */ @@ -569,6 +596,10 @@ vect_analyze_operations (loop_vec_info l LOOP_VINFO_VECT_FACTOR (loop_vinfo) = vectorization_factor; } + /* After VF is set, SLP costs should be updated since the number of created + vector stmts depends on VF. */ + vect_update_slp_costs_according_to_vf (loop_vinfo); + if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) && vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, @@ -2248,7 +2279,8 @@ vect_get_and_check_slp_defs (loop_vec_in enum vect_def_type *first_stmt_dt1, tree *first_stmt_def0_type, tree *first_stmt_def1_type, - tree *first_stmt_const_oprnd) + tree *first_stmt_const_oprnd, + int ncopies_for_cost) { tree oprnd; enum operation_type op_type = TREE_OPERAND_LENGTH (rhs); @@ -2291,6 +2323,14 @@ vect_get_and_check_slp_defs (loop_vec_in *first_stmt_def0_type = TREE_TYPE (def); else *first_stmt_const_oprnd = oprnd; + + /* Analyze costs (for the first stmt of the group only). */ + if (op_type) + /* Not memory operation (we don't call this functions for loads). */ + vect_model_simple_cost (stmt_info, ncopies_for_cost, dt, slp_node); + else + /* Store. */ + vect_model_store_cost (stmt_info, ncopies_for_cost, dt[0], slp_node); } else @@ -2369,7 +2409,9 @@ vect_get_and_check_slp_defs (loop_vec_in static bool vect_build_slp_tree (loop_vec_info loop_vinfo, slp_tree *node, - unsigned int group_size, bool *slp_impossible) + unsigned int group_size, bool *slp_impossible, + int *inside_cost, int *outside_cost, + int ncopies_for_cost) { VEC (tree, heap) *def_stmts0 = VEC_alloc (tree, heap, group_size); VEC (tree, heap) *def_stmts1 = VEC_alloc (tree, heap, group_size); @@ -2492,7 +2534,8 @@ vect_build_slp_tree (loop_vec_info loop_ &first_stmt_dt1, &first_stmt_def0_type, &first_stmt_def1_type, - &first_stmt_const_oprnd)) + &first_stmt_const_oprnd, + ncopies_for_cost)) return false; } else @@ -2515,6 +2558,10 @@ vect_build_slp_tree (loop_vec_info loop_ return false; } + + /* Analyze costs (for the first stmt in the group). */ + vect_model_load_cost (vinfo_for_stmt (stmt), + ncopies_for_cost, *node); } else { @@ -2572,11 +2619,16 @@ vect_build_slp_tree (loop_vec_info loop_ &first_stmt_dt1, &first_stmt_def0_type, &first_stmt_def1_type, - &first_stmt_const_oprnd)) + &first_stmt_const_oprnd, + ncopies_for_cost)) return false; } } + /* Add the costs of the node to the overall instance costs. */ + *inside_cost += SLP_TREE_INSIDE_OF_LOOP_COST (*node); + *outside_cost += SLP_TREE_OUTSIDE_OF_LOOP_COST (*node); + /* Strided loads were reached - stop the recursion. */ if (stop_recursion) return true; @@ -2589,8 +2641,11 @@ vect_build_slp_tree (loop_vec_info loop_ SLP_TREE_VEC_STMTS (left_node) = NULL; SLP_TREE_LEFT (left_node) = NULL; SLP_TREE_RIGHT (left_node) = NULL; + SLP_TREE_OUTSIDE_OF_LOOP_COST (left_node) = 0; + SLP_TREE_INSIDE_OF_LOOP_COST (left_node) = 0; if (!vect_build_slp_tree (loop_vinfo, &left_node, group_size, - slp_impossible)) + slp_impossible, inside_cost, outside_cost, + ncopies_for_cost)) return false; SLP_TREE_LEFT (*node) = left_node; @@ -2603,8 +2658,11 @@ vect_build_slp_tree (loop_vec_info loop_ SLP_TREE_VEC_STMTS (right_node) = NULL; SLP_TREE_LEFT (right_node) = NULL; SLP_TREE_RIGHT (right_node) = NULL; + SLP_TREE_OUTSIDE_OF_LOOP_COST (right_node) = 0; + SLP_TREE_INSIDE_OF_LOOP_COST (right_node) = 0; if (!vect_build_slp_tree (loop_vinfo, &right_node, group_size, - slp_impossible)) + slp_impossible, inside_cost, outside_cost, + ncopies_for_cost)) return false; SLP_TREE_RIGHT (*node) = right_node; @@ -2672,6 +2730,7 @@ vect_analyze_slp_instance (loop_vec_info tree vectype, scalar_type, next; unsigned int vectorization_factor = 0, ncopies; bool slp_impossible = false; + int inside_cost = 0, outside_cost = 0, ncopies_for_cost; /* FORNOW: multiple types are not supported. */ scalar_type = TREE_TYPE (DR_REF (STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt)))); @@ -2701,18 +2760,28 @@ vect_analyze_slp_instance (loop_vec_info SLP_TREE_NUMBER_OF_VEC_STMTS (node) = 0; SLP_TREE_LEFT (node) = NULL; SLP_TREE_RIGHT (node) = NULL; + SLP_TREE_OUTSIDE_OF_LOOP_COST (node) = 0; + SLP_TREE_INSIDE_OF_LOOP_COST (node) = 0; /* Calculate the unrolling factor. */ unrolling_factor = least_common_multiple (nunits, group_size) / group_size; + /* Calculate the number of vector stmts to create based on the unrolling + factor (number of vectors is 1 if NUNITS >= GROUP_SIZE, and is + GROUP_SIZE / NUNITS otherwise. */ + ncopies_for_cost = unrolling_factor * group_size / nunits; + /* Build the tree for the SLP instance. */ - if (vect_build_slp_tree (loop_vinfo, &node, group_size, &slp_impossible)) + if (vect_build_slp_tree (loop_vinfo, &node, group_size, &slp_impossible, + &inside_cost, &outside_cost, ncopies_for_cost)) { /* Create a new SLP instance. */ new_instance = XNEW (struct _slp_instance); SLP_INSTANCE_TREE (new_instance) = node; SLP_INSTANCE_GROUP_SIZE (new_instance) = group_size; SLP_INSTANCE_UNROLLING_FACTOR (new_instance) = unrolling_factor; + SLP_INSTANCE_OUTSIDE_OF_LOOP_COST (new_instance) = outside_cost; + SLP_INSTANCE_INSIDE_OF_LOOP_COST (new_instance) = inside_cost; VEC_safe_push (slp_instance, heap, LOOP_VINFO_SLP_INSTANCES (loop_vinfo), new_instance); if (vect_print_dump_info (REPORT_DETAILS)) --- tree-vect-transform.c.patch4 2007-08-14 15:42:48.000000000 +0300 +++ tree-vect-transform.c 2007-08-14 15:45:55.000000000 +0300 @@ -99,6 +99,8 @@ vect_estimate_min_profitable_iters (loop basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo); int nbbs = loop->num_nodes; int byte_misalign; + VEC (slp_instance, heap) *slp_instances; + slp_instance instance; /* Cost model disabled. */ if (!flag_vect_cost_model) @@ -250,6 +252,14 @@ vect_estimate_min_profitable_iters (loop targetm.vectorize.builtin_vectorization_cost (runtime_test)); } + /* Add SLP costs. */ + slp_instances = LOOP_VINFO_SLP_INSTANCES (loop_vinfo); + for (i = 0; VEC_iterate (slp_instance, slp_instances, i, instance); i++) + { + vec_outside_cost += SLP_INSTANCE_OUTSIDE_OF_LOOP_COST (instance); + vec_inside_cost += SLP_INSTANCE_INSIDE_OF_LOOP_COST (instance); + } + /* Calculate number of iterations required to make the vector version profitable, relative to the loop bodies only. The following condition must hold true: ((SIC*VF)-VIC)*niters > VOC*VF, where @@ -416,25 +426,37 @@ vect_model_induction_cost (stmt_vec_info single op. Right now, this does not account for multiple insns that could be generated for the single vector op. We will handle that shortly. */ -static void +void vect_model_simple_cost (stmt_vec_info stmt_info, int ncopies, - enum vect_def_type *dt) + enum vect_def_type *dt, slp_tree slp_node) { int i; + int *inside_cost_field, *outside_cost_field; - STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = ncopies * TARG_VEC_STMT_COST; + /* Take addresses of relevant fields to update in the function. */ + if (slp_node) + { + inside_cost_field = &(SLP_TREE_INSIDE_OF_LOOP_COST (slp_node)); + outside_cost_field = &(SLP_TREE_OUTSIDE_OF_LOOP_COST (slp_node)); + } + else + { + inside_cost_field = &(STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info)); + outside_cost_field = &(STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info)); + } + + *inside_cost_field = ncopies * TARG_VEC_STMT_COST; /* FORNOW: Assuming maximum 2 args per stmts. */ for (i = 0; i < 2; i++) { if (dt[i] == vect_constant_def || dt[i] == vect_invariant_def) - STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) += TARG_SCALAR_TO_VEC_COST; + *outside_cost_field += TARG_SCALAR_TO_VEC_COST; } if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, "vect_model_simple_cost: inside_cost = %d, " - "outside_cost = %d .", STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info), - STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info)); + "outside_cost = %d .", *inside_cost_field, *outside_cost_field); } @@ -461,18 +483,31 @@ vect_cost_strided_group_size (stmt_vec_i Models cost for stores. In the case of strided accesses, one access has the overhead of the strided access attributed to it. */ -static void +void vect_model_store_cost (stmt_vec_info stmt_info, int ncopies, - enum vect_def_type dt) + enum vect_def_type dt, slp_tree slp_node) { int cost = 0; int group_size; + int *inside_cost_field, *outside_cost_field; + + /* Take addresses of relevant fields to update in the function. */ + if (slp_node) + { + inside_cost_field = &(SLP_TREE_INSIDE_OF_LOOP_COST (slp_node)); + outside_cost_field = &(SLP_TREE_OUTSIDE_OF_LOOP_COST (slp_node)); + } + else + { + inside_cost_field = &(STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info)); + outside_cost_field = &(STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info)); + } if (dt == vect_constant_def || dt == vect_invariant_def) - STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = TARG_SCALAR_TO_VEC_COST; + *outside_cost_field = TARG_SCALAR_TO_VEC_COST; /* Strided access? */ - if (STMT_VINFO_STRIDED_ACCESS (stmt_info)) + if (DR_GROUP_FIRST_DR (stmt_info)) group_size = vect_cost_strided_group_size (stmt_info); /* Not a strided access. */ else @@ -495,12 +530,11 @@ vect_model_store_cost (stmt_vec_info stm /* Costs of the stores. */ cost += ncopies * TARG_VEC_STORE_COST; - STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost; + *inside_cost_field = cost; if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, "vect_model_store_cost: inside_cost = %d, " - "outside_cost = %d .", STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info), - STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info)); + "outside_cost = %d .", *inside_cost_field, *outside_cost_field); } @@ -511,8 +545,8 @@ vect_model_store_cost (stmt_vec_info stm accesses are supported for loads, we also account for the costs of the access scheme chosen. */ -static void -vect_model_load_cost (stmt_vec_info stmt_info, int ncopies) +void +vect_model_load_cost (stmt_vec_info stmt_info, int ncopies, slp_tree slp_node) { int inner_cost = 0; @@ -520,10 +554,23 @@ vect_model_load_cost (stmt_vec_info stmt int alignment_support_cheme; tree first_stmt; struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info), *first_dr; + int *inside_cost_field, *outside_cost_field; + + /* Take addresses of relevant fields to update in the function. */ + if (slp_node) + { + inside_cost_field = &(SLP_TREE_INSIDE_OF_LOOP_COST (slp_node)); + outside_cost_field = &(SLP_TREE_OUTSIDE_OF_LOOP_COST (slp_node)); + } + else + { + inside_cost_field = &(STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info)); + outside_cost_field = &(STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info)); + } /* Strided accesses? */ first_stmt = DR_GROUP_FIRST_DR (stmt_info); - if (first_stmt) + if (first_stmt && !slp_node) { group_size = vect_cost_strided_group_size (stmt_info); first_dr = STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt)); @@ -589,14 +636,14 @@ vect_model_load_cost (stmt_vec_info stmt access in the group. Inside the loop, there is a load op and a realignment op. */ - if ((!STMT_VINFO_STRIDED_ACCESS (stmt_info)) || group_size > 1) + if ((!DR_GROUP_FIRST_DR (stmt_info)) || group_size > 1 || slp_node) { outer_cost = 2*TARG_VEC_STMT_COST; if (targetm.vectorize.builtin_mask_for_load) outer_cost += TARG_VEC_STMT_COST; } - STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = outer_cost; + *outside_cost_field = outer_cost; inner_cost += ncopies * (TARG_VEC_LOAD_COST + TARG_VEC_STMT_COST); @@ -607,12 +654,12 @@ vect_model_load_cost (stmt_vec_info stmt gcc_unreachable (); } - STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = inner_cost; + *inside_cost_field = inner_cost; if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, "vect_model_load_cost: inside_cost = %d, " - "outside_cost = %d .", STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info), - STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info)); + "outside_cost = %d .", *inside_cost_field, *outside_cost_field); + } @@ -2658,7 +2705,7 @@ vectorizable_call (tree stmt, block_stmt STMT_VINFO_TYPE (stmt_info) = call_vec_info_type; if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, "=== vectorizable_call ==="); - vect_model_simple_cost (stmt_info, ncopies, dt); + vect_model_simple_cost (stmt_info, ncopies, dt, NULL); return true; } @@ -3194,7 +3241,7 @@ vectorizable_assignment (tree stmt, bloc STMT_VINFO_TYPE (stmt_info) = assignment_vec_info_type; if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, "=== vectorizable_assignment ==="); - vect_model_simple_cost (stmt_info, ncopies, dt); + vect_model_simple_cost (stmt_info, ncopies, dt, NULL); return true; } @@ -3461,7 +3508,7 @@ vectorizable_operation (tree stmt, block STMT_VINFO_TYPE (stmt_info) = op_vec_info_type; if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, "=== vectorizable_operation ==="); - vect_model_simple_cost (stmt_info, ncopies, dt); + vect_model_simple_cost (stmt_info, ncopies, dt, NULL); return true; } @@ -3719,7 +3766,7 @@ vectorizable_type_demotion (tree stmt, b STMT_VINFO_TYPE (stmt_info) = type_demotion_vec_info_type; if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, "=== vectorizable_demotion ==="); - vect_model_simple_cost (stmt_info, ncopies, dt); + vect_model_simple_cost (stmt_info, ncopies, dt, NULL); return true; } @@ -3882,7 +3929,7 @@ vectorizable_type_promotion (tree stmt, STMT_VINFO_TYPE (stmt_info) = type_promotion_vec_info_type; if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, "=== vectorizable_promotion ==="); - vect_model_simple_cost (stmt_info, 2*ncopies, dt); + vect_model_simple_cost (stmt_info, 2*ncopies, dt, NULL); return true; } @@ -4212,7 +4259,8 @@ vectorizable_store (tree stmt, block_stm if (!vec_stmt) /* transformation not required. */ { STMT_VINFO_TYPE (stmt_info) = store_vec_info_type; - vect_model_store_cost (stmt_info, ncopies, dt); + if (!STMT_VINFO_PURE_SLP (stmt_info)) + vect_model_store_cost (stmt_info, ncopies, dt, NULL); return true; } @@ -4893,7 +4941,8 @@ vectorizable_load (tree stmt, block_stmt if (!vec_stmt) /* transformation not required. */ { STMT_VINFO_TYPE (stmt_info) = load_vec_info_type; - vect_model_load_cost (stmt_info, ncopies); + if (!STMT_VINFO_PURE_SLP (stmt_info)) + vect_model_load_cost (stmt_info, ncopies, NULL); return true; } ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2007-09-09 11:00 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <OF988F5AAC.5E4646D5-ONC225734D.002EB0F3-C2257351.00279CD2@LocalDomain> 2007-09-09 8:59 ` [patch] Loop-aware SLP 4/5 Dorit Nuzman [not found] <OF56DBF560.643A33F4-ONC2257351.003B4089-C2257351.003BEBE7@LocalDomain> 2007-09-09 11:54 ` Dorit Nuzman [not found] <OFADE7D20D.3F117D3E-ONC2257351.002B67C0-C2257351.002BF742@LocalDomain> 2007-09-09 9:05 ` Ira Rosen 2007-09-09 11:32 ` Ira Rosen [not found] <OFFE7E55A6.7B3C8820-ONC225733B.006323F7-C225733B.00663708@LocalDomain> 2007-09-09 7:57 ` Ira Rosen [not found] <OF21828808.D8D7629C-ONC2257337.00487532-C2257337.0048D04A@LocalDomain> 2007-08-18 18:40 ` Dorit Nuzman 2007-08-14 13:01 [patch] Loop-aware SLP Ira Rosen 2007-08-14 13:16 ` [patch] Loop-aware SLP 4/5 Ira Rosen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).