* [PATCH 0/2] Loop distribution for memset zero @ 2010-07-30 20:41 Sebastian Pop 2010-07-30 20:43 ` [PATCH 2/2] Enable flag_tree_loop_distribute_memset_zero at -O3 Sebastian Pop ` (2 more replies) 0 siblings, 3 replies; 24+ messages in thread From: Sebastian Pop @ 2010-07-30 20:41 UTC (permalink / raw) To: gcc-patches; +Cc: matz, Sebastian Pop Hi, Michael Matz proposed that it would be a good idea for some CPU2006 benchmarks to add a separate heuristic for the loop distribution pass for the memset zero pattern, and to enable that at -O3 in order to exercise the loop distribution code. The following two patches implement on top of the current loop distribution pass the heuristic, and enable it at -O3. The new pass starts by adding to the partitions working list the data references that are initialized to zero. These partitions are then code generated in different loops, and the current loop distribution detects the memset zero pattern. Regstrapped on amd64-linux. SPEC2006 passed with -O3 (except the dealII compile fail that I haven't fixed in my sources yet...). Bootstrap failed with BOOT_CFLAGS="-g -O3", but then when I tried also without these two patches it also failed with the same miscompiled files, so bootstrap of trunk is broken at -O3, see http://gcc.gnu.org/PR45146 Ok for trunk? Thanks, Sebastian Add pass_loop_distribute_memset_zero. Enable flag_tree_loop_distribute_memset_zero at -O3. gcc/common.opt | 4 ++ gcc/doc/invoke.texi | 23 ++++++++++++++- gcc/opts.c | 1 + gcc/passes.c | 1 + gcc/tree-data-ref.c | 26 +++++++++++++++++ gcc/tree-data-ref.h | 1 + gcc/tree-loop-distribution.c | 63 ++++++++++++++++++++++++++++++++++++++++++ gcc/tree-pass.h | 1 + 8 files changed, 119 insertions(+), 1 deletions(-) ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 2/2] Enable flag_tree_loop_distribute_memset_zero at -O3. 2010-07-30 20:41 [PATCH 0/2] Loop distribution for memset zero Sebastian Pop @ 2010-07-30 20:43 ` Sebastian Pop 2010-07-30 20:52 ` [PATCH 1/2] Add pass_loop_distribute_memset_zero Sebastian Pop 2010-07-31 11:35 ` [PATCH 0/2] Loop distribution for memset zero Richard Guenther 2 siblings, 0 replies; 24+ messages in thread From: Sebastian Pop @ 2010-07-30 20:43 UTC (permalink / raw) To: gcc-patches; +Cc: matz, Sebastian Pop --- gcc/doc/invoke.texi | 1 + gcc/opts.c | 1 + 2 files changed, 2 insertions(+), 0 deletions(-) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 2d61382..ca3238c 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -6944,6 +6944,7 @@ DO I = 1, N ENDDO @end smallexample and the initialization loop is transformed into a call to memset zero. +This flag is enabled by default at @option{-O3}. @item -ftree-loop-im @opindex ftree-loop-im diff --git a/gcc/opts.c b/gcc/opts.c index 07d7a23..16a337c 100644 --- a/gcc/opts.c +++ b/gcc/opts.c @@ -862,6 +862,7 @@ decode_options (unsigned int argc, const char **argv, /* -O3 optimizations. */ opt3 = (optimize >= 3); + flag_tree_loop_distribute_memset_zero = opt3; flag_predictive_commoning = opt3; flag_inline_functions = opt3; flag_unswitch_loops = opt3; -- 1.7.0.4 ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 1/2] Add pass_loop_distribute_memset_zero. 2010-07-30 20:41 [PATCH 0/2] Loop distribution for memset zero Sebastian Pop 2010-07-30 20:43 ` [PATCH 2/2] Enable flag_tree_loop_distribute_memset_zero at -O3 Sebastian Pop @ 2010-07-30 20:52 ` Sebastian Pop 2010-07-31 11:35 ` [PATCH 0/2] Loop distribution for memset zero Richard Guenther 2 siblings, 0 replies; 24+ messages in thread From: Sebastian Pop @ 2010-07-30 20:52 UTC (permalink / raw) To: gcc-patches; +Cc: matz, Sebastian Pop --- gcc/common.opt | 4 ++ gcc/doc/invoke.texi | 22 ++++++++++++++- gcc/passes.c | 1 + gcc/tree-data-ref.c | 26 +++++++++++++++++ gcc/tree-data-ref.h | 1 + gcc/tree-loop-distribution.c | 63 ++++++++++++++++++++++++++++++++++++++++++ gcc/tree-pass.h | 1 + 7 files changed, 117 insertions(+), 1 deletions(-) diff --git a/gcc/common.opt b/gcc/common.opt index 41a9838..77cf58e 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1337,6 +1337,10 @@ ftree-loop-distribution Common Report Var(flag_tree_loop_distribution) Optimization Enable loop distribution on trees +ftree-loop-distribute-memset-zero +Common Report Var(flag_tree_loop_distribute_memset_zero) Optimization +Enable loop distribution of initialization loops using memset zero + ftree-loop-im Common Report Var(flag_tree_loop_im) Init(1) Optimization Enable loop invariant motion on trees diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 73051de..2d61382 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -384,7 +384,7 @@ Objective-C and Objective-C++ Dialects}. -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-copy-prop @gol -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol -ftree-forwprop -ftree-fre -ftree-loop-if-convert -ftree-loop-im @gol --ftree-phiprop -ftree-loop-distribution @gol +-ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-memset-zero @gol -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-pta -ftree-reassoc @gol -ftree-sink -ftree-sra -ftree-switch-conversion @gol @@ -6925,6 +6925,26 @@ DO I = 1, N ENDDO @end smallexample +@item -ftree-loop-distribute-memset-zero +Perform loop distribution of initialization loops and code generate +them with a call to memset zero. For example, the loop +@smallexample +DO I = 1, N + A(I) = 0 + B(I) = A(I) + I +ENDDO +@end smallexample +is transformed to +@smallexample +DO I = 1, N + A(I) = 0 +ENDDO +DO I = 1, N + B(I) = A(I) + I +ENDDO +@end smallexample +and the initialization loop is transformed into a call to memset zero. + @item -ftree-loop-im @opindex ftree-loop-im Perform loop invariant motion on trees. This pass moves only invariants that diff --git a/gcc/passes.c b/gcc/passes.c index 72e9b5a..7aed7e2 100644 --- a/gcc/passes.c +++ b/gcc/passes.c @@ -897,6 +897,7 @@ init_optimization_passes (void) NEXT_PASS (pass_scev_cprop); NEXT_PASS (pass_record_bounds); NEXT_PASS (pass_check_data_deps); + NEXT_PASS (pass_loop_distribute_memset_zero); NEXT_PASS (pass_loop_distribution); NEXT_PASS (pass_linear_transform); NEXT_PASS (pass_graphite_transforms); diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c index e7aa277..2656350 100644 --- a/gcc/tree-data-ref.c +++ b/gcc/tree-data-ref.c @@ -5038,6 +5038,32 @@ stores_from_loop (struct loop *loop, VEC (gimple, heap) **stmts) free (bbs); } +/* Initialize STMTS with all the statements of LOOP that contain a + store to memory of the form "A[i] = 0". */ + +void +stores_zero_from_loop (struct loop *loop, VEC (gimple, heap) **stmts) +{ + unsigned int i; + basic_block bb; + gimple_stmt_iterator si; + gimple stmt; + tree op; + basic_block *bbs = get_loop_body_in_dom_order (loop); + + for (i = 0; i < loop->num_nodes; i++) + for (bb = bbs[i], si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si)) + if ((stmt = gsi_stmt (si)) + && gimple_vdef (stmt) + && is_gimple_assign (stmt) + && gimple_assign_rhs_code (stmt) == INTEGER_CST + && (op = gimple_assign_rhs1 (stmt)) + && (integer_zerop (op) || real_zerop (op))) + VEC_safe_push (gimple, heap, *stmts, gsi_stmt (si)); + + free (bbs); +} + /* For a data reference REF, return the declaration of its base address or NULL_TREE if the base is not determined. */ diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h index eff5348..9e18e26 100644 --- a/gcc/tree-data-ref.h +++ b/gcc/tree-data-ref.h @@ -564,6 +564,7 @@ index_in_loop_nest (int var, VEC (loop_p, heap) *loop_nest) } void stores_from_loop (struct loop *, VEC (gimple, heap) **); +void stores_zero_from_loop (struct loop *, VEC (gimple, heap) **); void remove_similar_memory_refs (VEC (gimple, heap) **); bool rdg_defs_used_in_other_loops_p (struct graph *, int); bool have_similar_memory_accesses (gimple, gimple); diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c index 099a7fe..abc2ac9 100644 --- a/gcc/tree-loop-distribution.c +++ b/gcc/tree-loop-distribution.c @@ -1238,3 +1238,66 @@ struct gimple_opt_pass pass_loop_distribution = TODO_dump_func /* todo_flags_finish */ } }; + +/* Distribute all the loops containing initializations to zero. */ + +static unsigned int +tree_loop_distribute_memset_zero (void) +{ + struct loop *loop; + loop_iterator li; + + FOR_EACH_LOOP (li, loop, 0) + { + VEC (gimple, heap) *work_list = VEC_alloc (gimple, heap, 3); + int nb_generated_loops = 0; + + /* With the following working list, we're asking distribute_loop + to separate from the rest of the loop the stores of the form + "A[i] = 0". */ + stores_zero_from_loop (loop, &work_list); + + if (VEC_length (gimple, work_list) > 0) + nb_generated_loops = distribute_loop (loop, work_list); + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + if (nb_generated_loops > 1) + fprintf (dump_file, "Loop %d distributed: split to %d loops.\n", + loop->num, nb_generated_loops); + else + fprintf (dump_file, "Loop %d is the same.\n", loop->num); + } + + verify_loop_structure (); + + VEC_free (gimple, heap, work_list); + } + + return 0; +} + +static bool +gate_ldist_memset_zero (void) +{ + return flag_tree_loop_distribute_memset_zero != 0; +} + +struct gimple_opt_pass pass_loop_distribute_memset_zero = +{ + { + GIMPLE_PASS, + "ldist-memset-zero", /* name */ + gate_ldist_memset_zero, /* gate */ + tree_loop_distribute_memset_zero, /* execute */ + NULL, /* sub */ + NULL, /* next */ + 0, /* static_pass_number */ + TV_TREE_LOOP_DISTRIBUTION, /* tv_id */ + PROP_cfg | PROP_ssa, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + TODO_dump_func /* todo_flags_finish */ + } +}; diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index c72d7cf..3384aad 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -375,6 +375,7 @@ extern struct gimple_opt_pass pass_empty_loop; extern struct gimple_opt_pass pass_record_bounds; extern struct gimple_opt_pass pass_graphite_transforms; extern struct gimple_opt_pass pass_if_conversion; +extern struct gimple_opt_pass pass_loop_distribute_memset_zero; extern struct gimple_opt_pass pass_loop_distribution; extern struct gimple_opt_pass pass_vectorize; extern struct gimple_opt_pass pass_slp_vectorize; -- 1.7.0.4 ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/2] Loop distribution for memset zero 2010-07-30 20:41 [PATCH 0/2] Loop distribution for memset zero Sebastian Pop 2010-07-30 20:43 ` [PATCH 2/2] Enable flag_tree_loop_distribute_memset_zero at -O3 Sebastian Pop 2010-07-30 20:52 ` [PATCH 1/2] Add pass_loop_distribute_memset_zero Sebastian Pop @ 2010-07-31 11:35 ` Richard Guenther 2010-07-31 15:28 ` Sebastian Pop 2 siblings, 1 reply; 24+ messages in thread From: Richard Guenther @ 2010-07-31 11:35 UTC (permalink / raw) To: Sebastian Pop; +Cc: gcc-patches, matz On Fri, Jul 30, 2010 at 10:40 PM, Sebastian Pop <sebpop@gmail.com> wrote: > Hi, > > Michael Matz proposed that it would be a good idea for some CPU2006 > benchmarks to add a separate heuristic for the loop distribution pass > for the memset zero pattern, and to enable that at -O3 in order to > exercise the loop distribution code. The following two patches > implement on top of the current loop distribution pass the heuristic, > and enable it at -O3. > > The new pass starts by adding to the partitions working list the data > references that are initialized to zero. These partitions are then > code generated in different loops, and the current loop distribution > detects the memset zero pattern. > > Regstrapped on amd64-linux. > > SPEC2006 passed with -O3 (except the dealII compile fail that I > haven't fixed in my sources yet...). > > Bootstrap failed with BOOT_CFLAGS="-g -O3", but then when I tried also > without these two patches it also failed with the same miscompiled > files, so bootstrap of trunk is broken at -O3, see > http://gcc.gnu.org/PR45146 > > Ok for trunk? The new pass should be disabled when loop-distribution is enabled, no? Thus, I think it would make more sense to fold it into the existing pass which then runs in different modes depending on the flags used. The flag should be named more general, like -ftree-loop-distribute-patterns as we probably want to add memcpy or array sin/cos operations as well here. Now the code looks very specific at the moment, with stores_zero_from_loop. I suppose we can't ask loop distribution to separate stores as is but then only generate separate code for the memset and ask it to keep the other pieces together? Thanks, Richard. > Thanks, > Sebastian > > Add pass_loop_distribute_memset_zero. > Enable flag_tree_loop_distribute_memset_zero at -O3. > > gcc/common.opt | 4 ++ > gcc/doc/invoke.texi | 23 ++++++++++++++- > gcc/opts.c | 1 + > gcc/passes.c | 1 + > gcc/tree-data-ref.c | 26 +++++++++++++++++ > gcc/tree-data-ref.h | 1 + > gcc/tree-loop-distribution.c | 63 ++++++++++++++++++++++++++++++++++++++++++ > gcc/tree-pass.h | 1 + > 8 files changed, 119 insertions(+), 1 deletions(-) > > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/2] Loop distribution for memset zero 2010-07-31 11:35 ` [PATCH 0/2] Loop distribution for memset zero Richard Guenther @ 2010-07-31 15:28 ` Sebastian Pop 2010-07-31 15:28 ` Sebastian Pop ` (5 more replies) 0 siblings, 6 replies; 24+ messages in thread From: Sebastian Pop @ 2010-07-31 15:28 UTC (permalink / raw) To: Richard Guenther; +Cc: gcc-patches, matz On Sat, Jul 31, 2010 at 05:01, Richard Guenther <richard.guenther@gmail.com> wrote: > The new pass should be disabled when loop-distribution is enabled, no? > Thus, I think it would make more sense to fold it into the existing pass > which then runs in different modes depending on the flags used. I will do that and resubmit the patch. > > The flag should be named more general, like -ftree-loop-distribute-patterns What about adding both -ftree-loop-distribute-patterns and -ftree-loop-distribute-memset-zero, such that we can control what patterns are detected, and to enable all these patterns together we'll have -ftree-loop-distribute-patterns. > as we probably want to add memcpy or array sin/cos operations as well > here. I can imagine the memcpy pattern, but could you please provide an example for sin/cos patterns? > Now the code looks very specific at the moment, with > stores_zero_from_loop. I suppose we can't ask loop distribution > to separate stores as is but then only generate separate code for > the memset and ask it to keep the other pieces together? That is the intent of the worklist: the worklist contains the roots of the partitions that have to be generated. The partitions are then augmented with only those dependences that form a cycle (SCC) around the root. For A[i] = 0, there is no way to aggregate around it anything else. The rest of the loop is generated in the default remaining partition. Sebastian ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/2] Loop distribution for memset zero 2010-07-31 15:28 ` Sebastian Pop @ 2010-07-31 15:28 ` Sebastian Pop 2010-07-31 15:28 ` [PATCH 1/3] Add pass_loop_distribute_memset_zero Sebastian Pop ` (4 subsequent siblings) 5 siblings, 0 replies; 24+ messages in thread From: Sebastian Pop @ 2010-07-31 15:28 UTC (permalink / raw) To: Richard Guenther; +Cc: gcc-patches, Sebastian Pop Hi, here is the updated patch set. Ok for trunk after regstrap? Thanks, Sebastian Add pass_loop_distribute_memset_zero. Enable flag_tree_loop_distribute_memset_zero at -O3. Add -ftree-loop-distribute-patterns. gcc/common.opt | 8 +++++++ gcc/doc/invoke.texi | 28 ++++++++++++++++++++++++- gcc/opts.c | 1 + gcc/tree-data-ref.c | 26 +++++++++++++++++++++++ gcc/tree-data-ref.h | 1 + gcc/tree-loop-distribution.c | 46 ++++++++++++++++++++++++++++++----------- 6 files changed, 96 insertions(+), 14 deletions(-) ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 1/3] Add pass_loop_distribute_memset_zero. 2010-07-31 15:28 ` Sebastian Pop 2010-07-31 15:28 ` Sebastian Pop @ 2010-07-31 15:28 ` Sebastian Pop 2010-07-31 15:31 ` [PATCH 3/3] Add -ftree-loop-distribute-patterns Sebastian Pop ` (3 subsequent siblings) 5 siblings, 0 replies; 24+ messages in thread From: Sebastian Pop @ 2010-07-31 15:28 UTC (permalink / raw) To: Richard Guenther; +Cc: gcc-patches, Sebastian Pop --- gcc/common.opt | 4 +++ gcc/doc/invoke.texi | 22 +++++++++++++++++++- gcc/tree-data-ref.c | 26 ++++++++++++++++++++++++ gcc/tree-data-ref.h | 1 + gcc/tree-loop-distribution.c | 45 +++++++++++++++++++++++++++++------------ 5 files changed, 84 insertions(+), 14 deletions(-) diff --git a/gcc/common.opt b/gcc/common.opt index 41a9838..77cf58e 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1337,6 +1337,10 @@ ftree-loop-distribution Common Report Var(flag_tree_loop_distribution) Optimization Enable loop distribution on trees +ftree-loop-distribute-memset-zero +Common Report Var(flag_tree_loop_distribute_memset_zero) Optimization +Enable loop distribution of initialization loops using memset zero + ftree-loop-im Common Report Var(flag_tree_loop_im) Init(1) Optimization Enable loop invariant motion on trees diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 73051de..2d61382 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -384,7 +384,7 @@ Objective-C and Objective-C++ Dialects}. -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-copy-prop @gol -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol -ftree-forwprop -ftree-fre -ftree-loop-if-convert -ftree-loop-im @gol --ftree-phiprop -ftree-loop-distribution @gol +-ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-memset-zero @gol -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-pta -ftree-reassoc @gol -ftree-sink -ftree-sra -ftree-switch-conversion @gol @@ -6925,6 +6925,26 @@ DO I = 1, N ENDDO @end smallexample +@item -ftree-loop-distribute-memset-zero +Perform loop distribution of initialization loops and code generate +them with a call to memset zero. For example, the loop +@smallexample +DO I = 1, N + A(I) = 0 + B(I) = A(I) + I +ENDDO +@end smallexample +is transformed to +@smallexample +DO I = 1, N + A(I) = 0 +ENDDO +DO I = 1, N + B(I) = A(I) + I +ENDDO +@end smallexample +and the initialization loop is transformed into a call to memset zero. + @item -ftree-loop-im @opindex ftree-loop-im Perform loop invariant motion on trees. This pass moves only invariants that diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c index e7aa277..2656350 100644 --- a/gcc/tree-data-ref.c +++ b/gcc/tree-data-ref.c @@ -5038,6 +5038,32 @@ stores_from_loop (struct loop *loop, VEC (gimple, heap) **stmts) free (bbs); } +/* Initialize STMTS with all the statements of LOOP that contain a + store to memory of the form "A[i] = 0". */ + +void +stores_zero_from_loop (struct loop *loop, VEC (gimple, heap) **stmts) +{ + unsigned int i; + basic_block bb; + gimple_stmt_iterator si; + gimple stmt; + tree op; + basic_block *bbs = get_loop_body_in_dom_order (loop); + + for (i = 0; i < loop->num_nodes; i++) + for (bb = bbs[i], si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si)) + if ((stmt = gsi_stmt (si)) + && gimple_vdef (stmt) + && is_gimple_assign (stmt) + && gimple_assign_rhs_code (stmt) == INTEGER_CST + && (op = gimple_assign_rhs1 (stmt)) + && (integer_zerop (op) || real_zerop (op))) + VEC_safe_push (gimple, heap, *stmts, gsi_stmt (si)); + + free (bbs); +} + /* For a data reference REF, return the declaration of its base address or NULL_TREE if the base is not determined. */ diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h index eff5348..9e18e26 100644 --- a/gcc/tree-data-ref.h +++ b/gcc/tree-data-ref.h @@ -564,6 +564,7 @@ index_in_loop_nest (int var, VEC (loop_p, heap) *loop_nest) } void stores_from_loop (struct loop *, VEC (gimple, heap) **); +void stores_zero_from_loop (struct loop *, VEC (gimple, heap) **); void remove_similar_memory_refs (VEC (gimple, heap) **); bool rdg_defs_used_in_other_loops_p (struct graph *, int); bool have_similar_memory_accesses (gimple, gimple); diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c index 099a7fe..920f744 100644 --- a/gcc/tree-loop-distribution.c +++ b/gcc/tree-loop-distribution.c @@ -1184,18 +1184,36 @@ tree_loop_distribution (void) { VEC (gimple, heap) *work_list = VEC_alloc (gimple, heap, 3); - /* With the following working list, we're asking distribute_loop - to separate the stores of the loop: when dependences allow, - it will end on having one store per loop. */ - stores_from_loop (loop, &work_list); - - /* A simple heuristic for cache locality is to not split stores - to the same array. Without this call, an unrolled loop would - be split into as many loops as unroll factor, each loop - storing in the same array. */ - remove_similar_memory_refs (&work_list); - - nb_generated_loops = distribute_loop (loop, work_list); + /* If both flag_tree_loop_distribute_memset_zero and + flag_tree_loop_distribution are set, then only memset_zero is + executed. */ + if (flag_tree_loop_distribute_memset_zero) + { + /* With the following working list, we're asking + distribute_loop to separate from the rest of the loop the + stores of the form "A[i] = 0". */ + stores_zero_from_loop (loop, &work_list); + + /* If there is nothing to be distributed */ + if (VEC_length (gimple, work_list) > 0) + nb_generated_loops = distribute_loop (loop, work_list); + } + else if (flag_tree_loop_distribution) + { + /* With the following working list, we're asking + distribute_loop to separate the stores of the loop: when + dependences allow, it will end on having one store per + loop. */ + stores_from_loop (loop, &work_list); + + /* A simple heuristic for cache locality is to not split + stores to the same array. Without this call, an unrolled + loop would be split into as many loops as unroll factor, + each loop storing in the same array. */ + remove_similar_memory_refs (&work_list); + + nb_generated_loops = distribute_loop (loop, work_list); + } if (dump_file && (dump_flags & TDF_DETAILS)) { @@ -1217,7 +1235,8 @@ tree_loop_distribution (void) static bool gate_tree_loop_distribution (void) { - return flag_tree_loop_distribution != 0; + return flag_tree_loop_distribution + || flag_tree_loop_distribute_memset_zero; } struct gimple_opt_pass pass_loop_distribution = -- 1.7.0.4 ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 3/3] Add -ftree-loop-distribute-patterns. 2010-07-31 15:28 ` Sebastian Pop 2010-07-31 15:28 ` Sebastian Pop 2010-07-31 15:28 ` [PATCH 1/3] Add pass_loop_distribute_memset_zero Sebastian Pop @ 2010-07-31 15:31 ` Sebastian Pop 2010-07-31 15:45 ` Sebastian Pop 2010-07-31 15:34 ` [PATCH 2/3] Enable flag_tree_loop_distribute_memset_zero at -O3 Sebastian Pop ` (2 subsequent siblings) 5 siblings, 1 reply; 24+ messages in thread From: Sebastian Pop @ 2010-07-31 15:31 UTC (permalink / raw) To: Richard Guenther; +Cc: gcc-patches, Sebastian Pop --- gcc/common.opt | 4 ++++ gcc/doc/invoke.texi | 5 +++++ gcc/tree-loop-distribution.c | 3 ++- 3 files changed, 11 insertions(+), 1 deletions(-) diff --git a/gcc/common.opt b/gcc/common.opt index 77cf58e..a9fcdd2 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1341,6 +1341,10 @@ ftree-loop-distribute-memset-zero Common Report Var(flag_tree_loop_distribute_memset_zero) Optimization Enable loop distribution of initialization loops using memset zero +ftree-loop-distribute-patterns +Common Report Var(flag_tree_loop_distribute_patterns) Optimization +Enable loop distribution of patterns code generated with calls to a library + ftree-loop-im Common Report Var(flag_tree_loop_im) Init(1) Optimization Enable loop invariant motion on trees diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index ca3238c..b9b8b22 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -385,6 +385,7 @@ Objective-C and Objective-C++ Dialects}. -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol -ftree-forwprop -ftree-fre -ftree-loop-if-convert -ftree-loop-im @gol -ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-memset-zero @gol +-ftree-loop-distribute-patterns @gol -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-pta -ftree-reassoc @gol -ftree-sink -ftree-sra -ftree-switch-conversion @gol @@ -6946,6 +6947,10 @@ ENDDO and the initialization loop is transformed into a call to memset zero. This flag is enabled by default at @option{-O3}. +@item -ftree-loop-distribute-patterns +Perform loop distribution of patterns that can be code generated with +calls to a library. This enables @option{-ftree-loop-distribute-memset-zero}. + @item -ftree-loop-im @opindex ftree-loop-im Perform loop invariant motion on trees. This pass moves only invariants that diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c index 920f744..c677ecb 100644 --- a/gcc/tree-loop-distribution.c +++ b/gcc/tree-loop-distribution.c @@ -1187,7 +1187,8 @@ tree_loop_distribution (void) /* If both flag_tree_loop_distribute_memset_zero and flag_tree_loop_distribution are set, then only memset_zero is executed. */ - if (flag_tree_loop_distribute_memset_zero) + if (flag_tree_loop_distribute_memset_zero + || flag_tree_loop_distribute_patterns) { /* With the following working list, we're asking distribute_loop to separate from the rest of the loop the -- 1.7.0.4 ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 3/3] Add -ftree-loop-distribute-patterns. 2010-07-31 15:31 ` [PATCH 3/3] Add -ftree-loop-distribute-patterns Sebastian Pop @ 2010-07-31 15:45 ` Sebastian Pop 2010-08-01 12:06 ` Richard Guenther 0 siblings, 1 reply; 24+ messages in thread From: Sebastian Pop @ 2010-07-31 15:45 UTC (permalink / raw) To: Richard Guenther; +Cc: gcc-patches, Sebastian Pop I forgot this part in the patch below: diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c index c677ecb..34d6e21 100644 --- a/gcc/tree-loop-distribution.c +++ b/gcc/tree-loop-distribution.c @@ -1237,7 +1237,8 @@ static bool gate_tree_loop_distribution (void) { return flag_tree_loop_distribution - || flag_tree_loop_distribute_memset_zero; + || flag_tree_loop_distribute_memset_zero + || flag_tree_loop_distribute_patterns; } struct gimple_opt_pass pass_loop_distribution = On Sat, Jul 31, 2010 at 10:27, Sebastian Pop <sebpop@gmail.com> wrote: > --- > gcc/common.opt | 4 ++++ > gcc/doc/invoke.texi | 5 +++++ > gcc/tree-loop-distribution.c | 3 ++- > 3 files changed, 11 insertions(+), 1 deletions(-) > > diff --git a/gcc/common.opt b/gcc/common.opt > index 77cf58e..a9fcdd2 100644 > --- a/gcc/common.opt > +++ b/gcc/common.opt > @@ -1341,6 +1341,10 @@ ftree-loop-distribute-memset-zero > Common Report Var(flag_tree_loop_distribute_memset_zero) Optimization > Enable loop distribution of initialization loops using memset zero > > +ftree-loop-distribute-patterns > +Common Report Var(flag_tree_loop_distribute_patterns) Optimization > +Enable loop distribution of patterns code generated with calls to a library > + > ftree-loop-im > Common Report Var(flag_tree_loop_im) Init(1) Optimization > Enable loop invariant motion on trees > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index ca3238c..b9b8b22 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -385,6 +385,7 @@ Objective-C and Objective-C++ Dialects}. > -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol > -ftree-forwprop -ftree-fre -ftree-loop-if-convert -ftree-loop-im @gol > -ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-memset-zero @gol > +-ftree-loop-distribute-patterns @gol > -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol > -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-pta -ftree-reassoc @gol > -ftree-sink -ftree-sra -ftree-switch-conversion @gol > @@ -6946,6 +6947,10 @@ ENDDO > and the initialization loop is transformed into a call to memset zero. > This flag is enabled by default at @option{-O3}. > > +@item -ftree-loop-distribute-patterns > +Perform loop distribution of patterns that can be code generated with > +calls to a library. This enables @option{-ftree-loop-distribute-memset-zero}. > + > @item -ftree-loop-im > @opindex ftree-loop-im > Perform loop invariant motion on trees. This pass moves only invariants that > diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c > index 920f744..c677ecb 100644 > --- a/gcc/tree-loop-distribution.c > +++ b/gcc/tree-loop-distribution.c > @@ -1187,7 +1187,8 @@ tree_loop_distribution (void) > /* If both flag_tree_loop_distribute_memset_zero and > flag_tree_loop_distribution are set, then only memset_zero is > executed. */ > - if (flag_tree_loop_distribute_memset_zero) > + if (flag_tree_loop_distribute_memset_zero > + || flag_tree_loop_distribute_patterns) > { > /* With the following working list, we're asking > distribute_loop to separate from the rest of the loop the > -- > 1.7.0.4 > > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 3/3] Add -ftree-loop-distribute-patterns. 2010-07-31 15:45 ` Sebastian Pop @ 2010-08-01 12:06 ` Richard Guenther 2010-08-02 9:21 ` Richard Guenther 0 siblings, 1 reply; 24+ messages in thread From: Richard Guenther @ 2010-08-01 12:06 UTC (permalink / raw) To: Sebastian Pop; +Cc: gcc-patches On Sat, Jul 31, 2010 at 5:30 PM, Sebastian Pop <sebpop@gmail.com> wrote: > I forgot this part in the patch below: I thought of renaming -ftree-loop-distribute-memset-zero to -ftree-loop-distribute-patterns, not adding an unused option. Richard. > diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c > index c677ecb..34d6e21 100644 > --- a/gcc/tree-loop-distribution.c > +++ b/gcc/tree-loop-distribution.c > @@ -1237,7 +1237,8 @@ static bool > gate_tree_loop_distribution (void) > { > return flag_tree_loop_distribution > - || flag_tree_loop_distribute_memset_zero; > + || flag_tree_loop_distribute_memset_zero > + || flag_tree_loop_distribute_patterns; > } > > struct gimple_opt_pass pass_loop_distribution = > > > On Sat, Jul 31, 2010 at 10:27, Sebastian Pop <sebpop@gmail.com> wrote: >> --- >> gcc/common.opt | 4 ++++ >> gcc/doc/invoke.texi | 5 +++++ >> gcc/tree-loop-distribution.c | 3 ++- >> 3 files changed, 11 insertions(+), 1 deletions(-) >> >> diff --git a/gcc/common.opt b/gcc/common.opt >> index 77cf58e..a9fcdd2 100644 >> --- a/gcc/common.opt >> +++ b/gcc/common.opt >> @@ -1341,6 +1341,10 @@ ftree-loop-distribute-memset-zero >> Common Report Var(flag_tree_loop_distribute_memset_zero) Optimization >> Enable loop distribution of initialization loops using memset zero >> >> +ftree-loop-distribute-patterns >> +Common Report Var(flag_tree_loop_distribute_patterns) Optimization >> +Enable loop distribution of patterns code generated with calls to a library >> + >> ftree-loop-im >> Common Report Var(flag_tree_loop_im) Init(1) Optimization >> Enable loop invariant motion on trees >> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi >> index ca3238c..b9b8b22 100644 >> --- a/gcc/doc/invoke.texi >> +++ b/gcc/doc/invoke.texi >> @@ -385,6 +385,7 @@ Objective-C and Objective-C++ Dialects}. >> -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol >> -ftree-forwprop -ftree-fre -ftree-loop-if-convert -ftree-loop-im @gol >> -ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-memset-zero @gol >> +-ftree-loop-distribute-patterns @gol >> -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol >> -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-pta -ftree-reassoc @gol >> -ftree-sink -ftree-sra -ftree-switch-conversion @gol >> @@ -6946,6 +6947,10 @@ ENDDO >> and the initialization loop is transformed into a call to memset zero. >> This flag is enabled by default at @option{-O3}. >> >> +@item -ftree-loop-distribute-patterns >> +Perform loop distribution of patterns that can be code generated with >> +calls to a library. This enables @option{-ftree-loop-distribute-memset-zero}. >> + >> @item -ftree-loop-im >> @opindex ftree-loop-im >> Perform loop invariant motion on trees. This pass moves only invariants that >> diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c >> index 920f744..c677ecb 100644 >> --- a/gcc/tree-loop-distribution.c >> +++ b/gcc/tree-loop-distribution.c >> @@ -1187,7 +1187,8 @@ tree_loop_distribution (void) >> /* If both flag_tree_loop_distribute_memset_zero and >> flag_tree_loop_distribution are set, then only memset_zero is >> executed. */ >> - if (flag_tree_loop_distribute_memset_zero) >> + if (flag_tree_loop_distribute_memset_zero >> + || flag_tree_loop_distribute_patterns) >> { >> /* With the following working list, we're asking >> distribute_loop to separate from the rest of the loop the >> -- >> 1.7.0.4 >> >> > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 3/3] Add -ftree-loop-distribute-patterns. 2010-08-01 12:06 ` Richard Guenther @ 2010-08-02 9:21 ` Richard Guenther 2010-08-02 15:26 ` [PATCH] Add -ftree-loop-distribute-patterns enabled at -O3 Sebastian Pop 2010-08-02 16:22 ` [PATCH 3/3] Add -ftree-loop-distribute-patterns Sebastian Pop 0 siblings, 2 replies; 24+ messages in thread From: Richard Guenther @ 2010-08-02 9:21 UTC (permalink / raw) To: Sebastian Pop; +Cc: gcc-patches On Sun, Aug 1, 2010 at 2:05 PM, Richard Guenther <richard.guenther@gmail.com> wrote: > On Sat, Jul 31, 2010 at 5:30 PM, Sebastian Pop <sebpop@gmail.com> wrote: >> I forgot this part in the patch below: > > I thought of renaming -ftree-loop-distribute-memset-zero to > -ftree-loop-distribute-patterns, not adding an unused option. Btw, the patchset is ok with that change, -ftree-loop-distribute-memset removed and -ftree-loop-distribute-patterns enabled at -O3. Thanks, Richard. > Richard. > >> diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c >> index c677ecb..34d6e21 100644 >> --- a/gcc/tree-loop-distribution.c >> +++ b/gcc/tree-loop-distribution.c >> @@ -1237,7 +1237,8 @@ static bool >> gate_tree_loop_distribution (void) >> { >> return flag_tree_loop_distribution >> - || flag_tree_loop_distribute_memset_zero; >> + || flag_tree_loop_distribute_memset_zero >> + || flag_tree_loop_distribute_patterns; >> } >> >> struct gimple_opt_pass pass_loop_distribution = >> >> >> On Sat, Jul 31, 2010 at 10:27, Sebastian Pop <sebpop@gmail.com> wrote: >>> --- >>> gcc/common.opt | 4 ++++ >>> gcc/doc/invoke.texi | 5 +++++ >>> gcc/tree-loop-distribution.c | 3 ++- >>> 3 files changed, 11 insertions(+), 1 deletions(-) >>> >>> diff --git a/gcc/common.opt b/gcc/common.opt >>> index 77cf58e..a9fcdd2 100644 >>> --- a/gcc/common.opt >>> +++ b/gcc/common.opt >>> @@ -1341,6 +1341,10 @@ ftree-loop-distribute-memset-zero >>> Common Report Var(flag_tree_loop_distribute_memset_zero) Optimization >>> Enable loop distribution of initialization loops using memset zero >>> >>> +ftree-loop-distribute-patterns >>> +Common Report Var(flag_tree_loop_distribute_patterns) Optimization >>> +Enable loop distribution of patterns code generated with calls to a library >>> + >>> ftree-loop-im >>> Common Report Var(flag_tree_loop_im) Init(1) Optimization >>> Enable loop invariant motion on trees >>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi >>> index ca3238c..b9b8b22 100644 >>> --- a/gcc/doc/invoke.texi >>> +++ b/gcc/doc/invoke.texi >>> @@ -385,6 +385,7 @@ Objective-C and Objective-C++ Dialects}. >>> -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol >>> -ftree-forwprop -ftree-fre -ftree-loop-if-convert -ftree-loop-im @gol >>> -ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-memset-zero @gol >>> +-ftree-loop-distribute-patterns @gol >>> -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol >>> -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-pta -ftree-reassoc @gol >>> -ftree-sink -ftree-sra -ftree-switch-conversion @gol >>> @@ -6946,6 +6947,10 @@ ENDDO >>> and the initialization loop is transformed into a call to memset zero. >>> This flag is enabled by default at @option{-O3}. >>> >>> +@item -ftree-loop-distribute-patterns >>> +Perform loop distribution of patterns that can be code generated with >>> +calls to a library. This enables @option{-ftree-loop-distribute-memset-zero}. >>> + >>> @item -ftree-loop-im >>> @opindex ftree-loop-im >>> Perform loop invariant motion on trees. This pass moves only invariants that >>> diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c >>> index 920f744..c677ecb 100644 >>> --- a/gcc/tree-loop-distribution.c >>> +++ b/gcc/tree-loop-distribution.c >>> @@ -1187,7 +1187,8 @@ tree_loop_distribution (void) >>> /* If both flag_tree_loop_distribute_memset_zero and >>> flag_tree_loop_distribution are set, then only memset_zero is >>> executed. */ >>> - if (flag_tree_loop_distribute_memset_zero) >>> + if (flag_tree_loop_distribute_memset_zero >>> + || flag_tree_loop_distribute_patterns) >>> { >>> /* With the following working list, we're asking >>> distribute_loop to separate from the rest of the loop the >>> -- >>> 1.7.0.4 >>> >>> >> > ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH] Add -ftree-loop-distribute-patterns enabled at -O3. 2010-08-02 9:21 ` Richard Guenther @ 2010-08-02 15:26 ` Sebastian Pop 2010-08-07 17:49 ` Gerald Pfeifer 2010-10-21 0:30 ` H.J. Lu 2010-08-02 16:22 ` [PATCH 3/3] Add -ftree-loop-distribute-patterns Sebastian Pop 1 sibling, 2 replies; 24+ messages in thread From: Sebastian Pop @ 2010-08-02 15:26 UTC (permalink / raw) To: Richard Guenther; +Cc: gcc-patches, Sebastian Pop Hi, Here is the patch that I am testing on amd64-linux. I will commit this patch to trunk after regstrap. Sebastian --- gcc/common.opt | 4 +++ gcc/doc/invoke.texi | 25 ++++++++++++++++++++++- gcc/opts.c | 1 + gcc/tree-data-ref.c | 26 ++++++++++++++++++++++++ gcc/tree-data-ref.h | 1 + gcc/tree-loop-distribution.c | 45 +++++++++++++++++++++++++++++------------ 6 files changed, 88 insertions(+), 14 deletions(-) diff --git a/gcc/common.opt b/gcc/common.opt index 41a9838..8cb09ab 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1337,6 +1337,10 @@ ftree-loop-distribution Common Report Var(flag_tree_loop_distribution) Optimization Enable loop distribution on trees +ftree-loop-distribute-patterns +Common Report Var(flag_tree_loop_distribute_patterns) Optimization +Enable loop distribution for patterns transformed into a library call + ftree-loop-im Common Report Var(flag_tree_loop_im) Init(1) Optimization Enable loop invariant motion on trees diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 73051de..68b64db 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -384,7 +384,7 @@ Objective-C and Objective-C++ Dialects}. -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-copy-prop @gol -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol -ftree-forwprop -ftree-fre -ftree-loop-if-convert -ftree-loop-im @gol --ftree-phiprop -ftree-loop-distribution @gol +-ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-patterns @gol -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-pta -ftree-reassoc @gol -ftree-sink -ftree-sra -ftree-switch-conversion @gol @@ -6925,6 +6925,29 @@ DO I = 1, N ENDDO @end smallexample +@item -ftree-loop-distribute-patterns +Perform loop distribution of patterns that can be code generated with +calls to a library. This flag is enabled by default at @option{-O3}. + +This pass distributes the initialization loops and generates a call to +memset zero. For example, the loop +@smallexample +DO I = 1, N + A(I) = 0 + B(I) = A(I) + I +ENDDO +@end smallexample +is transformed to +@smallexample +DO I = 1, N + A(I) = 0 +ENDDO +DO I = 1, N + B(I) = A(I) + I +ENDDO +@end smallexample +and the initialization loop is transformed into a call to memset zero. + @item -ftree-loop-im @opindex ftree-loop-im Perform loop invariant motion on trees. This pass moves only invariants that diff --git a/gcc/opts.c b/gcc/opts.c index 07d7a23..2579e9f 100644 --- a/gcc/opts.c +++ b/gcc/opts.c @@ -862,6 +862,7 @@ decode_options (unsigned int argc, const char **argv, /* -O3 optimizations. */ opt3 = (optimize >= 3); + flag_tree_loop_distribute_patterns = opt3; flag_predictive_commoning = opt3; flag_inline_functions = opt3; flag_unswitch_loops = opt3; diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c index e7aa277..2656350 100644 --- a/gcc/tree-data-ref.c +++ b/gcc/tree-data-ref.c @@ -5038,6 +5038,32 @@ stores_from_loop (struct loop *loop, VEC (gimple, heap) **stmts) free (bbs); } +/* Initialize STMTS with all the statements of LOOP that contain a + store to memory of the form "A[i] = 0". */ + +void +stores_zero_from_loop (struct loop *loop, VEC (gimple, heap) **stmts) +{ + unsigned int i; + basic_block bb; + gimple_stmt_iterator si; + gimple stmt; + tree op; + basic_block *bbs = get_loop_body_in_dom_order (loop); + + for (i = 0; i < loop->num_nodes; i++) + for (bb = bbs[i], si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si)) + if ((stmt = gsi_stmt (si)) + && gimple_vdef (stmt) + && is_gimple_assign (stmt) + && gimple_assign_rhs_code (stmt) == INTEGER_CST + && (op = gimple_assign_rhs1 (stmt)) + && (integer_zerop (op) || real_zerop (op))) + VEC_safe_push (gimple, heap, *stmts, gsi_stmt (si)); + + free (bbs); +} + /* For a data reference REF, return the declaration of its base address or NULL_TREE if the base is not determined. */ diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h index eff5348..9e18e26 100644 --- a/gcc/tree-data-ref.h +++ b/gcc/tree-data-ref.h @@ -564,6 +564,7 @@ index_in_loop_nest (int var, VEC (loop_p, heap) *loop_nest) } void stores_from_loop (struct loop *, VEC (gimple, heap) **); +void stores_zero_from_loop (struct loop *, VEC (gimple, heap) **); void remove_similar_memory_refs (VEC (gimple, heap) **); bool rdg_defs_used_in_other_loops_p (struct graph *, int); bool have_similar_memory_accesses (gimple, gimple); diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c index 099a7fe..5905406 100644 --- a/gcc/tree-loop-distribution.c +++ b/gcc/tree-loop-distribution.c @@ -1184,18 +1184,36 @@ tree_loop_distribution (void) { VEC (gimple, heap) *work_list = VEC_alloc (gimple, heap, 3); - /* With the following working list, we're asking distribute_loop - to separate the stores of the loop: when dependences allow, - it will end on having one store per loop. */ - stores_from_loop (loop, &work_list); - - /* A simple heuristic for cache locality is to not split stores - to the same array. Without this call, an unrolled loop would - be split into as many loops as unroll factor, each loop - storing in the same array. */ - remove_similar_memory_refs (&work_list); - - nb_generated_loops = distribute_loop (loop, work_list); + /* If both flag_tree_loop_distribute_patterns and + flag_tree_loop_distribution are set, then only + distribute_patterns is executed. */ + if (flag_tree_loop_distribute_patterns) + { + /* With the following working list, we're asking + distribute_loop to separate from the rest of the loop the + stores of the form "A[i] = 0". */ + stores_zero_from_loop (loop, &work_list); + + /* Do nothing if there are no patterns to be distributed. */ + if (VEC_length (gimple, work_list) > 0) + nb_generated_loops = distribute_loop (loop, work_list); + } + else if (flag_tree_loop_distribution) + { + /* With the following working list, we're asking + distribute_loop to separate the stores of the loop: when + dependences allow, it will end on having one store per + loop. */ + stores_from_loop (loop, &work_list); + + /* A simple heuristic for cache locality is to not split + stores to the same array. Without this call, an unrolled + loop would be split into as many loops as unroll factor, + each loop storing in the same array. */ + remove_similar_memory_refs (&work_list); + + nb_generated_loops = distribute_loop (loop, work_list); + } if (dump_file && (dump_flags & TDF_DETAILS)) { @@ -1217,7 +1235,8 @@ tree_loop_distribution (void) static bool gate_tree_loop_distribution (void) { - return flag_tree_loop_distribution != 0; + return flag_tree_loop_distribution + || flag_tree_loop_distribute_patterns; } struct gimple_opt_pass pass_loop_distribution = -- 1.7.0.4 ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] Add -ftree-loop-distribute-patterns enabled at -O3. 2010-08-02 15:26 ` [PATCH] Add -ftree-loop-distribute-patterns enabled at -O3 Sebastian Pop @ 2010-08-07 17:49 ` Gerald Pfeifer 2010-10-21 0:30 ` H.J. Lu 1 sibling, 0 replies; 24+ messages in thread From: Gerald Pfeifer @ 2010-08-07 17:49 UTC (permalink / raw) To: Sebastian Pop; +Cc: gcc-patches On Mon, 2 Aug 2010, Sebastian Pop wrote: > Here is the patch that I am testing on amd64-linux. I will commit > this patch to trunk after regstrap. I think this would be worthwhile to add to http://gcc.gnu.org/gcc-4.6/changes.html ? Gerald ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] Add -ftree-loop-distribute-patterns enabled at -O3. 2010-08-02 15:26 ` [PATCH] Add -ftree-loop-distribute-patterns enabled at -O3 Sebastian Pop 2010-08-07 17:49 ` Gerald Pfeifer @ 2010-10-21 0:30 ` H.J. Lu 2010-12-18 17:48 ` H.J. Lu 1 sibling, 1 reply; 24+ messages in thread From: H.J. Lu @ 2010-10-21 0:30 UTC (permalink / raw) To: Sebastian Pop; +Cc: Richard Guenther, gcc-patches On Mon, Aug 2, 2010 at 8:25 AM, Sebastian Pop <sebpop@gmail.com> wrote: > Hi, > > Here is the patch that I am testing on amd64-linux. I will commit > this patch to trunk after regstrap. > > Sebastian > This new option caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46107 -- H.J. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] Add -ftree-loop-distribute-patterns enabled at -O3. 2010-10-21 0:30 ` H.J. Lu @ 2010-12-18 17:48 ` H.J. Lu 0 siblings, 0 replies; 24+ messages in thread From: H.J. Lu @ 2010-12-18 17:48 UTC (permalink / raw) To: Sebastian Pop; +Cc: Richard Guenther, gcc-patches On Wed, Oct 20, 2010 at 4:06 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Mon, Aug 2, 2010 at 8:25 AM, Sebastian Pop <sebpop@gmail.com> wrote: >> Hi, >> >> Here is the patch that I am testing on amd64-linux. I will commit >> this patch to trunk after regstrap. >> >> Sebastian >> > > This new option caused: > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46107 > This also caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47002 -- H.J. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 3/3] Add -ftree-loop-distribute-patterns. 2010-08-02 9:21 ` Richard Guenther 2010-08-02 15:26 ` [PATCH] Add -ftree-loop-distribute-patterns enabled at -O3 Sebastian Pop @ 2010-08-02 16:22 ` Sebastian Pop 1 sibling, 0 replies; 24+ messages in thread From: Sebastian Pop @ 2010-08-02 16:22 UTC (permalink / raw) To: Richard Guenther; +Cc: gcc-patches On Mon, Aug 2, 2010 at 04:21, Richard Guenther <richard.guenther@gmail.com> wrote: > Btw, the patchset is ok with that change, -ftree-loop-distribute-memset > removed and -ftree-loop-distribute-patterns enabled at -O3. > Committed r162822. ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 2/3] Enable flag_tree_loop_distribute_memset_zero at -O3. 2010-07-31 15:28 ` Sebastian Pop ` (2 preceding siblings ...) 2010-07-31 15:31 ` [PATCH 3/3] Add -ftree-loop-distribute-patterns Sebastian Pop @ 2010-07-31 15:34 ` Sebastian Pop 2010-07-31 17:21 ` [PATCH 0/2] Loop distribution for memset zero Michael Matz 2010-08-01 12:19 ` Richard Guenther 5 siblings, 0 replies; 24+ messages in thread From: Sebastian Pop @ 2010-07-31 15:34 UTC (permalink / raw) To: Richard Guenther; +Cc: gcc-patches, Sebastian Pop --- gcc/doc/invoke.texi | 1 + gcc/opts.c | 1 + 2 files changed, 2 insertions(+), 0 deletions(-) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 2d61382..ca3238c 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -6944,6 +6944,7 @@ DO I = 1, N ENDDO @end smallexample and the initialization loop is transformed into a call to memset zero. +This flag is enabled by default at @option{-O3}. @item -ftree-loop-im @opindex ftree-loop-im diff --git a/gcc/opts.c b/gcc/opts.c index 07d7a23..16a337c 100644 --- a/gcc/opts.c +++ b/gcc/opts.c @@ -862,6 +862,7 @@ decode_options (unsigned int argc, const char **argv, /* -O3 optimizations. */ opt3 = (optimize >= 3); + flag_tree_loop_distribute_memset_zero = opt3; flag_predictive_commoning = opt3; flag_inline_functions = opt3; flag_unswitch_loops = opt3; -- 1.7.0.4 ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/2] Loop distribution for memset zero 2010-07-31 15:28 ` Sebastian Pop ` (3 preceding siblings ...) 2010-07-31 15:34 ` [PATCH 2/3] Enable flag_tree_loop_distribute_memset_zero at -O3 Sebastian Pop @ 2010-07-31 17:21 ` Michael Matz 2010-07-31 17:36 ` Sebastian Pop 2010-07-31 20:01 ` Joseph S. Myers 2010-08-01 12:19 ` Richard Guenther 5 siblings, 2 replies; 24+ messages in thread From: Michael Matz @ 2010-07-31 17:21 UTC (permalink / raw) To: Sebastian Pop; +Cc: Richard Guenther, gcc-patches Hi, On Sat, 31 Jul 2010, Sebastian Pop wrote: > > as we probably want to add memcpy or array sin/cos operations as well > > here. > > I can imagine the memcpy pattern, but could you please provide an > example for sin/cos patterns? Some math libraries (the one from AMD at least for instance) provide not only vectorized intrinsics for a fixed vector size (e.g. 4 float elements), but also for a generic arbitrarily sized array. For instance: void vrsa_expf(int n, float *src, float *dest); is equivalent to: for (i = 0; i < n; i++) dest[i] = expf (src[i]); Ciao, Michael. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/2] Loop distribution for memset zero 2010-07-31 17:21 ` [PATCH 0/2] Loop distribution for memset zero Michael Matz @ 2010-07-31 17:36 ` Sebastian Pop 2010-08-01 12:10 ` Richard Guenther 2010-07-31 20:01 ` Joseph S. Myers 1 sibling, 1 reply; 24+ messages in thread From: Sebastian Pop @ 2010-07-31 17:36 UTC (permalink / raw) To: Michael Matz; +Cc: Richard Guenther, gcc-patches On Sat, Jul 31, 2010 at 11:43, Michael Matz <matz@suse.de> wrote: > Some math libraries (the one from AMD at least for instance) provide not > only vectorized intrinsics for a fixed vector size (e.g. 4 float > elements), but also for a generic arbitrarily sized array. > For instance: > > void vrsa_expf(int n, float *src, float *dest); > > is equivalent to: > > for (i = 0; i < n; i++) > dest[i] = expf (src[i]); > I see. I think it would not be difficult to detect this kind of pattern as well. I would need some help on the code generation part, as I don't know how to generate the calls. For the memset and memcpy we have BUILT_IN_MEMSET and BUILT_IN_MEMCPY. How should I generate code for the vrsa_expf function? Thanks, Sebastian ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/2] Loop distribution for memset zero 2010-07-31 17:36 ` Sebastian Pop @ 2010-08-01 12:10 ` Richard Guenther 0 siblings, 0 replies; 24+ messages in thread From: Richard Guenther @ 2010-08-01 12:10 UTC (permalink / raw) To: Sebastian Pop; +Cc: Michael Matz, gcc-patches On Sat, Jul 31, 2010 at 6:55 PM, Sebastian Pop <sebpop@gmail.com> wrote: > On Sat, Jul 31, 2010 at 11:43, Michael Matz <matz@suse.de> wrote: >> Some math libraries (the one from AMD at least for instance) provide not >> only vectorized intrinsics for a fixed vector size (e.g. 4 float >> elements), but also for a generic arbitrarily sized array. >> For instance: >> >> void vrsa_expf(int n, float *src, float *dest); >> >> is equivalent to: >> >> for (i = 0; i < n; i++) >> dest[i] = expf (src[i]); >> > > I see. I think it would not be difficult to detect this kind of > pattern as well. > > I would need some help on the code generation part, as I don't know > how to generate the calls. For the memset and memcpy we have > BUILT_IN_MEMSET and BUILT_IN_MEMCPY. How should I > generate code for the vrsa_expf function? Conditional on -mveclibabi=acml you simply assume the availability and ABI of acml (you need to build a function type and decl, look at ix86_veclibabi_acml in i386.c). Richard. > Thanks, > Sebastian > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/2] Loop distribution for memset zero 2010-07-31 17:21 ` [PATCH 0/2] Loop distribution for memset zero Michael Matz 2010-07-31 17:36 ` Sebastian Pop @ 2010-07-31 20:01 ` Joseph S. Myers 2010-08-01 16:19 ` Michael Matz 1 sibling, 1 reply; 24+ messages in thread From: Joseph S. Myers @ 2010-07-31 20:01 UTC (permalink / raw) To: Michael Matz; +Cc: Sebastian Pop, Richard Guenther, gcc-patches On Sat, 31 Jul 2010, Michael Matz wrote: > Some math libraries (the one from AMD at least for instance) provide not > only vectorized intrinsics for a fixed vector size (e.g. 4 float > elements), but also for a generic arbitrarily sized array. > For instance: > > void vrsa_expf(int n, float *src, float *dest); > > is equivalent to: > > for (i = 0; i < n; i++) > dest[i] = expf (src[i]); Exactly equivalent to that C code even with overlaps, or is it really (int n, const float *restrict src, float *restrict dest) with no overlap permitted? -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/2] Loop distribution for memset zero 2010-07-31 20:01 ` Joseph S. Myers @ 2010-08-01 16:19 ` Michael Matz 0 siblings, 0 replies; 24+ messages in thread From: Michael Matz @ 2010-08-01 16:19 UTC (permalink / raw) To: Joseph S. Myers; +Cc: Sebastian Pop, Richard Guenther, gcc-patches Hi, On Sat, 31 Jul 2010, Joseph S. Myers wrote: > On Sat, 31 Jul 2010, Michael Matz wrote: > > > Some math libraries (the one from AMD at least for instance) provide not > > only vectorized intrinsics for a fixed vector size (e.g. 4 float > > elements), but also for a generic arbitrarily sized array. > > For instance: > > > > void vrsa_expf(int n, float *src, float *dest); > > > > is equivalent to: > > > > for (i = 0; i < n; i++) > > dest[i] = expf (src[i]); > > Exactly equivalent to that C code even with overlaps, or is it really (int > n, const float *restrict src, float *restrict dest) with no overlap > permitted? The current vrsa_expf happens to be implemented with a forward walk through the two arrays without checking for overlap. I think it's safe to assume that nobody thought about this aspect and hence the specification should include that no partial overlap is permitted (it would work with exact overlap). Ciao, Michael. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/2] Loop distribution for memset zero 2010-07-31 15:28 ` Sebastian Pop ` (4 preceding siblings ...) 2010-07-31 17:21 ` [PATCH 0/2] Loop distribution for memset zero Michael Matz @ 2010-08-01 12:19 ` Richard Guenther 2010-08-01 15:21 ` Sebastian Pop 5 siblings, 1 reply; 24+ messages in thread From: Richard Guenther @ 2010-08-01 12:19 UTC (permalink / raw) To: Sebastian Pop; +Cc: gcc-patches, matz On Sat, Jul 31, 2010 at 5:00 PM, Sebastian Pop <sebpop@gmail.com> wrote: > On Sat, Jul 31, 2010 at 05:01, Richard Guenther > <richard.guenther@gmail.com> wrote: >> The new pass should be disabled when loop-distribution is enabled, no? >> Thus, I think it would make more sense to fold it into the existing pass >> which then runs in different modes depending on the flags used. > > I will do that and resubmit the patch. > >> >> The flag should be named more general, like -ftree-loop-distribute-patterns > > What about adding both -ftree-loop-distribute-patterns and > -ftree-loop-distribute-memset-zero, such that we can control what > patterns are detected, and to enable all these patterns together we'll > have -ftree-loop-distribute-patterns. Hm. I don't like inflation of command-line arguments too much, but it might make sense ... >> as we probably want to add memcpy or array sin/cos operations as well >> here. > > I can imagine the memcpy pattern, but could you please provide an > example for sin/cos patterns? See other responses. Can we detect for eample daxpy? for (i=0; i<n; ++i) dy[i] = dy[i] + da * dx[i]; ? In principle all the blas routines have one destination, so we'd need to distribute all stores, like with regular loop distribution but then after analyzing the partitions and detecting which ones we recognize we need to fuse the unhandled parts together again. Can we do that from inside the loop distribution machinery? The Fortran frontend alread has -fexternal-blas which it uses to implement matmul on large arrays. >> Now the code looks very specific at the moment, with >> stores_zero_from_loop. I suppose we can't ask loop distribution >> to separate stores as is but then only generate separate code for >> the memset and ask it to keep the other pieces together? > > That is the intent of the worklist: the worklist contains the roots of > the partitions that have to be generated. The partitions are then > augmented with only those dependences that form a cycle (SCC) around > the root. For A[i] = 0, there is no way to aggregate around it > anything else. The rest of the loop is generated in the default > remaining partition. Ok, so we'd need to do the pattern recognition before distributing the loop? But we need to make sure that the partition only contains side-effects the replacement function has. Consider for (i = 0; i < n ; ++i) { dx[i] = i; dy[i] = dy[i] + da * dx[i]; } will loop distribution include the assignment to dx[i] in the partition if the worklist contains dy[i]? Richard. > > Sebastian > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/2] Loop distribution for memset zero 2010-08-01 12:19 ` Richard Guenther @ 2010-08-01 15:21 ` Sebastian Pop 0 siblings, 0 replies; 24+ messages in thread From: Sebastian Pop @ 2010-08-01 15:21 UTC (permalink / raw) To: Richard Guenther; +Cc: gcc-patches, matz On Sun, Aug 1, 2010 at 07:19, Richard Guenther <richard.guenther@gmail.com> wrote: > Hm. I don't like inflation of command-line arguments too much, but Ok, so for now let's have only -ftree-loop-distribute-patterns. > See other responses. Can we detect for eample daxpy? > > for (i=0; i<n; ++i) > dy[i] = dy[i] + da * dx[i]; > Yes we could detect these patterns. > ? In principle all the blas routines have one destination, so we'd need > to distribute all stores, like with regular loop distribution but then > after analyzing the partitions and detecting which ones we recognize > we need to fuse the unhandled parts together again. Can we do > that from inside the loop distribution machinery? > The loop distribution does not fuse back the partitions if, due to other dependences we have to pull in the same partition more data references than in the analysis part, and so the kernel to be code generated is not exactly the one for which we have the lib function. I think it would not be difficult to implement the fusion of the partitions that do not match the patterns with the default partition. > Ok, so we'd need to do the pattern recognition before distributing > the loop? We have to detect the pattern both before we create the partitions, as this would create the initial root of the partition, and then after the partition is created by aggregation of other data references, we have to run the pattern matching again to make sure the pattern matches again. > But we need to make sure that the partition only contains > side-effects the replacement function has. Consider > > for (i = 0; i < n ; ++i) > { > dx[i] = i; > dy[i] = dy[i] + da * dx[i]; > } > > will loop distribution include the assignment to dx[i] in the partition > if the worklist contains dy[i]? I think that in this case you would get two partitions, because there is no SCC in the data dep graph that would require the write to dx to be in the same partition as the write to dy, and the distribution would lead to: for (i = 0; i < n ; ++i) dx[i] = i; for (i = 0; i < n ; ++i) dy[i] = dy[i] + da * dx[i]; Sebastian ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2010-12-18 14:27 UTC | newest] Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-07-30 20:41 [PATCH 0/2] Loop distribution for memset zero Sebastian Pop 2010-07-30 20:43 ` [PATCH 2/2] Enable flag_tree_loop_distribute_memset_zero at -O3 Sebastian Pop 2010-07-30 20:52 ` [PATCH 1/2] Add pass_loop_distribute_memset_zero Sebastian Pop 2010-07-31 11:35 ` [PATCH 0/2] Loop distribution for memset zero Richard Guenther 2010-07-31 15:28 ` Sebastian Pop 2010-07-31 15:28 ` Sebastian Pop 2010-07-31 15:28 ` [PATCH 1/3] Add pass_loop_distribute_memset_zero Sebastian Pop 2010-07-31 15:31 ` [PATCH 3/3] Add -ftree-loop-distribute-patterns Sebastian Pop 2010-07-31 15:45 ` Sebastian Pop 2010-08-01 12:06 ` Richard Guenther 2010-08-02 9:21 ` Richard Guenther 2010-08-02 15:26 ` [PATCH] Add -ftree-loop-distribute-patterns enabled at -O3 Sebastian Pop 2010-08-07 17:49 ` Gerald Pfeifer 2010-10-21 0:30 ` H.J. Lu 2010-12-18 17:48 ` H.J. Lu 2010-08-02 16:22 ` [PATCH 3/3] Add -ftree-loop-distribute-patterns Sebastian Pop 2010-07-31 15:34 ` [PATCH 2/3] Enable flag_tree_loop_distribute_memset_zero at -O3 Sebastian Pop 2010-07-31 17:21 ` [PATCH 0/2] Loop distribution for memset zero Michael Matz 2010-07-31 17:36 ` Sebastian Pop 2010-08-01 12:10 ` Richard Guenther 2010-07-31 20:01 ` Joseph S. Myers 2010-08-01 16:19 ` Michael Matz 2010-08-01 12:19 ` Richard Guenther 2010-08-01 15:21 ` Sebastian Pop
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).