* [PATCH, PR66851] Handle double reduction in parloops @ 2015-07-13 14:56 Tom de Vries 2015-07-24 10:55 ` [PING][PATCH, " Tom de Vries 0 siblings, 1 reply; 4+ messages in thread From: Tom de Vries @ 2015-07-13 14:56 UTC (permalink / raw) To: gcc-patches [-- Attachment #1: Type: text/plain, Size: 447 bytes --] Hi, this patch fixes PR66851. In parloops, we manage to parallelize outer loops, but not if the inner loop contains a reduction. There is an xfail in autopar/outer-4.c for this: ... /* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops" { xfail *-*-* } } } */ ... This patch allows outer loops with a reduction in the inner loop to be parallelized. Bootstrapped and reg-tested on x86_64. OK for trunk? Thanks, - Tom [-- Attachment #2: 0001-Handle-double-reduction-in-parloops.patch --] [-- Type: text/x-patch, Size: 7050 bytes --] Handle double reduction in parloops 2015-07-13 Tom de Vries <tom@codesourcery.com> PR tree-optimization/66851 * tree-parloops.c (reduc_stmt_res): New function. (initialize_reductions, add_field_for_reduction) (create_phi_for_local_result, create_loads_for_reductions) (create_stores_for_reduction, build_new_reduction): Handle case that reduc_stmt is a phi. (gather_scalar_reductions): Allow double_reduc reductions. * gcc.dg/autopar/outer-4.c (parloop): Remove superfluous noinline attribute. Remove xfail on scan for parallelizing outer loop. (main): Remove. * testsuite/libgomp.c/outer-4.c: New test. --- gcc/testsuite/gcc.dg/autopar/outer-4.c | 17 ++++------------ gcc/tree-parloops.c | 37 +++++++++++++++++++++++++--------- libgomp/testsuite/libgomp.c/outer-4.c | 36 +++++++++++++++++++++++++++++++++ 3 files changed, 68 insertions(+), 22 deletions(-) create mode 100644 libgomp/testsuite/libgomp.c/outer-4.c diff --git a/gcc/testsuite/gcc.dg/autopar/outer-4.c b/gcc/testsuite/gcc.dg/autopar/outer-4.c index 6fd37c5..f435080 100644 --- a/gcc/testsuite/gcc.dg/autopar/outer-4.c +++ b/gcc/testsuite/gcc.dg/autopar/outer-4.c @@ -6,15 +6,13 @@ void abort (void); int g_sum=0; int x[500][500]; -__attribute__((noinline)) -void parloop (int N) +void +parloop (int N) { int i, j; int sum; - /* Double reduction is currently not supported, outer loop is not - parallelized. Inner reduction is detected, inner loop is - parallelized. */ + /* Double reduction is detected, outer loop is parallelized. */ sum = 0; for (i = 0; i < N; i++) for (j = 0; j < N; j++) @@ -23,13 +21,6 @@ void parloop (int N) g_sum = sum; } -int main(void) -{ - parloop(500); - - return 0; -} - -/* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops" } } */ /* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" } } */ diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index 21ed17b..db7da62 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -560,6 +560,14 @@ take_address_of (tree obj, tree type, edge entry, return name; } +static tree +reduc_stmt_res (gimple stmt) +{ + return (gimple_code (stmt) == GIMPLE_PHI + ? gimple_phi_result (stmt) + : gimple_assign_lhs (stmt)); +} + /* Callback for htab_traverse. Create the initialization statement for reduction described in SLOT, and place it at the preheader of the loop described in DATA. */ @@ -586,7 +594,7 @@ initialize_reductions (reduction_info **slot, struct loop *loop) c = build_omp_clause (gimple_location (reduc->reduc_stmt), OMP_CLAUSE_REDUCTION); OMP_CLAUSE_REDUCTION_CODE (c) = reduc->reduction_code; - OMP_CLAUSE_DECL (c) = SSA_NAME_VAR (gimple_assign_lhs (reduc->reduc_stmt)); + OMP_CLAUSE_DECL (c) = SSA_NAME_VAR (reduc_stmt_res (reduc->reduc_stmt)); init = omp_reduction_init (c, TREE_TYPE (bvar)); reduc->init = init; @@ -993,7 +1001,7 @@ add_field_for_reduction (reduction_info **slot, tree type) { struct reduction_info *const red = *slot; - tree var = gimple_assign_lhs (red->reduc_stmt); + tree var = reduc_stmt_res (red->reduc_stmt); tree field = build_decl (gimple_location (red->reduc_stmt), FIELD_DECL, SSA_NAME_IDENTIFIER (var), TREE_TYPE (var)); @@ -1053,12 +1061,12 @@ create_phi_for_local_result (reduction_info **slot, struct loop *loop) e = EDGE_PRED (store_bb, 1); else e = EDGE_PRED (store_bb, 0); - local_res = copy_ssa_name (gimple_assign_lhs (reduc->reduc_stmt)); + tree lhs = reduc_stmt_res (reduc->reduc_stmt); + local_res = copy_ssa_name (lhs); locus = gimple_location (reduc->reduc_stmt); new_phi = create_phi_node (local_res, store_bb); add_phi_arg (new_phi, reduc->init, e, locus); - add_phi_arg (new_phi, gimple_assign_lhs (reduc->reduc_stmt), - FALLTHRU_EDGE (loop->latch), locus); + add_phi_arg (new_phi, lhs, FALLTHRU_EDGE (loop->latch), locus); reduc->new_phi = new_phi; return 1; @@ -1151,7 +1159,7 @@ create_loads_for_reductions (reduction_info **slot, struct clsn_data *clsn_data) struct reduction_info *const red = *slot; gimple stmt; gimple_stmt_iterator gsi; - tree type = TREE_TYPE (gimple_assign_lhs (red->reduc_stmt)); + tree type = TREE_TYPE (reduc_stmt_res (red->reduc_stmt)); tree load_struct; tree name; tree x; @@ -1212,7 +1220,7 @@ create_stores_for_reduction (reduction_info **slot, struct clsn_data *clsn_data) tree t; gimple stmt; gimple_stmt_iterator gsi; - tree type = TREE_TYPE (gimple_assign_lhs (red->reduc_stmt)); + tree type = TREE_TYPE (reduc_stmt_res (red->reduc_stmt)); gsi = gsi_last_bb (clsn_data->store_bb); t = build3 (COMPONENT_REF, type, clsn_data->store, red->field, NULL_TREE); @@ -2321,6 +2329,7 @@ build_new_reduction (reduction_info_table_type *reduction_list, { reduction_info **slot; struct reduction_info *new_reduction; + enum tree_code reduction_code; gcc_assert (reduc_stmt); @@ -2332,12 +2341,22 @@ build_new_reduction (reduction_info_table_type *reduction_list, fprintf (dump_file, "\n"); } + if (gimple_code (reduc_stmt) == GIMPLE_PHI) + { + tree op1 = PHI_ARG_DEF (reduc_stmt, 0); + gimple def1 = SSA_NAME_DEF_STMT (op1); + reduction_code = gimple_assign_rhs_code (def1); + } + + else + reduction_code = gimple_assign_rhs_code (reduc_stmt); + new_reduction = XCNEW (struct reduction_info); new_reduction->reduc_stmt = reduc_stmt; new_reduction->reduc_phi = phi; new_reduction->reduc_version = SSA_NAME_VERSION (gimple_phi_result (phi)); - new_reduction->reduction_code = gimple_assign_rhs_code (reduc_stmt); + new_reduction->reduction_code = reduction_code; slot = reduction_list->find_slot (new_reduction, INSERT); *slot = new_reduction; } @@ -2378,7 +2397,7 @@ gather_scalar_reductions (loop_p loop, reduction_info_table_type *reduction_list gimple reduc_stmt = vect_force_simple_reduction (simple_loop_info, phi, true, &double_reduc); - if (reduc_stmt && !double_reduc) + if (reduc_stmt) build_new_reduction (reduction_list, reduc_stmt, phi); } } diff --git a/libgomp/testsuite/libgomp.c/outer-4.c b/libgomp/testsuite/libgomp.c/outer-4.c new file mode 100644 index 0000000..f77f634 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/outer-4.c @@ -0,0 +1,36 @@ +/* { dg-do run } */ +/* { dg-additional-options "-ftree-parallelize-loops=2" } */ + +void abort (void); + +int g_sum = 1; + +int x[500][500]; + +void __attribute__((noinline,noclone)) +parloop (int N) +{ + int i, j; + int sum; + + /* Double reduction is detected, outer loop is parallelized. */ + sum = 0; + for (i = 0; i < N; i++) + for (j = 0; j < N; j++) + sum += x[i][j]; + + g_sum = sum; +} + +int +main (void) +{ + x[234][432] = 2; + + parloop (500); + + if (g_sum != 2) + abort (); + + return 0; +} -- 1.9.1 ^ permalink raw reply [flat|nested] 4+ messages in thread
* [PING][PATCH, PR66851] Handle double reduction in parloops 2015-07-13 14:56 [PATCH, PR66851] Handle double reduction in parloops Tom de Vries @ 2015-07-24 10:55 ` Tom de Vries 2015-07-27 23:12 ` Tom de Vries 0 siblings, 1 reply; 4+ messages in thread From: Tom de Vries @ 2015-07-24 10:55 UTC (permalink / raw) To: gcc-patches On 13/07/15 16:55, Tom de Vries wrote: > Hi, > > this patch fixes PR66851. > > In parloops, we manage to parallelize outer loops, but not if the inner > loop contains a reduction. There is an xfail in autopar/outer-4.c for this: > ... > /* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 > "parloops" { xfail *-*-* } } } */ > ... > > This patch allows outer loops with a reduction in the inner loop to be > parallelized. > > Bootstrapped and reg-tested on x86_64. > > OK for trunk? > Ping ( original posting at https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01057.html ). Thanks, - Tom > 0001-Handle-double-reduction-in-parloops.patch > > > Handle double reduction in parloops > > 2015-07-13 Tom de Vries<tom@codesourcery.com> > > PR tree-optimization/66851 > * tree-parloops.c (reduc_stmt_res): New function. > (initialize_reductions, add_field_for_reduction) > (create_phi_for_local_result, create_loads_for_reductions) > (create_stores_for_reduction, build_new_reduction): Handle case that > reduc_stmt is a phi. > (gather_scalar_reductions): Allow double_reduc reductions. > > * gcc.dg/autopar/outer-4.c (parloop): Remove superfluous noinline > attribute. Remove xfail on scan for parallelizing outer loop. > (main): Remove. > > * testsuite/libgomp.c/outer-4.c: New test. > --- > gcc/testsuite/gcc.dg/autopar/outer-4.c | 17 ++++------------ > gcc/tree-parloops.c | 37 +++++++++++++++++++++++++--------- > libgomp/testsuite/libgomp.c/outer-4.c | 36 +++++++++++++++++++++++++++++++++ > 3 files changed, 68 insertions(+), 22 deletions(-) > create mode 100644 libgomp/testsuite/libgomp.c/outer-4.c > > diff --git a/gcc/testsuite/gcc.dg/autopar/outer-4.c b/gcc/testsuite/gcc.dg/autopar/outer-4.c > index 6fd37c5..f435080 100644 > --- a/gcc/testsuite/gcc.dg/autopar/outer-4.c > +++ b/gcc/testsuite/gcc.dg/autopar/outer-4.c > @@ -6,15 +6,13 @@ void abort (void); > int g_sum=0; > int x[500][500]; > > -__attribute__((noinline)) > -void parloop (int N) > +void > +parloop (int N) > { > int i, j; > int sum; > > - /* Double reduction is currently not supported, outer loop is not > - parallelized. Inner reduction is detected, inner loop is > - parallelized. */ > + /* Double reduction is detected, outer loop is parallelized. */ > sum = 0; > for (i = 0; i < N; i++) > for (j = 0; j < N; j++) > @@ -23,13 +21,6 @@ void parloop (int N) > g_sum = sum; > } > > -int main(void) > -{ > - parloop(500); > - > - return 0; > -} > - > > -/* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops" { xfail *-*-* } } } */ > +/* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops" } } */ > /* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" } } */ > diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c > index 21ed17b..db7da62 100644 > --- a/gcc/tree-parloops.c > +++ b/gcc/tree-parloops.c > @@ -560,6 +560,14 @@ take_address_of (tree obj, tree type, edge entry, > return name; > } > > +static tree > +reduc_stmt_res (gimple stmt) > +{ > + return (gimple_code (stmt) == GIMPLE_PHI > + ? gimple_phi_result (stmt) > + : gimple_assign_lhs (stmt)); > +} > + > /* Callback for htab_traverse. Create the initialization statement > for reduction described in SLOT, and place it at the preheader of > the loop described in DATA. */ > @@ -586,7 +594,7 @@ initialize_reductions (reduction_info **slot, struct loop *loop) > c = build_omp_clause (gimple_location (reduc->reduc_stmt), > OMP_CLAUSE_REDUCTION); > OMP_CLAUSE_REDUCTION_CODE (c) = reduc->reduction_code; > - OMP_CLAUSE_DECL (c) = SSA_NAME_VAR (gimple_assign_lhs (reduc->reduc_stmt)); > + OMP_CLAUSE_DECL (c) = SSA_NAME_VAR (reduc_stmt_res (reduc->reduc_stmt)); > > init = omp_reduction_init (c, TREE_TYPE (bvar)); > reduc->init = init; > @@ -993,7 +1001,7 @@ add_field_for_reduction (reduction_info **slot, tree type) > { > > struct reduction_info *const red = *slot; > - tree var = gimple_assign_lhs (red->reduc_stmt); > + tree var = reduc_stmt_res (red->reduc_stmt); > tree field = build_decl (gimple_location (red->reduc_stmt), FIELD_DECL, > SSA_NAME_IDENTIFIER (var), TREE_TYPE (var)); > > @@ -1053,12 +1061,12 @@ create_phi_for_local_result (reduction_info **slot, struct loop *loop) > e = EDGE_PRED (store_bb, 1); > else > e = EDGE_PRED (store_bb, 0); > - local_res = copy_ssa_name (gimple_assign_lhs (reduc->reduc_stmt)); > + tree lhs = reduc_stmt_res (reduc->reduc_stmt); > + local_res = copy_ssa_name (lhs); > locus = gimple_location (reduc->reduc_stmt); > new_phi = create_phi_node (local_res, store_bb); > add_phi_arg (new_phi, reduc->init, e, locus); > - add_phi_arg (new_phi, gimple_assign_lhs (reduc->reduc_stmt), > - FALLTHRU_EDGE (loop->latch), locus); > + add_phi_arg (new_phi, lhs, FALLTHRU_EDGE (loop->latch), locus); > reduc->new_phi = new_phi; > > return 1; > @@ -1151,7 +1159,7 @@ create_loads_for_reductions (reduction_info **slot, struct clsn_data *clsn_data) > struct reduction_info *const red = *slot; > gimple stmt; > gimple_stmt_iterator gsi; > - tree type = TREE_TYPE (gimple_assign_lhs (red->reduc_stmt)); > + tree type = TREE_TYPE (reduc_stmt_res (red->reduc_stmt)); > tree load_struct; > tree name; > tree x; > @@ -1212,7 +1220,7 @@ create_stores_for_reduction (reduction_info **slot, struct clsn_data *clsn_data) > tree t; > gimple stmt; > gimple_stmt_iterator gsi; > - tree type = TREE_TYPE (gimple_assign_lhs (red->reduc_stmt)); > + tree type = TREE_TYPE (reduc_stmt_res (red->reduc_stmt)); > > gsi = gsi_last_bb (clsn_data->store_bb); > t = build3 (COMPONENT_REF, type, clsn_data->store, red->field, NULL_TREE); > @@ -2321,6 +2329,7 @@ build_new_reduction (reduction_info_table_type *reduction_list, > { > reduction_info **slot; > struct reduction_info *new_reduction; > + enum tree_code reduction_code; > > gcc_assert (reduc_stmt); > > @@ -2332,12 +2341,22 @@ build_new_reduction (reduction_info_table_type *reduction_list, > fprintf (dump_file, "\n"); > } > > + if (gimple_code (reduc_stmt) == GIMPLE_PHI) > + { > + tree op1 = PHI_ARG_DEF (reduc_stmt, 0); > + gimple def1 = SSA_NAME_DEF_STMT (op1); > + reduction_code = gimple_assign_rhs_code (def1); > + } > + > + else > + reduction_code = gimple_assign_rhs_code (reduc_stmt); > + > new_reduction = XCNEW (struct reduction_info); > > new_reduction->reduc_stmt = reduc_stmt; > new_reduction->reduc_phi = phi; > new_reduction->reduc_version = SSA_NAME_VERSION (gimple_phi_result (phi)); > - new_reduction->reduction_code = gimple_assign_rhs_code (reduc_stmt); > + new_reduction->reduction_code = reduction_code; > slot = reduction_list->find_slot (new_reduction, INSERT); > *slot = new_reduction; > } > @@ -2378,7 +2397,7 @@ gather_scalar_reductions (loop_p loop, reduction_info_table_type *reduction_list > gimple reduc_stmt = vect_force_simple_reduction (simple_loop_info, > phi, true, > &double_reduc); > - if (reduc_stmt && !double_reduc) > + if (reduc_stmt) > build_new_reduction (reduction_list, reduc_stmt, phi); > } > } > diff --git a/libgomp/testsuite/libgomp.c/outer-4.c b/libgomp/testsuite/libgomp.c/outer-4.c > new file mode 100644 > index 0000000..f77f634 > --- /dev/null > +++ b/libgomp/testsuite/libgomp.c/outer-4.c > @@ -0,0 +1,36 @@ > +/* { dg-do run } */ > +/* { dg-additional-options "-ftree-parallelize-loops=2" } */ > + > +void abort (void); > + > +int g_sum = 1; > + > +int x[500][500]; > + > +void __attribute__((noinline,noclone)) > +parloop (int N) > +{ > + int i, j; > + int sum; > + > + /* Double reduction is detected, outer loop is parallelized. */ > + sum = 0; > + for (i = 0; i < N; i++) > + for (j = 0; j < N; j++) > + sum += x[i][j]; > + > + g_sum = sum; > +} > + > +int > +main (void) > +{ > + x[234][432] = 2; > + > + parloop (500); > + > + if (g_sum != 2) > + abort (); > + > + return 0; > +} > -- 1.9.1 > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PING][PATCH, PR66851] Handle double reduction in parloops 2015-07-24 10:55 ` [PING][PATCH, " Tom de Vries @ 2015-07-27 23:12 ` Tom de Vries 2015-07-28 8:01 ` Richard Biener 0 siblings, 1 reply; 4+ messages in thread From: Tom de Vries @ 2015-07-27 23:12 UTC (permalink / raw) To: gcc-patches [-- Attachment #1: Type: text/plain, Size: 801 bytes --] On 24/07/15 12:30, Tom de Vries wrote: > On 13/07/15 16:55, Tom de Vries wrote: >> Hi, >> >> this patch fixes PR66851. >> >> In parloops, we manage to parallelize outer loops, but not if the inner >> loop contains a reduction. There is an xfail in autopar/outer-4.c for >> this: >> ... >> /* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 >> "parloops" { xfail *-*-* } } } */ >> ... >> >> This patch allows outer loops with a reduction in the inner loop to be >> parallelized. >> Updated patch checks that we actually have an inner reduction that we can parallelize. So, uns-outer-4.c with unsigned int reduction will be paralellized, while outer-4.c with signed int reduction will not be paralellized. Bootstrapped on x86_64, reg-test in progress. OK for trunk? Thanks, - Tom [-- Attachment #2: 0001-Handle-double-reduction-in-parloops.patch --] [-- Type: text/x-patch, Size: 8277 bytes --] Handle double reduction in parloops 2015-07-27 Tom de Vries <tom@codesourcery.com> * tree-parloops.c (reduc_stmt_res): New function. (initialize_reductions, add_field_for_reduction) (create_phi_for_local_result, create_loads_for_reductions) (create_stores_for_reduction, build_new_reduction): Handle case that reduc_stmt is a phi. (gather_scalar_reductions): Allow double_reduc reductions. * gcc.dg/autopar/uns-outer-4.c: Remove xfail on scan for parallelizing outer loop. * testsuite/libgomp.c/uns-outer-4.c: New test. --- gcc/testsuite/gcc.dg/autopar/uns-outer-4.c | 6 +-- gcc/tree-parloops.c | 73 ++++++++++++++++++++++++++---- libgomp/testsuite/libgomp.c/uns-outer-4.c | 36 +++++++++++++++ 3 files changed, 102 insertions(+), 13 deletions(-) create mode 100644 libgomp/testsuite/libgomp.c/uns-outer-4.c diff --git a/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c b/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c index 30ead25..5eb67ea 100644 --- a/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c +++ b/gcc/testsuite/gcc.dg/autopar/uns-outer-4.c @@ -12,9 +12,7 @@ parloop (int N) int i, j; unsigned int sum; - /* Double reduction is currently not supported, outer loop is not - parallelized. Inner reduction is detected, inner loop is - parallelized. */ + /* Double reduction is detected, outer loop is parallelized. */ sum = 0; for (i = 0; i < N; i++) for (j = 0; j < N; j++) @@ -23,5 +21,5 @@ parloop (int N) g_sum = sum; } -/* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops" } } */ /* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" } } */ diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index daf23f2..b06265c 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -549,6 +549,14 @@ take_address_of (tree obj, tree type, edge entry, return name; } +static tree +reduc_stmt_res (gimple stmt) +{ + return (gimple_code (stmt) == GIMPLE_PHI + ? gimple_phi_result (stmt) + : gimple_assign_lhs (stmt)); +} + /* Callback for htab_traverse. Create the initialization statement for reduction described in SLOT, and place it at the preheader of the loop described in DATA. */ @@ -575,7 +583,7 @@ initialize_reductions (reduction_info **slot, struct loop *loop) c = build_omp_clause (gimple_location (reduc->reduc_stmt), OMP_CLAUSE_REDUCTION); OMP_CLAUSE_REDUCTION_CODE (c) = reduc->reduction_code; - OMP_CLAUSE_DECL (c) = SSA_NAME_VAR (gimple_assign_lhs (reduc->reduc_stmt)); + OMP_CLAUSE_DECL (c) = SSA_NAME_VAR (reduc_stmt_res (reduc->reduc_stmt)); init = omp_reduction_init (c, TREE_TYPE (bvar)); reduc->init = init; @@ -982,7 +990,7 @@ add_field_for_reduction (reduction_info **slot, tree type) { struct reduction_info *const red = *slot; - tree var = gimple_assign_lhs (red->reduc_stmt); + tree var = reduc_stmt_res (red->reduc_stmt); tree field = build_decl (gimple_location (red->reduc_stmt), FIELD_DECL, SSA_NAME_IDENTIFIER (var), TREE_TYPE (var)); @@ -1042,12 +1050,12 @@ create_phi_for_local_result (reduction_info **slot, struct loop *loop) e = EDGE_PRED (store_bb, 1); else e = EDGE_PRED (store_bb, 0); - local_res = copy_ssa_name (gimple_assign_lhs (reduc->reduc_stmt)); + tree lhs = reduc_stmt_res (reduc->reduc_stmt); + local_res = copy_ssa_name (lhs); locus = gimple_location (reduc->reduc_stmt); new_phi = create_phi_node (local_res, store_bb); add_phi_arg (new_phi, reduc->init, e, locus); - add_phi_arg (new_phi, gimple_assign_lhs (reduc->reduc_stmt), - FALLTHRU_EDGE (loop->latch), locus); + add_phi_arg (new_phi, lhs, FALLTHRU_EDGE (loop->latch), locus); reduc->new_phi = new_phi; return 1; @@ -1140,7 +1148,7 @@ create_loads_for_reductions (reduction_info **slot, struct clsn_data *clsn_data) struct reduction_info *const red = *slot; gimple stmt; gimple_stmt_iterator gsi; - tree type = TREE_TYPE (gimple_assign_lhs (red->reduc_stmt)); + tree type = TREE_TYPE (reduc_stmt_res (red->reduc_stmt)); tree load_struct; tree name; tree x; @@ -1205,7 +1213,7 @@ create_stores_for_reduction (reduction_info **slot, struct clsn_data *clsn_data) tree t; gimple stmt; gimple_stmt_iterator gsi; - tree type = TREE_TYPE (gimple_assign_lhs (red->reduc_stmt)); + tree type = TREE_TYPE (reduc_stmt_res (red->reduc_stmt)); gsi = gsi_last_bb (clsn_data->store_bb); t = build3 (COMPONENT_REF, type, clsn_data->store, red->field, NULL_TREE); @@ -2330,6 +2338,7 @@ build_new_reduction (reduction_info_table_type *reduction_list, { reduction_info **slot; struct reduction_info *new_reduction; + enum tree_code reduction_code; gcc_assert (reduc_stmt); @@ -2341,12 +2350,22 @@ build_new_reduction (reduction_info_table_type *reduction_list, fprintf (dump_file, "\n"); } + if (gimple_code (reduc_stmt) == GIMPLE_PHI) + { + tree op1 = PHI_ARG_DEF (reduc_stmt, 0); + gimple def1 = SSA_NAME_DEF_STMT (op1); + reduction_code = gimple_assign_rhs_code (def1); + } + + else + reduction_code = gimple_assign_rhs_code (reduc_stmt); + new_reduction = XCNEW (struct reduction_info); new_reduction->reduc_stmt = reduc_stmt; new_reduction->reduc_phi = phi; new_reduction->reduc_version = SSA_NAME_VERSION (gimple_phi_result (phi)); - new_reduction->reduction_code = gimple_assign_rhs_code (reduc_stmt); + new_reduction->reduction_code = reduction_code; slot = reduction_list->find_slot (new_reduction, INSERT); *slot = new_reduction; } @@ -2368,6 +2387,8 @@ gather_scalar_reductions (loop_p loop, reduction_info_table_type *reduction_list { gphi_iterator gsi; loop_vec_info simple_loop_info; + loop_vec_info simple_inner_loop_info = NULL; + bool allow_double_reduc = true; simple_loop_info = vect_analyze_loop_form (loop); if (simple_loop_info == NULL) @@ -2389,12 +2410,46 @@ gather_scalar_reductions (loop_p loop, reduction_info_table_type *reduction_list gimple reduc_stmt = vect_force_simple_reduction (simple_loop_info, phi, true, &double_reduc, true); - if (!reduc_stmt || double_reduc) + if (!reduc_stmt) continue; + if (double_reduc) + { + if (!allow_double_reduc + || loop->inner->inner != NULL) + continue; + + if (!simple_inner_loop_info) + { + simple_inner_loop_info = vect_analyze_loop_form (loop->inner); + if (!simple_inner_loop_info) + { + allow_double_reduc = false; + continue; + } + } + + use_operand_p use_p; + gimple inner_stmt; + bool single_use_p = single_imm_use (res, &use_p, &inner_stmt); + gcc_assert (single_use_p); + gphi *inner_phi = as_a <gphi *> (inner_stmt); + if (simple_iv (loop->inner, loop->inner, PHI_RESULT (inner_phi), + &iv, true)) + continue; + + gimple inner_reduc_stmt + = vect_force_simple_reduction (simple_inner_loop_info, inner_phi, + true, &double_reduc, true); + gcc_assert (!double_reduc); + if (inner_reduc_stmt == NULL) + continue; + } + build_new_reduction (reduction_list, reduc_stmt, phi); } destroy_loop_vec_info (simple_loop_info, true); + destroy_loop_vec_info (simple_inner_loop_info, true); /* As gimple_uid is used by the vectorizer in between vect_analyze_loop_form and destroy_loop_vec_info, we can set gimple_uid of reduc_phi stmts diff --git a/libgomp/testsuite/libgomp.c/uns-outer-4.c b/libgomp/testsuite/libgomp.c/uns-outer-4.c new file mode 100644 index 0000000..cd646a5 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/uns-outer-4.c @@ -0,0 +1,36 @@ +/* { dg-do run } */ +/* { dg-additional-options "-ftree-parallelize-loops=2" } */ + +void abort (void); + +unsigned int g_sum = 1; + +unsigned int x[500][500]; + +void __attribute__((noinline,noclone)) +parloop (int N) +{ + int i, j; + unsigned int sum; + + /* Double reduction is detected, outer loop is parallelized. */ + sum = 0; + for (i = 0; i < N; i++) + for (j = 0; j < N; j++) + sum += x[i][j]; + + g_sum = sum; +} + +int +main (void) +{ + x[234][432] = 2; + + parloop (500); + + if (g_sum != 2) + abort (); + + return 0; +} -- 1.9.1 ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PING][PATCH, PR66851] Handle double reduction in parloops 2015-07-27 23:12 ` Tom de Vries @ 2015-07-28 8:01 ` Richard Biener 0 siblings, 0 replies; 4+ messages in thread From: Richard Biener @ 2015-07-28 8:01 UTC (permalink / raw) To: Tom de Vries; +Cc: gcc-patches On Tue, Jul 28, 2015 at 12:32 AM, Tom de Vries <Tom_deVries@mentor.com> wrote: > On 24/07/15 12:30, Tom de Vries wrote: >> >> On 13/07/15 16:55, Tom de Vries wrote: >>> >>> Hi, >>> >>> this patch fixes PR66851. >>> >>> In parloops, we manage to parallelize outer loops, but not if the inner >>> loop contains a reduction. There is an xfail in autopar/outer-4.c for >>> this: >>> ... >>> /* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 >>> "parloops" { xfail *-*-* } } } */ >>> ... >>> >>> This patch allows outer loops with a reduction in the inner loop to be >>> parallelized. >>> > > Updated patch checks that we actually have an inner reduction that we can > parallelize. So, uns-outer-4.c with unsigned int reduction will be > paralellized, while outer-4.c with signed int reduction will not be > paralellized. > > Bootstrapped on x86_64, reg-test in progress. > > > OK for trunk? Ok. Richard. > Thanks, > - Tom > ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-07-28 7:44 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-07-13 14:56 [PATCH, PR66851] Handle double reduction in parloops Tom de Vries 2015-07-24 10:55 ` [PING][PATCH, " Tom de Vries 2015-07-27 23:12 ` Tom de Vries 2015-07-28 8:01 ` Richard Biener
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).