* [PATCH GCC][04/13]Sort statements in topological order for loop distribution @ 2017-06-12 17:03 Bin Cheng 2017-06-13 10:59 ` Richard Biener 0 siblings, 1 reply; 6+ messages in thread From: Bin Cheng @ 2017-06-12 17:03 UTC (permalink / raw) To: gcc-patches; +Cc: nd [-- Attachment #1: Type: text/plain, Size: 684 bytes --] Hi, During the work I ran into a latent bug for distributing. For the moment we sort statements in dominance order, but that's not enough because basic blocks may be sorted in reverse order of execution flow. This results in wrong data dependence direction later. This patch fixes the issue by sorting in topological order. Bootstrap and test on x86_64 and AArch64. Is it OK? Thanks, bin 2017-06-07 Bin Cheng <bin.cheng@arm.com> * tree-loop-distribution.c (bb_top_order_index): New. (bb_top_order_index_size, bb_top_order_cmp): New. (stmts_from_loop): Use topological order. (pass_loop_distribution::execute): Compute topological order for. basic blocks. [-- Attachment #2: 0004-sort-stmts-in-top-order-20170607.txt --] [-- Type: text/plain, Size: 3520 bytes --] From 4bb233239e080eca956b3db7836cdf64da486dbf Mon Sep 17 00:00:00 2001 From: Bin Cheng <binche01@e108451-lin.cambridge.arm.com> Date: Wed, 7 Jun 2017 13:47:52 +0100 Subject: [PATCH 04/14] sort-stmts-in-top-order-20170607.txt --- gcc/tree-loop-distribution.c | 58 +++++++++++++++++++++++++++++++++++++++----- 1 file changed, 52 insertions(+), 6 deletions(-) diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c index b0b9d66..a32253c 100644 --- a/gcc/tree-loop-distribution.c +++ b/gcc/tree-loop-distribution.c @@ -373,16 +373,39 @@ create_rdg_vertices (struct graph *rdg, vec<gimple *> stmts, loop_p loop, return true; } -/* Initialize STMTS with all the statements of LOOP. The order in - which we discover statements is important as - generate_loops_for_partition is using the same traversal for - identifying statements in loop copies. */ +/* Array mapping basic block's index to its topological order. */ +static int *bb_top_order_index; +/* And size of the array. */ +static int bb_top_order_index_size; + +/* If X has a smaller topological sort number than Y, returns -1; + if greater, returns 1. */ + +static int +bb_top_order_cmp (const void *x, const void *y) +{ + basic_block bb1 = *(const basic_block *) x; + basic_block bb2 = *(const basic_block *) y; + + gcc_assert (bb1->index < bb_top_order_index_size + && bb2->index < bb_top_order_index_size); + gcc_assert (bb1 == bb2 + || bb_top_order_index[bb1->index] + != bb_top_order_index[bb2->index]); + + return (bb_top_order_index[bb1->index] - bb_top_order_index[bb2->index]); +} + +/* Initialize STMTS with all the statements of LOOP. We use topological + order to discover all statements. The order is important because + generate_loops_for_partition is using the same traversal for identifying + statements in loop copies. */ static void stmts_from_loop (struct loop *loop, vec<gimple *> *stmts) { unsigned int i; - basic_block *bbs = get_loop_body_in_dom_order (loop); + basic_block *bbs = get_loop_body_in_custom_order (loop, bb_top_order_cmp); for (i = 0; i < loop->num_nodes; i++) { @@ -1764,6 +1787,22 @@ pass_loop_distribution::execute (function *fun) if (number_of_loops (fun) <= 1) return 0; + /* Compute topological order for basic blocks. Topological order is + needed because data dependence is computed for data references in + lexicographical order. */ + if (bb_top_order_index == NULL) + { + int *rpo = XNEWVEC (int, last_basic_block_for_fn (cfun)); + + bb_top_order_index = XNEWVEC (int, last_basic_block_for_fn (cfun)); + bb_top_order_index_size + = pre_and_rev_post_order_compute_fn (cfun, NULL, rpo, true); + for (int i = 0; i < bb_top_order_index_size; i++) + bb_top_order_index[rpo[i]] = i; + + free (rpo); + } + FOR_ALL_BB_FN (bb, fun) { gimple_stmt_iterator gsi; @@ -1881,13 +1920,20 @@ out: if (cd) delete cd; + if (bb_top_order_index != NULL) + { + free (bb_top_order_index); + bb_top_order_index = NULL; + bb_top_order_index_size = 0; + } + if (changed) { /* Destroy loop bodies that could not be reused. Do this late as we otherwise can end up refering to stale data in control dependences. */ unsigned i; FOR_EACH_VEC_ELT (loops_to_be_destroyed, i, loop) - destroy_loop (loop); + destroy_loop (loop); /* Cached scalar evolutions now may refer to wrong or non-existing loops. */ -- 1.9.1 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH GCC][04/13]Sort statements in topological order for loop distribution 2017-06-12 17:03 [PATCH GCC][04/13]Sort statements in topological order for loop distribution Bin Cheng @ 2017-06-13 10:59 ` Richard Biener 2017-06-14 7:53 ` Bin.Cheng 0 siblings, 1 reply; 6+ messages in thread From: Richard Biener @ 2017-06-13 10:59 UTC (permalink / raw) To: Bin Cheng; +Cc: gcc-patches, nd On Mon, Jun 12, 2017 at 7:02 PM, Bin Cheng <Bin.Cheng@arm.com> wrote: > Hi, > During the work I ran into a latent bug for distributing. For the moment we sort statements > in dominance order, but that's not enough because basic blocks may be sorted in reverse order > of execution flow. This results in wrong data dependence direction later. This patch fixes > the issue by sorting in topological order. > > Bootstrap and test on x86_64 and AArch64. Is it OK? I suppose you are fixing static int pg_add_dependence_edges (struct graph *rdg, vec<loop_p> loops, int dir, vec<data_reference_p> drs1, vec<data_reference_p> drs2) { ... /* Re-shuffle data-refs to be in dominator order. */ if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1)) > rdg_vertex_for_stmt (rdg, DR_STMT (dr2))) { std::swap (dr1, dr2); this_dir = -this_dir; } but then for stmts that are not "ordered" by RPO or DOM like if (flag) ... = dr1; else ... = dr2; this doesn't avoid spurious swaps? Also the code was basically copied from tree-data-refs.c:find_data_references_in_loop which does iterate over get_loop_body_in_dom_order as well. So isn't the issue latent there as well? That said, what about those "unordered" stmts? I suppose dependence analysis still happily computes a dependence distance but in reality we'd have to consider both execution orders? Thanks, Richard. > > Thanks, > bin > 2017-06-07 Bin Cheng <bin.cheng@arm.com> > > * tree-loop-distribution.c (bb_top_order_index): New. > (bb_top_order_index_size, bb_top_order_cmp): New. > (stmts_from_loop): Use topological order. > (pass_loop_distribution::execute): Compute topological order for. > basic blocks. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH GCC][04/13]Sort statements in topological order for loop distribution 2017-06-13 10:59 ` Richard Biener @ 2017-06-14 7:53 ` Bin.Cheng 2017-06-14 9:15 ` Richard Biener 0 siblings, 1 reply; 6+ messages in thread From: Bin.Cheng @ 2017-06-14 7:53 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches On Tue, Jun 13, 2017 at 11:59 AM, Richard Biener <richard.guenther@gmail.com> wrote: > On Mon, Jun 12, 2017 at 7:02 PM, Bin Cheng <Bin.Cheng@arm.com> wrote: >> Hi, >> During the work I ran into a latent bug for distributing. For the moment we sort statements >> in dominance order, but that's not enough because basic blocks may be sorted in reverse order >> of execution flow. This results in wrong data dependence direction later. This patch fixes >> the issue by sorting in topological order. >> >> Bootstrap and test on x86_64 and AArch64. Is it OK? > > I suppose you are fixing > > static int > pg_add_dependence_edges (struct graph *rdg, vec<loop_p> loops, int dir, > vec<data_reference_p> drs1, > vec<data_reference_p> drs2) > { > ... > /* Re-shuffle data-refs to be in dominator order. */ > if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1)) > > rdg_vertex_for_stmt (rdg, DR_STMT (dr2))) > { > std::swap (dr1, dr2); > this_dir = -this_dir; > } > > but then for stmts that are not "ordered" by RPO or DOM like > > if (flag) > ... = dr1; > else > ... = dr2; > > this doesn't avoid spurious swaps? Also the code was basically No, this is mainly for below case: if (flag) { partition1: arr[i] = x; } partition2: arr[i] = y; function pg_add_dependence_edges is like: /* Re-shuffle data-refs to be in dominator order. */ if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1)) > rdg_vertex_for_stmt (rdg, DR_STMT (dr2))) { std::swap (dr1, dr2); this_dir = -this_dir; } //... else if (DDR_ARE_DEPENDENT (ddr) == NULL_TREE) { if (DDR_REVERSED_P (ddr)) { std::swap (dr1, dr2); this_dir = -this_dir; } /* Known dependences can still be unordered througout the iteration space, see gcc.dg/tree-ssa/ldist-16.c. */ if (DDR_NUM_DIST_VECTS (ddr) != 1) this_dir = 2; /* If the overlap is exact preserve stmt order. */ else if (lambda_vector_zerop (DDR_DIST_VECT (ddr, 0), 1)) ; else { /* Else as the distance vector is lexicographic positive swap the dependence direction. */ this_dir = -this_dir; } } For non-ZERO distance vector dependence, the second part always computes src->dest dependence info correctly, as well as edge direction of PG. For ZERO distance vector dependence, we rely on the swap part (thus topological order) to get correct dependence direction. For mentioned case, the two data references are unordered in dominance relation, but ordered in RPO. This is why DOM is not enough. For if-then-else case, the order actually doesn't matter, and the references are unordered in either dominance relation or RPO. Specifically, src->dest is always computed correctly for non-ZERO distance vector cases, no matter <dr1, dr2> or <dr2, dr1> is passed to data dependence analyzer. As for ZERO distance vector (exact overlap), the order doesn't matter either because they control dependent on the same condition. We can simply assume an arbitrary order for it. > copied from tree-data-refs.c:find_data_references_in_loop which > does iterate over get_loop_body_in_dom_order as well. So isn't the > issue latent there as well? In theory maybe. In practice, this is not a problem at all since loop distribution is the only one handles control dependence so far. > > That said, what about those "unordered" stmts? I suppose > dependence analysis still happily computes a dependence > distance but in reality we'd have to consider both execution > orders? As explained, there is no need to consider both orders. GCC doesn't really support control dependence parallelization? Thanks, bin > > Thanks, > Richard. > > >> >> Thanks, >> bin >> 2017-06-07 Bin Cheng <bin.cheng@arm.com> >> >> * tree-loop-distribution.c (bb_top_order_index): New. >> (bb_top_order_index_size, bb_top_order_cmp): New. >> (stmts_from_loop): Use topological order. >> (pass_loop_distribution::execute): Compute topological order for. >> basic blocks. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH GCC][04/13]Sort statements in topological order for loop distribution 2017-06-14 7:53 ` Bin.Cheng @ 2017-06-14 9:15 ` Richard Biener 2017-06-14 9:25 ` Bin.Cheng 0 siblings, 1 reply; 6+ messages in thread From: Richard Biener @ 2017-06-14 9:15 UTC (permalink / raw) To: Bin.Cheng; +Cc: gcc-patches On Wed, Jun 14, 2017 at 9:53 AM, Bin.Cheng <amker.cheng@gmail.com> wrote: > On Tue, Jun 13, 2017 at 11:59 AM, Richard Biener > <richard.guenther@gmail.com> wrote: >> On Mon, Jun 12, 2017 at 7:02 PM, Bin Cheng <Bin.Cheng@arm.com> wrote: >>> Hi, >>> During the work I ran into a latent bug for distributing. For the moment we sort statements >>> in dominance order, but that's not enough because basic blocks may be sorted in reverse order >>> of execution flow. This results in wrong data dependence direction later. This patch fixes >>> the issue by sorting in topological order. >>> >>> Bootstrap and test on x86_64 and AArch64. Is it OK? >> >> I suppose you are fixing >> >> static int >> pg_add_dependence_edges (struct graph *rdg, vec<loop_p> loops, int dir, >> vec<data_reference_p> drs1, >> vec<data_reference_p> drs2) >> { >> ... >> /* Re-shuffle data-refs to be in dominator order. */ >> if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1)) >> > rdg_vertex_for_stmt (rdg, DR_STMT (dr2))) >> { >> std::swap (dr1, dr2); >> this_dir = -this_dir; >> } >> >> but then for stmts that are not "ordered" by RPO or DOM like >> >> if (flag) >> ... = dr1; >> else >> ... = dr2; >> >> this doesn't avoid spurious swaps? Also the code was basically > No, this is mainly for below case: > if (flag) > { > partition1: arr[i] = x; > } > partition2: arr[i] = y; > > function pg_add_dependence_edges is like: > /* Re-shuffle data-refs to be in dominator order. */ > if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1)) > > rdg_vertex_for_stmt (rdg, DR_STMT (dr2))) > { > std::swap (dr1, dr2); > this_dir = -this_dir; > } > //... > else if (DDR_ARE_DEPENDENT (ddr) == NULL_TREE) > { > if (DDR_REVERSED_P (ddr)) > { > std::swap (dr1, dr2); > this_dir = -this_dir; > } > /* Known dependences can still be unordered througout the > iteration space, see gcc.dg/tree-ssa/ldist-16.c. */ > if (DDR_NUM_DIST_VECTS (ddr) != 1) > this_dir = 2; > /* If the overlap is exact preserve stmt order. */ > else if (lambda_vector_zerop (DDR_DIST_VECT (ddr, 0), 1)) > ; > else > { > /* Else as the distance vector is lexicographic positive > swap the dependence direction. */ > this_dir = -this_dir; > } > } > For non-ZERO distance vector dependence, the second part always > computes src->dest dependence info correctly, as well as edge > direction of PG. For ZERO distance vector dependence, we rely on the > swap part (thus topological order) to get correct dependence > direction. For mentioned case, the two data references are unordered > in dominance relation, but ordered in RPO. This is why DOM is not > enough. For if-then-else case, the order actually doesn't matter, and > the references are unordered in either dominance relation or RPO. > Specifically, src->dest is always computed correctly for non-ZERO > distance vector cases, no matter <dr1, dr2> or <dr2, dr1> is passed to > data dependence analyzer. As for ZERO distance vector (exact > overlap), the order doesn't matter either because they control > dependent on the same condition. We can simply assume an arbitrary > order for it. Ok, if it only is an issue for zero-distance then yes, I agree. >> copied from tree-data-refs.c:find_data_references_in_loop which >> does iterate over get_loop_body_in_dom_order as well. So isn't the >> issue latent there as well? > In theory maybe. In practice, this is not a problem at all since loop > distribution is the only one handles control dependence so far. You mean the only one running into the bogus BB ordering case. I don't see how handling control dependences factor in here. >> >> That said, what about those "unordered" stmts? I suppose >> dependence analysis still happily computes a dependence >> distance but in reality we'd have to consider both execution >> orders? > As explained, there is no need to consider both orders. GCC doesn't > really support control dependence parallelization? I think autopar supports an arbitrary CFG inside the loops but as it will never split them it won't change stmt ordering for zero-distance. That said, if dependence info can be incorrect if applied to a loop we should fixup tree-data-refs.c as well. It might be useful to make get_loop_body_in_rpo_order available then (and eventually all _in_dom_order users can work with rpo order as well thus we can replace it entirely as a second step). Thanks, Richard. > Thanks, > bin >> >> Thanks, >> Richard. >> >> >>> >>> Thanks, >>> bin >>> 2017-06-07 Bin Cheng <bin.cheng@arm.com> >>> >>> * tree-loop-distribution.c (bb_top_order_index): New. >>> (bb_top_order_index_size, bb_top_order_cmp): New. >>> (stmts_from_loop): Use topological order. >>> (pass_loop_distribution::execute): Compute topological order for. >>> basic blocks. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH GCC][04/13]Sort statements in topological order for loop distribution 2017-06-14 9:15 ` Richard Biener @ 2017-06-14 9:25 ` Bin.Cheng 2017-06-14 9:52 ` Richard Biener 0 siblings, 1 reply; 6+ messages in thread From: Bin.Cheng @ 2017-06-14 9:25 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches On Wed, Jun 14, 2017 at 10:15 AM, Richard Biener <richard.guenther@gmail.com> wrote: > On Wed, Jun 14, 2017 at 9:53 AM, Bin.Cheng <amker.cheng@gmail.com> wrote: >> On Tue, Jun 13, 2017 at 11:59 AM, Richard Biener >> <richard.guenther@gmail.com> wrote: >>> On Mon, Jun 12, 2017 at 7:02 PM, Bin Cheng <Bin.Cheng@arm.com> wrote: >>>> Hi, >>>> During the work I ran into a latent bug for distributing. For the moment we sort statements >>>> in dominance order, but that's not enough because basic blocks may be sorted in reverse order >>>> of execution flow. This results in wrong data dependence direction later. This patch fixes >>>> the issue by sorting in topological order. >>>> >>>> Bootstrap and test on x86_64 and AArch64. Is it OK? >>> >>> I suppose you are fixing >>> >>> static int >>> pg_add_dependence_edges (struct graph *rdg, vec<loop_p> loops, int dir, >>> vec<data_reference_p> drs1, >>> vec<data_reference_p> drs2) >>> { >>> ... >>> /* Re-shuffle data-refs to be in dominator order. */ >>> if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1)) >>> > rdg_vertex_for_stmt (rdg, DR_STMT (dr2))) >>> { >>> std::swap (dr1, dr2); >>> this_dir = -this_dir; >>> } >>> >>> but then for stmts that are not "ordered" by RPO or DOM like >>> >>> if (flag) >>> ... = dr1; >>> else >>> ... = dr2; >>> >>> this doesn't avoid spurious swaps? Also the code was basically >> No, this is mainly for below case: >> if (flag) >> { >> partition1: arr[i] = x; >> } >> partition2: arr[i] = y; >> >> function pg_add_dependence_edges is like: >> /* Re-shuffle data-refs to be in dominator order. */ >> if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1)) >> > rdg_vertex_for_stmt (rdg, DR_STMT (dr2))) >> { >> std::swap (dr1, dr2); >> this_dir = -this_dir; >> } >> //... >> else if (DDR_ARE_DEPENDENT (ddr) == NULL_TREE) >> { >> if (DDR_REVERSED_P (ddr)) >> { >> std::swap (dr1, dr2); >> this_dir = -this_dir; >> } >> /* Known dependences can still be unordered througout the >> iteration space, see gcc.dg/tree-ssa/ldist-16.c. */ >> if (DDR_NUM_DIST_VECTS (ddr) != 1) >> this_dir = 2; >> /* If the overlap is exact preserve stmt order. */ >> else if (lambda_vector_zerop (DDR_DIST_VECT (ddr, 0), 1)) >> ; >> else >> { >> /* Else as the distance vector is lexicographic positive >> swap the dependence direction. */ >> this_dir = -this_dir; >> } >> } >> For non-ZERO distance vector dependence, the second part always >> computes src->dest dependence info correctly, as well as edge >> direction of PG. For ZERO distance vector dependence, we rely on the >> swap part (thus topological order) to get correct dependence >> direction. For mentioned case, the two data references are unordered >> in dominance relation, but ordered in RPO. This is why DOM is not >> enough. For if-then-else case, the order actually doesn't matter, and >> the references are unordered in either dominance relation or RPO. >> Specifically, src->dest is always computed correctly for non-ZERO >> distance vector cases, no matter <dr1, dr2> or <dr2, dr1> is passed to >> data dependence analyzer. As for ZERO distance vector (exact >> overlap), the order doesn't matter either because they control >> dependent on the same condition. We can simply assume an arbitrary >> order for it. > > Ok, if it only is an issue for zero-distance then yes, I agree. Yeah, so for distribution, I think we can further simplify pg_add_dependence_edge because swap is only necessary for zero-distance cases. > >>> copied from tree-data-refs.c:find_data_references_in_loop which >>> does iterate over get_loop_body_in_dom_order as well. So isn't the >>> issue latent there as well? >> In theory maybe. In practice, this is not a problem at all since loop >> distribution is the only one handles control dependence so far. > > You mean the only one running into the bogus BB ordering case. > I don't see how handling control dependences factor in here. Yes, that's what I meant. The ordering issue doesn't exist if there is no control dependence between basic blocks in loop body I think. I didn't realize the autopar pass. > >>> >>> That said, what about those "unordered" stmts? I suppose >>> dependence analysis still happily computes a dependence >>> distance but in reality we'd have to consider both execution >>> orders? >> As explained, there is no need to consider both orders. GCC doesn't >> really support control dependence parallelization? > > I think autopar supports an arbitrary CFG inside the loops but as it > will never split them it won't change stmt ordering for zero-distance. > > That said, if dependence info can be incorrect if applied to a loop > we should fixup tree-data-refs.c as well. It might be useful > to make get_loop_body_in_rpo_order available then (and eventually > all _in_dom_order users can work with rpo order as well thus we > can replace it entirely as a second step). I can work on that as a followup patch. Is it OK? Thanks, bin > > Thanks, > Richard. > >> Thanks, >> bin >>> >>> Thanks, >>> Richard. >>> >>> >>>> >>>> Thanks, >>>> bin >>>> 2017-06-07 Bin Cheng <bin.cheng@arm.com> >>>> >>>> * tree-loop-distribution.c (bb_top_order_index): New. >>>> (bb_top_order_index_size, bb_top_order_cmp): New. >>>> (stmts_from_loop): Use topological order. >>>> (pass_loop_distribution::execute): Compute topological order for. >>>> basic blocks. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH GCC][04/13]Sort statements in topological order for loop distribution 2017-06-14 9:25 ` Bin.Cheng @ 2017-06-14 9:52 ` Richard Biener 0 siblings, 0 replies; 6+ messages in thread From: Richard Biener @ 2017-06-14 9:52 UTC (permalink / raw) To: Bin.Cheng; +Cc: gcc-patches On Wed, Jun 14, 2017 at 11:25 AM, Bin.Cheng <amker.cheng@gmail.com> wrote: > On Wed, Jun 14, 2017 at 10:15 AM, Richard Biener > <richard.guenther@gmail.com> wrote: >> On Wed, Jun 14, 2017 at 9:53 AM, Bin.Cheng <amker.cheng@gmail.com> wrote: >>> On Tue, Jun 13, 2017 at 11:59 AM, Richard Biener >>> <richard.guenther@gmail.com> wrote: >>>> On Mon, Jun 12, 2017 at 7:02 PM, Bin Cheng <Bin.Cheng@arm.com> wrote: >>>>> Hi, >>>>> During the work I ran into a latent bug for distributing. For the moment we sort statements >>>>> in dominance order, but that's not enough because basic blocks may be sorted in reverse order >>>>> of execution flow. This results in wrong data dependence direction later. This patch fixes >>>>> the issue by sorting in topological order. >>>>> >>>>> Bootstrap and test on x86_64 and AArch64. Is it OK? >>>> >>>> I suppose you are fixing >>>> >>>> static int >>>> pg_add_dependence_edges (struct graph *rdg, vec<loop_p> loops, int dir, >>>> vec<data_reference_p> drs1, >>>> vec<data_reference_p> drs2) >>>> { >>>> ... >>>> /* Re-shuffle data-refs to be in dominator order. */ >>>> if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1)) >>>> > rdg_vertex_for_stmt (rdg, DR_STMT (dr2))) >>>> { >>>> std::swap (dr1, dr2); >>>> this_dir = -this_dir; >>>> } >>>> >>>> but then for stmts that are not "ordered" by RPO or DOM like >>>> >>>> if (flag) >>>> ... = dr1; >>>> else >>>> ... = dr2; >>>> >>>> this doesn't avoid spurious swaps? Also the code was basically >>> No, this is mainly for below case: >>> if (flag) >>> { >>> partition1: arr[i] = x; >>> } >>> partition2: arr[i] = y; >>> >>> function pg_add_dependence_edges is like: >>> /* Re-shuffle data-refs to be in dominator order. */ >>> if (rdg_vertex_for_stmt (rdg, DR_STMT (dr1)) >>> > rdg_vertex_for_stmt (rdg, DR_STMT (dr2))) >>> { >>> std::swap (dr1, dr2); >>> this_dir = -this_dir; >>> } >>> //... >>> else if (DDR_ARE_DEPENDENT (ddr) == NULL_TREE) >>> { >>> if (DDR_REVERSED_P (ddr)) >>> { >>> std::swap (dr1, dr2); >>> this_dir = -this_dir; >>> } >>> /* Known dependences can still be unordered througout the >>> iteration space, see gcc.dg/tree-ssa/ldist-16.c. */ >>> if (DDR_NUM_DIST_VECTS (ddr) != 1) >>> this_dir = 2; >>> /* If the overlap is exact preserve stmt order. */ >>> else if (lambda_vector_zerop (DDR_DIST_VECT (ddr, 0), 1)) >>> ; >>> else >>> { >>> /* Else as the distance vector is lexicographic positive >>> swap the dependence direction. */ >>> this_dir = -this_dir; >>> } >>> } >>> For non-ZERO distance vector dependence, the second part always >>> computes src->dest dependence info correctly, as well as edge >>> direction of PG. For ZERO distance vector dependence, we rely on the >>> swap part (thus topological order) to get correct dependence >>> direction. For mentioned case, the two data references are unordered >>> in dominance relation, but ordered in RPO. This is why DOM is not >>> enough. For if-then-else case, the order actually doesn't matter, and >>> the references are unordered in either dominance relation or RPO. >>> Specifically, src->dest is always computed correctly for non-ZERO >>> distance vector cases, no matter <dr1, dr2> or <dr2, dr1> is passed to >>> data dependence analyzer. As for ZERO distance vector (exact >>> overlap), the order doesn't matter either because they control >>> dependent on the same condition. We can simply assume an arbitrary >>> order for it. >> >> Ok, if it only is an issue for zero-distance then yes, I agree. > Yeah, so for distribution, I think we can further simplify > pg_add_dependence_edge because swap is only necessary for > zero-distance cases. >> >>>> copied from tree-data-refs.c:find_data_references_in_loop which >>>> does iterate over get_loop_body_in_dom_order as well. So isn't the >>>> issue latent there as well? >>> In theory maybe. In practice, this is not a problem at all since loop >>> distribution is the only one handles control dependence so far. >> >> You mean the only one running into the bogus BB ordering case. >> I don't see how handling control dependences factor in here. > Yes, that's what I meant. The ordering issue doesn't exist if there > is no control dependence between basic blocks in loop body I think. I > didn't realize the autopar pass. >> >>>> >>>> That said, what about those "unordered" stmts? I suppose >>>> dependence analysis still happily computes a dependence >>>> distance but in reality we'd have to consider both execution >>>> orders? >>> As explained, there is no need to consider both orders. GCC doesn't >>> really support control dependence parallelization? >> >> I think autopar supports an arbitrary CFG inside the loops but as it >> will never split them it won't change stmt ordering for zero-distance. >> >> That said, if dependence info can be incorrect if applied to a loop >> we should fixup tree-data-refs.c as well. It might be useful >> to make get_loop_body_in_rpo_order available then (and eventually >> all _in_dom_order users can work with rpo order as well thus we >> can replace it entirely as a second step). > I can work on that as a followup patch. Is it OK? Yes. Thanks, Richard. > Thanks, > bin >> >> Thanks, >> Richard. >> >>> Thanks, >>> bin >>>> >>>> Thanks, >>>> Richard. >>>> >>>> >>>>> >>>>> Thanks, >>>>> bin >>>>> 2017-06-07 Bin Cheng <bin.cheng@arm.com> >>>>> >>>>> * tree-loop-distribution.c (bb_top_order_index): New. >>>>> (bb_top_order_index_size, bb_top_order_cmp): New. >>>>> (stmts_from_loop): Use topological order. >>>>> (pass_loop_distribution::execute): Compute topological order for. >>>>> basic blocks. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-06-14 9:52 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-06-12 17:03 [PATCH GCC][04/13]Sort statements in topological order for loop distribution Bin Cheng 2017-06-13 10:59 ` Richard Biener 2017-06-14 7:53 ` Bin.Cheng 2017-06-14 9:15 ` Richard Biener 2017-06-14 9:25 ` Bin.Cheng 2017-06-14 9:52 ` Richard Biener
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).