* Question about SMS scheduling windows @ 2011-07-27 10:21 Richard Sandiford 2011-07-27 13:34 ` Richard Sandiford 2011-07-27 13:40 ` Revital1 Eres 0 siblings, 2 replies; 9+ messages in thread From: Richard Sandiford @ 2011-07-27 10:21 UTC (permalink / raw) To: gcc; +Cc: zaks, eres I've been looking at SMS, and have a question about get_sched_window. When there are previously-scheduled predessors, we use: if (e->data_type == MEM_DEP) end = MIN (end, SCHED_TIME (v_node) + ii - 1); to get an upper bound on the scheduling window that is permitted by memory dependencies. I think this: SCHED_TIME (v_node) + ii - 1 is an inclusive bound, in that scheduling the node at that time would not break the memory dependence, whereas scheduling at SCHED_TIME (v_node) would. Is that right? I ask because in the final range: start = early_start; end = MIN (end, early_start + ii); /* Schedule the node close to it's predecessors. */ step = 1; END is an exclusive bound. It seems like we might be double-counting here, and effectively limiting the schedule to SCHED_TIME (v_node) + ii - 2. While I'm here, I was also curious about: /* If there are more successors than predecessors schedule the node close to it's successors. */ if (count_succs >= count_preds) { int old_start = start; start = end - 1; end = old_start - 1; step = -1; } This doesn't seem to be in the paper, and the comment suggests "count_succs > count_preds" rather than "count_succs >= count_preds". Is the ">=" vs ">" important? Thanks, Richard ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Question about SMS scheduling windows 2011-07-27 10:21 Question about SMS scheduling windows Richard Sandiford @ 2011-07-27 13:34 ` Richard Sandiford 2011-07-27 13:40 ` Revital1 Eres 1 sibling, 0 replies; 9+ messages in thread From: Richard Sandiford @ 2011-07-27 13:34 UTC (permalink / raw) To: gcc; +Cc: zaks, eres Richard Sandiford <richard.sandiford@linaro.org> writes: > I've been looking at SMS, and have a question about get_sched_window. > When there are previously-scheduled predessors, we use: > > if (e->data_type == MEM_DEP) > end = MIN (end, SCHED_TIME (v_node) + ii - 1); > > to get an upper bound on the scheduling window that is permitted > by memory dependencies. I think this: > > SCHED_TIME (v_node) + ii - 1 > > is an inclusive bound, in that scheduling the node at that time > would not break the memory dependence, whereas scheduling at > SCHED_TIME (v_node) would. Is that right? I meant of course "scheduling at SCHED_TIME (v_node) + ii would". Richard ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Question about SMS scheduling windows 2011-07-27 10:21 Question about SMS scheduling windows Richard Sandiford 2011-07-27 13:34 ` Richard Sandiford @ 2011-07-27 13:40 ` Revital1 Eres 2011-07-27 16:42 ` Ayal Zaks 1 sibling, 1 reply; 9+ messages in thread From: Revital1 Eres @ 2011-07-27 13:40 UTC (permalink / raw) To: Richard Sandiford; +Cc: ayal.zaks, gcc Hello Richard, > I ask because in the final range: > > start = early_start; > end = MIN (end, early_start + ii); > /* Schedule the node close to it's predecessors. */ > step = 1; > > END is an exclusive bound. It seems like we might be double-counting here, > and effectively limiting the schedule to SCHED_TIME (v_node) + ii - 2. Yes, I think it indeed should be fixed. Thanks for reporting on this. Revital ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Question about SMS scheduling windows 2011-07-27 13:40 ` Revital1 Eres @ 2011-07-27 16:42 ` Ayal Zaks 2011-08-04 9:03 ` Richard Sandiford 0 siblings, 1 reply; 9+ messages in thread From: Ayal Zaks @ 2011-07-27 16:42 UTC (permalink / raw) To: Revital1 Eres; +Cc: Richard Sandiford, gcc (sorry for replicated submissions, had to convert to plain text) >2011/7/27 Revital1 Eres <ERES@il.ibm.com> > >Hello Richard, > > >> I ask because in the final range: >> >> start = early_start; >> end = MIN (end, early_start + ii); >> /* Schedule the node close to it's predecessors. */ >> step = 1; >> >> END is an exclusive bound. It seems like we might be double-counting here, >> and effectively limiting the schedule to SCHED_TIME (v_node) + ii - 2. > > >Yes, I think it indeed should be fixed. Thanks for reporting on this. > >Revital Agreed; if (e->data_type == MEM_DEP) end = MIN (end, SCHED_TIME (v_node) + ii - 1); should be replaced with if (e->data_type == MEM_DEP) end = MIN (end, p_st + ii); also for the (3rd) case when there are both previously-scheduled predessors and previously-scheduled successors. The range is inclusive of start and exclusive of end: for (c = start; c != end; c += step)... >This doesn't seem to be in the paper, and the comment suggests >"count_succs > count_preds" rather than "count_succs >= count_preds". >Is the ">=" vs ">" important? I think not: in both cases you'll be trying to minimize the same number of live ranges. But indeed it's good to be consistent with the comment. Thanks, Ayal. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Question about SMS scheduling windows 2011-07-27 16:42 ` Ayal Zaks @ 2011-08-04 9:03 ` Richard Sandiford 2011-08-04 16:27 ` Ayal Zaks 0 siblings, 1 reply; 9+ messages in thread From: Richard Sandiford @ 2011-08-04 9:03 UTC (permalink / raw) To: Ayal Zaks; +Cc: Revital1 Eres, gcc Hi Ayal, Thanks to you and Revital for the replies. The reason I asked is that I wanted to rewrite gen_sched_window so that it has only one loop over the PSPs and one loop over the PSSs. I have a follow-up patch to use iv analysis to reduce the number of memory dependencies (or at least increase the distance between them), and it was easier to do that after this change. Ayal Zaks <ayal.zaks@gmail.com> writes: >>> I ask because in the final range: >>> >>> start = early_start; >>> end = MIN (end, early_start + ii); >>> /* Schedule the node close to it's predecessors. */ >>> step = 1; >>> >>> END is an exclusive bound. It seems like we might be double-counting > here, >>> and effectively limiting the schedule to SCHED_TIME (v_node) + ii - 2. >> >> >>Yes, I think it indeed should be fixed. Thanks for reporting on this. >> >>Revital > > Agreed; > > if (e->data_type == MEM_DEP) > end = MIN (end, SCHED_TIME (v_node) + ii - 1); > > should be replaced with > > if (e->data_type == MEM_DEP) > end = MIN (end, p_st + ii); > > also for the (3rd) case when there are both previously-scheduled > predessors and previously-scheduled successors. The range is inclusive > of start and exclusive of end: for (c = start; c != end; c += step)... OK. For the follow-on iv patch, it seemed easier to keep both bounds inclusive for the loop, then make the "end" exclusive when setting the out parameters. (Note that there shouldn't be any problem with overflow when making the bound exclusive, because the size of the range has been restricted to "ii" by that stage. The follow-on patch does use saturating maths for all ops though.) >>This doesn't seem to be in the paper, and the comment suggests >>"count_succs > count_preds" rather than "count_succs >= count_preds". >>Is the ">=" vs ">" important? > > I think not: in both cases you'll be trying to minimize the same > number of live ranges. But indeed it's good to be consistent with the > comment. OK. For no particular reason other than cowardice, I ended up keeping this as: ! if (count_succs && count_succs >= count_preds) The reason for asking was that: ! if (count_succs > count_preds) seemed more natural, and would match the existing comment. I'm happy to test that instead if you prefer. I don't have powerpc hardware that I can do meaningful performance testing on, but I did run it through a Popular* Embedded Benchmark on an ARM Cortex-A8 board with -O3 -fmodulo-sched -fmodulo-sched-allow-regmoves. There were no changes. (And this is a benchmark that does benefit from modulo scheduling, in some cases by a significant amount.) Bootstrapped & regression-tested on powerpc-ibm-aix5.3 with the additional patch: Index: gcc/opts.c =================================================================== --- gcc/opts.c 2011-08-03 10:56:50.000000000 +0100 +++ gcc/opts.c 2011-08-03 10:56:50.000000000 +0100 @@ -449,6 +449,8 @@ static const struct default_options defa { OPT_LEVELS_1_PLUS, OPT_ftree_ch, NULL, 1 }, { OPT_LEVELS_1_PLUS, OPT_fcombine_stack_adjustments, NULL, 1 }, { OPT_LEVELS_1_PLUS, OPT_fcompare_elim, NULL, 1 }, + { OPT_LEVELS_1_PLUS, OPT_fmodulo_sched, NULL, 1 }, + { OPT_LEVELS_1_PLUS, OPT_fmodulo_sched_allow_regmoves, NULL, 1 }, /* -O2 optimizations. */ { OPT_LEVELS_2_PLUS, OPT_finline_small_functions, NULL, 1 }, applied for both the "before" and "after" runs. OK to install? Thanks, Richard gcc/ * modulo-sched.c (get_sched_window): Use just one loop for predecessors and one loop for successors. Fix upper bound of memory range. Index: gcc/modulo-sched.c =================================================================== *** gcc/modulo-sched.c 2011-08-04 09:09:29.000000000 +0100 --- gcc/modulo-sched.c 2011-08-04 09:49:16.000000000 +0100 *************** #define MAX_SPLIT_NUM 10 *** 1630,1638 **** static int get_sched_window (partial_schedule_ptr ps, ddg_node_ptr u_node, ! sbitmap sched_nodes, int ii, int *start_p, int *step_p, int *end_p) { int start, step, end; ddg_edge_ptr e; sbitmap psp = sbitmap_alloc (ps->g->num_nodes); sbitmap pss = sbitmap_alloc (ps->g->num_nodes); --- 1630,1640 ---- static int get_sched_window (partial_schedule_ptr ps, ddg_node_ptr u_node, ! sbitmap sched_nodes, int ii, int *start_p, int *step_p, ! int *end_p) { int start, step, end; + int early_start, late_start; ddg_edge_ptr e; sbitmap psp = sbitmap_alloc (ps->g->num_nodes); sbitmap pss = sbitmap_alloc (ps->g->num_nodes); *************** get_sched_window (partial_schedule_ptr p *** 1640,1645 **** --- 1642,1649 ---- sbitmap u_node_succs = NODE_SUCCESSORS (u_node); int psp_not_empty; int pss_not_empty; + int count_preds; + int count_succs; /* 1. compute sched window for u (start, end, step). */ sbitmap_zero (psp); *************** get_sched_window (partial_schedule_ptr p *** 1647,1861 **** psp_not_empty = sbitmap_a_and_b_cg (psp, u_node_preds, sched_nodes); pss_not_empty = sbitmap_a_and_b_cg (pss, u_node_succs, sched_nodes); ! if (psp_not_empty && !pss_not_empty) ! { ! int early_start = INT_MIN; ! ! end = INT_MAX; ! for (e = u_node->in; e != 0; e = e->next_in) ! { ! ddg_node_ptr v_node = e->src; ! ! if (dump_file) ! { ! fprintf (dump_file, "\nProcessing edge: "); ! print_ddg_edge (dump_file, e); ! fprintf (dump_file, ! "\nScheduling %d (%d) in psp_not_empty," ! " checking p %d (%d): ", u_node->cuid, ! INSN_UID (u_node->insn), v_node->cuid, INSN_UID ! (v_node->insn)); ! } ! ! if (TEST_BIT (sched_nodes, v_node->cuid)) ! { ! int p_st = SCHED_TIME (v_node); ! ! early_start = ! MAX (early_start, p_st + e->latency - (e->distance * ii)); ! ! if (dump_file) ! fprintf (dump_file, ! "pred st = %d; early_start = %d; latency: %d", ! p_st, early_start, e->latency); ! ! if (e->data_type == MEM_DEP) ! end = MIN (end, SCHED_TIME (v_node) + ii - 1); ! } ! else if (dump_file) ! fprintf (dump_file, "the node is not scheduled\n"); ! } ! start = early_start; ! end = MIN (end, early_start + ii); ! /* Schedule the node close to it's predecessors. */ ! step = 1; ! ! if (dump_file) ! fprintf (dump_file, ! "\nScheduling %d (%d) in a window (%d..%d) with step %d\n", ! u_node->cuid, INSN_UID (u_node->insn), start, end, step); ! } ! ! else if (!psp_not_empty && pss_not_empty) ! { ! int late_start = INT_MAX; ! ! end = INT_MIN; ! for (e = u_node->out; e != 0; e = e->next_out) ! { ! ddg_node_ptr v_node = e->dest; ! ! if (dump_file) ! { ! fprintf (dump_file, "\nProcessing edge:"); ! print_ddg_edge (dump_file, e); ! fprintf (dump_file, ! "\nScheduling %d (%d) in pss_not_empty," ! " checking s %d (%d): ", u_node->cuid, ! INSN_UID (u_node->insn), v_node->cuid, INSN_UID ! (v_node->insn)); ! } ! ! if (TEST_BIT (sched_nodes, v_node->cuid)) ! { ! int s_st = SCHED_TIME (v_node); ! ! late_start = MIN (late_start, ! s_st - e->latency + (e->distance * ii)); ! ! if (dump_file) ! fprintf (dump_file, ! "succ st = %d; late_start = %d; latency = %d", ! s_st, late_start, e->latency); ! ! if (e->data_type == MEM_DEP) ! end = MAX (end, SCHED_TIME (v_node) - ii + 1); ! if (dump_file) ! fprintf (dump_file, "end = %d\n", end); ! ! } ! else if (dump_file) ! fprintf (dump_file, "the node is not scheduled\n"); ! } ! start = late_start; ! end = MAX (end, late_start - ii); ! /* Schedule the node close to it's successors. */ ! step = -1; ! if (dump_file) ! fprintf (dump_file, ! "\nScheduling %d (%d) in a window (%d..%d) with step %d\n", ! u_node->cuid, INSN_UID (u_node->insn), start, end, step); ! } ! else if (psp_not_empty && pss_not_empty) ! { ! int early_start = INT_MIN; ! int late_start = INT_MAX; ! int count_preds = 0; ! int count_succs = 0; ! start = INT_MIN; ! end = INT_MAX; ! for (e = u_node->in; e != 0; e = e->next_in) ! { ! ddg_node_ptr v_node = e->src; ! if (dump_file) ! { ! fprintf (dump_file, "\nProcessing edge:"); ! print_ddg_edge (dump_file, e); fprintf (dump_file, ! "\nScheduling %d (%d) in psp_pss_not_empty," ! " checking p %d (%d): ", u_node->cuid, INSN_UID ! (u_node->insn), v_node->cuid, INSN_UID ! (v_node->insn)); ! } ! ! if (TEST_BIT (sched_nodes, v_node->cuid)) ! { ! int p_st = SCHED_TIME (v_node); ! ! early_start = MAX (early_start, ! p_st + e->latency ! - (e->distance * ii)); ! ! if (dump_file) ! fprintf (dump_file, ! "pred st = %d; early_start = %d; latency = %d", ! p_st, early_start, e->latency); ! ! if (e->type == TRUE_DEP && e->data_type == REG_DEP) ! count_preds++; ! if (e->data_type == MEM_DEP) ! end = MIN (end, SCHED_TIME (v_node) + ii - 1); ! } ! else if (dump_file) ! fprintf (dump_file, "the node is not scheduled\n"); ! } ! for (e = u_node->out; e != 0; e = e->next_out) ! { ! ddg_node_ptr v_node = e->dest; ! if (dump_file) ! { ! fprintf (dump_file, "\nProcessing edge:"); ! print_ddg_edge (dump_file, e); ! fprintf (dump_file, ! "\nScheduling %d (%d) in psp_pss_not_empty," ! " checking s %d (%d): ", u_node->cuid, INSN_UID ! (u_node->insn), v_node->cuid, INSN_UID ! (v_node->insn)); ! } ! if (TEST_BIT (sched_nodes, v_node->cuid)) ! { ! int s_st = SCHED_TIME (v_node); ! late_start = MIN (late_start, ! s_st - e->latency ! + (e->distance * ii)); ! if (dump_file) ! fprintf (dump_file, ! "succ st = %d; late_start = %d; latency = %d", ! s_st, late_start, e->latency); ! if (e->type == TRUE_DEP && e->data_type == REG_DEP) ! count_succs++; ! if (e->data_type == MEM_DEP) ! start = MAX (start, SCHED_TIME (v_node) - ii + 1); ! } ! else if (dump_file) ! fprintf (dump_file, "the node is not scheduled\n"); ! } ! start = MAX (start, early_start); ! end = MIN (end, MIN (early_start + ii, late_start + 1)); ! step = 1; ! /* If there are more successors than predecessors schedule the ! node close to it's successors. */ ! if (count_succs >= count_preds) ! { ! int old_start = start; ! start = end - 1; ! end = old_start - 1; ! step = -1; ! } ! } ! else /* psp is empty && pss is empty. */ ! { ! start = SCHED_ASAP (u_node); ! end = start + ii; ! step = 1; } *start_p = start; *step_p = step; *end_p = end; --- 1651,1769 ---- psp_not_empty = sbitmap_a_and_b_cg (psp, u_node_preds, sched_nodes); pss_not_empty = sbitmap_a_and_b_cg (pss, u_node_succs, sched_nodes); ! early_start = INT_MIN; ! late_start = INT_MAX; ! start = INT_MIN; ! end = INT_MAX; ! count_preds = psp_not_empty; ! count_succs = pss_not_empty; ! ! /* Calculate early_start and limit end. Both bounds are inclusive. */ ! if (psp_not_empty) ! for (e = u_node->in; e != 0; e = e->next_in) ! { ! ddg_node_ptr v_node = e->src; ! if (dump_file) ! { ! fprintf (dump_file, "\nProcessing edge: "); ! print_ddg_edge (dump_file, e); ! fprintf (dump_file, ! "\nScheduling %d (%d) in psp_not_empty," ! " checking p %d (%d): ", u_node->cuid, ! INSN_UID (u_node->insn), v_node->cuid, INSN_UID ! (v_node->insn)); ! } ! if (TEST_BIT (sched_nodes, v_node->cuid)) ! { ! int p_st = SCHED_TIME (v_node); ! early_start = MAX (early_start, ! p_st + e->latency - (e->distance * ii)); ! if (e->data_type == MEM_DEP) ! end = MIN (end, p_st + ii - 1); ! if (e->type == TRUE_DEP && e->data_type == REG_DEP) ! count_preds++; ! if (dump_file) fprintf (dump_file, ! "pred st = %d; early_start = %d; latency: %d;" ! " end: %d\n", p_st, early_start, e->latency, end); ! } ! else if (dump_file) ! fprintf (dump_file, "the node is not scheduled\n"); ! } ! /* Calculate late_start and limit start. Both bounds are inclusive. */ ! if (pss_not_empty) ! for (e = u_node->out; e != 0; e = e->next_out) ! { ! ddg_node_ptr v_node = e->dest; ! if (dump_file) ! { ! fprintf (dump_file, "\nProcessing edge:"); ! print_ddg_edge (dump_file, e); ! fprintf (dump_file, ! "\nScheduling %d (%d) in pss_not_empty," ! " checking s %d (%d): ", u_node->cuid, ! INSN_UID (u_node->insn), v_node->cuid, INSN_UID ! (v_node->insn)); ! } ! if (TEST_BIT (sched_nodes, v_node->cuid)) ! { ! int s_st = SCHED_TIME (v_node); ! late_start = MIN (late_start, ! s_st - e->latency + (e->distance * ii)); ! if (e->data_type == MEM_DEP) ! start = MAX (start, s_st - ii + 1); ! if (e->type == TRUE_DEP && e->data_type == REG_DEP) ! count_succs++; ! if (dump_file) ! fprintf (dump_file, ! "succ st = %d; late_start = %d; latency = %d;" ! " start=%d", s_st, late_start, e->latency, start); ! } ! else if (dump_file) ! fprintf (dump_file, "the node is not scheduled\n"); ! } ! /* Get a target scheduling window no bigger than ii. */ ! if (early_start == INT_MIN && late_start == INT_MAX) ! early_start = SCHED_ASAP (u_node); ! else if (early_start == INT_MIN) ! early_start = late_start - (ii - 1); ! late_start = MIN (early_start + (ii - 1), late_start); ! ! /* Apply memory dependence limits. */ ! start = MAX (start, early_start); ! end = MIN (end, late_start); ! step = 1; ! ! /* If there are at least as many successors as predecessors, schedule the ! node close to its successors. */ ! if (count_succs && count_succs >= count_preds) ! { ! int tmp = end; ! end = start; ! start = tmp; ! step = -1; } + /* Now that we've finalized the window, make END an exclusive rather + than an inclusive bound. */ + end += step; + *start_p = start; *step_p = step; *end_p = end; *************** get_sched_window (partial_schedule_ptr p *** 1867,1876 **** if (dump_file) fprintf (dump_file, "\nEmpty window: start=%d, end=%d, step=%d\n", start, end, step); ! return -1; } ! return 0; } /* Calculate MUST_PRECEDE/MUST_FOLLOW bitmaps of U_NODE; which is the --- 1775,1784 ---- if (dump_file) fprintf (dump_file, "\nEmpty window: start=%d, end=%d, step=%d\n", start, end, step); ! return -1; } ! return 0; } /* Calculate MUST_PRECEDE/MUST_FOLLOW bitmaps of U_NODE; which is the ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Question about SMS scheduling windows 2011-08-04 9:03 ` Richard Sandiford @ 2011-08-04 16:27 ` Ayal Zaks 2011-08-08 9:30 ` Richard Sandiford 0 siblings, 1 reply; 9+ messages in thread From: Ayal Zaks @ 2011-08-04 16:27 UTC (permalink / raw) To: Ayal Zaks, Revital1 Eres, gcc, richard.sandiford Hi Richard, 2011/8/4 Richard Sandiford <richard.sandiford@linaro.org> > > Hi Ayal, > > Thanks to you and Revital for the replies. The reason I asked is that > I wanted to rewrite gen_sched_window so that it has only one loop over > the PSPs and one loop over the PSSs. This rewrite makes perfect sense regardless of any follow-up patches, as it clarifies and reuses code. > > I have a follow-up patch to use > iv analysis to reduce the number of memory dependencies (or at least > increase the distance between them), and it was easier to do that > after this change. Reducing memory dependencies is definitely important. > > Ayal Zaks <ayal.zaks@gmail.com> writes: > >>> I ask because in the final range: > >>> > >>> start = early_start; > >>> end = MIN (end, early_start + ii); > >>> /* Schedule the node close to it's predecessors. */ > >>> step = 1; > >>> > >>> END is an exclusive bound. It seems like we might be double-counting > > here, > >>> and effectively limiting the schedule to SCHED_TIME (v_node) + ii - 2. > >> > >> > >>Yes, I think it indeed should be fixed. Thanks for reporting on this. > >> > >>Revital > > > > Agreed; > > > > if (e->data_type == MEM_DEP) > > end = MIN (end, SCHED_TIME (v_node) + ii - 1); > > > > should be replaced with > > > > if (e->data_type == MEM_DEP) > > end = MIN (end, p_st + ii); > > > > also for the (3rd) case when there are both previously-scheduled > > predessors and previously-scheduled successors. The range is inclusive > > of start and exclusive of end: for (c = start; c != end; c += step)... > > OK. For the follow-on iv patch, it seemed easier to keep both bounds > inclusive for the loop, then make the "end" exclusive when setting the > out parameters. (Note that there shouldn't be any problem with overflow > when making the bound exclusive, because the size of the range has been > restricted to "ii" by that stage. The follow-on patch does use > saturating maths for all ops though.) Sure, no problem having "end" inclusive, as it simplifies things. We can even keep it inclusive all the way through, iterating over: for (c = start; c != end+step; c += step)... > > >>This doesn't seem to be in the paper, and the comment suggests > >>"count_succs > count_preds" rather than "count_succs >= count_preds". > >>Is the ">=" vs ">" important? > > > > I think not: in both cases you'll be trying to minimize the same > > number of live ranges. But indeed it's good to be consistent with the > > comment. > > OK. For no particular reason other than cowardice, I ended up keeping > this as: > > ! if (count_succs && count_succs >= count_preds) > > The reason for asking was that: > > ! if (count_succs > count_preds) > > seemed more natural, and would match the existing comment. I'm happy > to test that instead if you prefer. > I wouldn't worry about this tie breaker, unless there's a reason (in which case the reason should hopefully provide a secondary criteria). > > I don't have powerpc hardware that I can do meaningful performance > testing on, but I did run it through a Popular* Embedded Benchmark > on an ARM Cortex-A8 board with -O3 -fmodulo-sched > -fmodulo-sched-allow-regmoves. There were no changes. (And this is > a benchmark that does benefit from modulo scheduling, in some cases > by a significant amount.) > Ok, the patch should be neutral performance-wise. One could check btw if SMS produces exactly the same output in both cases, even w/o a powerpc (or any other) HW. > > Bootstrapped & regression-tested on powerpc-ibm-aix5.3 with the > additional patch: > > Index: gcc/opts.c > =================================================================== > --- gcc/opts.c 2011-08-03 10:56:50.000000000 +0100 > +++ gcc/opts.c 2011-08-03 10:56:50.000000000 +0100 > @@ -449,6 +449,8 @@ static const struct default_options defa > { OPT_LEVELS_1_PLUS, OPT_ftree_ch, NULL, 1 }, > { OPT_LEVELS_1_PLUS, OPT_fcombine_stack_adjustments, NULL, 1 }, > { OPT_LEVELS_1_PLUS, OPT_fcompare_elim, NULL, 1 }, > + { OPT_LEVELS_1_PLUS, OPT_fmodulo_sched, NULL, 1 }, > + { OPT_LEVELS_1_PLUS, OPT_fmodulo_sched_allow_regmoves, NULL, 1 }, > > /* -O2 optimizations. */ > { OPT_LEVELS_2_PLUS, OPT_finline_small_functions, NULL, 1 }, > > applied for both the "before" and "after" runs. OK to install? Yes, with just a couple of minor non-mandatory comments: 1. Setting "count_preds = psp_not_empty;" and "count_succs = pss_not_empty;" suggests that you want to decrement non-interesting neighbors instead of incrementing interesting ones. Otherwise can initialize to zero (doesn't really matter, just a bit confusing). 2. I find it easier to read "late_start = MIN (late_start, early_start + (ii - 1));" instead of "late_start = MIN (early_start + (ii - 1), late_start);". 3. Suggest to set step=1 at the beginning, explaining from the start that we first compute a forward range (start<end), and may later possibly revert it. 4. "late_start" would better be renamed "late_end". 5. While we're at it, as you've unified the treatments, we can now actually do with "start" and "end" only, and do away with early_start and late_start altogether... Thanks, Ayal. > > Thanks, > Richard > > > gcc/ > * modulo-sched.c (get_sched_window): Use just one loop for predecessors > and one loop for successors. Fix upper bound of memory range. > > Index: gcc/modulo-sched.c > =================================================================== > *** gcc/modulo-sched.c 2011-08-04 09:09:29.000000000 +0100 > --- gcc/modulo-sched.c 2011-08-04 09:49:16.000000000 +0100 > *************** #define MAX_SPLIT_NUM 10 > *** 1630,1638 **** > > static int > get_sched_window (partial_schedule_ptr ps, ddg_node_ptr u_node, > ! sbitmap sched_nodes, int ii, int *start_p, int *step_p, int *end_p) > { > int start, step, end; > ddg_edge_ptr e; > sbitmap psp = sbitmap_alloc (ps->g->num_nodes); > sbitmap pss = sbitmap_alloc (ps->g->num_nodes); > --- 1630,1640 ---- > > static int > get_sched_window (partial_schedule_ptr ps, ddg_node_ptr u_node, > ! sbitmap sched_nodes, int ii, int *start_p, int *step_p, > ! int *end_p) > { > int start, step, end; > + int early_start, late_start; > ddg_edge_ptr e; > sbitmap psp = sbitmap_alloc (ps->g->num_nodes); > sbitmap pss = sbitmap_alloc (ps->g->num_nodes); > *************** get_sched_window (partial_schedule_ptr p > *** 1640,1645 **** > --- 1642,1649 ---- > sbitmap u_node_succs = NODE_SUCCESSORS (u_node); > int psp_not_empty; > int pss_not_empty; > + int count_preds; > + int count_succs; > > /* 1. compute sched window for u (start, end, step). */ > sbitmap_zero (psp); > *************** get_sched_window (partial_schedule_ptr p > *** 1647,1861 **** > psp_not_empty = sbitmap_a_and_b_cg (psp, u_node_preds, sched_nodes); > pss_not_empty = sbitmap_a_and_b_cg (pss, u_node_succs, sched_nodes); > > ! if (psp_not_empty && !pss_not_empty) > ! { > ! int early_start = INT_MIN; > ! > ! end = INT_MAX; > ! for (e = u_node->in; e != 0; e = e->next_in) > ! { > ! ddg_node_ptr v_node = e->src; > ! > ! if (dump_file) > ! { > ! fprintf (dump_file, "\nProcessing edge: "); > ! print_ddg_edge (dump_file, e); > ! fprintf (dump_file, > ! "\nScheduling %d (%d) in psp_not_empty," > ! " checking p %d (%d): ", u_node->cuid, > ! INSN_UID (u_node->insn), v_node->cuid, INSN_UID > ! (v_node->insn)); > ! } > ! > ! if (TEST_BIT (sched_nodes, v_node->cuid)) > ! { > ! int p_st = SCHED_TIME (v_node); > ! > ! early_start = > ! MAX (early_start, p_st + e->latency - (e->distance * ii)); > ! > ! if (dump_file) > ! fprintf (dump_file, > ! "pred st = %d; early_start = %d; latency: %d", > ! p_st, early_start, e->latency); > ! > ! if (e->data_type == MEM_DEP) > ! end = MIN (end, SCHED_TIME (v_node) + ii - 1); > ! } > ! else if (dump_file) > ! fprintf (dump_file, "the node is not scheduled\n"); > ! } > ! start = early_start; > ! end = MIN (end, early_start + ii); > ! /* Schedule the node close to it's predecessors. */ > ! step = 1; > ! > ! if (dump_file) > ! fprintf (dump_file, > ! "\nScheduling %d (%d) in a window (%d..%d) with step %d\n", > ! u_node->cuid, INSN_UID (u_node->insn), start, end, step); > ! } > ! > ! else if (!psp_not_empty && pss_not_empty) > ! { > ! int late_start = INT_MAX; > ! > ! end = INT_MIN; > ! for (e = u_node->out; e != 0; e = e->next_out) > ! { > ! ddg_node_ptr v_node = e->dest; > ! > ! if (dump_file) > ! { > ! fprintf (dump_file, "\nProcessing edge:"); > ! print_ddg_edge (dump_file, e); > ! fprintf (dump_file, > ! "\nScheduling %d (%d) in pss_not_empty," > ! " checking s %d (%d): ", u_node->cuid, > ! INSN_UID (u_node->insn), v_node->cuid, INSN_UID > ! (v_node->insn)); > ! } > ! > ! if (TEST_BIT (sched_nodes, v_node->cuid)) > ! { > ! int s_st = SCHED_TIME (v_node); > ! > ! late_start = MIN (late_start, > ! s_st - e->latency + (e->distance * ii)); > ! > ! if (dump_file) > ! fprintf (dump_file, > ! "succ st = %d; late_start = %d; latency = %d", > ! s_st, late_start, e->latency); > ! > ! if (e->data_type == MEM_DEP) > ! end = MAX (end, SCHED_TIME (v_node) - ii + 1); > ! if (dump_file) > ! fprintf (dump_file, "end = %d\n", end); > ! > ! } > ! else if (dump_file) > ! fprintf (dump_file, "the node is not scheduled\n"); > > ! } > ! start = late_start; > ! end = MAX (end, late_start - ii); > ! /* Schedule the node close to it's successors. */ > ! step = -1; > > ! if (dump_file) > ! fprintf (dump_file, > ! "\nScheduling %d (%d) in a window (%d..%d) with step %d\n", > ! u_node->cuid, INSN_UID (u_node->insn), start, end, step); > > ! } > > ! else if (psp_not_empty && pss_not_empty) > ! { > ! int early_start = INT_MIN; > ! int late_start = INT_MAX; > ! int count_preds = 0; > ! int count_succs = 0; > > ! start = INT_MIN; > ! end = INT_MAX; > ! for (e = u_node->in; e != 0; e = e->next_in) > ! { > ! ddg_node_ptr v_node = e->src; > > ! if (dump_file) > ! { > ! fprintf (dump_file, "\nProcessing edge:"); > ! print_ddg_edge (dump_file, e); > fprintf (dump_file, > ! "\nScheduling %d (%d) in psp_pss_not_empty," > ! " checking p %d (%d): ", u_node->cuid, INSN_UID > ! (u_node->insn), v_node->cuid, INSN_UID > ! (v_node->insn)); > ! } > ! > ! if (TEST_BIT (sched_nodes, v_node->cuid)) > ! { > ! int p_st = SCHED_TIME (v_node); > ! > ! early_start = MAX (early_start, > ! p_st + e->latency > ! - (e->distance * ii)); > ! > ! if (dump_file) > ! fprintf (dump_file, > ! "pred st = %d; early_start = %d; latency = %d", > ! p_st, early_start, e->latency); > ! > ! if (e->type == TRUE_DEP && e->data_type == REG_DEP) > ! count_preds++; > > ! if (e->data_type == MEM_DEP) > ! end = MIN (end, SCHED_TIME (v_node) + ii - 1); > ! } > ! else if (dump_file) > ! fprintf (dump_file, "the node is not scheduled\n"); > > ! } > ! for (e = u_node->out; e != 0; e = e->next_out) > ! { > ! ddg_node_ptr v_node = e->dest; > > ! if (dump_file) > ! { > ! fprintf (dump_file, "\nProcessing edge:"); > ! print_ddg_edge (dump_file, e); > ! fprintf (dump_file, > ! "\nScheduling %d (%d) in psp_pss_not_empty," > ! " checking s %d (%d): ", u_node->cuid, INSN_UID > ! (u_node->insn), v_node->cuid, INSN_UID > ! (v_node->insn)); > ! } > > ! if (TEST_BIT (sched_nodes, v_node->cuid)) > ! { > ! int s_st = SCHED_TIME (v_node); > > ! late_start = MIN (late_start, > ! s_st - e->latency > ! + (e->distance * ii)); > > ! if (dump_file) > ! fprintf (dump_file, > ! "succ st = %d; late_start = %d; latency = %d", > ! s_st, late_start, e->latency); > > ! if (e->type == TRUE_DEP && e->data_type == REG_DEP) > ! count_succs++; > > ! if (e->data_type == MEM_DEP) > ! start = MAX (start, SCHED_TIME (v_node) - ii + 1); > ! } > ! else if (dump_file) > ! fprintf (dump_file, "the node is not scheduled\n"); > > ! } > ! start = MAX (start, early_start); > ! end = MIN (end, MIN (early_start + ii, late_start + 1)); > ! step = 1; > ! /* If there are more successors than predecessors schedule the > ! node close to it's successors. */ > ! if (count_succs >= count_preds) > ! { > ! int old_start = start; > > ! start = end - 1; > ! end = old_start - 1; > ! step = -1; > ! } > ! } > ! else /* psp is empty && pss is empty. */ > ! { > ! start = SCHED_ASAP (u_node); > ! end = start + ii; > ! step = 1; > } > > *start_p = start; > *step_p = step; > *end_p = end; > --- 1651,1769 ---- > psp_not_empty = sbitmap_a_and_b_cg (psp, u_node_preds, sched_nodes); > pss_not_empty = sbitmap_a_and_b_cg (pss, u_node_succs, sched_nodes); > > ! early_start = INT_MIN; > ! late_start = INT_MAX; > ! start = INT_MIN; > ! end = INT_MAX; > ! count_preds = psp_not_empty; > ! count_succs = pss_not_empty; > ! > ! /* Calculate early_start and limit end. Both bounds are inclusive. */ > ! if (psp_not_empty) > ! for (e = u_node->in; e != 0; e = e->next_in) > ! { > ! ddg_node_ptr v_node = e->src; > > ! if (dump_file) > ! { > ! fprintf (dump_file, "\nProcessing edge: "); > ! print_ddg_edge (dump_file, e); > ! fprintf (dump_file, > ! "\nScheduling %d (%d) in psp_not_empty," > ! " checking p %d (%d): ", u_node->cuid, > ! INSN_UID (u_node->insn), v_node->cuid, INSN_UID > ! (v_node->insn)); > ! } > > ! if (TEST_BIT (sched_nodes, v_node->cuid)) > ! { > ! int p_st = SCHED_TIME (v_node); > > ! early_start = MAX (early_start, > ! p_st + e->latency - (e->distance * ii)); > > ! if (e->data_type == MEM_DEP) > ! end = MIN (end, p_st + ii - 1); > > ! if (e->type == TRUE_DEP && e->data_type == REG_DEP) > ! count_preds++; > > ! if (dump_file) > fprintf (dump_file, > ! "pred st = %d; early_start = %d; latency: %d;" > ! " end: %d\n", p_st, early_start, e->latency, end); > > ! } > ! else if (dump_file) > ! fprintf (dump_file, "the node is not scheduled\n"); > ! } > > ! /* Calculate late_start and limit start. Both bounds are inclusive. */ > ! if (pss_not_empty) > ! for (e = u_node->out; e != 0; e = e->next_out) > ! { > ! ddg_node_ptr v_node = e->dest; > > ! if (dump_file) > ! { > ! fprintf (dump_file, "\nProcessing edge:"); > ! print_ddg_edge (dump_file, e); > ! fprintf (dump_file, > ! "\nScheduling %d (%d) in pss_not_empty," > ! " checking s %d (%d): ", u_node->cuid, > ! INSN_UID (u_node->insn), v_node->cuid, INSN_UID > ! (v_node->insn)); > ! } > > ! if (TEST_BIT (sched_nodes, v_node->cuid)) > ! { > ! int s_st = SCHED_TIME (v_node); > > ! late_start = MIN (late_start, > ! s_st - e->latency + (e->distance * ii)); > > ! if (e->data_type == MEM_DEP) > ! start = MAX (start, s_st - ii + 1); > > ! if (e->type == TRUE_DEP && e->data_type == REG_DEP) > ! count_succs++; > > ! if (dump_file) > ! fprintf (dump_file, > ! "succ st = %d; late_start = %d; latency = %d;" > ! " start=%d", s_st, late_start, e->latency, start); > > ! } > ! else if (dump_file) > ! fprintf (dump_file, "the node is not scheduled\n"); > ! } > > ! /* Get a target scheduling window no bigger than ii. */ > ! if (early_start == INT_MIN && late_start == INT_MAX) > ! early_start = SCHED_ASAP (u_node); > ! else if (early_start == INT_MIN) > ! early_start = late_start - (ii - 1); > ! late_start = MIN (early_start + (ii - 1), late_start); > ! > ! /* Apply memory dependence limits. */ > ! start = MAX (start, early_start); > ! end = MIN (end, late_start); > ! step = 1; > ! > ! /* If there are at least as many successors as predecessors, schedule the > ! node close to its successors. */ > ! if (count_succs && count_succs >= count_preds) > ! { > ! int tmp = end; > ! end = start; > ! start = tmp; > ! step = -1; > } > > + /* Now that we've finalized the window, make END an exclusive rather > + than an inclusive bound. */ > + end += step; > + > *start_p = start; > *step_p = step; > *end_p = end; > *************** get_sched_window (partial_schedule_ptr p > *** 1867,1876 **** > if (dump_file) > fprintf (dump_file, "\nEmpty window: start=%d, end=%d, step=%d\n", > start, end, step); > ! return -1; > } > > ! return 0; > } > > /* Calculate MUST_PRECEDE/MUST_FOLLOW bitmaps of U_NODE; which is the > --- 1775,1784 ---- > if (dump_file) > fprintf (dump_file, "\nEmpty window: start=%d, end=%d, step=%d\n", > start, end, step); > ! return -1; > } > > ! return 0; > } > > /* Calculate MUST_PRECEDE/MUST_FOLLOW bitmaps of U_NODE; which is the ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Question about SMS scheduling windows 2011-08-04 16:27 ` Ayal Zaks @ 2011-08-08 9:30 ` Richard Sandiford 2011-08-08 19:36 ` Ayal Zaks 0 siblings, 1 reply; 9+ messages in thread From: Richard Sandiford @ 2011-08-08 9:30 UTC (permalink / raw) To: Ayal Zaks; +Cc: Revital1 Eres, gcc Ayal Zaks <ayal.zaks@gmail.com> writes: >> OK. For the follow-on iv patch, it seemed easier to keep both bounds >> inclusive for the loop, then make the "end" exclusive when setting the >> out parameters. (Note that there shouldn't be any problem with overflow >> when making the bound exclusive, because the size of the range has been >> restricted to "ii" by that stage. The follow-on patch does use >> saturating maths for all ops though.) > > Sure, no problem having "end" inclusive, as it simplifies things. We > can even keep it inclusive all the way through, iterating over: for (c > = start; c != end+step; c += step)... Yeah, I'd prefer that too. I might do it once Revital's and Roman's changes are in. >> I don't have powerpc hardware that I can do meaningful performance >> testing on, but I did run it through a Popular* Embedded Benchmark >> on an ARM Cortex-A8 board with -O3 -fmodulo-sched >> -fmodulo-sched-allow-regmoves. There were no changes. (And this is >> a benchmark that does benefit from modulo scheduling, in some cases >> by a significant amount.) >> > Ok, the patch should be neutral performance-wise. One could check btw > if SMS produces exactly the same output in both cases, even w/o a > powerpc (or any other) HW. OK. The patch does change the output a little because of the MEM_DEP range thing. To compensate for that, I tried compiling libav on ARM with: Index: gcc/modulo-sched.c =================================================================== --- gcc/modulo-sched.c 2011-08-05 11:23:16.000000000 +0100 +++ gcc/modulo-sched.c 2011-08-05 11:27:57.000000000 +0100 @@ -1680,7 +1680,7 @@ get_sched_window (partial_schedule_ptr p p_st, early_start, e->latency); if (e->data_type == MEM_DEP) - end = MIN (end, SCHED_TIME (v_node) + ii - 1); + end = MIN (end, SCHED_TIME (v_node) + ii); } else if (dump_file) fprintf (dump_file, "the node is not scheduled\n"); @@ -1729,7 +1729,7 @@ get_sched_window (partial_schedule_ptr p s_st, late_start, e->latency); if (e->data_type == MEM_DEP) - end = MAX (end, SCHED_TIME (v_node) - ii + 1); + end = MAX (end, SCHED_TIME (v_node) - ii); if (dump_file) fprintf (dump_file, "end = %d\n", end); @@ -1791,7 +1791,7 @@ get_sched_window (partial_schedule_ptr p count_preds++; if (e->data_type == MEM_DEP) - end = MIN (end, SCHED_TIME (v_node) + ii - 1); + end = MIN (end, SCHED_TIME (v_node) + ii); } else if (dump_file) fprintf (dump_file, "the node is not scheduled\n"); applied, then tried the same thing with the revised patch below. The output was the same. (FWIW, libav did show up extra differences when using the patch that I'd originally submitted. They were due to the count_preds and count_succs thing that you picked up in your review.) >> Bootstrapped & regression-tested on powerpc-ibm-aix5.3 with the >> additional patch: >> >> Index: gcc/opts.c >> =================================================================== >> --- gcc/opts.c 2011-08-03 10:56:50.000000000 +0100 >> +++ gcc/opts.c 2011-08-03 10:56:50.000000000 +0100 >> @@ -449,6 +449,8 @@ static const struct default_options defa >> { OPT_LEVELS_1_PLUS, OPT_ftree_ch, NULL, 1 }, >> { OPT_LEVELS_1_PLUS, OPT_fcombine_stack_adjustments, NULL, 1 }, >> { OPT_LEVELS_1_PLUS, OPT_fcompare_elim, NULL, 1 }, >> + { OPT_LEVELS_1_PLUS, OPT_fmodulo_sched, NULL, 1 }, >> + { OPT_LEVELS_1_PLUS, OPT_fmodulo_sched_allow_regmoves, NULL, 1 }, >> >> /* -O2 optimizations. */ >> { OPT_LEVELS_2_PLUS, OPT_finline_small_functions, NULL, 1 }, >> >> applied for both the "before" and "after" runs. OK to install? > > > Yes, with just a couple of minor non-mandatory comments: > 1. Setting "count_preds = psp_not_empty;" and "count_succs = > pss_not_empty;" suggests that you want to decrement non-interesting > neighbors instead of incrementing interesting ones. Otherwise can > initialize to zero (doesn't really matter, just a bit confusing). Yeah, good point. I initialised them to zero, like you said, and changed the reversal condition to: if (pss_not_empty && count_succs >= count_preds) Which I have admit makes a lot more sense than what I submitted, and avoids some unwanted changes in output. > 2. I find it easier to read "late_start = MIN (late_start, early_start > + (ii - 1));" instead of "late_start = MIN (early_start + (ii - 1), > late_start);". Oops, that was accidental. Fixed. > 3. Suggest to set step=1 at the beginning, explaining from the start > that we first compute a forward range (start<end), and may later > possibly revert it. Yeah, that's better. > 4. "late_start" would better be renamed "late_end". Since you said this was optional, I decided to leave it as it is. The name and value of "late start" come directly from the paper. > 5. While we're at it, as you've unified the treatments, we can now > actually do with "start" and "end" only, and do away with early_start > and late_start altogether... The IV patch needs the same distinction as we have now, so I'd prefer to keep it. Thanks for the review. Here's what I applied after testing on powerpc-ibm-aix5.3. Richard gcc/ * modulo-sched.c (get_sched_window): Use just one loop for predecessors and one loop for successors. Fix upper bound of memory range. Index: gcc/modulo-sched.c =================================================================== *** gcc/modulo-sched.c 2011-08-05 10:39:27.000000000 +0100 --- gcc/modulo-sched.c 2011-08-05 10:40:49.000000000 +0100 *************** #define MAX_SPLIT_NUM 10 *** 1630,1638 **** static int get_sched_window (partial_schedule_ptr ps, ddg_node_ptr u_node, ! sbitmap sched_nodes, int ii, int *start_p, int *step_p, int *end_p) { int start, step, end; ddg_edge_ptr e; sbitmap psp = sbitmap_alloc (ps->g->num_nodes); sbitmap pss = sbitmap_alloc (ps->g->num_nodes); --- 1630,1640 ---- static int get_sched_window (partial_schedule_ptr ps, ddg_node_ptr u_node, ! sbitmap sched_nodes, int ii, int *start_p, int *step_p, ! int *end_p) { int start, step, end; + int early_start, late_start; ddg_edge_ptr e; sbitmap psp = sbitmap_alloc (ps->g->num_nodes); sbitmap pss = sbitmap_alloc (ps->g->num_nodes); *************** get_sched_window (partial_schedule_ptr p *** 1640,1645 **** --- 1642,1649 ---- sbitmap u_node_succs = NODE_SUCCESSORS (u_node); int psp_not_empty; int pss_not_empty; + int count_preds; + int count_succs; /* 1. compute sched window for u (start, end, step). */ sbitmap_zero (psp); *************** get_sched_window (partial_schedule_ptr p *** 1647,1861 **** psp_not_empty = sbitmap_a_and_b_cg (psp, u_node_preds, sched_nodes); pss_not_empty = sbitmap_a_and_b_cg (pss, u_node_succs, sched_nodes); ! if (psp_not_empty && !pss_not_empty) ! { ! int early_start = INT_MIN; ! ! end = INT_MAX; ! for (e = u_node->in; e != 0; e = e->next_in) ! { ! ddg_node_ptr v_node = e->src; ! ! if (dump_file) ! { ! fprintf (dump_file, "\nProcessing edge: "); ! print_ddg_edge (dump_file, e); ! fprintf (dump_file, ! "\nScheduling %d (%d) in psp_not_empty," ! " checking p %d (%d): ", u_node->cuid, ! INSN_UID (u_node->insn), v_node->cuid, INSN_UID ! (v_node->insn)); ! } ! ! if (TEST_BIT (sched_nodes, v_node->cuid)) ! { ! int p_st = SCHED_TIME (v_node); ! ! early_start = ! MAX (early_start, p_st + e->latency - (e->distance * ii)); ! ! if (dump_file) ! fprintf (dump_file, ! "pred st = %d; early_start = %d; latency: %d", ! p_st, early_start, e->latency); ! ! if (e->data_type == MEM_DEP) ! end = MIN (end, SCHED_TIME (v_node) + ii - 1); ! } ! else if (dump_file) ! fprintf (dump_file, "the node is not scheduled\n"); ! } ! start = early_start; ! end = MIN (end, early_start + ii); ! /* Schedule the node close to it's predecessors. */ ! step = 1; ! ! if (dump_file) ! fprintf (dump_file, ! "\nScheduling %d (%d) in a window (%d..%d) with step %d\n", ! u_node->cuid, INSN_UID (u_node->insn), start, end, step); ! } ! ! else if (!psp_not_empty && pss_not_empty) ! { ! int late_start = INT_MAX; ! ! end = INT_MIN; ! for (e = u_node->out; e != 0; e = e->next_out) ! { ! ddg_node_ptr v_node = e->dest; ! ! if (dump_file) ! { ! fprintf (dump_file, "\nProcessing edge:"); ! print_ddg_edge (dump_file, e); ! fprintf (dump_file, ! "\nScheduling %d (%d) in pss_not_empty," ! " checking s %d (%d): ", u_node->cuid, ! INSN_UID (u_node->insn), v_node->cuid, INSN_UID ! (v_node->insn)); ! } ! ! if (TEST_BIT (sched_nodes, v_node->cuid)) ! { ! int s_st = SCHED_TIME (v_node); ! ! late_start = MIN (late_start, ! s_st - e->latency + (e->distance * ii)); ! ! if (dump_file) ! fprintf (dump_file, ! "succ st = %d; late_start = %d; latency = %d", ! s_st, late_start, e->latency); ! ! if (e->data_type == MEM_DEP) ! end = MAX (end, SCHED_TIME (v_node) - ii + 1); ! if (dump_file) ! fprintf (dump_file, "end = %d\n", end); ! ! } ! else if (dump_file) ! fprintf (dump_file, "the node is not scheduled\n"); ! } ! start = late_start; ! end = MAX (end, late_start - ii); ! /* Schedule the node close to it's successors. */ ! step = -1; ! if (dump_file) ! fprintf (dump_file, ! "\nScheduling %d (%d) in a window (%d..%d) with step %d\n", ! u_node->cuid, INSN_UID (u_node->insn), start, end, step); ! } ! else if (psp_not_empty && pss_not_empty) ! { ! int early_start = INT_MIN; ! int late_start = INT_MAX; ! int count_preds = 0; ! int count_succs = 0; ! start = INT_MIN; ! end = INT_MAX; ! for (e = u_node->in; e != 0; e = e->next_in) ! { ! ddg_node_ptr v_node = e->src; ! if (dump_file) ! { ! fprintf (dump_file, "\nProcessing edge:"); ! print_ddg_edge (dump_file, e); fprintf (dump_file, ! "\nScheduling %d (%d) in psp_pss_not_empty," ! " checking p %d (%d): ", u_node->cuid, INSN_UID ! (u_node->insn), v_node->cuid, INSN_UID ! (v_node->insn)); ! } ! ! if (TEST_BIT (sched_nodes, v_node->cuid)) ! { ! int p_st = SCHED_TIME (v_node); ! ! early_start = MAX (early_start, ! p_st + e->latency ! - (e->distance * ii)); ! ! if (dump_file) ! fprintf (dump_file, ! "pred st = %d; early_start = %d; latency = %d", ! p_st, early_start, e->latency); ! ! if (e->type == TRUE_DEP && e->data_type == REG_DEP) ! count_preds++; ! if (e->data_type == MEM_DEP) ! end = MIN (end, SCHED_TIME (v_node) + ii - 1); ! } ! else if (dump_file) ! fprintf (dump_file, "the node is not scheduled\n"); ! } ! for (e = u_node->out; e != 0; e = e->next_out) ! { ! ddg_node_ptr v_node = e->dest; ! if (dump_file) ! { ! fprintf (dump_file, "\nProcessing edge:"); ! print_ddg_edge (dump_file, e); ! fprintf (dump_file, ! "\nScheduling %d (%d) in psp_pss_not_empty," ! " checking s %d (%d): ", u_node->cuid, INSN_UID ! (u_node->insn), v_node->cuid, INSN_UID ! (v_node->insn)); ! } ! if (TEST_BIT (sched_nodes, v_node->cuid)) ! { ! int s_st = SCHED_TIME (v_node); ! late_start = MIN (late_start, ! s_st - e->latency ! + (e->distance * ii)); ! if (dump_file) ! fprintf (dump_file, ! "succ st = %d; late_start = %d; latency = %d", ! s_st, late_start, e->latency); ! if (e->type == TRUE_DEP && e->data_type == REG_DEP) ! count_succs++; ! if (e->data_type == MEM_DEP) ! start = MAX (start, SCHED_TIME (v_node) - ii + 1); ! } ! else if (dump_file) ! fprintf (dump_file, "the node is not scheduled\n"); ! } ! start = MAX (start, early_start); ! end = MIN (end, MIN (early_start + ii, late_start + 1)); ! step = 1; ! /* If there are more successors than predecessors schedule the ! node close to it's successors. */ ! if (count_succs >= count_preds) ! { ! int old_start = start; ! start = end - 1; ! end = old_start - 1; ! step = -1; ! } ! } ! else /* psp is empty && pss is empty. */ ! { ! start = SCHED_ASAP (u_node); ! end = start + ii; ! step = 1; } *start_p = start; *step_p = step; *end_p = end; --- 1651,1772 ---- psp_not_empty = sbitmap_a_and_b_cg (psp, u_node_preds, sched_nodes); pss_not_empty = sbitmap_a_and_b_cg (pss, u_node_succs, sched_nodes); ! /* We first compute a forward range (start <= end), then decide whether ! to reverse it. */ ! early_start = INT_MIN; ! late_start = INT_MAX; ! start = INT_MIN; ! end = INT_MAX; ! step = 1; ! ! count_preds = 0; ! count_succs = 0; ! ! /* Calculate early_start and limit end. Both bounds are inclusive. */ ! if (psp_not_empty) ! for (e = u_node->in; e != 0; e = e->next_in) ! { ! ddg_node_ptr v_node = e->src; ! if (dump_file) ! { ! fprintf (dump_file, "\nProcessing edge: "); ! print_ddg_edge (dump_file, e); ! fprintf (dump_file, ! "\nScheduling %d (%d) in psp_not_empty," ! " checking p %d (%d): ", u_node->cuid, ! INSN_UID (u_node->insn), v_node->cuid, INSN_UID ! (v_node->insn)); ! } ! if (TEST_BIT (sched_nodes, v_node->cuid)) ! { ! int p_st = SCHED_TIME (v_node); ! early_start = MAX (early_start, ! p_st + e->latency - (e->distance * ii)); ! if (e->data_type == MEM_DEP) ! end = MIN (end, p_st + ii - 1); ! if (e->type == TRUE_DEP && e->data_type == REG_DEP) ! count_preds++; ! if (dump_file) fprintf (dump_file, ! "pred st = %d; early_start = %d; latency: %d;" ! " end: %d\n", p_st, early_start, e->latency, end); ! } ! else if (dump_file) ! fprintf (dump_file, "the node is not scheduled\n"); ! } ! /* Calculate late_start and limit start. Both bounds are inclusive. */ ! if (pss_not_empty) ! for (e = u_node->out; e != 0; e = e->next_out) ! { ! ddg_node_ptr v_node = e->dest; ! if (dump_file) ! { ! fprintf (dump_file, "\nProcessing edge:"); ! print_ddg_edge (dump_file, e); ! fprintf (dump_file, ! "\nScheduling %d (%d) in pss_not_empty," ! " checking s %d (%d): ", u_node->cuid, ! INSN_UID (u_node->insn), v_node->cuid, INSN_UID ! (v_node->insn)); ! } ! if (TEST_BIT (sched_nodes, v_node->cuid)) ! { ! int s_st = SCHED_TIME (v_node); ! late_start = MIN (late_start, ! s_st - e->latency + (e->distance * ii)); ! if (e->data_type == MEM_DEP) ! start = MAX (start, s_st - ii + 1); ! if (e->type == TRUE_DEP && e->data_type == REG_DEP) ! count_succs++; ! if (dump_file) ! fprintf (dump_file, ! "succ st = %d; late_start = %d; latency = %d;" ! " start=%d", s_st, late_start, e->latency, start); ! } ! else if (dump_file) ! fprintf (dump_file, "the node is not scheduled\n"); ! } ! /* Get a target scheduling window no bigger than ii. */ ! if (early_start == INT_MIN && late_start == INT_MAX) ! early_start = SCHED_ASAP (u_node); ! else if (early_start == INT_MIN) ! early_start = late_start - (ii - 1); ! late_start = MIN (late_start, early_start + (ii - 1)); ! ! /* Apply memory dependence limits. */ ! start = MAX (start, early_start); ! end = MIN (end, late_start); ! ! /* If there are at least as many successors as predecessors, schedule the ! node close to its successors. */ ! if (pss_not_empty && count_succs >= count_preds) ! { ! int tmp = end; ! end = start; ! start = tmp; ! step = -1; } + /* Now that we've finalized the window, make END an exclusive rather + than an inclusive bound. */ + end += step; + *start_p = start; *step_p = step; *end_p = end; *************** get_sched_window (partial_schedule_ptr p *** 1867,1876 **** if (dump_file) fprintf (dump_file, "\nEmpty window: start=%d, end=%d, step=%d\n", start, end, step); ! return -1; } ! return 0; } /* Calculate MUST_PRECEDE/MUST_FOLLOW bitmaps of U_NODE; which is the --- 1778,1787 ---- if (dump_file) fprintf (dump_file, "\nEmpty window: start=%d, end=%d, step=%d\n", start, end, step); ! return -1; } ! return 0; } /* Calculate MUST_PRECEDE/MUST_FOLLOW bitmaps of U_NODE; which is the ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Question about SMS scheduling windows 2011-08-08 9:30 ` Richard Sandiford @ 2011-08-08 19:36 ` Ayal Zaks 2011-08-09 11:19 ` Richard Sandiford 0 siblings, 1 reply; 9+ messages in thread From: Ayal Zaks @ 2011-08-08 19:36 UTC (permalink / raw) To: Ayal Zaks, Revital1 Eres, gcc, richard.sandiford 2011/8/8 Richard Sandiford <richard.sandiford@linaro.org> > > Ayal Zaks <ayal.zaks@gmail.com> writes: > >> OK. For the follow-on iv patch, it seemed easier to keep both bounds > >> inclusive for the loop, then make the "end" exclusive when setting the > >> out parameters. (Note that there shouldn't be any problem with overflow > >> when making the bound exclusive, because the size of the range has been > >> restricted to "ii" by that stage. The follow-on patch does use > >> saturating maths for all ops though.) > > > > Sure, no problem having "end" inclusive, as it simplifies things. We > > can even keep it inclusive all the way through, iterating over: for (c > > = start; c != end+step; c += step)... > > Yeah, I'd prefer that too. I might do it once Revital's and > Roman's changes are in. > > >> I don't have powerpc hardware that I can do meaningful performance > >> testing on, but I did run it through a Popular* Embedded Benchmark > >> on an ARM Cortex-A8 board with -O3 -fmodulo-sched > >> -fmodulo-sched-allow-regmoves. There were no changes. (And this is > >> a benchmark that does benefit from modulo scheduling, in some cases > >> by a significant amount.) > >> > > Ok, the patch should be neutral performance-wise. One could check btw > > if SMS produces exactly the same output in both cases, even w/o a > > powerpc (or any other) HW. > > OK. The patch does change the output a little because of the MEM_DEP > range thing. Ahh, right. > > To compensate for that, I tried compiling libav on ARM with: > > Index: gcc/modulo-sched.c > =================================================================== > --- gcc/modulo-sched.c 2011-08-05 11:23:16.000000000 +0100 > +++ gcc/modulo-sched.c 2011-08-05 11:27:57.000000000 +0100 > @@ -1680,7 +1680,7 @@ get_sched_window (partial_schedule_ptr p > p_st, early_start, e->latency); > > if (e->data_type == MEM_DEP) > - end = MIN (end, SCHED_TIME (v_node) + ii - 1); > + end = MIN (end, SCHED_TIME (v_node) + ii); > } > else if (dump_file) > fprintf (dump_file, "the node is not scheduled\n"); > @@ -1729,7 +1729,7 @@ get_sched_window (partial_schedule_ptr p > s_st, late_start, e->latency); > > if (e->data_type == MEM_DEP) > - end = MAX (end, SCHED_TIME (v_node) - ii + 1); > + end = MAX (end, SCHED_TIME (v_node) - ii); > if (dump_file) > fprintf (dump_file, "end = %d\n", end); > > @@ -1791,7 +1791,7 @@ get_sched_window (partial_schedule_ptr p > count_preds++; > > if (e->data_type == MEM_DEP) > - end = MIN (end, SCHED_TIME (v_node) + ii - 1); > + end = MIN (end, SCHED_TIME (v_node) + ii); > } > else if (dump_file) > fprintf (dump_file, "the node is not scheduled\n"); > > applied, then tried the same thing with the revised patch below. > The output was the same. ok, that's a good sanity check. > > (FWIW, libav did show up extra differences when using the patch > that I'd originally submitted. They were due to the count_preds > and count_succs thing that you picked up in your review.) (These differences had no noticable consequences performance-wise, right?) > > >> Bootstrapped & regression-tested on powerpc-ibm-aix5.3 with the > >> additional patch: > >> > >> Index: gcc/opts.c > >> =================================================================== > >> --- gcc/opts.c 2011-08-03 10:56:50.000000000 +0100 > >> +++ gcc/opts.c 2011-08-03 10:56:50.000000000 +0100 > >> @@ -449,6 +449,8 @@ static const struct default_options defa > >> { OPT_LEVELS_1_PLUS, OPT_ftree_ch, NULL, 1 }, > >> { OPT_LEVELS_1_PLUS, OPT_fcombine_stack_adjustments, NULL, 1 }, > >> { OPT_LEVELS_1_PLUS, OPT_fcompare_elim, NULL, 1 }, > >> + { OPT_LEVELS_1_PLUS, OPT_fmodulo_sched, NULL, 1 }, > >> + { OPT_LEVELS_1_PLUS, OPT_fmodulo_sched_allow_regmoves, NULL, 1 }, > >> > >> /* -O2 optimizations. */ > >> { OPT_LEVELS_2_PLUS, OPT_finline_small_functions, NULL, 1 }, > >> > >> applied for both the "before" and "after" runs. OK to install? > > > > > > Yes, with just a couple of minor non-mandatory comments: > > 1. Setting "count_preds = psp_not_empty;" and "count_succs = > > pss_not_empty;" suggests that you want to decrement non-interesting > > neighbors instead of incrementing interesting ones. Otherwise can > > initialize to zero (doesn't really matter, just a bit confusing). > > Yeah, good point. I initialised them to zero, like you said, > and changed the reversal condition to: > > if (pss_not_empty && count_succs >= count_preds) > > Which I have admit makes a lot more sense than what I submitted, > and avoids some unwanted changes in output. > > > 2. I find it easier to read "late_start = MIN (late_start, early_start > > + (ii - 1));" instead of "late_start = MIN (early_start + (ii - 1), > > late_start);". > > Oops, that was accidental. Fixed. > > > 3. Suggest to set step=1 at the beginning, explaining from the start > > that we first compute a forward range (start<end), and may later > > possibly revert it. > > Yeah, that's better. > > > 4. "late_start" would better be renamed "late_end". > > Since you said this was optional, I decided to leave it as it is. > The name and value of "late start" come directly from the paper. Ok. We can write another paper ;-) > > > 5. While we're at it, as you've unified the treatments, we can now > > actually do with "start" and "end" only, and do away with early_start > > and late_start altogether... > > The IV patch needs the same distinction as we have now, so I'd prefer > to keep it. Ok, looking forward to the IV patch then... > > Thanks for the review. Here's what I applied after testing on > powerpc-ibm-aix5.3. > Thanks for looking into this, Ayal. > > Richard > > > gcc/ > * modulo-sched.c (get_sched_window): Use just one loop for predecessors > and one loop for successors. Fix upper bound of memory range. > > Index: gcc/modulo-sched.c > =================================================================== > *** gcc/modulo-sched.c 2011-08-05 10:39:27.000000000 +0100 > --- gcc/modulo-sched.c 2011-08-05 10:40:49.000000000 +0100 > *************** #define MAX_SPLIT_NUM 10 > *** 1630,1638 **** > > static int > get_sched_window (partial_schedule_ptr ps, ddg_node_ptr u_node, > ! sbitmap sched_nodes, int ii, int *start_p, int *step_p, int *end_p) > { > int start, step, end; > ddg_edge_ptr e; > sbitmap psp = sbitmap_alloc (ps->g->num_nodes); > sbitmap pss = sbitmap_alloc (ps->g->num_nodes); > --- 1630,1640 ---- > > static int > get_sched_window (partial_schedule_ptr ps, ddg_node_ptr u_node, > ! sbitmap sched_nodes, int ii, int *start_p, int *step_p, > ! int *end_p) > { > int start, step, end; > + int early_start, late_start; > ddg_edge_ptr e; > sbitmap psp = sbitmap_alloc (ps->g->num_nodes); > sbitmap pss = sbitmap_alloc (ps->g->num_nodes); > *************** get_sched_window (partial_schedule_ptr p > *** 1640,1645 **** > --- 1642,1649 ---- > sbitmap u_node_succs = NODE_SUCCESSORS (u_node); > int psp_not_empty; > int pss_not_empty; > + int count_preds; > + int count_succs; > > /* 1. compute sched window for u (start, end, step). */ > sbitmap_zero (psp); > *************** get_sched_window (partial_schedule_ptr p > *** 1647,1861 **** > psp_not_empty = sbitmap_a_and_b_cg (psp, u_node_preds, sched_nodes); > pss_not_empty = sbitmap_a_and_b_cg (pss, u_node_succs, sched_nodes); > > ! if (psp_not_empty && !pss_not_empty) > ! { > ! int early_start = INT_MIN; > ! > ! end = INT_MAX; > ! for (e = u_node->in; e != 0; e = e->next_in) > ! { > ! ddg_node_ptr v_node = e->src; > ! > ! if (dump_file) > ! { > ! fprintf (dump_file, "\nProcessing edge: "); > ! print_ddg_edge (dump_file, e); > ! fprintf (dump_file, > ! "\nScheduling %d (%d) in psp_not_empty," > ! " checking p %d (%d): ", u_node->cuid, > ! INSN_UID (u_node->insn), v_node->cuid, INSN_UID > ! (v_node->insn)); > ! } > ! > ! if (TEST_BIT (sched_nodes, v_node->cuid)) > ! { > ! int p_st = SCHED_TIME (v_node); > ! > ! early_start = > ! MAX (early_start, p_st + e->latency - (e->distance * ii)); > ! > ! if (dump_file) > ! fprintf (dump_file, > ! "pred st = %d; early_start = %d; latency: %d", > ! p_st, early_start, e->latency); > ! > ! if (e->data_type == MEM_DEP) > ! end = MIN (end, SCHED_TIME (v_node) + ii - 1); > ! } > ! else if (dump_file) > ! fprintf (dump_file, "the node is not scheduled\n"); > ! } > ! start = early_start; > ! end = MIN (end, early_start + ii); > ! /* Schedule the node close to it's predecessors. */ > ! step = 1; > ! > ! if (dump_file) > ! fprintf (dump_file, > ! "\nScheduling %d (%d) in a window (%d..%d) with step %d\n", > ! u_node->cuid, INSN_UID (u_node->insn), start, end, step); > ! } > ! > ! else if (!psp_not_empty && pss_not_empty) > ! { > ! int late_start = INT_MAX; > ! > ! end = INT_MIN; > ! for (e = u_node->out; e != 0; e = e->next_out) > ! { > ! ddg_node_ptr v_node = e->dest; > ! > ! if (dump_file) > ! { > ! fprintf (dump_file, "\nProcessing edge:"); > ! print_ddg_edge (dump_file, e); > ! fprintf (dump_file, > ! "\nScheduling %d (%d) in pss_not_empty," > ! " checking s %d (%d): ", u_node->cuid, > ! INSN_UID (u_node->insn), v_node->cuid, INSN_UID > ! (v_node->insn)); > ! } > ! > ! if (TEST_BIT (sched_nodes, v_node->cuid)) > ! { > ! int s_st = SCHED_TIME (v_node); > ! > ! late_start = MIN (late_start, > ! s_st - e->latency + (e->distance * ii)); > ! > ! if (dump_file) > ! fprintf (dump_file, > ! "succ st = %d; late_start = %d; latency = %d", > ! s_st, late_start, e->latency); > ! > ! if (e->data_type == MEM_DEP) > ! end = MAX (end, SCHED_TIME (v_node) - ii + 1); > ! if (dump_file) > ! fprintf (dump_file, "end = %d\n", end); > ! > ! } > ! else if (dump_file) > ! fprintf (dump_file, "the node is not scheduled\n"); > > ! } > ! start = late_start; > ! end = MAX (end, late_start - ii); > ! /* Schedule the node close to it's successors. */ > ! step = -1; > > ! if (dump_file) > ! fprintf (dump_file, > ! "\nScheduling %d (%d) in a window (%d..%d) with step %d\n", > ! u_node->cuid, INSN_UID (u_node->insn), start, end, step); > > ! } > > ! else if (psp_not_empty && pss_not_empty) > ! { > ! int early_start = INT_MIN; > ! int late_start = INT_MAX; > ! int count_preds = 0; > ! int count_succs = 0; > > ! start = INT_MIN; > ! end = INT_MAX; > ! for (e = u_node->in; e != 0; e = e->next_in) > ! { > ! ddg_node_ptr v_node = e->src; > > ! if (dump_file) > ! { > ! fprintf (dump_file, "\nProcessing edge:"); > ! print_ddg_edge (dump_file, e); > fprintf (dump_file, > ! "\nScheduling %d (%d) in psp_pss_not_empty," > ! " checking p %d (%d): ", u_node->cuid, INSN_UID > ! (u_node->insn), v_node->cuid, INSN_UID > ! (v_node->insn)); > ! } > ! > ! if (TEST_BIT (sched_nodes, v_node->cuid)) > ! { > ! int p_st = SCHED_TIME (v_node); > ! > ! early_start = MAX (early_start, > ! p_st + e->latency > ! - (e->distance * ii)); > ! > ! if (dump_file) > ! fprintf (dump_file, > ! "pred st = %d; early_start = %d; latency = %d", > ! p_st, early_start, e->latency); > ! > ! if (e->type == TRUE_DEP && e->data_type == REG_DEP) > ! count_preds++; > > ! if (e->data_type == MEM_DEP) > ! end = MIN (end, SCHED_TIME (v_node) + ii - 1); > ! } > ! else if (dump_file) > ! fprintf (dump_file, "the node is not scheduled\n"); > > ! } > ! for (e = u_node->out; e != 0; e = e->next_out) > ! { > ! ddg_node_ptr v_node = e->dest; > > ! if (dump_file) > ! { > ! fprintf (dump_file, "\nProcessing edge:"); > ! print_ddg_edge (dump_file, e); > ! fprintf (dump_file, > ! "\nScheduling %d (%d) in psp_pss_not_empty," > ! " checking s %d (%d): ", u_node->cuid, INSN_UID > ! (u_node->insn), v_node->cuid, INSN_UID > ! (v_node->insn)); > ! } > > ! if (TEST_BIT (sched_nodes, v_node->cuid)) > ! { > ! int s_st = SCHED_TIME (v_node); > > ! late_start = MIN (late_start, > ! s_st - e->latency > ! + (e->distance * ii)); > > ! if (dump_file) > ! fprintf (dump_file, > ! "succ st = %d; late_start = %d; latency = %d", > ! s_st, late_start, e->latency); > > ! if (e->type == TRUE_DEP && e->data_type == REG_DEP) > ! count_succs++; > > ! if (e->data_type == MEM_DEP) > ! start = MAX (start, SCHED_TIME (v_node) - ii + 1); > ! } > ! else if (dump_file) > ! fprintf (dump_file, "the node is not scheduled\n"); > > ! } > ! start = MAX (start, early_start); > ! end = MIN (end, MIN (early_start + ii, late_start + 1)); > ! step = 1; > ! /* If there are more successors than predecessors schedule the > ! node close to it's successors. */ > ! if (count_succs >= count_preds) > ! { > ! int old_start = start; > > ! start = end - 1; > ! end = old_start - 1; > ! step = -1; > ! } > ! } > ! else /* psp is empty && pss is empty. */ > ! { > ! start = SCHED_ASAP (u_node); > ! end = start + ii; > ! step = 1; > } > > *start_p = start; > *step_p = step; > *end_p = end; > --- 1651,1772 ---- > psp_not_empty = sbitmap_a_and_b_cg (psp, u_node_preds, sched_nodes); > pss_not_empty = sbitmap_a_and_b_cg (pss, u_node_succs, sched_nodes); > > ! /* We first compute a forward range (start <= end), then decide whether > ! to reverse it. */ > ! early_start = INT_MIN; > ! late_start = INT_MAX; > ! start = INT_MIN; > ! end = INT_MAX; > ! step = 1; > ! > ! count_preds = 0; > ! count_succs = 0; > ! > ! /* Calculate early_start and limit end. Both bounds are inclusive. */ > ! if (psp_not_empty) > ! for (e = u_node->in; e != 0; e = e->next_in) > ! { > ! ddg_node_ptr v_node = e->src; > > ! if (dump_file) > ! { > ! fprintf (dump_file, "\nProcessing edge: "); > ! print_ddg_edge (dump_file, e); > ! fprintf (dump_file, > ! "\nScheduling %d (%d) in psp_not_empty," > ! " checking p %d (%d): ", u_node->cuid, > ! INSN_UID (u_node->insn), v_node->cuid, INSN_UID > ! (v_node->insn)); > ! } > > ! if (TEST_BIT (sched_nodes, v_node->cuid)) > ! { > ! int p_st = SCHED_TIME (v_node); > > ! early_start = MAX (early_start, > ! p_st + e->latency - (e->distance * ii)); > > ! if (e->data_type == MEM_DEP) > ! end = MIN (end, p_st + ii - 1); > > ! if (e->type == TRUE_DEP && e->data_type == REG_DEP) > ! count_preds++; > > ! if (dump_file) > fprintf (dump_file, > ! "pred st = %d; early_start = %d; latency: %d;" > ! " end: %d\n", p_st, early_start, e->latency, end); > > ! } > ! else if (dump_file) > ! fprintf (dump_file, "the node is not scheduled\n"); > ! } > > ! /* Calculate late_start and limit start. Both bounds are inclusive. */ > ! if (pss_not_empty) > ! for (e = u_node->out; e != 0; e = e->next_out) > ! { > ! ddg_node_ptr v_node = e->dest; > > ! if (dump_file) > ! { > ! fprintf (dump_file, "\nProcessing edge:"); > ! print_ddg_edge (dump_file, e); > ! fprintf (dump_file, > ! "\nScheduling %d (%d) in pss_not_empty," > ! " checking s %d (%d): ", u_node->cuid, > ! INSN_UID (u_node->insn), v_node->cuid, INSN_UID > ! (v_node->insn)); > ! } > > ! if (TEST_BIT (sched_nodes, v_node->cuid)) > ! { > ! int s_st = SCHED_TIME (v_node); > > ! late_start = MIN (late_start, > ! s_st - e->latency + (e->distance * ii)); > > ! if (e->data_type == MEM_DEP) > ! start = MAX (start, s_st - ii + 1); > > ! if (e->type == TRUE_DEP && e->data_type == REG_DEP) > ! count_succs++; > > ! if (dump_file) > ! fprintf (dump_file, > ! "succ st = %d; late_start = %d; latency = %d;" > ! " start=%d", s_st, late_start, e->latency, start); > > ! } > ! else if (dump_file) > ! fprintf (dump_file, "the node is not scheduled\n"); > ! } > > ! /* Get a target scheduling window no bigger than ii. */ > ! if (early_start == INT_MIN && late_start == INT_MAX) > ! early_start = SCHED_ASAP (u_node); > ! else if (early_start == INT_MIN) > ! early_start = late_start - (ii - 1); > ! late_start = MIN (late_start, early_start + (ii - 1)); > ! > ! /* Apply memory dependence limits. */ > ! start = MAX (start, early_start); > ! end = MIN (end, late_start); > ! > ! /* If there are at least as many successors as predecessors, schedule the > ! node close to its successors. */ > ! if (pss_not_empty && count_succs >= count_preds) > ! { > ! int tmp = end; > ! end = start; > ! start = tmp; > ! step = -1; > } > > + /* Now that we've finalized the window, make END an exclusive rather > + than an inclusive bound. */ > + end += step; > + > *start_p = start; > *step_p = step; > *end_p = end; > *************** get_sched_window (partial_schedule_ptr p > *** 1867,1876 **** > if (dump_file) > fprintf (dump_file, "\nEmpty window: start=%d, end=%d, step=%d\n", > start, end, step); > ! return -1; > } > > ! return 0; > } > > /* Calculate MUST_PRECEDE/MUST_FOLLOW bitmaps of U_NODE; which is the > --- 1778,1787 ---- > if (dump_file) > fprintf (dump_file, "\nEmpty window: start=%d, end=%d, step=%d\n", > start, end, step); > ! return -1; > } > > ! return 0; > } > > /* Calculate MUST_PRECEDE/MUST_FOLLOW bitmaps of U_NODE; which is the ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Question about SMS scheduling windows 2011-08-08 19:36 ` Ayal Zaks @ 2011-08-09 11:19 ` Richard Sandiford 0 siblings, 0 replies; 9+ messages in thread From: Richard Sandiford @ 2011-08-09 11:19 UTC (permalink / raw) To: Ayal Zaks; +Cc: Revital1 Eres, gcc Ayal Zaks <ayal.zaks@gmail.com> writes: >> (FWIW, libav did show up extra differences when using the patch >> that I'd originally submitted. They were due to the count_preds >> and count_succs thing that you picked up in your review.) > > > (These differences had no noticable consequences performance-wise, right?) Well, because they were unintentional, I'm afraid I didn't measure them. The committed version didn't have these differences. (It was an assembly source comparison, so I don't have a ready way of measuring the performance of each affected loop.) Richard ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2011-08-09 11:19 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-07-27 10:21 Question about SMS scheduling windows Richard Sandiford 2011-07-27 13:34 ` Richard Sandiford 2011-07-27 13:40 ` Revital1 Eres 2011-07-27 16:42 ` Ayal Zaks 2011-08-04 9:03 ` Richard Sandiford 2011-08-04 16:27 ` Ayal Zaks 2011-08-08 9:30 ` Richard Sandiford 2011-08-08 19:36 ` Ayal Zaks 2011-08-09 11:19 ` Richard Sandiford
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).