public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Question about SMS scheduling windows
@ 2011-07-27 10:21 Richard Sandiford
  2011-07-27 13:34 ` Richard Sandiford
  2011-07-27 13:40 ` Revital1 Eres
  0 siblings, 2 replies; 9+ messages in thread
From: Richard Sandiford @ 2011-07-27 10:21 UTC (permalink / raw)
  To: gcc; +Cc: zaks, eres

I've been looking at SMS, and have a question about get_sched_window.
When there are previously-scheduled predessors, we use:

	      if (e->data_type == MEM_DEP)
		end = MIN (end, SCHED_TIME (v_node) + ii - 1);

to get an upper bound on the scheduling window that is permitted
by memory dependencies.  I think this:

    SCHED_TIME (v_node) + ii - 1

is an inclusive bound, in that scheduling the node at that time
would not break the memory dependence, whereas scheduling at
SCHED_TIME (v_node) would.  Is that right?

I ask because in the final range:

      start = early_start;
      end = MIN (end, early_start + ii);
      /* Schedule the node close to it's predecessors.  */
      step = 1;

END is an exclusive bound.  It seems like we might be double-counting here,
and effectively limiting the schedule to SCHED_TIME (v_node) + ii - 2.

While I'm here, I was also curious about:

      /* If there are more successors than predecessors schedule the
         node close to it's successors.  */
      if (count_succs >= count_preds)
        {
          int old_start = start;

          start = end - 1;
          end = old_start - 1;
          step = -1;
        }

This doesn't seem to be in the paper, and the comment suggests
"count_succs > count_preds" rather than "count_succs >= count_preds".
Is the ">=" vs ">" important?

Thanks,
Richard

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Question about SMS scheduling windows
  2011-07-27 10:21 Question about SMS scheduling windows Richard Sandiford
@ 2011-07-27 13:34 ` Richard Sandiford
  2011-07-27 13:40 ` Revital1 Eres
  1 sibling, 0 replies; 9+ messages in thread
From: Richard Sandiford @ 2011-07-27 13:34 UTC (permalink / raw)
  To: gcc; +Cc: zaks, eres

Richard Sandiford <richard.sandiford@linaro.org> writes:
> I've been looking at SMS, and have a question about get_sched_window.
> When there are previously-scheduled predessors, we use:
>
> 	      if (e->data_type == MEM_DEP)
> 		end = MIN (end, SCHED_TIME (v_node) + ii - 1);
>
> to get an upper bound on the scheduling window that is permitted
> by memory dependencies.  I think this:
>
>     SCHED_TIME (v_node) + ii - 1
>
> is an inclusive bound, in that scheduling the node at that time
> would not break the memory dependence, whereas scheduling at
> SCHED_TIME (v_node) would.  Is that right?

I meant of course "scheduling at SCHED_TIME (v_node) + ii would".

Richard

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Question about SMS scheduling windows
  2011-07-27 10:21 Question about SMS scheduling windows Richard Sandiford
  2011-07-27 13:34 ` Richard Sandiford
@ 2011-07-27 13:40 ` Revital1 Eres
  2011-07-27 16:42   ` Ayal Zaks
  1 sibling, 1 reply; 9+ messages in thread
From: Revital1 Eres @ 2011-07-27 13:40 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: ayal.zaks, gcc

Hello Richard,

> I ask because in the final range:
>
>       start = early_start;
>       end = MIN (end, early_start + ii);
>       /* Schedule the node close to it's predecessors.  */
>       step = 1;
>
> END is an exclusive bound.  It seems like we might be double-counting
here,
> and effectively limiting the schedule to SCHED_TIME (v_node) + ii - 2.

Yes, I think it indeed should be fixed. Thanks for reporting on this.

Revital

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Question about SMS scheduling windows
  2011-07-27 13:40 ` Revital1 Eres
@ 2011-07-27 16:42   ` Ayal Zaks
  2011-08-04  9:03     ` Richard Sandiford
  0 siblings, 1 reply; 9+ messages in thread
From: Ayal Zaks @ 2011-07-27 16:42 UTC (permalink / raw)
  To: Revital1 Eres; +Cc: Richard Sandiford, gcc

(sorry for replicated submissions, had to convert to plain text)

>2011/7/27 Revital1 Eres <ERES@il.ibm.com>
>
>Hello Richard,
>
>
>> I ask because in the final range:
>>
>>       start = early_start;
>>       end = MIN (end, early_start + ii);
>>       /* Schedule the node close to it's predecessors.  */
>>       step = 1;
>>
>> END is an exclusive bound.  It seems like we might be double-counting
here,
>> and effectively limiting the schedule to SCHED_TIME (v_node) + ii - 2.
>
>
>Yes, I think it indeed should be fixed. Thanks for reporting on this.
>
>Revital

Agreed;

                      if (e->data_type == MEM_DEP)
                                end = MIN (end, SCHED_TIME (v_node) + ii - 1);

should be replaced with

                      if (e->data_type == MEM_DEP)
                                end = MIN (end, p_st + ii);

also for the (3rd) case when there are both previously-scheduled
predessors and previously-scheduled successors. The range is inclusive
of start and exclusive of end: for (c = start; c != end; c += step)...


>This doesn't seem to be in the paper, and the comment suggests
>"count_succs > count_preds" rather than "count_succs >= count_preds".
>Is the ">=" vs ">" important?

I think not: in both cases you'll be trying to minimize the same
number of live ranges. But indeed it's good to be consistent with the
comment.

Thanks,
Ayal.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Question about SMS scheduling windows
  2011-07-27 16:42   ` Ayal Zaks
@ 2011-08-04  9:03     ` Richard Sandiford
  2011-08-04 16:27       ` Ayal Zaks
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Sandiford @ 2011-08-04  9:03 UTC (permalink / raw)
  To: Ayal Zaks; +Cc: Revital1 Eres, gcc

Hi Ayal,

Thanks to you and Revital for the replies.  The reason I asked is that
I wanted to rewrite gen_sched_window so that it has only one loop over
the PSPs and one loop over the PSSs.  I have a follow-up patch to use
iv analysis to reduce the number of memory dependencies (or at least
increase the distance between them), and it was easier to do that
after this change.

Ayal Zaks <ayal.zaks@gmail.com> writes:
>>> I ask because in the final range:
>>>
>>>       start = early_start;
>>>       end = MIN (end, early_start + ii);
>>>       /* Schedule the node close to it's predecessors.  */
>>>       step = 1;
>>>
>>> END is an exclusive bound.  It seems like we might be double-counting
> here,
>>> and effectively limiting the schedule to SCHED_TIME (v_node) + ii - 2.
>>
>>
>>Yes, I think it indeed should be fixed. Thanks for reporting on this.
>>
>>Revital
>
> Agreed;
>
>                       if (e->data_type == MEM_DEP)
>                                 end = MIN (end, SCHED_TIME (v_node) + ii - 1);
>
> should be replaced with
>
>                       if (e->data_type == MEM_DEP)
>                                 end = MIN (end, p_st + ii);
>
> also for the (3rd) case when there are both previously-scheduled
> predessors and previously-scheduled successors. The range is inclusive
> of start and exclusive of end: for (c = start; c != end; c += step)...

OK.  For the follow-on iv patch, it seemed easier to keep both bounds
inclusive for the loop, then make the "end" exclusive when setting the
out parameters.  (Note that there shouldn't be any problem with overflow
when making the bound exclusive, because the size of the range has been
restricted to "ii" by that stage.  The follow-on patch does use
saturating maths for all ops though.)

>>This doesn't seem to be in the paper, and the comment suggests
>>"count_succs > count_preds" rather than "count_succs >= count_preds".
>>Is the ">=" vs ">" important?
>
> I think not: in both cases you'll be trying to minimize the same
> number of live ranges. But indeed it's good to be consistent with the
> comment.

OK.  For no particular reason other than cowardice, I ended up keeping
this as:

!   if (count_succs && count_succs >= count_preds)

The reason for asking was that:

!   if (count_succs > count_preds)

seemed more natural, and would match the existing comment.  I'm happy
to test that instead if you prefer.

I don't have powerpc hardware that I can do meaningful performance
testing on, but I did run it through a Popular* Embedded Benchmark
on an ARM Cortex-A8 board with -O3 -fmodulo-sched
-fmodulo-sched-allow-regmoves.  There were no changes.  (And this is
a benchmark that does benefit from modulo scheduling, in some cases
by a significant amount.)

Bootstrapped & regression-tested on powerpc-ibm-aix5.3 with the
additional patch:

Index: gcc/opts.c
===================================================================
--- gcc/opts.c	2011-08-03 10:56:50.000000000 +0100
+++ gcc/opts.c	2011-08-03 10:56:50.000000000 +0100
@@ -449,6 +449,8 @@ static const struct default_options defa
     { OPT_LEVELS_1_PLUS, OPT_ftree_ch, NULL, 1 },
     { OPT_LEVELS_1_PLUS, OPT_fcombine_stack_adjustments, NULL, 1 },
     { OPT_LEVELS_1_PLUS, OPT_fcompare_elim, NULL, 1 },
+    { OPT_LEVELS_1_PLUS, OPT_fmodulo_sched, NULL, 1 },
+    { OPT_LEVELS_1_PLUS, OPT_fmodulo_sched_allow_regmoves, NULL, 1 },
 
     /* -O2 optimizations.  */
     { OPT_LEVELS_2_PLUS, OPT_finline_small_functions, NULL, 1 },

applied for both the "before" and "after" runs.  OK to install?

Thanks,
Richard


gcc/
	* modulo-sched.c (get_sched_window): Use just one loop for predecessors
	and one loop for successors.  Fix upper bound of memory range.

Index: gcc/modulo-sched.c
===================================================================
*** gcc/modulo-sched.c	2011-08-04 09:09:29.000000000 +0100
--- gcc/modulo-sched.c	2011-08-04 09:49:16.000000000 +0100
*************** #define MAX_SPLIT_NUM 10
*** 1630,1638 ****
  
  static int
  get_sched_window (partial_schedule_ptr ps, ddg_node_ptr u_node,
! 		  sbitmap sched_nodes, int ii, int *start_p, int *step_p, int *end_p)
  {
    int start, step, end;
    ddg_edge_ptr e;
    sbitmap psp = sbitmap_alloc (ps->g->num_nodes);
    sbitmap pss = sbitmap_alloc (ps->g->num_nodes);
--- 1630,1640 ----
  
  static int
  get_sched_window (partial_schedule_ptr ps, ddg_node_ptr u_node,
! 		  sbitmap sched_nodes, int ii, int *start_p, int *step_p,
! 		  int *end_p)
  {
    int start, step, end;
+   int early_start, late_start;
    ddg_edge_ptr e;
    sbitmap psp = sbitmap_alloc (ps->g->num_nodes);
    sbitmap pss = sbitmap_alloc (ps->g->num_nodes);
*************** get_sched_window (partial_schedule_ptr p
*** 1640,1645 ****
--- 1642,1649 ----
    sbitmap u_node_succs = NODE_SUCCESSORS (u_node);
    int psp_not_empty;
    int pss_not_empty;
+   int count_preds;
+   int count_succs;
  
    /* 1. compute sched window for u (start, end, step).  */
    sbitmap_zero (psp);
*************** get_sched_window (partial_schedule_ptr p
*** 1647,1861 ****
    psp_not_empty = sbitmap_a_and_b_cg (psp, u_node_preds, sched_nodes);
    pss_not_empty = sbitmap_a_and_b_cg (pss, u_node_succs, sched_nodes);
  
!   if (psp_not_empty && !pss_not_empty)
!     {
!       int early_start = INT_MIN;
! 
!       end = INT_MAX;
!       for (e = u_node->in; e != 0; e = e->next_in)
! 	{
! 	  ddg_node_ptr v_node = e->src;
! 
!           if (dump_file)
!             {
! 	      fprintf (dump_file, "\nProcessing edge: ");
!               print_ddg_edge (dump_file, e);
! 	      fprintf (dump_file,
! 		       "\nScheduling %d (%d) in psp_not_empty,"
! 		       " checking p %d (%d): ", u_node->cuid,
! 		       INSN_UID (u_node->insn), v_node->cuid, INSN_UID
! 		       (v_node->insn));
!             }
! 
! 	  if (TEST_BIT (sched_nodes, v_node->cuid))
! 	    {
!               int p_st = SCHED_TIME (v_node);
! 
!               early_start =
!                 MAX (early_start, p_st + e->latency - (e->distance * ii));
! 
!               if (dump_file)
!                 fprintf (dump_file,
!                          "pred st = %d; early_start = %d; latency: %d",
!                          p_st, early_start, e->latency);
! 
! 	      if (e->data_type == MEM_DEP)
! 		end = MIN (end, SCHED_TIME (v_node) + ii - 1);
! 	    }
!          else if (dump_file)
!             fprintf (dump_file, "the node is not scheduled\n");
! 	}
!       start = early_start;
!       end = MIN (end, early_start + ii);
!       /* Schedule the node close to it's predecessors.  */
!       step = 1;
! 
!       if (dump_file)
!         fprintf (dump_file,
! 		 "\nScheduling %d (%d) in a window (%d..%d) with step %d\n",
! 		 u_node->cuid, INSN_UID (u_node->insn), start, end, step);
!     }
! 
!   else if (!psp_not_empty && pss_not_empty)
!     {
!       int late_start = INT_MAX;
! 
!       end = INT_MIN;
!       for (e = u_node->out; e != 0; e = e->next_out)
! 	{
! 	  ddg_node_ptr v_node = e->dest;
! 
!           if (dump_file)
!             {
!               fprintf (dump_file, "\nProcessing edge:");
!               print_ddg_edge (dump_file, e);
!               fprintf (dump_file,
!                        "\nScheduling %d (%d) in pss_not_empty,"
!                        " checking s %d (%d): ", u_node->cuid,
!                        INSN_UID (u_node->insn), v_node->cuid, INSN_UID
!                        (v_node->insn));
!             }
! 
! 	  if (TEST_BIT (sched_nodes, v_node->cuid))
! 	    {
!               int s_st = SCHED_TIME (v_node);
! 
!               late_start = MIN (late_start,
!                                 s_st - e->latency + (e->distance * ii));
! 
!               if (dump_file)
!                 fprintf (dump_file,
!                          "succ st = %d; late_start = %d; latency = %d",
!                          s_st, late_start, e->latency);
! 
! 	      if (e->data_type == MEM_DEP)
! 		end = MAX (end, SCHED_TIME (v_node) - ii + 1);
!              if (dump_file)
!                  fprintf (dump_file, "end = %d\n", end);
! 
! 	    }
!           else if (dump_file)
!             fprintf (dump_file, "the node is not scheduled\n");
  
! 	}
!       start = late_start;
!       end = MAX (end, late_start - ii);
!       /* Schedule the node close to it's successors.  */
!       step = -1;
  
!       if (dump_file)
!         fprintf (dump_file,
!                  "\nScheduling %d (%d) in a window (%d..%d) with step %d\n",
!                  u_node->cuid, INSN_UID (u_node->insn), start, end, step);
  
!     }
  
!   else if (psp_not_empty && pss_not_empty)
!     {
!       int early_start = INT_MIN;
!       int late_start = INT_MAX;
!       int count_preds = 0;
!       int count_succs = 0;
  
!       start = INT_MIN;
!       end = INT_MAX;
!       for (e = u_node->in; e != 0; e = e->next_in)
! 	{
! 	  ddg_node_ptr v_node = e->src;
  
! 	  if (dump_file)
! 	    {
!               fprintf (dump_file, "\nProcessing edge:");
!               print_ddg_edge (dump_file, e);
  	      fprintf (dump_file,
! 		       "\nScheduling %d (%d) in psp_pss_not_empty,"
! 		       " checking p %d (%d): ", u_node->cuid, INSN_UID
! 		       (u_node->insn), v_node->cuid, INSN_UID
! 		       (v_node->insn));
! 	    }
! 
! 	  if (TEST_BIT (sched_nodes, v_node->cuid))
! 	    {
!               int p_st = SCHED_TIME (v_node);
! 
! 	      early_start = MAX (early_start,
! 				 p_st + e->latency
! 				 - (e->distance * ii));
! 
!               if (dump_file)
!                 fprintf (dump_file,
!                          "pred st = %d; early_start = %d; latency = %d",
!                          p_st, early_start, e->latency);
! 
!               if (e->type == TRUE_DEP && e->data_type == REG_DEP)
!                 count_preds++;
  
! 	      if (e->data_type == MEM_DEP)
! 		end = MIN (end, SCHED_TIME (v_node) + ii - 1);
! 	    }
!           else if (dump_file)
!             fprintf (dump_file, "the node is not scheduled\n");
  
! 	}
!       for (e = u_node->out; e != 0; e = e->next_out)
! 	{
! 	  ddg_node_ptr v_node = e->dest;
  
! 	  if (dump_file)
! 	    {
!               fprintf (dump_file, "\nProcessing edge:");
!               print_ddg_edge (dump_file, e);
! 	      fprintf (dump_file,
! 		       "\nScheduling %d (%d) in psp_pss_not_empty,"
! 		       " checking s %d (%d): ", u_node->cuid, INSN_UID
! 		       (u_node->insn), v_node->cuid, INSN_UID
! 		       (v_node->insn));
! 	    }
  
! 	  if (TEST_BIT (sched_nodes, v_node->cuid))
! 	    {
!               int s_st = SCHED_TIME (v_node);
  
! 	      late_start = MIN (late_start,
! 				s_st - e->latency
! 				+ (e->distance * ii));
  
!               if (dump_file)
!                 fprintf (dump_file,
!                          "succ st = %d; late_start = %d; latency = %d",
!                          s_st, late_start, e->latency);
  
!                if (e->type == TRUE_DEP && e->data_type == REG_DEP)
!                  count_succs++;
  
! 	      if (e->data_type == MEM_DEP)
! 		start = MAX (start, SCHED_TIME (v_node) - ii + 1);
! 	    }
!           else if (dump_file)
!             fprintf (dump_file, "the node is not scheduled\n");
  
! 	}
!       start = MAX (start, early_start);
!       end = MIN (end, MIN (early_start + ii, late_start + 1));
!       step = 1;
!       /* If there are more successors than predecessors schedule the
!          node close to it's successors.  */
!       if (count_succs >= count_preds)
!         {
!           int old_start = start;
  
!           start = end - 1;
!           end = old_start - 1;
!           step = -1;
!         }
!     }
!   else /* psp is empty && pss is empty.  */
!     {
!       start = SCHED_ASAP (u_node);
!       end = start + ii;
!       step = 1;
      }
  
    *start_p = start;
    *step_p = step;
    *end_p = end;
--- 1651,1769 ----
    psp_not_empty = sbitmap_a_and_b_cg (psp, u_node_preds, sched_nodes);
    pss_not_empty = sbitmap_a_and_b_cg (pss, u_node_succs, sched_nodes);
  
!   early_start = INT_MIN;
!   late_start = INT_MAX;
!   start = INT_MIN;
!   end = INT_MAX;
!   count_preds = psp_not_empty;
!   count_succs = pss_not_empty;
! 
!   /* Calculate early_start and limit end.  Both bounds are inclusive.  */
!   if (psp_not_empty)
!     for (e = u_node->in; e != 0; e = e->next_in)
!       {
! 	ddg_node_ptr v_node = e->src;
  
! 	if (dump_file)
! 	  {
! 	    fprintf (dump_file, "\nProcessing edge: ");
! 	    print_ddg_edge (dump_file, e);
! 	    fprintf (dump_file,
! 		     "\nScheduling %d (%d) in psp_not_empty,"
! 		     " checking p %d (%d): ", u_node->cuid,
! 		     INSN_UID (u_node->insn), v_node->cuid, INSN_UID
! 		     (v_node->insn));
! 	  }
  
! 	if (TEST_BIT (sched_nodes, v_node->cuid))
! 	  {
! 	    int p_st = SCHED_TIME (v_node);
  
! 	    early_start = MAX (early_start,
! 			       p_st + e->latency - (e->distance * ii));
  
! 	    if (e->data_type == MEM_DEP)
! 	      end = MIN (end, p_st + ii - 1);
  
! 	    if (e->type == TRUE_DEP && e->data_type == REG_DEP)
! 	      count_preds++;
  
! 	    if (dump_file)
  	      fprintf (dump_file,
! 		       "pred st = %d; early_start = %d; latency: %d;"
! 		       " end: %d\n", p_st, early_start, e->latency, end);
  
! 	  }
! 	else if (dump_file)
! 	  fprintf (dump_file, "the node is not scheduled\n");
!       }
  
!   /* Calculate late_start and limit start.  Both bounds are inclusive.  */
!   if (pss_not_empty)
!     for (e = u_node->out; e != 0; e = e->next_out)
!       {
! 	ddg_node_ptr v_node = e->dest;
  
! 	if (dump_file)
! 	  {
! 	    fprintf (dump_file, "\nProcessing edge:");
! 	    print_ddg_edge (dump_file, e);
! 	    fprintf (dump_file,
! 		     "\nScheduling %d (%d) in pss_not_empty,"
! 		     " checking s %d (%d): ", u_node->cuid,
! 		     INSN_UID (u_node->insn), v_node->cuid, INSN_UID
! 		     (v_node->insn));
! 	  }
  
! 	if (TEST_BIT (sched_nodes, v_node->cuid))
! 	  {
! 	    int s_st = SCHED_TIME (v_node);
  
! 	    late_start = MIN (late_start,
! 			      s_st - e->latency + (e->distance * ii));
  
! 	    if (e->data_type == MEM_DEP)
! 	      start = MAX (start, s_st - ii + 1);
  
! 	    if (e->type == TRUE_DEP && e->data_type == REG_DEP)
! 	      count_succs++;
  
! 	    if (dump_file)
! 	      fprintf (dump_file,
! 		       "succ st = %d; late_start = %d; latency = %d;"
! 		       " start=%d", s_st, late_start, e->latency, start);
  
! 	  }
! 	else if (dump_file)
! 	  fprintf (dump_file, "the node is not scheduled\n");
!       }
  
!   /* Get a target scheduling window no bigger than ii.  */
!   if (early_start == INT_MIN && late_start == INT_MAX)
!     early_start = SCHED_ASAP (u_node);
!   else if (early_start == INT_MIN)
!     early_start = late_start - (ii - 1);
!   late_start = MIN (early_start + (ii - 1), late_start);
! 
!   /* Apply memory dependence limits.  */
!   start = MAX (start, early_start);
!   end = MIN (end, late_start);
!   step = 1;
! 
!   /* If there are at least as many successors as predecessors, schedule the
!      node close to its successors.  */
!   if (count_succs && count_succs >= count_preds)
!     {
!       int tmp = end;
!       end = start;
!       start = tmp;
!       step = -1;
      }
  
+   /* Now that we've finalized the window, make END an exclusive rather
+      than an inclusive bound.  */
+   end += step;
+ 
    *start_p = start;
    *step_p = step;
    *end_p = end;
*************** get_sched_window (partial_schedule_ptr p
*** 1867,1876 ****
        if (dump_file)
  	fprintf (dump_file, "\nEmpty window: start=%d, end=%d, step=%d\n",
  		 start, end, step);
!     return -1;
      }
  
!     return 0;
  }
  
  /* Calculate MUST_PRECEDE/MUST_FOLLOW bitmaps of U_NODE; which is the
--- 1775,1784 ----
        if (dump_file)
  	fprintf (dump_file, "\nEmpty window: start=%d, end=%d, step=%d\n",
  		 start, end, step);
!       return -1;
      }
  
!   return 0;
  }
  
  /* Calculate MUST_PRECEDE/MUST_FOLLOW bitmaps of U_NODE; which is the

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Question about SMS scheduling windows
  2011-08-04  9:03     ` Richard Sandiford
@ 2011-08-04 16:27       ` Ayal Zaks
  2011-08-08  9:30         ` Richard Sandiford
  0 siblings, 1 reply; 9+ messages in thread
From: Ayal Zaks @ 2011-08-04 16:27 UTC (permalink / raw)
  To: Ayal Zaks, Revital1 Eres, gcc, richard.sandiford

Hi Richard,

2011/8/4 Richard Sandiford <richard.sandiford@linaro.org>
>
> Hi Ayal,
>
> Thanks to you and Revital for the replies.  The reason I asked is that
> I wanted to rewrite gen_sched_window so that it has only one loop over
> the PSPs and one loop over the PSSs.


This rewrite makes perfect sense regardless of any follow-up patches,
as it clarifies and reuses code.

>
> I have a follow-up patch to use
> iv analysis to reduce the number of memory dependencies (or at least
> increase the distance between them), and it was easier to do that
> after this change.


Reducing memory dependencies is definitely important.

>
> Ayal Zaks <ayal.zaks@gmail.com> writes:
> >>> I ask because in the final range:
> >>>
> >>>       start = early_start;
> >>>       end = MIN (end, early_start + ii);
> >>>       /* Schedule the node close to it's predecessors.  */
> >>>       step = 1;
> >>>
> >>> END is an exclusive bound.  It seems like we might be double-counting
> > here,
> >>> and effectively limiting the schedule to SCHED_TIME (v_node) + ii - 2.
> >>
> >>
> >>Yes, I think it indeed should be fixed. Thanks for reporting on this.
> >>
> >>Revital
> >
> > Agreed;
> >
> >                       if (e->data_type == MEM_DEP)
> >                                 end = MIN (end, SCHED_TIME (v_node) + ii - 1);
> >
> > should be replaced with
> >
> >                       if (e->data_type == MEM_DEP)
> >                                 end = MIN (end, p_st + ii);
> >
> > also for the (3rd) case when there are both previously-scheduled
> > predessors and previously-scheduled successors. The range is inclusive
> > of start and exclusive of end: for (c = start; c != end; c += step)...
>
> OK.  For the follow-on iv patch, it seemed easier to keep both bounds
> inclusive for the loop, then make the "end" exclusive when setting the
> out parameters.  (Note that there shouldn't be any problem with overflow
> when making the bound exclusive, because the size of the range has been
> restricted to "ii" by that stage.  The follow-on patch does use
> saturating maths for all ops though.)


Sure, no problem having "end" inclusive, as it simplifies things. We
can even keep it inclusive all the way through, iterating over: for (c
= start; c != end+step; c += step)...
>
> >>This doesn't seem to be in the paper, and the comment suggests
> >>"count_succs > count_preds" rather than "count_succs >= count_preds".
> >>Is the ">=" vs ">" important?
> >
> > I think not: in both cases you'll be trying to minimize the same
> > number of live ranges. But indeed it's good to be consistent with the
> > comment.
>
> OK.  For no particular reason other than cowardice, I ended up keeping
> this as:
>
> !   if (count_succs && count_succs >= count_preds)
>
> The reason for asking was that:
>
> !   if (count_succs > count_preds)
>
> seemed more natural, and would match the existing comment.  I'm happy
> to test that instead if you prefer.
>
I wouldn't worry about this tie breaker, unless there's a reason (in
which case the reason should hopefully provide a secondary criteria).

>
> I don't have powerpc hardware that I can do meaningful performance
> testing on, but I did run it through a Popular* Embedded Benchmark
> on an ARM Cortex-A8 board with -O3 -fmodulo-sched
> -fmodulo-sched-allow-regmoves.  There were no changes.  (And this is
> a benchmark that does benefit from modulo scheduling, in some cases
> by a significant amount.)
>
Ok, the patch should be neutral performance-wise. One could check btw
if SMS produces exactly the same output in both cases, even w/o a
powerpc (or any other) HW.

>
> Bootstrapped & regression-tested on powerpc-ibm-aix5.3 with the
> additional patch:
>
> Index: gcc/opts.c
> ===================================================================
> --- gcc/opts.c  2011-08-03 10:56:50.000000000 +0100
> +++ gcc/opts.c  2011-08-03 10:56:50.000000000 +0100
> @@ -449,6 +449,8 @@ static const struct default_options defa
>     { OPT_LEVELS_1_PLUS, OPT_ftree_ch, NULL, 1 },
>     { OPT_LEVELS_1_PLUS, OPT_fcombine_stack_adjustments, NULL, 1 },
>     { OPT_LEVELS_1_PLUS, OPT_fcompare_elim, NULL, 1 },
> +    { OPT_LEVELS_1_PLUS, OPT_fmodulo_sched, NULL, 1 },
> +    { OPT_LEVELS_1_PLUS, OPT_fmodulo_sched_allow_regmoves, NULL, 1 },
>
>     /* -O2 optimizations.  */
>     { OPT_LEVELS_2_PLUS, OPT_finline_small_functions, NULL, 1 },
>
> applied for both the "before" and "after" runs.  OK to install?


Yes, with just a couple of minor non-mandatory comments:
1. Setting "count_preds = psp_not_empty;" and "count_succs =
pss_not_empty;" suggests that you want to decrement non-interesting
neighbors instead of incrementing interesting ones. Otherwise can
initialize to zero (doesn't really matter, just a bit confusing).
2. I find it easier to read "late_start = MIN (late_start, early_start
+ (ii - 1));" instead of "late_start = MIN (early_start + (ii - 1),
late_start);".
3. Suggest to set step=1 at the beginning, explaining from the start
that we first compute a forward range (start<end), and may later
possibly revert it.
4. "late_start" would better be renamed "late_end".
5. While we're at it, as you've unified the treatments, we can now
actually do with "start" and "end" only, and do away with early_start
and late_start altogether...

Thanks,
Ayal.

>
> Thanks,
> Richard
>
>
> gcc/
>        * modulo-sched.c (get_sched_window): Use just one loop for predecessors
>        and one loop for successors.  Fix upper bound of memory range.
>
> Index: gcc/modulo-sched.c
> ===================================================================
> *** gcc/modulo-sched.c  2011-08-04 09:09:29.000000000 +0100
> --- gcc/modulo-sched.c  2011-08-04 09:49:16.000000000 +0100
> *************** #define MAX_SPLIT_NUM 10
> *** 1630,1638 ****
>
>  static int
>  get_sched_window (partial_schedule_ptr ps, ddg_node_ptr u_node,
> !                 sbitmap sched_nodes, int ii, int *start_p, int *step_p, int *end_p)
>  {
>    int start, step, end;
>    ddg_edge_ptr e;
>    sbitmap psp = sbitmap_alloc (ps->g->num_nodes);
>    sbitmap pss = sbitmap_alloc (ps->g->num_nodes);
> --- 1630,1640 ----
>
>  static int
>  get_sched_window (partial_schedule_ptr ps, ddg_node_ptr u_node,
> !                 sbitmap sched_nodes, int ii, int *start_p, int *step_p,
> !                 int *end_p)
>  {
>    int start, step, end;
> +   int early_start, late_start;
>    ddg_edge_ptr e;
>    sbitmap psp = sbitmap_alloc (ps->g->num_nodes);
>    sbitmap pss = sbitmap_alloc (ps->g->num_nodes);
> *************** get_sched_window (partial_schedule_ptr p
> *** 1640,1645 ****
> --- 1642,1649 ----
>    sbitmap u_node_succs = NODE_SUCCESSORS (u_node);
>    int psp_not_empty;
>    int pss_not_empty;
> +   int count_preds;
> +   int count_succs;
>
>    /* 1. compute sched window for u (start, end, step).  */
>    sbitmap_zero (psp);
> *************** get_sched_window (partial_schedule_ptr p
> *** 1647,1861 ****
>    psp_not_empty = sbitmap_a_and_b_cg (psp, u_node_preds, sched_nodes);
>    pss_not_empty = sbitmap_a_and_b_cg (pss, u_node_succs, sched_nodes);
>
> !   if (psp_not_empty && !pss_not_empty)
> !     {
> !       int early_start = INT_MIN;
> !
> !       end = INT_MAX;
> !       for (e = u_node->in; e != 0; e = e->next_in)
> !       {
> !         ddg_node_ptr v_node = e->src;
> !
> !           if (dump_file)
> !             {
> !             fprintf (dump_file, "\nProcessing edge: ");
> !               print_ddg_edge (dump_file, e);
> !             fprintf (dump_file,
> !                      "\nScheduling %d (%d) in psp_not_empty,"
> !                      " checking p %d (%d): ", u_node->cuid,
> !                      INSN_UID (u_node->insn), v_node->cuid, INSN_UID
> !                      (v_node->insn));
> !             }
> !
> !         if (TEST_BIT (sched_nodes, v_node->cuid))
> !           {
> !               int p_st = SCHED_TIME (v_node);
> !
> !               early_start =
> !                 MAX (early_start, p_st + e->latency - (e->distance * ii));
> !
> !               if (dump_file)
> !                 fprintf (dump_file,
> !                          "pred st = %d; early_start = %d; latency: %d",
> !                          p_st, early_start, e->latency);
> !
> !             if (e->data_type == MEM_DEP)
> !               end = MIN (end, SCHED_TIME (v_node) + ii - 1);
> !           }
> !          else if (dump_file)
> !             fprintf (dump_file, "the node is not scheduled\n");
> !       }
> !       start = early_start;
> !       end = MIN (end, early_start + ii);
> !       /* Schedule the node close to it's predecessors.  */
> !       step = 1;
> !
> !       if (dump_file)
> !         fprintf (dump_file,
> !                "\nScheduling %d (%d) in a window (%d..%d) with step %d\n",
> !                u_node->cuid, INSN_UID (u_node->insn), start, end, step);
> !     }
> !
> !   else if (!psp_not_empty && pss_not_empty)
> !     {
> !       int late_start = INT_MAX;
> !
> !       end = INT_MIN;
> !       for (e = u_node->out; e != 0; e = e->next_out)
> !       {
> !         ddg_node_ptr v_node = e->dest;
> !
> !           if (dump_file)
> !             {
> !               fprintf (dump_file, "\nProcessing edge:");
> !               print_ddg_edge (dump_file, e);
> !               fprintf (dump_file,
> !                        "\nScheduling %d (%d) in pss_not_empty,"
> !                        " checking s %d (%d): ", u_node->cuid,
> !                        INSN_UID (u_node->insn), v_node->cuid, INSN_UID
> !                        (v_node->insn));
> !             }
> !
> !         if (TEST_BIT (sched_nodes, v_node->cuid))
> !           {
> !               int s_st = SCHED_TIME (v_node);
> !
> !               late_start = MIN (late_start,
> !                                 s_st - e->latency + (e->distance * ii));
> !
> !               if (dump_file)
> !                 fprintf (dump_file,
> !                          "succ st = %d; late_start = %d; latency = %d",
> !                          s_st, late_start, e->latency);
> !
> !             if (e->data_type == MEM_DEP)
> !               end = MAX (end, SCHED_TIME (v_node) - ii + 1);
> !              if (dump_file)
> !                  fprintf (dump_file, "end = %d\n", end);
> !
> !           }
> !           else if (dump_file)
> !             fprintf (dump_file, "the node is not scheduled\n");
>
> !       }
> !       start = late_start;
> !       end = MAX (end, late_start - ii);
> !       /* Schedule the node close to it's successors.  */
> !       step = -1;
>
> !       if (dump_file)
> !         fprintf (dump_file,
> !                  "\nScheduling %d (%d) in a window (%d..%d) with step %d\n",
> !                  u_node->cuid, INSN_UID (u_node->insn), start, end, step);
>
> !     }
>
> !   else if (psp_not_empty && pss_not_empty)
> !     {
> !       int early_start = INT_MIN;
> !       int late_start = INT_MAX;
> !       int count_preds = 0;
> !       int count_succs = 0;
>
> !       start = INT_MIN;
> !       end = INT_MAX;
> !       for (e = u_node->in; e != 0; e = e->next_in)
> !       {
> !         ddg_node_ptr v_node = e->src;
>
> !         if (dump_file)
> !           {
> !               fprintf (dump_file, "\nProcessing edge:");
> !               print_ddg_edge (dump_file, e);
>              fprintf (dump_file,
> !                      "\nScheduling %d (%d) in psp_pss_not_empty,"
> !                      " checking p %d (%d): ", u_node->cuid, INSN_UID
> !                      (u_node->insn), v_node->cuid, INSN_UID
> !                      (v_node->insn));
> !           }
> !
> !         if (TEST_BIT (sched_nodes, v_node->cuid))
> !           {
> !               int p_st = SCHED_TIME (v_node);
> !
> !             early_start = MAX (early_start,
> !                                p_st + e->latency
> !                                - (e->distance * ii));
> !
> !               if (dump_file)
> !                 fprintf (dump_file,
> !                          "pred st = %d; early_start = %d; latency = %d",
> !                          p_st, early_start, e->latency);
> !
> !               if (e->type == TRUE_DEP && e->data_type == REG_DEP)
> !                 count_preds++;
>
> !             if (e->data_type == MEM_DEP)
> !               end = MIN (end, SCHED_TIME (v_node) + ii - 1);
> !           }
> !           else if (dump_file)
> !             fprintf (dump_file, "the node is not scheduled\n");
>
> !       }
> !       for (e = u_node->out; e != 0; e = e->next_out)
> !       {
> !         ddg_node_ptr v_node = e->dest;
>
> !         if (dump_file)
> !           {
> !               fprintf (dump_file, "\nProcessing edge:");
> !               print_ddg_edge (dump_file, e);
> !             fprintf (dump_file,
> !                      "\nScheduling %d (%d) in psp_pss_not_empty,"
> !                      " checking s %d (%d): ", u_node->cuid, INSN_UID
> !                      (u_node->insn), v_node->cuid, INSN_UID
> !                      (v_node->insn));
> !           }
>
> !         if (TEST_BIT (sched_nodes, v_node->cuid))
> !           {
> !               int s_st = SCHED_TIME (v_node);
>
> !             late_start = MIN (late_start,
> !                               s_st - e->latency
> !                               + (e->distance * ii));
>
> !               if (dump_file)
> !                 fprintf (dump_file,
> !                          "succ st = %d; late_start = %d; latency = %d",
> !                          s_st, late_start, e->latency);
>
> !                if (e->type == TRUE_DEP && e->data_type == REG_DEP)
> !                  count_succs++;
>
> !             if (e->data_type == MEM_DEP)
> !               start = MAX (start, SCHED_TIME (v_node) - ii + 1);
> !           }
> !           else if (dump_file)
> !             fprintf (dump_file, "the node is not scheduled\n");
>
> !       }
> !       start = MAX (start, early_start);
> !       end = MIN (end, MIN (early_start + ii, late_start + 1));
> !       step = 1;
> !       /* If there are more successors than predecessors schedule the
> !          node close to it's successors.  */
> !       if (count_succs >= count_preds)
> !         {
> !           int old_start = start;
>
> !           start = end - 1;
> !           end = old_start - 1;
> !           step = -1;
> !         }
> !     }
> !   else /* psp is empty && pss is empty.  */
> !     {
> !       start = SCHED_ASAP (u_node);
> !       end = start + ii;
> !       step = 1;
>      }
>
>    *start_p = start;
>    *step_p = step;
>    *end_p = end;
> --- 1651,1769 ----
>    psp_not_empty = sbitmap_a_and_b_cg (psp, u_node_preds, sched_nodes);
>    pss_not_empty = sbitmap_a_and_b_cg (pss, u_node_succs, sched_nodes);
>
> !   early_start = INT_MIN;
> !   late_start = INT_MAX;
> !   start = INT_MIN;
> !   end = INT_MAX;
> !   count_preds = psp_not_empty;
> !   count_succs = pss_not_empty;
> !
> !   /* Calculate early_start and limit end.  Both bounds are inclusive.  */
> !   if (psp_not_empty)
> !     for (e = u_node->in; e != 0; e = e->next_in)
> !       {
> !       ddg_node_ptr v_node = e->src;
>
> !       if (dump_file)
> !         {
> !           fprintf (dump_file, "\nProcessing edge: ");
> !           print_ddg_edge (dump_file, e);
> !           fprintf (dump_file,
> !                    "\nScheduling %d (%d) in psp_not_empty,"
> !                    " checking p %d (%d): ", u_node->cuid,
> !                    INSN_UID (u_node->insn), v_node->cuid, INSN_UID
> !                    (v_node->insn));
> !         }
>
> !       if (TEST_BIT (sched_nodes, v_node->cuid))
> !         {
> !           int p_st = SCHED_TIME (v_node);
>
> !           early_start = MAX (early_start,
> !                              p_st + e->latency - (e->distance * ii));
>
> !           if (e->data_type == MEM_DEP)
> !             end = MIN (end, p_st + ii - 1);
>
> !           if (e->type == TRUE_DEP && e->data_type == REG_DEP)
> !             count_preds++;
>
> !           if (dump_file)
>              fprintf (dump_file,
> !                      "pred st = %d; early_start = %d; latency: %d;"
> !                      " end: %d\n", p_st, early_start, e->latency, end);
>
> !         }
> !       else if (dump_file)
> !         fprintf (dump_file, "the node is not scheduled\n");
> !       }
>
> !   /* Calculate late_start and limit start.  Both bounds are inclusive.  */
> !   if (pss_not_empty)
> !     for (e = u_node->out; e != 0; e = e->next_out)
> !       {
> !       ddg_node_ptr v_node = e->dest;
>
> !       if (dump_file)
> !         {
> !           fprintf (dump_file, "\nProcessing edge:");
> !           print_ddg_edge (dump_file, e);
> !           fprintf (dump_file,
> !                    "\nScheduling %d (%d) in pss_not_empty,"
> !                    " checking s %d (%d): ", u_node->cuid,
> !                    INSN_UID (u_node->insn), v_node->cuid, INSN_UID
> !                    (v_node->insn));
> !         }
>
> !       if (TEST_BIT (sched_nodes, v_node->cuid))
> !         {
> !           int s_st = SCHED_TIME (v_node);
>
> !           late_start = MIN (late_start,
> !                             s_st - e->latency + (e->distance * ii));
>
> !           if (e->data_type == MEM_DEP)
> !             start = MAX (start, s_st - ii + 1);
>
> !           if (e->type == TRUE_DEP && e->data_type == REG_DEP)
> !             count_succs++;
>
> !           if (dump_file)
> !             fprintf (dump_file,
> !                      "succ st = %d; late_start = %d; latency = %d;"
> !                      " start=%d", s_st, late_start, e->latency, start);
>
> !         }
> !       else if (dump_file)
> !         fprintf (dump_file, "the node is not scheduled\n");
> !       }
>
> !   /* Get a target scheduling window no bigger than ii.  */
> !   if (early_start == INT_MIN && late_start == INT_MAX)
> !     early_start = SCHED_ASAP (u_node);
> !   else if (early_start == INT_MIN)
> !     early_start = late_start - (ii - 1);
> !   late_start = MIN (early_start + (ii - 1), late_start);
> !
> !   /* Apply memory dependence limits.  */
> !   start = MAX (start, early_start);
> !   end = MIN (end, late_start);
> !   step = 1;
> !
> !   /* If there are at least as many successors as predecessors, schedule the
> !      node close to its successors.  */
> !   if (count_succs && count_succs >= count_preds)
> !     {
> !       int tmp = end;
> !       end = start;
> !       start = tmp;
> !       step = -1;
>      }
>
> +   /* Now that we've finalized the window, make END an exclusive rather
> +      than an inclusive bound.  */
> +   end += step;
> +
>    *start_p = start;
>    *step_p = step;
>    *end_p = end;
> *************** get_sched_window (partial_schedule_ptr p
> *** 1867,1876 ****
>        if (dump_file)
>        fprintf (dump_file, "\nEmpty window: start=%d, end=%d, step=%d\n",
>                 start, end, step);
> !     return -1;
>      }
>
> !     return 0;
>  }
>
>  /* Calculate MUST_PRECEDE/MUST_FOLLOW bitmaps of U_NODE; which is the
> --- 1775,1784 ----
>        if (dump_file)
>        fprintf (dump_file, "\nEmpty window: start=%d, end=%d, step=%d\n",
>                 start, end, step);
> !       return -1;
>      }
>
> !   return 0;
>  }
>
>  /* Calculate MUST_PRECEDE/MUST_FOLLOW bitmaps of U_NODE; which is the

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Question about SMS scheduling windows
  2011-08-04 16:27       ` Ayal Zaks
@ 2011-08-08  9:30         ` Richard Sandiford
  2011-08-08 19:36           ` Ayal Zaks
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Sandiford @ 2011-08-08  9:30 UTC (permalink / raw)
  To: Ayal Zaks; +Cc: Revital1 Eres, gcc

Ayal Zaks <ayal.zaks@gmail.com> writes:
>> OK.  For the follow-on iv patch, it seemed easier to keep both bounds
>> inclusive for the loop, then make the "end" exclusive when setting the
>> out parameters.  (Note that there shouldn't be any problem with overflow
>> when making the bound exclusive, because the size of the range has been
>> restricted to "ii" by that stage.  The follow-on patch does use
>> saturating maths for all ops though.)
>
> Sure, no problem having "end" inclusive, as it simplifies things. We
> can even keep it inclusive all the way through, iterating over: for (c
> = start; c != end+step; c += step)...

Yeah, I'd prefer that too.  I might do it once Revital's and
Roman's changes are in.

>> I don't have powerpc hardware that I can do meaningful performance
>> testing on, but I did run it through a Popular* Embedded Benchmark
>> on an ARM Cortex-A8 board with -O3 -fmodulo-sched
>> -fmodulo-sched-allow-regmoves.  There were no changes.  (And this is
>> a benchmark that does benefit from modulo scheduling, in some cases
>> by a significant amount.)
>>
> Ok, the patch should be neutral performance-wise. One could check btw
> if SMS produces exactly the same output in both cases, even w/o a
> powerpc (or any other) HW.

OK.  The patch does change the output a little because of the MEM_DEP
range thing.  To compensate for that, I tried compiling libav on ARM with:

Index: gcc/modulo-sched.c
===================================================================
--- gcc/modulo-sched.c	2011-08-05 11:23:16.000000000 +0100
+++ gcc/modulo-sched.c	2011-08-05 11:27:57.000000000 +0100
@@ -1680,7 +1680,7 @@ get_sched_window (partial_schedule_ptr p
                          p_st, early_start, e->latency);
 
 	      if (e->data_type == MEM_DEP)
-		end = MIN (end, SCHED_TIME (v_node) + ii - 1);
+		end = MIN (end, SCHED_TIME (v_node) + ii);
 	    }
          else if (dump_file)
             fprintf (dump_file, "the node is not scheduled\n");
@@ -1729,7 +1729,7 @@ get_sched_window (partial_schedule_ptr p
                          s_st, late_start, e->latency);
 
 	      if (e->data_type == MEM_DEP)
-		end = MAX (end, SCHED_TIME (v_node) - ii + 1);
+		end = MAX (end, SCHED_TIME (v_node) - ii);
              if (dump_file)
                  fprintf (dump_file, "end = %d\n", end);
 
@@ -1791,7 +1791,7 @@ get_sched_window (partial_schedule_ptr p
                 count_preds++;
 
 	      if (e->data_type == MEM_DEP)
-		end = MIN (end, SCHED_TIME (v_node) + ii - 1);
+		end = MIN (end, SCHED_TIME (v_node) + ii);
 	    }
           else if (dump_file)
             fprintf (dump_file, "the node is not scheduled\n");

applied, then tried the same thing with the revised patch below.
The output was the same.

(FWIW, libav did show up extra differences when using the patch
that I'd originally submitted.  They were due to the count_preds
and count_succs thing that you picked up in your review.)

>> Bootstrapped & regression-tested on powerpc-ibm-aix5.3 with the
>> additional patch:
>>
>> Index: gcc/opts.c
>> ===================================================================
>> --- gcc/opts.c  2011-08-03 10:56:50.000000000 +0100
>> +++ gcc/opts.c  2011-08-03 10:56:50.000000000 +0100
>> @@ -449,6 +449,8 @@ static const struct default_options defa
>>     { OPT_LEVELS_1_PLUS, OPT_ftree_ch, NULL, 1 },
>>     { OPT_LEVELS_1_PLUS, OPT_fcombine_stack_adjustments, NULL, 1 },
>>     { OPT_LEVELS_1_PLUS, OPT_fcompare_elim, NULL, 1 },
>> +    { OPT_LEVELS_1_PLUS, OPT_fmodulo_sched, NULL, 1 },
>> +    { OPT_LEVELS_1_PLUS, OPT_fmodulo_sched_allow_regmoves, NULL, 1 },
>>
>>     /* -O2 optimizations.  */
>>     { OPT_LEVELS_2_PLUS, OPT_finline_small_functions, NULL, 1 },
>>
>> applied for both the "before" and "after" runs.  OK to install?
>
>
> Yes, with just a couple of minor non-mandatory comments:
> 1. Setting "count_preds = psp_not_empty;" and "count_succs =
> pss_not_empty;" suggests that you want to decrement non-interesting
> neighbors instead of incrementing interesting ones. Otherwise can
> initialize to zero (doesn't really matter, just a bit confusing).

Yeah, good point.  I initialised them to zero, like you said,
and changed the reversal condition to:

  if (pss_not_empty && count_succs >= count_preds)

Which I have admit makes a lot more sense than what I submitted,
and avoids some unwanted changes in output.

> 2. I find it easier to read "late_start = MIN (late_start, early_start
> + (ii - 1));" instead of "late_start = MIN (early_start + (ii - 1),
> late_start);".

Oops, that was accidental.  Fixed.

> 3. Suggest to set step=1 at the beginning, explaining from the start
> that we first compute a forward range (start<end), and may later
> possibly revert it.

Yeah, that's better.

> 4. "late_start" would better be renamed "late_end".

Since you said this was optional, I decided to leave it as it is.
The name and value of "late start" come directly from the paper.

> 5. While we're at it, as you've unified the treatments, we can now
> actually do with "start" and "end" only, and do away with early_start
> and late_start altogether...

The IV patch needs the same distinction as we have now, so I'd prefer
to keep it.

Thanks for the review.  Here's what I applied after testing on
powerpc-ibm-aix5.3.

Richard


gcc/
	* modulo-sched.c (get_sched_window): Use just one loop for predecessors
	and one loop for successors.  Fix upper bound of memory range.

Index: gcc/modulo-sched.c
===================================================================
*** gcc/modulo-sched.c	2011-08-05 10:39:27.000000000 +0100
--- gcc/modulo-sched.c	2011-08-05 10:40:49.000000000 +0100
*************** #define MAX_SPLIT_NUM 10
*** 1630,1638 ****
  
  static int
  get_sched_window (partial_schedule_ptr ps, ddg_node_ptr u_node,
! 		  sbitmap sched_nodes, int ii, int *start_p, int *step_p, int *end_p)
  {
    int start, step, end;
    ddg_edge_ptr e;
    sbitmap psp = sbitmap_alloc (ps->g->num_nodes);
    sbitmap pss = sbitmap_alloc (ps->g->num_nodes);
--- 1630,1640 ----
  
  static int
  get_sched_window (partial_schedule_ptr ps, ddg_node_ptr u_node,
! 		  sbitmap sched_nodes, int ii, int *start_p, int *step_p,
! 		  int *end_p)
  {
    int start, step, end;
+   int early_start, late_start;
    ddg_edge_ptr e;
    sbitmap psp = sbitmap_alloc (ps->g->num_nodes);
    sbitmap pss = sbitmap_alloc (ps->g->num_nodes);
*************** get_sched_window (partial_schedule_ptr p
*** 1640,1645 ****
--- 1642,1649 ----
    sbitmap u_node_succs = NODE_SUCCESSORS (u_node);
    int psp_not_empty;
    int pss_not_empty;
+   int count_preds;
+   int count_succs;
  
    /* 1. compute sched window for u (start, end, step).  */
    sbitmap_zero (psp);
*************** get_sched_window (partial_schedule_ptr p
*** 1647,1861 ****
    psp_not_empty = sbitmap_a_and_b_cg (psp, u_node_preds, sched_nodes);
    pss_not_empty = sbitmap_a_and_b_cg (pss, u_node_succs, sched_nodes);
  
!   if (psp_not_empty && !pss_not_empty)
!     {
!       int early_start = INT_MIN;
! 
!       end = INT_MAX;
!       for (e = u_node->in; e != 0; e = e->next_in)
! 	{
! 	  ddg_node_ptr v_node = e->src;
! 
!           if (dump_file)
!             {
! 	      fprintf (dump_file, "\nProcessing edge: ");
!               print_ddg_edge (dump_file, e);
! 	      fprintf (dump_file,
! 		       "\nScheduling %d (%d) in psp_not_empty,"
! 		       " checking p %d (%d): ", u_node->cuid,
! 		       INSN_UID (u_node->insn), v_node->cuid, INSN_UID
! 		       (v_node->insn));
!             }
! 
! 	  if (TEST_BIT (sched_nodes, v_node->cuid))
! 	    {
!               int p_st = SCHED_TIME (v_node);
! 
!               early_start =
!                 MAX (early_start, p_st + e->latency - (e->distance * ii));
! 
!               if (dump_file)
!                 fprintf (dump_file,
!                          "pred st = %d; early_start = %d; latency: %d",
!                          p_st, early_start, e->latency);
! 
! 	      if (e->data_type == MEM_DEP)
! 		end = MIN (end, SCHED_TIME (v_node) + ii - 1);
! 	    }
!          else if (dump_file)
!             fprintf (dump_file, "the node is not scheduled\n");
! 	}
!       start = early_start;
!       end = MIN (end, early_start + ii);
!       /* Schedule the node close to it's predecessors.  */
!       step = 1;
! 
!       if (dump_file)
!         fprintf (dump_file,
! 		 "\nScheduling %d (%d) in a window (%d..%d) with step %d\n",
! 		 u_node->cuid, INSN_UID (u_node->insn), start, end, step);
!     }
! 
!   else if (!psp_not_empty && pss_not_empty)
!     {
!       int late_start = INT_MAX;
! 
!       end = INT_MIN;
!       for (e = u_node->out; e != 0; e = e->next_out)
! 	{
! 	  ddg_node_ptr v_node = e->dest;
! 
!           if (dump_file)
!             {
!               fprintf (dump_file, "\nProcessing edge:");
!               print_ddg_edge (dump_file, e);
!               fprintf (dump_file,
!                        "\nScheduling %d (%d) in pss_not_empty,"
!                        " checking s %d (%d): ", u_node->cuid,
!                        INSN_UID (u_node->insn), v_node->cuid, INSN_UID
!                        (v_node->insn));
!             }
! 
! 	  if (TEST_BIT (sched_nodes, v_node->cuid))
! 	    {
!               int s_st = SCHED_TIME (v_node);
! 
!               late_start = MIN (late_start,
!                                 s_st - e->latency + (e->distance * ii));
! 
!               if (dump_file)
!                 fprintf (dump_file,
!                          "succ st = %d; late_start = %d; latency = %d",
!                          s_st, late_start, e->latency);
! 
! 	      if (e->data_type == MEM_DEP)
! 		end = MAX (end, SCHED_TIME (v_node) - ii + 1);
!              if (dump_file)
!                  fprintf (dump_file, "end = %d\n", end);
! 
! 	    }
!           else if (dump_file)
!             fprintf (dump_file, "the node is not scheduled\n");
  
! 	}
!       start = late_start;
!       end = MAX (end, late_start - ii);
!       /* Schedule the node close to it's successors.  */
!       step = -1;
  
!       if (dump_file)
!         fprintf (dump_file,
!                  "\nScheduling %d (%d) in a window (%d..%d) with step %d\n",
!                  u_node->cuid, INSN_UID (u_node->insn), start, end, step);
  
!     }
  
!   else if (psp_not_empty && pss_not_empty)
!     {
!       int early_start = INT_MIN;
!       int late_start = INT_MAX;
!       int count_preds = 0;
!       int count_succs = 0;
  
!       start = INT_MIN;
!       end = INT_MAX;
!       for (e = u_node->in; e != 0; e = e->next_in)
! 	{
! 	  ddg_node_ptr v_node = e->src;
  
! 	  if (dump_file)
! 	    {
!               fprintf (dump_file, "\nProcessing edge:");
!               print_ddg_edge (dump_file, e);
  	      fprintf (dump_file,
! 		       "\nScheduling %d (%d) in psp_pss_not_empty,"
! 		       " checking p %d (%d): ", u_node->cuid, INSN_UID
! 		       (u_node->insn), v_node->cuid, INSN_UID
! 		       (v_node->insn));
! 	    }
! 
! 	  if (TEST_BIT (sched_nodes, v_node->cuid))
! 	    {
!               int p_st = SCHED_TIME (v_node);
! 
! 	      early_start = MAX (early_start,
! 				 p_st + e->latency
! 				 - (e->distance * ii));
! 
!               if (dump_file)
!                 fprintf (dump_file,
!                          "pred st = %d; early_start = %d; latency = %d",
!                          p_st, early_start, e->latency);
! 
!               if (e->type == TRUE_DEP && e->data_type == REG_DEP)
!                 count_preds++;
  
! 	      if (e->data_type == MEM_DEP)
! 		end = MIN (end, SCHED_TIME (v_node) + ii - 1);
! 	    }
!           else if (dump_file)
!             fprintf (dump_file, "the node is not scheduled\n");
  
! 	}
!       for (e = u_node->out; e != 0; e = e->next_out)
! 	{
! 	  ddg_node_ptr v_node = e->dest;
  
! 	  if (dump_file)
! 	    {
!               fprintf (dump_file, "\nProcessing edge:");
!               print_ddg_edge (dump_file, e);
! 	      fprintf (dump_file,
! 		       "\nScheduling %d (%d) in psp_pss_not_empty,"
! 		       " checking s %d (%d): ", u_node->cuid, INSN_UID
! 		       (u_node->insn), v_node->cuid, INSN_UID
! 		       (v_node->insn));
! 	    }
  
! 	  if (TEST_BIT (sched_nodes, v_node->cuid))
! 	    {
!               int s_st = SCHED_TIME (v_node);
  
! 	      late_start = MIN (late_start,
! 				s_st - e->latency
! 				+ (e->distance * ii));
  
!               if (dump_file)
!                 fprintf (dump_file,
!                          "succ st = %d; late_start = %d; latency = %d",
!                          s_st, late_start, e->latency);
  
!                if (e->type == TRUE_DEP && e->data_type == REG_DEP)
!                  count_succs++;
  
! 	      if (e->data_type == MEM_DEP)
! 		start = MAX (start, SCHED_TIME (v_node) - ii + 1);
! 	    }
!           else if (dump_file)
!             fprintf (dump_file, "the node is not scheduled\n");
  
! 	}
!       start = MAX (start, early_start);
!       end = MIN (end, MIN (early_start + ii, late_start + 1));
!       step = 1;
!       /* If there are more successors than predecessors schedule the
!          node close to it's successors.  */
!       if (count_succs >= count_preds)
!         {
!           int old_start = start;
  
!           start = end - 1;
!           end = old_start - 1;
!           step = -1;
!         }
!     }
!   else /* psp is empty && pss is empty.  */
!     {
!       start = SCHED_ASAP (u_node);
!       end = start + ii;
!       step = 1;
      }
  
    *start_p = start;
    *step_p = step;
    *end_p = end;
--- 1651,1772 ----
    psp_not_empty = sbitmap_a_and_b_cg (psp, u_node_preds, sched_nodes);
    pss_not_empty = sbitmap_a_and_b_cg (pss, u_node_succs, sched_nodes);
  
!   /* We first compute a forward range (start <= end), then decide whether
!      to reverse it.  */
!   early_start = INT_MIN;
!   late_start = INT_MAX;
!   start = INT_MIN;
!   end = INT_MAX;
!   step = 1;
! 
!   count_preds = 0;
!   count_succs = 0;
! 
!   /* Calculate early_start and limit end.  Both bounds are inclusive.  */
!   if (psp_not_empty)
!     for (e = u_node->in; e != 0; e = e->next_in)
!       {
! 	ddg_node_ptr v_node = e->src;
  
! 	if (dump_file)
! 	  {
! 	    fprintf (dump_file, "\nProcessing edge: ");
! 	    print_ddg_edge (dump_file, e);
! 	    fprintf (dump_file,
! 		     "\nScheduling %d (%d) in psp_not_empty,"
! 		     " checking p %d (%d): ", u_node->cuid,
! 		     INSN_UID (u_node->insn), v_node->cuid, INSN_UID
! 		     (v_node->insn));
! 	  }
  
! 	if (TEST_BIT (sched_nodes, v_node->cuid))
! 	  {
! 	    int p_st = SCHED_TIME (v_node);
  
! 	    early_start = MAX (early_start,
! 			       p_st + e->latency - (e->distance * ii));
  
! 	    if (e->data_type == MEM_DEP)
! 	      end = MIN (end, p_st + ii - 1);
  
! 	    if (e->type == TRUE_DEP && e->data_type == REG_DEP)
! 	      count_preds++;
  
! 	    if (dump_file)
  	      fprintf (dump_file,
! 		       "pred st = %d; early_start = %d; latency: %d;"
! 		       " end: %d\n", p_st, early_start, e->latency, end);
  
! 	  }
! 	else if (dump_file)
! 	  fprintf (dump_file, "the node is not scheduled\n");
!       }
  
!   /* Calculate late_start and limit start.  Both bounds are inclusive.  */
!   if (pss_not_empty)
!     for (e = u_node->out; e != 0; e = e->next_out)
!       {
! 	ddg_node_ptr v_node = e->dest;
  
! 	if (dump_file)
! 	  {
! 	    fprintf (dump_file, "\nProcessing edge:");
! 	    print_ddg_edge (dump_file, e);
! 	    fprintf (dump_file,
! 		     "\nScheduling %d (%d) in pss_not_empty,"
! 		     " checking s %d (%d): ", u_node->cuid,
! 		     INSN_UID (u_node->insn), v_node->cuid, INSN_UID
! 		     (v_node->insn));
! 	  }
  
! 	if (TEST_BIT (sched_nodes, v_node->cuid))
! 	  {
! 	    int s_st = SCHED_TIME (v_node);
  
! 	    late_start = MIN (late_start,
! 			      s_st - e->latency + (e->distance * ii));
  
! 	    if (e->data_type == MEM_DEP)
! 	      start = MAX (start, s_st - ii + 1);
  
! 	    if (e->type == TRUE_DEP && e->data_type == REG_DEP)
! 	      count_succs++;
  
! 	    if (dump_file)
! 	      fprintf (dump_file,
! 		       "succ st = %d; late_start = %d; latency = %d;"
! 		       " start=%d", s_st, late_start, e->latency, start);
  
! 	  }
! 	else if (dump_file)
! 	  fprintf (dump_file, "the node is not scheduled\n");
!       }
  
!   /* Get a target scheduling window no bigger than ii.  */
!   if (early_start == INT_MIN && late_start == INT_MAX)
!     early_start = SCHED_ASAP (u_node);
!   else if (early_start == INT_MIN)
!     early_start = late_start - (ii - 1);
!   late_start = MIN (late_start, early_start + (ii - 1));
! 
!   /* Apply memory dependence limits.  */
!   start = MAX (start, early_start);
!   end = MIN (end, late_start);
! 
!   /* If there are at least as many successors as predecessors, schedule the
!      node close to its successors.  */
!   if (pss_not_empty && count_succs >= count_preds)
!     {
!       int tmp = end;
!       end = start;
!       start = tmp;
!       step = -1;
      }
  
+   /* Now that we've finalized the window, make END an exclusive rather
+      than an inclusive bound.  */
+   end += step;
+ 
    *start_p = start;
    *step_p = step;
    *end_p = end;
*************** get_sched_window (partial_schedule_ptr p
*** 1867,1876 ****
        if (dump_file)
  	fprintf (dump_file, "\nEmpty window: start=%d, end=%d, step=%d\n",
  		 start, end, step);
!     return -1;
      }
  
!     return 0;
  }
  
  /* Calculate MUST_PRECEDE/MUST_FOLLOW bitmaps of U_NODE; which is the
--- 1778,1787 ----
        if (dump_file)
  	fprintf (dump_file, "\nEmpty window: start=%d, end=%d, step=%d\n",
  		 start, end, step);
!       return -1;
      }
  
!   return 0;
  }
  
  /* Calculate MUST_PRECEDE/MUST_FOLLOW bitmaps of U_NODE; which is the

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Question about SMS scheduling windows
  2011-08-08  9:30         ` Richard Sandiford
@ 2011-08-08 19:36           ` Ayal Zaks
  2011-08-09 11:19             ` Richard Sandiford
  0 siblings, 1 reply; 9+ messages in thread
From: Ayal Zaks @ 2011-08-08 19:36 UTC (permalink / raw)
  To: Ayal Zaks, Revital1 Eres, gcc, richard.sandiford

2011/8/8 Richard Sandiford <richard.sandiford@linaro.org>
>
> Ayal Zaks <ayal.zaks@gmail.com> writes:
> >> OK.  For the follow-on iv patch, it seemed easier to keep both bounds
> >> inclusive for the loop, then make the "end" exclusive when setting the
> >> out parameters.  (Note that there shouldn't be any problem with overflow
> >> when making the bound exclusive, because the size of the range has been
> >> restricted to "ii" by that stage.  The follow-on patch does use
> >> saturating maths for all ops though.)
> >
> > Sure, no problem having "end" inclusive, as it simplifies things. We
> > can even keep it inclusive all the way through, iterating over: for (c
> > = start; c != end+step; c += step)...
>
> Yeah, I'd prefer that too.  I might do it once Revital's and
> Roman's changes are in.
>
> >> I don't have powerpc hardware that I can do meaningful performance
> >> testing on, but I did run it through a Popular* Embedded Benchmark
> >> on an ARM Cortex-A8 board with -O3 -fmodulo-sched
> >> -fmodulo-sched-allow-regmoves.  There were no changes.  (And this is
> >> a benchmark that does benefit from modulo scheduling, in some cases
> >> by a significant amount.)
> >>
> > Ok, the patch should be neutral performance-wise. One could check btw
> > if SMS produces exactly the same output in both cases, even w/o a
> > powerpc (or any other) HW.
>
> OK.  The patch does change the output a little because of the MEM_DEP
> range thing.


Ahh, right.

>
>  To compensate for that, I tried compiling libav on ARM with:
>
> Index: gcc/modulo-sched.c
> ===================================================================
> --- gcc/modulo-sched.c  2011-08-05 11:23:16.000000000 +0100
> +++ gcc/modulo-sched.c  2011-08-05 11:27:57.000000000 +0100
> @@ -1680,7 +1680,7 @@ get_sched_window (partial_schedule_ptr p
>                          p_st, early_start, e->latency);
>
>              if (e->data_type == MEM_DEP)
> -               end = MIN (end, SCHED_TIME (v_node) + ii - 1);
> +               end = MIN (end, SCHED_TIME (v_node) + ii);
>            }
>          else if (dump_file)
>             fprintf (dump_file, "the node is not scheduled\n");
> @@ -1729,7 +1729,7 @@ get_sched_window (partial_schedule_ptr p
>                          s_st, late_start, e->latency);
>
>              if (e->data_type == MEM_DEP)
> -               end = MAX (end, SCHED_TIME (v_node) - ii + 1);
> +               end = MAX (end, SCHED_TIME (v_node) - ii);
>              if (dump_file)
>                  fprintf (dump_file, "end = %d\n", end);
>
> @@ -1791,7 +1791,7 @@ get_sched_window (partial_schedule_ptr p
>                 count_preds++;
>
>              if (e->data_type == MEM_DEP)
> -               end = MIN (end, SCHED_TIME (v_node) + ii - 1);
> +               end = MIN (end, SCHED_TIME (v_node) + ii);
>            }
>           else if (dump_file)
>             fprintf (dump_file, "the node is not scheduled\n");
>
> applied, then tried the same thing with the revised patch below.
> The output was the same.


ok, that's a good sanity check.

>
> (FWIW, libav did show up extra differences when using the patch
> that I'd originally submitted.  They were due to the count_preds
> and count_succs thing that you picked up in your review.)


(These differences had no noticable consequences performance-wise, right?)

>
> >> Bootstrapped & regression-tested on powerpc-ibm-aix5.3 with the
> >> additional patch:
> >>
> >> Index: gcc/opts.c
> >> ===================================================================
> >> --- gcc/opts.c  2011-08-03 10:56:50.000000000 +0100
> >> +++ gcc/opts.c  2011-08-03 10:56:50.000000000 +0100
> >> @@ -449,6 +449,8 @@ static const struct default_options defa
> >>     { OPT_LEVELS_1_PLUS, OPT_ftree_ch, NULL, 1 },
> >>     { OPT_LEVELS_1_PLUS, OPT_fcombine_stack_adjustments, NULL, 1 },
> >>     { OPT_LEVELS_1_PLUS, OPT_fcompare_elim, NULL, 1 },
> >> +    { OPT_LEVELS_1_PLUS, OPT_fmodulo_sched, NULL, 1 },
> >> +    { OPT_LEVELS_1_PLUS, OPT_fmodulo_sched_allow_regmoves, NULL, 1 },
> >>
> >>     /* -O2 optimizations.  */
> >>     { OPT_LEVELS_2_PLUS, OPT_finline_small_functions, NULL, 1 },
> >>
> >> applied for both the "before" and "after" runs.  OK to install?
> >
> >
> > Yes, with just a couple of minor non-mandatory comments:
> > 1. Setting "count_preds = psp_not_empty;" and "count_succs =
> > pss_not_empty;" suggests that you want to decrement non-interesting
> > neighbors instead of incrementing interesting ones. Otherwise can
> > initialize to zero (doesn't really matter, just a bit confusing).
>
> Yeah, good point.  I initialised them to zero, like you said,
> and changed the reversal condition to:
>
>  if (pss_not_empty && count_succs >= count_preds)
>
> Which I have admit makes a lot more sense than what I submitted,
> and avoids some unwanted changes in output.
>
> > 2. I find it easier to read "late_start = MIN (late_start, early_start
> > + (ii - 1));" instead of "late_start = MIN (early_start + (ii - 1),
> > late_start);".
>
> Oops, that was accidental.  Fixed.
>
> > 3. Suggest to set step=1 at the beginning, explaining from the start
> > that we first compute a forward range (start<end), and may later
> > possibly revert it.
>
> Yeah, that's better.
>
> > 4. "late_start" would better be renamed "late_end".
>
> Since you said this was optional, I decided to leave it as it is.
> The name and value of "late start" come directly from the paper.


Ok. We can write another paper ;-)

>
> > 5. While we're at it, as you've unified the treatments, we can now
> > actually do with "start" and "end" only, and do away with early_start
> > and late_start altogether...
>
> The IV patch needs the same distinction as we have now, so I'd prefer
> to keep it.


Ok, looking forward to the IV patch then...

>
> Thanks for the review.  Here's what I applied after testing on
> powerpc-ibm-aix5.3.
>


Thanks for looking into this,
Ayal.


>
> Richard
>
>
> gcc/
>        * modulo-sched.c (get_sched_window): Use just one loop for predecessors
>        and one loop for successors.  Fix upper bound of memory range.
>
> Index: gcc/modulo-sched.c
> ===================================================================
> *** gcc/modulo-sched.c  2011-08-05 10:39:27.000000000 +0100
> --- gcc/modulo-sched.c  2011-08-05 10:40:49.000000000 +0100
> *************** #define MAX_SPLIT_NUM 10
> *** 1630,1638 ****
>
>  static int
>  get_sched_window (partial_schedule_ptr ps, ddg_node_ptr u_node,
> !                 sbitmap sched_nodes, int ii, int *start_p, int *step_p, int *end_p)
>  {
>    int start, step, end;
>    ddg_edge_ptr e;
>    sbitmap psp = sbitmap_alloc (ps->g->num_nodes);
>    sbitmap pss = sbitmap_alloc (ps->g->num_nodes);
> --- 1630,1640 ----
>
>  static int
>  get_sched_window (partial_schedule_ptr ps, ddg_node_ptr u_node,
> !                 sbitmap sched_nodes, int ii, int *start_p, int *step_p,
> !                 int *end_p)
>  {
>    int start, step, end;
> +   int early_start, late_start;
>    ddg_edge_ptr e;
>    sbitmap psp = sbitmap_alloc (ps->g->num_nodes);
>    sbitmap pss = sbitmap_alloc (ps->g->num_nodes);
> *************** get_sched_window (partial_schedule_ptr p
> *** 1640,1645 ****
> --- 1642,1649 ----
>    sbitmap u_node_succs = NODE_SUCCESSORS (u_node);
>    int psp_not_empty;
>    int pss_not_empty;
> +   int count_preds;
> +   int count_succs;
>
>    /* 1. compute sched window for u (start, end, step).  */
>    sbitmap_zero (psp);
> *************** get_sched_window (partial_schedule_ptr p
> *** 1647,1861 ****
>    psp_not_empty = sbitmap_a_and_b_cg (psp, u_node_preds, sched_nodes);
>    pss_not_empty = sbitmap_a_and_b_cg (pss, u_node_succs, sched_nodes);
>
> !   if (psp_not_empty && !pss_not_empty)
> !     {
> !       int early_start = INT_MIN;
> !
> !       end = INT_MAX;
> !       for (e = u_node->in; e != 0; e = e->next_in)
> !       {
> !         ddg_node_ptr v_node = e->src;
> !
> !           if (dump_file)
> !             {
> !             fprintf (dump_file, "\nProcessing edge: ");
> !               print_ddg_edge (dump_file, e);
> !             fprintf (dump_file,
> !                      "\nScheduling %d (%d) in psp_not_empty,"
> !                      " checking p %d (%d): ", u_node->cuid,
> !                      INSN_UID (u_node->insn), v_node->cuid, INSN_UID
> !                      (v_node->insn));
> !             }
> !
> !         if (TEST_BIT (sched_nodes, v_node->cuid))
> !           {
> !               int p_st = SCHED_TIME (v_node);
> !
> !               early_start =
> !                 MAX (early_start, p_st + e->latency - (e->distance * ii));
> !
> !               if (dump_file)
> !                 fprintf (dump_file,
> !                          "pred st = %d; early_start = %d; latency: %d",
> !                          p_st, early_start, e->latency);
> !
> !             if (e->data_type == MEM_DEP)
> !               end = MIN (end, SCHED_TIME (v_node) + ii - 1);
> !           }
> !          else if (dump_file)
> !             fprintf (dump_file, "the node is not scheduled\n");
> !       }
> !       start = early_start;
> !       end = MIN (end, early_start + ii);
> !       /* Schedule the node close to it's predecessors.  */
> !       step = 1;
> !
> !       if (dump_file)
> !         fprintf (dump_file,
> !                "\nScheduling %d (%d) in a window (%d..%d) with step %d\n",
> !                u_node->cuid, INSN_UID (u_node->insn), start, end, step);
> !     }
> !
> !   else if (!psp_not_empty && pss_not_empty)
> !     {
> !       int late_start = INT_MAX;
> !
> !       end = INT_MIN;
> !       for (e = u_node->out; e != 0; e = e->next_out)
> !       {
> !         ddg_node_ptr v_node = e->dest;
> !
> !           if (dump_file)
> !             {
> !               fprintf (dump_file, "\nProcessing edge:");
> !               print_ddg_edge (dump_file, e);
> !               fprintf (dump_file,
> !                        "\nScheduling %d (%d) in pss_not_empty,"
> !                        " checking s %d (%d): ", u_node->cuid,
> !                        INSN_UID (u_node->insn), v_node->cuid, INSN_UID
> !                        (v_node->insn));
> !             }
> !
> !         if (TEST_BIT (sched_nodes, v_node->cuid))
> !           {
> !               int s_st = SCHED_TIME (v_node);
> !
> !               late_start = MIN (late_start,
> !                                 s_st - e->latency + (e->distance * ii));
> !
> !               if (dump_file)
> !                 fprintf (dump_file,
> !                          "succ st = %d; late_start = %d; latency = %d",
> !                          s_st, late_start, e->latency);
> !
> !             if (e->data_type == MEM_DEP)
> !               end = MAX (end, SCHED_TIME (v_node) - ii + 1);
> !              if (dump_file)
> !                  fprintf (dump_file, "end = %d\n", end);
> !
> !           }
> !           else if (dump_file)
> !             fprintf (dump_file, "the node is not scheduled\n");
>
> !       }
> !       start = late_start;
> !       end = MAX (end, late_start - ii);
> !       /* Schedule the node close to it's successors.  */
> !       step = -1;
>
> !       if (dump_file)
> !         fprintf (dump_file,
> !                  "\nScheduling %d (%d) in a window (%d..%d) with step %d\n",
> !                  u_node->cuid, INSN_UID (u_node->insn), start, end, step);
>
> !     }
>
> !   else if (psp_not_empty && pss_not_empty)
> !     {
> !       int early_start = INT_MIN;
> !       int late_start = INT_MAX;
> !       int count_preds = 0;
> !       int count_succs = 0;
>
> !       start = INT_MIN;
> !       end = INT_MAX;
> !       for (e = u_node->in; e != 0; e = e->next_in)
> !       {
> !         ddg_node_ptr v_node = e->src;
>
> !         if (dump_file)
> !           {
> !               fprintf (dump_file, "\nProcessing edge:");
> !               print_ddg_edge (dump_file, e);
>              fprintf (dump_file,
> !                      "\nScheduling %d (%d) in psp_pss_not_empty,"
> !                      " checking p %d (%d): ", u_node->cuid, INSN_UID
> !                      (u_node->insn), v_node->cuid, INSN_UID
> !                      (v_node->insn));
> !           }
> !
> !         if (TEST_BIT (sched_nodes, v_node->cuid))
> !           {
> !               int p_st = SCHED_TIME (v_node);
> !
> !             early_start = MAX (early_start,
> !                                p_st + e->latency
> !                                - (e->distance * ii));
> !
> !               if (dump_file)
> !                 fprintf (dump_file,
> !                          "pred st = %d; early_start = %d; latency = %d",
> !                          p_st, early_start, e->latency);
> !
> !               if (e->type == TRUE_DEP && e->data_type == REG_DEP)
> !                 count_preds++;
>
> !             if (e->data_type == MEM_DEP)
> !               end = MIN (end, SCHED_TIME (v_node) + ii - 1);
> !           }
> !           else if (dump_file)
> !             fprintf (dump_file, "the node is not scheduled\n");
>
> !       }
> !       for (e = u_node->out; e != 0; e = e->next_out)
> !       {
> !         ddg_node_ptr v_node = e->dest;
>
> !         if (dump_file)
> !           {
> !               fprintf (dump_file, "\nProcessing edge:");
> !               print_ddg_edge (dump_file, e);
> !             fprintf (dump_file,
> !                      "\nScheduling %d (%d) in psp_pss_not_empty,"
> !                      " checking s %d (%d): ", u_node->cuid, INSN_UID
> !                      (u_node->insn), v_node->cuid, INSN_UID
> !                      (v_node->insn));
> !           }
>
> !         if (TEST_BIT (sched_nodes, v_node->cuid))
> !           {
> !               int s_st = SCHED_TIME (v_node);
>
> !             late_start = MIN (late_start,
> !                               s_st - e->latency
> !                               + (e->distance * ii));
>
> !               if (dump_file)
> !                 fprintf (dump_file,
> !                          "succ st = %d; late_start = %d; latency = %d",
> !                          s_st, late_start, e->latency);
>
> !                if (e->type == TRUE_DEP && e->data_type == REG_DEP)
> !                  count_succs++;
>
> !             if (e->data_type == MEM_DEP)
> !               start = MAX (start, SCHED_TIME (v_node) - ii + 1);
> !           }
> !           else if (dump_file)
> !             fprintf (dump_file, "the node is not scheduled\n");
>
> !       }
> !       start = MAX (start, early_start);
> !       end = MIN (end, MIN (early_start + ii, late_start + 1));
> !       step = 1;
> !       /* If there are more successors than predecessors schedule the
> !          node close to it's successors.  */
> !       if (count_succs >= count_preds)
> !         {
> !           int old_start = start;
>
> !           start = end - 1;
> !           end = old_start - 1;
> !           step = -1;
> !         }
> !     }
> !   else /* psp is empty && pss is empty.  */
> !     {
> !       start = SCHED_ASAP (u_node);
> !       end = start + ii;
> !       step = 1;
>      }
>
>    *start_p = start;
>    *step_p = step;
>    *end_p = end;
> --- 1651,1772 ----
>    psp_not_empty = sbitmap_a_and_b_cg (psp, u_node_preds, sched_nodes);
>    pss_not_empty = sbitmap_a_and_b_cg (pss, u_node_succs, sched_nodes);
>
> !   /* We first compute a forward range (start <= end), then decide whether
> !      to reverse it.  */
> !   early_start = INT_MIN;
> !   late_start = INT_MAX;
> !   start = INT_MIN;
> !   end = INT_MAX;
> !   step = 1;
> !
> !   count_preds = 0;
> !   count_succs = 0;
> !
> !   /* Calculate early_start and limit end.  Both bounds are inclusive.  */
> !   if (psp_not_empty)
> !     for (e = u_node->in; e != 0; e = e->next_in)
> !       {
> !       ddg_node_ptr v_node = e->src;
>
> !       if (dump_file)
> !         {
> !           fprintf (dump_file, "\nProcessing edge: ");
> !           print_ddg_edge (dump_file, e);
> !           fprintf (dump_file,
> !                    "\nScheduling %d (%d) in psp_not_empty,"
> !                    " checking p %d (%d): ", u_node->cuid,
> !                    INSN_UID (u_node->insn), v_node->cuid, INSN_UID
> !                    (v_node->insn));
> !         }
>
> !       if (TEST_BIT (sched_nodes, v_node->cuid))
> !         {
> !           int p_st = SCHED_TIME (v_node);
>
> !           early_start = MAX (early_start,
> !                              p_st + e->latency - (e->distance * ii));
>
> !           if (e->data_type == MEM_DEP)
> !             end = MIN (end, p_st + ii - 1);
>
> !           if (e->type == TRUE_DEP && e->data_type == REG_DEP)
> !             count_preds++;
>
> !           if (dump_file)
>              fprintf (dump_file,
> !                      "pred st = %d; early_start = %d; latency: %d;"
> !                      " end: %d\n", p_st, early_start, e->latency, end);
>
> !         }
> !       else if (dump_file)
> !         fprintf (dump_file, "the node is not scheduled\n");
> !       }
>
> !   /* Calculate late_start and limit start.  Both bounds are inclusive.  */
> !   if (pss_not_empty)
> !     for (e = u_node->out; e != 0; e = e->next_out)
> !       {
> !       ddg_node_ptr v_node = e->dest;
>
> !       if (dump_file)
> !         {
> !           fprintf (dump_file, "\nProcessing edge:");
> !           print_ddg_edge (dump_file, e);
> !           fprintf (dump_file,
> !                    "\nScheduling %d (%d) in pss_not_empty,"
> !                    " checking s %d (%d): ", u_node->cuid,
> !                    INSN_UID (u_node->insn), v_node->cuid, INSN_UID
> !                    (v_node->insn));
> !         }
>
> !       if (TEST_BIT (sched_nodes, v_node->cuid))
> !         {
> !           int s_st = SCHED_TIME (v_node);
>
> !           late_start = MIN (late_start,
> !                             s_st - e->latency + (e->distance * ii));
>
> !           if (e->data_type == MEM_DEP)
> !             start = MAX (start, s_st - ii + 1);
>
> !           if (e->type == TRUE_DEP && e->data_type == REG_DEP)
> !             count_succs++;
>
> !           if (dump_file)
> !             fprintf (dump_file,
> !                      "succ st = %d; late_start = %d; latency = %d;"
> !                      " start=%d", s_st, late_start, e->latency, start);
>
> !         }
> !       else if (dump_file)
> !         fprintf (dump_file, "the node is not scheduled\n");
> !       }
>
> !   /* Get a target scheduling window no bigger than ii.  */
> !   if (early_start == INT_MIN && late_start == INT_MAX)
> !     early_start = SCHED_ASAP (u_node);
> !   else if (early_start == INT_MIN)
> !     early_start = late_start - (ii - 1);
> !   late_start = MIN (late_start, early_start + (ii - 1));
> !
> !   /* Apply memory dependence limits.  */
> !   start = MAX (start, early_start);
> !   end = MIN (end, late_start);
> !
> !   /* If there are at least as many successors as predecessors, schedule the
> !      node close to its successors.  */
> !   if (pss_not_empty && count_succs >= count_preds)
> !     {
> !       int tmp = end;
> !       end = start;
> !       start = tmp;
> !       step = -1;
>      }
>
> +   /* Now that we've finalized the window, make END an exclusive rather
> +      than an inclusive bound.  */
> +   end += step;
> +
>    *start_p = start;
>    *step_p = step;
>    *end_p = end;
> *************** get_sched_window (partial_schedule_ptr p
> *** 1867,1876 ****
>        if (dump_file)
>        fprintf (dump_file, "\nEmpty window: start=%d, end=%d, step=%d\n",
>                 start, end, step);
> !     return -1;
>      }
>
> !     return 0;
>  }
>
>  /* Calculate MUST_PRECEDE/MUST_FOLLOW bitmaps of U_NODE; which is the
> --- 1778,1787 ----
>        if (dump_file)
>        fprintf (dump_file, "\nEmpty window: start=%d, end=%d, step=%d\n",
>                 start, end, step);
> !       return -1;
>      }
>
> !   return 0;
>  }
>
>  /* Calculate MUST_PRECEDE/MUST_FOLLOW bitmaps of U_NODE; which is the

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Question about SMS scheduling windows
  2011-08-08 19:36           ` Ayal Zaks
@ 2011-08-09 11:19             ` Richard Sandiford
  0 siblings, 0 replies; 9+ messages in thread
From: Richard Sandiford @ 2011-08-09 11:19 UTC (permalink / raw)
  To: Ayal Zaks; +Cc: Revital1 Eres, gcc

Ayal Zaks <ayal.zaks@gmail.com> writes:
>> (FWIW, libav did show up extra differences when using the patch
>> that I'd originally submitted.  They were due to the count_preds
>> and count_succs thing that you picked up in your review.)
>
>
> (These differences had no noticable consequences performance-wise, right?)

Well, because they were unintentional, I'm afraid I didn't measure them.
The committed version didn't have these differences.  (It was an assembly
source comparison, so I don't have a ready way of measuring the performance
of each affected loop.)

Richard

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-08-09 11:19 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-27 10:21 Question about SMS scheduling windows Richard Sandiford
2011-07-27 13:34 ` Richard Sandiford
2011-07-27 13:40 ` Revital1 Eres
2011-07-27 16:42   ` Ayal Zaks
2011-08-04  9:03     ` Richard Sandiford
2011-08-04 16:27       ` Ayal Zaks
2011-08-08  9:30         ` Richard Sandiford
2011-08-08 19:36           ` Ayal Zaks
2011-08-09 11:19             ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).