public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [0/4] Make SMS schedule register moves
@ 2011-08-30 12:45 Richard Sandiford
  2011-08-30 12:46 ` [1/4] SMS: remove register undo list Richard Sandiford
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Richard Sandiford @ 2011-08-30 12:45 UTC (permalink / raw)
  To: gcc-patches; +Cc: zaks

I'm seeing several cases in which SMS's register move handling is
causing it to generate worse code than the normal schedulers on ARM
Cortex-A8.  The problem is that we first schedule the loop without
taking the moves into account, then emit the required moves immediately
before the initial definition.

A simple example is:

    void
    loop (unsigned char *__restrict q, unsigned char *__restrict p, int n)
    {
      while (n > 0)
        {
          q[0] = (p[0] + p[1]) >> 1;
          q++;
          p += 2;
          n--;
        }
    }

(taken from libav).  Compiled with:

  -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp
  -mvectorize-with-neon-quad -O2 -ftree-vectorize -fno-auto-inc-dec
  -fmodulo-sched -fmodulo-sched-allow-regmoves

on arm-linux-gnueabi-gcc (with current trunk), the scheduled loop has an
ii of 27, a stage count of 6, and requires 14 register moves, 12 of which
are vector moves.  Vector moves cannot be dual-issued with most other
vector instructions, and the scheduled loop only has one free "slot"
for a vector move, so even in the best case, this loop theoretically
needs 27 + 12 - 1 cycles per iteration, significantly higher than the ii.
(It actually ends up much worse than that, because with so many live
vector values, we end up with lots of spills.)

The aim of the patches is to schedule the moves, and to reject the
current ii if this scheduling fails.  Revital pointed out that Mustafa
Hagog had tried the same thing, so this is effectively a reimplementation
of that idea.  For those who've seen Mustafa's patch, the main functional
differences are that:

  - Mustafa's version scheduled from low rows to high rows, with the
    high row being the one associated with the previous move.  These
    patches instead schedule high-to-low, which should leave a larger
    window for later moves.

  - The patches use a cyclic scheduling window.  E.g., for a move
    related to the instruction in column 1 of row 0, the patches first
    try row 0 (column 0 only), then row ii-1, etc.

  - The patches take instruction latency into account.

On the loop above, we reject the ii of 27 and try again with an ii of 28.
This leads to a stage count of 3 and no register moves, which is
theoretically 10 cycles quicker than before.  The lack of spills means
that the real figures are much better though: on a BeagleBoard, the new
loop is 5.45 times faster than the old one.  I've seen similar improvements
in other "real" libav loops too, not all of them due to fewer spills.

(BTW, in the ii=27 version, most of the moves are for distance-1 true
dependencies between a stage N instruction in row R and a stage N+1
instruction in row R+1.)

As well as testing on a collection of libav-derived microbenchmarks like
the one above, I tested on a commercial embeeded testsuite.  It showed a
significant improvement in one test and no change for the rest.

Bootstrapped & regression-tested on powerp-ibm-aix5.3.0, using a
compiler that had -fmodulo-sched and -fmodulo-sched-allow-regmoves
turned on at -O and above.

Richard

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [1/4] SMS: remove register undo list
  2011-08-30 12:45 [0/4] Make SMS schedule register moves Richard Sandiford
@ 2011-08-30 12:46 ` Richard Sandiford
  2011-08-30 13:04 ` [2/4] SMS: Use ids to represent ps_insns Richard Sandiford
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Richard Sandiford @ 2011-08-30 12:46 UTC (permalink / raw)
  To: gcc-patches; +Cc: zaks

This patch removes the (unused) undo_replace_buff_elem list.
Verges on the obvious, but still.

Patch 3 splits the generation of moves into two: one function to record
what needs to happen, and another function to actually do it.  It's then
easy to bail out if we decide that we don't want the moves.

Richard


gcc/
	* modulo-sched.c (undo_replace_buff_elem): Delete.
	(generate_reg_moves): Don't build and return an undo list.
	(free_undo_replace_buff): Delete.
	(sms_schedule): Adjust call to generate_reg_moves.
	Don't call free_undo_replace_buff.

Index: gcc/modulo-sched.c
===================================================================
--- gcc/modulo-sched.c	2011-08-24 12:38:37.171362916 +0100
+++ gcc/modulo-sched.c	2011-08-24 12:56:17.754942951 +0100
@@ -165,17 +165,6 @@ struct partial_schedule
   int stage_count;  /* The stage count of the partial schedule.  */
 };
 
-/* We use this to record all the register replacements we do in
-   the kernel so we can undo SMS if it is not profitable.  */
-struct undo_replace_buff_elem
-{
-  rtx insn;
-  rtx orig_reg;
-  rtx new_reg;
-  struct undo_replace_buff_elem *next;
-};
-
-
 
 static partial_schedule_ptr create_partial_schedule (int ii, ddg_ptr, int history);
 static void free_partial_schedule (partial_schedule_ptr);
@@ -456,13 +445,12 @@ print_node_sched_params (FILE *file, int
    nreg_moves = ----------------------------------- + 1 - {   dependence.
                             ii                          { 1 if not.
 */
-static struct undo_replace_buff_elem *
+static void
 generate_reg_moves (partial_schedule_ptr ps, bool rescan)
 {
   ddg_ptr g = ps->g;
   int ii = ps->ii;
   int i;
-  struct undo_replace_buff_elem *reg_move_replaces = NULL;
 
   for (i = 0; i < g->num_nodes; i++)
     {
@@ -543,22 +531,6 @@ generate_reg_moves (partial_schedule_ptr
 
 	  EXECUTE_IF_SET_IN_SBITMAP (uses_of_defs[i_reg_move], 0, i_use, sbi)
 	    {
-	      struct undo_replace_buff_elem *rep;
-
-	      rep = (struct undo_replace_buff_elem *)
-		    xcalloc (1, sizeof (struct undo_replace_buff_elem));
-	      rep->insn = g->nodes[i_use].insn;
-	      rep->orig_reg = old_reg;
-	      rep->new_reg = new_reg;
-
-	      if (! reg_move_replaces)
-		reg_move_replaces = rep;
-	      else
-		{
-		  rep->next = reg_move_replaces;
-		  reg_move_replaces = rep;
-		}
-
 	      replace_rtx (g->nodes[i_use].insn, old_reg, new_reg);
 	      if (rescan)
 		df_insn_rescan (g->nodes[i_use].insn);
@@ -568,21 +540,6 @@ generate_reg_moves (partial_schedule_ptr
 	}
       sbitmap_vector_free (uses_of_defs);
     }
-  return reg_move_replaces;
-}
-
-/* Free memory allocated for the undo buffer.  */
-static void
-free_undo_replace_buff (struct undo_replace_buff_elem *reg_move_replaces)
-{
-
-  while (reg_move_replaces)
-    {
-      struct undo_replace_buff_elem *rep = reg_move_replaces;
-
-      reg_move_replaces = reg_move_replaces->next;
-      free (rep);
-    }
 }
 
 /* Update the sched_params (time, row and stage) for node U using the II,
@@ -1472,8 +1429,6 @@ sms_schedule (void)
 	}
       else
 	{
-	  struct undo_replace_buff_elem *reg_move_replaces;
-
           if (!opt_sc_p)
             {
 	      /* Rotate the partial schedule to have the branch in row ii-1.  */
@@ -1523,13 +1478,11 @@ sms_schedule (void)
 	  /* The life-info is not valid any more.  */
 	  df_set_bb_dirty (g->bb);
 
-	  reg_move_replaces = generate_reg_moves (ps, true);
+	  generate_reg_moves (ps, true);
 	  if (dump_file)
 	    print_node_sched_params (dump_file, g->num_nodes, g);
 	  /* Generate prolog and epilog.  */
           generate_prolog_epilog (ps, loop, count_reg, count_init);
-
-	  free_undo_replace_buff (reg_move_replaces);
 	}
 
       free_partial_schedule (ps);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [2/4] SMS: Use ids to represent ps_insns
  2011-08-30 12:45 [0/4] Make SMS schedule register moves Richard Sandiford
  2011-08-30 12:46 ` [1/4] SMS: remove register undo list Richard Sandiford
@ 2011-08-30 13:04 ` Richard Sandiford
  2011-08-30 13:12 ` [3/4] SMS: Record moves in the partial schedule Richard Sandiford
  2011-08-30 13:17 ` [4/4] Make SMS schedule register moves Richard Sandiford
  3 siblings, 0 replies; 15+ messages in thread
From: Richard Sandiford @ 2011-08-30 13:04 UTC (permalink / raw)
  To: gcc-patches; +Cc: zaks

Instructions in a partial schedule are currently represented as a
ddg node.  This patch uses a more abstract id instead.  At the moment,
the ids map directly to ddg nodes, but the next patch will add register
moves to the end.

One slight advantage of using ids is that we can leave the ASAP value
on the node; we don't need to copy it across to the scheduling info.

(Later patches use the same scheduling info for moves, for which an ASAP
value would be wasted space.)

Richard



gcc/
	* modulo-sched.c (ps_insn): Replace node field with an identifier.
	(SCHED_ASAP): Replace with..
	(NODE_ASAP): ...this macro.
	(SCHED_PARAMS): New macro.
	(SCHED_TIME, SCHED_FIRST_REG_MOVE, SCHED_NREG_MOVES, SCHED_ROW)
	(SCHED_STAGE, SCHED_COLUMN): Redefine using SCHED_PARAMS.
	(node_sched_params): Remove asap.
	(ps_rtl_insn, ps_first_note): New functions.
	(set_node_sched_params): Use XCNEWVEC.  Don't copy across the
	asap values.
	(print_node_sched_params): Use SCHED_PARAMS and NODE_ASAP.
	(generate_reg_moves): Pass ids to the SCHED_* macros.
	(update_node_sched_params): Take a ps_insn identifier rather than
	a node as parameter.  Use ps_rtl_insn.
	(set_columns_for_ps): Update for above field and SCHED_* macro changes.
	(permute_partial_schedule): Use ps_rtl_insn and ps_first_note.
	(optimize_sc): Update for above field and SCHED_* macro changes.
	Update calls to try_scheduling_node_in_cycle and
	update_node_sched_params.
	(duplicate_insns_of_cycles): Adjust for above field and SCHED_*
	macro changes.  Use ps_rtl_insn and ps_first_note.
	(sms_schedule): Pass ids to the SCHED_* macros.
	(get_sched_window): Adjust for above field and SCHED_* macro changes.
	Use NODE_ASAP instead of SCHED_ASAP.
	(try_scheduling_node_in_cycle): Remove node parameter.  Update
	call to ps_add_node_check_conflicts.  Pass ids to the SCHED_*
	macros.
	(sms_schedule_by_order): Update call to try_scheduling_node_in_cycle.
	(ps_insert_empty_row): Adjust for above field changes.
	(compute_split_row): Use ids rather than nodes.
	(verify_partial_schedule): Adjust for above field changes.
	(print_partial_schedule): Use ps_rtl_insn.
	(create_ps_insn): Take an id rather than a node.
	(ps_insn_find_column): Adjust for above field changes.
	Use ps_rtl_insn.
	(ps_insn_advance_column): Adjust for above field changes.
	(add_node_to_ps): Remove node parameter.  Update call to
	create_ps_insn.
	(ps_has_conflicts): Use ps_rtl_insn.
	(ps_add_node_check_conflicts): Replace node parameter than an id.

Index: gcc/modulo-sched.c
===================================================================
--- gcc/modulo-sched.c	2011-08-24 12:26:13.899926738 +0100
+++ gcc/modulo-sched.c	2011-08-24 13:29:01.956079514 +0100
@@ -124,8 +124,8 @@ #define PS_STAGE_COUNT(ps) (((partial_sc
 /* A single instruction in the partial schedule.  */
 struct ps_insn
 {
-  /* The corresponding DDG_NODE.  */
-  ddg_node_ptr node;
+  /* The number of the ddg node whose instruction is being scheduled.  */
+  int id;
 
   /* The (absolute) cycle in which the PS instruction is scheduled.
      Same as SCHED_TIME (node).  */
@@ -183,9 +183,7 @@ static void reset_partial_schedule (part
 void print_partial_schedule (partial_schedule_ptr, FILE *);
 static void verify_partial_schedule (partial_schedule_ptr, sbitmap);
 static ps_insn_ptr ps_add_node_check_conflicts (partial_schedule_ptr,
-						ddg_node_ptr node, int cycle,
-						sbitmap must_precede,
-						sbitmap must_follow);
+						int, int, sbitmap, sbitmap);
 static void rotate_partial_schedule (partial_schedule_ptr, int);
 void set_row_column_for_ps (partial_schedule_ptr);
 static void ps_insert_empty_row (partial_schedule_ptr, int, sbitmap);
@@ -208,25 +206,23 @@ static void calculate_must_precede_follo
 					   int, int, sbitmap, sbitmap, sbitmap);
 static int get_sched_window (partial_schedule_ptr, ddg_node_ptr, 
 			     sbitmap, int, int *, int *, int *);
-static bool try_scheduling_node_in_cycle (partial_schedule_ptr, ddg_node_ptr,
-					  int, int, sbitmap, int *, sbitmap,
-					  sbitmap);
+static bool try_scheduling_node_in_cycle (partial_schedule_ptr, int, int,
+					  sbitmap, int *, sbitmap, sbitmap);
 static bool remove_node_from_ps (partial_schedule_ptr, ps_insn_ptr);
 
-#define SCHED_ASAP(x) (((node_sched_params_ptr)(x)->aux.info)->asap)
-#define SCHED_TIME(x) (((node_sched_params_ptr)(x)->aux.info)->time)
-#define SCHED_FIRST_REG_MOVE(x) \
-	(((node_sched_params_ptr)(x)->aux.info)->first_reg_move)
-#define SCHED_NREG_MOVES(x) \
-	(((node_sched_params_ptr)(x)->aux.info)->nreg_moves)
-#define SCHED_ROW(x) (((node_sched_params_ptr)(x)->aux.info)->row)
-#define SCHED_STAGE(x) (((node_sched_params_ptr)(x)->aux.info)->stage)
-#define SCHED_COLUMN(x) (((node_sched_params_ptr)(x)->aux.info)->column)
+#define NODE_ASAP(node) ((node)->aux.count)
+
+#define SCHED_PARAMS(x) (&node_sched_params[x])
+#define SCHED_TIME(x) (SCHED_PARAMS (x)->time)
+#define SCHED_FIRST_REG_MOVE(x) (SCHED_PARAMS (x)->first_reg_move)
+#define SCHED_NREG_MOVES(x) (SCHED_PARAMS (x)->nreg_moves)
+#define SCHED_ROW(x) (SCHED_PARAMS (x)->row)
+#define SCHED_STAGE(x) (SCHED_PARAMS (x)->stage)
+#define SCHED_COLUMN(x) (SCHED_PARAMS (x)->column)
 
 /* The scheduling parameters held for each node.  */
 typedef struct node_sched_params
 {
-  int asap;	/* A lower-bound on the absolute scheduling cycle.  */
   int time;	/* The absolute scheduling cycle (time >= asap).  */
 
   /* The following field (first_reg_move) is a pointer to the first
@@ -295,6 +291,23 @@ static struct haifa_sched_info sms_sched
   0
 };
 
+/* Return the rtl instruction that is being scheduled by partial schedule
+   instruction ID, which belongs to schedule PS.  */
+static rtx
+ps_rtl_insn (partial_schedule_ptr ps, int id)
+{
+  return ps->g->nodes[id].insn;
+}
+
+/* Return the first instruction in the original (unscheduled) loop that
+   was associated with ps_rtl_insn (PS, ID).  If the instruction had
+   some notes before it, this is the first of those notes.  */
+static rtx
+ps_first_note (partial_schedule_ptr ps, int id)
+{
+  return ps->g->nodes[id].first_note;
+}
+
 /* Given HEAD and TAIL which are the first and last insns in a loop;
    return the register which controls the loop.  Return zero if it has
    more than one occurrence in the loop besides the control part or the
@@ -398,28 +411,11 @@ res_MII (ddg_ptr g)
 /* Points to the array that contains the sched data for each node.  */
 static node_sched_params_ptr node_sched_params;
 
-/* Allocate sched_params for each node and initialize it.  Assumes that
-   the aux field of each node contain the asap bound (computed earlier),
-   and copies it into the sched_params field.  */
+/* Allocate sched_params for each node and initialize it.  */
 static void
 set_node_sched_params (ddg_ptr g)
 {
-  int i;
-
-  /* Allocate for each node in the DDG a place to hold the "sched_data".  */
-  /* Initialize ASAP/ALAP/HIGHT to zero.  */
-  node_sched_params = (node_sched_params_ptr)
-		       xcalloc (g->num_nodes,
-				sizeof (struct node_sched_params));
-
-  /* Set the pointer of the general data of the node to point to the
-     appropriate sched_params structure.  */
-  for (i = 0; i < g->num_nodes; i++)
-    {
-      /* Watch out for aliasing problems?  */
-      node_sched_params[i].asap = g->nodes[i].aux.count;
-      g->nodes[i].aux.info = &node_sched_params[i];
-    }
+  node_sched_params = XCNEWVEC (struct node_sched_params, g->num_nodes);
 }
 
 static void
@@ -431,13 +427,13 @@ print_node_sched_params (FILE *file, int
     return;
   for (i = 0; i < num_nodes; i++)
     {
-      node_sched_params_ptr nsp = &node_sched_params[i];
+      node_sched_params_ptr nsp = SCHED_PARAMS (i);
       rtx reg_move = nsp->first_reg_move;
       int j;
 
       fprintf (file, "Node = %d; INSN = %d\n", i,
 	       (INSN_UID (g->nodes[i].insn)));
-      fprintf (file, " asap = %d:\n", nsp->asap);
+      fprintf (file, " asap = %d:\n", NODE_ASAP (&g->nodes[i]));
       fprintf (file, " time = %d:\n", nsp->time);
       fprintf (file, " nreg_moves = %d:\n", nsp->nreg_moves);
       for (j = 0; j < nsp->nreg_moves; j++)
@@ -482,15 +478,17 @@ generate_reg_moves (partial_schedule_ptr
       for (e = u->out; e; e = e->next_out)
 	if (e->type == TRUE_DEP && e->dest != e->src)
 	  {
-	    int nreg_moves4e = (SCHED_TIME (e->dest) - SCHED_TIME (e->src)) / ii;
+	    int nreg_moves4e = (SCHED_TIME (e->dest->cuid)
+				- SCHED_TIME (e->src->cuid)) / ii;
 
             if (e->distance == 1)
-              nreg_moves4e = (SCHED_TIME (e->dest) - SCHED_TIME (e->src) + ii) / ii;
+              nreg_moves4e = (SCHED_TIME (e->dest->cuid)
+			      - SCHED_TIME (e->src->cuid) + ii) / ii;
 
 	    /* If dest precedes src in the schedule of the kernel, then dest
 	       will read before src writes and we can save one reg_copy.  */
-	    if (SCHED_ROW (e->dest) == SCHED_ROW (e->src)
-		&& SCHED_COLUMN (e->dest) < SCHED_COLUMN (e->src))
+	    if (SCHED_ROW (e->dest->cuid) == SCHED_ROW (e->src->cuid)
+		&& SCHED_COLUMN (e->dest->cuid) < SCHED_COLUMN (e->src->cuid))
 	      nreg_moves4e--;
 
 	    nreg_moves = MAX (nreg_moves, nreg_moves4e);
@@ -508,13 +506,15 @@ generate_reg_moves (partial_schedule_ptr
       for (e = u->out; e; e = e->next_out)
 	if (e->type == TRUE_DEP && e->dest != e->src)
 	  {
-	    int dest_copy = (SCHED_TIME (e->dest) - SCHED_TIME (e->src)) / ii;
+	    int dest_copy = (SCHED_TIME (e->dest->cuid)
+			     - SCHED_TIME (e->src->cuid)) / ii;
 
 	    if (e->distance == 1)
-	      dest_copy = (SCHED_TIME (e->dest) - SCHED_TIME (e->src) + ii) / ii;
+	      dest_copy = (SCHED_TIME (e->dest->cuid)
+			   - SCHED_TIME (e->src->cuid) + ii) / ii;
 
-	    if (SCHED_ROW (e->dest) == SCHED_ROW (e->src)
-		&& SCHED_COLUMN (e->dest) < SCHED_COLUMN (e->src))
+	    if (SCHED_ROW (e->dest->cuid) == SCHED_ROW (e->src->cuid)
+		&& SCHED_COLUMN (e->dest->cuid) < SCHED_COLUMN (e->src->cuid))
 	      dest_copy--;
 
 	    if (dest_copy)
@@ -522,7 +522,7 @@ generate_reg_moves (partial_schedule_ptr
 	  }
 
       /* Now generate the reg_moves, attaching relevant uses to them.  */
-      SCHED_NREG_MOVES (u) = nreg_moves;
+      SCHED_NREG_MOVES (i) = nreg_moves;
       old_reg = prev_reg = copy_rtx (SET_DEST (single_set (u->insn)));
       /* Insert the reg-moves right before the notes which precede
          the insn they relates to.  */
@@ -538,8 +538,8 @@ generate_reg_moves (partial_schedule_ptr
 	  add_insn_before (reg_move, last_reg_move, NULL);
 	  last_reg_move = reg_move;
 
-	  if (!SCHED_FIRST_REG_MOVE (u))
-	    SCHED_FIRST_REG_MOVE (u) = reg_move;
+	  if (!SCHED_FIRST_REG_MOVE (i))
+	    SCHED_FIRST_REG_MOVE (i) = reg_move;
 
 	  EXECUTE_IF_SET_IN_SBITMAP (uses_of_defs[i_reg_move], 0, i_use, sbi)
 	    {
@@ -591,7 +591,7 @@ free_undo_replace_buff (struct undo_repl
    SCHED_STAGE (u) = CALC_STAGE_COUNT (SCHED_TIME (u), min_cycle, ii);
    because the stages may not be aligned on cycle 0.  */
 static void
-update_node_sched_params (ddg_node_ptr u, int ii, int cycle, int min_cycle)
+update_node_sched_params (int u, int ii, int cycle, int min_cycle)
 {
   int sc_until_cycle_zero;
   int stage;
@@ -628,18 +628,19 @@ reset_sched_times (partial_schedule_ptr 
   for (row = 0; row < ii; row++)
     for (crr_insn = ps->rows[row]; crr_insn; crr_insn = crr_insn->next_in_row)
       {
-	ddg_node_ptr u = crr_insn->node;
+	int u = crr_insn->id;
 	int normalized_time = SCHED_TIME (u) - amount;
 	int new_min_cycle = PS_MIN_CYCLE (ps) - amount;
 
         if (dump_file)
           {
             /* Print the scheduling times after the rotation.  */
+	    rtx insn = ps_rtl_insn (ps, u);
+
             fprintf (dump_file, "crr_insn->node=%d (insn id %d), "
-                     "crr_insn->cycle=%d, min_cycle=%d", crr_insn->node->cuid,
-                     INSN_UID (crr_insn->node->insn), normalized_time,
-                     new_min_cycle);
-            if (JUMP_P (crr_insn->node->insn))
+                     "crr_insn->cycle=%d, min_cycle=%d", u,
+                     INSN_UID (insn), normalized_time, new_min_cycle);
+            if (JUMP_P (insn))
               fprintf (dump_file, " (branch)");
             fprintf (dump_file, "\n");
           }
@@ -664,7 +665,7 @@ set_columns_for_ps (partial_schedule_ptr
       int column = 0;
 
       for (; cur_insn; cur_insn = cur_insn->next_in_row)
-	SCHED_COLUMN (cur_insn->node) = column++;
+	SCHED_COLUMN (cur_insn->id) = column++;
     }
 }
 
@@ -680,9 +681,13 @@ permute_partial_schedule (partial_schedu
 
   for (row = 0; row < ii ; row++)
     for (ps_ij = ps->rows[row]; ps_ij; ps_ij = ps_ij->next_in_row)
-      if (PREV_INSN (last) != ps_ij->node->insn)
-      	reorder_insns_nobb (ps_ij->node->first_note, ps_ij->node->insn,
-			    PREV_INSN (last));
+      {
+	rtx insn = ps_rtl_insn (ps, ps_ij->id);
+
+	if (PREV_INSN (last) != insn)
+	  reorder_insns_nobb (ps_first_note (ps, ps_ij->id), insn,
+			      PREV_INSN (last));
+      }
 }
 
 /* Set bitmaps TMP_FOLLOW and TMP_PRECEDE to MUST_FOLLOW and MUST_PRECEDE
@@ -731,7 +736,7 @@ optimize_sc (partial_schedule_ptr ps, dd
      to row ii-1.  If they are equal just bail out.  */
   stage_count = calculate_stage_count (ps, amount);
   stage_count_curr =
-    calculate_stage_count (ps, SCHED_TIME (g->closing_branch) - (ii - 1));
+    calculate_stage_count (ps, SCHED_TIME (g->closing_branch->cuid) - (ii - 1));
 
   if (stage_count == stage_count_curr)
     {
@@ -760,7 +765,7 @@ optimize_sc (partial_schedule_ptr ps, dd
       print_partial_schedule (ps, dump_file);
     }
 
-  if (SMODULO (SCHED_TIME (g->closing_branch), ii) == ii - 1)
+  if (SMODULO (SCHED_TIME (g->closing_branch->cuid), ii) == ii - 1)
     {
       ok = true;
       goto clear;
@@ -775,7 +780,7 @@ optimize_sc (partial_schedule_ptr ps, dd
     {
       bool success;
       ps_insn_ptr next_ps_i;
-      int branch_cycle = SCHED_TIME (g->closing_branch);
+      int branch_cycle = SCHED_TIME (g->closing_branch->cuid);
       int row = SMODULO (branch_cycle, ps->ii);
       int num_splits = 0;
       sbitmap must_precede, must_follow, tmp_precede, tmp_follow;
@@ -831,14 +836,13 @@ optimize_sc (partial_schedule_ptr ps, dd
          branch so we can remove it from it's current cycle.  */
       for (next_ps_i = ps->rows[row];
 	   next_ps_i; next_ps_i = next_ps_i->next_in_row)
-	if (next_ps_i->node->cuid == g->closing_branch->cuid)
+	if (next_ps_i->id == g->closing_branch->cuid)
 	  break;
 
       gcc_assert (next_ps_i);
       gcc_assert (remove_node_from_ps (ps, next_ps_i));
       success =
-	try_scheduling_node_in_cycle (ps, g->closing_branch,
-				      g->closing_branch->cuid, c,
+	try_scheduling_node_in_cycle (ps, g->closing_branch->cuid, c,
 				      sched_nodes, &num_splits,
 				      tmp_precede, tmp_follow);
       gcc_assert (num_splits == 0);
@@ -856,8 +860,7 @@ optimize_sc (partial_schedule_ptr ps, dd
 				   must_precede, branch_cycle, start, end,
 				   step);
 	  success =
-	    try_scheduling_node_in_cycle (ps, g->closing_branch,
-					  g->closing_branch->cuid,
+	    try_scheduling_node_in_cycle (ps, g->closing_branch->cuid,
 					  branch_cycle, sched_nodes,
 					  &num_splits, tmp_precede,
 					  tmp_follow);
@@ -871,7 +874,7 @@ optimize_sc (partial_schedule_ptr ps, dd
 	    fprintf (dump_file,
 		     "SMS success in moving branch to cycle %d\n", c);
 
-	  update_node_sched_params (g->closing_branch, ii, c,
+	  update_node_sched_params (g->closing_branch->cuid, ii, c,
 				    PS_MIN_CYCLE (ps));
 	  ok = true;
 	}
@@ -895,9 +898,10 @@ duplicate_insns_of_cycles (partial_sched
   for (row = 0; row < ps->ii; row++)
     for (ps_ij = ps->rows[row]; ps_ij; ps_ij = ps_ij->next_in_row)
       {
-	ddg_node_ptr u_node = ps_ij->node;
+	int u = ps_ij->id;
 	int j, i_reg_moves;
 	rtx reg_move = NULL_RTX;
+	rtx u_insn;
 
         /* Do not duplicate any insn which refers to count_reg as it
            belongs to the control part.
@@ -905,52 +909,53 @@ duplicate_insns_of_cycles (partial_sched
            be ignored.
            TODO: This should be done by analyzing the control part of
            the loop.  */
-        if (reg_mentioned_p (count_reg, u_node->insn)
-            || JUMP_P (ps_ij->node->insn))
+	u_insn = ps_rtl_insn (ps, u);
+        if (reg_mentioned_p (count_reg, u_insn)
+            || JUMP_P (u_insn))
           continue;
 
 	if (for_prolog)
 	  {
-	    /* SCHED_STAGE (u_node) >= from_stage == 0.  Generate increasing
+	    /* SCHED_STAGE (u) >= from_stage == 0.  Generate increasing
 	       number of reg_moves starting with the second occurrence of
-	       u_node, which is generated if its SCHED_STAGE <= to_stage.  */
-	    i_reg_moves = to_stage - SCHED_STAGE (u_node) + 1;
+	       u, which is generated if its SCHED_STAGE <= to_stage.  */
+	    i_reg_moves = to_stage - SCHED_STAGE (u) + 1;
 	    i_reg_moves = MAX (i_reg_moves, 0);
-	    i_reg_moves = MIN (i_reg_moves, SCHED_NREG_MOVES (u_node));
+	    i_reg_moves = MIN (i_reg_moves, SCHED_NREG_MOVES (u));
 
 	    /* The reg_moves start from the *first* reg_move backwards.  */
 	    if (i_reg_moves)
 	      {
-		reg_move = SCHED_FIRST_REG_MOVE (u_node);
+		reg_move = SCHED_FIRST_REG_MOVE (u);
 		for (j = 1; j < i_reg_moves; j++)
 		  reg_move = PREV_INSN (reg_move);
 	      }
 	  }
 	else /* It's for the epilog.  */
 	  {
-	    /* SCHED_STAGE (u_node) <= to_stage.  Generate all reg_moves,
-	       starting to decrease one stage after u_node no longer occurs;
+	    /* SCHED_STAGE (u) <= to_stage.  Generate all reg_moves,
+	       starting to decrease one stage after u no longer occurs;
 	       that is, generate all reg_moves until
-	       SCHED_STAGE (u_node) == from_stage - 1.  */
-	    i_reg_moves = SCHED_NREG_MOVES (u_node)
-	    	       - (from_stage - SCHED_STAGE (u_node) - 1);
+	       SCHED_STAGE (u) == from_stage - 1.  */
+	    i_reg_moves = (SCHED_NREG_MOVES (u)
+			   - (from_stage - SCHED_STAGE (u) - 1));
 	    i_reg_moves = MAX (i_reg_moves, 0);
-	    i_reg_moves = MIN (i_reg_moves, SCHED_NREG_MOVES (u_node));
+	    i_reg_moves = MIN (i_reg_moves, SCHED_NREG_MOVES (u));
 
 	    /* The reg_moves start from the *last* reg_move forwards.  */
 	    if (i_reg_moves)
 	      {
-		reg_move = SCHED_FIRST_REG_MOVE (u_node);
-		for (j = 1; j < SCHED_NREG_MOVES (u_node); j++)
+		reg_move = SCHED_FIRST_REG_MOVE (u);
+		for (j = 1; j < SCHED_NREG_MOVES (u); j++)
 		  reg_move = PREV_INSN (reg_move);
 	      }
 	  }
 
 	for (j = 0; j < i_reg_moves; j++, reg_move = NEXT_INSN (reg_move))
 	  emit_insn (copy_rtx (PATTERN (reg_move)));
-	if (SCHED_STAGE (u_node) >= from_stage
-	    && SCHED_STAGE (u_node) <= to_stage)
-	  duplicate_insn_chain (u_node->first_note, u_node->insn);
+	if (SCHED_STAGE (u) >= from_stage
+	    && SCHED_STAGE (u) <= to_stage)
+	  duplicate_insn_chain (ps_first_note (ps, u), u_insn);
       }
 }
 
@@ -1417,8 +1422,6 @@ sms_schedule (void)
 	fprintf (dump_file, "SMS iis %d %d %d (rec_mii, mii, maxii)\n",
 		 rec_mii, mii, maxii);
 
-      /* After sms_order_nodes and before sms_schedule_by_order, to copy over
-	 ASAP.  */
       set_node_sched_params (g);
 
       ps = sms_schedule_by_order (g, mii, maxii, node_order);
@@ -1436,7 +1439,7 @@ sms_schedule (void)
 	  else
 	    {
 	      /* Bring the branch to cycle ii-1.  */
-	      int amount = SCHED_TIME (g->closing_branch) - (ps->ii - 1);
+	      int amount = SCHED_TIME (g->closing_branch->cuid) - (ps->ii - 1);
 	      
 	      if (dump_file)
 		fprintf (dump_file, "SMS schedule branch at cycle ii-1\n");
@@ -1472,7 +1475,7 @@ sms_schedule (void)
           if (!opt_sc_p)
             {
 	      /* Rotate the partial schedule to have the branch in row ii-1.  */
-              int amount = SCHED_TIME (g->closing_branch) - (ps->ii - 1);
+              int amount = SCHED_TIME (g->closing_branch->cuid) - (ps->ii - 1);
 	      
               reset_sched_times (ps, amount);
               rotate_partial_schedule (ps, amount);
@@ -1675,11 +1678,11 @@ get_sched_window (partial_schedule_ptr p
   if (psp_not_empty)
     for (e = u_node->in; e != 0; e = e->next_in)
       {
-	ddg_node_ptr v_node = e->src;
+	int v = e->src->cuid;
 
-	if (TEST_BIT (sched_nodes, v_node->cuid))
+	if (TEST_BIT (sched_nodes, v))
 	  {
-	    int p_st = SCHED_TIME (v_node);
+	    int p_st = SCHED_TIME (v);
 	    int earliest = p_st + e->latency - (e->distance * ii);
 	    int latest = (e->data_type == MEM_DEP ? p_st + ii - 1 : INT_MAX);
 
@@ -1703,11 +1706,11 @@ get_sched_window (partial_schedule_ptr p
   if (pss_not_empty)
     for (e = u_node->out; e != 0; e = e->next_out)
       {
-	ddg_node_ptr v_node = e->dest;
+	int v = e->dest->cuid;
 
-	if (TEST_BIT (sched_nodes, v_node->cuid))
+	if (TEST_BIT (sched_nodes, v))
 	  {
-	    int s_st = SCHED_TIME (v_node);
+	    int s_st = SCHED_TIME (v);
 	    int earliest = (e->data_type == MEM_DEP ? s_st - ii + 1 : INT_MIN);
 	    int latest = s_st - e->latency + (e->distance * ii);
 
@@ -1738,7 +1741,7 @@ get_sched_window (partial_schedule_ptr p
 
   /* Get a target scheduling window no bigger than ii.  */
   if (early_start == INT_MIN && late_start == INT_MAX)
-    early_start = SCHED_ASAP (u_node);
+    early_start = NODE_ASAP (u_node);
   else if (early_start == INT_MIN)
     early_start = late_start - (ii - 1);
   late_start = MIN (late_start, early_start + (ii - 1));
@@ -1835,7 +1838,7 @@ calculate_must_precede_follow (ddg_node_
       SCHED_TIME (e->src) - (e->distance * ii) == first_cycle_in_window  */
   for (e = u_node->in; e != 0; e = e->next_in)
     if (TEST_BIT (sched_nodes, e->src->cuid)
-	&& ((SCHED_TIME (e->src) - (e->distance * ii)) ==
+	&& ((SCHED_TIME (e->src->cuid) - (e->distance * ii)) ==
              first_cycle_in_window))
       {
 	if (dump_file)
@@ -1860,7 +1863,7 @@ calculate_must_precede_follow (ddg_node_
       SCHED_TIME (e->dest) + (e->distance * ii) == last_cycle_in_window  */
   for (e = u_node->out; e != 0; e = e->next_out)
     if (TEST_BIT (sched_nodes, e->dest->cuid)
-	&& ((SCHED_TIME (e->dest) + (e->distance * ii)) ==
+	&& ((SCHED_TIME (e->dest->cuid) + (e->distance * ii)) ==
              last_cycle_in_window))
       {
 	if (dump_file)
@@ -1884,7 +1887,7 @@ calculate_must_precede_follow (ddg_node_
    last row of the scheduling window)  */
 
 static bool
-try_scheduling_node_in_cycle (partial_schedule_ptr ps, ddg_node_ptr u_node,
+try_scheduling_node_in_cycle (partial_schedule_ptr ps,
 			      int u, int cycle, sbitmap sched_nodes,
 			      int *num_splits, sbitmap must_precede,
 			      sbitmap must_follow)
@@ -1893,11 +1896,10 @@ try_scheduling_node_in_cycle (partial_sc
   bool success = 0;
 
   verify_partial_schedule (ps, sched_nodes);
-  psi = ps_add_node_check_conflicts (ps, u_node, cycle,
-				     must_precede, must_follow);
+  psi = ps_add_node_check_conflicts (ps, u, cycle, must_precede, must_follow);
   if (psi)
     {
-      SCHED_TIME (u_node) = cycle;
+      SCHED_TIME (u) = cycle;
       SET_BIT (sched_nodes, u);
       success = 1;
       *num_splits = 0;
@@ -1977,7 +1979,7 @@ sms_schedule_by_order (ddg_ptr g, int mi
 		                           &tmp_precede, must_precede, 
                                            c, start, end, step);
                   success =
-                    try_scheduling_node_in_cycle (ps, u_node, u, c,
+                    try_scheduling_node_in_cycle (ps, u, c,
                                                   sched_nodes,
                                                   &num_splits, tmp_precede,
                                                   tmp_follow);
@@ -2077,7 +2079,7 @@ ps_insert_empty_row (partial_schedule_pt
       for (crr_insn = rows_new[row];
 	   crr_insn; crr_insn = crr_insn->next_in_row)
 	{
-	  ddg_node_ptr u = crr_insn->node;
+	  int u = crr_insn->id;
 	  int new_time = SCHED_TIME (u) + (SCHED_TIME (u) / ii);
 
 	  SCHED_TIME (u) = new_time;
@@ -2098,7 +2100,7 @@ ps_insert_empty_row (partial_schedule_pt
       for (crr_insn = rows_new[row + 1];
 	   crr_insn; crr_insn = crr_insn->next_in_row)
 	{
-	  ddg_node_ptr u = crr_insn->node;
+	  int u = crr_insn->id;
 	  int new_time = SCHED_TIME (u) + (SCHED_TIME (u) / ii) + 1;
 
 	  SCHED_TIME (u) = new_time;
@@ -2138,24 +2140,24 @@ compute_split_row (sbitmap sched_nodes, 
 {
   ddg_edge_ptr e;
   int lower = INT_MIN, upper = INT_MAX;
-  ddg_node_ptr crit_pred = NULL;
-  ddg_node_ptr crit_succ = NULL;
+  int crit_pred = -1;
+  int crit_succ = -1;
   int crit_cycle;
 
   for (e = u_node->in; e != 0; e = e->next_in)
     {
-      ddg_node_ptr v_node = e->src;
+      int v = e->src->cuid;
 
-      if (TEST_BIT (sched_nodes, v_node->cuid)
-	  && (low == SCHED_TIME (v_node) + e->latency - (e->distance * ii)))
-	if (SCHED_TIME (v_node) > lower)
+      if (TEST_BIT (sched_nodes, v)
+	  && (low == SCHED_TIME (v) + e->latency - (e->distance * ii)))
+	if (SCHED_TIME (v) > lower)
 	  {
-	    crit_pred = v_node;
-	    lower = SCHED_TIME (v_node);
+	    crit_pred = v;
+	    lower = SCHED_TIME (v);
 	  }
     }
 
-  if (crit_pred != NULL)
+  if (crit_pred >= 0)
     {
       crit_cycle = SCHED_TIME (crit_pred) + 1;
       return SMODULO (crit_cycle, ii);
@@ -2163,17 +2165,18 @@ compute_split_row (sbitmap sched_nodes, 
 
   for (e = u_node->out; e != 0; e = e->next_out)
     {
-      ddg_node_ptr v_node = e->dest;
-      if (TEST_BIT (sched_nodes, v_node->cuid)
-	  && (up == SCHED_TIME (v_node) - e->latency + (e->distance * ii)))
-	if (SCHED_TIME (v_node) < upper)
+      int v = e->dest->cuid;
+
+      if (TEST_BIT (sched_nodes, v)
+	  && (up == SCHED_TIME (v) - e->latency + (e->distance * ii)))
+	if (SCHED_TIME (v) < upper)
 	  {
-	    crit_succ = v_node;
-	    upper = SCHED_TIME (v_node);
+	    crit_succ = v;
+	    upper = SCHED_TIME (v);
 	  }
     }
 
-  if (crit_succ != NULL)
+  if (crit_succ >= 0)
     {
       crit_cycle = SCHED_TIME (crit_succ);
       return SMODULO (crit_cycle, ii);
@@ -2197,10 +2200,10 @@ verify_partial_schedule (partial_schedul
       
       for (crr_insn = ps->rows[row]; crr_insn; crr_insn = crr_insn->next_in_row)
 	{
-	  ddg_node_ptr u = crr_insn->node;
+	  int u = crr_insn->id;
 	  
 	  length++;
-	  gcc_assert (TEST_BIT (sched_nodes, u->cuid));
+	  gcc_assert (TEST_BIT (sched_nodes, u));
 	  /* ??? Test also that all nodes of sched_nodes are in ps, perhaps by
 	     popcount (sched_nodes) == number of insns in ps.  */
 	  gcc_assert (SCHED_TIME (u) >= ps->min_cycle);
@@ -2692,12 +2695,12 @@ print_partial_schedule (partial_schedule
       fprintf (dump, "\n[ROW %d ]: ", i);
       while (ps_i)
 	{
-	  if (JUMP_P (ps_i->node->insn))
-	    fprintf (dump, "%d (branch), ",
-		     INSN_UID (ps_i->node->insn));
+	  rtx insn = ps_rtl_insn (ps, ps_i->id);
+
+	  if (JUMP_P (insn))
+	    fprintf (dump, "%d (branch), ", INSN_UID (insn));
 	  else
-	    fprintf (dump, "%d, ",
-		     INSN_UID (ps_i->node->insn));
+	    fprintf (dump, "%d, ", INSN_UID (insn));
 	
 	  ps_i = ps_i->next_in_row;
 	}
@@ -2706,11 +2709,11 @@ print_partial_schedule (partial_schedule
 
 /* Creates an object of PS_INSN and initializes it to the given parameters.  */
 static ps_insn_ptr
-create_ps_insn (ddg_node_ptr node, int cycle)
+create_ps_insn (int id, int cycle)
 {
   ps_insn_ptr ps_i = XNEW (struct ps_insn);
 
-  ps_i->node = node;
+  ps_i->id = id;
   ps_i->next_in_row = NULL;
   ps_i->prev_in_row = NULL;
   ps_i->cycle = cycle;
@@ -2779,10 +2782,11 @@ ps_insn_find_column (partial_schedule_pt
        next_ps_i;
        next_ps_i = next_ps_i->next_in_row)
     {
-      if (must_follow && TEST_BIT (must_follow, next_ps_i->node->cuid)
+      if (must_follow
+	  && TEST_BIT (must_follow, next_ps_i->id)
 	  && ! first_must_follow)
         first_must_follow = next_ps_i;
-      if (must_precede && TEST_BIT (must_precede, next_ps_i->node->cuid))
+      if (must_precede && TEST_BIT (must_precede, next_ps_i->id))
         {
           /* If we have already met a node that must follow, then
 	     there is no possible column.  */
@@ -2793,8 +2797,8 @@ ps_insn_find_column (partial_schedule_pt
         }
       /* The closing branch must be the last in the row.  */
       if (must_precede 
-	  && TEST_BIT (must_precede, next_ps_i->node->cuid) 
-	  && JUMP_P (next_ps_i->node->insn))     
+	  && TEST_BIT (must_precede, next_ps_i->id)
+	  && JUMP_P (ps_rtl_insn (ps, next_ps_i->id)))
 	return false;
              
        last_in_row = next_ps_i;
@@ -2803,7 +2807,7 @@ ps_insn_find_column (partial_schedule_pt
   /* The closing branch is scheduled as well.  Make sure there is no
      dependent instruction after it as the branch should be the last
      instruction in the row.  */
-  if (JUMP_P (ps_i->node->insn)) 
+  if (JUMP_P (ps_rtl_insn (ps, ps_i->id)))
     {
       if (first_must_follow)
 	return false;
@@ -2854,7 +2858,6 @@ ps_insn_advance_column (partial_schedule
 {
   ps_insn_ptr prev, next;
   int row;
-  ddg_node_ptr next_node;
 
   if (!ps || !ps_i)
     return false;
@@ -2864,11 +2867,9 @@ ps_insn_advance_column (partial_schedule
   if (! ps_i->next_in_row)
     return false;
 
-  next_node = ps_i->next_in_row->node;
-
   /* Check if next_in_row is dependent on ps_i, both having same sched
      times (typically ANTI_DEP).  If so, ps_i cannot skip over it.  */
-  if (must_follow && TEST_BIT (must_follow, next_node->cuid))
+  if (must_follow && TEST_BIT (must_follow, ps_i->next_in_row->id))
     return false;
 
   /* Advance PS_I over its next_in_row in the doubly linked list.  */
@@ -2899,7 +2900,7 @@ ps_insn_advance_column (partial_schedule
    before/after (respectively) the node pointed to by PS_I when scheduled
    in the same cycle.  */
 static ps_insn_ptr
-add_node_to_ps (partial_schedule_ptr ps, ddg_node_ptr node, int cycle,
+add_node_to_ps (partial_schedule_ptr ps, int id, int cycle,
 		sbitmap must_precede, sbitmap must_follow)
 {
   ps_insn_ptr ps_i;
@@ -2908,7 +2909,7 @@ add_node_to_ps (partial_schedule_ptr ps,
   if (ps->rows_length[row] >= issue_rate)
     return NULL;
 
-  ps_i = create_ps_insn (node, cycle);
+  ps_i = create_ps_insn (id, cycle);
 
   /* Finds and inserts PS_I according to MUST_FOLLOW and
      MUST_PRECEDE.  */
@@ -2960,7 +2961,7 @@ ps_has_conflicts (partial_schedule_ptr p
 	   crr_insn;
 	   crr_insn = crr_insn->next_in_row)
 	{
-	  rtx insn = crr_insn->node->insn;
+	  rtx insn = ps_rtl_insn (ps, crr_insn->id);
 
 	  if (!NONDEBUG_INSN_P (insn))
 	    continue;
@@ -2997,7 +2998,7 @@ ps_has_conflicts (partial_schedule_ptr p
    cuid N must be come before/after (respectively) the node pointed to by
    PS_I when scheduled in the same cycle.  */
 ps_insn_ptr
-ps_add_node_check_conflicts (partial_schedule_ptr ps, ddg_node_ptr n,
+ps_add_node_check_conflicts (partial_schedule_ptr ps, int n,
    			     int c, sbitmap must_precede,
 			     sbitmap must_follow)
 {

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [3/4] SMS: Record moves in the partial schedule
  2011-08-30 12:45 [0/4] Make SMS schedule register moves Richard Sandiford
  2011-08-30 12:46 ` [1/4] SMS: remove register undo list Richard Sandiford
  2011-08-30 13:04 ` [2/4] SMS: Use ids to represent ps_insns Richard Sandiford
@ 2011-08-30 13:12 ` Richard Sandiford
  2011-08-30 13:17 ` [4/4] Make SMS schedule register moves Richard Sandiford
  3 siblings, 0 replies; 15+ messages in thread
From: Richard Sandiford @ 2011-08-30 13:12 UTC (permalink / raw)
  To: gcc-patches; +Cc: zaks

This patch adds infrastructure that will be used by the final patch.
Specifically:

  - it splits the generation of register moves into two: schedule_reg_moves
    records moves in the partial schedule, while apply_reg_moves makes the
    register substitutions.

    This patch doesn't actually schedule the moves.  Instead, there's
    some throw-away code in apply_reg_moves to emit the moves in the
    same as we do now.  That's throw-away code that will be removed
    in the final patch.

  - schedule_reg_moves is allowed to fail.  We then try again with the
    next ii (subject to the usual ii limits).

    In this patch, schedule_reg_moves always returns true.

  - The partial schedule uses ids to represent register moves.
    The first register move has id g->num_nodes.

Richard


gcc/
	* modulo-sched.c (ps_insn): Adjust comment.
	(ps_reg_move_info): New structure.
	(partial_schedule): Add reg_moves field.
	(SCHED_PARAMS): Use node_sched_param_vec instead of node_sched_params.
	(node_sched_params): Turn first_reg_move into an identifier.
	(ps_reg_move): New function.
	(ps_rtl_insn): Cope with register moves.
	(ps_first_note): Adjust comment and assert that the instruction
	isn't a register move.
	(node_sched_params): Replace with...
	(node_sched_param_vec): ...this vector.
	(set_node_sched_params): Adjust accordingly.
	(print_node_sched_params): Take a partial schedule instead of a ddg.
	Use ps_rtl_insn and ps_reg_move.
	(generate_reg_moves): Rename to...
	(schedule_reg_moves): ...this.  Remove rescan parameter.  Record each
	move in the partial schedule, but don't emit it here.  Don't perform
	register substitutions here either.
	(apply_reg_moves): New function.
	(duplicate_insns_of_cycles): Use register indices directly,
	rather than finding instructions using PREV_INSN.  Use ps_reg_move.
	(sms_schedule): Call schedule_reg_moves before committing to
	a partial schedule.   Try the next ii if the schedule fails.
	Use apply_reg_moves instead of generate_reg_moves.  Adjust
	call to print_node_sched_params.  Free node_sched_param_vec
	instead of node_sched_params.
	(create_partial_schedule): Initialize reg_moves.
	(free_partial_schedule): Free reg_moves.

Index: gcc/modulo-sched.c
===================================================================
*** gcc/modulo-sched.c	2011-08-30 11:32:13.924908138 +0100
--- gcc/modulo-sched.c	2011-08-30 13:06:36.528669762 +0100
*************** #define PS_STAGE_COUNT(ps) (((partial_sc
*** 124,130 ****
  /* A single instruction in the partial schedule.  */
  struct ps_insn
  {
!   /* The number of the ddg node whose instruction is being scheduled.  */
    int id;
  
    /* The (absolute) cycle in which the PS instruction is scheduled.
--- 124,132 ----
  /* A single instruction in the partial schedule.  */
  struct ps_insn
  {
!   /* Identifies the instruction to be scheduled.  Values smaller than
!      the ddg's num_nodes refer directly to ddg nodes.  A value of
!      X - num_nodes refers to register move X.  */
    int id;
  
    /* The (absolute) cycle in which the PS instruction is scheduled.
*************** struct ps_insn
*** 137,142 ****
--- 139,170 ----
  
  };
  
+ /* Information about a register move that has been added to a partial
+    schedule.  */
+ struct ps_reg_move_info
+ {
+   /* The dependencies exist between the ps_insn with id DEF and the
+      ps_insns with the ids in USES.  */
+   int def;
+   sbitmap uses;
+ 
+   /* DEF's instruction defines OLD_REG.  The original form of
+      USES' instructions used it.  */
+   rtx old_reg;
+ 
+   /* USES's instructions must now use NEW_REG instead of OLD_REG.  */
+   rtx new_reg;
+ 
+   /* An instruction that sets NEW_REG to the correct value.  The first
+      move associated with DEF will have an rhs of OLD_REG; later moves
+      use the result of the previous move.  */
+   rtx insn;
+ };
+ 
+ typedef struct ps_reg_move_info ps_reg_move_info;
+ DEF_VEC_O (ps_reg_move_info);
+ DEF_VEC_ALLOC_O (ps_reg_move_info, heap);
+ 
  /* Holds the partial schedule as an array of II rows.  Each entry of the
     array points to a linked list of PS_INSNs, which represents the
     instructions that are scheduled for that row.  */
*************** struct partial_schedule
*** 148,153 ****
--- 176,185 ----
    /* rows[i] points to linked list of insns scheduled in row i (0<=i<ii).  */
    ps_insn_ptr *rows;
  
+   /* All the moves added for this partial schedule.  Index X has
+      a ps_insn id of X + g->num_nodes.  */
+   VEC (ps_reg_move_info, heap) *reg_moves;
+ 
    /*  rows_length[i] holds the number of instructions in the row.
        It is used only (as an optimization) to back off quickly from
        trying to schedule a node in a full row; that is, to avoid running
*************** static bool remove_node_from_ps (partial
*** 201,207 ****
  
  #define NODE_ASAP(node) ((node)->aux.count)
  
! #define SCHED_PARAMS(x) (&node_sched_params[x])
  #define SCHED_TIME(x) (SCHED_PARAMS (x)->time)
  #define SCHED_FIRST_REG_MOVE(x) (SCHED_PARAMS (x)->first_reg_move)
  #define SCHED_NREG_MOVES(x) (SCHED_PARAMS (x)->nreg_moves)
--- 233,239 ----
  
  #define NODE_ASAP(node) ((node)->aux.count)
  
! #define SCHED_PARAMS(x) VEC_index (node_sched_params, node_sched_param_vec, x)
  #define SCHED_TIME(x) (SCHED_PARAMS (x)->time)
  #define SCHED_FIRST_REG_MOVE(x) (SCHED_PARAMS (x)->first_reg_move)
  #define SCHED_NREG_MOVES(x) (SCHED_PARAMS (x)->nreg_moves)
*************** typedef struct node_sched_params
*** 214,227 ****
  {
    int time;	/* The absolute scheduling cycle (time >= asap).  */
  
!   /* The following field (first_reg_move) is a pointer to the first
       register-move instruction added to handle the modulo-variable-expansion
       of the register defined by this node.  This register-move copies the
       original register defined by the node.  */
!   rtx first_reg_move;
  
!   /* The number of register-move instructions added, immediately preceding
!      first_reg_move.  */
    int nreg_moves;
  
    int row;    /* Holds time % ii.  */
--- 246,258 ----
  {
    int time;	/* The absolute scheduling cycle (time >= asap).  */
  
!   /* The following field (first_reg_move) is the ps_insn id of the first
       register-move instruction added to handle the modulo-variable-expansion
       of the register defined by this node.  This register-move copies the
       original register defined by the node.  */
!   int first_reg_move;
  
!   /* The number of register-move instructions added.  */
    int nreg_moves;
  
    int row;    /* Holds time % ii.  */
*************** typedef struct node_sched_params
*** 232,237 ****
--- 263,271 ----
    int column;
  } *node_sched_params_ptr;
  
+ typedef struct node_sched_params node_sched_params;
+ DEF_VEC_O (node_sched_params);
+ DEF_VEC_ALLOC_O (node_sched_params, heap);
  \f
  /* The following three functions are copied from the current scheduler
     code in order to use sched_analyze() for computing the dependencies.
*************** static struct haifa_sched_info sms_sched
*** 280,299 ****
    0
  };
  
  /* Return the rtl instruction that is being scheduled by partial schedule
     instruction ID, which belongs to schedule PS.  */
  static rtx
  ps_rtl_insn (partial_schedule_ptr ps, int id)
  {
!   return ps->g->nodes[id].insn;
  }
  
! /* Return the first instruction in the original (unscheduled) loop that
!    was associated with ps_rtl_insn (PS, ID).  If the instruction had
!    some notes before it, this is the first of those notes.  */
  static rtx
  ps_first_note (partial_schedule_ptr ps, int id)
  {
    return ps->g->nodes[id].first_note;
  }
  
--- 314,348 ----
    0
  };
  
+ /* Partial schedule instruction ID in PS is a register move.  Return
+    information about it.  */
+ static struct ps_reg_move_info *
+ ps_reg_move (partial_schedule_ptr ps, int id)
+ {
+   gcc_checking_assert (id >= ps->g->num_nodes);
+   return VEC_index (ps_reg_move_info, ps->reg_moves, id - ps->g->num_nodes);
+ }
+ 
  /* Return the rtl instruction that is being scheduled by partial schedule
     instruction ID, which belongs to schedule PS.  */
  static rtx
  ps_rtl_insn (partial_schedule_ptr ps, int id)
  {
!   if (id < ps->g->num_nodes)
!     return ps->g->nodes[id].insn;
!   else
!     return ps_reg_move (ps, id)->insn;
  }
  
! /* Partial schedule instruction ID, which belongs to PS, occured in
!    the original (unscheduled) loop.  Return the first instruction
!    in the loop that was associated with ps_rtl_insn (PS, ID).
!    If the instruction had some notes before it, this is the first
!    of those notes.  */
  static rtx
  ps_first_note (partial_schedule_ptr ps, int id)
  {
+   gcc_assert (id < ps->g->num_nodes);
    return ps->g->nodes[id].first_note;
  }
  
*************** res_MII (ddg_ptr g)
*** 397,414 ****
  }
  
  
! /* Points to the array that contains the sched data for each node.  */
! static node_sched_params_ptr node_sched_params;
  
  /* Allocate sched_params for each node and initialize it.  */
  static void
  set_node_sched_params (ddg_ptr g)
  {
!   node_sched_params = XCNEWVEC (struct node_sched_params, g->num_nodes);
  }
  
  static void
! print_node_sched_params (FILE *file, int num_nodes, ddg_ptr g)
  {
    int i;
  
--- 446,465 ----
  }
  
  
! /* A vector that contains the sched data for each ps_insn.  */
! static VEC (node_sched_params, heap) *node_sched_param_vec;
  
  /* Allocate sched_params for each node and initialize it.  */
  static void
  set_node_sched_params (ddg_ptr g)
  {
!   VEC_truncate (node_sched_params, node_sched_param_vec, 0);
!   VEC_safe_grow_cleared (node_sched_params, heap,
! 			 node_sched_param_vec, g->num_nodes);
  }
  
  static void
! print_node_sched_params (FILE *file, int num_nodes, partial_schedule_ptr ps)
  {
    int i;
  
*************** print_node_sched_params (FILE *file, int
*** 417,435 ****
    for (i = 0; i < num_nodes; i++)
      {
        node_sched_params_ptr nsp = SCHED_PARAMS (i);
-       rtx reg_move = nsp->first_reg_move;
        int j;
  
        fprintf (file, "Node = %d; INSN = %d\n", i,
! 	       (INSN_UID (g->nodes[i].insn)));
!       fprintf (file, " asap = %d:\n", NODE_ASAP (&g->nodes[i]));
        fprintf (file, " time = %d:\n", nsp->time);
        fprintf (file, " nreg_moves = %d:\n", nsp->nreg_moves);
        for (j = 0; j < nsp->nreg_moves; j++)
  	{
  	  fprintf (file, " reg_move = ");
! 	  print_rtl_single (file, reg_move);
! 	  reg_move = PREV_INSN (reg_move);
  	}
      }
  }
--- 468,486 ----
    for (i = 0; i < num_nodes; i++)
      {
        node_sched_params_ptr nsp = SCHED_PARAMS (i);
        int j;
  
        fprintf (file, "Node = %d; INSN = %d\n", i,
! 	       INSN_UID (ps_rtl_insn (ps, i)));
!       fprintf (file, " asap = %d:\n", NODE_ASAP (&ps->g->nodes[i]));
        fprintf (file, " time = %d:\n", nsp->time);
        fprintf (file, " nreg_moves = %d:\n", nsp->nreg_moves);
        for (j = 0; j < nsp->nreg_moves; j++)
  	{
+ 	  ps_reg_move_info *move = ps_reg_move (ps, nsp->first_reg_move + j);
+ 
  	  fprintf (file, " reg_move = ");
! 	  print_rtl_single (file, move->insn);
  	}
      }
  }
*************** print_node_sched_params (FILE *file, int
*** 445,452 ****
     nreg_moves = ----------------------------------- + 1 - {   dependence.
                              ii                          { 1 if not.
  */
! static void
! generate_reg_moves (partial_schedule_ptr ps, bool rescan)
  {
    ddg_ptr g = ps->g;
    int ii = ps->ii;
--- 496,503 ----
     nreg_moves = ----------------------------------- + 1 - {   dependence.
                              ii                          { 1 if not.
  */
! static bool
! schedule_reg_moves (partial_schedule_ptr ps)
  {
    ddg_ptr g = ps->g;
    int ii = ps->ii;
*************** generate_reg_moves (partial_schedule_ptr
*** 457,465 ****
        ddg_node_ptr u = &g->nodes[i];
        ddg_edge_ptr e;
        int nreg_moves = 0, i_reg_move;
-       sbitmap *uses_of_defs;
-       rtx last_reg_move;
        rtx prev_reg, old_reg;
  
        /* Compute the number of reg_moves needed for u, by looking at life
  	 ranges started at u (excluding self-loops).  */
--- 508,515 ----
        ddg_node_ptr u = &g->nodes[i];
        ddg_edge_ptr e;
        int nreg_moves = 0, i_reg_move;
        rtx prev_reg, old_reg;
+       int first_move;
  
        /* Compute the number of reg_moves needed for u, by looking at life
  	 ranges started at u (excluding self-loops).  */
*************** generate_reg_moves (partial_schedule_ptr
*** 485,496 ****
        if (nreg_moves == 0)
  	continue;
  
        /* Every use of the register defined by node may require a different
  	 copy of this register, depending on the time the use is scheduled.
! 	 Set a bitmap vector, telling which nodes use each copy of this
! 	 register.  */
!       uses_of_defs = sbitmap_vector_alloc (nreg_moves, g->num_nodes);
!       sbitmap_vector_zero (uses_of_defs, nreg_moves);
        for (e = u->out; e; e = e->next_out)
  	if (e->type == TRUE_DEP && e->dest != e->src)
  	  {
--- 535,569 ----
        if (nreg_moves == 0)
  	continue;
  
+       /* Create NREG_MOVES register moves.  */
+       first_move = VEC_length (ps_reg_move_info, ps->reg_moves);
+       VEC_safe_grow_cleared (ps_reg_move_info, heap, ps->reg_moves,
+ 			     first_move + nreg_moves);
+ 
+       /* Record the moves associated with this node.  */
+       first_move += ps->g->num_nodes;
+       SCHED_FIRST_REG_MOVE (i) = first_move;
+       SCHED_NREG_MOVES (i) = nreg_moves;
+ 
+       /* Generate each move.  */
+       old_reg = prev_reg = SET_DEST (single_set (u->insn));
+       for (i_reg_move = 0; i_reg_move < nreg_moves; i_reg_move++)
+ 	{
+ 	  ps_reg_move_info *move = ps_reg_move (ps, first_move + i_reg_move);
+ 
+ 	  move->def = i;
+ 	  move->uses = sbitmap_alloc (g->num_nodes);
+ 	  move->old_reg = old_reg;
+ 	  move->new_reg = gen_reg_rtx (GET_MODE (prev_reg));
+ 	  move->insn = gen_move_insn (move->new_reg, copy_rtx (prev_reg));
+ 	  sbitmap_zero (move->uses);
+ 
+ 	  prev_reg = move->new_reg;
+ 	}
+ 
        /* Every use of the register defined by node may require a different
  	 copy of this register, depending on the time the use is scheduled.
! 	 Record which uses require which move results.  */
        for (e = u->out; e; e = e->next_out)
  	if (e->type == TRUE_DEP && e->dest != e->src)
  	  {
*************** generate_reg_moves (partial_schedule_ptr
*** 506,545 ****
  	      dest_copy--;
  
  	    if (dest_copy)
! 	      SET_BIT (uses_of_defs[dest_copy - 1], e->dest->cuid);
! 	  }
! 
!       /* Now generate the reg_moves, attaching relevant uses to them.  */
!       SCHED_NREG_MOVES (i) = nreg_moves;
!       old_reg = prev_reg = copy_rtx (SET_DEST (single_set (u->insn)));
!       /* Insert the reg-moves right before the notes which precede
!          the insn they relates to.  */
!       last_reg_move = u->first_note;
! 
!       for (i_reg_move = 0; i_reg_move < nreg_moves; i_reg_move++)
! 	{
! 	  unsigned int i_use = 0;
! 	  rtx new_reg = gen_reg_rtx (GET_MODE (prev_reg));
! 	  rtx reg_move = gen_move_insn (new_reg, prev_reg);
! 	  sbitmap_iterator sbi;
  
! 	  add_insn_before (reg_move, last_reg_move, NULL);
! 	  last_reg_move = reg_move;
  
! 	  if (!SCHED_FIRST_REG_MOVE (i))
! 	    SCHED_FIRST_REG_MOVE (i) = reg_move;
  
! 	  EXECUTE_IF_SET_IN_SBITMAP (uses_of_defs[i_reg_move], 0, i_use, sbi)
! 	    {
! 	      replace_rtx (g->nodes[i_use].insn, old_reg, new_reg);
! 	      if (rescan)
! 		df_insn_rescan (g->nodes[i_use].insn);
! 	    }
  
! 	  prev_reg = new_reg;
  	}
-       sbitmap_vector_free (uses_of_defs);
      }
  }
  
  /* Update the sched_params (time, row and stage) for node U using the II,
--- 579,617 ----
  	      dest_copy--;
  
  	    if (dest_copy)
! 	      {
! 		ps_reg_move_info *move;
  
! 		move = ps_reg_move (ps, first_move + dest_copy - 1);
! 		SET_BIT (move->uses, e->dest->cuid);
! 	      }
! 	  }
!     }
!   return true;
! }
  
! /* Emit the moves associatied with PS.  Apply the substitutions
!    associated with them.  */
! static void
! apply_reg_moves (partial_schedule_ptr ps)
! {
!   ps_reg_move_info *move;
!   int i;
  
!   FOR_EACH_VEC_ELT (ps_reg_move_info, ps->reg_moves, i, move)
!     {
!       unsigned int i_use;
!       sbitmap_iterator sbi;
  
!       EXECUTE_IF_SET_IN_SBITMAP (move->uses, 0, i_use, sbi)
! 	{
! 	  replace_rtx (ps->g->nodes[i_use].insn, move->old_reg, move->new_reg);
! 	  df_insn_rescan (ps->g->nodes[i_use].insn);
  	}
      }
+ 
+   FOR_EACH_VEC_ELT_REVERSE (ps_reg_move_info, ps->reg_moves, i, move)
+     add_insn_before (move->insn, ps_first_note (ps, move->def), NULL);
  }
  
  /* Update the sched_params (time, row and stage) for node U using the II,
*************** duplicate_insns_of_cycles (partial_sched
*** 856,863 ****
      for (ps_ij = ps->rows[row]; ps_ij; ps_ij = ps_ij->next_in_row)
        {
  	int u = ps_ij->id;
! 	int j, i_reg_moves;
! 	rtx reg_move = NULL_RTX;
  	rtx u_insn;
  
          /* Do not duplicate any insn which refers to count_reg as it
--- 928,934 ----
      for (ps_ij = ps->rows[row]; ps_ij; ps_ij = ps_ij->next_in_row)
        {
  	int u = ps_ij->id;
! 	int j, i_reg_moves, i_reg_move;
  	rtx u_insn;
  
          /* Do not duplicate any insn which refers to count_reg as it
*************** duplicate_insns_of_cycles (partial_sched
*** 881,892 ****
  	    i_reg_moves = MIN (i_reg_moves, SCHED_NREG_MOVES (u));
  
  	    /* The reg_moves start from the *first* reg_move backwards.  */
! 	    if (i_reg_moves)
! 	      {
! 		reg_move = SCHED_FIRST_REG_MOVE (u);
! 		for (j = 1; j < i_reg_moves; j++)
! 		  reg_move = PREV_INSN (reg_move);
! 	      }
  	  }
  	else /* It's for the epilog.  */
  	  {
--- 952,958 ----
  	    i_reg_moves = MIN (i_reg_moves, SCHED_NREG_MOVES (u));
  
  	    /* The reg_moves start from the *first* reg_move backwards.  */
! 	    i_reg_move = SCHED_FIRST_REG_MOVE (u) + (i_reg_moves - 1);
  	  }
  	else /* It's for the epilog.  */
  	  {
*************** duplicate_insns_of_cycles (partial_sched
*** 900,915 ****
  	    i_reg_moves = MIN (i_reg_moves, SCHED_NREG_MOVES (u));
  
  	    /* The reg_moves start from the *last* reg_move forwards.  */
! 	    if (i_reg_moves)
! 	      {
! 		reg_move = SCHED_FIRST_REG_MOVE (u);
! 		for (j = 1; j < SCHED_NREG_MOVES (u); j++)
! 		  reg_move = PREV_INSN (reg_move);
! 	      }
  	  }
  
! 	for (j = 0; j < i_reg_moves; j++, reg_move = NEXT_INSN (reg_move))
! 	  emit_insn (copy_rtx (PATTERN (reg_move)));
  	if (SCHED_STAGE (u) >= from_stage
  	    && SCHED_STAGE (u) <= to_stage)
  	  duplicate_insn_chain (ps_first_note (ps, u), u_insn);
--- 966,980 ----
  	    i_reg_moves = MIN (i_reg_moves, SCHED_NREG_MOVES (u));
  
  	    /* The reg_moves start from the *last* reg_move forwards.  */
! 	    i_reg_move = SCHED_FIRST_REG_MOVE (u) + (SCHED_NREG_MOVES (u) - 1);
  	  }
  
! 	for (j = 0; j < i_reg_moves; j++)
! 	  {
! 	    ps_reg_move_info *move = ps_reg_move (ps, i_reg_move - j);
! 
! 	    emit_insn (copy_rtx (PATTERN (move->insn)));
! 	  }
  	if (SCHED_STAGE (u) >= from_stage
  	    && SCHED_STAGE (u) <= to_stage)
  	  duplicate_insn_chain (ps_first_note (ps, u), u_insn);
*************** sms_schedule (void)
*** 1300,1308 ****
        rtx head, tail;
        rtx count_reg, count_init;
        int mii, rec_mii;
!       unsigned stage_count = 0;
        HOST_WIDEST_INT loop_count = 0;
!       bool opt_sc_p = false;
  
        if (! (g = g_arr[loop->num]))
          continue;
--- 1365,1373 ----
        rtx head, tail;
        rtx count_reg, count_init;
        int mii, rec_mii;
!       unsigned stage_count;
        HOST_WIDEST_INT loop_count = 0;
!       bool opt_sc_p;
  
        if (! (g = g_arr[loop->num]))
          continue;
*************** sms_schedule (void)
*** 1379,1432 ****
  	fprintf (dump_file, "SMS iis %d %d %d (rec_mii, mii, maxii)\n",
  		 rec_mii, mii, maxii);
  
!       set_node_sched_params (g);
! 
!       ps = sms_schedule_by_order (g, mii, maxii, node_order);
!       
!       if (ps)
  	{
! 	  /* Try to achieve optimized SC by normalizing the partial
! 	     schedule (having the cycles start from cycle zero).
! 	     The branch location must be placed in row ii-1 in the
! 	     final scheduling.	If failed, shift all instructions to
! 	     position the branch in row ii-1.  */
! 	  opt_sc_p = optimize_sc (ps, g);
! 	  if (opt_sc_p)
! 	    stage_count = calculate_stage_count (ps, 0);
! 	  else
  	    {
! 	      /* Bring the branch to cycle ii-1.  */
! 	      int amount = SCHED_TIME (g->closing_branch->cuid) - (ps->ii - 1);
! 	      
! 	      if (dump_file)
! 		fprintf (dump_file, "SMS schedule branch at cycle ii-1\n");
! 	      
! 	      stage_count = calculate_stage_count (ps, amount);
  	    }
! 	  
! 	  gcc_assert (stage_count >= 1);
! 	  PS_STAGE_COUNT (ps) = stage_count;
! 	}
!       
!       /* The default value of PARAM_SMS_MIN_SC is 2 as stage count of
! 	 1 means that there is no interleaving between iterations thus
! 	 we let the scheduling passes do the job in this case.  */
!       if (stage_count < (unsigned) PARAM_VALUE (PARAM_SMS_MIN_SC)
! 	  || (count_init && (loop_count <= stage_count))
! 	  || (flag_branch_probabilities && (trip_count <= stage_count)))
! 	{
! 	  if (dump_file)
  	    {
! 	      fprintf (dump_file, "SMS failed... \n");
! 	      fprintf (dump_file, "SMS sched-failed (stage-count=%d, loop-count=", stage_count);
! 	      fprintf (dump_file, HOST_WIDEST_INT_PRINT_DEC, loop_count);
! 	      fprintf (dump_file, ", trip-count=");
! 	      fprintf (dump_file, HOST_WIDEST_INT_PRINT_DEC, trip_count);
! 	      fprintf (dump_file, ")\n");
  	    }
! 	}
!       else
! 	{
            if (!opt_sc_p)
              {
  	      /* Rotate the partial schedule to have the branch in row ii-1.  */
--- 1444,1503 ----
  	fprintf (dump_file, "SMS iis %d %d %d (rec_mii, mii, maxii)\n",
  		 rec_mii, mii, maxii);
  
!       for (;;)
  	{
! 	  set_node_sched_params (g);
! 
! 	  stage_count = 0;
! 	  opt_sc_p = false;
! 	  ps = sms_schedule_by_order (g, mii, maxii, node_order);
! 
! 	  if (ps)
  	    {
! 	      /* Try to achieve optimized SC by normalizing the partial
! 		 schedule (having the cycles start from cycle zero).
! 		 The branch location must be placed in row ii-1 in the
! 		 final scheduling.	If failed, shift all instructions to
! 		 position the branch in row ii-1.  */
! 	      opt_sc_p = optimize_sc (ps, g);
! 	      if (opt_sc_p)
! 		stage_count = calculate_stage_count (ps, 0);
! 	      else
! 		{
! 		  /* Bring the branch to cycle ii-1.  */
! 		  int amount = (SCHED_TIME (g->closing_branch->cuid)
! 				- (ps->ii - 1));
! 
! 		  if (dump_file)
! 		    fprintf (dump_file, "SMS schedule branch at cycle ii-1\n");
! 
! 		  stage_count = calculate_stage_count (ps, amount);
! 		}
! 
! 	      gcc_assert (stage_count >= 1);
! 	      PS_STAGE_COUNT (ps) = stage_count;
  	    }
! 
! 	  /* The default value of PARAM_SMS_MIN_SC is 2 as stage count of
! 	     1 means that there is no interleaving between iterations thus
! 	     we let the scheduling passes do the job in this case.  */
! 	  if (stage_count < (unsigned) PARAM_VALUE (PARAM_SMS_MIN_SC)
! 	      || (count_init && (loop_count <= stage_count))
! 	      || (flag_branch_probabilities && (trip_count <= stage_count)))
  	    {
! 	      if (dump_file)
! 		{
! 		  fprintf (dump_file, "SMS failed... \n");
! 		  fprintf (dump_file, "SMS sched-failed (stage-count=%d,"
! 			   " loop-count=", stage_count);
! 		  fprintf (dump_file, HOST_WIDEST_INT_PRINT_DEC, loop_count);
! 		  fprintf (dump_file, ", trip-count=");
! 		  fprintf (dump_file, HOST_WIDEST_INT_PRINT_DEC, trip_count);
! 		  fprintf (dump_file, ")\n");
! 		}
! 	      break;
  	    }
! 
            if (!opt_sc_p)
              {
  	      /* Rotate the partial schedule to have the branch in row ii-1.  */
*************** sms_schedule (void)
*** 1438,1443 ****
--- 1509,1521 ----
  	  
  	  set_columns_for_ps (ps);
  
+ 	  if (!schedule_reg_moves (ps))
+ 	    {
+ 	      mii = ps->ii + 1;
+ 	      free_partial_schedule (ps);
+ 	      continue;
+ 	    }
+ 
  	  canon_loop (loop);
  
            if (dump_file)
*************** sms_schedule (void)
*** 1476,1490 ****
  	  /* The life-info is not valid any more.  */
  	  df_set_bb_dirty (g->bb);
  
! 	  generate_reg_moves (ps, true);
  	  if (dump_file)
! 	    print_node_sched_params (dump_file, g->num_nodes, g);
  	  /* Generate prolog and epilog.  */
            generate_prolog_epilog (ps, loop, count_reg, count_init);
  	}
  
        free_partial_schedule (ps);
!       free (node_sched_params);
        free (node_order);
        free_ddg (g);
      }
--- 1554,1569 ----
  	  /* The life-info is not valid any more.  */
  	  df_set_bb_dirty (g->bb);
  
! 	  apply_reg_moves (ps);
  	  if (dump_file)
! 	    print_node_sched_params (dump_file, g->num_nodes, ps);
  	  /* Generate prolog and epilog.  */
            generate_prolog_epilog (ps, loop, count_reg, count_init);
+ 	  break;
  	}
  
        free_partial_schedule (ps);
!       VEC_free (node_sched_params, heap, node_sched_param_vec);
        free (node_order);
        free_ddg (g);
      }
*************** create_partial_schedule (int ii, ddg_ptr
*** 2571,2576 ****
--- 2650,2656 ----
    partial_schedule_ptr ps = XNEW (struct partial_schedule);
    ps->rows = (ps_insn_ptr *) xcalloc (ii, sizeof (ps_insn_ptr));
    ps->rows_length = (int *) xcalloc (ii, sizeof (int));
+   ps->reg_moves = NULL;
    ps->ii = ii;
    ps->history = history;
    ps->min_cycle = INT_MAX;
*************** free_ps_insns (partial_schedule_ptr ps)
*** 2605,2612 ****
--- 2685,2700 ----
  static void
  free_partial_schedule (partial_schedule_ptr ps)
  {
+   ps_reg_move_info *move;
+   unsigned int i;
+ 
    if (!ps)
      return;
+ 
+   FOR_EACH_VEC_ELT (ps_reg_move_info, ps->reg_moves, i, move)
+     sbitmap_free (move->uses);
+   VEC_free (ps_reg_move_info, heap, ps->reg_moves);
+ 
    free_ps_insns (ps);
    free (ps->rows);
    free (ps->rows_length);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [4/4] Make SMS schedule register moves
  2011-08-30 12:45 [0/4] Make SMS schedule register moves Richard Sandiford
                   ` (2 preceding siblings ...)
  2011-08-30 13:12 ` [3/4] SMS: Record moves in the partial schedule Richard Sandiford
@ 2011-08-30 13:17 ` Richard Sandiford
  2011-08-30 13:22   ` Richard Guenther
  3 siblings, 1 reply; 15+ messages in thread
From: Richard Sandiford @ 2011-08-30 13:17 UTC (permalink / raw)
  To: gcc-patches; +Cc: zaks

This is the move-scheduling patch itself.  It should be fairly
self-explanatory.  Let me know if it isn't, and I'll try to improve
the commentary.

One potentially controversial change is to the way we handle moves
in the prologue and epilogue.  The current code uses a conservative
check to decide which stages need which moves.  This check copes
with values that are live before the loop, and emits some moves
that aren't actually needed.  The code also emits chains of moves
that can be trivially optimised through propagation.  We rely on
later patches to clean this up for us (and they do).

So, rather than keep a rather complicated check here, I've simply
emitted the moves for all stages.  In principle, that will generate
a little more dead code than now in cases where chains of two moves
are needed, but that seems to be very rare (when moves are scheduled).

Richard


gcc/
	* modulo-sched.c (extend_node_sched_params): New function.
	(print_node_sched_params): Print the stage.
	(set_columns_for_row, schedule_reg_move): New functions.
	(set_columns_for_ps): Move up file and use set_columns_for_now.
	(schedule_reg_moves): Call extend_node_sched_params and
	schedule_reg_move.  Extend size of uses bitmap.  Return false
	if a move could not be scheduled.
	(apply_reg_moves): Don't emit moves here.
	(permute_partial_schedule): Handle register moves.
	(duplicate_insns_of_cycles): Remove for_prolog.  Always emit moves.
	(generate_prolog_epilog): Update calls accordingly.

Index: gcc/modulo-sched.c
===================================================================
--- gcc/modulo-sched.c	2011-08-30 13:06:36.528669762 +0100
+++ gcc/modulo-sched.c	2011-08-30 13:22:08.029124537 +0100
@@ -220,8 +220,6 @@ static partial_schedule_ptr sms_schedule
 static void permute_partial_schedule (partial_schedule_ptr, rtx);
 static void generate_prolog_epilog (partial_schedule_ptr, struct loop *,
                                     rtx, rtx);
-static void duplicate_insns_of_cycles (partial_schedule_ptr,
-				       int, int, int, rtx);
 static int calculate_stage_count (partial_schedule_ptr, int);
 static void calculate_must_precede_follow (ddg_node_ptr, int, int,
 					   int, int, sbitmap, sbitmap, sbitmap);
@@ -458,6 +456,15 @@ set_node_sched_params (ddg_ptr g)
 			 node_sched_param_vec, g->num_nodes);
 }
 
+/* Make sure that node_sched_param_vec has an entry for every move in PS.  */
+static void
+extend_node_sched_params (partial_schedule_ptr ps)
+{
+  VEC_safe_grow_cleared (node_sched_params, heap, node_sched_param_vec,
+			 ps->g->num_nodes + VEC_length (ps_reg_move_info,
+							ps->reg_moves));
+}
+
 static void
 print_node_sched_params (FILE *file, int num_nodes, partial_schedule_ptr ps)
 {
@@ -474,6 +481,7 @@ print_node_sched_params (FILE *file, int
 	       INSN_UID (ps_rtl_insn (ps, i)));
       fprintf (file, " asap = %d:\n", NODE_ASAP (&ps->g->nodes[i]));
       fprintf (file, " time = %d:\n", nsp->time);
+      fprintf (file, " stage = %d:\n", nsp->stage);
       fprintf (file, " nreg_moves = %d:\n", nsp->nreg_moves);
       for (j = 0; j < nsp->nreg_moves; j++)
 	{
@@ -485,6 +493,168 @@ print_node_sched_params (FILE *file, int
     }
 }
 
+/* Set SCHED_COLUMN for each instruction in row ROW of PS.  */
+static void
+set_columns_for_row (partial_schedule_ptr ps, int row)
+{
+  ps_insn_ptr cur_insn;
+  int column;
+
+  column = 0;
+  for (cur_insn = ps->rows[row]; cur_insn; cur_insn = cur_insn->next_in_row)
+    SCHED_COLUMN (cur_insn->id) = column++;
+}
+
+/* Set SCHED_COLUMN for each instruction in PS.  */
+static void
+set_columns_for_ps (partial_schedule_ptr ps)
+{
+  int row;
+
+  for (row = 0; row < ps->ii; row++)
+    set_columns_for_row (ps, row);
+}
+
+/* Try to schedule the move with ps_insn identifier I_REG_MOVE in PS.
+   The source of the move is provided by I_MUST_FOLLOW, which has
+   already been scheduled.  MUST_FOLLOW is a scratch bitmap that
+   is big enough to hold I_MUST_FOLLOW.
+
+   Return true on success.  */
+static bool
+schedule_reg_move (partial_schedule_ptr ps, int i_reg_move,
+		   sbitmap must_follow, int i_must_follow)
+{
+  unsigned int u;
+  int i, start_row, end_row, this_start, this_end, this_row, latency;
+  int origin_row, origin_column;
+  sbitmap_iterator sbi;
+  ps_reg_move_info *move;
+
+  move = ps_reg_move (ps, i_reg_move);
+
+  /* The cyclic lifetime of move->new_reg starts and ends at move->def
+     (the instruction that defines move->old_reg).  We need to decide
+     where in that cyclic lifetime the move should go.  The position
+     is limited by I_MUST_FOLLOW (which defines the source of the move)
+     and the nodes in move->uses (which consume the result).
+
+     Specifically, the lowest possible row (relative to move->def)
+     is the maximum of:
+
+       a) the row in which the result of the previous iteration's
+	  I_MUST_FOLLOW becomes available
+       b) the first row in which every use of the previous iteration's
+	  move->new_reg has completed.
+
+     The highest possible row (again relative to move->def) is
+     the minimum of:
+
+       c) the row in which the current iteration's I_MUST_FOLLOW
+	  executes (and thus makes the source of the move unavailable)
+       d) the last row in which every use of this iteration's
+	  move->new_reg could be satisfied without delay.
+
+     Because all positions are relative to move->def, this function
+     uses ROW - ps->ii to represent positions come after move->def.
+     The range includes origin_row twice: "origin_row - ps->ii" is for
+     the positions in origin_row after move->def, while "origin_row"
+     itself is for the positions before move->def.  */
+  origin_row = SCHED_ROW (move->def);
+  origin_column = SCHED_COLUMN (move->def);
+  start_row = origin_row - ps->ii;
+  end_row = origin_row;
+
+  if (dump_file)
+    fprintf (dump_file, "Scheduling register move INSN %d with cyclic lifetime"
+	     " [%d, %d]\n", INSN_UID (move->insn), start_row, end_row);
+
+  /* Limit the range to [a, c].  */
+  this_end = SCHED_ROW (i_must_follow);
+  if (this_end > origin_row
+      || (this_end == origin_row
+	  && SCHED_COLUMN (i_must_follow) > origin_column))
+    this_end -= ps->ii;
+  latency = insn_latency (ps_rtl_insn (ps, i_must_follow), move->insn);
+  this_start = this_end - ps->ii + latency;
+
+  if (dump_file)
+    fprintf (dump_file, "  -- source produced by INSN %d with latency %d,"
+	     " range [%d, %d]\n", INSN_UID (ps_rtl_insn (ps, i_must_follow)),
+	     latency, this_start, this_end);
+
+  if (start_row < this_start)
+    start_row = this_start;
+  if (end_row > this_end)
+    end_row = this_end;
+
+  /* Limit the range to [b, d].  */
+  EXECUTE_IF_SET_IN_SBITMAP (move->uses, 0, u, sbi)
+    {
+      latency = insn_latency (move->insn, ps_rtl_insn (ps, u));
+      this_start = SCHED_ROW (u);
+      if (this_start > origin_row
+	  || (this_start == origin_row
+	      && SCHED_COLUMN (u) > origin_column))
+	this_start -= ps->ii;
+      this_end = this_start + ps->ii - latency;
+
+      if (dump_file)
+	fprintf (dump_file, "  -- destination used by INSN %d with latency"
+		 " %d, range [%d, %d]\n", INSN_UID (ps_rtl_insn (ps, u)),
+		 latency, this_start, this_end);
+
+      if (start_row < this_start)
+	start_row = this_start;
+      if (end_row > this_end)
+	end_row = this_end;
+    }
+
+  sbitmap_zero (must_follow);
+  SET_BIT (must_follow, i_must_follow);
+
+  if (dump_file)
+    fprintf (dump_file, "  -- trying to schedule in rows [%d, %d]\n",
+	     start_row, end_row);
+
+  for (i = end_row; i >= start_row; i--)
+    {
+      ps_insn_ptr psi;
+
+      this_row = i < 0 ? i + ps->ii : i;
+      /* origin_row - ps->ii represents the part of row origin_row after
+	 move->def.  In this case the move must come after both move->uses
+	 and move->def.  */
+      if (i == origin_row - ps->ii)
+	{
+	  gcc_checking_assert (i == start_row);
+	  gcc_checking_assert (!TEST_BIT (move->uses, move->def));
+	  SET_BIT (move->uses, move->def);
+	  psi = ps_add_node_check_conflicts (ps, i_reg_move, this_row,
+					     move->uses, NULL);
+	  RESET_BIT (move->uses, move->def);
+	}
+      else
+	psi = ps_add_node_check_conflicts (ps, i_reg_move, this_row,
+					   move->uses, must_follow);
+
+      if (psi)
+	{
+	  SCHED_ROW (i_reg_move) = this_row;
+	  if (dump_file)
+	    fprintf (dump_file, "  -- scheduled in row %d (%d)\n",
+		     i, this_row);
+	  set_columns_for_row (ps, this_row);
+	  return true;
+	}
+    }
+
+  if (dump_file)
+    fprintf (dump_file, "  -- no available slot\n");
+
+  return false;
+}
+
 /*
    Breaking intra-loop register anti-dependences:
    Each intra-loop register anti-dependence implies a cross-iteration true
@@ -507,9 +677,10 @@ schedule_reg_moves (partial_schedule_ptr
     {
       ddg_node_ptr u = &g->nodes[i];
       ddg_edge_ptr e;
-      int nreg_moves = 0, i_reg_move;
+      int nreg_moves = 0, i_reg_move, i_must_follow;
       rtx prev_reg, old_reg;
       int first_move;
+      sbitmap must_follow;
 
       /* Compute the number of reg_moves needed for u, by looking at life
 	 ranges started at u (excluding self-loops).  */
@@ -539,6 +710,7 @@ schedule_reg_moves (partial_schedule_ptr
       first_move = VEC_length (ps_reg_move_info, ps->reg_moves);
       VEC_safe_grow_cleared (ps_reg_move_info, heap, ps->reg_moves,
 			     first_move + nreg_moves);
+      extend_node_sched_params (ps);
 
       /* Record the moves associated with this node.  */
       first_move += ps->g->num_nodes;
@@ -552,7 +724,7 @@ schedule_reg_moves (partial_schedule_ptr
 	  ps_reg_move_info *move = ps_reg_move (ps, first_move + i_reg_move);
 
 	  move->def = i;
-	  move->uses = sbitmap_alloc (g->num_nodes);
+	  move->uses = sbitmap_alloc (first_move + nreg_moves);
 	  move->old_reg = old_reg;
 	  move->new_reg = gen_reg_rtx (GET_MODE (prev_reg));
 	  move->insn = gen_move_insn (move->new_reg, copy_rtx (prev_reg));
@@ -586,6 +758,19 @@ schedule_reg_moves (partial_schedule_ptr
 		SET_BIT (move->uses, e->dest->cuid);
 	      }
 	  }
+
+      must_follow = sbitmap_alloc (first_move + nreg_moves);
+      i_must_follow = i;
+      for (i_reg_move = 0; i_reg_move < nreg_moves; i_reg_move++)
+	{
+	  if (!schedule_reg_move (ps, first_move + i_reg_move,
+				  must_follow, i_must_follow))
+	    break;
+	  i_must_follow = first_move + i_reg_move;
+	}
+      sbitmap_free (must_follow);
+      if (i_reg_move < nreg_moves)
+	return false;
     }
   return true;
 }
@@ -609,9 +794,6 @@ apply_reg_moves (partial_schedule_ptr ps
 	  df_insn_rescan (ps->g->nodes[i_use].insn);
 	}
     }
-
-  FOR_EACH_VEC_ELT_REVERSE (ps_reg_move_info, ps->reg_moves, i, move)
-    add_insn_before (move->insn, ps_first_note (ps, move->def), NULL);
 }
 
 /* Update the sched_params (time, row and stage) for node U using the II,
@@ -682,22 +864,6 @@ reset_sched_times (partial_schedule_ptr 
       }
 }
  
-/* Set SCHED_COLUMN of each node according to its position in PS.  */
-static void
-set_columns_for_ps (partial_schedule_ptr ps)
-{
-  int row;
-
-  for (row = 0; row < ps->ii; row++)
-    {
-      ps_insn_ptr cur_insn = ps->rows[row];
-      int column = 0;
-
-      for (; cur_insn; cur_insn = cur_insn->next_in_row)
-	SCHED_COLUMN (cur_insn->id) = column++;
-    }
-}
-
 /* Permute the insns according to their order in PS, from row 0 to
    row ii-1, and position them right before LAST.  This schedules
    the insns of the loop kernel.  */
@@ -714,8 +880,13 @@ permute_partial_schedule (partial_schedu
 	rtx insn = ps_rtl_insn (ps, ps_ij->id);
 
 	if (PREV_INSN (last) != insn)
-	  reorder_insns_nobb (ps_first_note (ps, ps_ij->id), insn,
-			      PREV_INSN (last));
+	  {
+	    if (ps_ij->id < ps->g->num_nodes)
+	      reorder_insns_nobb (ps_first_note (ps, ps_ij->id), insn,
+				  PREV_INSN (last));
+	    else
+	      add_insn_before (insn, last, NULL);
+	  }
       }
 }
 
@@ -919,7 +1090,7 @@ optimize_sc (partial_schedule_ptr ps, dd
 
 static void
 duplicate_insns_of_cycles (partial_schedule_ptr ps, int from_stage,
-			   int to_stage, int for_prolog, rtx count_reg)
+			   int to_stage, rtx count_reg)
 {
   int row;
   ps_insn_ptr ps_ij;
@@ -928,7 +1099,6 @@ duplicate_insns_of_cycles (partial_sched
     for (ps_ij = ps->rows[row]; ps_ij; ps_ij = ps_ij->next_in_row)
       {
 	int u = ps_ij->id;
-	int j, i_reg_moves, i_reg_move;
 	rtx u_insn;
 
         /* Do not duplicate any insn which refers to count_reg as it
@@ -942,42 +1112,18 @@ duplicate_insns_of_cycles (partial_sched
             || JUMP_P (u_insn))
           continue;
 
-	if (for_prolog)
-	  {
-	    /* SCHED_STAGE (u) >= from_stage == 0.  Generate increasing
-	       number of reg_moves starting with the second occurrence of
-	       u, which is generated if its SCHED_STAGE <= to_stage.  */
-	    i_reg_moves = to_stage - SCHED_STAGE (u) + 1;
-	    i_reg_moves = MAX (i_reg_moves, 0);
-	    i_reg_moves = MIN (i_reg_moves, SCHED_NREG_MOVES (u));
-
-	    /* The reg_moves start from the *first* reg_move backwards.  */
-	    i_reg_move = SCHED_FIRST_REG_MOVE (u) + (i_reg_moves - 1);
-	  }
-	else /* It's for the epilog.  */
-	  {
-	    /* SCHED_STAGE (u) <= to_stage.  Generate all reg_moves,
-	       starting to decrease one stage after u no longer occurs;
-	       that is, generate all reg_moves until
-	       SCHED_STAGE (u) == from_stage - 1.  */
-	    i_reg_moves = (SCHED_NREG_MOVES (u)
-			   - (from_stage - SCHED_STAGE (u) - 1));
-	    i_reg_moves = MAX (i_reg_moves, 0);
-	    i_reg_moves = MIN (i_reg_moves, SCHED_NREG_MOVES (u));
-
-	    /* The reg_moves start from the *last* reg_move forwards.  */
-	    i_reg_move = SCHED_FIRST_REG_MOVE (u) + (SCHED_NREG_MOVES (u) - 1);
-	  }
-
-	for (j = 0; j < i_reg_moves; j++)
+	if (u < ps->g->num_nodes)
 	  {
-	    ps_reg_move_info *move = ps_reg_move (ps, i_reg_move - j);
-
-	    emit_insn (copy_rtx (PATTERN (move->insn)));
+	    if (SCHED_STAGE (u) >= from_stage
+		&& SCHED_STAGE (u) <= to_stage)
+	      duplicate_insn_chain (ps_first_note (ps, u), u_insn);
 	  }
-	if (SCHED_STAGE (u) >= from_stage
-	    && SCHED_STAGE (u) <= to_stage)
-	  duplicate_insn_chain (ps_first_note (ps, u), u_insn);
+	else
+	  /* For simplicity, we generate all moves for every stage,
+	     even though some of them will be dead.  Later optimizations
+	     will remove the dead instructions and undo some of the
+	     unnecessary renaming that moves would otherwise produce.  */
+	  emit_insn (copy_rtx (PATTERN (u_insn)));
       }
 }
 
@@ -1011,7 +1157,7 @@ generate_prolog_epilog (partial_schedule
     }
 
   for (i = 0; i < last_stage; i++)
-    duplicate_insns_of_cycles (ps, 0, i, 1, count_reg);
+    duplicate_insns_of_cycles (ps, 0, i, count_reg);
 
   /* Put the prolog on the entry edge.  */
   e = loop_preheader_edge (loop);
@@ -1023,7 +1169,7 @@ generate_prolog_epilog (partial_schedule
   start_sequence ();
 
   for (i = 0; i < last_stage; i++)
-    duplicate_insns_of_cycles (ps, i + 1, last_stage, 0, count_reg);
+    duplicate_insns_of_cycles (ps, i + 1, last_stage, count_reg);
 
   /* Put the epilogue on the exit edge.  */
   gcc_assert (single_exit (loop));

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [4/4] Make SMS schedule register moves
  2011-08-30 13:17 ` [4/4] Make SMS schedule register moves Richard Sandiford
@ 2011-08-30 13:22   ` Richard Guenther
  2011-08-30 13:43     ` Richard Sandiford
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Guenther @ 2011-08-30 13:22 UTC (permalink / raw)
  To: gcc-patches, zaks, richard.sandiford

On Tue, Aug 30, 2011 at 2:29 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This is the move-scheduling patch itself.  It should be fairly
> self-explanatory.  Let me know if it isn't, and I'll try to improve
> the commentary.

Can you add some testcases?

Thanks,
Richard.

> One potentially controversial change is to the way we handle moves
> in the prologue and epilogue.  The current code uses a conservative
> check to decide which stages need which moves.  This check copes
> with values that are live before the loop, and emits some moves
> that aren't actually needed.  The code also emits chains of moves
> that can be trivially optimised through propagation.  We rely on
> later patches to clean this up for us (and they do).
>
> So, rather than keep a rather complicated check here, I've simply
> emitted the moves for all stages.  In principle, that will generate
> a little more dead code than now in cases where chains of two moves
> are needed, but that seems to be very rare (when moves are scheduled).
>
> Richard
>
>
> gcc/
>        * modulo-sched.c (extend_node_sched_params): New function.
>        (print_node_sched_params): Print the stage.
>        (set_columns_for_row, schedule_reg_move): New functions.
>        (set_columns_for_ps): Move up file and use set_columns_for_now.
>        (schedule_reg_moves): Call extend_node_sched_params and
>        schedule_reg_move.  Extend size of uses bitmap.  Return false
>        if a move could not be scheduled.
>        (apply_reg_moves): Don't emit moves here.
>        (permute_partial_schedule): Handle register moves.
>        (duplicate_insns_of_cycles): Remove for_prolog.  Always emit moves.
>        (generate_prolog_epilog): Update calls accordingly.
>
> Index: gcc/modulo-sched.c
> ===================================================================
> --- gcc/modulo-sched.c  2011-08-30 13:06:36.528669762 +0100
> +++ gcc/modulo-sched.c  2011-08-30 13:22:08.029124537 +0100
> @@ -220,8 +220,6 @@ static partial_schedule_ptr sms_schedule
>  static void permute_partial_schedule (partial_schedule_ptr, rtx);
>  static void generate_prolog_epilog (partial_schedule_ptr, struct loop *,
>                                     rtx, rtx);
> -static void duplicate_insns_of_cycles (partial_schedule_ptr,
> -                                      int, int, int, rtx);
>  static int calculate_stage_count (partial_schedule_ptr, int);
>  static void calculate_must_precede_follow (ddg_node_ptr, int, int,
>                                           int, int, sbitmap, sbitmap, sbitmap);
> @@ -458,6 +456,15 @@ set_node_sched_params (ddg_ptr g)
>                         node_sched_param_vec, g->num_nodes);
>  }
>
> +/* Make sure that node_sched_param_vec has an entry for every move in PS.  */
> +static void
> +extend_node_sched_params (partial_schedule_ptr ps)
> +{
> +  VEC_safe_grow_cleared (node_sched_params, heap, node_sched_param_vec,
> +                        ps->g->num_nodes + VEC_length (ps_reg_move_info,
> +                                                       ps->reg_moves));
> +}
> +
>  static void
>  print_node_sched_params (FILE *file, int num_nodes, partial_schedule_ptr ps)
>  {
> @@ -474,6 +481,7 @@ print_node_sched_params (FILE *file, int
>               INSN_UID (ps_rtl_insn (ps, i)));
>       fprintf (file, " asap = %d:\n", NODE_ASAP (&ps->g->nodes[i]));
>       fprintf (file, " time = %d:\n", nsp->time);
> +      fprintf (file, " stage = %d:\n", nsp->stage);
>       fprintf (file, " nreg_moves = %d:\n", nsp->nreg_moves);
>       for (j = 0; j < nsp->nreg_moves; j++)
>        {
> @@ -485,6 +493,168 @@ print_node_sched_params (FILE *file, int
>     }
>  }
>
> +/* Set SCHED_COLUMN for each instruction in row ROW of PS.  */
> +static void
> +set_columns_for_row (partial_schedule_ptr ps, int row)
> +{
> +  ps_insn_ptr cur_insn;
> +  int column;
> +
> +  column = 0;
> +  for (cur_insn = ps->rows[row]; cur_insn; cur_insn = cur_insn->next_in_row)
> +    SCHED_COLUMN (cur_insn->id) = column++;
> +}
> +
> +/* Set SCHED_COLUMN for each instruction in PS.  */
> +static void
> +set_columns_for_ps (partial_schedule_ptr ps)
> +{
> +  int row;
> +
> +  for (row = 0; row < ps->ii; row++)
> +    set_columns_for_row (ps, row);
> +}
> +
> +/* Try to schedule the move with ps_insn identifier I_REG_MOVE in PS.
> +   The source of the move is provided by I_MUST_FOLLOW, which has
> +   already been scheduled.  MUST_FOLLOW is a scratch bitmap that
> +   is big enough to hold I_MUST_FOLLOW.
> +
> +   Return true on success.  */
> +static bool
> +schedule_reg_move (partial_schedule_ptr ps, int i_reg_move,
> +                  sbitmap must_follow, int i_must_follow)
> +{
> +  unsigned int u;
> +  int i, start_row, end_row, this_start, this_end, this_row, latency;
> +  int origin_row, origin_column;
> +  sbitmap_iterator sbi;
> +  ps_reg_move_info *move;
> +
> +  move = ps_reg_move (ps, i_reg_move);
> +
> +  /* The cyclic lifetime of move->new_reg starts and ends at move->def
> +     (the instruction that defines move->old_reg).  We need to decide
> +     where in that cyclic lifetime the move should go.  The position
> +     is limited by I_MUST_FOLLOW (which defines the source of the move)
> +     and the nodes in move->uses (which consume the result).
> +
> +     Specifically, the lowest possible row (relative to move->def)
> +     is the maximum of:
> +
> +       a) the row in which the result of the previous iteration's
> +         I_MUST_FOLLOW becomes available
> +       b) the first row in which every use of the previous iteration's
> +         move->new_reg has completed.
> +
> +     The highest possible row (again relative to move->def) is
> +     the minimum of:
> +
> +       c) the row in which the current iteration's I_MUST_FOLLOW
> +         executes (and thus makes the source of the move unavailable)
> +       d) the last row in which every use of this iteration's
> +         move->new_reg could be satisfied without delay.
> +
> +     Because all positions are relative to move->def, this function
> +     uses ROW - ps->ii to represent positions come after move->def.
> +     The range includes origin_row twice: "origin_row - ps->ii" is for
> +     the positions in origin_row after move->def, while "origin_row"
> +     itself is for the positions before move->def.  */
> +  origin_row = SCHED_ROW (move->def);
> +  origin_column = SCHED_COLUMN (move->def);
> +  start_row = origin_row - ps->ii;
> +  end_row = origin_row;
> +
> +  if (dump_file)
> +    fprintf (dump_file, "Scheduling register move INSN %d with cyclic lifetime"
> +            " [%d, %d]\n", INSN_UID (move->insn), start_row, end_row);
> +
> +  /* Limit the range to [a, c].  */
> +  this_end = SCHED_ROW (i_must_follow);
> +  if (this_end > origin_row
> +      || (this_end == origin_row
> +         && SCHED_COLUMN (i_must_follow) > origin_column))
> +    this_end -= ps->ii;
> +  latency = insn_latency (ps_rtl_insn (ps, i_must_follow), move->insn);
> +  this_start = this_end - ps->ii + latency;
> +
> +  if (dump_file)
> +    fprintf (dump_file, "  -- source produced by INSN %d with latency %d,"
> +            " range [%d, %d]\n", INSN_UID (ps_rtl_insn (ps, i_must_follow)),
> +            latency, this_start, this_end);
> +
> +  if (start_row < this_start)
> +    start_row = this_start;
> +  if (end_row > this_end)
> +    end_row = this_end;
> +
> +  /* Limit the range to [b, d].  */
> +  EXECUTE_IF_SET_IN_SBITMAP (move->uses, 0, u, sbi)
> +    {
> +      latency = insn_latency (move->insn, ps_rtl_insn (ps, u));
> +      this_start = SCHED_ROW (u);
> +      if (this_start > origin_row
> +         || (this_start == origin_row
> +             && SCHED_COLUMN (u) > origin_column))
> +       this_start -= ps->ii;
> +      this_end = this_start + ps->ii - latency;
> +
> +      if (dump_file)
> +       fprintf (dump_file, "  -- destination used by INSN %d with latency"
> +                " %d, range [%d, %d]\n", INSN_UID (ps_rtl_insn (ps, u)),
> +                latency, this_start, this_end);
> +
> +      if (start_row < this_start)
> +       start_row = this_start;
> +      if (end_row > this_end)
> +       end_row = this_end;
> +    }
> +
> +  sbitmap_zero (must_follow);
> +  SET_BIT (must_follow, i_must_follow);
> +
> +  if (dump_file)
> +    fprintf (dump_file, "  -- trying to schedule in rows [%d, %d]\n",
> +            start_row, end_row);
> +
> +  for (i = end_row; i >= start_row; i--)
> +    {
> +      ps_insn_ptr psi;
> +
> +      this_row = i < 0 ? i + ps->ii : i;
> +      /* origin_row - ps->ii represents the part of row origin_row after
> +        move->def.  In this case the move must come after both move->uses
> +        and move->def.  */
> +      if (i == origin_row - ps->ii)
> +       {
> +         gcc_checking_assert (i == start_row);
> +         gcc_checking_assert (!TEST_BIT (move->uses, move->def));
> +         SET_BIT (move->uses, move->def);
> +         psi = ps_add_node_check_conflicts (ps, i_reg_move, this_row,
> +                                            move->uses, NULL);
> +         RESET_BIT (move->uses, move->def);
> +       }
> +      else
> +       psi = ps_add_node_check_conflicts (ps, i_reg_move, this_row,
> +                                          move->uses, must_follow);
> +
> +      if (psi)
> +       {
> +         SCHED_ROW (i_reg_move) = this_row;
> +         if (dump_file)
> +           fprintf (dump_file, "  -- scheduled in row %d (%d)\n",
> +                    i, this_row);
> +         set_columns_for_row (ps, this_row);
> +         return true;
> +       }
> +    }
> +
> +  if (dump_file)
> +    fprintf (dump_file, "  -- no available slot\n");
> +
> +  return false;
> +}
> +
>  /*
>    Breaking intra-loop register anti-dependences:
>    Each intra-loop register anti-dependence implies a cross-iteration true
> @@ -507,9 +677,10 @@ schedule_reg_moves (partial_schedule_ptr
>     {
>       ddg_node_ptr u = &g->nodes[i];
>       ddg_edge_ptr e;
> -      int nreg_moves = 0, i_reg_move;
> +      int nreg_moves = 0, i_reg_move, i_must_follow;
>       rtx prev_reg, old_reg;
>       int first_move;
> +      sbitmap must_follow;
>
>       /* Compute the number of reg_moves needed for u, by looking at life
>         ranges started at u (excluding self-loops).  */
> @@ -539,6 +710,7 @@ schedule_reg_moves (partial_schedule_ptr
>       first_move = VEC_length (ps_reg_move_info, ps->reg_moves);
>       VEC_safe_grow_cleared (ps_reg_move_info, heap, ps->reg_moves,
>                             first_move + nreg_moves);
> +      extend_node_sched_params (ps);
>
>       /* Record the moves associated with this node.  */
>       first_move += ps->g->num_nodes;
> @@ -552,7 +724,7 @@ schedule_reg_moves (partial_schedule_ptr
>          ps_reg_move_info *move = ps_reg_move (ps, first_move + i_reg_move);
>
>          move->def = i;
> -         move->uses = sbitmap_alloc (g->num_nodes);
> +         move->uses = sbitmap_alloc (first_move + nreg_moves);
>          move->old_reg = old_reg;
>          move->new_reg = gen_reg_rtx (GET_MODE (prev_reg));
>          move->insn = gen_move_insn (move->new_reg, copy_rtx (prev_reg));
> @@ -586,6 +758,19 @@ schedule_reg_moves (partial_schedule_ptr
>                SET_BIT (move->uses, e->dest->cuid);
>              }
>          }
> +
> +      must_follow = sbitmap_alloc (first_move + nreg_moves);
> +      i_must_follow = i;
> +      for (i_reg_move = 0; i_reg_move < nreg_moves; i_reg_move++)
> +       {
> +         if (!schedule_reg_move (ps, first_move + i_reg_move,
> +                                 must_follow, i_must_follow))
> +           break;
> +         i_must_follow = first_move + i_reg_move;
> +       }
> +      sbitmap_free (must_follow);
> +      if (i_reg_move < nreg_moves)
> +       return false;
>     }
>   return true;
>  }
> @@ -609,9 +794,6 @@ apply_reg_moves (partial_schedule_ptr ps
>          df_insn_rescan (ps->g->nodes[i_use].insn);
>        }
>     }
> -
> -  FOR_EACH_VEC_ELT_REVERSE (ps_reg_move_info, ps->reg_moves, i, move)
> -    add_insn_before (move->insn, ps_first_note (ps, move->def), NULL);
>  }
>
>  /* Update the sched_params (time, row and stage) for node U using the II,
> @@ -682,22 +864,6 @@ reset_sched_times (partial_schedule_ptr
>       }
>  }
>
> -/* Set SCHED_COLUMN of each node according to its position in PS.  */
> -static void
> -set_columns_for_ps (partial_schedule_ptr ps)
> -{
> -  int row;
> -
> -  for (row = 0; row < ps->ii; row++)
> -    {
> -      ps_insn_ptr cur_insn = ps->rows[row];
> -      int column = 0;
> -
> -      for (; cur_insn; cur_insn = cur_insn->next_in_row)
> -       SCHED_COLUMN (cur_insn->id) = column++;
> -    }
> -}
> -
>  /* Permute the insns according to their order in PS, from row 0 to
>    row ii-1, and position them right before LAST.  This schedules
>    the insns of the loop kernel.  */
> @@ -714,8 +880,13 @@ permute_partial_schedule (partial_schedu
>        rtx insn = ps_rtl_insn (ps, ps_ij->id);
>
>        if (PREV_INSN (last) != insn)
> -         reorder_insns_nobb (ps_first_note (ps, ps_ij->id), insn,
> -                             PREV_INSN (last));
> +         {
> +           if (ps_ij->id < ps->g->num_nodes)
> +             reorder_insns_nobb (ps_first_note (ps, ps_ij->id), insn,
> +                                 PREV_INSN (last));
> +           else
> +             add_insn_before (insn, last, NULL);
> +         }
>       }
>  }
>
> @@ -919,7 +1090,7 @@ optimize_sc (partial_schedule_ptr ps, dd
>
>  static void
>  duplicate_insns_of_cycles (partial_schedule_ptr ps, int from_stage,
> -                          int to_stage, int for_prolog, rtx count_reg)
> +                          int to_stage, rtx count_reg)
>  {
>   int row;
>   ps_insn_ptr ps_ij;
> @@ -928,7 +1099,6 @@ duplicate_insns_of_cycles (partial_sched
>     for (ps_ij = ps->rows[row]; ps_ij; ps_ij = ps_ij->next_in_row)
>       {
>        int u = ps_ij->id;
> -       int j, i_reg_moves, i_reg_move;
>        rtx u_insn;
>
>         /* Do not duplicate any insn which refers to count_reg as it
> @@ -942,42 +1112,18 @@ duplicate_insns_of_cycles (partial_sched
>             || JUMP_P (u_insn))
>           continue;
>
> -       if (for_prolog)
> -         {
> -           /* SCHED_STAGE (u) >= from_stage == 0.  Generate increasing
> -              number of reg_moves starting with the second occurrence of
> -              u, which is generated if its SCHED_STAGE <= to_stage.  */
> -           i_reg_moves = to_stage - SCHED_STAGE (u) + 1;
> -           i_reg_moves = MAX (i_reg_moves, 0);
> -           i_reg_moves = MIN (i_reg_moves, SCHED_NREG_MOVES (u));
> -
> -           /* The reg_moves start from the *first* reg_move backwards.  */
> -           i_reg_move = SCHED_FIRST_REG_MOVE (u) + (i_reg_moves - 1);
> -         }
> -       else /* It's for the epilog.  */
> -         {
> -           /* SCHED_STAGE (u) <= to_stage.  Generate all reg_moves,
> -              starting to decrease one stage after u no longer occurs;
> -              that is, generate all reg_moves until
> -              SCHED_STAGE (u) == from_stage - 1.  */
> -           i_reg_moves = (SCHED_NREG_MOVES (u)
> -                          - (from_stage - SCHED_STAGE (u) - 1));
> -           i_reg_moves = MAX (i_reg_moves, 0);
> -           i_reg_moves = MIN (i_reg_moves, SCHED_NREG_MOVES (u));
> -
> -           /* The reg_moves start from the *last* reg_move forwards.  */
> -           i_reg_move = SCHED_FIRST_REG_MOVE (u) + (SCHED_NREG_MOVES (u) - 1);
> -         }
> -
> -       for (j = 0; j < i_reg_moves; j++)
> +       if (u < ps->g->num_nodes)
>          {
> -           ps_reg_move_info *move = ps_reg_move (ps, i_reg_move - j);
> -
> -           emit_insn (copy_rtx (PATTERN (move->insn)));
> +           if (SCHED_STAGE (u) >= from_stage
> +               && SCHED_STAGE (u) <= to_stage)
> +             duplicate_insn_chain (ps_first_note (ps, u), u_insn);
>          }
> -       if (SCHED_STAGE (u) >= from_stage
> -           && SCHED_STAGE (u) <= to_stage)
> -         duplicate_insn_chain (ps_first_note (ps, u), u_insn);
> +       else
> +         /* For simplicity, we generate all moves for every stage,
> +            even though some of them will be dead.  Later optimizations
> +            will remove the dead instructions and undo some of the
> +            unnecessary renaming that moves would otherwise produce.  */
> +         emit_insn (copy_rtx (PATTERN (u_insn)));
>       }
>  }
>
> @@ -1011,7 +1157,7 @@ generate_prolog_epilog (partial_schedule
>     }
>
>   for (i = 0; i < last_stage; i++)
> -    duplicate_insns_of_cycles (ps, 0, i, 1, count_reg);
> +    duplicate_insns_of_cycles (ps, 0, i, count_reg);
>
>   /* Put the prolog on the entry edge.  */
>   e = loop_preheader_edge (loop);
> @@ -1023,7 +1169,7 @@ generate_prolog_epilog (partial_schedule
>   start_sequence ();
>
>   for (i = 0; i < last_stage; i++)
> -    duplicate_insns_of_cycles (ps, i + 1, last_stage, 0, count_reg);
> +    duplicate_insns_of_cycles (ps, i + 1, last_stage, count_reg);
>
>   /* Put the epilogue on the exit edge.  */
>   gcc_assert (single_exit (loop));
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [4/4] Make SMS schedule register moves
  2011-08-30 13:22   ` Richard Guenther
@ 2011-08-30 13:43     ` Richard Sandiford
  2011-08-30 13:50       ` Richard Guenther
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Sandiford @ 2011-08-30 13:43 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, zaks

Richard Guenther <richard.guenther@gmail.com> writes:
> On Tue, Aug 30, 2011 at 2:29 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> This is the move-scheduling patch itself.  It should be fairly
>> self-explanatory.  Let me know if it isn't, and I'll try to improve
>> the commentary.
>
> Can you add some testcases?

I don't think it's an easy thing to test for.  E.g. with the resampling
loop in the covering message, there might well be a reasonable ii=27
schedule that doesn't need as many moves.

Richard

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [4/4] Make SMS schedule register moves
  2011-08-30 13:43     ` Richard Sandiford
@ 2011-08-30 13:50       ` Richard Guenther
  2011-08-30 13:57         ` Richard Sandiford
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Guenther @ 2011-08-30 13:50 UTC (permalink / raw)
  To: Richard Guenther, gcc-patches, zaks, richard.sandiford

On Tue, Aug 30, 2011 at 2:43 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Guenther <richard.guenther@gmail.com> writes:
>> On Tue, Aug 30, 2011 at 2:29 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> This is the move-scheduling patch itself.  It should be fairly
>>> self-explanatory.  Let me know if it isn't, and I'll try to improve
>>> the commentary.
>>
>> Can you add some testcases?
>
> I don't think it's an easy thing to test for.  E.g. with the resampling
> loop in the covering message, there might well be a reasonable ii=27
> schedule that doesn't need as many moves.

So, does it only improve code generation or does it expose more
loops as SMS candidates?  If the former, ok ...

Richard.

> Richard
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [4/4] Make SMS schedule register moves
  2011-08-30 13:50       ` Richard Guenther
@ 2011-08-30 13:57         ` Richard Sandiford
  0 siblings, 0 replies; 15+ messages in thread
From: Richard Sandiford @ 2011-08-30 13:57 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, zaks

Richard Guenther <richard.guenther@gmail.com> writes:
> On Tue, Aug 30, 2011 at 2:43 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Richard Guenther <richard.guenther@gmail.com> writes:
>>> On Tue, Aug 30, 2011 at 2:29 PM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> This is the move-scheduling patch itself.  It should be fairly
>>>> self-explanatory.  Let me know if it isn't, and I'll try to improve
>>>> the commentary.
>>>
>>> Can you add some testcases?
>>
>> I don't think it's an easy thing to test for.  E.g. with the resampling
>> loop in the covering message, there might well be a reasonable ii=27
>> schedule that doesn't need as many moves.
>
> So, does it only improve code generation or does it expose more
> loops as SMS candidates?  If the former, ok ...

Yeah, it just improves code generation.  The code I'm adding only kicks
in once we've found a "successful" SMS schedule.  At the moment, once
we've found that kind of schedule, we always emit it, regardless of how
many moves it needs.

The new code instead allows the schedule to be rejected.  So from that
point of view, it reduces the number of SMS candidates (but hopefully
only in cases where we would otherwise generate worse code).

Richard

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [3/4] SMS: Record moves in the partial schedule
  2011-10-03 20:25         ` Ayal Zaks
@ 2011-10-04  8:06           ` Richard Sandiford
  0 siblings, 0 replies; 15+ messages in thread
From: Richard Sandiford @ 2011-10-04  8:06 UTC (permalink / raw)
  To: Ayal Zaks; +Cc: gcc-patches

Ayal Zaks <ayal.zaks@gmail.com> writes:
> On Wed, Sep 28, 2011 at 4:53 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Ayal Zaks <ayal.zaks@gmail.com> writes:
>>>>> Only request is to document that the register moves are
>>>>> placed/assigned-id's in a specific order.
>>>>
>>>>I suppose this is the downside of splitting the patches up, sorry,
>>>>but the ids are only ordered for the throw-away loop:
>>>>
>>>> FOR_EACH_VEC_ELT_REVERSE (ps_reg_move_info, ps->reg_moves, i, move)
>>>>   add_insn_before (move->insn, ps_first_note (ps, move->def), NULL);
>>>>
>>>>and for the prologue/epilogue code.  Both are replaced in patch 4
>>>>with code that doesn't rely on the ordering of ids.
>>> Ok then, very well. I was mostly referring to the following in
>>> prologue/epiloque code, which merely relies on assigning all regmoves
>>> of a node consecutive id's:
>>
>> FWIW, the 4/4 that I just posted did finally get rid of the first_reg_move
>> & nreg_moves fields.
>>
>> Here's a slightly updated patch in line with your 4/4 comments.
>> The move->def is now always the id of the predecessor, rather than
>> the id of the original ddg node producer.  I've adapted the throw-away
>> loop accordingly, but there are no other changes.
>>
>
> This is ok.

Thanks.

> Just to make sure I follow, the changes were (for this patch):
>
> 1. setting up a different move->def for each move
>
>> +         move->def = i_reg_move > 0 ? first_move + i_reg_move - 1 : i;
>
> instead of the same original def for all
>
>> +      move->def = i;
>
> 2. inserting each move right before its own def, bottom-up:
>
>> +  FOR_EACH_VEC_ELT (ps_reg_move_info, ps->reg_moves, i, move)
>> +    add_insn_before (move->insn, ps_first_note (ps, move->def), NULL);
>
> instead of inserting each move right before the original def, top-down:
>
>>>> FOR_EACH_VEC_ELT_REVERSE (ps_reg_move_info, ps->reg_moves, i, move)
>>>>   add_insn_before (move->insn, ps_first_note (ps, move->def), NULL);

Yeah, those were the ones.

Richard

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [3/4] SMS: Record moves in the partial schedule
  2011-09-28 16:16       ` Richard Sandiford
@ 2011-10-03 20:25         ` Ayal Zaks
  2011-10-04  8:06           ` Richard Sandiford
  0 siblings, 1 reply; 15+ messages in thread
From: Ayal Zaks @ 2011-10-03 20:25 UTC (permalink / raw)
  To: Ayal Zaks, gcc-patches, richard.sandiford

On Wed, Sep 28, 2011 at 4:53 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Ayal Zaks <ayal.zaks@gmail.com> writes:
>>>> Only request is to document that the register moves are
>>>> placed/assigned-id's in a specific order.
>>>
>>>I suppose this is the downside of splitting the patches up, sorry,
>>>but the ids are only ordered for the throw-away loop:
>>>
>>> FOR_EACH_VEC_ELT_REVERSE (ps_reg_move_info, ps->reg_moves, i, move)
>>>   add_insn_before (move->insn, ps_first_note (ps, move->def), NULL);
>>>
>>>and for the prologue/epilogue code.  Both are replaced in patch 4
>>>with code that doesn't rely on the ordering of ids.
>> Ok then, very well. I was mostly referring to the following in
>> prologue/epiloque code, which merely relies on assigning all regmoves
>> of a node consecutive id's:
>
> FWIW, the 4/4 that I just posted did finally get rid of the first_reg_move
> & nreg_moves fields.
>
> Here's a slightly updated patch in line with your 4/4 comments.
> The move->def is now always the id of the predecessor, rather than
> the id of the original ddg node producer.  I've adapted the throw-away
> loop accordingly, but there are no other changes.
>

This is ok.

Just to make sure I follow, the changes were (for this patch):

1. setting up a different move->def for each move

> +         move->def = i_reg_move > 0 ? first_move + i_reg_move - 1 : i;

instead of the same original def for all

> +      move->def = i;

2. inserting each move right before its own def, bottom-up:

> +  FOR_EACH_VEC_ELT (ps_reg_move_info, ps->reg_moves, i, move)
> +    add_insn_before (move->insn, ps_first_note (ps, move->def), NULL);

instead of inserting each move right before the original def, top-down:

>>> FOR_EACH_VEC_ELT_REVERSE (ps_reg_move_info, ps->reg_moves, i, move)
>>>   add_insn_before (move->insn, ps_first_note (ps, move->def), NULL);

Thanks,
Ayal.


> Tested on powerpc64-linux-gnu and with ARM benchmarks.
>
> Richard
>
> gcc/
>        * modulo-sched.c (ps_insn): Adjust comment.
>        (ps_reg_move_info): New structure.
>        (partial_schedule): Add reg_moves field.
>        (SCHED_PARAMS): Use node_sched_param_vec instead of node_sched_params.
>        (node_sched_params): Turn first_reg_move into an identifier.
>        (ps_reg_move): New function.
>        (ps_rtl_insn): Cope with register moves.
>        (ps_first_note): Adjust comment and assert that the instruction
>        isn't a register move.
>        (node_sched_params): Replace with...
>        (node_sched_param_vec): ...this vector.
>        (set_node_sched_params): Adjust accordingly.
>        (print_node_sched_params): Take a partial schedule instead of a ddg.
>        Use ps_rtl_insn and ps_reg_move.
>        (generate_reg_moves): Rename to...
>        (schedule_reg_moves): ...this.  Remove rescan parameter.  Record each
>        move in the partial schedule, but don't emit it here.  Don't perform
>        register substitutions here either.
>        (apply_reg_moves): New function.
>        (duplicate_insns_of_cycles): Use register indices directly,
>        rather than finding instructions using PREV_INSN.  Use ps_reg_move.
>        (sms_schedule): Call schedule_reg_moves before committing to
>        a partial schedule.   Try the next ii if the schedule fails.
>        Use apply_reg_moves instead of generate_reg_moves.  Adjust
>        call to print_node_sched_params.  Free node_sched_param_vec
>        instead of node_sched_params.
>        (create_partial_schedule): Initialize reg_moves.
>        (free_partial_schedule): Free reg_moves.
>
> Index: gcc/modulo-sched.c
> ===================================================================
> --- gcc/modulo-sched.c  2011-09-28 09:03:10.334301485 +0100
> +++ gcc/modulo-sched.c  2011-09-28 11:24:26.530300781 +0100
> @@ -124,7 +124,9 @@ #define PS_STAGE_COUNT(ps) (((partial_sc
>  /* A single instruction in the partial schedule.  */
>  struct ps_insn
>  {
> -  /* The number of the ddg node whose instruction is being scheduled.  */
> +  /* Identifies the instruction to be scheduled.  Values smaller than
> +     the ddg's num_nodes refer directly to ddg nodes.  A value of
> +     X - num_nodes refers to register move X.  */
>   int id;
>
>   /* The (absolute) cycle in which the PS instruction is scheduled.
> @@ -137,6 +139,30 @@ struct ps_insn
>
>  };
>
> +/* Information about a register move that has been added to a partial
> +   schedule.  */
> +struct ps_reg_move_info
> +{
> +  /* The source of the move is defined by the ps_insn with id DEF.
> +     The destination is used by the ps_insns with the ids in USES.  */
> +  int def;
> +  sbitmap uses;
> +
> +  /* The original form of USES' instructions used OLD_REG, but they
> +     should now use NEW_REG.  */
> +  rtx old_reg;
> +  rtx new_reg;
> +
> +  /* An instruction that sets NEW_REG to the correct value.  The first
> +     move associated with DEF will have an rhs of OLD_REG; later moves
> +     use the result of the previous move.  */
> +  rtx insn;
> +};
> +
> +typedef struct ps_reg_move_info ps_reg_move_info;
> +DEF_VEC_O (ps_reg_move_info);
> +DEF_VEC_ALLOC_O (ps_reg_move_info, heap);
> +
>  /* Holds the partial schedule as an array of II rows.  Each entry of the
>    array points to a linked list of PS_INSNs, which represents the
>    instructions that are scheduled for that row.  */
> @@ -148,6 +174,10 @@ struct partial_schedule
>   /* rows[i] points to linked list of insns scheduled in row i (0<=i<ii).  */
>   ps_insn_ptr *rows;
>
> +  /* All the moves added for this partial schedule.  Index X has
> +     a ps_insn id of X + g->num_nodes.  */
> +  VEC (ps_reg_move_info, heap) *reg_moves;
> +
>   /*  rows_length[i] holds the number of instructions in the row.
>       It is used only (as an optimization) to back off quickly from
>       trying to schedule a node in a full row; that is, to avoid running
> @@ -201,7 +231,7 @@ static void remove_node_from_ps (partial
>
>  #define NODE_ASAP(node) ((node)->aux.count)
>
> -#define SCHED_PARAMS(x) (&node_sched_params[x])
> +#define SCHED_PARAMS(x) VEC_index (node_sched_params, node_sched_param_vec, x)
>  #define SCHED_TIME(x) (SCHED_PARAMS (x)->time)
>  #define SCHED_FIRST_REG_MOVE(x) (SCHED_PARAMS (x)->first_reg_move)
>  #define SCHED_NREG_MOVES(x) (SCHED_PARAMS (x)->nreg_moves)
> @@ -214,14 +244,13 @@ typedef struct node_sched_params
>  {
>   int time;    /* The absolute scheduling cycle.  */
>
> -  /* The following field (first_reg_move) is a pointer to the first
> +  /* The following field (first_reg_move) is the ps_insn id of the first
>      register-move instruction added to handle the modulo-variable-expansion
>      of the register defined by this node.  This register-move copies the
>      original register defined by the node.  */
> -  rtx first_reg_move;
> +  int first_reg_move;
>
> -  /* The number of register-move instructions added, immediately preceding
> -     first_reg_move.  */
> +  /* The number of register-move instructions added.  */
>   int nreg_moves;
>
>   int row;    /* Holds time % ii.  */
> @@ -232,6 +261,9 @@ typedef struct node_sched_params
>   int column;
>  } *node_sched_params_ptr;
>
> +typedef struct node_sched_params node_sched_params;
> +DEF_VEC_O (node_sched_params);
> +DEF_VEC_ALLOC_O (node_sched_params, heap);
>
>  /* The following three functions are copied from the current scheduler
>    code in order to use sched_analyze() for computing the dependencies.
> @@ -280,20 +312,35 @@ static struct haifa_sched_info sms_sched
>   0
>  };
>
> +/* Partial schedule instruction ID in PS is a register move.  Return
> +   information about it.  */
> +static struct ps_reg_move_info *
> +ps_reg_move (partial_schedule_ptr ps, int id)
> +{
> +  gcc_checking_assert (id >= ps->g->num_nodes);
> +  return VEC_index (ps_reg_move_info, ps->reg_moves, id - ps->g->num_nodes);
> +}
> +
>  /* Return the rtl instruction that is being scheduled by partial schedule
>    instruction ID, which belongs to schedule PS.  */
>  static rtx
>  ps_rtl_insn (partial_schedule_ptr ps, int id)
>  {
> -  return ps->g->nodes[id].insn;
> +  if (id < ps->g->num_nodes)
> +    return ps->g->nodes[id].insn;
> +  else
> +    return ps_reg_move (ps, id)->insn;
>  }
>
> -/* Return the first instruction in the original (unscheduled) loop that
> -   was associated with ps_rtl_insn (PS, ID).  If the instruction had
> -   some notes before it, this is the first of those notes.  */
> +/* Partial schedule instruction ID, which belongs to PS, occured in
> +   the original (unscheduled) loop.  Return the first instruction
> +   in the loop that was associated with ps_rtl_insn (PS, ID).
> +   If the instruction had some notes before it, this is the first
> +   of those notes.  */
>  static rtx
>  ps_first_note (partial_schedule_ptr ps, int id)
>  {
> +  gcc_assert (id < ps->g->num_nodes);
>   return ps->g->nodes[id].first_note;
>  }
>
> @@ -397,18 +444,20 @@ res_MII (ddg_ptr g)
>  }
>
>
> -/* Points to the array that contains the sched data for each node.  */
> -static node_sched_params_ptr node_sched_params;
> +/* A vector that contains the sched data for each ps_insn.  */
> +static VEC (node_sched_params, heap) *node_sched_param_vec;
>
>  /* Allocate sched_params for each node and initialize it.  */
>  static void
>  set_node_sched_params (ddg_ptr g)
>  {
> -  node_sched_params = XCNEWVEC (struct node_sched_params, g->num_nodes);
> +  VEC_truncate (node_sched_params, node_sched_param_vec, 0);
> +  VEC_safe_grow_cleared (node_sched_params, heap,
> +                        node_sched_param_vec, g->num_nodes);
>  }
>
>  static void
> -print_node_sched_params (FILE *file, int num_nodes, ddg_ptr g)
> +print_node_sched_params (FILE *file, int num_nodes, partial_schedule_ptr ps)
>  {
>   int i;
>
> @@ -417,19 +466,19 @@ print_node_sched_params (FILE *file, int
>   for (i = 0; i < num_nodes; i++)
>     {
>       node_sched_params_ptr nsp = SCHED_PARAMS (i);
> -      rtx reg_move = nsp->first_reg_move;
>       int j;
>
>       fprintf (file, "Node = %d; INSN = %d\n", i,
> -              (INSN_UID (g->nodes[i].insn)));
> -      fprintf (file, " asap = %d:\n", NODE_ASAP (&g->nodes[i]));
> +              INSN_UID (ps_rtl_insn (ps, i)));
> +      fprintf (file, " asap = %d:\n", NODE_ASAP (&ps->g->nodes[i]));
>       fprintf (file, " time = %d:\n", nsp->time);
>       fprintf (file, " nreg_moves = %d:\n", nsp->nreg_moves);
>       for (j = 0; j < nsp->nreg_moves; j++)
>        {
> +         ps_reg_move_info *move = ps_reg_move (ps, nsp->first_reg_move + j);
> +
>          fprintf (file, " reg_move = ");
> -         print_rtl_single (file, reg_move);
> -         reg_move = PREV_INSN (reg_move);
> +         print_rtl_single (file, move->insn);
>        }
>     }
>  }
> @@ -445,8 +494,8 @@ print_node_sched_params (FILE *file, int
>    nreg_moves = ----------------------------------- + 1 - {   dependence.
>                             ii                          { 1 if not.
>  */
> -static void
> -generate_reg_moves (partial_schedule_ptr ps, bool rescan)
> +static bool
> +schedule_reg_moves (partial_schedule_ptr ps)
>  {
>   ddg_ptr g = ps->g;
>   int ii = ps->ii;
> @@ -457,9 +506,8 @@ generate_reg_moves (partial_schedule_ptr
>       ddg_node_ptr u = &g->nodes[i];
>       ddg_edge_ptr e;
>       int nreg_moves = 0, i_reg_move;
> -      sbitmap *uses_of_defs;
> -      rtx last_reg_move;
>       rtx prev_reg, old_reg;
> +      int first_move;
>
>       /* Compute the number of reg_moves needed for u, by looking at life
>         ranges started at u (excluding self-loops).  */
> @@ -485,12 +533,35 @@ generate_reg_moves (partial_schedule_ptr
>       if (nreg_moves == 0)
>        continue;
>
> +      /* Create NREG_MOVES register moves.  */
> +      first_move = VEC_length (ps_reg_move_info, ps->reg_moves);
> +      VEC_safe_grow_cleared (ps_reg_move_info, heap, ps->reg_moves,
> +                            first_move + nreg_moves);
> +
> +      /* Record the moves associated with this node.  */
> +      first_move += ps->g->num_nodes;
> +      SCHED_FIRST_REG_MOVE (i) = first_move;
> +      SCHED_NREG_MOVES (i) = nreg_moves;
> +
> +      /* Generate each move.  */
> +      old_reg = prev_reg = SET_DEST (single_set (u->insn));
> +      for (i_reg_move = 0; i_reg_move < nreg_moves; i_reg_move++)
> +       {
> +         ps_reg_move_info *move = ps_reg_move (ps, first_move + i_reg_move);
> +
> +         move->def = i_reg_move > 0 ? first_move + i_reg_move - 1 : i;
> +         move->uses = sbitmap_alloc (g->num_nodes);
> +         move->old_reg = old_reg;
> +         move->new_reg = gen_reg_rtx (GET_MODE (prev_reg));
> +         move->insn = gen_move_insn (move->new_reg, copy_rtx (prev_reg));
> +         sbitmap_zero (move->uses);
> +
> +         prev_reg = move->new_reg;
> +       }
> +
>       /* Every use of the register defined by node may require a different
>         copy of this register, depending on the time the use is scheduled.
> -        Set a bitmap vector, telling which nodes use each copy of this
> -        register.  */
> -      uses_of_defs = sbitmap_vector_alloc (nreg_moves, g->num_nodes);
> -      sbitmap_vector_zero (uses_of_defs, nreg_moves);
> +        Record which uses require which move results.  */
>       for (e = u->out; e; e = e->next_out)
>        if (e->type == TRUE_DEP && e->dest != e->src)
>          {
> @@ -506,40 +577,39 @@ generate_reg_moves (partial_schedule_ptr
>              dest_copy--;
>
>            if (dest_copy)
> -             SET_BIT (uses_of_defs[dest_copy - 1], e->dest->cuid);
> -         }
> -
> -      /* Now generate the reg_moves, attaching relevant uses to them.  */
> -      SCHED_NREG_MOVES (i) = nreg_moves;
> -      old_reg = prev_reg = copy_rtx (SET_DEST (single_set (u->insn)));
> -      /* Insert the reg-moves right before the notes which precede
> -         the insn they relates to.  */
> -      last_reg_move = u->first_note;
> -
> -      for (i_reg_move = 0; i_reg_move < nreg_moves; i_reg_move++)
> -       {
> -         unsigned int i_use = 0;
> -         rtx new_reg = gen_reg_rtx (GET_MODE (prev_reg));
> -         rtx reg_move = gen_move_insn (new_reg, prev_reg);
> -         sbitmap_iterator sbi;
> +             {
> +               ps_reg_move_info *move;
>
> -         add_insn_before (reg_move, last_reg_move, NULL);
> -         last_reg_move = reg_move;
> +               move = ps_reg_move (ps, first_move + dest_copy - 1);
> +               SET_BIT (move->uses, e->dest->cuid);
> +             }
> +         }
> +    }
> +  return true;
> +}
>
> -         if (!SCHED_FIRST_REG_MOVE (i))
> -           SCHED_FIRST_REG_MOVE (i) = reg_move;
> +/* Emit the moves associatied with PS.  Apply the substitutions
> +   associated with them.  */
> +static void
> +apply_reg_moves (partial_schedule_ptr ps)
> +{
> +  ps_reg_move_info *move;
> +  int i;
>
> -         EXECUTE_IF_SET_IN_SBITMAP (uses_of_defs[i_reg_move], 0, i_use, sbi)
> -           {
> -             replace_rtx (g->nodes[i_use].insn, old_reg, new_reg);
> -             if (rescan)
> -               df_insn_rescan (g->nodes[i_use].insn);
> -           }
> +  FOR_EACH_VEC_ELT (ps_reg_move_info, ps->reg_moves, i, move)
> +    {
> +      unsigned int i_use;
> +      sbitmap_iterator sbi;
>
> -         prev_reg = new_reg;
> +      EXECUTE_IF_SET_IN_SBITMAP (move->uses, 0, i_use, sbi)
> +       {
> +         replace_rtx (ps->g->nodes[i_use].insn, move->old_reg, move->new_reg);
> +         df_insn_rescan (ps->g->nodes[i_use].insn);
>        }
> -      sbitmap_vector_free (uses_of_defs);
>     }
> +
> +  FOR_EACH_VEC_ELT (ps_reg_move_info, ps->reg_moves, i, move)
> +    add_insn_before (move->insn, ps_first_note (ps, move->def), NULL);
>  }
>
>  /* Update the sched_params (time, row and stage) for node U using the II,
> @@ -855,8 +925,7 @@ duplicate_insns_of_cycles (partial_sched
>     for (ps_ij = ps->rows[row]; ps_ij; ps_ij = ps_ij->next_in_row)
>       {
>        int u = ps_ij->id;
> -       int j, i_reg_moves;
> -       rtx reg_move = NULL_RTX;
> +       int j, i_reg_moves, i_reg_move;
>        rtx u_insn;
>
>         /* Do not duplicate any insn which refers to count_reg as it
> @@ -880,12 +949,7 @@ duplicate_insns_of_cycles (partial_sched
>            i_reg_moves = MIN (i_reg_moves, SCHED_NREG_MOVES (u));
>
>            /* The reg_moves start from the *first* reg_move backwards.  */
> -           if (i_reg_moves)
> -             {
> -               reg_move = SCHED_FIRST_REG_MOVE (u);
> -               for (j = 1; j < i_reg_moves; j++)
> -                 reg_move = PREV_INSN (reg_move);
> -             }
> +           i_reg_move = SCHED_FIRST_REG_MOVE (u) + (i_reg_moves - 1);
>          }
>        else /* It's for the epilog.  */
>          {
> @@ -899,16 +963,15 @@ duplicate_insns_of_cycles (partial_sched
>            i_reg_moves = MIN (i_reg_moves, SCHED_NREG_MOVES (u));
>
>            /* The reg_moves start from the *last* reg_move forwards.  */
> -           if (i_reg_moves)
> -             {
> -               reg_move = SCHED_FIRST_REG_MOVE (u);
> -               for (j = 1; j < SCHED_NREG_MOVES (u); j++)
> -                 reg_move = PREV_INSN (reg_move);
> -             }
> +           i_reg_move = SCHED_FIRST_REG_MOVE (u) + (SCHED_NREG_MOVES (u) - 1);
>          }
>
> -       for (j = 0; j < i_reg_moves; j++, reg_move = NEXT_INSN (reg_move))
> -         emit_insn (copy_rtx (PATTERN (reg_move)));
> +       for (j = 0; j < i_reg_moves; j++)
> +         {
> +           ps_reg_move_info *move = ps_reg_move (ps, i_reg_move - j);
> +
> +           emit_insn (copy_rtx (PATTERN (move->insn)));
> +         }
>        if (SCHED_STAGE (u) >= from_stage
>            && SCHED_STAGE (u) <= to_stage)
>          duplicate_insn_chain (ps_first_note (ps, u), u_insn);
> @@ -1299,9 +1362,9 @@ sms_schedule (void)
>       rtx head, tail;
>       rtx count_reg, count_init;
>       int mii, rec_mii;
> -      unsigned stage_count = 0;
> +      unsigned stage_count;
>       HOST_WIDEST_INT loop_count = 0;
> -      bool opt_sc_p = false;
> +      bool opt_sc_p;
>
>       if (! (g = g_arr[loop->num]))
>         continue;
> @@ -1378,54 +1441,60 @@ sms_schedule (void)
>        fprintf (dump_file, "SMS iis %d %d %d (rec_mii, mii, maxii)\n",
>                 rec_mii, mii, maxii);
>
> -      set_node_sched_params (g);
> -
> -      ps = sms_schedule_by_order (g, mii, maxii, node_order);
> -
> -      if (ps)
> +      for (;;)
>        {
> -         /* Try to achieve optimized SC by normalizing the partial
> -            schedule (having the cycles start from cycle zero).
> -            The branch location must be placed in row ii-1 in the
> -            final scheduling.  If failed, shift all instructions to
> -            position the branch in row ii-1.  */
> -         opt_sc_p = optimize_sc (ps, g);
> -         if (opt_sc_p)
> -           stage_count = calculate_stage_count (ps, 0);
> -         else
> +         set_node_sched_params (g);
> +
> +         stage_count = 0;
> +         opt_sc_p = false;
> +         ps = sms_schedule_by_order (g, mii, maxii, node_order);
> +
> +         if (ps)
>            {
> -             /* Bring the branch to cycle ii-1.  */
> -             int amount = SCHED_TIME (g->closing_branch->cuid) - (ps->ii - 1);
> -
> -             if (dump_file)
> -               fprintf (dump_file, "SMS schedule branch at cycle ii-1\n");
> -
> -             stage_count = calculate_stage_count (ps, amount);
> +             /* Try to achieve optimized SC by normalizing the partial
> +                schedule (having the cycles start from cycle zero).
> +                The branch location must be placed in row ii-1 in the
> +                final scheduling.      If failed, shift all instructions to
> +                position the branch in row ii-1.  */
> +             opt_sc_p = optimize_sc (ps, g);
> +             if (opt_sc_p)
> +               stage_count = calculate_stage_count (ps, 0);
> +             else
> +               {
> +                 /* Bring the branch to cycle ii-1.  */
> +                 int amount = (SCHED_TIME (g->closing_branch->cuid)
> +                               - (ps->ii - 1));
> +
> +                 if (dump_file)
> +                   fprintf (dump_file, "SMS schedule branch at cycle ii-1\n");
> +
> +                 stage_count = calculate_stage_count (ps, amount);
> +               }
> +
> +             gcc_assert (stage_count >= 1);
> +             PS_STAGE_COUNT (ps) = stage_count;
>            }
> -
> -         gcc_assert (stage_count >= 1);
> -         PS_STAGE_COUNT (ps) = stage_count;
> -       }
> -
> -      /* The default value of PARAM_SMS_MIN_SC is 2 as stage count of
> -        1 means that there is no interleaving between iterations thus
> -        we let the scheduling passes do the job in this case.  */
> -      if (stage_count < (unsigned) PARAM_VALUE (PARAM_SMS_MIN_SC)
> -         || (count_init && (loop_count <= stage_count))
> -         || (flag_branch_probabilities && (trip_count <= stage_count)))
> -       {
> -         if (dump_file)
> +
> +         /* The default value of PARAM_SMS_MIN_SC is 2 as stage count of
> +            1 means that there is no interleaving between iterations thus
> +            we let the scheduling passes do the job in this case.  */
> +         if (stage_count < (unsigned) PARAM_VALUE (PARAM_SMS_MIN_SC)
> +             || (count_init && (loop_count <= stage_count))
> +             || (flag_branch_probabilities && (trip_count <= stage_count)))
>            {
> -             fprintf (dump_file, "SMS failed... \n");
> -             fprintf (dump_file, "SMS sched-failed (stage-count=%d, loop-count=", stage_count);
> -             fprintf (dump_file, HOST_WIDEST_INT_PRINT_DEC, loop_count);
> -             fprintf (dump_file, ", trip-count=");
> -             fprintf (dump_file, HOST_WIDEST_INT_PRINT_DEC, trip_count);
> -             fprintf (dump_file, ")\n");
> +             if (dump_file)
> +               {
> +                 fprintf (dump_file, "SMS failed... \n");
> +                 fprintf (dump_file, "SMS sched-failed (stage-count=%d,"
> +                          " loop-count=", stage_count);
> +                 fprintf (dump_file, HOST_WIDEST_INT_PRINT_DEC, loop_count);
> +                 fprintf (dump_file, ", trip-count=");
> +                 fprintf (dump_file, HOST_WIDEST_INT_PRINT_DEC, trip_count);
> +                 fprintf (dump_file, ")\n");
> +               }
> +             break;
>            }
> -       }
> -      else
> -       {
> +
>           if (!opt_sc_p)
>             {
>              /* Rotate the partial schedule to have the branch in row ii-1.  */
> @@ -1437,6 +1506,13 @@ sms_schedule (void)
>
>          set_columns_for_ps (ps);
>
> +         if (!schedule_reg_moves (ps))
> +           {
> +             mii = ps->ii + 1;
> +             free_partial_schedule (ps);
> +             continue;
> +           }
> +
>          canon_loop (loop);
>
>           if (dump_file)
> @@ -1475,15 +1551,16 @@ sms_schedule (void)
>          /* The life-info is not valid any more.  */
>          df_set_bb_dirty (g->bb);
>
> -         generate_reg_moves (ps, true);
> +         apply_reg_moves (ps);
>          if (dump_file)
> -           print_node_sched_params (dump_file, g->num_nodes, g);
> +           print_node_sched_params (dump_file, g->num_nodes, ps);
>          /* Generate prolog and epilog.  */
>           generate_prolog_epilog (ps, loop, count_reg, count_init);
> +         break;
>        }
>
>       free_partial_schedule (ps);
> -      free (node_sched_params);
> +      VEC_free (node_sched_params, heap, node_sched_param_vec);
>       free (node_order);
>       free_ddg (g);
>     }
> @@ -2570,6 +2647,7 @@ create_partial_schedule (int ii, ddg_ptr
>   partial_schedule_ptr ps = XNEW (struct partial_schedule);
>   ps->rows = (ps_insn_ptr *) xcalloc (ii, sizeof (ps_insn_ptr));
>   ps->rows_length = (int *) xcalloc (ii, sizeof (int));
> +  ps->reg_moves = NULL;
>   ps->ii = ii;
>   ps->history = history;
>   ps->min_cycle = INT_MAX;
> @@ -2604,8 +2682,16 @@ free_ps_insns (partial_schedule_ptr ps)
>  static void
>  free_partial_schedule (partial_schedule_ptr ps)
>  {
> +  ps_reg_move_info *move;
> +  unsigned int i;
> +
>   if (!ps)
>     return;
> +
> +  FOR_EACH_VEC_ELT (ps_reg_move_info, ps->reg_moves, i, move)
> +    sbitmap_free (move->uses);
> +  VEC_free (ps_reg_move_info, heap, ps->reg_moves);
> +
>   free_ps_insns (ps);
>   free (ps->rows);
>   free (ps->rows_length);
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [3/4] SMS: Record moves in the partial schedule
  2011-09-22 16:48     ` Ayal Zaks
@ 2011-09-28 16:16       ` Richard Sandiford
  2011-10-03 20:25         ` Ayal Zaks
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Sandiford @ 2011-09-28 16:16 UTC (permalink / raw)
  To: Ayal Zaks; +Cc: gcc-patches

Ayal Zaks <ayal.zaks@gmail.com> writes:
>>> Only request is to document that the register moves are
>>> placed/assigned-id's in a specific order.
>>
>>I suppose this is the downside of splitting the patches up, sorry,
>>but the ids are only ordered for the throw-away loop:
>>
>> FOR_EACH_VEC_ELT_REVERSE (ps_reg_move_info, ps->reg_moves, i, move)
>>   add_insn_before (move->insn, ps_first_note (ps, move->def), NULL);
>>
>>and for the prologue/epilogue code.  Both are replaced in patch 4
>>with code that doesn't rely on the ordering of ids.
> Ok then, very well. I was mostly referring to the following in
> prologue/epiloque code, which merely relies on assigning all regmoves
> of a node consecutive id's:

FWIW, the 4/4 that I just posted did finally get rid of the first_reg_move
& nreg_moves fields.

Here's a slightly updated patch in line with your 4/4 comments.
The move->def is now always the id of the predecessor, rather than
the id of the original ddg node producer.  I've adapted the throw-away
loop accordingly, but there are no other changes.

Tested on powerpc64-linux-gnu and with ARM benchmarks.

Richard

gcc/
	* modulo-sched.c (ps_insn): Adjust comment.
	(ps_reg_move_info): New structure.
	(partial_schedule): Add reg_moves field.
	(SCHED_PARAMS): Use node_sched_param_vec instead of node_sched_params.
	(node_sched_params): Turn first_reg_move into an identifier.
	(ps_reg_move): New function.
	(ps_rtl_insn): Cope with register moves.
	(ps_first_note): Adjust comment and assert that the instruction
	isn't a register move.
	(node_sched_params): Replace with...
	(node_sched_param_vec): ...this vector.
	(set_node_sched_params): Adjust accordingly.
	(print_node_sched_params): Take a partial schedule instead of a ddg.
	Use ps_rtl_insn and ps_reg_move.
	(generate_reg_moves): Rename to...
	(schedule_reg_moves): ...this.  Remove rescan parameter.  Record each
	move in the partial schedule, but don't emit it here.  Don't perform
	register substitutions here either.
	(apply_reg_moves): New function.
	(duplicate_insns_of_cycles): Use register indices directly,
	rather than finding instructions using PREV_INSN.  Use ps_reg_move.
	(sms_schedule): Call schedule_reg_moves before committing to
	a partial schedule.   Try the next ii if the schedule fails.
	Use apply_reg_moves instead of generate_reg_moves.  Adjust
	call to print_node_sched_params.  Free node_sched_param_vec
	instead of node_sched_params.
	(create_partial_schedule): Initialize reg_moves.
	(free_partial_schedule): Free reg_moves.

Index: gcc/modulo-sched.c
===================================================================
--- gcc/modulo-sched.c	2011-09-28 09:03:10.334301485 +0100
+++ gcc/modulo-sched.c	2011-09-28 11:24:26.530300781 +0100
@@ -124,7 +124,9 @@ #define PS_STAGE_COUNT(ps) (((partial_sc
 /* A single instruction in the partial schedule.  */
 struct ps_insn
 {
-  /* The number of the ddg node whose instruction is being scheduled.  */
+  /* Identifies the instruction to be scheduled.  Values smaller than
+     the ddg's num_nodes refer directly to ddg nodes.  A value of
+     X - num_nodes refers to register move X.  */
   int id;
 
   /* The (absolute) cycle in which the PS instruction is scheduled.
@@ -137,6 +139,30 @@ struct ps_insn
 
 };
 
+/* Information about a register move that has been added to a partial
+   schedule.  */
+struct ps_reg_move_info
+{
+  /* The source of the move is defined by the ps_insn with id DEF.
+     The destination is used by the ps_insns with the ids in USES.  */
+  int def;
+  sbitmap uses;
+
+  /* The original form of USES' instructions used OLD_REG, but they
+     should now use NEW_REG.  */
+  rtx old_reg;
+  rtx new_reg;
+
+  /* An instruction that sets NEW_REG to the correct value.  The first
+     move associated with DEF will have an rhs of OLD_REG; later moves
+     use the result of the previous move.  */
+  rtx insn;
+};
+
+typedef struct ps_reg_move_info ps_reg_move_info;
+DEF_VEC_O (ps_reg_move_info);
+DEF_VEC_ALLOC_O (ps_reg_move_info, heap);
+
 /* Holds the partial schedule as an array of II rows.  Each entry of the
    array points to a linked list of PS_INSNs, which represents the
    instructions that are scheduled for that row.  */
@@ -148,6 +174,10 @@ struct partial_schedule
   /* rows[i] points to linked list of insns scheduled in row i (0<=i<ii).  */
   ps_insn_ptr *rows;
 
+  /* All the moves added for this partial schedule.  Index X has
+     a ps_insn id of X + g->num_nodes.  */
+  VEC (ps_reg_move_info, heap) *reg_moves;
+
   /*  rows_length[i] holds the number of instructions in the row.
       It is used only (as an optimization) to back off quickly from
       trying to schedule a node in a full row; that is, to avoid running
@@ -201,7 +231,7 @@ static void remove_node_from_ps (partial
 
 #define NODE_ASAP(node) ((node)->aux.count)
 
-#define SCHED_PARAMS(x) (&node_sched_params[x])
+#define SCHED_PARAMS(x) VEC_index (node_sched_params, node_sched_param_vec, x)
 #define SCHED_TIME(x) (SCHED_PARAMS (x)->time)
 #define SCHED_FIRST_REG_MOVE(x) (SCHED_PARAMS (x)->first_reg_move)
 #define SCHED_NREG_MOVES(x) (SCHED_PARAMS (x)->nreg_moves)
@@ -214,14 +244,13 @@ typedef struct node_sched_params
 {
   int time;	/* The absolute scheduling cycle.  */
 
-  /* The following field (first_reg_move) is a pointer to the first
+  /* The following field (first_reg_move) is the ps_insn id of the first
      register-move instruction added to handle the modulo-variable-expansion
      of the register defined by this node.  This register-move copies the
      original register defined by the node.  */
-  rtx first_reg_move;
+  int first_reg_move;
 
-  /* The number of register-move instructions added, immediately preceding
-     first_reg_move.  */
+  /* The number of register-move instructions added.  */
   int nreg_moves;
 
   int row;    /* Holds time % ii.  */
@@ -232,6 +261,9 @@ typedef struct node_sched_params
   int column;
 } *node_sched_params_ptr;
 
+typedef struct node_sched_params node_sched_params;
+DEF_VEC_O (node_sched_params);
+DEF_VEC_ALLOC_O (node_sched_params, heap);
 \f
 /* The following three functions are copied from the current scheduler
    code in order to use sched_analyze() for computing the dependencies.
@@ -280,20 +312,35 @@ static struct haifa_sched_info sms_sched
   0
 };
 
+/* Partial schedule instruction ID in PS is a register move.  Return
+   information about it.  */
+static struct ps_reg_move_info *
+ps_reg_move (partial_schedule_ptr ps, int id)
+{
+  gcc_checking_assert (id >= ps->g->num_nodes);
+  return VEC_index (ps_reg_move_info, ps->reg_moves, id - ps->g->num_nodes);
+}
+
 /* Return the rtl instruction that is being scheduled by partial schedule
    instruction ID, which belongs to schedule PS.  */
 static rtx
 ps_rtl_insn (partial_schedule_ptr ps, int id)
 {
-  return ps->g->nodes[id].insn;
+  if (id < ps->g->num_nodes)
+    return ps->g->nodes[id].insn;
+  else
+    return ps_reg_move (ps, id)->insn;
 }
 
-/* Return the first instruction in the original (unscheduled) loop that
-   was associated with ps_rtl_insn (PS, ID).  If the instruction had
-   some notes before it, this is the first of those notes.  */
+/* Partial schedule instruction ID, which belongs to PS, occured in
+   the original (unscheduled) loop.  Return the first instruction
+   in the loop that was associated with ps_rtl_insn (PS, ID).
+   If the instruction had some notes before it, this is the first
+   of those notes.  */
 static rtx
 ps_first_note (partial_schedule_ptr ps, int id)
 {
+  gcc_assert (id < ps->g->num_nodes);
   return ps->g->nodes[id].first_note;
 }
 
@@ -397,18 +444,20 @@ res_MII (ddg_ptr g)
 }
 
 
-/* Points to the array that contains the sched data for each node.  */
-static node_sched_params_ptr node_sched_params;
+/* A vector that contains the sched data for each ps_insn.  */
+static VEC (node_sched_params, heap) *node_sched_param_vec;
 
 /* Allocate sched_params for each node and initialize it.  */
 static void
 set_node_sched_params (ddg_ptr g)
 {
-  node_sched_params = XCNEWVEC (struct node_sched_params, g->num_nodes);
+  VEC_truncate (node_sched_params, node_sched_param_vec, 0);
+  VEC_safe_grow_cleared (node_sched_params, heap,
+			 node_sched_param_vec, g->num_nodes);
 }
 
 static void
-print_node_sched_params (FILE *file, int num_nodes, ddg_ptr g)
+print_node_sched_params (FILE *file, int num_nodes, partial_schedule_ptr ps)
 {
   int i;
 
@@ -417,19 +466,19 @@ print_node_sched_params (FILE *file, int
   for (i = 0; i < num_nodes; i++)
     {
       node_sched_params_ptr nsp = SCHED_PARAMS (i);
-      rtx reg_move = nsp->first_reg_move;
       int j;
 
       fprintf (file, "Node = %d; INSN = %d\n", i,
-	       (INSN_UID (g->nodes[i].insn)));
-      fprintf (file, " asap = %d:\n", NODE_ASAP (&g->nodes[i]));
+	       INSN_UID (ps_rtl_insn (ps, i)));
+      fprintf (file, " asap = %d:\n", NODE_ASAP (&ps->g->nodes[i]));
       fprintf (file, " time = %d:\n", nsp->time);
       fprintf (file, " nreg_moves = %d:\n", nsp->nreg_moves);
       for (j = 0; j < nsp->nreg_moves; j++)
 	{
+	  ps_reg_move_info *move = ps_reg_move (ps, nsp->first_reg_move + j);
+
 	  fprintf (file, " reg_move = ");
-	  print_rtl_single (file, reg_move);
-	  reg_move = PREV_INSN (reg_move);
+	  print_rtl_single (file, move->insn);
 	}
     }
 }
@@ -445,8 +494,8 @@ print_node_sched_params (FILE *file, int
    nreg_moves = ----------------------------------- + 1 - {   dependence.
                             ii                          { 1 if not.
 */
-static void
-generate_reg_moves (partial_schedule_ptr ps, bool rescan)
+static bool
+schedule_reg_moves (partial_schedule_ptr ps)
 {
   ddg_ptr g = ps->g;
   int ii = ps->ii;
@@ -457,9 +506,8 @@ generate_reg_moves (partial_schedule_ptr
       ddg_node_ptr u = &g->nodes[i];
       ddg_edge_ptr e;
       int nreg_moves = 0, i_reg_move;
-      sbitmap *uses_of_defs;
-      rtx last_reg_move;
       rtx prev_reg, old_reg;
+      int first_move;
 
       /* Compute the number of reg_moves needed for u, by looking at life
 	 ranges started at u (excluding self-loops).  */
@@ -485,12 +533,35 @@ generate_reg_moves (partial_schedule_ptr
       if (nreg_moves == 0)
 	continue;
 
+      /* Create NREG_MOVES register moves.  */
+      first_move = VEC_length (ps_reg_move_info, ps->reg_moves);
+      VEC_safe_grow_cleared (ps_reg_move_info, heap, ps->reg_moves,
+			     first_move + nreg_moves);
+
+      /* Record the moves associated with this node.  */
+      first_move += ps->g->num_nodes;
+      SCHED_FIRST_REG_MOVE (i) = first_move;
+      SCHED_NREG_MOVES (i) = nreg_moves;
+
+      /* Generate each move.  */
+      old_reg = prev_reg = SET_DEST (single_set (u->insn));
+      for (i_reg_move = 0; i_reg_move < nreg_moves; i_reg_move++)
+	{
+	  ps_reg_move_info *move = ps_reg_move (ps, first_move + i_reg_move);
+
+	  move->def = i_reg_move > 0 ? first_move + i_reg_move - 1 : i;
+	  move->uses = sbitmap_alloc (g->num_nodes);
+	  move->old_reg = old_reg;
+	  move->new_reg = gen_reg_rtx (GET_MODE (prev_reg));
+	  move->insn = gen_move_insn (move->new_reg, copy_rtx (prev_reg));
+	  sbitmap_zero (move->uses);
+
+	  prev_reg = move->new_reg;
+	}
+
       /* Every use of the register defined by node may require a different
 	 copy of this register, depending on the time the use is scheduled.
-	 Set a bitmap vector, telling which nodes use each copy of this
-	 register.  */
-      uses_of_defs = sbitmap_vector_alloc (nreg_moves, g->num_nodes);
-      sbitmap_vector_zero (uses_of_defs, nreg_moves);
+	 Record which uses require which move results.  */
       for (e = u->out; e; e = e->next_out)
 	if (e->type == TRUE_DEP && e->dest != e->src)
 	  {
@@ -506,40 +577,39 @@ generate_reg_moves (partial_schedule_ptr
 	      dest_copy--;
 
 	    if (dest_copy)
-	      SET_BIT (uses_of_defs[dest_copy - 1], e->dest->cuid);
-	  }
-
-      /* Now generate the reg_moves, attaching relevant uses to them.  */
-      SCHED_NREG_MOVES (i) = nreg_moves;
-      old_reg = prev_reg = copy_rtx (SET_DEST (single_set (u->insn)));
-      /* Insert the reg-moves right before the notes which precede
-         the insn they relates to.  */
-      last_reg_move = u->first_note;
-
-      for (i_reg_move = 0; i_reg_move < nreg_moves; i_reg_move++)
-	{
-	  unsigned int i_use = 0;
-	  rtx new_reg = gen_reg_rtx (GET_MODE (prev_reg));
-	  rtx reg_move = gen_move_insn (new_reg, prev_reg);
-	  sbitmap_iterator sbi;
+	      {
+		ps_reg_move_info *move;
 
-	  add_insn_before (reg_move, last_reg_move, NULL);
-	  last_reg_move = reg_move;
+		move = ps_reg_move (ps, first_move + dest_copy - 1);
+		SET_BIT (move->uses, e->dest->cuid);
+	      }
+	  }
+    }
+  return true;
+}
 
-	  if (!SCHED_FIRST_REG_MOVE (i))
-	    SCHED_FIRST_REG_MOVE (i) = reg_move;
+/* Emit the moves associatied with PS.  Apply the substitutions
+   associated with them.  */
+static void
+apply_reg_moves (partial_schedule_ptr ps)
+{
+  ps_reg_move_info *move;
+  int i;
 
-	  EXECUTE_IF_SET_IN_SBITMAP (uses_of_defs[i_reg_move], 0, i_use, sbi)
-	    {
-	      replace_rtx (g->nodes[i_use].insn, old_reg, new_reg);
-	      if (rescan)
-		df_insn_rescan (g->nodes[i_use].insn);
-	    }
+  FOR_EACH_VEC_ELT (ps_reg_move_info, ps->reg_moves, i, move)
+    {
+      unsigned int i_use;
+      sbitmap_iterator sbi;
 
-	  prev_reg = new_reg;
+      EXECUTE_IF_SET_IN_SBITMAP (move->uses, 0, i_use, sbi)
+	{
+	  replace_rtx (ps->g->nodes[i_use].insn, move->old_reg, move->new_reg);
+	  df_insn_rescan (ps->g->nodes[i_use].insn);
 	}
-      sbitmap_vector_free (uses_of_defs);
     }
+
+  FOR_EACH_VEC_ELT (ps_reg_move_info, ps->reg_moves, i, move)
+    add_insn_before (move->insn, ps_first_note (ps, move->def), NULL);
 }
 
 /* Update the sched_params (time, row and stage) for node U using the II,
@@ -855,8 +925,7 @@ duplicate_insns_of_cycles (partial_sched
     for (ps_ij = ps->rows[row]; ps_ij; ps_ij = ps_ij->next_in_row)
       {
 	int u = ps_ij->id;
-	int j, i_reg_moves;
-	rtx reg_move = NULL_RTX;
+	int j, i_reg_moves, i_reg_move;
 	rtx u_insn;
 
         /* Do not duplicate any insn which refers to count_reg as it
@@ -880,12 +949,7 @@ duplicate_insns_of_cycles (partial_sched
 	    i_reg_moves = MIN (i_reg_moves, SCHED_NREG_MOVES (u));
 
 	    /* The reg_moves start from the *first* reg_move backwards.  */
-	    if (i_reg_moves)
-	      {
-		reg_move = SCHED_FIRST_REG_MOVE (u);
-		for (j = 1; j < i_reg_moves; j++)
-		  reg_move = PREV_INSN (reg_move);
-	      }
+	    i_reg_move = SCHED_FIRST_REG_MOVE (u) + (i_reg_moves - 1);
 	  }
 	else /* It's for the epilog.  */
 	  {
@@ -899,16 +963,15 @@ duplicate_insns_of_cycles (partial_sched
 	    i_reg_moves = MIN (i_reg_moves, SCHED_NREG_MOVES (u));
 
 	    /* The reg_moves start from the *last* reg_move forwards.  */
-	    if (i_reg_moves)
-	      {
-		reg_move = SCHED_FIRST_REG_MOVE (u);
-		for (j = 1; j < SCHED_NREG_MOVES (u); j++)
-		  reg_move = PREV_INSN (reg_move);
-	      }
+	    i_reg_move = SCHED_FIRST_REG_MOVE (u) + (SCHED_NREG_MOVES (u) - 1);
 	  }
 
-	for (j = 0; j < i_reg_moves; j++, reg_move = NEXT_INSN (reg_move))
-	  emit_insn (copy_rtx (PATTERN (reg_move)));
+	for (j = 0; j < i_reg_moves; j++)
+	  {
+	    ps_reg_move_info *move = ps_reg_move (ps, i_reg_move - j);
+
+	    emit_insn (copy_rtx (PATTERN (move->insn)));
+	  }
 	if (SCHED_STAGE (u) >= from_stage
 	    && SCHED_STAGE (u) <= to_stage)
 	  duplicate_insn_chain (ps_first_note (ps, u), u_insn);
@@ -1299,9 +1362,9 @@ sms_schedule (void)
       rtx head, tail;
       rtx count_reg, count_init;
       int mii, rec_mii;
-      unsigned stage_count = 0;
+      unsigned stage_count;
       HOST_WIDEST_INT loop_count = 0;
-      bool opt_sc_p = false;
+      bool opt_sc_p;
 
       if (! (g = g_arr[loop->num]))
         continue;
@@ -1378,54 +1441,60 @@ sms_schedule (void)
 	fprintf (dump_file, "SMS iis %d %d %d (rec_mii, mii, maxii)\n",
 		 rec_mii, mii, maxii);
 
-      set_node_sched_params (g);
-
-      ps = sms_schedule_by_order (g, mii, maxii, node_order);
-      
-      if (ps)
+      for (;;)
 	{
-	  /* Try to achieve optimized SC by normalizing the partial
-	     schedule (having the cycles start from cycle zero).
-	     The branch location must be placed in row ii-1 in the
-	     final scheduling.	If failed, shift all instructions to
-	     position the branch in row ii-1.  */
-	  opt_sc_p = optimize_sc (ps, g);
-	  if (opt_sc_p)
-	    stage_count = calculate_stage_count (ps, 0);
-	  else
+	  set_node_sched_params (g);
+
+	  stage_count = 0;
+	  opt_sc_p = false;
+	  ps = sms_schedule_by_order (g, mii, maxii, node_order);
+
+	  if (ps)
 	    {
-	      /* Bring the branch to cycle ii-1.  */
-	      int amount = SCHED_TIME (g->closing_branch->cuid) - (ps->ii - 1);
-	      
-	      if (dump_file)
-		fprintf (dump_file, "SMS schedule branch at cycle ii-1\n");
-	      
-	      stage_count = calculate_stage_count (ps, amount);
+	      /* Try to achieve optimized SC by normalizing the partial
+		 schedule (having the cycles start from cycle zero).
+		 The branch location must be placed in row ii-1 in the
+		 final scheduling.	If failed, shift all instructions to
+		 position the branch in row ii-1.  */
+	      opt_sc_p = optimize_sc (ps, g);
+	      if (opt_sc_p)
+		stage_count = calculate_stage_count (ps, 0);
+	      else
+		{
+		  /* Bring the branch to cycle ii-1.  */
+		  int amount = (SCHED_TIME (g->closing_branch->cuid)
+				- (ps->ii - 1));
+
+		  if (dump_file)
+		    fprintf (dump_file, "SMS schedule branch at cycle ii-1\n");
+
+		  stage_count = calculate_stage_count (ps, amount);
+		}
+
+	      gcc_assert (stage_count >= 1);
+	      PS_STAGE_COUNT (ps) = stage_count;
 	    }
-	  
-	  gcc_assert (stage_count >= 1);
-	  PS_STAGE_COUNT (ps) = stage_count;
-	}
-      
-      /* The default value of PARAM_SMS_MIN_SC is 2 as stage count of
-	 1 means that there is no interleaving between iterations thus
-	 we let the scheduling passes do the job in this case.  */
-      if (stage_count < (unsigned) PARAM_VALUE (PARAM_SMS_MIN_SC)
-	  || (count_init && (loop_count <= stage_count))
-	  || (flag_branch_probabilities && (trip_count <= stage_count)))
-	{
-	  if (dump_file)
+
+	  /* The default value of PARAM_SMS_MIN_SC is 2 as stage count of
+	     1 means that there is no interleaving between iterations thus
+	     we let the scheduling passes do the job in this case.  */
+	  if (stage_count < (unsigned) PARAM_VALUE (PARAM_SMS_MIN_SC)
+	      || (count_init && (loop_count <= stage_count))
+	      || (flag_branch_probabilities && (trip_count <= stage_count)))
 	    {
-	      fprintf (dump_file, "SMS failed... \n");
-	      fprintf (dump_file, "SMS sched-failed (stage-count=%d, loop-count=", stage_count);
-	      fprintf (dump_file, HOST_WIDEST_INT_PRINT_DEC, loop_count);
-	      fprintf (dump_file, ", trip-count=");
-	      fprintf (dump_file, HOST_WIDEST_INT_PRINT_DEC, trip_count);
-	      fprintf (dump_file, ")\n");
+	      if (dump_file)
+		{
+		  fprintf (dump_file, "SMS failed... \n");
+		  fprintf (dump_file, "SMS sched-failed (stage-count=%d,"
+			   " loop-count=", stage_count);
+		  fprintf (dump_file, HOST_WIDEST_INT_PRINT_DEC, loop_count);
+		  fprintf (dump_file, ", trip-count=");
+		  fprintf (dump_file, HOST_WIDEST_INT_PRINT_DEC, trip_count);
+		  fprintf (dump_file, ")\n");
+		}
+	      break;
 	    }
-	}
-      else
-	{
+
           if (!opt_sc_p)
             {
 	      /* Rotate the partial schedule to have the branch in row ii-1.  */
@@ -1437,6 +1506,13 @@ sms_schedule (void)
 	  
 	  set_columns_for_ps (ps);
 
+	  if (!schedule_reg_moves (ps))
+	    {
+	      mii = ps->ii + 1;
+	      free_partial_schedule (ps);
+	      continue;
+	    }
+
 	  canon_loop (loop);
 
           if (dump_file)
@@ -1475,15 +1551,16 @@ sms_schedule (void)
 	  /* The life-info is not valid any more.  */
 	  df_set_bb_dirty (g->bb);
 
-	  generate_reg_moves (ps, true);
+	  apply_reg_moves (ps);
 	  if (dump_file)
-	    print_node_sched_params (dump_file, g->num_nodes, g);
+	    print_node_sched_params (dump_file, g->num_nodes, ps);
 	  /* Generate prolog and epilog.  */
           generate_prolog_epilog (ps, loop, count_reg, count_init);
+	  break;
 	}
 
       free_partial_schedule (ps);
-      free (node_sched_params);
+      VEC_free (node_sched_params, heap, node_sched_param_vec);
       free (node_order);
       free_ddg (g);
     }
@@ -2570,6 +2647,7 @@ create_partial_schedule (int ii, ddg_ptr
   partial_schedule_ptr ps = XNEW (struct partial_schedule);
   ps->rows = (ps_insn_ptr *) xcalloc (ii, sizeof (ps_insn_ptr));
   ps->rows_length = (int *) xcalloc (ii, sizeof (int));
+  ps->reg_moves = NULL;
   ps->ii = ii;
   ps->history = history;
   ps->min_cycle = INT_MAX;
@@ -2604,8 +2682,16 @@ free_ps_insns (partial_schedule_ptr ps)
 static void
 free_partial_schedule (partial_schedule_ptr ps)
 {
+  ps_reg_move_info *move;
+  unsigned int i;
+
   if (!ps)
     return;
+
+  FOR_EACH_VEC_ELT (ps_reg_move_info, ps->reg_moves, i, move)
+    sbitmap_free (move->uses);
+  VEC_free (ps_reg_move_info, heap, ps->reg_moves);
+
   free_ps_insns (ps);
   free (ps->rows);
   free (ps->rows_length);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [3/4] SMS: Record moves in the partial schedule
  2011-09-22 16:38   ` Richard Sandiford
@ 2011-09-22 16:48     ` Ayal Zaks
  2011-09-28 16:16       ` Richard Sandiford
  0 siblings, 1 reply; 15+ messages in thread
From: Ayal Zaks @ 2011-09-22 16:48 UTC (permalink / raw)
  To: Ayal Zaks, gcc-patches, richard.sandiford

>> Only request is to document that the register moves are
>> placed/assigned-id's in a specific order.
>
>I suppose this is the downside of splitting the patches up, sorry,
>but the ids are only ordered for the throw-away loop:
>
> FOR_EACH_VEC_ELT_REVERSE (ps_reg_move_info, ps->reg_moves, i, move)
>   add_insn_before (move->insn, ps_first_note (ps, move->def), NULL);
>
>and for the prologue/epilogue code.  Both are replaced in patch 4
>with code that doesn't rely on the ordering of ids.
Ok then, very well. I was mostly referring to the following in
prologue/epiloque code, which merely relies on assigning all regmoves
of a node consecutive id's:

>          i_reg_moves = MIN (i_reg_moves, SCHED_NREG_MOVES (u));
>
>          /* The reg_moves start from the *first* reg_move backwards.  */
> !        i_reg_move = SCHED_FIRST_REG_MOVE (u) + (i_reg_moves - 1);

Ayal.

2011/9/22 Richard Sandiford <richard.sandiford@linaro.org>
>
> Ayal Zaks <ayal.zaks@gmail.com> writes:
> > Richard Sandiford <richard.sandiford@linaro.org> wrote on 30/08/2011
> > 03:10:50 PM:
> >
> >> From: Richard Sandiford <richard.sandiford@linaro.org>
> >> To: gcc-patches@gcc.gnu.org
> >> Cc: Ayal Zaks/Haifa/IBM@IBMIL
> >> Date: 30/08/2011 03:10 PM
> >> Subject: [3/4] SMS: Record moves in the partial schedule
> >
> >>
> >> This patch adds infrastructure that will be used by the final patch.
> >> Specifically:
> >>
> >>   - it splits the generation of register moves into two:
> > schedule_reg_moves
> >>     records moves in the partial schedule, while apply_reg_moves makes
> > the
> >>     register substitutions.
> >>
> >>     This patch doesn't actually schedule the moves.  Instead, there's
> >>     some throw-away code in apply_reg_moves to emit the moves in the
> >>     same as we do now.  That's throw-away code that will be removed
> >>     in the final patch.
> >>
> >>   - schedule_reg_moves is allowed to fail.  We then try again with the
> >>     next ii (subject to the usual ii limits).
> >>
> >>     In this patch, schedule_reg_moves always returns true.
> >>
> >>   - The partial schedule uses ids to represent register moves.
> >>     The first register move has id g->num_nodes.
> >>
> >> Richard
> > This is fine.
>
> Thanks.
>
> > Only request is to document that the register moves are
> > placed/assigned-id's in a specific order.
>
> I suppose this is the downside of splitting the patches up, sorry,
> but the ids are only ordered for the throw-away loop:
>
>  FOR_EACH_VEC_ELT_REVERSE (ps_reg_move_info, ps->reg_moves, i, move)
>    add_insn_before (move->insn, ps_first_note (ps, move->def), NULL);
>
> and for the prologue/epilogue code.  Both are replaced in patch 4
> with code that doesn't rely on the ordering of ids.
>
> I'd rather the code didn't rely on any ordering here.  If any future
> code is added that needs to know more about the moves, I think it should
> use the move structure instead of the ids.  (There's already a fair bit
> of info in the structure, but we could add more later if we need it.)
>
> > Functionality-wise it results in identical code as current version,
> > right?
>
> Yeah, that's right.  Only patch 4 does anything useful to the output.
>
> Richard
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [3/4] SMS: Record moves in the partial schedule
  2011-09-22  0:29 ` [3/4] SMS: Record moves in the partial schedule Ayal Zaks
@ 2011-09-22 16:38   ` Richard Sandiford
  2011-09-22 16:48     ` Ayal Zaks
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Sandiford @ 2011-09-22 16:38 UTC (permalink / raw)
  To: Ayal Zaks; +Cc: gcc-patches

Ayal Zaks <ayal.zaks@gmail.com> writes:
> Richard Sandiford <richard.sandiford@linaro.org> wrote on 30/08/2011
> 03:10:50 PM:
>
>> From: Richard Sandiford <richard.sandiford@linaro.org>
>> To: gcc-patches@gcc.gnu.org
>> Cc: Ayal Zaks/Haifa/IBM@IBMIL
>> Date: 30/08/2011 03:10 PM
>> Subject: [3/4] SMS: Record moves in the partial schedule
>
>>
>> This patch adds infrastructure that will be used by the final patch.
>> Specifically:
>>
>>   - it splits the generation of register moves into two:
> schedule_reg_moves
>>     records moves in the partial schedule, while apply_reg_moves makes
> the
>>     register substitutions.
>>
>>     This patch doesn't actually schedule the moves.  Instead, there's
>>     some throw-away code in apply_reg_moves to emit the moves in the
>>     same as we do now.  That's throw-away code that will be removed
>>     in the final patch.
>>
>>   - schedule_reg_moves is allowed to fail.  We then try again with the
>>     next ii (subject to the usual ii limits).
>>
>>     In this patch, schedule_reg_moves always returns true.
>>
>>   - The partial schedule uses ids to represent register moves.
>>     The first register move has id g->num_nodes.
>>
>> Richard
> This is fine.

Thanks.

> Only request is to document that the register moves are
> placed/assigned-id's in a specific order.

I suppose this is the downside of splitting the patches up, sorry,
but the ids are only ordered for the throw-away loop:

  FOR_EACH_VEC_ELT_REVERSE (ps_reg_move_info, ps->reg_moves, i, move)
    add_insn_before (move->insn, ps_first_note (ps, move->def), NULL);

and for the prologue/epilogue code.  Both are replaced in patch 4
with code that doesn't rely on the ordering of ids.

I'd rather the code didn't rely on any ordering here.  If any future
code is added that needs to know more about the moves, I think it should
use the move structure instead of the ids.  (There's already a fair bit
of info in the structure, but we could add more later if we need it.)

> Functionality-wise it results in identical code as current version,
> right?

Yeah, that's right.  Only patch 4 does anything useful to the output.

Richard


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [3/4] SMS: Record moves in the partial schedule
       [not found] <OF6AE2D848.53314E9C-ONC2257912.007D2C1A-C2257912.007D3A5D@il.ibm.com>
@ 2011-09-22  0:29 ` Ayal Zaks
  2011-09-22 16:38   ` Richard Sandiford
  0 siblings, 1 reply; 15+ messages in thread
From: Ayal Zaks @ 2011-09-22  0:29 UTC (permalink / raw)
  To: richard.sandiford; +Cc: gcc-patches

Richard Sandiford <richard.sandiford@linaro.org> wrote on 30/08/2011
03:10:50 PM:

> From: Richard Sandiford <richard.sandiford@linaro.org>
> To: gcc-patches@gcc.gnu.org
> Cc: Ayal Zaks/Haifa/IBM@IBMIL
> Date: 30/08/2011 03:10 PM
> Subject: [3/4] SMS: Record moves in the partial schedule

>
> This patch adds infrastructure that will be used by the final patch.
> Specifically:
>
>   - it splits the generation of register moves into two:
schedule_reg_moves
>     records moves in the partial schedule, while apply_reg_moves makes
the
>     register substitutions.
>
>     This patch doesn't actually schedule the moves.  Instead, there's
>     some throw-away code in apply_reg_moves to emit the moves in the
>     same as we do now.  That's throw-away code that will be removed
>     in the final patch.
>
>   - schedule_reg_moves is allowed to fail.  We then try again with the
>     next ii (subject to the usual ii limits).
>
>     In this patch, schedule_reg_moves always returns true.
>
>   - The partial schedule uses ids to represent register moves.
>     The first register move has id g->num_nodes.
>
> Richard
This is fine. Only request is to document that the register moves are
placed/assigned-id's in a specific order. Functionality-wise it
results in identical code as current version, right?

Thanks,
Ayal.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-10-04  7:50 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-30 12:45 [0/4] Make SMS schedule register moves Richard Sandiford
2011-08-30 12:46 ` [1/4] SMS: remove register undo list Richard Sandiford
2011-08-30 13:04 ` [2/4] SMS: Use ids to represent ps_insns Richard Sandiford
2011-08-30 13:12 ` [3/4] SMS: Record moves in the partial schedule Richard Sandiford
2011-08-30 13:17 ` [4/4] Make SMS schedule register moves Richard Sandiford
2011-08-30 13:22   ` Richard Guenther
2011-08-30 13:43     ` Richard Sandiford
2011-08-30 13:50       ` Richard Guenther
2011-08-30 13:57         ` Richard Sandiford
     [not found] <OF6AE2D848.53314E9C-ONC2257912.007D2C1A-C2257912.007D3A5D@il.ibm.com>
2011-09-22  0:29 ` [3/4] SMS: Record moves in the partial schedule Ayal Zaks
2011-09-22 16:38   ` Richard Sandiford
2011-09-22 16:48     ` Ayal Zaks
2011-09-28 16:16       ` Richard Sandiford
2011-10-03 20:25         ` Ayal Zaks
2011-10-04  8:06           ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).