public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Bad code generated by inter-block scheduling
@ 2004-09-22 18:52 Ulrich Weigand
  2004-09-22 21:36 ` Jeffrey A Law
  0 siblings, 1 reply; 6+ messages in thread
From: Ulrich Weigand @ 2004-09-22 18:52 UTC (permalink / raw)
  To: gcc

Hello,

current mainline miscompiles gcc.c-torture/execute/loop-14.c with -Os
due to what I believe to be a bug in inter-block scheduling in sched-rgn.c.

The test case is:

void f(int *a)
{
  int i;

  for (i=3; --i;)
    a[i] = 42 / i;
}

which gets transformed into (before scheduling):

basic block 0:
  reg51 = 3
  reg61 = 42
  reg66 = a + 12
  goto basic block 2

basic block 1:
  reg62 = reg61 divmod reg50       [*]
  mem(reg66) = subreg(reg62)
  fall through to basic block 2

basic block 2:
  reg50 = reg51 - 1
  reg51 = reg50
  reg66 = reg66 - 4
  if (reg50 != 0) goto basic block 1

Now, interblock scheduling goes and moves the divmod instruction [*]
into basic block 2.  This causes a division by zero to happen at
runtime.

While sched-rgn.c correctly detects that [*] is an insn that may
trap, it ignores this fact, because it considers this move to be
a regular interblock motion, *not* a speculative motion, i.e. it
thinks the divmod instruction would have been executed in any case
even before the move, so trapping doesn't matter.

The task of distiguishing between interblock and speculative motion
is done using the notion of 'potential-split-edges', as computed
by compute_dom_prob_ps in sched-rgn.c.  As I understand this code,
it looks for (generalized versions of) situations like this:

      [A]
     /   \
  [B]     [C]

If it finds such an edge A -> C, then a move from B to A would be
speculative, else it is a regular interblock motion.

Now, in this particular case, we obviously *do* have exactly that
situation, with [A] = bb2, [B] = bb 1, [C] = exit block.

However, the data structures available to compute_dom_prob_ps simply
do not contain the 'bb2 -> exit' edge in the first place!  This is
because where the private structures in sched-rgn.c are built
(build_control_flow), all edges leading to the exit block are 
completely ignored:

  nr_edges = 0;
  for (i = 0; i < num_edges; i++)
    {
      edge e = INDEX_EDGE (edge_list, i);

      if (e->dest != EXIT_BLOCK_PTR
          && e->src != ENTRY_BLOCK_PTR)
        new_edge (e->src->index, e->dest->index);
    }

From what I see from the revision histories, the behaviour apparently
has been the same since forever, so I don't know why this suddenly
shows up as a problem ...

Simply adding entry/exit edges to the sched-rgn.c tables is not completely
trivial due to the negative block index numbers; several global arrays in
sched-rgn.c are indexed by block number.  Any suggestions how to fix this?

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  weigand@informatik.uni-erlangen.de

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bad code generated by inter-block scheduling
  2004-09-22 18:52 Bad code generated by inter-block scheduling Ulrich Weigand
@ 2004-09-22 21:36 ` Jeffrey A Law
  2004-09-24  1:21   ` [PATCH] " Ulrich Weigand
  0 siblings, 1 reply; 6+ messages in thread
From: Jeffrey A Law @ 2004-09-22 21:36 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: gcc

On Wed, 2004-09-22 at 12:34, Ulrich Weigand wrote:

> 
> The task of distiguishing between interblock and speculative motion
> is done using the notion of 'potential-split-edges', as computed
> by compute_dom_prob_ps in sched-rgn.c.  As I understand this code,
> it looks for (generalized versions of) situations like this:
> 
>       [A]
>      /   \
>   [B]     [C]
> 
> If it finds such an edge A -> C, then a move from B to A would be
> speculative, else it is a regular interblock motion.
> 
> Now, in this particular case, we obviously *do* have exactly that
> situation, with [A] = bb2, [B] = bb 1, [C] = exit block.
> 
> However, the data structures available to compute_dom_prob_ps simply
> do not contain the 'bb2 -> exit' edge in the first place!  This is
> because where the private structures in sched-rgn.c are built
> (build_control_flow), all edges leading to the exit block are 
> completely ignored:
> 
>   nr_edges = 0;
>   for (i = 0; i < num_edges; i++)
>     {
>       edge e = INDEX_EDGE (edge_list, i);
> 
>       if (e->dest != EXIT_BLOCK_PTR
>           && e->src != ENTRY_BLOCK_PTR)
>         new_edge (e->src->index, e->dest->index);
>     }
> 
> >From what I see from the revision histories, the behaviour apparently
> has been the same since forever, so I don't know why this suddenly
> shows up as a problem ...
> 
> Simply adding entry/exit edges to the sched-rgn.c tables is not completely
> trivial due to the negative block index numbers; several global arrays in
> sched-rgn.c are indexed by block number.  Any suggestions how to fix this?
I've always hated the potential-split-edges nonsense in the
interblock scheduling code.  It seems to me what they're really
after is just post-dominance.

jeff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] Re: Bad code generated by inter-block scheduling
  2004-09-22 21:36 ` Jeffrey A Law
@ 2004-09-24  1:21   ` Ulrich Weigand
  2004-09-24  6:46     ` Jeffrey A Law
  0 siblings, 1 reply; 6+ messages in thread
From: Ulrich Weigand @ 2004-09-24  1:21 UTC (permalink / raw)
  To: law; +Cc: gcc, gcc-patches

Jeffrey A Law wrote:
>On Wed, 2004-09-22 at 12:34, Ulrich Weigand wrote:
>> Simply adding entry/exit edges to the sched-rgn.c tables is not completely
>> trivial due to the negative block index numbers; several global arrays in
>> sched-rgn.c are indexed by block number.  Any suggestions how to fix this?
>I've always hated the potential-split-edges nonsense in the
>interblock scheduling code.  It seems to me what they're really
>after is just post-dominance.

For the decision interblock vs. speculative, I agree that this
should just be post-dominance.  However, the split-edges stuff
is also used, in the speculative case, to find all the blocks
where speculative motion could cause a variable to become live
at start that previously wasn't.

Maybe there's some way of retrieving this information from
the dominator graph as well, but right now I don't really want
to do actual algorithmic changes, just fix my bug ;-)

Instead of piling more hacks on top, I've tried to completely
remove 'haifa_edge' and everything related to it, and use the
standard CFG data structures instead -- this way we will
automatically get the exit edges.  This went quite smoothly,
I only had to change a couple of other places to use basic_block
or edge pointers instead of indices.

Apart from the more mechanical changes related to the data
structure changes, the routines find_rgns, compute_dom_prob_ps,
compute_trg_info and propagate_deps had to be changed to 
traverse CFG instead of haifa_edge.  The build_control_flow
routine is now completely gone, of course; the only thing it did
apart from building the haifa_edge tables, checking for
trivially unreachable blocks, I've moved to is_cfg_nonregular.
I now don't even need to call create_edge_list at all, the
only other user of the edge_list, find_rgns, could easily
be changed to just use the regular CFG data structures.

Except for the consideration of exit edges where appropriate,
this patch should have no algorithmic changes whatsoever.

Bootstrapped/regtested on s390-ibm-linux and s390x-ibm-linux,
fixes the loop-14.c -Os test case on s390x.

OK for mainline?

Bye,
Ulrich


ChangeLog:

	* sched-rgn.c (haifa_edge, edge_table, NEXT_IN, NEXT_OUT, FROM_BLOCK,
	TO_BLOCK, nr_edges, in_edges, out_edges, IN_EDGES, OUT_EDGES,
	build_control_flow, new_edge): Remove.
	(schedule_insns): Remove edge_table/in_edges/out_edges cleanup.
	(bitlst, bitlst_table_last, bitlst_table): Remove.
	(bblst): Store basic_block pointer instead of block index.
	(bblst_table): Likewise.
	(edgelst): Store edge pointer instead of edge index.
	(edgelst_table, edgelst_last): New variables.
	(extract_bitlst): Rename to ...
	(extract_edgelst): ... this.  Return edge pointers, not indices.
	(split_edges): Update call.
	(rgn_edges): Store edge pointers instead of indices.
	(edge_to_bit): Remove.
	(EDGE_TO_BIT): Store per-region edge index in edge->aux.
	(SET_EDGE_TO_BIT): New macro.
	(is_cfg_nonregular): Check for simple cases of unreachable blocks.
	(find_rgns): Remove edge_list parameter.  Traverse standard CFG
	data structures instead of haifa_edge et al.  Use edge pointers
	instead of edge indices everywhere.
	(compute_dom_prob_ps): Use standard CFG data structures.  Account
	for exit edges.
	(compute_trg_info): Likewise.
	(propagate_deps): Likewise.
	(debug_candidate): Account for bblst data structure change.
	(check_live_1, update_live_1, is_pfree): Likewise.
	(IS_REACHABLE): Use standard CFG data structures.
	(init_ready_list): Update bblst_table/edgelst_table allocation.
	(schedule_region): Update alloc/cleanup code to data structure
	changes.  Use edge->aux to store per-region edge index.
	(init_regions): No longer call build_control_flow.  Do not
	create edge list any more.


Index: gcc/sched-rgn.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/sched-rgn.c,v
retrieving revision 1.82
diff -c -p -r1.82 sched-rgn.c
*** gcc/sched-rgn.c	18 Sep 2004 19:47:10 -0000	1.82
--- gcc/sched-rgn.c	23 Sep 2004 20:44:26 -0000
*************** Software Foundation, 59 Temple Place - S
*** 87,122 ****
  /* nr_inter/spec counts interblock/speculative motion for the function.  */
  static int nr_inter, nr_spec;
  
- /* Control flow graph edges are kept in circular lists.  */
- typedef struct
- {
-   int from_block;
-   int to_block;
-   int next_in;
-   int next_out;
- }
- haifa_edge;
- static haifa_edge *edge_table;
- 
- #define NEXT_IN(edge) (edge_table[edge].next_in)
- #define NEXT_OUT(edge) (edge_table[edge].next_out)
- #define FROM_BLOCK(edge) (edge_table[edge].from_block)
- #define TO_BLOCK(edge) (edge_table[edge].to_block)
- 
- /* Number of edges in the control flow graph.  (In fact, larger than
-    that by 1, since edge 0 is unused.)  */
- static int nr_edges;
- 
- /* Circular list of incoming/outgoing edges of a block.  */
- static int *in_edges;
- static int *out_edges;
- 
- #define IN_EDGES(block) (in_edges[block])
- #define OUT_EDGES(block) (out_edges[block])
- 
  static int is_cfg_nonregular (void);
- static int build_control_flow (struct edge_list *);
- static void new_edge (int, int);
  static bool sched_is_disabled_for_current_region_p (void);
  
  /* A region is the main entity for interblock scheduling: insns
--- 87,93 ----
*************** static int *containing_rgn;
*** 154,160 ****
  
  void debug_regions (void);
  static void find_single_block_region (void);
! static void find_rgns (struct edge_list *);
  static bool too_large (int, int *, int *);
  
  extern void debug_live (int, int);
--- 125,131 ----
  
  void debug_regions (void);
  static void find_single_block_region (void);
! static void find_rgns (void);
  static bool too_large (int, int *, int *);
  
  extern void debug_live (int, int);
*************** static int current_blocks;
*** 166,190 ****
  /* The mapping from bb to block.  */
  #define BB_TO_BLOCK(bb) (rgn_bb_table[current_blocks + (bb)])
  
- typedef struct
- {
-   int *first_member;		/* Pointer to the list start in bitlst_table.  */
-   int nr_members;		/* The number of members of the bit list.  */
- }
- bitlst;
- 
- static int bitlst_table_last;
- static int *bitlst_table;
- 
- static void extract_bitlst (sbitmap, bitlst *);
- 
  /* Target info declarations.
  
     The block currently being scheduled is referred to as the "target" block,
     while other blocks in the region from which insns can be moved to the
     target are called "source" blocks.  The candidate structure holds info
     about such sources: are they valid?  Speculative?  Etc.  */
! typedef bitlst bblst;
  typedef struct
  {
    char is_valid;
--- 137,155 ----
  /* The mapping from bb to block.  */
  #define BB_TO_BLOCK(bb) (rgn_bb_table[current_blocks + (bb)])
  
  /* Target info declarations.
  
     The block currently being scheduled is referred to as the "target" block,
     while other blocks in the region from which insns can be moved to the
     target are called "source" blocks.  The candidate structure holds info
     about such sources: are they valid?  Speculative?  Etc.  */
! typedef struct
! {
!   basic_block *first_member;
!   int nr_members;
! }
! bblst;
! 
  typedef struct
  {
    char is_valid;
*************** static candidate *candidate_table;
*** 204,210 ****
  
     Lists of split and update blocks for each candidate of the current
     target are in array bblst_table.  */
! static int *bblst_table, bblst_size, bblst_last;
  
  #define IS_VALID(src) ( candidate_table[src].is_valid )
  #define IS_SPECULATIVE(src) ( candidate_table[src].is_speculative )
--- 169,176 ----
  
     Lists of split and update blocks for each candidate of the current
     target are in array bblst_table.  */
! static basic_block *bblst_table;
! static int bblst_size, bblst_last;
  
  #define IS_VALID(src) ( candidate_table[src].is_valid )
  #define IS_SPECULATIVE(src) ( candidate_table[src].is_speculative )
*************** static int *bblst_table, bblst_size, bbl
*** 214,220 ****
  static int target_bb;
  
  /* List of edges.  */
! typedef bitlst edgelst;
  
  /* Target info functions.  */
  static void split_edges (int, int, edgelst *);
--- 180,197 ----
  static int target_bb;
  
  /* List of edges.  */
! typedef struct
! {
!   edge *first_member;
!   int nr_members;
! }
! edgelst;
! 
! static edge *edgelst_table;
! static int edgelst_last;
! 
! static void extract_edgelst (sbitmap, edgelst *);
! 
  
  /* Target info functions.  */
  static void split_edges (int, int, edgelst *);
*************** typedef sbitmap edgeset;
*** 250,261 ****
  static int rgn_nr_edges;
  
  /* Array of size rgn_nr_edges.  */
! static int *rgn_edges;
! 
  
  /* Mapping from each edge in the graph to its number in the rgn.  */
! static int *edge_to_bit;
! #define EDGE_TO_BIT(edge) (edge_to_bit[edge])
  
  /* The split edges of a source bb is different for each target
     bb.  In order to compute this efficiently, the 'potential-split edges'
--- 227,237 ----
  static int rgn_nr_edges;
  
  /* Array of size rgn_nr_edges.  */
! static edge *rgn_edges;
  
  /* Mapping from each edge in the graph to its number in the rgn.  */
! #define EDGE_TO_BIT(edge) ((int)(size_t)(edge)->aux)
! #define SET_EDGE_TO_BIT(edge,nr) ((edge)->aux = (void *)(size_t)(nr))
  
  /* The split edges of a source bb is different for each target
     bb.  In order to compute this efficiently, the 'potential-split edges'
*************** static void free_pending_lists (void);
*** 308,315 ****
  /* Return 1 if control flow graph should not be constructed, 0 otherwise.
  
     We decide not to build the control flow graph if there is possibly more
!    than one entry to the function, if computed branches exist, of if we
!    have nonlocal gotos.  */
  
  static int
  is_cfg_nonregular (void)
--- 284,291 ----
  /* Return 1 if control flow graph should not be constructed, 0 otherwise.
  
     We decide not to build the control flow graph if there is possibly more
!    than one entry to the function, if computed branches exist, if we
!    have nonlocal gotos, or if we have an unreachable loop.  */
  
  static int
  is_cfg_nonregular (void)
*************** is_cfg_nonregular (void)
*** 360,500 ****
  	  break;
        }
  
-   /* All the tests passed.  Consider the cfg well structured.  */
-   return 0;
- }
- 
- /* Build the control flow graph and set nr_edges.
- 
-    Instead of trying to build a cfg ourselves, we rely on flow to
-    do it for us.  Stamp out useless code (and bug) duplication.
- 
-    Return nonzero if an irregularity in the cfg is found which would
-    prevent cross block scheduling.  */
- 
- static int
- build_control_flow (struct edge_list *edge_list)
- {
-   int i, unreachable, num_edges;
-   basic_block b;
- 
-   /* This already accounts for entry/exit edges.  */
-   num_edges = NUM_EDGES (edge_list);
- 
    /* Unreachable loops with more than one basic block are detected
       during the DFS traversal in find_rgns.
  
       Unreachable loops with a single block are detected here.  This
       test is redundant with the one in find_rgns, but it's much
!     cheaper to go ahead and catch the trivial case here.  */
!   unreachable = 0;
    FOR_EACH_BB (b)
      {
        if (b->pred == NULL
  	  || (b->pred->src == b
  	      && b->pred->pred_next == NULL))
! 	unreachable = 1;
!     }
! 
!   /* ??? We can kill these soon.  */
!   in_edges = xcalloc (last_basic_block, sizeof (int));
!   out_edges = xcalloc (last_basic_block, sizeof (int));
!   edge_table = xcalloc (num_edges, sizeof (haifa_edge));
! 
!   nr_edges = 0;
!   for (i = 0; i < num_edges; i++)
!     {
!       edge e = INDEX_EDGE (edge_list, i);
! 
!       if (e->dest != EXIT_BLOCK_PTR
! 	  && e->src != ENTRY_BLOCK_PTR)
! 	new_edge (e->src->index, e->dest->index);
!     }
! 
!   /* Increment by 1, since edge 0 is unused.  */
!   nr_edges++;
! 
!   return unreachable;
! }
! 
! /* Record an edge in the control flow graph from SOURCE to TARGET.
! 
!    In theory, this is redundant with the s_succs computed above, but
!    we have not converted all of haifa to use information from the
!    integer lists.  */
! 
! static void
! new_edge (int source, int target)
! {
!   int e, next_edge;
!   int curr_edge, fst_edge;
! 
!   /* Check for duplicates.  */
!   fst_edge = curr_edge = OUT_EDGES (source);
!   while (curr_edge)
!     {
!       if (FROM_BLOCK (curr_edge) == source
! 	  && TO_BLOCK (curr_edge) == target)
! 	{
! 	  return;
! 	}
! 
!       curr_edge = NEXT_OUT (curr_edge);
! 
!       if (fst_edge == curr_edge)
! 	break;
!     }
! 
!   e = ++nr_edges;
! 
!   FROM_BLOCK (e) = source;
!   TO_BLOCK (e) = target;
! 
!   if (OUT_EDGES (source))
!     {
!       next_edge = NEXT_OUT (OUT_EDGES (source));
!       NEXT_OUT (OUT_EDGES (source)) = e;
!       NEXT_OUT (e) = next_edge;
!     }
!   else
!     {
!       OUT_EDGES (source) = e;
!       NEXT_OUT (e) = e;
      }
  
!   if (IN_EDGES (target))
!     {
!       next_edge = NEXT_IN (IN_EDGES (target));
!       NEXT_IN (IN_EDGES (target)) = e;
!       NEXT_IN (e) = next_edge;
!     }
!   else
!     {
!       IN_EDGES (target) = e;
!       NEXT_IN (e) = e;
!     }
  }
  
! /* Translate a bit-set SET to a list BL of the bit-set members.  */
  
  static void
! extract_bitlst (sbitmap set, bitlst *bl)
  {
    int i;
  
!   /* bblst table space is reused in each call to extract_bitlst.  */
!   bitlst_table_last = 0;
  
!   bl->first_member = &bitlst_table[bitlst_table_last];
!   bl->nr_members = 0;
  
    /* Iterate over each word in the bitset.  */
    EXECUTE_IF_SET_IN_SBITMAP (set, 0, i,
    {
!     bitlst_table[bitlst_table_last++] = i;
!     (bl->nr_members)++;
    });
- 
  }
  
  /* Functions for the construction of regions.  */
--- 336,378 ----
  	  break;
        }
  
    /* Unreachable loops with more than one basic block are detected
       during the DFS traversal in find_rgns.
  
       Unreachable loops with a single block are detected here.  This
       test is redundant with the one in find_rgns, but it's much
!      cheaper to go ahead and catch the trivial case here.  */
    FOR_EACH_BB (b)
      {
        if (b->pred == NULL
  	  || (b->pred->src == b
  	      && b->pred->pred_next == NULL))
! 	return 1;
      }
  
!   /* All the tests passed.  Consider the cfg well structured.  */
!   return 0;
  }
  
! /* Extract list of edges from a bitmap containing EDGE_TO_BIT bits.  */
  
  static void
! extract_edgelst (sbitmap set, edgelst *el)
  {
    int i;
  
!   /* edgelst table space is reused in each call to extract_edgelst.  */
!   edgelst_last = 0;
  
!   el->first_member = &edgelst_table[edgelst_last];
!   el->nr_members = 0;
  
    /* Iterate over each word in the bitset.  */
    EXECUTE_IF_SET_IN_SBITMAP (set, 0, i,
    {
!     edgelst_table[edgelst_last++] = rgn_edges[i];
!     el->nr_members++;
    });
  }
  
  /* Functions for the construction of regions.  */
*************** too_large (int block, int *num_bbs, int 
*** 609,628 ****
     of edge tables.  That would simplify it somewhat.  */
  
  static void
! find_rgns (struct edge_list *edge_list)
  {
!   int *max_hdr, *dfs_nr, *stack, *degree;
    char no_loops = 1;
    int node, child, loop_head, i, head, tail;
    int count = 0, sp, idx = 0;
!   int current_edge = out_edges[ENTRY_BLOCK_PTR->succ->dest->index];
    int num_bbs, num_insns, unreachable;
    int too_large_failure;
    basic_block bb;
  
-   /* Note if an edge has been passed.  */
-   sbitmap passed;
- 
    /* Note if a block is a natural loop header.  */
    sbitmap header;
  
--- 487,504 ----
     of edge tables.  That would simplify it somewhat.  */
  
  static void
! find_rgns (void)
  {
!   int *max_hdr, *dfs_nr, *degree;
    char no_loops = 1;
    int node, child, loop_head, i, head, tail;
    int count = 0, sp, idx = 0;
!   edge current_edge = ENTRY_BLOCK_PTR->succ->dest->succ;
!   edge *stack;
    int num_bbs, num_insns, unreachable;
    int too_large_failure;
    basic_block bb;
  
    /* Note if a block is a natural loop header.  */
    sbitmap header;
  
*************** find_rgns (struct edge_list *edge_list)
*** 635,642 ****
    /* Note if a block is in the block queue.  */
    sbitmap in_stack;
  
-   int num_edges = NUM_EDGES (edge_list);
- 
    /* Perform a DFS traversal of the cfg.  Identify loop headers, inner loops
       and a mapping from block to its loop header (if the block is contained
       in a loop, else -1).
--- 511,516 ----
*************** find_rgns (struct edge_list *edge_list)
*** 649,655 ****
    /* Allocate and initialize variables for the first traversal.  */
    max_hdr = xmalloc (last_basic_block * sizeof (int));
    dfs_nr = xcalloc (last_basic_block, sizeof (int));
!   stack = xmalloc (nr_edges * sizeof (int));
  
    inner = sbitmap_alloc (last_basic_block);
    sbitmap_ones (inner);
--- 523,529 ----
    /* Allocate and initialize variables for the first traversal.  */
    max_hdr = xmalloc (last_basic_block * sizeof (int));
    dfs_nr = xcalloc (last_basic_block, sizeof (int));
!   stack = xmalloc (n_edges * sizeof (edge));
  
    inner = sbitmap_alloc (last_basic_block);
    sbitmap_ones (inner);
*************** find_rgns (struct edge_list *edge_list)
*** 657,665 ****
    header = sbitmap_alloc (last_basic_block);
    sbitmap_zero (header);
  
-   passed = sbitmap_alloc (nr_edges);
-   sbitmap_zero (passed);
- 
    in_queue = sbitmap_alloc (last_basic_block);
    sbitmap_zero (in_queue);
  
--- 531,536 ----
*************** find_rgns (struct edge_list *edge_list)
*** 669,699 ****
    for (i = 0; i < last_basic_block; i++)
      max_hdr[i] = -1;
  
    /* DFS traversal to find inner loops in the cfg.  */
  
    sp = -1;
    while (1)
      {
!       if (current_edge == 0 || TEST_BIT (passed, current_edge))
  	{
  	  /* We have reached a leaf node or a node that was already
  	     processed.  Pop edges off the stack until we find
  	     an edge that has not yet been processed.  */
! 	  while (sp >= 0
! 		 && (current_edge == 0 || TEST_BIT (passed, current_edge)))
  	    {
  	      /* Pop entry off the stack.  */
  	      current_edge = stack[sp--];
! 	      node = FROM_BLOCK (current_edge);
! 	      child = TO_BLOCK (current_edge);
  	      RESET_BIT (in_stack, child);
  	      if (max_hdr[child] >= 0 && TEST_BIT (in_stack, max_hdr[child]))
  		UPDATE_LOOP_RELATIONS (node, max_hdr[child]);
! 	      current_edge = NEXT_OUT (current_edge);
  	    }
  
  	  /* See if have finished the DFS tree traversal.  */
! 	  if (sp < 0 && TEST_BIT (passed, current_edge))
  	    break;
  
  	  /* Nope, continue the traversal with the popped node.  */
--- 540,574 ----
    for (i = 0; i < last_basic_block; i++)
      max_hdr[i] = -1;
  
+   #define EDGE_PASSED(E) (!(E) || (E)->aux)
+   #define SET_EDGE_PASSED(E) ((E)->aux = (E))
+ 
    /* DFS traversal to find inner loops in the cfg.  */
  
    sp = -1;
    while (1)
      {
!       if (EDGE_PASSED (current_edge))
  	{
  	  /* We have reached a leaf node or a node that was already
  	     processed.  Pop edges off the stack until we find
  	     an edge that has not yet been processed.  */
! 	  while (sp >= 0 && EDGE_PASSED (current_edge))
  	    {
  	      /* Pop entry off the stack.  */
  	      current_edge = stack[sp--];
! 	      node = current_edge->src->index;
! 	      gcc_assert (node != ENTRY_BLOCK);
! 	      child = current_edge->dest->index;
! 	      gcc_assert (child != EXIT_BLOCK);
  	      RESET_BIT (in_stack, child);
  	      if (max_hdr[child] >= 0 && TEST_BIT (in_stack, max_hdr[child]))
  		UPDATE_LOOP_RELATIONS (node, max_hdr[child]);
! 	      current_edge = current_edge->succ_next;
  	    }
  
  	  /* See if have finished the DFS tree traversal.  */
! 	  if (sp < 0 && EDGE_PASSED (current_edge))
  	    break;
  
  	  /* Nope, continue the traversal with the popped node.  */
*************** find_rgns (struct edge_list *edge_list)
*** 701,711 ****
  	}
  
        /* Process a node.  */
!       node = FROM_BLOCK (current_edge);
!       child = TO_BLOCK (current_edge);
        SET_BIT (in_stack, node);
        dfs_nr[node] = ++count;
  
        /* If the successor is in the stack, then we've found a loop.
  	 Mark the loop, if it is not a natural loop, then it will
  	 be rejected during the second traversal.  */
--- 576,595 ----
  	}
  
        /* Process a node.  */
!       node = current_edge->src->index;
!       gcc_assert (node != ENTRY_BLOCK);
        SET_BIT (in_stack, node);
        dfs_nr[node] = ++count;
  
+       /* We don't traverse to the exit block.  */
+       child = current_edge->dest->index;
+       if (child == EXIT_BLOCK)
+ 	{
+ 	  SET_EDGE_PASSED (current_edge);
+ 	  current_edge = current_edge->succ_next;
+ 	  continue;
+ 	}
+ 
        /* If the successor is in the stack, then we've found a loop.
  	 Mark the loop, if it is not a natural loop, then it will
  	 be rejected during the second traversal.  */
*************** find_rgns (struct edge_list *edge_list)
*** 714,721 ****
  	  no_loops = 0;
  	  SET_BIT (header, child);
  	  UPDATE_LOOP_RELATIONS (node, child);
! 	  SET_BIT (passed, current_edge);
! 	  current_edge = NEXT_OUT (current_edge);
  	  continue;
  	}
  
--- 598,605 ----
  	  no_loops = 0;
  	  SET_BIT (header, child);
  	  UPDATE_LOOP_RELATIONS (node, child);
! 	  SET_EDGE_PASSED (current_edge);
! 	  current_edge = current_edge->succ_next;
  	  continue;
  	}
  
*************** find_rgns (struct edge_list *edge_list)
*** 726,757 ****
  	{
  	  if (max_hdr[child] >= 0 && TEST_BIT (in_stack, max_hdr[child]))
  	    UPDATE_LOOP_RELATIONS (node, max_hdr[child]);
! 	  SET_BIT (passed, current_edge);
! 	  current_edge = NEXT_OUT (current_edge);
  	  continue;
  	}
  
        /* Push an entry on the stack and continue DFS traversal.  */
        stack[++sp] = current_edge;
!       SET_BIT (passed, current_edge);
!       current_edge = OUT_EDGES (child);
  
!       /* This is temporary until haifa is converted to use rth's new
! 	 cfg routines which have true entry/exit blocks and the
! 	 appropriate edges from/to those blocks.
! 
! 	 Generally we update dfs_nr for a node when we process its
! 	 out edge.  However, if the node has no out edge then we will
! 	 not set dfs_nr for that node.  This can confuse the scheduler
! 	 into thinking that we have unreachable blocks, which in turn
! 	 disables cross block scheduling.
! 
! 	 So, if we have a node with no out edges, go ahead and mark it
! 	 as reachable now.  */
!       if (current_edge == 0)
! 	dfs_nr[child] = ++count;
      }
  
    /* Another check for unreachable blocks.  The earlier test in
       is_cfg_nonregular only finds unreachable blocks that do not
       form a loop.
--- 610,635 ----
  	{
  	  if (max_hdr[child] >= 0 && TEST_BIT (in_stack, max_hdr[child]))
  	    UPDATE_LOOP_RELATIONS (node, max_hdr[child]);
! 	  SET_EDGE_PASSED (current_edge);
! 	  current_edge = current_edge->succ_next;
  	  continue;
  	}
  
        /* Push an entry on the stack and continue DFS traversal.  */
        stack[++sp] = current_edge;
!       SET_EDGE_PASSED (current_edge);
!       current_edge = current_edge->dest->succ;
!     }
  
!   /* Reset ->aux field used by EDGE_PASSED.  */
!   FOR_ALL_BB (bb)
!     {
!       edge e;
!       for (e = bb->succ; e; e = e->succ_next)
! 	e->aux = NULL;
      }
  
+ 
    /* Another check for unreachable blocks.  The earlier test in
       is_cfg_nonregular only finds unreachable blocks that do not
       form a loop.
*************** find_rgns (struct edge_list *edge_list)
*** 772,784 ****
    degree = dfs_nr;
  
    FOR_EACH_BB (bb)
-     degree[bb->index] = 0;
-   for (i = 0; i < num_edges; i++)
      {
!       edge e = INDEX_EDGE (edge_list, i);
! 
!       if (e->dest != EXIT_BLOCK_PTR)
! 	degree[e->dest->index]++;
      }
  
    /* Do not perform region scheduling if there are any unreachable
--- 650,660 ----
    degree = dfs_nr;
  
    FOR_EACH_BB (bb)
      {
!       edge e;
!       degree[bb->index] = 0;
!       for (e = bb->pred; e; e = e->pred_next)
! 	degree[bb->index]++;
      }
  
    /* Do not perform region scheduling if there are any unreachable
*************** find_rgns (struct edge_list *edge_list)
*** 1021,1027 ****
    free (max_hdr);
    free (dfs_nr);
    free (stack);
-   sbitmap_free (passed);
    sbitmap_free (header);
    sbitmap_free (inner);
    sbitmap_free (in_queue);
--- 897,902 ----
*************** find_rgns (struct edge_list *edge_list)
*** 1036,1043 ****
  static void
  compute_dom_prob_ps (int bb)
  {
!   int nxt_in_edge, fst_in_edge, pred;
!   int fst_out_edge, nxt_out_edge, nr_out_edges, nr_rgn_out_edges;
  
    prob[bb] = 0.0;
    if (IS_RGN_ENTRY (bb))
--- 911,919 ----
  static void
  compute_dom_prob_ps (int bb)
  {
!   int pred_bb;
!   int nr_out_edges, nr_rgn_out_edges;
!   edge in_edge, out_edge;
  
    prob[bb] = 0.0;
    if (IS_RGN_ENTRY (bb))
*************** compute_dom_prob_ps (int bb)
*** 1047,1089 ****
        return;
      }
  
-   fst_in_edge = nxt_in_edge = IN_EDGES (BB_TO_BLOCK (bb));
- 
    /* Initialize dom[bb] to '111..1'.  */
    sbitmap_ones (dom[bb]);
  
!   do
      {
!       pred = FROM_BLOCK (nxt_in_edge);
!       sbitmap_a_and_b (dom[bb], dom[bb], dom[BLOCK_TO_BB (pred)]);
!       sbitmap_a_or_b (ancestor_edges[bb], ancestor_edges[bb], ancestor_edges[BLOCK_TO_BB (pred)]);
  
!       SET_BIT (ancestor_edges[bb], EDGE_TO_BIT (nxt_in_edge));
  
!       nr_out_edges = 1;
!       nr_rgn_out_edges = 0;
!       fst_out_edge = OUT_EDGES (pred);
!       nxt_out_edge = NEXT_OUT (fst_out_edge);
  
!       sbitmap_a_or_b (pot_split[bb], pot_split[bb], pot_split[BLOCK_TO_BB (pred)]);
  
!       SET_BIT (pot_split[bb], EDGE_TO_BIT (fst_out_edge));
! 
!       /* The successor doesn't belong in the region?  */
!       if (CONTAINING_RGN (TO_BLOCK (fst_out_edge)) !=
! 	  CONTAINING_RGN (BB_TO_BLOCK (bb)))
! 	++nr_rgn_out_edges;
  
!       while (fst_out_edge != nxt_out_edge)
  	{
  	  ++nr_out_edges;
  	  /* The successor doesn't belong in the region?  */
! 	  if (CONTAINING_RGN (TO_BLOCK (nxt_out_edge)) !=
! 	      CONTAINING_RGN (BB_TO_BLOCK (bb)))
  	    ++nr_rgn_out_edges;
- 	  SET_BIT (pot_split[bb], EDGE_TO_BIT (nxt_out_edge));
- 	  nxt_out_edge = NEXT_OUT (nxt_out_edge);
  
  	}
  
        /* Now nr_rgn_out_edges is the number of region-exit edges from
--- 923,963 ----
        return;
      }
  
    /* Initialize dom[bb] to '111..1'.  */
    sbitmap_ones (dom[bb]);
  
!   for (in_edge = BASIC_BLOCK (BB_TO_BLOCK (bb))->pred;
!        in_edge;
!        in_edge = in_edge->pred_next)
      {
!       if (in_edge->src == ENTRY_BLOCK_PTR)
! 	continue;
  
!       pred_bb = BLOCK_TO_BB (in_edge->src->index);
!       sbitmap_a_and_b (dom[bb], dom[bb], dom[pred_bb]);
!       sbitmap_a_or_b (ancestor_edges[bb],
! 		      ancestor_edges[bb], ancestor_edges[pred_bb]);
  
!       SET_BIT (ancestor_edges[bb], EDGE_TO_BIT (in_edge));
  
!       sbitmap_a_or_b (pot_split[bb], pot_split[bb], pot_split[pred_bb]);
  
!       nr_out_edges = 0;
!       nr_rgn_out_edges = 0;
  
!       for (out_edge = in_edge->src->succ;
! 	   out_edge;
! 	   out_edge = out_edge->succ_next)
  	{
  	  ++nr_out_edges;
+ 
  	  /* The successor doesn't belong in the region?  */
! 	  if (out_edge->dest != EXIT_BLOCK_PTR
! 	      && CONTAINING_RGN (out_edge->dest->index)
! 		 != CONTAINING_RGN (BB_TO_BLOCK (bb)))
  	    ++nr_rgn_out_edges;
  
+ 	  SET_BIT (pot_split[bb], EDGE_TO_BIT (out_edge));
  	}
  
        /* Now nr_rgn_out_edges is the number of region-exit edges from
*************** compute_dom_prob_ps (int bb)
*** 1091,1102 ****
           not leaving the region.  */
        nr_out_edges -= nr_rgn_out_edges;
        if (nr_rgn_out_edges > 0)
! 	prob[bb] += 0.9 * prob[BLOCK_TO_BB (pred)] / nr_out_edges;
        else
! 	prob[bb] += prob[BLOCK_TO_BB (pred)] / nr_out_edges;
!       nxt_in_edge = NEXT_IN (nxt_in_edge);
      }
-   while (fst_in_edge != nxt_in_edge);
  
    SET_BIT (dom[bb], bb);
    sbitmap_difference (pot_split[bb], pot_split[bb], ancestor_edges[bb]);
--- 965,974 ----
           not leaving the region.  */
        nr_out_edges -= nr_rgn_out_edges;
        if (nr_rgn_out_edges > 0)
! 	prob[bb] += 0.9 * prob[pred_bb] / nr_out_edges;
        else
! 	prob[bb] += prob[pred_bb] / nr_out_edges;
      }
  
    SET_BIT (dom[bb], bb);
    sbitmap_difference (pot_split[bb], pot_split[bb], ancestor_edges[bb]);
*************** split_edges (int bb_src, int bb_trg, edg
*** 1118,1124 ****
    sbitmap_copy (src, pot_split[bb_src]);
  
    sbitmap_difference (src, src, pot_split[bb_trg]);
!   extract_bitlst (src, bl);
    sbitmap_free (src);
  }
  
--- 990,996 ----
    sbitmap_copy (src, pot_split[bb_src]);
  
    sbitmap_difference (src, src, pot_split[bb_trg]);
!   extract_edgelst (src, bl);
    sbitmap_free (src);
  }
  
*************** compute_trg_info (int trg)
*** 1131,1138 ****
  {
    candidate *sp;
    edgelst el;
!   int check_block, update_idx;
!   int i, j, k, fst_edge, nxt_edge;
  
    /* Define some of the fields for the target bb as well.  */
    sp = candidate_table + trg;
--- 1003,1011 ----
  {
    candidate *sp;
    edgelst el;
!   int i, j, k, update_idx;
!   basic_block block;
!   edge e;
  
    /* Define some of the fields for the target bb as well.  */
    sp = candidate_table + trg;
*************** compute_trg_info (int trg)
*** 1161,1175 ****
  
        if (sp->is_valid)
  	{
- 	  char *update_blocks;
- 
  	  /* Compute split blocks and store them in bblst_table.
  	     The TO block of every split edge is a split block.  */
  	  sp->split_bbs.first_member = &bblst_table[bblst_last];
  	  sp->split_bbs.nr_members = el.nr_members;
  	  for (j = 0; j < el.nr_members; bblst_last++, j++)
! 	    bblst_table[bblst_last] =
! 	      TO_BLOCK (rgn_edges[el.first_member[j]]);
  	  sp->update_bbs.first_member = &bblst_table[bblst_last];
  
  	  /* Compute update blocks and store them in bblst_table.
--- 1034,1045 ----
  
        if (sp->is_valid)
  	{
  	  /* Compute split blocks and store them in bblst_table.
  	     The TO block of every split edge is a split block.  */
  	  sp->split_bbs.first_member = &bblst_table[bblst_last];
  	  sp->split_bbs.nr_members = el.nr_members;
  	  for (j = 0; j < el.nr_members; bblst_last++, j++)
! 	    bblst_table[bblst_last] = el.first_member[j]->dest;
  	  sp->update_bbs.first_member = &bblst_table[bblst_last];
  
  	  /* Compute update blocks and store them in bblst_table.
*************** compute_trg_info (int trg)
*** 1178,1213 ****
  	     add the TO block to the update block list.  This list can end
  	     up with a lot of duplicates.  We need to weed them out to avoid
  	     overrunning the end of the bblst_table.  */
- 	  update_blocks = alloca (last_basic_block);
- 	  memset (update_blocks, 0, last_basic_block);
  
  	  update_idx = 0;
  	  for (j = 0; j < el.nr_members; j++)
  	    {
! 	      check_block = FROM_BLOCK (rgn_edges[el.first_member[j]]);
! 	      fst_edge = nxt_edge = OUT_EDGES (check_block);
! 	      do
  		{
! 		  if (! update_blocks[TO_BLOCK (nxt_edge)])
  		    {
  		      for (k = 0; k < el.nr_members; k++)
! 			if (EDGE_TO_BIT (nxt_edge) == el.first_member[k])
  			  break;
  
  		      if (k >= el.nr_members)
  			{
! 			  bblst_table[bblst_last++] = TO_BLOCK (nxt_edge);
! 			  update_blocks[TO_BLOCK (nxt_edge)] = 1;
  			  update_idx++;
  			}
  		    }
- 
- 		  nxt_edge = NEXT_OUT (nxt_edge);
  		}
- 	      while (fst_edge != nxt_edge);
  	    }
  	  sp->update_bbs.nr_members = update_idx;
  
  	  /* Make sure we didn't overrun the end of bblst_table.  */
  	  gcc_assert (bblst_last <= bblst_size);
  	}
--- 1048,1080 ----
  	     add the TO block to the update block list.  This list can end
  	     up with a lot of duplicates.  We need to weed them out to avoid
  	     overrunning the end of the bblst_table.  */
  
  	  update_idx = 0;
  	  for (j = 0; j < el.nr_members; j++)
  	    {
! 	      block = el.first_member[j]->src;
! 	      for (e = block->succ; e; e = e->succ_next)
  		{
! 		  if (!(e->dest->flags & BB_VISITED))
  		    {
  		      for (k = 0; k < el.nr_members; k++)
! 			if (e == el.first_member[k])
  			  break;
  
  		      if (k >= el.nr_members)
  			{
! 			  bblst_table[bblst_last++] = e->dest;
! 			  e->dest->flags |= BB_VISITED;
  			  update_idx++;
  			}
  		    }
  		}
  	    }
  	  sp->update_bbs.nr_members = update_idx;
  
+ 	  FOR_ALL_BB (block)
+ 	    block->flags &= ~BB_VISITED;
+ 
  	  /* Make sure we didn't overrun the end of bblst_table.  */
  	  gcc_assert (bblst_last <= bblst_size);
  	}
*************** debug_candidate (int i)
*** 1237,1243 ****
        fprintf (sched_dump, "split path: ");
        for (j = 0; j < candidate_table[i].split_bbs.nr_members; j++)
  	{
! 	  int b = candidate_table[i].split_bbs.first_member[j];
  
  	  fprintf (sched_dump, " %d ", b);
  	}
--- 1104,1110 ----
        fprintf (sched_dump, "split path: ");
        for (j = 0; j < candidate_table[i].split_bbs.nr_members; j++)
  	{
! 	  int b = candidate_table[i].split_bbs.first_member[j]->index;
  
  	  fprintf (sched_dump, " %d ", b);
  	}
*************** debug_candidate (int i)
*** 1246,1252 ****
        fprintf (sched_dump, "update path: ");
        for (j = 0; j < candidate_table[i].update_bbs.nr_members; j++)
  	{
! 	  int b = candidate_table[i].update_bbs.first_member[j];
  
  	  fprintf (sched_dump, " %d ", b);
  	}
--- 1113,1119 ----
        fprintf (sched_dump, "update path: ");
        for (j = 0; j < candidate_table[i].update_bbs.nr_members; j++)
  	{
! 	  int b = candidate_table[i].update_bbs.first_member[j]->index;
  
  	  fprintf (sched_dump, " %d ", b);
  	}
*************** check_live_1 (int src, rtx x)
*** 1323,1332 ****
  	    {
  	      for (i = 0; i < candidate_table[src].split_bbs.nr_members; i++)
  		{
! 		  int b = candidate_table[src].split_bbs.first_member[i];
  
! 		  if (REGNO_REG_SET_P (BASIC_BLOCK (b)->global_live_at_start,
! 				       regno + j))
  		    {
  		      return 0;
  		    }
--- 1190,1198 ----
  	    {
  	      for (i = 0; i < candidate_table[src].split_bbs.nr_members; i++)
  		{
! 		  basic_block b = candidate_table[src].split_bbs.first_member[i];
  
! 		  if (REGNO_REG_SET_P (b->global_live_at_start, regno + j))
  		    {
  		      return 0;
  		    }
*************** check_live_1 (int src, rtx x)
*** 1338,1346 ****
  	  /* Check for pseudo registers.  */
  	  for (i = 0; i < candidate_table[src].split_bbs.nr_members; i++)
  	    {
! 	      int b = candidate_table[src].split_bbs.first_member[i];
  
! 	      if (REGNO_REG_SET_P (BASIC_BLOCK (b)->global_live_at_start, regno))
  		{
  		  return 0;
  		}
--- 1204,1212 ----
  	  /* Check for pseudo registers.  */
  	  for (i = 0; i < candidate_table[src].split_bbs.nr_members; i++)
  	    {
! 	      basic_block b = candidate_table[src].split_bbs.first_member[i];
  
! 	      if (REGNO_REG_SET_P (b->global_live_at_start, regno))
  		{
  		  return 0;
  		}
*************** update_live_1 (int src, rtx x)
*** 1397,1406 ****
  	    {
  	      for (i = 0; i < candidate_table[src].update_bbs.nr_members; i++)
  		{
! 		  int b = candidate_table[src].update_bbs.first_member[i];
  
! 		  SET_REGNO_REG_SET (BASIC_BLOCK (b)->global_live_at_start,
! 				     regno + j);
  		}
  	    }
  	}
--- 1263,1271 ----
  	    {
  	      for (i = 0; i < candidate_table[src].update_bbs.nr_members; i++)
  		{
! 		  basic_block b = candidate_table[src].update_bbs.first_member[i];
  
! 		  SET_REGNO_REG_SET (b->global_live_at_start, regno + j);
  		}
  	    }
  	}
*************** update_live_1 (int src, rtx x)
*** 1408,1416 ****
  	{
  	  for (i = 0; i < candidate_table[src].update_bbs.nr_members; i++)
  	    {
! 	      int b = candidate_table[src].update_bbs.first_member[i];
  
! 	      SET_REGNO_REG_SET (BASIC_BLOCK (b)->global_live_at_start, regno);
  	    }
  	}
      }
--- 1273,1281 ----
  	{
  	  for (i = 0; i < candidate_table[src].update_bbs.nr_members; i++)
  	    {
! 	      basic_block b = candidate_table[src].update_bbs.first_member[i];
  
! 	      SET_REGNO_REG_SET (b->global_live_at_start, regno);
  	    }
  	}
      }
*************** update_live (rtx insn, int src)
*** 1467,1473 ****
    (bb_from == bb_to							\
     || IS_RGN_ENTRY (bb_from)						\
     || (TEST_BIT (ancestor_edges[bb_to],					\
! 		 EDGE_TO_BIT (IN_EDGES (BB_TO_BLOCK (bb_from))))))
  
  /* Turns on the fed_by_spec_load flag for insns fed by load_insn.  */
  
--- 1332,1338 ----
    (bb_from == bb_to							\
     || IS_RGN_ENTRY (bb_from)						\
     || (TEST_BIT (ancestor_edges[bb_to],					\
! 		 EDGE_TO_BIT (BASIC_BLOCK (BB_TO_BLOCK (bb_from))->pred))))
  
  /* Turns on the fed_by_spec_load flag for insns fed by load_insn.  */
  
*************** is_pfree (rtx load_insn, int bb_src, int
*** 1604,1610 ****
  		    /* insn2 is the similar load, in the target block.  */
  		    return 1;
  
! 		  if (*(candp->split_bbs.first_member) == BLOCK_NUM (insn2))
  		    /* insn2 is a similar load, in a split-block.  */
  		    return 1;
  		}
--- 1469,1475 ----
  		    /* insn2 is the similar load, in the target block.  */
  		    return 1;
  
! 		  if (*(candp->split_bbs.first_member) == BLOCK_FOR_INSN (insn2))
  		    /* insn2 is a similar load, in a split-block.  */
  		    return 1;
  		}
*************** init_ready_list (struct ready_list *read
*** 1738,1747 ****
  	 the TO blocks of region edges, so there can be at most rgn_nr_edges
  	 of them.  */
        bblst_size = (current_nr_blocks - target_bb) * rgn_nr_edges;
!       bblst_table = xmalloc (bblst_size * sizeof (int));
  
!       bitlst_table_last = 0;
!       bitlst_table = xmalloc (rgn_nr_edges * sizeof (int));
  
        compute_trg_info (target_bb);
      }
--- 1603,1612 ----
  	 the TO blocks of region edges, so there can be at most rgn_nr_edges
  	 of them.  */
        bblst_size = (current_nr_blocks - target_bb) * rgn_nr_edges;
!       bblst_table = xmalloc (bblst_size * sizeof (basic_block));
  
!       edgelst_last = 0;
!       edgelst_table = xmalloc (rgn_nr_edges * sizeof (edge));
  
        compute_trg_info (target_bb);
      }
*************** concat_insn_mem_list (rtx copy_insns, rt
*** 2119,2190 ****
  static void
  propagate_deps (int bb, struct deps *pred_deps)
  {
!   int b = BB_TO_BLOCK (bb);
!   int e, first_edge;
  
    /* bb's structures are inherited by its successors.  */
!   first_edge = e = OUT_EDGES (b);
!   if (e > 0)
!     do
!       {
! 	int b_succ = TO_BLOCK (e);
! 	int bb_succ = BLOCK_TO_BB (b_succ);
! 	struct deps *succ_deps = bb_deps + bb_succ;
! 	int reg;
! 
! 	/* Only bbs "below" bb, in the same region, are interesting.  */
! 	if (CONTAINING_RGN (b) != CONTAINING_RGN (b_succ)
! 	    || bb_succ <= bb)
! 	  {
! 	    e = NEXT_OUT (e);
! 	    continue;
! 	  }
  
! 	/* The reg_last lists are inherited by bb_succ.  */
! 	EXECUTE_IF_SET_IN_REG_SET (&pred_deps->reg_last_in_use, 0, reg,
! 	  {
! 	    struct deps_reg *pred_rl = &pred_deps->reg_last[reg];
! 	    struct deps_reg *succ_rl = &succ_deps->reg_last[reg];
  
! 	    succ_rl->uses = concat_INSN_LIST (pred_rl->uses, succ_rl->uses);
! 	    succ_rl->sets = concat_INSN_LIST (pred_rl->sets, succ_rl->sets);
! 	    succ_rl->clobbers = concat_INSN_LIST (pred_rl->clobbers,
! 						  succ_rl->clobbers);
! 	    succ_rl->uses_length += pred_rl->uses_length;
! 	    succ_rl->clobbers_length += pred_rl->clobbers_length;
! 	  });
! 	IOR_REG_SET (&succ_deps->reg_last_in_use, &pred_deps->reg_last_in_use);
! 
! 	/* Mem read/write lists are inherited by bb_succ.  */
! 	concat_insn_mem_list (pred_deps->pending_read_insns,
! 			      pred_deps->pending_read_mems,
! 			      &succ_deps->pending_read_insns,
! 			      &succ_deps->pending_read_mems);
! 	concat_insn_mem_list (pred_deps->pending_write_insns,
! 			      pred_deps->pending_write_mems,
! 			      &succ_deps->pending_write_insns,
! 			      &succ_deps->pending_write_mems);
! 
! 	succ_deps->last_pending_memory_flush
! 	  = concat_INSN_LIST (pred_deps->last_pending_memory_flush,
! 			      succ_deps->last_pending_memory_flush);
! 
! 	succ_deps->pending_lists_length += pred_deps->pending_lists_length;
! 	succ_deps->pending_flush_length += pred_deps->pending_flush_length;
! 
! 	/* last_function_call is inherited by bb_succ.  */
! 	succ_deps->last_function_call
! 	  = concat_INSN_LIST (pred_deps->last_function_call,
! 			      succ_deps->last_function_call);
  
! 	/* sched_before_next_call is inherited by bb_succ.  */
! 	succ_deps->sched_before_next_call
! 	  = concat_INSN_LIST (pred_deps->sched_before_next_call,
! 			      succ_deps->sched_before_next_call);
  
! 	e = NEXT_OUT (e);
!       }
!     while (e != first_edge);
  
    /* These lists should point to the right place, for correct
       freeing later.  */
--- 1984,2048 ----
  static void
  propagate_deps (int bb, struct deps *pred_deps)
  {
!   basic_block block = BASIC_BLOCK (BB_TO_BLOCK (bb));
!   edge e;
  
    /* bb's structures are inherited by its successors.  */
!   for (e = block->succ; e; e = e->succ_next)
!     {
!       struct deps *succ_deps;
!       int reg;
  
!       /* Only bbs "below" bb, in the same region, are interesting.  */
!       if (e->dest == EXIT_BLOCK_PTR
! 	  || CONTAINING_RGN (block->index) != CONTAINING_RGN (e->dest->index)
! 	  || BLOCK_TO_BB (e->dest->index) <= bb)
! 	continue;
  
!       succ_deps = bb_deps + BLOCK_TO_BB (e->dest->index);
  
!       /* The reg_last lists are inherited by successor.  */
!       EXECUTE_IF_SET_IN_REG_SET (&pred_deps->reg_last_in_use, 0, reg,
! 	{
! 	  struct deps_reg *pred_rl = &pred_deps->reg_last[reg];
! 	  struct deps_reg *succ_rl = &succ_deps->reg_last[reg];
  
! 	  succ_rl->uses = concat_INSN_LIST (pred_rl->uses, succ_rl->uses);
! 	  succ_rl->sets = concat_INSN_LIST (pred_rl->sets, succ_rl->sets);
! 	  succ_rl->clobbers = concat_INSN_LIST (pred_rl->clobbers,
! 						succ_rl->clobbers);
! 	  succ_rl->uses_length += pred_rl->uses_length;
! 	  succ_rl->clobbers_length += pred_rl->clobbers_length;
! 	});
!       IOR_REG_SET (&succ_deps->reg_last_in_use, &pred_deps->reg_last_in_use);
! 
!       /* Mem read/write lists are inherited by successor.  */
!       concat_insn_mem_list (pred_deps->pending_read_insns,
! 			    pred_deps->pending_read_mems,
! 			    &succ_deps->pending_read_insns,
! 			    &succ_deps->pending_read_mems);
!       concat_insn_mem_list (pred_deps->pending_write_insns,
! 			    pred_deps->pending_write_mems,
! 			    &succ_deps->pending_write_insns,
! 			    &succ_deps->pending_write_mems);
! 
!       succ_deps->last_pending_memory_flush
! 	= concat_INSN_LIST (pred_deps->last_pending_memory_flush,
! 			    succ_deps->last_pending_memory_flush);
! 
!       succ_deps->pending_lists_length += pred_deps->pending_lists_length;
!       succ_deps->pending_flush_length += pred_deps->pending_flush_length;
! 
!       /* last_function_call is inherited by successor.  */
!       succ_deps->last_function_call
! 	= concat_INSN_LIST (pred_deps->last_function_call,
! 			      succ_deps->last_function_call);
! 
!       /* sched_before_next_call is inherited by successor.  */
!       succ_deps->sched_before_next_call
! 	= concat_INSN_LIST (pred_deps->sched_before_next_call,
! 			    succ_deps->sched_before_next_call);
!     }
  
    /* These lists should point to the right place, for correct
       freeing later.  */
*************** sched_is_disabled_for_current_region_p (
*** 2351,2356 ****
--- 2209,2216 ----
  static void
  schedule_region (int rgn)
  {
+   basic_block block;
+   edge e;
    int bb;
    int rgn_n_insns = 0;
    int sched_rgn_n_insns = 0;
*************** schedule_region (int rgn)
*** 2400,2423 ****
    /* Compute interblock info: probabilities, split-edges, dominators, etc.  */
    if (current_nr_blocks > 1)
      {
-       int i;
- 
        prob = xmalloc ((current_nr_blocks) * sizeof (float));
  
        dom = sbitmap_vector_alloc (current_nr_blocks, current_nr_blocks);
        sbitmap_vector_zero (dom, current_nr_blocks);
!       /* Edge to bit.  */
        rgn_nr_edges = 0;
!       edge_to_bit = xmalloc (nr_edges * sizeof (int));
!       for (i = 1; i < nr_edges; i++)
! 	if (CONTAINING_RGN (FROM_BLOCK (i)) == rgn)
! 	  EDGE_TO_BIT (i) = rgn_nr_edges++;
!       rgn_edges = xmalloc (rgn_nr_edges * sizeof (int));
  
        rgn_nr_edges = 0;
!       for (i = 1; i < nr_edges; i++)
! 	if (CONTAINING_RGN (FROM_BLOCK (i)) == (rgn))
! 	  rgn_edges[rgn_nr_edges++] = i;
  
        /* Split edges.  */
        pot_split = sbitmap_vector_alloc (current_nr_blocks, rgn_nr_edges);
--- 2260,2289 ----
    /* Compute interblock info: probabilities, split-edges, dominators, etc.  */
    if (current_nr_blocks > 1)
      {
        prob = xmalloc ((current_nr_blocks) * sizeof (float));
  
        dom = sbitmap_vector_alloc (current_nr_blocks, current_nr_blocks);
        sbitmap_vector_zero (dom, current_nr_blocks);
! 
!       /* Use ->aux to implement EDGE_TO_BIT mapping.  */
        rgn_nr_edges = 0;
!       FOR_EACH_BB (block)
! 	{
! 	  if (CONTAINING_RGN (block->index) != rgn)
! 	    continue;
! 	  for (e = block->succ; e; e = e->succ_next)
! 	    SET_EDGE_TO_BIT (e, rgn_nr_edges++);
! 	}
  
+       rgn_edges = xmalloc (rgn_nr_edges * sizeof (edge));
        rgn_nr_edges = 0;
!       FOR_EACH_BB (block)
! 	{
! 	  if (CONTAINING_RGN (block->index) != rgn)
! 	    continue;
! 	  for (e = block->succ; e; e = e->succ_next)
! 	    rgn_edges[rgn_nr_edges++] = e;
! 	}
  
        /* Split edges.  */
        pot_split = sbitmap_vector_alloc (current_nr_blocks, rgn_nr_edges);
*************** schedule_region (int rgn)
*** 2493,2499 ****
  	{
  	  free (candidate_table);
  	  free (bblst_table);
! 	  free (bitlst_table);
  	}
      }
  
--- 2359,2365 ----
  	{
  	  free (candidate_table);
  	  free (bblst_table);
! 	  free (edgelst_table);
  	}
      }
  
*************** schedule_region (int rgn)
*** 2520,2530 ****
  
    if (current_nr_blocks > 1)
      {
        free (prob);
        sbitmap_vector_free (dom);
        sbitmap_vector_free (pot_split);
        sbitmap_vector_free (ancestor_edges);
-       free (edge_to_bit);
        free (rgn_edges);
      }
  }
--- 2386,2404 ----
  
    if (current_nr_blocks > 1)
      {
+       /* Cleanup ->aux used for EDGE_TO_BIT mapping.  */
+       FOR_EACH_BB (block)
+ 	{
+ 	  if (CONTAINING_RGN (block->index) != rgn)
+ 	    continue;
+ 	  for (e = block->succ; e; e = e->succ_next)
+ 	    e->aux = NULL;
+ 	}
+ 
        free (prob);
        sbitmap_vector_free (dom);
        sbitmap_vector_free (pot_split);
        sbitmap_vector_free (ancestor_edges);
        free (rgn_edges);
      }
  }
*************** init_regions (void)
*** 2550,2597 ****
    /* Compute regions for scheduling.  */
    if (reload_completed
        || n_basic_blocks == 1
!       || !flag_schedule_interblock)
      {
        find_single_block_region ();
      }
    else
      {
!       /* Verify that a 'good' control flow graph can be built.  */
!       if (is_cfg_nonregular ())
! 	{
! 	  find_single_block_region ();
! 	}
!       else
! 	{
! 	  struct edge_list *edge_list;
! 
! 	  /* The scheduler runs after estimate_probabilities; therefore, we
! 	     can't blindly call back into find_basic_blocks since doing so
! 	     could invalidate the branch probability info.  We could,
! 	     however, call cleanup_cfg.  */
! 	  edge_list = create_edge_list ();
! 
! 	  /* Compute the dominators and post dominators.  */
! 	  calculate_dominance_info (CDI_DOMINATORS);
! 
! 	  /* build_control_flow will return nonzero if it detects unreachable
! 	     blocks or any other irregularity with the cfg which prevents
! 	     cross block scheduling.  */
! 	  if (build_control_flow (edge_list) != 0)
! 	    find_single_block_region ();
! 	  else
! 	    find_rgns (edge_list);
  
! 	  if (sched_verbose >= 3)
! 	    debug_regions ();
  
! 	  /* We are done with flow's edge list.  */
! 	  free_edge_list (edge_list);
  
! 	  /* For now.  This will move as more and more of haifa is converted
! 	     to using the cfg code in flow.c.  */
! 	  free_dominance_info (CDI_DOMINATORS);
! 	}
      }
  
  
--- 2424,2448 ----
    /* Compute regions for scheduling.  */
    if (reload_completed
        || n_basic_blocks == 1
!       || !flag_schedule_interblock
!       || is_cfg_nonregular ())
      {
        find_single_block_region ();
      }
    else
      {
!       /* Compute the dominators and post dominators.  */
!       calculate_dominance_info (CDI_DOMINATORS);
  
!       /* Find regions.  */
!       find_rgns ();
  
!       if (sched_verbose >= 3)
! 	debug_regions ();
  
!       /* For now.  This will move as more and more of haifa is converted
! 	 to using the cfg code in flow.c.  */
!       free_dominance_info (CDI_DOMINATORS);
      }
  
  
*************** schedule_insns (FILE *dump_file)
*** 2740,2762 ****
  
    sched_finish ();
  
-   if (edge_table)
-     {
-       free (edge_table);
-       edge_table = NULL;
-     }
- 
-   if (in_edges)
-     {
-       free (in_edges);
-       in_edges = NULL;
-     }
-   if (out_edges)
-     {
-       free (out_edges);
-       out_edges = NULL;
-     }
- 
    sbitmap_free (blocks);
    sbitmap_free (large_region_blocks);
  }
--- 2591,2596 ----

-- 
  Dr. Ulrich Weigand
  weigand@informatik.uni-erlangen.de

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Re: Bad code generated by inter-block scheduling
  2004-09-24  1:21   ` [PATCH] " Ulrich Weigand
@ 2004-09-24  6:46     ` Jeffrey A Law
  2004-09-24 18:16       ` Ulrich Weigand
  0 siblings, 1 reply; 6+ messages in thread
From: Jeffrey A Law @ 2004-09-24  6:46 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: gcc, gcc-patches

On Thu, 2004-09-23 at 18:11, Ulrich Weigand wrote:
> Maybe there's some way of retrieving this information from
> the dominator graph as well, but right now I don't really want
> to do actual algorithmic changes, just fix my bug ;-)
I understand...


> Instead of piling more hacks on top, I've tried to completely
> remove 'haifa_edge' and everything related to it, and use the
> standard CFG data structures instead -- this way we will
> automatically get the exit edges.  This went quite smoothly,
> I only had to change a couple of other places to use basic_block
> or edge pointers instead of indices.
Way way way cool.  That code has been on my hitlist for about 6
years now.  In the defense of the original haifa-sched developers,
their work pre-dates having a CFG, so they did just about everything
they needed inside haifa.  Removing that code as we've made the
CFG a first class structure within GCC hasn't progressed as well as
I would have liked.

I'll conditionally approve the patch -- however, the condition is
that it's updated for the edge vector changes after they go in, then
retested.  Please post the patch that actually gets installed.
[ The edge vector changes are bloody huge and I'd like to get them
  installed before we create conflicts for merging in that code. ]


> I now don't even need to call create_edge_list at all, the
> only other user of the edge_list, find_rgns, could easily
> be changed to just use the regular CFG data structures.
That would be a greatly appreciated follow-on patch.


Jeff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Re: Bad code generated by inter-block scheduling
  2004-09-24  6:46     ` Jeffrey A Law
@ 2004-09-24 18:16       ` Ulrich Weigand
  0 siblings, 0 replies; 6+ messages in thread
From: Ulrich Weigand @ 2004-09-24 18:16 UTC (permalink / raw)
  To: law; +Cc: Ulrich Weigand, gcc, gcc-patches

Jeff Law wrote:
> I'll conditionally approve the patch -- however, the condition is
> that it's updated for the edge vector changes after they go in, then
> retested.  Please post the patch that actually gets installed.
> [ The edge vector changes are bloody huge and I'd like to get them
>   installed before we create conflicts for merging in that code. ]

OK, I'll do that as soon as the branch is merged.  Thanks!

> > I now don't even need to call create_edge_list at all, the
> > only other user of the edge_list, find_rgns, could easily
> > be changed to just use the regular CFG data structures.
> That would be a greatly appreciated follow-on patch.

I meant that this is already part of the patch I posted ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  weigand@informatik.uni-erlangen.de

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Re: Bad code generated by inter-block scheduling
@ 2004-09-27 14:01 Mircea Namolaru
  0 siblings, 0 replies; 6+ messages in thread
From: Mircea Namolaru @ 2004-09-27 14:01 UTC (permalink / raw)
  To: weigand; +Cc: law, gcc, gcc-patches, Ayal Zaks

> Instead of piling more hacks on top, I've tried to completely
> remove 'haifa_edge' and everything related to it, and use the
> standard CFG data structures instead -- this way we will
> automatically get the exit edges.

When we developed this code years ago the CFG was not maintained
during compilation. It's a good idea to use the standard CFG data 
structures.

> For the decision interblock vs. speculative, I agree that this
> should just be post-dominance.  However, the split-edges stuff
> is also used, in the speculative case, to find all the blocks
> where speculative motion could cause a variable to become live
> at start that previously wasn't.

For a speculative motion we must avoid moving a definition of a
register R into a place where it is currently live.

This is the main reason for computing the split edges. The split
edges also provide an easy way to determine if two basic blocks are 
equivalent without explicit computation of post-dominance relation.

Mircea Namolaru

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-09-27 11:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-09-22 18:52 Bad code generated by inter-block scheduling Ulrich Weigand
2004-09-22 21:36 ` Jeffrey A Law
2004-09-24  1:21   ` [PATCH] " Ulrich Weigand
2004-09-24  6:46     ` Jeffrey A Law
2004-09-24 18:16       ` Ulrich Weigand
2004-09-27 14:01 Mircea Namolaru

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).