public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
@ 2012-11-15 20:10 Teresa Johnson
  2012-11-26 15:55 ` Teresa Johnson
  0 siblings, 1 reply; 35+ messages in thread
From: Teresa Johnson @ 2012-11-15 20:10 UTC (permalink / raw)
  To: reply, davidxl, stevenb.gcc, matthew.gretton-dann,
	christophe.lyon, gcc-patches

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 40962 bytes --]

Revised patch that fixes failures encountered when enabling
-freorder-blocks-and-partition, including the failure reported in PR 53743.

This includes new verification code to ensure no cold blocks dominate hot
blocks contributed by Steven Bosscher.

I attempted to make the handling of partition updates through the optimization
passes much more consistent, removing a number of partial fixes in the code
stream in the process. The code to fixup partitions (including the BB_PARTITION
assignement, region crossing jump notes, and switch text section notes) is
now handled in a few centralized locations. For example, inside
rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers
don't need to attempt the fixup themselves.

For optimization passes that make adjustments to the cfg while in cfg layout
mode that are not easy to fix up incrementally, the new routine
fixup_partitions handles the cleanup globally. This does require calculation
of the dominance relation, however, as far as I can tell the routines which
now invoke this global fixup (try_optimize_cfg and commit_edge_insertions)
are invoked typically once (or a small number of times in the case of
try_optimize_cfg) per optimization pass. Additionally, I compared the
-ftime-report output for some large fdo compilations and saw only minimal
increases in the dominance computation times, which were only a tiny percent
of the overall compile time.

Additionally, I added a flag to the rtl_data structure to indicate whether
any partitioning was actually performed, so that optimizations which were
conservatively disabled whenever the flag_reorder_blocks_and_partition
is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less
conservative for functions where no partitions were formed (e.g. they are
completely hot).

Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int
benchmarks and internal google benchmarks using profile feedback and
-freorder-blocks-and-partition to get more coverage. Ok for trunk?

Thanks,
Teresa

2012-11-14  Teresa Johnson  <tejohnson@google.com>
            Steven Bosscher  <steven@gcc.gnu.org>

	* cfghooks.h (cfg_layout_finalize): New parameter.
	* modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize
        parameter.
	* ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert
        as this is now done by redirect_edge_and_branch_force.
	* function.c (thread_prologue_and_epilogue_insns): Insert new bb after
        barriers, new cfg_layout_finalize parameter, and don't store exit
        predecessor BB until after it is potentially split.
	* function.h (struct rtl_data): New flag has_bb_partition.
	* hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter.
	* cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if
        any blocks in function actually partitioned.
	(try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean
        up partitioning.
	* bb-reorder.c (connect_traces): Only look for partitions and skip
        block copying if any blocks in function actually partitioned.
	(emit_barrier_after_bb): Handle insertion in non-cfglayout mode.
        (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure
        that no cold blocks dominate a hot block.
	(fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert
        as this is now done by force_nonfallthru_and_redirect.
	(add_reg_crossing_jump_notes): Handle the fact that some jumps may
        already be marked with region crossing note.
	(reorder_basic_blocks): Only need to verify partitions if any
        blocks in function actually partitioned.
	(insert_section_boundary_note): Only need to insert note if any
        blocks in function actually partitioned.
	(rest_of_handle_reorder_blocks): New cfg_layout_finalize
        parameter, and remove call to insert_section_boundary_note as this
        is now called via cfg_layout_finalize/fixup_reorder_chain.
	(duplicate_computed_gotos): New cfg_layout_finalize
        parameter.
	(partition_hot_cold_basic_blocks): Set flag indicating function
        has bb partitions.
	* bb-reorder.h: Declare insert_section_boundary_note and
        emit_barrier_after_bb, which are no longer static.
	* basic-block.h: Declare new function fixup_partitions.
	* cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary
        check for region crossing note.
	(fixup_partition_crossing): New function.
	(fixup_bb_partition): Ditto.
	(rtl_redirect_edge_and_branch): Fixup partition boundaries.
	(force_nonfallthru_and_redirect): Fixup partition boundaries,
        remove old code that tried to do this. Emit barrier correctly
        when we are in cfglayout mode.
	(rtl_split_edge): Correctly fixup partition boundaries.
	(commit_one_edge_insertion): Remove old code that tried to
        fixup region crossing edge since this is now handled in
        split_block, and set up insertion point correctly since
        block may now end in a jump.
	(commit_edge_insertions): Invoke fixup_partitions to sanitize partition
        boundaries after optimizations that modify cfg and before trying to
        verify the flow info.
	(fixup_partitions): New function.
	(rtl_verify_flow_info_1): Add verification that no cold bbs dominate
        hot bbs.
	(record_effective_endpoints): Remove region-crossing notes and set flag
        indicating that they need to be reinserted on exit from cfglayout mode.
	(outof_cfg_layout_mode): New cfg_layout_finalize parameter.
	(fixup_reorder_chain): Call insert_section_boundary_note if necessary.
        Remove old code that attempted to fixup region crossing note as
        this is now handled in force_nonfallthru_and_redirect.
	(duplicate_insn_chain): Don't duplicate switch section notes.
	(cfg_layout_finalize): Pass new parameter to fixup_reorder_chain.
	(rtl_can_remove_branch_p): Remove unnecessary check for region crossing
        note.

Index: cfghooks.h
===================================================================
--- cfghooks.h	(revision 193376)
+++ cfghooks.h	(working copy)
@@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas
 void account_profile_record (struct profile_record *, int);
 
 extern void cfg_layout_initialize (unsigned int);
-extern void cfg_layout_finalize (void);
+extern void cfg_layout_finalize (bool);
 
 /* Hooks containers.  */
 extern struct cfg_hooks gimple_cfg_hooks;
@@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi
 extern void gimple_register_cfg_hooks (void);
 extern struct cfg_hooks get_cfg_hooks (void);
 extern void set_cfg_hooks (struct cfg_hooks);
-
Index: modulo-sched.c
===================================================================
--- modulo-sched.c	(revision 193376)
+++ modulo-sched.c	(working copy)
@@ -3354,7 +3354,7 @@ rest_of_handle_sms (void)
     if (bb->next_bb != EXIT_BLOCK_PTR)
       bb->aux = bb->next_bb;
   free_dominance_info (CDI_DOMINATORS);
-  cfg_layout_finalize ();
+  cfg_layout_finalize (false);
 #endif /* INSN_SCHEDULING */
   return 0;
 }
Index: ifcvt.c
===================================================================
--- ifcvt.c	(revision 193376)
+++ ifcvt.c	(working copy)
@@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg
   if (new_bb)
     {
       df_bb_replace (then_bb_index, new_bb);
-      /* Since the fallthru edge was redirected from test_bb to new_bb,
-         we need to ensure that new_bb is in the same partition as
-         test bb (you can not fall through across section boundaries).  */
-      BB_COPY_PARTITION (new_bb, test_bb);
+      /* This should have been done above via force_nonfallthru_and_redirect
+         (possibly called from redirect_edge_and_branch_force).  */
+      gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb));
     }
 
   num_true_changes++;
Index: function.c
===================================================================
--- function.c	(revision 193376)
+++ function.c	(working copy)
@@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void)
 		    break;
 		if (e)
 		  {
-		    copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)),
-						  NULL_RTX, e->src);
+                    /* Make sure we insert after any barriers.  */
+                    rtx end = get_last_bb_insn (e->src);
+                    copy_bb = create_basic_block (NEXT_INSN (end),
+                                                  NULL_RTX, e->src);
 		    BB_COPY_PARTITION (copy_bb, e->src);
 		  }
 		else
@@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void)
 	if (cur_bb->index >= NUM_FIXED_BLOCKS
 	    && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS)
 	  cur_bb->aux = cur_bb->next_bb;
-      cfg_layout_finalize ();
+      cfg_layout_finalize (false);
     }
 
 epilogue_done:
@@ -6517,7 +6519,7 @@ epilogue_done:
       basic_block simple_return_block_cold = NULL;
       edge pending_edge_hot = NULL;
       edge pending_edge_cold = NULL;
-      basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb;
+      basic_block exit_pred;
       int i;
 
       gcc_assert (entry_edge != orig_entry_edge);
@@ -6545,6 +6547,12 @@ epilogue_done:
 	    else
 	      pending_edge_cold = e;
 	  }
+      
+      /* Save a pointer to the exit's predecessor BB for use in
+         inserting new BBs at the end of the function. Do this
+         after the call to split_block above which may split
+         the original exit pred.  */
+      exit_pred = EXIT_BLOCK_PTR->prev_bb;
 
       FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e)
 	{
Index: function.h
===================================================================
--- function.h	(revision 193376)
+++ function.h	(working copy)
@@ -459,6 +459,11 @@ struct GTY(()) rtl_data {
      sched2) and is useful only if the port defines LEAF_REGISTERS.  */
   bool uses_only_leaf_regs;
 
+  /* Nonzero if the function being compiled has undergone hot/cold partitioning
+     (under flag_reorder_blocks_and_partition) and has at least one cold
+     block.  */
+  bool has_bb_partition;
+
   /* Like regs_ever_live, but 1 if a reg is set or clobbered from an
      asm.  Unlike regs_ever_live, elements of this array corresponding
      to eliminable regs (like the frame pointer) are set if an asm
Index: hw-doloop.c
===================================================================
--- hw-doloop.c	(revision 193376)
+++ hw-doloop.c	(working copy)
@@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops)
       else
 	bb->aux = NULL;
     }
-  cfg_layout_finalize ();
+  cfg_layout_finalize (false);
   clear_aux_for_blocks ();
   df_analyze ();
 }
Index: cfgcleanup.c
===================================================================
--- cfgcleanup.c	(revision 193376)
+++ cfgcleanup.c	(working copy)
@@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2,
      partition boundaries).  See the comments at the top of
      bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
 
-  if (flag_reorder_blocks_and_partition && reload_completed)
+  if (crtl->has_bb_partition && reload_completed)
     return false;
 
   /* Search backward through forwarder blocks.  We don't need to worry
@@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode)
 	      df_analyze ();
 	    }
 
+	  if (changed)
+            {
+              /* Edge forwarding in particular can cause hot blocks previously
+                 reached by both hot and cold blocks to become dominated only
+                 by cold blocks. This will cause the verification below to fail,
+                 and lead to now cold code in the hot section. This is not easy
+                 to detect and fix during edge forwarding, and in some cases
+                 is only visible after newly unreachable blocks are deleted,
+                 which will be done in fixup_partitions.  */
+              fixup_partitions ();
+
 #ifdef ENABLE_CHECKING
-	  if (changed)
-	    verify_flow_info ();
+              verify_flow_info ();
 #endif
+            }
 
 	  changed_overall |= changed;
 	  first_pass = false;
Index: bb-reorder.c
===================================================================
--- bb-reorder.c	(revision 193376)
+++ bb-reorder.c	(working copy)
@@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces
   current_partition = BB_PARTITION (traces[0].first);
   two_passes = false;
 
-  if (flag_reorder_blocks_and_partition)
+  if (crtl->has_bb_partition)
     for (i = 0; i < n_traces && !two_passes; i++)
       if (BB_PARTITION (traces[0].first)
 	  != BB_PARTITION (traces[i].first))
@@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces
 		      }
 		  }
 
-	      if (flag_reorder_blocks_and_partition)
+	      if (crtl->has_bb_partition)
 		try_copy = false;
 
 	      /* Copy tiny blocks always; copy larger blocks only when the
@@ -1381,13 +1381,14 @@ get_uncond_jump_length (void)
   return length;
 }
 
-/* Emit a barrier into the footer of BB.  */
+/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode.  */
 
-static void
+void
 emit_barrier_after_bb (basic_block bb)
 {
   rtx barrier = emit_barrier_after (BB_END (bb));
-  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
+  if (current_ir_type () == IR_RTL_CFGLAYOUT)
+    BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
 }
 
 /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.
@@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg
 {
   VEC(edge, heap) *crossing_edges = NULL;
   basic_block bb;
-  edge e;
-  edge_iterator ei;
+  edge e, e2;
+  edge_iterator ei, ei2;
+  unsigned int cold_bb_count = 0;
+  VEC (basic_block, heap) *bbs_in_hot_partition = NULL;
+  VEC (basic_block, heap) *bbs_newly_hot = NULL;
 
   /* Mark which partition (hot/cold) each basic block belongs in.  */
   FOR_EACH_BB (bb)
     {
       if (probably_never_executed_bb_p (cfun, bb))
-	BB_SET_PARTITION (bb, BB_COLD_PARTITION);
+        {
+          BB_SET_PARTITION (bb, BB_COLD_PARTITION);
+          cold_bb_count++;
+        }
       else
-	BB_SET_PARTITION (bb, BB_HOT_PARTITION);
+        {
+          BB_SET_PARTITION (bb, BB_HOT_PARTITION);
+          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb);
+        }
     }
 
+  /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of
+     several different possibilities. One is that there are edge weight insanities
+     due to optimization phases that do not properly update basic block profile
+     counts. The second is that the entry of the function may not be hot, because
+     it is entered fewer times than the number of profile training runs, but there
+     is a loop inside the function that causes blocks within the function to be
+     above the threshold for hotness.  */
+  if (cold_bb_count)
+    {
+      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
+
+      if (dom_calculated_here)
+        calculate_dominance_info (CDI_DOMINATORS);
+
+      /* Keep examining hot bbs until we have either checked them all, or
+         re-marked all cold bbs hot.  */
+      while (! VEC_empty (basic_block, bbs_in_hot_partition)
+             && cold_bb_count)
+        {
+          basic_block dom_bb;
+
+          bb = VEC_pop (basic_block, bbs_in_hot_partition);
+          dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
+
+          /* If bb's immediate dominator is also hot then it is ok.  */
+          if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION)
+            continue;
+
+          /* We have a hot bb with an immediate dominator that is cold.
+             The dominator needs to be re-marked to hot.  */
+          BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION);
+          cold_bb_count--;
+
+          /* Now we need to examine newly-hot dom_bb to see if it is also
+             dominated by a cold bb.  */
+          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb);
+
+          /* We should also adjust any cold blocks that the newly-hot bb
+             feeds and see if it makes sense to re-mark those as hot as
+             well.  */
+          VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb);
+          while (! VEC_empty (basic_block, bbs_newly_hot))
+            {
+              basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot);
+              /* Examine all successors of this newly-hot bb to see if they
+                 are cold and should be re-marked as hot.  */
+              FOR_EACH_EDGE (e, ei, new_hot_bb->succs)
+                {
+                  bool any_cold_preds = false;
+                  basic_block succ = e->dest;
+                  if (BB_PARTITION (succ) != BB_COLD_PARTITION)
+                    continue;
+                  /* Does this block have any cold predecessors now?  */
+                  FOR_EACH_EDGE (e2, ei2, succ->preds)
+                  {
+                    if (BB_PARTITION (e2->src) == BB_COLD_PARTITION)
+                      {
+                        any_cold_preds = true;
+                        break;
+                      }
+                  }
+                  if (any_cold_preds)
+                    continue;
+
+                  /* Here we have a successor of newly-hot bb that is cold
+                     but no longer has any cold precessessors. Since the original
+                     assignment of our newly-hot bb was incorrect, this successor's
+                     assignment as cold is also suspect. Go ahead and re-mark it
+                     as hot now too. Better heuristics may be in order here.  */
+                  BB_SET_PARTITION (succ, BB_HOT_PARTITION);
+                  cold_bb_count--;
+                  VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ);
+                  /* Examine this successor as a newly-hot bb.  */
+                  VEC_safe_push (basic_block, heap, bbs_newly_hot, succ);
+                }
+            }
+        }
+
+      if (dom_calculated_here)
+        free_dominance_info (CDI_DOMINATORS);
+    }
+
   /* The format of .gcc_except_table does not allow landing pads to
      be in a different partition as the throw.  Fix this by either
      moving or duplicating the landing pads.  */
@@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void)
 		      new_bb->aux = cur_bb->aux;
 		      cur_bb->aux = new_bb;
 
-		      /* Make sure new fall-through bb is in same
-			 partition as bb it's falling through from.  */
+                      /* This is done by force_nonfallthru_and_redirect.  */
+		      gcc_assert (BB_PARTITION (new_bb)
+                                  == BB_PARTITION (cur_bb));
 
-		      BB_COPY_PARTITION (new_bb, cur_bb);
 		      single_succ_edge (new_bb)->flags |= EDGE_CROSSING;
 		    }
 		  else
@@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void)
   FOR_EACH_BB (bb)
     FOR_EACH_EDGE (e, ei, bb->succs)
       if ((e->flags & EDGE_CROSSING)
-	  && JUMP_P (BB_END (e->src)))
+	  && JUMP_P (BB_END (e->src))
+          /* Some notes were added during fix_up_fall_thru_edges, via
+             force_nonfallthru_and_redirect.  */
+          && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX))
 	add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
 }
 
@@ -2160,7 +2255,7 @@ reorder_basic_blocks (void)
       dump_flow_info (dump_file, dump_flags);
     }
 
-  if (flag_reorder_blocks_and_partition)
+  if (crtl->has_bb_partition)
     verify_hot_cold_block_grouping ();
 }
 
@@ -2172,14 +2267,14 @@ reorder_basic_blocks (void)
    encountering this note will make the compiler switch between the
    hot and cold text sections.  */
 
-static void
+void
 insert_section_boundary_note (void)
 {
   basic_block bb;
   rtx new_note;
   int first_partition = 0;
 
-  if (!flag_reorder_blocks_and_partition)
+  if (!crtl->has_bb_partition)
     return;
 
   FOR_EACH_BB (bb)
@@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void)
   FOR_EACH_BB (bb)
     if (bb->next_bb != EXIT_BLOCK_PTR)
       bb->aux = bb->next_bb;
-  cfg_layout_finalize ();
+  cfg_layout_finalize (true);
 
-  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
-  insert_section_boundary_note ();
   return 0;
 }
 
@@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void)
     }
 
 done:
-  cfg_layout_finalize ();
+  cfg_layout_finalize (false);
 
   BITMAP_FREE (candidates);
   return 0;
@@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void)
   if (crossing_edges == NULL)
     return 0;
 
+  crtl->has_bb_partition = true;
+
   /* Make sure the source of any crossing edge ends in a jump and the
      destination of any crossing edge has a label.  */
   add_labels_and_missing_jumps (crossing_edges);
Index: bb-reorder.h
===================================================================
--- bb-reorder.h	(revision 193376)
+++ bb-reorder.h	(working copy)
@@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re
 
 extern int get_uncond_jump_length (void);
 
+extern void insert_section_boundary_note (void);
+
+extern void emit_barrier_after_bb (basic_block bb);
+
 #endif
Index: basic-block.h
===================================================================
--- basic-block.h	(revision 193376)
+++ basic-block.h	(working copy)
@@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect
 extern bool contains_no_active_insn_p (const_basic_block);
 extern bool forwarder_block_p (const_basic_block);
 extern bool can_fallthru (basic_block, basic_block);
+extern void fixup_partitions (void);
 
 /* In cfgbuild.c.  */
 extern void find_many_sub_basic_blocks (sbitmap);
Index: cfgrtl.c
===================================================================
--- cfgrtl.c	(revision 193376)
+++ cfgrtl.c	(working copy)
@@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree.h"
 #include "hard-reg-set.h"
 #include "basic-block.h"
+#include "bb-reorder.h"
 #include "regs.h"
 #include "flags.h"
 #include "function.h"
@@ -67,11 +68,12 @@ along with GCC; see the file COPYING3.  If not see
    Only applicable if the CFG is in cfglayout mode.  */
 static GTY(()) rtx cfg_layout_function_footer;
 static GTY(()) rtx cfg_layout_function_header;
+static bool had_sec_boundary_notes;
 
 static rtx skip_insns_after_block (basic_block);
 static void record_effective_endpoints (void);
 static rtx label_for_bb (basic_block);
-static void fixup_reorder_chain (void);
+static void fixup_reorder_chain (bool finalize_reorder_blocks);
 
 void verify_insn_chain (void);
 static void fixup_fallthru_exit_predecessor (void);
@@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc
      partition boundaries).  See  the comments at the top of
      bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
 
-  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
-      || BB_PARTITION (src) != BB_PARTITION (target))
+  if (BB_PARTITION (src) != BB_PARTITION (target))
     return NULL;
 
   /* We can replace or remove a complex jump only when we have exactly
@@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target)
   return e;
 }
 
+/* Called when edge E has been redirected to a new destination,
+   in order to update the region crossing flag on the edge and
+   jump.  */
+
+static void
+fixup_partition_crossing (edge e, basic_block target)
+{
+  rtx note;
+
+  gcc_assert (e->dest == target);
+
+  if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR)
+    return;
+  /* If we redirected an existing edge, it may already be marked
+     crossing, even though the new src is missing a reg crossing note.
+     But make sure reg crossing note doesn't already exist before
+     inserting.  */
+  if (BB_PARTITION (e->src) != BB_PARTITION (target))
+    {
+      e->flags |= EDGE_CROSSING;
+      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
+      if (JUMP_P (BB_END (e->src))
+          && !note)
+        add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
+    }
+  else if (BB_PARTITION (e->src) == BB_PARTITION (target))
+    {
+      e->flags &= ~EDGE_CROSSING;
+      /* Remove the region crossing note from jump at end of
+         e->src if it exists.  */
+      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
+      if (note)
+        remove_note (BB_END (e->src), note);
+    }
+}
+
+/* Called when block BB has been reassigned to a different partition,
+   to ensure that the region crossing attributes are updated.  */
+
+static void
+fixup_bb_partition (basic_block bb)
+{
+  edge e;
+  edge_iterator ei;
+
+  /* Now need to make bb's pred edges non-region crossing.  */
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    {
+      fixup_partition_crossing (e, e->dest);
+    }
+
+  /* Possibly need to make bb's successor edges region crossing,
+     or remove stale region crossing.  */
+  FOR_EACH_EDGE (e, ei, bb->succs)
+    {
+      if ((e->flags & EDGE_FALLTHRU)
+          && BB_PARTITION (bb) != BB_PARTITION (e->dest)
+          && e->dest != EXIT_BLOCK_PTR)
+        /* force_nonfallthru_and_redirect calls fixup_partition_crossing.  */
+        force_nonfallthru (e);
+      else
+        fixup_partition_crossing (e, e->dest);
+    }
+}
+
 /* Attempt to change code to redirect edge E to TARGET.  Don't do that on
    expense of adding new instructions or reordering basic blocks.
 
@@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block
 {
   edge ret;
   basic_block src = e->src;
+  basic_block dest = e->dest;
 
   if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
     return NULL;
 
-  if (e->dest == target)
+  if (dest == target)
     return e;
 
   if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL)
     {
       df_set_bb_dirty (src);
+      fixup_partition_crossing (ret, target);
       return ret;
     }
 
@@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block
     return NULL;
 
   df_set_bb_dirty (src);
+  fixup_partition_crossing (ret, target);
   return ret;
 }
 
@@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
       /* Make sure new block ends up in correct hot/cold section.  */
 
       BB_COPY_PARTITION (jump_block, e->src);
-      if (flag_reorder_blocks_and_partition
-	  && targetm_common.have_named_sections
-	  && JUMP_P (BB_END (jump_block))
-	  && !any_condjump_p (BB_END (jump_block))
-	  && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING))
-	add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX);
 
       /* Wire edge in.  */
       new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU);
       new_edge->probability = probability;
       new_edge->count = count;
 
+      /* If e->src was previously region crossing, it no longer is
+         and the reg crossing note should be removed.  */
+      fixup_partition_crossing (new_edge, jump_block);
+
       /* Redirect old edge.  */
       redirect_edge_pred (e, jump_block);
       e->probability = REG_BR_PROB_BASE;
@@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
       LABEL_NUSES (label)++;
     }
 
-  emit_barrier_after (BB_END (jump_block));
+  /* We might be in cfg layout mode, and if so, the following routine will
+     insert the barrier correctly.  */
+  emit_barrier_after_bb (jump_block);
   redirect_edge_succ_nodup (e, target);
 
   if (abnormal_edge_flags)
     make_edge (src, target, abnormal_edge_flags);
 
   df_mark_solutions_dirty ();
+  fixup_partition_crossing (e, target);
   return new_bb;
 }
 
@@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU
 static basic_block
 rtl_split_edge (edge edge_in)
 {
-  basic_block bb;
+  basic_block bb, new_bb;
   rtx before;
 
   /* Abnormal edges cannot be split.  */
@@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in)
   else
     {
       bb = create_basic_block (before, NULL, edge_in->dest->prev_bb);
-      /* ??? Why not edge_in->dest->prev_bb here?  */
-      BB_COPY_PARTITION (bb, edge_in->dest);
+      if (edge_in->src == ENTRY_BLOCK_PTR)
+        BB_COPY_PARTITION (bb, edge_in->dest);
+      else
+        /* Put the split bb into the src partition, to avoid creating
+           a situation where a cold bb dominates a hot bb, in the case
+           where src is cold and dest is hot. The src will dominate
+           the new bb (whereas it might not have dominated dest).  */
+        BB_COPY_PARTITION (bb, edge_in->src);
     }
 
   make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU);
 
+  /* Can't allow a region crossing edge to be fallthrough.  */
+  if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest)
+      && edge_in->dest != EXIT_BLOCK_PTR)
+    {
+      new_bb = force_nonfallthru (single_succ_edge (bb));
+      gcc_assert (!new_bb);
+    }
+
   /* For non-fallthru edges, we must adjust the predecessor's
      jump instruction to target our new block.  */
   if ((edge_in->flags & EDGE_FALLTHRU) == 0)
@@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e)
   else
     {
       bb = split_edge (e);
-      after = BB_END (bb);
 
-      if (flag_reorder_blocks_and_partition
-	  && targetm_common.have_named_sections
-	  && e->src != ENTRY_BLOCK_PTR
-	  && BB_PARTITION (e->src) == BB_COLD_PARTITION
-	  && !(e->flags & EDGE_CROSSING)
-	  && JUMP_P (after)
-	  && !any_condjump_p (after)
-	  && (single_succ_edge (bb)->flags & EDGE_CROSSING))
-	add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX);
+      /* If e crossed a partition boundary, we needed to make bb end in
+         a region-crossing jump, even though it was originally fallthru.  */
+      if (JUMP_P (BB_END (bb)))
+	before = BB_END (bb);
+      else
+        after = BB_END (bb);
     }
 
   /* Now that we've found the spot, do the insertion.  */
@@ -1827,6 +1907,14 @@ commit_edge_insertions (void)
 {
   basic_block bb;
 
+  /* Optimization passes that invoke this routine can cause hot blocks
+     previously reached by both hot and cold blocks to become dominated only
+     by cold blocks. This will cause the verification below to fail,
+     and lead to now cold code in the hot section. In some cases this
+     may only be visible after newly unreachable blocks are deleted,
+     which will be done by fixup_partitions.  */
+  fixup_partitions ();
+
 #ifdef ENABLE_CHECKING
   verify_flow_info ();
 #endif
@@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb)
 
   return end;
 }
-\f
+
+/* Perform cleanup on the hot/cold bb partitioning after optimization
+   passes that modify the cfg.  */
+
+void
+fixup_partitions (void)
+{
+  basic_block bb;
+
+  if (!crtl->has_bb_partition)
+    return;
+
+  /* Delete any blocks that became unreachable and weren't
+     already cleaned up, for example during edge forwarding
+     and convert_jumps_to_returns. This will expose more
+     opportunities for fixing the partition boundaries here.
+     Also, the calculation of the dominance graph during verification
+     will assert if there are unreachable nodes.  */
+  delete_unreachable_blocks ();
+
+  /* If there are partitions, do a sanity check on them: A basic block in
+     a cold partition cannot dominate a basic block in a hot partition.
+     Fixup any that now violate this requirement, as a result of edge
+     forwarding and unreachable block deletion.  */
+  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
+  VEC (basic_block, heap) *bbs_to_fix = NULL;
+  FOR_EACH_BB (bb)
+    if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
+      VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
+  if (! VEC_empty (basic_block, bbs_in_cold_partition))
+    {
+      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
+      basic_block son;
+
+      if (dom_calculated_here)
+        calculate_dominance_info (CDI_DOMINATORS);
+
+      while (! VEC_empty (basic_block, bbs_in_cold_partition))
+        {
+          bb = VEC_pop (basic_block, bbs_in_cold_partition);
+          /* If bb is not yet cold (because it was added below as
+             a block dominated by a cold bb) then mark it cold here.  */
+          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
+            {
+              BB_SET_PARTITION (bb, BB_COLD_PARTITION);
+              VEC_safe_push (basic_block, heap, bbs_to_fix, bb);
+            }
+          /* Any blocks dominated by a block in the cold section
+             must also be cold.  */
+          for (son = first_dom_son (CDI_DOMINATORS, bb);
+               son;
+               son = next_dom_son (CDI_DOMINATORS, son))
+            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
+        }
+
+      if (dom_calculated_here)
+        free_dominance_info (CDI_DOMINATORS);
+    }
+
+  /* Do the partition fixup after all necessary blocks have been converted to
+     cold, so that we only update the region crossings the minimum number of
+     places, which can require forcing edges to be non fallthru.  */
+  while (! VEC_empty (basic_block, bbs_to_fix))
+    {
+      bb = VEC_pop (basic_block, bbs_to_fix);
+      fixup_bb_partition (bb);
+    }
+}
+
 /* Verify the CFG and RTL consistency common for both underlying RTL and
    cfglayout RTL.
 
@@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void)
   rtx x;
   int err = 0;
   basic_block bb;
+  bool have_partitions = false;
 
   /* Check the general integrity of the basic blocks.  */
   FOR_EACH_BB_REVERSE (bb)
@@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void)
 
 	  if (e->flags & EDGE_ABNORMAL)
 	    n_abnormal++;
+
+          have_partitions |= is_crossing;
 	}
 
       if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX))
@@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void)
 	  }
     }
 
+  /* If there are partitions, do a sanity check on them: A basic block in
+     a cold partition cannot dominate a basic block in a hot partition.  */
+  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
+  if (have_partitions && !err)
+    FOR_EACH_BB (bb)
+      if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
+        VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
+  if (! VEC_empty (basic_block, bbs_in_cold_partition))
+    {
+      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
+      basic_block son;
+
+      if (dom_calculated_here)
+        calculate_dominance_info (CDI_DOMINATORS);
+
+      while (! VEC_empty (basic_block, bbs_in_cold_partition))
+        {
+          bb = VEC_pop (basic_block, bbs_in_cold_partition);
+          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
+            {
+              error ("non-cold basic block %d dominated "
+                     "by a block in the cold partition", bb->index);
+              err = 1;
+            }
+          for (son = first_dom_son (CDI_DOMINATORS, bb);
+               son;
+               son = next_dom_son (CDI_DOMINATORS, son))
+            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
+        }
+
+      if (dom_calculated_here)
+        free_dominance_info (CDI_DOMINATORS);
+    }
+
   /* Clean up.  */
   return err;
 }
@@ -2965,14 +3158,41 @@ record_effective_endpoints (void)
   else
     cfg_layout_function_header = NULL_RTX;
 
+  had_sec_boundary_notes = false;
+
   next_insn = get_insns ();
   FOR_EACH_BB (bb)
     {
       rtx end;
 
       if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb))
-	BB_HEADER (bb) = unlink_insn_chain (next_insn,
-					      PREV_INSN (BB_HEAD (bb)));
+        {
+          /* Rather than try to keep section boundary notes incrementally
+             up-to-date through cfg layout optimizations, simply remove them
+             and flag that they should be re-inserted when exiting
+             cfg layout mode.  */
+          rtx check_insn = next_insn;
+          while (check_insn)
+            {
+              if (NOTE_P (check_insn)
+                  && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS)
+              {
+                had_sec_boundary_notes |= true;
+                /* Remove note from chain. Grab new next_insn first.  */
+                if (next_insn == check_insn)
+                  next_insn = NEXT_INSN (check_insn);
+                /* Delete note.  */
+                delete_insn (check_insn);
+                /* There will only be one.  */
+                break;
+              }
+              check_insn = NEXT_INSN (check_insn);
+            }
+          /* If we still have header instructions left after above loop.  */
+          if (next_insn != BB_HEAD (bb))
+            BB_HEADER (bb) = unlink_insn_chain (next_insn,
+                                                PREV_INSN (BB_HEAD (bb)));
+        }
       end = skip_insns_after_block (bb);
       if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end)
 	BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end);
@@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void)
     if (bb->next_bb != EXIT_BLOCK_PTR)
       bb->aux = bb->next_bb;
 
-  cfg_layout_finalize ();
+  cfg_layout_finalize (false);
 
   return 0;
 }
@@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode)
 }
 \f
 
-/* Given a reorder chain, rearrange the code to match.  */
+/* Given a reorder chain, rearrange the code to match. If
+   this is called when we will FINALIZE_REORDER_BLOCKS, or when
+   section boundary notes were removed on entry to cfg layout
+   mode, insert section boundary notes here.  */
 
 static void
-fixup_reorder_chain (void)
+fixup_reorder_chain (bool finalize_reorder_blocks)
 {
   basic_block bb;
   rtx insn = NULL;
@@ -3150,7 +3373,7 @@ static void
 	  PREV_INSN (BB_HEADER (bb)) = insn;
 	  insn = BB_HEADER (bb);
 	  while (NEXT_INSN (insn))
-	    insn = NEXT_INSN (insn);
+            insn = NEXT_INSN (insn);
 	}
       if (insn)
 	NEXT_INSN (insn) = BB_HEAD (bb);
@@ -3175,6 +3398,11 @@ static void
     insn = NEXT_INSN (insn);
 
   set_last_insn (insn);
+
+  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
+  if (had_sec_boundary_notes || finalize_reorder_blocks)
+    insert_section_boundary_note ();
+
 #ifdef ENABLE_CHECKING
   verify_insn_chain ();
 #endif
@@ -3187,7 +3415,7 @@ static void
       edge e_fall, e_taken, e;
       rtx bb_end_insn;
       rtx ret_label = NULL_RTX;
-      basic_block nb, src_bb;
+      basic_block nb;
       edge_iterator ei;
 
       if (EDGE_COUNT (bb->succs) == 0)
@@ -3322,7 +3550,6 @@ static void
       /* We got here if we need to add a new jump insn. 
 	 Note force_nonfallthru can delete E_FALL and thus we have to
 	 save E_FALL->src prior to the call to force_nonfallthru.  */
-      src_bb = e_fall->src;
       nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label);
       if (nb)
 	{
@@ -3330,17 +3557,6 @@ static void
 	  bb->aux = nb;
 	  /* Don't process this new block.  */
 	  bb = nb;
-
-	  /* Make sure new bb is tagged for correct section (same as
-	     fall-thru source, since you cannot fall-thru across
-	     section boundaries).  */
-	  BB_COPY_PARTITION (src_bb, single_pred (bb));
-	  if (flag_reorder_blocks_and_partition
-	      && targetm_common.have_named_sections
-	      && JUMP_P (BB_END (bb))
-	      && !any_condjump_p (BB_END (bb))
-	      && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING))
-	    add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX);
 	}
     }
 
@@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to)
 	    case NOTE_INSN_FUNCTION_BEG:
 	      /* There is always just single entry to function.  */
 	    case NOTE_INSN_BASIC_BLOCK:
+              /* We should only switch text sections once.  */
+	    case NOTE_INSN_SWITCH_TEXT_SECTIONS:
 	      break;
 
 	    case NOTE_INSN_EPILOGUE_BEG:
-	    case NOTE_INSN_SWITCH_TEXT_SECTIONS:
 	      emit_note_copy (insn);
 	      break;
 
@@ -3759,10 +3976,13 @@ break_superblocks (void)
 }
 
 /* Finalize the changes: reorder insn list according to the sequence specified
-   by aux pointers, enter compensation code, rebuild scope forest.  */
+   by aux pointers, enter compensation code, rebuild scope forest. If
+   this is called when we will FINALIZE_REORDER_BLOCKS, indicate that
+   to fixup_reorder_chain so that it can insert the proper switch text
+   section notes.  */
 
 void
-cfg_layout_finalize (void)
+cfg_layout_finalize (bool finalize_reorder_blocks)
 {
 #ifdef ENABLE_CHECKING
   verify_flow_info ();
@@ -3775,7 +3995,7 @@ void
 #endif
       )
     fixup_fallthru_exit_predecessor ();
-  fixup_reorder_chain ();
+  fixup_reorder_chain (finalize_reorder_blocks);
 
   rebuild_jump_labels (get_insns ());
   delete_dead_jumptables ();
@@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e)
   if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
     return false;
 
-  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
-      || BB_PARTITION (src) != BB_PARTITION (target))
+  if (BB_PARTITION (src) != BB_PARTITION (target))
     return false;
 
   if (!onlyjump_p (insn)

--
This patch is available for review at http://codereview.appspot.com/6823047

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
  2012-11-15 20:10 Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047) Teresa Johnson
@ 2012-11-26 15:55 ` Teresa Johnson
  2012-11-26 16:25   ` Christophe Lyon
  0 siblings, 1 reply; 35+ messages in thread
From: Teresa Johnson @ 2012-11-26 15:55 UTC (permalink / raw)
  To: reply, David Li, Steven Bosscher, Matthew Gretton-Dann,
	Christophe Lyon, gcc-patches

Ping.
Teresa

On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote:
> Revised patch that fixes failures encountered when enabling
> -freorder-blocks-and-partition, including the failure reported in PR 53743.
>
> This includes new verification code to ensure no cold blocks dominate hot
> blocks contributed by Steven Bosscher.
>
> I attempted to make the handling of partition updates through the optimization
> passes much more consistent, removing a number of partial fixes in the code
> stream in the process. The code to fixup partitions (including the BB_PARTITION
> assignement, region crossing jump notes, and switch text section notes) is
> now handled in a few centralized locations. For example, inside
> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers
> don't need to attempt the fixup themselves.
>
> For optimization passes that make adjustments to the cfg while in cfg layout
> mode that are not easy to fix up incrementally, the new routine
> fixup_partitions handles the cleanup globally. This does require calculation
> of the dominance relation, however, as far as I can tell the routines which
> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions)
> are invoked typically once (or a small number of times in the case of
> try_optimize_cfg) per optimization pass. Additionally, I compared the
> -ftime-report output for some large fdo compilations and saw only minimal
> increases in the dominance computation times, which were only a tiny percent
> of the overall compile time.
>
> Additionally, I added a flag to the rtl_data structure to indicate whether
> any partitioning was actually performed, so that optimizations which were
> conservatively disabled whenever the flag_reorder_blocks_and_partition
> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less
> conservative for functions where no partitions were formed (e.g. they are
> completely hot).
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int
> benchmarks and internal google benchmarks using profile feedback and
> -freorder-blocks-and-partition to get more coverage. Ok for trunk?
>
> Thanks,
> Teresa
>
> 2012-11-14  Teresa Johnson  <tejohnson@google.com>
>             Steven Bosscher  <steven@gcc.gnu.org>
>
>         * cfghooks.h (cfg_layout_finalize): New parameter.
>         * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize
>         parameter.
>         * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert
>         as this is now done by redirect_edge_and_branch_force.
>         * function.c (thread_prologue_and_epilogue_insns): Insert new bb after
>         barriers, new cfg_layout_finalize parameter, and don't store exit
>         predecessor BB until after it is potentially split.
>         * function.h (struct rtl_data): New flag has_bb_partition.
>         * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter.
>         * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if
>         any blocks in function actually partitioned.
>         (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean
>         up partitioning.
>         * bb-reorder.c (connect_traces): Only look for partitions and skip
>         block copying if any blocks in function actually partitioned.
>         (emit_barrier_after_bb): Handle insertion in non-cfglayout mode.
>         (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure
>         that no cold blocks dominate a hot block.
>         (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert
>         as this is now done by force_nonfallthru_and_redirect.
>         (add_reg_crossing_jump_notes): Handle the fact that some jumps may
>         already be marked with region crossing note.
>         (reorder_basic_blocks): Only need to verify partitions if any
>         blocks in function actually partitioned.
>         (insert_section_boundary_note): Only need to insert note if any
>         blocks in function actually partitioned.
>         (rest_of_handle_reorder_blocks): New cfg_layout_finalize
>         parameter, and remove call to insert_section_boundary_note as this
>         is now called via cfg_layout_finalize/fixup_reorder_chain.
>         (duplicate_computed_gotos): New cfg_layout_finalize
>         parameter.
>         (partition_hot_cold_basic_blocks): Set flag indicating function
>         has bb partitions.
>         * bb-reorder.h: Declare insert_section_boundary_note and
>         emit_barrier_after_bb, which are no longer static.
>         * basic-block.h: Declare new function fixup_partitions.
>         * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary
>         check for region crossing note.
>         (fixup_partition_crossing): New function.
>         (fixup_bb_partition): Ditto.
>         (rtl_redirect_edge_and_branch): Fixup partition boundaries.
>         (force_nonfallthru_and_redirect): Fixup partition boundaries,
>         remove old code that tried to do this. Emit barrier correctly
>         when we are in cfglayout mode.
>         (rtl_split_edge): Correctly fixup partition boundaries.
>         (commit_one_edge_insertion): Remove old code that tried to
>         fixup region crossing edge since this is now handled in
>         split_block, and set up insertion point correctly since
>         block may now end in a jump.
>         (commit_edge_insertions): Invoke fixup_partitions to sanitize partition
>         boundaries after optimizations that modify cfg and before trying to
>         verify the flow info.
>         (fixup_partitions): New function.
>         (rtl_verify_flow_info_1): Add verification that no cold bbs dominate
>         hot bbs.
>         (record_effective_endpoints): Remove region-crossing notes and set flag
>         indicating that they need to be reinserted on exit from cfglayout mode.
>         (outof_cfg_layout_mode): New cfg_layout_finalize parameter.
>         (fixup_reorder_chain): Call insert_section_boundary_note if necessary.
>         Remove old code that attempted to fixup region crossing note as
>         this is now handled in force_nonfallthru_and_redirect.
>         (duplicate_insn_chain): Don't duplicate switch section notes.
>         (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain.
>         (rtl_can_remove_branch_p): Remove unnecessary check for region crossing
>         note.
>
> Index: cfghooks.h
> ===================================================================
> --- cfghooks.h  (revision 193376)
> +++ cfghooks.h  (working copy)
> @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas
>  void account_profile_record (struct profile_record *, int);
>
>  extern void cfg_layout_initialize (unsigned int);
> -extern void cfg_layout_finalize (void);
> +extern void cfg_layout_finalize (bool);
>
>  /* Hooks containers.  */
>  extern struct cfg_hooks gimple_cfg_hooks;
> @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi
>  extern void gimple_register_cfg_hooks (void);
>  extern struct cfg_hooks get_cfg_hooks (void);
>  extern void set_cfg_hooks (struct cfg_hooks);
> -
> Index: modulo-sched.c
> ===================================================================
> --- modulo-sched.c      (revision 193376)
> +++ modulo-sched.c      (working copy)
> @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void)
>      if (bb->next_bb != EXIT_BLOCK_PTR)
>        bb->aux = bb->next_bb;
>    free_dominance_info (CDI_DOMINATORS);
> -  cfg_layout_finalize ();
> +  cfg_layout_finalize (false);
>  #endif /* INSN_SCHEDULING */
>    return 0;
>  }
> Index: ifcvt.c
> ===================================================================
> --- ifcvt.c     (revision 193376)
> +++ ifcvt.c     (working copy)
> @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg
>    if (new_bb)
>      {
>        df_bb_replace (then_bb_index, new_bb);
> -      /* Since the fallthru edge was redirected from test_bb to new_bb,
> -         we need to ensure that new_bb is in the same partition as
> -         test bb (you can not fall through across section boundaries).  */
> -      BB_COPY_PARTITION (new_bb, test_bb);
> +      /* This should have been done above via force_nonfallthru_and_redirect
> +         (possibly called from redirect_edge_and_branch_force).  */
> +      gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb));
>      }
>
>    num_true_changes++;
> Index: function.c
> ===================================================================
> --- function.c  (revision 193376)
> +++ function.c  (working copy)
> @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void)
>                     break;
>                 if (e)
>                   {
> -                   copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)),
> -                                                 NULL_RTX, e->src);
> +                    /* Make sure we insert after any barriers.  */
> +                    rtx end = get_last_bb_insn (e->src);
> +                    copy_bb = create_basic_block (NEXT_INSN (end),
> +                                                  NULL_RTX, e->src);
>                     BB_COPY_PARTITION (copy_bb, e->src);
>                   }
>                 else
> @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void)
>         if (cur_bb->index >= NUM_FIXED_BLOCKS
>             && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS)
>           cur_bb->aux = cur_bb->next_bb;
> -      cfg_layout_finalize ();
> +      cfg_layout_finalize (false);
>      }
>
>  epilogue_done:
> @@ -6517,7 +6519,7 @@ epilogue_done:
>        basic_block simple_return_block_cold = NULL;
>        edge pending_edge_hot = NULL;
>        edge pending_edge_cold = NULL;
> -      basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb;
> +      basic_block exit_pred;
>        int i;
>
>        gcc_assert (entry_edge != orig_entry_edge);
> @@ -6545,6 +6547,12 @@ epilogue_done:
>             else
>               pending_edge_cold = e;
>           }
> +
> +      /* Save a pointer to the exit's predecessor BB for use in
> +         inserting new BBs at the end of the function. Do this
> +         after the call to split_block above which may split
> +         the original exit pred.  */
> +      exit_pred = EXIT_BLOCK_PTR->prev_bb;
>
>        FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e)
>         {
> Index: function.h
> ===================================================================
> --- function.h  (revision 193376)
> +++ function.h  (working copy)
> @@ -459,6 +459,11 @@ struct GTY(()) rtl_data {
>       sched2) and is useful only if the port defines LEAF_REGISTERS.  */
>    bool uses_only_leaf_regs;
>
> +  /* Nonzero if the function being compiled has undergone hot/cold partitioning
> +     (under flag_reorder_blocks_and_partition) and has at least one cold
> +     block.  */
> +  bool has_bb_partition;
> +
>    /* Like regs_ever_live, but 1 if a reg is set or clobbered from an
>       asm.  Unlike regs_ever_live, elements of this array corresponding
>       to eliminable regs (like the frame pointer) are set if an asm
> Index: hw-doloop.c
> ===================================================================
> --- hw-doloop.c (revision 193376)
> +++ hw-doloop.c (working copy)
> @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops)
>        else
>         bb->aux = NULL;
>      }
> -  cfg_layout_finalize ();
> +  cfg_layout_finalize (false);
>    clear_aux_for_blocks ();
>    df_analyze ();
>  }
> Index: cfgcleanup.c
> ===================================================================
> --- cfgcleanup.c        (revision 193376)
> +++ cfgcleanup.c        (working copy)
> @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2,
>       partition boundaries).  See the comments at the top of
>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>
> -  if (flag_reorder_blocks_and_partition && reload_completed)
> +  if (crtl->has_bb_partition && reload_completed)
>      return false;
>
>    /* Search backward through forwarder blocks.  We don't need to worry
> @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode)
>               df_analyze ();
>             }
>
> +         if (changed)
> +            {
> +              /* Edge forwarding in particular can cause hot blocks previously
> +                 reached by both hot and cold blocks to become dominated only
> +                 by cold blocks. This will cause the verification below to fail,
> +                 and lead to now cold code in the hot section. This is not easy
> +                 to detect and fix during edge forwarding, and in some cases
> +                 is only visible after newly unreachable blocks are deleted,
> +                 which will be done in fixup_partitions.  */
> +              fixup_partitions ();
> +
>  #ifdef ENABLE_CHECKING
> -         if (changed)
> -           verify_flow_info ();
> +              verify_flow_info ();
>  #endif
> +            }
>
>           changed_overall |= changed;
>           first_pass = false;
> Index: bb-reorder.c
> ===================================================================
> --- bb-reorder.c        (revision 193376)
> +++ bb-reorder.c        (working copy)
> @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces
>    current_partition = BB_PARTITION (traces[0].first);
>    two_passes = false;
>
> -  if (flag_reorder_blocks_and_partition)
> +  if (crtl->has_bb_partition)
>      for (i = 0; i < n_traces && !two_passes; i++)
>        if (BB_PARTITION (traces[0].first)
>           != BB_PARTITION (traces[i].first))
> @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces
>                       }
>                   }
>
> -             if (flag_reorder_blocks_and_partition)
> +             if (crtl->has_bb_partition)
>                 try_copy = false;
>
>               /* Copy tiny blocks always; copy larger blocks only when the
> @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void)
>    return length;
>  }
>
> -/* Emit a barrier into the footer of BB.  */
> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode.  */
>
> -static void
> +void
>  emit_barrier_after_bb (basic_block bb)
>  {
>    rtx barrier = emit_barrier_after (BB_END (bb));
> -  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
> +  if (current_ir_type () == IR_RTL_CFGLAYOUT)
> +    BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>  }
>
>  /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.
> @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg
>  {
>    VEC(edge, heap) *crossing_edges = NULL;
>    basic_block bb;
> -  edge e;
> -  edge_iterator ei;
> +  edge e, e2;
> +  edge_iterator ei, ei2;
> +  unsigned int cold_bb_count = 0;
> +  VEC (basic_block, heap) *bbs_in_hot_partition = NULL;
> +  VEC (basic_block, heap) *bbs_newly_hot = NULL;
>
>    /* Mark which partition (hot/cold) each basic block belongs in.  */
>    FOR_EACH_BB (bb)
>      {
>        if (probably_never_executed_bb_p (cfun, bb))
> -       BB_SET_PARTITION (bb, BB_COLD_PARTITION);
> +        {
> +          BB_SET_PARTITION (bb, BB_COLD_PARTITION);
> +          cold_bb_count++;
> +        }
>        else
> -       BB_SET_PARTITION (bb, BB_HOT_PARTITION);
> +        {
> +          BB_SET_PARTITION (bb, BB_HOT_PARTITION);
> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb);
> +        }
>      }
>
> +  /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of
> +     several different possibilities. One is that there are edge weight insanities
> +     due to optimization phases that do not properly update basic block profile
> +     counts. The second is that the entry of the function may not be hot, because
> +     it is entered fewer times than the number of profile training runs, but there
> +     is a loop inside the function that causes blocks within the function to be
> +     above the threshold for hotness.  */
> +  if (cold_bb_count)
> +    {
> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
> +
> +      if (dom_calculated_here)
> +        calculate_dominance_info (CDI_DOMINATORS);
> +
> +      /* Keep examining hot bbs until we have either checked them all, or
> +         re-marked all cold bbs hot.  */
> +      while (! VEC_empty (basic_block, bbs_in_hot_partition)
> +             && cold_bb_count)
> +        {
> +          basic_block dom_bb;
> +
> +          bb = VEC_pop (basic_block, bbs_in_hot_partition);
> +          dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
> +
> +          /* If bb's immediate dominator is also hot then it is ok.  */
> +          if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION)
> +            continue;
> +
> +          /* We have a hot bb with an immediate dominator that is cold.
> +             The dominator needs to be re-marked to hot.  */
> +          BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION);
> +          cold_bb_count--;
> +
> +          /* Now we need to examine newly-hot dom_bb to see if it is also
> +             dominated by a cold bb.  */
> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb);
> +
> +          /* We should also adjust any cold blocks that the newly-hot bb
> +             feeds and see if it makes sense to re-mark those as hot as
> +             well.  */
> +          VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb);
> +          while (! VEC_empty (basic_block, bbs_newly_hot))
> +            {
> +              basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot);
> +              /* Examine all successors of this newly-hot bb to see if they
> +                 are cold and should be re-marked as hot.  */
> +              FOR_EACH_EDGE (e, ei, new_hot_bb->succs)
> +                {
> +                  bool any_cold_preds = false;
> +                  basic_block succ = e->dest;
> +                  if (BB_PARTITION (succ) != BB_COLD_PARTITION)
> +                    continue;
> +                  /* Does this block have any cold predecessors now?  */
> +                  FOR_EACH_EDGE (e2, ei2, succ->preds)
> +                  {
> +                    if (BB_PARTITION (e2->src) == BB_COLD_PARTITION)
> +                      {
> +                        any_cold_preds = true;
> +                        break;
> +                      }
> +                  }
> +                  if (any_cold_preds)
> +                    continue;
> +
> +                  /* Here we have a successor of newly-hot bb that is cold
> +                     but no longer has any cold precessessors. Since the original
> +                     assignment of our newly-hot bb was incorrect, this successor's
> +                     assignment as cold is also suspect. Go ahead and re-mark it
> +                     as hot now too. Better heuristics may be in order here.  */
> +                  BB_SET_PARTITION (succ, BB_HOT_PARTITION);
> +                  cold_bb_count--;
> +                  VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ);
> +                  /* Examine this successor as a newly-hot bb.  */
> +                  VEC_safe_push (basic_block, heap, bbs_newly_hot, succ);
> +                }
> +            }
> +        }
> +
> +      if (dom_calculated_here)
> +        free_dominance_info (CDI_DOMINATORS);
> +    }
> +
>    /* The format of .gcc_except_table does not allow landing pads to
>       be in a different partition as the throw.  Fix this by either
>       moving or duplicating the landing pads.  */
> @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void)
>                       new_bb->aux = cur_bb->aux;
>                       cur_bb->aux = new_bb;
>
> -                     /* Make sure new fall-through bb is in same
> -                        partition as bb it's falling through from.  */
> +                      /* This is done by force_nonfallthru_and_redirect.  */
> +                     gcc_assert (BB_PARTITION (new_bb)
> +                                  == BB_PARTITION (cur_bb));
>
> -                     BB_COPY_PARTITION (new_bb, cur_bb);
>                       single_succ_edge (new_bb)->flags |= EDGE_CROSSING;
>                     }
>                   else
> @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void)
>    FOR_EACH_BB (bb)
>      FOR_EACH_EDGE (e, ei, bb->succs)
>        if ((e->flags & EDGE_CROSSING)
> -         && JUMP_P (BB_END (e->src)))
> +         && JUMP_P (BB_END (e->src))
> +          /* Some notes were added during fix_up_fall_thru_edges, via
> +             force_nonfallthru_and_redirect.  */
> +          && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX))
>         add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>  }
>
> @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void)
>        dump_flow_info (dump_file, dump_flags);
>      }
>
> -  if (flag_reorder_blocks_and_partition)
> +  if (crtl->has_bb_partition)
>      verify_hot_cold_block_grouping ();
>  }
>
> @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void)
>     encountering this note will make the compiler switch between the
>     hot and cold text sections.  */
>
> -static void
> +void
>  insert_section_boundary_note (void)
>  {
>    basic_block bb;
>    rtx new_note;
>    int first_partition = 0;
>
> -  if (!flag_reorder_blocks_and_partition)
> +  if (!crtl->has_bb_partition)
>      return;
>
>    FOR_EACH_BB (bb)
> @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void)
>    FOR_EACH_BB (bb)
>      if (bb->next_bb != EXIT_BLOCK_PTR)
>        bb->aux = bb->next_bb;
> -  cfg_layout_finalize ();
> +  cfg_layout_finalize (true);
>
> -  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
> -  insert_section_boundary_note ();
>    return 0;
>  }
>
> @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void)
>      }
>
>  done:
> -  cfg_layout_finalize ();
> +  cfg_layout_finalize (false);
>
>    BITMAP_FREE (candidates);
>    return 0;
> @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void)
>    if (crossing_edges == NULL)
>      return 0;
>
> +  crtl->has_bb_partition = true;
> +
>    /* Make sure the source of any crossing edge ends in a jump and the
>       destination of any crossing edge has a label.  */
>    add_labels_and_missing_jumps (crossing_edges);
> Index: bb-reorder.h
> ===================================================================
> --- bb-reorder.h        (revision 193376)
> +++ bb-reorder.h        (working copy)
> @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re
>
>  extern int get_uncond_jump_length (void);
>
> +extern void insert_section_boundary_note (void);
> +
> +extern void emit_barrier_after_bb (basic_block bb);
> +
>  #endif
> Index: basic-block.h
> ===================================================================
> --- basic-block.h       (revision 193376)
> +++ basic-block.h       (working copy)
> @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect
>  extern bool contains_no_active_insn_p (const_basic_block);
>  extern bool forwarder_block_p (const_basic_block);
>  extern bool can_fallthru (basic_block, basic_block);
> +extern void fixup_partitions (void);
>
>  /* In cfgbuild.c.  */
>  extern void find_many_sub_basic_blocks (sbitmap);
> Index: cfgrtl.c
> ===================================================================
> --- cfgrtl.c    (revision 193376)
> +++ cfgrtl.c    (working copy)
> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree.h"
>  #include "hard-reg-set.h"
>  #include "basic-block.h"
> +#include "bb-reorder.h"
>  #include "regs.h"
>  #include "flags.h"
>  #include "function.h"
> @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3.  If not see
>     Only applicable if the CFG is in cfglayout mode.  */
>  static GTY(()) rtx cfg_layout_function_footer;
>  static GTY(()) rtx cfg_layout_function_header;
> +static bool had_sec_boundary_notes;
>
>  static rtx skip_insns_after_block (basic_block);
>  static void record_effective_endpoints (void);
>  static rtx label_for_bb (basic_block);
> -static void fixup_reorder_chain (void);
> +static void fixup_reorder_chain (bool finalize_reorder_blocks);
>
>  void verify_insn_chain (void);
>  static void fixup_fallthru_exit_predecessor (void);
> @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc
>       partition boundaries).  See  the comments at the top of
>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>
> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
> -      || BB_PARTITION (src) != BB_PARTITION (target))
> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>      return NULL;
>
>    /* We can replace or remove a complex jump only when we have exactly
> @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target)
>    return e;
>  }
>
> +/* Called when edge E has been redirected to a new destination,
> +   in order to update the region crossing flag on the edge and
> +   jump.  */
> +
> +static void
> +fixup_partition_crossing (edge e, basic_block target)
> +{
> +  rtx note;
> +
> +  gcc_assert (e->dest == target);
> +
> +  if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR)
> +    return;
> +  /* If we redirected an existing edge, it may already be marked
> +     crossing, even though the new src is missing a reg crossing note.
> +     But make sure reg crossing note doesn't already exist before
> +     inserting.  */
> +  if (BB_PARTITION (e->src) != BB_PARTITION (target))
> +    {
> +      e->flags |= EDGE_CROSSING;
> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
> +      if (JUMP_P (BB_END (e->src))
> +          && !note)
> +        add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
> +    }
> +  else if (BB_PARTITION (e->src) == BB_PARTITION (target))
> +    {
> +      e->flags &= ~EDGE_CROSSING;
> +      /* Remove the region crossing note from jump at end of
> +         e->src if it exists.  */
> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
> +      if (note)
> +        remove_note (BB_END (e->src), note);
> +    }
> +}
> +
> +/* Called when block BB has been reassigned to a different partition,
> +   to ensure that the region crossing attributes are updated.  */
> +
> +static void
> +fixup_bb_partition (basic_block bb)
> +{
> +  edge e;
> +  edge_iterator ei;
> +
> +  /* Now need to make bb's pred edges non-region crossing.  */
> +  FOR_EACH_EDGE (e, ei, bb->preds)
> +    {
> +      fixup_partition_crossing (e, e->dest);
> +    }
> +
> +  /* Possibly need to make bb's successor edges region crossing,
> +     or remove stale region crossing.  */
> +  FOR_EACH_EDGE (e, ei, bb->succs)
> +    {
> +      if ((e->flags & EDGE_FALLTHRU)
> +          && BB_PARTITION (bb) != BB_PARTITION (e->dest)
> +          && e->dest != EXIT_BLOCK_PTR)
> +        /* force_nonfallthru_and_redirect calls fixup_partition_crossing.  */
> +        force_nonfallthru (e);
> +      else
> +        fixup_partition_crossing (e, e->dest);
> +    }
> +}
> +
>  /* Attempt to change code to redirect edge E to TARGET.  Don't do that on
>     expense of adding new instructions or reordering basic blocks.
>
> @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>  {
>    edge ret;
>    basic_block src = e->src;
> +  basic_block dest = e->dest;
>
>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>      return NULL;
>
> -  if (e->dest == target)
> +  if (dest == target)
>      return e;
>
>    if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL)
>      {
>        df_set_bb_dirty (src);
> +      fixup_partition_crossing (ret, target);
>        return ret;
>      }
>
> @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>      return NULL;
>
>    df_set_bb_dirty (src);
> +  fixup_partition_crossing (ret, target);
>    return ret;
>  }
>
> @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>        /* Make sure new block ends up in correct hot/cold section.  */
>
>        BB_COPY_PARTITION (jump_block, e->src);
> -      if (flag_reorder_blocks_and_partition
> -         && targetm_common.have_named_sections
> -         && JUMP_P (BB_END (jump_block))
> -         && !any_condjump_p (BB_END (jump_block))
> -         && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING))
> -       add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX);
>
>        /* Wire edge in.  */
>        new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU);
>        new_edge->probability = probability;
>        new_edge->count = count;
>
> +      /* If e->src was previously region crossing, it no longer is
> +         and the reg crossing note should be removed.  */
> +      fixup_partition_crossing (new_edge, jump_block);
> +
>        /* Redirect old edge.  */
>        redirect_edge_pred (e, jump_block);
>        e->probability = REG_BR_PROB_BASE;
> @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>        LABEL_NUSES (label)++;
>      }
>
> -  emit_barrier_after (BB_END (jump_block));
> +  /* We might be in cfg layout mode, and if so, the following routine will
> +     insert the barrier correctly.  */
> +  emit_barrier_after_bb (jump_block);
>    redirect_edge_succ_nodup (e, target);
>
>    if (abnormal_edge_flags)
>      make_edge (src, target, abnormal_edge_flags);
>
>    df_mark_solutions_dirty ();
> +  fixup_partition_crossing (e, target);
>    return new_bb;
>  }
>
> @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU
>  static basic_block
>  rtl_split_edge (edge edge_in)
>  {
> -  basic_block bb;
> +  basic_block bb, new_bb;
>    rtx before;
>
>    /* Abnormal edges cannot be split.  */
> @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in)
>    else
>      {
>        bb = create_basic_block (before, NULL, edge_in->dest->prev_bb);
> -      /* ??? Why not edge_in->dest->prev_bb here?  */
> -      BB_COPY_PARTITION (bb, edge_in->dest);
> +      if (edge_in->src == ENTRY_BLOCK_PTR)
> +        BB_COPY_PARTITION (bb, edge_in->dest);
> +      else
> +        /* Put the split bb into the src partition, to avoid creating
> +           a situation where a cold bb dominates a hot bb, in the case
> +           where src is cold and dest is hot. The src will dominate
> +           the new bb (whereas it might not have dominated dest).  */
> +        BB_COPY_PARTITION (bb, edge_in->src);
>      }
>
>    make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU);
>
> +  /* Can't allow a region crossing edge to be fallthrough.  */
> +  if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest)
> +      && edge_in->dest != EXIT_BLOCK_PTR)
> +    {
> +      new_bb = force_nonfallthru (single_succ_edge (bb));
> +      gcc_assert (!new_bb);
> +    }
> +
>    /* For non-fallthru edges, we must adjust the predecessor's
>       jump instruction to target our new block.  */
>    if ((edge_in->flags & EDGE_FALLTHRU) == 0)
> @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e)
>    else
>      {
>        bb = split_edge (e);
> -      after = BB_END (bb);
>
> -      if (flag_reorder_blocks_and_partition
> -         && targetm_common.have_named_sections
> -         && e->src != ENTRY_BLOCK_PTR
> -         && BB_PARTITION (e->src) == BB_COLD_PARTITION
> -         && !(e->flags & EDGE_CROSSING)
> -         && JUMP_P (after)
> -         && !any_condjump_p (after)
> -         && (single_succ_edge (bb)->flags & EDGE_CROSSING))
> -       add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX);
> +      /* If e crossed a partition boundary, we needed to make bb end in
> +         a region-crossing jump, even though it was originally fallthru.  */
> +      if (JUMP_P (BB_END (bb)))
> +       before = BB_END (bb);
> +      else
> +        after = BB_END (bb);
>      }
>
>    /* Now that we've found the spot, do the insertion.  */
> @@ -1827,6 +1907,14 @@ commit_edge_insertions (void)
>  {
>    basic_block bb;
>
> +  /* Optimization passes that invoke this routine can cause hot blocks
> +     previously reached by both hot and cold blocks to become dominated only
> +     by cold blocks. This will cause the verification below to fail,
> +     and lead to now cold code in the hot section. In some cases this
> +     may only be visible after newly unreachable blocks are deleted,
> +     which will be done by fixup_partitions.  */
> +  fixup_partitions ();
> +
>  #ifdef ENABLE_CHECKING
>    verify_flow_info ();
>  #endif
> @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb)
>
>    return end;
>  }
> -
> +
> +/* Perform cleanup on the hot/cold bb partitioning after optimization
> +   passes that modify the cfg.  */
> +
> +void
> +fixup_partitions (void)
> +{
> +  basic_block bb;
> +
> +  if (!crtl->has_bb_partition)
> +    return;
> +
> +  /* Delete any blocks that became unreachable and weren't
> +     already cleaned up, for example during edge forwarding
> +     and convert_jumps_to_returns. This will expose more
> +     opportunities for fixing the partition boundaries here.
> +     Also, the calculation of the dominance graph during verification
> +     will assert if there are unreachable nodes.  */
> +  delete_unreachable_blocks ();
> +
> +  /* If there are partitions, do a sanity check on them: A basic block in
> +     a cold partition cannot dominate a basic block in a hot partition.
> +     Fixup any that now violate this requirement, as a result of edge
> +     forwarding and unreachable block deletion.  */
> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
> +  VEC (basic_block, heap) *bbs_to_fix = NULL;
> +  FOR_EACH_BB (bb)
> +    if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
> +      VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
> +    {
> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
> +      basic_block son;
> +
> +      if (dom_calculated_here)
> +        calculate_dominance_info (CDI_DOMINATORS);
> +
> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
> +        {
> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
> +          /* If bb is not yet cold (because it was added below as
> +             a block dominated by a cold bb) then mark it cold here.  */
> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
> +            {
> +              BB_SET_PARTITION (bb, BB_COLD_PARTITION);
> +              VEC_safe_push (basic_block, heap, bbs_to_fix, bb);
> +            }
> +          /* Any blocks dominated by a block in the cold section
> +             must also be cold.  */
> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
> +               son;
> +               son = next_dom_son (CDI_DOMINATORS, son))
> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
> +        }
> +
> +      if (dom_calculated_here)
> +        free_dominance_info (CDI_DOMINATORS);
> +    }
> +
> +  /* Do the partition fixup after all necessary blocks have been converted to
> +     cold, so that we only update the region crossings the minimum number of
> +     places, which can require forcing edges to be non fallthru.  */
> +  while (! VEC_empty (basic_block, bbs_to_fix))
> +    {
> +      bb = VEC_pop (basic_block, bbs_to_fix);
> +      fixup_bb_partition (bb);
> +    }
> +}
> +
>  /* Verify the CFG and RTL consistency common for both underlying RTL and
>     cfglayout RTL.
>
> @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void)
>    rtx x;
>    int err = 0;
>    basic_block bb;
> +  bool have_partitions = false;
>
>    /* Check the general integrity of the basic blocks.  */
>    FOR_EACH_BB_REVERSE (bb)
> @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void)
>
>           if (e->flags & EDGE_ABNORMAL)
>             n_abnormal++;
> +
> +          have_partitions |= is_crossing;
>         }
>
>        if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX))
> @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void)
>           }
>      }
>
> +  /* If there are partitions, do a sanity check on them: A basic block in
> +     a cold partition cannot dominate a basic block in a hot partition.  */
> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
> +  if (have_partitions && !err)
> +    FOR_EACH_BB (bb)
> +      if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
> +        VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
> +    {
> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
> +      basic_block son;
> +
> +      if (dom_calculated_here)
> +        calculate_dominance_info (CDI_DOMINATORS);
> +
> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
> +        {
> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
> +            {
> +              error ("non-cold basic block %d dominated "
> +                     "by a block in the cold partition", bb->index);
> +              err = 1;
> +            }
> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
> +               son;
> +               son = next_dom_son (CDI_DOMINATORS, son))
> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
> +        }
> +
> +      if (dom_calculated_here)
> +        free_dominance_info (CDI_DOMINATORS);
> +    }
> +
>    /* Clean up.  */
>    return err;
>  }
> @@ -2965,14 +3158,41 @@ record_effective_endpoints (void)
>    else
>      cfg_layout_function_header = NULL_RTX;
>
> +  had_sec_boundary_notes = false;
> +
>    next_insn = get_insns ();
>    FOR_EACH_BB (bb)
>      {
>        rtx end;
>
>        if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb))
> -       BB_HEADER (bb) = unlink_insn_chain (next_insn,
> -                                             PREV_INSN (BB_HEAD (bb)));
> +        {
> +          /* Rather than try to keep section boundary notes incrementally
> +             up-to-date through cfg layout optimizations, simply remove them
> +             and flag that they should be re-inserted when exiting
> +             cfg layout mode.  */
> +          rtx check_insn = next_insn;
> +          while (check_insn)
> +            {
> +              if (NOTE_P (check_insn)
> +                  && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS)
> +              {
> +                had_sec_boundary_notes |= true;
> +                /* Remove note from chain. Grab new next_insn first.  */
> +                if (next_insn == check_insn)
> +                  next_insn = NEXT_INSN (check_insn);
> +                /* Delete note.  */
> +                delete_insn (check_insn);
> +                /* There will only be one.  */
> +                break;
> +              }
> +              check_insn = NEXT_INSN (check_insn);
> +            }
> +          /* If we still have header instructions left after above loop.  */
> +          if (next_insn != BB_HEAD (bb))
> +            BB_HEADER (bb) = unlink_insn_chain (next_insn,
> +                                                PREV_INSN (BB_HEAD (bb)));
> +        }
>        end = skip_insns_after_block (bb);
>        if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end)
>         BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end);
> @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void)
>      if (bb->next_bb != EXIT_BLOCK_PTR)
>        bb->aux = bb->next_bb;
>
> -  cfg_layout_finalize ();
> +  cfg_layout_finalize (false);
>
>    return 0;
>  }
> @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode)
>  }
>
>
> -/* Given a reorder chain, rearrange the code to match.  */
> +/* Given a reorder chain, rearrange the code to match. If
> +   this is called when we will FINALIZE_REORDER_BLOCKS, or when
> +   section boundary notes were removed on entry to cfg layout
> +   mode, insert section boundary notes here.  */
>
>  static void
> -fixup_reorder_chain (void)
> +fixup_reorder_chain (bool finalize_reorder_blocks)
>  {
>    basic_block bb;
>    rtx insn = NULL;
> @@ -3150,7 +3373,7 @@ static void
>           PREV_INSN (BB_HEADER (bb)) = insn;
>           insn = BB_HEADER (bb);
>           while (NEXT_INSN (insn))
> -           insn = NEXT_INSN (insn);
> +            insn = NEXT_INSN (insn);
>         }
>        if (insn)
>         NEXT_INSN (insn) = BB_HEAD (bb);
> @@ -3175,6 +3398,11 @@ static void
>      insn = NEXT_INSN (insn);
>
>    set_last_insn (insn);
> +
> +  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
> +  if (had_sec_boundary_notes || finalize_reorder_blocks)
> +    insert_section_boundary_note ();
> +
>  #ifdef ENABLE_CHECKING
>    verify_insn_chain ();
>  #endif
> @@ -3187,7 +3415,7 @@ static void
>        edge e_fall, e_taken, e;
>        rtx bb_end_insn;
>        rtx ret_label = NULL_RTX;
> -      basic_block nb, src_bb;
> +      basic_block nb;
>        edge_iterator ei;
>
>        if (EDGE_COUNT (bb->succs) == 0)
> @@ -3322,7 +3550,6 @@ static void
>        /* We got here if we need to add a new jump insn.
>          Note force_nonfallthru can delete E_FALL and thus we have to
>          save E_FALL->src prior to the call to force_nonfallthru.  */
> -      src_bb = e_fall->src;
>        nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label);
>        if (nb)
>         {
> @@ -3330,17 +3557,6 @@ static void
>           bb->aux = nb;
>           /* Don't process this new block.  */
>           bb = nb;
> -
> -         /* Make sure new bb is tagged for correct section (same as
> -            fall-thru source, since you cannot fall-thru across
> -            section boundaries).  */
> -         BB_COPY_PARTITION (src_bb, single_pred (bb));
> -         if (flag_reorder_blocks_and_partition
> -             && targetm_common.have_named_sections
> -             && JUMP_P (BB_END (bb))
> -             && !any_condjump_p (BB_END (bb))
> -             && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING))
> -           add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX);
>         }
>      }
>
> @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to)
>             case NOTE_INSN_FUNCTION_BEG:
>               /* There is always just single entry to function.  */
>             case NOTE_INSN_BASIC_BLOCK:
> +              /* We should only switch text sections once.  */
> +           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>               break;
>
>             case NOTE_INSN_EPILOGUE_BEG:
> -           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>               emit_note_copy (insn);
>               break;
>
> @@ -3759,10 +3976,13 @@ break_superblocks (void)
>  }
>
>  /* Finalize the changes: reorder insn list according to the sequence specified
> -   by aux pointers, enter compensation code, rebuild scope forest.  */
> +   by aux pointers, enter compensation code, rebuild scope forest. If
> +   this is called when we will FINALIZE_REORDER_BLOCKS, indicate that
> +   to fixup_reorder_chain so that it can insert the proper switch text
> +   section notes.  */
>
>  void
> -cfg_layout_finalize (void)
> +cfg_layout_finalize (bool finalize_reorder_blocks)
>  {
>  #ifdef ENABLE_CHECKING
>    verify_flow_info ();
> @@ -3775,7 +3995,7 @@ void
>  #endif
>        )
>      fixup_fallthru_exit_predecessor ();
> -  fixup_reorder_chain ();
> +  fixup_reorder_chain (finalize_reorder_blocks);
>
>    rebuild_jump_labels (get_insns ());
>    delete_dead_jumptables ();
> @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e)
>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>      return false;
>
> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
> -      || BB_PARTITION (src) != BB_PARTITION (target))
> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>      return false;
>
>    if (!onlyjump_p (insn)
>
> --
> This patch is available for review at http://codereview.appspot.com/6823047



-- 
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
  2012-11-26 15:55 ` Teresa Johnson
@ 2012-11-26 16:25   ` Christophe Lyon
  2012-11-26 20:20     ` Teresa Johnson
  0 siblings, 1 reply; 35+ messages in thread
From: Christophe Lyon @ 2012-11-26 16:25 UTC (permalink / raw)
  To: Teresa Johnson
  Cc: reply, David Li, Steven Bosscher, Matthew Gretton-Dann, gcc-patches

Hi,

I have tested your patch on Spec2000 on ARM, and I can still see
several failures caused by:
"error: fallthru edge crosses section boundary", including the case
described in PR55121.

On 26 November 2012 16:55, Teresa Johnson <tejohnson@google.com> wrote:
> Ping.
> Teresa
>
> On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote:
>> Revised patch that fixes failures encountered when enabling
>> -freorder-blocks-and-partition, including the failure reported in PR 53743.
>>
>> This includes new verification code to ensure no cold blocks dominate hot
>> blocks contributed by Steven Bosscher.
>>
>> I attempted to make the handling of partition updates through the optimization
>> passes much more consistent, removing a number of partial fixes in the code
>> stream in the process. The code to fixup partitions (including the BB_PARTITION
>> assignement, region crossing jump notes, and switch text section notes) is
>> now handled in a few centralized locations. For example, inside
>> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers
>> don't need to attempt the fixup themselves.
>>
>> For optimization passes that make adjustments to the cfg while in cfg layout
>> mode that are not easy to fix up incrementally, the new routine
>> fixup_partitions handles the cleanup globally. This does require calculation
>> of the dominance relation, however, as far as I can tell the routines which
>> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions)
>> are invoked typically once (or a small number of times in the case of
>> try_optimize_cfg) per optimization pass. Additionally, I compared the
>> -ftime-report output for some large fdo compilations and saw only minimal
>> increases in the dominance computation times, which were only a tiny percent
>> of the overall compile time.
>>
>> Additionally, I added a flag to the rtl_data structure to indicate whether
>> any partitioning was actually performed, so that optimizations which were
>> conservatively disabled whenever the flag_reorder_blocks_and_partition
>> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less
>> conservative for functions where no partitions were formed (e.g. they are
>> completely hot).
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int
>> benchmarks and internal google benchmarks using profile feedback and
>> -freorder-blocks-and-partition to get more coverage. Ok for trunk?
>>
>> Thanks,
>> Teresa
>>
>> 2012-11-14  Teresa Johnson  <tejohnson@google.com>
>>             Steven Bosscher  <steven@gcc.gnu.org>
>>
>>         * cfghooks.h (cfg_layout_finalize): New parameter.
>>         * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize
>>         parameter.
>>         * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert
>>         as this is now done by redirect_edge_and_branch_force.
>>         * function.c (thread_prologue_and_epilogue_insns): Insert new bb after
>>         barriers, new cfg_layout_finalize parameter, and don't store exit
>>         predecessor BB until after it is potentially split.
>>         * function.h (struct rtl_data): New flag has_bb_partition.
>>         * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter.
>>         * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if
>>         any blocks in function actually partitioned.
>>         (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean
>>         up partitioning.
>>         * bb-reorder.c (connect_traces): Only look for partitions and skip
>>         block copying if any blocks in function actually partitioned.
>>         (emit_barrier_after_bb): Handle insertion in non-cfglayout mode.
>>         (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure
>>         that no cold blocks dominate a hot block.
>>         (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert
>>         as this is now done by force_nonfallthru_and_redirect.
>>         (add_reg_crossing_jump_notes): Handle the fact that some jumps may
>>         already be marked with region crossing note.
>>         (reorder_basic_blocks): Only need to verify partitions if any
>>         blocks in function actually partitioned.
>>         (insert_section_boundary_note): Only need to insert note if any
>>         blocks in function actually partitioned.
>>         (rest_of_handle_reorder_blocks): New cfg_layout_finalize
>>         parameter, and remove call to insert_section_boundary_note as this
>>         is now called via cfg_layout_finalize/fixup_reorder_chain.
>>         (duplicate_computed_gotos): New cfg_layout_finalize
>>         parameter.
>>         (partition_hot_cold_basic_blocks): Set flag indicating function
>>         has bb partitions.
>>         * bb-reorder.h: Declare insert_section_boundary_note and
>>         emit_barrier_after_bb, which are no longer static.
>>         * basic-block.h: Declare new function fixup_partitions.
>>         * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary
>>         check for region crossing note.
>>         (fixup_partition_crossing): New function.
>>         (fixup_bb_partition): Ditto.
>>         (rtl_redirect_edge_and_branch): Fixup partition boundaries.
>>         (force_nonfallthru_and_redirect): Fixup partition boundaries,
>>         remove old code that tried to do this. Emit barrier correctly
>>         when we are in cfglayout mode.
>>         (rtl_split_edge): Correctly fixup partition boundaries.
>>         (commit_one_edge_insertion): Remove old code that tried to
>>         fixup region crossing edge since this is now handled in
>>         split_block, and set up insertion point correctly since
>>         block may now end in a jump.
>>         (commit_edge_insertions): Invoke fixup_partitions to sanitize partition
>>         boundaries after optimizations that modify cfg and before trying to
>>         verify the flow info.
>>         (fixup_partitions): New function.
>>         (rtl_verify_flow_info_1): Add verification that no cold bbs dominate
>>         hot bbs.
>>         (record_effective_endpoints): Remove region-crossing notes and set flag
>>         indicating that they need to be reinserted on exit from cfglayout mode.
>>         (outof_cfg_layout_mode): New cfg_layout_finalize parameter.
>>         (fixup_reorder_chain): Call insert_section_boundary_note if necessary.
>>         Remove old code that attempted to fixup region crossing note as
>>         this is now handled in force_nonfallthru_and_redirect.
>>         (duplicate_insn_chain): Don't duplicate switch section notes.
>>         (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain.
>>         (rtl_can_remove_branch_p): Remove unnecessary check for region crossing
>>         note.
>>
>> Index: cfghooks.h
>> ===================================================================
>> --- cfghooks.h  (revision 193376)
>> +++ cfghooks.h  (working copy)
>> @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas
>>  void account_profile_record (struct profile_record *, int);
>>
>>  extern void cfg_layout_initialize (unsigned int);
>> -extern void cfg_layout_finalize (void);
>> +extern void cfg_layout_finalize (bool);
>>
>>  /* Hooks containers.  */
>>  extern struct cfg_hooks gimple_cfg_hooks;
>> @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi
>>  extern void gimple_register_cfg_hooks (void);
>>  extern struct cfg_hooks get_cfg_hooks (void);
>>  extern void set_cfg_hooks (struct cfg_hooks);
>> -
>> Index: modulo-sched.c
>> ===================================================================
>> --- modulo-sched.c      (revision 193376)
>> +++ modulo-sched.c      (working copy)
>> @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void)
>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>>        bb->aux = bb->next_bb;
>>    free_dominance_info (CDI_DOMINATORS);
>> -  cfg_layout_finalize ();
>> +  cfg_layout_finalize (false);
>>  #endif /* INSN_SCHEDULING */
>>    return 0;
>>  }
>> Index: ifcvt.c
>> ===================================================================
>> --- ifcvt.c     (revision 193376)
>> +++ ifcvt.c     (working copy)
>> @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg
>>    if (new_bb)
>>      {
>>        df_bb_replace (then_bb_index, new_bb);
>> -      /* Since the fallthru edge was redirected from test_bb to new_bb,
>> -         we need to ensure that new_bb is in the same partition as
>> -         test bb (you can not fall through across section boundaries).  */
>> -      BB_COPY_PARTITION (new_bb, test_bb);
>> +      /* This should have been done above via force_nonfallthru_and_redirect
>> +         (possibly called from redirect_edge_and_branch_force).  */
>> +      gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb));
>>      }
>>
>>    num_true_changes++;
>> Index: function.c
>> ===================================================================
>> --- function.c  (revision 193376)
>> +++ function.c  (working copy)
>> @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void)
>>                     break;
>>                 if (e)
>>                   {
>> -                   copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)),
>> -                                                 NULL_RTX, e->src);
>> +                    /* Make sure we insert after any barriers.  */
>> +                    rtx end = get_last_bb_insn (e->src);
>> +                    copy_bb = create_basic_block (NEXT_INSN (end),
>> +                                                  NULL_RTX, e->src);
>>                     BB_COPY_PARTITION (copy_bb, e->src);
>>                   }
>>                 else
>> @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void)
>>         if (cur_bb->index >= NUM_FIXED_BLOCKS
>>             && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS)
>>           cur_bb->aux = cur_bb->next_bb;
>> -      cfg_layout_finalize ();
>> +      cfg_layout_finalize (false);
>>      }
>>
>>  epilogue_done:
>> @@ -6517,7 +6519,7 @@ epilogue_done:
>>        basic_block simple_return_block_cold = NULL;
>>        edge pending_edge_hot = NULL;
>>        edge pending_edge_cold = NULL;
>> -      basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb;
>> +      basic_block exit_pred;
>>        int i;
>>
>>        gcc_assert (entry_edge != orig_entry_edge);
>> @@ -6545,6 +6547,12 @@ epilogue_done:
>>             else
>>               pending_edge_cold = e;
>>           }
>> +
>> +      /* Save a pointer to the exit's predecessor BB for use in
>> +         inserting new BBs at the end of the function. Do this
>> +         after the call to split_block above which may split
>> +         the original exit pred.  */
>> +      exit_pred = EXIT_BLOCK_PTR->prev_bb;
>>
>>        FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e)
>>         {
>> Index: function.h
>> ===================================================================
>> --- function.h  (revision 193376)
>> +++ function.h  (working copy)
>> @@ -459,6 +459,11 @@ struct GTY(()) rtl_data {
>>       sched2) and is useful only if the port defines LEAF_REGISTERS.  */
>>    bool uses_only_leaf_regs;
>>
>> +  /* Nonzero if the function being compiled has undergone hot/cold partitioning
>> +     (under flag_reorder_blocks_and_partition) and has at least one cold
>> +     block.  */
>> +  bool has_bb_partition;
>> +
>>    /* Like regs_ever_live, but 1 if a reg is set or clobbered from an
>>       asm.  Unlike regs_ever_live, elements of this array corresponding
>>       to eliminable regs (like the frame pointer) are set if an asm
>> Index: hw-doloop.c
>> ===================================================================
>> --- hw-doloop.c (revision 193376)
>> +++ hw-doloop.c (working copy)
>> @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops)
>>        else
>>         bb->aux = NULL;
>>      }
>> -  cfg_layout_finalize ();
>> +  cfg_layout_finalize (false);
>>    clear_aux_for_blocks ();
>>    df_analyze ();
>>  }
>> Index: cfgcleanup.c
>> ===================================================================
>> --- cfgcleanup.c        (revision 193376)
>> +++ cfgcleanup.c        (working copy)
>> @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2,
>>       partition boundaries).  See the comments at the top of
>>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>>
>> -  if (flag_reorder_blocks_and_partition && reload_completed)
>> +  if (crtl->has_bb_partition && reload_completed)
>>      return false;
>>
>>    /* Search backward through forwarder blocks.  We don't need to worry
>> @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode)
>>               df_analyze ();
>>             }
>>
>> +         if (changed)
>> +            {
>> +              /* Edge forwarding in particular can cause hot blocks previously
>> +                 reached by both hot and cold blocks to become dominated only
>> +                 by cold blocks. This will cause the verification below to fail,
>> +                 and lead to now cold code in the hot section. This is not easy
>> +                 to detect and fix during edge forwarding, and in some cases
>> +                 is only visible after newly unreachable blocks are deleted,
>> +                 which will be done in fixup_partitions.  */
>> +              fixup_partitions ();
>> +
>>  #ifdef ENABLE_CHECKING
>> -         if (changed)
>> -           verify_flow_info ();
>> +              verify_flow_info ();
>>  #endif
>> +            }
>>
>>           changed_overall |= changed;
>>           first_pass = false;
>> Index: bb-reorder.c
>> ===================================================================
>> --- bb-reorder.c        (revision 193376)
>> +++ bb-reorder.c        (working copy)
>> @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces
>>    current_partition = BB_PARTITION (traces[0].first);
>>    two_passes = false;
>>
>> -  if (flag_reorder_blocks_and_partition)
>> +  if (crtl->has_bb_partition)
>>      for (i = 0; i < n_traces && !two_passes; i++)
>>        if (BB_PARTITION (traces[0].first)
>>           != BB_PARTITION (traces[i].first))
>> @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces
>>                       }
>>                   }
>>
>> -             if (flag_reorder_blocks_and_partition)
>> +             if (crtl->has_bb_partition)
>>                 try_copy = false;
>>
>>               /* Copy tiny blocks always; copy larger blocks only when the
>> @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void)
>>    return length;
>>  }
>>
>> -/* Emit a barrier into the footer of BB.  */
>> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode.  */
>>
>> -static void
>> +void
>>  emit_barrier_after_bb (basic_block bb)
>>  {
>>    rtx barrier = emit_barrier_after (BB_END (bb));
>> -  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>> +  if (current_ir_type () == IR_RTL_CFGLAYOUT)
>> +    BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>>  }
>>
>>  /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.
>> @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg
>>  {
>>    VEC(edge, heap) *crossing_edges = NULL;
>>    basic_block bb;
>> -  edge e;
>> -  edge_iterator ei;
>> +  edge e, e2;
>> +  edge_iterator ei, ei2;
>> +  unsigned int cold_bb_count = 0;
>> +  VEC (basic_block, heap) *bbs_in_hot_partition = NULL;
>> +  VEC (basic_block, heap) *bbs_newly_hot = NULL;
>>
>>    /* Mark which partition (hot/cold) each basic block belongs in.  */
>>    FOR_EACH_BB (bb)
>>      {
>>        if (probably_never_executed_bb_p (cfun, bb))
>> -       BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>> +        {
>> +          BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>> +          cold_bb_count++;
>> +        }
>>        else
>> -       BB_SET_PARTITION (bb, BB_HOT_PARTITION);
>> +        {
>> +          BB_SET_PARTITION (bb, BB_HOT_PARTITION);
>> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb);
>> +        }
>>      }
>>
>> +  /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of
>> +     several different possibilities. One is that there are edge weight insanities
>> +     due to optimization phases that do not properly update basic block profile
>> +     counts. The second is that the entry of the function may not be hot, because
>> +     it is entered fewer times than the number of profile training runs, but there
>> +     is a loop inside the function that causes blocks within the function to be
>> +     above the threshold for hotness.  */
>> +  if (cold_bb_count)
>> +    {
>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>> +
>> +      if (dom_calculated_here)
>> +        calculate_dominance_info (CDI_DOMINATORS);
>> +
>> +      /* Keep examining hot bbs until we have either checked them all, or
>> +         re-marked all cold bbs hot.  */
>> +      while (! VEC_empty (basic_block, bbs_in_hot_partition)
>> +             && cold_bb_count)
>> +        {
>> +          basic_block dom_bb;
>> +
>> +          bb = VEC_pop (basic_block, bbs_in_hot_partition);
>> +          dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
>> +
>> +          /* If bb's immediate dominator is also hot then it is ok.  */
>> +          if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION)
>> +            continue;
>> +
>> +          /* We have a hot bb with an immediate dominator that is cold.
>> +             The dominator needs to be re-marked to hot.  */
>> +          BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION);
>> +          cold_bb_count--;
>> +
>> +          /* Now we need to examine newly-hot dom_bb to see if it is also
>> +             dominated by a cold bb.  */
>> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb);
>> +
>> +          /* We should also adjust any cold blocks that the newly-hot bb
>> +             feeds and see if it makes sense to re-mark those as hot as
>> +             well.  */
>> +          VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb);
>> +          while (! VEC_empty (basic_block, bbs_newly_hot))
>> +            {
>> +              basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot);
>> +              /* Examine all successors of this newly-hot bb to see if they
>> +                 are cold and should be re-marked as hot.  */
>> +              FOR_EACH_EDGE (e, ei, new_hot_bb->succs)
>> +                {
>> +                  bool any_cold_preds = false;
>> +                  basic_block succ = e->dest;
>> +                  if (BB_PARTITION (succ) != BB_COLD_PARTITION)
>> +                    continue;
>> +                  /* Does this block have any cold predecessors now?  */
>> +                  FOR_EACH_EDGE (e2, ei2, succ->preds)
>> +                  {
>> +                    if (BB_PARTITION (e2->src) == BB_COLD_PARTITION)
>> +                      {
>> +                        any_cold_preds = true;
>> +                        break;
>> +                      }
>> +                  }
>> +                  if (any_cold_preds)
>> +                    continue;
>> +
>> +                  /* Here we have a successor of newly-hot bb that is cold
>> +                     but no longer has any cold precessessors. Since the original
>> +                     assignment of our newly-hot bb was incorrect, this successor's
>> +                     assignment as cold is also suspect. Go ahead and re-mark it
>> +                     as hot now too. Better heuristics may be in order here.  */
>> +                  BB_SET_PARTITION (succ, BB_HOT_PARTITION);
>> +                  cold_bb_count--;
>> +                  VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ);
>> +                  /* Examine this successor as a newly-hot bb.  */
>> +                  VEC_safe_push (basic_block, heap, bbs_newly_hot, succ);
>> +                }
>> +            }
>> +        }
>> +
>> +      if (dom_calculated_here)
>> +        free_dominance_info (CDI_DOMINATORS);
>> +    }
>> +
>>    /* The format of .gcc_except_table does not allow landing pads to
>>       be in a different partition as the throw.  Fix this by either
>>       moving or duplicating the landing pads.  */
>> @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void)
>>                       new_bb->aux = cur_bb->aux;
>>                       cur_bb->aux = new_bb;
>>
>> -                     /* Make sure new fall-through bb is in same
>> -                        partition as bb it's falling through from.  */
>> +                      /* This is done by force_nonfallthru_and_redirect.  */
>> +                     gcc_assert (BB_PARTITION (new_bb)
>> +                                  == BB_PARTITION (cur_bb));
>>
>> -                     BB_COPY_PARTITION (new_bb, cur_bb);
>>                       single_succ_edge (new_bb)->flags |= EDGE_CROSSING;
>>                     }
>>                   else
>> @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void)
>>    FOR_EACH_BB (bb)
>>      FOR_EACH_EDGE (e, ei, bb->succs)
>>        if ((e->flags & EDGE_CROSSING)
>> -         && JUMP_P (BB_END (e->src)))
>> +         && JUMP_P (BB_END (e->src))
>> +          /* Some notes were added during fix_up_fall_thru_edges, via
>> +             force_nonfallthru_and_redirect.  */
>> +          && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX))
>>         add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>  }
>>
>> @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void)
>>        dump_flow_info (dump_file, dump_flags);
>>      }
>>
>> -  if (flag_reorder_blocks_and_partition)
>> +  if (crtl->has_bb_partition)
>>      verify_hot_cold_block_grouping ();
>>  }
>>
>> @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void)
>>     encountering this note will make the compiler switch between the
>>     hot and cold text sections.  */
>>
>> -static void
>> +void
>>  insert_section_boundary_note (void)
>>  {
>>    basic_block bb;
>>    rtx new_note;
>>    int first_partition = 0;
>>
>> -  if (!flag_reorder_blocks_and_partition)
>> +  if (!crtl->has_bb_partition)
>>      return;
>>
>>    FOR_EACH_BB (bb)
>> @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void)
>>    FOR_EACH_BB (bb)
>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>>        bb->aux = bb->next_bb;
>> -  cfg_layout_finalize ();
>> +  cfg_layout_finalize (true);
>>
>> -  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
>> -  insert_section_boundary_note ();
>>    return 0;
>>  }
>>
>> @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void)
>>      }
>>
>>  done:
>> -  cfg_layout_finalize ();
>> +  cfg_layout_finalize (false);
>>
>>    BITMAP_FREE (candidates);
>>    return 0;
>> @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void)
>>    if (crossing_edges == NULL)
>>      return 0;
>>
>> +  crtl->has_bb_partition = true;
>> +
>>    /* Make sure the source of any crossing edge ends in a jump and the
>>       destination of any crossing edge has a label.  */
>>    add_labels_and_missing_jumps (crossing_edges);
>> Index: bb-reorder.h
>> ===================================================================
>> --- bb-reorder.h        (revision 193376)
>> +++ bb-reorder.h        (working copy)
>> @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re
>>
>>  extern int get_uncond_jump_length (void);
>>
>> +extern void insert_section_boundary_note (void);
>> +
>> +extern void emit_barrier_after_bb (basic_block bb);
>> +
>>  #endif
>> Index: basic-block.h
>> ===================================================================
>> --- basic-block.h       (revision 193376)
>> +++ basic-block.h       (working copy)
>> @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect
>>  extern bool contains_no_active_insn_p (const_basic_block);
>>  extern bool forwarder_block_p (const_basic_block);
>>  extern bool can_fallthru (basic_block, basic_block);
>> +extern void fixup_partitions (void);
>>
>>  /* In cfgbuild.c.  */
>>  extern void find_many_sub_basic_blocks (sbitmap);
>> Index: cfgrtl.c
>> ===================================================================
>> --- cfgrtl.c    (revision 193376)
>> +++ cfgrtl.c    (working copy)
>> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "tree.h"
>>  #include "hard-reg-set.h"
>>  #include "basic-block.h"
>> +#include "bb-reorder.h"
>>  #include "regs.h"
>>  #include "flags.h"
>>  #include "function.h"
>> @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3.  If not see
>>     Only applicable if the CFG is in cfglayout mode.  */
>>  static GTY(()) rtx cfg_layout_function_footer;
>>  static GTY(()) rtx cfg_layout_function_header;
>> +static bool had_sec_boundary_notes;
>>
>>  static rtx skip_insns_after_block (basic_block);
>>  static void record_effective_endpoints (void);
>>  static rtx label_for_bb (basic_block);
>> -static void fixup_reorder_chain (void);
>> +static void fixup_reorder_chain (bool finalize_reorder_blocks);
>>
>>  void verify_insn_chain (void);
>>  static void fixup_fallthru_exit_predecessor (void);
>> @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc
>>       partition boundaries).  See  the comments at the top of
>>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>>
>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
>> -      || BB_PARTITION (src) != BB_PARTITION (target))
>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>>      return NULL;
>>
>>    /* We can replace or remove a complex jump only when we have exactly
>> @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target)
>>    return e;
>>  }
>>
>> +/* Called when edge E has been redirected to a new destination,
>> +   in order to update the region crossing flag on the edge and
>> +   jump.  */
>> +
>> +static void
>> +fixup_partition_crossing (edge e, basic_block target)
>> +{
>> +  rtx note;
>> +
>> +  gcc_assert (e->dest == target);
>> +
>> +  if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR)
>> +    return;
>> +  /* If we redirected an existing edge, it may already be marked
>> +     crossing, even though the new src is missing a reg crossing note.
>> +     But make sure reg crossing note doesn't already exist before
>> +     inserting.  */
>> +  if (BB_PARTITION (e->src) != BB_PARTITION (target))
>> +    {
>> +      e->flags |= EDGE_CROSSING;
>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>> +      if (JUMP_P (BB_END (e->src))
>> +          && !note)
>> +        add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>> +    }
>> +  else if (BB_PARTITION (e->src) == BB_PARTITION (target))
>> +    {
>> +      e->flags &= ~EDGE_CROSSING;
>> +      /* Remove the region crossing note from jump at end of
>> +         e->src if it exists.  */
>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>> +      if (note)
>> +        remove_note (BB_END (e->src), note);
>> +    }
>> +}
>> +
>> +/* Called when block BB has been reassigned to a different partition,
>> +   to ensure that the region crossing attributes are updated.  */
>> +
>> +static void
>> +fixup_bb_partition (basic_block bb)
>> +{
>> +  edge e;
>> +  edge_iterator ei;
>> +
>> +  /* Now need to make bb's pred edges non-region crossing.  */
>> +  FOR_EACH_EDGE (e, ei, bb->preds)
>> +    {
>> +      fixup_partition_crossing (e, e->dest);
>> +    }
>> +
>> +  /* Possibly need to make bb's successor edges region crossing,
>> +     or remove stale region crossing.  */
>> +  FOR_EACH_EDGE (e, ei, bb->succs)
>> +    {
>> +      if ((e->flags & EDGE_FALLTHRU)
>> +          && BB_PARTITION (bb) != BB_PARTITION (e->dest)
>> +          && e->dest != EXIT_BLOCK_PTR)
>> +        /* force_nonfallthru_and_redirect calls fixup_partition_crossing.  */
>> +        force_nonfallthru (e);
>> +      else
>> +        fixup_partition_crossing (e, e->dest);
>> +    }
>> +}
>> +
>>  /* Attempt to change code to redirect edge E to TARGET.  Don't do that on
>>     expense of adding new instructions or reordering basic blocks.
>>
>> @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>>  {
>>    edge ret;
>>    basic_block src = e->src;
>> +  basic_block dest = e->dest;
>>
>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>>      return NULL;
>>
>> -  if (e->dest == target)
>> +  if (dest == target)
>>      return e;
>>
>>    if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL)
>>      {
>>        df_set_bb_dirty (src);
>> +      fixup_partition_crossing (ret, target);
>>        return ret;
>>      }
>>
>> @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>>      return NULL;
>>
>>    df_set_bb_dirty (src);
>> +  fixup_partition_crossing (ret, target);
>>    return ret;
>>  }
>>
>> @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>>        /* Make sure new block ends up in correct hot/cold section.  */
>>
>>        BB_COPY_PARTITION (jump_block, e->src);
>> -      if (flag_reorder_blocks_and_partition
>> -         && targetm_common.have_named_sections
>> -         && JUMP_P (BB_END (jump_block))
>> -         && !any_condjump_p (BB_END (jump_block))
>> -         && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING))
>> -       add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX);
>>
>>        /* Wire edge in.  */
>>        new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU);
>>        new_edge->probability = probability;
>>        new_edge->count = count;
>>
>> +      /* If e->src was previously region crossing, it no longer is
>> +         and the reg crossing note should be removed.  */
>> +      fixup_partition_crossing (new_edge, jump_block);
>> +
>>        /* Redirect old edge.  */
>>        redirect_edge_pred (e, jump_block);
>>        e->probability = REG_BR_PROB_BASE;
>> @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>>        LABEL_NUSES (label)++;
>>      }
>>
>> -  emit_barrier_after (BB_END (jump_block));
>> +  /* We might be in cfg layout mode, and if so, the following routine will
>> +     insert the barrier correctly.  */
>> +  emit_barrier_after_bb (jump_block);
>>    redirect_edge_succ_nodup (e, target);
>>
>>    if (abnormal_edge_flags)
>>      make_edge (src, target, abnormal_edge_flags);
>>
>>    df_mark_solutions_dirty ();
>> +  fixup_partition_crossing (e, target);
>>    return new_bb;
>>  }
>>
>> @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU
>>  static basic_block
>>  rtl_split_edge (edge edge_in)
>>  {
>> -  basic_block bb;
>> +  basic_block bb, new_bb;
>>    rtx before;
>>
>>    /* Abnormal edges cannot be split.  */
>> @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in)
>>    else
>>      {
>>        bb = create_basic_block (before, NULL, edge_in->dest->prev_bb);
>> -      /* ??? Why not edge_in->dest->prev_bb here?  */
>> -      BB_COPY_PARTITION (bb, edge_in->dest);
>> +      if (edge_in->src == ENTRY_BLOCK_PTR)
>> +        BB_COPY_PARTITION (bb, edge_in->dest);
>> +      else
>> +        /* Put the split bb into the src partition, to avoid creating
>> +           a situation where a cold bb dominates a hot bb, in the case
>> +           where src is cold and dest is hot. The src will dominate
>> +           the new bb (whereas it might not have dominated dest).  */
>> +        BB_COPY_PARTITION (bb, edge_in->src);
>>      }
>>
>>    make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU);
>>
>> +  /* Can't allow a region crossing edge to be fallthrough.  */
>> +  if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest)
>> +      && edge_in->dest != EXIT_BLOCK_PTR)
>> +    {
>> +      new_bb = force_nonfallthru (single_succ_edge (bb));
>> +      gcc_assert (!new_bb);
>> +    }
>> +
>>    /* For non-fallthru edges, we must adjust the predecessor's
>>       jump instruction to target our new block.  */
>>    if ((edge_in->flags & EDGE_FALLTHRU) == 0)
>> @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e)
>>    else
>>      {
>>        bb = split_edge (e);
>> -      after = BB_END (bb);
>>
>> -      if (flag_reorder_blocks_and_partition
>> -         && targetm_common.have_named_sections
>> -         && e->src != ENTRY_BLOCK_PTR
>> -         && BB_PARTITION (e->src) == BB_COLD_PARTITION
>> -         && !(e->flags & EDGE_CROSSING)
>> -         && JUMP_P (after)
>> -         && !any_condjump_p (after)
>> -         && (single_succ_edge (bb)->flags & EDGE_CROSSING))
>> -       add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX);
>> +      /* If e crossed a partition boundary, we needed to make bb end in
>> +         a region-crossing jump, even though it was originally fallthru.  */
>> +      if (JUMP_P (BB_END (bb)))
>> +       before = BB_END (bb);
>> +      else
>> +        after = BB_END (bb);
>>      }
>>
>>    /* Now that we've found the spot, do the insertion.  */
>> @@ -1827,6 +1907,14 @@ commit_edge_insertions (void)
>>  {
>>    basic_block bb;
>>
>> +  /* Optimization passes that invoke this routine can cause hot blocks
>> +     previously reached by both hot and cold blocks to become dominated only
>> +     by cold blocks. This will cause the verification below to fail,
>> +     and lead to now cold code in the hot section. In some cases this
>> +     may only be visible after newly unreachable blocks are deleted,
>> +     which will be done by fixup_partitions.  */
>> +  fixup_partitions ();
>> +
>>  #ifdef ENABLE_CHECKING
>>    verify_flow_info ();
>>  #endif
>> @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb)
>>
>>    return end;
>>  }
>> -
>> +
>> +/* Perform cleanup on the hot/cold bb partitioning after optimization
>> +   passes that modify the cfg.  */
>> +
>> +void
>> +fixup_partitions (void)
>> +{
>> +  basic_block bb;
>> +
>> +  if (!crtl->has_bb_partition)
>> +    return;
>> +
>> +  /* Delete any blocks that became unreachable and weren't
>> +     already cleaned up, for example during edge forwarding
>> +     and convert_jumps_to_returns. This will expose more
>> +     opportunities for fixing the partition boundaries here.
>> +     Also, the calculation of the dominance graph during verification
>> +     will assert if there are unreachable nodes.  */
>> +  delete_unreachable_blocks ();
>> +
>> +  /* If there are partitions, do a sanity check on them: A basic block in
>> +     a cold partition cannot dominate a basic block in a hot partition.
>> +     Fixup any that now violate this requirement, as a result of edge
>> +     forwarding and unreachable block deletion.  */
>> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
>> +  VEC (basic_block, heap) *bbs_to_fix = NULL;
>> +  FOR_EACH_BB (bb)
>> +    if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
>> +      VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
>> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
>> +    {
>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>> +      basic_block son;
>> +
>> +      if (dom_calculated_here)
>> +        calculate_dominance_info (CDI_DOMINATORS);
>> +
>> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
>> +        {
>> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
>> +          /* If bb is not yet cold (because it was added below as
>> +             a block dominated by a cold bb) then mark it cold here.  */
>> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
>> +            {
>> +              BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>> +              VEC_safe_push (basic_block, heap, bbs_to_fix, bb);
>> +            }
>> +          /* Any blocks dominated by a block in the cold section
>> +             must also be cold.  */
>> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
>> +               son;
>> +               son = next_dom_son (CDI_DOMINATORS, son))
>> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
>> +        }
>> +
>> +      if (dom_calculated_here)
>> +        free_dominance_info (CDI_DOMINATORS);
>> +    }
>> +
>> +  /* Do the partition fixup after all necessary blocks have been converted to
>> +     cold, so that we only update the region crossings the minimum number of
>> +     places, which can require forcing edges to be non fallthru.  */
>> +  while (! VEC_empty (basic_block, bbs_to_fix))
>> +    {
>> +      bb = VEC_pop (basic_block, bbs_to_fix);
>> +      fixup_bb_partition (bb);
>> +    }
>> +}
>> +
>>  /* Verify the CFG and RTL consistency common for both underlying RTL and
>>     cfglayout RTL.
>>
>> @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void)
>>    rtx x;
>>    int err = 0;
>>    basic_block bb;
>> +  bool have_partitions = false;
>>
>>    /* Check the general integrity of the basic blocks.  */
>>    FOR_EACH_BB_REVERSE (bb)
>> @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void)
>>
>>           if (e->flags & EDGE_ABNORMAL)
>>             n_abnormal++;
>> +
>> +          have_partitions |= is_crossing;
>>         }
>>
>>        if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX))
>> @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void)
>>           }
>>      }
>>
>> +  /* If there are partitions, do a sanity check on them: A basic block in
>> +     a cold partition cannot dominate a basic block in a hot partition.  */
>> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
>> +  if (have_partitions && !err)
>> +    FOR_EACH_BB (bb)
>> +      if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
>> +        VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
>> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
>> +    {
>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>> +      basic_block son;
>> +
>> +      if (dom_calculated_here)
>> +        calculate_dominance_info (CDI_DOMINATORS);
>> +
>> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
>> +        {
>> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
>> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
>> +            {
>> +              error ("non-cold basic block %d dominated "
>> +                     "by a block in the cold partition", bb->index);
>> +              err = 1;
>> +            }
>> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
>> +               son;
>> +               son = next_dom_son (CDI_DOMINATORS, son))
>> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
>> +        }
>> +
>> +      if (dom_calculated_here)
>> +        free_dominance_info (CDI_DOMINATORS);
>> +    }
>> +
>>    /* Clean up.  */
>>    return err;
>>  }
>> @@ -2965,14 +3158,41 @@ record_effective_endpoints (void)
>>    else
>>      cfg_layout_function_header = NULL_RTX;
>>
>> +  had_sec_boundary_notes = false;
>> +
>>    next_insn = get_insns ();
>>    FOR_EACH_BB (bb)
>>      {
>>        rtx end;
>>
>>        if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb))
>> -       BB_HEADER (bb) = unlink_insn_chain (next_insn,
>> -                                             PREV_INSN (BB_HEAD (bb)));
>> +        {
>> +          /* Rather than try to keep section boundary notes incrementally
>> +             up-to-date through cfg layout optimizations, simply remove them
>> +             and flag that they should be re-inserted when exiting
>> +             cfg layout mode.  */
>> +          rtx check_insn = next_insn;
>> +          while (check_insn)
>> +            {
>> +              if (NOTE_P (check_insn)
>> +                  && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS)
>> +              {
>> +                had_sec_boundary_notes |= true;
>> +                /* Remove note from chain. Grab new next_insn first.  */
>> +                if (next_insn == check_insn)
>> +                  next_insn = NEXT_INSN (check_insn);
>> +                /* Delete note.  */
>> +                delete_insn (check_insn);
>> +                /* There will only be one.  */
>> +                break;
>> +              }
>> +              check_insn = NEXT_INSN (check_insn);
>> +            }
>> +          /* If we still have header instructions left after above loop.  */
>> +          if (next_insn != BB_HEAD (bb))
>> +            BB_HEADER (bb) = unlink_insn_chain (next_insn,
>> +                                                PREV_INSN (BB_HEAD (bb)));
>> +        }
>>        end = skip_insns_after_block (bb);
>>        if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end)
>>         BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end);
>> @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void)
>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>>        bb->aux = bb->next_bb;
>>
>> -  cfg_layout_finalize ();
>> +  cfg_layout_finalize (false);
>>
>>    return 0;
>>  }
>> @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode)
>>  }
>>
>>
>> -/* Given a reorder chain, rearrange the code to match.  */
>> +/* Given a reorder chain, rearrange the code to match. If
>> +   this is called when we will FINALIZE_REORDER_BLOCKS, or when
>> +   section boundary notes were removed on entry to cfg layout
>> +   mode, insert section boundary notes here.  */
>>
>>  static void
>> -fixup_reorder_chain (void)
>> +fixup_reorder_chain (bool finalize_reorder_blocks)
>>  {
>>    basic_block bb;
>>    rtx insn = NULL;
>> @@ -3150,7 +3373,7 @@ static void
>>           PREV_INSN (BB_HEADER (bb)) = insn;
>>           insn = BB_HEADER (bb);
>>           while (NEXT_INSN (insn))
>> -           insn = NEXT_INSN (insn);
>> +            insn = NEXT_INSN (insn);
>>         }
>>        if (insn)
>>         NEXT_INSN (insn) = BB_HEAD (bb);
>> @@ -3175,6 +3398,11 @@ static void
>>      insn = NEXT_INSN (insn);
>>
>>    set_last_insn (insn);
>> +
>> +  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
>> +  if (had_sec_boundary_notes || finalize_reorder_blocks)
>> +    insert_section_boundary_note ();
>> +
>>  #ifdef ENABLE_CHECKING
>>    verify_insn_chain ();
>>  #endif
>> @@ -3187,7 +3415,7 @@ static void
>>        edge e_fall, e_taken, e;
>>        rtx bb_end_insn;
>>        rtx ret_label = NULL_RTX;
>> -      basic_block nb, src_bb;
>> +      basic_block nb;
>>        edge_iterator ei;
>>
>>        if (EDGE_COUNT (bb->succs) == 0)
>> @@ -3322,7 +3550,6 @@ static void
>>        /* We got here if we need to add a new jump insn.
>>          Note force_nonfallthru can delete E_FALL and thus we have to
>>          save E_FALL->src prior to the call to force_nonfallthru.  */
>> -      src_bb = e_fall->src;
>>        nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label);
>>        if (nb)
>>         {
>> @@ -3330,17 +3557,6 @@ static void
>>           bb->aux = nb;
>>           /* Don't process this new block.  */
>>           bb = nb;
>> -
>> -         /* Make sure new bb is tagged for correct section (same as
>> -            fall-thru source, since you cannot fall-thru across
>> -            section boundaries).  */
>> -         BB_COPY_PARTITION (src_bb, single_pred (bb));
>> -         if (flag_reorder_blocks_and_partition
>> -             && targetm_common.have_named_sections
>> -             && JUMP_P (BB_END (bb))
>> -             && !any_condjump_p (BB_END (bb))
>> -             && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING))
>> -           add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX);
>>         }
>>      }
>>
>> @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to)
>>             case NOTE_INSN_FUNCTION_BEG:
>>               /* There is always just single entry to function.  */
>>             case NOTE_INSN_BASIC_BLOCK:
>> +              /* We should only switch text sections once.  */
>> +           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>>               break;
>>
>>             case NOTE_INSN_EPILOGUE_BEG:
>> -           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>>               emit_note_copy (insn);
>>               break;
>>
>> @@ -3759,10 +3976,13 @@ break_superblocks (void)
>>  }
>>
>>  /* Finalize the changes: reorder insn list according to the sequence specified
>> -   by aux pointers, enter compensation code, rebuild scope forest.  */
>> +   by aux pointers, enter compensation code, rebuild scope forest. If
>> +   this is called when we will FINALIZE_REORDER_BLOCKS, indicate that
>> +   to fixup_reorder_chain so that it can insert the proper switch text
>> +   section notes.  */
>>
>>  void
>> -cfg_layout_finalize (void)
>> +cfg_layout_finalize (bool finalize_reorder_blocks)
>>  {
>>  #ifdef ENABLE_CHECKING
>>    verify_flow_info ();
>> @@ -3775,7 +3995,7 @@ void
>>  #endif
>>        )
>>      fixup_fallthru_exit_predecessor ();
>> -  fixup_reorder_chain ();
>> +  fixup_reorder_chain (finalize_reorder_blocks);
>>
>>    rebuild_jump_labels (get_insns ());
>>    delete_dead_jumptables ();
>> @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e)
>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>>      return false;
>>
>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
>> -      || BB_PARTITION (src) != BB_PARTITION (target))
>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>>      return false;
>>
>>    if (!onlyjump_p (insn)
>>
>> --
>> This patch is available for review at http://codereview.appspot.com/6823047
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
  2012-11-26 16:25   ` Christophe Lyon
@ 2012-11-26 20:20     ` Teresa Johnson
  2012-11-26 20:29       ` Teresa Johnson
  2012-11-26 20:43       ` Jack Howarth
  0 siblings, 2 replies; 35+ messages in thread
From: Teresa Johnson @ 2012-11-26 20:20 UTC (permalink / raw)
  To: Christophe Lyon
  Cc: reply, David Li, Steven Bosscher, Matthew Gretton-Dann, gcc-patches

Are you sure you have all my changes applied? I applied the 4 patches
attached to PR55121 into my trunk checkout that has my fixes, and to a
pristine trunk checkout. I configured and built both for
--target=arm-none-linux-gnueabi, and built using your options, .i file
and gcda file. I can reproduce the failure using the pristine trunk
with your patches but not with my fixed trunk + your patches. (I just
updated to head to pickup recent changes and get the same result. The
vec changes required some manual changes to the patch, which I will
resend shortly.)

Without my fixes:

$ ~/extra/gcc_trunk_3_arm-eabi/gcc/cc1 -fpreproce
ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9
-mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp
-mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use
-fno-common -o eval.s -freorder-blocks-and-partition
GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
2.4.2-p1, MPC version 0.8.1
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
2.4.2-p1, MPC version 0.8.1
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
Compiler executable checksum: d19cc60a2f07de08237a8488bb35cd1a
eval.c: In function ‘Ge’:
eval.c:792:1: internal compiler error: in df_compact_blocks, at df-core.c:1560
 }
 ^
0x622f71 df_compact_blocks()
../../gcc_trunk_3/gcc/df-core.c:1560
0x5cfcb5 compact_blocks()
../../gcc_trunk_3/gcc/cfg.c:162
0xc9dce0 reorder_basic_blocks
../../gcc_trunk_3/gcc/bb-reorder.c:2154
0xc9dce0 rest_of_handle_reorder_blocks
../../gcc_trunk_3/gcc/bb-reorder.c:2219
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.


With my fixes:

$ ~/extra/gcc_trunk_4_arm-eabi/gcc/cc1 -fpreproce
ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9
-mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp
-mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use
-fno-common -o eval.s -freorder-blocks-and-partition
GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
2.4.2-p1, MPC version 0.8.1
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
2.4.2-p1, MPC version 0.8.1
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
Compiler executable checksum: 45b468efa7c981f9afb44c4dac2424f3


Thanks,
Teresa

On Mon, Nov 26, 2012 at 8:25 AM, Christophe Lyon
<christophe.lyon@linaro.org> wrote:
> Hi,
>
> I have tested your patch on Spec2000 on ARM, and I can still see
> several failures caused by:
> "error: fallthru edge crosses section boundary", including the case
> described in PR55121.
>
> On 26 November 2012 16:55, Teresa Johnson <tejohnson@google.com> wrote:
>> Ping.
>> Teresa
>>
>> On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote:
>>> Revised patch that fixes failures encountered when enabling
>>> -freorder-blocks-and-partition, including the failure reported in PR 53743.
>>>
>>> This includes new verification code to ensure no cold blocks dominate hot
>>> blocks contributed by Steven Bosscher.
>>>
>>> I attempted to make the handling of partition updates through the optimization
>>> passes much more consistent, removing a number of partial fixes in the code
>>> stream in the process. The code to fixup partitions (including the BB_PARTITION
>>> assignement, region crossing jump notes, and switch text section notes) is
>>> now handled in a few centralized locations. For example, inside
>>> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers
>>> don't need to attempt the fixup themselves.
>>>
>>> For optimization passes that make adjustments to the cfg while in cfg layout
>>> mode that are not easy to fix up incrementally, the new routine
>>> fixup_partitions handles the cleanup globally. This does require calculation
>>> of the dominance relation, however, as far as I can tell the routines which
>>> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions)
>>> are invoked typically once (or a small number of times in the case of
>>> try_optimize_cfg) per optimization pass. Additionally, I compared the
>>> -ftime-report output for some large fdo compilations and saw only minimal
>>> increases in the dominance computation times, which were only a tiny percent
>>> of the overall compile time.
>>>
>>> Additionally, I added a flag to the rtl_data structure to indicate whether
>>> any partitioning was actually performed, so that optimizations which were
>>> conservatively disabled whenever the flag_reorder_blocks_and_partition
>>> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less
>>> conservative for functions where no partitions were formed (e.g. they are
>>> completely hot).
>>>
>>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int
>>> benchmarks and internal google benchmarks using profile feedback and
>>> -freorder-blocks-and-partition to get more coverage. Ok for trunk?
>>>
>>> Thanks,
>>> Teresa
>>>
>>> 2012-11-14  Teresa Johnson  <tejohnson@google.com>
>>>             Steven Bosscher  <steven@gcc.gnu.org>
>>>
>>>         * cfghooks.h (cfg_layout_finalize): New parameter.
>>>         * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize
>>>         parameter.
>>>         * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert
>>>         as this is now done by redirect_edge_and_branch_force.
>>>         * function.c (thread_prologue_and_epilogue_insns): Insert new bb after
>>>         barriers, new cfg_layout_finalize parameter, and don't store exit
>>>         predecessor BB until after it is potentially split.
>>>         * function.h (struct rtl_data): New flag has_bb_partition.
>>>         * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter.
>>>         * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if
>>>         any blocks in function actually partitioned.
>>>         (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean
>>>         up partitioning.
>>>         * bb-reorder.c (connect_traces): Only look for partitions and skip
>>>         block copying if any blocks in function actually partitioned.
>>>         (emit_barrier_after_bb): Handle insertion in non-cfglayout mode.
>>>         (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure
>>>         that no cold blocks dominate a hot block.
>>>         (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert
>>>         as this is now done by force_nonfallthru_and_redirect.
>>>         (add_reg_crossing_jump_notes): Handle the fact that some jumps may
>>>         already be marked with region crossing note.
>>>         (reorder_basic_blocks): Only need to verify partitions if any
>>>         blocks in function actually partitioned.
>>>         (insert_section_boundary_note): Only need to insert note if any
>>>         blocks in function actually partitioned.
>>>         (rest_of_handle_reorder_blocks): New cfg_layout_finalize
>>>         parameter, and remove call to insert_section_boundary_note as this
>>>         is now called via cfg_layout_finalize/fixup_reorder_chain.
>>>         (duplicate_computed_gotos): New cfg_layout_finalize
>>>         parameter.
>>>         (partition_hot_cold_basic_blocks): Set flag indicating function
>>>         has bb partitions.
>>>         * bb-reorder.h: Declare insert_section_boundary_note and
>>>         emit_barrier_after_bb, which are no longer static.
>>>         * basic-block.h: Declare new function fixup_partitions.
>>>         * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary
>>>         check for region crossing note.
>>>         (fixup_partition_crossing): New function.
>>>         (fixup_bb_partition): Ditto.
>>>         (rtl_redirect_edge_and_branch): Fixup partition boundaries.
>>>         (force_nonfallthru_and_redirect): Fixup partition boundaries,
>>>         remove old code that tried to do this. Emit barrier correctly
>>>         when we are in cfglayout mode.
>>>         (rtl_split_edge): Correctly fixup partition boundaries.
>>>         (commit_one_edge_insertion): Remove old code that tried to
>>>         fixup region crossing edge since this is now handled in
>>>         split_block, and set up insertion point correctly since
>>>         block may now end in a jump.
>>>         (commit_edge_insertions): Invoke fixup_partitions to sanitize partition
>>>         boundaries after optimizations that modify cfg and before trying to
>>>         verify the flow info.
>>>         (fixup_partitions): New function.
>>>         (rtl_verify_flow_info_1): Add verification that no cold bbs dominate
>>>         hot bbs.
>>>         (record_effective_endpoints): Remove region-crossing notes and set flag
>>>         indicating that they need to be reinserted on exit from cfglayout mode.
>>>         (outof_cfg_layout_mode): New cfg_layout_finalize parameter.
>>>         (fixup_reorder_chain): Call insert_section_boundary_note if necessary.
>>>         Remove old code that attempted to fixup region crossing note as
>>>         this is now handled in force_nonfallthru_and_redirect.
>>>         (duplicate_insn_chain): Don't duplicate switch section notes.
>>>         (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain.
>>>         (rtl_can_remove_branch_p): Remove unnecessary check for region crossing
>>>         note.
>>>
>>> Index: cfghooks.h
>>> ===================================================================
>>> --- cfghooks.h  (revision 193376)
>>> +++ cfghooks.h  (working copy)
>>> @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas
>>>  void account_profile_record (struct profile_record *, int);
>>>
>>>  extern void cfg_layout_initialize (unsigned int);
>>> -extern void cfg_layout_finalize (void);
>>> +extern void cfg_layout_finalize (bool);
>>>
>>>  /* Hooks containers.  */
>>>  extern struct cfg_hooks gimple_cfg_hooks;
>>> @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi
>>>  extern void gimple_register_cfg_hooks (void);
>>>  extern struct cfg_hooks get_cfg_hooks (void);
>>>  extern void set_cfg_hooks (struct cfg_hooks);
>>> -
>>> Index: modulo-sched.c
>>> ===================================================================
>>> --- modulo-sched.c      (revision 193376)
>>> +++ modulo-sched.c      (working copy)
>>> @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void)
>>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>>>        bb->aux = bb->next_bb;
>>>    free_dominance_info (CDI_DOMINATORS);
>>> -  cfg_layout_finalize ();
>>> +  cfg_layout_finalize (false);
>>>  #endif /* INSN_SCHEDULING */
>>>    return 0;
>>>  }
>>> Index: ifcvt.c
>>> ===================================================================
>>> --- ifcvt.c     (revision 193376)
>>> +++ ifcvt.c     (working copy)
>>> @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg
>>>    if (new_bb)
>>>      {
>>>        df_bb_replace (then_bb_index, new_bb);
>>> -      /* Since the fallthru edge was redirected from test_bb to new_bb,
>>> -         we need to ensure that new_bb is in the same partition as
>>> -         test bb (you can not fall through across section boundaries).  */
>>> -      BB_COPY_PARTITION (new_bb, test_bb);
>>> +      /* This should have been done above via force_nonfallthru_and_redirect
>>> +         (possibly called from redirect_edge_and_branch_force).  */
>>> +      gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb));
>>>      }
>>>
>>>    num_true_changes++;
>>> Index: function.c
>>> ===================================================================
>>> --- function.c  (revision 193376)
>>> +++ function.c  (working copy)
>>> @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void)
>>>                     break;
>>>                 if (e)
>>>                   {
>>> -                   copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)),
>>> -                                                 NULL_RTX, e->src);
>>> +                    /* Make sure we insert after any barriers.  */
>>> +                    rtx end = get_last_bb_insn (e->src);
>>> +                    copy_bb = create_basic_block (NEXT_INSN (end),
>>> +                                                  NULL_RTX, e->src);
>>>                     BB_COPY_PARTITION (copy_bb, e->src);
>>>                   }
>>>                 else
>>> @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void)
>>>         if (cur_bb->index >= NUM_FIXED_BLOCKS
>>>             && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS)
>>>           cur_bb->aux = cur_bb->next_bb;
>>> -      cfg_layout_finalize ();
>>> +      cfg_layout_finalize (false);
>>>      }
>>>
>>>  epilogue_done:
>>> @@ -6517,7 +6519,7 @@ epilogue_done:
>>>        basic_block simple_return_block_cold = NULL;
>>>        edge pending_edge_hot = NULL;
>>>        edge pending_edge_cold = NULL;
>>> -      basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb;
>>> +      basic_block exit_pred;
>>>        int i;
>>>
>>>        gcc_assert (entry_edge != orig_entry_edge);
>>> @@ -6545,6 +6547,12 @@ epilogue_done:
>>>             else
>>>               pending_edge_cold = e;
>>>           }
>>> +
>>> +      /* Save a pointer to the exit's predecessor BB for use in
>>> +         inserting new BBs at the end of the function. Do this
>>> +         after the call to split_block above which may split
>>> +         the original exit pred.  */
>>> +      exit_pred = EXIT_BLOCK_PTR->prev_bb;
>>>
>>>        FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e)
>>>         {
>>> Index: function.h
>>> ===================================================================
>>> --- function.h  (revision 193376)
>>> +++ function.h  (working copy)
>>> @@ -459,6 +459,11 @@ struct GTY(()) rtl_data {
>>>       sched2) and is useful only if the port defines LEAF_REGISTERS.  */
>>>    bool uses_only_leaf_regs;
>>>
>>> +  /* Nonzero if the function being compiled has undergone hot/cold partitioning
>>> +     (under flag_reorder_blocks_and_partition) and has at least one cold
>>> +     block.  */
>>> +  bool has_bb_partition;
>>> +
>>>    /* Like regs_ever_live, but 1 if a reg is set or clobbered from an
>>>       asm.  Unlike regs_ever_live, elements of this array corresponding
>>>       to eliminable regs (like the frame pointer) are set if an asm
>>> Index: hw-doloop.c
>>> ===================================================================
>>> --- hw-doloop.c (revision 193376)
>>> +++ hw-doloop.c (working copy)
>>> @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops)
>>>        else
>>>         bb->aux = NULL;
>>>      }
>>> -  cfg_layout_finalize ();
>>> +  cfg_layout_finalize (false);
>>>    clear_aux_for_blocks ();
>>>    df_analyze ();
>>>  }
>>> Index: cfgcleanup.c
>>> ===================================================================
>>> --- cfgcleanup.c        (revision 193376)
>>> +++ cfgcleanup.c        (working copy)
>>> @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2,
>>>       partition boundaries).  See the comments at the top of
>>>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>>>
>>> -  if (flag_reorder_blocks_and_partition && reload_completed)
>>> +  if (crtl->has_bb_partition && reload_completed)
>>>      return false;
>>>
>>>    /* Search backward through forwarder blocks.  We don't need to worry
>>> @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode)
>>>               df_analyze ();
>>>             }
>>>
>>> +         if (changed)
>>> +            {
>>> +              /* Edge forwarding in particular can cause hot blocks previously
>>> +                 reached by both hot and cold blocks to become dominated only
>>> +                 by cold blocks. This will cause the verification below to fail,
>>> +                 and lead to now cold code in the hot section. This is not easy
>>> +                 to detect and fix during edge forwarding, and in some cases
>>> +                 is only visible after newly unreachable blocks are deleted,
>>> +                 which will be done in fixup_partitions.  */
>>> +              fixup_partitions ();
>>> +
>>>  #ifdef ENABLE_CHECKING
>>> -         if (changed)
>>> -           verify_flow_info ();
>>> +              verify_flow_info ();
>>>  #endif
>>> +            }
>>>
>>>           changed_overall |= changed;
>>>           first_pass = false;
>>> Index: bb-reorder.c
>>> ===================================================================
>>> --- bb-reorder.c        (revision 193376)
>>> +++ bb-reorder.c        (working copy)
>>> @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces
>>>    current_partition = BB_PARTITION (traces[0].first);
>>>    two_passes = false;
>>>
>>> -  if (flag_reorder_blocks_and_partition)
>>> +  if (crtl->has_bb_partition)
>>>      for (i = 0; i < n_traces && !two_passes; i++)
>>>        if (BB_PARTITION (traces[0].first)
>>>           != BB_PARTITION (traces[i].first))
>>> @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces
>>>                       }
>>>                   }
>>>
>>> -             if (flag_reorder_blocks_and_partition)
>>> +             if (crtl->has_bb_partition)
>>>                 try_copy = false;
>>>
>>>               /* Copy tiny blocks always; copy larger blocks only when the
>>> @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void)
>>>    return length;
>>>  }
>>>
>>> -/* Emit a barrier into the footer of BB.  */
>>> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode.  */
>>>
>>> -static void
>>> +void
>>>  emit_barrier_after_bb (basic_block bb)
>>>  {
>>>    rtx barrier = emit_barrier_after (BB_END (bb));
>>> -  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>>> +  if (current_ir_type () == IR_RTL_CFGLAYOUT)
>>> +    BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>>>  }
>>>
>>>  /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.
>>> @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg
>>>  {
>>>    VEC(edge, heap) *crossing_edges = NULL;
>>>    basic_block bb;
>>> -  edge e;
>>> -  edge_iterator ei;
>>> +  edge e, e2;
>>> +  edge_iterator ei, ei2;
>>> +  unsigned int cold_bb_count = 0;
>>> +  VEC (basic_block, heap) *bbs_in_hot_partition = NULL;
>>> +  VEC (basic_block, heap) *bbs_newly_hot = NULL;
>>>
>>>    /* Mark which partition (hot/cold) each basic block belongs in.  */
>>>    FOR_EACH_BB (bb)
>>>      {
>>>        if (probably_never_executed_bb_p (cfun, bb))
>>> -       BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>>> +        {
>>> +          BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>>> +          cold_bb_count++;
>>> +        }
>>>        else
>>> -       BB_SET_PARTITION (bb, BB_HOT_PARTITION);
>>> +        {
>>> +          BB_SET_PARTITION (bb, BB_HOT_PARTITION);
>>> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb);
>>> +        }
>>>      }
>>>
>>> +  /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of
>>> +     several different possibilities. One is that there are edge weight insanities
>>> +     due to optimization phases that do not properly update basic block profile
>>> +     counts. The second is that the entry of the function may not be hot, because
>>> +     it is entered fewer times than the number of profile training runs, but there
>>> +     is a loop inside the function that causes blocks within the function to be
>>> +     above the threshold for hotness.  */
>>> +  if (cold_bb_count)
>>> +    {
>>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>>> +
>>> +      if (dom_calculated_here)
>>> +        calculate_dominance_info (CDI_DOMINATORS);
>>> +
>>> +      /* Keep examining hot bbs until we have either checked them all, or
>>> +         re-marked all cold bbs hot.  */
>>> +      while (! VEC_empty (basic_block, bbs_in_hot_partition)
>>> +             && cold_bb_count)
>>> +        {
>>> +          basic_block dom_bb;
>>> +
>>> +          bb = VEC_pop (basic_block, bbs_in_hot_partition);
>>> +          dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
>>> +
>>> +          /* If bb's immediate dominator is also hot then it is ok.  */
>>> +          if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION)
>>> +            continue;
>>> +
>>> +          /* We have a hot bb with an immediate dominator that is cold.
>>> +             The dominator needs to be re-marked to hot.  */
>>> +          BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION);
>>> +          cold_bb_count--;
>>> +
>>> +          /* Now we need to examine newly-hot dom_bb to see if it is also
>>> +             dominated by a cold bb.  */
>>> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb);
>>> +
>>> +          /* We should also adjust any cold blocks that the newly-hot bb
>>> +             feeds and see if it makes sense to re-mark those as hot as
>>> +             well.  */
>>> +          VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb);
>>> +          while (! VEC_empty (basic_block, bbs_newly_hot))
>>> +            {
>>> +              basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot);
>>> +              /* Examine all successors of this newly-hot bb to see if they
>>> +                 are cold and should be re-marked as hot.  */
>>> +              FOR_EACH_EDGE (e, ei, new_hot_bb->succs)
>>> +                {
>>> +                  bool any_cold_preds = false;
>>> +                  basic_block succ = e->dest;
>>> +                  if (BB_PARTITION (succ) != BB_COLD_PARTITION)
>>> +                    continue;
>>> +                  /* Does this block have any cold predecessors now?  */
>>> +                  FOR_EACH_EDGE (e2, ei2, succ->preds)
>>> +                  {
>>> +                    if (BB_PARTITION (e2->src) == BB_COLD_PARTITION)
>>> +                      {
>>> +                        any_cold_preds = true;
>>> +                        break;
>>> +                      }
>>> +                  }
>>> +                  if (any_cold_preds)
>>> +                    continue;
>>> +
>>> +                  /* Here we have a successor of newly-hot bb that is cold
>>> +                     but no longer has any cold precessessors. Since the original
>>> +                     assignment of our newly-hot bb was incorrect, this successor's
>>> +                     assignment as cold is also suspect. Go ahead and re-mark it
>>> +                     as hot now too. Better heuristics may be in order here.  */
>>> +                  BB_SET_PARTITION (succ, BB_HOT_PARTITION);
>>> +                  cold_bb_count--;
>>> +                  VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ);
>>> +                  /* Examine this successor as a newly-hot bb.  */
>>> +                  VEC_safe_push (basic_block, heap, bbs_newly_hot, succ);
>>> +                }
>>> +            }
>>> +        }
>>> +
>>> +      if (dom_calculated_here)
>>> +        free_dominance_info (CDI_DOMINATORS);
>>> +    }
>>> +
>>>    /* The format of .gcc_except_table does not allow landing pads to
>>>       be in a different partition as the throw.  Fix this by either
>>>       moving or duplicating the landing pads.  */
>>> @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void)
>>>                       new_bb->aux = cur_bb->aux;
>>>                       cur_bb->aux = new_bb;
>>>
>>> -                     /* Make sure new fall-through bb is in same
>>> -                        partition as bb it's falling through from.  */
>>> +                      /* This is done by force_nonfallthru_and_redirect.  */
>>> +                     gcc_assert (BB_PARTITION (new_bb)
>>> +                                  == BB_PARTITION (cur_bb));
>>>
>>> -                     BB_COPY_PARTITION (new_bb, cur_bb);
>>>                       single_succ_edge (new_bb)->flags |= EDGE_CROSSING;
>>>                     }
>>>                   else
>>> @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void)
>>>    FOR_EACH_BB (bb)
>>>      FOR_EACH_EDGE (e, ei, bb->succs)
>>>        if ((e->flags & EDGE_CROSSING)
>>> -         && JUMP_P (BB_END (e->src)))
>>> +         && JUMP_P (BB_END (e->src))
>>> +          /* Some notes were added during fix_up_fall_thru_edges, via
>>> +             force_nonfallthru_and_redirect.  */
>>> +          && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX))
>>>         add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>>  }
>>>
>>> @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void)
>>>        dump_flow_info (dump_file, dump_flags);
>>>      }
>>>
>>> -  if (flag_reorder_blocks_and_partition)
>>> +  if (crtl->has_bb_partition)
>>>      verify_hot_cold_block_grouping ();
>>>  }
>>>
>>> @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void)
>>>     encountering this note will make the compiler switch between the
>>>     hot and cold text sections.  */
>>>
>>> -static void
>>> +void
>>>  insert_section_boundary_note (void)
>>>  {
>>>    basic_block bb;
>>>    rtx new_note;
>>>    int first_partition = 0;
>>>
>>> -  if (!flag_reorder_blocks_and_partition)
>>> +  if (!crtl->has_bb_partition)
>>>      return;
>>>
>>>    FOR_EACH_BB (bb)
>>> @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void)
>>>    FOR_EACH_BB (bb)
>>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>>>        bb->aux = bb->next_bb;
>>> -  cfg_layout_finalize ();
>>> +  cfg_layout_finalize (true);
>>>
>>> -  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
>>> -  insert_section_boundary_note ();
>>>    return 0;
>>>  }
>>>
>>> @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void)
>>>      }
>>>
>>>  done:
>>> -  cfg_layout_finalize ();
>>> +  cfg_layout_finalize (false);
>>>
>>>    BITMAP_FREE (candidates);
>>>    return 0;
>>> @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void)
>>>    if (crossing_edges == NULL)
>>>      return 0;
>>>
>>> +  crtl->has_bb_partition = true;
>>> +
>>>    /* Make sure the source of any crossing edge ends in a jump and the
>>>       destination of any crossing edge has a label.  */
>>>    add_labels_and_missing_jumps (crossing_edges);
>>> Index: bb-reorder.h
>>> ===================================================================
>>> --- bb-reorder.h        (revision 193376)
>>> +++ bb-reorder.h        (working copy)
>>> @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re
>>>
>>>  extern int get_uncond_jump_length (void);
>>>
>>> +extern void insert_section_boundary_note (void);
>>> +
>>> +extern void emit_barrier_after_bb (basic_block bb);
>>> +
>>>  #endif
>>> Index: basic-block.h
>>> ===================================================================
>>> --- basic-block.h       (revision 193376)
>>> +++ basic-block.h       (working copy)
>>> @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect
>>>  extern bool contains_no_active_insn_p (const_basic_block);
>>>  extern bool forwarder_block_p (const_basic_block);
>>>  extern bool can_fallthru (basic_block, basic_block);
>>> +extern void fixup_partitions (void);
>>>
>>>  /* In cfgbuild.c.  */
>>>  extern void find_many_sub_basic_blocks (sbitmap);
>>> Index: cfgrtl.c
>>> ===================================================================
>>> --- cfgrtl.c    (revision 193376)
>>> +++ cfgrtl.c    (working copy)
>>> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "tree.h"
>>>  #include "hard-reg-set.h"
>>>  #include "basic-block.h"
>>> +#include "bb-reorder.h"
>>>  #include "regs.h"
>>>  #include "flags.h"
>>>  #include "function.h"
>>> @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3.  If not see
>>>     Only applicable if the CFG is in cfglayout mode.  */
>>>  static GTY(()) rtx cfg_layout_function_footer;
>>>  static GTY(()) rtx cfg_layout_function_header;
>>> +static bool had_sec_boundary_notes;
>>>
>>>  static rtx skip_insns_after_block (basic_block);
>>>  static void record_effective_endpoints (void);
>>>  static rtx label_for_bb (basic_block);
>>> -static void fixup_reorder_chain (void);
>>> +static void fixup_reorder_chain (bool finalize_reorder_blocks);
>>>
>>>  void verify_insn_chain (void);
>>>  static void fixup_fallthru_exit_predecessor (void);
>>> @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc
>>>       partition boundaries).  See  the comments at the top of
>>>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>>>
>>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
>>> -      || BB_PARTITION (src) != BB_PARTITION (target))
>>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>>>      return NULL;
>>>
>>>    /* We can replace or remove a complex jump only when we have exactly
>>> @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target)
>>>    return e;
>>>  }
>>>
>>> +/* Called when edge E has been redirected to a new destination,
>>> +   in order to update the region crossing flag on the edge and
>>> +   jump.  */
>>> +
>>> +static void
>>> +fixup_partition_crossing (edge e, basic_block target)
>>> +{
>>> +  rtx note;
>>> +
>>> +  gcc_assert (e->dest == target);
>>> +
>>> +  if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR)
>>> +    return;
>>> +  /* If we redirected an existing edge, it may already be marked
>>> +     crossing, even though the new src is missing a reg crossing note.
>>> +     But make sure reg crossing note doesn't already exist before
>>> +     inserting.  */
>>> +  if (BB_PARTITION (e->src) != BB_PARTITION (target))
>>> +    {
>>> +      e->flags |= EDGE_CROSSING;
>>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>> +      if (JUMP_P (BB_END (e->src))
>>> +          && !note)
>>> +        add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>> +    }
>>> +  else if (BB_PARTITION (e->src) == BB_PARTITION (target))
>>> +    {
>>> +      e->flags &= ~EDGE_CROSSING;
>>> +      /* Remove the region crossing note from jump at end of
>>> +         e->src if it exists.  */
>>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>> +      if (note)
>>> +        remove_note (BB_END (e->src), note);
>>> +    }
>>> +}
>>> +
>>> +/* Called when block BB has been reassigned to a different partition,
>>> +   to ensure that the region crossing attributes are updated.  */
>>> +
>>> +static void
>>> +fixup_bb_partition (basic_block bb)
>>> +{
>>> +  edge e;
>>> +  edge_iterator ei;
>>> +
>>> +  /* Now need to make bb's pred edges non-region crossing.  */
>>> +  FOR_EACH_EDGE (e, ei, bb->preds)
>>> +    {
>>> +      fixup_partition_crossing (e, e->dest);
>>> +    }
>>> +
>>> +  /* Possibly need to make bb's successor edges region crossing,
>>> +     or remove stale region crossing.  */
>>> +  FOR_EACH_EDGE (e, ei, bb->succs)
>>> +    {
>>> +      if ((e->flags & EDGE_FALLTHRU)
>>> +          && BB_PARTITION (bb) != BB_PARTITION (e->dest)
>>> +          && e->dest != EXIT_BLOCK_PTR)
>>> +        /* force_nonfallthru_and_redirect calls fixup_partition_crossing.  */
>>> +        force_nonfallthru (e);
>>> +      else
>>> +        fixup_partition_crossing (e, e->dest);
>>> +    }
>>> +}
>>> +
>>>  /* Attempt to change code to redirect edge E to TARGET.  Don't do that on
>>>     expense of adding new instructions or reordering basic blocks.
>>>
>>> @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>>>  {
>>>    edge ret;
>>>    basic_block src = e->src;
>>> +  basic_block dest = e->dest;
>>>
>>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>>>      return NULL;
>>>
>>> -  if (e->dest == target)
>>> +  if (dest == target)
>>>      return e;
>>>
>>>    if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL)
>>>      {
>>>        df_set_bb_dirty (src);
>>> +      fixup_partition_crossing (ret, target);
>>>        return ret;
>>>      }
>>>
>>> @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>>>      return NULL;
>>>
>>>    df_set_bb_dirty (src);
>>> +  fixup_partition_crossing (ret, target);
>>>    return ret;
>>>  }
>>>
>>> @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>>>        /* Make sure new block ends up in correct hot/cold section.  */
>>>
>>>        BB_COPY_PARTITION (jump_block, e->src);
>>> -      if (flag_reorder_blocks_and_partition
>>> -         && targetm_common.have_named_sections
>>> -         && JUMP_P (BB_END (jump_block))
>>> -         && !any_condjump_p (BB_END (jump_block))
>>> -         && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING))
>>> -       add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX);
>>>
>>>        /* Wire edge in.  */
>>>        new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU);
>>>        new_edge->probability = probability;
>>>        new_edge->count = count;
>>>
>>> +      /* If e->src was previously region crossing, it no longer is
>>> +         and the reg crossing note should be removed.  */
>>> +      fixup_partition_crossing (new_edge, jump_block);
>>> +
>>>        /* Redirect old edge.  */
>>>        redirect_edge_pred (e, jump_block);
>>>        e->probability = REG_BR_PROB_BASE;
>>> @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>>>        LABEL_NUSES (label)++;
>>>      }
>>>
>>> -  emit_barrier_after (BB_END (jump_block));
>>> +  /* We might be in cfg layout mode, and if so, the following routine will
>>> +     insert the barrier correctly.  */
>>> +  emit_barrier_after_bb (jump_block);
>>>    redirect_edge_succ_nodup (e, target);
>>>
>>>    if (abnormal_edge_flags)
>>>      make_edge (src, target, abnormal_edge_flags);
>>>
>>>    df_mark_solutions_dirty ();
>>> +  fixup_partition_crossing (e, target);
>>>    return new_bb;
>>>  }
>>>
>>> @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU
>>>  static basic_block
>>>  rtl_split_edge (edge edge_in)
>>>  {
>>> -  basic_block bb;
>>> +  basic_block bb, new_bb;
>>>    rtx before;
>>>
>>>    /* Abnormal edges cannot be split.  */
>>> @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in)
>>>    else
>>>      {
>>>        bb = create_basic_block (before, NULL, edge_in->dest->prev_bb);
>>> -      /* ??? Why not edge_in->dest->prev_bb here?  */
>>> -      BB_COPY_PARTITION (bb, edge_in->dest);
>>> +      if (edge_in->src == ENTRY_BLOCK_PTR)
>>> +        BB_COPY_PARTITION (bb, edge_in->dest);
>>> +      else
>>> +        /* Put the split bb into the src partition, to avoid creating
>>> +           a situation where a cold bb dominates a hot bb, in the case
>>> +           where src is cold and dest is hot. The src will dominate
>>> +           the new bb (whereas it might not have dominated dest).  */
>>> +        BB_COPY_PARTITION (bb, edge_in->src);
>>>      }
>>>
>>>    make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU);
>>>
>>> +  /* Can't allow a region crossing edge to be fallthrough.  */
>>> +  if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest)
>>> +      && edge_in->dest != EXIT_BLOCK_PTR)
>>> +    {
>>> +      new_bb = force_nonfallthru (single_succ_edge (bb));
>>> +      gcc_assert (!new_bb);
>>> +    }
>>> +
>>>    /* For non-fallthru edges, we must adjust the predecessor's
>>>       jump instruction to target our new block.  */
>>>    if ((edge_in->flags & EDGE_FALLTHRU) == 0)
>>> @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e)
>>>    else
>>>      {
>>>        bb = split_edge (e);
>>> -      after = BB_END (bb);
>>>
>>> -      if (flag_reorder_blocks_and_partition
>>> -         && targetm_common.have_named_sections
>>> -         && e->src != ENTRY_BLOCK_PTR
>>> -         && BB_PARTITION (e->src) == BB_COLD_PARTITION
>>> -         && !(e->flags & EDGE_CROSSING)
>>> -         && JUMP_P (after)
>>> -         && !any_condjump_p (after)
>>> -         && (single_succ_edge (bb)->flags & EDGE_CROSSING))
>>> -       add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX);
>>> +      /* If e crossed a partition boundary, we needed to make bb end in
>>> +         a region-crossing jump, even though it was originally fallthru.  */
>>> +      if (JUMP_P (BB_END (bb)))
>>> +       before = BB_END (bb);
>>> +      else
>>> +        after = BB_END (bb);
>>>      }
>>>
>>>    /* Now that we've found the spot, do the insertion.  */
>>> @@ -1827,6 +1907,14 @@ commit_edge_insertions (void)
>>>  {
>>>    basic_block bb;
>>>
>>> +  /* Optimization passes that invoke this routine can cause hot blocks
>>> +     previously reached by both hot and cold blocks to become dominated only
>>> +     by cold blocks. This will cause the verification below to fail,
>>> +     and lead to now cold code in the hot section. In some cases this
>>> +     may only be visible after newly unreachable blocks are deleted,
>>> +     which will be done by fixup_partitions.  */
>>> +  fixup_partitions ();
>>> +
>>>  #ifdef ENABLE_CHECKING
>>>    verify_flow_info ();
>>>  #endif
>>> @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb)
>>>
>>>    return end;
>>>  }
>>> -
>>> +
>>> +/* Perform cleanup on the hot/cold bb partitioning after optimization
>>> +   passes that modify the cfg.  */
>>> +
>>> +void
>>> +fixup_partitions (void)
>>> +{
>>> +  basic_block bb;
>>> +
>>> +  if (!crtl->has_bb_partition)
>>> +    return;
>>> +
>>> +  /* Delete any blocks that became unreachable and weren't
>>> +     already cleaned up, for example during edge forwarding
>>> +     and convert_jumps_to_returns. This will expose more
>>> +     opportunities for fixing the partition boundaries here.
>>> +     Also, the calculation of the dominance graph during verification
>>> +     will assert if there are unreachable nodes.  */
>>> +  delete_unreachable_blocks ();
>>> +
>>> +  /* If there are partitions, do a sanity check on them: A basic block in
>>> +     a cold partition cannot dominate a basic block in a hot partition.
>>> +     Fixup any that now violate this requirement, as a result of edge
>>> +     forwarding and unreachable block deletion.  */
>>> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
>>> +  VEC (basic_block, heap) *bbs_to_fix = NULL;
>>> +  FOR_EACH_BB (bb)
>>> +    if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
>>> +      VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
>>> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
>>> +    {
>>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>>> +      basic_block son;
>>> +
>>> +      if (dom_calculated_here)
>>> +        calculate_dominance_info (CDI_DOMINATORS);
>>> +
>>> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
>>> +        {
>>> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
>>> +          /* If bb is not yet cold (because it was added below as
>>> +             a block dominated by a cold bb) then mark it cold here.  */
>>> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
>>> +            {
>>> +              BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>>> +              VEC_safe_push (basic_block, heap, bbs_to_fix, bb);
>>> +            }
>>> +          /* Any blocks dominated by a block in the cold section
>>> +             must also be cold.  */
>>> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
>>> +               son;
>>> +               son = next_dom_son (CDI_DOMINATORS, son))
>>> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
>>> +        }
>>> +
>>> +      if (dom_calculated_here)
>>> +        free_dominance_info (CDI_DOMINATORS);
>>> +    }
>>> +
>>> +  /* Do the partition fixup after all necessary blocks have been converted to
>>> +     cold, so that we only update the region crossings the minimum number of
>>> +     places, which can require forcing edges to be non fallthru.  */
>>> +  while (! VEC_empty (basic_block, bbs_to_fix))
>>> +    {
>>> +      bb = VEC_pop (basic_block, bbs_to_fix);
>>> +      fixup_bb_partition (bb);
>>> +    }
>>> +}
>>> +
>>>  /* Verify the CFG and RTL consistency common for both underlying RTL and
>>>     cfglayout RTL.
>>>
>>> @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void)
>>>    rtx x;
>>>    int err = 0;
>>>    basic_block bb;
>>> +  bool have_partitions = false;
>>>
>>>    /* Check the general integrity of the basic blocks.  */
>>>    FOR_EACH_BB_REVERSE (bb)
>>> @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void)
>>>
>>>           if (e->flags & EDGE_ABNORMAL)
>>>             n_abnormal++;
>>> +
>>> +          have_partitions |= is_crossing;
>>>         }
>>>
>>>        if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX))
>>> @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void)
>>>           }
>>>      }
>>>
>>> +  /* If there are partitions, do a sanity check on them: A basic block in
>>> +     a cold partition cannot dominate a basic block in a hot partition.  */
>>> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
>>> +  if (have_partitions && !err)
>>> +    FOR_EACH_BB (bb)
>>> +      if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
>>> +        VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
>>> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
>>> +    {
>>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>>> +      basic_block son;
>>> +
>>> +      if (dom_calculated_here)
>>> +        calculate_dominance_info (CDI_DOMINATORS);
>>> +
>>> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
>>> +        {
>>> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
>>> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
>>> +            {
>>> +              error ("non-cold basic block %d dominated "
>>> +                     "by a block in the cold partition", bb->index);
>>> +              err = 1;
>>> +            }
>>> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
>>> +               son;
>>> +               son = next_dom_son (CDI_DOMINATORS, son))
>>> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
>>> +        }
>>> +
>>> +      if (dom_calculated_here)
>>> +        free_dominance_info (CDI_DOMINATORS);
>>> +    }
>>> +
>>>    /* Clean up.  */
>>>    return err;
>>>  }
>>> @@ -2965,14 +3158,41 @@ record_effective_endpoints (void)
>>>    else
>>>      cfg_layout_function_header = NULL_RTX;
>>>
>>> +  had_sec_boundary_notes = false;
>>> +
>>>    next_insn = get_insns ();
>>>    FOR_EACH_BB (bb)
>>>      {
>>>        rtx end;
>>>
>>>        if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb))
>>> -       BB_HEADER (bb) = unlink_insn_chain (next_insn,
>>> -                                             PREV_INSN (BB_HEAD (bb)));
>>> +        {
>>> +          /* Rather than try to keep section boundary notes incrementally
>>> +             up-to-date through cfg layout optimizations, simply remove them
>>> +             and flag that they should be re-inserted when exiting
>>> +             cfg layout mode.  */
>>> +          rtx check_insn = next_insn;
>>> +          while (check_insn)
>>> +            {
>>> +              if (NOTE_P (check_insn)
>>> +                  && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS)
>>> +              {
>>> +                had_sec_boundary_notes |= true;
>>> +                /* Remove note from chain. Grab new next_insn first.  */
>>> +                if (next_insn == check_insn)
>>> +                  next_insn = NEXT_INSN (check_insn);
>>> +                /* Delete note.  */
>>> +                delete_insn (check_insn);
>>> +                /* There will only be one.  */
>>> +                break;
>>> +              }
>>> +              check_insn = NEXT_INSN (check_insn);
>>> +            }
>>> +          /* If we still have header instructions left after above loop.  */
>>> +          if (next_insn != BB_HEAD (bb))
>>> +            BB_HEADER (bb) = unlink_insn_chain (next_insn,
>>> +                                                PREV_INSN (BB_HEAD (bb)));
>>> +        }
>>>        end = skip_insns_after_block (bb);
>>>        if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end)
>>>         BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end);
>>> @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void)
>>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>>>        bb->aux = bb->next_bb;
>>>
>>> -  cfg_layout_finalize ();
>>> +  cfg_layout_finalize (false);
>>>
>>>    return 0;
>>>  }
>>> @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode)
>>>  }
>>>
>>>
>>> -/* Given a reorder chain, rearrange the code to match.  */
>>> +/* Given a reorder chain, rearrange the code to match. If
>>> +   this is called when we will FINALIZE_REORDER_BLOCKS, or when
>>> +   section boundary notes were removed on entry to cfg layout
>>> +   mode, insert section boundary notes here.  */
>>>
>>>  static void
>>> -fixup_reorder_chain (void)
>>> +fixup_reorder_chain (bool finalize_reorder_blocks)
>>>  {
>>>    basic_block bb;
>>>    rtx insn = NULL;
>>> @@ -3150,7 +3373,7 @@ static void
>>>           PREV_INSN (BB_HEADER (bb)) = insn;
>>>           insn = BB_HEADER (bb);
>>>           while (NEXT_INSN (insn))
>>> -           insn = NEXT_INSN (insn);
>>> +            insn = NEXT_INSN (insn);
>>>         }
>>>        if (insn)
>>>         NEXT_INSN (insn) = BB_HEAD (bb);
>>> @@ -3175,6 +3398,11 @@ static void
>>>      insn = NEXT_INSN (insn);
>>>
>>>    set_last_insn (insn);
>>> +
>>> +  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
>>> +  if (had_sec_boundary_notes || finalize_reorder_blocks)
>>> +    insert_section_boundary_note ();
>>> +
>>>  #ifdef ENABLE_CHECKING
>>>    verify_insn_chain ();
>>>  #endif
>>> @@ -3187,7 +3415,7 @@ static void
>>>        edge e_fall, e_taken, e;
>>>        rtx bb_end_insn;
>>>        rtx ret_label = NULL_RTX;
>>> -      basic_block nb, src_bb;
>>> +      basic_block nb;
>>>        edge_iterator ei;
>>>
>>>        if (EDGE_COUNT (bb->succs) == 0)
>>> @@ -3322,7 +3550,6 @@ static void
>>>        /* We got here if we need to add a new jump insn.
>>>          Note force_nonfallthru can delete E_FALL and thus we have to
>>>          save E_FALL->src prior to the call to force_nonfallthru.  */
>>> -      src_bb = e_fall->src;
>>>        nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label);
>>>        if (nb)
>>>         {
>>> @@ -3330,17 +3557,6 @@ static void
>>>           bb->aux = nb;
>>>           /* Don't process this new block.  */
>>>           bb = nb;
>>> -
>>> -         /* Make sure new bb is tagged for correct section (same as
>>> -            fall-thru source, since you cannot fall-thru across
>>> -            section boundaries).  */
>>> -         BB_COPY_PARTITION (src_bb, single_pred (bb));
>>> -         if (flag_reorder_blocks_and_partition
>>> -             && targetm_common.have_named_sections
>>> -             && JUMP_P (BB_END (bb))
>>> -             && !any_condjump_p (BB_END (bb))
>>> -             && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING))
>>> -           add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX);
>>>         }
>>>      }
>>>
>>> @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to)
>>>             case NOTE_INSN_FUNCTION_BEG:
>>>               /* There is always just single entry to function.  */
>>>             case NOTE_INSN_BASIC_BLOCK:
>>> +              /* We should only switch text sections once.  */
>>> +           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>>>               break;
>>>
>>>             case NOTE_INSN_EPILOGUE_BEG:
>>> -           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>>>               emit_note_copy (insn);
>>>               break;
>>>
>>> @@ -3759,10 +3976,13 @@ break_superblocks (void)
>>>  }
>>>
>>>  /* Finalize the changes: reorder insn list according to the sequence specified
>>> -   by aux pointers, enter compensation code, rebuild scope forest.  */
>>> +   by aux pointers, enter compensation code, rebuild scope forest. If
>>> +   this is called when we will FINALIZE_REORDER_BLOCKS, indicate that
>>> +   to fixup_reorder_chain so that it can insert the proper switch text
>>> +   section notes.  */
>>>
>>>  void
>>> -cfg_layout_finalize (void)
>>> +cfg_layout_finalize (bool finalize_reorder_blocks)
>>>  {
>>>  #ifdef ENABLE_CHECKING
>>>    verify_flow_info ();
>>> @@ -3775,7 +3995,7 @@ void
>>>  #endif
>>>        )
>>>      fixup_fallthru_exit_predecessor ();
>>> -  fixup_reorder_chain ();
>>> +  fixup_reorder_chain (finalize_reorder_blocks);
>>>
>>>    rebuild_jump_labels (get_insns ());
>>>    delete_dead_jumptables ();
>>> @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e)
>>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>>>      return false;
>>>
>>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
>>> -      || BB_PARTITION (src) != BB_PARTITION (target))
>>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>>>      return false;
>>>
>>>    if (!onlyjump_p (insn)
>>>
>>> --
>>> This patch is available for review at http://codereview.appspot.com/6823047
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
  2012-11-26 20:20     ` Teresa Johnson
@ 2012-11-26 20:29       ` Teresa Johnson
  2012-11-26 20:43       ` Jack Howarth
  1 sibling, 0 replies; 35+ messages in thread
From: Teresa Johnson @ 2012-11-26 20:29 UTC (permalink / raw)
  To: Christophe Lyon
  Cc: reply, David Li, Steven Bosscher, Matthew Gretton-Dann, gcc-patches

Here is the patch again, updated to use the new vec implementation.

Thanks,
Teresa

Revised patch that fixes failures encountered when enabling
-freorder-blocks-and-partition, including the failure reported in PR 53743.

This includes new verification code to ensure no cold blocks dominate hot
blocks contributed by Steven Bosscher.

I attempted to make the handling of partition updates through the optimization
passes much more consistent, removing a number of partial fixes in the code
stream in the process. The code to fixup partitions (including the BB_PARTITION
assignement, region crossing jump notes, and switch text section notes) is
now handled in a few centralized locations. For example, inside
rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers
don't need to attempt the fixup themselves.

For optimization passes that make adjustments to the cfg while in cfg layout
mode that are not easy to fix up incrementally, the new routine
fixup_partitions handles the cleanup globally. This does require calculation
of the dominance relation, however, as far as I can tell the routines which
now invoke this global fixup (try_optimize_cfg and commit_edge_insertions)
are invoked typically once (or a small number of times in the case of
try_optimize_cfg) per optimization pass. Additionally, I compared the
-ftime-report output for some large fdo compilations and saw only minimal
increases in the dominance computation times, which were only a tiny percent
of the overall compile time.

Additionally, I added a flag to the rtl_data structure to indicate whether
any partitioning was actually performed, so that optimizations which were
conservatively disabled whenever the flag_reorder_blocks_and_partition
is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less
conservative for functions where no partitions were formed (e.g. they are
completely hot).

Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with
SPEC2006 int
benchmarks and internal google benchmarks using profile feedback and
-freorder-blocks-and-partition to get more coverage. Ok for trunk?

Thanks,
Teresa

2012-11-26  Teresa Johnson  <tejohnson@google.com>
            Steven Bosscher  <steven@gcc.gnu.org>

* cfghooks.h (cfg_layout_finalize): New parameter.
* modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize
        parameter.
* ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert
        as this is now done by redirect_edge_and_branch_force.
* function.c (thread_prologue_and_epilogue_insns): Insert new bb after
        barriers, new cfg_layout_finalize parameter, and don't store exit
        predecessor BB until after it is potentially split.
* function.h (struct rtl_data): New flag has_bb_partition.
* hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter.
* cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if
        any blocks in function actually partitioned.
(try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean
        up partitioning.
* bb-reorder.c (connect_traces): Only look for partitions and skip
        block copying if any blocks in function actually partitioned.
(emit_barrier_after_bb): Handle insertion in non-cfglayout mode.
        (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure
        that no cold blocks dominate a hot block.
(fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert
        as this is now done by force_nonfallthru_and_redirect.
(add_reg_crossing_jump_notes): Handle the fact that some jumps may
        already be marked with region crossing note.
(reorder_basic_blocks): Only need to verify partitions if any
        blocks in function actually partitioned.
(insert_section_boundary_note): Only need to insert note if any
        blocks in function actually partitioned.
(rest_of_handle_reorder_blocks): New cfg_layout_finalize
        parameter, and remove call to insert_section_boundary_note as this
        is now called via cfg_layout_finalize/fixup_reorder_chain.
(duplicate_computed_gotos): New cfg_layout_finalize
        parameter.
(partition_hot_cold_basic_blocks): Set flag indicating function
        has bb partitions.
* bb-reorder.h: Declare insert_section_boundary_note and
        emit_barrier_after_bb, which are no longer static.
* basic-block.h: Declare new function fixup_partitions.
* cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary
        check for region crossing note.
(fixup_partition_crossing): New function.
(fixup_bb_partition): Ditto.
(rtl_redirect_edge_and_branch): Fixup partition boundaries.
(force_nonfallthru_and_redirect): Fixup partition boundaries,
        remove old code that tried to do this. Emit barrier correctly
        when we are in cfglayout mode.
(rtl_split_edge): Correctly fixup partition boundaries.
(commit_one_edge_insertion): Remove old code that tried to
        fixup region crossing edge since this is now handled in
        split_block, and set up insertion point correctly since
        block may now end in a jump.
(commit_edge_insertions): Invoke fixup_partitions to sanitize partition
        boundaries after optimizations that modify cfg and before trying to
        verify the flow info.
(fixup_partitions): New function.
(rtl_verify_flow_info_1): Add verification that no cold bbs dominate
        hot bbs.
(record_effective_endpoints): Remove region-crossing notes and set flag
        indicating that they need to be reinserted on exit from cfglayout mode.
(outof_cfg_layout_mode): New cfg_layout_finalize parameter.
(fixup_reorder_chain): Call insert_section_boundary_note if necessary.
        Remove old code that attempted to fixup region crossing note as
        this is now handled in force_nonfallthru_and_redirect.
(duplicate_insn_chain): Don't duplicate switch section notes.
(cfg_layout_finalize): Pass new parameter to fixup_reorder_chain.
(rtl_can_remove_branch_p): Remove unnecessary check for region crossing
        note.

Index: cfghooks.h
===================================================================
--- cfghooks.h (revision 193827)
+++ cfghooks.h (working copy)
@@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas
 void account_profile_record (struct profile_record *, int);

 extern void cfg_layout_initialize (unsigned int);
-extern void cfg_layout_finalize (void);
+extern void cfg_layout_finalize (bool);

 /* Hooks containers.  */
 extern struct cfg_hooks gimple_cfg_hooks;
@@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi
 extern void gimple_register_cfg_hooks (void);
 extern struct cfg_hooks get_cfg_hooks (void);
 extern void set_cfg_hooks (struct cfg_hooks);
-
Index: modulo-sched.c
===================================================================
--- modulo-sched.c (revision 193827)
+++ modulo-sched.c (working copy)
@@ -3347,7 +3347,7 @@ rest_of_handle_sms (void)
     if (bb->next_bb != EXIT_BLOCK_PTR)
       bb->aux = bb->next_bb;
   free_dominance_info (CDI_DOMINATORS);
-  cfg_layout_finalize ();
+  cfg_layout_finalize (false);
 #endif /* INSN_SCHEDULING */
   return 0;
 }
Index: ifcvt.c
===================================================================
--- ifcvt.c (revision 193827)
+++ ifcvt.c (working copy)
@@ -3899,10 +3899,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg
   if (new_bb)
     {
       df_bb_replace (then_bb_index, new_bb);
-      /* Since the fallthru edge was redirected from test_bb to new_bb,
-         we need to ensure that new_bb is in the same partition as
-         test bb (you can not fall through across section boundaries).  */
-      BB_COPY_PARTITION (new_bb, test_bb);
+      /* This should have been done above via force_nonfallthru_and_redirect
+         (possibly called from redirect_edge_and_branch_force).  */
+      gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb));
     }

   num_true_changes++;
Index: function.c
===================================================================
--- function.c (revision 193827)
+++ function.c (working copy)
@@ -6246,8 +6246,10 @@ thread_prologue_and_epilogue_insns (void)
     break;
  if (e)
   {
-    copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)),
-  NULL_RTX, e->src);
+                    /* Make sure we insert after any barriers.  */
+                    rtx end = get_last_bb_insn (e->src);
+                    copy_bb = create_basic_block (NEXT_INSN (end),
+                                                  NULL_RTX, e->src);
     BB_COPY_PARTITION (copy_bb, e->src);
   }
  else
@@ -6472,7 +6474,7 @@ thread_prologue_and_epilogue_insns (void)
  if (cur_bb->index >= NUM_FIXED_BLOCKS
     && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS)
   cur_bb->aux = cur_bb->next_bb;
-      cfg_layout_finalize ();
+      cfg_layout_finalize (false);
     }

 epilogue_done:
@@ -6514,7 +6516,7 @@ epilogue_done:
       basic_block simple_return_block_cold = NULL;
       edge pending_edge_hot = NULL;
       edge pending_edge_cold = NULL;
-      basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb;
+      basic_block exit_pred;
       int i;

       gcc_assert (entry_edge != orig_entry_edge);
@@ -6542,6 +6544,12 @@ epilogue_done:
     else
       pending_edge_cold = e;
   }
+
+      /* Save a pointer to the exit's predecessor BB for use in
+         inserting new BBs at the end of the function. Do this
+         after the call to split_block above which may split
+         the original exit pred.  */
+      exit_pred = EXIT_BLOCK_PTR->prev_bb;

       FOR_EACH_VEC_ELT (unconverted_simple_returns, i, e)
  {
Index: function.h
===================================================================
--- function.h (revision 193827)
+++ function.h (working copy)
@@ -451,6 +451,11 @@ struct GTY(()) rtl_data {
      sched2) and is useful only if the port defines LEAF_REGISTERS.  */
   bool uses_only_leaf_regs;

+  /* Nonzero if the function being compiled has undergone hot/cold partitioning
+     (under flag_reorder_blocks_and_partition) and has at least one cold
+     block.  */
+  bool has_bb_partition;
+
   /* Like regs_ever_live, but 1 if a reg is set or clobbered from an
      asm.  Unlike regs_ever_live, elements of this array corresponding
      to eliminable regs (like the frame pointer) are set if an asm
Index: hw-doloop.c
===================================================================
--- hw-doloop.c (revision 193827)
+++ hw-doloop.c (working copy)
@@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops)
       else
  bb->aux = NULL;
     }
-  cfg_layout_finalize ();
+  cfg_layout_finalize (false);
   clear_aux_for_blocks ();
   df_analyze ();
 }
Index: cfgcleanup.c
===================================================================
--- cfgcleanup.c (revision 193827)
+++ cfgcleanup.c (working copy)
@@ -1846,7 +1846,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2,
      partition boundaries).  See the comments at the top of
      bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */

-  if (flag_reorder_blocks_and_partition && reload_completed)
+  if (crtl->has_bb_partition && reload_completed)
     return false;

   /* Search backward through forwarder blocks.  We don't need to worry
@@ -2789,10 +2789,21 @@ try_optimize_cfg (int mode)
       df_analyze ();
     }

+  if (changed)
+            {
+              /* Edge forwarding in particular can cause hot blocks previously
+                 reached by both hot and cold blocks to become dominated only
+                 by cold blocks. This will cause the verification
below to fail,
+                 and lead to now cold code in the hot section. This is not easy
+                 to detect and fix during edge forwarding, and in some cases
+                 is only visible after newly unreachable blocks are deleted,
+                 which will be done in fixup_partitions.  */
+              fixup_partitions ();
+
 #ifdef ENABLE_CHECKING
-  if (changed)
-    verify_flow_info ();
+              verify_flow_info ();
 #endif
+            }

   changed_overall |= changed;
   first_pass = false;
Index: bb-reorder.c
===================================================================
--- bb-reorder.c (revision 193827)
+++ bb-reorder.c (working copy)
@@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces
   current_partition = BB_PARTITION (traces[0].first);
   two_passes = false;

-  if (flag_reorder_blocks_and_partition)
+  if (crtl->has_bb_partition)
     for (i = 0; i < n_traces && !two_passes; i++)
       if (BB_PARTITION (traces[0].first)
   != BB_PARTITION (traces[i].first))
@@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces
       }
   }

-      if (flag_reorder_blocks_and_partition)
+      if (crtl->has_bb_partition)
  try_copy = false;

       /* Copy tiny blocks always; copy larger blocks only when the
@@ -1381,13 +1381,14 @@ get_uncond_jump_length (void)
   return length;
 }

-/* Emit a barrier into the footer of BB.  */
+/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode.  */

-static void
+void
 emit_barrier_after_bb (basic_block bb)
 {
   rtx barrier = emit_barrier_after (BB_END (bb));
-  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
+  if (current_ir_type () == IR_RTL_CFGLAYOUT)
+    BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
 }

 /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.
@@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg
 {
   vec<edge> crossing_edges = vNULL;
   basic_block bb;
-  edge e;
-  edge_iterator ei;
+  edge e, e2;
+  edge_iterator ei, ei2;
+  unsigned int cold_bb_count = 0;
+  vec<basic_block> bbs_in_hot_partition = vNULL;
+  vec<basic_block> bbs_newly_hot = vNULL;

   /* Mark which partition (hot/cold) each basic block belongs in.  */
   FOR_EACH_BB (bb)
     {
       if (probably_never_executed_bb_p (cfun, bb))
- BB_SET_PARTITION (bb, BB_COLD_PARTITION);
+        {
+          BB_SET_PARTITION (bb, BB_COLD_PARTITION);
+          cold_bb_count++;
+        }
       else
- BB_SET_PARTITION (bb, BB_HOT_PARTITION);
+        {
+          BB_SET_PARTITION (bb, BB_HOT_PARTITION);
+          bbs_in_hot_partition.safe_push (bb);
+        }
     }

+  /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of
+     several different possibilities. One is that there are edge
weight insanities
+     due to optimization phases that do not properly update basic block profile
+     counts. The second is that the entry of the function may not be
hot, because
+     it is entered fewer times than the number of profile training
runs, but there
+     is a loop inside the function that causes blocks within the function to be
+     above the threshold for hotness.  */
+  if (cold_bb_count)
+    {
+      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
+
+      if (dom_calculated_here)
+        calculate_dominance_info (CDI_DOMINATORS);
+
+      /* Keep examining hot bbs until we have either checked them all, or
+         re-marked all cold bbs hot.  */
+      while (! bbs_in_hot_partition.is_empty ()
+             && cold_bb_count)
+        {
+          basic_block dom_bb;
+
+          bb = bbs_in_hot_partition.pop ();
+          dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
+
+          /* If bb's immediate dominator is also hot then it is ok.  */
+          if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION)
+            continue;
+
+          /* We have a hot bb with an immediate dominator that is cold.
+             The dominator needs to be re-marked to hot.  */
+          BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION);
+          cold_bb_count--;
+
+          /* Now we need to examine newly-hot dom_bb to see if it is also
+             dominated by a cold bb.  */
+          bbs_in_hot_partition.safe_push (dom_bb);
+
+          /* We should also adjust any cold blocks that the newly-hot bb
+             feeds and see if it makes sense to re-mark those as hot as
+             well.  */
+          bbs_newly_hot.safe_push (dom_bb);
+          while (! bbs_newly_hot.is_empty ())
+            {
+              basic_block new_hot_bb = bbs_newly_hot.pop ();
+              /* Examine all successors of this newly-hot bb to see if they
+                 are cold and should be re-marked as hot.  */
+              FOR_EACH_EDGE (e, ei, new_hot_bb->succs)
+                {
+                  bool any_cold_preds = false;
+                  basic_block succ = e->dest;
+                  if (BB_PARTITION (succ) != BB_COLD_PARTITION)
+                    continue;
+                  /* Does this block have any cold predecessors now?  */
+                  FOR_EACH_EDGE (e2, ei2, succ->preds)
+                  {
+                    if (BB_PARTITION (e2->src) == BB_COLD_PARTITION)
+                      {
+                        any_cold_preds = true;
+                        break;
+                      }
+                  }
+                  if (any_cold_preds)
+                    continue;
+
+                  /* Here we have a successor of newly-hot bb that is cold
+                     but no longer has any cold precessessors. Since
the original
+                     assignment of our newly-hot bb was incorrect,
this successor's
+                     assignment as cold is also suspect. Go ahead and
re-mark it
+                     as hot now too. Better heuristics may be in
order here.  */
+                  BB_SET_PARTITION (succ, BB_HOT_PARTITION);
+                  cold_bb_count--;
+                  bbs_in_hot_partition.safe_push (succ);
+                  /* Examine this successor as a newly-hot bb.  */
+                  bbs_newly_hot.safe_push (succ);
+                }
+            }
+        }
+
+      if (dom_calculated_here)
+        free_dominance_info (CDI_DOMINATORS);
+    }
+
   /* The format of .gcc_except_table does not allow landing pads to
      be in a different partition as the throw.  Fix this by either
      moving or duplicating the landing pads.  */
@@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void)
       new_bb->aux = cur_bb->aux;
       cur_bb->aux = new_bb;

-      /* Make sure new fall-through bb is in same
- partition as bb it's falling through from.  */
+                      /* This is done by force_nonfallthru_and_redirect.  */
+      gcc_assert (BB_PARTITION (new_bb)
+                                  == BB_PARTITION (cur_bb));

-      BB_COPY_PARTITION (new_bb, cur_bb);
       single_succ_edge (new_bb)->flags |= EDGE_CROSSING;
     }
   else
@@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void)
   FOR_EACH_BB (bb)
     FOR_EACH_EDGE (e, ei, bb->succs)
       if ((e->flags & EDGE_CROSSING)
-  && JUMP_P (BB_END (e->src)))
+  && JUMP_P (BB_END (e->src))
+          /* Some notes were added during fix_up_fall_thru_edges, via
+             force_nonfallthru_and_redirect.  */
+          && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX))
  add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
 }

@@ -2160,7 +2255,7 @@ reorder_basic_blocks (void)
       dump_flow_info (dump_file, dump_flags);
     }

-  if (flag_reorder_blocks_and_partition)
+  if (crtl->has_bb_partition)
     verify_hot_cold_block_grouping ();
 }

@@ -2172,14 +2267,14 @@ reorder_basic_blocks (void)
    encountering this note will make the compiler switch between the
    hot and cold text sections.  */

-static void
+void
 insert_section_boundary_note (void)
 {
   basic_block bb;
   rtx new_note;
   int first_partition = 0;

-  if (!flag_reorder_blocks_and_partition)
+  if (!crtl->has_bb_partition)
     return;

   FOR_EACH_BB (bb)
@@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void)
   FOR_EACH_BB (bb)
     if (bb->next_bb != EXIT_BLOCK_PTR)
       bb->aux = bb->next_bb;
-  cfg_layout_finalize ();
+  cfg_layout_finalize (true);

-  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
-  insert_section_boundary_note ();
   return 0;
 }

@@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void)
     }

 done:
-  cfg_layout_finalize ();
+  cfg_layout_finalize (false);

   BITMAP_FREE (candidates);
   return 0;
@@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void)
   if (!crossing_edges.exists ())
     return 0;

+  crtl->has_bb_partition = true;
+
   /* Make sure the source of any crossing edge ends in a jump and the
      destination of any crossing edge has a label.  */
   add_labels_and_missing_jumps (crossing_edges);
Index: bb-reorder.h
===================================================================
--- bb-reorder.h (revision 193827)
+++ bb-reorder.h (working copy)
@@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re

 extern int get_uncond_jump_length (void);

+extern void insert_section_boundary_note (void);
+
+extern void emit_barrier_after_bb (basic_block bb);
+
 #endif
Index: basic-block.h
===================================================================
--- basic-block.h (revision 193827)
+++ basic-block.h (working copy)
@@ -800,6 +800,7 @@ extern basic_block force_nonfallthru_and_redirect
 extern bool contains_no_active_insn_p (const_basic_block);
 extern bool forwarder_block_p (const_basic_block);
 extern bool can_fallthru (basic_block, basic_block);
+extern void fixup_partitions (void);

 /* In cfgbuild.c.  */
 extern void find_many_sub_basic_blocks (sbitmap);
Index: cfgrtl.c
===================================================================
--- cfgrtl.c (revision 193827)
+++ cfgrtl.c (working copy)
@@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree.h"
 #include "hard-reg-set.h"
 #include "basic-block.h"
+#include "bb-reorder.h"
 #include "regs.h"
 #include "flags.h"
 #include "function.h"
@@ -67,11 +68,12 @@ along with GCC; see the file COPYING3.  If not see
    Only applicable if the CFG is in cfglayout mode.  */
 static GTY(()) rtx cfg_layout_function_footer;
 static GTY(()) rtx cfg_layout_function_header;
+static bool had_sec_boundary_notes;

 static rtx skip_insns_after_block (basic_block);
 static void record_effective_endpoints (void);
 static rtx label_for_bb (basic_block);
-static void fixup_reorder_chain (void);
+static void fixup_reorder_chain (bool finalize_reorder_blocks);

 void verify_insn_chain (void);
 static void fixup_fallthru_exit_predecessor (void);
@@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc
      partition boundaries).  See  the comments at the top of
      bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */

-  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
-      || BB_PARTITION (src) != BB_PARTITION (target))
+  if (BB_PARTITION (src) != BB_PARTITION (target))
     return NULL;

   /* We can replace or remove a complex jump only when we have exactly
@@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target)
   return e;
 }

+/* Called when edge E has been redirected to a new destination,
+   in order to update the region crossing flag on the edge and
+   jump.  */
+
+static void
+fixup_partition_crossing (edge e, basic_block target)
+{
+  rtx note;
+
+  gcc_assert (e->dest == target);
+
+  if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR)
+    return;
+  /* If we redirected an existing edge, it may already be marked
+     crossing, even though the new src is missing a reg crossing note.
+     But make sure reg crossing note doesn't already exist before
+     inserting.  */
+  if (BB_PARTITION (e->src) != BB_PARTITION (target))
+    {
+      e->flags |= EDGE_CROSSING;
+      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
+      if (JUMP_P (BB_END (e->src))
+          && !note)
+        add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
+    }
+  else if (BB_PARTITION (e->src) == BB_PARTITION (target))
+    {
+      e->flags &= ~EDGE_CROSSING;
+      /* Remove the region crossing note from jump at end of
+         e->src if it exists.  */
+      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
+      if (note)
+        remove_note (BB_END (e->src), note);
+    }
+}
+
+/* Called when block BB has been reassigned to a different partition,
+   to ensure that the region crossing attributes are updated.  */
+
+static void
+fixup_bb_partition (basic_block bb)
+{
+  edge e;
+  edge_iterator ei;
+
+  /* Now need to make bb's pred edges non-region crossing.  */
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    {
+      fixup_partition_crossing (e, e->dest);
+    }
+
+  /* Possibly need to make bb's successor edges region crossing,
+     or remove stale region crossing.  */
+  FOR_EACH_EDGE (e, ei, bb->succs)
+    {
+      if ((e->flags & EDGE_FALLTHRU)
+          && BB_PARTITION (bb) != BB_PARTITION (e->dest)
+          && e->dest != EXIT_BLOCK_PTR)
+        /* force_nonfallthru_and_redirect calls fixup_partition_crossing.  */
+        force_nonfallthru (e);
+      else
+        fixup_partition_crossing (e, e->dest);
+    }
+}
+
 /* Attempt to change code to redirect edge E to TARGET.  Don't do that on
    expense of adding new instructions or reordering basic blocks.

@@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block
 {
   edge ret;
   basic_block src = e->src;
+  basic_block dest = e->dest;

   if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
     return NULL;

-  if (e->dest == target)
+  if (dest == target)
     return e;

   if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL)
     {
       df_set_bb_dirty (src);
+      fixup_partition_crossing (ret, target);
       return ret;
     }

@@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block
     return NULL;

   df_set_bb_dirty (src);
+  fixup_partition_crossing (ret, target);
   return ret;
 }

@@ -1486,18 +1555,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
       /* Make sure new block ends up in correct hot/cold section.  */

       BB_COPY_PARTITION (jump_block, e->src);
-      if (flag_reorder_blocks_and_partition
-  && targetm_common.have_named_sections
-  && JUMP_P (BB_END (jump_block))
-  && !any_condjump_p (BB_END (jump_block))
-  && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING))
- add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX);

       /* Wire edge in.  */
       new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU);
       new_edge->probability = probability;
       new_edge->count = count;

+      /* If e->src was previously region crossing, it no longer is
+         and the reg crossing note should be removed.  */
+      fixup_partition_crossing (new_edge, jump_block);
+
       /* Redirect old edge.  */
       redirect_edge_pred (e, jump_block);
       e->probability = REG_BR_PROB_BASE;
@@ -1553,13 +1620,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
       LABEL_NUSES (label)++;
     }

-  emit_barrier_after (BB_END (jump_block));
+  /* We might be in cfg layout mode, and if so, the following routine will
+     insert the barrier correctly.  */
+  emit_barrier_after_bb (jump_block);
   redirect_edge_succ_nodup (e, target);

   if (abnormal_edge_flags)
     make_edge (src, target, abnormal_edge_flags);

   df_mark_solutions_dirty ();
+  fixup_partition_crossing (e, target);
   return new_bb;
 }

@@ -1658,7 +1734,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU
 static basic_block
 rtl_split_edge (edge edge_in)
 {
-  basic_block bb;
+  basic_block bb, new_bb;
   rtx before;

   /* Abnormal edges cannot be split.  */
@@ -1691,12 +1767,26 @@ rtl_split_edge (edge edge_in)
   else
     {
       bb = create_basic_block (before, NULL, edge_in->dest->prev_bb);
-      /* ??? Why not edge_in->dest->prev_bb here?  */
-      BB_COPY_PARTITION (bb, edge_in->dest);
+      if (edge_in->src == ENTRY_BLOCK_PTR)
+        BB_COPY_PARTITION (bb, edge_in->dest);
+      else
+        /* Put the split bb into the src partition, to avoid creating
+           a situation where a cold bb dominates a hot bb, in the case
+           where src is cold and dest is hot. The src will dominate
+           the new bb (whereas it might not have dominated dest).  */
+        BB_COPY_PARTITION (bb, edge_in->src);
     }

   make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU);

+  /* Can't allow a region crossing edge to be fallthrough.  */
+  if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest)
+      && edge_in->dest != EXIT_BLOCK_PTR)
+    {
+      new_bb = force_nonfallthru (single_succ_edge (bb));
+      gcc_assert (!new_bb);
+    }
+
   /* For non-fallthru edges, we must adjust the predecessor's
      jump instruction to target our new block.  */
   if ((edge_in->flags & EDGE_FALLTHRU) == 0)
@@ -1809,17 +1899,13 @@ commit_one_edge_insertion (edge e)
   else
     {
       bb = split_edge (e);
-      after = BB_END (bb);

-      if (flag_reorder_blocks_and_partition
-  && targetm_common.have_named_sections
-  && e->src != ENTRY_BLOCK_PTR
-  && BB_PARTITION (e->src) == BB_COLD_PARTITION
-  && !(e->flags & EDGE_CROSSING)
-  && JUMP_P (after)
-  && !any_condjump_p (after)
-  && (single_succ_edge (bb)->flags & EDGE_CROSSING))
- add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX);
+      /* If e crossed a partition boundary, we needed to make bb end in
+         a region-crossing jump, even though it was originally fallthru.  */
+      if (JUMP_P (BB_END (bb)))
+ before = BB_END (bb);
+      else
+        after = BB_END (bb);
     }

   /* Now that we've found the spot, do the insertion.  */
@@ -1859,6 +1945,14 @@ commit_edge_insertions (void)
 {
   basic_block bb;

+  /* Optimization passes that invoke this routine can cause hot blocks
+     previously reached by both hot and cold blocks to become dominated only
+     by cold blocks. This will cause the verification below to fail,
+     and lead to now cold code in the hot section. In some cases this
+     may only be visible after newly unreachable blocks are deleted,
+     which will be done by fixup_partitions.  */
+  fixup_partitions ();
+
 #ifdef ENABLE_CHECKING
   verify_flow_info ();
 #endif
@@ -2060,7 +2154,75 @@ get_last_bb_insn (basic_block bb)

   return end;
 }
-

+
+/* Perform cleanup on the hot/cold bb partitioning after optimization
+   passes that modify the cfg.  */
+
+void
+fixup_partitions (void)
+{
+  basic_block bb;
+
+  if (!crtl->has_bb_partition)
+    return;
+
+  /* Delete any blocks that became unreachable and weren't
+     already cleaned up, for example during edge forwarding
+     and convert_jumps_to_returns. This will expose more
+     opportunities for fixing the partition boundaries here.
+     Also, the calculation of the dominance graph during verification
+     will assert if there are unreachable nodes.  */
+  delete_unreachable_blocks ();
+
+  /* If there are partitions, do a sanity check on them: A basic block in
+     a cold partition cannot dominate a basic block in a hot partition.
+     Fixup any that now violate this requirement, as a result of edge
+     forwarding and unreachable block deletion.  */
+  vec<basic_block> bbs_in_cold_partition = vNULL;
+  vec<basic_block> bbs_to_fix = vNULL;
+  FOR_EACH_BB (bb)
+    if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
+      bbs_in_cold_partition.safe_push (bb);
+  if (! bbs_in_cold_partition.is_empty  ())
+    {
+      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
+      basic_block son;
+
+      if (dom_calculated_here)
+        calculate_dominance_info (CDI_DOMINATORS);
+
+      while (! bbs_in_cold_partition.is_empty  ())
+        {
+          bb = bbs_in_cold_partition.pop ();
+          /* If bb is not yet cold (because it was added below as
+             a block dominated by a cold bb) then mark it cold here.  */
+          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
+            {
+              BB_SET_PARTITION (bb, BB_COLD_PARTITION);
+              bbs_to_fix.safe_push (bb);
+            }
+          /* Any blocks dominated by a block in the cold section
+             must also be cold.  */
+          for (son = first_dom_son (CDI_DOMINATORS, bb);
+               son;
+               son = next_dom_son (CDI_DOMINATORS, son))
+            bbs_in_cold_partition.safe_push (son);
+        }
+
+      if (dom_calculated_here)
+        free_dominance_info (CDI_DOMINATORS);
+    }
+
+  /* Do the partition fixup after all necessary blocks have been converted to
+     cold, so that we only update the region crossings the minimum number of
+     places, which can require forcing edges to be non fallthru.  */
+  while (! bbs_to_fix.is_empty ())
+    {
+      bb = bbs_to_fix.pop ();
+      fixup_bb_partition (bb);
+    }
+}
+
 /* Verify the CFG and RTL consistency common for both underlying RTL and
    cfglayout RTL.

@@ -2084,6 +2246,7 @@ rtl_verify_flow_info_1 (void)
   rtx x;
   int err = 0;
   basic_block bb;
+  bool have_partitions = false;

   /* Check the general integrity of the basic blocks.  */
   FOR_EACH_BB_REVERSE (bb)
@@ -2201,6 +2364,8 @@ rtl_verify_flow_info_1 (void)

   if (e->flags & EDGE_ABNORMAL)
     n_abnormal++;
+
+          have_partitions |= is_crossing;
  }

       if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX))
@@ -2325,6 +2490,40 @@ rtl_verify_flow_info_1 (void)
   }
     }

+  /* If there are partitions, do a sanity check on them: A basic block in
+     a cold partition cannot dominate a basic block in a hot partition.  */
+  vec<basic_block> bbs_in_cold_partition = vNULL;
+  if (have_partitions && !err)
+    FOR_EACH_BB (bb)
+      if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
+        bbs_in_cold_partition.safe_push (bb);
+  if (! bbs_in_cold_partition.is_empty ())
+    {
+      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
+      basic_block son;
+
+      if (dom_calculated_here)
+        calculate_dominance_info (CDI_DOMINATORS);
+
+      while (! bbs_in_cold_partition.is_empty ())
+        {
+          bb = bbs_in_cold_partition.pop ();
+          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
+            {
+              error ("non-cold basic block %d dominated "
+                     "by a block in the cold partition", bb->index);
+              err = 1;
+            }
+          for (son = first_dom_son (CDI_DOMINATORS, bb);
+               son;
+               son = next_dom_son (CDI_DOMINATORS, son))
+            bbs_in_cold_partition.safe_push (son);
+        }
+
+      if (dom_calculated_here)
+        free_dominance_info (CDI_DOMINATORS);
+    }
+
   /* Clean up.  */
   return err;
 }
@@ -2997,14 +3196,41 @@ record_effective_endpoints (void)
   else
     cfg_layout_function_header = NULL_RTX;

+  had_sec_boundary_notes = false;
+
   next_insn = get_insns ();
   FOR_EACH_BB (bb)
     {
       rtx end;

       if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb))
- BB_HEADER (bb) = unlink_insn_chain (next_insn,
-      PREV_INSN (BB_HEAD (bb)));
+        {
+          /* Rather than try to keep section boundary notes incrementally
+             up-to-date through cfg layout optimizations, simply remove them
+             and flag that they should be re-inserted when exiting
+             cfg layout mode.  */
+          rtx check_insn = next_insn;
+          while (check_insn)
+            {
+              if (NOTE_P (check_insn)
+                  && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS)
+              {
+                had_sec_boundary_notes |= true;
+                /* Remove note from chain. Grab new next_insn first.  */
+                if (next_insn == check_insn)
+                  next_insn = NEXT_INSN (check_insn);
+                /* Delete note.  */
+                delete_insn (check_insn);
+                /* There will only be one.  */
+                break;
+              }
+              check_insn = NEXT_INSN (check_insn);
+            }
+          /* If we still have header instructions left after above loop.  */
+          if (next_insn != BB_HEAD (bb))
+            BB_HEADER (bb) = unlink_insn_chain (next_insn,
+                                                PREV_INSN (BB_HEAD (bb)));
+        }
       end = skip_insns_after_block (bb);
       if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end)
  BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end);
@@ -3032,7 +3258,7 @@ outof_cfg_layout_mode (void)
     if (bb->next_bb != EXIT_BLOCK_PTR)
       bb->aux = bb->next_bb;

-  cfg_layout_finalize ();
+  cfg_layout_finalize (false);

   return 0;
 }
@@ -3152,10 +3378,13 @@ relink_block_chain (bool stay_in_cfglayout_mode)
 }



-/* Given a reorder chain, rearrange the code to match.  */
+/* Given a reorder chain, rearrange the code to match. If
+   this is called when we will FINALIZE_REORDER_BLOCKS, or when
+   section boundary notes were removed on entry to cfg layout
+   mode, insert section boundary notes here.  */

 static void
-fixup_reorder_chain (void)
+fixup_reorder_chain (bool finalize_reorder_blocks)
 {
   basic_block bb;
   rtx insn = NULL;
@@ -3182,7 +3411,7 @@ static void
   PREV_INSN (BB_HEADER (bb)) = insn;
   insn = BB_HEADER (bb);
   while (NEXT_INSN (insn))
-    insn = NEXT_INSN (insn);
+            insn = NEXT_INSN (insn);
  }
       if (insn)
  NEXT_INSN (insn) = BB_HEAD (bb);
@@ -3207,6 +3436,11 @@ static void
     insn = NEXT_INSN (insn);

   set_last_insn (insn);
+
+  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
+  if (had_sec_boundary_notes || finalize_reorder_blocks)
+    insert_section_boundary_note ();
+
 #ifdef ENABLE_CHECKING
   verify_insn_chain ();
 #endif
@@ -3219,7 +3453,7 @@ static void
       edge e_fall, e_taken, e;
       rtx bb_end_insn;
       rtx ret_label = NULL_RTX;
-      basic_block nb, src_bb;
+      basic_block nb;
       edge_iterator ei;

       if (EDGE_COUNT (bb->succs) == 0)
@@ -3354,7 +3588,6 @@ static void
       /* We got here if we need to add a new jump insn.
  Note force_nonfallthru can delete E_FALL and thus we have to
  save E_FALL->src prior to the call to force_nonfallthru.  */
-      src_bb = e_fall->src;
       nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label);
       if (nb)
  {
@@ -3362,17 +3595,6 @@ static void
   bb->aux = nb;
   /* Don't process this new block.  */
   bb = nb;
-
-  /* Make sure new bb is tagged for correct section (same as
-     fall-thru source, since you cannot fall-thru across
-     section boundaries).  */
-  BB_COPY_PARTITION (src_bb, single_pred (bb));
-  if (flag_reorder_blocks_and_partition
-      && targetm_common.have_named_sections
-      && JUMP_P (BB_END (bb))
-      && !any_condjump_p (BB_END (bb))
-      && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING))
-    add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX);
  }
     }

@@ -3676,10 +3898,11 @@ duplicate_insn_chain (rtx from, rtx to)
     case NOTE_INSN_FUNCTION_BEG:
       /* There is always just single entry to function.  */
     case NOTE_INSN_BASIC_BLOCK:
+              /* We should only switch text sections once.  */
+    case NOTE_INSN_SWITCH_TEXT_SECTIONS:
       break;

     case NOTE_INSN_EPILOGUE_BEG:
-    case NOTE_INSN_SWITCH_TEXT_SECTIONS:
       emit_note_copy (insn);
       break;

@@ -3791,10 +4014,13 @@ break_superblocks (void)
 }

 /* Finalize the changes: reorder insn list according to the sequence specified
-   by aux pointers, enter compensation code, rebuild scope forest.  */
+   by aux pointers, enter compensation code, rebuild scope forest. If
+   this is called when we will FINALIZE_REORDER_BLOCKS, indicate that
+   to fixup_reorder_chain so that it can insert the proper switch text
+   section notes.  */

 void
-cfg_layout_finalize (void)
+cfg_layout_finalize (bool finalize_reorder_blocks)
 {
 #ifdef ENABLE_CHECKING
   verify_flow_info ();
@@ -3807,7 +4033,7 @@ void
 #endif
       )
     fixup_fallthru_exit_predecessor ();
-  fixup_reorder_chain ();
+  fixup_reorder_chain (finalize_reorder_blocks);

   rebuild_jump_labels (get_insns ());
   delete_dead_jumptables ();
@@ -4486,8 +4712,7 @@ rtl_can_remove_branch_p (const_edge e)
   if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
     return false;

-  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
-      || BB_PARTITION (src) != BB_PARTITION (target))
+  if (BB_PARTITION (src) != BB_PARTITION (target))
     return false;

   if (!onlyjump_p (insn)

On Mon, Nov 26, 2012 at 12:19 PM, Teresa Johnson <tejohnson@google.com> wrote:
> Are you sure you have all my changes applied? I applied the 4 patches
> attached to PR55121 into my trunk checkout that has my fixes, and to a
> pristine trunk checkout. I configured and built both for
> --target=arm-none-linux-gnueabi, and built using your options, .i file
> and gcda file. I can reproduce the failure using the pristine trunk
> with your patches but not with my fixed trunk + your patches. (I just
> updated to head to pickup recent changes and get the same result. The
> vec changes required some manual changes to the patch, which I will
> resend shortly.)
>
> Without my fixes:
>
> $ ~/extra/gcc_trunk_3_arm-eabi/gcc/cc1 -fpreproce
> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9
> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp
> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use
> -fno-common -o eval.s -freorder-blocks-and-partition
> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
> 2.4.2-p1, MPC version 0.8.1
> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
> 2.4.2-p1, MPC version 0.8.1
> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
> Compiler executable checksum: d19cc60a2f07de08237a8488bb35cd1a
> eval.c: In function ‘Ge’:
> eval.c:792:1: internal compiler error: in df_compact_blocks, at df-core.c:1560
>  }
>  ^
> 0x622f71 df_compact_blocks()
> ../../gcc_trunk_3/gcc/df-core.c:1560
> 0x5cfcb5 compact_blocks()
> ../../gcc_trunk_3/gcc/cfg.c:162
> 0xc9dce0 reorder_basic_blocks
> ../../gcc_trunk_3/gcc/bb-reorder.c:2154
> 0xc9dce0 rest_of_handle_reorder_blocks
> ../../gcc_trunk_3/gcc/bb-reorder.c:2219
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See <http://gcc.gnu.org/bugs.html> for instructions.
>
>
> With my fixes:
>
> $ ~/extra/gcc_trunk_4_arm-eabi/gcc/cc1 -fpreproce
> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9
> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp
> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use
> -fno-common -o eval.s -freorder-blocks-and-partition
> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
> 2.4.2-p1, MPC version 0.8.1
> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
> 2.4.2-p1, MPC version 0.8.1
> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
> Compiler executable checksum: 45b468efa7c981f9afb44c4dac2424f3
>
>
> Thanks,
> Teresa
>
> On Mon, Nov 26, 2012 at 8:25 AM, Christophe Lyon
> <christophe.lyon@linaro.org> wrote:
>> Hi,
>>
>> I have tested your patch on Spec2000 on ARM, and I can still see
>> several failures caused by:
>> "error: fallthru edge crosses section boundary", including the case
>> described in PR55121.
>>
>> On 26 November 2012 16:55, Teresa Johnson <tejohnson@google.com> wrote:
>>> Ping.
>>> Teresa
>>>
>>> On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote:
>>>> Revised patch that fixes failures encountered when enabling
>>>> -freorder-blocks-and-partition, including the failure reported in PR 53743.
>>>>
>>>> This includes new verification code to ensure no cold blocks dominate hot
>>>> blocks contributed by Steven Bosscher.
>>>>
>>>> I attempted to make the handling of partition updates through the optimization
>>>> passes much more consistent, removing a number of partial fixes in the code
>>>> stream in the process. The code to fixup partitions (including the BB_PARTITION
>>>> assignement, region crossing jump notes, and switch text section notes) is
>>>> now handled in a few centralized locations. For example, inside
>>>> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers
>>>> don't need to attempt the fixup themselves.
>>>>
>>>> For optimization passes that make adjustments to the cfg while in cfg layout
>>>> mode that are not easy to fix up incrementally, the new routine
>>>> fixup_partitions handles the cleanup globally. This does require calculation
>>>> of the dominance relation, however, as far as I can tell the routines which
>>>> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions)
>>>> are invoked typically once (or a small number of times in the case of
>>>> try_optimize_cfg) per optimization pass. Additionally, I compared the
>>>> -ftime-report output for some large fdo compilations and saw only minimal
>>>> increases in the dominance computation times, which were only a tiny percent
>>>> of the overall compile time.
>>>>
>>>> Additionally, I added a flag to the rtl_data structure to indicate whether
>>>> any partitioning was actually performed, so that optimizations which were
>>>> conservatively disabled whenever the flag_reorder_blocks_and_partition
>>>> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less
>>>> conservative for functions where no partitions were formed (e.g. they are
>>>> completely hot).
>>>>
>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int
>>>> benchmarks and internal google benchmarks using profile feedback and
>>>> -freorder-blocks-and-partition to get more coverage. Ok for trunk?
>>>>
>>>> Thanks,
>>>> Teresa
>>>>
>>>> 2012-11-14  Teresa Johnson  <tejohnson@google.com>
>>>>             Steven Bosscher  <steven@gcc.gnu.org>
>>>>
>>>>         * cfghooks.h (cfg_layout_finalize): New parameter.
>>>>         * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize
>>>>         parameter.
>>>>         * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert
>>>>         as this is now done by redirect_edge_and_branch_force.
>>>>         * function.c (thread_prologue_and_epilogue_insns): Insert new bb after
>>>>         barriers, new cfg_layout_finalize parameter, and don't store exit
>>>>         predecessor BB until after it is potentially split.
>>>>         * function.h (struct rtl_data): New flag has_bb_partition.
>>>>         * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter.
>>>>         * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if
>>>>         any blocks in function actually partitioned.
>>>>         (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean
>>>>         up partitioning.
>>>>         * bb-reorder.c (connect_traces): Only look for partitions and skip
>>>>         block copying if any blocks in function actually partitioned.
>>>>         (emit_barrier_after_bb): Handle insertion in non-cfglayout mode.
>>>>         (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure
>>>>         that no cold blocks dominate a hot block.
>>>>         (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert
>>>>         as this is now done by force_nonfallthru_and_redirect.
>>>>         (add_reg_crossing_jump_notes): Handle the fact that some jumps may
>>>>         already be marked with region crossing note.
>>>>         (reorder_basic_blocks): Only need to verify partitions if any
>>>>         blocks in function actually partitioned.
>>>>         (insert_section_boundary_note): Only need to insert note if any
>>>>         blocks in function actually partitioned.
>>>>         (rest_of_handle_reorder_blocks): New cfg_layout_finalize
>>>>         parameter, and remove call to insert_section_boundary_note as this
>>>>         is now called via cfg_layout_finalize/fixup_reorder_chain.
>>>>         (duplicate_computed_gotos): New cfg_layout_finalize
>>>>         parameter.
>>>>         (partition_hot_cold_basic_blocks): Set flag indicating function
>>>>         has bb partitions.
>>>>         * bb-reorder.h: Declare insert_section_boundary_note and
>>>>         emit_barrier_after_bb, which are no longer static.
>>>>         * basic-block.h: Declare new function fixup_partitions.
>>>>         * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary
>>>>         check for region crossing note.
>>>>         (fixup_partition_crossing): New function.
>>>>         (fixup_bb_partition): Ditto.
>>>>         (rtl_redirect_edge_and_branch): Fixup partition boundaries.
>>>>         (force_nonfallthru_and_redirect): Fixup partition boundaries,
>>>>         remove old code that tried to do this. Emit barrier correctly
>>>>         when we are in cfglayout mode.
>>>>         (rtl_split_edge): Correctly fixup partition boundaries.
>>>>         (commit_one_edge_insertion): Remove old code that tried to
>>>>         fixup region crossing edge since this is now handled in
>>>>         split_block, and set up insertion point correctly since
>>>>         block may now end in a jump.
>>>>         (commit_edge_insertions): Invoke fixup_partitions to sanitize partition
>>>>         boundaries after optimizations that modify cfg and before trying to
>>>>         verify the flow info.
>>>>         (fixup_partitions): New function.
>>>>         (rtl_verify_flow_info_1): Add verification that no cold bbs dominate
>>>>         hot bbs.
>>>>         (record_effective_endpoints): Remove region-crossing notes and set flag
>>>>         indicating that they need to be reinserted on exit from cfglayout mode.
>>>>         (outof_cfg_layout_mode): New cfg_layout_finalize parameter.
>>>>         (fixup_reorder_chain): Call insert_section_boundary_note if necessary.
>>>>         Remove old code that attempted to fixup region crossing note as
>>>>         this is now handled in force_nonfallthru_and_redirect.
>>>>         (duplicate_insn_chain): Don't duplicate switch section notes.
>>>>         (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain.
>>>>         (rtl_can_remove_branch_p): Remove unnecessary check for region crossing
>>>>         note.
>>>>
>>>> Index: cfghooks.h
>>>> ===================================================================
>>>> --- cfghooks.h  (revision 193376)
>>>> +++ cfghooks.h  (working copy)
>>>> @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas
>>>>  void account_profile_record (struct profile_record *, int);
>>>>
>>>>  extern void cfg_layout_initialize (unsigned int);
>>>> -extern void cfg_layout_finalize (void);
>>>> +extern void cfg_layout_finalize (bool);
>>>>
>>>>  /* Hooks containers.  */
>>>>  extern struct cfg_hooks gimple_cfg_hooks;
>>>> @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi
>>>>  extern void gimple_register_cfg_hooks (void);
>>>>  extern struct cfg_hooks get_cfg_hooks (void);
>>>>  extern void set_cfg_hooks (struct cfg_hooks);
>>>> -
>>>> Index: modulo-sched.c
>>>> ===================================================================
>>>> --- modulo-sched.c      (revision 193376)
>>>> +++ modulo-sched.c      (working copy)
>>>> @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void)
>>>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>>>>        bb->aux = bb->next_bb;
>>>>    free_dominance_info (CDI_DOMINATORS);
>>>> -  cfg_layout_finalize ();
>>>> +  cfg_layout_finalize (false);
>>>>  #endif /* INSN_SCHEDULING */
>>>>    return 0;
>>>>  }
>>>> Index: ifcvt.c
>>>> ===================================================================
>>>> --- ifcvt.c     (revision 193376)
>>>> +++ ifcvt.c     (working copy)
>>>> @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg
>>>>    if (new_bb)
>>>>      {
>>>>        df_bb_replace (then_bb_index, new_bb);
>>>> -      /* Since the fallthru edge was redirected from test_bb to new_bb,
>>>> -         we need to ensure that new_bb is in the same partition as
>>>> -         test bb (you can not fall through across section boundaries).  */
>>>> -      BB_COPY_PARTITION (new_bb, test_bb);
>>>> +      /* This should have been done above via force_nonfallthru_and_redirect
>>>> +         (possibly called from redirect_edge_and_branch_force).  */
>>>> +      gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb));
>>>>      }
>>>>
>>>>    num_true_changes++;
>>>> Index: function.c
>>>> ===================================================================
>>>> --- function.c  (revision 193376)
>>>> +++ function.c  (working copy)
>>>> @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void)
>>>>                     break;
>>>>                 if (e)
>>>>                   {
>>>> -                   copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)),
>>>> -                                                 NULL_RTX, e->src);
>>>> +                    /* Make sure we insert after any barriers.  */
>>>> +                    rtx end = get_last_bb_insn (e->src);
>>>> +                    copy_bb = create_basic_block (NEXT_INSN (end),
>>>> +                                                  NULL_RTX, e->src);
>>>>                     BB_COPY_PARTITION (copy_bb, e->src);
>>>>                   }
>>>>                 else
>>>> @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void)
>>>>         if (cur_bb->index >= NUM_FIXED_BLOCKS
>>>>             && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS)
>>>>           cur_bb->aux = cur_bb->next_bb;
>>>> -      cfg_layout_finalize ();
>>>> +      cfg_layout_finalize (false);
>>>>      }
>>>>
>>>>  epilogue_done:
>>>> @@ -6517,7 +6519,7 @@ epilogue_done:
>>>>        basic_block simple_return_block_cold = NULL;
>>>>        edge pending_edge_hot = NULL;
>>>>        edge pending_edge_cold = NULL;
>>>> -      basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb;
>>>> +      basic_block exit_pred;
>>>>        int i;
>>>>
>>>>        gcc_assert (entry_edge != orig_entry_edge);
>>>> @@ -6545,6 +6547,12 @@ epilogue_done:
>>>>             else
>>>>               pending_edge_cold = e;
>>>>           }
>>>> +
>>>> +      /* Save a pointer to the exit's predecessor BB for use in
>>>> +         inserting new BBs at the end of the function. Do this
>>>> +         after the call to split_block above which may split
>>>> +         the original exit pred.  */
>>>> +      exit_pred = EXIT_BLOCK_PTR->prev_bb;
>>>>
>>>>        FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e)
>>>>         {
>>>> Index: function.h
>>>> ===================================================================
>>>> --- function.h  (revision 193376)
>>>> +++ function.h  (working copy)
>>>> @@ -459,6 +459,11 @@ struct GTY(()) rtl_data {
>>>>       sched2) and is useful only if the port defines LEAF_REGISTERS.  */
>>>>    bool uses_only_leaf_regs;
>>>>
>>>> +  /* Nonzero if the function being compiled has undergone hot/cold partitioning
>>>> +     (under flag_reorder_blocks_and_partition) and has at least one cold
>>>> +     block.  */
>>>> +  bool has_bb_partition;
>>>> +
>>>>    /* Like regs_ever_live, but 1 if a reg is set or clobbered from an
>>>>       asm.  Unlike regs_ever_live, elements of this array corresponding
>>>>       to eliminable regs (like the frame pointer) are set if an asm
>>>> Index: hw-doloop.c
>>>> ===================================================================
>>>> --- hw-doloop.c (revision 193376)
>>>> +++ hw-doloop.c (working copy)
>>>> @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops)
>>>>        else
>>>>         bb->aux = NULL;
>>>>      }
>>>> -  cfg_layout_finalize ();
>>>> +  cfg_layout_finalize (false);
>>>>    clear_aux_for_blocks ();
>>>>    df_analyze ();
>>>>  }
>>>> Index: cfgcleanup.c
>>>> ===================================================================
>>>> --- cfgcleanup.c        (revision 193376)
>>>> +++ cfgcleanup.c        (working copy)
>>>> @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2,
>>>>       partition boundaries).  See the comments at the top of
>>>>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>>>>
>>>> -  if (flag_reorder_blocks_and_partition && reload_completed)
>>>> +  if (crtl->has_bb_partition && reload_completed)
>>>>      return false;
>>>>
>>>>    /* Search backward through forwarder blocks.  We don't need to worry
>>>> @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode)
>>>>               df_analyze ();
>>>>             }
>>>>
>>>> +         if (changed)
>>>> +            {
>>>> +              /* Edge forwarding in particular can cause hot blocks previously
>>>> +                 reached by both hot and cold blocks to become dominated only
>>>> +                 by cold blocks. This will cause the verification below to fail,
>>>> +                 and lead to now cold code in the hot section. This is not easy
>>>> +                 to detect and fix during edge forwarding, and in some cases
>>>> +                 is only visible after newly unreachable blocks are deleted,
>>>> +                 which will be done in fixup_partitions.  */
>>>> +              fixup_partitions ();
>>>> +
>>>>  #ifdef ENABLE_CHECKING
>>>> -         if (changed)
>>>> -           verify_flow_info ();
>>>> +              verify_flow_info ();
>>>>  #endif
>>>> +            }
>>>>
>>>>           changed_overall |= changed;
>>>>           first_pass = false;
>>>> Index: bb-reorder.c
>>>> ===================================================================
>>>> --- bb-reorder.c        (revision 193376)
>>>> +++ bb-reorder.c        (working copy)
>>>> @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces
>>>>    current_partition = BB_PARTITION (traces[0].first);
>>>>    two_passes = false;
>>>>
>>>> -  if (flag_reorder_blocks_and_partition)
>>>> +  if (crtl->has_bb_partition)
>>>>      for (i = 0; i < n_traces && !two_passes; i++)
>>>>        if (BB_PARTITION (traces[0].first)
>>>>           != BB_PARTITION (traces[i].first))
>>>> @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces
>>>>                       }
>>>>                   }
>>>>
>>>> -             if (flag_reorder_blocks_and_partition)
>>>> +             if (crtl->has_bb_partition)
>>>>                 try_copy = false;
>>>>
>>>>               /* Copy tiny blocks always; copy larger blocks only when the
>>>> @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void)
>>>>    return length;
>>>>  }
>>>>
>>>> -/* Emit a barrier into the footer of BB.  */
>>>> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode.  */
>>>>
>>>> -static void
>>>> +void
>>>>  emit_barrier_after_bb (basic_block bb)
>>>>  {
>>>>    rtx barrier = emit_barrier_after (BB_END (bb));
>>>> -  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>>>> +  if (current_ir_type () == IR_RTL_CFGLAYOUT)
>>>> +    BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>>>>  }
>>>>
>>>>  /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.
>>>> @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg
>>>>  {
>>>>    VEC(edge, heap) *crossing_edges = NULL;
>>>>    basic_block bb;
>>>> -  edge e;
>>>> -  edge_iterator ei;
>>>> +  edge e, e2;
>>>> +  edge_iterator ei, ei2;
>>>> +  unsigned int cold_bb_count = 0;
>>>> +  VEC (basic_block, heap) *bbs_in_hot_partition = NULL;
>>>> +  VEC (basic_block, heap) *bbs_newly_hot = NULL;
>>>>
>>>>    /* Mark which partition (hot/cold) each basic block belongs in.  */
>>>>    FOR_EACH_BB (bb)
>>>>      {
>>>>        if (probably_never_executed_bb_p (cfun, bb))
>>>> -       BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>>>> +        {
>>>> +          BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>>>> +          cold_bb_count++;
>>>> +        }
>>>>        else
>>>> -       BB_SET_PARTITION (bb, BB_HOT_PARTITION);
>>>> +        {
>>>> +          BB_SET_PARTITION (bb, BB_HOT_PARTITION);
>>>> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb);
>>>> +        }
>>>>      }
>>>>
>>>> +  /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of
>>>> +     several different possibilities. One is that there are edge weight insanities
>>>> +     due to optimization phases that do not properly update basic block profile
>>>> +     counts. The second is that the entry of the function may not be hot, because
>>>> +     it is entered fewer times than the number of profile training runs, but there
>>>> +     is a loop inside the function that causes blocks within the function to be
>>>> +     above the threshold for hotness.  */
>>>> +  if (cold_bb_count)
>>>> +    {
>>>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>>>> +
>>>> +      if (dom_calculated_here)
>>>> +        calculate_dominance_info (CDI_DOMINATORS);
>>>> +
>>>> +      /* Keep examining hot bbs until we have either checked them all, or
>>>> +         re-marked all cold bbs hot.  */
>>>> +      while (! VEC_empty (basic_block, bbs_in_hot_partition)
>>>> +             && cold_bb_count)
>>>> +        {
>>>> +          basic_block dom_bb;
>>>> +
>>>> +          bb = VEC_pop (basic_block, bbs_in_hot_partition);
>>>> +          dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
>>>> +
>>>> +          /* If bb's immediate dominator is also hot then it is ok.  */
>>>> +          if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION)
>>>> +            continue;
>>>> +
>>>> +          /* We have a hot bb with an immediate dominator that is cold.
>>>> +             The dominator needs to be re-marked to hot.  */
>>>> +          BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION);
>>>> +          cold_bb_count--;
>>>> +
>>>> +          /* Now we need to examine newly-hot dom_bb to see if it is also
>>>> +             dominated by a cold bb.  */
>>>> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb);
>>>> +
>>>> +          /* We should also adjust any cold blocks that the newly-hot bb
>>>> +             feeds and see if it makes sense to re-mark those as hot as
>>>> +             well.  */
>>>> +          VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb);
>>>> +          while (! VEC_empty (basic_block, bbs_newly_hot))
>>>> +            {
>>>> +              basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot);
>>>> +              /* Examine all successors of this newly-hot bb to see if they
>>>> +                 are cold and should be re-marked as hot.  */
>>>> +              FOR_EACH_EDGE (e, ei, new_hot_bb->succs)
>>>> +                {
>>>> +                  bool any_cold_preds = false;
>>>> +                  basic_block succ = e->dest;
>>>> +                  if (BB_PARTITION (succ) != BB_COLD_PARTITION)
>>>> +                    continue;
>>>> +                  /* Does this block have any cold predecessors now?  */
>>>> +                  FOR_EACH_EDGE (e2, ei2, succ->preds)
>>>> +                  {
>>>> +                    if (BB_PARTITION (e2->src) == BB_COLD_PARTITION)
>>>> +                      {
>>>> +                        any_cold_preds = true;
>>>> +                        break;
>>>> +                      }
>>>> +                  }
>>>> +                  if (any_cold_preds)
>>>> +                    continue;
>>>> +
>>>> +                  /* Here we have a successor of newly-hot bb that is cold
>>>> +                     but no longer has any cold precessessors. Since the original
>>>> +                     assignment of our newly-hot bb was incorrect, this successor's
>>>> +                     assignment as cold is also suspect. Go ahead and re-mark it
>>>> +                     as hot now too. Better heuristics may be in order here.  */
>>>> +                  BB_SET_PARTITION (succ, BB_HOT_PARTITION);
>>>> +                  cold_bb_count--;
>>>> +                  VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ);
>>>> +                  /* Examine this successor as a newly-hot bb.  */
>>>> +                  VEC_safe_push (basic_block, heap, bbs_newly_hot, succ);
>>>> +                }
>>>> +            }
>>>> +        }
>>>> +
>>>> +      if (dom_calculated_here)
>>>> +        free_dominance_info (CDI_DOMINATORS);
>>>> +    }
>>>> +
>>>>    /* The format of .gcc_except_table does not allow landing pads to
>>>>       be in a different partition as the throw.  Fix this by either
>>>>       moving or duplicating the landing pads.  */
>>>> @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void)
>>>>                       new_bb->aux = cur_bb->aux;
>>>>                       cur_bb->aux = new_bb;
>>>>
>>>> -                     /* Make sure new fall-through bb is in same
>>>> -                        partition as bb it's falling through from.  */
>>>> +                      /* This is done by force_nonfallthru_and_redirect.  */
>>>> +                     gcc_assert (BB_PARTITION (new_bb)
>>>> +                                  == BB_PARTITION (cur_bb));
>>>>
>>>> -                     BB_COPY_PARTITION (new_bb, cur_bb);
>>>>                       single_succ_edge (new_bb)->flags |= EDGE_CROSSING;
>>>>                     }
>>>>                   else
>>>> @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void)
>>>>    FOR_EACH_BB (bb)
>>>>      FOR_EACH_EDGE (e, ei, bb->succs)
>>>>        if ((e->flags & EDGE_CROSSING)
>>>> -         && JUMP_P (BB_END (e->src)))
>>>> +         && JUMP_P (BB_END (e->src))
>>>> +          /* Some notes were added during fix_up_fall_thru_edges, via
>>>> +             force_nonfallthru_and_redirect.  */
>>>> +          && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX))
>>>>         add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>>>  }
>>>>
>>>> @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void)
>>>>        dump_flow_info (dump_file, dump_flags);
>>>>      }
>>>>
>>>> -  if (flag_reorder_blocks_and_partition)
>>>> +  if (crtl->has_bb_partition)
>>>>      verify_hot_cold_block_grouping ();
>>>>  }
>>>>
>>>> @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void)
>>>>     encountering this note will make the compiler switch between the
>>>>     hot and cold text sections.  */
>>>>
>>>> -static void
>>>> +void
>>>>  insert_section_boundary_note (void)
>>>>  {
>>>>    basic_block bb;
>>>>    rtx new_note;
>>>>    int first_partition = 0;
>>>>
>>>> -  if (!flag_reorder_blocks_and_partition)
>>>> +  if (!crtl->has_bb_partition)
>>>>      return;
>>>>
>>>>    FOR_EACH_BB (bb)
>>>> @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void)
>>>>    FOR_EACH_BB (bb)
>>>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>>>>        bb->aux = bb->next_bb;
>>>> -  cfg_layout_finalize ();
>>>> +  cfg_layout_finalize (true);
>>>>
>>>> -  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
>>>> -  insert_section_boundary_note ();
>>>>    return 0;
>>>>  }
>>>>
>>>> @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void)
>>>>      }
>>>>
>>>>  done:
>>>> -  cfg_layout_finalize ();
>>>> +  cfg_layout_finalize (false);
>>>>
>>>>    BITMAP_FREE (candidates);
>>>>    return 0;
>>>> @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void)
>>>>    if (crossing_edges == NULL)
>>>>      return 0;
>>>>
>>>> +  crtl->has_bb_partition = true;
>>>> +
>>>>    /* Make sure the source of any crossing edge ends in a jump and the
>>>>       destination of any crossing edge has a label.  */
>>>>    add_labels_and_missing_jumps (crossing_edges);
>>>> Index: bb-reorder.h
>>>> ===================================================================
>>>> --- bb-reorder.h        (revision 193376)
>>>> +++ bb-reorder.h        (working copy)
>>>> @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re
>>>>
>>>>  extern int get_uncond_jump_length (void);
>>>>
>>>> +extern void insert_section_boundary_note (void);
>>>> +
>>>> +extern void emit_barrier_after_bb (basic_block bb);
>>>> +
>>>>  #endif
>>>> Index: basic-block.h
>>>> ===================================================================
>>>> --- basic-block.h       (revision 193376)
>>>> +++ basic-block.h       (working copy)
>>>> @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect
>>>>  extern bool contains_no_active_insn_p (const_basic_block);
>>>>  extern bool forwarder_block_p (const_basic_block);
>>>>  extern bool can_fallthru (basic_block, basic_block);
>>>> +extern void fixup_partitions (void);
>>>>
>>>>  /* In cfgbuild.c.  */
>>>>  extern void find_many_sub_basic_blocks (sbitmap);
>>>> Index: cfgrtl.c
>>>> ===================================================================
>>>> --- cfgrtl.c    (revision 193376)
>>>> +++ cfgrtl.c    (working copy)
>>>> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
>>>>  #include "tree.h"
>>>>  #include "hard-reg-set.h"
>>>>  #include "basic-block.h"
>>>> +#include "bb-reorder.h"
>>>>  #include "regs.h"
>>>>  #include "flags.h"
>>>>  #include "function.h"
>>>> @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3.  If not see
>>>>     Only applicable if the CFG is in cfglayout mode.  */
>>>>  static GTY(()) rtx cfg_layout_function_footer;
>>>>  static GTY(()) rtx cfg_layout_function_header;
>>>> +static bool had_sec_boundary_notes;
>>>>
>>>>  static rtx skip_insns_after_block (basic_block);
>>>>  static void record_effective_endpoints (void);
>>>>  static rtx label_for_bb (basic_block);
>>>> -static void fixup_reorder_chain (void);
>>>> +static void fixup_reorder_chain (bool finalize_reorder_blocks);
>>>>
>>>>  void verify_insn_chain (void);
>>>>  static void fixup_fallthru_exit_predecessor (void);
>>>> @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc
>>>>       partition boundaries).  See  the comments at the top of
>>>>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>>>>
>>>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
>>>> -      || BB_PARTITION (src) != BB_PARTITION (target))
>>>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>>>>      return NULL;
>>>>
>>>>    /* We can replace or remove a complex jump only when we have exactly
>>>> @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target)
>>>>    return e;
>>>>  }
>>>>
>>>> +/* Called when edge E has been redirected to a new destination,
>>>> +   in order to update the region crossing flag on the edge and
>>>> +   jump.  */
>>>> +
>>>> +static void
>>>> +fixup_partition_crossing (edge e, basic_block target)
>>>> +{
>>>> +  rtx note;
>>>> +
>>>> +  gcc_assert (e->dest == target);
>>>> +
>>>> +  if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR)
>>>> +    return;
>>>> +  /* If we redirected an existing edge, it may already be marked
>>>> +     crossing, even though the new src is missing a reg crossing note.
>>>> +     But make sure reg crossing note doesn't already exist before
>>>> +     inserting.  */
>>>> +  if (BB_PARTITION (e->src) != BB_PARTITION (target))
>>>> +    {
>>>> +      e->flags |= EDGE_CROSSING;
>>>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>>> +      if (JUMP_P (BB_END (e->src))
>>>> +          && !note)
>>>> +        add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>>> +    }
>>>> +  else if (BB_PARTITION (e->src) == BB_PARTITION (target))
>>>> +    {
>>>> +      e->flags &= ~EDGE_CROSSING;
>>>> +      /* Remove the region crossing note from jump at end of
>>>> +         e->src if it exists.  */
>>>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>>> +      if (note)
>>>> +        remove_note (BB_END (e->src), note);
>>>> +    }
>>>> +}
>>>> +
>>>> +/* Called when block BB has been reassigned to a different partition,
>>>> +   to ensure that the region crossing attributes are updated.  */
>>>> +
>>>> +static void
>>>> +fixup_bb_partition (basic_block bb)
>>>> +{
>>>> +  edge e;
>>>> +  edge_iterator ei;
>>>> +
>>>> +  /* Now need to make bb's pred edges non-region crossing.  */
>>>> +  FOR_EACH_EDGE (e, ei, bb->preds)
>>>> +    {
>>>> +      fixup_partition_crossing (e, e->dest);
>>>> +    }
>>>> +
>>>> +  /* Possibly need to make bb's successor edges region crossing,
>>>> +     or remove stale region crossing.  */
>>>> +  FOR_EACH_EDGE (e, ei, bb->succs)
>>>> +    {
>>>> +      if ((e->flags & EDGE_FALLTHRU)
>>>> +          && BB_PARTITION (bb) != BB_PARTITION (e->dest)
>>>> +          && e->dest != EXIT_BLOCK_PTR)
>>>> +        /* force_nonfallthru_and_redirect calls fixup_partition_crossing.  */
>>>> +        force_nonfallthru (e);
>>>> +      else
>>>> +        fixup_partition_crossing (e, e->dest);
>>>> +    }
>>>> +}
>>>> +
>>>>  /* Attempt to change code to redirect edge E to TARGET.  Don't do that on
>>>>     expense of adding new instructions or reordering basic blocks.
>>>>
>>>> @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>>>>  {
>>>>    edge ret;
>>>>    basic_block src = e->src;
>>>> +  basic_block dest = e->dest;
>>>>
>>>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>>>>      return NULL;
>>>>
>>>> -  if (e->dest == target)
>>>> +  if (dest == target)
>>>>      return e;
>>>>
>>>>    if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL)
>>>>      {
>>>>        df_set_bb_dirty (src);
>>>> +      fixup_partition_crossing (ret, target);
>>>>        return ret;
>>>>      }
>>>>
>>>> @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>>>>      return NULL;
>>>>
>>>>    df_set_bb_dirty (src);
>>>> +  fixup_partition_crossing (ret, target);
>>>>    return ret;
>>>>  }
>>>>
>>>> @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>>>>        /* Make sure new block ends up in correct hot/cold section.  */
>>>>
>>>>        BB_COPY_PARTITION (jump_block, e->src);
>>>> -      if (flag_reorder_blocks_and_partition
>>>> -         && targetm_common.have_named_sections
>>>> -         && JUMP_P (BB_END (jump_block))
>>>> -         && !any_condjump_p (BB_END (jump_block))
>>>> -         && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING))
>>>> -       add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX);
>>>>
>>>>        /* Wire edge in.  */
>>>>        new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU);
>>>>        new_edge->probability = probability;
>>>>        new_edge->count = count;
>>>>
>>>> +      /* If e->src was previously region crossing, it no longer is
>>>> +         and the reg crossing note should be removed.  */
>>>> +      fixup_partition_crossing (new_edge, jump_block);
>>>> +
>>>>        /* Redirect old edge.  */
>>>>        redirect_edge_pred (e, jump_block);
>>>>        e->probability = REG_BR_PROB_BASE;
>>>> @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>>>>        LABEL_NUSES (label)++;
>>>>      }
>>>>
>>>> -  emit_barrier_after (BB_END (jump_block));
>>>> +  /* We might be in cfg layout mode, and if so, the following routine will
>>>> +     insert the barrier correctly.  */
>>>> +  emit_barrier_after_bb (jump_block);
>>>>    redirect_edge_succ_nodup (e, target);
>>>>
>>>>    if (abnormal_edge_flags)
>>>>      make_edge (src, target, abnormal_edge_flags);
>>>>
>>>>    df_mark_solutions_dirty ();
>>>> +  fixup_partition_crossing (e, target);
>>>>    return new_bb;
>>>>  }
>>>>
>>>> @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU
>>>>  static basic_block
>>>>  rtl_split_edge (edge edge_in)
>>>>  {
>>>> -  basic_block bb;
>>>> +  basic_block bb, new_bb;
>>>>    rtx before;
>>>>
>>>>    /* Abnormal edges cannot be split.  */
>>>> @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in)
>>>>    else
>>>>      {
>>>>        bb = create_basic_block (before, NULL, edge_in->dest->prev_bb);
>>>> -      /* ??? Why not edge_in->dest->prev_bb here?  */
>>>> -      BB_COPY_PARTITION (bb, edge_in->dest);
>>>> +      if (edge_in->src == ENTRY_BLOCK_PTR)
>>>> +        BB_COPY_PARTITION (bb, edge_in->dest);
>>>> +      else
>>>> +        /* Put the split bb into the src partition, to avoid creating
>>>> +           a situation where a cold bb dominates a hot bb, in the case
>>>> +           where src is cold and dest is hot. The src will dominate
>>>> +           the new bb (whereas it might not have dominated dest).  */
>>>> +        BB_COPY_PARTITION (bb, edge_in->src);
>>>>      }
>>>>
>>>>    make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU);
>>>>
>>>> +  /* Can't allow a region crossing edge to be fallthrough.  */
>>>> +  if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest)
>>>> +      && edge_in->dest != EXIT_BLOCK_PTR)
>>>> +    {
>>>> +      new_bb = force_nonfallthru (single_succ_edge (bb));
>>>> +      gcc_assert (!new_bb);
>>>> +    }
>>>> +
>>>>    /* For non-fallthru edges, we must adjust the predecessor's
>>>>       jump instruction to target our new block.  */
>>>>    if ((edge_in->flags & EDGE_FALLTHRU) == 0)
>>>> @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e)
>>>>    else
>>>>      {
>>>>        bb = split_edge (e);
>>>> -      after = BB_END (bb);
>>>>
>>>> -      if (flag_reorder_blocks_and_partition
>>>> -         && targetm_common.have_named_sections
>>>> -         && e->src != ENTRY_BLOCK_PTR
>>>> -         && BB_PARTITION (e->src) == BB_COLD_PARTITION
>>>> -         && !(e->flags & EDGE_CROSSING)
>>>> -         && JUMP_P (after)
>>>> -         && !any_condjump_p (after)
>>>> -         && (single_succ_edge (bb)->flags & EDGE_CROSSING))
>>>> -       add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX);
>>>> +      /* If e crossed a partition boundary, we needed to make bb end in
>>>> +         a region-crossing jump, even though it was originally fallthru.  */
>>>> +      if (JUMP_P (BB_END (bb)))
>>>> +       before = BB_END (bb);
>>>> +      else
>>>> +        after = BB_END (bb);
>>>>      }
>>>>
>>>>    /* Now that we've found the spot, do the insertion.  */
>>>> @@ -1827,6 +1907,14 @@ commit_edge_insertions (void)
>>>>  {
>>>>    basic_block bb;
>>>>
>>>> +  /* Optimization passes that invoke this routine can cause hot blocks
>>>> +     previously reached by both hot and cold blocks to become dominated only
>>>> +     by cold blocks. This will cause the verification below to fail,
>>>> +     and lead to now cold code in the hot section. In some cases this
>>>> +     may only be visible after newly unreachable blocks are deleted,
>>>> +     which will be done by fixup_partitions.  */
>>>> +  fixup_partitions ();
>>>> +
>>>>  #ifdef ENABLE_CHECKING
>>>>    verify_flow_info ();
>>>>  #endif
>>>> @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb)
>>>>
>>>>    return end;
>>>>  }
>>>> -
>>>> +
>>>> +/* Perform cleanup on the hot/cold bb partitioning after optimization
>>>> +   passes that modify the cfg.  */
>>>> +
>>>> +void
>>>> +fixup_partitions (void)
>>>> +{
>>>> +  basic_block bb;
>>>> +
>>>> +  if (!crtl->has_bb_partition)
>>>> +    return;
>>>> +
>>>> +  /* Delete any blocks that became unreachable and weren't
>>>> +     already cleaned up, for example during edge forwarding
>>>> +     and convert_jumps_to_returns. This will expose more
>>>> +     opportunities for fixing the partition boundaries here.
>>>> +     Also, the calculation of the dominance graph during verification
>>>> +     will assert if there are unreachable nodes.  */
>>>> +  delete_unreachable_blocks ();
>>>> +
>>>> +  /* If there are partitions, do a sanity check on them: A basic block in
>>>> +     a cold partition cannot dominate a basic block in a hot partition.
>>>> +     Fixup any that now violate this requirement, as a result of edge
>>>> +     forwarding and unreachable block deletion.  */
>>>> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
>>>> +  VEC (basic_block, heap) *bbs_to_fix = NULL;
>>>> +  FOR_EACH_BB (bb)
>>>> +    if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
>>>> +      VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
>>>> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
>>>> +    {
>>>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>>>> +      basic_block son;
>>>> +
>>>> +      if (dom_calculated_here)
>>>> +        calculate_dominance_info (CDI_DOMINATORS);
>>>> +
>>>> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
>>>> +        {
>>>> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
>>>> +          /* If bb is not yet cold (because it was added below as
>>>> +             a block dominated by a cold bb) then mark it cold here.  */
>>>> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
>>>> +            {
>>>> +              BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>>>> +              VEC_safe_push (basic_block, heap, bbs_to_fix, bb);
>>>> +            }
>>>> +          /* Any blocks dominated by a block in the cold section
>>>> +             must also be cold.  */
>>>> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
>>>> +               son;
>>>> +               son = next_dom_son (CDI_DOMINATORS, son))
>>>> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
>>>> +        }
>>>> +
>>>> +      if (dom_calculated_here)
>>>> +        free_dominance_info (CDI_DOMINATORS);
>>>> +    }
>>>> +
>>>> +  /* Do the partition fixup after all necessary blocks have been converted to
>>>> +     cold, so that we only update the region crossings the minimum number of
>>>> +     places, which can require forcing edges to be non fallthru.  */
>>>> +  while (! VEC_empty (basic_block, bbs_to_fix))
>>>> +    {
>>>> +      bb = VEC_pop (basic_block, bbs_to_fix);
>>>> +      fixup_bb_partition (bb);
>>>> +    }
>>>> +}
>>>> +
>>>>  /* Verify the CFG and RTL consistency common for both underlying RTL and
>>>>     cfglayout RTL.
>>>>
>>>> @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void)
>>>>    rtx x;
>>>>    int err = 0;
>>>>    basic_block bb;
>>>> +  bool have_partitions = false;
>>>>
>>>>    /* Check the general integrity of the basic blocks.  */
>>>>    FOR_EACH_BB_REVERSE (bb)
>>>> @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void)
>>>>
>>>>           if (e->flags & EDGE_ABNORMAL)
>>>>             n_abnormal++;
>>>> +
>>>> +          have_partitions |= is_crossing;
>>>>         }
>>>>
>>>>        if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX))
>>>> @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void)
>>>>           }
>>>>      }
>>>>
>>>> +  /* If there are partitions, do a sanity check on them: A basic block in
>>>> +     a cold partition cannot dominate a basic block in a hot partition.  */
>>>> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
>>>> +  if (have_partitions && !err)
>>>> +    FOR_EACH_BB (bb)
>>>> +      if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
>>>> +        VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
>>>> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
>>>> +    {
>>>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>>>> +      basic_block son;
>>>> +
>>>> +      if (dom_calculated_here)
>>>> +        calculate_dominance_info (CDI_DOMINATORS);
>>>> +
>>>> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
>>>> +        {
>>>> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
>>>> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
>>>> +            {
>>>> +              error ("non-cold basic block %d dominated "
>>>> +                     "by a block in the cold partition", bb->index);
>>>> +              err = 1;
>>>> +            }
>>>> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
>>>> +               son;
>>>> +               son = next_dom_son (CDI_DOMINATORS, son))
>>>> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
>>>> +        }
>>>> +
>>>> +      if (dom_calculated_here)
>>>> +        free_dominance_info (CDI_DOMINATORS);
>>>> +    }
>>>> +
>>>>    /* Clean up.  */
>>>>    return err;
>>>>  }
>>>> @@ -2965,14 +3158,41 @@ record_effective_endpoints (void)
>>>>    else
>>>>      cfg_layout_function_header = NULL_RTX;
>>>>
>>>> +  had_sec_boundary_notes = false;
>>>> +
>>>>    next_insn = get_insns ();
>>>>    FOR_EACH_BB (bb)
>>>>      {
>>>>        rtx end;
>>>>
>>>>        if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb))
>>>> -       BB_HEADER (bb) = unlink_insn_chain (next_insn,
>>>> -                                             PREV_INSN (BB_HEAD (bb)));
>>>> +        {
>>>> +          /* Rather than try to keep section boundary notes incrementally
>>>> +             up-to-date through cfg layout optimizations, simply remove them
>>>> +             and flag that they should be re-inserted when exiting
>>>> +             cfg layout mode.  */
>>>> +          rtx check_insn = next_insn;
>>>> +          while (check_insn)
>>>> +            {
>>>> +              if (NOTE_P (check_insn)
>>>> +                  && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS)
>>>> +              {
>>>> +                had_sec_boundary_notes |= true;
>>>> +                /* Remove note from chain. Grab new next_insn first.  */
>>>> +                if (next_insn == check_insn)
>>>> +                  next_insn = NEXT_INSN (check_insn);
>>>> +                /* Delete note.  */
>>>> +                delete_insn (check_insn);
>>>> +                /* There will only be one.  */
>>>> +                break;
>>>> +              }
>>>> +              check_insn = NEXT_INSN (check_insn);
>>>> +            }
>>>> +          /* If we still have header instructions left after above loop.  */
>>>> +          if (next_insn != BB_HEAD (bb))
>>>> +            BB_HEADER (bb) = unlink_insn_chain (next_insn,
>>>> +                                                PREV_INSN (BB_HEAD (bb)));
>>>> +        }
>>>>        end = skip_insns_after_block (bb);
>>>>        if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end)
>>>>         BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end);
>>>> @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void)
>>>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>>>>        bb->aux = bb->next_bb;
>>>>
>>>> -  cfg_layout_finalize ();
>>>> +  cfg_layout_finalize (false);
>>>>
>>>>    return 0;
>>>>  }
>>>> @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode)
>>>>  }
>>>>
>>>>
>>>> -/* Given a reorder chain, rearrange the code to match.  */
>>>> +/* Given a reorder chain, rearrange the code to match. If
>>>> +   this is called when we will FINALIZE_REORDER_BLOCKS, or when
>>>> +   section boundary notes were removed on entry to cfg layout
>>>> +   mode, insert section boundary notes here.  */
>>>>
>>>>  static void
>>>> -fixup_reorder_chain (void)
>>>> +fixup_reorder_chain (bool finalize_reorder_blocks)
>>>>  {
>>>>    basic_block bb;
>>>>    rtx insn = NULL;
>>>> @@ -3150,7 +3373,7 @@ static void
>>>>           PREV_INSN (BB_HEADER (bb)) = insn;
>>>>           insn = BB_HEADER (bb);
>>>>           while (NEXT_INSN (insn))
>>>> -           insn = NEXT_INSN (insn);
>>>> +            insn = NEXT_INSN (insn);
>>>>         }
>>>>        if (insn)
>>>>         NEXT_INSN (insn) = BB_HEAD (bb);
>>>> @@ -3175,6 +3398,11 @@ static void
>>>>      insn = NEXT_INSN (insn);
>>>>
>>>>    set_last_insn (insn);
>>>> +
>>>> +  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
>>>> +  if (had_sec_boundary_notes || finalize_reorder_blocks)
>>>> +    insert_section_boundary_note ();
>>>> +
>>>>  #ifdef ENABLE_CHECKING
>>>>    verify_insn_chain ();
>>>>  #endif
>>>> @@ -3187,7 +3415,7 @@ static void
>>>>        edge e_fall, e_taken, e;
>>>>        rtx bb_end_insn;
>>>>        rtx ret_label = NULL_RTX;
>>>> -      basic_block nb, src_bb;
>>>> +      basic_block nb;
>>>>        edge_iterator ei;
>>>>
>>>>        if (EDGE_COUNT (bb->succs) == 0)
>>>> @@ -3322,7 +3550,6 @@ static void
>>>>        /* We got here if we need to add a new jump insn.
>>>>          Note force_nonfallthru can delete E_FALL and thus we have to
>>>>          save E_FALL->src prior to the call to force_nonfallthru.  */
>>>> -      src_bb = e_fall->src;
>>>>        nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label);
>>>>        if (nb)
>>>>         {
>>>> @@ -3330,17 +3557,6 @@ static void
>>>>           bb->aux = nb;
>>>>           /* Don't process this new block.  */
>>>>           bb = nb;
>>>> -
>>>> -         /* Make sure new bb is tagged for correct section (same as
>>>> -            fall-thru source, since you cannot fall-thru across
>>>> -            section boundaries).  */
>>>> -         BB_COPY_PARTITION (src_bb, single_pred (bb));
>>>> -         if (flag_reorder_blocks_and_partition
>>>> -             && targetm_common.have_named_sections
>>>> -             && JUMP_P (BB_END (bb))
>>>> -             && !any_condjump_p (BB_END (bb))
>>>> -             && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING))
>>>> -           add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX);
>>>>         }
>>>>      }
>>>>
>>>> @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to)
>>>>             case NOTE_INSN_FUNCTION_BEG:
>>>>               /* There is always just single entry to function.  */
>>>>             case NOTE_INSN_BASIC_BLOCK:
>>>> +              /* We should only switch text sections once.  */
>>>> +           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>>>>               break;
>>>>
>>>>             case NOTE_INSN_EPILOGUE_BEG:
>>>> -           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>>>>               emit_note_copy (insn);
>>>>               break;
>>>>
>>>> @@ -3759,10 +3976,13 @@ break_superblocks (void)
>>>>  }
>>>>
>>>>  /* Finalize the changes: reorder insn list according to the sequence specified
>>>> -   by aux pointers, enter compensation code, rebuild scope forest.  */
>>>> +   by aux pointers, enter compensation code, rebuild scope forest. If
>>>> +   this is called when we will FINALIZE_REORDER_BLOCKS, indicate that
>>>> +   to fixup_reorder_chain so that it can insert the proper switch text
>>>> +   section notes.  */
>>>>
>>>>  void
>>>> -cfg_layout_finalize (void)
>>>> +cfg_layout_finalize (bool finalize_reorder_blocks)
>>>>  {
>>>>  #ifdef ENABLE_CHECKING
>>>>    verify_flow_info ();
>>>> @@ -3775,7 +3995,7 @@ void
>>>>  #endif
>>>>        )
>>>>      fixup_fallthru_exit_predecessor ();
>>>> -  fixup_reorder_chain ();
>>>> +  fixup_reorder_chain (finalize_reorder_blocks);
>>>>
>>>>    rebuild_jump_labels (get_insns ());
>>>>    delete_dead_jumptables ();
>>>> @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e)
>>>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>>>>      return false;
>>>>
>>>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
>>>> -      || BB_PARTITION (src) != BB_PARTITION (target))
>>>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>>>>      return false;
>>>>
>>>>    if (!onlyjump_p (insn)
>>>>
>>>> --
>>>> This patch is available for review at http://codereview.appspot.com/6823047
>>>
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
  2012-11-26 20:20     ` Teresa Johnson
  2012-11-26 20:29       ` Teresa Johnson
@ 2012-11-26 20:43       ` Jack Howarth
  2012-11-26 20:52         ` Teresa Johnson
  1 sibling, 1 reply; 35+ messages in thread
From: Jack Howarth @ 2012-11-26 20:43 UTC (permalink / raw)
  To: Teresa Johnson
  Cc: Christophe Lyon, reply, David Li, Steven Bosscher,
	Matthew Gretton-Dann, gcc-patches

On Mon, Nov 26, 2012 at 12:19:55PM -0800, Teresa Johnson wrote:
> Are you sure you have all my changes applied? I applied the 4 patches
> attached to PR55121 into my trunk checkout that has my fixes, and to a
> pristine trunk checkout. I configured and built both for
> --target=arm-none-linux-gnueabi, and built using your options, .i file
> and gcda file. I can reproduce the failure using the pristine trunk
> with your patches but not with my fixed trunk + your patches. (I just
> updated to head to pickup recent changes and get the same result. The
> vec changes required some manual changes to the patch, which I will
> resend shortly.)

Teresa,
    Your mailer seems to have corrupted the posted patch with stray
=3D characters and line breaks. Can you repost a copy as an attachment
to the list?
             Jack

> 
> Without my fixes:
> 
> $ ~/extra/gcc_trunk_3_arm-eabi/gcc/cc1 -fpreproce
> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9
> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp
> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use
> -fno-common -o eval.s -freorder-blocks-and-partition
> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
> 2.4.2-p1, MPC version 0.8.1
> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
> 2.4.2-p1, MPC version 0.8.1
> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
> Compiler executable checksum: d19cc60a2f07de08237a8488bb35cd1a
> eval.c: In function ‘Ge’:
> eval.c:792:1: internal compiler error: in df_compact_blocks, at df-core.c:1560
>  }
>  ^
> 0x622f71 df_compact_blocks()
> ../../gcc_trunk_3/gcc/df-core.c:1560
> 0x5cfcb5 compact_blocks()
> ../../gcc_trunk_3/gcc/cfg.c:162
> 0xc9dce0 reorder_basic_blocks
> ../../gcc_trunk_3/gcc/bb-reorder.c:2154
> 0xc9dce0 rest_of_handle_reorder_blocks
> ../../gcc_trunk_3/gcc/bb-reorder.c:2219
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See <http://gcc.gnu.org/bugs.html> for instructions.
> 
> 
> With my fixes:
> 
> $ ~/extra/gcc_trunk_4_arm-eabi/gcc/cc1 -fpreproce
> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9
> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp
> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use
> -fno-common -o eval.s -freorder-blocks-and-partition
> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
> 2.4.2-p1, MPC version 0.8.1
> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
> 2.4.2-p1, MPC version 0.8.1
> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
> Compiler executable checksum: 45b468efa7c981f9afb44c4dac2424f3
> 
> 
> Thanks,
> Teresa
> 
> On Mon, Nov 26, 2012 at 8:25 AM, Christophe Lyon
> <christophe.lyon@linaro.org> wrote:
> > Hi,
> >
> > I have tested your patch on Spec2000 on ARM, and I can still see
> > several failures caused by:
> > "error: fallthru edge crosses section boundary", including the case
> > described in PR55121.
> >
> > On 26 November 2012 16:55, Teresa Johnson <tejohnson@google.com> wrote:
> >> Ping.
> >> Teresa
> >>
> >> On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote:
> >>> Revised patch that fixes failures encountered when enabling
> >>> -freorder-blocks-and-partition, including the failure reported in PR 53743.
> >>>
> >>> This includes new verification code to ensure no cold blocks dominate hot
> >>> blocks contributed by Steven Bosscher.
> >>>
> >>> I attempted to make the handling of partition updates through the optimization
> >>> passes much more consistent, removing a number of partial fixes in the code
> >>> stream in the process. The code to fixup partitions (including the BB_PARTITION
> >>> assignement, region crossing jump notes, and switch text section notes) is
> >>> now handled in a few centralized locations. For example, inside
> >>> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers
> >>> don't need to attempt the fixup themselves.
> >>>
> >>> For optimization passes that make adjustments to the cfg while in cfg layout
> >>> mode that are not easy to fix up incrementally, the new routine
> >>> fixup_partitions handles the cleanup globally. This does require calculation
> >>> of the dominance relation, however, as far as I can tell the routines which
> >>> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions)
> >>> are invoked typically once (or a small number of times in the case of
> >>> try_optimize_cfg) per optimization pass. Additionally, I compared the
> >>> -ftime-report output for some large fdo compilations and saw only minimal
> >>> increases in the dominance computation times, which were only a tiny percent
> >>> of the overall compile time.
> >>>
> >>> Additionally, I added a flag to the rtl_data structure to indicate whether
> >>> any partitioning was actually performed, so that optimizations which were
> >>> conservatively disabled whenever the flag_reorder_blocks_and_partition
> >>> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less
> >>> conservative for functions where no partitions were formed (e.g. they are
> >>> completely hot).
> >>>
> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int
> >>> benchmarks and internal google benchmarks using profile feedback and
> >>> -freorder-blocks-and-partition to get more coverage. Ok for trunk?
> >>>
> >>> Thanks,
> >>> Teresa
> >>>
> >>> 2012-11-14  Teresa Johnson  <tejohnson@google.com>
> >>>             Steven Bosscher  <steven@gcc.gnu.org>
> >>>
> >>>         * cfghooks.h (cfg_layout_finalize): New parameter.
> >>>         * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize
> >>>         parameter.
> >>>         * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert
> >>>         as this is now done by redirect_edge_and_branch_force.
> >>>         * function.c (thread_prologue_and_epilogue_insns): Insert new bb after
> >>>         barriers, new cfg_layout_finalize parameter, and don't store exit
> >>>         predecessor BB until after it is potentially split.
> >>>         * function.h (struct rtl_data): New flag has_bb_partition.
> >>>         * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter.
> >>>         * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if
> >>>         any blocks in function actually partitioned.
> >>>         (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean
> >>>         up partitioning.
> >>>         * bb-reorder.c (connect_traces): Only look for partitions and skip
> >>>         block copying if any blocks in function actually partitioned.
> >>>         (emit_barrier_after_bb): Handle insertion in non-cfglayout mode.
> >>>         (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure
> >>>         that no cold blocks dominate a hot block.
> >>>         (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert
> >>>         as this is now done by force_nonfallthru_and_redirect.
> >>>         (add_reg_crossing_jump_notes): Handle the fact that some jumps may
> >>>         already be marked with region crossing note.
> >>>         (reorder_basic_blocks): Only need to verify partitions if any
> >>>         blocks in function actually partitioned.
> >>>         (insert_section_boundary_note): Only need to insert note if any
> >>>         blocks in function actually partitioned.
> >>>         (rest_of_handle_reorder_blocks): New cfg_layout_finalize
> >>>         parameter, and remove call to insert_section_boundary_note as this
> >>>         is now called via cfg_layout_finalize/fixup_reorder_chain.
> >>>         (duplicate_computed_gotos): New cfg_layout_finalize
> >>>         parameter.
> >>>         (partition_hot_cold_basic_blocks): Set flag indicating function
> >>>         has bb partitions.
> >>>         * bb-reorder.h: Declare insert_section_boundary_note and
> >>>         emit_barrier_after_bb, which are no longer static.
> >>>         * basic-block.h: Declare new function fixup_partitions.
> >>>         * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary
> >>>         check for region crossing note.
> >>>         (fixup_partition_crossing): New function.
> >>>         (fixup_bb_partition): Ditto.
> >>>         (rtl_redirect_edge_and_branch): Fixup partition boundaries.
> >>>         (force_nonfallthru_and_redirect): Fixup partition boundaries,
> >>>         remove old code that tried to do this. Emit barrier correctly
> >>>         when we are in cfglayout mode.
> >>>         (rtl_split_edge): Correctly fixup partition boundaries.
> >>>         (commit_one_edge_insertion): Remove old code that tried to
> >>>         fixup region crossing edge since this is now handled in
> >>>         split_block, and set up insertion point correctly since
> >>>         block may now end in a jump.
> >>>         (commit_edge_insertions): Invoke fixup_partitions to sanitize partition
> >>>         boundaries after optimizations that modify cfg and before trying to
> >>>         verify the flow info.
> >>>         (fixup_partitions): New function.
> >>>         (rtl_verify_flow_info_1): Add verification that no cold bbs dominate
> >>>         hot bbs.
> >>>         (record_effective_endpoints): Remove region-crossing notes and set flag
> >>>         indicating that they need to be reinserted on exit from cfglayout mode.
> >>>         (outof_cfg_layout_mode): New cfg_layout_finalize parameter.
> >>>         (fixup_reorder_chain): Call insert_section_boundary_note if necessary.
> >>>         Remove old code that attempted to fixup region crossing note as
> >>>         this is now handled in force_nonfallthru_and_redirect.
> >>>         (duplicate_insn_chain): Don't duplicate switch section notes.
> >>>         (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain.
> >>>         (rtl_can_remove_branch_p): Remove unnecessary check for region crossing
> >>>         note.
> >>>
> >>> Index: cfghooks.h
> >>> ===================================================================
> >>> --- cfghooks.h  (revision 193376)
> >>> +++ cfghooks.h  (working copy)
> >>> @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas
> >>>  void account_profile_record (struct profile_record *, int);
> >>>
> >>>  extern void cfg_layout_initialize (unsigned int);
> >>> -extern void cfg_layout_finalize (void);
> >>> +extern void cfg_layout_finalize (bool);
> >>>
> >>>  /* Hooks containers.  */
> >>>  extern struct cfg_hooks gimple_cfg_hooks;
> >>> @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi
> >>>  extern void gimple_register_cfg_hooks (void);
> >>>  extern struct cfg_hooks get_cfg_hooks (void);
> >>>  extern void set_cfg_hooks (struct cfg_hooks);
> >>> -
> >>> Index: modulo-sched.c
> >>> ===================================================================
> >>> --- modulo-sched.c      (revision 193376)
> >>> +++ modulo-sched.c      (working copy)
> >>> @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void)
> >>>      if (bb->next_bb != EXIT_BLOCK_PTR)
> >>>        bb->aux = bb->next_bb;
> >>>    free_dominance_info (CDI_DOMINATORS);
> >>> -  cfg_layout_finalize ();
> >>> +  cfg_layout_finalize (false);
> >>>  #endif /* INSN_SCHEDULING */
> >>>    return 0;
> >>>  }
> >>> Index: ifcvt.c
> >>> ===================================================================
> >>> --- ifcvt.c     (revision 193376)
> >>> +++ ifcvt.c     (working copy)
> >>> @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg
> >>>    if (new_bb)
> >>>      {
> >>>        df_bb_replace (then_bb_index, new_bb);
> >>> -      /* Since the fallthru edge was redirected from test_bb to new_bb,
> >>> -         we need to ensure that new_bb is in the same partition as
> >>> -         test bb (you can not fall through across section boundaries).  */
> >>> -      BB_COPY_PARTITION (new_bb, test_bb);
> >>> +      /* This should have been done above via force_nonfallthru_and_redirect
> >>> +         (possibly called from redirect_edge_and_branch_force).  */
> >>> +      gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb));
> >>>      }
> >>>
> >>>    num_true_changes++;
> >>> Index: function.c
> >>> ===================================================================
> >>> --- function.c  (revision 193376)
> >>> +++ function.c  (working copy)
> >>> @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void)
> >>>                     break;
> >>>                 if (e)
> >>>                   {
> >>> -                   copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)),
> >>> -                                                 NULL_RTX, e->src);
> >>> +                    /* Make sure we insert after any barriers.  */
> >>> +                    rtx end = get_last_bb_insn (e->src);
> >>> +                    copy_bb = create_basic_block (NEXT_INSN (end),
> >>> +                                                  NULL_RTX, e->src);
> >>>                     BB_COPY_PARTITION (copy_bb, e->src);
> >>>                   }
> >>>                 else
> >>> @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void)
> >>>         if (cur_bb->index >= NUM_FIXED_BLOCKS
> >>>             && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS)
> >>>           cur_bb->aux = cur_bb->next_bb;
> >>> -      cfg_layout_finalize ();
> >>> +      cfg_layout_finalize (false);
> >>>      }
> >>>
> >>>  epilogue_done:
> >>> @@ -6517,7 +6519,7 @@ epilogue_done:
> >>>        basic_block simple_return_block_cold = NULL;
> >>>        edge pending_edge_hot = NULL;
> >>>        edge pending_edge_cold = NULL;
> >>> -      basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb;
> >>> +      basic_block exit_pred;
> >>>        int i;
> >>>
> >>>        gcc_assert (entry_edge != orig_entry_edge);
> >>> @@ -6545,6 +6547,12 @@ epilogue_done:
> >>>             else
> >>>               pending_edge_cold = e;
> >>>           }
> >>> +
> >>> +      /* Save a pointer to the exit's predecessor BB for use in
> >>> +         inserting new BBs at the end of the function. Do this
> >>> +         after the call to split_block above which may split
> >>> +         the original exit pred.  */
> >>> +      exit_pred = EXIT_BLOCK_PTR->prev_bb;
> >>>
> >>>        FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e)
> >>>         {
> >>> Index: function.h
> >>> ===================================================================
> >>> --- function.h  (revision 193376)
> >>> +++ function.h  (working copy)
> >>> @@ -459,6 +459,11 @@ struct GTY(()) rtl_data {
> >>>       sched2) and is useful only if the port defines LEAF_REGISTERS.  */
> >>>    bool uses_only_leaf_regs;
> >>>
> >>> +  /* Nonzero if the function being compiled has undergone hot/cold partitioning
> >>> +     (under flag_reorder_blocks_and_partition) and has at least one cold
> >>> +     block.  */
> >>> +  bool has_bb_partition;
> >>> +
> >>>    /* Like regs_ever_live, but 1 if a reg is set or clobbered from an
> >>>       asm.  Unlike regs_ever_live, elements of this array corresponding
> >>>       to eliminable regs (like the frame pointer) are set if an asm
> >>> Index: hw-doloop.c
> >>> ===================================================================
> >>> --- hw-doloop.c (revision 193376)
> >>> +++ hw-doloop.c (working copy)
> >>> @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops)
> >>>        else
> >>>         bb->aux = NULL;
> >>>      }
> >>> -  cfg_layout_finalize ();
> >>> +  cfg_layout_finalize (false);
> >>>    clear_aux_for_blocks ();
> >>>    df_analyze ();
> >>>  }
> >>> Index: cfgcleanup.c
> >>> ===================================================================
> >>> --- cfgcleanup.c        (revision 193376)
> >>> +++ cfgcleanup.c        (working copy)
> >>> @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2,
> >>>       partition boundaries).  See the comments at the top of
> >>>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
> >>>
> >>> -  if (flag_reorder_blocks_and_partition && reload_completed)
> >>> +  if (crtl->has_bb_partition && reload_completed)
> >>>      return false;
> >>>
> >>>    /* Search backward through forwarder blocks.  We don't need to worry
> >>> @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode)
> >>>               df_analyze ();
> >>>             }
> >>>
> >>> +         if (changed)
> >>> +            {
> >>> +              /* Edge forwarding in particular can cause hot blocks previously
> >>> +                 reached by both hot and cold blocks to become dominated only
> >>> +                 by cold blocks. This will cause the verification below to fail,
> >>> +                 and lead to now cold code in the hot section. This is not easy
> >>> +                 to detect and fix during edge forwarding, and in some cases
> >>> +                 is only visible after newly unreachable blocks are deleted,
> >>> +                 which will be done in fixup_partitions.  */
> >>> +              fixup_partitions ();
> >>> +
> >>>  #ifdef ENABLE_CHECKING
> >>> -         if (changed)
> >>> -           verify_flow_info ();
> >>> +              verify_flow_info ();
> >>>  #endif
> >>> +            }
> >>>
> >>>           changed_overall |= changed;
> >>>           first_pass = false;
> >>> Index: bb-reorder.c
> >>> ===================================================================
> >>> --- bb-reorder.c        (revision 193376)
> >>> +++ bb-reorder.c        (working copy)
> >>> @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces
> >>>    current_partition = BB_PARTITION (traces[0].first);
> >>>    two_passes = false;
> >>>
> >>> -  if (flag_reorder_blocks_and_partition)
> >>> +  if (crtl->has_bb_partition)
> >>>      for (i = 0; i < n_traces && !two_passes; i++)
> >>>        if (BB_PARTITION (traces[0].first)
> >>>           != BB_PARTITION (traces[i].first))
> >>> @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces
> >>>                       }
> >>>                   }
> >>>
> >>> -             if (flag_reorder_blocks_and_partition)
> >>> +             if (crtl->has_bb_partition)
> >>>                 try_copy = false;
> >>>
> >>>               /* Copy tiny blocks always; copy larger blocks only when the
> >>> @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void)
> >>>    return length;
> >>>  }
> >>>
> >>> -/* Emit a barrier into the footer of BB.  */
> >>> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode.  */
> >>>
> >>> -static void
> >>> +void
> >>>  emit_barrier_after_bb (basic_block bb)
> >>>  {
> >>>    rtx barrier = emit_barrier_after (BB_END (bb));
> >>> -  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
> >>> +  if (current_ir_type () == IR_RTL_CFGLAYOUT)
> >>> +    BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
> >>>  }
> >>>
> >>>  /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.
> >>> @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg
> >>>  {
> >>>    VEC(edge, heap) *crossing_edges = NULL;
> >>>    basic_block bb;
> >>> -  edge e;
> >>> -  edge_iterator ei;
> >>> +  edge e, e2;
> >>> +  edge_iterator ei, ei2;
> >>> +  unsigned int cold_bb_count = 0;
> >>> +  VEC (basic_block, heap) *bbs_in_hot_partition = NULL;
> >>> +  VEC (basic_block, heap) *bbs_newly_hot = NULL;
> >>>
> >>>    /* Mark which partition (hot/cold) each basic block belongs in.  */
> >>>    FOR_EACH_BB (bb)
> >>>      {
> >>>        if (probably_never_executed_bb_p (cfun, bb))
> >>> -       BB_SET_PARTITION (bb, BB_COLD_PARTITION);
> >>> +        {
> >>> +          BB_SET_PARTITION (bb, BB_COLD_PARTITION);
> >>> +          cold_bb_count++;
> >>> +        }
> >>>        else
> >>> -       BB_SET_PARTITION (bb, BB_HOT_PARTITION);
> >>> +        {
> >>> +          BB_SET_PARTITION (bb, BB_HOT_PARTITION);
> >>> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb);
> >>> +        }
> >>>      }
> >>>
> >>> +  /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of
> >>> +     several different possibilities. One is that there are edge weight insanities
> >>> +     due to optimization phases that do not properly update basic block profile
> >>> +     counts. The second is that the entry of the function may not be hot, because
> >>> +     it is entered fewer times than the number of profile training runs, but there
> >>> +     is a loop inside the function that causes blocks within the function to be
> >>> +     above the threshold for hotness.  */
> >>> +  if (cold_bb_count)
> >>> +    {
> >>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
> >>> +
> >>> +      if (dom_calculated_here)
> >>> +        calculate_dominance_info (CDI_DOMINATORS);
> >>> +
> >>> +      /* Keep examining hot bbs until we have either checked them all, or
> >>> +         re-marked all cold bbs hot.  */
> >>> +      while (! VEC_empty (basic_block, bbs_in_hot_partition)
> >>> +             && cold_bb_count)
> >>> +        {
> >>> +          basic_block dom_bb;
> >>> +
> >>> +          bb = VEC_pop (basic_block, bbs_in_hot_partition);
> >>> +          dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
> >>> +
> >>> +          /* If bb's immediate dominator is also hot then it is ok.  */
> >>> +          if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION)
> >>> +            continue;
> >>> +
> >>> +          /* We have a hot bb with an immediate dominator that is cold.
> >>> +             The dominator needs to be re-marked to hot.  */
> >>> +          BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION);
> >>> +          cold_bb_count--;
> >>> +
> >>> +          /* Now we need to examine newly-hot dom_bb to see if it is also
> >>> +             dominated by a cold bb.  */
> >>> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb);
> >>> +
> >>> +          /* We should also adjust any cold blocks that the newly-hot bb
> >>> +             feeds and see if it makes sense to re-mark those as hot as
> >>> +             well.  */
> >>> +          VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb);
> >>> +          while (! VEC_empty (basic_block, bbs_newly_hot))
> >>> +            {
> >>> +              basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot);
> >>> +              /* Examine all successors of this newly-hot bb to see if they
> >>> +                 are cold and should be re-marked as hot.  */
> >>> +              FOR_EACH_EDGE (e, ei, new_hot_bb->succs)
> >>> +                {
> >>> +                  bool any_cold_preds = false;
> >>> +                  basic_block succ = e->dest;
> >>> +                  if (BB_PARTITION (succ) != BB_COLD_PARTITION)
> >>> +                    continue;
> >>> +                  /* Does this block have any cold predecessors now?  */
> >>> +                  FOR_EACH_EDGE (e2, ei2, succ->preds)
> >>> +                  {
> >>> +                    if (BB_PARTITION (e2->src) == BB_COLD_PARTITION)
> >>> +                      {
> >>> +                        any_cold_preds = true;
> >>> +                        break;
> >>> +                      }
> >>> +                  }
> >>> +                  if (any_cold_preds)
> >>> +                    continue;
> >>> +
> >>> +                  /* Here we have a successor of newly-hot bb that is cold
> >>> +                     but no longer has any cold precessessors. Since the original
> >>> +                     assignment of our newly-hot bb was incorrect, this successor's
> >>> +                     assignment as cold is also suspect. Go ahead and re-mark it
> >>> +                     as hot now too. Better heuristics may be in order here.  */
> >>> +                  BB_SET_PARTITION (succ, BB_HOT_PARTITION);
> >>> +                  cold_bb_count--;
> >>> +                  VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ);
> >>> +                  /* Examine this successor as a newly-hot bb.  */
> >>> +                  VEC_safe_push (basic_block, heap, bbs_newly_hot, succ);
> >>> +                }
> >>> +            }
> >>> +        }
> >>> +
> >>> +      if (dom_calculated_here)
> >>> +        free_dominance_info (CDI_DOMINATORS);
> >>> +    }
> >>> +
> >>>    /* The format of .gcc_except_table does not allow landing pads to
> >>>       be in a different partition as the throw.  Fix this by either
> >>>       moving or duplicating the landing pads.  */
> >>> @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void)
> >>>                       new_bb->aux = cur_bb->aux;
> >>>                       cur_bb->aux = new_bb;
> >>>
> >>> -                     /* Make sure new fall-through bb is in same
> >>> -                        partition as bb it's falling through from.  */
> >>> +                      /* This is done by force_nonfallthru_and_redirect.  */
> >>> +                     gcc_assert (BB_PARTITION (new_bb)
> >>> +                                  == BB_PARTITION (cur_bb));
> >>>
> >>> -                     BB_COPY_PARTITION (new_bb, cur_bb);
> >>>                       single_succ_edge (new_bb)->flags |= EDGE_CROSSING;
> >>>                     }
> >>>                   else
> >>> @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void)
> >>>    FOR_EACH_BB (bb)
> >>>      FOR_EACH_EDGE (e, ei, bb->succs)
> >>>        if ((e->flags & EDGE_CROSSING)
> >>> -         && JUMP_P (BB_END (e->src)))
> >>> +         && JUMP_P (BB_END (e->src))
> >>> +          /* Some notes were added during fix_up_fall_thru_edges, via
> >>> +             force_nonfallthru_and_redirect.  */
> >>> +          && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX))
> >>>         add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
> >>>  }
> >>>
> >>> @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void)
> >>>        dump_flow_info (dump_file, dump_flags);
> >>>      }
> >>>
> >>> -  if (flag_reorder_blocks_and_partition)
> >>> +  if (crtl->has_bb_partition)
> >>>      verify_hot_cold_block_grouping ();
> >>>  }
> >>>
> >>> @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void)
> >>>     encountering this note will make the compiler switch between the
> >>>     hot and cold text sections.  */
> >>>
> >>> -static void
> >>> +void
> >>>  insert_section_boundary_note (void)
> >>>  {
> >>>    basic_block bb;
> >>>    rtx new_note;
> >>>    int first_partition = 0;
> >>>
> >>> -  if (!flag_reorder_blocks_and_partition)
> >>> +  if (!crtl->has_bb_partition)
> >>>      return;
> >>>
> >>>    FOR_EACH_BB (bb)
> >>> @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void)
> >>>    FOR_EACH_BB (bb)
> >>>      if (bb->next_bb != EXIT_BLOCK_PTR)
> >>>        bb->aux = bb->next_bb;
> >>> -  cfg_layout_finalize ();
> >>> +  cfg_layout_finalize (true);
> >>>
> >>> -  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
> >>> -  insert_section_boundary_note ();
> >>>    return 0;
> >>>  }
> >>>
> >>> @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void)
> >>>      }
> >>>
> >>>  done:
> >>> -  cfg_layout_finalize ();
> >>> +  cfg_layout_finalize (false);
> >>>
> >>>    BITMAP_FREE (candidates);
> >>>    return 0;
> >>> @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void)
> >>>    if (crossing_edges == NULL)
> >>>      return 0;
> >>>
> >>> +  crtl->has_bb_partition = true;
> >>> +
> >>>    /* Make sure the source of any crossing edge ends in a jump and the
> >>>       destination of any crossing edge has a label.  */
> >>>    add_labels_and_missing_jumps (crossing_edges);
> >>> Index: bb-reorder.h
> >>> ===================================================================
> >>> --- bb-reorder.h        (revision 193376)
> >>> +++ bb-reorder.h        (working copy)
> >>> @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re
> >>>
> >>>  extern int get_uncond_jump_length (void);
> >>>
> >>> +extern void insert_section_boundary_note (void);
> >>> +
> >>> +extern void emit_barrier_after_bb (basic_block bb);
> >>> +
> >>>  #endif
> >>> Index: basic-block.h
> >>> ===================================================================
> >>> --- basic-block.h       (revision 193376)
> >>> +++ basic-block.h       (working copy)
> >>> @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect
> >>>  extern bool contains_no_active_insn_p (const_basic_block);
> >>>  extern bool forwarder_block_p (const_basic_block);
> >>>  extern bool can_fallthru (basic_block, basic_block);
> >>> +extern void fixup_partitions (void);
> >>>
> >>>  /* In cfgbuild.c.  */
> >>>  extern void find_many_sub_basic_blocks (sbitmap);
> >>> Index: cfgrtl.c
> >>> ===================================================================
> >>> --- cfgrtl.c    (revision 193376)
> >>> +++ cfgrtl.c    (working copy)
> >>> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
> >>>  #include "tree.h"
> >>>  #include "hard-reg-set.h"
> >>>  #include "basic-block.h"
> >>> +#include "bb-reorder.h"
> >>>  #include "regs.h"
> >>>  #include "flags.h"
> >>>  #include "function.h"
> >>> @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3.  If not see
> >>>     Only applicable if the CFG is in cfglayout mode.  */
> >>>  static GTY(()) rtx cfg_layout_function_footer;
> >>>  static GTY(()) rtx cfg_layout_function_header;
> >>> +static bool had_sec_boundary_notes;
> >>>
> >>>  static rtx skip_insns_after_block (basic_block);
> >>>  static void record_effective_endpoints (void);
> >>>  static rtx label_for_bb (basic_block);
> >>> -static void fixup_reorder_chain (void);
> >>> +static void fixup_reorder_chain (bool finalize_reorder_blocks);
> >>>
> >>>  void verify_insn_chain (void);
> >>>  static void fixup_fallthru_exit_predecessor (void);
> >>> @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc
> >>>       partition boundaries).  See  the comments at the top of
> >>>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
> >>>
> >>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
> >>> -      || BB_PARTITION (src) != BB_PARTITION (target))
> >>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
> >>>      return NULL;
> >>>
> >>>    /* We can replace or remove a complex jump only when we have exactly
> >>> @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target)
> >>>    return e;
> >>>  }
> >>>
> >>> +/* Called when edge E has been redirected to a new destination,
> >>> +   in order to update the region crossing flag on the edge and
> >>> +   jump.  */
> >>> +
> >>> +static void
> >>> +fixup_partition_crossing (edge e, basic_block target)
> >>> +{
> >>> +  rtx note;
> >>> +
> >>> +  gcc_assert (e->dest == target);
> >>> +
> >>> +  if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR)
> >>> +    return;
> >>> +  /* If we redirected an existing edge, it may already be marked
> >>> +     crossing, even though the new src is missing a reg crossing note.
> >>> +     But make sure reg crossing note doesn't already exist before
> >>> +     inserting.  */
> >>> +  if (BB_PARTITION (e->src) != BB_PARTITION (target))
> >>> +    {
> >>> +      e->flags |= EDGE_CROSSING;
> >>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
> >>> +      if (JUMP_P (BB_END (e->src))
> >>> +          && !note)
> >>> +        add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
> >>> +    }
> >>> +  else if (BB_PARTITION (e->src) == BB_PARTITION (target))
> >>> +    {
> >>> +      e->flags &= ~EDGE_CROSSING;
> >>> +      /* Remove the region crossing note from jump at end of
> >>> +         e->src if it exists.  */
> >>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
> >>> +      if (note)
> >>> +        remove_note (BB_END (e->src), note);
> >>> +    }
> >>> +}
> >>> +
> >>> +/* Called when block BB has been reassigned to a different partition,
> >>> +   to ensure that the region crossing attributes are updated.  */
> >>> +
> >>> +static void
> >>> +fixup_bb_partition (basic_block bb)
> >>> +{
> >>> +  edge e;
> >>> +  edge_iterator ei;
> >>> +
> >>> +  /* Now need to make bb's pred edges non-region crossing.  */
> >>> +  FOR_EACH_EDGE (e, ei, bb->preds)
> >>> +    {
> >>> +      fixup_partition_crossing (e, e->dest);
> >>> +    }
> >>> +
> >>> +  /* Possibly need to make bb's successor edges region crossing,
> >>> +     or remove stale region crossing.  */
> >>> +  FOR_EACH_EDGE (e, ei, bb->succs)
> >>> +    {
> >>> +      if ((e->flags & EDGE_FALLTHRU)
> >>> +          && BB_PARTITION (bb) != BB_PARTITION (e->dest)
> >>> +          && e->dest != EXIT_BLOCK_PTR)
> >>> +        /* force_nonfallthru_and_redirect calls fixup_partition_crossing.  */
> >>> +        force_nonfallthru (e);
> >>> +      else
> >>> +        fixup_partition_crossing (e, e->dest);
> >>> +    }
> >>> +}
> >>> +
> >>>  /* Attempt to change code to redirect edge E to TARGET.  Don't do that on
> >>>     expense of adding new instructions or reordering basic blocks.
> >>>
> >>> @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block
> >>>  {
> >>>    edge ret;
> >>>    basic_block src = e->src;
> >>> +  basic_block dest = e->dest;
> >>>
> >>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
> >>>      return NULL;
> >>>
> >>> -  if (e->dest == target)
> >>> +  if (dest == target)
> >>>      return e;
> >>>
> >>>    if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL)
> >>>      {
> >>>        df_set_bb_dirty (src);
> >>> +      fixup_partition_crossing (ret, target);
> >>>        return ret;
> >>>      }
> >>>
> >>> @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block
> >>>      return NULL;
> >>>
> >>>    df_set_bb_dirty (src);
> >>> +  fixup_partition_crossing (ret, target);
> >>>    return ret;
> >>>  }
> >>>
> >>> @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
> >>>        /* Make sure new block ends up in correct hot/cold section.  */
> >>>
> >>>        BB_COPY_PARTITION (jump_block, e->src);
> >>> -      if (flag_reorder_blocks_and_partition
> >>> -         && targetm_common.have_named_sections
> >>> -         && JUMP_P (BB_END (jump_block))
> >>> -         && !any_condjump_p (BB_END (jump_block))
> >>> -         && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING))
> >>> -       add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX);
> >>>
> >>>        /* Wire edge in.  */
> >>>        new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU);
> >>>        new_edge->probability = probability;
> >>>        new_edge->count = count;
> >>>
> >>> +      /* If e->src was previously region crossing, it no longer is
> >>> +         and the reg crossing note should be removed.  */
> >>> +      fixup_partition_crossing (new_edge, jump_block);
> >>> +
> >>>        /* Redirect old edge.  */
> >>>        redirect_edge_pred (e, jump_block);
> >>>        e->probability = REG_BR_PROB_BASE;
> >>> @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
> >>>        LABEL_NUSES (label)++;
> >>>      }
> >>>
> >>> -  emit_barrier_after (BB_END (jump_block));
> >>> +  /* We might be in cfg layout mode, and if so, the following routine will
> >>> +     insert the barrier correctly.  */
> >>> +  emit_barrier_after_bb (jump_block);
> >>>    redirect_edge_succ_nodup (e, target);
> >>>
> >>>    if (abnormal_edge_flags)
> >>>      make_edge (src, target, abnormal_edge_flags);
> >>>
> >>>    df_mark_solutions_dirty ();
> >>> +  fixup_partition_crossing (e, target);
> >>>    return new_bb;
> >>>  }
> >>>
> >>> @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU
> >>>  static basic_block
> >>>  rtl_split_edge (edge edge_in)
> >>>  {
> >>> -  basic_block bb;
> >>> +  basic_block bb, new_bb;
> >>>    rtx before;
> >>>
> >>>    /* Abnormal edges cannot be split.  */
> >>> @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in)
> >>>    else
> >>>      {
> >>>        bb = create_basic_block (before, NULL, edge_in->dest->prev_bb);
> >>> -      /* ??? Why not edge_in->dest->prev_bb here?  */
> >>> -      BB_COPY_PARTITION (bb, edge_in->dest);
> >>> +      if (edge_in->src == ENTRY_BLOCK_PTR)
> >>> +        BB_COPY_PARTITION (bb, edge_in->dest);
> >>> +      else
> >>> +        /* Put the split bb into the src partition, to avoid creating
> >>> +           a situation where a cold bb dominates a hot bb, in the case
> >>> +           where src is cold and dest is hot. The src will dominate
> >>> +           the new bb (whereas it might not have dominated dest).  */
> >>> +        BB_COPY_PARTITION (bb, edge_in->src);
> >>>      }
> >>>
> >>>    make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU);
> >>>
> >>> +  /* Can't allow a region crossing edge to be fallthrough.  */
> >>> +  if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest)
> >>> +      && edge_in->dest != EXIT_BLOCK_PTR)
> >>> +    {
> >>> +      new_bb = force_nonfallthru (single_succ_edge (bb));
> >>> +      gcc_assert (!new_bb);
> >>> +    }
> >>> +
> >>>    /* For non-fallthru edges, we must adjust the predecessor's
> >>>       jump instruction to target our new block.  */
> >>>    if ((edge_in->flags & EDGE_FALLTHRU) == 0)
> >>> @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e)
> >>>    else
> >>>      {
> >>>        bb = split_edge (e);
> >>> -      after = BB_END (bb);
> >>>
> >>> -      if (flag_reorder_blocks_and_partition
> >>> -         && targetm_common.have_named_sections
> >>> -         && e->src != ENTRY_BLOCK_PTR
> >>> -         && BB_PARTITION (e->src) == BB_COLD_PARTITION
> >>> -         && !(e->flags & EDGE_CROSSING)
> >>> -         && JUMP_P (after)
> >>> -         && !any_condjump_p (after)
> >>> -         && (single_succ_edge (bb)->flags & EDGE_CROSSING))
> >>> -       add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX);
> >>> +      /* If e crossed a partition boundary, we needed to make bb end in
> >>> +         a region-crossing jump, even though it was originally fallthru.  */
> >>> +      if (JUMP_P (BB_END (bb)))
> >>> +       before = BB_END (bb);
> >>> +      else
> >>> +        after = BB_END (bb);
> >>>      }
> >>>
> >>>    /* Now that we've found the spot, do the insertion.  */
> >>> @@ -1827,6 +1907,14 @@ commit_edge_insertions (void)
> >>>  {
> >>>    basic_block bb;
> >>>
> >>> +  /* Optimization passes that invoke this routine can cause hot blocks
> >>> +     previously reached by both hot and cold blocks to become dominated only
> >>> +     by cold blocks. This will cause the verification below to fail,
> >>> +     and lead to now cold code in the hot section. In some cases this
> >>> +     may only be visible after newly unreachable blocks are deleted,
> >>> +     which will be done by fixup_partitions.  */
> >>> +  fixup_partitions ();
> >>> +
> >>>  #ifdef ENABLE_CHECKING
> >>>    verify_flow_info ();
> >>>  #endif
> >>> @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb)
> >>>
> >>>    return end;
> >>>  }
> >>> -
> >>> +
> >>> +/* Perform cleanup on the hot/cold bb partitioning after optimization
> >>> +   passes that modify the cfg.  */
> >>> +
> >>> +void
> >>> +fixup_partitions (void)
> >>> +{
> >>> +  basic_block bb;
> >>> +
> >>> +  if (!crtl->has_bb_partition)
> >>> +    return;
> >>> +
> >>> +  /* Delete any blocks that became unreachable and weren't
> >>> +     already cleaned up, for example during edge forwarding
> >>> +     and convert_jumps_to_returns. This will expose more
> >>> +     opportunities for fixing the partition boundaries here.
> >>> +     Also, the calculation of the dominance graph during verification
> >>> +     will assert if there are unreachable nodes.  */
> >>> +  delete_unreachable_blocks ();
> >>> +
> >>> +  /* If there are partitions, do a sanity check on them: A basic block in
> >>> +     a cold partition cannot dominate a basic block in a hot partition.
> >>> +     Fixup any that now violate this requirement, as a result of edge
> >>> +     forwarding and unreachable block deletion.  */
> >>> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
> >>> +  VEC (basic_block, heap) *bbs_to_fix = NULL;
> >>> +  FOR_EACH_BB (bb)
> >>> +    if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
> >>> +      VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
> >>> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
> >>> +    {
> >>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
> >>> +      basic_block son;
> >>> +
> >>> +      if (dom_calculated_here)
> >>> +        calculate_dominance_info (CDI_DOMINATORS);
> >>> +
> >>> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
> >>> +        {
> >>> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
> >>> +          /* If bb is not yet cold (because it was added below as
> >>> +             a block dominated by a cold bb) then mark it cold here.  */
> >>> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
> >>> +            {
> >>> +              BB_SET_PARTITION (bb, BB_COLD_PARTITION);
> >>> +              VEC_safe_push (basic_block, heap, bbs_to_fix, bb);
> >>> +            }
> >>> +          /* Any blocks dominated by a block in the cold section
> >>> +             must also be cold.  */
> >>> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
> >>> +               son;
> >>> +               son = next_dom_son (CDI_DOMINATORS, son))
> >>> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
> >>> +        }
> >>> +
> >>> +      if (dom_calculated_here)
> >>> +        free_dominance_info (CDI_DOMINATORS);
> >>> +    }
> >>> +
> >>> +  /* Do the partition fixup after all necessary blocks have been converted to
> >>> +     cold, so that we only update the region crossings the minimum number of
> >>> +     places, which can require forcing edges to be non fallthru.  */
> >>> +  while (! VEC_empty (basic_block, bbs_to_fix))
> >>> +    {
> >>> +      bb = VEC_pop (basic_block, bbs_to_fix);
> >>> +      fixup_bb_partition (bb);
> >>> +    }
> >>> +}
> >>> +
> >>>  /* Verify the CFG and RTL consistency common for both underlying RTL and
> >>>     cfglayout RTL.
> >>>
> >>> @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void)
> >>>    rtx x;
> >>>    int err = 0;
> >>>    basic_block bb;
> >>> +  bool have_partitions = false;
> >>>
> >>>    /* Check the general integrity of the basic blocks.  */
> >>>    FOR_EACH_BB_REVERSE (bb)
> >>> @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void)
> >>>
> >>>           if (e->flags & EDGE_ABNORMAL)
> >>>             n_abnormal++;
> >>> +
> >>> +          have_partitions |= is_crossing;
> >>>         }
> >>>
> >>>        if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX))
> >>> @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void)
> >>>           }
> >>>      }
> >>>
> >>> +  /* If there are partitions, do a sanity check on them: A basic block in
> >>> +     a cold partition cannot dominate a basic block in a hot partition.  */
> >>> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
> >>> +  if (have_partitions && !err)
> >>> +    FOR_EACH_BB (bb)
> >>> +      if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
> >>> +        VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
> >>> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
> >>> +    {
> >>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
> >>> +      basic_block son;
> >>> +
> >>> +      if (dom_calculated_here)
> >>> +        calculate_dominance_info (CDI_DOMINATORS);
> >>> +
> >>> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
> >>> +        {
> >>> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
> >>> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
> >>> +            {
> >>> +              error ("non-cold basic block %d dominated "
> >>> +                     "by a block in the cold partition", bb->index);
> >>> +              err = 1;
> >>> +            }
> >>> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
> >>> +               son;
> >>> +               son = next_dom_son (CDI_DOMINATORS, son))
> >>> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
> >>> +        }
> >>> +
> >>> +      if (dom_calculated_here)
> >>> +        free_dominance_info (CDI_DOMINATORS);
> >>> +    }
> >>> +
> >>>    /* Clean up.  */
> >>>    return err;
> >>>  }
> >>> @@ -2965,14 +3158,41 @@ record_effective_endpoints (void)
> >>>    else
> >>>      cfg_layout_function_header = NULL_RTX;
> >>>
> >>> +  had_sec_boundary_notes = false;
> >>> +
> >>>    next_insn = get_insns ();
> >>>    FOR_EACH_BB (bb)
> >>>      {
> >>>        rtx end;
> >>>
> >>>        if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb))
> >>> -       BB_HEADER (bb) = unlink_insn_chain (next_insn,
> >>> -                                             PREV_INSN (BB_HEAD (bb)));
> >>> +        {
> >>> +          /* Rather than try to keep section boundary notes incrementally
> >>> +             up-to-date through cfg layout optimizations, simply remove them
> >>> +             and flag that they should be re-inserted when exiting
> >>> +             cfg layout mode.  */
> >>> +          rtx check_insn = next_insn;
> >>> +          while (check_insn)
> >>> +            {
> >>> +              if (NOTE_P (check_insn)
> >>> +                  && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS)
> >>> +              {
> >>> +                had_sec_boundary_notes |= true;
> >>> +                /* Remove note from chain. Grab new next_insn first.  */
> >>> +                if (next_insn == check_insn)
> >>> +                  next_insn = NEXT_INSN (check_insn);
> >>> +                /* Delete note.  */
> >>> +                delete_insn (check_insn);
> >>> +                /* There will only be one.  */
> >>> +                break;
> >>> +              }
> >>> +              check_insn = NEXT_INSN (check_insn);
> >>> +            }
> >>> +          /* If we still have header instructions left after above loop.  */
> >>> +          if (next_insn != BB_HEAD (bb))
> >>> +            BB_HEADER (bb) = unlink_insn_chain (next_insn,
> >>> +                                                PREV_INSN (BB_HEAD (bb)));
> >>> +        }
> >>>        end = skip_insns_after_block (bb);
> >>>        if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end)
> >>>         BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end);
> >>> @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void)
> >>>      if (bb->next_bb != EXIT_BLOCK_PTR)
> >>>        bb->aux = bb->next_bb;
> >>>
> >>> -  cfg_layout_finalize ();
> >>> +  cfg_layout_finalize (false);
> >>>
> >>>    return 0;
> >>>  }
> >>> @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode)
> >>>  }
> >>>
> >>>
> >>> -/* Given a reorder chain, rearrange the code to match.  */
> >>> +/* Given a reorder chain, rearrange the code to match. If
> >>> +   this is called when we will FINALIZE_REORDER_BLOCKS, or when
> >>> +   section boundary notes were removed on entry to cfg layout
> >>> +   mode, insert section boundary notes here.  */
> >>>
> >>>  static void
> >>> -fixup_reorder_chain (void)
> >>> +fixup_reorder_chain (bool finalize_reorder_blocks)
> >>>  {
> >>>    basic_block bb;
> >>>    rtx insn = NULL;
> >>> @@ -3150,7 +3373,7 @@ static void
> >>>           PREV_INSN (BB_HEADER (bb)) = insn;
> >>>           insn = BB_HEADER (bb);
> >>>           while (NEXT_INSN (insn))
> >>> -           insn = NEXT_INSN (insn);
> >>> +            insn = NEXT_INSN (insn);
> >>>         }
> >>>        if (insn)
> >>>         NEXT_INSN (insn) = BB_HEAD (bb);
> >>> @@ -3175,6 +3398,11 @@ static void
> >>>      insn = NEXT_INSN (insn);
> >>>
> >>>    set_last_insn (insn);
> >>> +
> >>> +  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
> >>> +  if (had_sec_boundary_notes || finalize_reorder_blocks)
> >>> +    insert_section_boundary_note ();
> >>> +
> >>>  #ifdef ENABLE_CHECKING
> >>>    verify_insn_chain ();
> >>>  #endif
> >>> @@ -3187,7 +3415,7 @@ static void
> >>>        edge e_fall, e_taken, e;
> >>>        rtx bb_end_insn;
> >>>        rtx ret_label = NULL_RTX;
> >>> -      basic_block nb, src_bb;
> >>> +      basic_block nb;
> >>>        edge_iterator ei;
> >>>
> >>>        if (EDGE_COUNT (bb->succs) == 0)
> >>> @@ -3322,7 +3550,6 @@ static void
> >>>        /* We got here if we need to add a new jump insn.
> >>>          Note force_nonfallthru can delete E_FALL and thus we have to
> >>>          save E_FALL->src prior to the call to force_nonfallthru.  */
> >>> -      src_bb = e_fall->src;
> >>>        nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label);
> >>>        if (nb)
> >>>         {
> >>> @@ -3330,17 +3557,6 @@ static void
> >>>           bb->aux = nb;
> >>>           /* Don't process this new block.  */
> >>>           bb = nb;
> >>> -
> >>> -         /* Make sure new bb is tagged for correct section (same as
> >>> -            fall-thru source, since you cannot fall-thru across
> >>> -            section boundaries).  */
> >>> -         BB_COPY_PARTITION (src_bb, single_pred (bb));
> >>> -         if (flag_reorder_blocks_and_partition
> >>> -             && targetm_common.have_named_sections
> >>> -             && JUMP_P (BB_END (bb))
> >>> -             && !any_condjump_p (BB_END (bb))
> >>> -             && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING))
> >>> -           add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX);
> >>>         }
> >>>      }
> >>>
> >>> @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to)
> >>>             case NOTE_INSN_FUNCTION_BEG:
> >>>               /* There is always just single entry to function.  */
> >>>             case NOTE_INSN_BASIC_BLOCK:
> >>> +              /* We should only switch text sections once.  */
> >>> +           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
> >>>               break;
> >>>
> >>>             case NOTE_INSN_EPILOGUE_BEG:
> >>> -           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
> >>>               emit_note_copy (insn);
> >>>               break;
> >>>
> >>> @@ -3759,10 +3976,13 @@ break_superblocks (void)
> >>>  }
> >>>
> >>>  /* Finalize the changes: reorder insn list according to the sequence specified
> >>> -   by aux pointers, enter compensation code, rebuild scope forest.  */
> >>> +   by aux pointers, enter compensation code, rebuild scope forest. If
> >>> +   this is called when we will FINALIZE_REORDER_BLOCKS, indicate that
> >>> +   to fixup_reorder_chain so that it can insert the proper switch text
> >>> +   section notes.  */
> >>>
> >>>  void
> >>> -cfg_layout_finalize (void)
> >>> +cfg_layout_finalize (bool finalize_reorder_blocks)
> >>>  {
> >>>  #ifdef ENABLE_CHECKING
> >>>    verify_flow_info ();
> >>> @@ -3775,7 +3995,7 @@ void
> >>>  #endif
> >>>        )
> >>>      fixup_fallthru_exit_predecessor ();
> >>> -  fixup_reorder_chain ();
> >>> +  fixup_reorder_chain (finalize_reorder_blocks);
> >>>
> >>>    rebuild_jump_labels (get_insns ());
> >>>    delete_dead_jumptables ();
> >>> @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e)
> >>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
> >>>      return false;
> >>>
> >>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
> >>> -      || BB_PARTITION (src) != BB_PARTITION (target))
> >>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
> >>>      return false;
> >>>
> >>>    if (!onlyjump_p (insn)
> >>>
> >>> --
> >>> This patch is available for review at http://codereview.appspot.com/6823047
> >>
> >>
> >>
> >> --
> >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
  2012-11-26 20:43       ` Jack Howarth
@ 2012-11-26 20:52         ` Teresa Johnson
  2012-11-28 15:49           ` Christophe Lyon
  0 siblings, 1 reply; 35+ messages in thread
From: Teresa Johnson @ 2012-11-26 20:52 UTC (permalink / raw)
  To: Jack Howarth
  Cc: Christophe Lyon, reply, David Li, Steven Bosscher,
	Matthew Gretton-Dann, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 55041 bytes --]

Sorry, I don't know what happened there. Patch is attached.
Thanks,
Teresa

On Mon, Nov 26, 2012 at 12:42 PM, Jack Howarth <howarth@bromo.med.uc.edu> wrote:
> On Mon, Nov 26, 2012 at 12:19:55PM -0800, Teresa Johnson wrote:
>> Are you sure you have all my changes applied? I applied the 4 patches
>> attached to PR55121 into my trunk checkout that has my fixes, and to a
>> pristine trunk checkout. I configured and built both for
>> --target=arm-none-linux-gnueabi, and built using your options, .i file
>> and gcda file. I can reproduce the failure using the pristine trunk
>> with your patches but not with my fixed trunk + your patches. (I just
>> updated to head to pickup recent changes and get the same result. The
>> vec changes required some manual changes to the patch, which I will
>> resend shortly.)
>
> Teresa,
>     Your mailer seems to have corrupted the posted patch with stray
> =3D characters and line breaks. Can you repost a copy as an attachment
> to the list?
>              Jack
>
>>
>> Without my fixes:
>>
>> $ ~/extra/gcc_trunk_3_arm-eabi/gcc/cc1 -fpreproce
>> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9
>> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp
>> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use
>> -fno-common -o eval.s -freorder-blocks-and-partition
>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
>> 2.4.2-p1, MPC version 0.8.1
>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
>> 2.4.2-p1, MPC version 0.8.1
>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>> Compiler executable checksum: d19cc60a2f07de08237a8488bb35cd1a
>> eval.c: In function ‘Ge’:
>> eval.c:792:1: internal compiler error: in df_compact_blocks, at df-core.c:1560
>>  }
>>  ^
>> 0x622f71 df_compact_blocks()
>> ../../gcc_trunk_3/gcc/df-core.c:1560
>> 0x5cfcb5 compact_blocks()
>> ../../gcc_trunk_3/gcc/cfg.c:162
>> 0xc9dce0 reorder_basic_blocks
>> ../../gcc_trunk_3/gcc/bb-reorder.c:2154
>> 0xc9dce0 rest_of_handle_reorder_blocks
>> ../../gcc_trunk_3/gcc/bb-reorder.c:2219
>> Please submit a full bug report,
>> with preprocessed source if appropriate.
>> Please include the complete backtrace with any bug report.
>> See <http://gcc.gnu.org/bugs.html> for instructions.
>>
>>
>> With my fixes:
>>
>> $ ~/extra/gcc_trunk_4_arm-eabi/gcc/cc1 -fpreproce
>> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9
>> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp
>> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use
>> -fno-common -o eval.s -freorder-blocks-and-partition
>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
>> 2.4.2-p1, MPC version 0.8.1
>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
>> 2.4.2-p1, MPC version 0.8.1
>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>> Compiler executable checksum: 45b468efa7c981f9afb44c4dac2424f3
>>
>>
>> Thanks,
>> Teresa
>>
>> On Mon, Nov 26, 2012 at 8:25 AM, Christophe Lyon
>> <christophe.lyon@linaro.org> wrote:
>> > Hi,
>> >
>> > I have tested your patch on Spec2000 on ARM, and I can still see
>> > several failures caused by:
>> > "error: fallthru edge crosses section boundary", including the case
>> > described in PR55121.
>> >
>> > On 26 November 2012 16:55, Teresa Johnson <tejohnson@google.com> wrote:
>> >> Ping.
>> >> Teresa
>> >>
>> >> On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote:
>> >>> Revised patch that fixes failures encountered when enabling
>> >>> -freorder-blocks-and-partition, including the failure reported in PR 53743.
>> >>>
>> >>> This includes new verification code to ensure no cold blocks dominate hot
>> >>> blocks contributed by Steven Bosscher.
>> >>>
>> >>> I attempted to make the handling of partition updates through the optimization
>> >>> passes much more consistent, removing a number of partial fixes in the code
>> >>> stream in the process. The code to fixup partitions (including the BB_PARTITION
>> >>> assignement, region crossing jump notes, and switch text section notes) is
>> >>> now handled in a few centralized locations. For example, inside
>> >>> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers
>> >>> don't need to attempt the fixup themselves.
>> >>>
>> >>> For optimization passes that make adjustments to the cfg while in cfg layout
>> >>> mode that are not easy to fix up incrementally, the new routine
>> >>> fixup_partitions handles the cleanup globally. This does require calculation
>> >>> of the dominance relation, however, as far as I can tell the routines which
>> >>> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions)
>> >>> are invoked typically once (or a small number of times in the case of
>> >>> try_optimize_cfg) per optimization pass. Additionally, I compared the
>> >>> -ftime-report output for some large fdo compilations and saw only minimal
>> >>> increases in the dominance computation times, which were only a tiny percent
>> >>> of the overall compile time.
>> >>>
>> >>> Additionally, I added a flag to the rtl_data structure to indicate whether
>> >>> any partitioning was actually performed, so that optimizations which were
>> >>> conservatively disabled whenever the flag_reorder_blocks_and_partition
>> >>> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less
>> >>> conservative for functions where no partitions were formed (e.g. they are
>> >>> completely hot).
>> >>>
>> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int
>> >>> benchmarks and internal google benchmarks using profile feedback and
>> >>> -freorder-blocks-and-partition to get more coverage. Ok for trunk?
>> >>>
>> >>> Thanks,
>> >>> Teresa
>> >>>
>> >>> 2012-11-14  Teresa Johnson  <tejohnson@google.com>
>> >>>             Steven Bosscher  <steven@gcc.gnu.org>
>> >>>
>> >>>         * cfghooks.h (cfg_layout_finalize): New parameter.
>> >>>         * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize
>> >>>         parameter.
>> >>>         * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert
>> >>>         as this is now done by redirect_edge_and_branch_force.
>> >>>         * function.c (thread_prologue_and_epilogue_insns): Insert new bb after
>> >>>         barriers, new cfg_layout_finalize parameter, and don't store exit
>> >>>         predecessor BB until after it is potentially split.
>> >>>         * function.h (struct rtl_data): New flag has_bb_partition.
>> >>>         * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter.
>> >>>         * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if
>> >>>         any blocks in function actually partitioned.
>> >>>         (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean
>> >>>         up partitioning.
>> >>>         * bb-reorder.c (connect_traces): Only look for partitions and skip
>> >>>         block copying if any blocks in function actually partitioned.
>> >>>         (emit_barrier_after_bb): Handle insertion in non-cfglayout mode.
>> >>>         (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure
>> >>>         that no cold blocks dominate a hot block.
>> >>>         (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert
>> >>>         as this is now done by force_nonfallthru_and_redirect.
>> >>>         (add_reg_crossing_jump_notes): Handle the fact that some jumps may
>> >>>         already be marked with region crossing note.
>> >>>         (reorder_basic_blocks): Only need to verify partitions if any
>> >>>         blocks in function actually partitioned.
>> >>>         (insert_section_boundary_note): Only need to insert note if any
>> >>>         blocks in function actually partitioned.
>> >>>         (rest_of_handle_reorder_blocks): New cfg_layout_finalize
>> >>>         parameter, and remove call to insert_section_boundary_note as this
>> >>>         is now called via cfg_layout_finalize/fixup_reorder_chain.
>> >>>         (duplicate_computed_gotos): New cfg_layout_finalize
>> >>>         parameter.
>> >>>         (partition_hot_cold_basic_blocks): Set flag indicating function
>> >>>         has bb partitions.
>> >>>         * bb-reorder.h: Declare insert_section_boundary_note and
>> >>>         emit_barrier_after_bb, which are no longer static.
>> >>>         * basic-block.h: Declare new function fixup_partitions.
>> >>>         * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary
>> >>>         check for region crossing note.
>> >>>         (fixup_partition_crossing): New function.
>> >>>         (fixup_bb_partition): Ditto.
>> >>>         (rtl_redirect_edge_and_branch): Fixup partition boundaries.
>> >>>         (force_nonfallthru_and_redirect): Fixup partition boundaries,
>> >>>         remove old code that tried to do this. Emit barrier correctly
>> >>>         when we are in cfglayout mode.
>> >>>         (rtl_split_edge): Correctly fixup partition boundaries.
>> >>>         (commit_one_edge_insertion): Remove old code that tried to
>> >>>         fixup region crossing edge since this is now handled in
>> >>>         split_block, and set up insertion point correctly since
>> >>>         block may now end in a jump.
>> >>>         (commit_edge_insertions): Invoke fixup_partitions to sanitize partition
>> >>>         boundaries after optimizations that modify cfg and before trying to
>> >>>         verify the flow info.
>> >>>         (fixup_partitions): New function.
>> >>>         (rtl_verify_flow_info_1): Add verification that no cold bbs dominate
>> >>>         hot bbs.
>> >>>         (record_effective_endpoints): Remove region-crossing notes and set flag
>> >>>         indicating that they need to be reinserted on exit from cfglayout mode.
>> >>>         (outof_cfg_layout_mode): New cfg_layout_finalize parameter.
>> >>>         (fixup_reorder_chain): Call insert_section_boundary_note if necessary.
>> >>>         Remove old code that attempted to fixup region crossing note as
>> >>>         this is now handled in force_nonfallthru_and_redirect.
>> >>>         (duplicate_insn_chain): Don't duplicate switch section notes.
>> >>>         (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain.
>> >>>         (rtl_can_remove_branch_p): Remove unnecessary check for region crossing
>> >>>         note.
>> >>>
>> >>> Index: cfghooks.h
>> >>> ===================================================================
>> >>> --- cfghooks.h  (revision 193376)
>> >>> +++ cfghooks.h  (working copy)
>> >>> @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas
>> >>>  void account_profile_record (struct profile_record *, int);
>> >>>
>> >>>  extern void cfg_layout_initialize (unsigned int);
>> >>> -extern void cfg_layout_finalize (void);
>> >>> +extern void cfg_layout_finalize (bool);
>> >>>
>> >>>  /* Hooks containers.  */
>> >>>  extern struct cfg_hooks gimple_cfg_hooks;
>> >>> @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi
>> >>>  extern void gimple_register_cfg_hooks (void);
>> >>>  extern struct cfg_hooks get_cfg_hooks (void);
>> >>>  extern void set_cfg_hooks (struct cfg_hooks);
>> >>> -
>> >>> Index: modulo-sched.c
>> >>> ===================================================================
>> >>> --- modulo-sched.c      (revision 193376)
>> >>> +++ modulo-sched.c      (working copy)
>> >>> @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void)
>> >>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>> >>>        bb->aux = bb->next_bb;
>> >>>    free_dominance_info (CDI_DOMINATORS);
>> >>> -  cfg_layout_finalize ();
>> >>> +  cfg_layout_finalize (false);
>> >>>  #endif /* INSN_SCHEDULING */
>> >>>    return 0;
>> >>>  }
>> >>> Index: ifcvt.c
>> >>> ===================================================================
>> >>> --- ifcvt.c     (revision 193376)
>> >>> +++ ifcvt.c     (working copy)
>> >>> @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg
>> >>>    if (new_bb)
>> >>>      {
>> >>>        df_bb_replace (then_bb_index, new_bb);
>> >>> -      /* Since the fallthru edge was redirected from test_bb to new_bb,
>> >>> -         we need to ensure that new_bb is in the same partition as
>> >>> -         test bb (you can not fall through across section boundaries).  */
>> >>> -      BB_COPY_PARTITION (new_bb, test_bb);
>> >>> +      /* This should have been done above via force_nonfallthru_and_redirect
>> >>> +         (possibly called from redirect_edge_and_branch_force).  */
>> >>> +      gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb));
>> >>>      }
>> >>>
>> >>>    num_true_changes++;
>> >>> Index: function.c
>> >>> ===================================================================
>> >>> --- function.c  (revision 193376)
>> >>> +++ function.c  (working copy)
>> >>> @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void)
>> >>>                     break;
>> >>>                 if (e)
>> >>>                   {
>> >>> -                   copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)),
>> >>> -                                                 NULL_RTX, e->src);
>> >>> +                    /* Make sure we insert after any barriers.  */
>> >>> +                    rtx end = get_last_bb_insn (e->src);
>> >>> +                    copy_bb = create_basic_block (NEXT_INSN (end),
>> >>> +                                                  NULL_RTX, e->src);
>> >>>                     BB_COPY_PARTITION (copy_bb, e->src);
>> >>>                   }
>> >>>                 else
>> >>> @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void)
>> >>>         if (cur_bb->index >= NUM_FIXED_BLOCKS
>> >>>             && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS)
>> >>>           cur_bb->aux = cur_bb->next_bb;
>> >>> -      cfg_layout_finalize ();
>> >>> +      cfg_layout_finalize (false);
>> >>>      }
>> >>>
>> >>>  epilogue_done:
>> >>> @@ -6517,7 +6519,7 @@ epilogue_done:
>> >>>        basic_block simple_return_block_cold = NULL;
>> >>>        edge pending_edge_hot = NULL;
>> >>>        edge pending_edge_cold = NULL;
>> >>> -      basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb;
>> >>> +      basic_block exit_pred;
>> >>>        int i;
>> >>>
>> >>>        gcc_assert (entry_edge != orig_entry_edge);
>> >>> @@ -6545,6 +6547,12 @@ epilogue_done:
>> >>>             else
>> >>>               pending_edge_cold = e;
>> >>>           }
>> >>> +
>> >>> +      /* Save a pointer to the exit's predecessor BB for use in
>> >>> +         inserting new BBs at the end of the function. Do this
>> >>> +         after the call to split_block above which may split
>> >>> +         the original exit pred.  */
>> >>> +      exit_pred = EXIT_BLOCK_PTR->prev_bb;
>> >>>
>> >>>        FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e)
>> >>>         {
>> >>> Index: function.h
>> >>> ===================================================================
>> >>> --- function.h  (revision 193376)
>> >>> +++ function.h  (working copy)
>> >>> @@ -459,6 +459,11 @@ struct GTY(()) rtl_data {
>> >>>       sched2) and is useful only if the port defines LEAF_REGISTERS.  */
>> >>>    bool uses_only_leaf_regs;
>> >>>
>> >>> +  /* Nonzero if the function being compiled has undergone hot/cold partitioning
>> >>> +     (under flag_reorder_blocks_and_partition) and has at least one cold
>> >>> +     block.  */
>> >>> +  bool has_bb_partition;
>> >>> +
>> >>>    /* Like regs_ever_live, but 1 if a reg is set or clobbered from an
>> >>>       asm.  Unlike regs_ever_live, elements of this array corresponding
>> >>>       to eliminable regs (like the frame pointer) are set if an asm
>> >>> Index: hw-doloop.c
>> >>> ===================================================================
>> >>> --- hw-doloop.c (revision 193376)
>> >>> +++ hw-doloop.c (working copy)
>> >>> @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops)
>> >>>        else
>> >>>         bb->aux = NULL;
>> >>>      }
>> >>> -  cfg_layout_finalize ();
>> >>> +  cfg_layout_finalize (false);
>> >>>    clear_aux_for_blocks ();
>> >>>    df_analyze ();
>> >>>  }
>> >>> Index: cfgcleanup.c
>> >>> ===================================================================
>> >>> --- cfgcleanup.c        (revision 193376)
>> >>> +++ cfgcleanup.c        (working copy)
>> >>> @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2,
>> >>>       partition boundaries).  See the comments at the top of
>> >>>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>> >>>
>> >>> -  if (flag_reorder_blocks_and_partition && reload_completed)
>> >>> +  if (crtl->has_bb_partition && reload_completed)
>> >>>      return false;
>> >>>
>> >>>    /* Search backward through forwarder blocks.  We don't need to worry
>> >>> @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode)
>> >>>               df_analyze ();
>> >>>             }
>> >>>
>> >>> +         if (changed)
>> >>> +            {
>> >>> +              /* Edge forwarding in particular can cause hot blocks previously
>> >>> +                 reached by both hot and cold blocks to become dominated only
>> >>> +                 by cold blocks. This will cause the verification below to fail,
>> >>> +                 and lead to now cold code in the hot section. This is not easy
>> >>> +                 to detect and fix during edge forwarding, and in some cases
>> >>> +                 is only visible after newly unreachable blocks are deleted,
>> >>> +                 which will be done in fixup_partitions.  */
>> >>> +              fixup_partitions ();
>> >>> +
>> >>>  #ifdef ENABLE_CHECKING
>> >>> -         if (changed)
>> >>> -           verify_flow_info ();
>> >>> +              verify_flow_info ();
>> >>>  #endif
>> >>> +            }
>> >>>
>> >>>           changed_overall |= changed;
>> >>>           first_pass = false;
>> >>> Index: bb-reorder.c
>> >>> ===================================================================
>> >>> --- bb-reorder.c        (revision 193376)
>> >>> +++ bb-reorder.c        (working copy)
>> >>> @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces
>> >>>    current_partition = BB_PARTITION (traces[0].first);
>> >>>    two_passes = false;
>> >>>
>> >>> -  if (flag_reorder_blocks_and_partition)
>> >>> +  if (crtl->has_bb_partition)
>> >>>      for (i = 0; i < n_traces && !two_passes; i++)
>> >>>        if (BB_PARTITION (traces[0].first)
>> >>>           != BB_PARTITION (traces[i].first))
>> >>> @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces
>> >>>                       }
>> >>>                   }
>> >>>
>> >>> -             if (flag_reorder_blocks_and_partition)
>> >>> +             if (crtl->has_bb_partition)
>> >>>                 try_copy = false;
>> >>>
>> >>>               /* Copy tiny blocks always; copy larger blocks only when the
>> >>> @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void)
>> >>>    return length;
>> >>>  }
>> >>>
>> >>> -/* Emit a barrier into the footer of BB.  */
>> >>> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode.  */
>> >>>
>> >>> -static void
>> >>> +void
>> >>>  emit_barrier_after_bb (basic_block bb)
>> >>>  {
>> >>>    rtx barrier = emit_barrier_after (BB_END (bb));
>> >>> -  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>> >>> +  if (current_ir_type () == IR_RTL_CFGLAYOUT)
>> >>> +    BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>> >>>  }
>> >>>
>> >>>  /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.
>> >>> @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg
>> >>>  {
>> >>>    VEC(edge, heap) *crossing_edges = NULL;
>> >>>    basic_block bb;
>> >>> -  edge e;
>> >>> -  edge_iterator ei;
>> >>> +  edge e, e2;
>> >>> +  edge_iterator ei, ei2;
>> >>> +  unsigned int cold_bb_count = 0;
>> >>> +  VEC (basic_block, heap) *bbs_in_hot_partition = NULL;
>> >>> +  VEC (basic_block, heap) *bbs_newly_hot = NULL;
>> >>>
>> >>>    /* Mark which partition (hot/cold) each basic block belongs in.  */
>> >>>    FOR_EACH_BB (bb)
>> >>>      {
>> >>>        if (probably_never_executed_bb_p (cfun, bb))
>> >>> -       BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>> >>> +        {
>> >>> +          BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>> >>> +          cold_bb_count++;
>> >>> +        }
>> >>>        else
>> >>> -       BB_SET_PARTITION (bb, BB_HOT_PARTITION);
>> >>> +        {
>> >>> +          BB_SET_PARTITION (bb, BB_HOT_PARTITION);
>> >>> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb);
>> >>> +        }
>> >>>      }
>> >>>
>> >>> +  /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of
>> >>> +     several different possibilities. One is that there are edge weight insanities
>> >>> +     due to optimization phases that do not properly update basic block profile
>> >>> +     counts. The second is that the entry of the function may not be hot, because
>> >>> +     it is entered fewer times than the number of profile training runs, but there
>> >>> +     is a loop inside the function that causes blocks within the function to be
>> >>> +     above the threshold for hotness.  */
>> >>> +  if (cold_bb_count)
>> >>> +    {
>> >>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>> >>> +
>> >>> +      if (dom_calculated_here)
>> >>> +        calculate_dominance_info (CDI_DOMINATORS);
>> >>> +
>> >>> +      /* Keep examining hot bbs until we have either checked them all, or
>> >>> +         re-marked all cold bbs hot.  */
>> >>> +      while (! VEC_empty (basic_block, bbs_in_hot_partition)
>> >>> +             && cold_bb_count)
>> >>> +        {
>> >>> +          basic_block dom_bb;
>> >>> +
>> >>> +          bb = VEC_pop (basic_block, bbs_in_hot_partition);
>> >>> +          dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
>> >>> +
>> >>> +          /* If bb's immediate dominator is also hot then it is ok.  */
>> >>> +          if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION)
>> >>> +            continue;
>> >>> +
>> >>> +          /* We have a hot bb with an immediate dominator that is cold.
>> >>> +             The dominator needs to be re-marked to hot.  */
>> >>> +          BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION);
>> >>> +          cold_bb_count--;
>> >>> +
>> >>> +          /* Now we need to examine newly-hot dom_bb to see if it is also
>> >>> +             dominated by a cold bb.  */
>> >>> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb);
>> >>> +
>> >>> +          /* We should also adjust any cold blocks that the newly-hot bb
>> >>> +             feeds and see if it makes sense to re-mark those as hot as
>> >>> +             well.  */
>> >>> +          VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb);
>> >>> +          while (! VEC_empty (basic_block, bbs_newly_hot))
>> >>> +            {
>> >>> +              basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot);
>> >>> +              /* Examine all successors of this newly-hot bb to see if they
>> >>> +                 are cold and should be re-marked as hot.  */
>> >>> +              FOR_EACH_EDGE (e, ei, new_hot_bb->succs)
>> >>> +                {
>> >>> +                  bool any_cold_preds = false;
>> >>> +                  basic_block succ = e->dest;
>> >>> +                  if (BB_PARTITION (succ) != BB_COLD_PARTITION)
>> >>> +                    continue;
>> >>> +                  /* Does this block have any cold predecessors now?  */
>> >>> +                  FOR_EACH_EDGE (e2, ei2, succ->preds)
>> >>> +                  {
>> >>> +                    if (BB_PARTITION (e2->src) == BB_COLD_PARTITION)
>> >>> +                      {
>> >>> +                        any_cold_preds = true;
>> >>> +                        break;
>> >>> +                      }
>> >>> +                  }
>> >>> +                  if (any_cold_preds)
>> >>> +                    continue;
>> >>> +
>> >>> +                  /* Here we have a successor of newly-hot bb that is cold
>> >>> +                     but no longer has any cold precessessors. Since the original
>> >>> +                     assignment of our newly-hot bb was incorrect, this successor's
>> >>> +                     assignment as cold is also suspect. Go ahead and re-mark it
>> >>> +                     as hot now too. Better heuristics may be in order here.  */
>> >>> +                  BB_SET_PARTITION (succ, BB_HOT_PARTITION);
>> >>> +                  cold_bb_count--;
>> >>> +                  VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ);
>> >>> +                  /* Examine this successor as a newly-hot bb.  */
>> >>> +                  VEC_safe_push (basic_block, heap, bbs_newly_hot, succ);
>> >>> +                }
>> >>> +            }
>> >>> +        }
>> >>> +
>> >>> +      if (dom_calculated_here)
>> >>> +        free_dominance_info (CDI_DOMINATORS);
>> >>> +    }
>> >>> +
>> >>>    /* The format of .gcc_except_table does not allow landing pads to
>> >>>       be in a different partition as the throw.  Fix this by either
>> >>>       moving or duplicating the landing pads.  */
>> >>> @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void)
>> >>>                       new_bb->aux = cur_bb->aux;
>> >>>                       cur_bb->aux = new_bb;
>> >>>
>> >>> -                     /* Make sure new fall-through bb is in same
>> >>> -                        partition as bb it's falling through from.  */
>> >>> +                      /* This is done by force_nonfallthru_and_redirect.  */
>> >>> +                     gcc_assert (BB_PARTITION (new_bb)
>> >>> +                                  == BB_PARTITION (cur_bb));
>> >>>
>> >>> -                     BB_COPY_PARTITION (new_bb, cur_bb);
>> >>>                       single_succ_edge (new_bb)->flags |= EDGE_CROSSING;
>> >>>                     }
>> >>>                   else
>> >>> @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void)
>> >>>    FOR_EACH_BB (bb)
>> >>>      FOR_EACH_EDGE (e, ei, bb->succs)
>> >>>        if ((e->flags & EDGE_CROSSING)
>> >>> -         && JUMP_P (BB_END (e->src)))
>> >>> +         && JUMP_P (BB_END (e->src))
>> >>> +          /* Some notes were added during fix_up_fall_thru_edges, via
>> >>> +             force_nonfallthru_and_redirect.  */
>> >>> +          && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX))
>> >>>         add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>> >>>  }
>> >>>
>> >>> @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void)
>> >>>        dump_flow_info (dump_file, dump_flags);
>> >>>      }
>> >>>
>> >>> -  if (flag_reorder_blocks_and_partition)
>> >>> +  if (crtl->has_bb_partition)
>> >>>      verify_hot_cold_block_grouping ();
>> >>>  }
>> >>>
>> >>> @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void)
>> >>>     encountering this note will make the compiler switch between the
>> >>>     hot and cold text sections.  */
>> >>>
>> >>> -static void
>> >>> +void
>> >>>  insert_section_boundary_note (void)
>> >>>  {
>> >>>    basic_block bb;
>> >>>    rtx new_note;
>> >>>    int first_partition = 0;
>> >>>
>> >>> -  if (!flag_reorder_blocks_and_partition)
>> >>> +  if (!crtl->has_bb_partition)
>> >>>      return;
>> >>>
>> >>>    FOR_EACH_BB (bb)
>> >>> @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void)
>> >>>    FOR_EACH_BB (bb)
>> >>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>> >>>        bb->aux = bb->next_bb;
>> >>> -  cfg_layout_finalize ();
>> >>> +  cfg_layout_finalize (true);
>> >>>
>> >>> -  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
>> >>> -  insert_section_boundary_note ();
>> >>>    return 0;
>> >>>  }
>> >>>
>> >>> @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void)
>> >>>      }
>> >>>
>> >>>  done:
>> >>> -  cfg_layout_finalize ();
>> >>> +  cfg_layout_finalize (false);
>> >>>
>> >>>    BITMAP_FREE (candidates);
>> >>>    return 0;
>> >>> @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void)
>> >>>    if (crossing_edges == NULL)
>> >>>      return 0;
>> >>>
>> >>> +  crtl->has_bb_partition = true;
>> >>> +
>> >>>    /* Make sure the source of any crossing edge ends in a jump and the
>> >>>       destination of any crossing edge has a label.  */
>> >>>    add_labels_and_missing_jumps (crossing_edges);
>> >>> Index: bb-reorder.h
>> >>> ===================================================================
>> >>> --- bb-reorder.h        (revision 193376)
>> >>> +++ bb-reorder.h        (working copy)
>> >>> @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re
>> >>>
>> >>>  extern int get_uncond_jump_length (void);
>> >>>
>> >>> +extern void insert_section_boundary_note (void);
>> >>> +
>> >>> +extern void emit_barrier_after_bb (basic_block bb);
>> >>> +
>> >>>  #endif
>> >>> Index: basic-block.h
>> >>> ===================================================================
>> >>> --- basic-block.h       (revision 193376)
>> >>> +++ basic-block.h       (working copy)
>> >>> @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect
>> >>>  extern bool contains_no_active_insn_p (const_basic_block);
>> >>>  extern bool forwarder_block_p (const_basic_block);
>> >>>  extern bool can_fallthru (basic_block, basic_block);
>> >>> +extern void fixup_partitions (void);
>> >>>
>> >>>  /* In cfgbuild.c.  */
>> >>>  extern void find_many_sub_basic_blocks (sbitmap);
>> >>> Index: cfgrtl.c
>> >>> ===================================================================
>> >>> --- cfgrtl.c    (revision 193376)
>> >>> +++ cfgrtl.c    (working copy)
>> >>> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
>> >>>  #include "tree.h"
>> >>>  #include "hard-reg-set.h"
>> >>>  #include "basic-block.h"
>> >>> +#include "bb-reorder.h"
>> >>>  #include "regs.h"
>> >>>  #include "flags.h"
>> >>>  #include "function.h"
>> >>> @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3.  If not see
>> >>>     Only applicable if the CFG is in cfglayout mode.  */
>> >>>  static GTY(()) rtx cfg_layout_function_footer;
>> >>>  static GTY(()) rtx cfg_layout_function_header;
>> >>> +static bool had_sec_boundary_notes;
>> >>>
>> >>>  static rtx skip_insns_after_block (basic_block);
>> >>>  static void record_effective_endpoints (void);
>> >>>  static rtx label_for_bb (basic_block);
>> >>> -static void fixup_reorder_chain (void);
>> >>> +static void fixup_reorder_chain (bool finalize_reorder_blocks);
>> >>>
>> >>>  void verify_insn_chain (void);
>> >>>  static void fixup_fallthru_exit_predecessor (void);
>> >>> @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc
>> >>>       partition boundaries).  See  the comments at the top of
>> >>>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>> >>>
>> >>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
>> >>> -      || BB_PARTITION (src) != BB_PARTITION (target))
>> >>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>> >>>      return NULL;
>> >>>
>> >>>    /* We can replace or remove a complex jump only when we have exactly
>> >>> @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target)
>> >>>    return e;
>> >>>  }
>> >>>
>> >>> +/* Called when edge E has been redirected to a new destination,
>> >>> +   in order to update the region crossing flag on the edge and
>> >>> +   jump.  */
>> >>> +
>> >>> +static void
>> >>> +fixup_partition_crossing (edge e, basic_block target)
>> >>> +{
>> >>> +  rtx note;
>> >>> +
>> >>> +  gcc_assert (e->dest == target);
>> >>> +
>> >>> +  if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR)
>> >>> +    return;
>> >>> +  /* If we redirected an existing edge, it may already be marked
>> >>> +     crossing, even though the new src is missing a reg crossing note.
>> >>> +     But make sure reg crossing note doesn't already exist before
>> >>> +     inserting.  */
>> >>> +  if (BB_PARTITION (e->src) != BB_PARTITION (target))
>> >>> +    {
>> >>> +      e->flags |= EDGE_CROSSING;
>> >>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>> >>> +      if (JUMP_P (BB_END (e->src))
>> >>> +          && !note)
>> >>> +        add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>> >>> +    }
>> >>> +  else if (BB_PARTITION (e->src) == BB_PARTITION (target))
>> >>> +    {
>> >>> +      e->flags &= ~EDGE_CROSSING;
>> >>> +      /* Remove the region crossing note from jump at end of
>> >>> +         e->src if it exists.  */
>> >>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>> >>> +      if (note)
>> >>> +        remove_note (BB_END (e->src), note);
>> >>> +    }
>> >>> +}
>> >>> +
>> >>> +/* Called when block BB has been reassigned to a different partition,
>> >>> +   to ensure that the region crossing attributes are updated.  */
>> >>> +
>> >>> +static void
>> >>> +fixup_bb_partition (basic_block bb)
>> >>> +{
>> >>> +  edge e;
>> >>> +  edge_iterator ei;
>> >>> +
>> >>> +  /* Now need to make bb's pred edges non-region crossing.  */
>> >>> +  FOR_EACH_EDGE (e, ei, bb->preds)
>> >>> +    {
>> >>> +      fixup_partition_crossing (e, e->dest);
>> >>> +    }
>> >>> +
>> >>> +  /* Possibly need to make bb's successor edges region crossing,
>> >>> +     or remove stale region crossing.  */
>> >>> +  FOR_EACH_EDGE (e, ei, bb->succs)
>> >>> +    {
>> >>> +      if ((e->flags & EDGE_FALLTHRU)
>> >>> +          && BB_PARTITION (bb) != BB_PARTITION (e->dest)
>> >>> +          && e->dest != EXIT_BLOCK_PTR)
>> >>> +        /* force_nonfallthru_and_redirect calls fixup_partition_crossing.  */
>> >>> +        force_nonfallthru (e);
>> >>> +      else
>> >>> +        fixup_partition_crossing (e, e->dest);
>> >>> +    }
>> >>> +}
>> >>> +
>> >>>  /* Attempt to change code to redirect edge E to TARGET.  Don't do that on
>> >>>     expense of adding new instructions or reordering basic blocks.
>> >>>
>> >>> @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>> >>>  {
>> >>>    edge ret;
>> >>>    basic_block src = e->src;
>> >>> +  basic_block dest = e->dest;
>> >>>
>> >>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>> >>>      return NULL;
>> >>>
>> >>> -  if (e->dest == target)
>> >>> +  if (dest == target)
>> >>>      return e;
>> >>>
>> >>>    if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL)
>> >>>      {
>> >>>        df_set_bb_dirty (src);
>> >>> +      fixup_partition_crossing (ret, target);
>> >>>        return ret;
>> >>>      }
>> >>>
>> >>> @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>> >>>      return NULL;
>> >>>
>> >>>    df_set_bb_dirty (src);
>> >>> +  fixup_partition_crossing (ret, target);
>> >>>    return ret;
>> >>>  }
>> >>>
>> >>> @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>> >>>        /* Make sure new block ends up in correct hot/cold section.  */
>> >>>
>> >>>        BB_COPY_PARTITION (jump_block, e->src);
>> >>> -      if (flag_reorder_blocks_and_partition
>> >>> -         && targetm_common.have_named_sections
>> >>> -         && JUMP_P (BB_END (jump_block))
>> >>> -         && !any_condjump_p (BB_END (jump_block))
>> >>> -         && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING))
>> >>> -       add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX);
>> >>>
>> >>>        /* Wire edge in.  */
>> >>>        new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU);
>> >>>        new_edge->probability = probability;
>> >>>        new_edge->count = count;
>> >>>
>> >>> +      /* If e->src was previously region crossing, it no longer is
>> >>> +         and the reg crossing note should be removed.  */
>> >>> +      fixup_partition_crossing (new_edge, jump_block);
>> >>> +
>> >>>        /* Redirect old edge.  */
>> >>>        redirect_edge_pred (e, jump_block);
>> >>>        e->probability = REG_BR_PROB_BASE;
>> >>> @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>> >>>        LABEL_NUSES (label)++;
>> >>>      }
>> >>>
>> >>> -  emit_barrier_after (BB_END (jump_block));
>> >>> +  /* We might be in cfg layout mode, and if so, the following routine will
>> >>> +     insert the barrier correctly.  */
>> >>> +  emit_barrier_after_bb (jump_block);
>> >>>    redirect_edge_succ_nodup (e, target);
>> >>>
>> >>>    if (abnormal_edge_flags)
>> >>>      make_edge (src, target, abnormal_edge_flags);
>> >>>
>> >>>    df_mark_solutions_dirty ();
>> >>> +  fixup_partition_crossing (e, target);
>> >>>    return new_bb;
>> >>>  }
>> >>>
>> >>> @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU
>> >>>  static basic_block
>> >>>  rtl_split_edge (edge edge_in)
>> >>>  {
>> >>> -  basic_block bb;
>> >>> +  basic_block bb, new_bb;
>> >>>    rtx before;
>> >>>
>> >>>    /* Abnormal edges cannot be split.  */
>> >>> @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in)
>> >>>    else
>> >>>      {
>> >>>        bb = create_basic_block (before, NULL, edge_in->dest->prev_bb);
>> >>> -      /* ??? Why not edge_in->dest->prev_bb here?  */
>> >>> -      BB_COPY_PARTITION (bb, edge_in->dest);
>> >>> +      if (edge_in->src == ENTRY_BLOCK_PTR)
>> >>> +        BB_COPY_PARTITION (bb, edge_in->dest);
>> >>> +      else
>> >>> +        /* Put the split bb into the src partition, to avoid creating
>> >>> +           a situation where a cold bb dominates a hot bb, in the case
>> >>> +           where src is cold and dest is hot. The src will dominate
>> >>> +           the new bb (whereas it might not have dominated dest).  */
>> >>> +        BB_COPY_PARTITION (bb, edge_in->src);
>> >>>      }
>> >>>
>> >>>    make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU);
>> >>>
>> >>> +  /* Can't allow a region crossing edge to be fallthrough.  */
>> >>> +  if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest)
>> >>> +      && edge_in->dest != EXIT_BLOCK_PTR)
>> >>> +    {
>> >>> +      new_bb = force_nonfallthru (single_succ_edge (bb));
>> >>> +      gcc_assert (!new_bb);
>> >>> +    }
>> >>> +
>> >>>    /* For non-fallthru edges, we must adjust the predecessor's
>> >>>       jump instruction to target our new block.  */
>> >>>    if ((edge_in->flags & EDGE_FALLTHRU) == 0)
>> >>> @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e)
>> >>>    else
>> >>>      {
>> >>>        bb = split_edge (e);
>> >>> -      after = BB_END (bb);
>> >>>
>> >>> -      if (flag_reorder_blocks_and_partition
>> >>> -         && targetm_common.have_named_sections
>> >>> -         && e->src != ENTRY_BLOCK_PTR
>> >>> -         && BB_PARTITION (e->src) == BB_COLD_PARTITION
>> >>> -         && !(e->flags & EDGE_CROSSING)
>> >>> -         && JUMP_P (after)
>> >>> -         && !any_condjump_p (after)
>> >>> -         && (single_succ_edge (bb)->flags & EDGE_CROSSING))
>> >>> -       add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX);
>> >>> +      /* If e crossed a partition boundary, we needed to make bb end in
>> >>> +         a region-crossing jump, even though it was originally fallthru.  */
>> >>> +      if (JUMP_P (BB_END (bb)))
>> >>> +       before = BB_END (bb);
>> >>> +      else
>> >>> +        after = BB_END (bb);
>> >>>      }
>> >>>
>> >>>    /* Now that we've found the spot, do the insertion.  */
>> >>> @@ -1827,6 +1907,14 @@ commit_edge_insertions (void)
>> >>>  {
>> >>>    basic_block bb;
>> >>>
>> >>> +  /* Optimization passes that invoke this routine can cause hot blocks
>> >>> +     previously reached by both hot and cold blocks to become dominated only
>> >>> +     by cold blocks. This will cause the verification below to fail,
>> >>> +     and lead to now cold code in the hot section. In some cases this
>> >>> +     may only be visible after newly unreachable blocks are deleted,
>> >>> +     which will be done by fixup_partitions.  */
>> >>> +  fixup_partitions ();
>> >>> +
>> >>>  #ifdef ENABLE_CHECKING
>> >>>    verify_flow_info ();
>> >>>  #endif
>> >>> @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb)
>> >>>
>> >>>    return end;
>> >>>  }
>> >>> -
>> >>> +
>> >>> +/* Perform cleanup on the hot/cold bb partitioning after optimization
>> >>> +   passes that modify the cfg.  */
>> >>> +
>> >>> +void
>> >>> +fixup_partitions (void)
>> >>> +{
>> >>> +  basic_block bb;
>> >>> +
>> >>> +  if (!crtl->has_bb_partition)
>> >>> +    return;
>> >>> +
>> >>> +  /* Delete any blocks that became unreachable and weren't
>> >>> +     already cleaned up, for example during edge forwarding
>> >>> +     and convert_jumps_to_returns. This will expose more
>> >>> +     opportunities for fixing the partition boundaries here.
>> >>> +     Also, the calculation of the dominance graph during verification
>> >>> +     will assert if there are unreachable nodes.  */
>> >>> +  delete_unreachable_blocks ();
>> >>> +
>> >>> +  /* If there are partitions, do a sanity check on them: A basic block in
>> >>> +     a cold partition cannot dominate a basic block in a hot partition.
>> >>> +     Fixup any that now violate this requirement, as a result of edge
>> >>> +     forwarding and unreachable block deletion.  */
>> >>> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
>> >>> +  VEC (basic_block, heap) *bbs_to_fix = NULL;
>> >>> +  FOR_EACH_BB (bb)
>> >>> +    if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
>> >>> +      VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
>> >>> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
>> >>> +    {
>> >>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>> >>> +      basic_block son;
>> >>> +
>> >>> +      if (dom_calculated_here)
>> >>> +        calculate_dominance_info (CDI_DOMINATORS);
>> >>> +
>> >>> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
>> >>> +        {
>> >>> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
>> >>> +          /* If bb is not yet cold (because it was added below as
>> >>> +             a block dominated by a cold bb) then mark it cold here.  */
>> >>> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
>> >>> +            {
>> >>> +              BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>> >>> +              VEC_safe_push (basic_block, heap, bbs_to_fix, bb);
>> >>> +            }
>> >>> +          /* Any blocks dominated by a block in the cold section
>> >>> +             must also be cold.  */
>> >>> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
>> >>> +               son;
>> >>> +               son = next_dom_son (CDI_DOMINATORS, son))
>> >>> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
>> >>> +        }
>> >>> +
>> >>> +      if (dom_calculated_here)
>> >>> +        free_dominance_info (CDI_DOMINATORS);
>> >>> +    }
>> >>> +
>> >>> +  /* Do the partition fixup after all necessary blocks have been converted to
>> >>> +     cold, so that we only update the region crossings the minimum number of
>> >>> +     places, which can require forcing edges to be non fallthru.  */
>> >>> +  while (! VEC_empty (basic_block, bbs_to_fix))
>> >>> +    {
>> >>> +      bb = VEC_pop (basic_block, bbs_to_fix);
>> >>> +      fixup_bb_partition (bb);
>> >>> +    }
>> >>> +}
>> >>> +
>> >>>  /* Verify the CFG and RTL consistency common for both underlying RTL and
>> >>>     cfglayout RTL.
>> >>>
>> >>> @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void)
>> >>>    rtx x;
>> >>>    int err = 0;
>> >>>    basic_block bb;
>> >>> +  bool have_partitions = false;
>> >>>
>> >>>    /* Check the general integrity of the basic blocks.  */
>> >>>    FOR_EACH_BB_REVERSE (bb)
>> >>> @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void)
>> >>>
>> >>>           if (e->flags & EDGE_ABNORMAL)
>> >>>             n_abnormal++;
>> >>> +
>> >>> +          have_partitions |= is_crossing;
>> >>>         }
>> >>>
>> >>>        if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX))
>> >>> @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void)
>> >>>           }
>> >>>      }
>> >>>
>> >>> +  /* If there are partitions, do a sanity check on them: A basic block in
>> >>> +     a cold partition cannot dominate a basic block in a hot partition.  */
>> >>> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
>> >>> +  if (have_partitions && !err)
>> >>> +    FOR_EACH_BB (bb)
>> >>> +      if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
>> >>> +        VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
>> >>> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
>> >>> +    {
>> >>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>> >>> +      basic_block son;
>> >>> +
>> >>> +      if (dom_calculated_here)
>> >>> +        calculate_dominance_info (CDI_DOMINATORS);
>> >>> +
>> >>> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
>> >>> +        {
>> >>> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
>> >>> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
>> >>> +            {
>> >>> +              error ("non-cold basic block %d dominated "
>> >>> +                     "by a block in the cold partition", bb->index);
>> >>> +              err = 1;
>> >>> +            }
>> >>> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
>> >>> +               son;
>> >>> +               son = next_dom_son (CDI_DOMINATORS, son))
>> >>> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
>> >>> +        }
>> >>> +
>> >>> +      if (dom_calculated_here)
>> >>> +        free_dominance_info (CDI_DOMINATORS);
>> >>> +    }
>> >>> +
>> >>>    /* Clean up.  */
>> >>>    return err;
>> >>>  }
>> >>> @@ -2965,14 +3158,41 @@ record_effective_endpoints (void)
>> >>>    else
>> >>>      cfg_layout_function_header = NULL_RTX;
>> >>>
>> >>> +  had_sec_boundary_notes = false;
>> >>> +
>> >>>    next_insn = get_insns ();
>> >>>    FOR_EACH_BB (bb)
>> >>>      {
>> >>>        rtx end;
>> >>>
>> >>>        if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb))
>> >>> -       BB_HEADER (bb) = unlink_insn_chain (next_insn,
>> >>> -                                             PREV_INSN (BB_HEAD (bb)));
>> >>> +        {
>> >>> +          /* Rather than try to keep section boundary notes incrementally
>> >>> +             up-to-date through cfg layout optimizations, simply remove them
>> >>> +             and flag that they should be re-inserted when exiting
>> >>> +             cfg layout mode.  */
>> >>> +          rtx check_insn = next_insn;
>> >>> +          while (check_insn)
>> >>> +            {
>> >>> +              if (NOTE_P (check_insn)
>> >>> +                  && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS)
>> >>> +              {
>> >>> +                had_sec_boundary_notes |= true;
>> >>> +                /* Remove note from chain. Grab new next_insn first.  */
>> >>> +                if (next_insn == check_insn)
>> >>> +                  next_insn = NEXT_INSN (check_insn);
>> >>> +                /* Delete note.  */
>> >>> +                delete_insn (check_insn);
>> >>> +                /* There will only be one.  */
>> >>> +                break;
>> >>> +              }
>> >>> +              check_insn = NEXT_INSN (check_insn);
>> >>> +            }
>> >>> +          /* If we still have header instructions left after above loop.  */
>> >>> +          if (next_insn != BB_HEAD (bb))
>> >>> +            BB_HEADER (bb) = unlink_insn_chain (next_insn,
>> >>> +                                                PREV_INSN (BB_HEAD (bb)));
>> >>> +        }
>> >>>        end = skip_insns_after_block (bb);
>> >>>        if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end)
>> >>>         BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end);
>> >>> @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void)
>> >>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>> >>>        bb->aux = bb->next_bb;
>> >>>
>> >>> -  cfg_layout_finalize ();
>> >>> +  cfg_layout_finalize (false);
>> >>>
>> >>>    return 0;
>> >>>  }
>> >>> @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode)
>> >>>  }
>> >>>
>> >>>
>> >>> -/* Given a reorder chain, rearrange the code to match.  */
>> >>> +/* Given a reorder chain, rearrange the code to match. If
>> >>> +   this is called when we will FINALIZE_REORDER_BLOCKS, or when
>> >>> +   section boundary notes were removed on entry to cfg layout
>> >>> +   mode, insert section boundary notes here.  */
>> >>>
>> >>>  static void
>> >>> -fixup_reorder_chain (void)
>> >>> +fixup_reorder_chain (bool finalize_reorder_blocks)
>> >>>  {
>> >>>    basic_block bb;
>> >>>    rtx insn = NULL;
>> >>> @@ -3150,7 +3373,7 @@ static void
>> >>>           PREV_INSN (BB_HEADER (bb)) = insn;
>> >>>           insn = BB_HEADER (bb);
>> >>>           while (NEXT_INSN (insn))
>> >>> -           insn = NEXT_INSN (insn);
>> >>> +            insn = NEXT_INSN (insn);
>> >>>         }
>> >>>        if (insn)
>> >>>         NEXT_INSN (insn) = BB_HEAD (bb);
>> >>> @@ -3175,6 +3398,11 @@ static void
>> >>>      insn = NEXT_INSN (insn);
>> >>>
>> >>>    set_last_insn (insn);
>> >>> +
>> >>> +  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
>> >>> +  if (had_sec_boundary_notes || finalize_reorder_blocks)
>> >>> +    insert_section_boundary_note ();
>> >>> +
>> >>>  #ifdef ENABLE_CHECKING
>> >>>    verify_insn_chain ();
>> >>>  #endif
>> >>> @@ -3187,7 +3415,7 @@ static void
>> >>>        edge e_fall, e_taken, e;
>> >>>        rtx bb_end_insn;
>> >>>        rtx ret_label = NULL_RTX;
>> >>> -      basic_block nb, src_bb;
>> >>> +      basic_block nb;
>> >>>        edge_iterator ei;
>> >>>
>> >>>        if (EDGE_COUNT (bb->succs) == 0)
>> >>> @@ -3322,7 +3550,6 @@ static void
>> >>>        /* We got here if we need to add a new jump insn.
>> >>>          Note force_nonfallthru can delete E_FALL and thus we have to
>> >>>          save E_FALL->src prior to the call to force_nonfallthru.  */
>> >>> -      src_bb = e_fall->src;
>> >>>        nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label);
>> >>>        if (nb)
>> >>>         {
>> >>> @@ -3330,17 +3557,6 @@ static void
>> >>>           bb->aux = nb;
>> >>>           /* Don't process this new block.  */
>> >>>           bb = nb;
>> >>> -
>> >>> -         /* Make sure new bb is tagged for correct section (same as
>> >>> -            fall-thru source, since you cannot fall-thru across
>> >>> -            section boundaries).  */
>> >>> -         BB_COPY_PARTITION (src_bb, single_pred (bb));
>> >>> -         if (flag_reorder_blocks_and_partition
>> >>> -             && targetm_common.have_named_sections
>> >>> -             && JUMP_P (BB_END (bb))
>> >>> -             && !any_condjump_p (BB_END (bb))
>> >>> -             && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING))
>> >>> -           add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX);
>> >>>         }
>> >>>      }
>> >>>
>> >>> @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to)
>> >>>             case NOTE_INSN_FUNCTION_BEG:
>> >>>               /* There is always just single entry to function.  */
>> >>>             case NOTE_INSN_BASIC_BLOCK:
>> >>> +              /* We should only switch text sections once.  */
>> >>> +           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>> >>>               break;
>> >>>
>> >>>             case NOTE_INSN_EPILOGUE_BEG:
>> >>> -           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>> >>>               emit_note_copy (insn);
>> >>>               break;
>> >>>
>> >>> @@ -3759,10 +3976,13 @@ break_superblocks (void)
>> >>>  }
>> >>>
>> >>>  /* Finalize the changes: reorder insn list according to the sequence specified
>> >>> -   by aux pointers, enter compensation code, rebuild scope forest.  */
>> >>> +   by aux pointers, enter compensation code, rebuild scope forest. If
>> >>> +   this is called when we will FINALIZE_REORDER_BLOCKS, indicate that
>> >>> +   to fixup_reorder_chain so that it can insert the proper switch text
>> >>> +   section notes.  */
>> >>>
>> >>>  void
>> >>> -cfg_layout_finalize (void)
>> >>> +cfg_layout_finalize (bool finalize_reorder_blocks)
>> >>>  {
>> >>>  #ifdef ENABLE_CHECKING
>> >>>    verify_flow_info ();
>> >>> @@ -3775,7 +3995,7 @@ void
>> >>>  #endif
>> >>>        )
>> >>>      fixup_fallthru_exit_predecessor ();
>> >>> -  fixup_reorder_chain ();
>> >>> +  fixup_reorder_chain (finalize_reorder_blocks);
>> >>>
>> >>>    rebuild_jump_labels (get_insns ());
>> >>>    delete_dead_jumptables ();
>> >>> @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e)
>> >>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>> >>>      return false;
>> >>>
>> >>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
>> >>> -      || BB_PARTITION (src) != BB_PARTITION (target))
>> >>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>> >>>      return false;
>> >>>
>> >>>    if (!onlyjump_p (insn)
>> >>>
>> >>> --
>> >>> This patch is available for review at http://codereview.appspot.com/6823047
>> >>
>> >>
>> >>
>> >> --
>> >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

[-- Attachment #2: patch.diff --]
[-- Type: application/octet-stream, Size: 40421 bytes --]

Revised patch that fixes failures encountered when enabling
-freorder-blocks-and-partition, including the failure reported in PR 53743.

This includes new verification code to ensure no cold blocks dominate hot
blocks contributed by Steven Bosscher.

I attempted to make the handling of partition updates through the optimization
passes much more consistent, removing a number of partial fixes in the code
stream in the process. The code to fixup partitions (including the BB_PARTITION
assignement, region crossing jump notes, and switch text section notes) is
now handled in a few centralized locations. For example, inside
rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers
don't need to attempt the fixup themselves.

For optimization passes that make adjustments to the cfg while in cfg layout
mode that are not easy to fix up incrementally, the new routine
fixup_partitions handles the cleanup globally. This does require calculation
of the dominance relation, however, as far as I can tell the routines which
now invoke this global fixup (try_optimize_cfg and commit_edge_insertions)
are invoked typically once (or a small number of times in the case of
try_optimize_cfg) per optimization pass. Additionally, I compared the
-ftime-report output for some large fdo compilations and saw only minimal
increases in the dominance computation times, which were only a tiny percent
of the overall compile time.

Additionally, I added a flag to the rtl_data structure to indicate whether
any partitioning was actually performed, so that optimizations which were
conservatively disabled whenever the flag_reorder_blocks_and_partition
is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less
conservative for functions where no partitions were formed (e.g. they are
completely hot).

Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int
benchmarks and internal google benchmarks using profile feedback and
-freorder-blocks-and-partition to get more coverage. Ok for trunk?

Thanks,
Teresa

2012-11-26  Teresa Johnson  <tejohnson@google.com>
            Steven Bosscher  <steven@gcc.gnu.org>

	* cfghooks.h (cfg_layout_finalize): New parameter.
	* modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize
        parameter.
	* ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert
        as this is now done by redirect_edge_and_branch_force.
	* function.c (thread_prologue_and_epilogue_insns): Insert new bb after
        barriers, new cfg_layout_finalize parameter, and don't store exit
        predecessor BB until after it is potentially split.
	* function.h (struct rtl_data): New flag has_bb_partition.
	* hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter.
	* cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if
        any blocks in function actually partitioned.
	(try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean
        up partitioning.
	* bb-reorder.c (connect_traces): Only look for partitions and skip
        block copying if any blocks in function actually partitioned.
	(emit_barrier_after_bb): Handle insertion in non-cfglayout mode.
        (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure
        that no cold blocks dominate a hot block.
	(fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert
        as this is now done by force_nonfallthru_and_redirect.
	(add_reg_crossing_jump_notes): Handle the fact that some jumps may
        already be marked with region crossing note.
	(reorder_basic_blocks): Only need to verify partitions if any
        blocks in function actually partitioned.
	(insert_section_boundary_note): Only need to insert note if any
        blocks in function actually partitioned.
	(rest_of_handle_reorder_blocks): New cfg_layout_finalize
        parameter, and remove call to insert_section_boundary_note as this
        is now called via cfg_layout_finalize/fixup_reorder_chain.
	(duplicate_computed_gotos): New cfg_layout_finalize
        parameter.
	(partition_hot_cold_basic_blocks): Set flag indicating function
        has bb partitions.
	* bb-reorder.h: Declare insert_section_boundary_note and
        emit_barrier_after_bb, which are no longer static.
	* basic-block.h: Declare new function fixup_partitions.
	* cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary
        check for region crossing note.
	(fixup_partition_crossing): New function.
	(fixup_bb_partition): Ditto.
	(rtl_redirect_edge_and_branch): Fixup partition boundaries.
	(force_nonfallthru_and_redirect): Fixup partition boundaries,
        remove old code that tried to do this. Emit barrier correctly
        when we are in cfglayout mode.
	(rtl_split_edge): Correctly fixup partition boundaries.
	(commit_one_edge_insertion): Remove old code that tried to
        fixup region crossing edge since this is now handled in
        split_block, and set up insertion point correctly since
        block may now end in a jump.
	(commit_edge_insertions): Invoke fixup_partitions to sanitize partition
        boundaries after optimizations that modify cfg and before trying to
        verify the flow info.
	(fixup_partitions): New function.
	(rtl_verify_flow_info_1): Add verification that no cold bbs dominate
        hot bbs.
	(record_effective_endpoints): Remove region-crossing notes and set flag
        indicating that they need to be reinserted on exit from cfglayout mode.
	(outof_cfg_layout_mode): New cfg_layout_finalize parameter.
	(fixup_reorder_chain): Call insert_section_boundary_note if necessary.
        Remove old code that attempted to fixup region crossing note as
        this is now handled in force_nonfallthru_and_redirect.
	(duplicate_insn_chain): Don't duplicate switch section notes.
	(cfg_layout_finalize): Pass new parameter to fixup_reorder_chain.
	(rtl_can_remove_branch_p): Remove unnecessary check for region crossing
        note.

Index: cfghooks.h
===================================================================
--- cfghooks.h	(revision 193827)
+++ cfghooks.h	(working copy)
@@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas
 void account_profile_record (struct profile_record *, int);
 
 extern void cfg_layout_initialize (unsigned int);
-extern void cfg_layout_finalize (void);
+extern void cfg_layout_finalize (bool);
 
 /* Hooks containers.  */
 extern struct cfg_hooks gimple_cfg_hooks;
@@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi
 extern void gimple_register_cfg_hooks (void);
 extern struct cfg_hooks get_cfg_hooks (void);
 extern void set_cfg_hooks (struct cfg_hooks);
-
Index: modulo-sched.c
===================================================================
--- modulo-sched.c	(revision 193827)
+++ modulo-sched.c	(working copy)
@@ -3347,7 +3347,7 @@ rest_of_handle_sms (void)
     if (bb->next_bb != EXIT_BLOCK_PTR)
       bb->aux = bb->next_bb;
   free_dominance_info (CDI_DOMINATORS);
-  cfg_layout_finalize ();
+  cfg_layout_finalize (false);
 #endif /* INSN_SCHEDULING */
   return 0;
 }
Index: ifcvt.c
===================================================================
--- ifcvt.c	(revision 193827)
+++ ifcvt.c	(working copy)
@@ -3899,10 +3899,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg
   if (new_bb)
     {
       df_bb_replace (then_bb_index, new_bb);
-      /* Since the fallthru edge was redirected from test_bb to new_bb,
-         we need to ensure that new_bb is in the same partition as
-         test bb (you can not fall through across section boundaries).  */
-      BB_COPY_PARTITION (new_bb, test_bb);
+      /* This should have been done above via force_nonfallthru_and_redirect
+         (possibly called from redirect_edge_and_branch_force).  */
+      gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb));
     }
 
   num_true_changes++;
Index: function.c
===================================================================
--- function.c	(revision 193827)
+++ function.c	(working copy)
@@ -6246,8 +6246,10 @@ thread_prologue_and_epilogue_insns (void)
 		    break;
 		if (e)
 		  {
-		    copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)),
-						  NULL_RTX, e->src);
+                    /* Make sure we insert after any barriers.  */
+                    rtx end = get_last_bb_insn (e->src);
+                    copy_bb = create_basic_block (NEXT_INSN (end),
+                                                  NULL_RTX, e->src);
 		    BB_COPY_PARTITION (copy_bb, e->src);
 		  }
 		else
@@ -6472,7 +6474,7 @@ thread_prologue_and_epilogue_insns (void)
 	if (cur_bb->index >= NUM_FIXED_BLOCKS
 	    && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS)
 	  cur_bb->aux = cur_bb->next_bb;
-      cfg_layout_finalize ();
+      cfg_layout_finalize (false);
     }
 
 epilogue_done:
@@ -6514,7 +6516,7 @@ epilogue_done:
       basic_block simple_return_block_cold = NULL;
       edge pending_edge_hot = NULL;
       edge pending_edge_cold = NULL;
-      basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb;
+      basic_block exit_pred;
       int i;
 
       gcc_assert (entry_edge != orig_entry_edge);
@@ -6542,6 +6544,12 @@ epilogue_done:
 	    else
 	      pending_edge_cold = e;
 	  }
+      
+      /* Save a pointer to the exit's predecessor BB for use in
+         inserting new BBs at the end of the function. Do this
+         after the call to split_block above which may split
+         the original exit pred.  */
+      exit_pred = EXIT_BLOCK_PTR->prev_bb;
 
       FOR_EACH_VEC_ELT (unconverted_simple_returns, i, e)
 	{
Index: function.h
===================================================================
--- function.h	(revision 193827)
+++ function.h	(working copy)
@@ -451,6 +451,11 @@ struct GTY(()) rtl_data {
      sched2) and is useful only if the port defines LEAF_REGISTERS.  */
   bool uses_only_leaf_regs;
 
+  /* Nonzero if the function being compiled has undergone hot/cold partitioning
+     (under flag_reorder_blocks_and_partition) and has at least one cold
+     block.  */
+  bool has_bb_partition;
+
   /* Like regs_ever_live, but 1 if a reg is set or clobbered from an
      asm.  Unlike regs_ever_live, elements of this array corresponding
      to eliminable regs (like the frame pointer) are set if an asm
Index: hw-doloop.c
===================================================================
--- hw-doloop.c	(revision 193827)
+++ hw-doloop.c	(working copy)
@@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops)
       else
 	bb->aux = NULL;
     }
-  cfg_layout_finalize ();
+  cfg_layout_finalize (false);
   clear_aux_for_blocks ();
   df_analyze ();
 }
Index: cfgcleanup.c
===================================================================
--- cfgcleanup.c	(revision 193827)
+++ cfgcleanup.c	(working copy)
@@ -1846,7 +1846,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2,
      partition boundaries).  See the comments at the top of
      bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
 
-  if (flag_reorder_blocks_and_partition && reload_completed)
+  if (crtl->has_bb_partition && reload_completed)
     return false;
 
   /* Search backward through forwarder blocks.  We don't need to worry
@@ -2789,10 +2789,21 @@ try_optimize_cfg (int mode)
 	      df_analyze ();
 	    }
 
+	  if (changed)
+            {
+              /* Edge forwarding in particular can cause hot blocks previously
+                 reached by both hot and cold blocks to become dominated only
+                 by cold blocks. This will cause the verification below to fail,
+                 and lead to now cold code in the hot section. This is not easy
+                 to detect and fix during edge forwarding, and in some cases
+                 is only visible after newly unreachable blocks are deleted,
+                 which will be done in fixup_partitions.  */
+              fixup_partitions ();
+
 #ifdef ENABLE_CHECKING
-	  if (changed)
-	    verify_flow_info ();
+              verify_flow_info ();
 #endif
+            }
 
 	  changed_overall |= changed;
 	  first_pass = false;
Index: bb-reorder.c
===================================================================
--- bb-reorder.c	(revision 193827)
+++ bb-reorder.c	(working copy)
@@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces
   current_partition = BB_PARTITION (traces[0].first);
   two_passes = false;
 
-  if (flag_reorder_blocks_and_partition)
+  if (crtl->has_bb_partition)
     for (i = 0; i < n_traces && !two_passes; i++)
       if (BB_PARTITION (traces[0].first)
 	  != BB_PARTITION (traces[i].first))
@@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces
 		      }
 		  }
 
-	      if (flag_reorder_blocks_and_partition)
+	      if (crtl->has_bb_partition)
 		try_copy = false;
 
 	      /* Copy tiny blocks always; copy larger blocks only when the
@@ -1381,13 +1381,14 @@ get_uncond_jump_length (void)
   return length;
 }
 
-/* Emit a barrier into the footer of BB.  */
+/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode.  */
 
-static void
+void
 emit_barrier_after_bb (basic_block bb)
 {
   rtx barrier = emit_barrier_after (BB_END (bb));
-  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
+  if (current_ir_type () == IR_RTL_CFGLAYOUT)
+    BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
 }
 
 /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.
@@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg
 {
   vec<edge> crossing_edges = vNULL;
   basic_block bb;
-  edge e;
-  edge_iterator ei;
+  edge e, e2;
+  edge_iterator ei, ei2;
+  unsigned int cold_bb_count = 0;
+  vec<basic_block> bbs_in_hot_partition = vNULL;
+  vec<basic_block> bbs_newly_hot = vNULL;
 
   /* Mark which partition (hot/cold) each basic block belongs in.  */
   FOR_EACH_BB (bb)
     {
       if (probably_never_executed_bb_p (cfun, bb))
-	BB_SET_PARTITION (bb, BB_COLD_PARTITION);
+        {
+          BB_SET_PARTITION (bb, BB_COLD_PARTITION);
+          cold_bb_count++;
+        }
       else
-	BB_SET_PARTITION (bb, BB_HOT_PARTITION);
+        {
+          BB_SET_PARTITION (bb, BB_HOT_PARTITION);
+          bbs_in_hot_partition.safe_push (bb);
+        }
     }
 
+  /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of
+     several different possibilities. One is that there are edge weight insanities
+     due to optimization phases that do not properly update basic block profile
+     counts. The second is that the entry of the function may not be hot, because
+     it is entered fewer times than the number of profile training runs, but there
+     is a loop inside the function that causes blocks within the function to be
+     above the threshold for hotness.  */
+  if (cold_bb_count)
+    {
+      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
+
+      if (dom_calculated_here)
+        calculate_dominance_info (CDI_DOMINATORS);
+
+      /* Keep examining hot bbs until we have either checked them all, or
+         re-marked all cold bbs hot.  */
+      while (! bbs_in_hot_partition.is_empty ()
+             && cold_bb_count)
+        {
+          basic_block dom_bb;
+
+          bb = bbs_in_hot_partition.pop ();
+          dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
+
+          /* If bb's immediate dominator is also hot then it is ok.  */
+          if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION)
+            continue;
+
+          /* We have a hot bb with an immediate dominator that is cold.
+             The dominator needs to be re-marked to hot.  */
+          BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION);
+          cold_bb_count--;
+
+          /* Now we need to examine newly-hot dom_bb to see if it is also
+             dominated by a cold bb.  */
+          bbs_in_hot_partition.safe_push (dom_bb);
+
+          /* We should also adjust any cold blocks that the newly-hot bb
+             feeds and see if it makes sense to re-mark those as hot as
+             well.  */
+          bbs_newly_hot.safe_push (dom_bb);
+          while (! bbs_newly_hot.is_empty ())
+            {
+              basic_block new_hot_bb = bbs_newly_hot.pop ();
+              /* Examine all successors of this newly-hot bb to see if they
+                 are cold and should be re-marked as hot.  */
+              FOR_EACH_EDGE (e, ei, new_hot_bb->succs)
+                {
+                  bool any_cold_preds = false;
+                  basic_block succ = e->dest;
+                  if (BB_PARTITION (succ) != BB_COLD_PARTITION)
+                    continue;
+                  /* Does this block have any cold predecessors now?  */
+                  FOR_EACH_EDGE (e2, ei2, succ->preds)
+                  {
+                    if (BB_PARTITION (e2->src) == BB_COLD_PARTITION)
+                      {
+                        any_cold_preds = true;
+                        break;
+                      }
+                  }
+                  if (any_cold_preds)
+                    continue;
+
+                  /* Here we have a successor of newly-hot bb that is cold
+                     but no longer has any cold precessessors. Since the original
+                     assignment of our newly-hot bb was incorrect, this successor's
+                     assignment as cold is also suspect. Go ahead and re-mark it
+                     as hot now too. Better heuristics may be in order here.  */
+                  BB_SET_PARTITION (succ, BB_HOT_PARTITION);
+                  cold_bb_count--;
+                  bbs_in_hot_partition.safe_push (succ);
+                  /* Examine this successor as a newly-hot bb.  */
+                  bbs_newly_hot.safe_push (succ);
+                }
+            }
+        }
+
+      if (dom_calculated_here)
+        free_dominance_info (CDI_DOMINATORS);
+    }
+
   /* The format of .gcc_except_table does not allow landing pads to
      be in a different partition as the throw.  Fix this by either
      moving or duplicating the landing pads.  */
@@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void)
 		      new_bb->aux = cur_bb->aux;
 		      cur_bb->aux = new_bb;
 
-		      /* Make sure new fall-through bb is in same
-			 partition as bb it's falling through from.  */
+                      /* This is done by force_nonfallthru_and_redirect.  */
+		      gcc_assert (BB_PARTITION (new_bb)
+                                  == BB_PARTITION (cur_bb));
 
-		      BB_COPY_PARTITION (new_bb, cur_bb);
 		      single_succ_edge (new_bb)->flags |= EDGE_CROSSING;
 		    }
 		  else
@@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void)
   FOR_EACH_BB (bb)
     FOR_EACH_EDGE (e, ei, bb->succs)
       if ((e->flags & EDGE_CROSSING)
-	  && JUMP_P (BB_END (e->src)))
+	  && JUMP_P (BB_END (e->src))
+          /* Some notes were added during fix_up_fall_thru_edges, via
+             force_nonfallthru_and_redirect.  */
+          && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX))
 	add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
 }
 
@@ -2160,7 +2255,7 @@ reorder_basic_blocks (void)
       dump_flow_info (dump_file, dump_flags);
     }
 
-  if (flag_reorder_blocks_and_partition)
+  if (crtl->has_bb_partition)
     verify_hot_cold_block_grouping ();
 }
 
@@ -2172,14 +2267,14 @@ reorder_basic_blocks (void)
    encountering this note will make the compiler switch between the
    hot and cold text sections.  */
 
-static void
+void
 insert_section_boundary_note (void)
 {
   basic_block bb;
   rtx new_note;
   int first_partition = 0;
 
-  if (!flag_reorder_blocks_and_partition)
+  if (!crtl->has_bb_partition)
     return;
 
   FOR_EACH_BB (bb)
@@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void)
   FOR_EACH_BB (bb)
     if (bb->next_bb != EXIT_BLOCK_PTR)
       bb->aux = bb->next_bb;
-  cfg_layout_finalize ();
+  cfg_layout_finalize (true);
 
-  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
-  insert_section_boundary_note ();
   return 0;
 }
 
@@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void)
     }
 
 done:
-  cfg_layout_finalize ();
+  cfg_layout_finalize (false);
 
   BITMAP_FREE (candidates);
   return 0;
@@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void)
   if (!crossing_edges.exists ())
     return 0;
 
+  crtl->has_bb_partition = true;
+
   /* Make sure the source of any crossing edge ends in a jump and the
      destination of any crossing edge has a label.  */
   add_labels_and_missing_jumps (crossing_edges);
Index: bb-reorder.h
===================================================================
--- bb-reorder.h	(revision 193827)
+++ bb-reorder.h	(working copy)
@@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re
 
 extern int get_uncond_jump_length (void);
 
+extern void insert_section_boundary_note (void);
+
+extern void emit_barrier_after_bb (basic_block bb);
+
 #endif
Index: basic-block.h
===================================================================
--- basic-block.h	(revision 193827)
+++ basic-block.h	(working copy)
@@ -800,6 +800,7 @@ extern basic_block force_nonfallthru_and_redirect
 extern bool contains_no_active_insn_p (const_basic_block);
 extern bool forwarder_block_p (const_basic_block);
 extern bool can_fallthru (basic_block, basic_block);
+extern void fixup_partitions (void);
 
 /* In cfgbuild.c.  */
 extern void find_many_sub_basic_blocks (sbitmap);
Index: cfgrtl.c
===================================================================
--- cfgrtl.c	(revision 193827)
+++ cfgrtl.c	(working copy)
@@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree.h"
 #include "hard-reg-set.h"
 #include "basic-block.h"
+#include "bb-reorder.h"
 #include "regs.h"
 #include "flags.h"
 #include "function.h"
@@ -67,11 +68,12 @@ along with GCC; see the file COPYING3.  If not see
    Only applicable if the CFG is in cfglayout mode.  */
 static GTY(()) rtx cfg_layout_function_footer;
 static GTY(()) rtx cfg_layout_function_header;
+static bool had_sec_boundary_notes;
 
 static rtx skip_insns_after_block (basic_block);
 static void record_effective_endpoints (void);
 static rtx label_for_bb (basic_block);
-static void fixup_reorder_chain (void);
+static void fixup_reorder_chain (bool finalize_reorder_blocks);
 
 void verify_insn_chain (void);
 static void fixup_fallthru_exit_predecessor (void);
@@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc
      partition boundaries).  See  the comments at the top of
      bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
 
-  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
-      || BB_PARTITION (src) != BB_PARTITION (target))
+  if (BB_PARTITION (src) != BB_PARTITION (target))
     return NULL;
 
   /* We can replace or remove a complex jump only when we have exactly
@@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target)
   return e;
 }
 
+/* Called when edge E has been redirected to a new destination,
+   in order to update the region crossing flag on the edge and
+   jump.  */
+
+static void
+fixup_partition_crossing (edge e, basic_block target)
+{
+  rtx note;
+
+  gcc_assert (e->dest == target);
+
+  if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR)
+    return;
+  /* If we redirected an existing edge, it may already be marked
+     crossing, even though the new src is missing a reg crossing note.
+     But make sure reg crossing note doesn't already exist before
+     inserting.  */
+  if (BB_PARTITION (e->src) != BB_PARTITION (target))
+    {
+      e->flags |= EDGE_CROSSING;
+      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
+      if (JUMP_P (BB_END (e->src))
+          && !note)
+        add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
+    }
+  else if (BB_PARTITION (e->src) == BB_PARTITION (target))
+    {
+      e->flags &= ~EDGE_CROSSING;
+      /* Remove the region crossing note from jump at end of
+         e->src if it exists.  */
+      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
+      if (note)
+        remove_note (BB_END (e->src), note);
+    }
+}
+
+/* Called when block BB has been reassigned to a different partition,
+   to ensure that the region crossing attributes are updated.  */
+
+static void
+fixup_bb_partition (basic_block bb)
+{
+  edge e;
+  edge_iterator ei;
+
+  /* Now need to make bb's pred edges non-region crossing.  */
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    {
+      fixup_partition_crossing (e, e->dest);
+    }
+
+  /* Possibly need to make bb's successor edges region crossing,
+     or remove stale region crossing.  */
+  FOR_EACH_EDGE (e, ei, bb->succs)
+    {
+      if ((e->flags & EDGE_FALLTHRU)
+          && BB_PARTITION (bb) != BB_PARTITION (e->dest)
+          && e->dest != EXIT_BLOCK_PTR)
+        /* force_nonfallthru_and_redirect calls fixup_partition_crossing.  */
+        force_nonfallthru (e);
+      else
+        fixup_partition_crossing (e, e->dest);
+    }
+}
+
 /* Attempt to change code to redirect edge E to TARGET.  Don't do that on
    expense of adding new instructions or reordering basic blocks.
 
@@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block
 {
   edge ret;
   basic_block src = e->src;
+  basic_block dest = e->dest;
 
   if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
     return NULL;
 
-  if (e->dest == target)
+  if (dest == target)
     return e;
 
   if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL)
     {
       df_set_bb_dirty (src);
+      fixup_partition_crossing (ret, target);
       return ret;
     }
 
@@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block
     return NULL;
 
   df_set_bb_dirty (src);
+  fixup_partition_crossing (ret, target);
   return ret;
 }
 
@@ -1486,18 +1555,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
       /* Make sure new block ends up in correct hot/cold section.  */
 
       BB_COPY_PARTITION (jump_block, e->src);
-      if (flag_reorder_blocks_and_partition
-	  && targetm_common.have_named_sections
-	  && JUMP_P (BB_END (jump_block))
-	  && !any_condjump_p (BB_END (jump_block))
-	  && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING))
-	add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX);
 
       /* Wire edge in.  */
       new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU);
       new_edge->probability = probability;
       new_edge->count = count;
 
+      /* If e->src was previously region crossing, it no longer is
+         and the reg crossing note should be removed.  */
+      fixup_partition_crossing (new_edge, jump_block);
+
       /* Redirect old edge.  */
       redirect_edge_pred (e, jump_block);
       e->probability = REG_BR_PROB_BASE;
@@ -1553,13 +1620,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
       LABEL_NUSES (label)++;
     }
 
-  emit_barrier_after (BB_END (jump_block));
+  /* We might be in cfg layout mode, and if so, the following routine will
+     insert the barrier correctly.  */
+  emit_barrier_after_bb (jump_block);
   redirect_edge_succ_nodup (e, target);
 
   if (abnormal_edge_flags)
     make_edge (src, target, abnormal_edge_flags);
 
   df_mark_solutions_dirty ();
+  fixup_partition_crossing (e, target);
   return new_bb;
 }
 
@@ -1658,7 +1734,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU
 static basic_block
 rtl_split_edge (edge edge_in)
 {
-  basic_block bb;
+  basic_block bb, new_bb;
   rtx before;
 
   /* Abnormal edges cannot be split.  */
@@ -1691,12 +1767,26 @@ rtl_split_edge (edge edge_in)
   else
     {
       bb = create_basic_block (before, NULL, edge_in->dest->prev_bb);
-      /* ??? Why not edge_in->dest->prev_bb here?  */
-      BB_COPY_PARTITION (bb, edge_in->dest);
+      if (edge_in->src == ENTRY_BLOCK_PTR)
+        BB_COPY_PARTITION (bb, edge_in->dest);
+      else
+        /* Put the split bb into the src partition, to avoid creating
+           a situation where a cold bb dominates a hot bb, in the case
+           where src is cold and dest is hot. The src will dominate
+           the new bb (whereas it might not have dominated dest).  */
+        BB_COPY_PARTITION (bb, edge_in->src);
     }
 
   make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU);
 
+  /* Can't allow a region crossing edge to be fallthrough.  */
+  if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest)
+      && edge_in->dest != EXIT_BLOCK_PTR)
+    {
+      new_bb = force_nonfallthru (single_succ_edge (bb));
+      gcc_assert (!new_bb);
+    }
+
   /* For non-fallthru edges, we must adjust the predecessor's
      jump instruction to target our new block.  */
   if ((edge_in->flags & EDGE_FALLTHRU) == 0)
@@ -1809,17 +1899,13 @@ commit_one_edge_insertion (edge e)
   else
     {
       bb = split_edge (e);
-      after = BB_END (bb);
 
-      if (flag_reorder_blocks_and_partition
-	  && targetm_common.have_named_sections
-	  && e->src != ENTRY_BLOCK_PTR
-	  && BB_PARTITION (e->src) == BB_COLD_PARTITION
-	  && !(e->flags & EDGE_CROSSING)
-	  && JUMP_P (after)
-	  && !any_condjump_p (after)
-	  && (single_succ_edge (bb)->flags & EDGE_CROSSING))
-	add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX);
+      /* If e crossed a partition boundary, we needed to make bb end in
+         a region-crossing jump, even though it was originally fallthru.  */
+      if (JUMP_P (BB_END (bb)))
+	before = BB_END (bb);
+      else
+        after = BB_END (bb);
     }
 
   /* Now that we've found the spot, do the insertion.  */
@@ -1859,6 +1945,14 @@ commit_edge_insertions (void)
 {
   basic_block bb;
 
+  /* Optimization passes that invoke this routine can cause hot blocks
+     previously reached by both hot and cold blocks to become dominated only
+     by cold blocks. This will cause the verification below to fail,
+     and lead to now cold code in the hot section. In some cases this
+     may only be visible after newly unreachable blocks are deleted,
+     which will be done by fixup_partitions.  */
+  fixup_partitions ();
+
 #ifdef ENABLE_CHECKING
   verify_flow_info ();
 #endif
@@ -2060,7 +2154,75 @@ get_last_bb_insn (basic_block bb)
 
   return end;
 }
-\f
+
+/* Perform cleanup on the hot/cold bb partitioning after optimization
+   passes that modify the cfg.  */
+
+void
+fixup_partitions (void)
+{
+  basic_block bb;
+
+  if (!crtl->has_bb_partition)
+    return;
+
+  /* Delete any blocks that became unreachable and weren't
+     already cleaned up, for example during edge forwarding
+     and convert_jumps_to_returns. This will expose more
+     opportunities for fixing the partition boundaries here.
+     Also, the calculation of the dominance graph during verification
+     will assert if there are unreachable nodes.  */
+  delete_unreachable_blocks ();
+
+  /* If there are partitions, do a sanity check on them: A basic block in
+     a cold partition cannot dominate a basic block in a hot partition.
+     Fixup any that now violate this requirement, as a result of edge
+     forwarding and unreachable block deletion.  */
+  vec<basic_block> bbs_in_cold_partition = vNULL;
+  vec<basic_block> bbs_to_fix = vNULL;
+  FOR_EACH_BB (bb)
+    if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
+      bbs_in_cold_partition.safe_push (bb);
+  if (! bbs_in_cold_partition.is_empty  ())
+    {
+      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
+      basic_block son;
+
+      if (dom_calculated_here)
+        calculate_dominance_info (CDI_DOMINATORS);
+
+      while (! bbs_in_cold_partition.is_empty  ())
+        {
+          bb = bbs_in_cold_partition.pop ();
+          /* If bb is not yet cold (because it was added below as
+             a block dominated by a cold bb) then mark it cold here.  */
+          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
+            {
+              BB_SET_PARTITION (bb, BB_COLD_PARTITION);
+              bbs_to_fix.safe_push (bb);
+            }
+          /* Any blocks dominated by a block in the cold section
+             must also be cold.  */
+          for (son = first_dom_son (CDI_DOMINATORS, bb);
+               son;
+               son = next_dom_son (CDI_DOMINATORS, son))
+            bbs_in_cold_partition.safe_push (son);
+        }
+
+      if (dom_calculated_here)
+        free_dominance_info (CDI_DOMINATORS);
+    }
+
+  /* Do the partition fixup after all necessary blocks have been converted to
+     cold, so that we only update the region crossings the minimum number of
+     places, which can require forcing edges to be non fallthru.  */
+  while (! bbs_to_fix.is_empty ())
+    {
+      bb = bbs_to_fix.pop ();
+      fixup_bb_partition (bb);
+    }
+}
+
 /* Verify the CFG and RTL consistency common for both underlying RTL and
    cfglayout RTL.
 
@@ -2084,6 +2246,7 @@ rtl_verify_flow_info_1 (void)
   rtx x;
   int err = 0;
   basic_block bb;
+  bool have_partitions = false;
 
   /* Check the general integrity of the basic blocks.  */
   FOR_EACH_BB_REVERSE (bb)
@@ -2201,6 +2364,8 @@ rtl_verify_flow_info_1 (void)
 
 	  if (e->flags & EDGE_ABNORMAL)
 	    n_abnormal++;
+
+          have_partitions |= is_crossing;
 	}
 
       if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX))
@@ -2325,6 +2490,40 @@ rtl_verify_flow_info_1 (void)
 	  }
     }
 
+  /* If there are partitions, do a sanity check on them: A basic block in
+     a cold partition cannot dominate a basic block in a hot partition.  */
+  vec<basic_block> bbs_in_cold_partition = vNULL;
+  if (have_partitions && !err)
+    FOR_EACH_BB (bb)
+      if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
+        bbs_in_cold_partition.safe_push (bb);
+  if (! bbs_in_cold_partition.is_empty ())
+    {
+      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
+      basic_block son;
+
+      if (dom_calculated_here)
+        calculate_dominance_info (CDI_DOMINATORS);
+
+      while (! bbs_in_cold_partition.is_empty ())
+        {
+          bb = bbs_in_cold_partition.pop ();
+          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
+            {
+              error ("non-cold basic block %d dominated "
+                     "by a block in the cold partition", bb->index);
+              err = 1;
+            }
+          for (son = first_dom_son (CDI_DOMINATORS, bb);
+               son;
+               son = next_dom_son (CDI_DOMINATORS, son))
+            bbs_in_cold_partition.safe_push (son);
+        }
+
+      if (dom_calculated_here)
+        free_dominance_info (CDI_DOMINATORS);
+    }
+
   /* Clean up.  */
   return err;
 }
@@ -2997,14 +3196,41 @@ record_effective_endpoints (void)
   else
     cfg_layout_function_header = NULL_RTX;
 
+  had_sec_boundary_notes = false;
+
   next_insn = get_insns ();
   FOR_EACH_BB (bb)
     {
       rtx end;
 
       if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb))
-	BB_HEADER (bb) = unlink_insn_chain (next_insn,
-					      PREV_INSN (BB_HEAD (bb)));
+        {
+          /* Rather than try to keep section boundary notes incrementally
+             up-to-date through cfg layout optimizations, simply remove them
+             and flag that they should be re-inserted when exiting
+             cfg layout mode.  */
+          rtx check_insn = next_insn;
+          while (check_insn)
+            {
+              if (NOTE_P (check_insn)
+                  && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS)
+              {
+                had_sec_boundary_notes |= true;
+                /* Remove note from chain. Grab new next_insn first.  */
+                if (next_insn == check_insn)
+                  next_insn = NEXT_INSN (check_insn);
+                /* Delete note.  */
+                delete_insn (check_insn);
+                /* There will only be one.  */
+                break;
+              }
+              check_insn = NEXT_INSN (check_insn);
+            }
+          /* If we still have header instructions left after above loop.  */
+          if (next_insn != BB_HEAD (bb))
+            BB_HEADER (bb) = unlink_insn_chain (next_insn,
+                                                PREV_INSN (BB_HEAD (bb)));
+        }
       end = skip_insns_after_block (bb);
       if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end)
 	BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end);
@@ -3032,7 +3258,7 @@ outof_cfg_layout_mode (void)
     if (bb->next_bb != EXIT_BLOCK_PTR)
       bb->aux = bb->next_bb;
 
-  cfg_layout_finalize ();
+  cfg_layout_finalize (false);
 
   return 0;
 }
@@ -3152,10 +3378,13 @@ relink_block_chain (bool stay_in_cfglayout_mode)
 }
 \f
 
-/* Given a reorder chain, rearrange the code to match.  */
+/* Given a reorder chain, rearrange the code to match. If
+   this is called when we will FINALIZE_REORDER_BLOCKS, or when
+   section boundary notes were removed on entry to cfg layout
+   mode, insert section boundary notes here.  */
 
 static void
-fixup_reorder_chain (void)
+fixup_reorder_chain (bool finalize_reorder_blocks)
 {
   basic_block bb;
   rtx insn = NULL;
@@ -3182,7 +3411,7 @@ static void
 	  PREV_INSN (BB_HEADER (bb)) = insn;
 	  insn = BB_HEADER (bb);
 	  while (NEXT_INSN (insn))
-	    insn = NEXT_INSN (insn);
+            insn = NEXT_INSN (insn);
 	}
       if (insn)
 	NEXT_INSN (insn) = BB_HEAD (bb);
@@ -3207,6 +3436,11 @@ static void
     insn = NEXT_INSN (insn);
 
   set_last_insn (insn);
+
+  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
+  if (had_sec_boundary_notes || finalize_reorder_blocks)
+    insert_section_boundary_note ();
+
 #ifdef ENABLE_CHECKING
   verify_insn_chain ();
 #endif
@@ -3219,7 +3453,7 @@ static void
       edge e_fall, e_taken, e;
       rtx bb_end_insn;
       rtx ret_label = NULL_RTX;
-      basic_block nb, src_bb;
+      basic_block nb;
       edge_iterator ei;
 
       if (EDGE_COUNT (bb->succs) == 0)
@@ -3354,7 +3588,6 @@ static void
       /* We got here if we need to add a new jump insn. 
 	 Note force_nonfallthru can delete E_FALL and thus we have to
 	 save E_FALL->src prior to the call to force_nonfallthru.  */
-      src_bb = e_fall->src;
       nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label);
       if (nb)
 	{
@@ -3362,17 +3595,6 @@ static void
 	  bb->aux = nb;
 	  /* Don't process this new block.  */
 	  bb = nb;
-
-	  /* Make sure new bb is tagged for correct section (same as
-	     fall-thru source, since you cannot fall-thru across
-	     section boundaries).  */
-	  BB_COPY_PARTITION (src_bb, single_pred (bb));
-	  if (flag_reorder_blocks_and_partition
-	      && targetm_common.have_named_sections
-	      && JUMP_P (BB_END (bb))
-	      && !any_condjump_p (BB_END (bb))
-	      && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING))
-	    add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX);
 	}
     }
 
@@ -3676,10 +3898,11 @@ duplicate_insn_chain (rtx from, rtx to)
 	    case NOTE_INSN_FUNCTION_BEG:
 	      /* There is always just single entry to function.  */
 	    case NOTE_INSN_BASIC_BLOCK:
+              /* We should only switch text sections once.  */
+	    case NOTE_INSN_SWITCH_TEXT_SECTIONS:
 	      break;
 
 	    case NOTE_INSN_EPILOGUE_BEG:
-	    case NOTE_INSN_SWITCH_TEXT_SECTIONS:
 	      emit_note_copy (insn);
 	      break;
 
@@ -3791,10 +4014,13 @@ break_superblocks (void)
 }
 
 /* Finalize the changes: reorder insn list according to the sequence specified
-   by aux pointers, enter compensation code, rebuild scope forest.  */
+   by aux pointers, enter compensation code, rebuild scope forest. If
+   this is called when we will FINALIZE_REORDER_BLOCKS, indicate that
+   to fixup_reorder_chain so that it can insert the proper switch text
+   section notes.  */
 
 void
-cfg_layout_finalize (void)
+cfg_layout_finalize (bool finalize_reorder_blocks)
 {
 #ifdef ENABLE_CHECKING
   verify_flow_info ();
@@ -3807,7 +4033,7 @@ void
 #endif
       )
     fixup_fallthru_exit_predecessor ();
-  fixup_reorder_chain ();
+  fixup_reorder_chain (finalize_reorder_blocks);
 
   rebuild_jump_labels (get_insns ());
   delete_dead_jumptables ();
@@ -4486,8 +4712,7 @@ rtl_can_remove_branch_p (const_edge e)
   if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
     return false;
 
-  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
-      || BB_PARTITION (src) != BB_PARTITION (target))
+  if (BB_PARTITION (src) != BB_PARTITION (target))
     return false;
 
   if (!onlyjump_p (insn)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
  2012-11-26 20:52         ` Teresa Johnson
@ 2012-11-28 15:49           ` Christophe Lyon
  2012-11-28 15:57             ` Teresa Johnson
       [not found]             ` <CAAe5K+UOyQrDyg=pY7za9YRK=8-3dVVsfcMuJdsJp4w2X6BaJg@mail.gmail.com>
  0 siblings, 2 replies; 35+ messages in thread
From: Christophe Lyon @ 2012-11-28 15:49 UTC (permalink / raw)
  To: Teresa Johnson
  Cc: Jack Howarth, reply, David Li, Steven Bosscher,
	Matthew Gretton-Dann, gcc-patches

I have updated my trunk checkout, and I can confirm that eval.c now
compiles with your patch (and the other 4 patches I added to PR55121).

Now, when looking at the whole Spec2k results:
- vpr passes now (used to fail)
- gcc, parser, perlbmk bzip2 and twolf no longer build: they all fail
with the same error from gas:
can't resolve `.text.unlikely' {.text.unlikely section} - `.LBB171'
{.text section}
- gap still does not build (same error as above)

I haven't looked in detail, so I may be missing an obvious patch here.

And I still observe runtime mis-behaviour on crafty, galgel, facerec and fma3d.

Thanks
Christophe.


On 26 November 2012 21:52, Teresa Johnson <tejohnson@google.com> wrote:
> Sorry, I don't know what happened there. Patch is attached.
> Thanks,
> Teresa
>
> On Mon, Nov 26, 2012 at 12:42 PM, Jack Howarth <howarth@bromo.med.uc.edu> wrote:
>> On Mon, Nov 26, 2012 at 12:19:55PM -0800, Teresa Johnson wrote:
>>> Are you sure you have all my changes applied? I applied the 4 patches
>>> attached to PR55121 into my trunk checkout that has my fixes, and to a
>>> pristine trunk checkout. I configured and built both for
>>> --target=arm-none-linux-gnueabi, and built using your options, .i file
>>> and gcda file. I can reproduce the failure using the pristine trunk
>>> with your patches but not with my fixed trunk + your patches. (I just
>>> updated to head to pickup recent changes and get the same result. The
>>> vec changes required some manual changes to the patch, which I will
>>> resend shortly.)
>>
>> Teresa,
>>     Your mailer seems to have corrupted the posted patch with stray
>> =3D characters and line breaks. Can you repost a copy as an attachment
>> to the list?
>>              Jack
>>
>>>
>>> Without my fixes:
>>>
>>> $ ~/extra/gcc_trunk_3_arm-eabi/gcc/cc1 -fpreproce
>>> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9
>>> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp
>>> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use
>>> -fno-common -o eval.s -freorder-blocks-and-partition
>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
>>> 2.4.2-p1, MPC version 0.8.1
>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
>>> 2.4.2-p1, MPC version 0.8.1
>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>>> Compiler executable checksum: d19cc60a2f07de08237a8488bb35cd1a
>>> eval.c: In function ‘Ge’:
>>> eval.c:792:1: internal compiler error: in df_compact_blocks, at df-core.c:1560
>>>  }
>>>  ^
>>> 0x622f71 df_compact_blocks()
>>> ../../gcc_trunk_3/gcc/df-core.c:1560
>>> 0x5cfcb5 compact_blocks()
>>> ../../gcc_trunk_3/gcc/cfg.c:162
>>> 0xc9dce0 reorder_basic_blocks
>>> ../../gcc_trunk_3/gcc/bb-reorder.c:2154
>>> 0xc9dce0 rest_of_handle_reorder_blocks
>>> ../../gcc_trunk_3/gcc/bb-reorder.c:2219
>>> Please submit a full bug report,
>>> with preprocessed source if appropriate.
>>> Please include the complete backtrace with any bug report.
>>> See <http://gcc.gnu.org/bugs.html> for instructions.
>>>
>>>
>>> With my fixes:
>>>
>>> $ ~/extra/gcc_trunk_4_arm-eabi/gcc/cc1 -fpreproce
>>> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9
>>> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp
>>> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use
>>> -fno-common -o eval.s -freorder-blocks-and-partition
>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
>>> 2.4.2-p1, MPC version 0.8.1
>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
>>> 2.4.2-p1, MPC version 0.8.1
>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>>> Compiler executable checksum: 45b468efa7c981f9afb44c4dac2424f3
>>>
>>>
>>> Thanks,
>>> Teresa
>>>
>>> On Mon, Nov 26, 2012 at 8:25 AM, Christophe Lyon
>>> <christophe.lyon@linaro.org> wrote:
>>> > Hi,
>>> >
>>> > I have tested your patch on Spec2000 on ARM, and I can still see
>>> > several failures caused by:
>>> > "error: fallthru edge crosses section boundary", including the case
>>> > described in PR55121.
>>> >
>>> > On 26 November 2012 16:55, Teresa Johnson <tejohnson@google.com> wrote:
>>> >> Ping.
>>> >> Teresa
>>> >>
>>> >> On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote:
>>> >>> Revised patch that fixes failures encountered when enabling
>>> >>> -freorder-blocks-and-partition, including the failure reported in PR 53743.
>>> >>>
>>> >>> This includes new verification code to ensure no cold blocks dominate hot
>>> >>> blocks contributed by Steven Bosscher.
>>> >>>
>>> >>> I attempted to make the handling of partition updates through the optimization
>>> >>> passes much more consistent, removing a number of partial fixes in the code
>>> >>> stream in the process. The code to fixup partitions (including the BB_PARTITION
>>> >>> assignement, region crossing jump notes, and switch text section notes) is
>>> >>> now handled in a few centralized locations. For example, inside
>>> >>> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers
>>> >>> don't need to attempt the fixup themselves.
>>> >>>
>>> >>> For optimization passes that make adjustments to the cfg while in cfg layout
>>> >>> mode that are not easy to fix up incrementally, the new routine
>>> >>> fixup_partitions handles the cleanup globally. This does require calculation
>>> >>> of the dominance relation, however, as far as I can tell the routines which
>>> >>> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions)
>>> >>> are invoked typically once (or a small number of times in the case of
>>> >>> try_optimize_cfg) per optimization pass. Additionally, I compared the
>>> >>> -ftime-report output for some large fdo compilations and saw only minimal
>>> >>> increases in the dominance computation times, which were only a tiny percent
>>> >>> of the overall compile time.
>>> >>>
>>> >>> Additionally, I added a flag to the rtl_data structure to indicate whether
>>> >>> any partitioning was actually performed, so that optimizations which were
>>> >>> conservatively disabled whenever the flag_reorder_blocks_and_partition
>>> >>> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less
>>> >>> conservative for functions where no partitions were formed (e.g. they are
>>> >>> completely hot).
>>> >>>
>>> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int
>>> >>> benchmarks and internal google benchmarks using profile feedback and
>>> >>> -freorder-blocks-and-partition to get more coverage. Ok for trunk?
>>> >>>
>>> >>> Thanks,
>>> >>> Teresa
>>> >>>
>>> >>> 2012-11-14  Teresa Johnson  <tejohnson@google.com>
>>> >>>             Steven Bosscher  <steven@gcc.gnu.org>
>>> >>>
>>> >>>         * cfghooks.h (cfg_layout_finalize): New parameter.
>>> >>>         * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize
>>> >>>         parameter.
>>> >>>         * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert
>>> >>>         as this is now done by redirect_edge_and_branch_force.
>>> >>>         * function.c (thread_prologue_and_epilogue_insns): Insert new bb after
>>> >>>         barriers, new cfg_layout_finalize parameter, and don't store exit
>>> >>>         predecessor BB until after it is potentially split.
>>> >>>         * function.h (struct rtl_data): New flag has_bb_partition.
>>> >>>         * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter.
>>> >>>         * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if
>>> >>>         any blocks in function actually partitioned.
>>> >>>         (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean
>>> >>>         up partitioning.
>>> >>>         * bb-reorder.c (connect_traces): Only look for partitions and skip
>>> >>>         block copying if any blocks in function actually partitioned.
>>> >>>         (emit_barrier_after_bb): Handle insertion in non-cfglayout mode.
>>> >>>         (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure
>>> >>>         that no cold blocks dominate a hot block.
>>> >>>         (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert
>>> >>>         as this is now done by force_nonfallthru_and_redirect.
>>> >>>         (add_reg_crossing_jump_notes): Handle the fact that some jumps may
>>> >>>         already be marked with region crossing note.
>>> >>>         (reorder_basic_blocks): Only need to verify partitions if any
>>> >>>         blocks in function actually partitioned.
>>> >>>         (insert_section_boundary_note): Only need to insert note if any
>>> >>>         blocks in function actually partitioned.
>>> >>>         (rest_of_handle_reorder_blocks): New cfg_layout_finalize
>>> >>>         parameter, and remove call to insert_section_boundary_note as this
>>> >>>         is now called via cfg_layout_finalize/fixup_reorder_chain.
>>> >>>         (duplicate_computed_gotos): New cfg_layout_finalize
>>> >>>         parameter.
>>> >>>         (partition_hot_cold_basic_blocks): Set flag indicating function
>>> >>>         has bb partitions.
>>> >>>         * bb-reorder.h: Declare insert_section_boundary_note and
>>> >>>         emit_barrier_after_bb, which are no longer static.
>>> >>>         * basic-block.h: Declare new function fixup_partitions.
>>> >>>         * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary
>>> >>>         check for region crossing note.
>>> >>>         (fixup_partition_crossing): New function.
>>> >>>         (fixup_bb_partition): Ditto.
>>> >>>         (rtl_redirect_edge_and_branch): Fixup partition boundaries.
>>> >>>         (force_nonfallthru_and_redirect): Fixup partition boundaries,
>>> >>>         remove old code that tried to do this. Emit barrier correctly
>>> >>>         when we are in cfglayout mode.
>>> >>>         (rtl_split_edge): Correctly fixup partition boundaries.
>>> >>>         (commit_one_edge_insertion): Remove old code that tried to
>>> >>>         fixup region crossing edge since this is now handled in
>>> >>>         split_block, and set up insertion point correctly since
>>> >>>         block may now end in a jump.
>>> >>>         (commit_edge_insertions): Invoke fixup_partitions to sanitize partition
>>> >>>         boundaries after optimizations that modify cfg and before trying to
>>> >>>         verify the flow info.
>>> >>>         (fixup_partitions): New function.
>>> >>>         (rtl_verify_flow_info_1): Add verification that no cold bbs dominate
>>> >>>         hot bbs.
>>> >>>         (record_effective_endpoints): Remove region-crossing notes and set flag
>>> >>>         indicating that they need to be reinserted on exit from cfglayout mode.
>>> >>>         (outof_cfg_layout_mode): New cfg_layout_finalize parameter.
>>> >>>         (fixup_reorder_chain): Call insert_section_boundary_note if necessary.
>>> >>>         Remove old code that attempted to fixup region crossing note as
>>> >>>         this is now handled in force_nonfallthru_and_redirect.
>>> >>>         (duplicate_insn_chain): Don't duplicate switch section notes.
>>> >>>         (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain.
>>> >>>         (rtl_can_remove_branch_p): Remove unnecessary check for region crossing
>>> >>>         note.
>>> >>>
>>> >>> Index: cfghooks.h
>>> >>> ===================================================================
>>> >>> --- cfghooks.h  (revision 193376)
>>> >>> +++ cfghooks.h  (working copy)
>>> >>> @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas
>>> >>>  void account_profile_record (struct profile_record *, int);
>>> >>>
>>> >>>  extern void cfg_layout_initialize (unsigned int);
>>> >>> -extern void cfg_layout_finalize (void);
>>> >>> +extern void cfg_layout_finalize (bool);
>>> >>>
>>> >>>  /* Hooks containers.  */
>>> >>>  extern struct cfg_hooks gimple_cfg_hooks;
>>> >>> @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi
>>> >>>  extern void gimple_register_cfg_hooks (void);
>>> >>>  extern struct cfg_hooks get_cfg_hooks (void);
>>> >>>  extern void set_cfg_hooks (struct cfg_hooks);
>>> >>> -
>>> >>> Index: modulo-sched.c
>>> >>> ===================================================================
>>> >>> --- modulo-sched.c      (revision 193376)
>>> >>> +++ modulo-sched.c      (working copy)
>>> >>> @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void)
>>> >>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>>> >>>        bb->aux = bb->next_bb;
>>> >>>    free_dominance_info (CDI_DOMINATORS);
>>> >>> -  cfg_layout_finalize ();
>>> >>> +  cfg_layout_finalize (false);
>>> >>>  #endif /* INSN_SCHEDULING */
>>> >>>    return 0;
>>> >>>  }
>>> >>> Index: ifcvt.c
>>> >>> ===================================================================
>>> >>> --- ifcvt.c     (revision 193376)
>>> >>> +++ ifcvt.c     (working copy)
>>> >>> @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg
>>> >>>    if (new_bb)
>>> >>>      {
>>> >>>        df_bb_replace (then_bb_index, new_bb);
>>> >>> -      /* Since the fallthru edge was redirected from test_bb to new_bb,
>>> >>> -         we need to ensure that new_bb is in the same partition as
>>> >>> -         test bb (you can not fall through across section boundaries).  */
>>> >>> -      BB_COPY_PARTITION (new_bb, test_bb);
>>> >>> +      /* This should have been done above via force_nonfallthru_and_redirect
>>> >>> +         (possibly called from redirect_edge_and_branch_force).  */
>>> >>> +      gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb));
>>> >>>      }
>>> >>>
>>> >>>    num_true_changes++;
>>> >>> Index: function.c
>>> >>> ===================================================================
>>> >>> --- function.c  (revision 193376)
>>> >>> +++ function.c  (working copy)
>>> >>> @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void)
>>> >>>                     break;
>>> >>>                 if (e)
>>> >>>                   {
>>> >>> -                   copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)),
>>> >>> -                                                 NULL_RTX, e->src);
>>> >>> +                    /* Make sure we insert after any barriers.  */
>>> >>> +                    rtx end = get_last_bb_insn (e->src);
>>> >>> +                    copy_bb = create_basic_block (NEXT_INSN (end),
>>> >>> +                                                  NULL_RTX, e->src);
>>> >>>                     BB_COPY_PARTITION (copy_bb, e->src);
>>> >>>                   }
>>> >>>                 else
>>> >>> @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void)
>>> >>>         if (cur_bb->index >= NUM_FIXED_BLOCKS
>>> >>>             && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS)
>>> >>>           cur_bb->aux = cur_bb->next_bb;
>>> >>> -      cfg_layout_finalize ();
>>> >>> +      cfg_layout_finalize (false);
>>> >>>      }
>>> >>>
>>> >>>  epilogue_done:
>>> >>> @@ -6517,7 +6519,7 @@ epilogue_done:
>>> >>>        basic_block simple_return_block_cold = NULL;
>>> >>>        edge pending_edge_hot = NULL;
>>> >>>        edge pending_edge_cold = NULL;
>>> >>> -      basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb;
>>> >>> +      basic_block exit_pred;
>>> >>>        int i;
>>> >>>
>>> >>>        gcc_assert (entry_edge != orig_entry_edge);
>>> >>> @@ -6545,6 +6547,12 @@ epilogue_done:
>>> >>>             else
>>> >>>               pending_edge_cold = e;
>>> >>>           }
>>> >>> +
>>> >>> +      /* Save a pointer to the exit's predecessor BB for use in
>>> >>> +         inserting new BBs at the end of the function. Do this
>>> >>> +         after the call to split_block above which may split
>>> >>> +         the original exit pred.  */
>>> >>> +      exit_pred = EXIT_BLOCK_PTR->prev_bb;
>>> >>>
>>> >>>        FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e)
>>> >>>         {
>>> >>> Index: function.h
>>> >>> ===================================================================
>>> >>> --- function.h  (revision 193376)
>>> >>> +++ function.h  (working copy)
>>> >>> @@ -459,6 +459,11 @@ struct GTY(()) rtl_data {
>>> >>>       sched2) and is useful only if the port defines LEAF_REGISTERS.  */
>>> >>>    bool uses_only_leaf_regs;
>>> >>>
>>> >>> +  /* Nonzero if the function being compiled has undergone hot/cold partitioning
>>> >>> +     (under flag_reorder_blocks_and_partition) and has at least one cold
>>> >>> +     block.  */
>>> >>> +  bool has_bb_partition;
>>> >>> +
>>> >>>    /* Like regs_ever_live, but 1 if a reg is set or clobbered from an
>>> >>>       asm.  Unlike regs_ever_live, elements of this array corresponding
>>> >>>       to eliminable regs (like the frame pointer) are set if an asm
>>> >>> Index: hw-doloop.c
>>> >>> ===================================================================
>>> >>> --- hw-doloop.c (revision 193376)
>>> >>> +++ hw-doloop.c (working copy)
>>> >>> @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops)
>>> >>>        else
>>> >>>         bb->aux = NULL;
>>> >>>      }
>>> >>> -  cfg_layout_finalize ();
>>> >>> +  cfg_layout_finalize (false);
>>> >>>    clear_aux_for_blocks ();
>>> >>>    df_analyze ();
>>> >>>  }
>>> >>> Index: cfgcleanup.c
>>> >>> ===================================================================
>>> >>> --- cfgcleanup.c        (revision 193376)
>>> >>> +++ cfgcleanup.c        (working copy)
>>> >>> @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2,
>>> >>>       partition boundaries).  See the comments at the top of
>>> >>>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>>> >>>
>>> >>> -  if (flag_reorder_blocks_and_partition && reload_completed)
>>> >>> +  if (crtl->has_bb_partition && reload_completed)
>>> >>>      return false;
>>> >>>
>>> >>>    /* Search backward through forwarder blocks.  We don't need to worry
>>> >>> @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode)
>>> >>>               df_analyze ();
>>> >>>             }
>>> >>>
>>> >>> +         if (changed)
>>> >>> +            {
>>> >>> +              /* Edge forwarding in particular can cause hot blocks previously
>>> >>> +                 reached by both hot and cold blocks to become dominated only
>>> >>> +                 by cold blocks. This will cause the verification below to fail,
>>> >>> +                 and lead to now cold code in the hot section. This is not easy
>>> >>> +                 to detect and fix during edge forwarding, and in some cases
>>> >>> +                 is only visible after newly unreachable blocks are deleted,
>>> >>> +                 which will be done in fixup_partitions.  */
>>> >>> +              fixup_partitions ();
>>> >>> +
>>> >>>  #ifdef ENABLE_CHECKING
>>> >>> -         if (changed)
>>> >>> -           verify_flow_info ();
>>> >>> +              verify_flow_info ();
>>> >>>  #endif
>>> >>> +            }
>>> >>>
>>> >>>           changed_overall |= changed;
>>> >>>           first_pass = false;
>>> >>> Index: bb-reorder.c
>>> >>> ===================================================================
>>> >>> --- bb-reorder.c        (revision 193376)
>>> >>> +++ bb-reorder.c        (working copy)
>>> >>> @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces
>>> >>>    current_partition = BB_PARTITION (traces[0].first);
>>> >>>    two_passes = false;
>>> >>>
>>> >>> -  if (flag_reorder_blocks_and_partition)
>>> >>> +  if (crtl->has_bb_partition)
>>> >>>      for (i = 0; i < n_traces && !two_passes; i++)
>>> >>>        if (BB_PARTITION (traces[0].first)
>>> >>>           != BB_PARTITION (traces[i].first))
>>> >>> @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces
>>> >>>                       }
>>> >>>                   }
>>> >>>
>>> >>> -             if (flag_reorder_blocks_and_partition)
>>> >>> +             if (crtl->has_bb_partition)
>>> >>>                 try_copy = false;
>>> >>>
>>> >>>               /* Copy tiny blocks always; copy larger blocks only when the
>>> >>> @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void)
>>> >>>    return length;
>>> >>>  }
>>> >>>
>>> >>> -/* Emit a barrier into the footer of BB.  */
>>> >>> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode.  */
>>> >>>
>>> >>> -static void
>>> >>> +void
>>> >>>  emit_barrier_after_bb (basic_block bb)
>>> >>>  {
>>> >>>    rtx barrier = emit_barrier_after (BB_END (bb));
>>> >>> -  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>>> >>> +  if (current_ir_type () == IR_RTL_CFGLAYOUT)
>>> >>> +    BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>>> >>>  }
>>> >>>
>>> >>>  /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.
>>> >>> @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg
>>> >>>  {
>>> >>>    VEC(edge, heap) *crossing_edges = NULL;
>>> >>>    basic_block bb;
>>> >>> -  edge e;
>>> >>> -  edge_iterator ei;
>>> >>> +  edge e, e2;
>>> >>> +  edge_iterator ei, ei2;
>>> >>> +  unsigned int cold_bb_count = 0;
>>> >>> +  VEC (basic_block, heap) *bbs_in_hot_partition = NULL;
>>> >>> +  VEC (basic_block, heap) *bbs_newly_hot = NULL;
>>> >>>
>>> >>>    /* Mark which partition (hot/cold) each basic block belongs in.  */
>>> >>>    FOR_EACH_BB (bb)
>>> >>>      {
>>> >>>        if (probably_never_executed_bb_p (cfun, bb))
>>> >>> -       BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>>> >>> +        {
>>> >>> +          BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>>> >>> +          cold_bb_count++;
>>> >>> +        }
>>> >>>        else
>>> >>> -       BB_SET_PARTITION (bb, BB_HOT_PARTITION);
>>> >>> +        {
>>> >>> +          BB_SET_PARTITION (bb, BB_HOT_PARTITION);
>>> >>> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb);
>>> >>> +        }
>>> >>>      }
>>> >>>
>>> >>> +  /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of
>>> >>> +     several different possibilities. One is that there are edge weight insanities
>>> >>> +     due to optimization phases that do not properly update basic block profile
>>> >>> +     counts. The second is that the entry of the function may not be hot, because
>>> >>> +     it is entered fewer times than the number of profile training runs, but there
>>> >>> +     is a loop inside the function that causes blocks within the function to be
>>> >>> +     above the threshold for hotness.  */
>>> >>> +  if (cold_bb_count)
>>> >>> +    {
>>> >>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>>> >>> +
>>> >>> +      if (dom_calculated_here)
>>> >>> +        calculate_dominance_info (CDI_DOMINATORS);
>>> >>> +
>>> >>> +      /* Keep examining hot bbs until we have either checked them all, or
>>> >>> +         re-marked all cold bbs hot.  */
>>> >>> +      while (! VEC_empty (basic_block, bbs_in_hot_partition)
>>> >>> +             && cold_bb_count)
>>> >>> +        {
>>> >>> +          basic_block dom_bb;
>>> >>> +
>>> >>> +          bb = VEC_pop (basic_block, bbs_in_hot_partition);
>>> >>> +          dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
>>> >>> +
>>> >>> +          /* If bb's immediate dominator is also hot then it is ok.  */
>>> >>> +          if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION)
>>> >>> +            continue;
>>> >>> +
>>> >>> +          /* We have a hot bb with an immediate dominator that is cold.
>>> >>> +             The dominator needs to be re-marked to hot.  */
>>> >>> +          BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION);
>>> >>> +          cold_bb_count--;
>>> >>> +
>>> >>> +          /* Now we need to examine newly-hot dom_bb to see if it is also
>>> >>> +             dominated by a cold bb.  */
>>> >>> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb);
>>> >>> +
>>> >>> +          /* We should also adjust any cold blocks that the newly-hot bb
>>> >>> +             feeds and see if it makes sense to re-mark those as hot as
>>> >>> +             well.  */
>>> >>> +          VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb);
>>> >>> +          while (! VEC_empty (basic_block, bbs_newly_hot))
>>> >>> +            {
>>> >>> +              basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot);
>>> >>> +              /* Examine all successors of this newly-hot bb to see if they
>>> >>> +                 are cold and should be re-marked as hot.  */
>>> >>> +              FOR_EACH_EDGE (e, ei, new_hot_bb->succs)
>>> >>> +                {
>>> >>> +                  bool any_cold_preds = false;
>>> >>> +                  basic_block succ = e->dest;
>>> >>> +                  if (BB_PARTITION (succ) != BB_COLD_PARTITION)
>>> >>> +                    continue;
>>> >>> +                  /* Does this block have any cold predecessors now?  */
>>> >>> +                  FOR_EACH_EDGE (e2, ei2, succ->preds)
>>> >>> +                  {
>>> >>> +                    if (BB_PARTITION (e2->src) == BB_COLD_PARTITION)
>>> >>> +                      {
>>> >>> +                        any_cold_preds = true;
>>> >>> +                        break;
>>> >>> +                      }
>>> >>> +                  }
>>> >>> +                  if (any_cold_preds)
>>> >>> +                    continue;
>>> >>> +
>>> >>> +                  /* Here we have a successor of newly-hot bb that is cold
>>> >>> +                     but no longer has any cold precessessors. Since the original
>>> >>> +                     assignment of our newly-hot bb was incorrect, this successor's
>>> >>> +                     assignment as cold is also suspect. Go ahead and re-mark it
>>> >>> +                     as hot now too. Better heuristics may be in order here.  */
>>> >>> +                  BB_SET_PARTITION (succ, BB_HOT_PARTITION);
>>> >>> +                  cold_bb_count--;
>>> >>> +                  VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ);
>>> >>> +                  /* Examine this successor as a newly-hot bb.  */
>>> >>> +                  VEC_safe_push (basic_block, heap, bbs_newly_hot, succ);
>>> >>> +                }
>>> >>> +            }
>>> >>> +        }
>>> >>> +
>>> >>> +      if (dom_calculated_here)
>>> >>> +        free_dominance_info (CDI_DOMINATORS);
>>> >>> +    }
>>> >>> +
>>> >>>    /* The format of .gcc_except_table does not allow landing pads to
>>> >>>       be in a different partition as the throw.  Fix this by either
>>> >>>       moving or duplicating the landing pads.  */
>>> >>> @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void)
>>> >>>                       new_bb->aux = cur_bb->aux;
>>> >>>                       cur_bb->aux = new_bb;
>>> >>>
>>> >>> -                     /* Make sure new fall-through bb is in same
>>> >>> -                        partition as bb it's falling through from.  */
>>> >>> +                      /* This is done by force_nonfallthru_and_redirect.  */
>>> >>> +                     gcc_assert (BB_PARTITION (new_bb)
>>> >>> +                                  == BB_PARTITION (cur_bb));
>>> >>>
>>> >>> -                     BB_COPY_PARTITION (new_bb, cur_bb);
>>> >>>                       single_succ_edge (new_bb)->flags |= EDGE_CROSSING;
>>> >>>                     }
>>> >>>                   else
>>> >>> @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void)
>>> >>>    FOR_EACH_BB (bb)
>>> >>>      FOR_EACH_EDGE (e, ei, bb->succs)
>>> >>>        if ((e->flags & EDGE_CROSSING)
>>> >>> -         && JUMP_P (BB_END (e->src)))
>>> >>> +         && JUMP_P (BB_END (e->src))
>>> >>> +          /* Some notes were added during fix_up_fall_thru_edges, via
>>> >>> +             force_nonfallthru_and_redirect.  */
>>> >>> +          && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX))
>>> >>>         add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>> >>>  }
>>> >>>
>>> >>> @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void)
>>> >>>        dump_flow_info (dump_file, dump_flags);
>>> >>>      }
>>> >>>
>>> >>> -  if (flag_reorder_blocks_and_partition)
>>> >>> +  if (crtl->has_bb_partition)
>>> >>>      verify_hot_cold_block_grouping ();
>>> >>>  }
>>> >>>
>>> >>> @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void)
>>> >>>     encountering this note will make the compiler switch between the
>>> >>>     hot and cold text sections.  */
>>> >>>
>>> >>> -static void
>>> >>> +void
>>> >>>  insert_section_boundary_note (void)
>>> >>>  {
>>> >>>    basic_block bb;
>>> >>>    rtx new_note;
>>> >>>    int first_partition = 0;
>>> >>>
>>> >>> -  if (!flag_reorder_blocks_and_partition)
>>> >>> +  if (!crtl->has_bb_partition)
>>> >>>      return;
>>> >>>
>>> >>>    FOR_EACH_BB (bb)
>>> >>> @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void)
>>> >>>    FOR_EACH_BB (bb)
>>> >>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>>> >>>        bb->aux = bb->next_bb;
>>> >>> -  cfg_layout_finalize ();
>>> >>> +  cfg_layout_finalize (true);
>>> >>>
>>> >>> -  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
>>> >>> -  insert_section_boundary_note ();
>>> >>>    return 0;
>>> >>>  }
>>> >>>
>>> >>> @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void)
>>> >>>      }
>>> >>>
>>> >>>  done:
>>> >>> -  cfg_layout_finalize ();
>>> >>> +  cfg_layout_finalize (false);
>>> >>>
>>> >>>    BITMAP_FREE (candidates);
>>> >>>    return 0;
>>> >>> @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void)
>>> >>>    if (crossing_edges == NULL)
>>> >>>      return 0;
>>> >>>
>>> >>> +  crtl->has_bb_partition = true;
>>> >>> +
>>> >>>    /* Make sure the source of any crossing edge ends in a jump and the
>>> >>>       destination of any crossing edge has a label.  */
>>> >>>    add_labels_and_missing_jumps (crossing_edges);
>>> >>> Index: bb-reorder.h
>>> >>> ===================================================================
>>> >>> --- bb-reorder.h        (revision 193376)
>>> >>> +++ bb-reorder.h        (working copy)
>>> >>> @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re
>>> >>>
>>> >>>  extern int get_uncond_jump_length (void);
>>> >>>
>>> >>> +extern void insert_section_boundary_note (void);
>>> >>> +
>>> >>> +extern void emit_barrier_after_bb (basic_block bb);
>>> >>> +
>>> >>>  #endif
>>> >>> Index: basic-block.h
>>> >>> ===================================================================
>>> >>> --- basic-block.h       (revision 193376)
>>> >>> +++ basic-block.h       (working copy)
>>> >>> @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect
>>> >>>  extern bool contains_no_active_insn_p (const_basic_block);
>>> >>>  extern bool forwarder_block_p (const_basic_block);
>>> >>>  extern bool can_fallthru (basic_block, basic_block);
>>> >>> +extern void fixup_partitions (void);
>>> >>>
>>> >>>  /* In cfgbuild.c.  */
>>> >>>  extern void find_many_sub_basic_blocks (sbitmap);
>>> >>> Index: cfgrtl.c
>>> >>> ===================================================================
>>> >>> --- cfgrtl.c    (revision 193376)
>>> >>> +++ cfgrtl.c    (working copy)
>>> >>> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
>>> >>>  #include "tree.h"
>>> >>>  #include "hard-reg-set.h"
>>> >>>  #include "basic-block.h"
>>> >>> +#include "bb-reorder.h"
>>> >>>  #include "regs.h"
>>> >>>  #include "flags.h"
>>> >>>  #include "function.h"
>>> >>> @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3.  If not see
>>> >>>     Only applicable if the CFG is in cfglayout mode.  */
>>> >>>  static GTY(()) rtx cfg_layout_function_footer;
>>> >>>  static GTY(()) rtx cfg_layout_function_header;
>>> >>> +static bool had_sec_boundary_notes;
>>> >>>
>>> >>>  static rtx skip_insns_after_block (basic_block);
>>> >>>  static void record_effective_endpoints (void);
>>> >>>  static rtx label_for_bb (basic_block);
>>> >>> -static void fixup_reorder_chain (void);
>>> >>> +static void fixup_reorder_chain (bool finalize_reorder_blocks);
>>> >>>
>>> >>>  void verify_insn_chain (void);
>>> >>>  static void fixup_fallthru_exit_predecessor (void);
>>> >>> @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc
>>> >>>       partition boundaries).  See  the comments at the top of
>>> >>>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>>> >>>
>>> >>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
>>> >>> -      || BB_PARTITION (src) != BB_PARTITION (target))
>>> >>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>>> >>>      return NULL;
>>> >>>
>>> >>>    /* We can replace or remove a complex jump only when we have exactly
>>> >>> @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target)
>>> >>>    return e;
>>> >>>  }
>>> >>>
>>> >>> +/* Called when edge E has been redirected to a new destination,
>>> >>> +   in order to update the region crossing flag on the edge and
>>> >>> +   jump.  */
>>> >>> +
>>> >>> +static void
>>> >>> +fixup_partition_crossing (edge e, basic_block target)
>>> >>> +{
>>> >>> +  rtx note;
>>> >>> +
>>> >>> +  gcc_assert (e->dest == target);
>>> >>> +
>>> >>> +  if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR)
>>> >>> +    return;
>>> >>> +  /* If we redirected an existing edge, it may already be marked
>>> >>> +     crossing, even though the new src is missing a reg crossing note.
>>> >>> +     But make sure reg crossing note doesn't already exist before
>>> >>> +     inserting.  */
>>> >>> +  if (BB_PARTITION (e->src) != BB_PARTITION (target))
>>> >>> +    {
>>> >>> +      e->flags |= EDGE_CROSSING;
>>> >>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>> >>> +      if (JUMP_P (BB_END (e->src))
>>> >>> +          && !note)
>>> >>> +        add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>> >>> +    }
>>> >>> +  else if (BB_PARTITION (e->src) == BB_PARTITION (target))
>>> >>> +    {
>>> >>> +      e->flags &= ~EDGE_CROSSING;
>>> >>> +      /* Remove the region crossing note from jump at end of
>>> >>> +         e->src if it exists.  */
>>> >>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>> >>> +      if (note)
>>> >>> +        remove_note (BB_END (e->src), note);
>>> >>> +    }
>>> >>> +}
>>> >>> +
>>> >>> +/* Called when block BB has been reassigned to a different partition,
>>> >>> +   to ensure that the region crossing attributes are updated.  */
>>> >>> +
>>> >>> +static void
>>> >>> +fixup_bb_partition (basic_block bb)
>>> >>> +{
>>> >>> +  edge e;
>>> >>> +  edge_iterator ei;
>>> >>> +
>>> >>> +  /* Now need to make bb's pred edges non-region crossing.  */
>>> >>> +  FOR_EACH_EDGE (e, ei, bb->preds)
>>> >>> +    {
>>> >>> +      fixup_partition_crossing (e, e->dest);
>>> >>> +    }
>>> >>> +
>>> >>> +  /* Possibly need to make bb's successor edges region crossing,
>>> >>> +     or remove stale region crossing.  */
>>> >>> +  FOR_EACH_EDGE (e, ei, bb->succs)
>>> >>> +    {
>>> >>> +      if ((e->flags & EDGE_FALLTHRU)
>>> >>> +          && BB_PARTITION (bb) != BB_PARTITION (e->dest)
>>> >>> +          && e->dest != EXIT_BLOCK_PTR)
>>> >>> +        /* force_nonfallthru_and_redirect calls fixup_partition_crossing.  */
>>> >>> +        force_nonfallthru (e);
>>> >>> +      else
>>> >>> +        fixup_partition_crossing (e, e->dest);
>>> >>> +    }
>>> >>> +}
>>> >>> +
>>> >>>  /* Attempt to change code to redirect edge E to TARGET.  Don't do that on
>>> >>>     expense of adding new instructions or reordering basic blocks.
>>> >>>
>>> >>> @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>>> >>>  {
>>> >>>    edge ret;
>>> >>>    basic_block src = e->src;
>>> >>> +  basic_block dest = e->dest;
>>> >>>
>>> >>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>>> >>>      return NULL;
>>> >>>
>>> >>> -  if (e->dest == target)
>>> >>> +  if (dest == target)
>>> >>>      return e;
>>> >>>
>>> >>>    if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL)
>>> >>>      {
>>> >>>        df_set_bb_dirty (src);
>>> >>> +      fixup_partition_crossing (ret, target);
>>> >>>        return ret;
>>> >>>      }
>>> >>>
>>> >>> @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>>> >>>      return NULL;
>>> >>>
>>> >>>    df_set_bb_dirty (src);
>>> >>> +  fixup_partition_crossing (ret, target);
>>> >>>    return ret;
>>> >>>  }
>>> >>>
>>> >>> @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>>> >>>        /* Make sure new block ends up in correct hot/cold section.  */
>>> >>>
>>> >>>        BB_COPY_PARTITION (jump_block, e->src);
>>> >>> -      if (flag_reorder_blocks_and_partition
>>> >>> -         && targetm_common.have_named_sections
>>> >>> -         && JUMP_P (BB_END (jump_block))
>>> >>> -         && !any_condjump_p (BB_END (jump_block))
>>> >>> -         && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING))
>>> >>> -       add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX);
>>> >>>
>>> >>>        /* Wire edge in.  */
>>> >>>        new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU);
>>> >>>        new_edge->probability = probability;
>>> >>>        new_edge->count = count;
>>> >>>
>>> >>> +      /* If e->src was previously region crossing, it no longer is
>>> >>> +         and the reg crossing note should be removed.  */
>>> >>> +      fixup_partition_crossing (new_edge, jump_block);
>>> >>> +
>>> >>>        /* Redirect old edge.  */
>>> >>>        redirect_edge_pred (e, jump_block);
>>> >>>        e->probability = REG_BR_PROB_BASE;
>>> >>> @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>>> >>>        LABEL_NUSES (label)++;
>>> >>>      }
>>> >>>
>>> >>> -  emit_barrier_after (BB_END (jump_block));
>>> >>> +  /* We might be in cfg layout mode, and if so, the following routine will
>>> >>> +     insert the barrier correctly.  */
>>> >>> +  emit_barrier_after_bb (jump_block);
>>> >>>    redirect_edge_succ_nodup (e, target);
>>> >>>
>>> >>>    if (abnormal_edge_flags)
>>> >>>      make_edge (src, target, abnormal_edge_flags);
>>> >>>
>>> >>>    df_mark_solutions_dirty ();
>>> >>> +  fixup_partition_crossing (e, target);
>>> >>>    return new_bb;
>>> >>>  }
>>> >>>
>>> >>> @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU
>>> >>>  static basic_block
>>> >>>  rtl_split_edge (edge edge_in)
>>> >>>  {
>>> >>> -  basic_block bb;
>>> >>> +  basic_block bb, new_bb;
>>> >>>    rtx before;
>>> >>>
>>> >>>    /* Abnormal edges cannot be split.  */
>>> >>> @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in)
>>> >>>    else
>>> >>>      {
>>> >>>        bb = create_basic_block (before, NULL, edge_in->dest->prev_bb);
>>> >>> -      /* ??? Why not edge_in->dest->prev_bb here?  */
>>> >>> -      BB_COPY_PARTITION (bb, edge_in->dest);
>>> >>> +      if (edge_in->src == ENTRY_BLOCK_PTR)
>>> >>> +        BB_COPY_PARTITION (bb, edge_in->dest);
>>> >>> +      else
>>> >>> +        /* Put the split bb into the src partition, to avoid creating
>>> >>> +           a situation where a cold bb dominates a hot bb, in the case
>>> >>> +           where src is cold and dest is hot. The src will dominate
>>> >>> +           the new bb (whereas it might not have dominated dest).  */
>>> >>> +        BB_COPY_PARTITION (bb, edge_in->src);
>>> >>>      }
>>> >>>
>>> >>>    make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU);
>>> >>>
>>> >>> +  /* Can't allow a region crossing edge to be fallthrough.  */
>>> >>> +  if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest)
>>> >>> +      && edge_in->dest != EXIT_BLOCK_PTR)
>>> >>> +    {
>>> >>> +      new_bb = force_nonfallthru (single_succ_edge (bb));
>>> >>> +      gcc_assert (!new_bb);
>>> >>> +    }
>>> >>> +
>>> >>>    /* For non-fallthru edges, we must adjust the predecessor's
>>> >>>       jump instruction to target our new block.  */
>>> >>>    if ((edge_in->flags & EDGE_FALLTHRU) == 0)
>>> >>> @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e)
>>> >>>    else
>>> >>>      {
>>> >>>        bb = split_edge (e);
>>> >>> -      after = BB_END (bb);
>>> >>>
>>> >>> -      if (flag_reorder_blocks_and_partition
>>> >>> -         && targetm_common.have_named_sections
>>> >>> -         && e->src != ENTRY_BLOCK_PTR
>>> >>> -         && BB_PARTITION (e->src) == BB_COLD_PARTITION
>>> >>> -         && !(e->flags & EDGE_CROSSING)
>>> >>> -         && JUMP_P (after)
>>> >>> -         && !any_condjump_p (after)
>>> >>> -         && (single_succ_edge (bb)->flags & EDGE_CROSSING))
>>> >>> -       add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX);
>>> >>> +      /* If e crossed a partition boundary, we needed to make bb end in
>>> >>> +         a region-crossing jump, even though it was originally fallthru.  */
>>> >>> +      if (JUMP_P (BB_END (bb)))
>>> >>> +       before = BB_END (bb);
>>> >>> +      else
>>> >>> +        after = BB_END (bb);
>>> >>>      }
>>> >>>
>>> >>>    /* Now that we've found the spot, do the insertion.  */
>>> >>> @@ -1827,6 +1907,14 @@ commit_edge_insertions (void)
>>> >>>  {
>>> >>>    basic_block bb;
>>> >>>
>>> >>> +  /* Optimization passes that invoke this routine can cause hot blocks
>>> >>> +     previously reached by both hot and cold blocks to become dominated only
>>> >>> +     by cold blocks. This will cause the verification below to fail,
>>> >>> +     and lead to now cold code in the hot section. In some cases this
>>> >>> +     may only be visible after newly unreachable blocks are deleted,
>>> >>> +     which will be done by fixup_partitions.  */
>>> >>> +  fixup_partitions ();
>>> >>> +
>>> >>>  #ifdef ENABLE_CHECKING
>>> >>>    verify_flow_info ();
>>> >>>  #endif
>>> >>> @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb)
>>> >>>
>>> >>>    return end;
>>> >>>  }
>>> >>> -
>>> >>> +
>>> >>> +/* Perform cleanup on the hot/cold bb partitioning after optimization
>>> >>> +   passes that modify the cfg.  */
>>> >>> +
>>> >>> +void
>>> >>> +fixup_partitions (void)
>>> >>> +{
>>> >>> +  basic_block bb;
>>> >>> +
>>> >>> +  if (!crtl->has_bb_partition)
>>> >>> +    return;
>>> >>> +
>>> >>> +  /* Delete any blocks that became unreachable and weren't
>>> >>> +     already cleaned up, for example during edge forwarding
>>> >>> +     and convert_jumps_to_returns. This will expose more
>>> >>> +     opportunities for fixing the partition boundaries here.
>>> >>> +     Also, the calculation of the dominance graph during verification
>>> >>> +     will assert if there are unreachable nodes.  */
>>> >>> +  delete_unreachable_blocks ();
>>> >>> +
>>> >>> +  /* If there are partitions, do a sanity check on them: A basic block in
>>> >>> +     a cold partition cannot dominate a basic block in a hot partition.
>>> >>> +     Fixup any that now violate this requirement, as a result of edge
>>> >>> +     forwarding and unreachable block deletion.  */
>>> >>> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
>>> >>> +  VEC (basic_block, heap) *bbs_to_fix = NULL;
>>> >>> +  FOR_EACH_BB (bb)
>>> >>> +    if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
>>> >>> +      VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
>>> >>> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
>>> >>> +    {
>>> >>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>>> >>> +      basic_block son;
>>> >>> +
>>> >>> +      if (dom_calculated_here)
>>> >>> +        calculate_dominance_info (CDI_DOMINATORS);
>>> >>> +
>>> >>> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
>>> >>> +        {
>>> >>> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
>>> >>> +          /* If bb is not yet cold (because it was added below as
>>> >>> +             a block dominated by a cold bb) then mark it cold here.  */
>>> >>> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
>>> >>> +            {
>>> >>> +              BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>>> >>> +              VEC_safe_push (basic_block, heap, bbs_to_fix, bb);
>>> >>> +            }
>>> >>> +          /* Any blocks dominated by a block in the cold section
>>> >>> +             must also be cold.  */
>>> >>> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
>>> >>> +               son;
>>> >>> +               son = next_dom_son (CDI_DOMINATORS, son))
>>> >>> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
>>> >>> +        }
>>> >>> +
>>> >>> +      if (dom_calculated_here)
>>> >>> +        free_dominance_info (CDI_DOMINATORS);
>>> >>> +    }
>>> >>> +
>>> >>> +  /* Do the partition fixup after all necessary blocks have been converted to
>>> >>> +     cold, so that we only update the region crossings the minimum number of
>>> >>> +     places, which can require forcing edges to be non fallthru.  */
>>> >>> +  while (! VEC_empty (basic_block, bbs_to_fix))
>>> >>> +    {
>>> >>> +      bb = VEC_pop (basic_block, bbs_to_fix);
>>> >>> +      fixup_bb_partition (bb);
>>> >>> +    }
>>> >>> +}
>>> >>> +
>>> >>>  /* Verify the CFG and RTL consistency common for both underlying RTL and
>>> >>>     cfglayout RTL.
>>> >>>
>>> >>> @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void)
>>> >>>    rtx x;
>>> >>>    int err = 0;
>>> >>>    basic_block bb;
>>> >>> +  bool have_partitions = false;
>>> >>>
>>> >>>    /* Check the general integrity of the basic blocks.  */
>>> >>>    FOR_EACH_BB_REVERSE (bb)
>>> >>> @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void)
>>> >>>
>>> >>>           if (e->flags & EDGE_ABNORMAL)
>>> >>>             n_abnormal++;
>>> >>> +
>>> >>> +          have_partitions |= is_crossing;
>>> >>>         }
>>> >>>
>>> >>>        if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX))
>>> >>> @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void)
>>> >>>           }
>>> >>>      }
>>> >>>
>>> >>> +  /* If there are partitions, do a sanity check on them: A basic block in
>>> >>> +     a cold partition cannot dominate a basic block in a hot partition.  */
>>> >>> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
>>> >>> +  if (have_partitions && !err)
>>> >>> +    FOR_EACH_BB (bb)
>>> >>> +      if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
>>> >>> +        VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
>>> >>> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
>>> >>> +    {
>>> >>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>>> >>> +      basic_block son;
>>> >>> +
>>> >>> +      if (dom_calculated_here)
>>> >>> +        calculate_dominance_info (CDI_DOMINATORS);
>>> >>> +
>>> >>> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
>>> >>> +        {
>>> >>> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
>>> >>> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
>>> >>> +            {
>>> >>> +              error ("non-cold basic block %d dominated "
>>> >>> +                     "by a block in the cold partition", bb->index);
>>> >>> +              err = 1;
>>> >>> +            }
>>> >>> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
>>> >>> +               son;
>>> >>> +               son = next_dom_son (CDI_DOMINATORS, son))
>>> >>> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
>>> >>> +        }
>>> >>> +
>>> >>> +      if (dom_calculated_here)
>>> >>> +        free_dominance_info (CDI_DOMINATORS);
>>> >>> +    }
>>> >>> +
>>> >>>    /* Clean up.  */
>>> >>>    return err;
>>> >>>  }
>>> >>> @@ -2965,14 +3158,41 @@ record_effective_endpoints (void)
>>> >>>    else
>>> >>>      cfg_layout_function_header = NULL_RTX;
>>> >>>
>>> >>> +  had_sec_boundary_notes = false;
>>> >>> +
>>> >>>    next_insn = get_insns ();
>>> >>>    FOR_EACH_BB (bb)
>>> >>>      {
>>> >>>        rtx end;
>>> >>>
>>> >>>        if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb))
>>> >>> -       BB_HEADER (bb) = unlink_insn_chain (next_insn,
>>> >>> -                                             PREV_INSN (BB_HEAD (bb)));
>>> >>> +        {
>>> >>> +          /* Rather than try to keep section boundary notes incrementally
>>> >>> +             up-to-date through cfg layout optimizations, simply remove them
>>> >>> +             and flag that they should be re-inserted when exiting
>>> >>> +             cfg layout mode.  */
>>> >>> +          rtx check_insn = next_insn;
>>> >>> +          while (check_insn)
>>> >>> +            {
>>> >>> +              if (NOTE_P (check_insn)
>>> >>> +                  && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS)
>>> >>> +              {
>>> >>> +                had_sec_boundary_notes |= true;
>>> >>> +                /* Remove note from chain. Grab new next_insn first.  */
>>> >>> +                if (next_insn == check_insn)
>>> >>> +                  next_insn = NEXT_INSN (check_insn);
>>> >>> +                /* Delete note.  */
>>> >>> +                delete_insn (check_insn);
>>> >>> +                /* There will only be one.  */
>>> >>> +                break;
>>> >>> +              }
>>> >>> +              check_insn = NEXT_INSN (check_insn);
>>> >>> +            }
>>> >>> +          /* If we still have header instructions left after above loop.  */
>>> >>> +          if (next_insn != BB_HEAD (bb))
>>> >>> +            BB_HEADER (bb) = unlink_insn_chain (next_insn,
>>> >>> +                                                PREV_INSN (BB_HEAD (bb)));
>>> >>> +        }
>>> >>>        end = skip_insns_after_block (bb);
>>> >>>        if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end)
>>> >>>         BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end);
>>> >>> @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void)
>>> >>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>>> >>>        bb->aux = bb->next_bb;
>>> >>>
>>> >>> -  cfg_layout_finalize ();
>>> >>> +  cfg_layout_finalize (false);
>>> >>>
>>> >>>    return 0;
>>> >>>  }
>>> >>> @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode)
>>> >>>  }
>>> >>>
>>> >>>
>>> >>> -/* Given a reorder chain, rearrange the code to match.  */
>>> >>> +/* Given a reorder chain, rearrange the code to match. If
>>> >>> +   this is called when we will FINALIZE_REORDER_BLOCKS, or when
>>> >>> +   section boundary notes were removed on entry to cfg layout
>>> >>> +   mode, insert section boundary notes here.  */
>>> >>>
>>> >>>  static void
>>> >>> -fixup_reorder_chain (void)
>>> >>> +fixup_reorder_chain (bool finalize_reorder_blocks)
>>> >>>  {
>>> >>>    basic_block bb;
>>> >>>    rtx insn = NULL;
>>> >>> @@ -3150,7 +3373,7 @@ static void
>>> >>>           PREV_INSN (BB_HEADER (bb)) = insn;
>>> >>>           insn = BB_HEADER (bb);
>>> >>>           while (NEXT_INSN (insn))
>>> >>> -           insn = NEXT_INSN (insn);
>>> >>> +            insn = NEXT_INSN (insn);
>>> >>>         }
>>> >>>        if (insn)
>>> >>>         NEXT_INSN (insn) = BB_HEAD (bb);
>>> >>> @@ -3175,6 +3398,11 @@ static void
>>> >>>      insn = NEXT_INSN (insn);
>>> >>>
>>> >>>    set_last_insn (insn);
>>> >>> +
>>> >>> +  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
>>> >>> +  if (had_sec_boundary_notes || finalize_reorder_blocks)
>>> >>> +    insert_section_boundary_note ();
>>> >>> +
>>> >>>  #ifdef ENABLE_CHECKING
>>> >>>    verify_insn_chain ();
>>> >>>  #endif
>>> >>> @@ -3187,7 +3415,7 @@ static void
>>> >>>        edge e_fall, e_taken, e;
>>> >>>        rtx bb_end_insn;
>>> >>>        rtx ret_label = NULL_RTX;
>>> >>> -      basic_block nb, src_bb;
>>> >>> +      basic_block nb;
>>> >>>        edge_iterator ei;
>>> >>>
>>> >>>        if (EDGE_COUNT (bb->succs) == 0)
>>> >>> @@ -3322,7 +3550,6 @@ static void
>>> >>>        /* We got here if we need to add a new jump insn.
>>> >>>          Note force_nonfallthru can delete E_FALL and thus we have to
>>> >>>          save E_FALL->src prior to the call to force_nonfallthru.  */
>>> >>> -      src_bb = e_fall->src;
>>> >>>        nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label);
>>> >>>        if (nb)
>>> >>>         {
>>> >>> @@ -3330,17 +3557,6 @@ static void
>>> >>>           bb->aux = nb;
>>> >>>           /* Don't process this new block.  */
>>> >>>           bb = nb;
>>> >>> -
>>> >>> -         /* Make sure new bb is tagged for correct section (same as
>>> >>> -            fall-thru source, since you cannot fall-thru across
>>> >>> -            section boundaries).  */
>>> >>> -         BB_COPY_PARTITION (src_bb, single_pred (bb));
>>> >>> -         if (flag_reorder_blocks_and_partition
>>> >>> -             && targetm_common.have_named_sections
>>> >>> -             && JUMP_P (BB_END (bb))
>>> >>> -             && !any_condjump_p (BB_END (bb))
>>> >>> -             && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING))
>>> >>> -           add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX);
>>> >>>         }
>>> >>>      }
>>> >>>
>>> >>> @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to)
>>> >>>             case NOTE_INSN_FUNCTION_BEG:
>>> >>>               /* There is always just single entry to function.  */
>>> >>>             case NOTE_INSN_BASIC_BLOCK:
>>> >>> +              /* We should only switch text sections once.  */
>>> >>> +           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>>> >>>               break;
>>> >>>
>>> >>>             case NOTE_INSN_EPILOGUE_BEG:
>>> >>> -           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>>> >>>               emit_note_copy (insn);
>>> >>>               break;
>>> >>>
>>> >>> @@ -3759,10 +3976,13 @@ break_superblocks (void)
>>> >>>  }
>>> >>>
>>> >>>  /* Finalize the changes: reorder insn list according to the sequence specified
>>> >>> -   by aux pointers, enter compensation code, rebuild scope forest.  */
>>> >>> +   by aux pointers, enter compensation code, rebuild scope forest. If
>>> >>> +   this is called when we will FINALIZE_REORDER_BLOCKS, indicate that
>>> >>> +   to fixup_reorder_chain so that it can insert the proper switch text
>>> >>> +   section notes.  */
>>> >>>
>>> >>>  void
>>> >>> -cfg_layout_finalize (void)
>>> >>> +cfg_layout_finalize (bool finalize_reorder_blocks)
>>> >>>  {
>>> >>>  #ifdef ENABLE_CHECKING
>>> >>>    verify_flow_info ();
>>> >>> @@ -3775,7 +3995,7 @@ void
>>> >>>  #endif
>>> >>>        )
>>> >>>      fixup_fallthru_exit_predecessor ();
>>> >>> -  fixup_reorder_chain ();
>>> >>> +  fixup_reorder_chain (finalize_reorder_blocks);
>>> >>>
>>> >>>    rebuild_jump_labels (get_insns ());
>>> >>>    delete_dead_jumptables ();
>>> >>> @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e)
>>> >>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>>> >>>      return false;
>>> >>>
>>> >>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
>>> >>> -      || BB_PARTITION (src) != BB_PARTITION (target))
>>> >>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>>> >>>      return false;
>>> >>>
>>> >>>    if (!onlyjump_p (insn)
>>> >>>
>>> >>> --
>>> >>> This patch is available for review at http://codereview.appspot.com/6823047
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>>
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
  2012-11-28 15:49           ` Christophe Lyon
@ 2012-11-28 15:57             ` Teresa Johnson
  2012-11-28 17:03               ` Christophe Lyon
       [not found]             ` <CAAe5K+UOyQrDyg=pY7za9YRK=8-3dVVsfcMuJdsJp4w2X6BaJg@mail.gmail.com>
  1 sibling, 1 reply; 35+ messages in thread
From: Teresa Johnson @ 2012-11-28 15:57 UTC (permalink / raw)
  To: Christophe Lyon
  Cc: Jack Howarth, reply, David Li, Steven Bosscher,
	Matthew Gretton-Dann, gcc-patches

Is this with the same target compiler and options used in PR55121? I
will try to reproduce the compile-time failures with arm and those
options if so. I haven't seen those with spec2006 linux x86_64. I'm
not sure how to test the runtime behavior though.

Thanks,
Teresa

On Wed, Nov 28, 2012 at 7:48 AM, Christophe Lyon
<christophe.lyon@linaro.org> wrote:
> I have updated my trunk checkout, and I can confirm that eval.c now
> compiles with your patch (and the other 4 patches I added to PR55121).
>
> Now, when looking at the whole Spec2k results:
> - vpr passes now (used to fail)
> - gcc, parser, perlbmk bzip2 and twolf no longer build: they all fail
> with the same error from gas:
> can't resolve `.text.unlikely' {.text.unlikely section} - `.LBB171'
> {.text section}
> - gap still does not build (same error as above)
>
> I haven't looked in detail, so I may be missing an obvious patch here.
>
> And I still observe runtime mis-behaviour on crafty, galgel, facerec and fma3d.
>
> Thanks
> Christophe.
>
>
> On 26 November 2012 21:52, Teresa Johnson <tejohnson@google.com> wrote:
>> Sorry, I don't know what happened there. Patch is attached.
>> Thanks,
>> Teresa
>>
>> On Mon, Nov 26, 2012 at 12:42 PM, Jack Howarth <howarth@bromo.med.uc.edu> wrote:
>>> On Mon, Nov 26, 2012 at 12:19:55PM -0800, Teresa Johnson wrote:
>>>> Are you sure you have all my changes applied? I applied the 4 patches
>>>> attached to PR55121 into my trunk checkout that has my fixes, and to a
>>>> pristine trunk checkout. I configured and built both for
>>>> --target=arm-none-linux-gnueabi, and built using your options, .i file
>>>> and gcda file. I can reproduce the failure using the pristine trunk
>>>> with your patches but not with my fixed trunk + your patches. (I just
>>>> updated to head to pickup recent changes and get the same result. The
>>>> vec changes required some manual changes to the patch, which I will
>>>> resend shortly.)
>>>
>>> Teresa,
>>>     Your mailer seems to have corrupted the posted patch with stray
>>> =3D characters and line breaks. Can you repost a copy as an attachment
>>> to the list?
>>>              Jack
>>>
>>>>
>>>> Without my fixes:
>>>>
>>>> $ ~/extra/gcc_trunk_3_arm-eabi/gcc/cc1 -fpreproce
>>>> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9
>>>> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp
>>>> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use
>>>> -fno-common -o eval.s -freorder-blocks-and-partition
>>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
>>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
>>>> 2.4.2-p1, MPC version 0.8.1
>>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
>>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
>>>> 2.4.2-p1, MPC version 0.8.1
>>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>>>> Compiler executable checksum: d19cc60a2f07de08237a8488bb35cd1a
>>>> eval.c: In function ‘Ge’:
>>>> eval.c:792:1: internal compiler error: in df_compact_blocks, at df-core.c:1560
>>>>  }
>>>>  ^
>>>> 0x622f71 df_compact_blocks()
>>>> ../../gcc_trunk_3/gcc/df-core.c:1560
>>>> 0x5cfcb5 compact_blocks()
>>>> ../../gcc_trunk_3/gcc/cfg.c:162
>>>> 0xc9dce0 reorder_basic_blocks
>>>> ../../gcc_trunk_3/gcc/bb-reorder.c:2154
>>>> 0xc9dce0 rest_of_handle_reorder_blocks
>>>> ../../gcc_trunk_3/gcc/bb-reorder.c:2219
>>>> Please submit a full bug report,
>>>> with preprocessed source if appropriate.
>>>> Please include the complete backtrace with any bug report.
>>>> See <http://gcc.gnu.org/bugs.html> for instructions.
>>>>
>>>>
>>>> With my fixes:
>>>>
>>>> $ ~/extra/gcc_trunk_4_arm-eabi/gcc/cc1 -fpreproce
>>>> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9
>>>> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp
>>>> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use
>>>> -fno-common -o eval.s -freorder-blocks-and-partition
>>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
>>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
>>>> 2.4.2-p1, MPC version 0.8.1
>>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
>>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
>>>> 2.4.2-p1, MPC version 0.8.1
>>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>>>> Compiler executable checksum: 45b468efa7c981f9afb44c4dac2424f3
>>>>
>>>>
>>>> Thanks,
>>>> Teresa
>>>>
>>>> On Mon, Nov 26, 2012 at 8:25 AM, Christophe Lyon
>>>> <christophe.lyon@linaro.org> wrote:
>>>> > Hi,
>>>> >
>>>> > I have tested your patch on Spec2000 on ARM, and I can still see
>>>> > several failures caused by:
>>>> > "error: fallthru edge crosses section boundary", including the case
>>>> > described in PR55121.
>>>> >
>>>> > On 26 November 2012 16:55, Teresa Johnson <tejohnson@google.com> wrote:
>>>> >> Ping.
>>>> >> Teresa
>>>> >>
>>>> >> On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote:
>>>> >>> Revised patch that fixes failures encountered when enabling
>>>> >>> -freorder-blocks-and-partition, including the failure reported in PR 53743.
>>>> >>>
>>>> >>> This includes new verification code to ensure no cold blocks dominate hot
>>>> >>> blocks contributed by Steven Bosscher.
>>>> >>>
>>>> >>> I attempted to make the handling of partition updates through the optimization
>>>> >>> passes much more consistent, removing a number of partial fixes in the code
>>>> >>> stream in the process. The code to fixup partitions (including the BB_PARTITION
>>>> >>> assignement, region crossing jump notes, and switch text section notes) is
>>>> >>> now handled in a few centralized locations. For example, inside
>>>> >>> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers
>>>> >>> don't need to attempt the fixup themselves.
>>>> >>>
>>>> >>> For optimization passes that make adjustments to the cfg while in cfg layout
>>>> >>> mode that are not easy to fix up incrementally, the new routine
>>>> >>> fixup_partitions handles the cleanup globally. This does require calculation
>>>> >>> of the dominance relation, however, as far as I can tell the routines which
>>>> >>> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions)
>>>> >>> are invoked typically once (or a small number of times in the case of
>>>> >>> try_optimize_cfg) per optimization pass. Additionally, I compared the
>>>> >>> -ftime-report output for some large fdo compilations and saw only minimal
>>>> >>> increases in the dominance computation times, which were only a tiny percent
>>>> >>> of the overall compile time.
>>>> >>>
>>>> >>> Additionally, I added a flag to the rtl_data structure to indicate whether
>>>> >>> any partitioning was actually performed, so that optimizations which were
>>>> >>> conservatively disabled whenever the flag_reorder_blocks_and_partition
>>>> >>> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less
>>>> >>> conservative for functions where no partitions were formed (e.g. they are
>>>> >>> completely hot).
>>>> >>>
>>>> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int
>>>> >>> benchmarks and internal google benchmarks using profile feedback and
>>>> >>> -freorder-blocks-and-partition to get more coverage. Ok for trunk?
>>>> >>>
>>>> >>> Thanks,
>>>> >>> Teresa
>>>> >>>
>>>> >>> 2012-11-14  Teresa Johnson  <tejohnson@google.com>
>>>> >>>             Steven Bosscher  <steven@gcc.gnu.org>
>>>> >>>
>>>> >>>         * cfghooks.h (cfg_layout_finalize): New parameter.
>>>> >>>         * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize
>>>> >>>         parameter.
>>>> >>>         * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert
>>>> >>>         as this is now done by redirect_edge_and_branch_force.
>>>> >>>         * function.c (thread_prologue_and_epilogue_insns): Insert new bb after
>>>> >>>         barriers, new cfg_layout_finalize parameter, and don't store exit
>>>> >>>         predecessor BB until after it is potentially split.
>>>> >>>         * function.h (struct rtl_data): New flag has_bb_partition.
>>>> >>>         * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter.
>>>> >>>         * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if
>>>> >>>         any blocks in function actually partitioned.
>>>> >>>         (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean
>>>> >>>         up partitioning.
>>>> >>>         * bb-reorder.c (connect_traces): Only look for partitions and skip
>>>> >>>         block copying if any blocks in function actually partitioned.
>>>> >>>         (emit_barrier_after_bb): Handle insertion in non-cfglayout mode.
>>>> >>>         (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure
>>>> >>>         that no cold blocks dominate a hot block.
>>>> >>>         (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert
>>>> >>>         as this is now done by force_nonfallthru_and_redirect.
>>>> >>>         (add_reg_crossing_jump_notes): Handle the fact that some jumps may
>>>> >>>         already be marked with region crossing note.
>>>> >>>         (reorder_basic_blocks): Only need to verify partitions if any
>>>> >>>         blocks in function actually partitioned.
>>>> >>>         (insert_section_boundary_note): Only need to insert note if any
>>>> >>>         blocks in function actually partitioned.
>>>> >>>         (rest_of_handle_reorder_blocks): New cfg_layout_finalize
>>>> >>>         parameter, and remove call to insert_section_boundary_note as this
>>>> >>>         is now called via cfg_layout_finalize/fixup_reorder_chain.
>>>> >>>         (duplicate_computed_gotos): New cfg_layout_finalize
>>>> >>>         parameter.
>>>> >>>         (partition_hot_cold_basic_blocks): Set flag indicating function
>>>> >>>         has bb partitions.
>>>> >>>         * bb-reorder.h: Declare insert_section_boundary_note and
>>>> >>>         emit_barrier_after_bb, which are no longer static.
>>>> >>>         * basic-block.h: Declare new function fixup_partitions.
>>>> >>>         * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary
>>>> >>>         check for region crossing note.
>>>> >>>         (fixup_partition_crossing): New function.
>>>> >>>         (fixup_bb_partition): Ditto.
>>>> >>>         (rtl_redirect_edge_and_branch): Fixup partition boundaries.
>>>> >>>         (force_nonfallthru_and_redirect): Fixup partition boundaries,
>>>> >>>         remove old code that tried to do this. Emit barrier correctly
>>>> >>>         when we are in cfglayout mode.
>>>> >>>         (rtl_split_edge): Correctly fixup partition boundaries.
>>>> >>>         (commit_one_edge_insertion): Remove old code that tried to
>>>> >>>         fixup region crossing edge since this is now handled in
>>>> >>>         split_block, and set up insertion point correctly since
>>>> >>>         block may now end in a jump.
>>>> >>>         (commit_edge_insertions): Invoke fixup_partitions to sanitize partition
>>>> >>>         boundaries after optimizations that modify cfg and before trying to
>>>> >>>         verify the flow info.
>>>> >>>         (fixup_partitions): New function.
>>>> >>>         (rtl_verify_flow_info_1): Add verification that no cold bbs dominate
>>>> >>>         hot bbs.
>>>> >>>         (record_effective_endpoints): Remove region-crossing notes and set flag
>>>> >>>         indicating that they need to be reinserted on exit from cfglayout mode.
>>>> >>>         (outof_cfg_layout_mode): New cfg_layout_finalize parameter.
>>>> >>>         (fixup_reorder_chain): Call insert_section_boundary_note if necessary.
>>>> >>>         Remove old code that attempted to fixup region crossing note as
>>>> >>>         this is now handled in force_nonfallthru_and_redirect.
>>>> >>>         (duplicate_insn_chain): Don't duplicate switch section notes.
>>>> >>>         (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain.
>>>> >>>         (rtl_can_remove_branch_p): Remove unnecessary check for region crossing
>>>> >>>         note.
>>>> >>>
>>>> >>> Index: cfghooks.h
>>>> >>> ===================================================================
>>>> >>> --- cfghooks.h  (revision 193376)
>>>> >>> +++ cfghooks.h  (working copy)
>>>> >>> @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas
>>>> >>>  void account_profile_record (struct profile_record *, int);
>>>> >>>
>>>> >>>  extern void cfg_layout_initialize (unsigned int);
>>>> >>> -extern void cfg_layout_finalize (void);
>>>> >>> +extern void cfg_layout_finalize (bool);
>>>> >>>
>>>> >>>  /* Hooks containers.  */
>>>> >>>  extern struct cfg_hooks gimple_cfg_hooks;
>>>> >>> @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi
>>>> >>>  extern void gimple_register_cfg_hooks (void);
>>>> >>>  extern struct cfg_hooks get_cfg_hooks (void);
>>>> >>>  extern void set_cfg_hooks (struct cfg_hooks);
>>>> >>> -
>>>> >>> Index: modulo-sched.c
>>>> >>> ===================================================================
>>>> >>> --- modulo-sched.c      (revision 193376)
>>>> >>> +++ modulo-sched.c      (working copy)
>>>> >>> @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void)
>>>> >>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>>>> >>>        bb->aux = bb->next_bb;
>>>> >>>    free_dominance_info (CDI_DOMINATORS);
>>>> >>> -  cfg_layout_finalize ();
>>>> >>> +  cfg_layout_finalize (false);
>>>> >>>  #endif /* INSN_SCHEDULING */
>>>> >>>    return 0;
>>>> >>>  }
>>>> >>> Index: ifcvt.c
>>>> >>> ===================================================================
>>>> >>> --- ifcvt.c     (revision 193376)
>>>> >>> +++ ifcvt.c     (working copy)
>>>> >>> @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg
>>>> >>>    if (new_bb)
>>>> >>>      {
>>>> >>>        df_bb_replace (then_bb_index, new_bb);
>>>> >>> -      /* Since the fallthru edge was redirected from test_bb to new_bb,
>>>> >>> -         we need to ensure that new_bb is in the same partition as
>>>> >>> -         test bb (you can not fall through across section boundaries).  */
>>>> >>> -      BB_COPY_PARTITION (new_bb, test_bb);
>>>> >>> +      /* This should have been done above via force_nonfallthru_and_redirect
>>>> >>> +         (possibly called from redirect_edge_and_branch_force).  */
>>>> >>> +      gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb));
>>>> >>>      }
>>>> >>>
>>>> >>>    num_true_changes++;
>>>> >>> Index: function.c
>>>> >>> ===================================================================
>>>> >>> --- function.c  (revision 193376)
>>>> >>> +++ function.c  (working copy)
>>>> >>> @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void)
>>>> >>>                     break;
>>>> >>>                 if (e)
>>>> >>>                   {
>>>> >>> -                   copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)),
>>>> >>> -                                                 NULL_RTX, e->src);
>>>> >>> +                    /* Make sure we insert after any barriers.  */
>>>> >>> +                    rtx end = get_last_bb_insn (e->src);
>>>> >>> +                    copy_bb = create_basic_block (NEXT_INSN (end),
>>>> >>> +                                                  NULL_RTX, e->src);
>>>> >>>                     BB_COPY_PARTITION (copy_bb, e->src);
>>>> >>>                   }
>>>> >>>                 else
>>>> >>> @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void)
>>>> >>>         if (cur_bb->index >= NUM_FIXED_BLOCKS
>>>> >>>             && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS)
>>>> >>>           cur_bb->aux = cur_bb->next_bb;
>>>> >>> -      cfg_layout_finalize ();
>>>> >>> +      cfg_layout_finalize (false);
>>>> >>>      }
>>>> >>>
>>>> >>>  epilogue_done:
>>>> >>> @@ -6517,7 +6519,7 @@ epilogue_done:
>>>> >>>        basic_block simple_return_block_cold = NULL;
>>>> >>>        edge pending_edge_hot = NULL;
>>>> >>>        edge pending_edge_cold = NULL;
>>>> >>> -      basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb;
>>>> >>> +      basic_block exit_pred;
>>>> >>>        int i;
>>>> >>>
>>>> >>>        gcc_assert (entry_edge != orig_entry_edge);
>>>> >>> @@ -6545,6 +6547,12 @@ epilogue_done:
>>>> >>>             else
>>>> >>>               pending_edge_cold = e;
>>>> >>>           }
>>>> >>> +
>>>> >>> +      /* Save a pointer to the exit's predecessor BB for use in
>>>> >>> +         inserting new BBs at the end of the function. Do this
>>>> >>> +         after the call to split_block above which may split
>>>> >>> +         the original exit pred.  */
>>>> >>> +      exit_pred = EXIT_BLOCK_PTR->prev_bb;
>>>> >>>
>>>> >>>        FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e)
>>>> >>>         {
>>>> >>> Index: function.h
>>>> >>> ===================================================================
>>>> >>> --- function.h  (revision 193376)
>>>> >>> +++ function.h  (working copy)
>>>> >>> @@ -459,6 +459,11 @@ struct GTY(()) rtl_data {
>>>> >>>       sched2) and is useful only if the port defines LEAF_REGISTERS.  */
>>>> >>>    bool uses_only_leaf_regs;
>>>> >>>
>>>> >>> +  /* Nonzero if the function being compiled has undergone hot/cold partitioning
>>>> >>> +     (under flag_reorder_blocks_and_partition) and has at least one cold
>>>> >>> +     block.  */
>>>> >>> +  bool has_bb_partition;
>>>> >>> +
>>>> >>>    /* Like regs_ever_live, but 1 if a reg is set or clobbered from an
>>>> >>>       asm.  Unlike regs_ever_live, elements of this array corresponding
>>>> >>>       to eliminable regs (like the frame pointer) are set if an asm
>>>> >>> Index: hw-doloop.c
>>>> >>> ===================================================================
>>>> >>> --- hw-doloop.c (revision 193376)
>>>> >>> +++ hw-doloop.c (working copy)
>>>> >>> @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops)
>>>> >>>        else
>>>> >>>         bb->aux = NULL;
>>>> >>>      }
>>>> >>> -  cfg_layout_finalize ();
>>>> >>> +  cfg_layout_finalize (false);
>>>> >>>    clear_aux_for_blocks ();
>>>> >>>    df_analyze ();
>>>> >>>  }
>>>> >>> Index: cfgcleanup.c
>>>> >>> ===================================================================
>>>> >>> --- cfgcleanup.c        (revision 193376)
>>>> >>> +++ cfgcleanup.c        (working copy)
>>>> >>> @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2,
>>>> >>>       partition boundaries).  See the comments at the top of
>>>> >>>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>>>> >>>
>>>> >>> -  if (flag_reorder_blocks_and_partition && reload_completed)
>>>> >>> +  if (crtl->has_bb_partition && reload_completed)
>>>> >>>      return false;
>>>> >>>
>>>> >>>    /* Search backward through forwarder blocks.  We don't need to worry
>>>> >>> @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode)
>>>> >>>               df_analyze ();
>>>> >>>             }
>>>> >>>
>>>> >>> +         if (changed)
>>>> >>> +            {
>>>> >>> +              /* Edge forwarding in particular can cause hot blocks previously
>>>> >>> +                 reached by both hot and cold blocks to become dominated only
>>>> >>> +                 by cold blocks. This will cause the verification below to fail,
>>>> >>> +                 and lead to now cold code in the hot section. This is not easy
>>>> >>> +                 to detect and fix during edge forwarding, and in some cases
>>>> >>> +                 is only visible after newly unreachable blocks are deleted,
>>>> >>> +                 which will be done in fixup_partitions.  */
>>>> >>> +              fixup_partitions ();
>>>> >>> +
>>>> >>>  #ifdef ENABLE_CHECKING
>>>> >>> -         if (changed)
>>>> >>> -           verify_flow_info ();
>>>> >>> +              verify_flow_info ();
>>>> >>>  #endif
>>>> >>> +            }
>>>> >>>
>>>> >>>           changed_overall |= changed;
>>>> >>>           first_pass = false;
>>>> >>> Index: bb-reorder.c
>>>> >>> ===================================================================
>>>> >>> --- bb-reorder.c        (revision 193376)
>>>> >>> +++ bb-reorder.c        (working copy)
>>>> >>> @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces
>>>> >>>    current_partition = BB_PARTITION (traces[0].first);
>>>> >>>    two_passes = false;
>>>> >>>
>>>> >>> -  if (flag_reorder_blocks_and_partition)
>>>> >>> +  if (crtl->has_bb_partition)
>>>> >>>      for (i = 0; i < n_traces && !two_passes; i++)
>>>> >>>        if (BB_PARTITION (traces[0].first)
>>>> >>>           != BB_PARTITION (traces[i].first))
>>>> >>> @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces
>>>> >>>                       }
>>>> >>>                   }
>>>> >>>
>>>> >>> -             if (flag_reorder_blocks_and_partition)
>>>> >>> +             if (crtl->has_bb_partition)
>>>> >>>                 try_copy = false;
>>>> >>>
>>>> >>>               /* Copy tiny blocks always; copy larger blocks only when the
>>>> >>> @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void)
>>>> >>>    return length;
>>>> >>>  }
>>>> >>>
>>>> >>> -/* Emit a barrier into the footer of BB.  */
>>>> >>> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode.  */
>>>> >>>
>>>> >>> -static void
>>>> >>> +void
>>>> >>>  emit_barrier_after_bb (basic_block bb)
>>>> >>>  {
>>>> >>>    rtx barrier = emit_barrier_after (BB_END (bb));
>>>> >>> -  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>>>> >>> +  if (current_ir_type () == IR_RTL_CFGLAYOUT)
>>>> >>> +    BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>>>> >>>  }
>>>> >>>
>>>> >>>  /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.
>>>> >>> @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg
>>>> >>>  {
>>>> >>>    VEC(edge, heap) *crossing_edges = NULL;
>>>> >>>    basic_block bb;
>>>> >>> -  edge e;
>>>> >>> -  edge_iterator ei;
>>>> >>> +  edge e, e2;
>>>> >>> +  edge_iterator ei, ei2;
>>>> >>> +  unsigned int cold_bb_count = 0;
>>>> >>> +  VEC (basic_block, heap) *bbs_in_hot_partition = NULL;
>>>> >>> +  VEC (basic_block, heap) *bbs_newly_hot = NULL;
>>>> >>>
>>>> >>>    /* Mark which partition (hot/cold) each basic block belongs in.  */
>>>> >>>    FOR_EACH_BB (bb)
>>>> >>>      {
>>>> >>>        if (probably_never_executed_bb_p (cfun, bb))
>>>> >>> -       BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>>>> >>> +        {
>>>> >>> +          BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>>>> >>> +          cold_bb_count++;
>>>> >>> +        }
>>>> >>>        else
>>>> >>> -       BB_SET_PARTITION (bb, BB_HOT_PARTITION);
>>>> >>> +        {
>>>> >>> +          BB_SET_PARTITION (bb, BB_HOT_PARTITION);
>>>> >>> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb);
>>>> >>> +        }
>>>> >>>      }
>>>> >>>
>>>> >>> +  /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of
>>>> >>> +     several different possibilities. One is that there are edge weight insanities
>>>> >>> +     due to optimization phases that do not properly update basic block profile
>>>> >>> +     counts. The second is that the entry of the function may not be hot, because
>>>> >>> +     it is entered fewer times than the number of profile training runs, but there
>>>> >>> +     is a loop inside the function that causes blocks within the function to be
>>>> >>> +     above the threshold for hotness.  */
>>>> >>> +  if (cold_bb_count)
>>>> >>> +    {
>>>> >>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>>>> >>> +
>>>> >>> +      if (dom_calculated_here)
>>>> >>> +        calculate_dominance_info (CDI_DOMINATORS);
>>>> >>> +
>>>> >>> +      /* Keep examining hot bbs until we have either checked them all, or
>>>> >>> +         re-marked all cold bbs hot.  */
>>>> >>> +      while (! VEC_empty (basic_block, bbs_in_hot_partition)
>>>> >>> +             && cold_bb_count)
>>>> >>> +        {
>>>> >>> +          basic_block dom_bb;
>>>> >>> +
>>>> >>> +          bb = VEC_pop (basic_block, bbs_in_hot_partition);
>>>> >>> +          dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
>>>> >>> +
>>>> >>> +          /* If bb's immediate dominator is also hot then it is ok.  */
>>>> >>> +          if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION)
>>>> >>> +            continue;
>>>> >>> +
>>>> >>> +          /* We have a hot bb with an immediate dominator that is cold.
>>>> >>> +             The dominator needs to be re-marked to hot.  */
>>>> >>> +          BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION);
>>>> >>> +          cold_bb_count--;
>>>> >>> +
>>>> >>> +          /* Now we need to examine newly-hot dom_bb to see if it is also
>>>> >>> +             dominated by a cold bb.  */
>>>> >>> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb);
>>>> >>> +
>>>> >>> +          /* We should also adjust any cold blocks that the newly-hot bb
>>>> >>> +             feeds and see if it makes sense to re-mark those as hot as
>>>> >>> +             well.  */
>>>> >>> +          VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb);
>>>> >>> +          while (! VEC_empty (basic_block, bbs_newly_hot))
>>>> >>> +            {
>>>> >>> +              basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot);
>>>> >>> +              /* Examine all successors of this newly-hot bb to see if they
>>>> >>> +                 are cold and should be re-marked as hot.  */
>>>> >>> +              FOR_EACH_EDGE (e, ei, new_hot_bb->succs)
>>>> >>> +                {
>>>> >>> +                  bool any_cold_preds = false;
>>>> >>> +                  basic_block succ = e->dest;
>>>> >>> +                  if (BB_PARTITION (succ) != BB_COLD_PARTITION)
>>>> >>> +                    continue;
>>>> >>> +                  /* Does this block have any cold predecessors now?  */
>>>> >>> +                  FOR_EACH_EDGE (e2, ei2, succ->preds)
>>>> >>> +                  {
>>>> >>> +                    if (BB_PARTITION (e2->src) == BB_COLD_PARTITION)
>>>> >>> +                      {
>>>> >>> +                        any_cold_preds = true;
>>>> >>> +                        break;
>>>> >>> +                      }
>>>> >>> +                  }
>>>> >>> +                  if (any_cold_preds)
>>>> >>> +                    continue;
>>>> >>> +
>>>> >>> +                  /* Here we have a successor of newly-hot bb that is cold
>>>> >>> +                     but no longer has any cold precessessors. Since the original
>>>> >>> +                     assignment of our newly-hot bb was incorrect, this successor's
>>>> >>> +                     assignment as cold is also suspect. Go ahead and re-mark it
>>>> >>> +                     as hot now too. Better heuristics may be in order here.  */
>>>> >>> +                  BB_SET_PARTITION (succ, BB_HOT_PARTITION);
>>>> >>> +                  cold_bb_count--;
>>>> >>> +                  VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ);
>>>> >>> +                  /* Examine this successor as a newly-hot bb.  */
>>>> >>> +                  VEC_safe_push (basic_block, heap, bbs_newly_hot, succ);
>>>> >>> +                }
>>>> >>> +            }
>>>> >>> +        }
>>>> >>> +
>>>> >>> +      if (dom_calculated_here)
>>>> >>> +        free_dominance_info (CDI_DOMINATORS);
>>>> >>> +    }
>>>> >>> +
>>>> >>>    /* The format of .gcc_except_table does not allow landing pads to
>>>> >>>       be in a different partition as the throw.  Fix this by either
>>>> >>>       moving or duplicating the landing pads.  */
>>>> >>> @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void)
>>>> >>>                       new_bb->aux = cur_bb->aux;
>>>> >>>                       cur_bb->aux = new_bb;
>>>> >>>
>>>> >>> -                     /* Make sure new fall-through bb is in same
>>>> >>> -                        partition as bb it's falling through from.  */
>>>> >>> +                      /* This is done by force_nonfallthru_and_redirect.  */
>>>> >>> +                     gcc_assert (BB_PARTITION (new_bb)
>>>> >>> +                                  == BB_PARTITION (cur_bb));
>>>> >>>
>>>> >>> -                     BB_COPY_PARTITION (new_bb, cur_bb);
>>>> >>>                       single_succ_edge (new_bb)->flags |= EDGE_CROSSING;
>>>> >>>                     }
>>>> >>>                   else
>>>> >>> @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void)
>>>> >>>    FOR_EACH_BB (bb)
>>>> >>>      FOR_EACH_EDGE (e, ei, bb->succs)
>>>> >>>        if ((e->flags & EDGE_CROSSING)
>>>> >>> -         && JUMP_P (BB_END (e->src)))
>>>> >>> +         && JUMP_P (BB_END (e->src))
>>>> >>> +          /* Some notes were added during fix_up_fall_thru_edges, via
>>>> >>> +             force_nonfallthru_and_redirect.  */
>>>> >>> +          && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX))
>>>> >>>         add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>>> >>>  }
>>>> >>>
>>>> >>> @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void)
>>>> >>>        dump_flow_info (dump_file, dump_flags);
>>>> >>>      }
>>>> >>>
>>>> >>> -  if (flag_reorder_blocks_and_partition)
>>>> >>> +  if (crtl->has_bb_partition)
>>>> >>>      verify_hot_cold_block_grouping ();
>>>> >>>  }
>>>> >>>
>>>> >>> @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void)
>>>> >>>     encountering this note will make the compiler switch between the
>>>> >>>     hot and cold text sections.  */
>>>> >>>
>>>> >>> -static void
>>>> >>> +void
>>>> >>>  insert_section_boundary_note (void)
>>>> >>>  {
>>>> >>>    basic_block bb;
>>>> >>>    rtx new_note;
>>>> >>>    int first_partition = 0;
>>>> >>>
>>>> >>> -  if (!flag_reorder_blocks_and_partition)
>>>> >>> +  if (!crtl->has_bb_partition)
>>>> >>>      return;
>>>> >>>
>>>> >>>    FOR_EACH_BB (bb)
>>>> >>> @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void)
>>>> >>>    FOR_EACH_BB (bb)
>>>> >>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>>>> >>>        bb->aux = bb->next_bb;
>>>> >>> -  cfg_layout_finalize ();
>>>> >>> +  cfg_layout_finalize (true);
>>>> >>>
>>>> >>> -  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
>>>> >>> -  insert_section_boundary_note ();
>>>> >>>    return 0;
>>>> >>>  }
>>>> >>>
>>>> >>> @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void)
>>>> >>>      }
>>>> >>>
>>>> >>>  done:
>>>> >>> -  cfg_layout_finalize ();
>>>> >>> +  cfg_layout_finalize (false);
>>>> >>>
>>>> >>>    BITMAP_FREE (candidates);
>>>> >>>    return 0;
>>>> >>> @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void)
>>>> >>>    if (crossing_edges == NULL)
>>>> >>>      return 0;
>>>> >>>
>>>> >>> +  crtl->has_bb_partition = true;
>>>> >>> +
>>>> >>>    /* Make sure the source of any crossing edge ends in a jump and the
>>>> >>>       destination of any crossing edge has a label.  */
>>>> >>>    add_labels_and_missing_jumps (crossing_edges);
>>>> >>> Index: bb-reorder.h
>>>> >>> ===================================================================
>>>> >>> --- bb-reorder.h        (revision 193376)
>>>> >>> +++ bb-reorder.h        (working copy)
>>>> >>> @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re
>>>> >>>
>>>> >>>  extern int get_uncond_jump_length (void);
>>>> >>>
>>>> >>> +extern void insert_section_boundary_note (void);
>>>> >>> +
>>>> >>> +extern void emit_barrier_after_bb (basic_block bb);
>>>> >>> +
>>>> >>>  #endif
>>>> >>> Index: basic-block.h
>>>> >>> ===================================================================
>>>> >>> --- basic-block.h       (revision 193376)
>>>> >>> +++ basic-block.h       (working copy)
>>>> >>> @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect
>>>> >>>  extern bool contains_no_active_insn_p (const_basic_block);
>>>> >>>  extern bool forwarder_block_p (const_basic_block);
>>>> >>>  extern bool can_fallthru (basic_block, basic_block);
>>>> >>> +extern void fixup_partitions (void);
>>>> >>>
>>>> >>>  /* In cfgbuild.c.  */
>>>> >>>  extern void find_many_sub_basic_blocks (sbitmap);
>>>> >>> Index: cfgrtl.c
>>>> >>> ===================================================================
>>>> >>> --- cfgrtl.c    (revision 193376)
>>>> >>> +++ cfgrtl.c    (working copy)
>>>> >>> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
>>>> >>>  #include "tree.h"
>>>> >>>  #include "hard-reg-set.h"
>>>> >>>  #include "basic-block.h"
>>>> >>> +#include "bb-reorder.h"
>>>> >>>  #include "regs.h"
>>>> >>>  #include "flags.h"
>>>> >>>  #include "function.h"
>>>> >>> @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3.  If not see
>>>> >>>     Only applicable if the CFG is in cfglayout mode.  */
>>>> >>>  static GTY(()) rtx cfg_layout_function_footer;
>>>> >>>  static GTY(()) rtx cfg_layout_function_header;
>>>> >>> +static bool had_sec_boundary_notes;
>>>> >>>
>>>> >>>  static rtx skip_insns_after_block (basic_block);
>>>> >>>  static void record_effective_endpoints (void);
>>>> >>>  static rtx label_for_bb (basic_block);
>>>> >>> -static void fixup_reorder_chain (void);
>>>> >>> +static void fixup_reorder_chain (bool finalize_reorder_blocks);
>>>> >>>
>>>> >>>  void verify_insn_chain (void);
>>>> >>>  static void fixup_fallthru_exit_predecessor (void);
>>>> >>> @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc
>>>> >>>       partition boundaries).  See  the comments at the top of
>>>> >>>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>>>> >>>
>>>> >>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
>>>> >>> -      || BB_PARTITION (src) != BB_PARTITION (target))
>>>> >>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>>>> >>>      return NULL;
>>>> >>>
>>>> >>>    /* We can replace or remove a complex jump only when we have exactly
>>>> >>> @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target)
>>>> >>>    return e;
>>>> >>>  }
>>>> >>>
>>>> >>> +/* Called when edge E has been redirected to a new destination,
>>>> >>> +   in order to update the region crossing flag on the edge and
>>>> >>> +   jump.  */
>>>> >>> +
>>>> >>> +static void
>>>> >>> +fixup_partition_crossing (edge e, basic_block target)
>>>> >>> +{
>>>> >>> +  rtx note;
>>>> >>> +
>>>> >>> +  gcc_assert (e->dest == target);
>>>> >>> +
>>>> >>> +  if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR)
>>>> >>> +    return;
>>>> >>> +  /* If we redirected an existing edge, it may already be marked
>>>> >>> +     crossing, even though the new src is missing a reg crossing note.
>>>> >>> +     But make sure reg crossing note doesn't already exist before
>>>> >>> +     inserting.  */
>>>> >>> +  if (BB_PARTITION (e->src) != BB_PARTITION (target))
>>>> >>> +    {
>>>> >>> +      e->flags |= EDGE_CROSSING;
>>>> >>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>>> >>> +      if (JUMP_P (BB_END (e->src))
>>>> >>> +          && !note)
>>>> >>> +        add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>>> >>> +    }
>>>> >>> +  else if (BB_PARTITION (e->src) == BB_PARTITION (target))
>>>> >>> +    {
>>>> >>> +      e->flags &= ~EDGE_CROSSING;
>>>> >>> +      /* Remove the region crossing note from jump at end of
>>>> >>> +         e->src if it exists.  */
>>>> >>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>>> >>> +      if (note)
>>>> >>> +        remove_note (BB_END (e->src), note);
>>>> >>> +    }
>>>> >>> +}
>>>> >>> +
>>>> >>> +/* Called when block BB has been reassigned to a different partition,
>>>> >>> +   to ensure that the region crossing attributes are updated.  */
>>>> >>> +
>>>> >>> +static void
>>>> >>> +fixup_bb_partition (basic_block bb)
>>>> >>> +{
>>>> >>> +  edge e;
>>>> >>> +  edge_iterator ei;
>>>> >>> +
>>>> >>> +  /* Now need to make bb's pred edges non-region crossing.  */
>>>> >>> +  FOR_EACH_EDGE (e, ei, bb->preds)
>>>> >>> +    {
>>>> >>> +      fixup_partition_crossing (e, e->dest);
>>>> >>> +    }
>>>> >>> +
>>>> >>> +  /* Possibly need to make bb's successor edges region crossing,
>>>> >>> +     or remove stale region crossing.  */
>>>> >>> +  FOR_EACH_EDGE (e, ei, bb->succs)
>>>> >>> +    {
>>>> >>> +      if ((e->flags & EDGE_FALLTHRU)
>>>> >>> +          && BB_PARTITION (bb) != BB_PARTITION (e->dest)
>>>> >>> +          && e->dest != EXIT_BLOCK_PTR)
>>>> >>> +        /* force_nonfallthru_and_redirect calls fixup_partition_crossing.  */
>>>> >>> +        force_nonfallthru (e);
>>>> >>> +      else
>>>> >>> +        fixup_partition_crossing (e, e->dest);
>>>> >>> +    }
>>>> >>> +}
>>>> >>> +
>>>> >>>  /* Attempt to change code to redirect edge E to TARGET.  Don't do that on
>>>> >>>     expense of adding new instructions or reordering basic blocks.
>>>> >>>
>>>> >>> @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>>>> >>>  {
>>>> >>>    edge ret;
>>>> >>>    basic_block src = e->src;
>>>> >>> +  basic_block dest = e->dest;
>>>> >>>
>>>> >>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>>>> >>>      return NULL;
>>>> >>>
>>>> >>> -  if (e->dest == target)
>>>> >>> +  if (dest == target)
>>>> >>>      return e;
>>>> >>>
>>>> >>>    if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL)
>>>> >>>      {
>>>> >>>        df_set_bb_dirty (src);
>>>> >>> +      fixup_partition_crossing (ret, target);
>>>> >>>        return ret;
>>>> >>>      }
>>>> >>>
>>>> >>> @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>>>> >>>      return NULL;
>>>> >>>
>>>> >>>    df_set_bb_dirty (src);
>>>> >>> +  fixup_partition_crossing (ret, target);
>>>> >>>    return ret;
>>>> >>>  }
>>>> >>>
>>>> >>> @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>>>> >>>        /* Make sure new block ends up in correct hot/cold section.  */
>>>> >>>
>>>> >>>        BB_COPY_PARTITION (jump_block, e->src);
>>>> >>> -      if (flag_reorder_blocks_and_partition
>>>> >>> -         && targetm_common.have_named_sections
>>>> >>> -         && JUMP_P (BB_END (jump_block))
>>>> >>> -         && !any_condjump_p (BB_END (jump_block))
>>>> >>> -         && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING))
>>>> >>> -       add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX);
>>>> >>>
>>>> >>>        /* Wire edge in.  */
>>>> >>>        new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU);
>>>> >>>        new_edge->probability = probability;
>>>> >>>        new_edge->count = count;
>>>> >>>
>>>> >>> +      /* If e->src was previously region crossing, it no longer is
>>>> >>> +         and the reg crossing note should be removed.  */
>>>> >>> +      fixup_partition_crossing (new_edge, jump_block);
>>>> >>> +
>>>> >>>        /* Redirect old edge.  */
>>>> >>>        redirect_edge_pred (e, jump_block);
>>>> >>>        e->probability = REG_BR_PROB_BASE;
>>>> >>> @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>>>> >>>        LABEL_NUSES (label)++;
>>>> >>>      }
>>>> >>>
>>>> >>> -  emit_barrier_after (BB_END (jump_block));
>>>> >>> +  /* We might be in cfg layout mode, and if so, the following routine will
>>>> >>> +     insert the barrier correctly.  */
>>>> >>> +  emit_barrier_after_bb (jump_block);
>>>> >>>    redirect_edge_succ_nodup (e, target);
>>>> >>>
>>>> >>>    if (abnormal_edge_flags)
>>>> >>>      make_edge (src, target, abnormal_edge_flags);
>>>> >>>
>>>> >>>    df_mark_solutions_dirty ();
>>>> >>> +  fixup_partition_crossing (e, target);
>>>> >>>    return new_bb;
>>>> >>>  }
>>>> >>>
>>>> >>> @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU
>>>> >>>  static basic_block
>>>> >>>  rtl_split_edge (edge edge_in)
>>>> >>>  {
>>>> >>> -  basic_block bb;
>>>> >>> +  basic_block bb, new_bb;
>>>> >>>    rtx before;
>>>> >>>
>>>> >>>    /* Abnormal edges cannot be split.  */
>>>> >>> @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in)
>>>> >>>    else
>>>> >>>      {
>>>> >>>        bb = create_basic_block (before, NULL, edge_in->dest->prev_bb);
>>>> >>> -      /* ??? Why not edge_in->dest->prev_bb here?  */
>>>> >>> -      BB_COPY_PARTITION (bb, edge_in->dest);
>>>> >>> +      if (edge_in->src == ENTRY_BLOCK_PTR)
>>>> >>> +        BB_COPY_PARTITION (bb, edge_in->dest);
>>>> >>> +      else
>>>> >>> +        /* Put the split bb into the src partition, to avoid creating
>>>> >>> +           a situation where a cold bb dominates a hot bb, in the case
>>>> >>> +           where src is cold and dest is hot. The src will dominate
>>>> >>> +           the new bb (whereas it might not have dominated dest).  */
>>>> >>> +        BB_COPY_PARTITION (bb, edge_in->src);
>>>> >>>      }
>>>> >>>
>>>> >>>    make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU);
>>>> >>>
>>>> >>> +  /* Can't allow a region crossing edge to be fallthrough.  */
>>>> >>> +  if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest)
>>>> >>> +      && edge_in->dest != EXIT_BLOCK_PTR)
>>>> >>> +    {
>>>> >>> +      new_bb = force_nonfallthru (single_succ_edge (bb));
>>>> >>> +      gcc_assert (!new_bb);
>>>> >>> +    }
>>>> >>> +
>>>> >>>    /* For non-fallthru edges, we must adjust the predecessor's
>>>> >>>       jump instruction to target our new block.  */
>>>> >>>    if ((edge_in->flags & EDGE_FALLTHRU) == 0)
>>>> >>> @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e)
>>>> >>>    else
>>>> >>>      {
>>>> >>>        bb = split_edge (e);
>>>> >>> -      after = BB_END (bb);
>>>> >>>
>>>> >>> -      if (flag_reorder_blocks_and_partition
>>>> >>> -         && targetm_common.have_named_sections
>>>> >>> -         && e->src != ENTRY_BLOCK_PTR
>>>> >>> -         && BB_PARTITION (e->src) == BB_COLD_PARTITION
>>>> >>> -         && !(e->flags & EDGE_CROSSING)
>>>> >>> -         && JUMP_P (after)
>>>> >>> -         && !any_condjump_p (after)
>>>> >>> -         && (single_succ_edge (bb)->flags & EDGE_CROSSING))
>>>> >>> -       add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX);
>>>> >>> +      /* If e crossed a partition boundary, we needed to make bb end in
>>>> >>> +         a region-crossing jump, even though it was originally fallthru.  */
>>>> >>> +      if (JUMP_P (BB_END (bb)))
>>>> >>> +       before = BB_END (bb);
>>>> >>> +      else
>>>> >>> +        after = BB_END (bb);
>>>> >>>      }
>>>> >>>
>>>> >>>    /* Now that we've found the spot, do the insertion.  */
>>>> >>> @@ -1827,6 +1907,14 @@ commit_edge_insertions (void)
>>>> >>>  {
>>>> >>>    basic_block bb;
>>>> >>>
>>>> >>> +  /* Optimization passes that invoke this routine can cause hot blocks
>>>> >>> +     previously reached by both hot and cold blocks to become dominated only
>>>> >>> +     by cold blocks. This will cause the verification below to fail,
>>>> >>> +     and lead to now cold code in the hot section. In some cases this
>>>> >>> +     may only be visible after newly unreachable blocks are deleted,
>>>> >>> +     which will be done by fixup_partitions.  */
>>>> >>> +  fixup_partitions ();
>>>> >>> +
>>>> >>>  #ifdef ENABLE_CHECKING
>>>> >>>    verify_flow_info ();
>>>> >>>  #endif
>>>> >>> @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb)
>>>> >>>
>>>> >>>    return end;
>>>> >>>  }
>>>> >>> -
>>>> >>> +
>>>> >>> +/* Perform cleanup on the hot/cold bb partitioning after optimization
>>>> >>> +   passes that modify the cfg.  */
>>>> >>> +
>>>> >>> +void
>>>> >>> +fixup_partitions (void)
>>>> >>> +{
>>>> >>> +  basic_block bb;
>>>> >>> +
>>>> >>> +  if (!crtl->has_bb_partition)
>>>> >>> +    return;
>>>> >>> +
>>>> >>> +  /* Delete any blocks that became unreachable and weren't
>>>> >>> +     already cleaned up, for example during edge forwarding
>>>> >>> +     and convert_jumps_to_returns. This will expose more
>>>> >>> +     opportunities for fixing the partition boundaries here.
>>>> >>> +     Also, the calculation of the dominance graph during verification
>>>> >>> +     will assert if there are unreachable nodes.  */
>>>> >>> +  delete_unreachable_blocks ();
>>>> >>> +
>>>> >>> +  /* If there are partitions, do a sanity check on them: A basic block in
>>>> >>> +     a cold partition cannot dominate a basic block in a hot partition.
>>>> >>> +     Fixup any that now violate this requirement, as a result of edge
>>>> >>> +     forwarding and unreachable block deletion.  */
>>>> >>> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
>>>> >>> +  VEC (basic_block, heap) *bbs_to_fix = NULL;
>>>> >>> +  FOR_EACH_BB (bb)
>>>> >>> +    if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
>>>> >>> +      VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
>>>> >>> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
>>>> >>> +    {
>>>> >>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>>>> >>> +      basic_block son;
>>>> >>> +
>>>> >>> +      if (dom_calculated_here)
>>>> >>> +        calculate_dominance_info (CDI_DOMINATORS);
>>>> >>> +
>>>> >>> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
>>>> >>> +        {
>>>> >>> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
>>>> >>> +          /* If bb is not yet cold (because it was added below as
>>>> >>> +             a block dominated by a cold bb) then mark it cold here.  */
>>>> >>> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
>>>> >>> +            {
>>>> >>> +              BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>>>> >>> +              VEC_safe_push (basic_block, heap, bbs_to_fix, bb);
>>>> >>> +            }
>>>> >>> +          /* Any blocks dominated by a block in the cold section
>>>> >>> +             must also be cold.  */
>>>> >>> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
>>>> >>> +               son;
>>>> >>> +               son = next_dom_son (CDI_DOMINATORS, son))
>>>> >>> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
>>>> >>> +        }
>>>> >>> +
>>>> >>> +      if (dom_calculated_here)
>>>> >>> +        free_dominance_info (CDI_DOMINATORS);
>>>> >>> +    }
>>>> >>> +
>>>> >>> +  /* Do the partition fixup after all necessary blocks have been converted to
>>>> >>> +     cold, so that we only update the region crossings the minimum number of
>>>> >>> +     places, which can require forcing edges to be non fallthru.  */
>>>> >>> +  while (! VEC_empty (basic_block, bbs_to_fix))
>>>> >>> +    {
>>>> >>> +      bb = VEC_pop (basic_block, bbs_to_fix);
>>>> >>> +      fixup_bb_partition (bb);
>>>> >>> +    }
>>>> >>> +}
>>>> >>> +
>>>> >>>  /* Verify the CFG and RTL consistency common for both underlying RTL and
>>>> >>>     cfglayout RTL.
>>>> >>>
>>>> >>> @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void)
>>>> >>>    rtx x;
>>>> >>>    int err = 0;
>>>> >>>    basic_block bb;
>>>> >>> +  bool have_partitions = false;
>>>> >>>
>>>> >>>    /* Check the general integrity of the basic blocks.  */
>>>> >>>    FOR_EACH_BB_REVERSE (bb)
>>>> >>> @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void)
>>>> >>>
>>>> >>>           if (e->flags & EDGE_ABNORMAL)
>>>> >>>             n_abnormal++;
>>>> >>> +
>>>> >>> +          have_partitions |= is_crossing;
>>>> >>>         }
>>>> >>>
>>>> >>>        if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX))
>>>> >>> @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void)
>>>> >>>           }
>>>> >>>      }
>>>> >>>
>>>> >>> +  /* If there are partitions, do a sanity check on them: A basic block in
>>>> >>> +     a cold partition cannot dominate a basic block in a hot partition.  */
>>>> >>> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
>>>> >>> +  if (have_partitions && !err)
>>>> >>> +    FOR_EACH_BB (bb)
>>>> >>> +      if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
>>>> >>> +        VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
>>>> >>> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
>>>> >>> +    {
>>>> >>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>>>> >>> +      basic_block son;
>>>> >>> +
>>>> >>> +      if (dom_calculated_here)
>>>> >>> +        calculate_dominance_info (CDI_DOMINATORS);
>>>> >>> +
>>>> >>> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
>>>> >>> +        {
>>>> >>> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
>>>> >>> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
>>>> >>> +            {
>>>> >>> +              error ("non-cold basic block %d dominated "
>>>> >>> +                     "by a block in the cold partition", bb->index);
>>>> >>> +              err = 1;
>>>> >>> +            }
>>>> >>> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
>>>> >>> +               son;
>>>> >>> +               son = next_dom_son (CDI_DOMINATORS, son))
>>>> >>> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
>>>> >>> +        }
>>>> >>> +
>>>> >>> +      if (dom_calculated_here)
>>>> >>> +        free_dominance_info (CDI_DOMINATORS);
>>>> >>> +    }
>>>> >>> +
>>>> >>>    /* Clean up.  */
>>>> >>>    return err;
>>>> >>>  }
>>>> >>> @@ -2965,14 +3158,41 @@ record_effective_endpoints (void)
>>>> >>>    else
>>>> >>>      cfg_layout_function_header = NULL_RTX;
>>>> >>>
>>>> >>> +  had_sec_boundary_notes = false;
>>>> >>> +
>>>> >>>    next_insn = get_insns ();
>>>> >>>    FOR_EACH_BB (bb)
>>>> >>>      {
>>>> >>>        rtx end;
>>>> >>>
>>>> >>>        if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb))
>>>> >>> -       BB_HEADER (bb) = unlink_insn_chain (next_insn,
>>>> >>> -                                             PREV_INSN (BB_HEAD (bb)));
>>>> >>> +        {
>>>> >>> +          /* Rather than try to keep section boundary notes incrementally
>>>> >>> +             up-to-date through cfg layout optimizations, simply remove them
>>>> >>> +             and flag that they should be re-inserted when exiting
>>>> >>> +             cfg layout mode.  */
>>>> >>> +          rtx check_insn = next_insn;
>>>> >>> +          while (check_insn)
>>>> >>> +            {
>>>> >>> +              if (NOTE_P (check_insn)
>>>> >>> +                  && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS)
>>>> >>> +              {
>>>> >>> +                had_sec_boundary_notes |= true;
>>>> >>> +                /* Remove note from chain. Grab new next_insn first.  */
>>>> >>> +                if (next_insn == check_insn)
>>>> >>> +                  next_insn = NEXT_INSN (check_insn);
>>>> >>> +                /* Delete note.  */
>>>> >>> +                delete_insn (check_insn);
>>>> >>> +                /* There will only be one.  */
>>>> >>> +                break;
>>>> >>> +              }
>>>> >>> +              check_insn = NEXT_INSN (check_insn);
>>>> >>> +            }
>>>> >>> +          /* If we still have header instructions left after above loop.  */
>>>> >>> +          if (next_insn != BB_HEAD (bb))
>>>> >>> +            BB_HEADER (bb) = unlink_insn_chain (next_insn,
>>>> >>> +                                                PREV_INSN (BB_HEAD (bb)));
>>>> >>> +        }
>>>> >>>        end = skip_insns_after_block (bb);
>>>> >>>        if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end)
>>>> >>>         BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end);
>>>> >>> @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void)
>>>> >>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>>>> >>>        bb->aux = bb->next_bb;
>>>> >>>
>>>> >>> -  cfg_layout_finalize ();
>>>> >>> +  cfg_layout_finalize (false);
>>>> >>>
>>>> >>>    return 0;
>>>> >>>  }
>>>> >>> @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode)
>>>> >>>  }
>>>> >>>
>>>> >>>
>>>> >>> -/* Given a reorder chain, rearrange the code to match.  */
>>>> >>> +/* Given a reorder chain, rearrange the code to match. If
>>>> >>> +   this is called when we will FINALIZE_REORDER_BLOCKS, or when
>>>> >>> +   section boundary notes were removed on entry to cfg layout
>>>> >>> +   mode, insert section boundary notes here.  */
>>>> >>>
>>>> >>>  static void
>>>> >>> -fixup_reorder_chain (void)
>>>> >>> +fixup_reorder_chain (bool finalize_reorder_blocks)
>>>> >>>  {
>>>> >>>    basic_block bb;
>>>> >>>    rtx insn = NULL;
>>>> >>> @@ -3150,7 +3373,7 @@ static void
>>>> >>>           PREV_INSN (BB_HEADER (bb)) = insn;
>>>> >>>           insn = BB_HEADER (bb);
>>>> >>>           while (NEXT_INSN (insn))
>>>> >>> -           insn = NEXT_INSN (insn);
>>>> >>> +            insn = NEXT_INSN (insn);
>>>> >>>         }
>>>> >>>        if (insn)
>>>> >>>         NEXT_INSN (insn) = BB_HEAD (bb);
>>>> >>> @@ -3175,6 +3398,11 @@ static void
>>>> >>>      insn = NEXT_INSN (insn);
>>>> >>>
>>>> >>>    set_last_insn (insn);
>>>> >>> +
>>>> >>> +  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
>>>> >>> +  if (had_sec_boundary_notes || finalize_reorder_blocks)
>>>> >>> +    insert_section_boundary_note ();
>>>> >>> +
>>>> >>>  #ifdef ENABLE_CHECKING
>>>> >>>    verify_insn_chain ();
>>>> >>>  #endif
>>>> >>> @@ -3187,7 +3415,7 @@ static void
>>>> >>>        edge e_fall, e_taken, e;
>>>> >>>        rtx bb_end_insn;
>>>> >>>        rtx ret_label = NULL_RTX;
>>>> >>> -      basic_block nb, src_bb;
>>>> >>> +      basic_block nb;
>>>> >>>        edge_iterator ei;
>>>> >>>
>>>> >>>        if (EDGE_COUNT (bb->succs) == 0)
>>>> >>> @@ -3322,7 +3550,6 @@ static void
>>>> >>>        /* We got here if we need to add a new jump insn.
>>>> >>>          Note force_nonfallthru can delete E_FALL and thus we have to
>>>> >>>          save E_FALL->src prior to the call to force_nonfallthru.  */
>>>> >>> -      src_bb = e_fall->src;
>>>> >>>        nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label);
>>>> >>>        if (nb)
>>>> >>>         {
>>>> >>> @@ -3330,17 +3557,6 @@ static void
>>>> >>>           bb->aux = nb;
>>>> >>>           /* Don't process this new block.  */
>>>> >>>           bb = nb;
>>>> >>> -
>>>> >>> -         /* Make sure new bb is tagged for correct section (same as
>>>> >>> -            fall-thru source, since you cannot fall-thru across
>>>> >>> -            section boundaries).  */
>>>> >>> -         BB_COPY_PARTITION (src_bb, single_pred (bb));
>>>> >>> -         if (flag_reorder_blocks_and_partition
>>>> >>> -             && targetm_common.have_named_sections
>>>> >>> -             && JUMP_P (BB_END (bb))
>>>> >>> -             && !any_condjump_p (BB_END (bb))
>>>> >>> -             && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING))
>>>> >>> -           add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX);
>>>> >>>         }
>>>> >>>      }
>>>> >>>
>>>> >>> @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to)
>>>> >>>             case NOTE_INSN_FUNCTION_BEG:
>>>> >>>               /* There is always just single entry to function.  */
>>>> >>>             case NOTE_INSN_BASIC_BLOCK:
>>>> >>> +              /* We should only switch text sections once.  */
>>>> >>> +           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>>>> >>>               break;
>>>> >>>
>>>> >>>             case NOTE_INSN_EPILOGUE_BEG:
>>>> >>> -           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>>>> >>>               emit_note_copy (insn);
>>>> >>>               break;
>>>> >>>
>>>> >>> @@ -3759,10 +3976,13 @@ break_superblocks (void)
>>>> >>>  }
>>>> >>>
>>>> >>>  /* Finalize the changes: reorder insn list according to the sequence specified
>>>> >>> -   by aux pointers, enter compensation code, rebuild scope forest.  */
>>>> >>> +   by aux pointers, enter compensation code, rebuild scope forest. If
>>>> >>> +   this is called when we will FINALIZE_REORDER_BLOCKS, indicate that
>>>> >>> +   to fixup_reorder_chain so that it can insert the proper switch text
>>>> >>> +   section notes.  */
>>>> >>>
>>>> >>>  void
>>>> >>> -cfg_layout_finalize (void)
>>>> >>> +cfg_layout_finalize (bool finalize_reorder_blocks)
>>>> >>>  {
>>>> >>>  #ifdef ENABLE_CHECKING
>>>> >>>    verify_flow_info ();
>>>> >>> @@ -3775,7 +3995,7 @@ void
>>>> >>>  #endif
>>>> >>>        )
>>>> >>>      fixup_fallthru_exit_predecessor ();
>>>> >>> -  fixup_reorder_chain ();
>>>> >>> +  fixup_reorder_chain (finalize_reorder_blocks);
>>>> >>>
>>>> >>>    rebuild_jump_labels (get_insns ());
>>>> >>>    delete_dead_jumptables ();
>>>> >>> @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e)
>>>> >>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>>>> >>>      return false;
>>>> >>>
>>>> >>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
>>>> >>> -      || BB_PARTITION (src) != BB_PARTITION (target))
>>>> >>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>>>> >>>      return false;
>>>> >>>
>>>> >>>    if (!onlyjump_p (insn)
>>>> >>>
>>>> >>> --
>>>> >>> This patch is available for review at http://codereview.appspot.com/6823047
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>>>
>>>>
>>>>
>>>> --
>>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
  2012-11-28 15:57             ` Teresa Johnson
@ 2012-11-28 17:03               ` Christophe Lyon
  0 siblings, 0 replies; 35+ messages in thread
From: Christophe Lyon @ 2012-11-28 17:03 UTC (permalink / raw)
  To: Teresa Johnson
  Cc: Jack Howarth, reply, David Li, Steven Bosscher,
	Matthew Gretton-Dann, gcc-patches

Yes, I have configured GCC with:
--target=arm-none-linux-gnueabi--with-cpu=cortex-a9 --with-fpu=neon
--with-float=softfp

Thanks,

Christophe.

On 28 November 2012 16:56, Teresa Johnson <tejohnson@google.com> wrote:
> Is this with the same target compiler and options used in PR55121? I
> will try to reproduce the compile-time failures with arm and those
> options if so. I haven't seen those with spec2006 linux x86_64. I'm
> not sure how to test the runtime behavior though.
>
> Thanks,
> Teresa
>
> On Wed, Nov 28, 2012 at 7:48 AM, Christophe Lyon
> <christophe.lyon@linaro.org> wrote:
>> I have updated my trunk checkout, and I can confirm that eval.c now
>> compiles with your patch (and the other 4 patches I added to PR55121).
>>
>> Now, when looking at the whole Spec2k results:
>> - vpr passes now (used to fail)
>> - gcc, parser, perlbmk bzip2 and twolf no longer build: they all fail
>> with the same error from gas:
>> can't resolve `.text.unlikely' {.text.unlikely section} - `.LBB171'
>> {.text section}
>> - gap still does not build (same error as above)
>>
>> I haven't looked in detail, so I may be missing an obvious patch here.
>>
>> And I still observe runtime mis-behaviour on crafty, galgel, facerec and fma3d.
>>
>> Thanks
>> Christophe.
>>
>>
>> On 26 November 2012 21:52, Teresa Johnson <tejohnson@google.com> wrote:
>>> Sorry, I don't know what happened there. Patch is attached.
>>> Thanks,
>>> Teresa
>>>
>>> On Mon, Nov 26, 2012 at 12:42 PM, Jack Howarth <howarth@bromo.med.uc.edu> wrote:
>>>> On Mon, Nov 26, 2012 at 12:19:55PM -0800, Teresa Johnson wrote:
>>>>> Are you sure you have all my changes applied? I applied the 4 patches
>>>>> attached to PR55121 into my trunk checkout that has my fixes, and to a
>>>>> pristine trunk checkout. I configured and built both for
>>>>> --target=arm-none-linux-gnueabi, and built using your options, .i file
>>>>> and gcda file. I can reproduce the failure using the pristine trunk
>>>>> with your patches but not with my fixed trunk + your patches. (I just
>>>>> updated to head to pickup recent changes and get the same result. The
>>>>> vec changes required some manual changes to the patch, which I will
>>>>> resend shortly.)
>>>>
>>>> Teresa,
>>>>     Your mailer seems to have corrupted the posted patch with stray
>>>> =3D characters and line breaks. Can you repost a copy as an attachment
>>>> to the list?
>>>>              Jack
>>>>
>>>>>
>>>>> Without my fixes:
>>>>>
>>>>> $ ~/extra/gcc_trunk_3_arm-eabi/gcc/cc1 -fpreproce
>>>>> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9
>>>>> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp
>>>>> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use
>>>>> -fno-common -o eval.s -freorder-blocks-and-partition
>>>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
>>>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
>>>>> 2.4.2-p1, MPC version 0.8.1
>>>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>>>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
>>>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
>>>>> 2.4.2-p1, MPC version 0.8.1
>>>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>>>>> Compiler executable checksum: d19cc60a2f07de08237a8488bb35cd1a
>>>>> eval.c: In function ‘Ge’:
>>>>> eval.c:792:1: internal compiler error: in df_compact_blocks, at df-core.c:1560
>>>>>  }
>>>>>  ^
>>>>> 0x622f71 df_compact_blocks()
>>>>> ../../gcc_trunk_3/gcc/df-core.c:1560
>>>>> 0x5cfcb5 compact_blocks()
>>>>> ../../gcc_trunk_3/gcc/cfg.c:162
>>>>> 0xc9dce0 reorder_basic_blocks
>>>>> ../../gcc_trunk_3/gcc/bb-reorder.c:2154
>>>>> 0xc9dce0 rest_of_handle_reorder_blocks
>>>>> ../../gcc_trunk_3/gcc/bb-reorder.c:2219
>>>>> Please submit a full bug report,
>>>>> with preprocessed source if appropriate.
>>>>> Please include the complete backtrace with any bug report.
>>>>> See <http://gcc.gnu.org/bugs.html> for instructions.
>>>>>
>>>>>
>>>>> With my fixes:
>>>>>
>>>>> $ ~/extra/gcc_trunk_4_arm-eabi/gcc/cc1 -fpreproce
>>>>> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9
>>>>> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp
>>>>> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use
>>>>> -fno-common -o eval.s -freorder-blocks-and-partition
>>>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
>>>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
>>>>> 2.4.2-p1, MPC version 0.8.1
>>>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>>>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi)
>>>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version
>>>>> 2.4.2-p1, MPC version 0.8.1
>>>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>>>>> Compiler executable checksum: 45b468efa7c981f9afb44c4dac2424f3
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Teresa
>>>>>
>>>>> On Mon, Nov 26, 2012 at 8:25 AM, Christophe Lyon
>>>>> <christophe.lyon@linaro.org> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > I have tested your patch on Spec2000 on ARM, and I can still see
>>>>> > several failures caused by:
>>>>> > "error: fallthru edge crosses section boundary", including the case
>>>>> > described in PR55121.
>>>>> >
>>>>> > On 26 November 2012 16:55, Teresa Johnson <tejohnson@google.com> wrote:
>>>>> >> Ping.
>>>>> >> Teresa
>>>>> >>
>>>>> >> On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote:
>>>>> >>> Revised patch that fixes failures encountered when enabling
>>>>> >>> -freorder-blocks-and-partition, including the failure reported in PR 53743.
>>>>> >>>
>>>>> >>> This includes new verification code to ensure no cold blocks dominate hot
>>>>> >>> blocks contributed by Steven Bosscher.
>>>>> >>>
>>>>> >>> I attempted to make the handling of partition updates through the optimization
>>>>> >>> passes much more consistent, removing a number of partial fixes in the code
>>>>> >>> stream in the process. The code to fixup partitions (including the BB_PARTITION
>>>>> >>> assignement, region crossing jump notes, and switch text section notes) is
>>>>> >>> now handled in a few centralized locations. For example, inside
>>>>> >>> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers
>>>>> >>> don't need to attempt the fixup themselves.
>>>>> >>>
>>>>> >>> For optimization passes that make adjustments to the cfg while in cfg layout
>>>>> >>> mode that are not easy to fix up incrementally, the new routine
>>>>> >>> fixup_partitions handles the cleanup globally. This does require calculation
>>>>> >>> of the dominance relation, however, as far as I can tell the routines which
>>>>> >>> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions)
>>>>> >>> are invoked typically once (or a small number of times in the case of
>>>>> >>> try_optimize_cfg) per optimization pass. Additionally, I compared the
>>>>> >>> -ftime-report output for some large fdo compilations and saw only minimal
>>>>> >>> increases in the dominance computation times, which were only a tiny percent
>>>>> >>> of the overall compile time.
>>>>> >>>
>>>>> >>> Additionally, I added a flag to the rtl_data structure to indicate whether
>>>>> >>> any partitioning was actually performed, so that optimizations which were
>>>>> >>> conservatively disabled whenever the flag_reorder_blocks_and_partition
>>>>> >>> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less
>>>>> >>> conservative for functions where no partitions were formed (e.g. they are
>>>>> >>> completely hot).
>>>>> >>>
>>>>> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int
>>>>> >>> benchmarks and internal google benchmarks using profile feedback and
>>>>> >>> -freorder-blocks-and-partition to get more coverage. Ok for trunk?
>>>>> >>>
>>>>> >>> Thanks,
>>>>> >>> Teresa
>>>>> >>>
>>>>> >>> 2012-11-14  Teresa Johnson  <tejohnson@google.com>
>>>>> >>>             Steven Bosscher  <steven@gcc.gnu.org>
>>>>> >>>
>>>>> >>>         * cfghooks.h (cfg_layout_finalize): New parameter.
>>>>> >>>         * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize
>>>>> >>>         parameter.
>>>>> >>>         * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert
>>>>> >>>         as this is now done by redirect_edge_and_branch_force.
>>>>> >>>         * function.c (thread_prologue_and_epilogue_insns): Insert new bb after
>>>>> >>>         barriers, new cfg_layout_finalize parameter, and don't store exit
>>>>> >>>         predecessor BB until after it is potentially split.
>>>>> >>>         * function.h (struct rtl_data): New flag has_bb_partition.
>>>>> >>>         * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter.
>>>>> >>>         * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if
>>>>> >>>         any blocks in function actually partitioned.
>>>>> >>>         (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean
>>>>> >>>         up partitioning.
>>>>> >>>         * bb-reorder.c (connect_traces): Only look for partitions and skip
>>>>> >>>         block copying if any blocks in function actually partitioned.
>>>>> >>>         (emit_barrier_after_bb): Handle insertion in non-cfglayout mode.
>>>>> >>>         (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure
>>>>> >>>         that no cold blocks dominate a hot block.
>>>>> >>>         (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert
>>>>> >>>         as this is now done by force_nonfallthru_and_redirect.
>>>>> >>>         (add_reg_crossing_jump_notes): Handle the fact that some jumps may
>>>>> >>>         already be marked with region crossing note.
>>>>> >>>         (reorder_basic_blocks): Only need to verify partitions if any
>>>>> >>>         blocks in function actually partitioned.
>>>>> >>>         (insert_section_boundary_note): Only need to insert note if any
>>>>> >>>         blocks in function actually partitioned.
>>>>> >>>         (rest_of_handle_reorder_blocks): New cfg_layout_finalize
>>>>> >>>         parameter, and remove call to insert_section_boundary_note as this
>>>>> >>>         is now called via cfg_layout_finalize/fixup_reorder_chain.
>>>>> >>>         (duplicate_computed_gotos): New cfg_layout_finalize
>>>>> >>>         parameter.
>>>>> >>>         (partition_hot_cold_basic_blocks): Set flag indicating function
>>>>> >>>         has bb partitions.
>>>>> >>>         * bb-reorder.h: Declare insert_section_boundary_note and
>>>>> >>>         emit_barrier_after_bb, which are no longer static.
>>>>> >>>         * basic-block.h: Declare new function fixup_partitions.
>>>>> >>>         * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary
>>>>> >>>         check for region crossing note.
>>>>> >>>         (fixup_partition_crossing): New function.
>>>>> >>>         (fixup_bb_partition): Ditto.
>>>>> >>>         (rtl_redirect_edge_and_branch): Fixup partition boundaries.
>>>>> >>>         (force_nonfallthru_and_redirect): Fixup partition boundaries,
>>>>> >>>         remove old code that tried to do this. Emit barrier correctly
>>>>> >>>         when we are in cfglayout mode.
>>>>> >>>         (rtl_split_edge): Correctly fixup partition boundaries.
>>>>> >>>         (commit_one_edge_insertion): Remove old code that tried to
>>>>> >>>         fixup region crossing edge since this is now handled in
>>>>> >>>         split_block, and set up insertion point correctly since
>>>>> >>>         block may now end in a jump.
>>>>> >>>         (commit_edge_insertions): Invoke fixup_partitions to sanitize partition
>>>>> >>>         boundaries after optimizations that modify cfg and before trying to
>>>>> >>>         verify the flow info.
>>>>> >>>         (fixup_partitions): New function.
>>>>> >>>         (rtl_verify_flow_info_1): Add verification that no cold bbs dominate
>>>>> >>>         hot bbs.
>>>>> >>>         (record_effective_endpoints): Remove region-crossing notes and set flag
>>>>> >>>         indicating that they need to be reinserted on exit from cfglayout mode.
>>>>> >>>         (outof_cfg_layout_mode): New cfg_layout_finalize parameter.
>>>>> >>>         (fixup_reorder_chain): Call insert_section_boundary_note if necessary.
>>>>> >>>         Remove old code that attempted to fixup region crossing note as
>>>>> >>>         this is now handled in force_nonfallthru_and_redirect.
>>>>> >>>         (duplicate_insn_chain): Don't duplicate switch section notes.
>>>>> >>>         (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain.
>>>>> >>>         (rtl_can_remove_branch_p): Remove unnecessary check for region crossing
>>>>> >>>         note.
>>>>> >>>
>>>>> >>> Index: cfghooks.h
>>>>> >>> ===================================================================
>>>>> >>> --- cfghooks.h  (revision 193376)
>>>>> >>> +++ cfghooks.h  (working copy)
>>>>> >>> @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas
>>>>> >>>  void account_profile_record (struct profile_record *, int);
>>>>> >>>
>>>>> >>>  extern void cfg_layout_initialize (unsigned int);
>>>>> >>> -extern void cfg_layout_finalize (void);
>>>>> >>> +extern void cfg_layout_finalize (bool);
>>>>> >>>
>>>>> >>>  /* Hooks containers.  */
>>>>> >>>  extern struct cfg_hooks gimple_cfg_hooks;
>>>>> >>> @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi
>>>>> >>>  extern void gimple_register_cfg_hooks (void);
>>>>> >>>  extern struct cfg_hooks get_cfg_hooks (void);
>>>>> >>>  extern void set_cfg_hooks (struct cfg_hooks);
>>>>> >>> -
>>>>> >>> Index: modulo-sched.c
>>>>> >>> ===================================================================
>>>>> >>> --- modulo-sched.c      (revision 193376)
>>>>> >>> +++ modulo-sched.c      (working copy)
>>>>> >>> @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void)
>>>>> >>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>>>>> >>>        bb->aux = bb->next_bb;
>>>>> >>>    free_dominance_info (CDI_DOMINATORS);
>>>>> >>> -  cfg_layout_finalize ();
>>>>> >>> +  cfg_layout_finalize (false);
>>>>> >>>  #endif /* INSN_SCHEDULING */
>>>>> >>>    return 0;
>>>>> >>>  }
>>>>> >>> Index: ifcvt.c
>>>>> >>> ===================================================================
>>>>> >>> --- ifcvt.c     (revision 193376)
>>>>> >>> +++ ifcvt.c     (working copy)
>>>>> >>> @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg
>>>>> >>>    if (new_bb)
>>>>> >>>      {
>>>>> >>>        df_bb_replace (then_bb_index, new_bb);
>>>>> >>> -      /* Since the fallthru edge was redirected from test_bb to new_bb,
>>>>> >>> -         we need to ensure that new_bb is in the same partition as
>>>>> >>> -         test bb (you can not fall through across section boundaries).  */
>>>>> >>> -      BB_COPY_PARTITION (new_bb, test_bb);
>>>>> >>> +      /* This should have been done above via force_nonfallthru_and_redirect
>>>>> >>> +         (possibly called from redirect_edge_and_branch_force).  */
>>>>> >>> +      gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb));
>>>>> >>>      }
>>>>> >>>
>>>>> >>>    num_true_changes++;
>>>>> >>> Index: function.c
>>>>> >>> ===================================================================
>>>>> >>> --- function.c  (revision 193376)
>>>>> >>> +++ function.c  (working copy)
>>>>> >>> @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void)
>>>>> >>>                     break;
>>>>> >>>                 if (e)
>>>>> >>>                   {
>>>>> >>> -                   copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)),
>>>>> >>> -                                                 NULL_RTX, e->src);
>>>>> >>> +                    /* Make sure we insert after any barriers.  */
>>>>> >>> +                    rtx end = get_last_bb_insn (e->src);
>>>>> >>> +                    copy_bb = create_basic_block (NEXT_INSN (end),
>>>>> >>> +                                                  NULL_RTX, e->src);
>>>>> >>>                     BB_COPY_PARTITION (copy_bb, e->src);
>>>>> >>>                   }
>>>>> >>>                 else
>>>>> >>> @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void)
>>>>> >>>         if (cur_bb->index >= NUM_FIXED_BLOCKS
>>>>> >>>             && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS)
>>>>> >>>           cur_bb->aux = cur_bb->next_bb;
>>>>> >>> -      cfg_layout_finalize ();
>>>>> >>> +      cfg_layout_finalize (false);
>>>>> >>>      }
>>>>> >>>
>>>>> >>>  epilogue_done:
>>>>> >>> @@ -6517,7 +6519,7 @@ epilogue_done:
>>>>> >>>        basic_block simple_return_block_cold = NULL;
>>>>> >>>        edge pending_edge_hot = NULL;
>>>>> >>>        edge pending_edge_cold = NULL;
>>>>> >>> -      basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb;
>>>>> >>> +      basic_block exit_pred;
>>>>> >>>        int i;
>>>>> >>>
>>>>> >>>        gcc_assert (entry_edge != orig_entry_edge);
>>>>> >>> @@ -6545,6 +6547,12 @@ epilogue_done:
>>>>> >>>             else
>>>>> >>>               pending_edge_cold = e;
>>>>> >>>           }
>>>>> >>> +
>>>>> >>> +      /* Save a pointer to the exit's predecessor BB for use in
>>>>> >>> +         inserting new BBs at the end of the function. Do this
>>>>> >>> +         after the call to split_block above which may split
>>>>> >>> +         the original exit pred.  */
>>>>> >>> +      exit_pred = EXIT_BLOCK_PTR->prev_bb;
>>>>> >>>
>>>>> >>>        FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e)
>>>>> >>>         {
>>>>> >>> Index: function.h
>>>>> >>> ===================================================================
>>>>> >>> --- function.h  (revision 193376)
>>>>> >>> +++ function.h  (working copy)
>>>>> >>> @@ -459,6 +459,11 @@ struct GTY(()) rtl_data {
>>>>> >>>       sched2) and is useful only if the port defines LEAF_REGISTERS.  */
>>>>> >>>    bool uses_only_leaf_regs;
>>>>> >>>
>>>>> >>> +  /* Nonzero if the function being compiled has undergone hot/cold partitioning
>>>>> >>> +     (under flag_reorder_blocks_and_partition) and has at least one cold
>>>>> >>> +     block.  */
>>>>> >>> +  bool has_bb_partition;
>>>>> >>> +
>>>>> >>>    /* Like regs_ever_live, but 1 if a reg is set or clobbered from an
>>>>> >>>       asm.  Unlike regs_ever_live, elements of this array corresponding
>>>>> >>>       to eliminable regs (like the frame pointer) are set if an asm
>>>>> >>> Index: hw-doloop.c
>>>>> >>> ===================================================================
>>>>> >>> --- hw-doloop.c (revision 193376)
>>>>> >>> +++ hw-doloop.c (working copy)
>>>>> >>> @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops)
>>>>> >>>        else
>>>>> >>>         bb->aux = NULL;
>>>>> >>>      }
>>>>> >>> -  cfg_layout_finalize ();
>>>>> >>> +  cfg_layout_finalize (false);
>>>>> >>>    clear_aux_for_blocks ();
>>>>> >>>    df_analyze ();
>>>>> >>>  }
>>>>> >>> Index: cfgcleanup.c
>>>>> >>> ===================================================================
>>>>> >>> --- cfgcleanup.c        (revision 193376)
>>>>> >>> +++ cfgcleanup.c        (working copy)
>>>>> >>> @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2,
>>>>> >>>       partition boundaries).  See the comments at the top of
>>>>> >>>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>>>>> >>>
>>>>> >>> -  if (flag_reorder_blocks_and_partition && reload_completed)
>>>>> >>> +  if (crtl->has_bb_partition && reload_completed)
>>>>> >>>      return false;
>>>>> >>>
>>>>> >>>    /* Search backward through forwarder blocks.  We don't need to worry
>>>>> >>> @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode)
>>>>> >>>               df_analyze ();
>>>>> >>>             }
>>>>> >>>
>>>>> >>> +         if (changed)
>>>>> >>> +            {
>>>>> >>> +              /* Edge forwarding in particular can cause hot blocks previously
>>>>> >>> +                 reached by both hot and cold blocks to become dominated only
>>>>> >>> +                 by cold blocks. This will cause the verification below to fail,
>>>>> >>> +                 and lead to now cold code in the hot section. This is not easy
>>>>> >>> +                 to detect and fix during edge forwarding, and in some cases
>>>>> >>> +                 is only visible after newly unreachable blocks are deleted,
>>>>> >>> +                 which will be done in fixup_partitions.  */
>>>>> >>> +              fixup_partitions ();
>>>>> >>> +
>>>>> >>>  #ifdef ENABLE_CHECKING
>>>>> >>> -         if (changed)
>>>>> >>> -           verify_flow_info ();
>>>>> >>> +              verify_flow_info ();
>>>>> >>>  #endif
>>>>> >>> +            }
>>>>> >>>
>>>>> >>>           changed_overall |= changed;
>>>>> >>>           first_pass = false;
>>>>> >>> Index: bb-reorder.c
>>>>> >>> ===================================================================
>>>>> >>> --- bb-reorder.c        (revision 193376)
>>>>> >>> +++ bb-reorder.c        (working copy)
>>>>> >>> @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces
>>>>> >>>    current_partition = BB_PARTITION (traces[0].first);
>>>>> >>>    two_passes = false;
>>>>> >>>
>>>>> >>> -  if (flag_reorder_blocks_and_partition)
>>>>> >>> +  if (crtl->has_bb_partition)
>>>>> >>>      for (i = 0; i < n_traces && !two_passes; i++)
>>>>> >>>        if (BB_PARTITION (traces[0].first)
>>>>> >>>           != BB_PARTITION (traces[i].first))
>>>>> >>> @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces
>>>>> >>>                       }
>>>>> >>>                   }
>>>>> >>>
>>>>> >>> -             if (flag_reorder_blocks_and_partition)
>>>>> >>> +             if (crtl->has_bb_partition)
>>>>> >>>                 try_copy = false;
>>>>> >>>
>>>>> >>>               /* Copy tiny blocks always; copy larger blocks only when the
>>>>> >>> @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void)
>>>>> >>>    return length;
>>>>> >>>  }
>>>>> >>>
>>>>> >>> -/* Emit a barrier into the footer of BB.  */
>>>>> >>> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode.  */
>>>>> >>>
>>>>> >>> -static void
>>>>> >>> +void
>>>>> >>>  emit_barrier_after_bb (basic_block bb)
>>>>> >>>  {
>>>>> >>>    rtx barrier = emit_barrier_after (BB_END (bb));
>>>>> >>> -  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>>>>> >>> +  if (current_ir_type () == IR_RTL_CFGLAYOUT)
>>>>> >>> +    BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>>>>> >>>  }
>>>>> >>>
>>>>> >>>  /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.
>>>>> >>> @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg
>>>>> >>>  {
>>>>> >>>    VEC(edge, heap) *crossing_edges = NULL;
>>>>> >>>    basic_block bb;
>>>>> >>> -  edge e;
>>>>> >>> -  edge_iterator ei;
>>>>> >>> +  edge e, e2;
>>>>> >>> +  edge_iterator ei, ei2;
>>>>> >>> +  unsigned int cold_bb_count = 0;
>>>>> >>> +  VEC (basic_block, heap) *bbs_in_hot_partition = NULL;
>>>>> >>> +  VEC (basic_block, heap) *bbs_newly_hot = NULL;
>>>>> >>>
>>>>> >>>    /* Mark which partition (hot/cold) each basic block belongs in.  */
>>>>> >>>    FOR_EACH_BB (bb)
>>>>> >>>      {
>>>>> >>>        if (probably_never_executed_bb_p (cfun, bb))
>>>>> >>> -       BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>>>>> >>> +        {
>>>>> >>> +          BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>>>>> >>> +          cold_bb_count++;
>>>>> >>> +        }
>>>>> >>>        else
>>>>> >>> -       BB_SET_PARTITION (bb, BB_HOT_PARTITION);
>>>>> >>> +        {
>>>>> >>> +          BB_SET_PARTITION (bb, BB_HOT_PARTITION);
>>>>> >>> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb);
>>>>> >>> +        }
>>>>> >>>      }
>>>>> >>>
>>>>> >>> +  /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of
>>>>> >>> +     several different possibilities. One is that there are edge weight insanities
>>>>> >>> +     due to optimization phases that do not properly update basic block profile
>>>>> >>> +     counts. The second is that the entry of the function may not be hot, because
>>>>> >>> +     it is entered fewer times than the number of profile training runs, but there
>>>>> >>> +     is a loop inside the function that causes blocks within the function to be
>>>>> >>> +     above the threshold for hotness.  */
>>>>> >>> +  if (cold_bb_count)
>>>>> >>> +    {
>>>>> >>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>>>>> >>> +
>>>>> >>> +      if (dom_calculated_here)
>>>>> >>> +        calculate_dominance_info (CDI_DOMINATORS);
>>>>> >>> +
>>>>> >>> +      /* Keep examining hot bbs until we have either checked them all, or
>>>>> >>> +         re-marked all cold bbs hot.  */
>>>>> >>> +      while (! VEC_empty (basic_block, bbs_in_hot_partition)
>>>>> >>> +             && cold_bb_count)
>>>>> >>> +        {
>>>>> >>> +          basic_block dom_bb;
>>>>> >>> +
>>>>> >>> +          bb = VEC_pop (basic_block, bbs_in_hot_partition);
>>>>> >>> +          dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
>>>>> >>> +
>>>>> >>> +          /* If bb's immediate dominator is also hot then it is ok.  */
>>>>> >>> +          if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION)
>>>>> >>> +            continue;
>>>>> >>> +
>>>>> >>> +          /* We have a hot bb with an immediate dominator that is cold.
>>>>> >>> +             The dominator needs to be re-marked to hot.  */
>>>>> >>> +          BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION);
>>>>> >>> +          cold_bb_count--;
>>>>> >>> +
>>>>> >>> +          /* Now we need to examine newly-hot dom_bb to see if it is also
>>>>> >>> +             dominated by a cold bb.  */
>>>>> >>> +          VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb);
>>>>> >>> +
>>>>> >>> +          /* We should also adjust any cold blocks that the newly-hot bb
>>>>> >>> +             feeds and see if it makes sense to re-mark those as hot as
>>>>> >>> +             well.  */
>>>>> >>> +          VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb);
>>>>> >>> +          while (! VEC_empty (basic_block, bbs_newly_hot))
>>>>> >>> +            {
>>>>> >>> +              basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot);
>>>>> >>> +              /* Examine all successors of this newly-hot bb to see if they
>>>>> >>> +                 are cold and should be re-marked as hot.  */
>>>>> >>> +              FOR_EACH_EDGE (e, ei, new_hot_bb->succs)
>>>>> >>> +                {
>>>>> >>> +                  bool any_cold_preds = false;
>>>>> >>> +                  basic_block succ = e->dest;
>>>>> >>> +                  if (BB_PARTITION (succ) != BB_COLD_PARTITION)
>>>>> >>> +                    continue;
>>>>> >>> +                  /* Does this block have any cold predecessors now?  */
>>>>> >>> +                  FOR_EACH_EDGE (e2, ei2, succ->preds)
>>>>> >>> +                  {
>>>>> >>> +                    if (BB_PARTITION (e2->src) == BB_COLD_PARTITION)
>>>>> >>> +                      {
>>>>> >>> +                        any_cold_preds = true;
>>>>> >>> +                        break;
>>>>> >>> +                      }
>>>>> >>> +                  }
>>>>> >>> +                  if (any_cold_preds)
>>>>> >>> +                    continue;
>>>>> >>> +
>>>>> >>> +                  /* Here we have a successor of newly-hot bb that is cold
>>>>> >>> +                     but no longer has any cold precessessors. Since the original
>>>>> >>> +                     assignment of our newly-hot bb was incorrect, this successor's
>>>>> >>> +                     assignment as cold is also suspect. Go ahead and re-mark it
>>>>> >>> +                     as hot now too. Better heuristics may be in order here.  */
>>>>> >>> +                  BB_SET_PARTITION (succ, BB_HOT_PARTITION);
>>>>> >>> +                  cold_bb_count--;
>>>>> >>> +                  VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ);
>>>>> >>> +                  /* Examine this successor as a newly-hot bb.  */
>>>>> >>> +                  VEC_safe_push (basic_block, heap, bbs_newly_hot, succ);
>>>>> >>> +                }
>>>>> >>> +            }
>>>>> >>> +        }
>>>>> >>> +
>>>>> >>> +      if (dom_calculated_here)
>>>>> >>> +        free_dominance_info (CDI_DOMINATORS);
>>>>> >>> +    }
>>>>> >>> +
>>>>> >>>    /* The format of .gcc_except_table does not allow landing pads to
>>>>> >>>       be in a different partition as the throw.  Fix this by either
>>>>> >>>       moving or duplicating the landing pads.  */
>>>>> >>> @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void)
>>>>> >>>                       new_bb->aux = cur_bb->aux;
>>>>> >>>                       cur_bb->aux = new_bb;
>>>>> >>>
>>>>> >>> -                     /* Make sure new fall-through bb is in same
>>>>> >>> -                        partition as bb it's falling through from.  */
>>>>> >>> +                      /* This is done by force_nonfallthru_and_redirect.  */
>>>>> >>> +                     gcc_assert (BB_PARTITION (new_bb)
>>>>> >>> +                                  == BB_PARTITION (cur_bb));
>>>>> >>>
>>>>> >>> -                     BB_COPY_PARTITION (new_bb, cur_bb);
>>>>> >>>                       single_succ_edge (new_bb)->flags |= EDGE_CROSSING;
>>>>> >>>                     }
>>>>> >>>                   else
>>>>> >>> @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void)
>>>>> >>>    FOR_EACH_BB (bb)
>>>>> >>>      FOR_EACH_EDGE (e, ei, bb->succs)
>>>>> >>>        if ((e->flags & EDGE_CROSSING)
>>>>> >>> -         && JUMP_P (BB_END (e->src)))
>>>>> >>> +         && JUMP_P (BB_END (e->src))
>>>>> >>> +          /* Some notes were added during fix_up_fall_thru_edges, via
>>>>> >>> +             force_nonfallthru_and_redirect.  */
>>>>> >>> +          && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX))
>>>>> >>>         add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>>>> >>>  }
>>>>> >>>
>>>>> >>> @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void)
>>>>> >>>        dump_flow_info (dump_file, dump_flags);
>>>>> >>>      }
>>>>> >>>
>>>>> >>> -  if (flag_reorder_blocks_and_partition)
>>>>> >>> +  if (crtl->has_bb_partition)
>>>>> >>>      verify_hot_cold_block_grouping ();
>>>>> >>>  }
>>>>> >>>
>>>>> >>> @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void)
>>>>> >>>     encountering this note will make the compiler switch between the
>>>>> >>>     hot and cold text sections.  */
>>>>> >>>
>>>>> >>> -static void
>>>>> >>> +void
>>>>> >>>  insert_section_boundary_note (void)
>>>>> >>>  {
>>>>> >>>    basic_block bb;
>>>>> >>>    rtx new_note;
>>>>> >>>    int first_partition = 0;
>>>>> >>>
>>>>> >>> -  if (!flag_reorder_blocks_and_partition)
>>>>> >>> +  if (!crtl->has_bb_partition)
>>>>> >>>      return;
>>>>> >>>
>>>>> >>>    FOR_EACH_BB (bb)
>>>>> >>> @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void)
>>>>> >>>    FOR_EACH_BB (bb)
>>>>> >>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>>>>> >>>        bb->aux = bb->next_bb;
>>>>> >>> -  cfg_layout_finalize ();
>>>>> >>> +  cfg_layout_finalize (true);
>>>>> >>>
>>>>> >>> -  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
>>>>> >>> -  insert_section_boundary_note ();
>>>>> >>>    return 0;
>>>>> >>>  }
>>>>> >>>
>>>>> >>> @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void)
>>>>> >>>      }
>>>>> >>>
>>>>> >>>  done:
>>>>> >>> -  cfg_layout_finalize ();
>>>>> >>> +  cfg_layout_finalize (false);
>>>>> >>>
>>>>> >>>    BITMAP_FREE (candidates);
>>>>> >>>    return 0;
>>>>> >>> @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void)
>>>>> >>>    if (crossing_edges == NULL)
>>>>> >>>      return 0;
>>>>> >>>
>>>>> >>> +  crtl->has_bb_partition = true;
>>>>> >>> +
>>>>> >>>    /* Make sure the source of any crossing edge ends in a jump and the
>>>>> >>>       destination of any crossing edge has a label.  */
>>>>> >>>    add_labels_and_missing_jumps (crossing_edges);
>>>>> >>> Index: bb-reorder.h
>>>>> >>> ===================================================================
>>>>> >>> --- bb-reorder.h        (revision 193376)
>>>>> >>> +++ bb-reorder.h        (working copy)
>>>>> >>> @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re
>>>>> >>>
>>>>> >>>  extern int get_uncond_jump_length (void);
>>>>> >>>
>>>>> >>> +extern void insert_section_boundary_note (void);
>>>>> >>> +
>>>>> >>> +extern void emit_barrier_after_bb (basic_block bb);
>>>>> >>> +
>>>>> >>>  #endif
>>>>> >>> Index: basic-block.h
>>>>> >>> ===================================================================
>>>>> >>> --- basic-block.h       (revision 193376)
>>>>> >>> +++ basic-block.h       (working copy)
>>>>> >>> @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect
>>>>> >>>  extern bool contains_no_active_insn_p (const_basic_block);
>>>>> >>>  extern bool forwarder_block_p (const_basic_block);
>>>>> >>>  extern bool can_fallthru (basic_block, basic_block);
>>>>> >>> +extern void fixup_partitions (void);
>>>>> >>>
>>>>> >>>  /* In cfgbuild.c.  */
>>>>> >>>  extern void find_many_sub_basic_blocks (sbitmap);
>>>>> >>> Index: cfgrtl.c
>>>>> >>> ===================================================================
>>>>> >>> --- cfgrtl.c    (revision 193376)
>>>>> >>> +++ cfgrtl.c    (working copy)
>>>>> >>> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
>>>>> >>>  #include "tree.h"
>>>>> >>>  #include "hard-reg-set.h"
>>>>> >>>  #include "basic-block.h"
>>>>> >>> +#include "bb-reorder.h"
>>>>> >>>  #include "regs.h"
>>>>> >>>  #include "flags.h"
>>>>> >>>  #include "function.h"
>>>>> >>> @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3.  If not see
>>>>> >>>     Only applicable if the CFG is in cfglayout mode.  */
>>>>> >>>  static GTY(()) rtx cfg_layout_function_footer;
>>>>> >>>  static GTY(()) rtx cfg_layout_function_header;
>>>>> >>> +static bool had_sec_boundary_notes;
>>>>> >>>
>>>>> >>>  static rtx skip_insns_after_block (basic_block);
>>>>> >>>  static void record_effective_endpoints (void);
>>>>> >>>  static rtx label_for_bb (basic_block);
>>>>> >>> -static void fixup_reorder_chain (void);
>>>>> >>> +static void fixup_reorder_chain (bool finalize_reorder_blocks);
>>>>> >>>
>>>>> >>>  void verify_insn_chain (void);
>>>>> >>>  static void fixup_fallthru_exit_predecessor (void);
>>>>> >>> @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc
>>>>> >>>       partition boundaries).  See  the comments at the top of
>>>>> >>>       bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>>>>> >>>
>>>>> >>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
>>>>> >>> -      || BB_PARTITION (src) != BB_PARTITION (target))
>>>>> >>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>>>>> >>>      return NULL;
>>>>> >>>
>>>>> >>>    /* We can replace or remove a complex jump only when we have exactly
>>>>> >>> @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target)
>>>>> >>>    return e;
>>>>> >>>  }
>>>>> >>>
>>>>> >>> +/* Called when edge E has been redirected to a new destination,
>>>>> >>> +   in order to update the region crossing flag on the edge and
>>>>> >>> +   jump.  */
>>>>> >>> +
>>>>> >>> +static void
>>>>> >>> +fixup_partition_crossing (edge e, basic_block target)
>>>>> >>> +{
>>>>> >>> +  rtx note;
>>>>> >>> +
>>>>> >>> +  gcc_assert (e->dest == target);
>>>>> >>> +
>>>>> >>> +  if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR)
>>>>> >>> +    return;
>>>>> >>> +  /* If we redirected an existing edge, it may already be marked
>>>>> >>> +     crossing, even though the new src is missing a reg crossing note.
>>>>> >>> +     But make sure reg crossing note doesn't already exist before
>>>>> >>> +     inserting.  */
>>>>> >>> +  if (BB_PARTITION (e->src) != BB_PARTITION (target))
>>>>> >>> +    {
>>>>> >>> +      e->flags |= EDGE_CROSSING;
>>>>> >>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>>>> >>> +      if (JUMP_P (BB_END (e->src))
>>>>> >>> +          && !note)
>>>>> >>> +        add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>>>> >>> +    }
>>>>> >>> +  else if (BB_PARTITION (e->src) == BB_PARTITION (target))
>>>>> >>> +    {
>>>>> >>> +      e->flags &= ~EDGE_CROSSING;
>>>>> >>> +      /* Remove the region crossing note from jump at end of
>>>>> >>> +         e->src if it exists.  */
>>>>> >>> +      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
>>>>> >>> +      if (note)
>>>>> >>> +        remove_note (BB_END (e->src), note);
>>>>> >>> +    }
>>>>> >>> +}
>>>>> >>> +
>>>>> >>> +/* Called when block BB has been reassigned to a different partition,
>>>>> >>> +   to ensure that the region crossing attributes are updated.  */
>>>>> >>> +
>>>>> >>> +static void
>>>>> >>> +fixup_bb_partition (basic_block bb)
>>>>> >>> +{
>>>>> >>> +  edge e;
>>>>> >>> +  edge_iterator ei;
>>>>> >>> +
>>>>> >>> +  /* Now need to make bb's pred edges non-region crossing.  */
>>>>> >>> +  FOR_EACH_EDGE (e, ei, bb->preds)
>>>>> >>> +    {
>>>>> >>> +      fixup_partition_crossing (e, e->dest);
>>>>> >>> +    }
>>>>> >>> +
>>>>> >>> +  /* Possibly need to make bb's successor edges region crossing,
>>>>> >>> +     or remove stale region crossing.  */
>>>>> >>> +  FOR_EACH_EDGE (e, ei, bb->succs)
>>>>> >>> +    {
>>>>> >>> +      if ((e->flags & EDGE_FALLTHRU)
>>>>> >>> +          && BB_PARTITION (bb) != BB_PARTITION (e->dest)
>>>>> >>> +          && e->dest != EXIT_BLOCK_PTR)
>>>>> >>> +        /* force_nonfallthru_and_redirect calls fixup_partition_crossing.  */
>>>>> >>> +        force_nonfallthru (e);
>>>>> >>> +      else
>>>>> >>> +        fixup_partition_crossing (e, e->dest);
>>>>> >>> +    }
>>>>> >>> +}
>>>>> >>> +
>>>>> >>>  /* Attempt to change code to redirect edge E to TARGET.  Don't do that on
>>>>> >>>     expense of adding new instructions or reordering basic blocks.
>>>>> >>>
>>>>> >>> @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>>>>> >>>  {
>>>>> >>>    edge ret;
>>>>> >>>    basic_block src = e->src;
>>>>> >>> +  basic_block dest = e->dest;
>>>>> >>>
>>>>> >>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>>>>> >>>      return NULL;
>>>>> >>>
>>>>> >>> -  if (e->dest == target)
>>>>> >>> +  if (dest == target)
>>>>> >>>      return e;
>>>>> >>>
>>>>> >>>    if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL)
>>>>> >>>      {
>>>>> >>>        df_set_bb_dirty (src);
>>>>> >>> +      fixup_partition_crossing (ret, target);
>>>>> >>>        return ret;
>>>>> >>>      }
>>>>> >>>
>>>>> >>> @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block
>>>>> >>>      return NULL;
>>>>> >>>
>>>>> >>>    df_set_bb_dirty (src);
>>>>> >>> +  fixup_partition_crossing (ret, target);
>>>>> >>>    return ret;
>>>>> >>>  }
>>>>> >>>
>>>>> >>> @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>>>>> >>>        /* Make sure new block ends up in correct hot/cold section.  */
>>>>> >>>
>>>>> >>>        BB_COPY_PARTITION (jump_block, e->src);
>>>>> >>> -      if (flag_reorder_blocks_and_partition
>>>>> >>> -         && targetm_common.have_named_sections
>>>>> >>> -         && JUMP_P (BB_END (jump_block))
>>>>> >>> -         && !any_condjump_p (BB_END (jump_block))
>>>>> >>> -         && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING))
>>>>> >>> -       add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX);
>>>>> >>>
>>>>> >>>        /* Wire edge in.  */
>>>>> >>>        new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU);
>>>>> >>>        new_edge->probability = probability;
>>>>> >>>        new_edge->count = count;
>>>>> >>>
>>>>> >>> +      /* If e->src was previously region crossing, it no longer is
>>>>> >>> +         and the reg crossing note should be removed.  */
>>>>> >>> +      fixup_partition_crossing (new_edge, jump_block);
>>>>> >>> +
>>>>> >>>        /* Redirect old edge.  */
>>>>> >>>        redirect_edge_pred (e, jump_block);
>>>>> >>>        e->probability = REG_BR_PROB_BASE;
>>>>> >>> @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
>>>>> >>>        LABEL_NUSES (label)++;
>>>>> >>>      }
>>>>> >>>
>>>>> >>> -  emit_barrier_after (BB_END (jump_block));
>>>>> >>> +  /* We might be in cfg layout mode, and if so, the following routine will
>>>>> >>> +     insert the barrier correctly.  */
>>>>> >>> +  emit_barrier_after_bb (jump_block);
>>>>> >>>    redirect_edge_succ_nodup (e, target);
>>>>> >>>
>>>>> >>>    if (abnormal_edge_flags)
>>>>> >>>      make_edge (src, target, abnormal_edge_flags);
>>>>> >>>
>>>>> >>>    df_mark_solutions_dirty ();
>>>>> >>> +  fixup_partition_crossing (e, target);
>>>>> >>>    return new_bb;
>>>>> >>>  }
>>>>> >>>
>>>>> >>> @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU
>>>>> >>>  static basic_block
>>>>> >>>  rtl_split_edge (edge edge_in)
>>>>> >>>  {
>>>>> >>> -  basic_block bb;
>>>>> >>> +  basic_block bb, new_bb;
>>>>> >>>    rtx before;
>>>>> >>>
>>>>> >>>    /* Abnormal edges cannot be split.  */
>>>>> >>> @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in)
>>>>> >>>    else
>>>>> >>>      {
>>>>> >>>        bb = create_basic_block (before, NULL, edge_in->dest->prev_bb);
>>>>> >>> -      /* ??? Why not edge_in->dest->prev_bb here?  */
>>>>> >>> -      BB_COPY_PARTITION (bb, edge_in->dest);
>>>>> >>> +      if (edge_in->src == ENTRY_BLOCK_PTR)
>>>>> >>> +        BB_COPY_PARTITION (bb, edge_in->dest);
>>>>> >>> +      else
>>>>> >>> +        /* Put the split bb into the src partition, to avoid creating
>>>>> >>> +           a situation where a cold bb dominates a hot bb, in the case
>>>>> >>> +           where src is cold and dest is hot. The src will dominate
>>>>> >>> +           the new bb (whereas it might not have dominated dest).  */
>>>>> >>> +        BB_COPY_PARTITION (bb, edge_in->src);
>>>>> >>>      }
>>>>> >>>
>>>>> >>>    make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU);
>>>>> >>>
>>>>> >>> +  /* Can't allow a region crossing edge to be fallthrough.  */
>>>>> >>> +  if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest)
>>>>> >>> +      && edge_in->dest != EXIT_BLOCK_PTR)
>>>>> >>> +    {
>>>>> >>> +      new_bb = force_nonfallthru (single_succ_edge (bb));
>>>>> >>> +      gcc_assert (!new_bb);
>>>>> >>> +    }
>>>>> >>> +
>>>>> >>>    /* For non-fallthru edges, we must adjust the predecessor's
>>>>> >>>       jump instruction to target our new block.  */
>>>>> >>>    if ((edge_in->flags & EDGE_FALLTHRU) == 0)
>>>>> >>> @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e)
>>>>> >>>    else
>>>>> >>>      {
>>>>> >>>        bb = split_edge (e);
>>>>> >>> -      after = BB_END (bb);
>>>>> >>>
>>>>> >>> -      if (flag_reorder_blocks_and_partition
>>>>> >>> -         && targetm_common.have_named_sections
>>>>> >>> -         && e->src != ENTRY_BLOCK_PTR
>>>>> >>> -         && BB_PARTITION (e->src) == BB_COLD_PARTITION
>>>>> >>> -         && !(e->flags & EDGE_CROSSING)
>>>>> >>> -         && JUMP_P (after)
>>>>> >>> -         && !any_condjump_p (after)
>>>>> >>> -         && (single_succ_edge (bb)->flags & EDGE_CROSSING))
>>>>> >>> -       add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX);
>>>>> >>> +      /* If e crossed a partition boundary, we needed to make bb end in
>>>>> >>> +         a region-crossing jump, even though it was originally fallthru.  */
>>>>> >>> +      if (JUMP_P (BB_END (bb)))
>>>>> >>> +       before = BB_END (bb);
>>>>> >>> +      else
>>>>> >>> +        after = BB_END (bb);
>>>>> >>>      }
>>>>> >>>
>>>>> >>>    /* Now that we've found the spot, do the insertion.  */
>>>>> >>> @@ -1827,6 +1907,14 @@ commit_edge_insertions (void)
>>>>> >>>  {
>>>>> >>>    basic_block bb;
>>>>> >>>
>>>>> >>> +  /* Optimization passes that invoke this routine can cause hot blocks
>>>>> >>> +     previously reached by both hot and cold blocks to become dominated only
>>>>> >>> +     by cold blocks. This will cause the verification below to fail,
>>>>> >>> +     and lead to now cold code in the hot section. In some cases this
>>>>> >>> +     may only be visible after newly unreachable blocks are deleted,
>>>>> >>> +     which will be done by fixup_partitions.  */
>>>>> >>> +  fixup_partitions ();
>>>>> >>> +
>>>>> >>>  #ifdef ENABLE_CHECKING
>>>>> >>>    verify_flow_info ();
>>>>> >>>  #endif
>>>>> >>> @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb)
>>>>> >>>
>>>>> >>>    return end;
>>>>> >>>  }
>>>>> >>> -
>>>>> >>> +
>>>>> >>> +/* Perform cleanup on the hot/cold bb partitioning after optimization
>>>>> >>> +   passes that modify the cfg.  */
>>>>> >>> +
>>>>> >>> +void
>>>>> >>> +fixup_partitions (void)
>>>>> >>> +{
>>>>> >>> +  basic_block bb;
>>>>> >>> +
>>>>> >>> +  if (!crtl->has_bb_partition)
>>>>> >>> +    return;
>>>>> >>> +
>>>>> >>> +  /* Delete any blocks that became unreachable and weren't
>>>>> >>> +     already cleaned up, for example during edge forwarding
>>>>> >>> +     and convert_jumps_to_returns. This will expose more
>>>>> >>> +     opportunities for fixing the partition boundaries here.
>>>>> >>> +     Also, the calculation of the dominance graph during verification
>>>>> >>> +     will assert if there are unreachable nodes.  */
>>>>> >>> +  delete_unreachable_blocks ();
>>>>> >>> +
>>>>> >>> +  /* If there are partitions, do a sanity check on them: A basic block in
>>>>> >>> +     a cold partition cannot dominate a basic block in a hot partition.
>>>>> >>> +     Fixup any that now violate this requirement, as a result of edge
>>>>> >>> +     forwarding and unreachable block deletion.  */
>>>>> >>> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
>>>>> >>> +  VEC (basic_block, heap) *bbs_to_fix = NULL;
>>>>> >>> +  FOR_EACH_BB (bb)
>>>>> >>> +    if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
>>>>> >>> +      VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
>>>>> >>> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
>>>>> >>> +    {
>>>>> >>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>>>>> >>> +      basic_block son;
>>>>> >>> +
>>>>> >>> +      if (dom_calculated_here)
>>>>> >>> +        calculate_dominance_info (CDI_DOMINATORS);
>>>>> >>> +
>>>>> >>> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
>>>>> >>> +        {
>>>>> >>> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
>>>>> >>> +          /* If bb is not yet cold (because it was added below as
>>>>> >>> +             a block dominated by a cold bb) then mark it cold here.  */
>>>>> >>> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
>>>>> >>> +            {
>>>>> >>> +              BB_SET_PARTITION (bb, BB_COLD_PARTITION);
>>>>> >>> +              VEC_safe_push (basic_block, heap, bbs_to_fix, bb);
>>>>> >>> +            }
>>>>> >>> +          /* Any blocks dominated by a block in the cold section
>>>>> >>> +             must also be cold.  */
>>>>> >>> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
>>>>> >>> +               son;
>>>>> >>> +               son = next_dom_son (CDI_DOMINATORS, son))
>>>>> >>> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
>>>>> >>> +        }
>>>>> >>> +
>>>>> >>> +      if (dom_calculated_here)
>>>>> >>> +        free_dominance_info (CDI_DOMINATORS);
>>>>> >>> +    }
>>>>> >>> +
>>>>> >>> +  /* Do the partition fixup after all necessary blocks have been converted to
>>>>> >>> +     cold, so that we only update the region crossings the minimum number of
>>>>> >>> +     places, which can require forcing edges to be non fallthru.  */
>>>>> >>> +  while (! VEC_empty (basic_block, bbs_to_fix))
>>>>> >>> +    {
>>>>> >>> +      bb = VEC_pop (basic_block, bbs_to_fix);
>>>>> >>> +      fixup_bb_partition (bb);
>>>>> >>> +    }
>>>>> >>> +}
>>>>> >>> +
>>>>> >>>  /* Verify the CFG and RTL consistency common for both underlying RTL and
>>>>> >>>     cfglayout RTL.
>>>>> >>>
>>>>> >>> @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void)
>>>>> >>>    rtx x;
>>>>> >>>    int err = 0;
>>>>> >>>    basic_block bb;
>>>>> >>> +  bool have_partitions = false;
>>>>> >>>
>>>>> >>>    /* Check the general integrity of the basic blocks.  */
>>>>> >>>    FOR_EACH_BB_REVERSE (bb)
>>>>> >>> @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void)
>>>>> >>>
>>>>> >>>           if (e->flags & EDGE_ABNORMAL)
>>>>> >>>             n_abnormal++;
>>>>> >>> +
>>>>> >>> +          have_partitions |= is_crossing;
>>>>> >>>         }
>>>>> >>>
>>>>> >>>        if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX))
>>>>> >>> @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void)
>>>>> >>>           }
>>>>> >>>      }
>>>>> >>>
>>>>> >>> +  /* If there are partitions, do a sanity check on them: A basic block in
>>>>> >>> +     a cold partition cannot dominate a basic block in a hot partition.  */
>>>>> >>> +  VEC (basic_block, heap) *bbs_in_cold_partition = NULL;
>>>>> >>> +  if (have_partitions && !err)
>>>>> >>> +    FOR_EACH_BB (bb)
>>>>> >>> +      if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
>>>>> >>> +        VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb);
>>>>> >>> +  if (! VEC_empty (basic_block, bbs_in_cold_partition))
>>>>> >>> +    {
>>>>> >>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>>>>> >>> +      basic_block son;
>>>>> >>> +
>>>>> >>> +      if (dom_calculated_here)
>>>>> >>> +        calculate_dominance_info (CDI_DOMINATORS);
>>>>> >>> +
>>>>> >>> +      while (! VEC_empty (basic_block, bbs_in_cold_partition))
>>>>> >>> +        {
>>>>> >>> +          bb = VEC_pop (basic_block, bbs_in_cold_partition);
>>>>> >>> +          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
>>>>> >>> +            {
>>>>> >>> +              error ("non-cold basic block %d dominated "
>>>>> >>> +                     "by a block in the cold partition", bb->index);
>>>>> >>> +              err = 1;
>>>>> >>> +            }
>>>>> >>> +          for (son = first_dom_son (CDI_DOMINATORS, bb);
>>>>> >>> +               son;
>>>>> >>> +               son = next_dom_son (CDI_DOMINATORS, son))
>>>>> >>> +            VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son);
>>>>> >>> +        }
>>>>> >>> +
>>>>> >>> +      if (dom_calculated_here)
>>>>> >>> +        free_dominance_info (CDI_DOMINATORS);
>>>>> >>> +    }
>>>>> >>> +
>>>>> >>>    /* Clean up.  */
>>>>> >>>    return err;
>>>>> >>>  }
>>>>> >>> @@ -2965,14 +3158,41 @@ record_effective_endpoints (void)
>>>>> >>>    else
>>>>> >>>      cfg_layout_function_header = NULL_RTX;
>>>>> >>>
>>>>> >>> +  had_sec_boundary_notes = false;
>>>>> >>> +
>>>>> >>>    next_insn = get_insns ();
>>>>> >>>    FOR_EACH_BB (bb)
>>>>> >>>      {
>>>>> >>>        rtx end;
>>>>> >>>
>>>>> >>>        if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb))
>>>>> >>> -       BB_HEADER (bb) = unlink_insn_chain (next_insn,
>>>>> >>> -                                             PREV_INSN (BB_HEAD (bb)));
>>>>> >>> +        {
>>>>> >>> +          /* Rather than try to keep section boundary notes incrementally
>>>>> >>> +             up-to-date through cfg layout optimizations, simply remove them
>>>>> >>> +             and flag that they should be re-inserted when exiting
>>>>> >>> +             cfg layout mode.  */
>>>>> >>> +          rtx check_insn = next_insn;
>>>>> >>> +          while (check_insn)
>>>>> >>> +            {
>>>>> >>> +              if (NOTE_P (check_insn)
>>>>> >>> +                  && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS)
>>>>> >>> +              {
>>>>> >>> +                had_sec_boundary_notes |= true;
>>>>> >>> +                /* Remove note from chain. Grab new next_insn first.  */
>>>>> >>> +                if (next_insn == check_insn)
>>>>> >>> +                  next_insn = NEXT_INSN (check_insn);
>>>>> >>> +                /* Delete note.  */
>>>>> >>> +                delete_insn (check_insn);
>>>>> >>> +                /* There will only be one.  */
>>>>> >>> +                break;
>>>>> >>> +              }
>>>>> >>> +              check_insn = NEXT_INSN (check_insn);
>>>>> >>> +            }
>>>>> >>> +          /* If we still have header instructions left after above loop.  */
>>>>> >>> +          if (next_insn != BB_HEAD (bb))
>>>>> >>> +            BB_HEADER (bb) = unlink_insn_chain (next_insn,
>>>>> >>> +                                                PREV_INSN (BB_HEAD (bb)));
>>>>> >>> +        }
>>>>> >>>        end = skip_insns_after_block (bb);
>>>>> >>>        if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end)
>>>>> >>>         BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end);
>>>>> >>> @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void)
>>>>> >>>      if (bb->next_bb != EXIT_BLOCK_PTR)
>>>>> >>>        bb->aux = bb->next_bb;
>>>>> >>>
>>>>> >>> -  cfg_layout_finalize ();
>>>>> >>> +  cfg_layout_finalize (false);
>>>>> >>>
>>>>> >>>    return 0;
>>>>> >>>  }
>>>>> >>> @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode)
>>>>> >>>  }
>>>>> >>>
>>>>> >>>
>>>>> >>> -/* Given a reorder chain, rearrange the code to match.  */
>>>>> >>> +/* Given a reorder chain, rearrange the code to match. If
>>>>> >>> +   this is called when we will FINALIZE_REORDER_BLOCKS, or when
>>>>> >>> +   section boundary notes were removed on entry to cfg layout
>>>>> >>> +   mode, insert section boundary notes here.  */
>>>>> >>>
>>>>> >>>  static void
>>>>> >>> -fixup_reorder_chain (void)
>>>>> >>> +fixup_reorder_chain (bool finalize_reorder_blocks)
>>>>> >>>  {
>>>>> >>>    basic_block bb;
>>>>> >>>    rtx insn = NULL;
>>>>> >>> @@ -3150,7 +3373,7 @@ static void
>>>>> >>>           PREV_INSN (BB_HEADER (bb)) = insn;
>>>>> >>>           insn = BB_HEADER (bb);
>>>>> >>>           while (NEXT_INSN (insn))
>>>>> >>> -           insn = NEXT_INSN (insn);
>>>>> >>> +            insn = NEXT_INSN (insn);
>>>>> >>>         }
>>>>> >>>        if (insn)
>>>>> >>>         NEXT_INSN (insn) = BB_HEAD (bb);
>>>>> >>> @@ -3175,6 +3398,11 @@ static void
>>>>> >>>      insn = NEXT_INSN (insn);
>>>>> >>>
>>>>> >>>    set_last_insn (insn);
>>>>> >>> +
>>>>> >>> +  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
>>>>> >>> +  if (had_sec_boundary_notes || finalize_reorder_blocks)
>>>>> >>> +    insert_section_boundary_note ();
>>>>> >>> +
>>>>> >>>  #ifdef ENABLE_CHECKING
>>>>> >>>    verify_insn_chain ();
>>>>> >>>  #endif
>>>>> >>> @@ -3187,7 +3415,7 @@ static void
>>>>> >>>        edge e_fall, e_taken, e;
>>>>> >>>        rtx bb_end_insn;
>>>>> >>>        rtx ret_label = NULL_RTX;
>>>>> >>> -      basic_block nb, src_bb;
>>>>> >>> +      basic_block nb;
>>>>> >>>        edge_iterator ei;
>>>>> >>>
>>>>> >>>        if (EDGE_COUNT (bb->succs) == 0)
>>>>> >>> @@ -3322,7 +3550,6 @@ static void
>>>>> >>>        /* We got here if we need to add a new jump insn.
>>>>> >>>          Note force_nonfallthru can delete E_FALL and thus we have to
>>>>> >>>          save E_FALL->src prior to the call to force_nonfallthru.  */
>>>>> >>> -      src_bb = e_fall->src;
>>>>> >>>        nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label);
>>>>> >>>        if (nb)
>>>>> >>>         {
>>>>> >>> @@ -3330,17 +3557,6 @@ static void
>>>>> >>>           bb->aux = nb;
>>>>> >>>           /* Don't process this new block.  */
>>>>> >>>           bb = nb;
>>>>> >>> -
>>>>> >>> -         /* Make sure new bb is tagged for correct section (same as
>>>>> >>> -            fall-thru source, since you cannot fall-thru across
>>>>> >>> -            section boundaries).  */
>>>>> >>> -         BB_COPY_PARTITION (src_bb, single_pred (bb));
>>>>> >>> -         if (flag_reorder_blocks_and_partition
>>>>> >>> -             && targetm_common.have_named_sections
>>>>> >>> -             && JUMP_P (BB_END (bb))
>>>>> >>> -             && !any_condjump_p (BB_END (bb))
>>>>> >>> -             && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING))
>>>>> >>> -           add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX);
>>>>> >>>         }
>>>>> >>>      }
>>>>> >>>
>>>>> >>> @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to)
>>>>> >>>             case NOTE_INSN_FUNCTION_BEG:
>>>>> >>>               /* There is always just single entry to function.  */
>>>>> >>>             case NOTE_INSN_BASIC_BLOCK:
>>>>> >>> +              /* We should only switch text sections once.  */
>>>>> >>> +           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>>>>> >>>               break;
>>>>> >>>
>>>>> >>>             case NOTE_INSN_EPILOGUE_BEG:
>>>>> >>> -           case NOTE_INSN_SWITCH_TEXT_SECTIONS:
>>>>> >>>               emit_note_copy (insn);
>>>>> >>>               break;
>>>>> >>>
>>>>> >>> @@ -3759,10 +3976,13 @@ break_superblocks (void)
>>>>> >>>  }
>>>>> >>>
>>>>> >>>  /* Finalize the changes: reorder insn list according to the sequence specified
>>>>> >>> -   by aux pointers, enter compensation code, rebuild scope forest.  */
>>>>> >>> +   by aux pointers, enter compensation code, rebuild scope forest. If
>>>>> >>> +   this is called when we will FINALIZE_REORDER_BLOCKS, indicate that
>>>>> >>> +   to fixup_reorder_chain so that it can insert the proper switch text
>>>>> >>> +   section notes.  */
>>>>> >>>
>>>>> >>>  void
>>>>> >>> -cfg_layout_finalize (void)
>>>>> >>> +cfg_layout_finalize (bool finalize_reorder_blocks)
>>>>> >>>  {
>>>>> >>>  #ifdef ENABLE_CHECKING
>>>>> >>>    verify_flow_info ();
>>>>> >>> @@ -3775,7 +3995,7 @@ void
>>>>> >>>  #endif
>>>>> >>>        )
>>>>> >>>      fixup_fallthru_exit_predecessor ();
>>>>> >>> -  fixup_reorder_chain ();
>>>>> >>> +  fixup_reorder_chain (finalize_reorder_blocks);
>>>>> >>>
>>>>> >>>    rebuild_jump_labels (get_insns ());
>>>>> >>>    delete_dead_jumptables ();
>>>>> >>> @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e)
>>>>> >>>    if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
>>>>> >>>      return false;
>>>>> >>>
>>>>> >>> -  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
>>>>> >>> -      || BB_PARTITION (src) != BB_PARTITION (target))
>>>>> >>> +  if (BB_PARTITION (src) != BB_PARTITION (target))
>>>>> >>>      return false;
>>>>> >>>
>>>>> >>>    if (!onlyjump_p (insn)
>>>>> >>>
>>>>> >>> --
>>>>> >>> This patch is available for review at http://codereview.appspot.com/6823047
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>>>
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
       [not found]             ` <CAAe5K+UOyQrDyg=pY7za9YRK=8-3dVVsfcMuJdsJp4w2X6BaJg@mail.gmail.com>
@ 2013-01-31 14:51               ` Christophe Lyon
  2013-02-05 15:45                 ` Teresa Johnson
  0 siblings, 1 reply; 35+ messages in thread
From: Christophe Lyon @ 2013-01-31 14:51 UTC (permalink / raw)
  To: Teresa Johnson
  Cc: Jack Howarth, reply, David Li, Steven Bosscher,
	Matthew Gretton-Dann, gcc-patches

Hello,

Sorry for the long delay (ref http://patchwork.ozlabs.org/patch/199397/)



On 6 December 2012 20:26, Teresa Johnson <tejohnson@google.com> wrote:
>
>
>
> On Wed, Nov 28, 2012 at 7:48 AM, Christophe Lyon
> <christophe.lyon@linaro.org> wrote:
>>
>> I have updated my trunk checkout, and I can confirm that eval.c now
>> compiles with your patch (and the other 4 patches I added to PR55121).
>
>
> good
>
>>
>>
>> Now, when looking at the whole Spec2k results:
>> - vpr passes now (used to fail)
>
>
> good
>
>>
>> - gcc, parser, perlbmk bzip2 and twolf no longer build: they all fail
>> with the same error from gas:
>> can't resolve `.text.unlikely' {.text.unlikely section} - `.LBB171'
>> {.text section}
>> - gap still does not build (same error as above)
>>
>> I haven't looked in detail, so I may be missing an obvious patch here.
>
>
> Finally had a chance to get back to this. I was able to reproduce the
> failure using x86_64 linux with "-freorder-blocks-and-partition -g".
> However, I am also getting the same failure with a pristine copy of trunk.
> Can you confirm whether you were seeing any of these failures without my
> patches, because I believe they are probably a limitation with function
> splitting and debug info that is orthogonal to my patch.
>
Yes I confirm that I see these failures without your patch too; and
both -freorder-blocks-and-partition and -g are present in my
command-line.
And now gap's integer.c fails to compile with a similar error message too.

>>
>> And I still observe runtime mis-behaviour on crafty, galgel, facerec and
>> fma3d.
>
>
> I'm not seeing this on x86_64, unfortunately, so it might require some
> follow-on work to triage and fix.
>
> I'll look into the gas failure, but if someone could review this patch in
> the meantime given that it does improve things considerably (at least
> without -g), that would be great.
>
Indeed.

> Thanks,
> Teresa
>

Thanks
Christophe

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
  2013-01-31 14:51               ` Christophe Lyon
@ 2013-02-05 15:45                 ` Teresa Johnson
  0 siblings, 0 replies; 35+ messages in thread
From: Teresa Johnson @ 2013-02-05 15:45 UTC (permalink / raw)
  To: Christophe Lyon
  Cc: Jack Howarth, reply, David Li, Steven Bosscher,
	Matthew Gretton-Dann, gcc-patches

Thanks for the confirmation that the -g issue is orthogonal. I did
start to try to address it but got pulled away by some other things
for awhile. I'll see if I can take another stab at it.

In the meantime, could one of the global maintainers take a look at
the patch? I don't want it to get too stale, and without these fixes I
am unable to get -freorder-blocks-and-partition to work at all.

Thanks!
Teresa

On Thu, Jan 31, 2013 at 6:18 AM, Christophe Lyon
<christophe.lyon@linaro.org> wrote:
> Hello,
>
> Sorry for the long delay (ref http://patchwork.ozlabs.org/patch/199397/)
>
>
>
> On 6 December 2012 20:26, Teresa Johnson <tejohnson@google.com> wrote:
>>
>>
>>
>> On Wed, Nov 28, 2012 at 7:48 AM, Christophe Lyon
>> <christophe.lyon@linaro.org> wrote:
>>>
>>> I have updated my trunk checkout, and I can confirm that eval.c now
>>> compiles with your patch (and the other 4 patches I added to PR55121).
>>
>>
>> good
>>
>>>
>>>
>>> Now, when looking at the whole Spec2k results:
>>> - vpr passes now (used to fail)
>>
>>
>> good
>>
>>>
>>> - gcc, parser, perlbmk bzip2 and twolf no longer build: they all fail
>>> with the same error from gas:
>>> can't resolve `.text.unlikely' {.text.unlikely section} - `.LBB171'
>>> {.text section}
>>> - gap still does not build (same error as above)
>>>
>>> I haven't looked in detail, so I may be missing an obvious patch here.
>>
>>
>> Finally had a chance to get back to this. I was able to reproduce the
>> failure using x86_64 linux with "-freorder-blocks-and-partition -g".
>> However, I am also getting the same failure with a pristine copy of trunk.
>> Can you confirm whether you were seeing any of these failures without my
>> patches, because I believe they are probably a limitation with function
>> splitting and debug info that is orthogonal to my patch.
>>
> Yes I confirm that I see these failures without your patch too; and
> both -freorder-blocks-and-partition and -g are present in my
> command-line.
> And now gap's integer.c fails to compile with a similar error message too.
>
>>>
>>> And I still observe runtime mis-behaviour on crafty, galgel, facerec and
>>> fma3d.
>>
>>
>> I'm not seeing this on x86_64, unfortunately, so it might require some
>> follow-on work to triage and fix.
>>
>> I'll look into the gas failure, but if someone could review this patch in
>> the meantime given that it does improve things considerably (at least
>> without -g), that would be great.
>>
> Indeed.
>
>> Thanks,
>> Teresa
>>
>
> Thanks
> Christophe



-- 
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
  2013-05-11 15:03                     ` Steven Bosscher
@ 2013-05-12 14:37                       ` Teresa Johnson
  0 siblings, 0 replies; 35+ messages in thread
From: Teresa Johnson @ 2013-05-12 14:37 UTC (permalink / raw)
  To: Steven Bosscher
  Cc: Xinliang David Li, Diego Novillo, gcc-patches, Jan Hubicka

[-- Attachment #1: Type: text/plain, Size: 6951 bytes --]

On Sat, May 11, 2013 at 8:02 AM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> On Sat, May 11, 2013 at 4:38 PM, Teresa Johnson <tejohnson@google.com> wrote:
>>   /* If we are partitioning hot/cold basic blocks, we don't want to
>>      mess up unconditional or indirect jumps that cross between hot
>>      and cold sections.
>>
>>      Basic block partitioning may result in some jumps that appear to
>>      be optimizable (or blocks that appear to be mergeable), but which really
>>      must be left untouched (they are required to make it safely across
>>      partition boundaries).  See the comments at the top of
>>      bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>>
>> And at least a bunch of these are when we are in cfglayout mode.
>>
>> But let me locate a reproducer so we can make sure it isn't due to
>> some issue with my patch.
>
> It sounds like the issue here is that the partitioning code insists on
> having an explicit jump for a section switch even for single_succ_p
> blocks i.e. unconditional jump, while normally in cfglayout mode there
> are no unconditional jumps.
>
> If that is the problem, then the proper solution is to not have the
> explicit jump. Just forward the jump, set the EDGE_CROSSING flag on
> the crossing edge, and add a REG_CROSSING_JUMP note on the branching
> insn. (FWIW, these REG_CROSSING_JUMP notes are also an aberration --
> we have the CFG edges with all the information in them! But oh
> well...) If the edge cannot fall through when going out of cfglayout
> mode after bbro (and crossing edges can't fall through) then an extra
> "forwarder jump" block will be inserted automatically. This may
> require some hacking to invert branch conditions for branching insns
> such that the branch goes to the other section and that the fallthru
> path stays in the same section, but that's something that should be
> relatively easy to do in cfgcleanup.

I dug into this yesterday and figured out why they are there in the
first place, and now have a better solution.

It turns out that these forwarder blocks are inserted by bbpart
itself. I've attached a dot graph dump after bbpart that illustrates
this (I verified that the same thing happens in current trunk, this
isn't due to my changes). See the forwarder blocks on paths between
the pink (hot partition) and blue (cold partition) blocks. These
forwarder blocks persist throughout the compile, which I was
addressing with the changes I described above to enable more
cfg_cleanup in rtl mode when we exit bbro.

The blocks are being inserted by fix_up_fall_thru_edges, called by the
main partitioning routine in bb-reorder.c. This routine looks for
edges that fall through across a partition boundary, and attempts to
invert the jump, otherwise force it fall through. The latter is done
by force_nonfallthru, which maps to rtl_force_nonfallthru even for
cfglayout mode, and which essentially always creates a new bb to
forward the branch.

There are two issues here, both relating to the check for whether
fix_up_fall_thru_edges can call invert_jump:

First, we were basically never calling invert_jump when we had a
crossing fall-through edge. This is because the check guarding the
call is: "cur_bb->aux == cond_jump->dest". It turns out that bbpart
doesn't really use the aux pointer, so it is pretty much always NULL.
The routine does set up the aux pointer for new bbs inserted by fixup,
but these are cleared again after fixup by calling
clear_aux_for_blocks in caller partition_hot_cold_basic_blocks. I
confirmed by instrumenting to check for non-null aux pointers here and
compiling all of cpu2006int, and it is non-null exactly once, which
was presumably a bb inserted by fix_up_fall_thru_edges itself. It
looks like the correct check for whether the new fall-through is next
in bb order would be to check cur_bb->next_bb instead of cur_bb->aux.
However, the new fall-through (the other successor) is only next in bb
order some of the time, so fixing this check is not going to fully
address the problem.

However, the main issue is that I don't think we care about what is
next in bb order at all at this point in the compilation, which is in
cfglayout mode and before bbro! In fact, there is other code in
cfglayout mode, specifically in fixup_reorder_chain called at the end
of cfglayout mode, that calls invert_jump without regard to bb layout
order, and also indicates in the comments that it should always
succeed:

              /* Otherwise we can try to invert the jump.  This will
                 basically never fail, however, keep up the pretense.  */
              else if (invert_jump (bb_end_insn,
                                    (e_fall->dest == EXIT_BLOCK_PTR
                                     ? NULL_RTX
                                     : label_for_bb (e_fall->dest)), 0))

Note though that if it were to return false we would reach a call to
force_nonfallthru_and_redirect, which would go ahead and insert a
forwarding block.

Similar to the above code, I think fix_up_fall_thru_edges should
unconditionally call invert_jump without regard to bb layout
successors. I went ahead and tried this (removing the check for
"cur_bb->aux == cond_jump->dest") and this worked great - we
successfully inverted the jumps in the cases I looked at and never
inserted a forwarder block.

I also instrumented the code to see how many times we still couldn't
call invert_jump, because the following guard failed:
                      if (old_jump && JUMP_P (old_jump) && fall_thru_label)
and also instrumented when we called invert_jump but it failed
(returned false). I built cpu2006int and everything built fine with
the change to remove the bb layout order check. I found that on a
couple hundred cases we were not able to call invert_jump because the
guard above failed, but when we called invert_jump (the vast majority
of cases), it never failed. However, similar to the code in
fixup_reorder_chain, I think we want to leave the handling of the
failure case in as a catch-all, and in any case it is sometimes needed
because the guard above will fail, and also when both successor edges
are crossing (when cond_jump_crosses is true as well).

I am currently building/running all of cpu2006 with the test for the
bb layout order removed, and my earlier fix for this problem that
called cleanup_cfg again in rtl mode at the end of bbro removed.

(BTW, the testing with the change to move the note insertion to
free_cfg all went fine, so I will clean that up and put it in the next
version of the patch instead of the changes to strip/reinsert the
switch notes when going in/out of cfglayout mode.)

Thanks for your questions and help - the patch is getting better. I'll
work on cleanup up the stuff I fixed here, and splitting it into 3
next, assuming the testing completes fine.

Teresa

>
> Ciao!
> Steven



--
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

[-- Attachment #2: spec.c.199r.bbpart.dot --]
[-- Type: application/octet-stream, Size: 108922 bytes --]

digraph "spec.c.199r.bbpart" {
overlap=false;
subgraph "spec_init" {
	color="black";
	label="spec_init";
	fn_31_basic_block_1 [shape=Mdiamond,style=filled,fillcolor=white,label="EXIT"];

	fn_31_basic_block_2 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:3 FREQ:0 |\ \ \ 31:\ NOTE_INSN_BASIC_BLOCK\ 2\l\
|\ \ \ 29:\ NOTE_INSN_FUNCTION_BEG\l\
|\ \ \ 33:\ flags:CCGC=cmp([`dbglvl'],0x3)\l\
|\ \ \ 34:\ pc=\{(flags:CCGC\<=0)?L38:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0\l\
}"];

	fn_31_basic_block_3 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:3 FREQ:0 |\ \ \ 35:\ NOTE_INSN_BASIC_BLOCK\ 3\l\
|\ \ \ 36:\ di:DI=`*.LC2'\l\
|\ \ \ 37:\ ax:SI=call\ [`puts']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
}"];

	fn_31_basic_block_4 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:3 FREQ:0 |\ \ \ 38:\ L38:\l\
|\ \ \ 39:\ NOTE_INSN_BASIC_BLOCK\ 4\l\
|\ \ \ 41:\ debug\ i\ =\>\ 0\l\
|\ \ \ 42:\ r96:DI=`spec_fd'\l\
|\ \ \ 56:\ r124:SI=0\l\
}"];

	fn_31_basic_block_5 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:9 FREQ:0 |\ \ \ 48:\ L48:\l\
|\ \ \ 49:\ NOTE_INSN_BASIC_BLOCK\ 5\l\
|\ \ \ 50:\ debug\ i\ =\>\ optimized\ away\l\
|\ \ \ 52:\ r90:SI=[r96:DI]\l\
|\ \ \ 53:\ debug\ limit\ =\>\ r90:SI\l\
|\ \ \ 54:\ r106:DI=r96:DI\l\
|\ \ \ 55:\ r107:SI=0x18\l\
|\ \ \ 57:\ NOTE_INSN_DELETED\l\
|\ \ \ 58:\ flags:CCZ=cmp(zero_extract(r96:DI,0x1,0),0)\l\
|\ \ \ 59:\ pc=\{(flags:CCZ==0)?L62:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCZ\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x2328\l\
}"];

	fn_31_basic_block_34 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:1 FREQ:0 |\ \ 454:\ NOTE_INSN_BASIC_BLOCK\ 34\l\
|\ \ 455:\ pc=L447\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_31_basic_block_6 [shape=record,style=filled,fillcolor=lightblue,label="{COUNT:1 FREQ:0 |\ \ 447:\ L447:\l\
|\ \ 142:\ NOTE_INSN_BASIC_BLOCK\ 6\l\
|\ \ \ 60:\ \{[r96:DI]=r124:SI#0;r106:DI=r96:DI+0x1;unspec[0]\ 38;\}\l\
|\ \ \ 61:\ r107:SI=0x17\l\
|\ \ 448:\ pc=L62\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_31_basic_block_7 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:9 FREQ:0 |\ \ \ 62:\ L62:\l\
|\ \ 143:\ NOTE_INSN_BASIC_BLOCK\ 7\l\
|\ \ \ 63:\ NOTE_INSN_DELETED\l\
|\ \ \ 64:\ flags:CCZ=cmp(zero_extract(r106:DI,0x1,0x1),0)\l\
|\ \ \ 65:\ pc=\{(flags:CCZ==0)?L68:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCZ\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x2328\l\
}"];

	fn_31_basic_block_35 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:1 FREQ:0 |\ \ 458:\ NOTE_INSN_BASIC_BLOCK\ 35\l\
|\ \ 459:\ pc=L450\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_31_basic_block_8 [shape=record,style=filled,fillcolor=lightblue,label="{COUNT:1 FREQ:0 |\ \ 450:\ L450:\l\
|\ \ 144:\ NOTE_INSN_BASIC_BLOCK\ 8\l\
|\ \ \ 66:\ \{[r106:DI]=r124:SI#0;r106:DI=r106:DI+0x2;unspec[0]\ 38;\}\l\
|\ \ \ 67:\ \{r107:SI=r107:SI-0x2;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ 451:\ pc=L68\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_31_basic_block_9 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:9 FREQ:0 |\ \ \ 68:\ L68:\l\
|\ \ 145:\ NOTE_INSN_BASIC_BLOCK\ 9\l\
|\ \ \ 69:\ \{r111:SI=r107:SI\ 0\>\>0x2;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ \ 70:\ r112:DI=zero_extend(r111:SI)\l\
\ \ \ \ \ \ REG_DEAD\ r111:SI\l\
|\ \ \ 71:\ \{r131:DI=0;r130:DI=r112:DI\<\<0x2+r106:DI;[r106:DI]=0;use\ r124:SI;use\ r112:DI;\}\l\
\ \ \ \ \ \ REG_DEAD\ r112:DI\l\
\ \ \ \ \ \ REG_DEAD\ r106:DI\l\
\ \ \ \ \ \ REG_UNUSED\ r131:DI\l\
|\ \ \ 72:\ NOTE_INSN_DELETED\l\
|\ \ \ 73:\ flags:CCZ=cmp(zero_extract(r107:SI#0,0x1,0x1),0)\l\
|\ \ \ 74:\ pc=\{(flags:CCZ==0)?L76:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCZ\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x1388\l\
}"];

	fn_31_basic_block_10 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:4 FREQ:0 |\ \ 146:\ NOTE_INSN_BASIC_BLOCK\ 10\l\
|\ \ \ 75:\ \{[r130:DI]=r124:SI#0;r130:DI=r130:DI+0x2;unspec[0]\ 38;\}\l\
}"];

	fn_31_basic_block_11 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:9 FREQ:0 |\ \ \ 76:\ L76:\l\
|\ \ 147:\ NOTE_INSN_BASIC_BLOCK\ 11\l\
|\ \ \ 77:\ NOTE_INSN_DELETED\l\
|\ \ \ 78:\ flags:CCZ=cmp(zero_extract(r107:SI#0,0x1,0),0)\l\
\ \ \ \ \ \ REG_DEAD\ r107:SI\l\
|\ \ \ 79:\ pc=\{(flags:CCZ==0)?L81:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCZ\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x1388\l\
}"];

	fn_31_basic_block_12 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:4 FREQ:0 |\ \ 148:\ NOTE_INSN_BASIC_BLOCK\ 12\l\
|\ \ \ 80:\ \{[r130:DI]=r124:SI#0;r132:DI=r130:DI+0x1;unspec[0]\ 38;\}\l\
\ \ \ \ \ \ REG_DEAD\ r130:DI\l\
\ \ \ \ \ \ REG_UNUSED\ r132:DI\l\
}"];

	fn_31_basic_block_13 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:9 FREQ:0 |\ \ \ 81:\ L81:\l\
|\ \ 149:\ NOTE_INSN_BASIC_BLOCK\ 13\l\
|\ \ \ 82:\ [r96:DI]=r90:SI\l\
|\ \ \ 83:\ \{r115:SI=r90:SI+0x100000;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ \ 84:\ r116:DI=sign_extend(r115:SI)\l\
\ \ \ \ \ \ REG_DEAD\ r115:SI\l\
|\ \ \ 85:\ di:DI=r116:DI\l\
\ \ \ \ \ \ REG_DEAD\ r116:DI\l\
|\ \ \ 86:\ ax:DI=call\ [`malloc']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_EH_REGION\ 0\l\
|\ \ \ 87:\ r117:DI=ax:DI\l\
\ \ \ \ \ \ REG_DEAD\ ax:DI\l\
\ \ \ \ \ \ REG_NOALIAS\ r117:DI\l\
|\ \ \ 89:\ [r96:DI+0x10]=r117:DI\l\
|\ \ \ 90:\ flags:CCZ=cmp(r117:DI,0)\l\
|\ \ \ 91:\ pc=\{(flags:CCZ!=0)?L98:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCZ\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x2710\l\
}"];

	fn_31_basic_block_15 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:9 FREQ:0 |\ \ \ 98:\ L98:\l\
|\ \ \ 99:\ NOTE_INSN_BASIC_BLOCK\ 15\l\
|\ \ 101:\ debug\ j\ =\>\ 0\l\
|\ \ 102:\ flags:CCNO=cmp(r90:SI,0)\l\
|\ \ 103:\ pc=\{(flags:CCNO\<=0)?L124:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCNO\l\
\ \ \ \ \ \ REG_BR_PROB\ 0\l\
}"];

	fn_31_basic_block_16 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:9 FREQ:0 |\ \ 104:\ NOTE_INSN_BASIC_BLOCK\ 16\l\
|\ \ 105:\ \{r118:SI=r90:SI-0x1;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_DEAD\ r90:SI\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ 106:\ \{r119:SI=r118:SI\ 0\>\>0xa;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_DEAD\ r118:SI\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ 107:\ r120:DI=zero_extend(r119:SI)\l\
\ \ \ \ \ \ REG_DEAD\ r119:SI\l\
|\ \ 108:\ \{r121:DI=r120:DI+0x1;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_DEAD\ r120:DI\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ 109:\ \{r86:DI=r121:DI\<\<0xa;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_DEAD\ r121:DI\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ 424:\ debug\ D#3\ =\>\ 0\l\
|\ \ 171:\ \{r127:DI=r86:DI-0x400;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ 172:\ \{r125:DI=r127:DI\ 0\>\>0xa;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_DEAD\ r127:DI\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ 173:\ \{r128:DI=r125:DI&0x7;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_DEAD\ r125:DI\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ 176:\ debug\ j\ =\>\ D#3#0\l\
|\ \ 178:\ [r117:DI]=0\l\
\ \ \ \ \ \ REG_DEAD\ r117:DI\l\
|\ \ 179:\ debug\ D#1\ =\>\ D#3#0+0x400\l\
|\ \ 180:\ debug\ j\ =\>\ D#1\l\
|\ \ 181:\ r133:DI=0x400\l\
|\ \ 182:\ flags:CCZ=cmp(r133:DI,r86:DI)\l\
\ \ \ \ \ \ REG_EQUAL\ cmp(0x400,r86:DI)\l\
|\ \ 183:\ pc=\{(flags:CCZ==0)?L124:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCZ\l\
\ \ \ \ \ \ REG_BR_PROB\ 0\l\
}"];

	fn_31_basic_block_20 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 306:\ NOTE_INSN_BASIC_BLOCK\ 20\l\
|\ \ 304:\ flags:CCZ=cmp(r128:DI,0)\l\
|\ \ 305:\ pc=\{(flags:CCZ==0)?L121:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCZ\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x4e2\l\
}"];

	fn_31_basic_block_21 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 289:\ NOTE_INSN_BASIC_BLOCK\ 21\l\
|\ \ 287:\ flags:CCZ=cmp(r128:DI,0x1)\l\
|\ \ 288:\ pc=\{(flags:CCZ==0)?L418:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCZ\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x594\l\
}"];

	fn_31_basic_block_22 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 272:\ NOTE_INSN_BASIC_BLOCK\ 22\l\
|\ \ 270:\ flags:CCZ=cmp(r128:DI,0x2)\l\
|\ \ 271:\ pc=\{(flags:CCZ==0)?L419:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCZ\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x682\l\
}"];

	fn_31_basic_block_23 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 255:\ NOTE_INSN_BASIC_BLOCK\ 23\l\
|\ \ 253:\ flags:CCZ=cmp(r128:DI,0x3)\l\
|\ \ 254:\ pc=\{(flags:CCZ==0)?L420:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCZ\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x7d0\l\
}"];

	fn_31_basic_block_24 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 238:\ NOTE_INSN_BASIC_BLOCK\ 24\l\
|\ \ 236:\ flags:CCZ=cmp(r128:DI,0x4)\l\
|\ \ 237:\ pc=\{(flags:CCZ==0)?L421:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCZ\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x9c4\l\
}"];

	fn_31_basic_block_25 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 221:\ NOTE_INSN_BASIC_BLOCK\ 25\l\
|\ \ 219:\ flags:CCZ=cmp(r128:DI,0x5)\l\
|\ \ 220:\ pc=\{(flags:CCZ==0)?L422:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCZ\l\
\ \ \ \ \ \ REG_BR_PROB\ 0xd05\l\
}"];

	fn_31_basic_block_26 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 204:\ NOTE_INSN_BASIC_BLOCK\ 26\l\
|\ \ 202:\ flags:CCZ=cmp(r128:DI,0x6)\l\
\ \ \ \ \ \ REG_DEAD\ r128:DI\l\
|\ \ 203:\ pc=\{(flags:CCZ==0)?L423:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCZ\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x1388\l\
}"];

	fn_31_basic_block_27 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 197:\ NOTE_INSN_BASIC_BLOCK\ 27\l\
|\ \ 189:\ debug\ j\ =\>\ D#3#0\l\
|\ \ 190:\ r149:DI=[r96:DI+0x10]\l\
|\ \ 191:\ [r149:DI+0x400]=0\l\
\ \ \ \ \ \ REG_DEAD\ r149:DI\l\
|\ \ 192:\ debug\ D#1\ =\>\ D#3#0+0x400\l\
|\ \ 193:\ debug\ j\ =\>\ D#1\l\
|\ \ 194:\ r133:DI=0x800\l\
}"];

	fn_31_basic_block_28 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 423:\ L423:\l\
|\ \ 214:\ NOTE_INSN_BASIC_BLOCK\ 28\l\
|\ \ 206:\ debug\ j\ =\>\ D#3#0\l\
|\ \ 207:\ r150:DI=[r96:DI+0x10]\l\
|\ \ 208:\ [r150:DI+r133:DI]=0\l\
\ \ \ \ \ \ REG_DEAD\ r150:DI\l\
|\ \ 209:\ debug\ D#1\ =\>\ D#3#0+0x400\l\
|\ \ 210:\ debug\ j\ =\>\ D#1\l\
|\ \ 211:\ \{r133:DI=r133:DI+0x400;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
}"];

	fn_31_basic_block_29 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 422:\ L422:\l\
|\ \ 231:\ NOTE_INSN_BASIC_BLOCK\ 29\l\
|\ \ 223:\ debug\ j\ =\>\ D#3#0\l\
|\ \ 224:\ r151:DI=[r96:DI+0x10]\l\
|\ \ 225:\ [r151:DI+r133:DI]=0\l\
\ \ \ \ \ \ REG_DEAD\ r151:DI\l\
|\ \ 226:\ debug\ D#1\ =\>\ D#3#0+0x400\l\
|\ \ 227:\ debug\ j\ =\>\ D#1\l\
|\ \ 228:\ \{r133:DI=r133:DI+0x400;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
}"];

	fn_31_basic_block_30 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 421:\ L421:\l\
|\ \ 248:\ NOTE_INSN_BASIC_BLOCK\ 30\l\
|\ \ 240:\ debug\ j\ =\>\ D#3#0\l\
|\ \ 241:\ r152:DI=[r96:DI+0x10]\l\
|\ \ 242:\ [r152:DI+r133:DI]=0\l\
\ \ \ \ \ \ REG_DEAD\ r152:DI\l\
|\ \ 243:\ debug\ D#1\ =\>\ D#3#0+0x400\l\
|\ \ 244:\ debug\ j\ =\>\ D#1\l\
|\ \ 245:\ \{r133:DI=r133:DI+0x400;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
}"];

	fn_31_basic_block_31 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 420:\ L420:\l\
|\ \ 265:\ NOTE_INSN_BASIC_BLOCK\ 31\l\
|\ \ 257:\ debug\ j\ =\>\ D#3#0\l\
|\ \ 258:\ r153:DI=[r96:DI+0x10]\l\
|\ \ 259:\ [r153:DI+r133:DI]=0\l\
\ \ \ \ \ \ REG_DEAD\ r153:DI\l\
|\ \ 260:\ debug\ D#1\ =\>\ D#3#0+0x400\l\
|\ \ 261:\ debug\ j\ =\>\ D#1\l\
|\ \ 262:\ \{r133:DI=r133:DI+0x400;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
}"];

	fn_31_basic_block_32 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 419:\ L419:\l\
|\ \ 282:\ NOTE_INSN_BASIC_BLOCK\ 32\l\
|\ \ 274:\ debug\ j\ =\>\ D#3#0\l\
|\ \ 275:\ r154:DI=[r96:DI+0x10]\l\
|\ \ 276:\ [r154:DI+r133:DI]=0\l\
\ \ \ \ \ \ REG_DEAD\ r154:DI\l\
|\ \ 277:\ debug\ D#1\ =\>\ D#3#0+0x400\l\
|\ \ 278:\ debug\ j\ =\>\ D#1\l\
|\ \ 279:\ \{r133:DI=r133:DI+0x400;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
}"];

	fn_31_basic_block_33 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 418:\ L418:\l\
|\ \ 299:\ NOTE_INSN_BASIC_BLOCK\ 33\l\
|\ \ 291:\ debug\ j\ =\>\ D#3#0\l\
|\ \ 292:\ r155:DI=[r96:DI+0x10]\l\
|\ \ 293:\ [r155:DI+r133:DI]=0\l\
\ \ \ \ \ \ REG_DEAD\ r155:DI\l\
|\ \ 294:\ debug\ D#1\ =\>\ D#3#0+0x400\l\
|\ \ 295:\ debug\ j\ =\>\ D#1\l\
|\ \ 296:\ \{r133:DI=r133:DI+0x400;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ 297:\ flags:CCZ=cmp(r133:DI,r86:DI)\l\
|\ \ 298:\ pc=\{(flags:CCZ==0)?L124:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCZ\l\
\ \ \ \ \ \ REG_BR_PROB\ 0\l\
}"];

	fn_31_basic_block_17 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:36480 FREQ:1250 |\ \ 121:\ L121:\l\
|\ \ 110:\ NOTE_INSN_BASIC_BLOCK\ 17\l\
|\ \ 111:\ debug\ j\ =\>\ D#3#0\l\
|\ \ 112:\ r134:DI=[r96:DI+0x10]\l\
|\ \ 113:\ [r134:DI+r133:DI]=0\l\
\ \ \ \ \ \ REG_DEAD\ r134:DI\l\
|\ \ 114:\ debug\ D#1\ =\>\ D#3#0+0x400\l\
|\ \ 308:\ debug\ j\ =\>\ D#3#0\l\
|\ \ 309:\ r136:DI=[r96:DI+0x10]\l\
|\ \ 310:\ [r136:DI+r133:DI+0x400]=0\l\
\ \ \ \ \ \ REG_DEAD\ r136:DI\l\
|\ \ 311:\ debug\ D#1\ =\>\ D#3#0+0x400\l\
|\ \ 320:\ debug\ j\ =\>\ D#3#0\l\
|\ \ 321:\ r138:DI=[r96:DI+0x10]\l\
|\ \ 322:\ [r138:DI+r133:DI+0x800]=0\l\
\ \ \ \ \ \ REG_DEAD\ r138:DI\l\
|\ \ 323:\ debug\ D#1\ =\>\ D#3#0+0x400\l\
|\ \ 332:\ debug\ j\ =\>\ D#3#0\l\
|\ \ 333:\ r140:DI=[r96:DI+0x10]\l\
|\ \ 334:\ [r140:DI+r133:DI+0xc00]=0\l\
\ \ \ \ \ \ REG_DEAD\ r140:DI\l\
|\ \ 335:\ debug\ D#1\ =\>\ D#3#0+0x400\l\
|\ \ 344:\ debug\ j\ =\>\ D#3#0\l\
|\ \ 345:\ r142:DI=[r96:DI+0x10]\l\
|\ \ 346:\ [r142:DI+r133:DI+0x1000]=0\l\
\ \ \ \ \ \ REG_DEAD\ r142:DI\l\
|\ \ 347:\ debug\ D#1\ =\>\ D#3#0+0x400\l\
|\ \ 356:\ debug\ j\ =\>\ D#3#0\l\
|\ \ 357:\ r144:DI=[r96:DI+0x10]\l\
|\ \ 358:\ [r144:DI+r133:DI+0x1400]=0\l\
\ \ \ \ \ \ REG_DEAD\ r144:DI\l\
|\ \ 359:\ debug\ D#1\ =\>\ D#3#0+0x400\l\
|\ \ 368:\ debug\ j\ =\>\ D#3#0\l\
|\ \ 369:\ r146:DI=[r96:DI+0x10]\l\
|\ \ 370:\ [r146:DI+r133:DI+0x1800]=0\l\
\ \ \ \ \ \ REG_DEAD\ r146:DI\l\
|\ \ 371:\ debug\ D#1\ =\>\ D#3#0+0x400\l\
|\ \ 380:\ debug\ j\ =\>\ D#3#0\l\
|\ \ 381:\ r148:DI=[r96:DI+0x10]\l\
|\ \ 382:\ [r148:DI+r133:DI+0x1c00]=0\l\
\ \ \ \ \ \ REG_DEAD\ r148:DI\l\
|\ \ 383:\ debug\ D#1\ =\>\ D#3#0+0x400\l\
|\ \ 384:\ debug\ j\ =\>\ D#1\l\
|\ \ 385:\ \{r133:DI=r133:DI+0x2000;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ 386:\ flags:CCZ=cmp(r133:DI,r86:DI)\l\
|\ \ 387:\ pc=\{(flags:CCZ==0)?L124:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCZ\l\
\ \ \ \ \ \ REG_BR_PROB\ 0\l\
}"];

	fn_31_basic_block_18 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:9 FREQ:0 |\ \ 124:\ L124:\l\
|\ \ 125:\ NOTE_INSN_BASIC_BLOCK\ 18\l\
|\ \ 127:\ debug\ i\ =\>\ D#2\l\
|\ \ 128:\ \{r96:DI=r96:DI+0x18;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ 130:\ flags:CCZ=cmp(r96:DI,const(`spec_fd'+0x48))\l\
|\ \ 131:\ pc=\{(flags:CCZ!=0)?L48:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCZ\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x1d4c\l\
}"];

	fn_31_basic_block_19 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:3 FREQ:0 |\ \ 132:\ NOTE_INSN_BASIC_BLOCK\ 19\l\
|\ \ 137:\ ax:SI=0\l\
|\ \ 140:\ use\ ax:SI\l\
}"];

	fn_31_basic_block_36 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 462:\ NOTE_INSN_BASIC_BLOCK\ 36\l\
|\ \ 463:\ pc=L453\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_31_basic_block_14 [shape=record,style=filled,fillcolor=lightblue,label="{ FREQ:0 |\ \ 453:\ L453:\l\
|\ \ \ 92:\ NOTE_INSN_BASIC_BLOCK\ 14\l\
|\ \ \ 93:\ di:DI=`*.LC3'\l\
|\ \ \ 94:\ ax:SI=call\ [`puts']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
|\ \ \ 95:\ di:SI=0\l\
|\ \ \ 96:\ call\ [`exit']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:SI\l\
\ \ \ \ \ \ REG_NORETURN\ 0\l\
\ \ \ \ \ \ REG_EH_REGION\ 0\l\
}"];

	fn_31_basic_block_0 [shape=Mdiamond,style=filled,fillcolor=white,label="ENTRY"];

	fn_31_basic_block_0:s -> fn_31_basic_block_2:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_31_basic_block_2:s -> fn_31_basic_block_3:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_31_basic_block_2:s -> fn_31_basic_block_4:n [style="solid,bold",color=black,weight=10,constraint=true, label="[0%]"];
	fn_31_basic_block_3:s -> fn_31_basic_block_4:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_31_basic_block_4:s -> fn_31_basic_block_5:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_31_basic_block_5:s -> fn_31_basic_block_7:n [style="solid,bold",color=black,weight=10,constraint=true, label="[90%]"];
	fn_31_basic_block_5:s -> fn_31_basic_block_34:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[10%]"];
	fn_31_basic_block_34:s -> fn_31_basic_block_6:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_31_basic_block_6:s -> fn_31_basic_block_7:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_31_basic_block_7:s -> fn_31_basic_block_9:n [style="solid,bold",color=black,weight=10,constraint=true, label="[90%]"];
	fn_31_basic_block_7:s -> fn_31_basic_block_35:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[10%]"];
	fn_31_basic_block_35:s -> fn_31_basic_block_8:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_31_basic_block_8:s -> fn_31_basic_block_9:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_31_basic_block_9:s -> fn_31_basic_block_11:n [style="solid,bold",color=black,weight=10,constraint=true, label="[50%]"];
	fn_31_basic_block_9:s -> fn_31_basic_block_10:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[50%]"];
	fn_31_basic_block_10:s -> fn_31_basic_block_11:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_31_basic_block_11:s -> fn_31_basic_block_13:n [style="solid,bold",color=black,weight=10,constraint=true, label="[50%]"];
	fn_31_basic_block_11:s -> fn_31_basic_block_12:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[50%]"];
	fn_31_basic_block_12:s -> fn_31_basic_block_13:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_31_basic_block_13:s -> fn_31_basic_block_36:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[0%]"];
	fn_31_basic_block_13:s -> fn_31_basic_block_15:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_31_basic_block_36:s -> fn_31_basic_block_14:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_31_basic_block_15:s -> fn_31_basic_block_16:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_31_basic_block_15:s -> fn_31_basic_block_18:n [style="solid,bold",color=black,weight=10,constraint=true, label="[0%]"];
	fn_31_basic_block_16:s -> fn_31_basic_block_20:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_31_basic_block_16:s -> fn_31_basic_block_18:n [style="solid,bold",color=black,weight=10,constraint=true, label="[0%]"];
	fn_31_basic_block_17:s -> fn_31_basic_block_17:n [style="dotted,bold",color=blue,weight=10,constraint=false, label="[100%]"];
	fn_31_basic_block_17:s -> fn_31_basic_block_18:n [style="solid,bold",color=black,weight=10,constraint=true, label="[0%]"];
	fn_31_basic_block_18:s -> fn_31_basic_block_5:n [style="dotted,bold",color=blue,weight=10,constraint=false, label="[75%]"];
	fn_31_basic_block_18:s -> fn_31_basic_block_19:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[25%]"];
	fn_31_basic_block_19:s -> fn_31_basic_block_1:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_31_basic_block_20:s -> fn_31_basic_block_21:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_31_basic_block_20:s -> fn_31_basic_block_17:n [style="solid,bold",color=black,weight=10,constraint=true, label="[12%]"];
	fn_31_basic_block_21:s -> fn_31_basic_block_22:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_31_basic_block_21:s -> fn_31_basic_block_33:n [style="solid,bold",color=black,weight=10,constraint=true, label="[14%]"];
	fn_31_basic_block_22:s -> fn_31_basic_block_23:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_31_basic_block_22:s -> fn_31_basic_block_32:n [style="solid,bold",color=black,weight=10,constraint=true, label="[16%]"];
	fn_31_basic_block_23:s -> fn_31_basic_block_24:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_31_basic_block_23:s -> fn_31_basic_block_31:n [style="solid,bold",color=black,weight=10,constraint=true, label="[20%]"];
	fn_31_basic_block_24:s -> fn_31_basic_block_25:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_31_basic_block_24:s -> fn_31_basic_block_30:n [style="solid,bold",color=black,weight=10,constraint=true, label="[25%]"];
	fn_31_basic_block_25:s -> fn_31_basic_block_26:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_31_basic_block_25:s -> fn_31_basic_block_29:n [style="solid,bold",color=black,weight=10,constraint=true, label="[33%]"];
	fn_31_basic_block_26:s -> fn_31_basic_block_27:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_31_basic_block_26:s -> fn_31_basic_block_28:n [style="solid,bold",color=black,weight=10,constraint=true, label="[50%]"];
	fn_31_basic_block_27:s -> fn_31_basic_block_28:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_31_basic_block_28:s -> fn_31_basic_block_29:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_31_basic_block_29:s -> fn_31_basic_block_30:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_31_basic_block_30:s -> fn_31_basic_block_31:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_31_basic_block_31:s -> fn_31_basic_block_32:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_31_basic_block_32:s -> fn_31_basic_block_33:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_31_basic_block_33:s -> fn_31_basic_block_17:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_31_basic_block_33:s -> fn_31_basic_block_18:n [style="solid,bold",color=black,weight=10,constraint=true, label="[0%]"];
	fn_31_basic_block_0:s -> fn_31_basic_block_1:n [style="invis",constraint=true];
}
subgraph "spec_load" {
	color="black";
	label="spec_load";
	fn_33_basic_block_1 [shape=Mdiamond,style=filled,fillcolor=white,label="EXIT"];

	fn_33_basic_block_2 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:3 FREQ:62 |\ \ \ \ 7:\ NOTE_INSN_BASIC_BLOCK\ 2\l\
|\ \ \ \ 2:\ r88:SI=di:SI\l\
\ \ \ \ \ \ REG_DEAD\ di:SI\l\
|\ \ \ \ 3:\ r89:DI=si:DI\l\
\ \ \ \ \ \ REG_DEAD\ si:DI\l\
|\ \ \ \ 4:\ r90:SI=dx:SI\l\
\ \ \ \ \ \ REG_DEAD\ dx:SI\l\
|\ \ \ \ 5:\ NOTE_INSN_FUNCTION_BEG\l\
|\ \ \ \ 9:\ si:SI=0\l\
|\ \ \ 10:\ di:DI=r89:DI\l\
|\ \ \ 11:\ ax:QI=0\l\
|\ \ \ 12:\ ax:SI=call\ [`open']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_DEAD\ si:SI\l\
|\ \ \ 13:\ r60:SI=ax:SI\l\
\ \ \ \ \ \ REG_DEAD\ ax:SI\l\
|\ \ \ 14:\ debug\ fd\ =\>\ r60:SI\l\
|\ \ \ 15:\ flags:CCGOC=cmp(r60:SI,0)\l\
|\ \ \ 16:\ pc=\{(flags:CCGOC\>=0)?L32:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGOC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x2710\l\
}"];

	fn_33_basic_block_4 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:3 FREQ:62 |\ \ \ 32:\ L32:\l\
|\ \ \ 33:\ NOTE_INSN_BASIC_BLOCK\ 4\l\
|\ \ \ 34:\ r152:DI=sign_extend(r88:SI)\l\
\ \ \ \ \ \ REG_DEAD\ r88:SI\l\
|\ \ \ 36:\ \{r93:DI=r152:DI*0x18;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ \ 37:\ \{r94:DI=r93:DI+`spec_fd';clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ \ 38:\ [r94:DI+0x4]=0\l\
|\ \ \ 43:\ [r94:DI+0x8]=0\l\
\ \ \ \ \ \ REG_DEAD\ r94:DI\l\
|\ \ \ 44:\ debug\ i\ =\>\ 0\l\
|\ \ \ \ 6:\ r59:SI=0\l\
|\ \ \ 50:\ r154:DI=r93:DI\l\
\ \ \ \ \ \ REG_DEAD\ r93:DI\l\
}"];

	fn_33_basic_block_10 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:482 FREQ:10000 |\ \ 106:\ L106:\l\
|\ \ 107:\ NOTE_INSN_BASIC_BLOCK\ 10\l\
|\ \ 108:\ debug\ i\ =\>\ r59:SI\l\
|\ \ 110:\ flags:CCGC=cmp(r59:SI,r90:SI)\l\
|\ \ 111:\ pc=\{(flags:CCGC\<0)?L109:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x2710\l\
}"];

	fn_33_basic_block_5 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:482 FREQ:10000 |\ \ 109:\ L109:\l\
|\ \ \ 47:\ NOTE_INSN_BASIC_BLOCK\ 5\l\
|\ \ \ 52:\ r103:DI=sign_extend(r59:SI)\l\
|\ \ \ 53:\ NOTE_INSN_DELETED\l\
|\ \ \ 54:\ \{r104:DI=[r154:DI+const(`spec_fd'+0x10)]+r103:DI;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
\ \ \ \ \ \ REG_DEAD\ r103:DI\l\
|\ \ \ 55:\ dx:DI=0x20000\l\
|\ \ \ 56:\ si:DI=r104:DI\l\
\ \ \ \ \ \ REG_DEAD\ r104:DI\l\
|\ \ \ 57:\ di:SI=r60:SI\l\
|\ \ \ 58:\ ax:DI=call\ [`read']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:SI\l\
\ \ \ \ \ \ REG_DEAD\ si:DI\l\
\ \ \ \ \ \ REG_DEAD\ dx:DI\l\
|\ \ \ 59:\ r64:DI=ax:DI\l\
\ \ \ \ \ \ REG_DEAD\ ax:DI\l\
|\ \ \ 61:\ debug\ rc\ =\>\ r64:DI#0\l\
|\ \ \ 62:\ flags:CCGOC=cmp(r64:DI#0,0)\l\
|\ \ \ 63:\ pc=\{(flags:CCGOC!=0)?L69:pc\}\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x26d2\l\
}"];

	fn_33_basic_block_7 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:479 FREQ:9938 |\ \ \ 69:\ L69:\l\
|\ \ \ 70:\ NOTE_INSN_BASIC_BLOCK\ 7\l\
|\ \ \ 72:\ pc=\{(flags:CCGOC\>=0)?L88:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGOC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x2710\l\
}"];

	fn_33_basic_block_9 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:479 FREQ:9938 |\ \ \ 88:\ L88:\l\
|\ \ \ 89:\ NOTE_INSN_BASIC_BLOCK\ 9\l\
|\ \ \ 93:\ NOTE_INSN_DELETED\l\
|\ \ 101:\ NOTE_INSN_DELETED\l\
|\ \ 102:\ NOTE_INSN_DELETED\l\
|\ \ 103:\ \{[r154:DI+const(`spec_fd'+0x4)]=[r154:DI+const(`spec_fd'+0x4)]+r64:DI#0;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ 104:\ \{r59:SI=r59:SI+r64:DI#0;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_DEAD\ r64:DI\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ 105:\ debug\ i\ =\>\ r59:SI\l\
}"];

	fn_33_basic_block_17 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 201:\ NOTE_INSN_BASIC_BLOCK\ 17\l\
|\ \ 202:\ pc=L196\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_33_basic_block_8 [shape=record,style=filled,fillcolor=lightblue,label="{ FREQ:0 |\ \ 196:\ L196:\l\
|\ \ \ 73:\ NOTE_INSN_BASIC_BLOCK\ 8\l\
|\ \ \ 74:\ ax:DI=call\ [`__errno_location']\ argc:0\l\
\ \ \ \ \ \ REG_EH_REGION\ 0\l\
|\ \ \ 75:\ r68:DI=ax:DI\l\
\ \ \ \ \ \ REG_DEAD\ ax:DI\l\
|\ \ \ 76:\ di:SI=[r68:DI]\l\
\ \ \ \ \ \ REG_DEAD\ r68:DI\l\
|\ \ \ 77:\ ax:DI=call\ [`strerror']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:SI\l\
\ \ \ \ \ \ REG_EH_REGION\ 0\l\
|\ \ \ 78:\ r70:DI=ax:DI\l\
\ \ \ \ \ \ REG_DEAD\ ax:DI\l\
|\ \ \ 79:\ cx:DI=r70:DI\l\
\ \ \ \ \ \ REG_DEAD\ r70:DI\l\
|\ \ \ 80:\ dx:DI=r89:DI\l\
\ \ \ \ \ \ REG_DEAD\ r89:DI\l\
|\ \ \ 81:\ si:DI=`*.LC12'\l\
|\ \ \ 82:\ di:DI=[`stderr']\l\
|\ \ \ 83:\ ax:QI=0\l\
|\ \ \ 84:\ ax:SI=call\ [`fprintf']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_DEAD\ si:DI\l\
\ \ \ \ \ \ REG_DEAD\ cx:DI\l\
\ \ \ \ \ \ REG_DEAD\ dx:DI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
|\ \ \ 85:\ di:SI=0\l\
|\ \ \ 86:\ call\ [`exit']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:SI\l\
\ \ \ \ \ \ REG_NORETURN\ 0\l\
\ \ \ \ \ \ REG_EH_REGION\ 0\l\
}"];

	fn_33_basic_block_6 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:3 FREQ:62 |\ \ 112:\ L112:\l\
|\ \ \ 64:\ NOTE_INSN_BASIC_BLOCK\ 6\l\
|\ \ \ 65:\ di:SI=r60:SI\l\
\ \ \ \ \ \ REG_DEAD\ r60:SI\l\
|\ \ \ 66:\ ax:SI=call\ [`close']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:SI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
|\ \ 171:\ \{r153:DI=r152:DI*0x18;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_DEAD\ r152:DI\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
}"];

	fn_33_basic_block_14 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:8 FREQ:166 |\ \ 167:\ L167:\l\
|\ \ 168:\ NOTE_INSN_BASIC_BLOCK\ 14\l\
|\ \ 172:\ NOTE_INSN_DELETED\l\
|\ \ 173:\ r72:SI=[r153:DI+const(`spec_fd'+0x4)]\l\
|\ \ 175:\ flags:CCGC=cmp(r90:SI,r72:SI)\l\
|\ \ 176:\ pc=\{(flags:CCGC\>0)?L174:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x186a\l\
}"];

	fn_33_basic_block_15 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:3 FREQ:62 |\ \ 177:\ NOTE_INSN_BASIC_BLOCK\ 15\l\
|\ \ 182:\ ax:SI=0\l\
|\ \ 185:\ use\ ax:SI\l\
}"];

	fn_33_basic_block_11 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:5 FREQ:104 |\ \ 174:\ L174:\l\
|\ \ 116:\ NOTE_INSN_BASIC_BLOCK\ 11\l\
|\ \ 117:\ \{r73:SI=r90:SI-r72:SI;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ 118:\ debug\ tmp\ =\>\ r73:SI\l\
|\ \ 119:\ flags:CCGC=cmp(r73:SI,r72:SI)\l\
|\ \ 120:\ r86:SI=\{(flags:CCGC\<=0)?r73:SI:r72:SI\}\l\
\ \ \ \ \ \ REG_DEAD\ r73:SI\l\
\ \ \ \ \ \ REG_DEAD\ r72:SI\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGC\l\
|\ \ 121:\ debug\ tmp\ =\>\ r86:SI\l\
|\ \ 122:\ flags:CCGC=cmp([`dbglvl'],0x3)\l\
|\ \ 123:\ pc=\{(flags:CCGC\<=0)?L129:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0\l\
}"];

	fn_33_basic_block_12 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:5 FREQ:104 |\ \ 124:\ NOTE_INSN_BASIC_BLOCK\ 12\l\
|\ \ 125:\ si:SI=r86:SI\l\
|\ \ 126:\ di:DI=`*.LC13'\l\
|\ \ 127:\ ax:QI=0\l\
|\ \ 128:\ ax:SI=call\ [`printf']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_DEAD\ si:SI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
}"];

	fn_33_basic_block_13 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:5 FREQ:104 |\ \ 129:\ L129:\l\
|\ \ 130:\ NOTE_INSN_BASIC_BLOCK\ 13\l\
|\ \ 135:\ r76:DI=[r153:DI+const(`spec_fd'+0x10)]\l\
|\ \ 139:\ \{r125:DI=r153:DI+`spec_fd';clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ 140:\ NOTE_INSN_DELETED\l\
|\ \ 141:\ r126:DI=sign_extend([r125:DI+0x4])\l\
|\ \ 142:\ \{r128:DI=r76:DI+r126:DI;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_DEAD\ r126:DI\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ 143:\ r129:DI=sign_extend(r86:SI)\l\
|\ \ 147:\ dx:DI=r129:DI\l\
\ \ \ \ \ \ REG_DEAD\ r129:DI\l\
|\ \ 148:\ si:DI=r76:DI\l\
\ \ \ \ \ \ REG_DEAD\ r76:DI\l\
|\ \ 149:\ di:DI=r128:DI\l\
\ \ \ \ \ \ REG_DEAD\ r128:DI\l\
|\ \ 150:\ ax:DI=call\ [`memcpy']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_DEAD\ si:DI\l\
\ \ \ \ \ \ REG_DEAD\ dx:DI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:DI\l\
\ \ \ \ \ \ REG_EH_REGION\ 0\l\
|\ \ 164:\ NOTE_INSN_DELETED\l\
|\ \ 165:\ NOTE_INSN_DELETED\l\
|\ \ 166:\ \{[r125:DI+0x4]=r86:SI+[r125:DI+0x4];clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_DEAD\ r86:SI\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
\ \ \ \ \ \ REG_DEAD\ r125:DI\l\
}"];

	fn_33_basic_block_16 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 197:\ NOTE_INSN_BASIC_BLOCK\ 16\l\
|\ \ 198:\ pc=L195\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_33_basic_block_3 [shape=record,style=filled,fillcolor=lightblue,label="{ FREQ:0 |\ \ 195:\ L195:\l\
|\ \ \ 17:\ NOTE_INSN_BASIC_BLOCK\ 3\l\
|\ \ \ 18:\ ax:DI=call\ [`__errno_location']\ argc:0\l\
\ \ \ \ \ \ REG_EH_REGION\ 0\l\
|\ \ \ 19:\ r82:DI=ax:DI\l\
\ \ \ \ \ \ REG_DEAD\ ax:DI\l\
|\ \ \ 20:\ di:SI=[r82:DI]\l\
\ \ \ \ \ \ REG_DEAD\ r82:DI\l\
|\ \ \ 21:\ ax:DI=call\ [`strerror']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:SI\l\
\ \ \ \ \ \ REG_EH_REGION\ 0\l\
|\ \ \ 22:\ r84:DI=ax:DI\l\
\ \ \ \ \ \ REG_DEAD\ ax:DI\l\
|\ \ \ 23:\ cx:DI=r84:DI\l\
\ \ \ \ \ \ REG_DEAD\ r84:DI\l\
|\ \ \ 24:\ dx:DI=r89:DI\l\
\ \ \ \ \ \ REG_DEAD\ r89:DI\l\
|\ \ \ 25:\ si:DI=`*.LC11'\l\
|\ \ \ 26:\ di:DI=[`stderr']\l\
|\ \ \ 27:\ ax:QI=0\l\
|\ \ \ 28:\ ax:SI=call\ [`fprintf']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_DEAD\ si:DI\l\
\ \ \ \ \ \ REG_DEAD\ cx:DI\l\
\ \ \ \ \ \ REG_DEAD\ dx:DI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
|\ \ \ 29:\ di:SI=0\l\
|\ \ \ 30:\ call\ [`exit']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:SI\l\
\ \ \ \ \ \ REG_NORETURN\ 0\l\
\ \ \ \ \ \ REG_EH_REGION\ 0\l\
}"];

	fn_33_basic_block_0 [shape=Mdiamond,style=filled,fillcolor=white,label="ENTRY"];

	fn_33_basic_block_0:s -> fn_33_basic_block_2:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_33_basic_block_2:s -> fn_33_basic_block_16:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[0%]"];
	fn_33_basic_block_2:s -> fn_33_basic_block_4:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_33_basic_block_16:s -> fn_33_basic_block_3:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_33_basic_block_4:s -> fn_33_basic_block_10:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_33_basic_block_5:s -> fn_33_basic_block_6:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[0%]"];
	fn_33_basic_block_5:s -> fn_33_basic_block_7:n [style="solid,bold",color=black,weight=10,constraint=true, label="[99%]"];
	fn_33_basic_block_6:s -> fn_33_basic_block_14:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_33_basic_block_7:s -> fn_33_basic_block_17:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[0%]"];
	fn_33_basic_block_7:s -> fn_33_basic_block_9:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_33_basic_block_17:s -> fn_33_basic_block_8:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_33_basic_block_9:s -> fn_33_basic_block_10:n [style="dotted,bold",color=blue,weight=10,constraint=false, label="[100%]"];
	fn_33_basic_block_10:s -> fn_33_basic_block_5:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_33_basic_block_10:s -> fn_33_basic_block_6:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[0%]"];
	fn_33_basic_block_11:s -> fn_33_basic_block_12:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_33_basic_block_11:s -> fn_33_basic_block_13:n [style="solid,bold",color=black,weight=10,constraint=true, label="[0%]"];
	fn_33_basic_block_12:s -> fn_33_basic_block_13:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_33_basic_block_13:s -> fn_33_basic_block_14:n [style="dotted,bold",color=blue,weight=10,constraint=false, label="[100%]"];
	fn_33_basic_block_14:s -> fn_33_basic_block_11:n [style="solid,bold",color=black,weight=10,constraint=true, label="[62%]"];
	fn_33_basic_block_14:s -> fn_33_basic_block_15:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[37%]"];
	fn_33_basic_block_15:s -> fn_33_basic_block_1:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_33_basic_block_0:s -> fn_33_basic_block_1:n [style="invis",constraint=true];
}
subgraph "spec_fread" {
	color="black";
	label="spec_fread";
	fn_35_basic_block_1 [shape=Mdiamond,style=filled,fillcolor=white,label="EXIT"];

	fn_35_basic_block_2 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:75517 FREQ:10000 |\ \ \ 11:\ NOTE_INSN_BASIC_BLOCK\ 2\l\
|\ \ \ \ 2:\ r77:DI=di:DI\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
|\ \ \ \ 3:\ r78:SI=si:SI\l\
\ \ \ \ \ \ REG_DEAD\ si:SI\l\
|\ \ \ \ 4:\ r79:SI=dx:SI\l\
\ \ \ \ \ \ REG_DEAD\ dx:SI\l\
|\ \ \ \ 5:\ r80:SI=cx:SI\l\
\ \ \ \ \ \ REG_DEAD\ cx:SI\l\
|\ \ \ \ 6:\ NOTE_INSN_FUNCTION_BEG\l\
|\ \ \ 13:\ debug\ rc\ =\>\ 0\l\
|\ \ \ 14:\ flags:CCGC=cmp([`dbglvl'],0x4)\l\
|\ \ \ 15:\ pc=\{(flags:CCGC\<=0)?L24:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x2710\l\
}"];

	fn_35_basic_block_24 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 208:\ NOTE_INSN_BASIC_BLOCK\ 24\l\
|\ \ 209:\ pc=L198\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_35_basic_block_3 [shape=record,style=filled,fillcolor=lightblue,label="{ FREQ:0 |\ \ 198:\ L198:\l\
|\ \ \ 16:\ NOTE_INSN_BASIC_BLOCK\ 3\l\
|\ \ \ 17:\ r8:SI=r80:SI\l\
|\ \ \ 18:\ cx:SI=r79:SI\l\
|\ \ \ 19:\ dx:SI=r78:SI\l\
|\ \ \ 20:\ si:DI=r77:DI\l\
|\ \ \ 21:\ di:DI=`*.LC20'\l\
|\ \ \ 22:\ ax:QI=0\l\
|\ \ \ 23:\ ax:SI=call\ [`printf']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ r8:SI\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_DEAD\ si:DI\l\
\ \ \ \ \ \ REG_DEAD\ cx:SI\l\
\ \ \ \ \ \ REG_DEAD\ dx:SI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
|\ \ 199:\ pc=L24\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_35_basic_block_4 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:75517 FREQ:10000 |\ \ \ 24:\ L24:\l\
|\ \ \ 25:\ NOTE_INSN_BASIC_BLOCK\ 4\l\
|\ \ \ 26:\ flags:CCGC=cmp(r80:SI,0x3)\l\
|\ \ \ 27:\ pc=\{(flags:CCGC\<=0)?L37:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x2710\l\
}"];

	fn_35_basic_block_6 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:75517 FREQ:10000 |\ \ \ 37:\ L37:\l\
|\ \ \ 38:\ NOTE_INSN_BASIC_BLOCK\ 6\l\
|\ \ \ 39:\ r129:DI=sign_extend(r80:SI)\l\
\ \ \ \ \ \ REG_DEAD\ r80:SI\l\
|\ \ \ 41:\ \{r83:DI=r129:DI*0x18;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ \ 42:\ \{r84:DI=r83:DI+`spec_fd';clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_DEAD\ r83:DI\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ \ 43:\ r61:SI=[r84:DI+0x8]\l\
|\ \ \ 48:\ r62:SI=[r84:DI+0x4]\l\
\ \ \ \ \ \ REG_DEAD\ r84:DI\l\
|\ \ \ 49:\ flags:CCGC=cmp(r61:SI,r62:SI)\l\
|\ \ \ 50:\ pc=\{(flags:CCGC\<0)?L59:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x2710\l\
}"];

	fn_35_basic_block_9 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:75517 FREQ:10000 |\ \ \ 59:\ L59:\l\
|\ \ \ 60:\ NOTE_INSN_BASIC_BLOCK\ 9\l\
|\ \ \ 61:\ \{r89:SI=r78:SI*r79:SI;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ \ 62:\ \{r90:SI=r61:SI+r89:SI;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_DEAD\ r89:SI\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ \ 63:\ flags:CCGC=cmp(r62:SI,r90:SI)\l\
\ \ \ \ \ \ REG_DEAD\ r90:SI\l\
|\ \ \ 64:\ pc=\{(flags:CCGC\>0)?L70:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x270e\l\
}"];

	fn_35_basic_block_10 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:18 FREQ:2 |\ \ \ 65:\ NOTE_INSN_BASIC_BLOCK\ 10\l\
|\ \ \ 66:\ \{r91:SI=r62:SI-r61:SI;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_DEAD\ r62:SI\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ \ 67:\ NOTE_INSN_DELETED\l\
|\ \ 170:\ debug\ D#6\ =\>\ r91:SI/r78:SI\l\
|\ \ \ 68:\ \{r79:SI=r91:SI/r78:SI;r93:SI=r91:SI%r78:SI;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
\ \ \ \ \ \ REG_UNUSED\ r93:SI\l\
\ \ \ \ \ \ REG_DEAD\ r91:SI\l\
|\ \ \ 69:\ debug\ rc\ =\>\ D#6\l\
}"];

	fn_35_basic_block_11 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:75517 FREQ:10000 |\ \ \ 70:\ L70:\l\
|\ \ \ 71:\ NOTE_INSN_BASIC_BLOCK\ 11\l\
|\ \ \ 72:\ debug\ rc\ =\>\ r79:SI\l\
|\ \ \ 73:\ NOTE_INSN_DELETED\l\
|\ \ \ 76:\ \{r97:DI=r129:DI*0x18;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ \ 78:\ r99:DI=sign_extend(r61:SI)\l\
\ \ \ \ \ \ REG_DEAD\ r61:SI\l\
|\ \ \ 79:\ NOTE_INSN_DELETED\l\
|\ \ \ 80:\ NOTE_INSN_DELETED\l\
|\ \ \ 81:\ r102:DI=sign_extend(r79:SI)\l\
|\ \ \ 82:\ r103:DI=r77:DI\l\
\ \ \ \ \ \ REG_DEAD\ r77:DI\l\
|\ \ \ 83:\ \{r104:DI=[r97:DI+const(`spec_fd'+0x10)]+r99:DI;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
\ \ \ \ \ \ REG_DEAD\ r99:DI\l\
\ \ \ \ \ \ REG_DEAD\ r97:DI\l\
|\ \ \ 84:\ flags:CC=cmp(r102:DI,0x8)\l\
|\ \ \ 85:\ pc=\{(ltu(flags:CC,0))?L94:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x7d0\l\
}"];

	fn_35_basic_block_12 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:60414 FREQ:8000 |\ \ 159:\ NOTE_INSN_BASIC_BLOCK\ 12\l\
|\ \ \ 86:\ NOTE_INSN_DELETED\l\
|\ \ \ 87:\ flags:CCZ=cmp(zero_extract(r103:DI,0x1,0x2),0)\l\
|\ \ \ 88:\ pc=\{(flags:CCZ==0)?L91:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCZ\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x2328\l\
}"];

	fn_35_basic_block_13 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:6041 FREQ:800 |\ \ 160:\ NOTE_INSN_BASIC_BLOCK\ 13\l\
|\ \ \ 89:\ \{[r103:DI]=[r104:DI];r103:DI=r103:DI+0x4;r104:DI=r104:DI+0x4;\}\l\
|\ \ \ 90:\ \{r102:DI=r102:DI-0x4;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
}"];

	fn_35_basic_block_14 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:60414 FREQ:8000 |\ \ \ 91:\ L91:\l\
|\ \ 161:\ NOTE_INSN_BASIC_BLOCK\ 14\l\
|\ \ \ 92:\ \{r106:DI=r102:DI\ 0\>\>0x3;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ \ 93:\ \{r130:DI=0;r103:DI=r106:DI\<\<0x3+r103:DI;r104:DI=r106:DI\<\<0x3+r104:DI;[r103:DI]=[r104:DI];use\ r106:DI;\}\l\
\ \ \ \ \ \ REG_DEAD\ r106:DI\l\
\ \ \ \ \ \ REG_UNUSED\ r130:DI\l\
}"];

	fn_35_basic_block_15 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:75517 FREQ:10000 |\ \ \ 94:\ L94:\l\
|\ \ 162:\ NOTE_INSN_BASIC_BLOCK\ 15\l\
|\ \ \ 95:\ r107:DI=0\l\
|\ \ \ 96:\ NOTE_INSN_DELETED\l\
|\ \ \ 97:\ flags:CCZ=cmp(zero_extract(r102:DI,0x1,0x2),0)\l\
|\ \ \ 98:\ pc=\{(flags:CCZ==0)?L103:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCZ\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x1388\l\
}"];

	fn_35_basic_block_16 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:37758 FREQ:5000 |\ \ 163:\ NOTE_INSN_BASIC_BLOCK\ 16\l\
|\ \ \ 99:\ r109:SI=[r104:DI]\l\
|\ \ 100:\ [r103:DI]=r109:SI\l\
\ \ \ \ \ \ REG_DEAD\ r109:SI\l\
|\ \ 102:\ r107:DI=0x4\l\
}"];

	fn_35_basic_block_17 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:75517 FREQ:10000 |\ \ 103:\ L103:\l\
|\ \ 164:\ NOTE_INSN_BASIC_BLOCK\ 17\l\
|\ \ 104:\ NOTE_INSN_DELETED\l\
|\ \ 105:\ flags:CCZ=cmp(zero_extract(r102:DI,0x1,0x1),0)\l\
|\ \ 106:\ pc=\{(flags:CCZ==0)?L111:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCZ\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x1388\l\
}"];

	fn_35_basic_block_18 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:37758 FREQ:5000 |\ \ 165:\ NOTE_INSN_BASIC_BLOCK\ 18\l\
|\ \ 107:\ r112:HI=[r104:DI+r107:DI]\l\
|\ \ 108:\ [r103:DI+r107:DI]=r112:HI\l\
\ \ \ \ \ \ REG_DEAD\ r112:HI\l\
|\ \ 109:\ NOTE_INSN_DELETED\l\
|\ \ 110:\ \{r107:DI=r107:DI+0x2;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
}"];

	fn_35_basic_block_19 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:75517 FREQ:10000 |\ \ 111:\ L111:\l\
|\ \ 166:\ NOTE_INSN_BASIC_BLOCK\ 19\l\
|\ \ 112:\ NOTE_INSN_DELETED\l\
|\ \ 113:\ flags:CCZ=cmp(zero_extract(r102:DI,0x1,0),0)\l\
\ \ \ \ \ \ REG_DEAD\ r102:DI\l\
|\ \ 114:\ pc=\{(flags:CCZ==0)?L117:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCZ\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x1388\l\
}"];

	fn_35_basic_block_20 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:37758 FREQ:5000 |\ \ 167:\ NOTE_INSN_BASIC_BLOCK\ 20\l\
|\ \ 115:\ r115:QI=[r104:DI+r107:DI]\l\
\ \ \ \ \ \ REG_DEAD\ r104:DI\l\
|\ \ 116:\ [r103:DI+r107:DI]=r115:QI\l\
\ \ \ \ \ \ REG_DEAD\ r115:QI\l\
\ \ \ \ \ \ REG_DEAD\ r107:DI\l\
\ \ \ \ \ \ REG_DEAD\ r103:DI\l\
}"];

	fn_35_basic_block_21 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:75517 FREQ:10000 |\ \ 117:\ L117:\l\
|\ \ 168:\ NOTE_INSN_BASIC_BLOCK\ 21\l\
|\ \ 118:\ \{r72:SI=r79:SI*r78:SI;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_DEAD\ r78:SI\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ 121:\ \{r118:DI=r129:DI*0x18;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_DEAD\ r129:DI\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ 122:\ NOTE_INSN_DELETED\l\
|\ \ 130:\ NOTE_INSN_DELETED\l\
|\ \ 131:\ NOTE_INSN_DELETED\l\
|\ \ 132:\ \{[r118:DI+const(`spec_fd'+0x8)]=r72:SI+[r118:DI+const(`spec_fd'+0x8)];clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_DEAD\ r118:DI\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ \ 10:\ r59:SI=r79:SI\l\
\ \ \ \ \ \ REG_DEAD\ r79:SI\l\
|\ \ 133:\ flags:CCGC=cmp([`dbglvl'],0x4)\l\
|\ \ 134:\ pc=\{(flags:CCGC\<=0)?L140:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x2710\l\
}"];

	fn_35_basic_block_27 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 220:\ NOTE_INSN_BASIC_BLOCK\ 27\l\
|\ \ 221:\ pc=L205\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_35_basic_block_22 [shape=record,style=filled,fillcolor=lightblue,label="{ FREQ:0 |\ \ 205:\ L205:\l\
|\ \ 135:\ NOTE_INSN_BASIC_BLOCK\ 22\l\
|\ \ 136:\ si:SI=r72:SI\l\
\ \ \ \ \ \ REG_DEAD\ r72:SI\l\
|\ \ 137:\ di:DI=`*.LC18'\l\
|\ \ 138:\ ax:QI=0\l\
|\ \ 139:\ ax:SI=call\ [`printf']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_DEAD\ si:SI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
|\ \ 206:\ pc=L140\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_35_basic_block_26 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 216:\ NOTE_INSN_BASIC_BLOCK\ 26\l\
|\ \ 217:\ pc=L202\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_35_basic_block_7 [shape=record,style=filled,fillcolor=lightblue,label="{ FREQ:0 |\ \ 202:\ L202:\l\
|\ \ \ 51:\ NOTE_INSN_BASIC_BLOCK\ 7\l\
|\ \ \ \ 9:\ r59:SI=0xffffffffffffffff\l\
|\ \ \ 52:\ flags:CCGC=cmp([`dbglvl'],0x4)\l\
|\ \ \ 53:\ pc=\{(flags:CCGC\<=0)?L140:pc\}\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x1bbd\l\
}"];

	fn_35_basic_block_8 [shape=record,style=filled,fillcolor=lightblue,label="{ FREQ:0 |\ \ \ 54:\ NOTE_INSN_BASIC_BLOCK\ 8\l\
|\ \ \ 55:\ di:DI=`*.LC17'\l\
|\ \ \ 56:\ ax:SI=call\ [`puts']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
|\ \ 203:\ pc=L140\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_35_basic_block_23 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:75517 FREQ:10000 |\ \ 140:\ L140:\l\
|\ \ 141:\ NOTE_INSN_BASIC_BLOCK\ 23\l\
|\ \ 146:\ ax:SI=r59:SI\l\
\ \ \ \ \ \ REG_DEAD\ r59:SI\l\
|\ \ 149:\ use\ ax:SI\l\
}"];

	fn_35_basic_block_25 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 212:\ NOTE_INSN_BASIC_BLOCK\ 25\l\
|\ \ 213:\ pc=L201\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_35_basic_block_5 [shape=record,style=filled,fillcolor=lightblue,label="{ FREQ:0 |\ \ 201:\ L201:\l\
|\ \ \ 28:\ NOTE_INSN_BASIC_BLOCK\ 5\l\
|\ \ \ 29:\ dx:SI=r80:SI\l\
\ \ \ \ \ \ REG_DEAD\ r80:SI\l\
|\ \ \ 30:\ si:DI=`*.LC21'\l\
|\ \ \ 31:\ di:DI=[`stderr']\l\
|\ \ \ 32:\ ax:QI=0\l\
|\ \ \ 33:\ ax:SI=call\ [`fprintf']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_DEAD\ si:DI\l\
\ \ \ \ \ \ REG_DEAD\ dx:SI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
|\ \ \ 34:\ di:SI=0\l\
|\ \ \ 35:\ call\ [`exit']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:SI\l\
\ \ \ \ \ \ REG_NORETURN\ 0\l\
\ \ \ \ \ \ REG_EH_REGION\ 0\l\
}"];

	fn_35_basic_block_0 [shape=Mdiamond,style=filled,fillcolor=white,label="ENTRY"];

	fn_35_basic_block_0:s -> fn_35_basic_block_2:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_35_basic_block_2:s -> fn_35_basic_block_24:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[0%]"];
	fn_35_basic_block_2:s -> fn_35_basic_block_4:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_35_basic_block_24:s -> fn_35_basic_block_3:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_35_basic_block_3:s -> fn_35_basic_block_4:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_35_basic_block_4:s -> fn_35_basic_block_25:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[0%]"];
	fn_35_basic_block_4:s -> fn_35_basic_block_6:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_35_basic_block_25:s -> fn_35_basic_block_5:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_35_basic_block_6:s -> fn_35_basic_block_26:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[0%]"];
	fn_35_basic_block_6:s -> fn_35_basic_block_9:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_35_basic_block_26:s -> fn_35_basic_block_7:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_35_basic_block_7:s -> fn_35_basic_block_8:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[28%]"];
	fn_35_basic_block_7:s -> fn_35_basic_block_23:n [style="solid,bold",color=black,weight=10,constraint=true, label="[71%]"];
	fn_35_basic_block_8:s -> fn_35_basic_block_23:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_35_basic_block_9:s -> fn_35_basic_block_10:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[0%]"];
	fn_35_basic_block_9:s -> fn_35_basic_block_11:n [style="solid,bold",color=black,weight=10,constraint=true, label="[99%]"];
	fn_35_basic_block_10:s -> fn_35_basic_block_11:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_35_basic_block_11:s -> fn_35_basic_block_15:n [style="solid,bold",color=black,weight=10,constraint=true, label="[20%]"];
	fn_35_basic_block_11:s -> fn_35_basic_block_12:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[80%]"];
	fn_35_basic_block_12:s -> fn_35_basic_block_14:n [style="solid,bold",color=black,weight=10,constraint=true, label="[90%]"];
	fn_35_basic_block_12:s -> fn_35_basic_block_13:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[10%]"];
	fn_35_basic_block_13:s -> fn_35_basic_block_14:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_35_basic_block_14:s -> fn_35_basic_block_15:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_35_basic_block_15:s -> fn_35_basic_block_17:n [style="solid,bold",color=black,weight=10,constraint=true, label="[50%]"];
	fn_35_basic_block_15:s -> fn_35_basic_block_16:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[50%]"];
	fn_35_basic_block_16:s -> fn_35_basic_block_17:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_35_basic_block_17:s -> fn_35_basic_block_19:n [style="solid,bold",color=black,weight=10,constraint=true, label="[50%]"];
	fn_35_basic_block_17:s -> fn_35_basic_block_18:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[50%]"];
	fn_35_basic_block_18:s -> fn_35_basic_block_19:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_35_basic_block_19:s -> fn_35_basic_block_21:n [style="solid,bold",color=black,weight=10,constraint=true, label="[50%]"];
	fn_35_basic_block_19:s -> fn_35_basic_block_20:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[50%]"];
	fn_35_basic_block_20:s -> fn_35_basic_block_21:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_35_basic_block_21:s -> fn_35_basic_block_27:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[0%]"];
	fn_35_basic_block_21:s -> fn_35_basic_block_23:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_35_basic_block_27:s -> fn_35_basic_block_22:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_35_basic_block_22:s -> fn_35_basic_block_23:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_35_basic_block_23:s -> fn_35_basic_block_1:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_35_basic_block_0:s -> fn_35_basic_block_1:n [style="invis",constraint=true];
}
subgraph "spec_getc" {
	color="black";
	label="spec_getc";
	fn_36_basic_block_1 [shape=Mdiamond,style=filled,fillcolor=white,label="EXIT"];

	fn_36_basic_block_2 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:151034 FREQ:10000 |\ \ \ \ 8:\ NOTE_INSN_BASIC_BLOCK\ 2\l\
|\ \ \ \ 2:\ r73:SI=di:SI\l\
\ \ \ \ \ \ REG_DEAD\ di:SI\l\
|\ \ \ \ 3:\ NOTE_INSN_FUNCTION_BEG\l\
|\ \ \ 10:\ debug\ rc\ =\>\ 0\l\
|\ \ \ 11:\ flags:CCGC=cmp([`dbglvl'],0x4)\l\
|\ \ \ 12:\ pc=\{(flags:CCGC\<=0)?L18:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x2710\l\
}"];

	fn_36_basic_block_12 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 106:\ NOTE_INSN_BASIC_BLOCK\ 12\l\
|\ \ 107:\ pc=L96\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_36_basic_block_3 [shape=record,style=filled,fillcolor=lightblue,label="{ FREQ:0 |\ \ \ 96:\ L96:\l\
|\ \ \ 13:\ NOTE_INSN_BASIC_BLOCK\ 3\l\
|\ \ \ 14:\ si:SI=r73:SI\l\
|\ \ \ 15:\ di:DI=`*.LC23'\l\
|\ \ \ 16:\ ax:QI=0\l\
|\ \ \ 17:\ ax:SI=call\ [`printf']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_DEAD\ si:SI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
|\ \ \ 97:\ pc=L18\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_36_basic_block_4 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:151034 FREQ:10000 |\ \ \ 18:\ L18:\l\
|\ \ \ 19:\ NOTE_INSN_BASIC_BLOCK\ 4\l\
|\ \ \ 20:\ flags:CCGC=cmp(r73:SI,0x3)\l\
|\ \ \ 21:\ pc=\{(flags:CCGC\<=0)?L31:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x2710\l\
}"];

	fn_36_basic_block_6 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:151034 FREQ:10000 |\ \ \ 31:\ L31:\l\
|\ \ \ 32:\ NOTE_INSN_BASIC_BLOCK\ 6\l\
|\ \ \ 33:\ r74:DI=sign_extend(r73:SI)\l\
\ \ \ \ \ \ REG_DEAD\ r73:SI\l\
|\ \ \ 35:\ \{r76:DI=r74:DI*0x18;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_DEAD\ r74:DI\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ \ 36:\ \{r77:DI=r76:DI+`spec_fd';clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ \ 37:\ r61:SI=[r77:DI+0x8]\l\
|\ \ \ 42:\ flags:CCGC=cmp(r61:SI,[r77:DI+0x4])\l\
|\ \ \ 43:\ pc=\{(flags:CCGC\<0)?L52:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x26e7\l\
}"];

	fn_36_basic_block_9 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:150408 FREQ:9959 |\ \ \ 52:\ L52:\l\
|\ \ \ 53:\ NOTE_INSN_BASIC_BLOCK\ 9\l\
|\ \ \ 58:\ r64:DI=[r76:DI+const(`spec_fd'+0x10)]\l\
\ \ \ \ \ \ REG_DEAD\ r76:DI\l\
|\ \ \ 63:\ \{r90:SI=r61:SI+0x1;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ \ 64:\ [r77:DI+0x8]=r90:SI\l\
\ \ \ \ \ \ REG_DEAD\ r90:SI\l\
\ \ \ \ \ \ REG_DEAD\ r77:DI\l\
\ \ \ \ \ \ REG_EQUAL\ r61:SI+0x1\l\
|\ \ \ 65:\ r91:DI=sign_extend(r61:SI)\l\
\ \ \ \ \ \ REG_DEAD\ r61:SI\l\
|\ \ \ 66:\ r59:SI=zero_extend([r64:DI+r91:DI])\l\
\ \ \ \ \ \ REG_DEAD\ r91:DI\l\
\ \ \ \ \ \ REG_DEAD\ r64:DI\l\
|\ \ \ 67:\ debug\ rc\ =\>\ D#7\l\
|\ \ \ 95:\ debug\ D#7\ =\>\ r59:SI\l\
|\ \ \ 68:\ flags:CCGC=cmp([`dbglvl'],0x4)\l\
|\ \ \ 69:\ pc=\{(flags:CCGC\<=0)?L75:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x2710\l\
}"];

	fn_36_basic_block_15 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 118:\ NOTE_INSN_BASIC_BLOCK\ 15\l\
|\ \ 119:\ pc=L103\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_36_basic_block_10 [shape=record,style=filled,fillcolor=lightblue,label="{ FREQ:0 |\ \ 103:\ L103:\l\
|\ \ \ 70:\ NOTE_INSN_BASIC_BLOCK\ 10\l\
|\ \ \ 71:\ si:SI=r59:SI\l\
|\ \ \ 72:\ di:DI=`*.LC18'\l\
|\ \ \ 73:\ ax:QI=0\l\
|\ \ \ 74:\ ax:SI=call\ [`printf']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_DEAD\ si:SI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
|\ \ 104:\ pc=L75\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_36_basic_block_7 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:626 FREQ:41 |\ \ \ 44:\ NOTE_INSN_BASIC_BLOCK\ 7\l\
|\ \ \ \ 6:\ r59:SI=0xffffffffffffffff\l\
|\ \ \ 45:\ flags:CCGC=cmp([`dbglvl'],0x4)\l\
|\ \ \ 46:\ pc=\{(flags:CCGC\<=0)?L75:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x2710\l\
}"];

	fn_36_basic_block_14 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 114:\ NOTE_INSN_BASIC_BLOCK\ 14\l\
|\ \ 115:\ pc=L100\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_36_basic_block_8 [shape=record,style=filled,fillcolor=lightblue,label="{ FREQ:0 |\ \ 100:\ L100:\l\
|\ \ \ 47:\ NOTE_INSN_BASIC_BLOCK\ 8\l\
|\ \ \ 48:\ di:DI=`*.LC17'\l\
|\ \ \ 49:\ ax:SI=call\ [`puts']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
|\ \ 101:\ pc=L75\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_36_basic_block_11 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:151034 FREQ:10000 |\ \ \ 75:\ L75:\l\
|\ \ \ 76:\ NOTE_INSN_BASIC_BLOCK\ 11\l\
|\ \ \ 81:\ ax:SI=r59:SI\l\
\ \ \ \ \ \ REG_DEAD\ r59:SI\l\
|\ \ \ 84:\ use\ ax:SI\l\
}"];

	fn_36_basic_block_13 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 110:\ NOTE_INSN_BASIC_BLOCK\ 13\l\
|\ \ 111:\ pc=L99\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_36_basic_block_5 [shape=record,style=filled,fillcolor=lightblue,label="{ FREQ:0 |\ \ \ 99:\ L99:\l\
|\ \ \ 22:\ NOTE_INSN_BASIC_BLOCK\ 5\l\
|\ \ \ 23:\ dx:SI=r73:SI\l\
\ \ \ \ \ \ REG_DEAD\ r73:SI\l\
|\ \ \ 24:\ si:DI=`*.LC16'\l\
|\ \ \ 25:\ di:DI=[`stderr']\l\
|\ \ \ 26:\ ax:QI=0\l\
|\ \ \ 27:\ ax:SI=call\ [`fprintf']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_DEAD\ si:DI\l\
\ \ \ \ \ \ REG_DEAD\ dx:SI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
|\ \ \ 28:\ di:SI=0\l\
|\ \ \ 29:\ call\ [`exit']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:SI\l\
\ \ \ \ \ \ REG_NORETURN\ 0\l\
\ \ \ \ \ \ REG_EH_REGION\ 0\l\
}"];

	fn_36_basic_block_0 [shape=Mdiamond,style=filled,fillcolor=white,label="ENTRY"];

	fn_36_basic_block_0:s -> fn_36_basic_block_2:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_36_basic_block_2:s -> fn_36_basic_block_12:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[0%]"];
	fn_36_basic_block_2:s -> fn_36_basic_block_4:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_36_basic_block_12:s -> fn_36_basic_block_3:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_36_basic_block_3:s -> fn_36_basic_block_4:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_36_basic_block_4:s -> fn_36_basic_block_13:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[0%]"];
	fn_36_basic_block_4:s -> fn_36_basic_block_6:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_36_basic_block_13:s -> fn_36_basic_block_5:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_36_basic_block_6:s -> fn_36_basic_block_7:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[0%]"];
	fn_36_basic_block_6:s -> fn_36_basic_block_9:n [style="solid,bold",color=black,weight=10,constraint=true, label="[99%]"];
	fn_36_basic_block_7:s -> fn_36_basic_block_14:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[0%]"];
	fn_36_basic_block_7:s -> fn_36_basic_block_11:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_36_basic_block_14:s -> fn_36_basic_block_8:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_36_basic_block_8:s -> fn_36_basic_block_11:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_36_basic_block_9:s -> fn_36_basic_block_15:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[0%]"];
	fn_36_basic_block_9:s -> fn_36_basic_block_11:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_36_basic_block_15:s -> fn_36_basic_block_10:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_36_basic_block_10:s -> fn_36_basic_block_11:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_36_basic_block_11:s -> fn_36_basic_block_1:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_36_basic_block_0:s -> fn_36_basic_block_1:n [style="invis",constraint=true];
}
subgraph "spec_ungetc" {
	color="black";
	label="spec_ungetc";
	fn_37_basic_block_1 [shape=Mdiamond,style=filled,fillcolor=white,label="EXIT"];

	fn_37_basic_block_2 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:150408 FREQ:10000 |\ \ \ \ 6:\ NOTE_INSN_BASIC_BLOCK\ 2\l\
|\ \ \ \ 2:\ r73:SI=di:SI\l\
\ \ \ \ \ \ REG_DEAD\ di:SI\l\
|\ \ \ \ 4:\ r74:SI=si:SI\l\
\ \ \ \ \ \ REG_DEAD\ si:SI\l\
|\ \ \ \ 5:\ NOTE_INSN_FUNCTION_BEG\l\
|\ \ \ \ 8:\ debug\ rc\ =\>\ 0\l\
|\ \ \ \ 9:\ flags:CCGC=cmp([`dbglvl'],0x4)\l\
|\ \ \ 10:\ pc=\{(flags:CCGC\<=0)?L16:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x2710\l\
}"];

	fn_37_basic_block_13 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 104:\ NOTE_INSN_BASIC_BLOCK\ 13\l\
|\ \ 105:\ pc=L95\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_37_basic_block_3 [shape=record,style=filled,fillcolor=lightblue,label="{ FREQ:0 |\ \ \ 95:\ L95:\l\
|\ \ \ 11:\ NOTE_INSN_BASIC_BLOCK\ 3\l\
|\ \ \ 12:\ si:SI=r74:SI\l\
|\ \ \ 13:\ di:DI=`*.LC25'\l\
|\ \ \ 14:\ ax:QI=0\l\
|\ \ \ 15:\ ax:SI=call\ [`printf']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_DEAD\ si:SI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
|\ \ \ 96:\ pc=L16\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_37_basic_block_4 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:150408 FREQ:10000 |\ \ \ 16:\ L16:\l\
|\ \ \ 17:\ NOTE_INSN_BASIC_BLOCK\ 4\l\
|\ \ \ 18:\ flags:CCGC=cmp(r74:SI,0x3)\l\
|\ \ \ 19:\ pc=\{(flags:CCGC\<=0)?L29:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x2710\l\
}"];

	fn_37_basic_block_6 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:150408 FREQ:10000 |\ \ \ 29:\ L29:\l\
|\ \ \ 30:\ NOTE_INSN_BASIC_BLOCK\ 6\l\
|\ \ \ 31:\ r75:DI=sign_extend(r74:SI)\l\
\ \ \ \ \ \ REG_DEAD\ r74:SI\l\
|\ \ \ 33:\ \{r77:DI=r75:DI*0x18;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_DEAD\ r75:DI\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ \ 34:\ \{r78:DI=r77:DI+`spec_fd';clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ \ 35:\ r60:SI=[r78:DI+0x8]\l\
|\ \ \ 36:\ flags:CCNO=cmp(r60:SI,0)\l\
|\ \ \ 37:\ pc=\{(flags:CCNO\>0)?L47:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCNO\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x2710\l\
}"];

	fn_37_basic_block_8 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:150408 FREQ:10000 |\ \ \ 47:\ L47:\l\
|\ \ \ 48:\ NOTE_INSN_BASIC_BLOCK\ 8\l\
|\ \ \ 53:\ r61:DI=[r77:DI+const(`spec_fd'+0x10)]\l\
\ \ \ \ \ \ REG_DEAD\ r77:DI\l\
|\ \ \ 54:\ \{r62:SI=r60:SI-0x1;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_DEAD\ r60:SI\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ \ 59:\ [r78:DI+0x8]=r62:SI\l\
\ \ \ \ \ \ REG_DEAD\ r78:DI\l\
|\ \ \ 60:\ r87:DI=sign_extend(r62:SI)\l\
\ \ \ \ \ \ REG_DEAD\ r62:SI\l\
|\ \ \ 61:\ r65:QI=[r61:DI+r87:DI]\l\
\ \ \ \ \ \ REG_DEAD\ r87:DI\l\
\ \ \ \ \ \ REG_DEAD\ r61:DI\l\
|\ \ \ 62:\ flags:CCZ=cmp(r65:QI,r73:SI#0)\l\
\ \ \ \ \ \ REG_DEAD\ r73:SI\l\
|\ \ \ 63:\ pc=\{(flags:CCZ==0)?L73:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCZ\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x2710\l\
}"];

	fn_37_basic_block_10 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:150408 FREQ:10000 |\ \ \ 73:\ L73:\l\
|\ \ \ 74:\ NOTE_INSN_BASIC_BLOCK\ 10\l\
|\ \ \ 75:\ flags:CCGC=cmp([`dbglvl'],0x4)\l\
|\ \ \ 76:\ pc=\{(flags:CCGC\<=0)?L82:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x2710\l\
}"];

	fn_37_basic_block_17 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 120:\ NOTE_INSN_BASIC_BLOCK\ 17\l\
|\ \ 121:\ pc=L101\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_37_basic_block_11 [shape=record,style=filled,fillcolor=lightblue,label="{ FREQ:0 |\ \ 101:\ L101:\l\
|\ \ \ 77:\ NOTE_INSN_BASIC_BLOCK\ 11\l\
|\ \ \ 78:\ si:SI=0\l\
|\ \ \ 79:\ di:DI=`*.LC18'\l\
|\ \ \ 80:\ ax:QI=0\l\
|\ \ \ 81:\ ax:SI=call\ [`printf']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_DEAD\ si:SI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
|\ \ 102:\ pc=L82\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_37_basic_block_12 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:150408 FREQ:10000 |\ \ \ 82:\ L82:\l\
|\ \ \ 83:\ NOTE_INSN_BASIC_BLOCK\ 12\l\
|\ \ \ 84:\ r88:SI=zero_extend(r65:QI)\l\
\ \ \ \ \ \ REG_DEAD\ r65:QI\l\
|\ \ \ 89:\ ax:SI=r88:SI\l\
\ \ \ \ \ \ REG_DEAD\ r88:SI\l\
|\ \ \ 92:\ use\ ax:SI\l\
}"];

	fn_37_basic_block_16 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 116:\ NOTE_INSN_BASIC_BLOCK\ 16\l\
|\ \ 117:\ pc=L100\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_37_basic_block_9 [shape=record,style=filled,fillcolor=lightblue,label="{ FREQ:0 |\ \ 100:\ L100:\l\
|\ \ \ 64:\ NOTE_INSN_BASIC_BLOCK\ 9\l\
|\ \ \ 65:\ cx:DI=[`stderr']\l\
|\ \ \ 66:\ dx:DI=0x47\l\
|\ \ \ 67:\ si:DI=0x1\l\
|\ \ \ 68:\ di:DI=`*.LC27'\l\
|\ \ \ 69:\ ax:DI=call\ [`fwrite']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_DEAD\ si:DI\l\
\ \ \ \ \ \ REG_DEAD\ cx:DI\l\
\ \ \ \ \ \ REG_DEAD\ dx:DI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:DI\l\
|\ \ \ 70:\ di:SI=0\l\
|\ \ \ 71:\ call\ [`exit']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:SI\l\
\ \ \ \ \ \ REG_NORETURN\ 0\l\
\ \ \ \ \ \ REG_EH_REGION\ 0\l\
}"];

	fn_37_basic_block_15 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 112:\ NOTE_INSN_BASIC_BLOCK\ 15\l\
|\ \ 113:\ pc=L99\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_37_basic_block_7 [shape=record,style=filled,fillcolor=lightblue,label="{ FREQ:0 |\ \ \ 99:\ L99:\l\
|\ \ \ 38:\ NOTE_INSN_BASIC_BLOCK\ 7\l\
|\ \ \ 39:\ dx:SI=r60:SI\l\
\ \ \ \ \ \ REG_DEAD\ r60:SI\l\
|\ \ \ 40:\ si:DI=`*.LC26'\l\
|\ \ \ 41:\ di:DI=[`stderr']\l\
|\ \ \ 42:\ ax:QI=0\l\
|\ \ \ 43:\ ax:SI=call\ [`fprintf']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_DEAD\ si:DI\l\
\ \ \ \ \ \ REG_DEAD\ dx:SI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
|\ \ \ 44:\ di:SI=0\l\
|\ \ \ 45:\ call\ [`exit']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:SI\l\
\ \ \ \ \ \ REG_NORETURN\ 0\l\
\ \ \ \ \ \ REG_EH_REGION\ 0\l\
}"];

	fn_37_basic_block_14 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 108:\ NOTE_INSN_BASIC_BLOCK\ 14\l\
|\ \ 109:\ pc=L98\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_37_basic_block_5 [shape=record,style=filled,fillcolor=lightblue,label="{ FREQ:0 |\ \ \ 98:\ L98:\l\
|\ \ \ 20:\ NOTE_INSN_BASIC_BLOCK\ 5\l\
|\ \ \ 21:\ dx:SI=r74:SI\l\
\ \ \ \ \ \ REG_DEAD\ r74:SI\l\
|\ \ \ 22:\ si:DI=`*.LC16'\l\
|\ \ \ 23:\ di:DI=[`stderr']\l\
|\ \ \ 24:\ ax:QI=0\l\
|\ \ \ 25:\ ax:SI=call\ [`fprintf']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_DEAD\ si:DI\l\
\ \ \ \ \ \ REG_DEAD\ dx:SI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
|\ \ \ 26:\ di:SI=0\l\
|\ \ \ 27:\ call\ [`exit']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:SI\l\
\ \ \ \ \ \ REG_NORETURN\ 0\l\
\ \ \ \ \ \ REG_EH_REGION\ 0\l\
}"];

	fn_37_basic_block_0 [shape=Mdiamond,style=filled,fillcolor=white,label="ENTRY"];

	fn_37_basic_block_0:s -> fn_37_basic_block_2:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_37_basic_block_2:s -> fn_37_basic_block_13:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[0%]"];
	fn_37_basic_block_2:s -> fn_37_basic_block_4:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_37_basic_block_13:s -> fn_37_basic_block_3:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_37_basic_block_3:s -> fn_37_basic_block_4:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_37_basic_block_4:s -> fn_37_basic_block_14:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[0%]"];
	fn_37_basic_block_4:s -> fn_37_basic_block_6:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_37_basic_block_14:s -> fn_37_basic_block_5:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_37_basic_block_6:s -> fn_37_basic_block_15:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[0%]"];
	fn_37_basic_block_6:s -> fn_37_basic_block_8:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_37_basic_block_15:s -> fn_37_basic_block_7:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_37_basic_block_8:s -> fn_37_basic_block_16:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[0%]"];
	fn_37_basic_block_8:s -> fn_37_basic_block_10:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_37_basic_block_16:s -> fn_37_basic_block_9:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_37_basic_block_10:s -> fn_37_basic_block_17:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[0%]"];
	fn_37_basic_block_10:s -> fn_37_basic_block_12:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_37_basic_block_17:s -> fn_37_basic_block_11:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_37_basic_block_11:s -> fn_37_basic_block_12:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_37_basic_block_12:s -> fn_37_basic_block_1:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_37_basic_block_0:s -> fn_37_basic_block_1:n [style="invis",constraint=true];
}
subgraph "spec_reset" {
	color="black";
	label="spec_reset";
	fn_39_basic_block_1 [shape=Mdiamond,style=filled,fillcolor=white,label="EXIT"];

	fn_39_basic_block_2 [shape=record,style=filled,fillcolor=lightgrey,label="{COUNT:18 FREQ:10000 |\ \ \ \ 4:\ NOTE_INSN_BASIC_BLOCK\ 2\l\
|\ \ \ \ 2:\ r63:SI=di:SI\l\
\ \ \ \ \ \ REG_DEAD\ di:SI\l\
|\ \ \ \ 3:\ NOTE_INSN_FUNCTION_BEG\l\
|\ \ \ \ 6:\ r64:DI=sign_extend(r63:SI)\l\
\ \ \ \ \ \ REG_DEAD\ r63:SI\l\
|\ \ \ \ 8:\ \{r66:DI=r64:DI*0x18;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_DEAD\ r64:DI\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ \ \ 9:\ \{r67:DI=r66:DI+`spec_fd';clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ \ 10:\ NOTE_INSN_DELETED\l\
|\ \ \ 11:\ r68:DI=sign_extend([r67:DI+0x4])\l\
|\ \ \ 16:\ r74:DI=[r66:DI+const(`spec_fd'+0x10)]\l\
\ \ \ \ \ \ REG_DEAD\ r66:DI\l\
|\ \ \ 19:\ dx:DI=r68:DI\l\
\ \ \ \ \ \ REG_DEAD\ r68:DI\l\
|\ \ \ 20:\ si:SI=0\l\
|\ \ \ 21:\ di:DI=r74:DI\l\
\ \ \ \ \ \ REG_DEAD\ r74:DI\l\
|\ \ \ 22:\ ax:DI=call\ [`memset']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_DEAD\ si:SI\l\
\ \ \ \ \ \ REG_DEAD\ dx:DI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:DI\l\
\ \ \ \ \ \ REG_EH_REGION\ 0\l\
|\ \ \ 29:\ [r67:DI+0x4]=0\l\
|\ \ \ 34:\ [r67:DI+0x8]=0\l\
\ \ \ \ \ \ REG_DEAD\ r67:DI\l\
|\ \ \ 39:\ ax:SI=0\l\
|\ \ \ 42:\ use\ ax:SI\l\
}"];

	fn_39_basic_block_0 [shape=Mdiamond,style=filled,fillcolor=white,label="ENTRY"];

	fn_39_basic_block_0:s -> fn_39_basic_block_2:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_39_basic_block_2:s -> fn_39_basic_block_1:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_39_basic_block_0:s -> fn_39_basic_block_1:n [style="invis",constraint=true];
}
subgraph "spec_fwrite" {
	color="black";
	label="spec_fwrite";
	fn_41_basic_block_1 [shape=Mdiamond,style=filled,fillcolor=white,label="EXIT"];

	fn_41_basic_block_2 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:75719 FREQ:10000 |\ \ \ \ 7:\ NOTE_INSN_BASIC_BLOCK\ 2\l\
|\ \ \ \ 2:\ r73:DI=di:DI\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
|\ \ \ \ 3:\ r74:SI=si:SI\l\
\ \ \ \ \ \ REG_DEAD\ si:SI\l\
|\ \ \ \ 4:\ r75:SI=dx:SI\l\
\ \ \ \ \ \ REG_DEAD\ dx:SI\l\
|\ \ \ \ 5:\ r76:SI=cx:SI\l\
\ \ \ \ \ \ REG_DEAD\ cx:SI\l\
|\ \ \ \ 6:\ NOTE_INSN_FUNCTION_BEG\l\
|\ \ \ \ 9:\ flags:CCGC=cmp([`dbglvl'],0x4)\l\
|\ \ \ 10:\ pc=\{(flags:CCGC\<=0)?L19:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x2710\l\
}"];

	fn_41_basic_block_23 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 203:\ NOTE_INSN_BASIC_BLOCK\ 23\l\
|\ \ 204:\ pc=L196\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_41_basic_block_3 [shape=record,style=filled,fillcolor=lightblue,label="{ FREQ:0 |\ \ 196:\ L196:\l\
|\ \ \ 11:\ NOTE_INSN_BASIC_BLOCK\ 3\l\
|\ \ \ 12:\ r8:SI=r76:SI\l\
|\ \ \ 13:\ cx:SI=r75:SI\l\
|\ \ \ 14:\ dx:SI=r74:SI\l\
|\ \ \ 15:\ si:DI=r73:DI\l\
|\ \ \ 16:\ di:DI=`*.LC34'\l\
|\ \ \ 17:\ ax:QI=0\l\
|\ \ \ 18:\ ax:SI=call\ [`printf']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ r8:SI\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_DEAD\ si:DI\l\
\ \ \ \ \ \ REG_DEAD\ cx:SI\l\
\ \ \ \ \ \ REG_DEAD\ dx:SI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
|\ \ 197:\ pc=L19\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_41_basic_block_4 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:75719 FREQ:10000 |\ \ \ 19:\ L19:\l\
|\ \ \ 20:\ NOTE_INSN_BASIC_BLOCK\ 4\l\
|\ \ \ 21:\ flags:CCGC=cmp(r76:SI,0x3)\l\
|\ \ \ 22:\ pc=\{(flags:CCGC\<=0)?L32:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x2710\l\
}"];

	fn_41_basic_block_6 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:75719 FREQ:10000 |\ \ \ 32:\ L32:\l\
|\ \ \ 33:\ NOTE_INSN_BASIC_BLOCK\ 6\l\
|\ \ \ 34:\ \{r60:SI=r74:SI*r75:SI;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_DEAD\ r74:SI\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ \ 35:\ r131:DI=sign_extend(r76:SI)\l\
\ \ \ \ \ \ REG_DEAD\ r76:SI\l\
|\ \ \ 37:\ \{r79:DI=r131:DI*0x18;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ \ 42:\ NOTE_INSN_DELETED\l\
|\ \ \ 43:\ NOTE_INSN_DELETED\l\
|\ \ \ 44:\ r85:DI=sign_extend([r79:DI+const(`spec_fd'+0x8)])\l\
|\ \ \ 45:\ NOTE_INSN_DELETED\l\
|\ \ \ 46:\ NOTE_INSN_DELETED\l\
|\ \ \ 47:\ r90:DI=sign_extend(r60:SI)\l\
|\ \ \ 49:\ \{r91:DI=[r79:DI+const(`spec_fd'+0x10)]+r85:DI;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
\ \ \ \ \ \ REG_DEAD\ r85:DI\l\
\ \ \ \ \ \ REG_DEAD\ r79:DI\l\
|\ \ \ 50:\ r92:DI=r73:DI\l\
\ \ \ \ \ \ REG_DEAD\ r73:DI\l\
|\ \ \ 51:\ flags:CC=cmp(r90:DI,0x8)\l\
|\ \ \ 52:\ pc=\{(ltu(flags:CC,0))?L73:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x7d0\l\
}"];

	fn_41_basic_block_7 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:60575 FREQ:8000 |\ \ 143:\ NOTE_INSN_BASIC_BLOCK\ 7\l\
|\ \ \ 53:\ NOTE_INSN_DELETED\l\
|\ \ \ 54:\ flags:CCZ=cmp(zero_extract(r91:DI,0x1,0),0)\l\
|\ \ \ 55:\ pc=\{(flags:CCZ==0)?L58:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCZ\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x2328\l\
}"];

	fn_41_basic_block_8 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:6057 FREQ:800 |\ \ 144:\ NOTE_INSN_BASIC_BLOCK\ 8\l\
|\ \ \ 56:\ \{[r91:DI]=[r92:DI];r91:DI=r91:DI+0x1;r92:DI=r92:DI+0x1;\}\l\
|\ \ \ 57:\ \{r90:DI=r90:DI-0x1;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
}"];

	fn_41_basic_block_9 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:60575 FREQ:8000 |\ \ \ 58:\ L58:\l\
|\ \ 145:\ NOTE_INSN_BASIC_BLOCK\ 9\l\
|\ \ \ 59:\ NOTE_INSN_DELETED\l\
|\ \ \ 60:\ flags:CCZ=cmp(zero_extract(r91:DI,0x1,0x1),0)\l\
|\ \ \ 61:\ pc=\{(flags:CCZ==0)?L64:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCZ\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x2328\l\
}"];

	fn_41_basic_block_10 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:6057 FREQ:800 |\ \ 146:\ NOTE_INSN_BASIC_BLOCK\ 10\l\
|\ \ \ 62:\ \{[r91:DI]=[r92:DI];r91:DI=r91:DI+0x2;r92:DI=r92:DI+0x2;\}\l\
|\ \ \ 63:\ \{r90:DI=r90:DI-0x2;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
}"];

	fn_41_basic_block_11 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:60575 FREQ:8000 |\ \ \ 64:\ L64:\l\
|\ \ 147:\ NOTE_INSN_BASIC_BLOCK\ 11\l\
|\ \ \ 65:\ NOTE_INSN_DELETED\l\
|\ \ \ 66:\ flags:CCZ=cmp(zero_extract(r91:DI,0x1,0x2),0)\l\
|\ \ \ 67:\ pc=\{(flags:CCZ==0)?L70:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCZ\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x2328\l\
}"];

	fn_41_basic_block_12 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:6057 FREQ:800 |\ \ 148:\ NOTE_INSN_BASIC_BLOCK\ 12\l\
|\ \ \ 68:\ \{[r91:DI]=[r92:DI];r91:DI=r91:DI+0x4;r92:DI=r92:DI+0x4;\}\l\
|\ \ \ 69:\ \{r90:DI=r90:DI-0x4;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
}"];

	fn_41_basic_block_13 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:60575 FREQ:8000 |\ \ \ 70:\ L70:\l\
|\ \ 149:\ NOTE_INSN_BASIC_BLOCK\ 13\l\
|\ \ \ 71:\ \{r96:DI=r90:DI\ 0\>\>0x3;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ \ 72:\ \{r132:DI=0;r91:DI=r96:DI\<\<0x3+r91:DI;r92:DI=r96:DI\<\<0x3+r92:DI;[r91:DI]=[r92:DI];use\ r96:DI;\}\l\
\ \ \ \ \ \ REG_DEAD\ r96:DI\l\
\ \ \ \ \ \ REG_UNUSED\ r132:DI\l\
}"];

	fn_41_basic_block_14 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:75719 FREQ:10000 |\ \ \ 73:\ L73:\l\
|\ \ 150:\ NOTE_INSN_BASIC_BLOCK\ 14\l\
|\ \ \ 74:\ r97:DI=0\l\
|\ \ \ 75:\ NOTE_INSN_DELETED\l\
|\ \ \ 76:\ flags:CCZ=cmp(zero_extract(r90:DI,0x1,0x2),0)\l\
|\ \ \ 77:\ pc=\{(flags:CCZ==0)?L82:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCZ\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x1388\l\
}"];

	fn_41_basic_block_15 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:37859 FREQ:5000 |\ \ 151:\ NOTE_INSN_BASIC_BLOCK\ 15\l\
|\ \ \ 78:\ r99:SI=[r92:DI]\l\
|\ \ \ 79:\ [r91:DI]=r99:SI\l\
\ \ \ \ \ \ REG_DEAD\ r99:SI\l\
|\ \ \ 81:\ r97:DI=0x4\l\
}"];

	fn_41_basic_block_16 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:75719 FREQ:10000 |\ \ \ 82:\ L82:\l\
|\ \ 152:\ NOTE_INSN_BASIC_BLOCK\ 16\l\
|\ \ \ 83:\ NOTE_INSN_DELETED\l\
|\ \ \ 84:\ flags:CCZ=cmp(zero_extract(r90:DI,0x1,0x1),0)\l\
|\ \ \ 85:\ pc=\{(flags:CCZ==0)?L90:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCZ\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x1388\l\
}"];

	fn_41_basic_block_17 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:37859 FREQ:5000 |\ \ 153:\ NOTE_INSN_BASIC_BLOCK\ 17\l\
|\ \ \ 86:\ r102:HI=[r92:DI+r97:DI]\l\
|\ \ \ 87:\ [r91:DI+r97:DI]=r102:HI\l\
\ \ \ \ \ \ REG_DEAD\ r102:HI\l\
|\ \ \ 88:\ NOTE_INSN_DELETED\l\
|\ \ \ 89:\ \{r97:DI=r97:DI+0x2;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
}"];

	fn_41_basic_block_18 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:75719 FREQ:10000 |\ \ \ 90:\ L90:\l\
|\ \ 154:\ NOTE_INSN_BASIC_BLOCK\ 18\l\
|\ \ \ 91:\ NOTE_INSN_DELETED\l\
|\ \ \ 92:\ flags:CCZ=cmp(zero_extract(r90:DI,0x1,0),0)\l\
\ \ \ \ \ \ REG_DEAD\ r90:DI\l\
|\ \ \ 93:\ pc=\{(flags:CCZ==0)?L96:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCZ\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x1388\l\
}"];

	fn_41_basic_block_19 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:37859 FREQ:5000 |\ \ 155:\ NOTE_INSN_BASIC_BLOCK\ 19\l\
|\ \ \ 94:\ r105:QI=[r92:DI+r97:DI]\l\
\ \ \ \ \ \ REG_DEAD\ r92:DI\l\
|\ \ \ 95:\ [r91:DI+r97:DI]=r105:QI\l\
\ \ \ \ \ \ REG_DEAD\ r105:QI\l\
\ \ \ \ \ \ REG_DEAD\ r97:DI\l\
\ \ \ \ \ \ REG_DEAD\ r91:DI\l\
}"];

	fn_41_basic_block_20 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:75719 FREQ:10000 |\ \ \ 96:\ L96:\l\
|\ \ 156:\ NOTE_INSN_BASIC_BLOCK\ 20\l\
|\ \ \ 99:\ \{r108:DI=r131:DI*0x18;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_DEAD\ r131:DI\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ 100:\ \{r109:DI=r108:DI+`spec_fd';clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_DEAD\ r108:DI\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ 108:\ NOTE_INSN_DELETED\l\
|\ \ 109:\ NOTE_INSN_DELETED\l\
|\ \ 110:\ \{[r109:DI+0x4]=r60:SI+[r109:DI+0x4];clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ 122:\ NOTE_INSN_DELETED\l\
|\ \ 123:\ NOTE_INSN_DELETED\l\
|\ \ 124:\ \{[r109:DI+0x8]=r60:SI+[r109:DI+0x8];clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_DEAD\ r60:SI\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
\ \ \ \ \ \ REG_DEAD\ r109:DI\l\
|\ \ 125:\ flags:CCGC=cmp([`dbglvl'],0x4)\l\
|\ \ 126:\ pc=\{(flags:CCGC\<=0)?L132:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x2710\l\
}"];

	fn_41_basic_block_25 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 211:\ NOTE_INSN_BASIC_BLOCK\ 25\l\
|\ \ 212:\ pc=L200\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_41_basic_block_21 [shape=record,style=filled,fillcolor=lightblue,label="{ FREQ:0 |\ \ 200:\ L200:\l\
|\ \ 127:\ NOTE_INSN_BASIC_BLOCK\ 21\l\
|\ \ 128:\ si:SI=r75:SI\l\
|\ \ 129:\ di:DI=`*.LC18'\l\
|\ \ 130:\ ax:QI=0\l\
|\ \ 131:\ ax:SI=call\ [`printf']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_DEAD\ si:SI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
|\ \ 201:\ pc=L132\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_41_basic_block_22 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:75719 FREQ:10000 |\ \ 132:\ L132:\l\
|\ \ 133:\ NOTE_INSN_BASIC_BLOCK\ 22\l\
|\ \ 138:\ ax:SI=r75:SI\l\
\ \ \ \ \ \ REG_DEAD\ r75:SI\l\
|\ \ 141:\ use\ ax:SI\l\
}"];

	fn_41_basic_block_24 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 207:\ NOTE_INSN_BASIC_BLOCK\ 24\l\
|\ \ 208:\ pc=L199\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_41_basic_block_5 [shape=record,style=filled,fillcolor=lightblue,label="{ FREQ:0 |\ \ 199:\ L199:\l\
|\ \ \ 23:\ NOTE_INSN_BASIC_BLOCK\ 5\l\
|\ \ \ 24:\ dx:SI=r76:SI\l\
\ \ \ \ \ \ REG_DEAD\ r76:SI\l\
|\ \ \ 25:\ si:DI=`*.LC35'\l\
|\ \ \ 26:\ di:DI=[`stderr']\l\
|\ \ \ 27:\ ax:QI=0\l\
|\ \ \ 28:\ ax:SI=call\ [`fprintf']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_DEAD\ si:DI\l\
\ \ \ \ \ \ REG_DEAD\ dx:SI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
|\ \ \ 29:\ di:SI=0\l\
|\ \ \ 30:\ call\ [`exit']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:SI\l\
\ \ \ \ \ \ REG_NORETURN\ 0\l\
\ \ \ \ \ \ REG_EH_REGION\ 0\l\
}"];

	fn_41_basic_block_0 [shape=Mdiamond,style=filled,fillcolor=white,label="ENTRY"];

	fn_41_basic_block_0:s -> fn_41_basic_block_2:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_41_basic_block_2:s -> fn_41_basic_block_23:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[0%]"];
	fn_41_basic_block_2:s -> fn_41_basic_block_4:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_41_basic_block_23:s -> fn_41_basic_block_3:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_41_basic_block_3:s -> fn_41_basic_block_4:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_41_basic_block_4:s -> fn_41_basic_block_24:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[0%]"];
	fn_41_basic_block_4:s -> fn_41_basic_block_6:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_41_basic_block_24:s -> fn_41_basic_block_5:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_41_basic_block_6:s -> fn_41_basic_block_14:n [style="solid,bold",color=black,weight=10,constraint=true, label="[20%]"];
	fn_41_basic_block_6:s -> fn_41_basic_block_7:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[80%]"];
	fn_41_basic_block_7:s -> fn_41_basic_block_9:n [style="solid,bold",color=black,weight=10,constraint=true, label="[90%]"];
	fn_41_basic_block_7:s -> fn_41_basic_block_8:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[10%]"];
	fn_41_basic_block_8:s -> fn_41_basic_block_9:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_41_basic_block_9:s -> fn_41_basic_block_11:n [style="solid,bold",color=black,weight=10,constraint=true, label="[90%]"];
	fn_41_basic_block_9:s -> fn_41_basic_block_10:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[10%]"];
	fn_41_basic_block_10:s -> fn_41_basic_block_11:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_41_basic_block_11:s -> fn_41_basic_block_13:n [style="solid,bold",color=black,weight=10,constraint=true, label="[90%]"];
	fn_41_basic_block_11:s -> fn_41_basic_block_12:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[10%]"];
	fn_41_basic_block_12:s -> fn_41_basic_block_13:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_41_basic_block_13:s -> fn_41_basic_block_14:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_41_basic_block_14:s -> fn_41_basic_block_16:n [style="solid,bold",color=black,weight=10,constraint=true, label="[50%]"];
	fn_41_basic_block_14:s -> fn_41_basic_block_15:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[50%]"];
	fn_41_basic_block_15:s -> fn_41_basic_block_16:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_41_basic_block_16:s -> fn_41_basic_block_18:n [style="solid,bold",color=black,weight=10,constraint=true, label="[50%]"];
	fn_41_basic_block_16:s -> fn_41_basic_block_17:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[50%]"];
	fn_41_basic_block_17:s -> fn_41_basic_block_18:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_41_basic_block_18:s -> fn_41_basic_block_20:n [style="solid,bold",color=black,weight=10,constraint=true, label="[50%]"];
	fn_41_basic_block_18:s -> fn_41_basic_block_19:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[50%]"];
	fn_41_basic_block_19:s -> fn_41_basic_block_20:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_41_basic_block_20:s -> fn_41_basic_block_25:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[0%]"];
	fn_41_basic_block_20:s -> fn_41_basic_block_22:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_41_basic_block_25:s -> fn_41_basic_block_21:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_41_basic_block_21:s -> fn_41_basic_block_22:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_41_basic_block_22:s -> fn_41_basic_block_1:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_41_basic_block_0:s -> fn_41_basic_block_1:n [style="invis",constraint=true];
}
subgraph "main" {
	color="black";
	label="main";
	fn_43_basic_block_1 [shape=Mdiamond,style=filled,fillcolor=white,label="EXIT"];

	fn_43_basic_block_2 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:3 FREQ:0 |\ \ \ 47:\ NOTE_INSN_BASIC_BLOCK\ 2\l\
|\ \ \ 37:\ r136:SI=di:SI\l\
\ \ \ \ \ \ REG_DEAD\ di:SI\l\
|\ \ \ 38:\ r137:DI=si:DI\l\
\ \ \ \ \ \ REG_DEAD\ si:DI\l\
|\ \ \ 39:\ NOTE_INSN_FUNCTION_BEG\l\
|\ \ \ 49:\ debug\ input_size\ =\>\ 0x40\l\
|\ \ \ 50:\ debug\ input_name\ =\>\ `*.LC39'\l\
|\ \ \ 51:\ [`seedi']=0xa\l\
|\ \ \ 52:\ flags:CCGC=cmp(r136:SI,0x1)\l\
|\ \ \ 53:\ pc=\{(flags:CCGC\<=0)?L306:pc\}\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0\l\
}"];

	fn_43_basic_block_34 [shape=record,style=filled,fillcolor=lightblue,label="{ FREQ:0 |\ \ 306:\ L306:\l\
|\ \ 305:\ NOTE_INSN_BASIC_BLOCK\ 34\l\
|\ \ \ 46:\ r98:DI=`*.LC39'\l\
}"];

	fn_43_basic_block_3 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:3 FREQ:0 |\ \ \ 54:\ NOTE_INSN_BASIC_BLOCK\ 3\l\
|\ \ \ 55:\ r98:DI=[r137:DI+0x8]\l\
|\ \ \ 57:\ debug\ input_name\ =\>\ r98:DI\l\
|\ \ \ 58:\ flags:CCZ=cmp(r136:SI,0x2)\l\
|\ \ \ 59:\ pc=\{(flags:CCZ==0)?L293:pc\}\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCZ\l\
\ \ \ \ \ \ REG_BR_PROB\ 0\l\
}"];

	fn_43_basic_block_35 [shape=record,style=filled,fillcolor=lightblue,label="{ FREQ:0 |\ \ 293:\ L293:\l\
|\ \ 294:\ NOTE_INSN_BASIC_BLOCK\ 35\l\
|\ \ 295:\ debug\ input_name\ =\>\ r98:DI\l\
|\ \ 296:\ debug\ input_size\ =\>\ 0x40\l\
|\ \ \ 41:\ r125:SI=0x40\l\
|\ \ \ 42:\ r96:SI=0x40\l\
|\ \ 324:\ pc=L83\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_43_basic_block_4 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:3 FREQ:0 |\ \ \ 60:\ NOTE_INSN_BASIC_BLOCK\ 4\l\
|\ \ \ 61:\ debug\ __nptr\ =\>\ [r137:DI+0x10]\l\
|\ \ \ 62:\ r138:DI=[r137:DI+0x10]\l\
|\ \ \ 63:\ dx:SI=0xa\l\
|\ \ \ 64:\ si:DI=0\l\
|\ \ \ 65:\ di:DI=r138:DI\l\
\ \ \ \ \ \ REG_DEAD\ r138:DI\l\
|\ \ \ 66:\ ax:DI=call\ [`strtol']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_DEAD\ si:DI\l\
\ \ \ \ \ \ REG_DEAD\ dx:SI\l\
\ \ \ \ \ \ REG_EH_REGION\ 0\l\
|\ \ \ 67:\ r124:DI=ax:DI\l\
\ \ \ \ \ \ REG_DEAD\ ax:DI\l\
|\ \ \ 68:\ r125:SI=r124:DI#0\l\
|\ \ \ 70:\ debug\ input_size\ =\>\ r124:DI#0\l\
|\ \ \ 40:\ r96:SI=r124:DI#0\l\
\ \ \ \ \ \ REG_DEAD\ r124:DI\l\
|\ \ \ 71:\ flags:CCZ=cmp(r136:SI,0x3)\l\
\ \ \ \ \ \ REG_DEAD\ r136:SI\l\
|\ \ \ 72:\ pc=\{(flags:CCZ==0)?L83:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCZ\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x2710\l\
}"];

	fn_43_basic_block_37 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 326:\ NOTE_INSN_BASIC_BLOCK\ 37\l\
|\ \ 327:\ pc=L320\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_43_basic_block_5 [shape=record,style=filled,fillcolor=lightblue,label="{ FREQ:0 |\ \ 320:\ L320:\l\
|\ \ \ 73:\ NOTE_INSN_BASIC_BLOCK\ 5\l\
|\ \ \ 74:\ debug\ __nptr\ =\>\ [r137:DI+0x18]\l\
|\ \ \ 75:\ r139:DI=[r137:DI+0x18]\l\
\ \ \ \ \ \ REG_DEAD\ r137:DI\l\
|\ \ \ 76:\ dx:SI=0xa\l\
|\ \ \ 77:\ si:DI=0\l\
|\ \ \ 78:\ di:DI=r139:DI\l\
\ \ \ \ \ \ REG_DEAD\ r139:DI\l\
|\ \ \ 79:\ ax:DI=call\ [`strtol']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_DEAD\ si:DI\l\
\ \ \ \ \ \ REG_DEAD\ dx:SI\l\
\ \ \ \ \ \ REG_EH_REGION\ 0\l\
|\ \ \ 80:\ r126:DI=ax:DI\l\
\ \ \ \ \ \ REG_DEAD\ ax:DI\l\
|\ \ \ 81:\ r96:SI=r126:DI#0\l\
\ \ \ \ \ \ REG_DEAD\ r126:DI\l\
|\ \ \ 82:\ debug\ compressed_size\ =\>\ optimized\ away\l\
|\ \ 321:\ pc=L83\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_43_basic_block_6 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:3 FREQ:0 |\ \ \ 83:\ L83:\l\
|\ \ \ 84:\ NOTE_INSN_BASIC_BLOCK\ 6\l\
|\ \ \ 85:\ debug\ compressed_size\ =\>\ r96:SI\l\
|\ \ \ 86:\ \{r101:SI=r125:SI\<\<0x14;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ \ 88:\ [`spec_fd']=r101:SI\l\
|\ \ \ 90:\ \{r142:SI=r96:SI\<\<0x14;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_DEAD\ r96:SI\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ \ 91:\ [const(`spec_fd'+0x18)]=r142:SI\l\
\ \ \ \ \ \ REG_DEAD\ r142:SI\l\
|\ \ \ 93:\ [const(`spec_fd'+0x30)]=r101:SI\l\
|\ \ \ 94:\ ax:QI=0\l\
|\ \ \ 95:\ ax:SI=call\ [`spec_init']\ argc:0\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
\ \ \ \ \ \ REG_EH_REGION\ 0\l\
|\ \ \ 96:\ flags:CCGC=cmp([`dbglvl'],0x2)\l\
|\ \ \ 97:\ pc=\{(flags:CCGC\<=0)?L101:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0\l\
}"];

	fn_43_basic_block_7 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:3 FREQ:0 |\ \ \ 98:\ NOTE_INSN_BASIC_BLOCK\ 7\l\
|\ \ \ 99:\ di:DI=`*.LC40'\l\
|\ \ 100:\ ax:SI=call\ [`puts']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
}"];

	fn_43_basic_block_8 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:3 FREQ:0 |\ \ 101:\ L101:\l\
|\ \ 102:\ NOTE_INSN_BASIC_BLOCK\ 8\l\
|\ \ 103:\ dx:SI=r101:SI\l\
|\ \ 104:\ si:DI=r98:DI\l\
\ \ \ \ \ \ REG_DEAD\ r98:DI\l\
|\ \ 105:\ di:SI=0\l\
|\ \ 106:\ ax:SI=call\ [`spec_load']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:SI\l\
\ \ \ \ \ \ REG_DEAD\ si:DI\l\
\ \ \ \ \ \ REG_DEAD\ dx:SI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
\ \ \ \ \ \ REG_EH_REGION\ 0\l\
|\ \ 107:\ flags:CCGC=cmp([`dbglvl'],0x3)\l\
|\ \ 108:\ pc=\{(flags:CCGC\<=0)?L115:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0\l\
}"];

	fn_43_basic_block_9 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:3 FREQ:0 |\ \ 109:\ NOTE_INSN_BASIC_BLOCK\ 9\l\
|\ \ 111:\ si:SI=[const(`spec_fd'+0x4)]\l\
|\ \ 112:\ di:DI=`*.LC41'\l\
|\ \ 113:\ ax:QI=0\l\
|\ \ 114:\ ax:SI=call\ [`printf']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_DEAD\ si:SI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
}"];

	fn_43_basic_block_10 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:3 FREQ:0 |\ \ 115:\ L115:\l\
|\ \ 116:\ NOTE_INSN_BASIC_BLOCK\ 10\l\
|\ \ 117:\ \{r145:SI=r125:SI\<\<0xa;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ 118:\ r146:DI=sign_extend(r145:SI)\l\
\ \ \ \ \ \ REG_DEAD\ r145:SI\l\
|\ \ 119:\ di:DI=r146:DI\l\
\ \ \ \ \ \ REG_DEAD\ r146:DI\l\
|\ \ 120:\ ax:DI=call\ [`malloc']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_EH_REGION\ 0\l\
|\ \ 121:\ r108:DI=ax:DI\l\
\ \ \ \ \ \ REG_DEAD\ ax:DI\l\
\ \ \ \ \ \ REG_NOALIAS\ r147:DI\l\
|\ \ 123:\ debug\ validate_array\ =\>\ r108:DI\l\
|\ \ 124:\ flags:CCZ=cmp(r108:DI,0)\l\
|\ \ 125:\ pc=\{(flags:CCZ!=0)?L132:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCZ\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x2710\l\
}"];

	fn_43_basic_block_12 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:3 FREQ:0 |\ \ 132:\ L132:\l\
|\ \ 133:\ NOTE_INSN_BASIC_BLOCK\ 12\l\
|\ \ 134:\ debug\ i\ =\>\ 0\l\
|\ \ 136:\ r109:DI=[const(`spec_fd'+0x10)]\l\
|\ \ 137:\ r132:DI=r108:DI\l\
|\ \ \ 43:\ r131:DI=0\l\
}"];

	fn_43_basic_block_14 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:97001 FREQ:3333 |\ \ 146:\ L146:\l\
|\ \ 147:\ NOTE_INSN_BASIC_BLOCK\ 14\l\
|\ \ 148:\ debug\ i\ =\>\ optimized\ away\l\
|\ \ 150:\ flags:CCGC=cmp(r101:SI,r131:DI#0)\l\
|\ \ 151:\ pc=\{(flags:CCGC\>0)?L149:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x2710\l\
}"];

	fn_43_basic_block_15 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:3 FREQ:0 |\ \ 152:\ NOTE_INSN_BASIC_BLOCK\ 15\l\
|\ \ 153:\ [`smallMode']=0\l\
|\ \ 154:\ [`verbosity']=0\l\
|\ \ 155:\ [`blockSize100k']=0x9\l\
|\ \ 156:\ [`workFactor']=0x1e\l\
|\ \ 158:\ debug\ level\ =\>\ 0x5\l\
|\ \ \ 44:\ r123:SI=0x5\l\
}"];

	fn_43_basic_block_16 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:9 FREQ:0 |\ \ 162:\ L162:\l\
|\ \ 163:\ NOTE_INSN_BASIC_BLOCK\ 16\l\
|\ \ 164:\ debug\ level\ =\>\ r123:SI\l\
|\ \ 165:\ flags:CCGC=cmp([`dbglvl'],0x2)\l\
|\ \ 166:\ pc=\{(flags:CCGC\<=0)?L172:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0\l\
}"];

	fn_43_basic_block_17 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:9 FREQ:0 |\ \ 167:\ NOTE_INSN_BASIC_BLOCK\ 17\l\
|\ \ 168:\ si:SI=r123:SI\l\
|\ \ 169:\ di:DI=`*.LC43'\l\
|\ \ 170:\ ax:QI=0\l\
|\ \ 171:\ ax:SI=call\ [`printf']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_DEAD\ si:SI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
}"];

	fn_43_basic_block_18 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:9 FREQ:0 |\ \ 172:\ L172:\l\
|\ \ 173:\ NOTE_INSN_BASIC_BLOCK\ 18\l\
|\ \ 174:\ debug\ in\ =\>\ 0\l\
|\ \ 175:\ debug\ out\ =\>\ 0x1\l\
|\ \ 176:\ debug\ lev\ =\>\ r123:SI\l\
|\ \ 177:\ [`blockSize100k']=r123:SI\l\
|\ \ 178:\ si:SI=0x1\l\
|\ \ 179:\ di:SI=0\l\
|\ \ 180:\ call\ [`compressStream']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:SI\l\
\ \ \ \ \ \ REG_DEAD\ si:SI\l\
|\ \ 181:\ flags:CCGC=cmp([`dbglvl'],0x3)\l\
|\ \ 182:\ pc=\{(flags:CCGC\<=0)?L189:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0\l\
}"];

	fn_43_basic_block_19 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:9 FREQ:0 |\ \ 183:\ NOTE_INSN_BASIC_BLOCK\ 19\l\
|\ \ 185:\ si:SI=[const(`spec_fd'+0x1c)]\l\
|\ \ 186:\ di:DI=`*.LC44'\l\
|\ \ 187:\ ax:QI=0\l\
|\ \ 188:\ ax:SI=call\ [`printf']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_DEAD\ si:SI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
}"];

	fn_43_basic_block_20 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:9 FREQ:0 |\ \ 189:\ L189:\l\
|\ \ 190:\ NOTE_INSN_BASIC_BLOCK\ 20\l\
|\ \ 191:\ di:SI=0\l\
|\ \ 192:\ ax:SI=call\ [`spec_reset']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:SI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
\ \ \ \ \ \ REG_EH_REGION\ 0\l\
|\ \ 193:\ debug\ fd\ =\>\ 0x1\l\
|\ \ 195:\ [const(`spec_fd'+0x20)]=0\l\
|\ \ 196:\ flags:CCGC=cmp([`dbglvl'],0x2)\l\
|\ \ 197:\ pc=\{(flags:CCGC\<=0)?L201:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0\l\
}"];

	fn_43_basic_block_21 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:9 FREQ:0 |\ \ 198:\ NOTE_INSN_BASIC_BLOCK\ 21\l\
|\ \ 199:\ di:DI=`*.LC45'\l\
|\ \ 200:\ ax:SI=call\ [`puts']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
}"];

	fn_43_basic_block_22 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:9 FREQ:0 |\ \ 201:\ L201:\l\
|\ \ 202:\ NOTE_INSN_BASIC_BLOCK\ 22\l\
|\ \ 203:\ debug\ in\ =\>\ 0x1\l\
|\ \ 204:\ debug\ out\ =\>\ 0\l\
|\ \ 205:\ debug\ lev\ =\>\ r123:SI\l\
|\ \ 206:\ [`blockSize100k']=0\l\
|\ \ 207:\ si:SI=0\l\
|\ \ 208:\ di:SI=0x1\l\
|\ \ 209:\ ax:QI=call\ [`uncompressStream']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:SI\l\
\ \ \ \ \ \ REG_DEAD\ si:SI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:QI\l\
|\ \ 210:\ flags:CCGC=cmp([`dbglvl'],0x3)\l\
|\ \ 211:\ pc=\{(flags:CCGC\<=0)?L218:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0\l\
}"];

	fn_43_basic_block_23 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:9 FREQ:0 |\ \ 212:\ NOTE_INSN_BASIC_BLOCK\ 23\l\
|\ \ 214:\ si:SI=[const(`spec_fd'+0x4)]\l\
|\ \ 215:\ di:DI=`*.LC46'\l\
|\ \ 216:\ ax:QI=0\l\
|\ \ 217:\ ax:SI=call\ [`printf']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_DEAD\ si:SI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
}"];

	fn_43_basic_block_24 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:9 FREQ:0 |\ \ 218:\ L218:\l\
|\ \ 219:\ NOTE_INSN_BASIC_BLOCK\ 24\l\
|\ \ 221:\ debug\ i\ =\>\ 0\l\
|\ \ 222:\ flags:CCNO=cmp(r101:SI,0)\l\
|\ \ 223:\ pc=\{(flags:CCNO\<=0)?L265:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCNO\l\
\ \ \ \ \ \ REG_BR_PROB\ 0\l\
}"];

	fn_43_basic_block_25 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:9 FREQ:0 |\ \ 224:\ NOTE_INSN_BASIC_BLOCK\ 25\l\
|\ \ 226:\ r129:DI=[const(`spec_fd'+0x10)]\l\
|\ \ 227:\ NOTE_INSN_DELETED\l\
|\ \ 228:\ flags:CCZ=cmp([r108:DI],[r129:DI])\l\
|\ \ 229:\ pc=\{(flags:CCZ!=0)?L243:pc\}\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCZ\l\
\ \ \ \ \ \ REG_BR_PROB\ 0\l\
}"];

	fn_43_basic_block_28 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:9 FREQ:0 |\ \ 253:\ NOTE_INSN_BASIC_BLOCK\ 28\l\
|\ \ 254:\ \{r97:DI=r108:DI+0x1;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ \ 45:\ r127:DI=0\l\
}"];

	fn_43_basic_block_29 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:290994 FREQ:10000 |\ \ 255:\ L255:\l\
|\ \ 256:\ NOTE_INSN_BASIC_BLOCK\ 29\l\
|\ \ 258:\ debug\ D#8\ =\>\ optimized\ away\l\
|\ \ 260:\ debug\ i\ =\>\ D#8\l\
|\ \ 262:\ \{r155:SI=r127:DI#0+0x403;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ 263:\ flags:CCGC=cmp(r101:SI,r155:SI)\l\
\ \ \ \ \ \ REG_DEAD\ r155:SI\l\
|\ \ 264:\ pc=\{(flags:CCGC\>0)?L261:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x2710\l\
}"];

	fn_43_basic_block_30 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:9 FREQ:0 |\ \ 265:\ L265:\l\
|\ \ 266:\ NOTE_INSN_BASIC_BLOCK\ 30\l\
|\ \ 267:\ flags:CCGC=cmp([`dbglvl'],0x3)\l\
|\ \ 268:\ pc=\{(flags:CCGC\<=0)?L272:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCGC\l\
\ \ \ \ \ \ REG_BR_PROB\ 0\l\
}"];

	fn_43_basic_block_31 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:9 FREQ:0 |\ \ 269:\ NOTE_INSN_BASIC_BLOCK\ 31\l\
|\ \ 270:\ di:DI=`*.LC48'\l\
|\ \ 271:\ ax:SI=call\ [`puts']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
}"];

	fn_43_basic_block_32 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:9 FREQ:0 |\ \ 272:\ L272:\l\
|\ \ 273:\ NOTE_INSN_BASIC_BLOCK\ 32\l\
|\ \ 274:\ di:SI=0x1\l\
|\ \ 275:\ ax:SI=call\ [`spec_reset']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:SI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
\ \ \ \ \ \ REG_EH_REGION\ 0\l\
|\ \ 276:\ debug\ fd\ =\>\ 0\l\
|\ \ 278:\ [const(`spec_fd'+0x8)]=0\l\
|\ \ 279:\ \{r123:SI=r123:SI+0x2;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ 281:\ debug\ level\ =\>\ r123:SI\l\
|\ \ 283:\ flags:CCZ=cmp(r123:SI,0xb)\l\
|\ \ 284:\ pc=\{(flags:CCZ!=0)?L162:pc\}\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCZ\l\
\ \ \ \ \ \ REG_BR_PROB\ 0x1d4c\l\
}"];

	fn_43_basic_block_33 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:3 FREQ:0 |\ \ 285:\ NOTE_INSN_BASIC_BLOCK\ 33\l\
|\ \ 286:\ si:SI=r125:SI\l\
\ \ \ \ \ \ REG_DEAD\ r125:SI\l\
|\ \ 287:\ di:DI=`*.LC49'\l\
|\ \ 288:\ ax:QI=0\l\
|\ \ 289:\ ax:SI=call\ [`printf']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_DEAD\ si:SI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
}"];

	fn_43_basic_block_36 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:6 FREQ:0 |\ \ 311:\ NOTE_INSN_BASIC_BLOCK\ 36\l\
|\ \ 300:\ ax:SI=0\l\
|\ \ 303:\ use\ ax:SI\l\
}"];

	fn_43_basic_block_26 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:290985 FREQ:10000 |\ \ 261:\ L261:\l\
|\ \ 233:\ NOTE_INSN_BASIC_BLOCK\ 26\l\
|\ \ 234:\ r118:QI=[r97:DI]\l\
|\ \ 235:\ r121:QI=[r129:DI+r127:DI+0x403]\l\
|\ \ 236:\ \{r97:DI=r97:DI+0x1;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ 237:\ \{r127:DI=r127:DI+0x403;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ 238:\ flags:CCZ=cmp(r118:QI,r121:QI)\l\
\ \ \ \ \ \ REG_DEAD\ r121:QI\l\
\ \ \ \ \ \ REG_DEAD\ r118:QI\l\
|\ \ 239:\ pc=\{(flags:CCZ!=0)?L243:pc\}\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
\ \ \ \ \ \ REG_DEAD\ flags:CCZ\l\
\ \ \ \ \ \ REG_BR_PROB\ 0\l\
}"];

	fn_43_basic_block_27 [shape=record,style=filled,fillcolor=lightblue,label="{ FREQ:0 |\ \ 243:\ L243:\l\
|\ \ 244:\ NOTE_INSN_BASIC_BLOCK\ 27\l\
|\ \ 245:\ si:SI=r125:SI\l\
\ \ \ \ \ \ REG_DEAD\ r125:SI\l\
|\ \ 246:\ di:DI=`*.LC47'\l\
|\ \ 247:\ ax:QI=0\l\
|\ \ 248:\ ax:SI=call\ [`printf']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_DEAD\ si:SI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
|\ \ 249:\ di:SI=0\l\
|\ \ 250:\ call\ [`exit']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:SI\l\
\ \ \ \ \ \ REG_NORETURN\ 0\l\
\ \ \ \ \ \ REG_EH_REGION\ 0\l\
}"];

	fn_43_basic_block_13 [shape=record,style=filled,fillcolor=lightpink,label="{COUNT:96998 FREQ:3333 |\ \ 149:\ L149:\l\
|\ \ 140:\ NOTE_INSN_BASIC_BLOCK\ 13\l\
|\ \ 141:\ r149:QI=[r109:DI+r131:DI]\l\
|\ \ 142:\ [r132:DI]=r149:QI\l\
\ \ \ \ \ \ REG_DEAD\ r149:QI\l\
|\ \ 143:\ debug\ i\ =\>\ optimized\ away\l\
|\ \ 144:\ \{r131:DI=r131:DI+0x403;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
|\ \ 145:\ \{r132:DI=r132:DI+0x1;clobber\ flags:CC;\}\l\
\ \ \ \ \ \ REG_UNUSED\ flags:CC\l\
}"];

	fn_43_basic_block_38 [shape=record,style=filled,fillcolor=lightpink,label="{ FREQ:0 |\ \ 330:\ NOTE_INSN_BASIC_BLOCK\ 38\l\
|\ \ 331:\ pc=L323\l\
\ \ \ \ \ \ REG_CROSSING_JUMP\ (nil)\l\
}"];

	fn_43_basic_block_11 [shape=record,style=filled,fillcolor=lightblue,label="{ FREQ:0 |\ \ 323:\ L323:\l\
|\ \ 126:\ NOTE_INSN_BASIC_BLOCK\ 11\l\
|\ \ 127:\ di:DI=`*.LC42'\l\
|\ \ 128:\ ax:SI=call\ [`puts']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:DI\l\
\ \ \ \ \ \ REG_UNUSED\ ax:SI\l\
|\ \ 129:\ di:SI=0\l\
|\ \ 130:\ call\ [`exit']\ argc:0\l\
\ \ \ \ \ \ REG_DEAD\ di:SI\l\
\ \ \ \ \ \ REG_NORETURN\ 0\l\
\ \ \ \ \ \ REG_EH_REGION\ 0\l\
}"];

	fn_43_basic_block_0 [shape=Mdiamond,style=filled,fillcolor=white,label="ENTRY"];

	fn_43_basic_block_0:s -> fn_43_basic_block_2:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_43_basic_block_2:s -> fn_43_basic_block_3:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_43_basic_block_2:s -> fn_43_basic_block_34:n [style="solid,bold",color=black,weight=10,constraint=true, label="[0%]"];
	fn_43_basic_block_3:s -> fn_43_basic_block_4:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_43_basic_block_3:s -> fn_43_basic_block_35:n [style="solid,bold",color=black,weight=10,constraint=true, label="[0%]"];
	fn_43_basic_block_4:s -> fn_43_basic_block_37:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[0%]"];
	fn_43_basic_block_4:s -> fn_43_basic_block_6:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_43_basic_block_37:s -> fn_43_basic_block_5:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_43_basic_block_5:s -> fn_43_basic_block_6:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_43_basic_block_6:s -> fn_43_basic_block_7:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_43_basic_block_6:s -> fn_43_basic_block_8:n [style="solid,bold",color=black,weight=10,constraint=true, label="[0%]"];
	fn_43_basic_block_7:s -> fn_43_basic_block_8:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_43_basic_block_8:s -> fn_43_basic_block_9:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_43_basic_block_8:s -> fn_43_basic_block_10:n [style="solid,bold",color=black,weight=10,constraint=true, label="[0%]"];
	fn_43_basic_block_9:s -> fn_43_basic_block_10:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_43_basic_block_10:s -> fn_43_basic_block_38:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[0%]"];
	fn_43_basic_block_10:s -> fn_43_basic_block_12:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_43_basic_block_38:s -> fn_43_basic_block_11:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_43_basic_block_12:s -> fn_43_basic_block_14:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_43_basic_block_13:s -> fn_43_basic_block_14:n [style="dotted,bold",color=blue,weight=10,constraint=false, label="[100%]"];
	fn_43_basic_block_14:s -> fn_43_basic_block_13:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_43_basic_block_14:s -> fn_43_basic_block_15:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[0%]"];
	fn_43_basic_block_15:s -> fn_43_basic_block_16:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_43_basic_block_16:s -> fn_43_basic_block_17:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_43_basic_block_16:s -> fn_43_basic_block_18:n [style="solid,bold",color=black,weight=10,constraint=true, label="[0%]"];
	fn_43_basic_block_17:s -> fn_43_basic_block_18:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_43_basic_block_18:s -> fn_43_basic_block_19:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_43_basic_block_18:s -> fn_43_basic_block_20:n [style="solid,bold",color=black,weight=10,constraint=true, label="[0%]"];
	fn_43_basic_block_19:s -> fn_43_basic_block_20:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_43_basic_block_20:s -> fn_43_basic_block_21:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_43_basic_block_20:s -> fn_43_basic_block_22:n [style="solid,bold",color=black,weight=10,constraint=true, label="[0%]"];
	fn_43_basic_block_21:s -> fn_43_basic_block_22:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_43_basic_block_22:s -> fn_43_basic_block_23:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_43_basic_block_22:s -> fn_43_basic_block_24:n [style="solid,bold",color=black,weight=10,constraint=true, label="[0%]"];
	fn_43_basic_block_23:s -> fn_43_basic_block_24:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_43_basic_block_24:s -> fn_43_basic_block_25:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_43_basic_block_24:s -> fn_43_basic_block_30:n [style="solid,bold",color=black,weight=10,constraint=true, label="[0%]"];
	fn_43_basic_block_25:s -> fn_43_basic_block_27:n [style="solid,bold",color=black,weight=10,constraint=true, label="[0%]"];
	fn_43_basic_block_25:s -> fn_43_basic_block_28:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_43_basic_block_26:s -> fn_43_basic_block_27:n [style="solid,bold",color=black,weight=10,constraint=true, label="[0%]"];
	fn_43_basic_block_26:s -> fn_43_basic_block_29:n [style="dotted,bold",color=blue,weight=10,constraint=false, label="[100%]"];
	fn_43_basic_block_28:s -> fn_43_basic_block_29:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_43_basic_block_29:s -> fn_43_basic_block_26:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_43_basic_block_29:s -> fn_43_basic_block_30:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[0%]"];
	fn_43_basic_block_30:s -> fn_43_basic_block_31:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_43_basic_block_30:s -> fn_43_basic_block_32:n [style="solid,bold",color=black,weight=10,constraint=true, label="[0%]"];
	fn_43_basic_block_31:s -> fn_43_basic_block_32:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_43_basic_block_32:s -> fn_43_basic_block_16:n [style="dotted,bold",color=blue,weight=10,constraint=false, label="[75%]"];
	fn_43_basic_block_32:s -> fn_43_basic_block_33:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[25%]"];
	fn_43_basic_block_33:s -> fn_43_basic_block_36:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_43_basic_block_34:s -> fn_43_basic_block_35:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_43_basic_block_35:s -> fn_43_basic_block_6:n [style="solid,bold",color=black,weight=10,constraint=true, label="[100%]"];
	fn_43_basic_block_36:s -> fn_43_basic_block_1:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[100%]"];
	fn_43_basic_block_0:s -> fn_43_basic_block_1:n [style="invis",constraint=true];
}
}

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
  2013-05-11 14:39                   ` Teresa Johnson
@ 2013-05-11 15:03                     ` Steven Bosscher
  2013-05-12 14:37                       ` Teresa Johnson
  0 siblings, 1 reply; 35+ messages in thread
From: Steven Bosscher @ 2013-05-11 15:03 UTC (permalink / raw)
  To: Teresa Johnson; +Cc: Xinliang David Li, Diego Novillo, gcc-patches, Jan Hubicka

On Sat, May 11, 2013 at 4:38 PM, Teresa Johnson <tejohnson@google.com> wrote:
>   /* If we are partitioning hot/cold basic blocks, we don't want to
>      mess up unconditional or indirect jumps that cross between hot
>      and cold sections.
>
>      Basic block partitioning may result in some jumps that appear to
>      be optimizable (or blocks that appear to be mergeable), but which really
>      must be left untouched (they are required to make it safely across
>      partition boundaries).  See the comments at the top of
>      bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
>
> And at least a bunch of these are when we are in cfglayout mode.
>
> But let me locate a reproducer so we can make sure it isn't due to
> some issue with my patch.

It sounds like the issue here is that the partitioning code insists on
having an explicit jump for a section switch even for single_succ_p
blocks i.e. unconditional jump, while normally in cfglayout mode there
are no unconditional jumps.

If that is the problem, then the proper solution is to not have the
explicit jump. Just forward the jump, set the EDGE_CROSSING flag on
the crossing edge, and add a REG_CROSSING_JUMP note on the branching
insn. (FWIW, these REG_CROSSING_JUMP notes are also an aberration --
we have the CFG edges with all the information in them! But oh
well...) If the edge cannot fall through when going out of cfglayout
mode after bbro (and crossing edges can't fall through) then an extra
"forwarder jump" block will be inserted automatically. This may
require some hacking to invert branch conditions for branching insns
such that the branch goes to the other section and that the fallthru
path stays in the same section, but that's something that should be
relatively easy to do in cfgcleanup.

Ciao!
Steven

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
  2013-05-11 11:19                 ` Jan Hubicka
  2013-05-11 11:51                   ` Steven Bosscher
@ 2013-05-11 14:43                   ` Teresa Johnson
  1 sibling, 0 replies; 35+ messages in thread
From: Teresa Johnson @ 2013-05-11 14:43 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: Steven Bosscher, Xinliang David Li, Diego Novillo, gcc-patches

On Sat, May 11, 2013 at 4:19 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> >
>> > BTW2: We badly need to figure out a way to create test cases for FDO... :-(
>>
>> Yes. I had tried testing awhile back with the gcc regression tests and
>> enabling -freorder-blocks-and-partition, but none of the issues I was
>> having with larger benchmarks fired. I think there just aren't enough
>> (or large/complex enough?) FDO tests in gcc.dg/tree-prof and elsewhere
>> to trigger this. I was able to trigger many of the issues when
>> compiling cpu2006 with fdo and partitioning enabled, but it will take
>> some work to cut them down.
>
> Yep, we do not have that many testcases in tree-prof and modifying i.e.
> gcc.c-torture/execute to run with -fprofile-generate/-fprofile-use by default
> is probly bit of overkill. Having easy way to do that optionally may be
> interesting though.
>
> Once -freorder-blocks-and-partition actually works, we should enable it by
> default with -fprofile-generate (I recall I was trying to do that once, but
> I am not sure what was outcome back then and why it did not happen).
> That should get it tested with profiledbootstrap, too.

I don't remember if I tried enabling splitting with a
profiledbootstrap - that sounds like a great stress test. I can try
enabling it locally with my patch and running that as a test.

Teresa

>
> Honza
>>
>> Thanks,
>> Teresa
>>
>> >
>> > Ciao!
>> > Steven
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413



--
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
  2013-05-11 11:45                 ` Steven Bosscher
@ 2013-05-11 14:39                   ` Teresa Johnson
  2013-05-11 15:03                     ` Steven Bosscher
  0 siblings, 1 reply; 35+ messages in thread
From: Teresa Johnson @ 2013-05-11 14:39 UTC (permalink / raw)
  To: Steven Bosscher
  Cc: Xinliang David Li, Diego Novillo, gcc-patches, Jan Hubicka

On Sat, May 11, 2013 at 4:44 AM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> On Sat, May 11, 2013 at 5:21 AM, Teresa Johnson wrote:
>> Here there was a block that happened to be laid out at the very start
>> of the cold section (it was jumped to from elsewhere, not reached via
>> fall through from its layout predecessor). Thus it was preceded by a
>> switch section note, which was put into the bb header when we entered
>> cfglayout mode for compgoto. The note ended up in the middle of the
>> block when we did some block combining with its cfg predecessor (not
>> the block that preceded it in the layout chain, which was the last hot
>> block in the reorder chain).
>
> Yikes. So we also need an INSN_NOTE verifier to make sure that kind of
> non-sense doesn't happen? More reason to make the note go away :-)
>
> (See my recent patch that added note_outside_basic_block_p for other
> examples of NOTEs ending up where they don't belong. It's one of those
> most-people-want-it-but-nobody-does-it tasks: To create some frame
> work for NOTE_INSN_* notes that is safe and sane...).
>
>
>>> Please make the note go away completely before pass_free_cfg, and earn
>>> greater admiration than Zeus. The note always was wrong, and now
>>> you've shown it's also a problem.
>
> Many, many thanks!

Some encouraging news: I removed the earlier calls to
insert_section_boundary_note and added a call from
rest_of_pass_free_cfg, and I also expanded that routine to do similar
sanity checking as in verify_hot_cold_block_grouping to ensure we only
switch sections once. And cpu2006 built fine with profile feedback and
splitting enabled. I am running them now, but all the previous
problems I chased down were compile-time not run-time, thankfully, so
that is a very good sign. I'll do some more testing with internal
benchmarks.

>
>>> Right, I think it's good that you've centralized this code. But it
>>> seems to me that we should make sure that the hot blocks and cold
>>> blocks are still grouped (i.e. there is only one basic block B such
>>> that BB_PARTITION(B) != BB_PARTITION(B->next_bb). That is something
>>> your code doesn't handle, AFAICT. It's just one thing that's difficult
>>> to maintain, probably there are others. It's also something the
>>> partitioning verifier should check, i.e. that the basic blocks in hot
>>> and cold partitions are properly grouped.
>>
>> Actually, there is already code that verifies this at the end of bbro
>> (verify_hot_cold_block_grouping()). Before bb reordering it doesn't
>> make sense to check this.
>>
>> And AFAICT, after bbro the only place we go into and out of cfglayout
>> mode is compgoto, which duplicates blocks along edges only if they
>> don't cross a partition boundary, and lays out the duplicated block
>> adjacent to the original. I haven't seen any places where this is
>> violated, probably as a result. But it wouldn't be a bad idea to call
>> verify_hot_cold_block_grouping again during the flow verification code
>> once we detect/flag that bbro is complete.
>
> I'd just make it part of verify_flow_info and only let it run if your
> new flag on crtl is set.

Yes, although the new flag is not sufficient since it just indicates
that there was partitioning done and not that we are done with bb
reordering. I don't see a way to detect that we are done with that
pass, so it looks like I will need to add another flag to the rtl_data
struct.

>
> You shouldn't count on compgoto being the last and only pass to go
> into/out-of cfglayout mode. The compiler changes all the time, passes
> get shuffled around from time to time, and target-specific passes can
> be inserted anywhere. I have patches in my queue to lengthen the life
> of the CFG beyond pass_machine_reorg (many machine reorgs are CFG
> aware already anyway) and they should be allowed to go into/out-of
> cfglayout mode also.

Yep, I agree it will be better to add the checking code to catch issues early.

>
>
>
>>>> +  /* Invoke the cleanup again once we are out of cfg layout mode
>>>> +     after committing the final bb layout above. This enables removal
>>>> +     of forwarding blocks across the hot/cold section boundary when
>>>> +     splitting is enabled that were necessary while in cfg layout
>>>> +     mode.  */
>>>> +  if (crtl->has_bb_partition)
>>>> +    cleanup_cfg (CLEANUP_EXPENSIVE);
>>>
>>> There shouldn't be any forwarder blocks in cfg layout mode. What did
>>> you need this for?
>>
>> This was a performance fix.
>>
>> There is code in try_forward_edges, called from try_optimize_cfg that
>> we call from cleanup_cfg, typically in cfglayout mode, that will not
>> eliminate forwarding blocks when either the given block "b" or its
>> successor block ends with a region-crossing jump. The comments
>> indicate that these need to be left in to ensure we don't fall through
>> across section boundaries, which makes sense. The issue here was that
>> I saw the blocks in the hot partition ending in conditional branches,
>> which had a fall-through to another hot section block, and the
>> conditional jump led to yet another block in the hot section that
>> simply contained an unconditional jump to a cold section block. So in
>> this case when try_forward_edges was called with the block with the
>> conditional branch, when we look at its successor (the forwarding
>> block), we can't eliminate it since it ends in a region crossing
>> branch. I guess the concern is that if the conditional branch sense
>> was reversed in cfglayout mode we would end up falling through to a
>> different region. But once we leave cfglayout mode that should not
>> occur. So I loosened up the checks on the successor block so that it
>> is ok if it ends in a region crossing branch when we are in cfgrtl
>> mode (and added this call). That way, these forwarding blocks are
>> eliminated and we are able to have a region crossing conditional jump
>> directly to the cold section block, without the intervening forwarding
>> block.
>
> This sounds a bit scary to me. After bbro there shouldn't be any
> changes to the basic block layout anymore. Do you have a test case for
> this problem? I'd like to understand why those forwarder blocks are
> really necessary. In cfglayout mode, it's supposed to be simpler to
> modify the CFG without the constraints of cfgrtl mode. You describe a
> scenario where cfglayout mode is more constrained than cfgrtl, and
> that'd be a bug IMHO.

Let me find a small reproducer. I was looking at this on some internal
code a couple weeks ago. From my understanding of cfg layout, I
thought this was needed to enable the flexibility to redirect
branches, etc in that mode (i.e. enables the simplicity you mention
above). There is a comment block that shows up multiple places in the
cfg optimization-related files:

  /* If we are partitioning hot/cold basic blocks, we don't want to
     mess up unconditional or indirect jumps that cross between hot
     and cold sections.

     Basic block partitioning may result in some jumps that appear to
     be optimizable (or blocks that appear to be mergeable), but which really
     must be left untouched (they are required to make it safely across
     partition boundaries).  See the comments at the top of
     bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */

And at least a bunch of these are when we are in cfglayout mode.

But let me locate a reproducer so we can make sure it isn't due to
some issue with my patch.

Thanks,
Teresa

>
> Ciao!
> Steven



--
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
  2013-05-11 11:51                   ` Steven Bosscher
@ 2013-05-11 12:28                     ` Jan Hubicka
  0 siblings, 0 replies; 35+ messages in thread
From: Jan Hubicka @ 2013-05-11 12:28 UTC (permalink / raw)
  To: Steven Bosscher
  Cc: Jan Hubicka, Teresa Johnson, Xinliang David Li, Diego Novillo,
	gcc-patches

> On Sat, May 11, 2013 at 1:19 PM, Jan Hubicka wrote:
> > Once -freorder-blocks-and-partition actually works, we should enable it by
> > default with -fprofile-generate (I recall I was trying to do that once, but
> > I am not sure what was outcome back then and why it did not happen).
> > That should get it tested with profiledbootstrap, too.
> 
> I don't think -freorder-blocks-and-partition ever was stable enough to
> work with profiledbootstrap. From day one, it was fragile and not well
> covered in regression testing. I hope the verifiers will make life a
> bit more bearable, and that the fixes from Teresa will allow us to
> enable -freorder-blocks-and-partition with -fprofile-generate.

Yep, I hoped to slowly chase the bugs away but always got scared by
implementation details....
> 
> Has anyone ever investigated the effects of
> -freorder-blocks-and-partition vs. the function splitting part if
> flag_partial_inlining (ipa-split.c)?

ipa-split really does splitting just in very special cases where partial
inlining seems possible and feasible. Plus one really split function into
two making it impossible to mix local vars. So it is not a replacement for
partitioning...

I was considering more aggressive outlininning of cold parts in ipa-split,
but did not get it implemented yet.

Honza
> 
> Ciao!
> Steven

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
  2013-05-11 11:19                 ` Jan Hubicka
@ 2013-05-11 11:51                   ` Steven Bosscher
  2013-05-11 12:28                     ` Jan Hubicka
  2013-05-11 14:43                   ` Teresa Johnson
  1 sibling, 1 reply; 35+ messages in thread
From: Steven Bosscher @ 2013-05-11 11:51 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Teresa Johnson, Xinliang David Li, Diego Novillo, gcc-patches

On Sat, May 11, 2013 at 1:19 PM, Jan Hubicka wrote:
> Once -freorder-blocks-and-partition actually works, we should enable it by
> default with -fprofile-generate (I recall I was trying to do that once, but
> I am not sure what was outcome back then and why it did not happen).
> That should get it tested with profiledbootstrap, too.

I don't think -freorder-blocks-and-partition ever was stable enough to
work with profiledbootstrap. From day one, it was fragile and not well
covered in regression testing. I hope the verifiers will make life a
bit more bearable, and that the fixes from Teresa will allow us to
enable -freorder-blocks-and-partition with -fprofile-generate.

Has anyone ever investigated the effects of
-freorder-blocks-and-partition vs. the function splitting part if
flag_partial_inlining (ipa-split.c)?

Ciao!
Steven

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
  2013-05-11  3:21               ` Teresa Johnson
  2013-05-11 11:19                 ` Jan Hubicka
@ 2013-05-11 11:45                 ` Steven Bosscher
  2013-05-11 14:39                   ` Teresa Johnson
  1 sibling, 1 reply; 35+ messages in thread
From: Steven Bosscher @ 2013-05-11 11:45 UTC (permalink / raw)
  To: Teresa Johnson; +Cc: Xinliang David Li, Diego Novillo, gcc-patches, Jan Hubicka

On Sat, May 11, 2013 at 5:21 AM, Teresa Johnson wrote:
> Here there was a block that happened to be laid out at the very start
> of the cold section (it was jumped to from elsewhere, not reached via
> fall through from its layout predecessor). Thus it was preceded by a
> switch section note, which was put into the bb header when we entered
> cfglayout mode for compgoto. The note ended up in the middle of the
> block when we did some block combining with its cfg predecessor (not
> the block that preceded it in the layout chain, which was the last hot
> block in the reorder chain).

Yikes. So we also need an INSN_NOTE verifier to make sure that kind of
non-sense doesn't happen? More reason to make the note go away :-)

(See my recent patch that added note_outside_basic_block_p for other
examples of NOTEs ending up where they don't belong. It's one of those
most-people-want-it-but-nobody-does-it tasks: To create some frame
work for NOTE_INSN_* notes that is safe and sane...).


>> Please make the note go away completely before pass_free_cfg, and earn
>> greater admiration than Zeus. The note always was wrong, and now
>> you've shown it's also a problem.

Many, many thanks!

>> Right, I think it's good that you've centralized this code. But it
>> seems to me that we should make sure that the hot blocks and cold
>> blocks are still grouped (i.e. there is only one basic block B such
>> that BB_PARTITION(B) != BB_PARTITION(B->next_bb). That is something
>> your code doesn't handle, AFAICT. It's just one thing that's difficult
>> to maintain, probably there are others. It's also something the
>> partitioning verifier should check, i.e. that the basic blocks in hot
>> and cold partitions are properly grouped.
>
> Actually, there is already code that verifies this at the end of bbro
> (verify_hot_cold_block_grouping()). Before bb reordering it doesn't
> make sense to check this.
>
> And AFAICT, after bbro the only place we go into and out of cfglayout
> mode is compgoto, which duplicates blocks along edges only if they
> don't cross a partition boundary, and lays out the duplicated block
> adjacent to the original. I haven't seen any places where this is
> violated, probably as a result. But it wouldn't be a bad idea to call
> verify_hot_cold_block_grouping again during the flow verification code
> once we detect/flag that bbro is complete.

I'd just make it part of verify_flow_info and only let it run if your
new flag on crtl is set.

You shouldn't count on compgoto being the last and only pass to go
into/out-of cfglayout mode. The compiler changes all the time, passes
get shuffled around from time to time, and target-specific passes can
be inserted anywhere. I have patches in my queue to lengthen the life
of the CFG beyond pass_machine_reorg (many machine reorgs are CFG
aware already anyway) and they should be allowed to go into/out-of
cfglayout mode also.



>>> +  /* Invoke the cleanup again once we are out of cfg layout mode
>>> +     after committing the final bb layout above. This enables removal
>>> +     of forwarding blocks across the hot/cold section boundary when
>>> +     splitting is enabled that were necessary while in cfg layout
>>> +     mode.  */
>>> +  if (crtl->has_bb_partition)
>>> +    cleanup_cfg (CLEANUP_EXPENSIVE);
>>
>> There shouldn't be any forwarder blocks in cfg layout mode. What did
>> you need this for?
>
> This was a performance fix.
>
> There is code in try_forward_edges, called from try_optimize_cfg that
> we call from cleanup_cfg, typically in cfglayout mode, that will not
> eliminate forwarding blocks when either the given block "b" or its
> successor block ends with a region-crossing jump. The comments
> indicate that these need to be left in to ensure we don't fall through
> across section boundaries, which makes sense. The issue here was that
> I saw the blocks in the hot partition ending in conditional branches,
> which had a fall-through to another hot section block, and the
> conditional jump led to yet another block in the hot section that
> simply contained an unconditional jump to a cold section block. So in
> this case when try_forward_edges was called with the block with the
> conditional branch, when we look at its successor (the forwarding
> block), we can't eliminate it since it ends in a region crossing
> branch. I guess the concern is that if the conditional branch sense
> was reversed in cfglayout mode we would end up falling through to a
> different region. But once we leave cfglayout mode that should not
> occur. So I loosened up the checks on the successor block so that it
> is ok if it ends in a region crossing branch when we are in cfgrtl
> mode (and added this call). That way, these forwarding blocks are
> eliminated and we are able to have a region crossing conditional jump
> directly to the cold section block, without the intervening forwarding
> block.

This sounds a bit scary to me. After bbro there shouldn't be any
changes to the basic block layout anymore. Do you have a test case for
this problem? I'd like to understand why those forwarder blocks are
really necessary. In cfglayout mode, it's supposed to be simpler to
modify the CFG without the constraints of cfgrtl mode. You describe a
scenario where cfglayout mode is more constrained than cfgrtl, and
that'd be a bug IMHO.

Ciao!
Steven

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
  2013-05-11  3:21               ` Teresa Johnson
@ 2013-05-11 11:19                 ` Jan Hubicka
  2013-05-11 11:51                   ` Steven Bosscher
  2013-05-11 14:43                   ` Teresa Johnson
  2013-05-11 11:45                 ` Steven Bosscher
  1 sibling, 2 replies; 35+ messages in thread
From: Jan Hubicka @ 2013-05-11 11:19 UTC (permalink / raw)
  To: Teresa Johnson
  Cc: Steven Bosscher, Xinliang David Li, Diego Novillo, gcc-patches,
	Jan Hubicka

> >
> > BTW2: We badly need to figure out a way to create test cases for FDO... :-(
> 
> Yes. I had tried testing awhile back with the gcc regression tests and
> enabling -freorder-blocks-and-partition, but none of the issues I was
> having with larger benchmarks fired. I think there just aren't enough
> (or large/complex enough?) FDO tests in gcc.dg/tree-prof and elsewhere
> to trigger this. I was able to trigger many of the issues when
> compiling cpu2006 with fdo and partitioning enabled, but it will take
> some work to cut them down.

Yep, we do not have that many testcases in tree-prof and modifying i.e.
gcc.c-torture/execute to run with -fprofile-generate/-fprofile-use by default
is probly bit of overkill. Having easy way to do that optionally may be
interesting though.

Once -freorder-blocks-and-partition actually works, we should enable it by
default with -fprofile-generate (I recall I was trying to do that once, but
I am not sure what was outcome back then and why it did not happen).
That should get it tested with profiledbootstrap, too.

Honza
> 
> Thanks,
> Teresa
> 
> >
> > Ciao!
> > Steven
> 
> 
> 
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
  2013-05-10 21:01             ` Steven Bosscher
  2013-05-10 21:10               ` Jan Hubicka
@ 2013-05-11  3:21               ` Teresa Johnson
  2013-05-11 11:19                 ` Jan Hubicka
  2013-05-11 11:45                 ` Steven Bosscher
  1 sibling, 2 replies; 35+ messages in thread
From: Teresa Johnson @ 2013-05-11  3:21 UTC (permalink / raw)
  To: Steven Bosscher
  Cc: Xinliang David Li, Diego Novillo, gcc-patches, Jan Hubicka

On Fri, May 10, 2013 at 2:00 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> On Fri, May 10, 2013 at 5:54 PM, Teresa Johnson wrote:
>> The main issue I had here, and why I made this change, is that we go
>> in and out of cfglayout mode several times after bb partitioning and
>> then out_of_cfglayout. The problem was that when we subsequently went
>> in and out of cfglayout mode, the switch text section notes that had
>> been inserted by bbpart were getting messed up (they were moved into
>> the bb header when we enter cfglayout mode and then not being
>> transferred to the correct location upon exit).
>>
>> I investigated trying
>> to keep those in sync, but it is really difficult/impossible to do
>> during cfglayout mode when they are in the header. So I simply strip
>> them out completely on entry to cfglayout mode, and if there were any
>> there on entry, this change ensures that they are restored in the
>> appropriate location upon exit. I'm not sure what is a good
>> alternative?
>
> The problem is that the note exists at all. I'd like to see the note
> go away completely eventually (when we have a CFG all the way through
> pass_final). As a stop-gap, we should not emit the note until
> pass_free_cfg. Up to that point the basic blocks tell what partition
> they're in, and all hot block and cold blocks should be in a sequence.
> So during pass_free_cfg, just walk the basic blocks chain until
> there's a partition change, and emit the note between those two
> blocks.

Ok, let me take a stab at that.

>
>
>> I triggered the same error in 445.gobmk once I applied the
>> thread_prologue_and_epilogue_insns fixes. This is an assert in the
>> dwarf CFI code that complains about a NOTE_INSN_SWITCH_TEXTS_SECTION
>> note not being preceeded by a barrier:
>
> The problem here is that fixup_reorder_chain should force a jump for
> basic blocks in one partition falling through to bb->next. That should
> be dealt with in can_fallthru, which should return false if
> BB_PARTITION (target) != BB_PARTITION (src).

That wasn't the issue in this case.

Here there was a block that happened to be laid out at the very start
of the cold section (it was jumped to from elsewhere, not reached via
fall through from its layout predecessor). Thus it was preceded by a
switch section note, which was put into the bb header when we entered
cfglayout mode for compgoto. The note ended up in the middle of the
block when we did some block combining with its cfg predecessor (not
the block that preceded it in the layout chain, which was the last hot
block in the reorder chain).

So the issue here wasn't making sure there is no fall-through across
section boundaries. There was already code to prevent this, but my
patch contains a few fixes to make sure that is always happening.

>
>
>> The correct solution in my opinion is to strip out the SWITCH note
>> every time we enter cfglayout mode after bbro, and then invoke
>> insert_section_boundary_note when leaving cfglayout (if one was found
>> on entry to that cfglayout mode) to reapply it.
>
> Please make the note go away completely before pass_free_cfg, and earn
> greater admiration than Zeus. The note always was wrong, and now
> you've shown it's also a problem.

Ok, I will try. =)

>
>
>>> * Fixup redirected edges that did not cross partitions before but
>>> apparently do after redirection. This is not supposed to happen in the
>>> first place, so fixing up any of this just papers over an error
>>> elsewhere in the compiler.
>>
>> Looking back through the earlier email exchanges from last fall I
>> found some discussion on this where I had found a couple places
>> causing this to happen. Two were in
>> thread_prologue_and_epilogue_insns: when we duplicated tail blocks or
>> created a new block to hold a simple return, and redirected some of
>> the edges to the new copy. In some cases this caused edges that were
>> previously region crossing to become non-region crossing, and in other
>> cases the reverse happened. I think there are a couple of other cases
>> mentioned in the emails too, but I would need to do some more digging
>> to find them all.
>>
>> There were a few places in the code that tried to detect this type of
>> issue, and fix them up, but they weren't consistent and there were
>> other places that had the same issue. I've centralized all that
>> handling in this patch (fixup_partitions and fixup_partition_crossing,
>> called from a few places where we redirect edges) so that it is more
>> consistent and comprehensive.
>
> Right, I think it's good that you've centralized this code. But it
> seems to me that we should make sure that the hot blocks and cold
> blocks are still grouped (i.e. there is only one basic block B such
> that BB_PARTITION(B) != BB_PARTITION(B->next_bb). That is something
> your code doesn't handle, AFAICT. It's just one thing that's difficult
> to maintain, probably there are others. It's also something the
> partitioning verifier should check, i.e. that the basic blocks in hot
> and cold partitions are properly grouped.

Actually, there is already code that verifies this at the end of bbro
(verify_hot_cold_block_grouping()). Before bb reordering it doesn't
make sense to check this.

And AFAICT, after bbro the only place we go into and out of cfglayout
mode is compgoto, which duplicates blocks along edges only if they
don't cross a partition boundary, and lays out the duplicated block
adjacent to the original. I haven't seen any places where this is
violated, probably as a result. But it wouldn't be a bad idea to call
verify_hot_cold_block_grouping again during the flow verification code
once we detect/flag that bbro is complete.

>
>
> BTW1: I don't understand this comment:
>
>> +  /* Invoke the cleanup again once we are out of cfg layout mode
>> +     after committing the final bb layout above. This enables removal
>> +     of forwarding blocks across the hot/cold section boundary when
>> +     splitting is enabled that were necessary while in cfg layout
>> +     mode.  */
>> +  if (crtl->has_bb_partition)
>> +    cleanup_cfg (CLEANUP_EXPENSIVE);
>
> There shouldn't be any forwarder blocks in cfg layout mode. What did
> you need this for?

This was a performance fix.

There is code in try_forward_edges, called from try_optimize_cfg that
we call from cleanup_cfg, typically in cfglayout mode, that will not
eliminate forwarding blocks when either the given block "b" or its
successor block ends with a region-crossing jump. The comments
indicate that these need to be left in to ensure we don't fall through
across section boundaries, which makes sense. The issue here was that
I saw the blocks in the hot partition ending in conditional branches,
which had a fall-through to another hot section block, and the
conditional jump led to yet another block in the hot section that
simply contained an unconditional jump to a cold section block. So in
this case when try_forward_edges was called with the block with the
conditional branch, when we look at its successor (the forwarding
block), we can't eliminate it since it ends in a region crossing
branch. I guess the concern is that if the conditional branch sense
was reversed in cfglayout mode we would end up falling through to a
different region. But once we leave cfglayout mode that should not
occur. So I loosened up the checks on the successor block so that it
is ok if it ends in a region crossing branch when we are in cfgrtl
mode (and added this call). That way, these forwarding blocks are
eliminated and we are able to have a region crossing conditional jump
directly to the cold section block, without the intervening forwarding
block.

>
> BTW2: We badly need to figure out a way to create test cases for FDO... :-(

Yes. I had tried testing awhile back with the gcc regression tests and
enabling -freorder-blocks-and-partition, but none of the issues I was
having with larger benchmarks fired. I think there just aren't enough
(or large/complex enough?) FDO tests in gcc.dg/tree-prof and elsewhere
to trigger this. I was able to trigger many of the issues when
compiling cpu2006 with fdo and partitioning enabled, but it will take
some work to cut them down.

Thanks,
Teresa

>
> Ciao!
> Steven



--
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
  2013-05-10 21:10               ` Jan Hubicka
@ 2013-05-10 21:14                 ` Steven Bosscher
  0 siblings, 0 replies; 35+ messages in thread
From: Steven Bosscher @ 2013-05-10 21:14 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Teresa Johnson, Xinliang David Li, Diego Novillo, gcc-patches

On Fri, May 10, 2013 at 11:10 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> There shouldn't be any forwarder blocks in cfg layout mode. What did
>> you need this for?
>>
>> BTW2: We badly need to figure out a way to create test cases for FDO... :-(
>
> We have gcc.dg/tree-prof and friends.  What do you need to add?

Something to more easily reproduce bugs and derive test cases from them.

Ciao!
Steven

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
  2013-05-10 21:01             ` Steven Bosscher
@ 2013-05-10 21:10               ` Jan Hubicka
  2013-05-10 21:14                 ` Steven Bosscher
  2013-05-11  3:21               ` Teresa Johnson
  1 sibling, 1 reply; 35+ messages in thread
From: Jan Hubicka @ 2013-05-10 21:10 UTC (permalink / raw)
  To: Steven Bosscher
  Cc: Teresa Johnson, Xinliang David Li, Diego Novillo, gcc-patches,
	Jan Hubicka

> There shouldn't be any forwarder blocks in cfg layout mode. What did
> you need this for?
> 
> BTW2: We badly need to figure out a way to create test cases for FDO... :-(

We have gcc.dg/tree-prof and friends.  What do you need to add?

Honza

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
  2013-05-10 15:54           ` Teresa Johnson
@ 2013-05-10 21:01             ` Steven Bosscher
  2013-05-10 21:10               ` Jan Hubicka
  2013-05-11  3:21               ` Teresa Johnson
  0 siblings, 2 replies; 35+ messages in thread
From: Steven Bosscher @ 2013-05-10 21:01 UTC (permalink / raw)
  To: Teresa Johnson; +Cc: Xinliang David Li, Diego Novillo, gcc-patches, Jan Hubicka

On Fri, May 10, 2013 at 5:54 PM, Teresa Johnson wrote:
> The main issue I had here, and why I made this change, is that we go
> in and out of cfglayout mode several times after bb partitioning and
> then out_of_cfglayout. The problem was that when we subsequently went
> in and out of cfglayout mode, the switch text section notes that had
> been inserted by bbpart were getting messed up (they were moved into
> the bb header when we enter cfglayout mode and then not being
> transferred to the correct location upon exit).
>
> I investigated trying
> to keep those in sync, but it is really difficult/impossible to do
> during cfglayout mode when they are in the header. So I simply strip
> them out completely on entry to cfglayout mode, and if there were any
> there on entry, this change ensures that they are restored in the
> appropriate location upon exit. I'm not sure what is a good
> alternative?

The problem is that the note exists at all. I'd like to see the note
go away completely eventually (when we have a CFG all the way through
pass_final). As a stop-gap, we should not emit the note until
pass_free_cfg. Up to that point the basic blocks tell what partition
they're in, and all hot block and cold blocks should be in a sequence.
So during pass_free_cfg, just walk the basic blocks chain until
there's a partition change, and emit the note between those two
blocks.


> I triggered the same error in 445.gobmk once I applied the
> thread_prologue_and_epilogue_insns fixes. This is an assert in the
> dwarf CFI code that complains about a NOTE_INSN_SWITCH_TEXTS_SECTION
> note not being preceeded by a barrier:

The problem here is that fixup_reorder_chain should force a jump for
basic blocks in one partition falling through to bb->next. That should
be dealt with in can_fallthru, which should return false if
BB_PARTITION (target) != BB_PARTITION (src).


> The correct solution in my opinion is to strip out the SWITCH note
> every time we enter cfglayout mode after bbro, and then invoke
> insert_section_boundary_note when leaving cfglayout (if one was found
> on entry to that cfglayout mode) to reapply it.

Please make the note go away completely before pass_free_cfg, and earn
greater admiration than Zeus. The note always was wrong, and now
you've shown it's also a problem.


>> * Fixup redirected edges that did not cross partitions before but
>> apparently do after redirection. This is not supposed to happen in the
>> first place, so fixing up any of this just papers over an error
>> elsewhere in the compiler.
>
> Looking back through the earlier email exchanges from last fall I
> found some discussion on this where I had found a couple places
> causing this to happen. Two were in
> thread_prologue_and_epilogue_insns: when we duplicated tail blocks or
> created a new block to hold a simple return, and redirected some of
> the edges to the new copy. In some cases this caused edges that were
> previously region crossing to become non-region crossing, and in other
> cases the reverse happened. I think there are a couple of other cases
> mentioned in the emails too, but I would need to do some more digging
> to find them all.
>
> There were a few places in the code that tried to detect this type of
> issue, and fix them up, but they weren't consistent and there were
> other places that had the same issue. I've centralized all that
> handling in this patch (fixup_partitions and fixup_partition_crossing,
> called from a few places where we redirect edges) so that it is more
> consistent and comprehensive.

Right, I think it's good that you've centralized this code. But it
seems to me that we should make sure that the hot blocks and cold
blocks are still grouped (i.e. there is only one basic block B such
that BB_PARTITION(B) != BB_PARTITION(B->next_bb). That is something
your code doesn't handle, AFAICT. It's just one thing that's difficult
to maintain, probably there are others. It's also something the
partitioning verifier should check, i.e. that the basic blocks in hot
and cold partitions are properly grouped.


BTW1: I don't understand this comment:

> +  /* Invoke the cleanup again once we are out of cfg layout mode
> +     after committing the final bb layout above. This enables removal
> +     of forwarding blocks across the hot/cold section boundary when
> +     splitting is enabled that were necessary while in cfg layout
> +     mode.  */
> +  if (crtl->has_bb_partition)
> +    cleanup_cfg (CLEANUP_EXPENSIVE);

There shouldn't be any forwarder blocks in cfg layout mode. What did
you need this for?

BTW2: We badly need to figure out a way to create test cases for FDO... :-(

Ciao!
Steven

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
  2013-05-10 12:07         ` Steven Bosscher
@ 2013-05-10 15:54           ` Teresa Johnson
  2013-05-10 21:01             ` Steven Bosscher
  0 siblings, 1 reply; 35+ messages in thread
From: Teresa Johnson @ 2013-05-10 15:54 UTC (permalink / raw)
  To: Steven Bosscher
  Cc: Xinliang David Li, Diego Novillo, gcc-patches, Jan Hubicka

On Thu, May 9, 2013 at 3:40 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> On Thu, May 9, 2013 at 11:42 PM, Diego Novillo wrote:
>> On 2013-05-08 01:13 , Teresa Johnson wrote:
>>> -static void
>>> +void
>>>  emit_barrier_after_bb (basic_block bb)
>>>  {
>>>    rtx barrier = emit_barrier_after (BB_END (bb));
>>> -  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>>> +  if (current_ir_type () == IR_RTL_CFGLAYOUT)
>>> +    BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>>
>>
>> What if the current IR is not RTL?  Should we fail here?  It doesn't seem
>> like it makes sense to call this from gimple, for instance.
>
> It also makes no sense calling it in IR_RTL_CFGLAYOUT mode. Barriers
> are meaningless in cfglayout mode.

Actually, the change above ensures we can call this routine when *not*
in CFGLAYOUT mode. It was previously only called when we were in
CFGLAYOUT mode (while in bbpart).

(This relates to a conversation we had on an earlier version of the
patch back in the fall, where we discussed this issue of bbpart adding
barriers while in cfglayout mode:
http://gcc.gnu.org/ml/gcc-patches/2012-10/msg02996.html).

>
> -  emit_barrier_after (BB_END (jump_block));
> +  /* We might be in cfg layout mode, and if so, the following routine will
> +     insert the barrier correctly.  */
> +  emit_barrier_after_bb (jump_block);
>
> We're practically always in cfglayout mode, but oh well...

Right, not always, which is why I needed to make this change.


On Fri, May 10, 2013 at 5:05 AM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> On Fri, May 10, 2013 at 12:57 AM, Xinliang David Li wrote:
>> On Thu, May 9, 2013 at 3:40 PM, Steven Bosscher wrote:
>
>>> This patch mixes up things badly from the point of
>>> what-depends-on-what, the whole approach looks wrong to me.
>>
>>
>> Do you mean the 'source file dependency' or 'logical dependency'?
>>
>> If the former, the code can be easily refactored to remove the
>> dependency. I don't see how the latter can be avoided as bb-partition
>> etc does change cfg states and leads to different actions in cfg
>> layout finalize.
>
> I mean logical dependency. cfglayout is just a representation of the
> CFG, and bb-partition is a code transformation. By making
> fixup_reorder_chain emit the note, you' ve put part of the
> transformation into out-of-cfglayout which is just bogus. You also
> don't put GCSE or loop unrolling in out-of-cfglayout, and this change
> is IMHO in the same category: mixing transformations into internal
> representations. That may be a short-term fix but it is a long-term
> maintenance/cleanups nightmare.

The main issue I had here, and why I made this change, is that we go
in and out of cfglayout mode several times after bb partitioning and
then out_of_cfglayout. The problem was that when we subsequently went
in and out of cfglayout mode, the switch text section notes that had
been inserted by bbpart were getting messed up (they were moved into
the bb header when we enter cfglayout mode and then not being
transferred to the correct location upon exit). I investigated trying
to keep those in sync, but it is really difficult/impossible to do
during cfglayout mode when they are in the header. So I simply strip
them out completely on entry to cfglayout mode, and if there were any
there on entry, this change ensures that they are restored in the
appropriate location upon exit. I'm not sure what is a good
alternative?

I found a more detailed description of why I made this change in our
email exchange from the fall:

--------------
I triggered the same error in 445.gobmk once I applied the
thread_prologue_and_epilogue_insns fixes. This is an assert in the
dwarf CFI code that complains about a NOTE_INSN_SWITCH_TEXTS_SECTION
note not being preceeded by a barrier:

gcc -c -o engine/utils.o -DSPEC_CPU -DNDEBUG -DHAVE_CONFIG_H -I. -I..
-I../include -I./include   -fprofile-use
-freorder-blocks-and-partition -freorder-blocks -ffunction-sections
 -O2      engine/utils.c

engine/utils.c: In function ‘visible_along_edge’:
engine/utils.c:274:1: internal compiler error: in create_pseudo_cfg,
at dwarf2cfi.c:2742
 }
 ^

In this case the switch section note was inside a BB. What I found was
that this was due to several phases going into and back out of
cfglayout mode again. In this case it was the compgotos phase. There
aren't any computed gotos, but this change occurs during
cfg_layout_initialize (in try_optimize_cfg called via cleanup_cfg).
Here it merged two (non-contiguous) blocks that had a
single-successor/single-predecessor relationship. However, the source
block was previously on the section boundary and had a SWITCH note
prior. This note is put into the header of the bb when we go into
cfglayout mode, and ended up inside the new merged block, which was in
any case not on the new border between the hot and cold sections.

The correct solution in my opinion is to strip out the SWITCH note
every time we enter cfglayout mode after bbro, and then invoke
insert_section_boundary_note when leaving cfglayout (if one was found
on entry to that cfglayout mode) to reapply it.
--------------

>
> Although I've only skimmed the patch, I have noted several issues with it:
>
> * 3 different changes put into a single patch: the
> crtl->has_bb_partition change (which looks good to me), the
> verification stuff, and various fixes. The patch should be submitted
> in 3 parts to make/testing review easier.

I plan to do that based on your, Jeff and Honza's suggestion.

>
> * Emitting barriers in cfglayout mode. That's non-sense.

See earlier description - that was always the case, not due to my change.

>
> * Fixup redirected edges that did not cross partitions before but
> apparently do after redirection. This is not supposed to happen in the
> first place, so fixing up any of this just papers over an error
> elsewhere in the compiler.

Looking back through the earlier email exchanges from last fall I
found some discussion on this where I had found a couple places
causing this to happen. Two were in
thread_prologue_and_epilogue_insns: when we duplicated tail blocks or
created a new block to hold a simple return, and redirected some of
the edges to the new copy. In some cases this caused edges that were
previously region crossing to become non-region crossing, and in other
cases the reverse happened. I think there are a couple of other cases
mentioned in the emails too, but I would need to do some more digging
to find them all.

There were a few places in the code that tried to detect this type of
issue, and fix them up, but they weren't consistent and there were
other places that had the same issue. I've centralized all that
handling in this patch (fixup_partitions and fixup_partition_crossing,
called from a few places where we redirect edges) so that it is more
consistent and comprehensive.

>
> * The fixup_reorder_chain changes I've mentioned above.

See explanation above.

Thanks!
Teresa

>
>
> Ciao!
> Steven



--
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
  2013-05-10 11:52     ` Jan Hubicka
@ 2013-05-10 14:50       ` Teresa Johnson
  0 siblings, 0 replies; 35+ messages in thread
From: Teresa Johnson @ 2013-05-10 14:50 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Jeff Law, gcc-patches, Steven Bosscher, David Li, reply

On Fri, May 10, 2013 at 4:52 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> On 05/07/13 23:13, Teresa Johnson wrote:
>> >----------------------
>> >Revised patch that fixes failures encountered when enabling
>> >-freorder-blocks-and-partition, including the failure reported in PR 53743.
>> >
>> >This includes new verification code to ensure no cold blocks dominate hot
>> >blocks contributed by Steven Bosscher.
>> Seems like a reasonable verification; presumably if we have a cold
>> block dominating a hot block, then the block/edge frequencies are
>> badly broken.  Ah, just saw the comments for the other case where
>> this happens.  cold entry, but hot loop inside pushing over the
>> barrier. Arguably given a cold block in the dominator graph, all its
>> children should have their frequences scaled down to avoid that situation.
>
> Yep, also note that sanity checking anything about frequencies is really hard.
> There are very many places in compiler that necesarilly need to invalidate
> frequencies in weird ways (at least short of rebuilding the whole profile
> from probabilities again).

Yes, as noted in the comments this was in part due to several places
where counts/frequencies were not kept in sync. Rather than try to fix
all of these, or do any scaling of frequencies, the partitioning code
now just enforces that the partitioning is sane w.r.t. the given
counts. This is done during bb partitioning. The sanity checking
routine was also useful for finding places where optimization passes
were splitting edges and causing hot blocks previously reached by both
hot and cold blocks to become dominated by cold blocks (see comments
in commit_edge_insertions in my patch), and making sure they got fixed
up.

But there is the issue of what we should do in the case of an
infrequent but non-zero entry (marked cold by maybe_hot_count_p
because its count is less than the number of training runs) that leads
to a hot loop. The code I added to the partitioning routine
(find_rarely_executed_basic_blocks_and_crossing_edges) will cause the
entry to also be placed in the hot partition. I would argue this is
the desired behavior - if the routine contains code that is very hot
for, say, 1/2 its training runs, the entry and hot loop (and
everything on the path in between) should be in the hot partition.

>
>> I can't really comment on the cfglayout and related stuff -- it was
>> added at a time when I wasn't doing much with GCC and thus I don't
>> know much about it.
>
> I think I can take a look at the cfglayout stuff. Splitting the patch would be great.

Thanks, that would be great. I can split the patch first.

>
> Honza
>>
>> However, I like the changes to record if we've done partitioning and
>> checking those instead of flag_reorder_blocks_and_partition.  That's
>> simple enough that I'd support pulling it out as a separate patch
>> and installing immediately if that can be done so without major
>> headaches.

Ok, thanks, will do.

>>
>> I think we could do something similar with the code to verify the
>> idom of a hot block is also hot.  Though looking at the
>> implementation I wonder if it could be simplified by walking the
>> dominator tree?  I can't look at it real closely tonight though.

Looking at this code again, I agree with you. It looks like it is
going to walk cold bb's more than once and O(n^2) in the worst case. I
will fix this (there are a couple places in the patch that do a walk
to ensure that this is not violated).

>>
>> Could you pull those two logical hunks of work out into individual
>> patches.

Will do. The only complication with splitting out the dominance
checking stuff is that there are a number of changes in the patch to
ensure that we don't violate this (hot block can't be dominated by
cold block). I am not sure it makes sense or will be easy to split all
of these out. I think what I will do is try to pull the big related
chunks of them out to a separate patch (the new verification code, the
code to prevent this in the partitioning routine, and
fixup_partitions), but there are going to be a few places in the other
patch that do some fixups related to this (e.g. in rtl_split_edge)
that I would like to leave in the larger correctness patch.

Thanks,
Teresa

>>
>> jeff



--
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
  2013-05-09 21:42   ` Diego Novillo
  2013-05-09 22:41     ` Steven Bosscher
@ 2013-05-10 14:29     ` Teresa Johnson
  1 sibling, 0 replies; 35+ messages in thread
From: Teresa Johnson @ 2013-05-10 14:29 UTC (permalink / raw)
  To: Diego Novillo; +Cc: gcc-patches, Steven Bosscher, David Li

On Thu, May 9, 2013 at 2:42 PM, Diego Novillo <dnovillo@google.com> wrote:
> On 2013-05-08 01:13 , Teresa Johnson wrote:
>>
>> Somehow Rietveld didn't upload the patch properly. I've attached the
>> patch to this email instead. Here is the description:
>
>
> Rietveld has turned out to be far less useful that I had hoped.  If you are
> running ubuntu precise, the upload script is having some bad interaction
> with the server, which makes it to constantly reject your password.
>
> I do not recommend using Rietveld anymore.  I don't really have the cycles
> to invest in fixing the various usability warts we've found. Sorry.

Thanks for the note. The main reason I have tried to keep using
Rietveld is that it sends out the patch inline in the email with the
formatting preserved. I have found that cut-n-paste into a gmail
window messes up the spacing. Do you know of a good way to work around
this issue?

>
>
>> -static void
>> +void
>>  emit_barrier_after_bb (basic_block bb)
>>  {
>>    rtx barrier = emit_barrier_after (BB_END (bb));
>> -  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>> +  if (current_ir_type () == IR_RTL_CFGLAYOUT)
>> +    BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>
>
> What if the current IR is not RTL?  Should we fail here?  It doesn't seem
> like it makes sense to call this from gimple, for instance.

This is called only from bb-reorder and cfgrtl, so we should only be
in IR_RTL, I can add an assert to this effect.

More on this change when I respond to Steven's comments.

>
>
>> +     several different possibilities. One is that there are edge weight
>> insanities
>> +     due to optimization phases that do not properly update basic block
>> profile
>> +     counts. The second is that the entry of the function may not be hot,
>> because
>> +     it is entered fewer times than the number of profile training runs,
>> but there
>> +     is a loop inside the function that causes blocks within the function
>> to be
>> +     above the threshold for hotness.  */
>> +  if (cold_bb_count)
>> +    {
>> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
>> +
>
>
> Move this out into its own function?

Will do.

>
>> +      if (dom_calculated_here)
>> +        calculate_dominance_info (CDI_DOMINATORS);
>> +
>> +      /* Keep examining hot bbs until we have either checked them all, or
>> +         re-marked all cold bbs hot.  */
>> +      while (! bbs_in_hot_partition.is_empty ()
>> +             && cold_bb_count)
>> +        {
>> +          basic_block dom_bb;
>> +
>> +          bb = bbs_in_hot_partition.pop ();
>> +          dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
>> +
>> +          /* If bb's immediate dominator is also hot then it is ok.  */
>> +          if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION)
>> +            continue;
>> +
>> +          /* We have a hot bb with an immediate dominator that is cold.
>> +             The dominator needs to be re-marked to hot.  */
>
>
> s/to hot/hot/

ok. Actually, I think s/to hot/as hot/ might sound better.

>
>> Index: cfgrtl.c
>> ===================================================================
>> --- cfgrtl.c    (revision 198686)
>> +++ cfgrtl.c    (working copy)
>> @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "tree.h"
>>  #include "hard-reg-set.h"
>>  #include "basic-block.h"
>> +#include "bb-reorder.h"
>
>
> You may need to modify Makefile.in to declare this new dependency.
>
>> +/* Called when edge E has been redirected to a new destination,
>> +   in order to update the region crossing flag on the edge and
>> +   jump.  */
>> +
>> +static void
>> +fixup_partition_crossing (edge e, basic_block target)
>> +{
>> +  rtx note;
>> +
>> +  gcc_assert (e->dest == target);
>
>
> Then, why not just take a single argument E?

Good idea, will do.

>
>> +fixup_bb_partition (basic_block bb)
>> +{
>> +  edge e;
>> +  edge_iterator ei;
>> +
>> +  /* Now need to make bb's pred edges non-region crossing.  */
>
>
> This is hard to parse.

Ok, how about:

/* Ensure edges to bb reflect its new partition assignment with the appropriate
   region-crossing flag setting.  */

>
>> +  /* Delete any blocks that became unreachable and weren't
>> +     already cleaned up, for example during edge forwarding
>> +     and convert_jumps_to_returns. This will expose more
>> +     opportunities for fixing the partition boundaries here.
>> +     Also, the calculation of the dominance graph during verification
>> +     will assert if there are unreachable nodes.  */
>> +  delete_unreachable_blocks ();
>
>
> Why not just schedule a CFG cleanup as a prerequisite to this pass?

Which pass? This is called right after we try to optimize the cfg
during cleanup_cfg, which is invoked numerous places. try_optimize_cfg
performs a number of cfg optimizations, some of which can create
unreachable blocks. I found it was much easier to clean this up in one
pass at the end rather that try to detect and fix this up
incrementally.

>
>
> A minor formatting nit.  References to locals and function arguments should
> be done in capitals.

Ok, will clean this up.

Thanks!
Teresa

>
>
> Diego.



--
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
  2013-05-09 22:57       ` Xinliang David Li
@ 2013-05-10 12:07         ` Steven Bosscher
  2013-05-10 15:54           ` Teresa Johnson
  0 siblings, 1 reply; 35+ messages in thread
From: Steven Bosscher @ 2013-05-10 12:07 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: Diego Novillo, Teresa Johnson, gcc-patches

On Fri, May 10, 2013 at 12:57 AM, Xinliang David Li wrote:
> On Thu, May 9, 2013 at 3:40 PM, Steven Bosscher wrote:

>> This patch mixes up things badly from the point of
>> what-depends-on-what, the whole approach looks wrong to me.
>
>
> Do you mean the 'source file dependency' or 'logical dependency'?
>
> If the former, the code can be easily refactored to remove the
> dependency. I don't see how the latter can be avoided as bb-partition
> etc does change cfg states and leads to different actions in cfg
> layout finalize.

I mean logical dependency. cfglayout is just a representation of the
CFG, and bb-partition is a code transformation. By making
fixup_reorder_chain emit the note, you' ve put part of the
transformation into out-of-cfglayout which is just bogus. You also
don't put GCSE or loop unrolling in out-of-cfglayout, and this change
is IMHO in the same category: mixing transformations into internal
representations. That may be a short-term fix but it is a long-term
maintenance/cleanups nightmare.

Although I've only skimmed the patch, I have noted several issues with it:

* 3 different changes put into a single patch: the
crtl->has_bb_partition change (which looks good to me), the
verification stuff, and various fixes. The patch should be submitted
in 3 parts to make/testing review easier.

* Emitting barriers in cfglayout mode. That's non-sense.

* Fixup redirected edges that did not cross partitions before but
apparently do after redirection. This is not supposed to happen in the
first place, so fixing up any of this just papers over an error
elsewhere in the compiler.

* The fixup_reorder_chain changes I've mentioned above.


Ciao!
Steven

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
  2013-05-10  4:43   ` Jeff Law
@ 2013-05-10 11:52     ` Jan Hubicka
  2013-05-10 14:50       ` Teresa Johnson
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Hubicka @ 2013-05-10 11:52 UTC (permalink / raw)
  To: Jeff Law; +Cc: Teresa Johnson, gcc-patches, Steven Bosscher, David Li, reply

> On 05/07/13 23:13, Teresa Johnson wrote:
> >----------------------
> >Revised patch that fixes failures encountered when enabling
> >-freorder-blocks-and-partition, including the failure reported in PR 53743.
> >
> >This includes new verification code to ensure no cold blocks dominate hot
> >blocks contributed by Steven Bosscher.
> Seems like a reasonable verification; presumably if we have a cold
> block dominating a hot block, then the block/edge frequencies are
> badly broken.  Ah, just saw the comments for the other case where
> this happens.  cold entry, but hot loop inside pushing over the
> barrier. Arguably given a cold block in the dominator graph, all its

Yep, also note that sanity checking anything about frequencies is really hard.
There are very many places in compiler that necesarilly need to invalidate
frequencies in weird ways (at least short of rebuilding the whole profile
from probabilities again).

> I can't really comment on the cfglayout and related stuff -- it was
> added at a time when I wasn't doing much with GCC and thus I don't
> know much about it.

I think I can take a look at the cfglayout stuff. Splitting the patch would be great.

Honza
> 
> However, I like the changes to record if we've done partitioning and
> checking those instead of flag_reorder_blocks_and_partition.  That's
> simple enough that I'd support pulling it out as a separate patch
> and installing immediately if that can be done so without major
> headaches.
> 
> I think we could do something similar with the code to verify the
> idom of a hot block is also hot.  Though looking at the
> implementation I wonder if it could be simplified by walking the
> dominator tree?  I can't look at it real closely tonight though.
> 
> Could you pull those two logical hunks of work out into individual
> patches.
> 
> jeff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
  2013-05-08  5:13 ` Teresa Johnson
  2013-05-09 21:42   ` Diego Novillo
@ 2013-05-10  4:43   ` Jeff Law
  2013-05-10 11:52     ` Jan Hubicka
  1 sibling, 1 reply; 35+ messages in thread
From: Jeff Law @ 2013-05-10  4:43 UTC (permalink / raw)
  To: Teresa Johnson; +Cc: gcc-patches, Steven Bosscher, David Li, reply

On 05/07/13 23:13, Teresa Johnson wrote:
> ----------------------
> Revised patch that fixes failures encountered when enabling
> -freorder-blocks-and-partition, including the failure reported in PR 53743.
>
> This includes new verification code to ensure no cold blocks dominate hot
> blocks contributed by Steven Bosscher.
Seems like a reasonable verification; presumably if we have a cold block 
dominating a hot block, then the block/edge frequencies are badly 
broken.  Ah, just saw the comments for the other case where this 
happens.  cold entry, but hot loop inside pushing over the barrier. 
Arguably given a cold block in the dominator graph, all its children 
should have their frequences scaled down to avoid that situation.

> Additionally, I added a flag to the rtl_data structure to indicate whether
> any partitioning was actually performed, so that optimizations which were
> conservatively disabled whenever the flag_reorder_blocks_and_partition
> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less
> conservative for functions where no partitions were formed (e.g. they are
> completely hot).
> ----------------------
>
> Ok for trunk?
I can't really comment on the cfglayout and related stuff -- it was 
added at a time when I wasn't doing much with GCC and thus I don't know 
much about it.

However, I like the changes to record if we've done partitioning and 
checking those instead of flag_reorder_blocks_and_partition.  That's 
simple enough that I'd support pulling it out as a separate patch and 
installing immediately if that can be done so without major headaches.

I think we could do something similar with the code to verify the idom 
of a hot block is also hot.  Though looking at the implementation I 
wonder if it could be simplified by walking the dominator tree?  I can't 
look at it real closely tonight though.

Could you pull those two logical hunks of work out into individual 
patches.

jeff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
  2013-05-09 22:41     ` Steven Bosscher
@ 2013-05-09 22:57       ` Xinliang David Li
  2013-05-10 12:07         ` Steven Bosscher
  0 siblings, 1 reply; 35+ messages in thread
From: Xinliang David Li @ 2013-05-09 22:57 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Diego Novillo, Teresa Johnson, gcc-patches

On Thu, May 9, 2013 at 3:40 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> On Thu, May 9, 2013 at 11:42 PM, Diego Novillo wrote:
>> On 2013-05-08 01:13 , Teresa Johnson wrote:
>>> -static void
>>> +void
>>>  emit_barrier_after_bb (basic_block bb)
>>>  {
>>>    rtx barrier = emit_barrier_after (BB_END (bb));
>>> -  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>>> +  if (current_ir_type () == IR_RTL_CFGLAYOUT)
>>> +    BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>>
>>
>> What if the current IR is not RTL?  Should we fail here?  It doesn't seem
>> like it makes sense to call this from gimple, for instance.
>
> It also makes no sense calling it in IR_RTL_CFGLAYOUT mode. Barriers
> are meaningless in cfglayout mode.
>
> -  emit_barrier_after (BB_END (jump_block));
> +  /* We might be in cfg layout mode, and if so, the following routine will
> +     insert the barrier correctly.  */
> +  emit_barrier_after_bb (jump_block);
>
> We're practically always in cfglayout mode, but oh well...
> +  if (current_ir_type () == IR_RTL_CFGLAYOUT)
>      emit_barrier_after (BB_END (jump_block));
>
>
>>> Index: cfgrtl.c
>>> ===================================================================
>>> --- cfgrtl.c    (revision 198686)
>>> +++ cfgrtl.c    (working copy)
>>> @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "tree.h"
>>>  #include "hard-reg-set.h"
>>>  #include "basic-block.h"
>>> +#include "bb-reorder.h"
>>
>>
>> You may need to modify Makefile.in to declare this new dependency.
>
> Eh, no. cfgrtl should not depend on bb-reorder.
>
> And cfglayout should not depend on basic block partitioning, either,
> so this change:
>
>>  /* Finalize the changes: reorder insn list according to the sequence specified
>> -   by aux pointers, enter compensation code, rebuild scope forest.  */
>> +   by aux pointers, enter compensation code, rebuild scope forest. If
>> +   this is called when we will FINALIZE_REORDER_BLOCKS, indicate that
>> +   to fixup_reorder_chain so that it can insert the proper switch text
>> +   section notes.  */
>>
>>  void
>> -cfg_layout_finalize (void)
>> +cfg_layout_finalize (bool finalize_reorder_blocks)
>
> is Just Wrong (tm).
>
> This patch mixes up things badly from the point of
> what-depends-on-what, the whole approach looks wrong to me.


Do you mean the 'source file dependency' or 'logical dependency'?

If the former, the code can be easily refactored to remove the
dependency. I don't see how the latter can be avoided as bb-partition
etc does change cfg states and leads to different actions in cfg
layout finalize.

thanks,

David




>
> Ciao!
> Steven

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
  2013-05-09 21:42   ` Diego Novillo
@ 2013-05-09 22:41     ` Steven Bosscher
  2013-05-09 22:57       ` Xinliang David Li
  2013-05-10 14:29     ` Teresa Johnson
  1 sibling, 1 reply; 35+ messages in thread
From: Steven Bosscher @ 2013-05-09 22:41 UTC (permalink / raw)
  To: Diego Novillo; +Cc: Teresa Johnson, gcc-patches, David Li

On Thu, May 9, 2013 at 11:42 PM, Diego Novillo wrote:
> On 2013-05-08 01:13 , Teresa Johnson wrote:
>> -static void
>> +void
>>  emit_barrier_after_bb (basic_block bb)
>>  {
>>    rtx barrier = emit_barrier_after (BB_END (bb));
>> -  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>> +  if (current_ir_type () == IR_RTL_CFGLAYOUT)
>> +    BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
>
>
> What if the current IR is not RTL?  Should we fail here?  It doesn't seem
> like it makes sense to call this from gimple, for instance.

It also makes no sense calling it in IR_RTL_CFGLAYOUT mode. Barriers
are meaningless in cfglayout mode.

-  emit_barrier_after (BB_END (jump_block));
+  /* We might be in cfg layout mode, and if so, the following routine will
+     insert the barrier correctly.  */
+  emit_barrier_after_bb (jump_block);

We're practically always in cfglayout mode, but oh well...
+  if (current_ir_type () == IR_RTL_CFGLAYOUT)
     emit_barrier_after (BB_END (jump_block));


>> Index: cfgrtl.c
>> ===================================================================
>> --- cfgrtl.c    (revision 198686)
>> +++ cfgrtl.c    (working copy)
>> @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "tree.h"
>>  #include "hard-reg-set.h"
>>  #include "basic-block.h"
>> +#include "bb-reorder.h"
>
>
> You may need to modify Makefile.in to declare this new dependency.

Eh, no. cfgrtl should not depend on bb-reorder.

And cfglayout should not depend on basic block partitioning, either,
so this change:

>  /* Finalize the changes: reorder insn list according to the sequence specified
> -   by aux pointers, enter compensation code, rebuild scope forest.  */
> +   by aux pointers, enter compensation code, rebuild scope forest. If
> +   this is called when we will FINALIZE_REORDER_BLOCKS, indicate that
> +   to fixup_reorder_chain so that it can insert the proper switch text
> +   section notes.  */
>
>  void
> -cfg_layout_finalize (void)
> +cfg_layout_finalize (bool finalize_reorder_blocks)

is Just Wrong (tm).

This patch mixes up things badly from the point of
what-depends-on-what, the whole approach looks wrong to me.

Ciao!
Steven

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
  2013-05-08  5:13 ` Teresa Johnson
@ 2013-05-09 21:42   ` Diego Novillo
  2013-05-09 22:41     ` Steven Bosscher
  2013-05-10 14:29     ` Teresa Johnson
  2013-05-10  4:43   ` Jeff Law
  1 sibling, 2 replies; 35+ messages in thread
From: Diego Novillo @ 2013-05-09 21:42 UTC (permalink / raw)
  To: Teresa Johnson; +Cc: gcc-patches, Steven Bosscher, David Li

On 2013-05-08 01:13 , Teresa Johnson wrote:
> Somehow Rietveld didn't upload the patch properly. I've attached the
> patch to this email instead. Here is the description:

Rietveld has turned out to be far less useful that I had hoped.  If you 
are running ubuntu precise, the upload script is having some bad 
interaction with the server, which makes it to constantly reject your 
password.

I do not recommend using Rietveld anymore.  I don't really have the 
cycles to invest in fixing the various usability warts we've found. Sorry.


> -static void
> +void
>  emit_barrier_after_bb (basic_block bb)
>  {
>    rtx barrier = emit_barrier_after (BB_END (bb));
> -  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
> +  if (current_ir_type () == IR_RTL_CFGLAYOUT)
> +    BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);

What if the current IR is not RTL?  Should we fail here?  It doesn't 
seem like it makes sense to call this from gimple, for instance.


> +     several different possibilities. One is that there are edge 
> weight insanities
> +     due to optimization phases that do not properly update basic 
> block profile
> +     counts. The second is that the entry of the function may not be 
> hot, because
> +     it is entered fewer times than the number of profile training 
> runs, but there
> +     is a loop inside the function that causes blocks within the 
> function to be
> +     above the threshold for hotness.  */
> +  if (cold_bb_count)
> +    {
> +      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
> +

Move this out into its own function?

> +      if (dom_calculated_here)
> +        calculate_dominance_info (CDI_DOMINATORS);
> +
> +      /* Keep examining hot bbs until we have either checked them all, or
> +         re-marked all cold bbs hot.  */
> +      while (! bbs_in_hot_partition.is_empty ()
> +             && cold_bb_count)
> +        {
> +          basic_block dom_bb;
> +
> +          bb = bbs_in_hot_partition.pop ();
> +          dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
> +
> +          /* If bb's immediate dominator is also hot then it is ok.  */
> +          if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION)
> +            continue;
> +
> +          /* We have a hot bb with an immediate dominator that is cold.
> +             The dominator needs to be re-marked to hot.  */

s/to hot/hot/

> Index: cfgrtl.c
> ===================================================================
> --- cfgrtl.c    (revision 198686)
> +++ cfgrtl.c    (working copy)
> @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree.h"
>  #include "hard-reg-set.h"
>  #include "basic-block.h"
> +#include "bb-reorder.h"

You may need to modify Makefile.in to declare this new dependency.

> +/* Called when edge E has been redirected to a new destination,
> +   in order to update the region crossing flag on the edge and
> +   jump.  */
> +
> +static void
> +fixup_partition_crossing (edge e, basic_block target)
> +{
> +  rtx note;
> +
> +  gcc_assert (e->dest == target);

Then, why not just take a single argument E?

> +fixup_bb_partition (basic_block bb)
> +{
> +  edge e;
> +  edge_iterator ei;
> +
> +  /* Now need to make bb's pred edges non-region crossing.  */

This is hard to parse.

> +  /* Delete any blocks that became unreachable and weren't
> +     already cleaned up, for example during edge forwarding
> +     and convert_jumps_to_returns. This will expose more
> +     opportunities for fixing the partition boundaries here.
> +     Also, the calculation of the dominance graph during verification
> +     will assert if there are unreachable nodes.  */
> +  delete_unreachable_blocks ();

Why not just schedule a CFG cleanup as a prerequisite to this pass?


A minor formatting nit.  References to locals and function arguments 
should be done in capitals.


Diego.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
  2013-05-08  5:08 Teresa Johnson
@ 2013-05-08  5:13 ` Teresa Johnson
  2013-05-09 21:42   ` Diego Novillo
  2013-05-10  4:43   ` Jeff Law
  0 siblings, 2 replies; 35+ messages in thread
From: Teresa Johnson @ 2013-05-08  5:13 UTC (permalink / raw)
  To: gcc-patches, Steven Bosscher, David Li, reply

[-- Attachment #1: Type: text/plain, Size: 2713 bytes --]

Somehow Rietveld didn't upload the patch properly. I've attached the
patch to this email instead. Here is the description:

I had sent this patch awhile back to address failures when using
-freorder-blocks-and-partition. Could one of the global maintainers
review it? Without these fixes this option is broken for many codes.

The patch is largely identical to the version sent out before, but I
just updated my client and re-did the bootstrap and regression testing
on x86_64-unknown-linux-gnu. I also just rebuilt and tested cpu2006 (both int
and fp) with profile feedback and -freorder-blocks-and-partition.

Here is the description from the earlier mail:

----------------------
Revised patch that fixes failures encountered when enabling
-freorder-blocks-and-partition, including the failure reported in PR 53743.

This includes new verification code to ensure no cold blocks dominate hot
blocks contributed by Steven Bosscher.

I attempted to make the handling of partition updates through the optimization
passes much more consistent, removing a number of partial fixes in the code
stream in the process. The code to fixup partitions (including the BB_PARTITION
assignment, region crossing jump notes, and switch text section notes) is
now handled in a few centralized locations. For example, inside
rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers
don't need to attempt the fixup themselves.

For optimization passes that make adjustments to the cfg while in cfg layout
mode that are not easy to fix up incrementally, the new routine
fixup_partitions handles the cleanup globally. This does require calculation
of the dominance relation, however, as far as I can tell the routines which
now invoke this global fixup (try_optimize_cfg and commit_edge_insertions)
are invoked typically once (or a small number of times in the case of
try_optimize_cfg) per optimization pass. Additionally, I compared the
-ftime-report output for some large fdo compilations and saw only minimal
increases in the dominance computation times, which were only a tiny percent
of the overall compile time.

Additionally, I added a flag to the rtl_data structure to indicate whether
any partitioning was actually performed, so that optimizations which were
conservatively disabled whenever the flag_reorder_blocks_and_partition
is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less
conservative for functions where no partitions were formed (e.g. they are
completely hot).
----------------------

Ok for trunk?

Thanks,
Teresa

(patch attached)

2013/5/7 Teresa Johnson <tejohnson@google.com>:
>



-- 
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413

[-- Attachment #2: patch.diff.050713 --]
[-- Type: application/octet-stream, Size: 42292 bytes --]

I had sent this patch awhile back to address failures when using
-freorder-blocks-and-partition. Could one of the global maintainers 
review it? Without these fixes this option is broken for many codes.

The patch is largely identical to the version sent out before, but I
just updated my client and re-did the bootstrap and regression testing
on x86_64-unknown-linux-gnu. I also just rebuilt and tested cpu2006 (both int
and fp) with profile feedback and -freorder-blocks-and-partition.

Here is the description from the earlier mail:

----------------------
Revised patch that fixes failures encountered when enabling
-freorder-blocks-and-partition, including the failure reported in PR 53743.

This includes new verification code to ensure no cold blocks dominate hot
blocks contributed by Steven Bosscher.

I attempted to make the handling of partition updates through the optimization
passes much more consistent, removing a number of partial fixes in the code
stream in the process. The code to fixup partitions (including the BB_PARTITION
assignment, region crossing jump notes, and switch text section notes) is
now handled in a few centralized locations. For example, inside
rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers
don't need to attempt the fixup themselves.

For optimization passes that make adjustments to the cfg while in cfg layout
mode that are not easy to fix up incrementally, the new routine
fixup_partitions handles the cleanup globally. This does require calculation
of the dominance relation, however, as far as I can tell the routines which
now invoke this global fixup (try_optimize_cfg and commit_edge_insertions)
are invoked typically once (or a small number of times in the case of
try_optimize_cfg) per optimization pass. Additionally, I compared the
-ftime-report output for some large fdo compilations and saw only minimal
increases in the dominance computation times, which were only a tiny percent
of the overall compile time.

Additionally, I added a flag to the rtl_data structure to indicate whether
any partitioning was actually performed, so that optimizations which were
conservatively disabled whenever the flag_reorder_blocks_and_partition
is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less
conservative for functions where no partitions were formed (e.g. they are
completely hot).
----------------------

Ok for trunk?

Thanks,
Teresa

2013-05-07  Teresa Johnson  <tejohnson@google.com>
            Steven Bosscher  <steven@gcc.gnu.org>

	* bb-reorder.c (connect_traces): Only look for partitions and skip
        block copying if any blocks in function actually partitioned.
	(emit_barrier_after_bb): Handle insertion in non-cfglayout mode.
        (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure
        that no cold blocks dominate a hot block.
	(fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert
        as this is now done by force_nonfallthru_and_redirect.
	(add_reg_crossing_jump_notes): Handle the fact that some jumps may
        already be marked with region crossing note.
	(reorder_basic_blocks): Only need to verify partitions if any
        blocks in function actually partitioned.
	(insert_section_boundary_note): Only need to insert note if any
        blocks in function actually partitioned.
	(rest_of_handle_reorder_blocks): New cfg_layout_finalize
        parameter, and remove call to insert_section_boundary_note as this
        is now called via cfg_layout_finalize/fixup_reorder_chain.
        Invoke cleanup_cfg after exiting layout mode to enable additional
        cleanup.
	(duplicate_computed_gotos): New cfg_layout_finalize
        parameter.
	(partition_hot_cold_basic_blocks): Set flag indicating function
        has bb partitions.
	* bb-reorder.h: Declare insert_section_boundary_note and
        emit_barrier_after_bb, which are no longer static.
	* basic-block.h: Declare new function fixup_partitions.
	* cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary
        check for region crossing note.
	(fixup_partition_crossing): New function.
	(fixup_bb_partition): Ditto.
	(rtl_redirect_edge_and_branch): Fixup partition boundaries.
	(force_nonfallthru_and_redirect): Fixup partition boundaries,
        remove old code that tried to do this. Emit barrier correctly
        when we are in cfglayout mode.
	(rtl_split_edge): Correctly fixup partition boundaries.
	(commit_one_edge_insertion): Remove old code that tried to
        fixup region crossing edge since this is now handled in
        split_block, and set up insertion point correctly since
        block may now end in a jump.
	(commit_edge_insertions): Invoke fixup_partitions to sanitize partition
        boundaries after optimizations that modify cfg and before trying to
        verify the flow info.
	(fixup_partitions): New function.
	(rtl_verify_flow_info_1): Add verification that no cold bbs dominate
        hot bbs.
	(record_effective_endpoints): Remove region-crossing notes and set flag
        indicating that they need to be reinserted on exit from cfglayout mode.
	(outof_cfg_layout_mode): New cfg_layout_finalize parameter.
	(fixup_reorder_chain): Call insert_section_boundary_note if necessary.
        Remove old code that attempted to fixup region crossing note as
        this is now handled in force_nonfallthru_and_redirect.
	(duplicate_insn_chain): Don't duplicate switch section notes.
	(cfg_layout_finalize): Pass new parameter to fixup_reorder_chain.
	(rtl_can_remove_branch_p): Remove unnecessary check for region crossing
        note.
	* cfghooks.h (cfg_layout_finalize): New parameter.
	* modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize
        parameter.
	* ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert
        as this is now done by redirect_edge_and_branch_force.
	* function.c (thread_prologue_and_epilogue_insns): Insert new bb after
        barriers, new cfg_layout_finalize parameter, and don't store exit
        predecessor BB until after it is potentially split.
	* function.h (struct rtl_data): New flag has_bb_partition.
	* hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter.
	* cfgcleanup.c (try_forward_edges): Enable forwarding block removal
        across partition boundaries when in CFGRTL mode.
	(try_crossjump_to_edge): Only skip optimization if
        any blocks in function actually partitioned.
	(try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean
        up partitioning.

Index: bb-reorder.c
===================================================================
--- bb-reorder.c	(revision 198686)
+++ bb-reorder.c	(working copy)
@@ -1053,7 +1053,7 @@ connect_traces (int n_traces, struct trace *traces
   current_partition = BB_PARTITION (traces[0].first);
   two_passes = false;
 
-  if (flag_reorder_blocks_and_partition)
+  if (crtl->has_bb_partition)
     for (i = 0; i < n_traces && !two_passes; i++)
       if (BB_PARTITION (traces[0].first)
 	  != BB_PARTITION (traces[i].first))
@@ -1262,7 +1262,7 @@ connect_traces (int n_traces, struct trace *traces
 		      }
 		  }
 
-	      if (flag_reorder_blocks_and_partition)
+	      if (crtl->has_bb_partition)
 		try_copy = false;
 
 	      /* Copy tiny blocks always; copy larger blocks only when the
@@ -1380,13 +1380,14 @@ get_uncond_jump_length (void)
   return length;
 }
 
-/* Emit a barrier into the footer of BB.  */
+/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode.  */
 
-static void
+void
 emit_barrier_after_bb (basic_block bb)
 {
   rtx barrier = emit_barrier_after (BB_END (bb));
-  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
+  if (current_ir_type () == IR_RTL_CFGLAYOUT)
+    BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
 }
 
 /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.
@@ -1462,18 +1463,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg
 {
   vec<edge> crossing_edges = vNULL;
   basic_block bb;
-  edge e;
-  edge_iterator ei;
+  edge e, e2;
+  edge_iterator ei, ei2;
+  unsigned int cold_bb_count = 0;
+  vec<basic_block> bbs_in_hot_partition = vNULL;
+  vec<basic_block> bbs_newly_hot = vNULL;
 
   /* Mark which partition (hot/cold) each basic block belongs in.  */
   FOR_EACH_BB (bb)
     {
       if (probably_never_executed_bb_p (cfun, bb))
-	BB_SET_PARTITION (bb, BB_COLD_PARTITION);
+        {
+          BB_SET_PARTITION (bb, BB_COLD_PARTITION);
+          cold_bb_count++;
+        }
       else
-	BB_SET_PARTITION (bb, BB_HOT_PARTITION);
+        {
+          BB_SET_PARTITION (bb, BB_HOT_PARTITION);
+          bbs_in_hot_partition.safe_push (bb);
+        }
     }
 
+  /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of
+     several different possibilities. One is that there are edge weight insanities
+     due to optimization phases that do not properly update basic block profile
+     counts. The second is that the entry of the function may not be hot, because
+     it is entered fewer times than the number of profile training runs, but there
+     is a loop inside the function that causes blocks within the function to be
+     above the threshold for hotness.  */
+  if (cold_bb_count)
+    {
+      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
+
+      if (dom_calculated_here)
+        calculate_dominance_info (CDI_DOMINATORS);
+
+      /* Keep examining hot bbs until we have either checked them all, or
+         re-marked all cold bbs hot.  */
+      while (! bbs_in_hot_partition.is_empty ()
+             && cold_bb_count)
+        {
+          basic_block dom_bb;
+
+          bb = bbs_in_hot_partition.pop ();
+          dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
+
+          /* If bb's immediate dominator is also hot then it is ok.  */
+          if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION)
+            continue;
+
+          /* We have a hot bb with an immediate dominator that is cold.
+             The dominator needs to be re-marked to hot.  */
+          BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION);
+          cold_bb_count--;
+
+          /* Now we need to examine newly-hot dom_bb to see if it is also
+             dominated by a cold bb.  */
+          bbs_in_hot_partition.safe_push (dom_bb);
+
+          /* We should also adjust any cold blocks that the newly-hot bb
+             feeds and see if it makes sense to re-mark those as hot as
+             well.  */
+          bbs_newly_hot.safe_push (dom_bb);
+          while (! bbs_newly_hot.is_empty ())
+            {
+              basic_block new_hot_bb = bbs_newly_hot.pop ();
+              /* Examine all successors of this newly-hot bb to see if they
+                 are cold and should be re-marked as hot.  */
+              FOR_EACH_EDGE (e, ei, new_hot_bb->succs)
+                {
+                  bool any_cold_preds = false;
+                  basic_block succ = e->dest;
+                  if (BB_PARTITION (succ) != BB_COLD_PARTITION)
+                    continue;
+                  /* Does this block have any cold predecessors now?  */
+                  FOR_EACH_EDGE (e2, ei2, succ->preds)
+                  {
+                    if (BB_PARTITION (e2->src) == BB_COLD_PARTITION)
+                      {
+                        any_cold_preds = true;
+                        break;
+                      }
+                  }
+                  if (any_cold_preds)
+                    continue;
+
+                  /* Here we have a successor of newly-hot bb that is cold
+                     but no longer has any cold precessessors. Since the original
+                     assignment of our newly-hot bb was incorrect, this successor's
+                     assignment as cold is also suspect. Go ahead and re-mark it
+                     as hot now too. Better heuristics may be in order here.  */
+                  BB_SET_PARTITION (succ, BB_HOT_PARTITION);
+                  cold_bb_count--;
+                  bbs_in_hot_partition.safe_push (succ);
+                  /* Examine this successor as a newly-hot bb.  */
+                  bbs_newly_hot.safe_push (succ);
+                }
+            }
+        }
+
+      if (dom_calculated_here)
+        free_dominance_info (CDI_DOMINATORS);
+    }
+
   /* The format of .gcc_except_table does not allow landing pads to
      be in a different partition as the throw.  Fix this by either
      moving or duplicating the landing pads.  */
@@ -1765,10 +1857,10 @@ fix_up_fall_thru_edges (void)
 		      new_bb->aux = cur_bb->aux;
 		      cur_bb->aux = new_bb;
 
-		      /* Make sure new fall-through bb is in same
-			 partition as bb it's falling through from.  */
+                      /* This is done by force_nonfallthru_and_redirect.  */
+		      gcc_assert (BB_PARTITION (new_bb)
+                                  == BB_PARTITION (cur_bb));
 
-		      BB_COPY_PARTITION (new_bb, cur_bb);
 		      single_succ_edge (new_bb)->flags |= EDGE_CROSSING;
 		    }
 		  else
@@ -2064,7 +2156,10 @@ add_reg_crossing_jump_notes (void)
   FOR_EACH_BB (bb)
     FOR_EACH_EDGE (e, ei, bb->succs)
       if ((e->flags & EDGE_CROSSING)
-	  && JUMP_P (BB_END (e->src)))
+	  && JUMP_P (BB_END (e->src))
+          /* Some notes were added during fix_up_fall_thru_edges, via
+             force_nonfallthru_and_redirect.  */
+          && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX))
 	add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
 }
 
@@ -2157,7 +2252,7 @@ reorder_basic_blocks (void)
       dump_flow_info (dump_file, dump_flags);
     }
 
-  if (flag_reorder_blocks_and_partition)
+  if (crtl->has_bb_partition)
     verify_hot_cold_block_grouping ();
 }
 
@@ -2169,13 +2264,13 @@ reorder_basic_blocks (void)
    encountering this note will make the compiler switch between the
    hot and cold text sections.  */
 
-static void
+void
 insert_section_boundary_note (void)
 {
   basic_block bb;
   int first_partition = 0;
 
-  if (!flag_reorder_blocks_and_partition)
+  if (!crtl->has_bb_partition)
     return;
 
   FOR_EACH_BB (bb)
@@ -2214,10 +2309,16 @@ rest_of_handle_reorder_blocks (void)
   FOR_EACH_BB (bb)
     if (bb->next_bb != EXIT_BLOCK_PTR)
       bb->aux = bb->next_bb;
-  cfg_layout_finalize ();
+  cfg_layout_finalize (true);
 
-  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
-  insert_section_boundary_note ();
+  /* Invoke the cleanup again once we are out of cfg layout mode
+     after committing the final bb layout above. This enables removal
+     of forwarding blocks across the hot/cold section boundary when
+     splitting is enabled that were necessary while in cfg layout
+     mode.  */
+  if (crtl->has_bb_partition)
+    cleanup_cfg (CLEANUP_EXPENSIVE);
+
   return 0;
 }
 
@@ -2358,7 +2459,7 @@ duplicate_computed_gotos (void)
     }
 
 done:
-  cfg_layout_finalize ();
+  cfg_layout_finalize (false);
 
   BITMAP_FREE (candidates);
   return 0;
@@ -2503,6 +2604,8 @@ partition_hot_cold_basic_blocks (void)
   if (!crossing_edges.exists ())
     return 0;
 
+  crtl->has_bb_partition = true;
+
   /* Make sure the source of any crossing edge ends in a jump and the
      destination of any crossing edge has a label.  */
   add_labels_and_missing_jumps (crossing_edges);
Index: bb-reorder.h
===================================================================
--- bb-reorder.h	(revision 198686)
+++ bb-reorder.h	(working copy)
@@ -35,4 +35,8 @@ extern struct target_bb_reorder *this_target_bb_re
 
 extern int get_uncond_jump_length (void);
 
+extern void insert_section_boundary_note (void);
+
+extern void emit_barrier_after_bb (basic_block bb);
+
 #endif
Index: basic-block.h
===================================================================
--- basic-block.h	(revision 198686)
+++ basic-block.h	(working copy)
@@ -796,6 +796,7 @@ extern basic_block force_nonfallthru_and_redirect
 extern bool contains_no_active_insn_p (const_basic_block);
 extern bool forwarder_block_p (const_basic_block);
 extern bool can_fallthru (basic_block, basic_block);
+extern void fixup_partitions (void);
 
 /* In cfgbuild.c.  */
 extern void find_many_sub_basic_blocks (sbitmap);
Index: cfgrtl.c
===================================================================
--- cfgrtl.c	(revision 198686)
+++ cfgrtl.c	(working copy)
@@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree.h"
 #include "hard-reg-set.h"
 #include "basic-block.h"
+#include "bb-reorder.h"
 #include "regs.h"
 #include "flags.h"
 #include "function.h"
@@ -65,11 +66,12 @@ along with GCC; see the file COPYING3.  If not see
    Only applicable if the CFG is in cfglayout mode.  */
 static GTY(()) rtx cfg_layout_function_footer;
 static GTY(()) rtx cfg_layout_function_header;
+static bool had_sec_boundary_notes;
 
 static rtx skip_insns_after_block (basic_block);
 static void record_effective_endpoints (void);
 static rtx label_for_bb (basic_block);
-static void fixup_reorder_chain (void);
+static void fixup_reorder_chain (bool finalize_reorder_blocks);
 
 void verify_insn_chain (void);
 static void fixup_fallthru_exit_predecessor (void);
@@ -981,8 +983,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc
      partition boundaries).  See  the comments at the top of
      bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
 
-  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
-      || BB_PARTITION (src) != BB_PARTITION (target))
+  if (BB_PARTITION (src) != BB_PARTITION (target))
     return NULL;
 
   /* We can replace or remove a complex jump only when we have exactly
@@ -1291,6 +1292,70 @@ redirect_branch_edge (edge e, basic_block target)
   return e;
 }
 
+/* Called when edge E has been redirected to a new destination,
+   in order to update the region crossing flag on the edge and
+   jump.  */
+
+static void
+fixup_partition_crossing (edge e, basic_block target)
+{
+  rtx note;
+
+  gcc_assert (e->dest == target);
+
+  if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR)
+    return;
+  /* If we redirected an existing edge, it may already be marked
+     crossing, even though the new src is missing a reg crossing note.
+     But make sure reg crossing note doesn't already exist before
+     inserting.  */
+  if (BB_PARTITION (e->src) != BB_PARTITION (target))
+    {
+      e->flags |= EDGE_CROSSING;
+      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
+      if (JUMP_P (BB_END (e->src))
+          && !note)
+        add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
+    }
+  else if (BB_PARTITION (e->src) == BB_PARTITION (target))
+    {
+      e->flags &= ~EDGE_CROSSING;
+      /* Remove the region crossing note from jump at end of
+         e->src if it exists.  */
+      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
+      if (note)
+        remove_note (BB_END (e->src), note);
+    }
+}
+
+/* Called when block BB has been reassigned to a different partition,
+   to ensure that the region crossing attributes are updated.  */
+
+static void
+fixup_bb_partition (basic_block bb)
+{
+  edge e;
+  edge_iterator ei;
+
+  /* Now need to make bb's pred edges non-region crossing.  */
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    {
+      fixup_partition_crossing (e, e->dest);
+    }
+
+  /* Possibly need to make bb's successor edges region crossing,
+     or remove stale region crossing.  */
+  FOR_EACH_EDGE (e, ei, bb->succs)
+    {
+      if ((e->flags & EDGE_FALLTHRU)
+          && BB_PARTITION (bb) != BB_PARTITION (e->dest)
+          && e->dest != EXIT_BLOCK_PTR)
+        force_nonfallthru (e);
+      else
+        fixup_partition_crossing (e, e->dest);
+    }
+}
+
 /* Attempt to change code to redirect edge E to TARGET.  Don't do that on
    expense of adding new instructions or reordering basic blocks.
 
@@ -1307,16 +1372,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block
 {
   edge ret;
   basic_block src = e->src;
+  basic_block dest = e->dest;
 
   if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
     return NULL;
 
-  if (e->dest == target)
+  if (dest == target)
     return e;
 
   if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL)
     {
       df_set_bb_dirty (src);
+      fixup_partition_crossing (ret, target);
       return ret;
     }
 
@@ -1325,6 +1392,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block
     return NULL;
 
   df_set_bb_dirty (src);
+  fixup_partition_crossing (ret, target);
   return ret;
 }
 
@@ -1492,18 +1560,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
       /* Make sure new block ends up in correct hot/cold section.  */
 
       BB_COPY_PARTITION (jump_block, e->src);
-      if (flag_reorder_blocks_and_partition
-	  && targetm_common.have_named_sections
-	  && JUMP_P (BB_END (jump_block))
-	  && !any_condjump_p (BB_END (jump_block))
-	  && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING))
-	add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX);
 
       /* Wire edge in.  */
       new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU);
       new_edge->probability = probability;
       new_edge->count = count;
 
+      /* If e->src was previously region crossing, it no longer is
+         and the reg crossing note should be removed.  */
+      fixup_partition_crossing (new_edge, jump_block);
+
       /* Redirect old edge.  */
       redirect_edge_pred (e, jump_block);
       e->probability = REG_BR_PROB_BASE;
@@ -1559,13 +1625,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
       LABEL_NUSES (label)++;
     }
 
-  emit_barrier_after (BB_END (jump_block));
+  /* We might be in cfg layout mode, and if so, the following routine will
+     insert the barrier correctly.  */
+  emit_barrier_after_bb (jump_block);
   redirect_edge_succ_nodup (e, target);
 
   if (abnormal_edge_flags)
     make_edge (src, target, abnormal_edge_flags);
 
   df_mark_solutions_dirty ();
+  fixup_partition_crossing (e, target);
   return new_bb;
 }
 
@@ -1664,7 +1733,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU
 static basic_block
 rtl_split_edge (edge edge_in)
 {
-  basic_block bb;
+  basic_block bb, new_bb;
   rtx before;
 
   /* Abnormal edges cannot be split.  */
@@ -1697,12 +1766,26 @@ rtl_split_edge (edge edge_in)
   else
     {
       bb = create_basic_block (before, NULL, edge_in->dest->prev_bb);
-      /* ??? Why not edge_in->dest->prev_bb here?  */
-      BB_COPY_PARTITION (bb, edge_in->dest);
+      if (edge_in->src == ENTRY_BLOCK_PTR)
+        BB_COPY_PARTITION (bb, edge_in->dest);
+      else
+        /* Put the split bb into the src partition, to avoid creating
+           a situation where a cold bb dominates a hot bb, in the case
+           where src is cold and dest is hot. The src will dominate
+           the new bb (whereas it might not have dominated dest).  */
+        BB_COPY_PARTITION (bb, edge_in->src);
     }
 
   make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU);
 
+  /* Can't allow a region crossing edge to be fallthrough.  */
+  if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest)
+      && edge_in->dest != EXIT_BLOCK_PTR)
+    {
+      new_bb = force_nonfallthru (single_succ_edge (bb));
+      gcc_assert (!new_bb);
+    }
+
   /* For non-fallthru edges, we must adjust the predecessor's
      jump instruction to target our new block.  */
   if ((edge_in->flags & EDGE_FALLTHRU) == 0)
@@ -1815,17 +1898,13 @@ commit_one_edge_insertion (edge e)
   else
     {
       bb = split_edge (e);
-      after = BB_END (bb);
 
-      if (flag_reorder_blocks_and_partition
-	  && targetm_common.have_named_sections
-	  && e->src != ENTRY_BLOCK_PTR
-	  && BB_PARTITION (e->src) == BB_COLD_PARTITION
-	  && !(e->flags & EDGE_CROSSING)
-	  && JUMP_P (after)
-	  && !any_condjump_p (after)
-	  && (single_succ_edge (bb)->flags & EDGE_CROSSING))
-	add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX);
+      /* If e crossed a partition boundary, we needed to make bb end in
+         a region-crossing jump, even though it was originally fallthru.  */
+      if (JUMP_P (BB_END (bb)))
+	before = BB_END (bb);
+      else
+        after = BB_END (bb);
     }
 
   /* Now that we've found the spot, do the insertion.  */
@@ -1865,6 +1944,14 @@ commit_edge_insertions (void)
 {
   basic_block bb;
 
+  /* Optimization passes that invoke this routine can cause hot blocks
+     previously reached by both hot and cold blocks to become dominated only
+     by cold blocks. This will cause the verification below to fail,
+     and lead to now cold code in the hot section. In some cases this
+     may only be visible after newly unreachable blocks are deleted,
+     which will be done by fixup_partitions.  */
+  fixup_partitions ();
+
 #ifdef ENABLE_CHECKING
   verify_flow_info ();
 #endif
@@ -2058,7 +2145,75 @@ get_last_bb_insn (basic_block bb)
 
   return end;
 }
-\f
+
+/* Perform cleanup on the hot/cold bb partitioning after optimization
+   passes that modify the cfg.  */
+
+void
+fixup_partitions (void)
+{
+  basic_block bb;
+
+  if (!crtl->has_bb_partition)
+    return;
+
+  /* Delete any blocks that became unreachable and weren't
+     already cleaned up, for example during edge forwarding
+     and convert_jumps_to_returns. This will expose more
+     opportunities for fixing the partition boundaries here.
+     Also, the calculation of the dominance graph during verification
+     will assert if there are unreachable nodes.  */
+  delete_unreachable_blocks ();
+
+  /* If there are partitions, do a sanity check on them: A basic block in
+     a cold partition cannot dominate a basic block in a hot partition.
+     Fixup any that now violate this requirement, as a result of edge
+     forwarding and unreachable block deletion.  */
+  vec<basic_block> bbs_in_cold_partition = vNULL;
+  vec<basic_block> bbs_to_fix = vNULL;
+  FOR_EACH_BB (bb)
+    if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
+      bbs_in_cold_partition.safe_push (bb);
+  if (! bbs_in_cold_partition.is_empty  ())
+    {
+      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
+      basic_block son;
+
+      if (dom_calculated_here)
+        calculate_dominance_info (CDI_DOMINATORS);
+
+      while (! bbs_in_cold_partition.is_empty  ())
+        {
+          bb = bbs_in_cold_partition.pop ();
+          /* If bb is not yet cold (because it was added below as
+             a block dominated by a cold bb) then mark it cold here.  */
+          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
+            {
+              BB_SET_PARTITION (bb, BB_COLD_PARTITION);
+              bbs_to_fix.safe_push (bb);
+            }
+          /* Any blocks dominated by a block in the cold section
+             must also be cold.  */
+          for (son = first_dom_son (CDI_DOMINATORS, bb);
+               son;
+               son = next_dom_son (CDI_DOMINATORS, son))
+            bbs_in_cold_partition.safe_push (son);
+        }
+
+      if (dom_calculated_here)
+        free_dominance_info (CDI_DOMINATORS);
+    }
+
+  /* Do the partition fixup after all necessary blocks have been converted to
+     cold, so that we only update the region crossings the minimum number of
+     places, which can require forcing edges to be non fallthru.  */
+  while (! bbs_to_fix.is_empty ())
+    {
+      bb = bbs_to_fix.pop ();
+      fixup_bb_partition (bb);
+    }
+}
+
 /* Verify the CFG and RTL consistency common for both underlying RTL and
    cfglayout RTL.
 
@@ -2082,6 +2237,7 @@ rtl_verify_flow_info_1 (void)
   rtx x;
   int err = 0;
   basic_block bb;
+  bool have_partitions = false;
 
   /* Check the general integrity of the basic blocks.  */
   FOR_EACH_BB_REVERSE (bb)
@@ -2199,6 +2355,8 @@ rtl_verify_flow_info_1 (void)
 
 	  if (e->flags & EDGE_ABNORMAL)
 	    n_abnormal++;
+
+          have_partitions |= is_crossing;
 	}
 
       if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX))
@@ -2323,6 +2481,40 @@ rtl_verify_flow_info_1 (void)
 	  }
     }
 
+  /* If there are partitions, do a sanity check on them: A basic block in
+     a cold partition cannot dominate a basic block in a hot partition.  */
+  vec<basic_block> bbs_in_cold_partition = vNULL;
+  if (have_partitions && !err)
+    FOR_EACH_BB (bb)
+      if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
+        bbs_in_cold_partition.safe_push (bb);
+  if (! bbs_in_cold_partition.is_empty ())
+    {
+      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
+      basic_block son;
+
+      if (dom_calculated_here)
+        calculate_dominance_info (CDI_DOMINATORS);
+
+      while (! bbs_in_cold_partition.is_empty ())
+        {
+          bb = bbs_in_cold_partition.pop ();
+          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
+            {
+              error ("non-cold basic block %d dominated "
+                     "by a block in the cold partition", bb->index);
+              err = 1;
+            }
+          for (son = first_dom_son (CDI_DOMINATORS, bb);
+               son;
+               son = next_dom_son (CDI_DOMINATORS, son))
+            bbs_in_cold_partition.safe_push (son);
+        }
+
+      if (dom_calculated_here)
+        free_dominance_info (CDI_DOMINATORS);
+    }
+
   /* Clean up.  */
   return err;
 }
@@ -2996,14 +3188,41 @@ record_effective_endpoints (void)
   else
     cfg_layout_function_header = NULL_RTX;
 
+  had_sec_boundary_notes = false;
+
   next_insn = get_insns ();
   FOR_EACH_BB (bb)
     {
       rtx end;
 
       if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb))
-	BB_HEADER (bb) = unlink_insn_chain (next_insn,
-					      PREV_INSN (BB_HEAD (bb)));
+        {
+          /* Rather than try to keep section boundary notes incrementally
+             up-to-date through cfg layout optimizations, simply remove them
+             and flag that they should be re-inserted when exiting
+             cfg layout mode.  */
+          rtx check_insn = next_insn;
+          while (check_insn)
+            {
+              if (NOTE_P (check_insn)
+                  && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS)
+              {
+                had_sec_boundary_notes |= true;
+                /* Remove note from chain. Grab new next_insn first.  */
+                if (next_insn == check_insn)
+                  next_insn = NEXT_INSN (check_insn);
+                /* Delete note.  */
+                delete_insn (check_insn);
+                /* There will only be one.  */
+                break;
+              }
+              check_insn = NEXT_INSN (check_insn);
+            }
+          /* If we still have header instructions left after above loop.  */
+          if (next_insn != BB_HEAD (bb))
+            BB_HEADER (bb) = unlink_insn_chain (next_insn,
+                                                PREV_INSN (BB_HEAD (bb)));
+        }
       end = skip_insns_after_block (bb);
       if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end)
 	BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end);
@@ -3031,7 +3250,7 @@ outof_cfg_layout_mode (void)
     if (bb->next_bb != EXIT_BLOCK_PTR)
       bb->aux = bb->next_bb;
 
-  cfg_layout_finalize ();
+  cfg_layout_finalize (false);
 
   return 0;
 }
@@ -3151,10 +3370,13 @@ relink_block_chain (bool stay_in_cfglayout_mode)
 }
 \f
 
-/* Given a reorder chain, rearrange the code to match.  */
+/* Given a reorder chain, rearrange the code to match. If
+   this is called when we will FINALIZE_REORDER_BLOCKS, or when
+   section boundary notes were removed on entry to cfg layout
+   mode, insert section boundary notes here.  */
 
 static void
-fixup_reorder_chain (void)
+fixup_reorder_chain (bool finalize_reorder_blocks)
 {
   basic_block bb;
   rtx insn = NULL;
@@ -3181,7 +3403,7 @@ static void
 	  PREV_INSN (BB_HEADER (bb)) = insn;
 	  insn = BB_HEADER (bb);
 	  while (NEXT_INSN (insn))
-	    insn = NEXT_INSN (insn);
+            insn = NEXT_INSN (insn);
 	}
       if (insn)
 	NEXT_INSN (insn) = BB_HEAD (bb);
@@ -3206,6 +3428,11 @@ static void
     insn = NEXT_INSN (insn);
 
   set_last_insn (insn);
+
+  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
+  if (had_sec_boundary_notes || finalize_reorder_blocks)
+    insert_section_boundary_note ();
+
 #ifdef ENABLE_CHECKING
   verify_insn_chain ();
 #endif
@@ -3218,7 +3445,7 @@ static void
       edge e_fall, e_taken, e;
       rtx bb_end_insn;
       rtx ret_label = NULL_RTX;
-      basic_block nb, src_bb;
+      basic_block nb;
       edge_iterator ei;
 
       if (EDGE_COUNT (bb->succs) == 0)
@@ -3353,7 +3580,6 @@ static void
       /* We got here if we need to add a new jump insn. 
 	 Note force_nonfallthru can delete E_FALL and thus we have to
 	 save E_FALL->src prior to the call to force_nonfallthru.  */
-      src_bb = e_fall->src;
       nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label);
       if (nb)
 	{
@@ -3361,17 +3587,6 @@ static void
 	  bb->aux = nb;
 	  /* Don't process this new block.  */
 	  bb = nb;
-
-	  /* Make sure new bb is tagged for correct section (same as
-	     fall-thru source, since you cannot fall-thru across
-	     section boundaries).  */
-	  BB_COPY_PARTITION (src_bb, single_pred (bb));
-	  if (flag_reorder_blocks_and_partition
-	      && targetm_common.have_named_sections
-	      && JUMP_P (BB_END (bb))
-	      && !any_condjump_p (BB_END (bb))
-	      && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING))
-	    add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX);
 	}
     }
 
@@ -3671,10 +3886,11 @@ duplicate_insn_chain (rtx from, rtx to)
 	    case NOTE_INSN_FUNCTION_BEG:
 	      /* There is always just single entry to function.  */
 	    case NOTE_INSN_BASIC_BLOCK:
+              /* We should only switch text sections once.  */
+	    case NOTE_INSN_SWITCH_TEXT_SECTIONS:
 	      break;
 
 	    case NOTE_INSN_EPILOGUE_BEG:
-	    case NOTE_INSN_SWITCH_TEXT_SECTIONS:
 	      emit_note_copy (insn);
 	      break;
 
@@ -3786,10 +4002,13 @@ break_superblocks (void)
 }
 
 /* Finalize the changes: reorder insn list according to the sequence specified
-   by aux pointers, enter compensation code, rebuild scope forest.  */
+   by aux pointers, enter compensation code, rebuild scope forest. If
+   this is called when we will FINALIZE_REORDER_BLOCKS, indicate that
+   to fixup_reorder_chain so that it can insert the proper switch text
+   section notes.  */
 
 void
-cfg_layout_finalize (void)
+cfg_layout_finalize (bool finalize_reorder_blocks)
 {
 #ifdef ENABLE_CHECKING
   verify_flow_info ();
@@ -3802,7 +4021,7 @@ void
 #endif
       )
     fixup_fallthru_exit_predecessor ();
-  fixup_reorder_chain ();
+  fixup_reorder_chain (finalize_reorder_blocks);
 
   rebuild_jump_labels (get_insns ());
   delete_dead_jumptables ();
@@ -4486,8 +4705,7 @@ rtl_can_remove_branch_p (const_edge e)
   if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
     return false;
 
-  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
-      || BB_PARTITION (src) != BB_PARTITION (target))
+  if (BB_PARTITION (src) != BB_PARTITION (target))
     return false;
 
   if (!onlyjump_p (insn)
Index: cfghooks.h
===================================================================
--- cfghooks.h	(revision 198686)
+++ cfghooks.h	(working copy)
@@ -206,7 +206,7 @@ extern void copy_bbs (basic_block *, unsigned, bas
 void account_profile_record (struct profile_record *, int);
 
 extern void cfg_layout_initialize (unsigned int);
-extern void cfg_layout_finalize (void);
+extern void cfg_layout_finalize (bool);
 
 /* Hooks containers.  */
 extern struct cfg_hooks gimple_cfg_hooks;
@@ -220,4 +220,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi
 extern void gimple_register_cfg_hooks (void);
 extern struct cfg_hooks get_cfg_hooks (void);
 extern void set_cfg_hooks (struct cfg_hooks);
-
Index: modulo-sched.c
===================================================================
--- modulo-sched.c	(revision 198686)
+++ modulo-sched.c	(working copy)
@@ -3346,7 +3346,7 @@ rest_of_handle_sms (void)
     if (bb->next_bb != EXIT_BLOCK_PTR)
       bb->aux = bb->next_bb;
   free_dominance_info (CDI_DOMINATORS);
-  cfg_layout_finalize ();
+  cfg_layout_finalize (false);
 #endif /* INSN_SCHEDULING */
   return 0;
 }
Index: ifcvt.c
===================================================================
--- ifcvt.c	(revision 198686)
+++ ifcvt.c	(working copy)
@@ -3905,10 +3905,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg
   if (new_bb)
     {
       df_bb_replace (then_bb_index, new_bb);
-      /* Since the fallthru edge was redirected from test_bb to new_bb,
-         we need to ensure that new_bb is in the same partition as
-         test bb (you can not fall through across section boundaries).  */
-      BB_COPY_PARTITION (new_bb, test_bb);
+      /* This should have been done above via force_nonfallthru_and_redirect
+         (possibly called from redirect_edge_and_branch_force).  */
+      gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb));
     }
 
   num_true_changes++;
Index: function.c
===================================================================
--- function.c	(revision 198686)
+++ function.c	(working copy)
@@ -6270,8 +6270,10 @@ thread_prologue_and_epilogue_insns (void)
 		    break;
 		if (e)
 		  {
-		    copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)),
-						  NULL_RTX, e->src);
+                    /* Make sure we insert after any barriers.  */
+                    rtx end = get_last_bb_insn (e->src);
+                    copy_bb = create_basic_block (NEXT_INSN (end),
+                                                  NULL_RTX, e->src);
 		    BB_COPY_PARTITION (copy_bb, e->src);
 		  }
 		else
@@ -6496,7 +6498,7 @@ thread_prologue_and_epilogue_insns (void)
 	if (cur_bb->index >= NUM_FIXED_BLOCKS
 	    && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS)
 	  cur_bb->aux = cur_bb->next_bb;
-      cfg_layout_finalize ();
+      cfg_layout_finalize (false);
     }
 
 epilogue_done:
@@ -6538,7 +6540,7 @@ epilogue_done:
       basic_block simple_return_block_cold = NULL;
       edge pending_edge_hot = NULL;
       edge pending_edge_cold = NULL;
-      basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb;
+      basic_block exit_pred;
       int i;
 
       gcc_assert (entry_edge != orig_entry_edge);
@@ -6566,6 +6568,12 @@ epilogue_done:
 	    else
 	      pending_edge_cold = e;
 	  }
+      
+      /* Save a pointer to the exit's predecessor BB for use in
+         inserting new BBs at the end of the function. Do this
+         after the call to split_block above which may split
+         the original exit pred.  */
+      exit_pred = EXIT_BLOCK_PTR->prev_bb;
 
       FOR_EACH_VEC_ELT (unconverted_simple_returns, i, e)
 	{
Index: function.h
===================================================================
--- function.h	(revision 198686)
+++ function.h	(working copy)
@@ -446,6 +446,11 @@ struct GTY(()) rtl_data {
      sched2) and is useful only if the port defines LEAF_REGISTERS.  */
   bool uses_only_leaf_regs;
 
+  /* Nonzero if the function being compiled has undergone hot/cold partitioning
+     (under flag_reorder_blocks_and_partition) and has at least one cold
+     block.  */
+  bool has_bb_partition;
+
   /* Like regs_ever_live, but 1 if a reg is set or clobbered from an
      asm.  Unlike regs_ever_live, elements of this array corresponding
      to eliminable regs (like the frame pointer) are set if an asm
Index: hw-doloop.c
===================================================================
--- hw-doloop.c	(revision 198686)
+++ hw-doloop.c	(working copy)
@@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops)
       else
 	bb->aux = NULL;
     }
-  cfg_layout_finalize ();
+  cfg_layout_finalize (false);
   clear_aux_for_blocks ();
   df_analyze ();
 }
Index: cfgcleanup.c
===================================================================
--- cfgcleanup.c	(revision 198686)
+++ cfgcleanup.c	(working copy)
@@ -452,9 +452,11 @@ try_forward_edges (int mode, basic_block b)
 	 really must be left untouched (they are required to make it safely
 	 across partition boundaries).  See the comments at the top of
 	 bb-reorder.c:partition_hot_cold_basic_blocks for complete
-	 details.  */
+	 details. These forwarding blocks may be removed once we
+         leave CFGLAYOUT mode, however, and are done with bb layout.  */
 
       if (first != EXIT_BLOCK_PTR
+          && current_ir_type() != IR_RTL_CFGRTL
 	  && find_reg_note (BB_END (first), REG_CROSSING_JUMP, NULL_RTX))
 	return false;
 
@@ -465,7 +467,8 @@ try_forward_edges (int mode, basic_block b)
 	  may_thread |= (target->flags & BB_MODIFIED) != 0;
 
 	  if (FORWARDER_BLOCK_P (target)
-	      && !(single_succ_edge (target)->flags & EDGE_CROSSING)
+	      && (!(single_succ_edge (target)->flags & EDGE_CROSSING)
+                  || current_ir_type() == IR_RTL_CFGRTL)
 	      && single_succ (target) != EXIT_BLOCK_PTR)
 	    {
 	      /* Bypass trivial infinite loops.  */
@@ -1864,7 +1867,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2,
      partition boundaries).  See the comments at the top of
      bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
 
-  if (flag_reorder_blocks_and_partition && reload_completed)
+  if (crtl->has_bb_partition && reload_completed)
     return false;
 
   /* Search backward through forwarder blocks.  We don't need to worry
@@ -2807,10 +2810,21 @@ try_optimize_cfg (int mode)
 	      df_analyze ();
 	    }
 
+	  if (changed)
+            {
+              /* Edge forwarding in particular can cause hot blocks previously
+                 reached by both hot and cold blocks to become dominated only
+                 by cold blocks. This will cause the verification below to fail,
+                 and lead to now cold code in the hot section. This is not easy
+                 to detect and fix during edge forwarding, and in some cases
+                 is only visible after newly unreachable blocks are deleted,
+                 which will be done in fixup_partitions.  */
+              fixup_partitions ();
+
 #ifdef ENABLE_CHECKING
-	  if (changed)
-	    verify_flow_info ();
+              verify_flow_info ();
 #endif
+            }
 
 	  changed_overall |= changed;
 	  first_pass = false;

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047)
@ 2013-05-08  5:08 Teresa Johnson
  2013-05-08  5:13 ` Teresa Johnson
  0 siblings, 1 reply; 35+ messages in thread
From: Teresa Johnson @ 2013-05-08  5:08 UTC (permalink / raw)
  To: gcc-patches, stevenb.gcc, davidxl, reply

I had sent this patch awhile back to address failures when using
-freorder-blocks-and-partition. Could one of the global maintainers 
review it? Without these fixes this option is broken for many codes.

The patch is largely identical to the version sent out before, but I
just updated my client and re-did the bootstrap and regression testing
on x86_64-unknown-linux-gnu. I also just rebuilt and tested cpu2006 (both int
and fp) with profile feedback and -freorder-blocks-and-partition.

Here is the description from the earlier mail:

----------------------
Revised patch that fixes failures encountered when enabling
-freorder-blocks-and-partition, including the failure reported in PR 53743.

This includes new verification code to ensure no cold blocks dominate hot
blocks contributed by Steven Bosscher.

I attempted to make the handling of partition updates through the optimization
passes much more consistent, removing a number of partial fixes in the code
stream in the process. The code to fixup partitions (including the BB_PARTITION
assignment, region crossing jump notes, and switch text section notes) is
now handled in a few centralized locations. For example, inside
rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers
don't need to attempt the fixup themselves.

For optimization passes that make adjustments to the cfg while in cfg layout
mode that are not easy to fix up incrementally, the new routine
fixup_partitions handles the cleanup globally. This does require calculation
of the dominance relation, however, as far as I can tell the routines which
now invoke this global fixup (try_optimize_cfg and commit_edge_insertions)
are invoked typically once (or a small number of times in the case of
try_optimize_cfg) per optimization pass. Additionally, I compared the
-ftime-report output for some large fdo compilations and saw only minimal
increases in the dominance computation times, which were only a tiny percent
of the overall compile time.

Additionally, I added a flag to the rtl_data structure to indicate whether
any partitioning was actually performed, so that optimizations which were
conservatively disabled whenever the flag_reorder_blocks_and_partition
is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less
conservative for functions where no partitions were formed (e.g. they are
completely hot).
----------------------

Ok for trunk?

Thanks,
Teresa

2013-05-07  Teresa Johnson  <tejohnson@google.com>
            Steven Bosscher  <steven@gcc.gnu.org>

	* bb-reorder.c (connect_traces): Only look for partitions and skip
        block copying if any blocks in function actually partitioned.
	(emit_barrier_after_bb): Handle insertion in non-cfglayout mode.
        (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure
        that no cold blocks dominate a hot block.
	(fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert
        as this is now done by force_nonfallthru_and_redirect.
	(add_reg_crossing_jump_notes): Handle the fact that some jumps may
        already be marked with region crossing note.
	(reorder_basic_blocks): Only need to verify partitions if any
        blocks in function actually partitioned.
	(insert_section_boundary_note): Only need to insert note if any
        blocks in function actually partitioned.
	(rest_of_handle_reorder_blocks): New cfg_layout_finalize
        parameter, and remove call to insert_section_boundary_note as this
        is now called via cfg_layout_finalize/fixup_reorder_chain.
        Invoke cleanup_cfg after exiting layout mode to enable additional
        cleanup.
	(duplicate_computed_gotos): New cfg_layout_finalize
        parameter.
	(partition_hot_cold_basic_blocks): Set flag indicating function
        has bb partitions.
	* bb-reorder.h: Declare insert_section_boundary_note and
        emit_barrier_after_bb, which are no longer static.
	* basic-block.h: Declare new function fixup_partitions.
	* cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary
        check for region crossing note.
	(fixup_partition_crossing): New function.
	(fixup_bb_partition): Ditto.
	(rtl_redirect_edge_and_branch): Fixup partition boundaries.
	(force_nonfallthru_and_redirect): Fixup partition boundaries,
        remove old code that tried to do this. Emit barrier correctly
        when we are in cfglayout mode.
	(rtl_split_edge): Correctly fixup partition boundaries.
	(commit_one_edge_insertion): Remove old code that tried to
        fixup region crossing edge since this is now handled in
        split_block, and set up insertion point correctly since
        block may now end in a jump.
	(commit_edge_insertions): Invoke fixup_partitions to sanitize partition
        boundaries after optimizations that modify cfg and before trying to
        verify the flow info.
	(fixup_partitions): New function.
	(rtl_verify_flow_info_1): Add verification that no cold bbs dominate
        hot bbs.
	(record_effective_endpoints): Remove region-crossing notes and set flag
        indicating that they need to be reinserted on exit from cfglayout mode.
	(outof_cfg_layout_mode): New cfg_layout_finalize parameter.
	(fixup_reorder_chain): Call insert_section_boundary_note if necessary.
        Remove old code that attempted to fixup region crossing note as
        this is now handled in force_nonfallthru_and_redirect.
	(duplicate_insn_chain): Don't duplicate switch section notes.
	(cfg_layout_finalize): Pass new parameter to fixup_reorder_chain.
	(rtl_can_remove_branch_p): Remove unnecessary check for region crossing
        note.
	* cfghooks.h (cfg_layout_finalize): New parameter.
	* modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize
        parameter.
	* ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert
        as this is now done by redirect_edge_and_branch_force.
	* function.c (thread_prologue_and_epilogue_insns): Insert new bb after
        barriers, new cfg_layout_finalize parameter, and don't store exit
        predecessor BB until after it is potentially split.
	* function.h (struct rtl_data): New flag has_bb_partition.
	* hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter.
	* cfgcleanup.c (try_forward_edges): Enable forwarding block removal
        across partition boundaries when in CFGRTL mode.
	(try_crossjump_to_edge): Only skip optimization if
        any blocks in function actually partitioned.
	(try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean
        up partitioning.

Index: bb-reorder.c
===================================================================
--- bb-reorder.c	(revision 198686)
+++ bb-reorder.c	(working copy)
@@ -1053,7 +1053,7 @@ connect_traces (int n_traces, struct trace *traces
   current_partition = BB_PARTITION (traces[0].first);
   two_passes = false;
 
-  if (flag_reorder_blocks_and_partition)
+  if (crtl->has_bb_partition)
     for (i = 0; i < n_traces && !two_passes; i++)
       if (BB_PARTITION (traces[0].first)
 	  != BB_PARTITION (traces[i].first))
@@ -1262,7 +1262,7 @@ connect_traces (int n_traces, struct trace *traces
 		      }
 		  }
 
-	      if (flag_reorder_blocks_and_partition)
+	      if (crtl->has_bb_partition)
 		try_copy = false;
 
 	      /* Copy tiny blocks always; copy larger blocks only when the
@@ -1380,13 +1380,14 @@ get_uncond_jump_length (void)
   return length;
 }
 
-/* Emit a barrier into the footer of BB.  */
+/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode.  */
 
-static void
+void
 emit_barrier_after_bb (basic_block bb)
 {
   rtx barrier = emit_barrier_after (BB_END (bb));
-  BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
+  if (current_ir_type () == IR_RTL_CFGLAYOUT)
+    BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier);
 }
 
 /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.
@@ -1462,18 +1463,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg
 {
   vec<edge> crossing_edges = vNULL;
   basic_block bb;
-  edge e;
-  edge_iterator ei;
+  edge e, e2;
+  edge_iterator ei, ei2;
+  unsigned int cold_bb_count = 0;
+  vec<basic_block> bbs_in_hot_partition = vNULL;
+  vec<basic_block> bbs_newly_hot = vNULL;
 
   /* Mark which partition (hot/cold) each basic block belongs in.  */
   FOR_EACH_BB (bb)
     {
       if (probably_never_executed_bb_p (cfun, bb))
-	BB_SET_PARTITION (bb, BB_COLD_PARTITION);
+        {
+          BB_SET_PARTITION (bb, BB_COLD_PARTITION);
+          cold_bb_count++;
+        }
       else
-	BB_SET_PARTITION (bb, BB_HOT_PARTITION);
+        {
+          BB_SET_PARTITION (bb, BB_HOT_PARTITION);
+          bbs_in_hot_partition.safe_push (bb);
+        }
     }
 
+  /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of
+     several different possibilities. One is that there are edge weight insanities
+     due to optimization phases that do not properly update basic block profile
+     counts. The second is that the entry of the function may not be hot, because
+     it is entered fewer times than the number of profile training runs, but there
+     is a loop inside the function that causes blocks within the function to be
+     above the threshold for hotness.  */
+  if (cold_bb_count)
+    {
+      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
+
+      if (dom_calculated_here)
+        calculate_dominance_info (CDI_DOMINATORS);
+
+      /* Keep examining hot bbs until we have either checked them all, or
+         re-marked all cold bbs hot.  */
+      while (! bbs_in_hot_partition.is_empty ()
+             && cold_bb_count)
+        {
+          basic_block dom_bb;
+
+          bb = bbs_in_hot_partition.pop ();
+          dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
+
+          /* If bb's immediate dominator is also hot then it is ok.  */
+          if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION)
+            continue;
+
+          /* We have a hot bb with an immediate dominator that is cold.
+             The dominator needs to be re-marked to hot.  */
+          BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION);
+          cold_bb_count--;
+
+          /* Now we need to examine newly-hot dom_bb to see if it is also
+             dominated by a cold bb.  */
+          bbs_in_hot_partition.safe_push (dom_bb);
+
+          /* We should also adjust any cold blocks that the newly-hot bb
+             feeds and see if it makes sense to re-mark those as hot as
+             well.  */
+          bbs_newly_hot.safe_push (dom_bb);
+          while (! bbs_newly_hot.is_empty ())
+            {
+              basic_block new_hot_bb = bbs_newly_hot.pop ();
+              /* Examine all successors of this newly-hot bb to see if they
+                 are cold and should be re-marked as hot.  */
+              FOR_EACH_EDGE (e, ei, new_hot_bb->succs)
+                {
+                  bool any_cold_preds = false;
+                  basic_block succ = e->dest;
+                  if (BB_PARTITION (succ) != BB_COLD_PARTITION)
+                    continue;
+                  /* Does this block have any cold predecessors now?  */
+                  FOR_EACH_EDGE (e2, ei2, succ->preds)
+                  {
+                    if (BB_PARTITION (e2->src) == BB_COLD_PARTITION)
+                      {
+                        any_cold_preds = true;
+                        break;
+                      }
+                  }
+                  if (any_cold_preds)
+                    continue;
+
+                  /* Here we have a successor of newly-hot bb that is cold
+                     but no longer has any cold precessessors. Since the original
+                     assignment of our newly-hot bb was incorrect, this successor's
+                     assignment as cold is also suspect. Go ahead and re-mark it
+                     as hot now too. Better heuristics may be in order here.  */
+                  BB_SET_PARTITION (succ, BB_HOT_PARTITION);
+                  cold_bb_count--;
+                  bbs_in_hot_partition.safe_push (succ);
+                  /* Examine this successor as a newly-hot bb.  */
+                  bbs_newly_hot.safe_push (succ);
+                }
+            }
+        }
+
+      if (dom_calculated_here)
+        free_dominance_info (CDI_DOMINATORS);
+    }
+
   /* The format of .gcc_except_table does not allow landing pads to
      be in a different partition as the throw.  Fix this by either
      moving or duplicating the landing pads.  */
@@ -1765,10 +1857,10 @@ fix_up_fall_thru_edges (void)
 		      new_bb->aux = cur_bb->aux;
 		      cur_bb->aux = new_bb;
 
-		      /* Make sure new fall-through bb is in same
-			 partition as bb it's falling through from.  */
+                      /* This is done by force_nonfallthru_and_redirect.  */
+		      gcc_assert (BB_PARTITION (new_bb)
+                                  == BB_PARTITION (cur_bb));
 
-		      BB_COPY_PARTITION (new_bb, cur_bb);
 		      single_succ_edge (new_bb)->flags |= EDGE_CROSSING;
 		    }
 		  else
@@ -2064,7 +2156,10 @@ add_reg_crossing_jump_notes (void)
   FOR_EACH_BB (bb)
     FOR_EACH_EDGE (e, ei, bb->succs)
       if ((e->flags & EDGE_CROSSING)
-	  && JUMP_P (BB_END (e->src)))
+	  && JUMP_P (BB_END (e->src))
+          /* Some notes were added during fix_up_fall_thru_edges, via
+             force_nonfallthru_and_redirect.  */
+          && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX))
 	add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
 }
 
@@ -2157,7 +2252,7 @@ reorder_basic_blocks (void)
       dump_flow_info (dump_file, dump_flags);
     }
 
-  if (flag_reorder_blocks_and_partition)
+  if (crtl->has_bb_partition)
     verify_hot_cold_block_grouping ();
 }
 
@@ -2169,13 +2264,13 @@ reorder_basic_blocks (void)
    encountering this note will make the compiler switch between the
    hot and cold text sections.  */
 
-static void
+void
 insert_section_boundary_note (void)
 {
   basic_block bb;
   int first_partition = 0;
 
-  if (!flag_reorder_blocks_and_partition)
+  if (!crtl->has_bb_partition)
     return;
 
   FOR_EACH_BB (bb)
@@ -2214,10 +2309,16 @@ rest_of_handle_reorder_blocks (void)
   FOR_EACH_BB (bb)
     if (bb->next_bb != EXIT_BLOCK_PTR)
       bb->aux = bb->next_bb;
-  cfg_layout_finalize ();
+  cfg_layout_finalize (true);
 
-  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
-  insert_section_boundary_note ();
+  /* Invoke the cleanup again once we are out of cfg layout mode
+     after committing the final bb layout above. This enables removal
+     of forwarding blocks across the hot/cold section boundary when
+     splitting is enabled that were necessary while in cfg layout
+     mode.  */
+  if (crtl->has_bb_partition)
+    cleanup_cfg (CLEANUP_EXPENSIVE);
+
   return 0;
 }
 
@@ -2358,7 +2459,7 @@ duplicate_computed_gotos (void)
     }
 
 done:
-  cfg_layout_finalize ();
+  cfg_layout_finalize (false);
 
   BITMAP_FREE (candidates);
   return 0;
@@ -2503,6 +2604,8 @@ partition_hot_cold_basic_blocks (void)
   if (!crossing_edges.exists ())
     return 0;
 
+  crtl->has_bb_partition = true;
+
   /* Make sure the source of any crossing edge ends in a jump and the
      destination of any crossing edge has a label.  */
   add_labels_and_missing_jumps (crossing_edges);
Index: bb-reorder.h
===================================================================
--- bb-reorder.h	(revision 198686)
+++ bb-reorder.h	(working copy)
@@ -35,4 +35,8 @@ extern struct target_bb_reorder *this_target_bb_re
 
 extern int get_uncond_jump_length (void);
 
+extern void insert_section_boundary_note (void);
+
+extern void emit_barrier_after_bb (basic_block bb);
+
 #endif
Index: basic-block.h
===================================================================
--- basic-block.h	(revision 198686)
+++ basic-block.h	(working copy)
@@ -796,6 +796,7 @@ extern basic_block force_nonfallthru_and_redirect
 extern bool contains_no_active_insn_p (const_basic_block);
 extern bool forwarder_block_p (const_basic_block);
 extern bool can_fallthru (basic_block, basic_block);
+extern void fixup_partitions (void);
 
 /* In cfgbuild.c.  */
 extern void find_many_sub_basic_blocks (sbitmap);
Index: cfgrtl.c
===================================================================
--- cfgrtl.c	(revision 198686)
+++ cfgrtl.c	(working copy)
@@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree.h"
 #include "hard-reg-set.h"
 #include "basic-block.h"
+#include "bb-reorder.h"
 #include "regs.h"
 #include "flags.h"
 #include "function.h"
@@ -65,11 +66,12 @@ along with GCC; see the file COPYING3.  If not see
    Only applicable if the CFG is in cfglayout mode.  */
 static GTY(()) rtx cfg_layout_function_footer;
 static GTY(()) rtx cfg_layout_function_header;
+static bool had_sec_boundary_notes;
 
 static rtx skip_insns_after_block (basic_block);
 static void record_effective_endpoints (void);
 static rtx label_for_bb (basic_block);
-static void fixup_reorder_chain (void);
+static void fixup_reorder_chain (bool finalize_reorder_blocks);
 
 void verify_insn_chain (void);
 static void fixup_fallthru_exit_predecessor (void);
@@ -981,8 +983,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc
      partition boundaries).  See  the comments at the top of
      bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
 
-  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
-      || BB_PARTITION (src) != BB_PARTITION (target))
+  if (BB_PARTITION (src) != BB_PARTITION (target))
     return NULL;
 
   /* We can replace or remove a complex jump only when we have exactly
@@ -1291,6 +1292,70 @@ redirect_branch_edge (edge e, basic_block target)
   return e;
 }
 
+/* Called when edge E has been redirected to a new destination,
+   in order to update the region crossing flag on the edge and
+   jump.  */
+
+static void
+fixup_partition_crossing (edge e, basic_block target)
+{
+  rtx note;
+
+  gcc_assert (e->dest == target);
+
+  if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR)
+    return;
+  /* If we redirected an existing edge, it may already be marked
+     crossing, even though the new src is missing a reg crossing note.
+     But make sure reg crossing note doesn't already exist before
+     inserting.  */
+  if (BB_PARTITION (e->src) != BB_PARTITION (target))
+    {
+      e->flags |= EDGE_CROSSING;
+      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
+      if (JUMP_P (BB_END (e->src))
+          && !note)
+        add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
+    }
+  else if (BB_PARTITION (e->src) == BB_PARTITION (target))
+    {
+      e->flags &= ~EDGE_CROSSING;
+      /* Remove the region crossing note from jump at end of
+         e->src if it exists.  */
+      note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX);
+      if (note)
+        remove_note (BB_END (e->src), note);
+    }
+}
+
+/* Called when block BB has been reassigned to a different partition,
+   to ensure that the region crossing attributes are updated.  */
+
+static void
+fixup_bb_partition (basic_block bb)
+{
+  edge e;
+  edge_iterator ei;
+
+  /* Now need to make bb's pred edges non-region crossing.  */
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    {
+      fixup_partition_crossing (e, e->dest);
+    }
+
+  /* Possibly need to make bb's successor edges region crossing,
+     or remove stale region crossing.  */
+  FOR_EACH_EDGE (e, ei, bb->succs)
+    {
+      if ((e->flags & EDGE_FALLTHRU)
+          && BB_PARTITION (bb) != BB_PARTITION (e->dest)
+          && e->dest != EXIT_BLOCK_PTR)
+        force_nonfallthru (e);
+      else
+        fixup_partition_crossing (e, e->dest);
+    }
+}
+
 /* Attempt to change code to redirect edge E to TARGET.  Don't do that on
    expense of adding new instructions or reordering basic blocks.
 
@@ -1307,16 +1372,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block
 {
   edge ret;
   basic_block src = e->src;
+  basic_block dest = e->dest;
 
   if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
     return NULL;
 
-  if (e->dest == target)
+  if (dest == target)
     return e;
 
   if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL)
     {
       df_set_bb_dirty (src);
+      fixup_partition_crossing (ret, target);
       return ret;
     }
 
@@ -1325,6 +1392,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block
     return NULL;
 
   df_set_bb_dirty (src);
+  fixup_partition_crossing (ret, target);
   return ret;
 }
 
@@ -1492,18 +1560,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
       /* Make sure new block ends up in correct hot/cold section.  */
 
       BB_COPY_PARTITION (jump_block, e->src);
-      if (flag_reorder_blocks_and_partition
-	  && targetm_common.have_named_sections
-	  && JUMP_P (BB_END (jump_block))
-	  && !any_condjump_p (BB_END (jump_block))
-	  && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING))
-	add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX);
 
       /* Wire edge in.  */
       new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU);
       new_edge->probability = probability;
       new_edge->count = count;
 
+      /* If e->src was previously region crossing, it no longer is
+         and the reg crossing note should be removed.  */
+      fixup_partition_crossing (new_edge, jump_block);
+
       /* Redirect old edge.  */
       redirect_edge_pred (e, jump_block);
       e->probability = REG_BR_PROB_BASE;
@@ -1559,13 +1625,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc
       LABEL_NUSES (label)++;
     }
 
-  emit_barrier_after (BB_END (jump_block));
+  /* We might be in cfg layout mode, and if so, the following routine will
+     insert the barrier correctly.  */
+  emit_barrier_after_bb (jump_block);
   redirect_edge_succ_nodup (e, target);
 
   if (abnormal_edge_flags)
     make_edge (src, target, abnormal_edge_flags);
 
   df_mark_solutions_dirty ();
+  fixup_partition_crossing (e, target);
   return new_bb;
 }
 
@@ -1664,7 +1733,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU
 static basic_block
 rtl_split_edge (edge edge_in)
 {
-  basic_block bb;
+  basic_block bb, new_bb;
   rtx before;
 
   /* Abnormal edges cannot be split.  */
@@ -1697,12 +1766,26 @@ rtl_split_edge (edge edge_in)
   else
     {
       bb = create_basic_block (before, NULL, edge_in->dest->prev_bb);
-      /* ??? Why not edge_in->dest->prev_bb here?  */
-      BB_COPY_PARTITION (bb, edge_in->dest);
+      if (edge_in->src == ENTRY_BLOCK_PTR)
+        BB_COPY_PARTITION (bb, edge_in->dest);
+      else
+        /* Put the split bb into the src partition, to avoid creating
+           a situation where a cold bb dominates a hot bb, in the case
+           where src is cold and dest is hot. The src will dominate
+           the new bb (whereas it might not have dominated dest).  */
+        BB_COPY_PARTITION (bb, edge_in->src);
     }
 
   make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU);
 
+  /* Can't allow a region crossing edge to be fallthrough.  */
+  if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest)
+      && edge_in->dest != EXIT_BLOCK_PTR)
+    {
+      new_bb = force_nonfallthru (single_succ_edge (bb));
+      gcc_assert (!new_bb);
+    }
+
   /* For non-fallthru edges, we must adjust the predecessor's
      jump instruction to target our new block.  */
   if ((edge_in->flags & EDGE_FALLTHRU) == 0)
@@ -1815,17 +1898,13 @@ commit_one_edge_insertion (edge e)
   else
     {
       bb = split_edge (e);
-      after = BB_END (bb);
 
-      if (flag_reorder_blocks_and_partition
-	  && targetm_common.have_named_sections
-	  && e->src != ENTRY_BLOCK_PTR
-	  && BB_PARTITION (e->src) == BB_COLD_PARTITION
-	  && !(e->flags & EDGE_CROSSING)
-	  && JUMP_P (after)
-	  && !any_condjump_p (after)
-	  && (single_succ_edge (bb)->flags & EDGE_CROSSING))
-	add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX);
+      /* If e crossed a partition boundary, we needed to make bb end in
+         a region-crossing jump, even though it was originally fallthru.  */
+      if (JUMP_P (BB_END (bb)))
+	before = BB_END (bb);
+      else
+        after = BB_END (bb);
     }
 
   /* Now that we've found the spot, do the insertion.  */
@@ -1865,6 +1944,14 @@ commit_edge_insertions (void)
 {
   basic_block bb;
 
+  /* Optimization passes that invoke this routine can cause hot blocks
+     previously reached by both hot and cold blocks to become dominated only
+     by cold blocks. This will cause the verification below to fail,
+     and lead to now cold code in the hot section. In some cases this
+     may only be visible after newly unreachable blocks are deleted,
+     which will be done by fixup_partitions.  */
+  fixup_partitions ();
+
 #ifdef ENABLE_CHECKING
   verify_flow_info ();
 #endif
@@ -2058,7 +2145,75 @@ get_last_bb_insn (basic_block bb)
 
   return end;
 }
-\f
+
+/* Perform cleanup on the hot/cold bb partitioning after optimization
+   passes that modify the cfg.  */
+
+void
+fixup_partitions (void)
+{
+  basic_block bb;
+
+  if (!crtl->has_bb_partition)
+    return;
+
+  /* Delete any blocks that became unreachable and weren't
+     already cleaned up, for example during edge forwarding
+     and convert_jumps_to_returns. This will expose more
+     opportunities for fixing the partition boundaries here.
+     Also, the calculation of the dominance graph during verification
+     will assert if there are unreachable nodes.  */
+  delete_unreachable_blocks ();
+
+  /* If there are partitions, do a sanity check on them: A basic block in
+     a cold partition cannot dominate a basic block in a hot partition.
+     Fixup any that now violate this requirement, as a result of edge
+     forwarding and unreachable block deletion.  */
+  vec<basic_block> bbs_in_cold_partition = vNULL;
+  vec<basic_block> bbs_to_fix = vNULL;
+  FOR_EACH_BB (bb)
+    if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
+      bbs_in_cold_partition.safe_push (bb);
+  if (! bbs_in_cold_partition.is_empty  ())
+    {
+      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
+      basic_block son;
+
+      if (dom_calculated_here)
+        calculate_dominance_info (CDI_DOMINATORS);
+
+      while (! bbs_in_cold_partition.is_empty  ())
+        {
+          bb = bbs_in_cold_partition.pop ();
+          /* If bb is not yet cold (because it was added below as
+             a block dominated by a cold bb) then mark it cold here.  */
+          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
+            {
+              BB_SET_PARTITION (bb, BB_COLD_PARTITION);
+              bbs_to_fix.safe_push (bb);
+            }
+          /* Any blocks dominated by a block in the cold section
+             must also be cold.  */
+          for (son = first_dom_son (CDI_DOMINATORS, bb);
+               son;
+               son = next_dom_son (CDI_DOMINATORS, son))
+            bbs_in_cold_partition.safe_push (son);
+        }
+
+      if (dom_calculated_here)
+        free_dominance_info (CDI_DOMINATORS);
+    }
+
+  /* Do the partition fixup after all necessary blocks have been converted to
+     cold, so that we only update the region crossings the minimum number of
+     places, which can require forcing edges to be non fallthru.  */
+  while (! bbs_to_fix.is_empty ())
+    {
+      bb = bbs_to_fix.pop ();
+      fixup_bb_partition (bb);
+    }
+}
+
 /* Verify the CFG and RTL consistency common for both underlying RTL and
    cfglayout RTL.
 
@@ -2082,6 +2237,7 @@ rtl_verify_flow_info_1 (void)
   rtx x;
   int err = 0;
   basic_block bb;
+  bool have_partitions = false;
 
   /* Check the general integrity of the basic blocks.  */
   FOR_EACH_BB_REVERSE (bb)
@@ -2199,6 +2355,8 @@ rtl_verify_flow_info_1 (void)
 
 	  if (e->flags & EDGE_ABNORMAL)
 	    n_abnormal++;
+
+          have_partitions |= is_crossing;
 	}
 
       if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX))
@@ -2323,6 +2481,40 @@ rtl_verify_flow_info_1 (void)
 	  }
     }
 
+  /* If there are partitions, do a sanity check on them: A basic block in
+     a cold partition cannot dominate a basic block in a hot partition.  */
+  vec<basic_block> bbs_in_cold_partition = vNULL;
+  if (have_partitions && !err)
+    FOR_EACH_BB (bb)
+      if ((BB_PARTITION (bb) == BB_COLD_PARTITION))
+        bbs_in_cold_partition.safe_push (bb);
+  if (! bbs_in_cold_partition.is_empty ())
+    {
+      bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS);
+      basic_block son;
+
+      if (dom_calculated_here)
+        calculate_dominance_info (CDI_DOMINATORS);
+
+      while (! bbs_in_cold_partition.is_empty ())
+        {
+          bb = bbs_in_cold_partition.pop ();
+          if ((BB_PARTITION (bb) != BB_COLD_PARTITION))
+            {
+              error ("non-cold basic block %d dominated "
+                     "by a block in the cold partition", bb->index);
+              err = 1;
+            }
+          for (son = first_dom_son (CDI_DOMINATORS, bb);
+               son;
+               son = next_dom_son (CDI_DOMINATORS, son))
+            bbs_in_cold_partition.safe_push (son);
+        }
+
+      if (dom_calculated_here)
+        free_dominance_info (CDI_DOMINATORS);
+    }
+
   /* Clean up.  */
   return err;
 }
@@ -2996,14 +3188,41 @@ record_effective_endpoints (void)
   else
     cfg_layout_function_header = NULL_RTX;
 
+  had_sec_boundary_notes = false;
+
   next_insn = get_insns ();
   FOR_EACH_BB (bb)
     {
       rtx end;
 
       if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb))
-	BB_HEADER (bb) = unlink_insn_chain (next_insn,
-					      PREV_INSN (BB_HEAD (bb)));
+        {
+          /* Rather than try to keep section boundary notes incrementally
+             up-to-date through cfg layout optimizations, simply remove them
+             and flag that they should be re-inserted when exiting
+             cfg layout mode.  */
+          rtx check_insn = next_insn;
+          while (check_insn)
+            {
+              if (NOTE_P (check_insn)
+                  && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS)
+              {
+                had_sec_boundary_notes |= true;
+                /* Remove note from chain. Grab new next_insn first.  */
+                if (next_insn == check_insn)
+                  next_insn = NEXT_INSN (check_insn);
+                /* Delete note.  */
+                delete_insn (check_insn);
+                /* There will only be one.  */
+                break;
+              }
+              check_insn = NEXT_INSN (check_insn);
+            }
+          /* If we still have header instructions left after above loop.  */
+          if (next_insn != BB_HEAD (bb))
+            BB_HEADER (bb) = unlink_insn_chain (next_insn,
+                                                PREV_INSN (BB_HEAD (bb)));
+        }
       end = skip_insns_after_block (bb);
       if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end)
 	BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end);
@@ -3031,7 +3250,7 @@ outof_cfg_layout_mode (void)
     if (bb->next_bb != EXIT_BLOCK_PTR)
       bb->aux = bb->next_bb;
 
-  cfg_layout_finalize ();
+  cfg_layout_finalize (false);
 
   return 0;
 }
@@ -3151,10 +3370,13 @@ relink_block_chain (bool stay_in_cfglayout_mode)
 }
 \f
 
-/* Given a reorder chain, rearrange the code to match.  */
+/* Given a reorder chain, rearrange the code to match. If
+   this is called when we will FINALIZE_REORDER_BLOCKS, or when
+   section boundary notes were removed on entry to cfg layout
+   mode, insert section boundary notes here.  */
 
 static void
-fixup_reorder_chain (void)
+fixup_reorder_chain (bool finalize_reorder_blocks)
 {
   basic_block bb;
   rtx insn = NULL;
@@ -3181,7 +3403,7 @@ static void
 	  PREV_INSN (BB_HEADER (bb)) = insn;
 	  insn = BB_HEADER (bb);
 	  while (NEXT_INSN (insn))
-	    insn = NEXT_INSN (insn);
+            insn = NEXT_INSN (insn);
 	}
       if (insn)
 	NEXT_INSN (insn) = BB_HEAD (bb);
@@ -3206,6 +3428,11 @@ static void
     insn = NEXT_INSN (insn);
 
   set_last_insn (insn);
+
+  /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes.  */
+  if (had_sec_boundary_notes || finalize_reorder_blocks)
+    insert_section_boundary_note ();
+
 #ifdef ENABLE_CHECKING
   verify_insn_chain ();
 #endif
@@ -3218,7 +3445,7 @@ static void
       edge e_fall, e_taken, e;
       rtx bb_end_insn;
       rtx ret_label = NULL_RTX;
-      basic_block nb, src_bb;
+      basic_block nb;
       edge_iterator ei;
 
       if (EDGE_COUNT (bb->succs) == 0)
@@ -3353,7 +3580,6 @@ static void
       /* We got here if we need to add a new jump insn. 
 	 Note force_nonfallthru can delete E_FALL and thus we have to
 	 save E_FALL->src prior to the call to force_nonfallthru.  */
-      src_bb = e_fall->src;
       nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label);
       if (nb)
 	{
@@ -3361,17 +3587,6 @@ static void
 	  bb->aux = nb;
 	  /* Don't process this new block.  */
 	  bb = nb;
-
-	  /* Make sure new bb is tagged for correct section (same as
-	     fall-thru source, since you cannot fall-thru across
-	     section boundaries).  */
-	  BB_COPY_PARTITION (src_bb, single_pred (bb));
-	  if (flag_reorder_blocks_and_partition
-	      && targetm_common.have_named_sections
-	      && JUMP_P (BB_END (bb))
-	      && !any_condjump_p (BB_END (bb))
-	      && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING))
-	    add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX);
 	}
     }
 
@@ -3671,10 +3886,11 @@ duplicate_insn_chain (rtx from, rtx to)
 	    case NOTE_INSN_FUNCTION_BEG:
 	      /* There is always just single entry to function.  */
 	    case NOTE_INSN_BASIC_BLOCK:
+              /* We should only switch text sections once.  */
+	    case NOTE_INSN_SWITCH_TEXT_SECTIONS:
 	      break;
 
 	    case NOTE_INSN_EPILOGUE_BEG:
-	    case NOTE_INSN_SWITCH_TEXT_SECTIONS:
 	      emit_note_copy (insn);
 	      break;
 
@@ -3786,10 +4002,13 @@ break_superblocks (void)
 }
 
 /* Finalize the changes: reorder insn list according to the sequence specified
-   by aux pointers, enter compensation code, rebuild scope forest.  */
+   by aux pointers, enter compensation code, rebuild scope forest. If
+   this is called when we will FINALIZE_REORDER_BLOCKS, indicate that
+   to fixup_reorder_chain so that it can insert the proper switch text
+   section notes.  */
 
 void
-cfg_layout_finalize (void)
+cfg_layout_finalize (bool finalize_reorder_blocks)
 {
 #ifdef ENABLE_CHECKING
   verify_flow_info ();
@@ -3802,7 +4021,7 @@ void
 #endif
       )
     fixup_fallthru_exit_predecessor ();
-  fixup_reorder_chain ();
+  fixup_reorder_chain (finalize_reorder_blocks);
 
   rebuild_jump_labels (get_insns ());
   delete_dead_jumptables ();
@@ -4486,8 +4705,7 @@ rtl_can_remove_branch_p (const_edge e)
   if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
     return false;
 
-  if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX)
-      || BB_PARTITION (src) != BB_PARTITION (target))
+  if (BB_PARTITION (src) != BB_PARTITION (target))
     return false;
 
   if (!onlyjump_p (insn)
Index: cfghooks.h
===================================================================
--- cfghooks.h	(revision 198686)
+++ cfghooks.h	(working copy)
@@ -206,7 +206,7 @@ extern void copy_bbs (basic_block *, unsigned, bas
 void account_profile_record (struct profile_record *, int);
 
 extern void cfg_layout_initialize (unsigned int);
-extern void cfg_layout_finalize (void);
+extern void cfg_layout_finalize (bool);
 
 /* Hooks containers.  */
 extern struct cfg_hooks gimple_cfg_hooks;
@@ -220,4 +220,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi
 extern void gimple_register_cfg_hooks (void);
 extern struct cfg_hooks get_cfg_hooks (void);
 extern void set_cfg_hooks (struct cfg_hooks);
-
Index: modulo-sched.c
===================================================================
--- modulo-sched.c	(revision 198686)
+++ modulo-sched.c	(working copy)
@@ -3346,7 +3346,7 @@ rest_of_handle_sms (void)
     if (bb->next_bb != EXIT_BLOCK_PTR)
       bb->aux = bb->next_bb;
   free_dominance_info (CDI_DOMINATORS);
-  cfg_layout_finalize ();
+  cfg_layout_finalize (false);
 #endif /* INSN_SCHEDULING */
   return 0;
 }
Index: ifcvt.c
===================================================================
--- ifcvt.c	(revision 198686)
+++ ifcvt.c	(working copy)
@@ -3905,10 +3905,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg
   if (new_bb)
     {
       df_bb_replace (then_bb_index, new_bb);
-      /* Since the fallthru edge was redirected from test_bb to new_bb,
-         we need to ensure that new_bb is in the same partition as
-         test bb (you can not fall through across section boundaries).  */
-      BB_COPY_PARTITION (new_bb, test_bb);
+      /* This should have been done above via force_nonfallthru_and_redirect
+         (possibly called from redirect_edge_and_branch_force).  */
+      gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb));
     }
 
   num_true_changes++;
Index: function.c
===================================================================
--- function.c	(revision 198686)
+++ function.c	(working copy)
@@ -6270,8 +6270,10 @@ thread_prologue_and_epilogue_insns (void)
 		    break;
 		if (e)
 		  {
-		    copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)),
-						  NULL_RTX, e->src);
+                    /* Make sure we insert after any barriers.  */
+                    rtx end = get_last_bb_insn (e->src);
+                    copy_bb = create_basic_block (NEXT_INSN (end),
+                                                  NULL_RTX, e->src);
 		    BB_COPY_PARTITION (copy_bb, e->src);
 		  }
 		else
@@ -6496,7 +6498,7 @@ thread_prologue_and_epilogue_insns (void)
 	if (cur_bb->index >= NUM_FIXED_BLOCKS
 	    && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS)
 	  cur_bb->aux = cur_bb->next_bb;
-      cfg_layout_finalize ();
+      cfg_layout_finalize (false);
     }
 
 epilogue_done:
@@ -6538,7 +6540,7 @@ epilogue_done:
       basic_block simple_return_block_cold = NULL;
       edge pending_edge_hot = NULL;
       edge pending_edge_cold = NULL;
-      basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb;
+      basic_block exit_pred;
       int i;
 
       gcc_assert (entry_edge != orig_entry_edge);
@@ -6566,6 +6568,12 @@ epilogue_done:
 	    else
 	      pending_edge_cold = e;
 	  }
+      
+      /* Save a pointer to the exit's predecessor BB for use in
+         inserting new BBs at the end of the function. Do this
+         after the call to split_block above which may split
+         the original exit pred.  */
+      exit_pred = EXIT_BLOCK_PTR->prev_bb;
 
       FOR_EACH_VEC_ELT (unconverted_simple_returns, i, e)
 	{
Index: function.h
===================================================================
--- function.h	(revision 198686)
+++ function.h	(working copy)
@@ -446,6 +446,11 @@ struct GTY(()) rtl_data {
      sched2) and is useful only if the port defines LEAF_REGISTERS.  */
   bool uses_only_leaf_regs;
 
+  /* Nonzero if the function being compiled has undergone hot/cold partitioning
+     (under flag_reorder_blocks_and_partition) and has at least one cold
+     block.  */
+  bool has_bb_partition;
+
   /* Like regs_ever_live, but 1 if a reg is set or clobbered from an
      asm.  Unlike regs_ever_live, elements of this array corresponding
      to eliminable regs (like the frame pointer) are set if an asm
Index: hw-doloop.c
===================================================================
--- hw-doloop.c	(revision 198686)
+++ hw-doloop.c	(working copy)
@@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops)
       else
 	bb->aux = NULL;
     }
-  cfg_layout_finalize ();
+  cfg_layout_finalize (false);
   clear_aux_for_blocks ();
   df_analyze ();
 }
Index: cfgcleanup.c
===================================================================
--- cfgcleanup.c	(revision 198686)
+++ cfgcleanup.c	(working copy)
@@ -452,9 +452,11 @@ try_forward_edges (int mode, basic_block b)
 	 really must be left untouched (they are required to make it safely
 	 across partition boundaries).  See the comments at the top of
 	 bb-reorder.c:partition_hot_cold_basic_blocks for complete
-	 details.  */
+	 details. These forwarding blocks may be removed once we
+         leave CFGLAYOUT mode, however, and are done with bb layout.  */
 
       if (first != EXIT_BLOCK_PTR
+          && current_ir_type() != IR_RTL_CFGRTL
 	  && find_reg_note (BB_END (first), REG_CROSSING_JUMP, NULL_RTX))
 	return false;
 
@@ -465,7 +467,8 @@ try_forward_edges (int mode, basic_block b)
 	  may_thread |= (target->flags & BB_MODIFIED) != 0;
 
 	  if (FORWARDER_BLOCK_P (target)
-	      && !(single_succ_edge (target)->flags & EDGE_CROSSING)
+	      && (!(single_succ_edge (target)->flags & EDGE_CROSSING)
+                  || current_ir_type() == IR_RTL_CFGRTL)
 	      && single_succ (target) != EXIT_BLOCK_PTR)
 	    {
 	      /* Bypass trivial infinite loops.  */
@@ -1864,7 +1867,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2,
      partition boundaries).  See the comments at the top of
      bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
 
-  if (flag_reorder_blocks_and_partition && reload_completed)
+  if (crtl->has_bb_partition && reload_completed)
     return false;
 
   /* Search backward through forwarder blocks.  We don't need to worry
@@ -2807,10 +2810,21 @@ try_optimize_cfg (int mode)
 	      df_analyze ();
 	    }
 
+	  if (changed)
+            {
+              /* Edge forwarding in particular can cause hot blocks previously
+                 reached by both hot and cold blocks to become dominated only
+                 by cold blocks. This will cause the verification below to fail,
+                 and lead to now cold code in the hot section. This is not easy
+                 to detect and fix during edge forwarding, and in some cases
+                 is only visible after newly unreachable blocks are deleted,
+                 which will be done in fixup_partitions.  */
+              fixup_partitions ();
+
 #ifdef ENABLE_CHECKING
-	  if (changed)
-	    verify_flow_info ();
+              verify_flow_info ();
 #endif
+            }
 
 	  changed_overall |= changed;
 	  first_pass = false;

--
This patch is available for review at http://codereview.appspot.com/6823047

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2013-05-12 14:37 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-15 20:10 Fix PR 53743 and other -freorder-blocks-and-partition failures (issue6823047) Teresa Johnson
2012-11-26 15:55 ` Teresa Johnson
2012-11-26 16:25   ` Christophe Lyon
2012-11-26 20:20     ` Teresa Johnson
2012-11-26 20:29       ` Teresa Johnson
2012-11-26 20:43       ` Jack Howarth
2012-11-26 20:52         ` Teresa Johnson
2012-11-28 15:49           ` Christophe Lyon
2012-11-28 15:57             ` Teresa Johnson
2012-11-28 17:03               ` Christophe Lyon
     [not found]             ` <CAAe5K+UOyQrDyg=pY7za9YRK=8-3dVVsfcMuJdsJp4w2X6BaJg@mail.gmail.com>
2013-01-31 14:51               ` Christophe Lyon
2013-02-05 15:45                 ` Teresa Johnson
2013-05-08  5:08 Teresa Johnson
2013-05-08  5:13 ` Teresa Johnson
2013-05-09 21:42   ` Diego Novillo
2013-05-09 22:41     ` Steven Bosscher
2013-05-09 22:57       ` Xinliang David Li
2013-05-10 12:07         ` Steven Bosscher
2013-05-10 15:54           ` Teresa Johnson
2013-05-10 21:01             ` Steven Bosscher
2013-05-10 21:10               ` Jan Hubicka
2013-05-10 21:14                 ` Steven Bosscher
2013-05-11  3:21               ` Teresa Johnson
2013-05-11 11:19                 ` Jan Hubicka
2013-05-11 11:51                   ` Steven Bosscher
2013-05-11 12:28                     ` Jan Hubicka
2013-05-11 14:43                   ` Teresa Johnson
2013-05-11 11:45                 ` Steven Bosscher
2013-05-11 14:39                   ` Teresa Johnson
2013-05-11 15:03                     ` Steven Bosscher
2013-05-12 14:37                       ` Teresa Johnson
2013-05-10 14:29     ` Teresa Johnson
2013-05-10  4:43   ` Jeff Law
2013-05-10 11:52     ` Jan Hubicka
2013-05-10 14:50       ` Teresa Johnson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).