public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 1/4] bb-reorder: Split out STC
  2015-09-23 22:09 [PATCH 0/4] bb-reorder: Add the "simple" algorithm Segher Boessenkool
@ 2015-09-23 22:09 ` Segher Boessenkool
  2015-09-24 16:12   ` Steven Bosscher
  2015-09-23 22:10 ` [PATCH 2/4] bb-reorder: Add the "simple" algorithm Segher Boessenkool
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 24+ messages in thread
From: Segher Boessenkool @ 2015-09-23 22:09 UTC (permalink / raw)
  To: gcc-patches; +Cc: Segher Boessenkool

This first patch simply factors code a little bit.


2015-09-23   Segher Boessenkool  <segher@kernel.crashing.org>

	* bb-reorder.c (reorder_basic_blocks_software_trace_cache): New
	function, factored out from ...
	(reorder_basic_blocks): ... here.

---
 gcc/bb-reorder.c | 29 ++++++++++++++++++-----------
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
index 2110bd2..725cdc3 100644
--- a/gcc/bb-reorder.c
+++ b/gcc/bb-reorder.c
@@ -2226,24 +2226,15 @@ update_crossing_jump_flags (void)
 	}
 }
 
-/* Reorder basic blocks.  The main entry point to this file.  FLAGS is
-   the set of flags to pass to cfg_layout_initialize().  */
+/* Reorder basic blocks using the software trace cache (STC) algorithm.  */
 
 static void
-reorder_basic_blocks (void)
+reorder_basic_blocks_software_trace_cache (void)
 {
   int n_traces;
   int i;
   struct trace *traces;
 
-  gcc_assert (current_ir_type () == IR_RTL_CFGLAYOUT);
-
-  if (n_basic_blocks_for_fn (cfun) <= NUM_FIXED_BLOCKS + 1)
-    return;
-
-  set_edge_can_fallthru_flag ();
-  mark_dfs_back_edges ();
-
   /* We are estimating the length of uncond jump insn only once since the code
      for getting the insn length always returns the minimal length now.  */
   if (uncond_jump_length == 0)
@@ -2268,6 +2259,22 @@ reorder_basic_blocks (void)
   connect_traces (n_traces, traces);
   FREE (traces);
   FREE (bbd);
+}
+
+/* Reorder basic blocks.  The main entry point to this file.  */
+
+static void
+reorder_basic_blocks (void)
+{
+  gcc_assert (current_ir_type () == IR_RTL_CFGLAYOUT);
+
+  if (n_basic_blocks_for_fn (cfun) <= NUM_FIXED_BLOCKS + 1)
+    return;
+
+  set_edge_can_fallthru_flag ();
+  mark_dfs_back_edges ();
+
+  reorder_basic_blocks_software_trace_cache ();
 
   relink_block_chain (/*stay_in_cfglayout_mode=*/true);
 
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 0/4] bb-reorder: Add the "simple" algorithm
@ 2015-09-23 22:09 Segher Boessenkool
  2015-09-23 22:09 ` [PATCH 1/4] bb-reorder: Split out STC Segher Boessenkool
                   ` (4 more replies)
  0 siblings, 5 replies; 24+ messages in thread
From: Segher Boessenkool @ 2015-09-23 22:09 UTC (permalink / raw)
  To: gcc-patches; +Cc: Segher Boessenkool

The current basic block reordering always uses the "software trace cache"
algorithm.  That has a few problems:

1) It increases code size substantially; this makes it not suitable for
-O1 or -Os, and not at all for some architectures;
2) but it is enabled for -Os and all targets;
3) and -O1 gets nothing, resulting in pretty jumpy code.

This patch set adds a new simple greedy basic block reordering algorithm,
adds a flag -freorder-blocks-algorithm=, and sets things up so that -O1
and -Os use the simple algo.

Split into a few pieces for easier review.  Every intermediate step works.

Bootstrapped and tested on powerpc64-linux.  There are two new fails in
guality testresults for -Os.

Is this okay for mainline?


Segher


Segher Boessenkool (4):
  bb-reorder: Split out STC
  bb-reorder: Add the "simple" algorithm
  bb-reorder: Add -freorder-blocks-algorithm= and wire it up
  bb-reorder: Documentation updates

 gcc/bb-reorder.c    | 196 +++++++++++++++++++++++++++++++++++++++++++++++++---
 gcc/common.opt      |  13 ++++
 gcc/doc/invoke.texi |  23 ++++--
 gcc/flag-types.h    |   7 ++
 gcc/opts.c          |   4 +-
 5 files changed, 227 insertions(+), 16 deletions(-)

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 2/4] bb-reorder: Add the "simple" algorithm
  2015-09-23 22:09 [PATCH 0/4] bb-reorder: Add the "simple" algorithm Segher Boessenkool
  2015-09-23 22:09 ` [PATCH 1/4] bb-reorder: Split out STC Segher Boessenkool
@ 2015-09-23 22:10 ` Segher Boessenkool
  2015-09-24 11:13   ` Bernd Schmidt
                     ` (2 more replies)
  2015-09-23 23:01 ` [PATCH 3/4] bb-reorder: Add -freorder-blocks-algorithm= and wire it up Segher Boessenkool
                   ` (2 subsequent siblings)
  4 siblings, 3 replies; 24+ messages in thread
From: Segher Boessenkool @ 2015-09-23 22:10 UTC (permalink / raw)
  To: gcc-patches; +Cc: Segher Boessenkool

This is the meat of this series: a new algorithm to do basic block
reordering.  It uses the simple greedy approach to maximum weighted
matching, where the weights are the predicted execution frequency of
the edges.  This always finds a solution that is within a factor two
of optimal, if you disregard loops (which we cannot allow) and the
complications of block partitioning.


2015-09-23   Segher Boessenkool  <segher@kernel.crashing.org>

	* bb-reorder.c (reorder_basic_blocks_software_trace_cache): Print
	a header to the dump file.
	(edge_order): New function.
	(reorder_basic_blocks_simple): New function.
	(reorder_basic_blocks): Choose between the STC and the simple
	algorithms (always choose the former).

---
 gcc/bb-reorder.c | 160 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 159 insertions(+), 1 deletion(-)

diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
index 725cdc3..40e9e50 100644
--- a/gcc/bb-reorder.c
+++ b/gcc/bb-reorder.c
@@ -2231,6 +2231,9 @@ update_crossing_jump_flags (void)
 static void
 reorder_basic_blocks_software_trace_cache (void)
 {
+  if (dump_file)
+    fprintf (dump_file, "\nReordering with the STC algorithm.\n\n");
+
   int n_traces;
   int i;
   struct trace *traces;
@@ -2261,6 +2264,158 @@ reorder_basic_blocks_software_trace_cache (void)
   FREE (bbd);
 }
 
+/* Return true if edge E1 is more desirable as a fallthrough edge than
+   edge E2 is.  */
+
+static bool
+edge_order (edge e1, edge e2)
+{
+  return EDGE_FREQUENCY (e1) > EDGE_FREQUENCY (e2);
+}
+
+/* Reorder basic blocks using the "simple" algorithm.  This tries to
+   maximize the dynamic number of branches that are fallthrough, without
+   copying instructions.  The algorithm is greedy, looking at the most
+   frequently executed branch first.  */
+
+static void
+reorder_basic_blocks_simple (void)
+{
+  if (dump_file)
+    fprintf (dump_file, "\nReordering with the \"simple\" algorithm.\n\n");
+
+  edge *edges = new edge[2 * n_basic_blocks_for_fn (cfun)];
+
+  /* First, collect all edges that can be optimized by reordering blocks:
+     simple jumps and conditional jumps, as well as the function entry edge.  */
+
+  int n = 0;
+  edges[n++] = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0);
+
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      rtx_insn *end = BB_END (bb);
+
+      if (computed_jump_p (end) || tablejump_p (end, NULL, NULL))
+	continue;
+
+      if (any_condjump_p (end))
+	{
+	  edges[n++] = EDGE_SUCC (bb, 0);
+	  edges[n++] = EDGE_SUCC (bb, 1);
+	}
+      else if (single_succ_p (bb))
+	edges[n++] = EDGE_SUCC (bb, 0);
+    }
+
+  /* Sort the edges, the most desirable first.  */
+
+  std::stable_sort (edges, edges + n, edge_order);
+
+  /* Now decide which of those edges to make fallthrough edges.  We set
+     BB_VISITED if a block already has a fallthrough successor assigned
+     to it.  We make ->AUX of an endpoint point to the opposite endpoint
+     of a sequence of blocks that fall through, and ->AUX will be NULL
+     for a block that is in such a sequence but not an endpoint anymore.
+
+     To start with, everything points to itself, nothing is assigned yet.  */
+
+  FOR_ALL_BB_FN (bb, cfun)
+    bb->aux = bb;
+
+  EXIT_BLOCK_PTR_FOR_FN (cfun)->aux = 0;
+
+  /* Now for all edges, the most desirable first, see if that edge can
+     connect two sequences.  If it can, update AUX and BB_VISITED; if it
+     cannot, zero out the edge in the table.  */
+
+  int j;
+  for (j = 0; j < n; j++)
+    {
+      edge e = edges[j];
+
+      basic_block tail_a = e->src;
+      basic_block head_b = e->dest;
+      basic_block head_a = (basic_block) tail_a->aux;
+      basic_block tail_b = (basic_block) head_b->aux;
+
+      /* An edge cannot connect two sequences if:
+	 - it crosses partitions;
+	 - its src is not a current endpoint;
+	 - its dest is not a current endpoint;
+	 - or, it would create a loop.  */
+
+      if (e->flags & EDGE_CROSSING
+	  || tail_a->flags & BB_VISITED
+	  || !tail_b
+	  || (!(head_b->flags & BB_VISITED) && head_b != tail_b)
+	  || tail_a == tail_b)
+	{
+	  edges[j] = 0;
+	  continue;
+	}
+
+      tail_a->aux = 0;
+      head_b->aux = 0;
+      head_a->aux = tail_b;
+      tail_b->aux = head_a;
+      tail_a->flags |= BB_VISITED;
+    }
+
+  /* Put the pieces together, in the same order that the start blocks of
+     the sequences already had.  The hot/cold partitioning gives a little
+     complication: as a first pass only do this for blocks in the same
+     partition as the start block, and (if there is anything left to do)
+     in a second pass handle the other partition.  */
+
+  basic_block last_tail = (basic_block) ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux;
+
+  int current_partition = BB_PARTITION (last_tail);
+  bool need_another_pass = true;
+
+  for (int pass = 0; pass < 2 && need_another_pass; pass++)
+    {
+      need_another_pass = false;
+
+      FOR_EACH_BB_FN (bb, cfun)
+	if ((bb->flags & BB_VISITED && bb->aux) || bb->aux == bb)
+	  {
+	    if (BB_PARTITION (bb) != current_partition)
+	      {
+		need_another_pass = true;
+		continue;
+	      }
+
+	    last_tail->aux = bb;
+	    last_tail = (basic_block) bb->aux;
+	  }
+
+      current_partition ^= BB_HOT_PARTITION | BB_COLD_PARTITION;
+    }
+
+  last_tail->aux = 0;
+
+  /* Finally, link all the chosen fallthrough edges.  */
+
+  for (j = 0; j < n; j++)
+    if (edges[j])
+      edges[j]->src->aux = edges[j]->dest;
+
+  delete[] edges;
+
+  /* If the entry edge no longer falls through we have to make a new
+     block so it can do so again.  */
+
+  edge e = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0);
+  if (e->dest != ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux)
+    {
+      force_nonfallthru (e);
+      e->src->aux = ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux;
+      BB_COPY_PARTITION (e->src, e->dest);
+    }
+}
+
 /* Reorder basic blocks.  The main entry point to this file.  */
 
 static void
@@ -2274,7 +2429,10 @@ reorder_basic_blocks (void)
   set_edge_can_fallthru_flag ();
   mark_dfs_back_edges ();
 
-  reorder_basic_blocks_software_trace_cache ();
+  if (1)
+    reorder_basic_blocks_software_trace_cache ();
+  else
+    reorder_basic_blocks_simple ();
 
   relink_block_chain (/*stay_in_cfglayout_mode=*/true);
 
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 3/4] bb-reorder: Add -freorder-blocks-algorithm= and wire it up
  2015-09-23 22:09 [PATCH 0/4] bb-reorder: Add the "simple" algorithm Segher Boessenkool
  2015-09-23 22:09 ` [PATCH 1/4] bb-reorder: Split out STC Segher Boessenkool
  2015-09-23 22:10 ` [PATCH 2/4] bb-reorder: Add the "simple" algorithm Segher Boessenkool
@ 2015-09-23 23:01 ` Segher Boessenkool
  2015-09-24 11:00   ` Bernd Schmidt
                     ` (2 more replies)
  2015-09-23 23:36 ` [PATCH 4/4] bb-reorder: Documentation updates Segher Boessenkool
  2015-09-24 10:33 ` [PATCH 0/4] bb-reorder: Add the "simple" algorithm Bernd Schmidt
  4 siblings, 3 replies; 24+ messages in thread
From: Segher Boessenkool @ 2015-09-23 23:01 UTC (permalink / raw)
  To: gcc-patches; +Cc: Segher Boessenkool

This adds an -freorder-blocks-algorithm=[simple|stc] flag, with "simple"
as default.  For -O2 and up (except -Os) it is switched to "stc" instead.
Targets that never want STC can override this.  This changes -freorder-blocks
to be on at -O1 and up (was -O2 and up).

In effect, the changes are for -O1 (which now gets "simple" instead of
nothing), -Os (which now gets "simple" instead of "stc", since STC results
in much bigger code), and for targets that wish to never use STC (not in
this patch though).


2015-09-23   Segher Boessenkool  <segher@kernel.crashing.org>

	* bb-reorder.c (reorder_basic_blocks): Use the algorithm selected
	with flag_reorder_blocks_algorithm.
	* common.opt (freorder-blocks-algorithm=): New flag.
	(reorder_blocks_algorithm): New enum.
	* flag-types.h (reorder_blocks_algorithm): New enum.
	* opts.c (default_options_table): Use -freorder-blocks at -O1 and up,
	and -freorder-blocks-algorithm=stc at -O2 and up (not at -Os).

---
 gcc/bb-reorder.c | 17 +++++++++++++----
 gcc/common.opt   | 13 +++++++++++++
 gcc/flag-types.h |  7 +++++++
 gcc/opts.c       |  4 +++-
 4 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
index 40e9e50..e09f344 100644
--- a/gcc/bb-reorder.c
+++ b/gcc/bb-reorder.c
@@ -2429,10 +2429,19 @@ reorder_basic_blocks (void)
   set_edge_can_fallthru_flag ();
   mark_dfs_back_edges ();
 
-  if (1)
-    reorder_basic_blocks_software_trace_cache ();
-  else
-    reorder_basic_blocks_simple ();
+  switch (flag_reorder_blocks_algorithm)
+    {
+    case REORDER_BLOCKS_ALGORITHM_SIMPLE:
+      reorder_basic_blocks_simple ();
+      break;
+
+    case REORDER_BLOCKS_ALGORITHM_STC:
+      reorder_basic_blocks_software_trace_cache ();
+      break;
+
+    default:
+      gcc_unreachable ();
+    }
 
   relink_block_chain (/*stay_in_cfglayout_mode=*/true);
 
diff --git a/gcc/common.opt b/gcc/common.opt
index 94d1d88..b0f70fb 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1910,6 +1910,19 @@ freorder-blocks
 Common Report Var(flag_reorder_blocks) Optimization
 Reorder basic blocks to improve code placement
 
+freorder-blocks-algorithm=
+Common Joined RejectNegative Enum(reorder_blocks_algorithm) Var(flag_reorder_blocks_algorithm) Init(REORDER_BLOCKS_ALGORITHM_SIMPLE) Optimization
+-freorder-blocks-algorithm=[simple|stc] Set the used basic block reordering algorithm
+
+Enum
+Name(reorder_blocks_algorithm) Type(enum reorder_blocks_algorithm) UnknownError(unknown basic block reordering algorithm %qs)
+
+EnumValue
+Enum(reorder_blocks_algorithm) String(simple) Value(REORDER_BLOCKS_ALGORITHM_SIMPLE)
+
+EnumValue
+Enum(reorder_blocks_algorithm) String(stc) Value(REORDER_BLOCKS_ALGORITHM_STC)
+
 freorder-blocks-and-partition
 Common Report Var(flag_reorder_blocks_and_partition) Optimization
 Reorder basic blocks and partition into hot and cold sections
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index ac9ca0b..6301cea 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -109,6 +109,13 @@ enum stack_reuse_level
   SR_ALL
 };
 
+/* The algorithm used for basic block reordering.  */
+enum reorder_blocks_algorithm
+{
+  REORDER_BLOCKS_ALGORITHM_SIMPLE,
+  REORDER_BLOCKS_ALGORITHM_STC
+};
+
 /* The algorithm used for the integrated register allocator (IRA).  */
 enum ira_algorithm
 {
diff --git a/gcc/opts.c b/gcc/opts.c
index f1a9acd..786fd3a 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -441,6 +441,7 @@ static const struct default_options default_options_table[] =
     { OPT_LEVELS_1_PLUS, OPT_fipa_reference, NULL, 1 },
     { OPT_LEVELS_1_PLUS, OPT_fipa_profile, NULL, 1 },
     { OPT_LEVELS_1_PLUS, OPT_fmerge_constants, NULL, 1 },
+    { OPT_LEVELS_1_PLUS, OPT_freorder_blocks, NULL, 1 },
     { OPT_LEVELS_1_PLUS, OPT_fshrink_wrap, NULL, 1 },
     { OPT_LEVELS_1_PLUS, OPT_fsplit_wide_types, NULL, 1 },
     { OPT_LEVELS_1_PLUS, OPT_ftree_ccp, NULL, 1 },
@@ -483,7 +484,8 @@ static const struct default_options default_options_table[] =
 #endif
     { OPT_LEVELS_2_PLUS, OPT_fstrict_aliasing, NULL, 1 },
     { OPT_LEVELS_2_PLUS, OPT_fstrict_overflow, NULL, 1 },
-    { OPT_LEVELS_2_PLUS, OPT_freorder_blocks, NULL, 1 },
+    { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_freorder_blocks_algorithm_, NULL,
+      REORDER_BLOCKS_ALGORITHM_STC },
     { OPT_LEVELS_2_PLUS, OPT_freorder_functions, NULL, 1 },
     { OPT_LEVELS_2_PLUS, OPT_ftree_vrp, NULL, 1 },
     { OPT_LEVELS_2_PLUS, OPT_ftree_builtin_call_dce, NULL, 1 },
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 4/4] bb-reorder: Documentation updates
  2015-09-23 22:09 [PATCH 0/4] bb-reorder: Add the "simple" algorithm Segher Boessenkool
                   ` (2 preceding siblings ...)
  2015-09-23 23:01 ` [PATCH 3/4] bb-reorder: Add -freorder-blocks-algorithm= and wire it up Segher Boessenkool
@ 2015-09-23 23:36 ` Segher Boessenkool
  2015-09-24 10:33 ` [PATCH 0/4] bb-reorder: Add the "simple" algorithm Bernd Schmidt
  4 siblings, 0 replies; 24+ messages in thread
From: Segher Boessenkool @ 2015-09-23 23:36 UTC (permalink / raw)
  To: gcc-patches; +Cc: Segher Boessenkool

This updates the documentation for the new option and new defaults.


2015-09-23   Segher Boessenkool  <segher@kernel.crashing.org>

	* doc/invoke.texi (Optimization Options): Add
	-freorder-blocks-algorithm=.
	(Optimize Options) <-O>: Add -freorder-blocks.
	<-O2>: Remove -freorder-blocks.  Add -freorder-blocks-algorithm=stc.
	<-Os>: Add -freorder-blocks-algorithm=stc as not enabled.
	<-freorder-blocks>: Also enabled at levels -O and -Os.
	<-freorder-blocks-algorithm=>: Document new option.

---
 gcc/doc/invoke.texi | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 09c58ee..ca18501 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -430,6 +430,7 @@ Objective-C and Objective-C++ Dialects}.
 -fprofile-use -fprofile-use=@var{path} -fprofile-values @gol
 -fprofile-reorder-functions @gol
 -freciprocal-math -free -frename-registers -freorder-blocks @gol
+-freorder-blocks-algorithm=@var{algorithm} @gol
 -freorder-blocks-and-partition -freorder-functions @gol
 -frerun-cse-after-loop -freschedule-modulo-scheduled-loops @gol
 -frounding-math -fsched2-use-superblocks -fsched-pressure @gol
@@ -7683,6 +7684,7 @@ compilation time.
 -fipa-reference @gol
 -fmerge-constants @gol
 -fmove-loop-invariants @gol
+-freorder-blocks @gol
 -fshrink-wrap @gol
 -fsplit-wide-types @gol
 -ftree-bit-ccp @gol
@@ -7739,7 +7741,8 @@ also turns on the following optimization flags:
 -foptimize-strlen @gol
 -fpartial-inlining @gol
 -fpeephole2 @gol
--freorder-blocks -freorder-blocks-and-partition -freorder-functions @gol
+-freorder-blocks-algorithm=stc @gol
+-freorder-blocks-and-partition -freorder-functions @gol
 -frerun-cse-after-loop  @gol
 -fsched-interblock  -fsched-spec @gol
 -fschedule-insns  -fschedule-insns2 @gol
@@ -7776,8 +7779,8 @@ optimizations designed to reduce code size.
 
 @option{-Os} disables the following optimization flags:
 @gccoptlist{-falign-functions  -falign-jumps  -falign-loops @gol
--falign-labels  -freorder-blocks  -freorder-blocks-and-partition @gol
--fprefetch-loop-arrays}
+-falign-labels  -freorder-blocks  -freorder-blocks-algorithm=stc @gol
+-freorder-blocks-and-partition  -fprefetch-loop-arrays}
 
 @item -Ofast
 @opindex Ofast
@@ -9127,7 +9130,19 @@ The default is @option{-fguess-branch-probability} at levels
 Reorder basic blocks in the compiled function in order to reduce number of
 taken branches and improve code locality.
 
-Enabled at levels @option{-O2}, @option{-O3}.
+Enabled at levels @option{-O}, @option{-O2}, @option{-O3}, @option{-Os}.
+
+@item -freorder-blocks-algorithm=@var{algorithm}
+@opindex freorder-blocks-algorithm
+Use the specified algorithm for basic block reordering.  The
+@var{algorithm} argument can be @samp{simple}, which does not increase
+code size (except sometimes due to secondary effects like alignment),
+or @samp{stc}, the ``software trace cache'' algorithm, which tries to
+put all often executed code together, minimizing the number of branches
+executed by making extra copies of code.
+
+The default is @samp{simple} at levels @option{-O}, @option{-Os}, and
+@samp{stc} at levels @option{-O2}, @option{-O3}.
 
 @item -freorder-blocks-and-partition
 @opindex freorder-blocks-and-partition
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/4] bb-reorder: Add the "simple" algorithm
  2015-09-23 22:09 [PATCH 0/4] bb-reorder: Add the "simple" algorithm Segher Boessenkool
                   ` (3 preceding siblings ...)
  2015-09-23 23:36 ` [PATCH 4/4] bb-reorder: Documentation updates Segher Boessenkool
@ 2015-09-24 10:33 ` Bernd Schmidt
  2015-09-24 13:32   ` Segher Boessenkool
  4 siblings, 1 reply; 24+ messages in thread
From: Bernd Schmidt @ 2015-09-24 10:33 UTC (permalink / raw)
  To: Segher Boessenkool, gcc-patches

On 09/24/2015 12:06 AM, Segher Boessenkool wrote:
> The current basic block reordering always uses the "software trace cache"
> algorithm.  That has a few problems:
>
> 1) It increases code size substantially; this makes it not suitable for
> -O1 or -Os, and not at all for some architectures;
> 2) but it is enabled for -Os and all targets;
> 3) and -O1 gets nothing, resulting in pretty jumpy code.

A general question first, I see code in bb-reorder.c (in copy_bb_p) that 
limits the amount of code growth if not optimizing for speed. Is that 
not working as expected or not sufficient?

Your code looks like a nice clean algorithm so I have no objections to 
it (detailed comments to follow), but I want to make sure it is 
necessary to add it.


Bernd

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/4] bb-reorder: Add -freorder-blocks-algorithm= and wire it up
  2015-09-23 23:01 ` [PATCH 3/4] bb-reorder: Add -freorder-blocks-algorithm= and wire it up Segher Boessenkool
@ 2015-09-24 11:00   ` Bernd Schmidt
  2015-09-24 14:06     ` Segher Boessenkool
  2015-09-24 15:21   ` Andi Kleen
       [not found]   ` <56377569.7010904@arm.com>
  2 siblings, 1 reply; 24+ messages in thread
From: Bernd Schmidt @ 2015-09-24 11:00 UTC (permalink / raw)
  To: Segher Boessenkool, gcc-patches

On 09/24/2015 12:06 AM, Segher Boessenkool wrote:
> This adds an -freorder-blocks-algorithm=[simple|stc] flag, with "simple"
> as default.  For -O2 and up (except -Os) it is switched to "stc" instead.
> Targets that never want STC can override this.  This changes -freorder-blocks
> to be on at -O1 and up (was -O2 and up).
>
> In effect, the changes are for -O1 (which now gets "simple" instead of
> nothing), -Os (which now gets "simple" instead of "stc", since STC results
> in much bigger code), and for targets that wish to never use STC (not in
> this patch though).

This should be merged with its documentation in 4/4, and personally I'd 
have no problem reviewing a patch with 2/3/4 all in one. Splitting 
patches is most helpful if there are parts that rearrange things such as 
your 1/4, or if there are multiple independent functional changes. I'm 
not saying you did anything wrong by splitting, just that maybe you made 
unnecessary work for yourself.

No objections to 3/4 and 4/4 otherwise.


Bernd

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] bb-reorder: Add the "simple" algorithm
  2015-09-23 22:10 ` [PATCH 2/4] bb-reorder: Add the "simple" algorithm Segher Boessenkool
@ 2015-09-24 11:13   ` Bernd Schmidt
  2015-09-24 13:48     ` Segher Boessenkool
  2015-09-24 16:35   ` Steven Bosscher
  2015-09-25 14:31   ` [PATCH 2/4 v2] " Segher Boessenkool
  2 siblings, 1 reply; 24+ messages in thread
From: Bernd Schmidt @ 2015-09-24 11:13 UTC (permalink / raw)
  To: Segher Boessenkool, gcc-patches

On 09/24/2015 12:06 AM, Segher Boessenkool wrote:
> This is the meat of this series: a new algorithm to do basic block
> reordering.  It uses the simple greedy approach to maximum weighted
> matching, where the weights are the predicted execution frequency of
> the edges.  This always finds a solution that is within a factor two
> of optimal, if you disregard loops (which we cannot allow) and the
> complications of block partitioning.

Looks really good for the most part.

The comment at the top of the file should be updated to mention both 
algorithms.

> +  /* Sort the edges, the most desirable first.  */
> +
> +  std::stable_sort (edges, edges + n, edge_order);

Any thoughts on this vs qsort? Do you need a stable sort?

> +  int j;
> +  for (j = 0; j < n; j++)

for (int j ...
here and in the other loop that uses j.

> +  /* If the entry edge no longer falls through we have to make a new
> +     block so it can do so again.  */
> +
> +  edge e = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0);
> +  if (e->dest != ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux)
> +    {
> +      force_nonfallthru (e);
> +      e->src->aux = ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux;
> +      BB_COPY_PARTITION (e->src, e->dest);
> +    }
> +}

That's a bit odd, can this situation be prevented earlier? Why wouldn't 
we force the entry edge to fall thru?


Bernd

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/4] bb-reorder: Add the "simple" algorithm
  2015-09-24 10:33 ` [PATCH 0/4] bb-reorder: Add the "simple" algorithm Bernd Schmidt
@ 2015-09-24 13:32   ` Segher Boessenkool
  0 siblings, 0 replies; 24+ messages in thread
From: Segher Boessenkool @ 2015-09-24 13:32 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: gcc-patches

On Thu, Sep 24, 2015 at 11:56:22AM +0200, Bernd Schmidt wrote:
> On 09/24/2015 12:06 AM, Segher Boessenkool wrote:
> >The current basic block reordering always uses the "software trace cache"
> >algorithm.  That has a few problems:
> >
> >1) It increases code size substantially; this makes it not suitable for
> >-O1 or -Os, and not at all for some architectures;
> >2) but it is enabled for -Os and all targets;
> >3) and -O1 gets nothing, resulting in pretty jumpy code.
> 
> A general question first, I see code in bb-reorder.c (in copy_bb_p) that 
> limits the amount of code growth if not optimizing for speed. Is that 
> not working as expected or not sufficient?

It works.  The "simple" algorithm generates slightly smaller code though
(less than a percent).  Defaulting -Os to STC is easy of course; do you
prefer that?

> Your code looks like a nice clean algorithm so I have no objections to 
> it (detailed comments to follow), but I want to make sure it is 
> necessary to add it.

It's not just for -Os, but also for -O1 (where we currently don't reorder
at all, although various passes leave the config in a pretty sorry state --
like, we run shrink-wrapping at -O1, and it can make quite a mess if some
blocks are copied and others not; but this is just an example, it was the
trigger for me though).

And, when I wrote the original for this, it was for a target where STC
does not help at all (there is no instruction cache); "simple" saves a
lot of space at -O2.  Quite important for embedded targets.

Finally, it lets us easily plug in other algorithms.


Segher

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] bb-reorder: Add the "simple" algorithm
  2015-09-24 11:13   ` Bernd Schmidt
@ 2015-09-24 13:48     ` Segher Boessenkool
  2015-09-24 15:19       ` Segher Boessenkool
  0 siblings, 1 reply; 24+ messages in thread
From: Segher Boessenkool @ 2015-09-24 13:48 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: gcc-patches

On Thu, Sep 24, 2015 at 12:32:59PM +0200, Bernd Schmidt wrote:
> On 09/24/2015 12:06 AM, Segher Boessenkool wrote:
> >This is the meat of this series: a new algorithm to do basic block
> >reordering.  It uses the simple greedy approach to maximum weighted
> >matching, where the weights are the predicted execution frequency of
> >the edges.  This always finds a solution that is within a factor two
> >of optimal, if you disregard loops (which we cannot allow) and the
> >complications of block partitioning.
> 
> Looks really good for the most part.
> 
> The comment at the top of the file should be updated to mention both 
> algorithms.

Will do.

> >+  /* Sort the edges, the most desirable first.  */
> >+
> >+  std::stable_sort (edges, edges + n, edge_order);
> 
> Any thoughts on this vs qsort? Do you need a stable sort?

We always need stable sorts in GCC; things are not reproducible across
targets with qsort (not every qsort is the same).

Also, you sometimes have two edges back-to-back, with the same weights;
a stable sort ensures we don't put a jump in the middle of that if we
can help it.

> >+  int j;
> >+  for (j = 0; j < n; j++)
> 
> for (int j ...
> here and in the other loop that uses j.

That is so ugly.  Will change though :-)

> >+  /* If the entry edge no longer falls through we have to make a new
> >+     block so it can do so again.  */
> >+
> >+  edge e = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0);
> >+  if (e->dest != ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux)
> >+    {
> >+      force_nonfallthru (e);
> >+      e->src->aux = ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux;
> >+      BB_COPY_PARTITION (e->src, e->dest);
> >+    }
> >+}
> 
> That's a bit odd, can this situation be prevented earlier?

We could always add an extra jump at the start, but that's about the
same code so not helpful.

> Why wouldn't we force the entry edge to fall thru?

Because it pessimises code.  If the model thinks the first block after
entry belongs somewhere in the middle of a fall-through sequence, it
usually is right (typical examples are loops that start with the loop
test).  This algorithm does not peel loops (it does no duplication at
all).

All the optimisable blocks end with an unconditional jump, and this algo
tries to remove as many of those as it can; logically, the start block
has such a jump as well.


Segher

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/4] bb-reorder: Add -freorder-blocks-algorithm= and wire it up
  2015-09-24 11:00   ` Bernd Schmidt
@ 2015-09-24 14:06     ` Segher Boessenkool
  0 siblings, 0 replies; 24+ messages in thread
From: Segher Boessenkool @ 2015-09-24 14:06 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: gcc-patches

On Thu, Sep 24, 2015 at 12:28:08PM +0200, Bernd Schmidt wrote:
> On 09/24/2015 12:06 AM, Segher Boessenkool wrote:
> >This adds an -freorder-blocks-algorithm=[simple|stc] flag, with "simple"
> >as default.  For -O2 and up (except -Os) it is switched to "stc" instead.
> >Targets that never want STC can override this.  This changes 
> >-freorder-blocks
> >to be on at -O1 and up (was -O2 and up).
> >
> >In effect, the changes are for -O1 (which now gets "simple" instead of
> >nothing), -Os (which now gets "simple" instead of "stc", since STC results
> >in much bigger code), and for targets that wish to never use STC (not in
> >this patch though).
> 
> This should be merged with its documentation in 4/4, and personally I'd 
> have no problem reviewing a patch with 2/3/4 all in one. Splitting 
> patches is most helpful if there are parts that rearrange things such as 
> your 1/4, or if there are multiple independent functional changes. I'm 
> not saying you did anything wrong by splitting, just that maybe you made 
> unnecessary work for yourself.

I had the patches like that in my git tree, so I figured I'd send it like
that, makes review slightly easier (not a big deal for small patches like
this of course).  I did not waste time splitting things up, don't worry :-)


Segher

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] bb-reorder: Add the "simple" algorithm
  2015-09-24 13:48     ` Segher Boessenkool
@ 2015-09-24 15:19       ` Segher Boessenkool
  0 siblings, 0 replies; 24+ messages in thread
From: Segher Boessenkool @ 2015-09-24 15:19 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: gcc-patches

On Thu, Sep 24, 2015 at 08:39:30AM -0500, Segher Boessenkool wrote:
> > Any thoughts on this vs qsort? Do you need a stable sort?
> 
> We always need stable sorts in GCC; things are not reproducible across
> targets with qsort (not every qsort is the same).

s/targets/hosts/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/4] bb-reorder: Add -freorder-blocks-algorithm= and wire it up
  2015-09-23 23:01 ` [PATCH 3/4] bb-reorder: Add -freorder-blocks-algorithm= and wire it up Segher Boessenkool
  2015-09-24 11:00   ` Bernd Schmidt
@ 2015-09-24 15:21   ` Andi Kleen
  2015-09-24 15:49     ` Segher Boessenkool
       [not found]   ` <56377569.7010904@arm.com>
  2 siblings, 1 reply; 24+ messages in thread
From: Andi Kleen @ 2015-09-24 15:21 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches

Segher Boessenkool <segher@kernel.crashing.org> writes:
>
> In effect, the changes are for -O1 (which now gets "simple" instead of
> nothing), -Os (which now gets "simple" instead of "stc", since STC results
> in much bigger code), and for targets that wish to never use STC (not in
> this patch though).

Do you have some data on the code size differences with -Os?

-Andi

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/4] bb-reorder: Add -freorder-blocks-algorithm= and wire it up
  2015-09-24 15:21   ` Andi Kleen
@ 2015-09-24 15:49     ` Segher Boessenkool
  2015-09-24 17:16       ` Segher Boessenkool
  0 siblings, 1 reply; 24+ messages in thread
From: Segher Boessenkool @ 2015-09-24 15:49 UTC (permalink / raw)
  To: Andi Kleen; +Cc: gcc-patches

On Thu, Sep 24, 2015 at 08:12:55AM -0700, Andi Kleen wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:
> >
> > In effect, the changes are for -O1 (which now gets "simple" instead of
> > nothing), -Os (which now gets "simple" instead of "stc", since STC results
> > in much bigger code), and for targets that wish to never use STC (not in
> > this patch though).
> 
> Do you have some data on the code size differences with -Os?

It's about 0.1% for a quick combine.ii sniff test; I don't have big test
results for -Os.

"Much bigger code" is a mischaracterisation, that is true for -O2, not -Os.


Segher

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] bb-reorder: Split out STC
  2015-09-23 22:09 ` [PATCH 1/4] bb-reorder: Split out STC Segher Boessenkool
@ 2015-09-24 16:12   ` Steven Bosscher
  0 siblings, 0 replies; 24+ messages in thread
From: Steven Bosscher @ 2015-09-24 16:12 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: GCC Patches

On Thu, Sep 24, 2015 at 12:06 AM, Segher Boessenkool wrote:
> This first patch simply factors code a little bit.
>
>
> 2015-09-23   Segher Boessenkool  <segher@kernel.crashing.org>
>
>         * bb-reorder.c (reorder_basic_blocks_software_trace_cache): New
>         function, factored out from ...
>         (reorder_basic_blocks): ... here.

OK.

Ciao!
Steven

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] bb-reorder: Add the "simple" algorithm
  2015-09-23 22:10 ` [PATCH 2/4] bb-reorder: Add the "simple" algorithm Segher Boessenkool
  2015-09-24 11:13   ` Bernd Schmidt
@ 2015-09-24 16:35   ` Steven Bosscher
  2015-09-24 17:22     ` Segher Boessenkool
  2015-09-25 14:31   ` [PATCH 2/4 v2] " Segher Boessenkool
  2 siblings, 1 reply; 24+ messages in thread
From: Steven Bosscher @ 2015-09-24 16:35 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: GCC Patches

On Thu, Sep 24, 2015 at 12:06 AM, Segher Boessenkool wrote:
> +  /* First, collect all edges that can be optimized by reordering blocks:
> +     simple jumps and conditional jumps, as well as the function entry edge.  */
> +
> +  int n = 0;
> +  edges[n++] = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0);
> +
> +  basic_block bb;
> +  FOR_EACH_BB_FN (bb, cfun)
> +    {
> +      rtx_insn *end = BB_END (bb);
> +
> +      if (computed_jump_p (end) || tablejump_p (end, NULL, NULL))
> +       continue;

Should handle ASM jumps.


> +  FOR_ALL_BB_FN (bb, cfun)
> +    bb->aux = bb;

Bit tricky for the ENTRY and EXIT blocks, that are not really basic
blocks. After the pass, EXIT should not end up pointing to itself.
Maybe use FOR_EACH_BB_FN and set ENTRY separately?


Other than that, looks good to me.

Ciao!
Steven

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/4] bb-reorder: Add -freorder-blocks-algorithm= and wire it up
  2015-09-24 15:49     ` Segher Boessenkool
@ 2015-09-24 17:16       ` Segher Boessenkool
  0 siblings, 0 replies; 24+ messages in thread
From: Segher Boessenkool @ 2015-09-24 17:16 UTC (permalink / raw)
  To: Andi Kleen; +Cc: gcc-patches

On Thu, Sep 24, 2015 at 08:12:55AM -0700, Andi Kleen wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:
> >
> > In effect, the changes are for -O1 (which now gets "simple" instead of
> > nothing), -Os (which now gets "simple" instead of "stc", since STC results
> > in much bigger code), and for targets that wish to never use STC (not in
> > this patch though).
> 
> Do you have some data on the code size differences with -Os?

It's about 0.1% for a quick combine.ii sniff test; I don't have big test
results for -Os.

"Much bigger code" is a mischaracterisation, that is true for -O2, not -Os.


Segher

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] bb-reorder: Add the "simple" algorithm
  2015-09-24 16:35   ` Steven Bosscher
@ 2015-09-24 17:22     ` Segher Boessenkool
  0 siblings, 0 replies; 24+ messages in thread
From: Segher Boessenkool @ 2015-09-24 17:22 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: GCC Patches

On Thu, Sep 24, 2015 at 06:03:33PM +0200, Steven Bosscher wrote:
> On Thu, Sep 24, 2015 at 12:06 AM, Segher Boessenkool wrote:
> > +  /* First, collect all edges that can be optimized by reordering blocks:
> > +     simple jumps and conditional jumps, as well as the function entry edge.  */
> > +
> > +  int n = 0;
> > +  edges[n++] = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0);
> > +
> > +  basic_block bb;
> > +  FOR_EACH_BB_FN (bb, cfun)
> > +    {
> > +      rtx_insn *end = BB_END (bb);
> > +
> > +      if (computed_jump_p (end) || tablejump_p (end, NULL, NULL))
> > +       continue;
> 
> Should handle ASM jumps.

Right, those are considered as optimisable now, although they are not.
Will fix.

> > +  FOR_ALL_BB_FN (bb, cfun)
> > +    bb->aux = bb;
> 
> Bit tricky for the ENTRY and EXIT blocks, that are not really basic
> blocks. After the pass, EXIT should not end up pointing to itself.

But it doesn't, the next line already takes care of it.


Segher

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4 v2] bb-reorder: Add the "simple" algorithm
  2015-09-23 22:10 ` [PATCH 2/4] bb-reorder: Add the "simple" algorithm Segher Boessenkool
  2015-09-24 11:13   ` Bernd Schmidt
  2015-09-24 16:35   ` Steven Bosscher
@ 2015-09-25 14:31   ` Segher Boessenkool
  2015-09-25 16:09     ` Peter Bergner
  2015-09-29 13:19     ` Bernd Schmidt
  2 siblings, 2 replies; 24+ messages in thread
From: Segher Boessenkool @ 2015-09-25 14:31 UTC (permalink / raw)
  To: gcc-patches; +Cc: bschmidt, stevenb.gcc

v2 changes:
- Add a file header comment;
- Use "for" loop initial declarations;
- Handle asm goto.

Testing this on x86_64-linux; okay if it succeeds?


Segher


2015-09-99   Segher Boessenkool  <segher@kernel.crashing.org>

	* bb-reorder.c: Add intro comment.
	(reorder_basic_blocks_software_trace_cache): Print a header to
	the dump file.
	(edge_order): New function.
	(reorder_basic_blocks_simple): New function.
	(reorder_basic_blocks): Choose between the STC and the simple
	algorithms (always choose the former).

---
 gcc/bb-reorder.c | 175 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 174 insertions(+), 1 deletion(-)

diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
index 725cdc3..4d07b2e 100644
--- a/gcc/bb-reorder.c
+++ b/gcc/bb-reorder.c
@@ -17,6 +17,18 @@
    along with GCC; see the file COPYING3.  If not see
    <http://www.gnu.org/licenses/>.  */
 
+/* This file contains the "reorder blocks" pass, which changes the control
+   flow of a function to encounter fewer branches; the "partition blocks"
+   pass, which divides the basic blocks into "hot" and "cold" partitions,
+   which are kept separate; and the "duplicate computed gotos" pass, which
+   duplicates blocks ending in an indirect jump.
+
+   There are two algorithms for "reorder blocks": the "simple" algorithm,
+   which just rearranges blocks, trying to minimize the number of executed
+   unconditional branches; and the "software trace cache" algorithm, which
+   also copies code, and in general tries a lot harder to have long linear
+   pieces of machine code executed.  This algorithm is described next.  */
+
 /* This (greedy) algorithm constructs traces in several rounds.
    The construction starts from "seeds".  The seed for the first round
    is the entry point of the function.  When there are more than one seed,
@@ -2231,6 +2243,9 @@ update_crossing_jump_flags (void)
 static void
 reorder_basic_blocks_software_trace_cache (void)
 {
+  if (dump_file)
+    fprintf (dump_file, "\nReordering with the STC algorithm.\n\n");
+
   int n_traces;
   int i;
   struct trace *traces;
@@ -2261,6 +2276,161 @@ reorder_basic_blocks_software_trace_cache (void)
   FREE (bbd);
 }
 
+/* Return true if edge E1 is more desirable as a fallthrough edge than
+   edge E2 is.  */
+
+static bool
+edge_order (edge e1, edge e2)
+{
+  return EDGE_FREQUENCY (e1) > EDGE_FREQUENCY (e2);
+}
+
+/* Reorder basic blocks using the "simple" algorithm.  This tries to
+   maximize the dynamic number of branches that are fallthrough, without
+   copying instructions.  The algorithm is greedy, looking at the most
+   frequently executed branch first.  */
+
+static void
+reorder_basic_blocks_simple (void)
+{
+  if (dump_file)
+    fprintf (dump_file, "\nReordering with the \"simple\" algorithm.\n\n");
+
+  edge *edges = new edge[2 * n_basic_blocks_for_fn (cfun)];
+
+  /* First, collect all edges that can be optimized by reordering blocks:
+     simple jumps and conditional jumps, as well as the function entry edge.  */
+
+  int n = 0;
+  edges[n++] = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0);
+
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      rtx_insn *end = BB_END (bb);
+
+      if (computed_jump_p (end) || tablejump_p (end, NULL, NULL))
+	continue;
+
+      /* We cannot optimize asm goto.  */
+      if (JUMP_P (end) && extract_asm_operands (end))
+	continue;
+
+      if (any_condjump_p (end))
+	{
+	  edges[n++] = EDGE_SUCC (bb, 0);
+	  edges[n++] = EDGE_SUCC (bb, 1);
+	}
+      else if (single_succ_p (bb))
+	edges[n++] = EDGE_SUCC (bb, 0);
+    }
+
+  /* Sort the edges, the most desirable first.  */
+
+  std::stable_sort (edges, edges + n, edge_order);
+
+  /* Now decide which of those edges to make fallthrough edges.  We set
+     BB_VISITED if a block already has a fallthrough successor assigned
+     to it.  We make ->AUX of an endpoint point to the opposite endpoint
+     of a sequence of blocks that fall through, and ->AUX will be NULL
+     for a block that is in such a sequence but not an endpoint anymore.
+
+     To start with, everything points to itself, nothing is assigned yet.  */
+
+  FOR_ALL_BB_FN (bb, cfun)
+    bb->aux = bb;
+
+  EXIT_BLOCK_PTR_FOR_FN (cfun)->aux = 0;
+
+  /* Now for all edges, the most desirable first, see if that edge can
+     connect two sequences.  If it can, update AUX and BB_VISITED; if it
+     cannot, zero out the edge in the table.  */
+
+  for (int j = 0; j < n; j++)
+    {
+      edge e = edges[j];
+
+      basic_block tail_a = e->src;
+      basic_block head_b = e->dest;
+      basic_block head_a = (basic_block) tail_a->aux;
+      basic_block tail_b = (basic_block) head_b->aux;
+
+      /* An edge cannot connect two sequences if:
+	 - it crosses partitions;
+	 - its src is not a current endpoint;
+	 - its dest is not a current endpoint;
+	 - or, it would create a loop.  */
+
+      if (e->flags & EDGE_CROSSING
+	  || tail_a->flags & BB_VISITED
+	  || !tail_b
+	  || (!(head_b->flags & BB_VISITED) && head_b != tail_b)
+	  || tail_a == tail_b)
+	{
+	  edges[j] = 0;
+	  continue;
+	}
+
+      tail_a->aux = 0;
+      head_b->aux = 0;
+      head_a->aux = tail_b;
+      tail_b->aux = head_a;
+      tail_a->flags |= BB_VISITED;
+    }
+
+  /* Put the pieces together, in the same order that the start blocks of
+     the sequences already had.  The hot/cold partitioning gives a little
+     complication: as a first pass only do this for blocks in the same
+     partition as the start block, and (if there is anything left to do)
+     in a second pass handle the other partition.  */
+
+  basic_block last_tail = (basic_block) ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux;
+
+  int current_partition = BB_PARTITION (last_tail);
+  bool need_another_pass = true;
+
+  for (int pass = 0; pass < 2 && need_another_pass; pass++)
+    {
+      need_another_pass = false;
+
+      FOR_EACH_BB_FN (bb, cfun)
+	if ((bb->flags & BB_VISITED && bb->aux) || bb->aux == bb)
+	  {
+	    if (BB_PARTITION (bb) != current_partition)
+	      {
+		need_another_pass = true;
+		continue;
+	      }
+
+	    last_tail->aux = bb;
+	    last_tail = (basic_block) bb->aux;
+	  }
+
+      current_partition ^= BB_HOT_PARTITION | BB_COLD_PARTITION;
+    }
+
+  last_tail->aux = 0;
+
+  /* Finally, link all the chosen fallthrough edges.  */
+
+  for (int j = 0; j < n; j++)
+    if (edges[j])
+      edges[j]->src->aux = edges[j]->dest;
+
+  delete[] edges;
+
+  /* If the entry edge no longer falls through we have to make a new
+     block so it can do so again.  */
+
+  edge e = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), 0);
+  if (e->dest != ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux)
+    {
+      force_nonfallthru (e);
+      e->src->aux = ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux;
+      BB_COPY_PARTITION (e->src, e->dest);
+    }
+}
+
 /* Reorder basic blocks.  The main entry point to this file.  */
 
 static void
@@ -2274,7 +2444,10 @@ reorder_basic_blocks (void)
   set_edge_can_fallthru_flag ();
   mark_dfs_back_edges ();
 
-  reorder_basic_blocks_software_trace_cache ();
+  if (1)
+    reorder_basic_blocks_software_trace_cache ();
+  else
+    reorder_basic_blocks_simple ();
 
   relink_block_chain (/*stay_in_cfglayout_mode=*/true);
 
-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4 v2] bb-reorder: Add the "simple" algorithm
  2015-09-25 14:31   ` [PATCH 2/4 v2] " Segher Boessenkool
@ 2015-09-25 16:09     ` Peter Bergner
  2015-09-25 16:15       ` Segher Boessenkool
  2015-09-29 13:19     ` Bernd Schmidt
  1 sibling, 1 reply; 24+ messages in thread
From: Peter Bergner @ 2015-09-25 16:09 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, bschmidt, stevenb.gcc

On Fri, 2015-09-25 at 09:16 -0500, Segher Boessenkool wrote:
>	(reorder_basic_blocks): Choose between the STC and the simple
> 	algorithms (always choose the former).
[snip]
@@ -2274,7 +2444,10 @@ reorder_basic_blocks (void)
>    set_edge_can_fallthru_flag ();
>    mark_dfs_back_edges ();
> 
> -  reorder_basic_blocks_software_trace_cache ();
> +  if (1)
> +    reorder_basic_blocks_software_trace_cache ();
> +  else
> +    reorder_basic_blocks_simple ();

Did you write the code this way because you're thinking of allowing
either reorder function to be called in the future?

Peter



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4 v2] bb-reorder: Add the "simple" algorithm
  2015-09-25 16:09     ` Peter Bergner
@ 2015-09-25 16:15       ` Segher Boessenkool
  0 siblings, 0 replies; 24+ messages in thread
From: Segher Boessenkool @ 2015-09-25 16:15 UTC (permalink / raw)
  To: Peter Bergner; +Cc: gcc-patches, bschmidt, stevenb.gcc

On Fri, Sep 25, 2015 at 10:59:37AM -0500, Peter Bergner wrote:
> On Fri, 2015-09-25 at 09:16 -0500, Segher Boessenkool wrote:
> >	(reorder_basic_blocks): Choose between the STC and the simple
> > 	algorithms (always choose the former).
> [snip]
> @@ -2274,7 +2444,10 @@ reorder_basic_blocks (void)
> >    set_edge_can_fallthru_flag ();
> >    mark_dfs_back_edges ();
> > 
> > -  reorder_basic_blocks_software_trace_cache ();
> > +  if (1)
> > +    reorder_basic_blocks_software_trace_cache ();
> > +  else
> > +    reorder_basic_blocks_simple ();
> 
> Did you write the code this way because you're thinking of allowing
> either reorder function to be called in the future?

I _wrote_ it as "if (0)" actually, to test the new algo ;-)

The next patch (3/4) wires up a new compiler flag to choose between the
two algorithms.


Segher

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4 v2] bb-reorder: Add the "simple" algorithm
  2015-09-25 14:31   ` [PATCH 2/4 v2] " Segher Boessenkool
  2015-09-25 16:09     ` Peter Bergner
@ 2015-09-29 13:19     ` Bernd Schmidt
  1 sibling, 0 replies; 24+ messages in thread
From: Bernd Schmidt @ 2015-09-29 13:19 UTC (permalink / raw)
  To: Segher Boessenkool, gcc-patches; +Cc: stevenb.gcc

On 09/25/2015 04:16 PM, Segher Boessenkool wrote:
> v2 changes:
> - Add a file header comment;
> - Use "for" loop initial declarations;
> - Handle asm goto.
>
> Testing this on x86_64-linux; okay if it succeeds?

No objections from me. Let's give Steven another day or so to comment.


Bernd

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/4] bb-reorder: Add -freorder-blocks-algorithm= and wire it up
       [not found]   ` <56377569.7010904@arm.com>
@ 2015-11-02 15:14     ` Alan Lawrence
  2015-11-02 21:04     ` Segher Boessenkool
  1 sibling, 0 replies; 24+ messages in thread
From: Alan Lawrence @ 2015-11-02 15:14 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches

On 02/11/15 14:38, Alan Lawrence wrote:
 >
> I'm a bit puzzled as to why nobody else has been seeing this, as it's been
> happening to me as part of building gcc on x86_64, but since this patch I've
> been seeing an ICE in vec::operator[] in reorder_basic_blocks_simple, building
> libitm/beginend.cc. Preprocessed source attached

OK, so I realize now just how big that preprocessed source was (1.4MB) - 
apologies! I've filed PR/68182 with a .gz for now....


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/4] bb-reorder: Add -freorder-blocks-algorithm= and wire it up
       [not found]   ` <56377569.7010904@arm.com>
  2015-11-02 15:14     ` Alan Lawrence
@ 2015-11-02 21:04     ` Segher Boessenkool
  1 sibling, 0 replies; 24+ messages in thread
From: Segher Boessenkool @ 2015-11-02 21:04 UTC (permalink / raw)
  To: Alan Lawrence; +Cc: gcc-patches

On Mon, Nov 02, 2015 at 02:38:33PM +0000, Alan Lawrence wrote:
> On 23/09/15 23:06, Segher Boessenkool wrote:
> >This adds an -freorder-blocks-algorithm=[simple|stc] flag, with "simple"
> >as default.  For -O2 and up (except -Os) it is switched to "stc" instead.
> >Targets that never want STC can override this.  This changes 
> >-freorder-blocks
> >to be on at -O1 and up (was -O2 and up).
> >
> >In effect, the changes are for -O1 (which now gets "simple" instead of
> >nothing), -Os (which now gets "simple" instead of "stc", since STC results
> >in much bigger code), and for targets that wish to never use STC (not in
> >this patch though).
> >
> >
> >2015-09-23   Segher Boessenkool  <segher@kernel.crashing.org>
> >
> >	* bb-reorder.c (reorder_basic_blocks): Use the algorithm selected
> >	with flag_reorder_blocks_algorithm.
> >	* common.opt (freorder-blocks-algorithm=): New flag.
> >	(reorder_blocks_algorithm): New enum.
> >	* flag-types.h (reorder_blocks_algorithm): New enum.
> >	* opts.c (default_options_table): Use -freorder-blocks at -O1 and up,
> >	and -freorder-blocks-algorithm=stc at -O2 and up (not at -Os).
> 
> I'm a bit puzzled as to why nobody else has been seeing this, as it's been 
> happening to me as part of building gcc on x86_64, but since this patch 
> I've been seeing an ICE in vec::operator[] in reorder_basic_blocks_simple, 
> building libitm/beginend.cc. Preprocessed source attached, command line 
> reduced to:
> 
> $ /work/alalaw01/build/./gcc/xg++ -B/work/alalaw01/build/./gcc/ -mrtm -O1 
> -g -m32 -c temp.ii
> /work/alalaw01/src/gcc/libitm/beginend.cc: In static member function 
> ‘static uint32_t GTM::gtm_thread::begin_transaction(uint32_t, const 
> gtm_jmpbuf*)’:
> /work/alalaw01/src/gcc/libitm/beginend.cc:400:1: internal compiler error: 
> in operator[], at vec.h:714
>  }
>  ^
> 0x1310783 vec<edge_def*, va_gc, vl_embed>::operator[](unsigned int)
> 	/work/alalaw01/src/gcc/gcc/vec.h:714
> 0x1310783 reorder_basic_blocks_simple
> 	/work/alalaw01/src/gcc/gcc/bb-reorder.c:2322

That seems to be happening in the FOR_EACH_BB_FN then?  Or one of the
EDGE_SUCCs?  I cannot match up that line # with anything.  Both cases
suggest things are broken before already.

> 0x1310783 reorder_basic_blocks
> 	/work/alalaw01/src/gcc/gcc/bb-reorder.c:2450
> 0x1310783 execute
> 	/work/alalaw01/src/gcc/gcc/bb-reorder.c:2551
> 
> Compilation succeeds at -O2 (instead of -O1).

reorder_basic_blocks_simple isn't used at -O2 (it uses STC instead).

I'll have a look at this wednesday.


Segher

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2015-11-02 21:04 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-23 22:09 [PATCH 0/4] bb-reorder: Add the "simple" algorithm Segher Boessenkool
2015-09-23 22:09 ` [PATCH 1/4] bb-reorder: Split out STC Segher Boessenkool
2015-09-24 16:12   ` Steven Bosscher
2015-09-23 22:10 ` [PATCH 2/4] bb-reorder: Add the "simple" algorithm Segher Boessenkool
2015-09-24 11:13   ` Bernd Schmidt
2015-09-24 13:48     ` Segher Boessenkool
2015-09-24 15:19       ` Segher Boessenkool
2015-09-24 16:35   ` Steven Bosscher
2015-09-24 17:22     ` Segher Boessenkool
2015-09-25 14:31   ` [PATCH 2/4 v2] " Segher Boessenkool
2015-09-25 16:09     ` Peter Bergner
2015-09-25 16:15       ` Segher Boessenkool
2015-09-29 13:19     ` Bernd Schmidt
2015-09-23 23:01 ` [PATCH 3/4] bb-reorder: Add -freorder-blocks-algorithm= and wire it up Segher Boessenkool
2015-09-24 11:00   ` Bernd Schmidt
2015-09-24 14:06     ` Segher Boessenkool
2015-09-24 15:21   ` Andi Kleen
2015-09-24 15:49     ` Segher Boessenkool
2015-09-24 17:16       ` Segher Boessenkool
     [not found]   ` <56377569.7010904@arm.com>
2015-11-02 15:14     ` Alan Lawrence
2015-11-02 21:04     ` Segher Boessenkool
2015-09-23 23:36 ` [PATCH 4/4] bb-reorder: Documentation updates Segher Boessenkool
2015-09-24 10:33 ` [PATCH 0/4] bb-reorder: Add the "simple" algorithm Bernd Schmidt
2015-09-24 13:32   ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).