[PATCH 0/3] Simplify the simple

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 0/3] Simplify the simple_return handling
@ 2016-05-03  7:02 Segher Boessenkool
  2016-05-03  7:03 ` [PATCH 2/3] cfgcleanup: Fold jumps and conditional branches with returns Segher Boessenkool
                   ` (4 more replies)
  0 siblings, 5 replies; 25+ messages in thread
From: Segher Boessenkool @ 2016-05-03  7:02 UTC (permalink / raw)
  To: gcc-patches; +Cc: Segher Boessenkool

This series teaches cfgcleanup how to optimize jumps and branches to and
around return statements, after which the shrink-wrap code doesn't have
to deal with it anymore.  The simplified code also catches a few more
cases.

Tested on powerpc64-linux (-m32 and -m64, all languages), and also on
x86_64-linux (actually, still in progress with the most recent version
of the patchset).  Is this okay for trunk?

Segher

  cfgcleanup: Bugfix
  cfgcleanup: Fold jumps and conditional branches with returns
  shrink-wrap: Remove complicated simple_return manipulations

 gcc/cfgcleanup.c  | 124 +++++++++++++++++++++++++++++++
 gcc/function.c    | 213 +-----------------------------------------------------
 gcc/shrink-wrap.c | 171 ++++---------------------------------------
 gcc/shrink-wrap.h |   6 --
 4 files changed, 140 insertions(+), 374 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 3/3] shrink-wrap: Remove complicated simple_return manipulations
  2016-05-03  7:02 [PATCH 0/3] Simplify the simple_return handling Segher Boessenkool
  2016-05-03  7:03 ` [PATCH 2/3] cfgcleanup: Fold jumps and conditional branches with returns Segher Boessenkool
@ 2016-05-03  7:03 ` Segher Boessenkool
  2016-05-09 13:54   ` Christophe Lyon
  2016-05-03  7:03 ` [PATCH 1/3] cfgcleanup: Bugfix in try_simplify_condjump Segher Boessenkool
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 25+ messages in thread
From: Segher Boessenkool @ 2016-05-03  7:03 UTC (permalink / raw)
  To: gcc-patches; +Cc: Segher Boessenkool

Now that cfgcleanup knows how to optimize with return statements, the
epilogue insertion code doesn't have to deal with it itself anymore.


2016-05-03  Segher Boessenkool  <segher@kernel.crashing.org>

	* function.c (emit_use_return_register_into_block): Delete.
	(gen_return_pattern): Delete.
	(emit_return_into_block): Delete.
	(active_insn_between): Delete.
	(convert_jumps_to_returns): Delete.
	(emit_return_for_exit): Delete.
	(thread_prologue_and_epilogue_insns): Delete all code dealing with
	simple_return for shrink-wrapped blocks.
	* shrink-wrap.c (try_shrink_wrapping): Insert simple_return at the
	end of blocks that need one.
	(get_unconverted_simple_return): Delete.
	(convert_to_simple_return): Delete.
	* shrink-wrap.c (get_unconverted_simple_return): Delete declaration.
	(convert_to_simple_return): Ditto.

---
 gcc/function.c    | 213 +-----------------------------------------------------
 gcc/shrink-wrap.c | 171 ++++---------------------------------------
 gcc/shrink-wrap.h |   6 --
 3 files changed, 16 insertions(+), 374 deletions(-)

diff --git a/gcc/function.c b/gcc/function.c
index f6eb56c..b9a6338 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -5753,49 +5753,6 @@ prologue_epilogue_contains (const_rtx insn)
   return 0;
 }
 
-/* Insert use of return register before the end of BB.  */
-
-static void
-emit_use_return_register_into_block (basic_block bb)
-{
-  start_sequence ();
-  use_return_register ();
-  rtx_insn *seq = get_insns ();
-  end_sequence ();
-  rtx_insn *insn = BB_END (bb);
-  if (HAVE_cc0 && reg_mentioned_p (cc0_rtx, PATTERN (insn)))
-    insn = prev_cc0_setter (insn);
-
-  emit_insn_before (seq, insn);
-}
-
-
-/* Create a return pattern, either simple_return or return, depending on
-   simple_p.  */
-
-static rtx_insn *
-gen_return_pattern (bool simple_p)
-{
-  return (simple_p
-	  ? targetm.gen_simple_return ()
-	  : targetm.gen_return ());
-}
-
-/* Insert an appropriate return pattern at the end of block BB.  This
-   also means updating block_for_insn appropriately.  SIMPLE_P is
-   the same as in gen_return_pattern and passed to it.  */
-
-void
-emit_return_into_block (bool simple_p, basic_block bb)
-{
-  rtx_jump_insn *jump = emit_jump_insn_after (gen_return_pattern (simple_p),
-					      BB_END (bb));
-  rtx pat = PATTERN (jump);
-  if (GET_CODE (pat) == PARALLEL)
-    pat = XVECEXP (pat, 0, 0);
-  gcc_assert (ANY_RETURN_P (pat));
-  JUMP_LABEL (jump) = pat;
-}
 
 /* Set JUMP_LABEL for a return insn.  */
 
@@ -5811,135 +5768,6 @@ set_return_jump_label (rtx_insn *returnjump)
     JUMP_LABEL (returnjump) = ret_rtx;
 }
 
-/* Return true if there are any active insns between HEAD and TAIL.  */
-bool
-active_insn_between (rtx_insn *head, rtx_insn *tail)
-{
-  while (tail)
-    {
-      if (active_insn_p (tail))
-	return true;
-      if (tail == head)
-	return false;
-      tail = PREV_INSN (tail);
-    }
-  return false;
-}
-
-/* LAST_BB is a block that exits, and empty of active instructions.
-   Examine its predecessors for jumps that can be converted to
-   (conditional) returns.  */
-vec<edge>
-convert_jumps_to_returns (basic_block last_bb, bool simple_p,
-			  vec<edge> unconverted ATTRIBUTE_UNUSED)
-{
-  int i;
-  basic_block bb;
-  edge_iterator ei;
-  edge e;
-  auto_vec<basic_block> src_bbs (EDGE_COUNT (last_bb->preds));
-
-  FOR_EACH_EDGE (e, ei, last_bb->preds)
-    if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun))
-      src_bbs.quick_push (e->src);
-
-  rtx_insn *label = BB_HEAD (last_bb);
-
-  FOR_EACH_VEC_ELT (src_bbs, i, bb)
-    {
-      rtx_insn *jump = BB_END (bb);
-
-      if (!JUMP_P (jump) || JUMP_LABEL (jump) != label)
-	continue;
-
-      e = find_edge (bb, last_bb);
-
-      /* If we have an unconditional jump, we can replace that
-	 with a simple return instruction.  */
-      if (simplejump_p (jump))
-	{
-	  /* The use of the return register might be present in the exit
-	     fallthru block.  Either:
-	     - removing the use is safe, and we should remove the use in
-	     the exit fallthru block, or
-	     - removing the use is not safe, and we should add it here.
-	     For now, we conservatively choose the latter.  Either of the
-	     2 helps in crossjumping.  */
-	  emit_use_return_register_into_block (bb);
-
-	  emit_return_into_block (simple_p, bb);
-	  delete_insn (jump);
-	}
-
-      /* If we have a conditional jump branching to the last
-	 block, we can try to replace that with a conditional
-	 return instruction.  */
-      else if (condjump_p (jump))
-	{
-	  rtx dest;
-
-	  if (simple_p)
-	    dest = simple_return_rtx;
-	  else
-	    dest = ret_rtx;
-	  if (!redirect_jump (as_a <rtx_jump_insn *> (jump), dest, 0))
-	    {
-	      if (targetm.have_simple_return () && simple_p)
-		{
-		  if (dump_file)
-		    fprintf (dump_file,
-			     "Failed to redirect bb %d branch.\n", bb->index);
-		  unconverted.safe_push (e);
-		}
-	      continue;
-	    }
-
-	  /* See comment in simplejump_p case above.  */
-	  emit_use_return_register_into_block (bb);
-
-	  /* If this block has only one successor, it both jumps
-	     and falls through to the fallthru block, so we can't
-	     delete the edge.  */
-	  if (single_succ_p (bb))
-	    continue;
-	}
-      else
-	{
-	  if (targetm.have_simple_return () && simple_p)
-	    {
-	      if (dump_file)
-		fprintf (dump_file,
-			 "Failed to redirect bb %d branch.\n", bb->index);
-	      unconverted.safe_push (e);
-	    }
-	  continue;
-	}
-
-      /* Fix up the CFG for the successful change we just made.  */
-      redirect_edge_succ (e, EXIT_BLOCK_PTR_FOR_FN (cfun));
-      e->flags &= ~EDGE_CROSSING;
-    }
-  src_bbs.release ();
-  return unconverted;
-}
-
-/* Emit a return insn for the exit fallthru block.  */
-basic_block
-emit_return_for_exit (edge exit_fallthru_edge, bool simple_p)
-{
-  basic_block last_bb = exit_fallthru_edge->src;
-
-  if (JUMP_P (BB_END (last_bb)))
-    {
-      last_bb = split_edge (exit_fallthru_edge);
-      exit_fallthru_edge = single_succ_edge (last_bb);
-    }
-  emit_barrier_after (BB_END (last_bb));
-  emit_return_into_block (simple_p, last_bb);
-  exit_fallthru_edge->flags &= ~EDGE_FALLTHRU;
-  return last_bb;
-}
-
 
 /* Generate the prologue and epilogue RTL if the machine supports it.  Thread
    this into place with notes indicating where the prologue ends and where
@@ -5993,7 +5821,6 @@ void
 thread_prologue_and_epilogue_insns (void)
 {
   bool inserted;
-  vec<edge> unconverted_simple_returns = vNULL;
   bitmap_head bb_flags;
   rtx_insn *returnjump;
   rtx_insn *epilogue_end ATTRIBUTE_UNUSED;
@@ -6088,40 +5915,8 @@ thread_prologue_and_epilogue_insns (void)
 
   exit_fallthru_edge = find_fallthru_edge (EXIT_BLOCK_PTR_FOR_FN (cfun)->preds);
 
-  if (targetm.have_simple_return () && entry_edge != orig_entry_edge)
-    exit_fallthru_edge
-	= get_unconverted_simple_return (exit_fallthru_edge, bb_flags,
-					 &unconverted_simple_returns,
-					 &returnjump);
-  if (targetm.have_return ())
-    {
-      if (exit_fallthru_edge == NULL)
-	goto epilogue_done;
-
-      if (optimize)
-	{
-	  basic_block last_bb = exit_fallthru_edge->src;
-
-	  if (LABEL_P (BB_HEAD (last_bb))
-	      && !active_insn_between (BB_HEAD (last_bb), BB_END (last_bb)))
-	    convert_jumps_to_returns (last_bb, false, vNULL);
-
-	  if (EDGE_COUNT (last_bb->preds) != 0
-	      && single_succ_p (last_bb))
-	    {
-	      last_bb = emit_return_for_exit (exit_fallthru_edge, false);
-	      epilogue_end = returnjump = BB_END (last_bb);
-
-	      /* Emitting the return may add a basic block.
-		 Fix bb_flags for the added block.  */
-	      if (targetm.have_simple_return ()
-		  && last_bb != exit_fallthru_edge->src)
-		bitmap_set_bit (&bb_flags, last_bb->index);
-
-	      goto epilogue_done;
-	    }
-	}
-    }
+  if (targetm.have_return () && exit_fallthru_edge == NULL)
+    goto epilogue_done;
 
   /* A small fib -- epilogue is not yet completed, but we wish to re-use
      this marker for the splits of EH_RETURN patterns, and nothing else
@@ -6229,10 +6024,6 @@ epilogue_done:
 	}
     }
 
-  if (targetm.have_simple_return ())
-    convert_to_simple_return (entry_edge, orig_entry_edge, bb_flags,
-			      returnjump, unconverted_simple_returns);
-
   /* Emit sibling epilogues before any sibling call sites.  */
   for (ei = ei_start (EXIT_BLOCK_PTR_FOR_FN (cfun)->preds); (e =
 							     ei_safe_edge (ei));
diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index fe79519..0ba1fed 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -946,10 +946,9 @@ try_shrink_wrapping (edge *entry_edge, bitmap_head *bb_with,
 	redirect_edge_and_branch_force (e, (basic_block) e->dest->aux);
       }
 
-  /* Change all the exits that should get a simple_return to FAKE.
-     They will be converted later.  */
+  /* Make a simple_return for those exits that run without prologue.  */
 
-  FOR_EACH_BB_FN (bb, cfun)
+  FOR_EACH_BB_REVERSE_FN (bb, cfun)
     if (!bitmap_bit_p (bb_with, bb->index))
       FOR_EACH_EDGE (e, ei, bb->succs)
 	if (e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun))
@@ -958,7 +957,18 @@ try_shrink_wrapping (edge *entry_edge, bitmap_head *bb_with,
 
 	    e->flags &= ~EDGE_FALLTHRU;
 	    if (!(e->flags & EDGE_SIBCALL))
-	      e->flags |= EDGE_FAKE;
+	      {
+		rtx_insn *ret = targetm.gen_simple_return ();
+		rtx_insn *end = BB_END (e->src);
+		rtx_jump_insn *start = emit_jump_insn_after (ret, end);
+		JUMP_LABEL (start) = simple_return_rtx;
+		e->flags &= ~EDGE_FAKE;
+
+		if (dump_file)
+		  fprintf (dump_file,
+			   "Made simple_return with UID %d in bb %d\n",
+			   INSN_UID (start), e->src->index);
+	      }
 
 	    emit_barrier_after_bb (e->src);
 	  }
@@ -996,156 +1006,3 @@ try_shrink_wrapping (edge *entry_edge, bitmap_head *bb_with,
 
   free_dominance_info (CDI_DOMINATORS);
 }
-
-/* If we're allowed to generate a simple return instruction, then by
-   definition we don't need a full epilogue.  If the last basic
-   block before the exit block does not contain active instructions,
-   examine its predecessors and try to emit (conditional) return
-   instructions.  */
-
-edge
-get_unconverted_simple_return (edge exit_fallthru_edge, bitmap_head bb_flags,
-			       vec<edge> *unconverted_simple_returns,
-			       rtx_insn **returnjump)
-{
-  if (optimize)
-    {
-      unsigned i, last;
-
-      /* convert_jumps_to_returns may add to preds of the exit block
-         (but won't remove).  Stop at end of current preds.  */
-      last = EDGE_COUNT (EXIT_BLOCK_PTR_FOR_FN (cfun)->preds);
-      for (i = 0; i < last; i++)
-	{
-	  edge e = EDGE_I (EXIT_BLOCK_PTR_FOR_FN (cfun)->preds, i);
-	  if (LABEL_P (BB_HEAD (e->src))
-	      && !bitmap_bit_p (&bb_flags, e->src->index)
-	      && !active_insn_between (BB_HEAD (e->src), BB_END (e->src)))
-	    *unconverted_simple_returns
-		  = convert_jumps_to_returns (e->src, true,
-					      *unconverted_simple_returns);
-	}
-    }
-
-  if (exit_fallthru_edge != NULL
-      && EDGE_COUNT (exit_fallthru_edge->src->preds) != 0
-      && !bitmap_bit_p (&bb_flags, exit_fallthru_edge->src->index))
-    {
-      basic_block last_bb;
-
-      last_bb = emit_return_for_exit (exit_fallthru_edge, true);
-      *returnjump = BB_END (last_bb);
-      exit_fallthru_edge = NULL;
-    }
-  return exit_fallthru_edge;
-}
-
-/* If there were branches to an empty LAST_BB which we tried to
-   convert to conditional simple_returns, but couldn't for some
-   reason, create a block to hold a simple_return insn and redirect
-   those remaining edges.  */
-
-void
-convert_to_simple_return (edge entry_edge, edge orig_entry_edge,
-			  bitmap_head bb_flags, rtx_insn *returnjump,
-			  vec<edge> unconverted_simple_returns)
-{
-  edge e;
-  edge_iterator ei;
-
-  if (!unconverted_simple_returns.is_empty ())
-    {
-      basic_block simple_return_block_hot = NULL;
-      basic_block simple_return_block_cold = NULL;
-      edge pending_edge_hot = NULL;
-      edge pending_edge_cold = NULL;
-      basic_block exit_pred;
-      int i;
-
-      gcc_assert (entry_edge != orig_entry_edge);
-
-      /* See if we can reuse the last insn that was emitted for the
-	 epilogue.  */
-      if (returnjump != NULL_RTX
-	  && JUMP_LABEL (returnjump) == simple_return_rtx)
-	{
-	  e = split_block (BLOCK_FOR_INSN (returnjump), PREV_INSN (returnjump));
-	  if (BB_PARTITION (e->src) == BB_HOT_PARTITION)
-	    simple_return_block_hot = e->dest;
-	  else
-	    simple_return_block_cold = e->dest;
-	}
-
-      /* Also check returns we might need to add to tail blocks.  */
-      FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
-	if (EDGE_COUNT (e->src->preds) != 0
-	    && (e->flags & EDGE_FAKE) != 0
-	    && !bitmap_bit_p (&bb_flags, e->src->index))
-	  {
-	    if (BB_PARTITION (e->src) == BB_HOT_PARTITION)
-	      pending_edge_hot = e;
-	    else
-	      pending_edge_cold = e;
-	  }
-
-      /* Save a pointer to the exit's predecessor BB for use in
-         inserting new BBs at the end of the function.  Do this
-         after the call to split_block above which may split
-         the original exit pred.  */
-      exit_pred = EXIT_BLOCK_PTR_FOR_FN (cfun)->prev_bb;
-
-      FOR_EACH_VEC_ELT (unconverted_simple_returns, i, e)
-	{
-	  basic_block *pdest_bb;
-	  edge pending;
-
-	  if (BB_PARTITION (e->src) == BB_HOT_PARTITION)
-	    {
-	      pdest_bb = &simple_return_block_hot;
-	      pending = pending_edge_hot;
-	    }
-	  else
-	    {
-	      pdest_bb = &simple_return_block_cold;
-	      pending = pending_edge_cold;
-	    }
-
-	  if (*pdest_bb == NULL && pending != NULL)
-	    {
-	      emit_return_into_block (true, pending->src);
-	      pending->flags &= ~(EDGE_FALLTHRU | EDGE_FAKE);
-	      *pdest_bb = pending->src;
-	    }
-	  else if (*pdest_bb == NULL)
-	    {
-	      basic_block bb;
-
-	      bb = create_basic_block (NULL, NULL, exit_pred);
-	      BB_COPY_PARTITION (bb, e->src);
-	      rtx_insn *ret = targetm.gen_simple_return ();
-	      rtx_jump_insn *start = emit_jump_insn_after (ret, BB_END (bb));
-	      JUMP_LABEL (start) = simple_return_rtx;
-	      emit_barrier_after (start);
-
-	      *pdest_bb = bb;
-	      make_edge (bb, EXIT_BLOCK_PTR_FOR_FN (cfun), 0);
-	    }
-	  redirect_edge_and_branch_force (e, *pdest_bb);
-	}
-      unconverted_simple_returns.release ();
-    }
-
-  if (entry_edge != orig_entry_edge)
-    {
-      FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
-	if (EDGE_COUNT (e->src->preds) != 0
-	    && (e->flags & EDGE_FAKE) != 0
-	    && !bitmap_bit_p (&bb_flags, e->src->index))
-	  {
-	    e = fix_fake_fallthrough_edge (e);
-
-	    emit_return_into_block (true, e->src);
-	    e->flags &= ~(EDGE_FALLTHRU | EDGE_FAKE);
-	  }
-    }
-}
diff --git a/gcc/shrink-wrap.h b/gcc/shrink-wrap.h
index 6e7a4f7..4d821d7 100644
--- a/gcc/shrink-wrap.h
+++ b/gcc/shrink-wrap.h
@@ -26,12 +26,6 @@ along with GCC; see the file COPYING3.  If not see
 extern bool requires_stack_frame_p (rtx_insn *, HARD_REG_SET, HARD_REG_SET);
 extern void try_shrink_wrapping (edge *entry_edge, bitmap_head *bb_flags,
 				 rtx_insn *prologue_seq);
-extern edge get_unconverted_simple_return (edge, bitmap_head,
-					   vec<edge> *, rtx_insn **);
-extern void convert_to_simple_return (edge entry_edge, edge orig_entry_edge,
-				      bitmap_head bb_flags,
-				      rtx_insn *returnjump,
-				      vec<edge> unconverted_simple_returns);
 #define SHRINK_WRAPPING_ENABLED \
   (flag_shrink_wrap && targetm.have_simple_return ())
 
-- 
1.9.3

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/3] cfgcleanup: Bugfix in try_simplify_condjump
  2016-05-03  7:02 [PATCH 0/3] Simplify the simple_return handling Segher Boessenkool
  2016-05-03  7:03 ` [PATCH 2/3] cfgcleanup: Fold jumps and conditional branches with returns Segher Boessenkool
  2016-05-03  7:03 ` [PATCH 3/3] shrink-wrap: Remove complicated simple_return manipulations Segher Boessenkool
@ 2016-05-03  7:03 ` Segher Boessenkool
  2016-05-03 20:29   ` Steven Bosscher
  2016-05-03 12:53 ` [PATCH 0/3] Simplify the simple_return handling Bernd Schmidt
  2016-05-03 15:03 ` Kyrill Tkachov
  4 siblings, 1 reply; 25+ messages in thread
From: Segher Boessenkool @ 2016-05-03  7:03 UTC (permalink / raw)
  To: gcc-patches; +Cc: Segher Boessenkool

If the jump_block here contains just a return, we will crash later
in invert_jump.  Don't allow that case.


2016-05-03  Segher Boessenkool  <segher@kernel.crashing.org>

	* cfgcleanup.c (try_simplify_condjump): Don't try to simplify a
	branch to a return.

---
 gcc/cfgcleanup.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c
index 6e92d4c..19583a7 100644
--- a/gcc/cfgcleanup.c
+++ b/gcc/cfgcleanup.c
@@ -156,6 +156,7 @@ try_simplify_condjump (basic_block cbranch_block)
   cbranch_dest_block = cbranch_jump_edge->dest;
 
   if (cbranch_dest_block == EXIT_BLOCK_PTR_FOR_FN (cfun)
+      || jump_dest_block == EXIT_BLOCK_PTR_FOR_FN (cfun)
       || !can_fallthru (jump_block, cbranch_dest_block))
     return false;
 
-- 
1.9.3

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 2/3] cfgcleanup: Fold jumps and conditional branches with returns
  2016-05-03  7:02 [PATCH 0/3] Simplify the simple_return handling Segher Boessenkool
@ 2016-05-03  7:03 ` Segher Boessenkool
  2016-05-10 19:34   ` Christophe Lyon
  2016-05-03  7:03 ` [PATCH 3/3] shrink-wrap: Remove complicated simple_return manipulations Segher Boessenkool
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 25+ messages in thread
From: Segher Boessenkool @ 2016-05-03  7:03 UTC (permalink / raw)
  To: gcc-patches; +Cc: Segher Boessenkool

This patch makes cfgcleanup optimize jumps to returns.  There are three
cases this handles:

-- A jump to a return; this is simplified to just that return.
-- A conditional branch to a return; simplified to a conditional return.
-- A conditional branch that falls through to a return.  This is simplified
   to a conditional return (with the condition inverted), falling through
   to a jump to the original destination.  That jump can then be optimized
   further, as usual.

This handles all cases the current function.c does, and a few it misses.


2016-05-03  Segher Boessenkool  <segher@kernel.crashing.org>

	* cfgcleanup.c (bb_is_just_return): New function.
	(try_optimize_cfg): Simplify jumps to return, branches to return,
	and branches around return.

---
 gcc/cfgcleanup.c | 123 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 123 insertions(+)

diff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c
index 19583a7..4f87811 100644
--- a/gcc/cfgcleanup.c
+++ b/gcc/cfgcleanup.c
@@ -2606,6 +2606,35 @@ trivially_empty_bb_p (basic_block bb)
     }
 }
 
+/* Return true if BB contains just a return and possibly a USE of the
+   return value.  Fill in *RET and *USE with the return and use insns
+   if any found, otherwise NULL.  */
+
+static bool
+bb_is_just_return (basic_block bb, rtx_insn **ret, rtx_insn **use)
+{
+  *ret = *use = NULL;
+  rtx_insn *insn;
+
+  if (bb == EXIT_BLOCK_PTR_FOR_FN (cfun))
+    return false;
+
+  FOR_BB_INSNS (bb, insn)
+    if (NONDEBUG_INSN_P (insn))
+      {
+	if (!*ret && ANY_RETURN_P (PATTERN (insn)))
+	  *ret = insn;
+	else if (!*ret && !*use && GET_CODE (PATTERN (insn)) == USE
+	    && REG_P (XEXP (PATTERN (insn), 0))
+	    && REG_FUNCTION_VALUE_P (XEXP (PATTERN (insn), 0)))
+	  *use = insn;
+	else
+	  return false;
+      }
+
+  return !!*ret;
+}
+
 /* Do simple CFG optimizations - basic block merging, simplifying of jump
    instructions etc.  Return nonzero if changes were made.  */
 
@@ -2792,6 +2821,100 @@ try_optimize_cfg (int mode)
 		      }
 		}
 
+	      /* Try to change a branch to a return to just that return.  */
+	      rtx_insn *ret, *use;
+	      if (single_succ_p (b)
+		  && onlyjump_p (BB_END (b))
+		  && bb_is_just_return (single_succ (b), &ret, &use))
+		{
+		  if (redirect_jump (as_a <rtx_jump_insn *> (BB_END (b)),
+				     PATTERN (ret), 0))
+		    {
+		      if (use)
+			emit_insn_before (copy_insn (PATTERN (use)),
+					  BB_END (b));
+		      if (dump_file)
+			fprintf (dump_file, "Changed jump %d->%d to return.\n",
+					    b->index, single_succ (b)->index);
+		      redirect_edge_succ (single_succ_edge (b),
+					  EXIT_BLOCK_PTR_FOR_FN (cfun));
+		      single_succ_edge (b)->flags &= ~EDGE_CROSSING;
+		      changed_here = true;
+		    }
+		}
+
+	      /* Try to change a conditional branch to a return to the
+		 respective conditional return.  */
+	      if (EDGE_COUNT (b->succs) == 2
+		  && any_condjump_p (BB_END (b))
+		  && bb_is_just_return (BRANCH_EDGE (b)->dest, &ret, &use))
+		{
+		  if (redirect_jump (as_a <rtx_jump_insn *> (BB_END (b)),
+				     PATTERN (ret), 0))
+		    {
+		      if (use)
+			emit_insn_before (copy_insn (PATTERN (use)),
+					  BB_END (b));
+		      if (dump_file)
+			fprintf (dump_file, "Changed conditional jump %d->%d "
+					    "to conditional return.\n",
+					    b->index,
+					    BRANCH_EDGE (b)->dest->index);
+		      redirect_edge_succ (BRANCH_EDGE (b),
+					  EXIT_BLOCK_PTR_FOR_FN (cfun));
+		      BRANCH_EDGE (b)->flags &= ~EDGE_CROSSING;
+		      changed_here = true;
+		    }
+		}
+
+	      /* Try to flip a conditional branch that falls through to
+		 a return so that it becomes a conditional return and a
+		 new jump to the original branch target.  */
+	      if (EDGE_COUNT (b->succs) == 2
+		  && any_condjump_p (BB_END (b))
+		  && bb_is_just_return (FALLTHRU_EDGE (b)->dest, &ret, &use))
+		{
+		  if (invert_jump (as_a <rtx_jump_insn *> (BB_END (b)),
+				   JUMP_LABEL (BB_END (b)), 0))
+		    {
+		      basic_block new_ft = BRANCH_EDGE (b)->dest;
+		      if (redirect_jump (as_a <rtx_jump_insn *> (BB_END (b)),
+					 PATTERN (ret), 0))
+			{
+			  if (use)
+			    emit_insn_before (copy_insn (PATTERN (use)),
+					      BB_END (b));
+			  if (dump_file)
+			    fprintf (dump_file, "Changed conditional jump "
+						"%d->%d to conditional return, "
+						"adding fall-through jump.\n",
+						b->index,
+						BRANCH_EDGE (b)->dest->index);
+			  redirect_edge_succ (BRANCH_EDGE (b),
+					      EXIT_BLOCK_PTR_FOR_FN (cfun));
+			  BRANCH_EDGE (b)->flags &= ~EDGE_CROSSING;
+			  std::swap (BRANCH_EDGE (b)->probability,
+				     FALLTHRU_EDGE (b)->probability);
+			  update_br_prob_note (b);
+			  basic_block jb = force_nonfallthru (FALLTHRU_EDGE (b));
+			  notice_new_block (jb);
+			  if (!redirect_jump (as_a <rtx_jump_insn *> (BB_END (jb)),
+					      block_label (new_ft), 0))
+			    gcc_unreachable ();
+			  redirect_edge_succ (single_succ_edge (jb), new_ft);
+			  changed_here = true;
+			}
+		      else
+			{
+			  /* Invert the jump back to what it was.  This should
+			     never fail.  */
+			  if (!invert_jump (as_a <rtx_jump_insn *> (BB_END (b)),
+					    JUMP_LABEL (BB_END (b)), 0))
+			    gcc_unreachable ();
+			}
+		    }
+		}
+
 	      /* Simplify branch over branch.  */
 	      if ((mode & CLEANUP_EXPENSIVE)
 		   && !(mode & CLEANUP_CFGLAYOUT)
-- 
1.9.3

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3] Simplify the simple_return handling
  2016-05-03  7:02 [PATCH 0/3] Simplify the simple_return handling Segher Boessenkool
                   ` (2 preceding siblings ...)
  2016-05-03  7:03 ` [PATCH 1/3] cfgcleanup: Bugfix in try_simplify_condjump Segher Boessenkool
@ 2016-05-03 12:53 ` Bernd Schmidt
  2016-05-03 13:31   ` Segher Boessenkool
  2016-05-04  0:10   ` Segher Boessenkool
  2016-05-03 15:03 ` Kyrill Tkachov
  4 siblings, 2 replies; 25+ messages in thread
From: Bernd Schmidt @ 2016-05-03 12:53 UTC (permalink / raw)
  To: Segher Boessenkool, gcc-patches



On 05/03/2016 08:59 AM, Segher Boessenkool wrote:
> This series teaches cfgcleanup how to optimize jumps and branches to and
> around return statements, after which the shrink-wrap code doesn't have
> to deal with it anymore.  The simplified code also catches a few more
> cases.
>
> Tested on powerpc64-linux (-m32 and -m64, all languages), and also on
> x86_64-linux (actually, still in progress with the most recent version
> of the patchset).  Is this okay for trunk?

Looking at the outputs I see a number of jump to return replaced with 
plain return, which seems like an improvement. There are random changes 
in .p2align output:

@@ -207,7 +207,7 @@
         jbe     .L20
         movl    $-1, %eax
         cmpl    $4, %r8d
-       ja      .L31
+       ja      .L28
         pushq   %r15
         .cfi_def_cfa_offset 16
         .cfi_offset 15, -16
@@ -288,8 +288,6 @@
         .cfi_offset 14, -24
         .cfi_offset 15, -16
         movl    $-1, %eax
-       .p2align 4,,10
-       .p2align 3
  .L18:
         addq    $8, %rsp
         .cfi_def_cfa_offset 56
@@ -311,7 +309,10 @@
         popq    %r15
         .cfi_restore 15
         .cfi_def_cfa_offset 8
-.L31:
+       ret
+       .p2align 4,,10
+       .p2align 3
+.L28:
         rep ret
         .cfi_endproc
  .LFE1257:

Do you have an explanation as to why this happens? (Testcase: basically 
any large file.)

Does ppc have conditional returns? If not, are you set up to test arm in 
any way? Ideally you'd want to run that as well.

I think all three patches look ok, except all your fprintf calls are 
misindented.


Bernd

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3] Simplify the simple_return handling
  2016-05-03 12:53 ` [PATCH 0/3] Simplify the simple_return handling Bernd Schmidt
@ 2016-05-03 13:31   ` Segher Boessenkool
  2016-05-03 13:46     ` Bernd Schmidt
  2016-05-04  0:10   ` Segher Boessenkool
  1 sibling, 1 reply; 25+ messages in thread
From: Segher Boessenkool @ 2016-05-03 13:31 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: gcc-patches

On Tue, May 03, 2016 at 02:53:42PM +0200, Bernd Schmidt wrote:
> >This series teaches cfgcleanup how to optimize jumps and branches to and
> >around return statements, after which the shrink-wrap code doesn't have
> >to deal with it anymore.  The simplified code also catches a few more
> >cases.
> >
> >Tested on powerpc64-linux (-m32 and -m64, all languages), and also on
> >x86_64-linux (actually, still in progress with the most recent version
> >of the patchset).  Is this okay for trunk?
> 
> Looking at the outputs I see a number of jump to return replaced with 
> plain return, which seems like an improvement.

Yeah; not very many though.

> There are random changes 
> in .p2align output:
> 
> @@ -207,7 +207,7 @@
>         jbe     .L20
>         movl    $-1, %eax
>         cmpl    $4, %r8d
> -       ja      .L31
> +       ja      .L28
>         pushq   %r15
>         .cfi_def_cfa_offset 16
>         .cfi_offset 15, -16
> @@ -288,8 +288,6 @@
>         .cfi_offset 14, -24
>         .cfi_offset 15, -16
>         movl    $-1, %eax
> -       .p2align 4,,10
> -       .p2align 3
>  .L18:
>         addq    $8, %rsp
>         .cfi_def_cfa_offset 56
> @@ -311,7 +309,10 @@
>         popq    %r15
>         .cfi_restore 15
>         .cfi_def_cfa_offset 8
> -.L31:
> +       ret
> +       .p2align 4,,10
> +       .p2align 3
> +.L28:
>         rep ret
>         .cfi_endproc
>  .LFE1257:
> 
> Do you have an explanation as to why this happens? (Testcase: basically 
> any large file.)

I have no explanation for this.  I'll investigate.

> Does ppc have conditional returns?

Yes it does.  "beqlr", etc.  They are generated fine.

> If not, are you set up to test arm in 
> any way? Ideally you'd want to run that as well.

Good plan.  There is arm64 in the cfarm; I'll see if I can build 32-bit
as well.

> I think all three patches look ok, except all your fprintf calls are 
> misindented.

In patch 2, bah I thought I fixed that.  I want to factor that huge function
anyway, reduce the indent a lot, okay to fix it then?  :-)


Segher

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3] Simplify the simple_return handling
  2016-05-03 13:31   ` Segher Boessenkool
@ 2016-05-03 13:46     ` Bernd Schmidt
  0 siblings, 0 replies; 25+ messages in thread
From: Bernd Schmidt @ 2016-05-03 13:46 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches

On 05/03/2016 03:31 PM, Segher Boessenkool wrote:
>> If not, are you set up to test arm in
>> any way? Ideally you'd want to run that as well.
>
> Good plan.  There is arm64 in the cfarm; I'll see if I can build 32-bit
> as well.

Simulator testing should work, but if ppc exercises this code there's 
probably no need to do it.

>> I think all three patches look ok, except all your fprintf calls are
>> misindented.
>
> In patch 2, bah I thought I fixed that.  I want to factor that huge function
> anyway, reduce the indent a lot, okay to fix it then?  :-)

Let's keep anything that's committed clean.


Bernd

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3] Simplify the simple_return handling
  2016-05-03  7:02 [PATCH 0/3] Simplify the simple_return handling Segher Boessenkool
                   ` (3 preceding siblings ...)
  2016-05-03 12:53 ` [PATCH 0/3] Simplify the simple_return handling Bernd Schmidt
@ 2016-05-03 15:03 ` Kyrill Tkachov
  4 siblings, 0 replies; 25+ messages in thread
From: Kyrill Tkachov @ 2016-05-03 15:03 UTC (permalink / raw)
  To: Segher Boessenkool, gcc-patches

Hi Segher,

On 03/05/16 07:59, Segher Boessenkool wrote:
> This series teaches cfgcleanup how to optimize jumps and branches to and
> around return statements, after which the shrink-wrap code doesn't have
> to deal with it anymore.  The simplified code also catches a few more
> cases.
>
> Tested on powerpc64-linux (-m32 and -m64, all languages), and also on
> x86_64-linux (actually, still in progress with the most recent version
> of the patchset).  Is this okay for trunk?
>

I tried make check on gcc on aarch64 and I didn't see any fallout.
I also had a quick look on SPECINT and there seems to be a consistent
small reduction in code size on them, mostly from redundant returns
being eliminated, for example:

     bgt    .L70
     ret
.L62:
     ret
.L76:

into:

     bgt    .L70
.L62:
     ret

So I think this should be a win overall.

Thanks,
Kyrill

> Segher
>
>
>    cfgcleanup: Bugfix
>    cfgcleanup: Fold jumps and conditional branches with returns
>    shrink-wrap: Remove complicated simple_return manipulations
>
>   gcc/cfgcleanup.c  | 124 +++++++++++++++++++++++++++++++
>   gcc/function.c    | 213 +-----------------------------------------------------
>   gcc/shrink-wrap.c | 171 ++++---------------------------------------
>   gcc/shrink-wrap.h |   6 --
>   4 files changed, 140 insertions(+), 374 deletions(-)
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3] cfgcleanup: Bugfix in try_simplify_condjump
  2016-05-03  7:03 ` [PATCH 1/3] cfgcleanup: Bugfix in try_simplify_condjump Segher Boessenkool
@ 2016-05-03 20:29   ` Steven Bosscher
  2016-05-03 21:09     ` Segher Boessenkool
  0 siblings, 1 reply; 25+ messages in thread
From: Steven Bosscher @ 2016-05-03 20:29 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: GCC Patches

On Tue, May 3, 2016 at 8:59 AM, Segher Boessenkool wrote:
> If the jump_block here contains just a return, we will crash later
> in invert_jump.  Don't allow that case.

But if jump_block contains a return, then it isn't the EXIT_BLOCK for
the function.
Is the conditional jump a conditional return insn?

Ciao!
Steven

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/3] cfgcleanup: Bugfix in try_simplify_condjump
  2016-05-03 20:29   ` Steven Bosscher
@ 2016-05-03 21:09     ` Segher Boessenkool
  0 siblings, 0 replies; 25+ messages in thread
From: Segher Boessenkool @ 2016-05-03 21:09 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: GCC Patches

On Tue, May 03, 2016 at 10:28:37PM +0200, Steven Bosscher wrote:
> On Tue, May 3, 2016 at 8:59 AM, Segher Boessenkool wrote:
> > If the jump_block here contains just a return, we will crash later
> > in invert_jump.  Don't allow that case.
> 
> But if jump_block contains a return, then it isn't the EXIT_BLOCK for
> the function.

The single successor block of the jump block is the exit block.

Too many blocks here ;-)


Segher

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3] Simplify the simple_return handling
  2016-05-03 12:53 ` [PATCH 0/3] Simplify the simple_return handling Bernd Schmidt
  2016-05-03 13:31   ` Segher Boessenkool
@ 2016-05-04  0:10   ` Segher Boessenkool
  2016-05-04 12:37     ` Bernd Schmidt
  1 sibling, 1 reply; 25+ messages in thread
From: Segher Boessenkool @ 2016-05-04  0:10 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: gcc-patches

On Tue, May 03, 2016 at 02:53:42PM +0200, Bernd Schmidt wrote:
> Looking at the outputs I see a number of jump to return replaced with 
> plain return, which seems like an improvement. There are random changes 
> in .p2align output:

<snip>

> Do you have an explanation as to why this happens? (Testcase: basically 
> any large file.)

A simple example is crtstuff.c:__do_global_dtors_aux, that is

===
void deregister_tm_clones (void);

static void __attribute__((used))
__do_global_dtors_aux (void)
{
  static _Bool completed;

  if (__builtin_expect (completed, 0))
    return;

  deregister_tm_clones ();

  completed = 1;
}
===

The difference happens in final.c:compute_alignments.  After the change,
you have a simple fork from bb2 fallthrough to bb3 (90%) and branch to
bb4 (10%), both bb3 and bb4 have a return.  compute_alignments decides
to align bb4.

Before the change, bb2 falls through to bb3 (90%) and branches to bb4
(10%); bb3 falls through to bb4; and bb4 has the wrong frequency (9000,
should be 10000).  bb4 is not aligned (because it is fallen into).

> I think all three patches look ok, except all your fprintf calls are 
> misindented.

Is this sufficient explanation, is it okay with the fprintf's fixed?

Thanks,


Segher

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/3] Simplify the simple_return handling
  2016-05-04  0:10   ` Segher Boessenkool
@ 2016-05-04 12:37     ` Bernd Schmidt
  0 siblings, 0 replies; 25+ messages in thread
From: Bernd Schmidt @ 2016-05-04 12:37 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches

On 05/04/2016 02:10 AM, Segher Boessenkool wrote:
>
> Is this sufficient explanation, is it okay with the fprintf's fixed?

Yeah, I suppose. From looking at some of the examples I have here I 
think there's still room for doubt whether all the alignment choices 
make perfect sense, but it's probably not something your patchkit is 
doing wrong.

Bernd

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] shrink-wrap: Remove complicated simple_return manipulations
  2016-05-03  7:03 ` [PATCH 3/3] shrink-wrap: Remove complicated simple_return manipulations Segher Boessenkool
@ 2016-05-09 13:54   ` Christophe Lyon
  2016-05-09 15:08     ` Segher Boessenkool
  0 siblings, 1 reply; 25+ messages in thread
From: Christophe Lyon @ 2016-05-09 13:54 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches

On 3 May 2016 at 08:59, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> Now that cfgcleanup knows how to optimize with return statements, the
> epilogue insertion code doesn't have to deal with it itself anymore.
>
>
> 2016-05-03  Segher Boessenkool  <segher@kernel.crashing.org>
>
>         * function.c (emit_use_return_register_into_block): Delete.
>         (gen_return_pattern): Delete.
>         (emit_return_into_block): Delete.
>         (active_insn_between): Delete.
>         (convert_jumps_to_returns): Delete.
>         (emit_return_for_exit): Delete.
>         (thread_prologue_and_epilogue_insns): Delete all code dealing with
>         simple_return for shrink-wrapped blocks.
>         * shrink-wrap.c (try_shrink_wrapping): Insert simple_return at the
>         end of blocks that need one.
>         (get_unconverted_simple_return): Delete.
>         (convert_to_simple_return): Delete.
>         * shrink-wrap.c (get_unconverted_simple_return): Delete declaration.
>         (convert_to_simple_return): Ditto.
>

Hi,

After this patch, I've noticed that
gcc.target/arm/pr43920-2.c
now fails at:
/* { dg-final { scan-assembler-times "pop" 2 } } */

Before the patch, the generated code was:
[...]
        pop     {r3, r4, r5, r6, r7, pc}
.L4:
        mov     r0, #-1
.L1:
        pop     {r3, r4, r5, r6, r7, pc}

it is now:
[...]
.L1:
        pop     {r3, r4, r5, r6, r7, pc}
.L4:
        mov     r0, #-1
        b       .L1

The new version does not seem better, as it adds a branch on the path
and it is not smaller.

Christophe.


> ---
>  gcc/function.c    | 213 +-----------------------------------------------------
>  gcc/shrink-wrap.c | 171 ++++---------------------------------------
>  gcc/shrink-wrap.h |   6 --
>  3 files changed, 16 insertions(+), 374 deletions(-)
>
> diff --git a/gcc/function.c b/gcc/function.c
> index f6eb56c..b9a6338 100644
> --- a/gcc/function.c
> +++ b/gcc/function.c
> @@ -5753,49 +5753,6 @@ prologue_epilogue_contains (const_rtx insn)
>    return 0;
>  }
>
> -/* Insert use of return register before the end of BB.  */
> -
> -static void
> -emit_use_return_register_into_block (basic_block bb)
> -{
> -  start_sequence ();
> -  use_return_register ();
> -  rtx_insn *seq = get_insns ();
> -  end_sequence ();
> -  rtx_insn *insn = BB_END (bb);
> -  if (HAVE_cc0 && reg_mentioned_p (cc0_rtx, PATTERN (insn)))
> -    insn = prev_cc0_setter (insn);
> -
> -  emit_insn_before (seq, insn);
> -}
> -
> -
> -/* Create a return pattern, either simple_return or return, depending on
> -   simple_p.  */
> -
> -static rtx_insn *
> -gen_return_pattern (bool simple_p)
> -{
> -  return (simple_p
> -         ? targetm.gen_simple_return ()
> -         : targetm.gen_return ());
> -}
> -
> -/* Insert an appropriate return pattern at the end of block BB.  This
> -   also means updating block_for_insn appropriately.  SIMPLE_P is
> -   the same as in gen_return_pattern and passed to it.  */
> -
> -void
> -emit_return_into_block (bool simple_p, basic_block bb)
> -{
> -  rtx_jump_insn *jump = emit_jump_insn_after (gen_return_pattern (simple_p),
> -                                             BB_END (bb));
> -  rtx pat = PATTERN (jump);
> -  if (GET_CODE (pat) == PARALLEL)
> -    pat = XVECEXP (pat, 0, 0);
> -  gcc_assert (ANY_RETURN_P (pat));
> -  JUMP_LABEL (jump) = pat;
> -}
>
>  /* Set JUMP_LABEL for a return insn.  */
>
> @@ -5811,135 +5768,6 @@ set_return_jump_label (rtx_insn *returnjump)
>      JUMP_LABEL (returnjump) = ret_rtx;
>  }
>
> -/* Return true if there are any active insns between HEAD and TAIL.  */
> -bool
> -active_insn_between (rtx_insn *head, rtx_insn *tail)
> -{
> -  while (tail)
> -    {
> -      if (active_insn_p (tail))
> -       return true;
> -      if (tail == head)
> -       return false;
> -      tail = PREV_INSN (tail);
> -    }
> -  return false;
> -}
> -
> -/* LAST_BB is a block that exits, and empty of active instructions.
> -   Examine its predecessors for jumps that can be converted to
> -   (conditional) returns.  */
> -vec<edge>
> -convert_jumps_to_returns (basic_block last_bb, bool simple_p,
> -                         vec<edge> unconverted ATTRIBUTE_UNUSED)
> -{
> -  int i;
> -  basic_block bb;
> -  edge_iterator ei;
> -  edge e;
> -  auto_vec<basic_block> src_bbs (EDGE_COUNT (last_bb->preds));
> -
> -  FOR_EACH_EDGE (e, ei, last_bb->preds)
> -    if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun))
> -      src_bbs.quick_push (e->src);
> -
> -  rtx_insn *label = BB_HEAD (last_bb);
> -
> -  FOR_EACH_VEC_ELT (src_bbs, i, bb)
> -    {
> -      rtx_insn *jump = BB_END (bb);
> -
> -      if (!JUMP_P (jump) || JUMP_LABEL (jump) != label)
> -       continue;
> -
> -      e = find_edge (bb, last_bb);
> -
> -      /* If we have an unconditional jump, we can replace that
> -        with a simple return instruction.  */
> -      if (simplejump_p (jump))
> -       {
> -         /* The use of the return register might be present in the exit
> -            fallthru block.  Either:
> -            - removing the use is safe, and we should remove the use in
> -            the exit fallthru block, or
> -            - removing the use is not safe, and we should add it here.
> -            For now, we conservatively choose the latter.  Either of the
> -            2 helps in crossjumping.  */
> -         emit_use_return_register_into_block (bb);
> -
> -         emit_return_into_block (simple_p, bb);
> -         delete_insn (jump);
> -       }
> -
> -      /* If we have a conditional jump branching to the last
> -        block, we can try to replace that with a conditional
> -        return instruction.  */
> -      else if (condjump_p (jump))
> -       {
> -         rtx dest;
> -
> -         if (simple_p)
> -           dest = simple_return_rtx;
> -         else
> -           dest = ret_rtx;
> -         if (!redirect_jump (as_a <rtx_jump_insn *> (jump), dest, 0))
> -           {
> -             if (targetm.have_simple_return () && simple_p)
> -               {
> -                 if (dump_file)
> -                   fprintf (dump_file,
> -                            "Failed to redirect bb %d branch.\n", bb->index);
> -                 unconverted.safe_push (e);
> -               }
> -             continue;
> -           }
> -
> -         /* See comment in simplejump_p case above.  */
> -         emit_use_return_register_into_block (bb);
> -
> -         /* If this block has only one successor, it both jumps
> -            and falls through to the fallthru block, so we can't
> -            delete the edge.  */
> -         if (single_succ_p (bb))
> -           continue;
> -       }
> -      else
> -       {
> -         if (targetm.have_simple_return () && simple_p)
> -           {
> -             if (dump_file)
> -               fprintf (dump_file,
> -                        "Failed to redirect bb %d branch.\n", bb->index);
> -             unconverted.safe_push (e);
> -           }
> -         continue;
> -       }
> -
> -      /* Fix up the CFG for the successful change we just made.  */
> -      redirect_edge_succ (e, EXIT_BLOCK_PTR_FOR_FN (cfun));
> -      e->flags &= ~EDGE_CROSSING;
> -    }
> -  src_bbs.release ();
> -  return unconverted;
> -}
> -
> -/* Emit a return insn for the exit fallthru block.  */
> -basic_block
> -emit_return_for_exit (edge exit_fallthru_edge, bool simple_p)
> -{
> -  basic_block last_bb = exit_fallthru_edge->src;
> -
> -  if (JUMP_P (BB_END (last_bb)))
> -    {
> -      last_bb = split_edge (exit_fallthru_edge);
> -      exit_fallthru_edge = single_succ_edge (last_bb);
> -    }
> -  emit_barrier_after (BB_END (last_bb));
> -  emit_return_into_block (simple_p, last_bb);
> -  exit_fallthru_edge->flags &= ~EDGE_FALLTHRU;
> -  return last_bb;
> -}
> -
>
>  /* Generate the prologue and epilogue RTL if the machine supports it.  Thread
>     this into place with notes indicating where the prologue ends and where
> @@ -5993,7 +5821,6 @@ void
>  thread_prologue_and_epilogue_insns (void)
>  {
>    bool inserted;
> -  vec<edge> unconverted_simple_returns = vNULL;
>    bitmap_head bb_flags;
>    rtx_insn *returnjump;
>    rtx_insn *epilogue_end ATTRIBUTE_UNUSED;
> @@ -6088,40 +5915,8 @@ thread_prologue_and_epilogue_insns (void)
>
>    exit_fallthru_edge = find_fallthru_edge (EXIT_BLOCK_PTR_FOR_FN (cfun)->preds);
>
> -  if (targetm.have_simple_return () && entry_edge != orig_entry_edge)
> -    exit_fallthru_edge
> -       = get_unconverted_simple_return (exit_fallthru_edge, bb_flags,
> -                                        &unconverted_simple_returns,
> -                                        &returnjump);
> -  if (targetm.have_return ())
> -    {
> -      if (exit_fallthru_edge == NULL)
> -       goto epilogue_done;
> -
> -      if (optimize)
> -       {
> -         basic_block last_bb = exit_fallthru_edge->src;
> -
> -         if (LABEL_P (BB_HEAD (last_bb))
> -             && !active_insn_between (BB_HEAD (last_bb), BB_END (last_bb)))
> -           convert_jumps_to_returns (last_bb, false, vNULL);
> -
> -         if (EDGE_COUNT (last_bb->preds) != 0
> -             && single_succ_p (last_bb))
> -           {
> -             last_bb = emit_return_for_exit (exit_fallthru_edge, false);
> -             epilogue_end = returnjump = BB_END (last_bb);
> -
> -             /* Emitting the return may add a basic block.
> -                Fix bb_flags for the added block.  */
> -             if (targetm.have_simple_return ()
> -                 && last_bb != exit_fallthru_edge->src)
> -               bitmap_set_bit (&bb_flags, last_bb->index);
> -
> -             goto epilogue_done;
> -           }
> -       }
> -    }
> +  if (targetm.have_return () && exit_fallthru_edge == NULL)
> +    goto epilogue_done;
>
>    /* A small fib -- epilogue is not yet completed, but we wish to re-use
>       this marker for the splits of EH_RETURN patterns, and nothing else
> @@ -6229,10 +6024,6 @@ epilogue_done:
>         }
>      }
>
> -  if (targetm.have_simple_return ())
> -    convert_to_simple_return (entry_edge, orig_entry_edge, bb_flags,
> -                             returnjump, unconverted_simple_returns);
> -
>    /* Emit sibling epilogues before any sibling call sites.  */
>    for (ei = ei_start (EXIT_BLOCK_PTR_FOR_FN (cfun)->preds); (e =
>                                                              ei_safe_edge (ei));
> diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
> index fe79519..0ba1fed 100644
> --- a/gcc/shrink-wrap.c
> +++ b/gcc/shrink-wrap.c
> @@ -946,10 +946,9 @@ try_shrink_wrapping (edge *entry_edge, bitmap_head *bb_with,
>         redirect_edge_and_branch_force (e, (basic_block) e->dest->aux);
>        }
>
> -  /* Change all the exits that should get a simple_return to FAKE.
> -     They will be converted later.  */
> +  /* Make a simple_return for those exits that run without prologue.  */
>
> -  FOR_EACH_BB_FN (bb, cfun)
> +  FOR_EACH_BB_REVERSE_FN (bb, cfun)
>      if (!bitmap_bit_p (bb_with, bb->index))
>        FOR_EACH_EDGE (e, ei, bb->succs)
>         if (e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun))
> @@ -958,7 +957,18 @@ try_shrink_wrapping (edge *entry_edge, bitmap_head *bb_with,
>
>             e->flags &= ~EDGE_FALLTHRU;
>             if (!(e->flags & EDGE_SIBCALL))
> -             e->flags |= EDGE_FAKE;
> +             {
> +               rtx_insn *ret = targetm.gen_simple_return ();
> +               rtx_insn *end = BB_END (e->src);
> +               rtx_jump_insn *start = emit_jump_insn_after (ret, end);
> +               JUMP_LABEL (start) = simple_return_rtx;
> +               e->flags &= ~EDGE_FAKE;
> +
> +               if (dump_file)
> +                 fprintf (dump_file,
> +                          "Made simple_return with UID %d in bb %d\n",
> +                          INSN_UID (start), e->src->index);
> +             }
>
>             emit_barrier_after_bb (e->src);
>           }
> @@ -996,156 +1006,3 @@ try_shrink_wrapping (edge *entry_edge, bitmap_head *bb_with,
>
>    free_dominance_info (CDI_DOMINATORS);
>  }
> -
> -/* If we're allowed to generate a simple return instruction, then by
> -   definition we don't need a full epilogue.  If the last basic
> -   block before the exit block does not contain active instructions,
> -   examine its predecessors and try to emit (conditional) return
> -   instructions.  */
> -
> -edge
> -get_unconverted_simple_return (edge exit_fallthru_edge, bitmap_head bb_flags,
> -                              vec<edge> *unconverted_simple_returns,
> -                              rtx_insn **returnjump)
> -{
> -  if (optimize)
> -    {
> -      unsigned i, last;
> -
> -      /* convert_jumps_to_returns may add to preds of the exit block
> -         (but won't remove).  Stop at end of current preds.  */
> -      last = EDGE_COUNT (EXIT_BLOCK_PTR_FOR_FN (cfun)->preds);
> -      for (i = 0; i < last; i++)
> -       {
> -         edge e = EDGE_I (EXIT_BLOCK_PTR_FOR_FN (cfun)->preds, i);
> -         if (LABEL_P (BB_HEAD (e->src))
> -             && !bitmap_bit_p (&bb_flags, e->src->index)
> -             && !active_insn_between (BB_HEAD (e->src), BB_END (e->src)))
> -           *unconverted_simple_returns
> -                 = convert_jumps_to_returns (e->src, true,
> -                                             *unconverted_simple_returns);
> -       }
> -    }
> -
> -  if (exit_fallthru_edge != NULL
> -      && EDGE_COUNT (exit_fallthru_edge->src->preds) != 0
> -      && !bitmap_bit_p (&bb_flags, exit_fallthru_edge->src->index))
> -    {
> -      basic_block last_bb;
> -
> -      last_bb = emit_return_for_exit (exit_fallthru_edge, true);
> -      *returnjump = BB_END (last_bb);
> -      exit_fallthru_edge = NULL;
> -    }
> -  return exit_fallthru_edge;
> -}
> -
> -/* If there were branches to an empty LAST_BB which we tried to
> -   convert to conditional simple_returns, but couldn't for some
> -   reason, create a block to hold a simple_return insn and redirect
> -   those remaining edges.  */
> -
> -void
> -convert_to_simple_return (edge entry_edge, edge orig_entry_edge,
> -                         bitmap_head bb_flags, rtx_insn *returnjump,
> -                         vec<edge> unconverted_simple_returns)
> -{
> -  edge e;
> -  edge_iterator ei;
> -
> -  if (!unconverted_simple_returns.is_empty ())
> -    {
> -      basic_block simple_return_block_hot = NULL;
> -      basic_block simple_return_block_cold = NULL;
> -      edge pending_edge_hot = NULL;
> -      edge pending_edge_cold = NULL;
> -      basic_block exit_pred;
> -      int i;
> -
> -      gcc_assert (entry_edge != orig_entry_edge);
> -
> -      /* See if we can reuse the last insn that was emitted for the
> -        epilogue.  */
> -      if (returnjump != NULL_RTX
> -         && JUMP_LABEL (returnjump) == simple_return_rtx)
> -       {
> -         e = split_block (BLOCK_FOR_INSN (returnjump), PREV_INSN (returnjump));
> -         if (BB_PARTITION (e->src) == BB_HOT_PARTITION)
> -           simple_return_block_hot = e->dest;
> -         else
> -           simple_return_block_cold = e->dest;
> -       }
> -
> -      /* Also check returns we might need to add to tail blocks.  */
> -      FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
> -       if (EDGE_COUNT (e->src->preds) != 0
> -           && (e->flags & EDGE_FAKE) != 0
> -           && !bitmap_bit_p (&bb_flags, e->src->index))
> -         {
> -           if (BB_PARTITION (e->src) == BB_HOT_PARTITION)
> -             pending_edge_hot = e;
> -           else
> -             pending_edge_cold = e;
> -         }
> -
> -      /* Save a pointer to the exit's predecessor BB for use in
> -         inserting new BBs at the end of the function.  Do this
> -         after the call to split_block above which may split
> -         the original exit pred.  */
> -      exit_pred = EXIT_BLOCK_PTR_FOR_FN (cfun)->prev_bb;
> -
> -      FOR_EACH_VEC_ELT (unconverted_simple_returns, i, e)
> -       {
> -         basic_block *pdest_bb;
> -         edge pending;
> -
> -         if (BB_PARTITION (e->src) == BB_HOT_PARTITION)
> -           {
> -             pdest_bb = &simple_return_block_hot;
> -             pending = pending_edge_hot;
> -           }
> -         else
> -           {
> -             pdest_bb = &simple_return_block_cold;
> -             pending = pending_edge_cold;
> -           }
> -
> -         if (*pdest_bb == NULL && pending != NULL)
> -           {
> -             emit_return_into_block (true, pending->src);
> -             pending->flags &= ~(EDGE_FALLTHRU | EDGE_FAKE);
> -             *pdest_bb = pending->src;
> -           }
> -         else if (*pdest_bb == NULL)
> -           {
> -             basic_block bb;
> -
> -             bb = create_basic_block (NULL, NULL, exit_pred);
> -             BB_COPY_PARTITION (bb, e->src);
> -             rtx_insn *ret = targetm.gen_simple_return ();
> -             rtx_jump_insn *start = emit_jump_insn_after (ret, BB_END (bb));
> -             JUMP_LABEL (start) = simple_return_rtx;
> -             emit_barrier_after (start);
> -
> -             *pdest_bb = bb;
> -             make_edge (bb, EXIT_BLOCK_PTR_FOR_FN (cfun), 0);
> -           }
> -         redirect_edge_and_branch_force (e, *pdest_bb);
> -       }
> -      unconverted_simple_returns.release ();
> -    }
> -
> -  if (entry_edge != orig_entry_edge)
> -    {
> -      FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
> -       if (EDGE_COUNT (e->src->preds) != 0
> -           && (e->flags & EDGE_FAKE) != 0
> -           && !bitmap_bit_p (&bb_flags, e->src->index))
> -         {
> -           e = fix_fake_fallthrough_edge (e);
> -
> -           emit_return_into_block (true, e->src);
> -           e->flags &= ~(EDGE_FALLTHRU | EDGE_FAKE);
> -         }
> -    }
> -}
> diff --git a/gcc/shrink-wrap.h b/gcc/shrink-wrap.h
> index 6e7a4f7..4d821d7 100644
> --- a/gcc/shrink-wrap.h
> +++ b/gcc/shrink-wrap.h
> @@ -26,12 +26,6 @@ along with GCC; see the file COPYING3.  If not see
>  extern bool requires_stack_frame_p (rtx_insn *, HARD_REG_SET, HARD_REG_SET);
>  extern void try_shrink_wrapping (edge *entry_edge, bitmap_head *bb_flags,
>                                  rtx_insn *prologue_seq);
> -extern edge get_unconverted_simple_return (edge, bitmap_head,
> -                                          vec<edge> *, rtx_insn **);
> -extern void convert_to_simple_return (edge entry_edge, edge orig_entry_edge,
> -                                     bitmap_head bb_flags,
> -                                     rtx_insn *returnjump,
> -                                     vec<edge> unconverted_simple_returns);
>  #define SHRINK_WRAPPING_ENABLED \
>    (flag_shrink_wrap && targetm.have_simple_return ())
>
> --
> 1.9.3
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] shrink-wrap: Remove complicated simple_return manipulations
  2016-05-09 13:54   ` Christophe Lyon
@ 2016-05-09 15:08     ` Segher Boessenkool
  2016-05-11 11:52       ` Jiong Wang
  0 siblings, 1 reply; 25+ messages in thread
From: Segher Boessenkool @ 2016-05-09 15:08 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: gcc-patches

Hi Christophe,

On Mon, May 09, 2016 at 03:54:26PM +0200, Christophe Lyon wrote:
> After this patch, I've noticed that
> gcc.target/arm/pr43920-2.c
> now fails at:
> /* { dg-final { scan-assembler-times "pop" 2 } } */
> 
> Before the patch, the generated code was:
> [...]
>         pop     {r3, r4, r5, r6, r7, pc}
> .L4:
>         mov     r0, #-1
> .L1:
>         pop     {r3, r4, r5, r6, r7, pc}
> 
> it is now:
> [...]
> .L1:
>         pop     {r3, r4, r5, r6, r7, pc}
> .L4:
>         mov     r0, #-1
>         b       .L1
> 
> The new version does not seem better, as it adds a branch on the path
> and it is not smaller.

That looks like bb-reorder isn't doing its job?  Maybe it thinks that
pop is too expensive to copy?


Segher

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/3] cfgcleanup: Fold jumps and conditional branches with returns
  2016-05-03  7:03 ` [PATCH 2/3] cfgcleanup: Fold jumps and conditional branches with returns Segher Boessenkool
@ 2016-05-10 19:34   ` Christophe Lyon
  2016-05-10 23:26     ` Segher Boessenkool
  0 siblings, 1 reply; 25+ messages in thread
From: Christophe Lyon @ 2016-05-10 19:34 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches

On 3 May 2016 at 08:59, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> This patch makes cfgcleanup optimize jumps to returns.  There are three
> cases this handles:
>
> -- A jump to a return; this is simplified to just that return.
> -- A conditional branch to a return; simplified to a conditional return.
> -- A conditional branch that falls through to a return.  This is simplified
>    to a conditional return (with the condition inverted), falling through
>    to a jump to the original destination.  That jump can then be optimized
>    further, as usual.
>
> This handles all cases the current function.c does, and a few it misses.
>
>
> 2016-05-03  Segher Boessenkool  <segher@kernel.crashing.org>
>
>         * cfgcleanup.c (bb_is_just_return): New function.
>         (try_optimize_cfg): Simplify jumps to return, branches to return,
>         and branches around return.
>

Hi,

This patch causes an ICE on gcc.dg/20010822-1.c for target arm-none-eabi
--with-cpu=cortex-a9

/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/20010822-1.c:
In function 'bar':
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/20010822-1.c:31:1:
internal compiler error: in redirect_jump, at jump.c:1560
0x949a27 redirect_jump(rtx_jump_insn*, rtx_def*, int)
        /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/jump.c:1560
0x10ec689 try_optimize_cfg
        /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgcleanup.c:2899
0x10ec689 cleanup_cfg(int)
        /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgcleanup.c:3150
0x10ed1f6 execute
        /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgcleanup.c:3279

Christophe

> ---
>  gcc/cfgcleanup.c | 123 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 123 insertions(+)
>
> diff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c
> index 19583a7..4f87811 100644
> --- a/gcc/cfgcleanup.c
> +++ b/gcc/cfgcleanup.c
> @@ -2606,6 +2606,35 @@ trivially_empty_bb_p (basic_block bb)
>      }
>  }
>
> +/* Return true if BB contains just a return and possibly a USE of the
> +   return value.  Fill in *RET and *USE with the return and use insns
> +   if any found, otherwise NULL.  */
> +
> +static bool
> +bb_is_just_return (basic_block bb, rtx_insn **ret, rtx_insn **use)
> +{
> +  *ret = *use = NULL;
> +  rtx_insn *insn;
> +
> +  if (bb == EXIT_BLOCK_PTR_FOR_FN (cfun))
> +    return false;
> +
> +  FOR_BB_INSNS (bb, insn)
> +    if (NONDEBUG_INSN_P (insn))
> +      {
> +       if (!*ret && ANY_RETURN_P (PATTERN (insn)))
> +         *ret = insn;
> +       else if (!*ret && !*use && GET_CODE (PATTERN (insn)) == USE
> +           && REG_P (XEXP (PATTERN (insn), 0))
> +           && REG_FUNCTION_VALUE_P (XEXP (PATTERN (insn), 0)))
> +         *use = insn;
> +       else
> +         return false;
> +      }
> +
> +  return !!*ret;
> +}
> +
>  /* Do simple CFG optimizations - basic block merging, simplifying of jump
>     instructions etc.  Return nonzero if changes were made.  */
>
> @@ -2792,6 +2821,100 @@ try_optimize_cfg (int mode)
>                       }
>                 }
>
> +             /* Try to change a branch to a return to just that return.  */
> +             rtx_insn *ret, *use;
> +             if (single_succ_p (b)
> +                 && onlyjump_p (BB_END (b))
> +                 && bb_is_just_return (single_succ (b), &ret, &use))
> +               {
> +                 if (redirect_jump (as_a <rtx_jump_insn *> (BB_END (b)),
> +                                    PATTERN (ret), 0))
> +                   {
> +                     if (use)
> +                       emit_insn_before (copy_insn (PATTERN (use)),
> +                                         BB_END (b));
> +                     if (dump_file)
> +                       fprintf (dump_file, "Changed jump %d->%d to return.\n",
> +                                           b->index, single_succ (b)->index);
> +                     redirect_edge_succ (single_succ_edge (b),
> +                                         EXIT_BLOCK_PTR_FOR_FN (cfun));
> +                     single_succ_edge (b)->flags &= ~EDGE_CROSSING;
> +                     changed_here = true;
> +                   }
> +               }
> +
> +             /* Try to change a conditional branch to a return to the
> +                respective conditional return.  */
> +             if (EDGE_COUNT (b->succs) == 2
> +                 && any_condjump_p (BB_END (b))
> +                 && bb_is_just_return (BRANCH_EDGE (b)->dest, &ret, &use))
> +               {
> +                 if (redirect_jump (as_a <rtx_jump_insn *> (BB_END (b)),
> +                                    PATTERN (ret), 0))
> +                   {
> +                     if (use)
> +                       emit_insn_before (copy_insn (PATTERN (use)),
> +                                         BB_END (b));
> +                     if (dump_file)
> +                       fprintf (dump_file, "Changed conditional jump %d->%d "
> +                                           "to conditional return.\n",
> +                                           b->index,
> +                                           BRANCH_EDGE (b)->dest->index);
> +                     redirect_edge_succ (BRANCH_EDGE (b),
> +                                         EXIT_BLOCK_PTR_FOR_FN (cfun));
> +                     BRANCH_EDGE (b)->flags &= ~EDGE_CROSSING;
> +                     changed_here = true;
> +                   }
> +               }
> +
> +             /* Try to flip a conditional branch that falls through to
> +                a return so that it becomes a conditional return and a
> +                new jump to the original branch target.  */
> +             if (EDGE_COUNT (b->succs) == 2
> +                 && any_condjump_p (BB_END (b))
> +                 && bb_is_just_return (FALLTHRU_EDGE (b)->dest, &ret, &use))
> +               {
> +                 if (invert_jump (as_a <rtx_jump_insn *> (BB_END (b)),
> +                                  JUMP_LABEL (BB_END (b)), 0))
> +                   {
> +                     basic_block new_ft = BRANCH_EDGE (b)->dest;
> +                     if (redirect_jump (as_a <rtx_jump_insn *> (BB_END (b)),
> +                                        PATTERN (ret), 0))
> +                       {
> +                         if (use)
> +                           emit_insn_before (copy_insn (PATTERN (use)),
> +                                             BB_END (b));
> +                         if (dump_file)
> +                           fprintf (dump_file, "Changed conditional jump "
> +                                               "%d->%d to conditional return, "
> +                                               "adding fall-through jump.\n",
> +                                               b->index,
> +                                               BRANCH_EDGE (b)->dest->index);
> +                         redirect_edge_succ (BRANCH_EDGE (b),
> +                                             EXIT_BLOCK_PTR_FOR_FN (cfun));
> +                         BRANCH_EDGE (b)->flags &= ~EDGE_CROSSING;
> +                         std::swap (BRANCH_EDGE (b)->probability,
> +                                    FALLTHRU_EDGE (b)->probability);
> +                         update_br_prob_note (b);
> +                         basic_block jb = force_nonfallthru (FALLTHRU_EDGE (b));
> +                         notice_new_block (jb);
> +                         if (!redirect_jump (as_a <rtx_jump_insn *> (BB_END (jb)),
> +                                             block_label (new_ft), 0))
> +                           gcc_unreachable ();
> +                         redirect_edge_succ (single_succ_edge (jb), new_ft);
> +                         changed_here = true;
> +                       }
> +                     else
> +                       {
> +                         /* Invert the jump back to what it was.  This should
> +                            never fail.  */
> +                         if (!invert_jump (as_a <rtx_jump_insn *> (BB_END (b)),
> +                                           JUMP_LABEL (BB_END (b)), 0))
> +                           gcc_unreachable ();
> +                       }
> +                   }
> +               }
> +
>               /* Simplify branch over branch.  */
>               if ((mode & CLEANUP_EXPENSIVE)
>                    && !(mode & CLEANUP_CFGLAYOUT)
> --
> 1.9.3
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/3] cfgcleanup: Fold jumps and conditional branches with returns
  2016-05-10 19:34   ` Christophe Lyon
@ 2016-05-10 23:26     ` Segher Boessenkool
  2016-05-11  8:17       ` Christophe Lyon
  0 siblings, 1 reply; 25+ messages in thread
From: Segher Boessenkool @ 2016-05-10 23:26 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: gcc-patches

On Tue, May 10, 2016 at 09:33:56PM +0200, Christophe Lyon wrote:
> This patch causes an ICE on gcc.dg/20010822-1.c for target arm-none-eabi
> --with-cpu=cortex-a9

That is PR71028, I sent a patch to fix it, will commit in a minute.
(See https://gcc.gnu.org/ml/gcc-patches/2016-05/msg00673.html ).

Sorry for the breakage,


Segher

> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/20010822-1.c:
> In function 'bar':
> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/20010822-1.c:31:1:
> internal compiler error: in redirect_jump, at jump.c:1560
> 0x949a27 redirect_jump(rtx_jump_insn*, rtx_def*, int)
>         /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/jump.c:1560
> 0x10ec689 try_optimize_cfg
>         /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgcleanup.c:2899
> 0x10ec689 cleanup_cfg(int)
>         /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgcleanup.c:3150
> 0x10ed1f6 execute
>         /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgcleanup.c:3279

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/3] cfgcleanup: Fold jumps and conditional branches with returns
  2016-05-10 23:26     ` Segher Boessenkool
@ 2016-05-11  8:17       ` Christophe Lyon
  0 siblings, 0 replies; 25+ messages in thread
From: Christophe Lyon @ 2016-05-11  8:17 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches

On 11 May 2016 at 01:26, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> On Tue, May 10, 2016 at 09:33:56PM +0200, Christophe Lyon wrote:
>> This patch causes an ICE on gcc.dg/20010822-1.c for target arm-none-eabi
>> --with-cpu=cortex-a9
>
> That is PR71028, I sent a patch to fix it, will commit in a minute.
> (See https://gcc.gnu.org/ml/gcc-patches/2016-05/msg00673.html ).
>
> Sorry for the breakage,
>

OK thanks. Sorry for the delay in reporting this. All the noise caused
by the cilkplus
merge meant I had to dig longer to identify real regressions.

I confirm your patch did fix the regression.

Christophe.

>
> Segher
>
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/20010822-1.c:
>> In function 'bar':
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/20010822-1.c:31:1:
>> internal compiler error: in redirect_jump, at jump.c:1560
>> 0x949a27 redirect_jump(rtx_jump_insn*, rtx_def*, int)
>>         /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/jump.c:1560
>> 0x10ec689 try_optimize_cfg
>>         /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgcleanup.c:2899
>> 0x10ec689 cleanup_cfg(int)
>>         /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgcleanup.c:3150
>> 0x10ed1f6 execute
>>         /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgcleanup.c:3279

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] shrink-wrap: Remove complicated simple_return manipulations
  2016-05-09 15:08     ` Segher Boessenkool
@ 2016-05-11 11:52       ` Jiong Wang
  2016-06-14 14:54         ` [Patch, cfg] Improve jump to return optimization for complex return Jiong Wang
  0 siblings, 1 reply; 25+ messages in thread
From: Jiong Wang @ 2016-05-11 11:52 UTC (permalink / raw)
  To: Segher Boessenkool, Christophe Lyon; +Cc: gcc-patches



On 09/05/16 16:08, Segher Boessenkool wrote:
> Hi Christophe,
>
> On Mon, May 09, 2016 at 03:54:26PM +0200, Christophe Lyon wrote:
>> After this patch, I've noticed that
>> gcc.target/arm/pr43920-2.c
>> now fails at:
>> /* { dg-final { scan-assembler-times "pop" 2 } } */
>>
>> Before the patch, the generated code was:
>> [...]
>>          pop     {r3, r4, r5, r6, r7, pc}
>> .L4:
>>          mov     r0, #-1
>> .L1:
>>          pop     {r3, r4, r5, r6, r7, pc}
>>
>> it is now:
>> [...]
>> .L1:
>>          pop     {r3, r4, r5, r6, r7, pc}
>> .L4:
>>          mov     r0, #-1
>>          b       .L1
>>
>> The new version does not seem better, as it adds a branch on the path
>> and it is not smaller.
> That looks like bb-reorder isn't doing its job?  Maybe it thinks that
> pop is too expensive to copy?
I think so. Filed PR71061

ARM backend is not setting the length attribute correctly, that the bb
failed copy_bb_p check.

Unfortunately I am afraid even we fixed the backend length issue, this 
testcase
will keep failing, because it's specify "-Os" that some bb copy won't be 
triggerd.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Patch, cfg] Improve jump to return optimization for complex return
  2016-05-11 11:52       ` Jiong Wang
@ 2016-06-14 14:54         ` Jiong Wang
  2016-06-14 20:30           ` Segher Boessenkool
  0 siblings, 1 reply; 25+ messages in thread
From: Jiong Wang @ 2016-06-14 14:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: Segher Boessenkool, Christophe Lyon

[-- Attachment #1: Type: text/plain, Size: 6036 bytes --]

On 11/05/16 12:52, Jiong Wang wrote:
>
>
> On 09/05/16 16:08, Segher Boessenkool wrote:
>> Hi Christophe,
>>
>> On Mon, May 09, 2016 at 03:54:26PM +0200, Christophe Lyon wrote:
>>> After this patch, I've noticed that
>>> gcc.target/arm/pr43920-2.c
>>> now fails at:
>>> /* { dg-final { scan-assembler-times "pop" 2 } } */
>>>
>>> Before the patch, the generated code was:
>>> [...]
>>>          pop     {r3, r4, r5, r6, r7, pc}
>>> .L4:
>>>          mov     r0, #-1
>>> .L1:
>>>          pop     {r3, r4, r5, r6, r7, pc}
>>>
>>> it is now:
>>> [...]
>>> .L1:
>>>          pop     {r3, r4, r5, r6, r7, pc}
>>> .L4:
>>>          mov     r0, #-1
>>>          b       .L1
>>>
>>> The new version does not seem better, as it adds a branch on the path
>>> and it is not smaller.
>> That looks like bb-reorder isn't doing its job?  Maybe it thinks that
>> pop is too expensive to copy?
> I think so. Filed PR71061
>
> ARM backend is not setting the length attribute correctly, that the bb
> failed copy_bb_p check.
>

The length attributes for these pop pattern has been correctted by r237331.

> Unfortunately I am afraid even we fixed the backend length issue, this 
> testcase
> will keep failing, because it's specify "-Os" that some bb copy won't 
> be triggerd.
>

A further investigation shows this issue is actually gcc couldn't optimize

"bl to pop" into "pop" which is "jump to return" into "return",  so a better
place to fix this issue is at try_optimize_cfg where we are doing these
jump/return optimization already:

   /* Try to change a branch to a return to just that return.  */
   rtx_insn *ret, *use;
   ...

Currently we are using ANY_RETURN_P to check whether an rtx is return
while ANY_RETURN_P only covered RETURN, SIMPLE_RETURN without side-effect:

#define ANY_RETURN_P(X) \
   (GET_CODE (X) == RETURN || GET_CODE (X) == SIMPLE_RETURN)

It is possible that some architectures support return instruction with
side-effect, for example ARM has pop instruction which will do a return
and pop registers at the same time.  The instruction pattern for that is
something like:

   (jump_insn 90 89 93 7 (parallel [
             (return)
             (set/f (reg/f:SI 13 sp)
                 (plus:SI (reg/f:SI 13 sp)
                     (const_int 24 [0x18])))
             (set/f (reg:SI 3 r3)
                 (mem/c:SI (reg/f:SI 13 sp) [3  S4 A32]))
             (set/f (reg:SI 4 r4)
                 (mem/c:SI (plus:SI (reg/f:SI 13 sp)
                         (const_int 4 [0x4])) [3  S4 A32]))
   ...

Above pattern will fail ANY_RETURN_P check, that gcc can't optimize the
folowing jump to return:

[bb 1]
(jump_insn 76 38 77 4 (set (pc)
         (label_ref 43)) pr43920-2.c:27 203 {*arm_jump}
      (nil)
  -> 43)

[bb 2]
(code_label 43)
...
(jump_insn 90 89 93 7 (parallel [
             (return)
             (set/f (reg/f:SI 13 sp)
                 (plus:SI (reg/f:SI 13 sp)
                     (const_int 24 [0x18])))
             (set/f (reg:SI 3 r3)
                 (mem/c:SI (reg/f:SI 13 sp) [3  S4 A32]))
             (set/f (reg:SI 4 r4)
                 (mem/c:SI (plus:SI (reg/f:SI 13 sp)
                         (const_int 4 [0x4])) [3  S4 A32]))

into:

(jump_insn 76 38 77 4 (parallel [
             (return)
             (set/f (reg/f:SI 13 sp)
                 (plus:SI (reg/f:SI 13 sp)
                     (const_int 24 [0x18])))
             (set/f (reg:SI 3 r3)
                 (mem/c:SI (reg/f:SI 13 sp) [3  S4 A32]))
             (set/f (reg:SI 4 r4)
                 (mem/c:SI (plus:SI (reg/f:SI 13 sp)
                         (const_int 4 [0x4])) [3  S4 A32]))
             (set/f (reg:SI 5 r5)
                 (mem/c:SI (plus:SI (reg/f:SI 13 sp)
                         (const_int 8 [0x8])) [3  S4 A32]))

This patch:
   * rename existed ANY_RETURN_P into ANY_PURE_RETURN_P.
   * Re-implement ANY_RETURN_P to consider complex JUMP_INSN with
     PARALLEL in the pattern.
   * Removed the side_effect_p check on the last insn in both bb1
     and bb2.  This is because suppose we have bb1 and bb2 contains
     the following single jump_insn and both fall through to EXIT_BB:

     bb 1                                bb 2

     (jump_insn                       (jump_insn
        (parallel [                      (parallel [
             (return)                         (return)
             (set/f                           (set/f
               (reg:SI 15 pc)                   (reg:SI 15 pc)
               (mem:SI                          (mem:SI
                 (post_inc:SI                     (post_inc:SI
                 (reg/f:SI 13 sp))                (reg/f:SI 13 sp))
         ])                               ])
      -> return)                        -> return)

                 \                    /
                  \                  /
                   \                /
                    v              v
                         EXIT_BB

     cross jump optimization will try to change the jump_insn in bb1 into
     a unconditional jump to bb2, then we will enter the next iteration
     of try_optimize_cfg, and it will be a new "jump to return", then
     bb1 will be optimized back into above patterns, and thus another round
     of cross jump optimization, we will end up with infinite loop here.

boostrap ok on x86_64/arm32/arm64, no gcc/g++ regression on any of
them.

OK for trunk?

2016-06-14  Jiong Wang<jiong.wang@arm.com>

     * cfgcleanup.c (output.h): New include.
     (try_optimize_cfg): Check instruction length when optimizing jump to
     return into return, consider optimize for size.
     (flow_find_cross_jump): Remove side_effect_p check on the last insn
     in both bb1 and bb2.
     * jump.c (return_in_jump): New function.
     (redirect_jump_2): Consider complex pattern when updating jump label.
     * rtl.c (classify_insn): Use ANY_PURE_RETURN_P instead of
     ANY_RETURN_P.
     * rtl.h (return_in_jump): New declaration.
     (ANY_RETURN_P): Rename to ANY_PURE_RETURN_P.  Re-implement
     through return_in_jump.



[-- Attachment #2: cfg.patch --]
[-- Type: text/x-patch, Size: 4873 bytes --]

diff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c
index 023b9d2..a4bea09 100644
--- a/gcc/cfgcleanup.c
+++ b/gcc/cfgcleanup.c
@@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tm_p.h"
 #include "insn-config.h"
 #include "emit-rtl.h"
+#include "output.h"
 #include "cselib.h"
 #include "params.h"
 #include "tree-pass.h"
@@ -1337,7 +1338,7 @@ flow_find_cross_jump (basic_block bb1, basic_block bb2, rtx_insn **f1,
   i1 = BB_END (bb1);
   last1 = afterlast1 = last2 = afterlast2 = NULL;
   if (onlyjump_p (i1)
-      || (returnjump_p (i1) && !side_effects_p (PATTERN (i1))))
+      || returnjump_p (i1))
     {
       last1 = i1;
       i1 = PREV_INSN (i1);
@@ -1345,7 +1346,7 @@ flow_find_cross_jump (basic_block bb1, basic_block bb2, rtx_insn **f1,
 
   i2 = BB_END (bb2);
   if (onlyjump_p (i2)
-      || (returnjump_p (i2) && !side_effects_p (PATTERN (i2))))
+      || returnjump_p (i2))
     {
       last2 = i2;
       /* Count everything except for unconditional jump as insn.
@@ -2825,7 +2826,11 @@ try_optimize_cfg (int mode)
 	      rtx_insn *ret, *use;
 	      if (single_succ_p (b)
 		  && onlyjump_p (BB_END (b))
-		  && bb_is_just_return (single_succ (b), &ret, &use))
+		  && bb_is_just_return (single_succ (b), &ret, &use)
+		  && ((get_attr_min_length (ret)
+		       <= get_attr_min_length (BB_END (b)))
+		      || optimize_bb_for_speed_p (b)))
+
 		{
 		  if (redirect_jump (as_a <rtx_jump_insn *> (BB_END (b)),
 				     PATTERN (ret), 0))
diff --git a/gcc/jump.c b/gcc/jump.c
index 5b433af..e962921 100644
--- a/gcc/jump.c
+++ b/gcc/jump.c
@@ -1437,7 +1437,35 @@ delete_for_peephole (rtx_insn *from, rtx_insn *to)
      since the peephole that replaces this sequence
      is also an unconditional jump in that case.  */
 }
-\f
+
+/* If X is RETURN or SIMPLE_RETURN then return itself.  If X is PARALLEL, return
+   the contained RETURN or SIMPLE_RETURN if it's not a CALL_INSN, otherwise
+   return NULL_RTX.  */
+
+rtx
+return_in_jump (rtx x)
+{
+  if (ANY_PURE_RETURN_P (x))
+    return x;
+
+  rtx ret = NULL_RTX;
+
+  if (GET_CODE (x) == PARALLEL)
+    {
+      int j = 0;
+      for (; j < XVECLEN (x, 0); j++)
+	if (ANY_PURE_RETURN_P (XVECEXP (x, 0, j)))
+	  ret = XVECEXP (x, 0, j);
+        /* Return NULL_RTX for CALL_INSN.  */
+	else if (GET_CODE (XVECEXP (x, 0, j)) == CALL
+		 || (GET_CODE (XVECEXP (x, 0, j)) == SET
+		     && GET_CODE (SET_SRC (XVECEXP (x, 0, j))) == CALL))
+	  return NULL_RTX;
+    }
+
+  return ret;
+}
+
 /* A helper function for redirect_exp_1; examines its input X and returns
    either a LABEL_REF around a label, or a RETURN if X was NULL.  */
 static rtx
@@ -1586,8 +1614,14 @@ redirect_jump_2 (rtx_jump_insn *jump, rtx olabel, rtx nlabel, int delete_unused,
      moving FUNCTION_END note.  Just sanity check that no user still worry
      about this.  */
   gcc_assert (delete_unused >= 0);
-  JUMP_LABEL (jump) = nlabel;
-  if (!ANY_RETURN_P (nlabel))
+  /* "nlabel" might be a pattern where "return/simple_return" is inside a
+     PARALLEL that it can't be used for JUMP_LABEL directly.  */
+  rtx ret = return_in_jump (nlabel);
+  if (ret)
+    JUMP_LABEL (jump) = ret;
+  else
+    JUMP_LABEL (jump) = nlabel;
+  if (!ret)
     ++LABEL_NUSES (nlabel);
 
   /* Update labels in any REG_EQUAL note.  */
diff --git a/gcc/rtl.h b/gcc/rtl.h
index b531ab7..a2db61b 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -961,9 +961,12 @@ is_a_helper <rtx_note *>::test (rtx_insn *insn)
 }
 
 /* Predicate yielding nonzero iff X is a return or simple_return.  */
-#define ANY_RETURN_P(X) \
+#define ANY_PURE_RETURN_P(X) \
   (GET_CODE (X) == RETURN || GET_CODE (X) == SIMPLE_RETURN)
 
+/* See comments for return_in_jump.  */
+#define ANY_RETURN_P(X) (return_in_jump (X) != NULL_RTX)
+
 /* 1 if X is a unary operator.  */
 
 #define UNARY_P(X)   \
@@ -3503,6 +3506,7 @@ extern enum rtx_code reversed_comparison_code_parts (enum rtx_code, const_rtx,
 						     const_rtx, const_rtx);
 extern void delete_for_peephole (rtx_insn *, rtx_insn *);
 extern int condjump_in_parallel_p (const rtx_insn *);
+extern rtx return_in_jump (rtx);
 
 /* In emit-rtl.c.  */
 extern int max_reg_num (void);
diff --git a/gcc/rtl.c b/gcc/rtl.c
index a445cdc..fd96839 100644
--- a/gcc/rtl.c
+++ b/gcc/rtl.c
@@ -694,7 +694,7 @@ classify_insn (rtx x)
     return CODE_LABEL;
   if (GET_CODE (x) == CALL)
     return CALL_INSN;
-  if (ANY_RETURN_P (x))
+  if (ANY_PURE_RETURN_P (x))
     return JUMP_INSN;
   if (GET_CODE (x) == SET)
     {
@@ -712,7 +712,7 @@ classify_insn (rtx x)
       for (j = XVECLEN (x, 0) - 1; j >= 0; j--)
 	if (GET_CODE (XVECEXP (x, 0, j)) == CALL)
 	  return CALL_INSN;
-	else if (ANY_RETURN_P (XVECEXP (x, 0, j)))
+	else if (ANY_PURE_RETURN_P (XVECEXP (x, 0, j)))
 	  has_return_p = true;
 	else if (GET_CODE (XVECEXP (x, 0, j)) == SET
 		 && GET_CODE (SET_DEST (XVECEXP (x, 0, j))) == PC)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Patch, cfg] Improve jump to return optimization for complex return
  2016-06-14 14:54         ` [Patch, cfg] Improve jump to return optimization for complex return Jiong Wang
@ 2016-06-14 20:30           ` Segher Boessenkool
  2016-06-15 17:01             ` Jiong Wang
  0 siblings, 1 reply; 25+ messages in thread
From: Segher Boessenkool @ 2016-06-14 20:30 UTC (permalink / raw)
  To: Jiong Wang; +Cc: gcc-patches, Christophe Lyon

On Tue, Jun 14, 2016 at 03:53:59PM +0100, Jiong Wang wrote:
> "bl to pop" into "pop" which is "jump to return" into "return",  so a better
> place to fix this issue is at try_optimize_cfg where we are doing these
> jump/return optimization already:
> 
>   /* Try to change a branch to a return to just that return.  */
>   rtx_insn *ret, *use;
>   ...
> 
> Currently we are using ANY_RETURN_P to check whether an rtx is return
> while ANY_RETURN_P only covered RETURN, SIMPLE_RETURN without side-effect:
> 
> #define ANY_RETURN_P(X) \
>   (GET_CODE (X) == RETURN || GET_CODE (X) == SIMPLE_RETURN)

On PowerPC we have a similar problem for jumps to trivial epilogues,
which are a parallel of a return with a (use reg:LR_REGNO).  But that
one shows up for conditional jumps.

> This patch:
>   * rename existed ANY_RETURN_P into ANY_PURE_RETURN_P.
>   * Re-implement ANY_RETURN_P to consider complex JUMP_INSN with
>     PARALLEL in the pattern.

ANY_RETURN_P is used in many other places, have you checked them all?

>   * Removed the side_effect_p check on the last insn in both bb1
>     and bb2.  This is because suppose we have bb1 and bb2 contains
>     the following single jump_insn and both fall through to EXIT_BB:

<snip>

>     cross jump optimization will try to change the jump_insn in bb1 into
>     a unconditional jump to bb2, then we will enter the next iteration
>     of try_optimize_cfg, and it will be a new "jump to return", then
>     bb1 will be optimized back into above patterns, and thus another round
>     of cross jump optimization, we will end up with infinite loop here.

Why is it save to remove the side_effects_p check?  Why was it there
in the first place?

> --- a/gcc/cfgcleanup.c
> +++ b/gcc/cfgcleanup.c
> @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tm_p.h"
>  #include "insn-config.h"
>  #include "emit-rtl.h"
> +#include "output.h"

What is this include used for?  Ah, get_attr_min_length?

> @@ -2825,7 +2826,11 @@ try_optimize_cfg (int mode)
>  	      rtx_insn *ret, *use;
>  	      if (single_succ_p (b)
>  		  && onlyjump_p (BB_END (b))
> -		  && bb_is_just_return (single_succ (b), &ret, &use))
> +		  && bb_is_just_return (single_succ (b), &ret, &use)
> +		  && ((get_attr_min_length (ret)
> +		       <= get_attr_min_length (BB_END (b)))
> +		      || optimize_bb_for_speed_p (b)))

This is for breaking the cycle, right?  It seems fragile.

> --- a/gcc/jump.c
> +++ b/gcc/jump.c
> @@ -1437,7 +1437,35 @@ delete_for_peephole (rtx_insn *from, rtx_insn *to)
>       since the peephole that replaces this sequence
>       is also an unconditional jump in that case.  */
>  }
> -\f
> +

Unrelated change.

> +/* If X is RETURN or SIMPLE_RETURN then return itself.  If X is PARALLEL, return
> +   the contained RETURN or SIMPLE_RETURN if it's not a CALL_INSN, otherwise
> +   return NULL_RTX.  */
> +
> +rtx
> +return_in_jump (rtx x)

Strange semantics, and the name does not capture the "call" semantics
at all.  Maybe split out that part?  What is that part for, anyway?


Segher

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Patch, cfg] Improve jump to return optimization for complex return
  2016-06-14 20:30           ` Segher Boessenkool
@ 2016-06-15 17:01             ` Jiong Wang
  0 siblings, 0 replies; 25+ messages in thread
From: Jiong Wang @ 2016-06-15 17:01 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, Christophe Lyon

Segher Boessenkool writes:

> On Tue, Jun 14, 2016 at 03:53:59PM +0100, Jiong Wang wrote:
>> "bl to pop" into "pop" which is "jump to return" into "return",  so a better
>> place to fix this issue is at try_optimize_cfg where we are doing these
>> jump/return optimization already:
>> 
>>   /* Try to change a branch to a return to just that return.  */
>>   rtx_insn *ret, *use;
>>   ...
>> 
>> Currently we are using ANY_RETURN_P to check whether an rtx is return
>> while ANY_RETURN_P only covered RETURN, SIMPLE_RETURN without side-effect:
>> 
>> #define ANY_RETURN_P(X) \
>>   (GET_CODE (X) == RETURN || GET_CODE (X) == SIMPLE_RETURN)
>
> On PowerPC we have a similar problem for jumps to trivial epilogues,
> which are a parallel of a return with a (use reg:LR_REGNO).  But that
> one shows up for conditional jumps.

On ARM, from my micro gcc bootstrap benchmark by enable
-fdump-rtl-pro_and_epilogue during gcc bootstrap then I can see
there are about 8.5x more "Changed jump.*to return" optimizations
happened:

 grep "Changed jump.*to return" BUILD/gcc/*.pro_and_epilogue | wc -l

The number is boosted from about thousand to about ten thousands.

And although this patch itself adds more code, the size of .text section
is slightly smaller after this patch.

It would be great if you can test this patch and see how the codegen is
affected on PowerPC.

>
>> This patch:
>>   * rename existed ANY_RETURN_P into ANY_PURE_RETURN_P.
>>   * Re-implement ANY_RETURN_P to consider complex JUMP_INSN with
>>     PARALLEL in the pattern.
>
> ANY_RETURN_P is used in many other places, have you checked them all?

I had done quick check on gcc/*.[ch] and think those places are safe, I
missed gcc/config/*.

I will double check all of them.

>
>>   * Removed the side_effect_p check on the last insn in both bb1
>>     and bb2.  This is because suppose we have bb1 and bb2 contains
>>     the following single jump_insn and both fall through to EXIT_BB:
>
> <snip>
>
>>     cross jump optimization will try to change the jump_insn in bb1 into
>>     a unconditional jump to bb2, then we will enter the next iteration
>>     of try_optimize_cfg, and it will be a new "jump to return", then
>>     bb1 will be optimized back into above patterns, and thus another round
>>     of cross jump optimization, we will end up with infinite loop here.
>
> Why is it save to remove the side_effects_p check?  Why was it there
> in the first place?

git blames shows the side_effects_p check was there since initial cfg
supported added (r45504, back in 2001).

My understanding it's there because the code want to use onlyjump_p and
returnjump_p to conver simple jumps, but the author may have found
onlyjump_p will check side_effect while returnjump_p won't, so the
side_effect_p was added for the latter to match the logic of onlyjump_p.

I found onlyjump_p is introduced by r28584 back in 1999 where RTH used
it to do something like

  "/* If the jump insn has side effects, we can't kill the edge.  */"

That make sense to me as it read like kill a execution path that if
there is side effect, it's not safe to do that.

But here for cross jump, my understanding is we are not killing
something, instead, we are redirecting one path to the other if both
share the same instruction sequences.

So given the following instruction sequences, both ended with jump to
dest and the jump is with side-effect, then even you redirect a jump to
insn A by jump to insn A1, the side-effect will still be executed.  I am
assuming the "outgoing_edges_match/old_insns_match_p" check which is
done before "flow_find_cross_jump" will make sure the side-effect in
both sequences are identical.

     insn A                insn A1
     insn B                insn A2
     insn C                insn A3
     jump to dest      jump to dest
          \           /
           \         /
             dest

So I think the removal of them is safe, and if we don't remove them, then
we will trigger endless optimization cycle after this patch, at least on
ARM.

Because complex return pattern is likely has side-effect, thus failed
these initial checkes in flow_find_cross_jump, then both jump_insn in
bb1 and bb2 will fall through to the full comparision path where it's
judged as identical, thus will be counted into ninsns and might triger
cross jump optimization.

    bb 1                                bb 2

    (jump_insn                       (jump_insn
       (parallel [                      (parallel [
            (return)                         (return)
            (set/f                           (set/f
              (reg:SI 15 pc)                   (reg:SI 15 pc)
              (mem:SI                          (mem:SI
                (post_inc:SI                     (post_inc:SI
                (reg/f:SI 13 sp))                (reg/f:SI 13 sp))
        ])                               ])
     -> return)                        -> return)

                \                    /
                 \                  /
                  \                /
                   v              v
                        EXIT_BB

What I observed is a new unconditional to bb 2 will be generated, then
it's another jump to return, and will be optimized into the same pattern
as above, then it triggers cross jump again.  The

  while (try_optimize_cfg (mode))
    {
    }

inside cleanup_cfg will never end as inside bb are keeping changing.

>
>> --- a/gcc/cfgcleanup.c
>> +++ b/gcc/cfgcleanup.c
>> @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "tm_p.h"
>>  #include "insn-config.h"
>>  #include "emit-rtl.h"
>> +#include "output.h"
>
> What is this include used for?  Ah, get_attr_min_length?

Yes.

>
>> @@ -2825,7 +2826,11 @@ try_optimize_cfg (int mode)
>>  	      rtx_insn *ret, *use;
>>  	      if (single_succ_p (b)
>>  		  && onlyjump_p (BB_END (b))
>> -		  && bb_is_just_return (single_succ (b), &ret, &use))
>> +		  && bb_is_just_return (single_succ (b), &ret, &use)
>> +		  && ((get_attr_min_length (ret)
>> +		       <= get_attr_min_length (BB_END (b)))
>> +		      || optimize_bb_for_speed_p (b)))
>
> This is for breaking the cycle, right?  It seems fragile.

It not for breaking the cycle, it's for code size consideration
only. For example, under ARM thumb2, a pop might be either 2 bytes or 4 bytes
while unconditional jump is 2 bytes, that the optimization from "b ->
pop" into "pop" might increase code size.

That removal of "side_effect_p" check explain above is for breaking cycle.

>
>> --- a/gcc/jump.c
>> +++ b/gcc/jump.c
>> @@ -1437,7 +1437,35 @@ delete_for_peephole (rtx_insn *from, rtx_insn *to)
>>       since the peephole that replaces this sequence
>>       is also an unconditional jump in that case.  */
>>  }
>> -\f
>> +
>
> Unrelated change.
>
>> +/* If X is RETURN or SIMPLE_RETURN then return itself.  If X is PARALLEL, return
>> +   the contained RETURN or SIMPLE_RETURN if it's not a CALL_INSN, otherwise
>> +   return NULL_RTX.  */
>> +
>> +rtx
>> +return_in_jump (rtx x)
>
> Strange semantics, and the name does not capture the "call" semantics
> at all.  Maybe split out that part?  What is that part for, anyway?

yeah, I am not good at English naming.  This function aims to handle
JUMP_INSN, not CALL_INSN as I am seeing the following errors during
bootstrapping on arm

  internal compiler error: RTL flag check: SIBLING_CALL_P used with
  unexpected rtx code 'jump_insn'...

  0xd4e9e4 rtl_check_failed_flag(char const*, rtx_def const*, char c...
	../../gcc-git-official/gcc/rtl.c:...
  0x14a6415 recog_...
	../../gcc-git-official/gcc/config/arm/arm.md: ...
  0x153443c recog(rtx_def*, rtx_def*, int*)

I am thinking this is because an instruction with CALL inside PARALLEL
will be CALL_INSN, while the instruction we want to optimize is
JUMP_INSN, then if optimize them into one then the internal rtx
structure will break, so I want to skip instructions with CALL inside
the pattern.

This function is quite similar with returnjump_p, except it will return
the inside "return/simple_return" instead of true/false. And it works on
PATTERN (insn) instead of rtx_insn.

>
>
> Segher

-- 
Regards,
Jiong

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] shrink-wrap: Remove complicated simple_return manipulations
  2016-05-10 17:18 David Edelsohn
  2016-05-11  2:34 ` Jan Hubicka
@ 2016-05-11  2:34 ` Jan Hubicka
  1 sibling, 0 replies; 25+ messages in thread
From: Jan Hubicka @ 2016-05-11  2:34 UTC (permalink / raw)
  To: David Edelsohn
  Cc: Wilco Dijkstra, segher, Christophe Lyon, gcc-patches, nd, Jan Hubicka

Hi,
can you also send me the string/str(c)spn.c testcase you speak about?

Honza

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] shrink-wrap: Remove complicated simple_return manipulations
  2016-05-10 17:18 David Edelsohn
@ 2016-05-11  2:34 ` Jan Hubicka
  2016-05-11  2:34 ` Jan Hubicka
  1 sibling, 0 replies; 25+ messages in thread
From: Jan Hubicka @ 2016-05-11  2:34 UTC (permalink / raw)
  To: David Edelsohn
  Cc: Wilco Dijkstra, segher, Christophe Lyon, gcc-patches, nd, Jan Hubicka

> >>>>> Wilco Dijkstra wrote:
> 
> > It relies on static branch probabilities, which are set completely wrong in GCC,
> so it ends up optimizing the hot path in many functions for size
> rather than speed
> and visa versa.
> 
> This sounds like the static branch prediction issue that we have been
> discussing with Honza.  Honza discussed plans to improve this
> infrastructure during this development cycle.

In fact I just started on refactoring the profile updating code.
However I do not see how it is going to help there or what is wrong.
In the testcase:
void g(void);
int a;
int f(void)
{
  g();       
  if (a < 0x7ffffffe)  // or != 0 or < 0 or a < 0x7ffffffe
    return -1;
  a = 1;
  return 1;
}
triggers negative return heuristics that says that with 96% probability the
function will not return negative constant (as those are usually error states).
This further combine with default compare heuristcs but it won't outvote this
one to make return -1 more likely and return 1;

With opcode, we predict that comparsions a==b are false when b is not 0.  We
also predict that values are positive. This is what triggers for test a>0

What is the particular problem with this?
Honza

> 
> - David

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] shrink-wrap: Remove complicated simple_return manipulations
@ 2016-05-10 17:18 David Edelsohn
  2016-05-11  2:34 ` Jan Hubicka
  2016-05-11  2:34 ` Jan Hubicka
  0 siblings, 2 replies; 25+ messages in thread
From: David Edelsohn @ 2016-05-10 17:18 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: segher, Christophe Lyon, gcc-patches, nd, Jan Hubicka

>>>>> Wilco Dijkstra wrote:

> It relies on static branch probabilities, which are set completely wrong in GCC,
so it ends up optimizing the hot path in many functions for size
rather than speed
and visa versa.

This sounds like the static branch prediction issue that we have been
discussing with Honza.  Honza discussed plans to improve this
infrastructure during this development cycle.

- David

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/3] shrink-wrap: Remove complicated simple_return manipulations
@ 2016-05-10 16:31 Wilco Dijkstra
  0 siblings, 0 replies; 25+ messages in thread
From: Wilco Dijkstra @ 2016-05-10 16:31 UTC (permalink / raw)
  To: segher, Christophe Lyon; +Cc: gcc-patches, nd

>> The new version does not seem better, as it adds a branch on the path
>> and it is not smaller.
>
> That looks like bb-reorder isn't doing its job?  Maybe it thinks that
> pop is too expensive to copy?

It relies on static branch probabilities, which are set completely wrong in GCC,
so it ends up optimizing the hot path in many functions for size rather than speed
and visa versa. A simple example I tried on AArch64:

void g(void);
int a;
int f(void)
{
  g();       
  if (a == 0)  // or != 0 or < 0 or a < 0x7ffffffe
    return -1;
  a = 1;
  return 1;
}

The funny thing is that a == 0 and a != 0 behave in exactly the same way, but a < 0 and
a >= 0 are different. However a < C and a > C are always seen as unlikely no matter the
immediate, except for a > 0 which inlines the return... 

I also noticed that GCC ignores the explicit __builtin_expect used in the string/str(c)spn.c
implementations in GLIBC (which you need to avoid incorrect block ordering) and not only
inlines returns in the cold path but also fails to inline them in the hot path...

Wilco

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2016-06-15 17:01 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-03  7:02 [PATCH 0/3] Simplify the simple_return handling Segher Boessenkool
2016-05-03  7:03 ` [PATCH 2/3] cfgcleanup: Fold jumps and conditional branches with returns Segher Boessenkool
2016-05-10 19:34   ` Christophe Lyon
2016-05-10 23:26     ` Segher Boessenkool
2016-05-11  8:17       ` Christophe Lyon
2016-05-03  7:03 ` [PATCH 3/3] shrink-wrap: Remove complicated simple_return manipulations Segher Boessenkool
2016-05-09 13:54   ` Christophe Lyon
2016-05-09 15:08     ` Segher Boessenkool
2016-05-11 11:52       ` Jiong Wang
2016-06-14 14:54         ` [Patch, cfg] Improve jump to return optimization for complex return Jiong Wang
2016-06-14 20:30           ` Segher Boessenkool
2016-06-15 17:01             ` Jiong Wang
2016-05-03  7:03 ` [PATCH 1/3] cfgcleanup: Bugfix in try_simplify_condjump Segher Boessenkool
2016-05-03 20:29   ` Steven Bosscher
2016-05-03 21:09     ` Segher Boessenkool
2016-05-03 12:53 ` [PATCH 0/3] Simplify the simple_return handling Bernd Schmidt
2016-05-03 13:31   ` Segher Boessenkool
2016-05-03 13:46     ` Bernd Schmidt
2016-05-04  0:10   ` Segher Boessenkool
2016-05-04 12:37     ` Bernd Schmidt
2016-05-03 15:03 ` Kyrill Tkachov
2016-05-10 16:31 [PATCH 3/3] shrink-wrap: Remove complicated simple_return manipulations Wilco Dijkstra
2016-05-10 17:18 David Edelsohn
2016-05-11  2:34 ` Jan Hubicka
2016-05-11  2:34 ` Jan Hubicka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).