public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* PowerPC shrink-wrap support 0 of 3
@ 2011-09-17  7:59 Alan Modra
  2011-09-17  8:22 ` PowerPC shrink-wrap support 1 " Alan Modra
                   ` (3 more replies)
  0 siblings, 4 replies; 48+ messages in thread
From: Alan Modra @ 2011-09-17  7:59 UTC (permalink / raw)
  To: gcc-patches, Bernd Schmidt, David Edelsohn

This patch series adds shrink-wrap support for PowerPC.  The patches
are on top of Bernd's "Initial shrink-wrapping patch":
http://gcc.gnu.org/ml/gcc-patches/2011-08/msg02557.html, but with the
tm.texi patch applied to tm.texi.in.  Bootstrapped and regression
tested powerpc64-linux all langs except ada, and spec CPU2006 tested.
The spec results were a little disappointing as I expected to see some
gains, but my baseline was a -O3 run and I suppose most of the
shrink-wrap opportunities were lost to inlining.

I deliberately omitted defining RETURN_ADDR_REGNUM for PowerPC as I
don't see why it is necessary to treat LR any differently to other
regs saved by the prologue.  I believe code in requires_stack_frame_p
  CLEAR_HARD_REG_SET (hardregs);
  note_stores (PATTERN (insn), record_hard_reg_sets, &hardregs);
  if (hard_reg_set_intersect_p (hardregs, prologue_used))
    return true;
will correctly pick up any set of LR.  If code in a function body sets
LR without saving and restoring around the use, then the prologue must
save LR, so LR will be in prologue_used.

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 48+ messages in thread

* PowerPC shrink-wrap support 1 of 3
  2011-09-17  7:59 PowerPC shrink-wrap support 0 of 3 Alan Modra
@ 2011-09-17  8:22 ` Alan Modra
  2011-09-17  9:13 ` PowerPC shrink-wrap support 2 " Alan Modra
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 48+ messages in thread
From: Alan Modra @ 2011-09-17  8:22 UTC (permalink / raw)
  To: gcc-patches, Bernd Schmidt

This obvious patch extends my 2011-08-03 fix to simple_return.
Necessary for the same reason as the original patch, namely that
rs6000.md has a peephole2 that matches and recreates a conditional
jump.  With the third patch in this series applied, rs6000 will start
to use simple_return in conditional jumps.  Even though I think this
patch meets the "obvious" requirement, I won't apply it without the
other two since the assert may trigger on mips without the second
patch in this series.

	PR rtl-optimization/49941
	* jump.c (mark_jump_label_1): Set JUMP_LABEL for simple_return jumps.

diff -urp -x.svn -x'*~' -x'*.orig' gcc-bernd/gcc/jump.c gcc-current/gcc/jump.c
--- gcc-bernd/gcc/jump.c	2011-09-16 11:52:14.000000000 +0930
+++ gcc-current/gcc/jump.c	2011-09-16 09:59:39.000000000 +0930
@@ -1086,6 +1086,7 @@ mark_jump_label_1 (rtx x, rtx insn, bool
       return;
 
     case RETURN:
+    case SIMPLE_RETURN:
       if (is_target)
 	{
 	  gcc_assert (JUMP_LABEL (insn) == NULL || JUMP_LABEL (insn) == x);
@@ -1408,7 +1409,7 @@ redirect_exp_1 (rtx *loc, rtx olabel, rt
   int i;
   const char *fmt;
 
-      if ((code == LABEL_REF && XEXP (x, 0) == olabel)
+  if ((code == LABEL_REF && XEXP (x, 0) == olabel)
       || x == olabel)
     {
       x = redirect_target (nlabel);

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 48+ messages in thread

* PowerPC shrink-wrap support 2 of 3
  2011-09-17  7:59 PowerPC shrink-wrap support 0 of 3 Alan Modra
  2011-09-17  8:22 ` PowerPC shrink-wrap support 1 " Alan Modra
@ 2011-09-17  9:13 ` Alan Modra
  2011-09-17  9:26 ` PowerPC shrink-wrap support 3 " Alan Modra
  2011-09-17 18:16 ` PowerPC shrink-wrap support 0 of 3 Bernd Schmidt
  3 siblings, 0 replies; 48+ messages in thread
From: Alan Modra @ 2011-09-17  9:13 UTC (permalink / raw)
  To: gcc-patches, Bernd Schmidt

This patch corrects some cases where a return jump jump_label wrongly
uses ret_rtx with a simple_return, thus triggering the assert in
mark_jump_label_1.  PowerPC doesn't need this patch with the final
version of my shrink wrap support, but at one stage I was using

(define_expand "return"
  [(simple_return)]
  "direct_return ()")

(define_insn "simple_return"
  [(simple_return)]
  ""
  "{br|blr}"
  [(set_attr "type" "jmpreg")])

in rs6000.md, similar to mips.md.  So gen_return() would create an
instruction using simple_return, and the code patched here would put
ret_rtx into JUMP_LABEL.

	* rtl.h (set_return_jump_label): Declare.
	* function.c (set_return_jump_label): New function, extracted..
	(thread_prologue_and_epilogue_insns): ..from here.  Use it in
	another instance to set return jump_label.
	* cfgrtl.c (force_nonfallthru_and_redirect): Use set_return_jump_label.
	* reorg.c (find_end_label): Likewise.

diff -urp -x.svn -x'*~' -x'*.orig' gcc-bernd/gcc/rtl.h gcc-current/gcc/rtl.h
--- gcc-bernd/gcc/rtl.h	2011-09-16 11:52:14.000000000 +0930
+++ gcc-current/gcc/rtl.h	2011-09-16 00:24:55.000000000 +0930
@@ -2502,6 +2502,7 @@ extern int sibcall_epilogue_contains (co
 extern void mark_temp_addr_taken (rtx);
 extern void update_temp_slot_address (rtx, rtx);
 extern void maybe_copy_prologue_epilogue_insn (rtx, rtx);
+extern void set_return_jump_label (rtx);
 
 /* In stmt.c */
 extern void expand_null_return (void);
diff -urp -x.svn -x'*~' -x'*.orig' gcc-bernd/gcc/function.c gcc-current/gcc/function.c
--- gcc-bernd/gcc/function.c	2011-09-16 11:53:40.000000000 +0930
+++ gcc-current/gcc/function.c	2011-09-16 00:24:55.000000000 +0930
@@ -5395,6 +5395,20 @@ emit_return_into_block (bool simple_p, b
 }
 #endif
 
+/* Set JUMP_LABEL for a return insn.  */
+
+void
+set_return_jump_label (rtx returnjump)
+{
+  rtx pat = PATTERN (returnjump);
+  if (GET_CODE (pat) == PARALLEL)
+    pat = XVECEXP (pat, 0, 0);
+  if (ANY_RETURN_P (pat))
+    JUMP_LABEL (returnjump) = pat;
+  else
+    JUMP_LABEL (returnjump) = ret_rtx;
+}
+
 /* Generate the prologue and epilogue RTL if the machine supports it.  Thread
    this into place with notes indicating where the prologue ends and where
    the epilogue begins.  Update the basic block information when possible.
@@ -5883,7 +5897,7 @@ thread_prologue_and_epilogue_insns (void
 	  emit_return_into_block (false, last_bb);
 	  epilogue_end = BB_END (last_bb);
 	  if (JUMP_P (epilogue_end))
-	    JUMP_LABEL (epilogue_end) = ret_rtx;
+	    set_return_jump_label (epilogue_end);
 	  single_succ_edge (last_bb)->flags &= ~EDGE_FALLTHRU;
 	  goto epilogue_done;
 	}
@@ -5948,15 +5962,7 @@ thread_prologue_and_epilogue_insns (void
       inserted = true;
 
       if (JUMP_P (returnjump))
-	{
-	  rtx pat = PATTERN (returnjump);
-	  if (GET_CODE (pat) == PARALLEL)
-	    pat = XVECEXP (pat, 0, 0);
-	  if (ANY_RETURN_P (pat))
-	    JUMP_LABEL (returnjump) = pat;
-	  else
-	    JUMP_LABEL (returnjump) = ret_rtx;
-	}
+	set_return_jump_label (returnjump);
     }
   else
 #endif
diff -urp -x.svn -x'*~' -x'*.orig' gcc-bernd/gcc/cfgrtl.c gcc-current/gcc/cfgrtl.c
--- gcc-bernd/gcc/cfgrtl.c	2011-09-16 11:52:15.000000000 +0930
+++ gcc-current/gcc/cfgrtl.c	2011-09-16 09:55:38.000000000 +0930
@@ -1273,7 +1273,7 @@ force_nonfallthru_and_redirect (edge e, 
 	  gcc_unreachable ();
 #endif
 	}
-      JUMP_LABEL (BB_END (jump_block)) = jump_label;
+      set_return_jump_label (BB_END (jump_block));
     }
   else
     {
diff -urp -x.svn -x'*~' -x'*.orig' gcc-bernd/gcc/reorg.c gcc-current/gcc/reorg.c
--- gcc-bernd/gcc/reorg.c	2011-09-16 11:52:12.000000000 +0930
+++ gcc-current/gcc/reorg.c	2011-09-16 00:07:46.000000000 +0930
@@ -467,7 +467,7 @@ find_end_label (rtx kind)
 	      /* The return we make may have delay slots too.  */
 	      rtx insn = gen_return ();
 	      insn = emit_jump_insn (insn);
-	      JUMP_LABEL (insn) = ret_rtx;
+	      set_return_jump_label (insn);
 	      emit_barrier ();
 	      if (num_delay_slots (insn) > 0)
 		obstack_ptr_grow (&unfilled_slots_obstack, insn);

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 48+ messages in thread

* PowerPC shrink-wrap support 3 of 3
  2011-09-17  7:59 PowerPC shrink-wrap support 0 of 3 Alan Modra
  2011-09-17  8:22 ` PowerPC shrink-wrap support 1 " Alan Modra
  2011-09-17  9:13 ` PowerPC shrink-wrap support 2 " Alan Modra
@ 2011-09-17  9:26 ` Alan Modra
  2011-09-26 14:25   ` Alan Modra
  2011-09-17 18:16 ` PowerPC shrink-wrap support 0 of 3 Bernd Schmidt
  3 siblings, 1 reply; 48+ messages in thread
From: Alan Modra @ 2011-09-17  9:26 UTC (permalink / raw)
  To: gcc-patches, Bernd Schmidt, David Edelsohn

Finally, the powerpc backend changes.  These are mostly just
mechanical.  I'll note that we need both "simple_return" and "return"
variants of the conditional returns because they can only be used when
no epilogue is required.  The "return" variant must use
direct_return() as a predicate to check this requirement.  The
"simple_return" variant is used by the shrink_wrap optimization to
enable a conditional return before the function prologue, and should
*not* use direct_return() because the optimization is valid for
functions that do require an epilogue.  All other instructions in the
rs6000 backend that currently use "return" are generated from the
epilogue or are sibling calls, so "simple_return" is appropriate as
per md.texi.

	* config/rs6000/rs6000.c (rs6000_make_savres_rtx): Delete unneeded
	declaration.
	(rs6000_make_savres_rtx): Rename to rs6000_emit_savres_rtx.  Use
	simple_return in pattern, emit instruction, and set jump_label.
	(rs6000_emit_prologue): Update for rs6000_emit_savres_rtx.  Use
	simple_return rather than return.
	(rs6000_output_mi_thunk): Use simple_return rather than return.
	* config/rs6000/altivec.md (restore_world): Likewise.
	* config/rs6000/spe.md (return_and_restore_gpregs_spe): Likewise.
	* config/rs6000/rs6000.md (sibcall*, sibcall_value*): Likewise.
	(return_internal*, return_and_restore*): Likewise.
	(any_return, return_pred, return_str): New iterators.
	(return, conditional return insns): Provide both return and
	simple_return variants.

diff -urp -x.svn -x'*~' -x'*.orig' gcc-bernd/gcc/config/rs6000/rs6000.c gcc-current/gcc/config/rs6000/rs6000.c
--- gcc-bernd/gcc/config/rs6000/rs6000.c	2011-09-16 11:52:15.000000000 +0930
+++ gcc-current/gcc/config/rs6000/rs6000.c	2011-09-15 22:25:50.000000000 +0930
@@ -899,8 +900,6 @@ static const char *rs6000_mangle_type (c
 static void rs6000_set_default_type_attributes (tree);
 static rtx rs6000_savres_routine_sym (rs6000_stack_t *, bool, bool, bool);
 static rtx rs6000_emit_stack_reset (rs6000_stack_t *, rtx, rtx, int, bool);
-static rtx rs6000_make_savres_rtx (rs6000_stack_t *, rtx, int,
-				   enum machine_mode, bool, bool, bool);
 static bool rs6000_reg_live_or_pic_offset_p (int);
 static tree rs6000_builtin_vectorized_libmass (tree, tree, tree);
 static tree rs6000_builtin_vectorized_function (tree, tree, tree);
@@ -19729,10 +19764,11 @@ rs6000_emit_stack_reset (rs6000_stack_t 
 }
 
 /* Construct a parallel rtx describing the effect of a call to an
-   out-of-line register save/restore routine.  */
+   out-of-line register save/restore routine, and emit the insn
+   or jump_insn as appropriate.  */
 
 static rtx
-rs6000_make_savres_rtx (rs6000_stack_t *info,
+rs6000_emit_savres_rtx (rs6000_stack_t *info,
 			rtx frame_reg_rtx, int save_area_offset,
 			enum machine_mode reg_mode,
 			bool savep, bool gpr, bool lr)
@@ -19742,6 +19778,7 @@ rs6000_make_savres_rtx (rs6000_stack_t *
   int reg_size = GET_MODE_SIZE (reg_mode);
   rtx sym;
   rtvec p;
+  rtx par, insn;
 
   offset = 0;
   start_reg = (gpr
@@ -19752,7 +19789,7 @@ rs6000_make_savres_rtx (rs6000_stack_t *
   p = rtvec_alloc ((lr ? 4 : 3) + n_regs);
 
   if (!savep && lr)
-    RTVEC_ELT (p, offset++) = ret_rtx;
+    RTVEC_ELT (p, offset++) = simple_return_rtx;
 
   RTVEC_ELT (p, offset++)
     = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, 65));
@@ -19788,7 +19825,16 @@ rs6000_make_savres_rtx (rs6000_stack_t *
       RTVEC_ELT (p, i + offset) = gen_rtx_SET (VOIDmode, mem, reg);
     }
 
-  return gen_rtx_PARALLEL (VOIDmode, p);
+  par = gen_rtx_PARALLEL (VOIDmode, p);
+
+  if (!savep && lr)
+    {
+      insn = emit_jump_insn (par);
+      JUMP_LABEL (insn) = simple_return_rtx;
+    }
+  else
+    insn = emit_insn (par);
+  return insn;
 }
 
 /* Determine whether the gp REG is really used.  */
@@ -20087,16 +20133,13 @@ rs6000_emit_prologue (void)
     }
   else if (!WORLD_SAVE_P (info) && info->first_fp_reg_save != 64)
     {
-      rtx par;
-
-      par = rs6000_make_savres_rtx (info, frame_reg_rtx,
-				    info->fp_save_offset + sp_offset,
-				    DFmode,
-				    /*savep=*/true, /*gpr=*/false,
-				    /*lr=*/(strategy
-					    & SAVE_NOINLINE_FPRS_SAVES_LR)
-					   != 0);
-      insn = emit_insn (par);
+      insn = rs6000_emit_savres_rtx (info, frame_reg_rtx,
+				     info->fp_save_offset + sp_offset,
+				     DFmode,
+				     /*savep=*/true, /*gpr=*/false,
+				     /*lr=*/((strategy
+					      & SAVE_NOINLINE_FPRS_SAVES_LR)
+					     != 0));
       rs6000_frame_related (insn, frame_ptr_rtx, info->total_size,
 			    NULL_RTX, NULL_RTX);
     }
@@ -20186,13 +20229,10 @@ rs6000_emit_prologue (void)
 	}
       else
 	{
-	  rtx par;
-
-	  par = rs6000_make_savres_rtx (info, gen_rtx_REG (Pmode, 11),
-					0, reg_mode,
-					/*savep=*/true, /*gpr=*/true,
-					/*lr=*/false);
-	  insn = emit_insn (par);
+	  insn = rs6000_emit_savres_rtx (info, gen_rtx_REG (Pmode, 11),
+					 0, reg_mode,
+					 /*savep=*/true, /*gpr=*/true,
+					 /*lr=*/false);
 	  rs6000_frame_related (insn, frame_ptr_rtx, info->total_size,
 				NULL_RTX, NULL_RTX);
 	}
@@ -20204,8 +20244,6 @@ rs6000_emit_prologue (void)
     }
   else if (!WORLD_SAVE_P (info) && !saving_GPRs_inline)
     {
-      rtx par;
-
       /* Need to adjust r11 (r12) if we saved any FPRs.  */
       if (info->first_fp_reg_save != 64)
         {
@@ -20216,14 +20254,13 @@ rs6000_emit_prologue (void)
 	  emit_insn (gen_add3_insn (dest_reg, frame_reg_rtx, offset));
         }
 
-      par = rs6000_make_savres_rtx (info, frame_reg_rtx,
-				    info->gp_save_offset + sp_offset,
-				    reg_mode,
-				    /*savep=*/true, /*gpr=*/true,
-				    /*lr=*/(strategy
-					    & SAVE_NOINLINE_GPRS_SAVES_LR)
-					   != 0);
-      insn = emit_insn (par);
+      insn = rs6000_emit_savres_rtx (info, frame_reg_rtx,
+				     info->gp_save_offset + sp_offset,
+				     reg_mode,
+				     /*savep=*/true, /*gpr=*/true,
+				     /*lr=*/((strategy
+					      & SAVE_NOINLINE_GPRS_SAVES_LR)
+					     != 0));
       rs6000_frame_related (insn, frame_ptr_rtx, info->total_size,
 			    NULL_RTX, NULL_RTX);
     }
@@ -20750,7 +20805,7 @@ rs6000_emit_epilogue (int sibcall)
       alloc_rname = ggc_strdup (rname);
 
       j = 0;
-      RTVEC_ELT (p, j++) = ret_rtx;
+      RTVEC_ELT (p, j++) = simple_return_rtx;
       RTVEC_ELT (p, j++) = gen_rtx_USE (VOIDmode,
 					gen_rtx_REG (Pmode,
 						     LR_REGNO));
@@ -21165,13 +21214,11 @@ rs6000_emit_epilogue (int sibcall)
 	}
       else
 	{
-	  rtx par;
+	  rs6000_emit_savres_rtx (info, gen_rtx_REG (Pmode, 11),
+				  0, reg_mode,
+				  /*savep=*/false, /*gpr=*/true,
+				  /*lr=*/true);
 
-	  par = rs6000_make_savres_rtx (info, gen_rtx_REG (Pmode, 11),
-					0, reg_mode,
-					/*savep=*/false, /*gpr=*/true,
-					/*lr=*/true);
-	  emit_jump_insn (par);
 	  /* We don't want anybody else emitting things after we jumped
 	     back.  */
 	  return;
@@ -21181,12 +21228,22 @@ rs6000_emit_epilogue (int sibcall)
     {
       /* We are jumping to an out-of-line function.  */
       bool can_use_exit = info->first_fp_reg_save == 64;
-      rtx par;
 
       /* Emit stack reset code if we need it.  */
       if (can_use_exit)
-	rs6000_emit_stack_reset (info, sp_reg_rtx, frame_reg_rtx,
-				 sp_offset, can_use_exit);
+	{
+	  rs6000_emit_stack_reset (info, sp_reg_rtx, frame_reg_rtx,
+				   sp_offset, can_use_exit);
+	  if (info->cr_save_p)
+	    {
+	      rs6000_restore_saved_cr (cr_save_reg, using_mtcr_multiple);
+	      if (DEFAULT_ABI == ABI_V4)
+		cfa_restores
+		  = alloc_reg_note (REG_CFA_RESTORE,
+				    gen_rtx_REG (SImode, CR2_REGNO),
+				    cfa_restores);
+	    }
+	}
       else
 	{
 	  emit_insn (gen_add3_insn (gen_rtx_REG (Pmode, DEFAULT_ABI == ABI_AIX
@@ -21197,31 +21254,18 @@ rs6000_emit_epilogue (int sibcall)
 	    sp_offset += info->fp_size;
 	}
 
-      par = rs6000_make_savres_rtx (info, frame_reg_rtx,
-				    info->gp_save_offset, reg_mode,
-				    /*savep=*/false, /*gpr=*/true,
-				    /*lr=*/can_use_exit);
+      insn = rs6000_emit_savres_rtx (info, frame_reg_rtx,
+				     info->gp_save_offset, reg_mode,
+				     /*savep=*/false, /*gpr=*/true,
+				     /*lr=*/can_use_exit);
 
       if (can_use_exit)
 	{
-	  if (info->cr_save_p)
-	    {
-	      rs6000_restore_saved_cr (cr_save_reg, using_mtcr_multiple);
-	      if (DEFAULT_ABI == ABI_V4)
-		cfa_restores
-		  = alloc_reg_note (REG_CFA_RESTORE,
-				    gen_rtx_REG (SImode, CR2_REGNO),
-				    cfa_restores);
-	    }
-
-	  emit_jump_insn (par);
-
 	  /* We don't want anybody else emitting things after we jumped
 	     back.  */
 	  return;
 	}
 
-      insn = emit_insn (par);
       if (DEFAULT_ABI == ABI_V4)
 	{
 	  if (frame_pointer_needed)
@@ -21366,7 +21410,7 @@ rs6000_emit_epilogue (int sibcall)
       else
 	p = rtvec_alloc (2);
 
-      RTVEC_ELT (p, 0) = ret_rtx;
+      RTVEC_ELT (p, 0) = simple_return_rtx;
       RTVEC_ELT (p, 1) = ((restoring_FPRs_inline || !lr)
 			  ? gen_rtx_USE (VOIDmode, gen_rtx_REG (Pmode, 65))
 			  : gen_rtx_CLOBBER (VOIDmode,
@@ -21768,7 +21812,7 @@ rs6000_output_mi_thunk (FILE *file, tree
 			gen_rtx_USE (VOIDmode,
 				     gen_rtx_REG (SImode,
 						  LR_REGNO)),
-			ret_rtx)));
+			simple_return_rtx)));
   SIBLING_CALL_P (insn) = 1;
   emit_barrier ();
 
diff -urp -x.svn -x'*~' -x'*.orig' gcc-bernd/gcc/config/rs6000/altivec.md gcc-current/gcc/config/rs6000/altivec.md
--- gcc-bernd/gcc/config/rs6000/altivec.md	2011-09-16 11:52:15.000000000 +0930
+++ gcc-current/gcc/config/rs6000/altivec.md	2011-09-15 18:28:43.000000000 +0930
@@ -306,7 +306,7 @@
 
 (define_insn "*restore_world"
  [(match_parallel 0 "restore_world_operation"
-                  [(return)
+                  [(simple_return)
 		   (use (reg:SI 65))
                    (use (match_operand:SI 1 "call_operand" "s"))
                    (clobber (match_operand:SI 2 "gpc_reg_operand" "=r"))])]
diff -urp -x.svn -x'*~' -x'*.orig' gcc-bernd/gcc/config/rs6000/rs6000.md gcc-current/gcc/config/rs6000/rs6000.md
--- gcc-bernd/gcc/config/rs6000/rs6000.md	2011-09-16 11:52:15.000000000 +0930
+++ gcc-current/gcc/config/rs6000/rs6000.md	2011-09-16 11:32:31.000000000 +0930
@@ -264,6 +265,12 @@
 ; Iterator for just SF/DF
 (define_mode_iterator SFDF [SF DF])
 
+; Conditional returns.
+(define_code_iterator any_return [return simple_return])
+(define_code_attr return_pred [(return "direct_return ()")
+			       (simple_return "")])
+(define_code_attr return_str [(return "") (simple_return "simple_")])
+
 ; Various instructions that come in SI and DI forms.
 ; A generic w/d attribute, for things like cmpw/cmpd.
 (define_mode_attr wd [(QI "b") (HI "h") (SI "w") (DI "d")])
@@ -12814,7 +12831,7 @@
 		    (match_operand 1 "" ""))
 	      (use (match_operand 2 "" ""))
 	      (use (reg:SI LR_REGNO))
-	      (return)])]
+	      (simple_return)])]
   ""
   "
 {
@@ -12838,7 +12855,7 @@
 	 (match_operand 1 "" "g,g"))
    (use (match_operand:SI 2 "immediate_operand" "O,n"))
    (use (reg:SI LR_REGNO))
-   (return)]
+   (simple_return)]
   "(INTVAL (operands[2]) & CALL_LONG) == 0"
   "*
 {
@@ -12858,7 +12875,7 @@
 	 (match_operand 1 "" "g,g"))
    (use (match_operand:SI 2 "immediate_operand" "O,n"))
    (use (reg:SI LR_REGNO))
-   (return)]
+   (simple_return)]
   "TARGET_64BIT && (INTVAL (operands[2]) & CALL_LONG) == 0"
   "*
 {
@@ -12879,7 +12896,7 @@
 	      (match_operand 2 "" "g,g")))
    (use (match_operand:SI 3 "immediate_operand" "O,n"))
    (use (reg:SI LR_REGNO))
-   (return)]
+   (simple_return)]
   "(INTVAL (operands[3]) & CALL_LONG) == 0"
   "*
 {
@@ -12901,7 +12918,7 @@
 	      (match_operand 2 "" "g,g")))
    (use (match_operand:SI 3 "immediate_operand" "O,n"))
    (use (reg:SI LR_REGNO))
-   (return)]
+   (simple_return)]
   "TARGET_64BIT && (INTVAL (operands[3]) & CALL_LONG) == 0"
   "*
 {
@@ -12921,7 +12938,7 @@
 	 (match_operand 1 "" "g,g"))
    (use (match_operand:SI 2 "immediate_operand" "O,O"))
    (use (reg:SI LR_REGNO))
-   (return)]
+   (simple_return)]
   "DEFAULT_ABI == ABI_AIX
    && (INTVAL (operands[2]) & CALL_LONG) == 0"
   "@
@@ -12936,7 +12953,7 @@
 	      (match_operand 2 "" "g,g")))
    (use (match_operand:SI 3 "immediate_operand" "O,O"))
    (use (reg:SI LR_REGNO))
-   (return)]
+   (simple_return)]
   "DEFAULT_ABI == ABI_AIX
    && (INTVAL (operands[3]) & CALL_LONG) == 0"
   "@
@@ -12950,7 +12967,7 @@
 	 (match_operand 1 "" ""))
    (use (match_operand 2 "immediate_operand" "O,n,O,n"))
    (use (reg:SI LR_REGNO))
-   (return)]
+   (simple_return)]
   "(DEFAULT_ABI == ABI_DARWIN
     || DEFAULT_ABI == ABI_V4)
    && (INTVAL (operands[2]) & CALL_LONG) == 0"
@@ -12981,7 +12998,7 @@
 		      (match_operand 2 "" "")))
 	      (use (match_operand 3 "" ""))
 	      (use (reg:SI LR_REGNO))
-	      (return)])]
+	      (simple_return)])]
   ""
   "
 {
@@ -13002,7 +13019,7 @@
 	      (match_operand 2 "" "")))
    (use (match_operand:SI 3 "immediate_operand" "O,n,O,n"))
    (use (reg:SI LR_REGNO))
-   (return)]
+   (simple_return)]
   "(DEFAULT_ABI == ABI_DARWIN
     || DEFAULT_ABI == ABI_V4)
    && (INTVAL (operands[3]) & CALL_LONG) == 0"
@@ -15328,9 +15345,9 @@
 				      [(match_operand 1
 						      "cc_reg_operand" "y")
 				       (const_int 0)])
-		      (return)
+		      (any_return)
 		      (pc)))]
-  "direct_return ()"
+  "<return_pred>"
   "*
 {
   return output_cbranch (operands[0], NULL, 0, insn);
@@ -15360,8 +15377,8 @@
 						      "cc_reg_operand" "y")
 				       (const_int 0)])
 		      (pc)
-		      (return)))]
-  "direct_return ()"
+		      (any_return)))]
+  "<return_pred>"
   "*
 {
   return output_cbranch (operands[0], NULL, 1, insn);
@@ -15491,9 +15508,9 @@
   "b %l0"
   [(set_attr "type" "branch")])
 
-(define_insn "return"
-  [(return)]
-  "direct_return ()"
+(define_insn "<return_str>return"
+  [(any_return)]
+  "<return_pred>"
   "{br|blr}"
   [(set_attr "type" "jmpreg")])
 
@@ -16015,7 +16032,7 @@
    (set_attr "cell_micro" "always")])
 
 (define_insn "*return_internal_<mode>"
-  [(return)
+  [(simple_return)
    (use (match_operand:P 0 "register_operand" "lc"))]
   ""
   "b%T0"
@@ -16077,7 +16094,7 @@
 
 (define_insn "*return_and_restore_gpregs_<mode>_r11"
  [(match_parallel 0 "any_parallel_operand"
-                  [(return)
+                  [(simple_return)
 		   (clobber (match_operand:P 1 "register_operand" "=l"))
 		   (use (match_operand:P 2 "symbol_ref_operand" "s"))
                    (use (reg:P 11))
@@ -16090,7 +16107,7 @@
 
 (define_insn "*return_and_restore_gpregs_<mode>_r12"
  [(match_parallel 0 "any_parallel_operand"
-                  [(return)
+                  [(simple_return)
 		   (clobber (match_operand:P 1 "register_operand" "=l"))
 		   (use (match_operand:P 2 "symbol_ref_operand" "s"))
                    (use (reg:P 12))
@@ -16103,7 +16120,7 @@
 
 (define_insn "*return_and_restore_gpregs_<mode>_r1"
  [(match_parallel 0 "any_parallel_operand"
-                  [(return)
+                  [(simple_return)
 		   (clobber (match_operand:P 1 "register_operand" "=l"))
 		   (use (match_operand:P 2 "symbol_ref_operand" "s"))
                    (use (reg:P 1))
@@ -16116,7 +16133,7 @@
 
 (define_insn "*return_and_restore_fpregs_<mode>_r11"
  [(match_parallel 0 "any_parallel_operand"
-                  [(return)
+                  [(simple_return)
 		   (clobber (match_operand:P 1 "register_operand" "=l"))
 		   (use (match_operand:P 2 "symbol_ref_operand" "s"))
                    (use (reg:P 11))
@@ -16129,7 +16146,7 @@
 
 (define_insn "*return_and_restore_fpregs_<mode>_r12"
  [(match_parallel 0 "any_parallel_operand"
-                  [(return)
+                  [(simple_return)
 		   (clobber (match_operand:P 1 "register_operand" "=l"))
 		   (use (match_operand:P 2 "symbol_ref_operand" "s"))
                    (use (reg:P 12))
@@ -16142,7 +16159,7 @@
 
 (define_insn "*return_and_restore_fpregs_<mode>_r1"
  [(match_parallel 0 "any_parallel_operand"
-                  [(return)
+                  [(simple_return)
 		   (clobber (match_operand:P 1 "register_operand" "=l"))
 		   (use (match_operand:P 2 "symbol_ref_operand" "s"))
                    (use (reg:P 1))
@@ -16155,7 +16172,7 @@
 
 (define_insn "*return_and_restore_fpregs_aix_<mode>_r11"
  [(match_parallel 0 "any_parallel_operand"
-		  [(return)
+		  [(simple_return)
 		   (use (match_operand:P 1 "register_operand" "l"))
 		   (use (match_operand:P 2 "symbol_ref_operand" "s"))
 		   (use (reg:P 11))
@@ -16168,7 +16185,7 @@
 
 (define_insn "*return_and_restore_fpregs_aix_<mode>_r1"
  [(match_parallel 0 "any_parallel_operand"
-		  [(return)
+		  [(simple_return)
 		   (use (match_operand:P 1 "register_operand" "l"))
 		   (use (match_operand:P 2 "symbol_ref_operand" "s"))
 		   (use (reg:P 1))
diff -urp -x.svn -x'*~' -x'*.orig' gcc-bernd/gcc/config/rs6000/spe.md gcc-current/gcc/config/rs6000/spe.md
--- gcc-bernd/gcc/config/rs6000/spe.md	2011-09-16 11:52:15.000000000 +0930
+++ gcc-current/gcc/config/rs6000/spe.md	2011-09-15 18:28:43.000000000 +0930
@@ -3178,7 +3178,7 @@
 
 (define_insn "*return_and_restore_gpregs_spe"
  [(match_parallel 0 "any_parallel_operand"
-		  [(return)
+		  [(simple_return)
 		   (clobber (reg:P 65))
 		   (use (match_operand:P 1 "symbol_ref_operand" "s"))
 		   (use (reg:P 11))

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: PowerPC shrink-wrap support 0 of 3
  2011-09-17  7:59 PowerPC shrink-wrap support 0 of 3 Alan Modra
                   ` (2 preceding siblings ...)
  2011-09-17  9:26 ` PowerPC shrink-wrap support 3 " Alan Modra
@ 2011-09-17 18:16 ` Bernd Schmidt
  2011-09-19  5:39   ` Alan Modra
  3 siblings, 1 reply; 48+ messages in thread
From: Bernd Schmidt @ 2011-09-17 18:16 UTC (permalink / raw)
  To: gcc-patches, David Edelsohn

On 09/17/11 09:16, Alan Modra wrote:
> This patch series adds shrink-wrap support for PowerPC.  The patches
> are on top of Bernd's "Initial shrink-wrapping patch":
> http://gcc.gnu.org/ml/gcc-patches/2011-08/msg02557.html, but with the
> tm.texi patch applied to tm.texi.in.  Bootstrapped and regression
> tested powerpc64-linux all langs except ada, and spec CPU2006 tested.
> The spec results were a little disappointing as I expected to see some
> gains, but my baseline was a -O3 run and I suppose most of the
> shrink-wrap opportunities were lost to inlining.

The last posted version had a bug that crept in during the review cycle,
and which made it quite ineffective. Find this block
          /* As a special case, check for jumps to the last bb that
             cannot successfully be converted to simple_returns later
             on, and mark them as requiring a frame.  These are
             conditional jumps that jump to their fallthru block, so
             it's not a case that is expected to occur often.  */

and replace "condjump_p" in the if statement with "any_condjump_p".

There'll be further patches which increase the number of shrink-wrapping
opportunities we get to see. One of them was submitted earlier:

 http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01499.html
 http://gcc.gnu.org/ml/gcc-patches/2011-03/msg02191.html


Bernd

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: PowerPC shrink-wrap support 0 of 3
  2011-09-17 18:16 ` PowerPC shrink-wrap support 0 of 3 Bernd Schmidt
@ 2011-09-19  5:39   ` Alan Modra
  2011-09-19 13:36     ` Bernd Schmidt
  0 siblings, 1 reply; 48+ messages in thread
From: Alan Modra @ 2011-09-19  5:39 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: gcc-patches, David Edelsohn

On Sat, Sep 17, 2011 at 03:26:21PM +0200, Bernd Schmidt wrote:
> On 09/17/11 09:16, Alan Modra wrote:
> > This patch series adds shrink-wrap support for PowerPC.  The patches
> > are on top of Bernd's "Initial shrink-wrapping patch":
> > http://gcc.gnu.org/ml/gcc-patches/2011-08/msg02557.html, but with the
> > tm.texi patch applied to tm.texi.in.  Bootstrapped and regression
> > tested powerpc64-linux all langs except ada, and spec CPU2006 tested.
> > The spec results were a little disappointing as I expected to see some
> > gains, but my baseline was a -O3 run and I suppose most of the
> > shrink-wrap opportunities were lost to inlining.
> 
> The last posted version had a bug that crept in during the review cycle,
> and which made it quite ineffective.

I wasn't complaining!  My disappointment really stemmed from having
unrealistically high expectations.  I still think this optimization is
a great feature.  Thanks for contributing it!

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: PowerPC shrink-wrap support 0 of 3
  2011-09-19  5:39   ` Alan Modra
@ 2011-09-19 13:36     ` Bernd Schmidt
       [not found]       ` <20110921152851.GE10321@bubble.grove.modra.org>
  0 siblings, 1 reply; 48+ messages in thread
From: Bernd Schmidt @ 2011-09-19 13:36 UTC (permalink / raw)
  To: gcc-patches, David Edelsohn

On 09/19/11 02:16, Alan Modra wrote:
> On Sat, Sep 17, 2011 at 03:26:21PM +0200, Bernd Schmidt wrote:
>> On 09/17/11 09:16, Alan Modra wrote:
>>> This patch series adds shrink-wrap support for PowerPC.  The patches
>>> are on top of Bernd's "Initial shrink-wrapping patch":
>>> http://gcc.gnu.org/ml/gcc-patches/2011-08/msg02557.html, but with the
>>> tm.texi patch applied to tm.texi.in.  Bootstrapped and regression
>>> tested powerpc64-linux all langs except ada, and spec CPU2006 tested.
>>> The spec results were a little disappointing as I expected to see some
>>> gains, but my baseline was a -O3 run and I suppose most of the
>>> shrink-wrap opportunities were lost to inlining.
>>
>> The last posted version had a bug that crept in during the review cycle,
>> and which made it quite ineffective.
> 
> I wasn't complaining!

I wasn't implying you were :) I was just hoping to see some good
benchmark results too, so that I can feel that all the pain was actually
worth it...


Bernd

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: PowerPC shrink-wrap support 3 of 3
  2011-09-17  9:26 ` PowerPC shrink-wrap support 3 " Alan Modra
@ 2011-09-26 14:25   ` Alan Modra
  2011-09-27  0:15     ` Alan Modra
  0 siblings, 1 reply; 48+ messages in thread
From: Alan Modra @ 2011-09-26 14:25 UTC (permalink / raw)
  To: gcc-patches, Bernd Schmidt, David Edelsohn

This patch fixes an issue that limit opportunities for shrink-wrapping
on PowerPC.  The rs6000 REG_ALLOC_ORDER chooses r0 as the very first
gpr to use in code, with r11 also having high priority.  This means it
is quite likely that r0 or r11 is live on the edge chosen for
shrink-wrapping.  That's unfortunate since rs6000 uses r0, r11, and
r12 as temporaries in the prologue, and thus can't insert the prologue
on that edge.

I avoid this situation in most cases by simply changing the register
allocation order, making r0 and r11 the last of the volatile gprs to
be used.  I also noticed that r12 is not used until after the
non-volatile regs are allocated.  According to the comment, the reason
this is done is to avoid putting a DI into the r12,r13 pair (and as
r13 is non-volatile would mean saving and restoring all the
non-volatile gprs when not inlining the saves/restores).  However,
when r13 is a fixed reg this cannot occur, so moving r12 before the
non-volatile regs makes another reg available before we need to start
saving regs.

After making the register allocation order changes I found gcc
bootstrap failed.  This turned out to be caused by shrink-wrapping
and register renaming moving some pre-prologue code after the
epilogue.  To support this we need to ensure no prologue unwind info
is left active after the epilogue.

Bootstrapped and regression tested powerpc-linux.  Two regressions
appeared due to a problem in the shrink-wrap code.  More on that
later.

	* config/rs6000/rs6000.c (rs6000_emit_epilogue): Emit cfa_restores
	when flag_shrink_wrap.
	* gcc/config/rs6000/rs6000.h (EARLY_R12, LATE_R12): Define.
	(REG_ALLOC_ORDER): Move r12 before call-saved regs when FIXED_R13.
	Move r11 and r0 later to suit shrink-wrapping.

diff -urp -x'*~' -x'*.orig' -x'*.rej' -x.svn gcc-alanshrink1/gcc/config/rs6000/rs6000.c gcc-current/gcc/config/rs6000/rs6000.c
--- gcc-alanshrink1/gcc/config/rs6000/rs6000.c	2011-09-26 15:51:19.000000000 +0930
+++ gcc-current/gcc/config/rs6000/rs6000.c	2011-09-23 11:54:00.000000000 +0930
@@ -21001,7 +21045,7 @@ rs6000_emit_epilogue (int sibcall)
 
 	    reg = gen_rtx_REG (V4SImode, i);
 	    emit_move_insn (reg, mem);
-	    if (DEFAULT_ABI == ABI_V4)
+	    if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap)
 	      cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg,
 					     cfa_restores);
 	  }
@@ -21052,10 +21096,16 @@ rs6000_emit_epilogue (int sibcall)
     }
 
   /* Set LR here to try to overlap restores below.  LR is always saved
-     above incoming stack, so it never needs REG_CFA_RESTORE.  */
+     above incoming stack, so it never needs REG_CFA_RESTORE except
+     when shrink-wrapping as in that case we may have blocks from
+     before the prologue being moved after the epilogue.  */
   if (restore_lr && restoring_GPRs_inline)
-    emit_move_insn (gen_rtx_REG (Pmode, LR_REGNO),
-		    gen_rtx_REG (Pmode, 0));
+    {
+      rtx lr = gen_rtx_REG (Pmode, LR_REGNO);
+      emit_move_insn (lr, gen_rtx_REG (Pmode, 0));
+      if (flag_shrink_wrap)
+	cfa_restores = alloc_reg_note (REG_CFA_RESTORE, lr, cfa_restores);
+    }
 
   /* Load exception handler data registers, if needed.  */
   if (crtl->calls_eh_return)
@@ -21146,7 +21196,7 @@ rs6000_emit_epilogue (int sibcall)
 		reg = gen_rtx_REG (reg_mode, info->first_gp_reg_save + i);
 
 		insn = emit_move_insn (reg, mem);
-		if (DEFAULT_ABI == ABI_V4)
+		if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap)
 		  {
 		    if (frame_pointer_needed
 			&& info->first_gp_reg_save + i
@@ -21188,7 +21238,7 @@ rs6000_emit_epilogue (int sibcall)
 	  if (info->cr_save_p)
 	    {
 	      rs6000_restore_saved_cr (cr_save_reg, using_mtcr_multiple);
-	      if (DEFAULT_ABI == ABI_V4)
+	      if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap)
 		cfa_restores
 		  = alloc_reg_note (REG_CFA_RESTORE,
 				    gen_rtx_REG (SImode, CR2_REGNO),
@@ -21217,7 +21267,7 @@ rs6000_emit_epilogue (int sibcall)
 	  return;
 	}
 
-      if (DEFAULT_ABI == ABI_V4)
+      if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap)
 	{
 	  if (frame_pointer_needed)
 	    {
@@ -21246,7 +21296,7 @@ rs6000_emit_epilogue (int sibcall)
 	  rtx reg = gen_rtx_REG (reg_mode, info->first_gp_reg_save + i);
 
 	  RTVEC_ELT (p, i) = gen_rtx_SET (VOIDmode, reg, mem);
-	  if (DEFAULT_ABI == ABI_V4)
+	  if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap)
 	    cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg,
 					   cfa_restores);
 	}
@@ -21271,7 +21321,7 @@ rs6000_emit_epilogue (int sibcall)
 	    rtx reg = gen_rtx_REG (reg_mode, info->first_gp_reg_save + i);
 
 	    insn = emit_move_insn (reg, mem);
-	    if (DEFAULT_ABI == ABI_V4)
+	    if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap)
 	      {
 	        if (frame_pointer_needed
 		    && info->first_gp_reg_save + i
@@ -21292,10 +21342,12 @@ rs6000_emit_epilogue (int sibcall)
     {
       rtx mem = gen_frame_mem_offset (Pmode, frame_reg_rtx,
 				     info->lr_save_offset + sp_offset);
+      rtx lr = gen_rtx_REG (Pmode, LR_REGNO);
 
       emit_move_insn (gen_rtx_REG (Pmode, 0), mem);
-      emit_move_insn (gen_rtx_REG (Pmode, LR_REGNO),
-		      gen_rtx_REG (Pmode, 0));
+      emit_move_insn (lr, gen_rtx_REG (Pmode, 0));
+      if (flag_shrink_wrap)
+	cfa_restores = alloc_reg_note (REG_CFA_RESTORE, lr, cfa_restores);
     }
 
   /* Restore fpr's if we need to do it without calling a function.  */
@@ -21316,7 +21368,7 @@ rs6000_emit_epilogue (int sibcall)
 			     info->first_fp_reg_save + i);
 
  	  emit_move_insn (reg, mem);
-	  if (DEFAULT_ABI == ABI_V4)
+	  if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap)
 	    cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg,
 					   cfa_restores);
 	}
@@ -21325,7 +21377,7 @@ rs6000_emit_epilogue (int sibcall)
   if (info->cr_save_p)
     {
       rs6000_restore_saved_cr (cr_save_reg, using_mtcr_multiple);
-      if (DEFAULT_ABI == ABI_V4)
+      if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap)
 	cfa_restores
 	  = alloc_reg_note (REG_CFA_RESTORE, gen_rtx_REG (SImode, CR2_REGNO),
 			    cfa_restores);
diff -urp -x'*~' -x'*.orig' -x'*.rej' -x.svn gcc-alanshrink1/gcc/config/rs6000/rs6000.h gcc-current/gcc/config/rs6000/rs6000.h
--- gcc-alanshrink1/gcc/config/rs6000/rs6000.h	2011-09-26 15:29:15.000000000 +0930
+++ gcc-current/gcc/config/rs6000/rs6000.h	2011-09-22 02:30:15.000000000 +0930
@@ -894,10 +894,11 @@ extern unsigned rs6000_pointer_size;
 	cr1		(not saved, but used for FP operations)
 	cr0		(not saved, but used for arithmetic operations)
 	cr4, cr3, cr2	(saved)
-	r0		(not saved; cannot be base reg)
 	r9		(not saved; best for TImode)
-	r11, r10, r8-r4	(not saved; highest used first to make less conflict)
+	r10, r8-r4	(not saved; highest first for less conflict with params)
 	r3		(not saved; return value register)
+	r11		(not saved; later alloc to help shrink-wrap)
+	r0		(not saved; cannot be base reg)
 	r31 - r13	(saved; order given to save least number)
 	r12		(not saved; if used for DImode or DFmode would use r13)
 	mq		(not saved; best to use it if we can)
@@ -922,6 +923,14 @@ extern unsigned rs6000_pointer_size;
 #define MAYBE_R2_FIXED
 #endif
 
+#if FIXED_R13 == 1
+#define EARLY_R12 12,
+#define LATE_R12
+#else
+#define EARLY_R12
+#define LATE_R12 12,
+#endif
+
 #define REG_ALLOC_ORDER						\
   {32,								\
    45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34,		\
@@ -929,11 +938,11 @@ extern unsigned rs6000_pointer_size;
    63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51,		\
    50, 49, 48, 47, 46,						\
    75, 74, 69, 68, 72, 71, 70,					\
-   0, MAYBE_R2_AVAILABLE					\
-   9, 11, 10, 8, 7, 6, 5, 4,					\
-   3,								\
+   MAYBE_R2_AVAILABLE						\
+   9, 10, 8, 7, 6, 5, 4,					\
+   3, EARLY_R12 11, 0,						\
    31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19,		\
-   18, 17, 16, 15, 14, 13, 12,					\
+   18, 17, 16, 15, 14, 13, LATE_R12				\
    64, 66, 65,							\
    73, 1, MAYBE_R2_FIXED 67, 76,				\
    /* AltiVec registers.  */					\

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH] PowerPC shrink-wrap support benchmark gains
       [not found]         ` <20110922144017.GF10321@bubble.grove.modra.org>
@ 2011-09-26 14:35           ` Alan Modra
  0 siblings, 0 replies; 48+ messages in thread
From: Alan Modra @ 2011-09-26 14:35 UTC (permalink / raw)
  To: Bernd Schmidt, Peter Bergner, gcc-patches

This patch increases opportunities for shrink-wrapping.  With this
applied, http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00754.html and
http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01499.html plus the fix
mentioned in http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01016.html,
along with my powerpc changes, I finally see some gains in CPU2006.

453.povray gives a massive 12% gain in one particular setup, with
other benchmarks in the suite mostly showing small improvements.
Some of this gain is likely due to having an extra gpr to play with
before we start saving regs.

	* gcc/ifcvt.c (dead_or_predicable): Disable if-conversion when
	doing so is likely to kill a shrink-wrapping opportunity.

diff -urp -x'*~' -x'*.orig' -x'*.rej' -x.svn gcc-alanshrink1/gcc/ifcvt.c gcc-current/gcc/ifcvt.c
--- gcc-alanshrink1/gcc/ifcvt.c	2011-09-26 15:29:14.000000000 +0930
+++ gcc-current/gcc/ifcvt.c	2011-09-26 14:24:53.000000000 +0930
@@ -4138,6 +4138,64 @@ dead_or_predicable (basic_block test_bb,
       FOR_BB_INSNS (merge_bb, insn)
 	if (NONDEBUG_INSN_P (insn))
 	  df_simulate_find_defs (insn, merge_set);
+
+      /* If shrink-wrapping, disable this optimization when test_bb is
+	 the first basic block and merge_bb exits.  The idea is to not
+	 move code setting up a return register as that may clobber a
+	 register used to pass function parameters, which then must be
+	 saved in caller-saved regs.  A caller-saved reg requires the
+	 prologue, killing a shrink-wrap opportunity.  */
+      if ((flag_shrink_wrap && !epilogue_completed)
+	  && ENTRY_BLOCK_PTR->next_bb == test_bb
+	  && single_succ_p (new_dest)
+	  && single_succ (new_dest) == EXIT_BLOCK_PTR
+	  && bitmap_intersect_p (df_get_live_in (new_dest), merge_set))
+	{
+	  regset return_regs;
+	  unsigned int i;
+
+	  return_regs = BITMAP_ALLOC (&reg_obstack);
+
+	  /* Start off with the intersection of regs used to pass
+	     params and regs used to return values.  */
+	  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+	    if (FUNCTION_ARG_REGNO_P (i)
+		&& FUNCTION_VALUE_REGNO_P (i))
+	      bitmap_set_bit (return_regs, INCOMING_REGNO (i));
+
+	  bitmap_and_into (return_regs, df_get_live_out (ENTRY_BLOCK_PTR));
+	  bitmap_and_into (return_regs, df_get_live_in (EXIT_BLOCK_PTR));
+	  if (!bitmap_empty_p (return_regs))
+	    {
+	      FOR_BB_INSNS_REVERSE (new_dest, insn)
+		if (NONDEBUG_INSN_P (insn))
+		  {
+		    df_ref *def_rec;
+		    unsigned int uid = INSN_UID (insn);
+
+		    /* If this insn sets any reg in return_regs..  */
+		    for (def_rec = DF_INSN_UID_DEFS (uid); *def_rec; def_rec++)
+		      {
+			df_ref def = *def_rec;
+			unsigned r = DF_REF_REGNO (def);
+
+			if (bitmap_bit_p (return_regs, r))
+			  break;
+		      }
+		    /* ..then add all reg uses to the set of regs
+		       we're interested in.  */
+		    if (*def_rec)
+		      df_simulate_uses (insn, return_regs);
+		  }
+	      if (bitmap_intersect_p (merge_set, return_regs))
+		{
+		  BITMAP_FREE (return_regs);
+		  BITMAP_FREE (merge_set);
+		  return FALSE;
+		}
+	    }
+	  BITMAP_FREE (return_regs);
+	}
     }
 
  no_body:

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: PowerPC shrink-wrap support 3 of 3
  2011-09-26 14:25   ` Alan Modra
@ 2011-09-27  0:15     ` Alan Modra
  2011-09-27  0:19       ` Bernd Schmidt
  0 siblings, 1 reply; 48+ messages in thread
From: Alan Modra @ 2011-09-27  0:15 UTC (permalink / raw)
  To: gcc-patches, Bernd Schmidt, Richard Henderson

On Mon, Sep 26, 2011 at 11:22:54PM +0930, Alan Modra wrote:
> Two regressions appeared due to a problem in the shrink-wrap code.

These two.
+FAIL: g++.dg/torture/pr46111.C  -O1  (internal compiler error)
+FAIL: gcc.dg/autopar/pr46099.c (internal compiler error)

Both "internal compiler error: in maybe_record_trace_start, at
dwarf2cfi.c:2243", caused by disjoint blocks in the set of blocks that
needs no prologue.  ie. thread_prologue_and_epilogue_insns gives

====
code needing no prologue
====  <- prologue inserted here
code needing prologue
====
more no prologue code
====
more code needing prologue
====
epilogue

That second block needing no prologue now wrongly tries to use unwind
info from the prologue.

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: PowerPC shrink-wrap support 3 of 3
  2011-09-27  0:15     ` Alan Modra
@ 2011-09-27  0:19       ` Bernd Schmidt
  2011-09-27  0:49         ` Alan Modra
  0 siblings, 1 reply; 48+ messages in thread
From: Bernd Schmidt @ 2011-09-27  0:19 UTC (permalink / raw)
  To: gcc-patches, Richard Henderson

On 09/27/11 00:32, Alan Modra wrote:
> On Mon, Sep 26, 2011 at 11:22:54PM +0930, Alan Modra wrote:
>> Two regressions appeared due to a problem in the shrink-wrap code.
> 
> These two.
> +FAIL: g++.dg/torture/pr46111.C  -O1  (internal compiler error)
> +FAIL: gcc.dg/autopar/pr46099.c (internal compiler error)
> 
> Both "internal compiler error: in maybe_record_trace_start, at
> dwarf2cfi.c:2243", caused by disjoint blocks in the set of blocks that
> needs no prologue.  ie. thread_prologue_and_epilogue_insns gives
> 
> ====
> code needing no prologue
> ====  <- prologue inserted here
> code needing prologue
> ====
> more no prologue code
> ====
> more code needing prologue
> ====
> epilogue
> 
> That second block needing no prologue now wrongly tries to use unwind
> info from the prologue.

What's the actual CFG structure here?


Bernd

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: PowerPC shrink-wrap support 3 of 3
  2011-09-27  0:19       ` Bernd Schmidt
@ 2011-09-27  0:49         ` Alan Modra
  2011-09-27  1:08           ` Bernd Schmidt
  0 siblings, 1 reply; 48+ messages in thread
From: Alan Modra @ 2011-09-27  0:49 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: gcc-patches, Richard Henderson

On Tue, Sep 27, 2011 at 12:39:36AM +0200, Bernd Schmidt wrote:
> On 09/27/11 00:32, Alan Modra wrote:
> > On Mon, Sep 26, 2011 at 11:22:54PM +0930, Alan Modra wrote:
> >> Two regressions appeared due to a problem in the shrink-wrap code.
> > 
> > These two.
> > +FAIL: g++.dg/torture/pr46111.C  -O1  (internal compiler error)
> > +FAIL: gcc.dg/autopar/pr46099.c (internal compiler error)
> > 
> > Both "internal compiler error: in maybe_record_trace_start, at
> > dwarf2cfi.c:2243", caused by disjoint blocks in the set of blocks that
> > needs no prologue.  ie. thread_prologue_and_epilogue_insns gives
> > 
> > ====
> > code needing no prologue
> > ====  <- prologue inserted here
> > code needing prologue
> > ====
> > more no prologue code
> > ====
> > more code needing prologue
> > ====
> > epilogue
> > 
> > That second block needing no prologue now wrongly tries to use unwind
> > info from the prologue.
> 
> What's the actual CFG structure here?

Immediately after thread_prologue_and_epilogue_insns

        bb2 -> simple_ret
         |
        bb3
        /\
      /    \
    bb4    bb5
   (pro)    |
     |      |<--|
    bb7     |   |
   (epi)   bb6--|
     |      |
    bb8<-|  |
  (s ret)|  |
         |-bb9


-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: PowerPC shrink-wrap support 3 of 3
  2011-09-27  0:49         ` Alan Modra
@ 2011-09-27  1:08           ` Bernd Schmidt
  2011-09-27  2:16             ` Alan Modra
  0 siblings, 1 reply; 48+ messages in thread
From: Bernd Schmidt @ 2011-09-27  1:08 UTC (permalink / raw)
  To: gcc-patches, Richard Henderson

On 09/27/11 02:11, Alan Modra wrote:
> On Tue, Sep 27, 2011 at 12:39:36AM +0200, Bernd Schmidt wrote:
>> On 09/27/11 00:32, Alan Modra wrote:
>>> On Mon, Sep 26, 2011 at 11:22:54PM +0930, Alan Modra wrote:
>>>> Two regressions appeared due to a problem in the shrink-wrap code.
>>>
>>> These two.
>>> +FAIL: g++.dg/torture/pr46111.C  -O1  (internal compiler error)
>>> +FAIL: gcc.dg/autopar/pr46099.c (internal compiler error)
>>>
>>> Both "internal compiler error: in maybe_record_trace_start, at
>>> dwarf2cfi.c:2243", caused by disjoint blocks in the set of blocks that
>>> needs no prologue.  ie. thread_prologue_and_epilogue_insns gives
>>>
>>> ====
>>> code needing no prologue
>>> ====  <- prologue inserted here
>>> code needing prologue
>>> ====
>>> more no prologue code
>>> ====
>>> more code needing prologue
>>> ====
>>> epilogue
>>>
>>> That second block needing no prologue now wrongly tries to use unwind
>>> info from the prologue.
>>
>> What's the actual CFG structure here?
> 
> Immediately after thread_prologue_and_epilogue_insns
> 
>         bb2 -> simple_ret
>          |
>         bb3
>         /\
>       /    \
>     bb4    bb5
>    (pro)    |
>      |      |<--|
>     bb7     |   |
>    (epi)   bb6--|
>      |      |
>     bb8<-|  |
>   (s ret)|  |
>          |-bb9

That looks perfectly reasonable, and even if it is ordered in the way
shown above, dwarf2cfi still should be able to deal with it if the
prologue and epilogue are annotated correctly.

Again, at the crash site, examine the start insn to find out which basic
block is the problem, and use dump_cfi_row on the two rows it tried to
compare (and found an unexpected difference).


Bernd

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: PowerPC shrink-wrap support 3 of 3
  2011-09-27  1:08           ` Bernd Schmidt
@ 2011-09-27  2:16             ` Alan Modra
  2011-09-28 16:35               ` Alan Modra
  0 siblings, 1 reply; 48+ messages in thread
From: Alan Modra @ 2011-09-27  2:16 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: gcc-patches, Richard Henderson

On Tue, Sep 27, 2011 at 02:15:24AM +0200, Bernd Schmidt wrote:
> That looks perfectly reasonable, and even if it is ordered in the way
> shown above, dwarf2cfi still should be able to deal with it if the
> prologue and epilogue are annotated correctly.

Here I was thinking that it was someone else's problem.  :-)
I looked at this before and saw some epilogue cfa_restores, but I see
now we're missing some for -m64.  More work needed in
rs6000_emit_epilogue..  Sorry for the noise.

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: PowerPC shrink-wrap support 3 of 3
  2011-09-27  2:16             ` Alan Modra
@ 2011-09-28 16:35               ` Alan Modra
  2011-10-16 20:19                 ` David Edelsohn
  0 siblings, 1 reply; 48+ messages in thread
From: Alan Modra @ 2011-09-28 16:35 UTC (permalink / raw)
  To: gcc-patches, David Edelsohn; +Cc: Bernd Schmidt

This supercedes http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01004.html
and http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01593.html, fixing
the two regressions introduced by those patches.  The first patch is
unchanged except to leave all the out-of-line restore functions using
"return" rather than "simple_return".  We don't want these being
confused with a plain "simple_return" and perhaps used by the shrink-
wrapping to return from pre-prologue code.

The second of these two patches was way too simplistic.  It was a real
pain getting the cfa_restores correct.  A lot were missing, or emitted
at the wrong place (due to bug in rs6000_emit_stack_reset).  I also
had the real restore insn move past the cfa_restores ("mtlr 0" insn
scheduled over loads from stack).

	* config/rs6000/rs6000.c (rs6000_make_savres_rtx): Delete unneeded
	declaration.
	(rs6000_emit_stack_reset): Only return insn emitted when it adjusts sp.
	(rs6000_make_savres_rtx): Rename to rs6000_emit_savres_rtx.  Use
	simple_return in pattern, emit instruction, and set jump_label.
	(rs6000_emit_prologue): Update for rs6000_emit_savres_rtx.  Use
	simple_return rather than return.
	(emit_cfa_restores): New function.
	(rs6000_emit_epilogue): Emit cfa_restores when flag_shrink_wrap.
	Add missing cfa_restores for SAVE_WORLD.  Add missing LR cfa_restore
	when using out-of-line gpr restore.  Add missing LR and FP regs
	cfa_restores for out-of-line fpr restore.  Consolidate code setting
	up cfa_restores.  Formatting.  Use LR_REGNO define.
	(rs6000_output_mi_thunk): Use simple_return rather than return.
	* config/rs6000/rs6000.md (sibcall*, sibcall_value*): Likewise.
	(return_internal*): Likewise.
	(any_return, return_pred, return_str): New iterators.
	(return, conditional return insns): Provide both return and
	simple_return variants.
	* gcc/config/rs6000/rs6000.h (EARLY_R12, LATE_R12): Define.
	(REG_ALLOC_ORDER): Move r12 before call-saved regs when FIXED_R13.
	Move r11 and r0 later to suit shrink-wrapping.

Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 178876)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -899,8 +900,6 @@ static const char *rs6000_mangle_type (c
 static void rs6000_set_default_type_attributes (tree);
 static rtx rs6000_savres_routine_sym (rs6000_stack_t *, bool, bool, bool);
 static rtx rs6000_emit_stack_reset (rs6000_stack_t *, rtx, rtx, int, bool);
-static rtx rs6000_make_savres_rtx (rs6000_stack_t *, rtx, int,
-				   enum machine_mode, bool, bool, bool);
 static bool rs6000_reg_live_or_pic_offset_p (int);
 static tree rs6000_builtin_vectorized_libmass (tree, tree, tree);
 static tree rs6000_builtin_vectorized_function (tree, tree, tree);
@@ -19704,8 +19728,10 @@ rs6000_emit_stack_reset (rs6000_stack_t 
       if (sp_offset != 0)
 	{
 	  rtx dest_reg = savres ? gen_rtx_REG (Pmode, 11) : sp_reg_rtx;
-	  return emit_insn (gen_add3_insn (dest_reg, frame_reg_rtx,
-					   GEN_INT (sp_offset)));
+	  rtx insn = emit_insn (gen_add3_insn (dest_reg, frame_reg_rtx,
+					       GEN_INT (sp_offset)));
+	  if (!savres)
+	    return insn;
 	}
       else if (!savres)
 	return emit_move_insn (sp_reg_rtx, frame_reg_rtx);
@@ -19729,10 +19755,11 @@ rs6000_emit_stack_reset (rs6000_stack_t 
 }
 
 /* Construct a parallel rtx describing the effect of a call to an
-   out-of-line register save/restore routine.  */
+   out-of-line register save/restore routine, and emit the insn
+   or jump_insn as appropriate.  */
 
 static rtx
-rs6000_make_savres_rtx (rs6000_stack_t *info,
+rs6000_emit_savres_rtx (rs6000_stack_t *info,
 			rtx frame_reg_rtx, int save_area_offset,
 			enum machine_mode reg_mode,
 			bool savep, bool gpr, bool lr)
@@ -19742,6 +19769,7 @@ rs6000_make_savres_rtx (rs6000_stack_t *
   int reg_size = GET_MODE_SIZE (reg_mode);
   rtx sym;
   rtvec p;
+  rtx par, insn;
 
   offset = 0;
   start_reg = (gpr
@@ -19755,7 +19783,7 @@ rs6000_make_savres_rtx (rs6000_stack_t *
     RTVEC_ELT (p, offset++) = ret_rtx;
 
   RTVEC_ELT (p, offset++)
-    = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, 65));
+    = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, LR_REGNO));
 
   sym = rs6000_savres_routine_sym (info, savep, gpr, lr);
   RTVEC_ELT (p, offset++) = gen_rtx_USE (VOIDmode, sym);
@@ -19788,7 +19816,16 @@ rs6000_make_savres_rtx (rs6000_stack_t *
       RTVEC_ELT (p, i + offset) = gen_rtx_SET (VOIDmode, mem, reg);
     }
 
-  return gen_rtx_PARALLEL (VOIDmode, p);
+  par = gen_rtx_PARALLEL (VOIDmode, p);
+
+  if (!savep && lr)
+    {
+      insn = emit_jump_insn (par);
+      JUMP_LABEL (insn) = ret_rtx;
+    }
+  else
+    insn = emit_insn (par);
+  return insn;
 }
 
 /* Determine whether the gp REG is really used.  */
@@ -20087,16 +20124,13 @@ rs6000_emit_prologue (void)
     }
   else if (!WORLD_SAVE_P (info) && info->first_fp_reg_save != 64)
     {
-      rtx par;
-
-      par = rs6000_make_savres_rtx (info, frame_reg_rtx,
-				    info->fp_save_offset + sp_offset,
-				    DFmode,
-				    /*savep=*/true, /*gpr=*/false,
-				    /*lr=*/(strategy
-					    & SAVE_NOINLINE_FPRS_SAVES_LR)
-					   != 0);
-      insn = emit_insn (par);
+      insn = rs6000_emit_savres_rtx (info, frame_reg_rtx,
+				     info->fp_save_offset + sp_offset,
+				     DFmode,
+				     /*savep=*/true, /*gpr=*/false,
+				     /*lr=*/((strategy
+					      & SAVE_NOINLINE_FPRS_SAVES_LR)
+					     != 0));
       rs6000_frame_related (insn, frame_ptr_rtx, info->total_size,
 			    NULL_RTX, NULL_RTX);
     }
@@ -20186,13 +20220,10 @@ rs6000_emit_prologue (void)
 	}
       else
 	{
-	  rtx par;
-
-	  par = rs6000_make_savres_rtx (info, gen_rtx_REG (Pmode, 11),
-					0, reg_mode,
-					/*savep=*/true, /*gpr=*/true,
-					/*lr=*/false);
-	  insn = emit_insn (par);
+	  insn = rs6000_emit_savres_rtx (info, gen_rtx_REG (Pmode, 11),
+					 0, reg_mode,
+					 /*savep=*/true, /*gpr=*/true,
+					 /*lr=*/false);
 	  rs6000_frame_related (insn, frame_ptr_rtx, info->total_size,
 				NULL_RTX, NULL_RTX);
 	}
@@ -20204,8 +20235,6 @@ rs6000_emit_prologue (void)
     }
   else if (!WORLD_SAVE_P (info) && !saving_GPRs_inline)
     {
-      rtx par;
-
       /* Need to adjust r11 (r12) if we saved any FPRs.  */
       if (info->first_fp_reg_save != 64)
         {
@@ -20216,14 +20245,13 @@ rs6000_emit_prologue (void)
 	  emit_insn (gen_add3_insn (dest_reg, frame_reg_rtx, offset));
         }
 
-      par = rs6000_make_savres_rtx (info, frame_reg_rtx,
-				    info->gp_save_offset + sp_offset,
-				    reg_mode,
-				    /*savep=*/true, /*gpr=*/true,
-				    /*lr=*/(strategy
-					    & SAVE_NOINLINE_GPRS_SAVES_LR)
-					   != 0);
-      insn = emit_insn (par);
+      insn = rs6000_emit_savres_rtx (info, frame_reg_rtx,
+				     info->gp_save_offset + sp_offset,
+				     reg_mode,
+				     /*savep=*/true, /*gpr=*/true,
+				     /*lr=*/((strategy
+					      & SAVE_NOINLINE_GPRS_SAVES_LR)
+					     != 0));
       rs6000_frame_related (insn, frame_ptr_rtx, info->total_size,
 			    NULL_RTX, NULL_RTX);
     }
@@ -20672,6 +20718,20 @@ offset_below_red_zone_p (HOST_WIDE_INT o
 		   : TARGET_32BIT ? -220 : -288);
 }
 
+/* Append CFA_RESTORES to any existing REG_NOTES on the last insn.  */
+
+static void
+emit_cfa_restores (rtx cfa_restores)
+{
+  rtx insn = get_last_insn ();
+  rtx *loc = &REG_NOTES (insn);
+
+  while (*loc)
+    loc = &XEXP (*loc, 1);
+  *loc = cfa_restores;
+  RTX_FRAME_RELATED_P (insn) = 1;
+}
+
 /* Emit function epilogue as insns.  */
 
 void
@@ -20769,6 +20829,14 @@ rs6000_emit_epilogue (int sibcall)
 	rtx mem = gen_frame_mem (reg_mode, addr);
 
 	RTVEC_ELT (p, j++) = gen_rtx_SET (VOIDmode, reg, mem);
+
+	if (flag_shrink_wrap)
+	  {
+	    cfa_restores = alloc_reg_note (REG_CFA_RESTORE,
+					   gen_rtx_REG (Pmode, LR_REGNO),
+					   cfa_restores);
+	    cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg, cfa_restores);
+	  }
       }
 
       for (i = 0; i < 32 - info->first_gp_reg_save; i++)
@@ -20780,6 +20848,8 @@ rs6000_emit_epilogue (int sibcall)
 	  rtx mem = gen_frame_mem (reg_mode, addr);
 
 	  RTVEC_ELT (p, j++) = gen_rtx_SET (VOIDmode, reg, mem);
+	  if (flag_shrink_wrap)
+	    cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg, cfa_restores);
 	}
       for (i = 0; info->first_altivec_reg_save + i <= LAST_ALTIVEC_REGNO; i++)
 	{
@@ -20790,6 +20860,8 @@ rs6000_emit_epilogue (int sibcall)
 	  rtx mem = gen_frame_mem (V4SImode, addr);
 
 	  RTVEC_ELT (p, j++) = gen_rtx_SET (VOIDmode, reg, mem);
+	  if (flag_shrink_wrap)
+	    cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg, cfa_restores);
 	}
       for (i = 0; info->first_fp_reg_save + i <= 63; i++)
 	{
@@ -20803,6 +20875,8 @@ rs6000_emit_epilogue (int sibcall)
 				     ? DFmode : SFmode), addr);
 
 	  RTVEC_ELT (p, j++) = gen_rtx_SET (VOIDmode, reg, mem);
+	  if (flag_shrink_wrap)
+	    cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg, cfa_restores);
 	}
       RTVEC_ELT (p, j++)
 	= gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, 0));
@@ -20814,8 +20888,14 @@ rs6000_emit_epilogue (int sibcall)
 	= gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (SImode, 8));
       RTVEC_ELT (p, j++)
 	= gen_rtx_USE (VOIDmode, gen_rtx_REG (SImode, 10));
-      emit_jump_insn (gen_rtx_PARALLEL (VOIDmode, p));
+      insn = emit_jump_insn (gen_rtx_PARALLEL (VOIDmode, p));
 
+      if (flag_shrink_wrap)
+	{
+	  REG_NOTES (insn) = cfa_restores;
+	  add_reg_note (insn, REG_CFA_DEF_CFA, sp_reg_rtx);
+	  RTX_FRAME_RELATED_P (insn) = 1;
+	}
       return;
     }
 
@@ -20860,9 +20940,10 @@ rs6000_emit_epilogue (int sibcall)
 
 	    reg = gen_rtx_REG (V4SImode, i);
 	    emit_move_insn (reg, mem);
-	    if (offset_below_red_zone_p (info->altivec_save_offset
-					 + (i - info->first_altivec_reg_save)
-					   * 16))
+	    if (flag_shrink_wrap
+		|| offset_below_red_zone_p (info->altivec_save_offset
+					    + (i - info->first_altivec_reg_save)
+					    * 16))
 	      cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg,
 					     cfa_restores);
 	  }
@@ -21001,7 +21082,7 @@ rs6000_emit_epilogue (int sibcall)
 
 	    reg = gen_rtx_REG (V4SImode, i);
 	    emit_move_insn (reg, mem);
-	    if (DEFAULT_ABI == ABI_V4)
+	    if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap)
 	      cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg,
 					     cfa_restores);
 	  }
@@ -21051,8 +21132,7 @@ rs6000_emit_epilogue (int sibcall)
       emit_move_insn (cr_save_reg, mem);
     }
 
-  /* Set LR here to try to overlap restores below.  LR is always saved
-     above incoming stack, so it never needs REG_CFA_RESTORE.  */
+  /* Set LR here to try to overlap restores below.  */
   if (restore_lr && restoring_GPRs_inline)
     emit_move_insn (gen_rtx_REG (Pmode, LR_REGNO),
 		    gen_rtx_REG (Pmode, 0));
@@ -21090,7 +21170,7 @@ rs6000_emit_epilogue (int sibcall)
   /* Restore GPRs.  This is done as a PARALLEL if we are using
      the load-multiple instructions.  */
   if (TARGET_SPE_ABI
-      && info->spe_64bit_regs_used != 0
+      && info->spe_64bit_regs_used
       && info->first_gp_reg_save != 32)
     {
       /* Determine whether we can address all of the registers that need
@@ -21114,7 +21194,7 @@ rs6000_emit_epilogue (int sibcall)
 	  int ool_adjust = (restoring_GPRs_inline
 			    ? 0
 			    : (info->first_gp_reg_save
-			       - (FIRST_SAVRES_REGISTER+1))*8);
+			       - (FIRST_SAVRES_REGISTER + 1)) * 8);
 
 	  if (frame_reg_rtx == sp_reg_rtx)
 	    frame_reg_rtx = gen_rtx_REG (Pmode, 11);
@@ -21145,48 +21225,28 @@ rs6000_emit_epilogue (int sibcall)
 		mem = gen_rtx_MEM (V2SImode, addr);
 		reg = gen_rtx_REG (reg_mode, info->first_gp_reg_save + i);
 
-		insn = emit_move_insn (reg, mem);
-		if (DEFAULT_ABI == ABI_V4)
-		  {
-		    if (frame_pointer_needed
-			&& info->first_gp_reg_save + i
-			   == HARD_FRAME_POINTER_REGNUM)
-		      {
-			add_reg_note (insn, REG_CFA_DEF_CFA,
-				      plus_constant (frame_reg_rtx,
-						     sp_offset));
-			RTX_FRAME_RELATED_P (insn) = 1;
-		      }
-
-		    cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg,
-						   cfa_restores);
-		  }
+		emit_move_insn (reg, mem);
 	      }
 	}
       else
-	{
-	  rtx par;
-
-	  par = rs6000_make_savres_rtx (info, gen_rtx_REG (Pmode, 11),
-					0, reg_mode,
-					/*savep=*/false, /*gpr=*/true,
-					/*lr=*/true);
-	  emit_jump_insn (par);
-	  /* We don't want anybody else emitting things after we jumped
-	     back.  */
-	  return;
-	}
+	rs6000_emit_savres_rtx (info, gen_rtx_REG (Pmode, 11),
+				0, reg_mode,
+				/*savep=*/false, /*gpr=*/true,
+				/*lr=*/true);
     }
   else if (!restoring_GPRs_inline)
     {
       /* We are jumping to an out-of-line function.  */
       bool can_use_exit = info->first_fp_reg_save == 64;
-      rtx par;
 
       /* Emit stack reset code if we need it.  */
       if (can_use_exit)
-	rs6000_emit_stack_reset (info, sp_reg_rtx, frame_reg_rtx,
-				 sp_offset, can_use_exit);
+	{
+	  rs6000_emit_stack_reset (info, sp_reg_rtx, frame_reg_rtx,
+				   sp_offset, can_use_exit);
+	  if (info->cr_save_p)
+	    rs6000_restore_saved_cr (cr_save_reg, using_mtcr_multiple);
+	}
       else
 	{
 	  emit_insn (gen_add3_insn (gen_rtx_REG (Pmode, DEFAULT_ABI == ABI_AIX
@@ -21197,45 +21257,10 @@ rs6000_emit_epilogue (int sibcall)
 	    sp_offset += info->fp_size;
 	}
 
-      par = rs6000_make_savres_rtx (info, frame_reg_rtx,
-				    info->gp_save_offset, reg_mode,
-				    /*savep=*/false, /*gpr=*/true,
-				    /*lr=*/can_use_exit);
-
-      if (can_use_exit)
-	{
-	  if (info->cr_save_p)
-	    {
-	      rs6000_restore_saved_cr (cr_save_reg, using_mtcr_multiple);
-	      if (DEFAULT_ABI == ABI_V4)
-		cfa_restores
-		  = alloc_reg_note (REG_CFA_RESTORE,
-				    gen_rtx_REG (SImode, CR2_REGNO),
-				    cfa_restores);
-	    }
-
-	  emit_jump_insn (par);
-
-	  /* We don't want anybody else emitting things after we jumped
-	     back.  */
-	  return;
-	}
-
-      insn = emit_insn (par);
-      if (DEFAULT_ABI == ABI_V4)
-	{
-	  if (frame_pointer_needed)
-	    {
-	      add_reg_note (insn, REG_CFA_DEF_CFA,
-			    plus_constant (frame_reg_rtx, sp_offset));
-	      RTX_FRAME_RELATED_P (insn) = 1;
-	    }
-
-	  for (i = info->first_gp_reg_save; i < 32; i++)
-	    cfa_restores
-	      = alloc_reg_note (REG_CFA_RESTORE,
-				gen_rtx_REG (reg_mode, i), cfa_restores);
-	}
+      rs6000_emit_savres_rtx (info, frame_reg_rtx,
+			      info->gp_save_offset, reg_mode,
+			      /*savep=*/false, /*gpr=*/true,
+			      /*lr=*/can_use_exit);
     }
   else if (using_load_multiple)
     {
@@ -21251,17 +21276,8 @@ rs6000_emit_epilogue (int sibcall)
 	  rtx reg = gen_rtx_REG (reg_mode, info->first_gp_reg_save + i);
 
 	  RTVEC_ELT (p, i) = gen_rtx_SET (VOIDmode, reg, mem);
-	  if (DEFAULT_ABI == ABI_V4)
-	    cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg,
-					   cfa_restores);
-	}
-      insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, p));
-      if (DEFAULT_ABI == ABI_V4 && frame_pointer_needed)
-	{
-	  add_reg_note (insn, REG_CFA_DEF_CFA,
-			plus_constant (frame_reg_rtx, sp_offset));
-	  RTX_FRAME_RELATED_P (insn) = 1;
 	}
+      emit_insn (gen_rtx_PARALLEL (VOIDmode, p));
     }
   else
     {
@@ -21275,24 +21291,70 @@ rs6000_emit_epilogue (int sibcall)
             rtx mem = gen_frame_mem (reg_mode, addr);
 	    rtx reg = gen_rtx_REG (reg_mode, info->first_gp_reg_save + i);
 
-	    insn = emit_move_insn (reg, mem);
-	    if (DEFAULT_ABI == ABI_V4)
-	      {
-	        if (frame_pointer_needed
-		    && info->first_gp_reg_save + i
-		       == HARD_FRAME_POINTER_REGNUM)
-		  {
-		    add_reg_note (insn, REG_CFA_DEF_CFA,
-				  plus_constant (frame_reg_rtx, sp_offset));
-		    RTX_FRAME_RELATED_P (insn) = 1;
-		  }
-
-		cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg,
-					       cfa_restores);
-	      }
+	    emit_move_insn (reg, mem);
           }
     }
 
+  if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap)
+    {
+      /* If the frame pointer was used then we can't delay emitting
+	 a REG_CFA_DEF_CFA note.  This must happen on the insn that
+	 restores the frame pointer, r31.  We may have already emitted
+	 a REG_CFA_DEF_CFA note, but that's OK;  A duplicate is
+	 discarded by dwarf2cfi.c/dwarf2out.c, and in any case would
+	 be harmless if emitted.  */
+      if (frame_pointer_needed)
+	{
+	  insn = get_last_insn ();
+	  add_reg_note (insn, REG_CFA_DEF_CFA,
+			plus_constant (frame_reg_rtx, sp_offset));
+	  RTX_FRAME_RELATED_P (insn) = 1;
+	}
+
+      /* Set up cfa_restores.  We always need these when
+	 shrink-wrapping.  If not shrink-wrapping then we only need
+	 the cfa_restore when the stack location is no longer valid.
+	 The cfa_restores must be emitted on or before the insn that
+	 invalidates the stack, and of course must not be emitted
+	 before the insn that actually does the restore.  The latter
+	 is why the LR cfa_restore condition below is a little
+	 complicated.  It's also why it is a bad idea to emit the
+	 cfa_restores as a group on the last instruction here that
+	 actually does a restore: That insn may be reordered with
+	 respect to others doing restores.  */
+      if (info->cr_save_p)
+	cfa_restores = alloc_reg_note (REG_CFA_RESTORE,
+				       gen_rtx_REG (SImode, CR2_REGNO),
+				       cfa_restores);
+      if (flag_shrink_wrap
+	  && (restore_lr
+	      || (info->lr_save_p
+		  && !restoring_GPRs_inline
+		  && info->first_fp_reg_save == 64)))
+	cfa_restores = alloc_reg_note (REG_CFA_RESTORE,
+				       gen_rtx_REG (Pmode, LR_REGNO),
+				       cfa_restores);
+
+      for (i = info->first_gp_reg_save; i < 32; i++)
+	if (!restoring_GPRs_inline
+	    || using_load_multiple
+	    || rs6000_reg_live_or_pic_offset_p (i))
+	  {
+	    rtx reg = gen_rtx_REG (reg_mode, i);
+
+	    cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg, cfa_restores);
+	  }
+    }
+
+  if (!restoring_GPRs_inline
+      && info->first_fp_reg_save == 64)
+    {
+      /* We are jumping to an out-of-line function.  */
+      if (cfa_restores)
+	emit_cfa_restores (cfa_restores);
+      return;
+    }
+
   if (restore_lr && !restoring_GPRs_inline)
     {
       rtx mem = gen_frame_mem_offset (Pmode, frame_reg_rtx,
@@ -21306,8 +21368,8 @@ rs6000_emit_epilogue (int sibcall)
   /* Restore fpr's if we need to do it without calling a function.  */
   if (restoring_FPRs_inline)
     for (i = 0; i < 64 - info->first_fp_reg_save; i++)
-      if ((df_regs_ever_live_p (info->first_fp_reg_save+i)
-	   && ! call_used_regs[info->first_fp_reg_save+i]))
+      if ((df_regs_ever_live_p (info->first_fp_reg_save + i)
+	   && !call_used_regs[info->first_fp_reg_save + i]))
 	{
 	  rtx addr, mem, reg;
 	  addr = gen_rtx_PLUS (Pmode, frame_reg_rtx,
@@ -21321,20 +21383,13 @@ rs6000_emit_epilogue (int sibcall)
 			     info->first_fp_reg_save + i);
 
  	  emit_move_insn (reg, mem);
-	  if (DEFAULT_ABI == ABI_V4)
-	    cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg,
-					   cfa_restores);
+	  if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap)
+	    cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg, cfa_restores);
 	}
 
   /* If we saved cr, restore it here.  Just those that were used.  */
   if (info->cr_save_p)
-    {
-      rs6000_restore_saved_cr (cr_save_reg, using_mtcr_multiple);
-      if (DEFAULT_ABI == ABI_V4)
-	cfa_restores
-	  = alloc_reg_note (REG_CFA_RESTORE, gen_rtx_REG (SImode, CR2_REGNO),
-			    cfa_restores);
-    }
+    rs6000_restore_saved_cr (cr_save_reg, using_mtcr_multiple);
 
   /* If this is V.4, unwind the stack pointer after all of the loads
      have been done.  */
@@ -21362,15 +21417,40 @@ rs6000_emit_epilogue (int sibcall)
       rtvec p;
       bool lr = (strategy & REST_NOINLINE_FPRS_DOESNT_RESTORE_LR) == 0;
       if (! restoring_FPRs_inline)
-	p = rtvec_alloc (4 + 64 - info->first_fp_reg_save);
+	{
+	  p = rtvec_alloc (4 + 64 - info->first_fp_reg_save);
+	  RTVEC_ELT (p, 0) = ret_rtx;
+	}
       else
-	p = rtvec_alloc (2);
+	{
+	  if (cfa_restores)
+	    {
+	      /* We can't hang the cfa_restores off a simple return,
+		 since the shrink-wrap code sometimes uses an existing
+		 return.  This means there might be a path from
+		 pre-prologue code to this return, and dwarf2cfi code
+		 wants the eh_frame unwinder state to be the same on
+		 all paths to any point.  So we need to emit the
+		 cfa_restores before the return.  For -m64 we really
+		 don't need epilogue cfa_restores at all, except for
+		 this irritating dwarf2cfi with shrink-wrap
+		 requirement;  The stack red-zone means eh_frame info
+		 from the prologue telling the unwinder to restore
+		 from the stack is perfectly good right to the end of
+		 the function.  */
+	      emit_insn (gen_blockage ());
+	      emit_cfa_restores (cfa_restores);
+	      cfa_restores = NULL_RTX;
+	    }
+	  p = rtvec_alloc (2);
+	  RTVEC_ELT (p, 0) = simple_return_rtx;
+	}
 
-      RTVEC_ELT (p, 0) = ret_rtx;
       RTVEC_ELT (p, 1) = ((restoring_FPRs_inline || !lr)
-			  ? gen_rtx_USE (VOIDmode, gen_rtx_REG (Pmode, 65))
+			  ? gen_rtx_USE (VOIDmode,
+					 gen_rtx_REG (Pmode, LR_REGNO))
 			  : gen_rtx_CLOBBER (VOIDmode,
-					     gen_rtx_REG (Pmode, 65)));
+					     gen_rtx_REG (Pmode, LR_REGNO)));
 
       /* If we have to restore more than two FP registers, branch to the
 	 restore function.  It will return to our caller.  */
@@ -21379,6 +21459,12 @@ rs6000_emit_epilogue (int sibcall)
 	  int i;
 	  rtx sym;
 
+	  if ((DEFAULT_ABI == ABI_V4 || flag_shrink_wrap)
+	      && lr)
+	    cfa_restores = alloc_reg_note (REG_CFA_RESTORE,
+					   gen_rtx_REG (Pmode, LR_REGNO),
+					   cfa_restores);
+
 	  sym = rs6000_savres_routine_sym (info,
 					   /*savep=*/false,
 					   /*gpr=*/false,
@@ -21390,20 +21476,32 @@ rs6000_emit_epilogue (int sibcall)
 						       ? 1 : 11));
 	  for (i = 0; i < 64 - info->first_fp_reg_save; i++)
 	    {
-	      rtx addr, mem;
+	      rtx addr, mem, reg;
+
 	      addr = gen_rtx_PLUS (Pmode, sp_reg_rtx,
-				   GEN_INT (info->fp_save_offset + 8*i));
+				   GEN_INT (info->fp_save_offset + 8 * i));
 	      mem = gen_frame_mem (DFmode, addr);
+	      reg = gen_rtx_REG (DFmode, info->first_fp_reg_save + i);
 
-	      RTVEC_ELT (p, i+4) =
-		gen_rtx_SET (VOIDmode,
-			     gen_rtx_REG (DFmode, info->first_fp_reg_save + i),
-			     mem);
+	      RTVEC_ELT (p, i + 4) = gen_rtx_SET (VOIDmode, reg, mem);
+	      if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap)
+		cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg,
+					       cfa_restores);
 	    }
 	}
 
       emit_jump_insn (gen_rtx_PARALLEL (VOIDmode, p));
     }
+
+  if (cfa_restores)
+    {
+      if (sibcall)
+	/* Ensure the cfa_restores are hung off an insn that won't
+	   be reordered above other restores.  */
+	emit_insn (gen_blockage ());
+
+      emit_cfa_restores (cfa_restores);
+    }
 }
 
 /* Write function epilogue.  */
@@ -21768,7 +21866,7 @@ rs6000_output_mi_thunk (FILE *file, tree
 			gen_rtx_USE (VOIDmode,
 				     gen_rtx_REG (SImode,
 						  LR_REGNO)),
-			ret_rtx)));
+			simple_return_rtx)));
   SIBLING_CALL_P (insn) = 1;
   emit_barrier ();
 
Index: gcc/config/rs6000/rs6000.h
===================================================================
--- gcc/config/rs6000/rs6000.h	(revision 178876)
+++ gcc/config/rs6000/rs6000.h	(working copy)
@@ -894,10 +894,11 @@ extern unsigned rs6000_pointer_size;
 	cr1		(not saved, but used for FP operations)
 	cr0		(not saved, but used for arithmetic operations)
 	cr4, cr3, cr2	(saved)
-	r0		(not saved; cannot be base reg)
 	r9		(not saved; best for TImode)
-	r11, r10, r8-r4	(not saved; highest used first to make less conflict)
+	r10, r8-r4	(not saved; highest first for less conflict with params)
 	r3		(not saved; return value register)
+	r11		(not saved; later alloc to help shrink-wrap)
+	r0		(not saved; cannot be base reg)
 	r31 - r13	(saved; order given to save least number)
 	r12		(not saved; if used for DImode or DFmode would use r13)
 	mq		(not saved; best to use it if we can)
@@ -922,6 +923,14 @@ extern unsigned rs6000_pointer_size;
 #define MAYBE_R2_FIXED
 #endif
 
+#if FIXED_R13 == 1
+#define EARLY_R12 12,
+#define LATE_R12
+#else
+#define EARLY_R12
+#define LATE_R12 12,
+#endif
+
 #define REG_ALLOC_ORDER						\
   {32,								\
    45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34,		\
@@ -929,11 +938,11 @@ extern unsigned rs6000_pointer_size;
    63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51,		\
    50, 49, 48, 47, 46,						\
    75, 74, 69, 68, 72, 71, 70,					\
-   0, MAYBE_R2_AVAILABLE					\
-   9, 11, 10, 8, 7, 6, 5, 4,					\
-   3,								\
+   MAYBE_R2_AVAILABLE						\
+   9, 10, 8, 7, 6, 5, 4,					\
+   3, EARLY_R12 11, 0,						\
    31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19,		\
-   18, 17, 16, 15, 14, 13, 12,					\
+   18, 17, 16, 15, 14, 13, LATE_R12				\
    64, 66, 65,							\
    73, 1, MAYBE_R2_FIXED 67, 76,				\
    /* AltiVec registers.  */					\
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 178876)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -264,6 +265,12 @@ (define_mode_iterator RECIPF [SF DF V4SF
 ; Iterator for just SF/DF
 (define_mode_iterator SFDF [SF DF])
 
+; Conditional returns.
+(define_code_iterator any_return [return simple_return])
+(define_code_attr return_pred [(return "direct_return ()")
+			       (simple_return "")])
+(define_code_attr return_str [(return "") (simple_return "simple_")])
+
 ; Various instructions that come in SI and DI forms.
 ; A generic w/d attribute, for things like cmpw/cmpd.
 (define_mode_attr wd [(QI "b") (HI "h") (SI "w") (DI "d")])
@@ -12814,7 +12831,7 @@ (define_expand "sibcall"
 		    (match_operand 1 "" ""))
 	      (use (match_operand 2 "" ""))
 	      (use (reg:SI LR_REGNO))
-	      (return)])]
+	      (simple_return)])]
   ""
   "
 {
@@ -12838,7 +12855,7 @@ (define_insn "*sibcall_local32"
 	 (match_operand 1 "" "g,g"))
    (use (match_operand:SI 2 "immediate_operand" "O,n"))
    (use (reg:SI LR_REGNO))
-   (return)]
+   (simple_return)]
   "(INTVAL (operands[2]) & CALL_LONG) == 0"
   "*
 {
@@ -12858,7 +12875,7 @@ (define_insn "*sibcall_local64"
 	 (match_operand 1 "" "g,g"))
    (use (match_operand:SI 2 "immediate_operand" "O,n"))
    (use (reg:SI LR_REGNO))
-   (return)]
+   (simple_return)]
   "TARGET_64BIT && (INTVAL (operands[2]) & CALL_LONG) == 0"
   "*
 {
@@ -12879,7 +12896,7 @@ (define_insn "*sibcall_value_local32"
 	      (match_operand 2 "" "g,g")))
    (use (match_operand:SI 3 "immediate_operand" "O,n"))
    (use (reg:SI LR_REGNO))
-   (return)]
+   (simple_return)]
   "(INTVAL (operands[3]) & CALL_LONG) == 0"
   "*
 {
@@ -12901,7 +12918,7 @@ (define_insn "*sibcall_value_local64"
 	      (match_operand 2 "" "g,g")))
    (use (match_operand:SI 3 "immediate_operand" "O,n"))
    (use (reg:SI LR_REGNO))
-   (return)]
+   (simple_return)]
   "TARGET_64BIT && (INTVAL (operands[3]) & CALL_LONG) == 0"
   "*
 {
@@ -12921,7 +12938,7 @@ (define_insn "*sibcall_nonlocal_aix<mode
 	 (match_operand 1 "" "g,g"))
    (use (match_operand:SI 2 "immediate_operand" "O,O"))
    (use (reg:SI LR_REGNO))
-   (return)]
+   (simple_return)]
   "DEFAULT_ABI == ABI_AIX
    && (INTVAL (operands[2]) & CALL_LONG) == 0"
   "@
@@ -12936,7 +12953,7 @@ (define_insn "*sibcall_value_nonlocal_ai
 	      (match_operand 2 "" "g,g")))
    (use (match_operand:SI 3 "immediate_operand" "O,O"))
    (use (reg:SI LR_REGNO))
-   (return)]
+   (simple_return)]
   "DEFAULT_ABI == ABI_AIX
    && (INTVAL (operands[3]) & CALL_LONG) == 0"
   "@
@@ -12950,7 +12967,7 @@ (define_insn "*sibcall_nonlocal_sysv<mod
 	 (match_operand 1 "" ""))
    (use (match_operand 2 "immediate_operand" "O,n,O,n"))
    (use (reg:SI LR_REGNO))
-   (return)]
+   (simple_return)]
   "(DEFAULT_ABI == ABI_DARWIN
     || DEFAULT_ABI == ABI_V4)
    && (INTVAL (operands[2]) & CALL_LONG) == 0"
@@ -12981,7 +12998,7 @@ (define_expand "sibcall_value"
 		      (match_operand 2 "" "")))
 	      (use (match_operand 3 "" ""))
 	      (use (reg:SI LR_REGNO))
-	      (return)])]
+	      (simple_return)])]
   ""
   "
 {
@@ -13002,7 +13019,7 @@ (define_insn "*sibcall_value_nonlocal_sy
 	      (match_operand 2 "" "")))
    (use (match_operand:SI 3 "immediate_operand" "O,n,O,n"))
    (use (reg:SI LR_REGNO))
-   (return)]
+   (simple_return)]
   "(DEFAULT_ABI == ABI_DARWIN
     || DEFAULT_ABI == ABI_V4)
    && (INTVAL (operands[3]) & CALL_LONG) == 0"
@@ -15328,9 +15345,9 @@ (define_insn ""
 				      [(match_operand 1
 						      "cc_reg_operand" "y")
 				       (const_int 0)])
-		      (return)
+		      (any_return)
 		      (pc)))]
-  "direct_return ()"
+  "<return_pred>"
   "*
 {
   return output_cbranch (operands[0], NULL, 0, insn);
@@ -15360,8 +15377,8 @@ (define_insn ""
 						      "cc_reg_operand" "y")
 				       (const_int 0)])
 		      (pc)
-		      (return)))]
-  "direct_return ()"
+		      (any_return)))]
+  "<return_pred>"
   "*
 {
   return output_cbranch (operands[0], NULL, 1, insn);
@@ -15491,9 +15508,9 @@ (define_insn "jump"
   "b %l0"
   [(set_attr "type" "branch")])
 
-(define_insn "return"
-  [(return)]
-  "direct_return ()"
+(define_insn "<return_str>return"
+  [(any_return)]
+  "<return_pred>"
   "{br|blr}"
   [(set_attr "type" "jmpreg")])
 
@@ -16015,7 +16032,7 @@ (define_insn "*lmw"
    (set_attr "cell_micro" "always")])
 
 (define_insn "*return_internal_<mode>"
-  [(return)
+  [(simple_return)
    (use (match_operand:P 0 "register_operand" "lc"))]
   ""
   "b%T0"

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: PowerPC shrink-wrap support 3 of 3
  2011-09-28 16:35               ` Alan Modra
@ 2011-10-16 20:19                 ` David Edelsohn
  2011-10-26 13:03                   ` Alan Modra
  0 siblings, 1 reply; 48+ messages in thread
From: David Edelsohn @ 2011-10-16 20:19 UTC (permalink / raw)
  To: gcc-patches, Bernd Schmidt, Alan Modra

On Wed, Sep 28, 2011 at 11:47 AM, Alan Modra <amodra@gmail.com> wrote:

>        * config/rs6000/rs6000.c (rs6000_make_savres_rtx): Delete unneeded
>        declaration.
>        (rs6000_emit_stack_reset): Only return insn emitted when it adjusts sp.
>        (rs6000_make_savres_rtx): Rename to rs6000_emit_savres_rtx.  Use
>        simple_return in pattern, emit instruction, and set jump_label.
>        (rs6000_emit_prologue): Update for rs6000_emit_savres_rtx.  Use
>        simple_return rather than return.
>        (emit_cfa_restores): New function.
>        (rs6000_emit_epilogue): Emit cfa_restores when flag_shrink_wrap.
>        Add missing cfa_restores for SAVE_WORLD.  Add missing LR cfa_restore
>        when using out-of-line gpr restore.  Add missing LR and FP regs
>        cfa_restores for out-of-line fpr restore.  Consolidate code setting
>        up cfa_restores.  Formatting.  Use LR_REGNO define.
>        (rs6000_output_mi_thunk): Use simple_return rather than return.
>        * config/rs6000/rs6000.md (sibcall*, sibcall_value*): Likewise.
>        (return_internal*): Likewise.
>        (any_return, return_pred, return_str): New iterators.
>        (return, conditional return insns): Provide both return and
>        simple_return variants.
>        * gcc/config/rs6000/rs6000.h (EARLY_R12, LATE_R12): Define.
>        (REG_ALLOC_ORDER): Move r12 before call-saved regs when FIXED_R13.
>        Move r11 and r0 later to suit shrink-wrapping.

Alan,

The patch is okay, although I am not thrilled about the need to change
the register allocation order.

Thanks, David

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: PowerPC shrink-wrap support 3 of 3
  2011-10-16 20:19                 ` David Edelsohn
@ 2011-10-26 13:03                   ` Alan Modra
  2011-10-26 13:42                     ` Bernd Schmidt
  2011-10-31 15:14                     ` Alan Modra
  0 siblings, 2 replies; 48+ messages in thread
From: Alan Modra @ 2011-10-26 13:03 UTC (permalink / raw)
  To: gcc-patches; +Cc: Bernd Schmidt

On Sun, Oct 16, 2011 at 02:51:01PM -0400, David Edelsohn wrote:
> The patch is okay, although I am not thrilled about the need to change
> the register allocation order.

Committed revision 180522.  It turns out that shrink-wrapping isn't as
effective as it used to be with the 20110915 based sources I was using
originally.  povray Ray_In_Bound no longer gets the benefit of shrink
wrap, likely due to some cfg optimization.  We end up with a simple
block that just does r3=1 then jumps to last_bb being reached from
blocks that need a prologue as well as blocks that don't.  That's
enough to kill our current shrink wrap implementation.  What we need
is something to duplicate these tail blocks..

Patch here for comment.  I haven't yet run benchmarks, but this passes
bootstrap and regression test (on rev 180286, current virgin mainline
fails bootstrap on powerpc-linux).

	* function.c (thread_prologue_and_epilogue_insns): Delete
	shadowing variables.  Don't do prologue register clobber tests
	when shrink wrapping already failed.  Compute tail block
	candidates for duplicating exit path.  Remove these from
	antic set.  Duplicate tails when reached from both blocks
	needing a prologue/epilogue and blocks not needing such.

Index: gcc/function.c
===================================================================
--- gcc/function.c	(revision 180467)
+++ gcc/function.c	(working copy)
@@ -5697,11 +5697,11 @@ thread_prologue_and_epilogue_insns (void
       HARD_REG_SET prologue_clobbered, prologue_used, live_on_edge;
       HARD_REG_SET set_up_by_prologue;
       rtx p_insn;
-
       VEC(basic_block, heap) *vec;
       basic_block bb;
       bitmap_head bb_antic_flags;
       bitmap_head bb_on_list;
+      bitmap_head bb_tail;
 
       if (dump_file)
 	fprintf (dump_file, "Attempting shrink-wrapping optimization.\n");
@@ -5732,6 +5732,7 @@ thread_prologue_and_epilogue_insns (void
 
       bitmap_initialize (&bb_antic_flags, &bitmap_default_obstack);
       bitmap_initialize (&bb_on_list, &bitmap_default_obstack);
+      bitmap_initialize (&bb_tail, &bitmap_default_obstack);
 
       /* Find the set of basic blocks that require a stack frame.  */
 
@@ -5774,19 +5775,22 @@ thread_prologue_and_epilogue_insns (void
 		}
 	}
 
+      /* Save a copy of blocks that really need a prologue.  */
+      bitmap_copy (&bb_antic_flags, &bb_flags);
+
       /* For every basic block that needs a prologue, mark all blocks
 	 reachable from it, so as to ensure they are also seen as
 	 requiring a prologue.  */
       while (!VEC_empty (basic_block, vec))
 	{
 	  basic_block tmp_bb = VEC_pop (basic_block, vec);
-	  edge e;
-	  edge_iterator ei;
+
 	  FOR_EACH_EDGE (e, ei, tmp_bb->succs)
 	    if (e->dest != EXIT_BLOCK_PTR
 		&& bitmap_set_bit (&bb_flags, e->dest->index))
 	      VEC_quick_push (basic_block, vec, e->dest);
 	}
+
       /* If the last basic block contains only a label, we'll be able
 	 to convert jumps to it to (potentially conditional) return
 	 insns later.  This means we don't necessarily need a prologue
@@ -5799,14 +5803,29 @@ thread_prologue_and_epilogue_insns (void
 	    goto fail_shrinkwrap;
 	}
 
+      /* Find the set of basic blocks that need no prologue and only
+	 go to the exit.  */
+      bitmap_set_bit (&bb_tail, EXIT_BLOCK_PTR->index);
+      VEC_quick_push (basic_block, vec, EXIT_BLOCK_PTR);
+      while (!VEC_empty (basic_block, vec))
+	{
+	  basic_block tmp_bb = VEC_pop (basic_block, vec);
+
+	  FOR_EACH_EDGE (e, ei, tmp_bb->preds)
+	    if (single_succ_p (e->src)
+		&& !bitmap_bit_p (&bb_antic_flags, e->src->index)
+		&& bitmap_set_bit (&bb_tail, e->src->index))
+	      VEC_quick_push (basic_block, vec, e->src);
+	}
+
       /* Now walk backwards from every block that is marked as needing
-	 a prologue to compute the bb_antic_flags bitmap.  */
-      bitmap_copy (&bb_antic_flags, &bb_flags);
+	 a prologue to compute the bb_antic_flags bitmap.  Exclude
+	 tail blocks; They can be duplicated to be used on paths not
+	 needing a prologue.  */
+      bitmap_and_compl (&bb_antic_flags, &bb_flags, &bb_tail);
       FOR_EACH_BB (bb)
 	{
-	  edge e;
-	  edge_iterator ei;
-	  if (!bitmap_bit_p (&bb_flags, bb->index))
+	  if (!bitmap_bit_p (&bb_antic_flags, bb->index))
 	    continue;
 	  FOR_EACH_EDGE (e, ei, bb->preds)
 	    if (!bitmap_bit_p (&bb_antic_flags, e->src->index)
@@ -5816,8 +5835,6 @@ thread_prologue_and_epilogue_insns (void
       while (!VEC_empty (basic_block, vec))
 	{
 	  basic_block tmp_bb = VEC_pop (basic_block, vec);
-	  edge e;
-	  edge_iterator ei;
 	  bool all_set = true;
 
 	  bitmap_clear_bit (&bb_on_list, tmp_bb->index);
@@ -5862,28 +5879,172 @@ thread_prologue_and_epilogue_insns (void
 		}
 	  }
 
-      /* Test whether the prologue is known to clobber any register
-	 (other than FP or SP) which are live on the edge.  */
-      CLEAR_HARD_REG_BIT (prologue_clobbered, STACK_POINTER_REGNUM);
-      if (frame_pointer_needed)
-	CLEAR_HARD_REG_BIT (prologue_clobbered, HARD_FRAME_POINTER_REGNUM);
-      CLEAR_HARD_REG_SET (live_on_edge);
-      reg_set_to_hard_reg_set (&live_on_edge,
-			       df_get_live_in (entry_edge->dest));
-      if (hard_reg_set_intersect_p (live_on_edge, prologue_clobbered))
+      if (entry_edge != orig_entry_edge)
 	{
-	  entry_edge = orig_entry_edge;
-	  if (dump_file)
-	    fprintf (dump_file, "Shrink-wrapping aborted due to clobber.\n");
+	  /* Test whether the prologue is known to clobber any register
+	     (other than FP or SP) which are live on the edge.  */
+	  CLEAR_HARD_REG_BIT (prologue_clobbered, STACK_POINTER_REGNUM);
+	  if (frame_pointer_needed)
+	    CLEAR_HARD_REG_BIT (prologue_clobbered, HARD_FRAME_POINTER_REGNUM);
+	  CLEAR_HARD_REG_SET (live_on_edge);
+	  reg_set_to_hard_reg_set (&live_on_edge,
+				   df_get_live_in (entry_edge->dest));
+	  if (hard_reg_set_intersect_p (live_on_edge, prologue_clobbered))
+	    {
+	      entry_edge = orig_entry_edge;
+	      if (dump_file)
+		fprintf (dump_file,
+			 "Shrink-wrapping aborted due to clobber.\n");
+	    }
 	}
-      else if (entry_edge != orig_entry_edge)
+      if (entry_edge != orig_entry_edge)
 	{
 	  crtl->shrink_wrapped = true;
 	  if (dump_file)
 	    fprintf (dump_file, "Performing shrink-wrapping.\n");
+
+	  /* Find tail blocks reachable from both blocks needing a
+	     prologue and blocks not needing a prologue.  */
+	  bitmap_clear_bit (&bb_tail, EXIT_BLOCK_PTR->index);
+	  if (!bitmap_empty_p (&bb_tail))
+	    FOR_EACH_BB (bb)
+	      {
+		bool some_pro, some_no_pro;
+		if (!bitmap_bit_p (&bb_tail, bb->index))
+		  continue;
+		some_pro = some_no_pro = false;
+		FOR_EACH_EDGE (e, ei, bb->preds)
+		  {
+		    if (bitmap_bit_p (&bb_flags, e->src->index))
+		      some_pro = true;
+		    else
+		      some_no_pro = true;
+		  }
+		if (some_pro && some_no_pro)
+		  VEC_quick_push (basic_block, vec, bb);
+		else
+		  bitmap_clear_bit (&bb_tail, bb->index);
+	      }
+	  /* Find the head of each tail.  */
+	  while (!VEC_empty (basic_block, vec))
+	    {
+	      basic_block tbb = VEC_pop (basic_block, vec);
+
+	      if (!bitmap_bit_p (&bb_tail, tbb->index))
+		continue;
+
+	      while (single_succ_p (tbb))
+		{
+		  tbb = single_succ (tbb);
+		  bitmap_clear_bit (&bb_tail, tbb->index);
+		}
+	    }
+	  /* Now duplicate the tails.  */
+	  if (!bitmap_empty_p (&bb_tail))
+	    FOR_EACH_BB_REVERSE (bb)
+	      {
+		basic_block copy_bb, next_bb, fall_bb;
+		edge fall_into;
+		rtx insert_point, last;
+
+		if (!bitmap_clear_bit (&bb_tail, bb->index))
+		  continue;
+
+		copy_bb = NULL;
+		next_bb = single_succ (bb);
+		fall_into = find_fallthru_edge (bb->preds);
+		fall_bb = (fall_into ? fall_into->src : NULL);
+
+		if (fall_into
+		    && !bitmap_bit_p (&bb_flags, fall_bb->index))
+		  e = fall_into;
+		else
+		  {
+		    FOR_EACH_EDGE (e, ei, bb->preds)
+		      if (!bitmap_bit_p (&bb_flags, e->src->index))
+			break;
+		    gcc_assert (e);
+		  }
+
+		/* Create a copy of BB, instructions and all, for
+		   use on paths that don't need a prologue.  We know
+		   BB has a single successor, so there is no need to
+		   copy a jump at the end of BB.  */
+		start_sequence ();
+		last = BB_END (bb);
+		if (simplejump_p (last))
+		  last = PREV_INSN (last);
+		duplicate_insn_chain (BB_HEAD (bb), last);
+		copy_bb = split_edge (e);
+		emit_insn_after (get_insns (), BB_END (copy_bb));
+		end_sequence ();
+		df_set_bb_dirty (copy_bb);
+		force_nonfallthru_and_redirect (single_succ_edge (copy_bb),
+						EXIT_BLOCK_PTR,
+						simple_return_rtx);
+		insert_point = BB_END (copy_bb);
+		gcc_assert (JUMP_P (insert_point));
+
+		/* If there was a fall-thru edge into bb, then
+		   creating copy_bb may have required inserting a
+		   jump block.  Set bb_flags for the jump block.  */
+		if (fall_bb
+		    && bitmap_bit_p (&bb_flags, fall_bb->index))
+		  FOR_EACH_EDGE (e, ei, fall_bb->succs)
+		    if (e->dest != EXIT_BLOCK_PTR)
+		      bitmap_set_bit (&bb_flags, e->dest->index);
+
+		/* Redirect all the paths that need no prologue into
+		   copy_bb.  */
+		for (ei = ei_start (bb->preds); (e = ei_safe_edge (ei)); )
+		  if (!bitmap_bit_p (&bb_flags, e->src->index))
+		    {
+		      redirect_edge_and_branch_force (e, copy_bb);
+		      continue;
+		    }
+		  else
+		    ei_next (&ei);
+
+		while (next_bb != EXIT_BLOCK_PTR)
+		  {
+		    basic_block tbb = next_bb;
+		    next_bb = single_succ (tbb);
+		    e = split_block (copy_bb, PREV_INSN (insert_point));
+		    copy_bb = e->dest;
+		    start_sequence ();
+		    last = BB_END (tbb);
+		    if (simplejump_p (last))
+		      last = PREV_INSN (last);
+		    duplicate_insn_chain (BB_HEAD (tbb), last);
+		    emit_insn_before (get_insns (), insert_point);
+		    end_sequence ();
+
+		    for (ei = ei_start (tbb->preds); (e = ei_safe_edge (ei)); )
+		      if (!bitmap_bit_p (&bb_flags, e->src->index))
+			{
+			  redirect_edge_and_branch_force (e, copy_bb);
+			  continue;
+			}
+		      else
+			ei_next (&ei);
+		  }
+		/* Our tail may just end in a sibling call, in which
+		   case we don't want the simple_return jump added by
+		   force_nonfallthru_and_redirect.  */
+		if (CALL_P (PREV_INSN (insert_point))
+		    && SIBLING_CALL_P (PREV_INSN (insert_point)))
+		  {
+		    delete_insn (insert_point);
+		    e = single_succ_edge (copy_bb);
+		    e->flags = EDGE_SIBCALL | EDGE_ABNORMAL;
+		  }
+		if (bitmap_empty_p (&bb_tail))
+		  break;
+	      }
 	}
 
     fail_shrinkwrap:
+      bitmap_clear (&bb_tail);
       bitmap_clear (&bb_antic_flags);
       bitmap_clear (&bb_on_list);
       VEC_free (basic_block, heap, vec);

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: PowerPC shrink-wrap support 3 of 3
  2011-10-26 13:03                   ` Alan Modra
@ 2011-10-26 13:42                     ` Bernd Schmidt
  2011-10-26 14:40                       ` Alan Modra
  2011-10-31 15:14                     ` Alan Modra
  1 sibling, 1 reply; 48+ messages in thread
From: Bernd Schmidt @ 2011-10-26 13:42 UTC (permalink / raw)
  To: gcc-patches

On 10/26/11 14:27, Alan Modra wrote:
> Committed revision 180522.  It turns out that shrink-wrapping isn't as
> effective as it used to be with the 20110915 based sources I was using
> originally.  povray Ray_In_Bound no longer gets the benefit of shrink
> wrap, likely due to some cfg optimization.  We end up with a simple
> block that just does r3=1 then jumps to last_bb being reached from
> blocks that need a prologue as well as blocks that don't.  That's
> enough to kill our current shrink wrap implementation.  What we need
> is something to duplicate these tail blocks..

Would it work to insert the epilogue on some edges to this R3=1 block,
and not on the others? (How many edges of each kind are there?)

Now that we have an initial patch in the tree and it mostly seems to
work, we can think about making it a little stronger - the initial
implementation is really quite conservative.


Bernd

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: PowerPC shrink-wrap support 3 of 3
  2011-10-26 13:42                     ` Bernd Schmidt
@ 2011-10-26 14:40                       ` Alan Modra
  2011-10-26 14:44                         ` Bernd Schmidt
  2011-10-28  0:41                         ` Alan Modra
  0 siblings, 2 replies; 48+ messages in thread
From: Alan Modra @ 2011-10-26 14:40 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: gcc-patches

On Wed, Oct 26, 2011 at 03:01:01PM +0200, Bernd Schmidt wrote:
> On 10/26/11 14:27, Alan Modra wrote:
> > Committed revision 180522.  It turns out that shrink-wrapping isn't as
> > effective as it used to be with the 20110915 based sources I was using
> > originally.  povray Ray_In_Bound no longer gets the benefit of shrink
> > wrap, likely due to some cfg optimization.  We end up with a simple
> > block that just does r3=1 then jumps to last_bb being reached from
> > blocks that need a prologue as well as blocks that don't.  That's
> > enough to kill our current shrink wrap implementation.  What we need
> > is something to duplicate these tail blocks..
> 
> Would it work to insert the epilogue on some edges to this R3=1 block,
> and not on the others?

Wouldn't you need to modify all the target epilogue code?  Our
epilogues return.

> (How many edges of each kind are there?)
In the povray case there was one edge of each kind, but I have seen
other cases where there were 4 edges from blocks needing no prologue
and 2 edges from blocks needing a prologue.  I can't tell you what the
testcase was now;  It was something I looked at when ironing out bugs
in my code.  You wouldn't believe how many ways it is possible to
write buggy cfg manipulation code..

I guess the tradeoff between the classic shrink-wrap epilogue scheme
and my duplicate tail idea is whether duplicating tail blocks adds
more code than duplicating epilogues.  From what I've seen, the
duplicate tails are generally very small.  I guess I should dump out
some info so we can get a better idea.

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: PowerPC shrink-wrap support 3 of 3
  2011-10-26 14:40                       ` Alan Modra
@ 2011-10-26 14:44                         ` Bernd Schmidt
  2011-10-26 15:40                           ` Alan Modra
  2011-10-28  0:41                         ` Alan Modra
  1 sibling, 1 reply; 48+ messages in thread
From: Bernd Schmidt @ 2011-10-26 14:44 UTC (permalink / raw)
  To: gcc-patches

On 10/26/11 15:54, Alan Modra wrote:
> On Wed, Oct 26, 2011 at 03:01:01PM +0200, Bernd Schmidt wrote:
>> On 10/26/11 14:27, Alan Modra wrote:
>>> Committed revision 180522.  It turns out that shrink-wrapping isn't as
>>> effective as it used to be with the 20110915 based sources I was using
>>> originally.  povray Ray_In_Bound no longer gets the benefit of shrink
>>> wrap, likely due to some cfg optimization.  We end up with a simple
>>> block that just does r3=1 then jumps to last_bb being reached from
>>> blocks that need a prologue as well as blocks that don't.  That's
>>> enough to kill our current shrink wrap implementation.  What we need
>>> is something to duplicate these tail blocks..
>>
>> Would it work to insert the epilogue on some edges to this R3=1 block,
>> and not on the others?
> 
> Wouldn't you need to modify all the target epilogue code?  Our
> epilogues return.

Not all of them at once - you could require that if a target has a
simple_return pattern, the epilogue does not return. But yes, these
kinds of complications are a reason why I went for a simple variant first.

> I guess the tradeoff between the classic shrink-wrap epilogue scheme
> and my duplicate tail idea is whether duplicating tail blocks adds
> more code than duplicating epilogues.  From what I've seen, the
> duplicate tails are generally very small.  I guess I should dump out
> some info so we can get a better idea.

I suppose if one wanted to avoid inserting more than one epilogue for
code-size reasons, one could make a new basic block containing the
epilogue, and redirect edges as appropriate.


Bernd

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: PowerPC shrink-wrap support 3 of 3
  2011-10-26 14:44                         ` Bernd Schmidt
@ 2011-10-26 15:40                           ` Alan Modra
  0 siblings, 0 replies; 48+ messages in thread
From: Alan Modra @ 2011-10-26 15:40 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: gcc-patches

On Wed, Oct 26, 2011 at 03:59:36PM +0200, Bernd Schmidt wrote:
> On 10/26/11 15:54, Alan Modra wrote:
> > I guess the tradeoff between the classic shrink-wrap epilogue scheme
> > and my duplicate tail idea is whether duplicating tail blocks adds
> > more code than duplicating epilogues.  From what I've seen, the
> > duplicate tails are generally very small.  I guess I should dump out
> > some info so we can get a better idea.
> 
> I suppose if one wanted to avoid inserting more than one epilogue for
> code-size reasons, one could make a new basic block containing the
> epilogue, and redirect edges as appropriate.

Suppose you have a function that returns r3=0 in one tail block and
r3=1 in another, and these blocks are reached both by paths needing
a prologue and by paths not needing a prologue.  Which seems a likely
common case.  I'm fairly certain that would require two copies of the
normal epilogue, or duplicating the tail blocks.  (But it's late here
and I'm ready to nod off so may not be thinking straight.)

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: PowerPC shrink-wrap support 3 of 3
  2011-10-26 14:40                       ` Alan Modra
  2011-10-26 14:44                         ` Bernd Schmidt
@ 2011-10-28  0:41                         ` Alan Modra
  1 sibling, 0 replies; 48+ messages in thread
From: Alan Modra @ 2011-10-28  0:41 UTC (permalink / raw)
  To: Bernd Schmidt, gcc-patches

On Thu, Oct 27, 2011 at 12:24:46AM +1030, Alan Modra wrote:
> more code than duplicating epilogues.  From what I've seen, the
> duplicate tails are generally very small.  I guess I should dump out
> some info so we can get a better idea.

There were 545 occurrences of shrink-wrap in the gcc/ dir for a
-O2 powerpc-linux bootstrap.  I counted active insns in duplicated
blocks.  244 had zero insns (all the ones I looked at were just uses
of r3), 96 had one insn (all the ones I looked at were setting r3),
with the remainder being:

cfgrtl.c.199r.pro_and_epilogue:Duplicating bb 6, 2 insns.
  tail call
function.c.199r.pro_and_epilogue:Duplicating bb 17, 2 insns.
  setting a value via pointer (*poffset in instantiate_new_reg)
insn-automata.c.199r.pro_and_epilogue:Duplicating bb 229, 2 insns.
  tail call
varasm.c.199r.pro_and_epilogue:Duplicating bb 48, 2 insns.
  setting a global var
rs6000.c.199r.pro_and_epilogue:Duplicating bb 300, 3 insns.
  tail call
rs6000.c.199r.pro_and_epilogue:Duplicating bb 8, 3 insns.
  tail call
toplev.c.199r.pro_and_epilogue:Duplicating bb 5, 4 insns.
  loading two word return value from "random_seed" var.
reginfo.c.199r.pro_and_epilogue:Duplicating bb 4, 7 insns.
  setting a two word global var and r3
dbxout.c.199r.pro_and_epilogue:Duplicating bb 4, 8 insns.
  tail call
dbxout.c.199r.pro_and_epilogue:Duplicating bb 4, 8 insns.
  tail call

Note that having 350 duplicated blocks doesn't mean we had 350 extra
cases of shrink-wrapping, because some tails were two blocks, and I
recall seeing a case when developing my code where one function had
two duplicated tails.  Certainly many more shrink wrapped functions
though.  :-)

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: PowerPC shrink-wrap support 3 of 3
  2011-10-26 13:03                   ` Alan Modra
  2011-10-26 13:42                     ` Bernd Schmidt
@ 2011-10-31 15:14                     ` Alan Modra
  2011-11-01 15:34                       ` Alan Modra
  1 sibling, 1 reply; 48+ messages in thread
From: Alan Modra @ 2011-10-31 15:14 UTC (permalink / raw)
  To: gcc-patches, Bernd Schmidt

So I'm at the point where I'm reasonably happy with this work.  This
patch doesn't do anything particularly clever regarding our
shrink-wrap implementation.  We still only insert one copy of the
prologue, and one of the epilogue in thread_prologue_and_epilogue.
All it really does is replaces Bernd's !last_bb_active code (allowing
one tail block with no active insns to be shared by paths needing a
prologue and paths not needing a prologue), with what I think is
conceptually simpler, duplicating a shared tail block.  Then I extend
this to duplicating a chain of tail blocks.

That leads to some simplification as all the special cases and
restrictions of !last_bb_active disappear.  For example,
convert_jumps_to_returns looks much like the code in gcc-4.6.  We also
get many more functions being shrink-wrapped.  Some numbers from my
latest gcc bootstraps:

powerpc-linux
.../gcc-virgin/gcc> grep 'Performing shrink' *.pro_and_epilogue | wc -l
453
.../gcc-curr/gcc> grep 'Performing shrink' *.pro_and_epilogue | wc -l
648

i686-linux
.../gcc-virgin/gcc$ grep 'Performing shrink' *pro_and_epilogue | wc -l
329
.../gcc-curr/gcc$ grep 'Performing shrink' *.pro_and_epilogue | wc -l
416

Bits left to do
- limit size of duplicated tails
- don't duplicate sibling call blocks, but instead split the block
  after the sibling call epilogue has been added, redirecting
  non-prologue paths past the epilogue.

Is this OK to apply as is?

	* function.c (bb_active_p): Delete.
	(dup_block_and_redirect, active_insn_between): New functions.
	(convert_jumps_to_returns, emit_return_for_exit): New functions,
	split out from..
	(thread_prologue_and_epilogue_insns): ..here.  Delete
	shadowing variables.  Don't do prologue register clobber tests
	when shrink wrapping already failed.  Delete all last_bb_active
	code.  Instead compute tail block candidates for duplicating
	exit path.  Remove these from antic set.  Duplicate tails when
	reached from both blocks needing a prologue/epilogue and
	blocks not needing such.

Index: gcc/function.c
===================================================================
*** gcc/function.c	(revision 180588)
--- gcc/function.c	(working copy)
*************** set_return_jump_label (rtx returnjump)
*** 5514,5535 ****
      JUMP_LABEL (returnjump) = ret_rtx;
  }
  
! /* Return true if BB has any active insns.  */
  static bool
! bb_active_p (basic_block bb)
  {
    rtx label;
  
!   /* Test whether there are active instructions in BB.  */
!   label = BB_END (bb);
!   while (label && !LABEL_P (label))
      {
!       if (active_insn_p (label))
! 	break;
!       label = PREV_INSN (label);
      }
!   return BB_HEAD (bb) != label || !LABEL_P (label);
  }
  
  /* Generate the prologue and epilogue RTL if the machine supports it.  Thread
     this into place with notes indicating where the prologue ends and where
--- 5514,5698 ----
      JUMP_LABEL (returnjump) = ret_rtx;
  }
  
! #ifdef HAVE_simple_return
! /* Create a copy of BB instructions and insert at BEFORE.  Redirect
!    preds of BB to COPY_BB if they don't appear in NEED_PROLOGUE.  */
! static void
! dup_block_and_redirect (basic_block bb, basic_block copy_bb, rtx before,
! 			bitmap_head *need_prologue)
! {
!   edge_iterator ei;
!   edge e;
!   rtx insn = BB_END (bb);
! 
!   /* We know BB has a single successor, so there is no need to copy a
!      simple jump at the end of BB.  */
!   if (simplejump_p (insn))
!     insn = PREV_INSN (insn);
! 
!   start_sequence ();
!   duplicate_insn_chain (BB_HEAD (bb), insn);
!   if (dump_file)
!     {
!       unsigned count = 0;
!       for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
! 	if (active_insn_p (insn))
! 	  ++count;
!       fprintf (dump_file, "Duplicating bb %d to bb %d, %u active insns.\n",
! 	       bb->index, copy_bb->index, count);
!     }
!   insn = get_insns ();
!   end_sequence ();
!   emit_insn_before (insn, before);
! 
!   /* Redirect all the paths that need no prologue into copy_bb.  */
!   for (ei = ei_start (bb->preds); (e = ei_safe_edge (ei)); )
!     if (!bitmap_bit_p (need_prologue, e->src->index))
!       {
! 	redirect_edge_and_branch_force (e, copy_bb);
! 	continue;
!       }
!     else
!       ei_next (&ei);
! }
! #endif
! 
! #if defined (HAVE_return) || defined (HAVE_simple_return)
! /* Return true if there are any active insns between HEAD and TAIL.  */
  static bool
! active_insn_between (rtx head, rtx tail)
! {
!   while (tail)
!     {
!       if (active_insn_p (tail))
! 	return true;
!       if (tail == head)
! 	return false;
!       tail = PREV_INSN (tail);
!     }
!   return false;
! }
! 
! /* LAST_BB is a block that exits, and empty of active instructions.
!    Examine its predecessors for jumps that can be converted to
!    (conditional) returns.  */
! static VEC (edge, heap) *
! convert_jumps_to_returns (basic_block last_bb, bool simple_p,
! 			  VEC (edge, heap) *unconverted ATTRIBUTE_UNUSED)
  {
+   int i;
+   basic_block bb;
    rtx label;
+   edge_iterator ei;
+   edge e;
+   VEC(basic_block,heap) *src_bbs;
+ 
+   src_bbs = VEC_alloc (basic_block, heap, EDGE_COUNT (last_bb->preds));
+   FOR_EACH_EDGE (e, ei, last_bb->preds)
+     if (e->src != ENTRY_BLOCK_PTR)
+       VEC_quick_push (basic_block, src_bbs, e->src);
  
!   label = BB_HEAD (last_bb);
! 
!   FOR_EACH_VEC_ELT (basic_block, src_bbs, i, bb)
      {
!       rtx jump = BB_END (bb);
! 
!       if (!JUMP_P (jump) || JUMP_LABEL (jump) != label)
! 	continue;
! 
!       e = find_edge (bb, last_bb);
! 
!       /* If we have an unconditional jump, we can replace that
! 	 with a simple return instruction.  */
!       if (simplejump_p (jump))
! 	{
! 	  /* The use of the return register might be present in the exit
! 	     fallthru block.  Either:
! 	     - removing the use is safe, and we should remove the use in
! 	     the exit fallthru block, or
! 	     - removing the use is not safe, and we should add it here.
! 	     For now, we conservatively choose the latter.  Either of the
! 	     2 helps in crossjumping.  */
! 	  emit_use_return_register_into_block (bb);
! 
! 	  emit_return_into_block (simple_p, bb);
! 	  delete_insn (jump);
! 	}
! 
!       /* If we have a conditional jump branching to the last
! 	 block, we can try to replace that with a conditional
! 	 return instruction.  */
!       else if (condjump_p (jump))
! 	{
! 	  rtx dest;
! 
! 	  if (simple_p)
! 	    dest = simple_return_rtx;
! 	  else
! 	    dest = ret_rtx;
! 	  if (!redirect_jump (jump, dest, 0))
! 	    {
! #ifdef HAVE_simple_return
! 	      if (simple_p)
! 		{
! 		  if (dump_file)
! 		    fprintf (dump_file,
! 			     "Failed to redirect bb %d branch.\n", bb->index);
! 		  VEC_safe_push (edge, heap, unconverted, e);
! 		}
! #endif
! 	      continue;
! 	    }
! 
! 	  /* See comment in simplejump_p case above.  */
! 	  emit_use_return_register_into_block (bb);
! 
! 	  /* If this block has only one successor, it both jumps
! 	     and falls through to the fallthru block, so we can't
! 	     delete the edge.  */
! 	  if (single_succ_p (bb))
! 	    continue;
! 	}
!       else
! 	{
! #ifdef HAVE_simple_return
! 	  if (simple_p)
! 	    {
! 	      if (dump_file)
! 		fprintf (dump_file,
! 			 "Failed to redirect bb %d branch.\n", bb->index);
! 	      VEC_safe_push (edge, heap, unconverted, e);
! 	    }
! #endif
! 	  continue;
! 	}
! 
!       /* Fix up the CFG for the successful change we just made.  */
!       redirect_edge_succ (e, EXIT_BLOCK_PTR);
!     }
!   VEC_free (basic_block, heap, src_bbs);
!   return unconverted;
! }
! 
! /* Emit a return insn for the exit fallthru block.  */
! static basic_block
! emit_return_for_exit (edge exit_fallthru_edge, bool simple_p)
! {
!   basic_block last_bb = exit_fallthru_edge->src;
! 
!   if (JUMP_P (BB_END (last_bb)))
!     {
!       last_bb = split_edge (exit_fallthru_edge);
!       exit_fallthru_edge = single_succ_edge (last_bb);
      }
!   emit_barrier_after (BB_END (last_bb));
!   emit_return_into_block (simple_p, last_bb);
!   exit_fallthru_edge->flags &= ~EDGE_FALLTHRU;
!   return last_bb;
  }
+ #endif
+ 
  
  /* Generate the prologue and epilogue RTL if the machine supports it.  Thread
     this into place with notes indicating where the prologue ends and where
*************** static void
*** 5583,5602 ****
  thread_prologue_and_epilogue_insns (void)
  {
    bool inserted;
-   basic_block last_bb;
-   bool last_bb_active ATTRIBUTE_UNUSED;
  #ifdef HAVE_simple_return
!   VEC (rtx, heap) *unconverted_simple_returns = NULL;
!   basic_block simple_return_block_hot = NULL;
!   basic_block simple_return_block_cold = NULL;
    bool nonempty_prologue;
  #endif
!   rtx returnjump ATTRIBUTE_UNUSED;
    rtx seq ATTRIBUTE_UNUSED, epilogue_end ATTRIBUTE_UNUSED;
    rtx prologue_seq ATTRIBUTE_UNUSED, split_prologue_seq ATTRIBUTE_UNUSED;
    edge e, entry_edge, orig_entry_edge, exit_fallthru_edge;
    edge_iterator ei;
-   bitmap_head bb_flags;
  
    df_analyze ();
  
--- 5746,5761 ----
  thread_prologue_and_epilogue_insns (void)
  {
    bool inserted;
  #ifdef HAVE_simple_return
!   VEC (edge, heap) *unconverted_simple_returns = NULL;
    bool nonempty_prologue;
+   bitmap_head bb_flags;
  #endif
!   rtx returnjump;
    rtx seq ATTRIBUTE_UNUSED, epilogue_end ATTRIBUTE_UNUSED;
    rtx prologue_seq ATTRIBUTE_UNUSED, split_prologue_seq ATTRIBUTE_UNUSED;
    edge e, entry_edge, orig_entry_edge, exit_fallthru_edge;
    edge_iterator ei;
  
    df_analyze ();
  
*************** thread_prologue_and_epilogue_insns (void
*** 5614,5631 ****
    entry_edge = single_succ_edge (ENTRY_BLOCK_PTR);
    orig_entry_edge = entry_edge;
  
-   exit_fallthru_edge = find_fallthru_edge (EXIT_BLOCK_PTR->preds);
-   if (exit_fallthru_edge != NULL)
-     {
-       last_bb = exit_fallthru_edge->src;
-       last_bb_active = bb_active_p (last_bb);
-     }
-   else
-     {
-       last_bb = NULL;
-       last_bb_active = false;
-     }
- 
    split_prologue_seq = NULL_RTX;
    if (flag_split_stack
        && (lookup_attribute ("no_split_stack", DECL_ATTRIBUTES (cfun->decl))
--- 5773,5778 ----
*************** thread_prologue_and_epilogue_insns (void
*** 5675,5683 ****
      }
  #endif
  
    bitmap_initialize (&bb_flags, &bitmap_default_obstack);
  
- #ifdef HAVE_simple_return
    /* Try to perform a kind of shrink-wrapping, making sure the
       prologue/epilogue is emitted only around those parts of the
       function that require it.  */
--- 5822,5830 ----
      }
  #endif
  
+ #ifdef HAVE_simple_return
    bitmap_initialize (&bb_flags, &bitmap_default_obstack);
  
    /* Try to perform a kind of shrink-wrapping, making sure the
       prologue/epilogue is emitted only around those parts of the
       function that require it.  */
*************** thread_prologue_and_epilogue_insns (void
*** 5697,5707 ****
        HARD_REG_SET prologue_clobbered, prologue_used, live_on_edge;
        HARD_REG_SET set_up_by_prologue;
        rtx p_insn;
- 
        VEC(basic_block, heap) *vec;
        basic_block bb;
        bitmap_head bb_antic_flags;
        bitmap_head bb_on_list;
  
        if (dump_file)
  	fprintf (dump_file, "Attempting shrink-wrapping optimization.\n");
--- 5844,5854 ----
        HARD_REG_SET prologue_clobbered, prologue_used, live_on_edge;
        HARD_REG_SET set_up_by_prologue;
        rtx p_insn;
        VEC(basic_block, heap) *vec;
        basic_block bb;
        bitmap_head bb_antic_flags;
        bitmap_head bb_on_list;
+       bitmap_head bb_tail;
  
        if (dump_file)
  	fprintf (dump_file, "Attempting shrink-wrapping optimization.\n");
*************** thread_prologue_and_epilogue_insns (void
*** 5726,5737 ****
  
        prepare_shrink_wrap (entry_edge->dest);
  
-       /* That may have inserted instructions into the last block.  */
-       if (last_bb && !last_bb_active)
- 	last_bb_active = bb_active_p (last_bb);
- 
        bitmap_initialize (&bb_antic_flags, &bitmap_default_obstack);
        bitmap_initialize (&bb_on_list, &bitmap_default_obstack);
  
        /* Find the set of basic blocks that require a stack frame.  */
  
--- 5873,5881 ----
  
        prepare_shrink_wrap (entry_edge->dest);
  
        bitmap_initialize (&bb_antic_flags, &bitmap_default_obstack);
        bitmap_initialize (&bb_on_list, &bitmap_default_obstack);
+       bitmap_initialize (&bb_tail, &bitmap_default_obstack);
  
        /* Find the set of basic blocks that require a stack frame.  */
  
*************** thread_prologue_and_epilogue_insns (void
*** 5750,5812 ****
        FOR_EACH_BB (bb)
  	{
  	  rtx insn;
! 	  /* As a special case, check for jumps to the last bb that
! 	     cannot successfully be converted to simple_returns later
! 	     on, and mark them as requiring a frame.  These are
! 	     conditional jumps that jump to their fallthru block, so
! 	     it's not a case that is expected to occur often.  */
! 	  if (JUMP_P (BB_END (bb)) && any_condjump_p (BB_END (bb))
! 	      && single_succ_p (bb)
! 	      && !last_bb_active
! 	      && single_succ (bb) == last_bb)
! 	    {
! 	      bitmap_set_bit (&bb_flags, bb->index);
! 	      VEC_quick_push (basic_block, vec, bb);
! 	    }
! 	  else
! 	    FOR_BB_INSNS (bb, insn)
! 	      if (requires_stack_frame_p (insn, prologue_used,
! 					  set_up_by_prologue))
! 		{
! 		  bitmap_set_bit (&bb_flags, bb->index);
! 		  VEC_quick_push (basic_block, vec, bb);
! 		  break;
! 		}
  	}
  
        /* For every basic block that needs a prologue, mark all blocks
  	 reachable from it, so as to ensure they are also seen as
  	 requiring a prologue.  */
        while (!VEC_empty (basic_block, vec))
  	{
  	  basic_block tmp_bb = VEC_pop (basic_block, vec);
! 	  edge e;
! 	  edge_iterator ei;
  	  FOR_EACH_EDGE (e, ei, tmp_bb->succs)
  	    if (e->dest != EXIT_BLOCK_PTR
  		&& bitmap_set_bit (&bb_flags, e->dest->index))
  	      VEC_quick_push (basic_block, vec, e->dest);
  	}
!       /* If the last basic block contains only a label, we'll be able
! 	 to convert jumps to it to (potentially conditional) return
! 	 insns later.  This means we don't necessarily need a prologue
! 	 for paths reaching it.  */
!       if (last_bb && optimize)
! 	{
! 	  if (!last_bb_active)
! 	    bitmap_clear_bit (&bb_flags, last_bb->index);
! 	  else if (!bitmap_bit_p (&bb_flags, last_bb->index))
! 	    goto fail_shrinkwrap;
  	}
  
        /* Now walk backwards from every block that is marked as needing
! 	 a prologue to compute the bb_antic_flags bitmap.  */
!       bitmap_copy (&bb_antic_flags, &bb_flags);
        FOR_EACH_BB (bb)
  	{
! 	  edge e;
! 	  edge_iterator ei;
! 	  if (!bitmap_bit_p (&bb_flags, bb->index))
  	    continue;
  	  FOR_EACH_EDGE (e, ei, bb->preds)
  	    if (!bitmap_bit_p (&bb_antic_flags, e->src->index)
--- 5894,5947 ----
        FOR_EACH_BB (bb)
  	{
  	  rtx insn;
! 	  FOR_BB_INSNS (bb, insn)
! 	    if (requires_stack_frame_p (insn, prologue_used,
! 					set_up_by_prologue))
! 	      {
! 		bitmap_set_bit (&bb_flags, bb->index);
! 		VEC_quick_push (basic_block, vec, bb);
! 		break;
! 	      }
  	}
  
+       /* Save a copy of blocks that really need a prologue.  */
+       bitmap_copy (&bb_antic_flags, &bb_flags);
+ 
        /* For every basic block that needs a prologue, mark all blocks
  	 reachable from it, so as to ensure they are also seen as
  	 requiring a prologue.  */
        while (!VEC_empty (basic_block, vec))
  	{
  	  basic_block tmp_bb = VEC_pop (basic_block, vec);
! 
  	  FOR_EACH_EDGE (e, ei, tmp_bb->succs)
  	    if (e->dest != EXIT_BLOCK_PTR
  		&& bitmap_set_bit (&bb_flags, e->dest->index))
  	      VEC_quick_push (basic_block, vec, e->dest);
  	}
! 
!       /* Find the set of basic blocks that need no prologue, have a
! 	 single successor, and only go to the exit.  */
!       VEC_quick_push (basic_block, vec, EXIT_BLOCK_PTR);
!       while (!VEC_empty (basic_block, vec))
! 	{
! 	  basic_block tmp_bb = VEC_pop (basic_block, vec);
! 
! 	  FOR_EACH_EDGE (e, ei, tmp_bb->preds)
! 	    if (single_succ_p (e->src)
! 		&& !bitmap_bit_p (&bb_antic_flags, e->src->index)
! 		&& bitmap_set_bit (&bb_tail, e->src->index))
! 	      VEC_quick_push (basic_block, vec, e->src);
  	}
  
        /* Now walk backwards from every block that is marked as needing
! 	 a prologue to compute the bb_antic_flags bitmap.  Exclude
! 	 tail blocks; They can be duplicated to be used on paths not
! 	 needing a prologue.  */
!       bitmap_and_compl (&bb_antic_flags, &bb_flags, &bb_tail);
        FOR_EACH_BB (bb)
  	{
! 	  if (!bitmap_bit_p (&bb_antic_flags, bb->index))
  	    continue;
  	  FOR_EACH_EDGE (e, ei, bb->preds)
  	    if (!bitmap_bit_p (&bb_antic_flags, e->src->index)
*************** thread_prologue_and_epilogue_insns (void
*** 5816,5823 ****
        while (!VEC_empty (basic_block, vec))
  	{
  	  basic_block tmp_bb = VEC_pop (basic_block, vec);
- 	  edge e;
- 	  edge_iterator ei;
  	  bool all_set = true;
  
  	  bitmap_clear_bit (&bb_on_list, tmp_bb->index);
--- 5951,5956 ----
*************** thread_prologue_and_epilogue_insns (void
*** 5862,5889 ****
  		}
  	  }
  
!       /* Test whether the prologue is known to clobber any register
! 	 (other than FP or SP) which are live on the edge.  */
!       CLEAR_HARD_REG_BIT (prologue_clobbered, STACK_POINTER_REGNUM);
!       if (frame_pointer_needed)
! 	CLEAR_HARD_REG_BIT (prologue_clobbered, HARD_FRAME_POINTER_REGNUM);
!       CLEAR_HARD_REG_SET (live_on_edge);
!       reg_set_to_hard_reg_set (&live_on_edge,
! 			       df_get_live_in (entry_edge->dest));
!       if (hard_reg_set_intersect_p (live_on_edge, prologue_clobbered))
  	{
! 	  entry_edge = orig_entry_edge;
! 	  if (dump_file)
! 	    fprintf (dump_file, "Shrink-wrapping aborted due to clobber.\n");
  	}
!       else if (entry_edge != orig_entry_edge)
  	{
  	  crtl->shrink_wrapped = true;
  	  if (dump_file)
  	    fprintf (dump_file, "Performing shrink-wrapping.\n");
  	}
  
      fail_shrinkwrap:
        bitmap_clear (&bb_antic_flags);
        bitmap_clear (&bb_on_list);
        VEC_free (basic_block, heap, vec);
--- 5995,6132 ----
  		}
  	  }
  
!       if (entry_edge != orig_entry_edge)
  	{
! 	  /* Test whether the prologue is known to clobber any register
! 	     (other than FP or SP) which are live on the edge.  */
! 	  CLEAR_HARD_REG_BIT (prologue_clobbered, STACK_POINTER_REGNUM);
! 	  if (frame_pointer_needed)
! 	    CLEAR_HARD_REG_BIT (prologue_clobbered, HARD_FRAME_POINTER_REGNUM);
! 	  CLEAR_HARD_REG_SET (live_on_edge);
! 	  reg_set_to_hard_reg_set (&live_on_edge,
! 				   df_get_live_in (entry_edge->dest));
! 	  if (hard_reg_set_intersect_p (live_on_edge, prologue_clobbered))
! 	    {
! 	      entry_edge = orig_entry_edge;
! 	      if (dump_file)
! 		fprintf (dump_file,
! 			 "Shrink-wrapping aborted due to clobber.\n");
! 	    }
  	}
!       if (entry_edge != orig_entry_edge)
  	{
  	  crtl->shrink_wrapped = true;
  	  if (dump_file)
  	    fprintf (dump_file, "Performing shrink-wrapping.\n");
+ 
+ 	  /* Find tail blocks reachable from both blocks needing a
+ 	     prologue and blocks not needing a prologue.  */
+ 	  if (!bitmap_empty_p (&bb_tail))
+ 	    FOR_EACH_BB (bb)
+ 	      {
+ 		bool some_pro, some_no_pro;
+ 		if (!bitmap_bit_p (&bb_tail, bb->index))
+ 		  continue;
+ 		some_pro = some_no_pro = false;
+ 		FOR_EACH_EDGE (e, ei, bb->preds)
+ 		  {
+ 		    if (bitmap_bit_p (&bb_flags, e->src->index))
+ 		      some_pro = true;
+ 		    else
+ 		      some_no_pro = true;
+ 		  }
+ 		if (some_pro && some_no_pro)
+ 		  VEC_quick_push (basic_block, vec, bb);
+ 		else
+ 		  bitmap_clear_bit (&bb_tail, bb->index);
+ 	      }
+ 	  /* Find the head of each tail.  */
+ 	  while (!VEC_empty (basic_block, vec))
+ 	    {
+ 	      basic_block tbb = VEC_pop (basic_block, vec);
+ 
+ 	      if (!bitmap_bit_p (&bb_tail, tbb->index))
+ 		continue;
+ 
+ 	      while (single_succ_p (tbb))
+ 		{
+ 		  tbb = single_succ (tbb);
+ 		  bitmap_clear_bit (&bb_tail, tbb->index);
+ 		}
+ 	    }
+ 	  /* Now duplicate the tails.  */
+ 	  if (!bitmap_empty_p (&bb_tail))
+ 	    FOR_EACH_BB_REVERSE (bb)
+ 	      {
+ 		basic_block copy_bb, next_bb;
+ 		rtx insert_point;
+ 
+ 		if (!bitmap_clear_bit (&bb_tail, bb->index))
+ 		  continue;
+ 
+ 		next_bb = single_succ (bb);
+ 
+ 		/* Create a copy of BB, instructions and all, for
+ 		   use on paths that don't need a prologue.
+ 		   Ideal placement of the copy is on a fall-thru edge
+ 		   or after a block that would jump to the copy.  */ 
+ 		FOR_EACH_EDGE (e, ei, bb->preds)
+ 		  if (!bitmap_bit_p (&bb_flags, e->src->index)
+ 		      && single_succ_p (e->src))
+ 		    break;
+ 		if (e)
+ 		  {
+ 		    copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)),
+ 						  NULL_RTX, e->src);
+ 		    BB_COPY_PARTITION (copy_bb, e->src);
+ 		  }
+ 		else
+ 		  {
+ 		    /* Otherwise put the copy at the end of the function.  */
+ 		    copy_bb = create_basic_block (NULL_RTX, NULL_RTX,
+ 						  EXIT_BLOCK_PTR->prev_bb);
+ 		    BB_COPY_PARTITION (copy_bb, bb);
+ 		  }
+ 
+ 		insert_point = emit_note_after (NOTE_INSN_DELETED,
+ 						BB_END (copy_bb));
+ 		emit_barrier_after (BB_END (copy_bb));
+ 
+ 		make_single_succ_edge (copy_bb, EXIT_BLOCK_PTR, 0);
+ 
+ 		dup_block_and_redirect (bb, copy_bb, insert_point,
+ 					&bb_flags);
+ 
+ 		while (next_bb != EXIT_BLOCK_PTR)
+ 		  {
+ 		    basic_block tbb = next_bb;
+ 		    next_bb = single_succ (tbb);
+ 		    e = split_block (copy_bb, PREV_INSN (insert_point));
+ 		    copy_bb = e->dest;
+ 		    dup_block_and_redirect (tbb, copy_bb, insert_point,
+ 					    &bb_flags);
+ 		  }
+ 
+ 		/* Fiddle with edge flags to quiet verify_flow_info.
+ 		   We have yet to add a simple_return to the tails,
+ 		   as we'd like to first convert_jumps_to_returns in
+ 		   case the block is no longer used after that.  */
+ 		e = single_succ_edge (copy_bb);
+ 		if (CALL_P (PREV_INSN (insert_point))
+ 		    && SIBLING_CALL_P (PREV_INSN (insert_point)))
+ 		  e->flags = EDGE_SIBCALL | EDGE_ABNORMAL;
+ 		else
+ 		  e->flags = EDGE_FAKE;
+ 		/* verify_flow_info doesn't like a note after a
+ 		   sibling call.  */
+ 		delete_insn (insert_point);
+ 		if (bitmap_empty_p (&bb_tail))
+ 		  break;
+ 	      }
  	}
  
      fail_shrinkwrap:
+       bitmap_clear (&bb_tail);
        bitmap_clear (&bb_antic_flags);
        bitmap_clear (&bb_on_list);
        VEC_free (basic_block, heap, vec);
*************** thread_prologue_and_epilogue_insns (void
*** 5911,6057 ****
  
    rtl_profile_for_bb (EXIT_BLOCK_PTR);
  
! #ifdef HAVE_return
    /* If we're allowed to generate a simple return instruction, then by
       definition we don't need a full epilogue.  If the last basic
       block before the exit block does not contain active instructions,
       examine its predecessors and try to emit (conditional) return
       instructions.  */
!   if (optimize && !last_bb_active
!       && (HAVE_return || entry_edge != orig_entry_edge))
      {
!       edge_iterator ei2;
!       int i;
!       basic_block bb;
!       rtx label;
!       VEC(basic_block,heap) *src_bbs;
! 
!       if (exit_fallthru_edge == NULL)
! 	goto epilogue_done;
!       label = BB_HEAD (last_bb);
! 
!       src_bbs = VEC_alloc (basic_block, heap, EDGE_COUNT (last_bb->preds));
!       FOR_EACH_EDGE (e, ei2, last_bb->preds)
! 	if (e->src != ENTRY_BLOCK_PTR)
! 	  VEC_quick_push (basic_block, src_bbs, e->src);
! 
!       FOR_EACH_VEC_ELT (basic_block, src_bbs, i, bb)
  	{
! 	  bool simple_p;
! 	  rtx jump;
! 	  e = find_edge (bb, last_bb);
  
! 	  jump = BB_END (bb);
! 
! #ifdef HAVE_simple_return
! 	  simple_p = (entry_edge != orig_entry_edge
! 		      && !bitmap_bit_p (&bb_flags, bb->index));
! #else
! 	  simple_p = false;
! #endif
! 
! 	  if (!simple_p
! 	      && (!HAVE_return || !JUMP_P (jump)
! 		  || JUMP_LABEL (jump) != label))
! 	    continue;
! 
! 	  /* If we have an unconditional jump, we can replace that
! 	     with a simple return instruction.  */
! 	  if (!JUMP_P (jump))
! 	    {
! 	      emit_barrier_after (BB_END (bb));
! 	      emit_return_into_block (simple_p, bb);
! 	    }
! 	  else if (simplejump_p (jump))
  	    {
! 	      /* The use of the return register might be present in the exit
! 	         fallthru block.  Either:
! 	         - removing the use is safe, and we should remove the use in
! 	           the exit fallthru block, or
! 	         - removing the use is not safe, and we should add it here.
! 	         For now, we conservatively choose the latter.  Either of the
! 	         2 helps in crossjumping.  */
! 	      emit_use_return_register_into_block (bb);
! 
! 	      emit_return_into_block (simple_p, bb);
! 	      delete_insn (jump);
  	    }
! 	  else if (condjump_p (jump) && JUMP_LABEL (jump) != label)
! 	    {
! 	      basic_block new_bb;
! 	      edge new_e;
  
! 	      gcc_assert (simple_p);
! 	      new_bb = split_edge (e);
! 	      emit_barrier_after (BB_END (new_bb));
! 	      emit_return_into_block (simple_p, new_bb);
! #ifdef HAVE_simple_return
! 	      if (BB_PARTITION (new_bb) == BB_HOT_PARTITION)
! 		simple_return_block_hot = new_bb;
! 	      else
! 		simple_return_block_cold = new_bb;
  #endif
! 	      new_e = single_succ_edge (new_bb);
! 	      redirect_edge_succ (new_e, EXIT_BLOCK_PTR);
  
! 	      continue;
! 	    }
! 	  /* If we have a conditional jump branching to the last
! 	     block, we can try to replace that with a conditional
! 	     return instruction.  */
! 	  else if (condjump_p (jump))
! 	    {
! 	      rtx dest;
! 	      if (simple_p)
! 		dest = simple_return_rtx;
! 	      else
! 		dest = ret_rtx;
! 	      if (! redirect_jump (jump, dest, 0))
! 		{
! #ifdef HAVE_simple_return
! 		  if (simple_p)
! 		    VEC_safe_push (rtx, heap,
! 				   unconverted_simple_returns, jump);
! #endif
! 		  continue;
! 		}
  
! 	      /* See comment in simple_jump_p case above.  */
! 	      emit_use_return_register_into_block (bb);
  
! 	      /* If this block has only one successor, it both jumps
! 		 and falls through to the fallthru block, so we can't
! 		 delete the edge.  */
! 	      if (single_succ_p (bb))
! 		continue;
! 	    }
! 	  else
  	    {
  #ifdef HAVE_simple_return
! 	      if (simple_p)
! 		VEC_safe_push (rtx, heap,
! 			       unconverted_simple_returns, jump);
  #endif
! 	      continue;
  	    }
- 
- 	  /* Fix up the CFG for the successful change we just made.  */
- 	  redirect_edge_succ (e, EXIT_BLOCK_PTR);
- 	}
-       VEC_free (basic_block, heap, src_bbs);
- 
-       if (HAVE_return)
- 	{
- 	  /* Emit a return insn for the exit fallthru block.  Whether
- 	     this is still reachable will be determined later.  */
- 
- 	  emit_barrier_after (BB_END (last_bb));
- 	  emit_return_into_block (false, last_bb);
- 	  epilogue_end = BB_END (last_bb);
- 	  if (JUMP_P (epilogue_end))
- 	    set_return_jump_label (epilogue_end);
- 	  single_succ_edge (last_bb)->flags &= ~EDGE_FALLTHRU;
- 	  goto epilogue_done;
  	}
      }
  #endif
--- 6154,6226 ----
  
    rtl_profile_for_bb (EXIT_BLOCK_PTR);
  
!   exit_fallthru_edge = find_fallthru_edge (EXIT_BLOCK_PTR->preds);
! 
    /* If we're allowed to generate a simple return instruction, then by
       definition we don't need a full epilogue.  If the last basic
       block before the exit block does not contain active instructions,
       examine its predecessors and try to emit (conditional) return
       instructions.  */
! #ifdef HAVE_simple_return
!   if (entry_edge != orig_entry_edge)
      {
!       if (optimize)
  	{
! 	  unsigned i, last;
  
! 	  /* convert_jumps_to_returns may add to EXIT_BLOCK_PTR->preds
! 	     (but won't remove).  Stop at end of current preds.  */
! 	  last = EDGE_COUNT (EXIT_BLOCK_PTR->preds);
! 	  for (i = 0; i < last; i++)
  	    {
! 	      e = EDGE_I (EXIT_BLOCK_PTR->preds, i);
! 	      if (LABEL_P (BB_HEAD (e->src))
! 		  && !bitmap_bit_p (&bb_flags, e->src->index)
! 		  && !active_insn_between (BB_HEAD (e->src), BB_END (e->src)))
! 		unconverted_simple_returns
! 		  = convert_jumps_to_returns (e->src, true,
! 					      unconverted_simple_returns);
  	    }
! 	}
  
!       if (exit_fallthru_edge != NULL
! 	  && EDGE_COUNT (exit_fallthru_edge->src->preds) != 0
! 	  && !bitmap_bit_p (&bb_flags, exit_fallthru_edge->src->index))
! 	{
! 	  basic_block last_bb;
! 
! 	  last_bb = emit_return_for_exit (exit_fallthru_edge, true);
! 	  returnjump = BB_END (last_bb);
! 	  exit_fallthru_edge = NULL;
! 	}
!     }
  #endif
! #ifdef HAVE_return
!   if (HAVE_return)
!     {
!       if (exit_fallthru_edge == NULL)
! 	goto epilogue_done;
  
!       if (optimize)
! 	{
! 	  basic_block last_bb = exit_fallthru_edge->src;
  
! 	  if (LABEL_P (BB_HEAD (last_bb))
! 	      && !active_insn_between (BB_HEAD (last_bb), BB_END (last_bb)))
! 	    convert_jumps_to_returns (last_bb, false, NULL);
  
! 	  if (EDGE_COUNT (exit_fallthru_edge->src->preds) != 0)
  	    {
+ 	      last_bb = emit_return_for_exit (exit_fallthru_edge, false);
+ 	      epilogue_end = returnjump = BB_END (last_bb);
  #ifdef HAVE_simple_return
! 	      /* Emitting the return may add a basic block.
! 		 Fix bb_flags for the added block.  */
! 	      if (last_bb != exit_fallthru_edge->src)
! 		bitmap_set_bit (&bb_flags, last_bb->index);
  #endif
! 	      goto epilogue_done;
  	    }
  	}
      }
  #endif
*************** epilogue_done:
*** 6171,6180 ****
       convert to conditional simple_returns, but couldn't for some
       reason, create a block to hold a simple_return insn and redirect
       those remaining edges.  */
!   if (!VEC_empty (rtx, unconverted_simple_returns))
      {
        basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb;
-       rtx jump;
        int i;
  
        gcc_assert (entry_edge != orig_entry_edge);
--- 6340,6352 ----
       convert to conditional simple_returns, but couldn't for some
       reason, create a block to hold a simple_return insn and redirect
       those remaining edges.  */
!   if (!VEC_empty (edge, unconverted_simple_returns))
      {
+       basic_block simple_return_block_hot = NULL;
+       basic_block simple_return_block_cold = NULL;
+       edge pending_edge_hot = NULL;
+       edge pending_edge_cold = NULL;
        basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb;
        int i;
  
        gcc_assert (entry_edge != orig_entry_edge);
*************** epilogue_done:
*** 6184,6208 ****
        if (returnjump != NULL_RTX
  	  && JUMP_LABEL (returnjump) == simple_return_rtx)
  	{
! 	  edge e = split_block (exit_fallthru_edge->src,
! 				PREV_INSN (returnjump));
  	  if (BB_PARTITION (e->src) == BB_HOT_PARTITION)
  	    simple_return_block_hot = e->dest;
  	  else
  	    simple_return_block_cold = e->dest;
  	}
  
!       FOR_EACH_VEC_ELT (rtx, unconverted_simple_returns, i, jump)
  	{
- 	  basic_block src_bb = BLOCK_FOR_INSN (jump);
- 	  edge e = find_edge (src_bb, last_bb);
  	  basic_block *pdest_bb;
  
! 	  if (BB_PARTITION (src_bb) == BB_HOT_PARTITION)
! 	    pdest_bb = &simple_return_block_hot;
  	  else
! 	    pdest_bb = &simple_return_block_cold;
! 	  if (*pdest_bb == NULL)
  	    {
  	      basic_block bb;
  	      rtx start;
--- 6356,6403 ----
        if (returnjump != NULL_RTX
  	  && JUMP_LABEL (returnjump) == simple_return_rtx)
  	{
! 	  e = split_block (BLOCK_FOR_INSN (returnjump), PREV_INSN (returnjump));
  	  if (BB_PARTITION (e->src) == BB_HOT_PARTITION)
  	    simple_return_block_hot = e->dest;
  	  else
  	    simple_return_block_cold = e->dest;
  	}
  
!       /* Also check returns we might need to add to tail blocks.  */
!       FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR->preds)
! 	if (EDGE_COUNT (e->src->preds) != 0
! 	    && (e->flags & EDGE_FAKE) != 0
! 	    && !bitmap_bit_p (&bb_flags, e->src->index))
! 	  {
! 	    if (BB_PARTITION (e->src) == BB_HOT_PARTITION)
! 	      pending_edge_hot = e;
! 	    else
! 	      pending_edge_cold = e;
! 	  }
! 
!       FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e)
  	{
  	  basic_block *pdest_bb;
+ 	  edge pending;
  
! 	  if (BB_PARTITION (e->src) == BB_HOT_PARTITION)
! 	    {
! 	      pdest_bb = &simple_return_block_hot;
! 	      pending = pending_edge_hot;
! 	    }
  	  else
! 	    {
! 	      pdest_bb = &simple_return_block_cold;
! 	      pending = pending_edge_cold;
! 	    }
! 
! 	  if (*pdest_bb == NULL && pending != NULL)
! 	    {
! 	      emit_return_into_block (true, pending->src);
! 	      pending->flags &= ~(EDGE_FALLTHRU | EDGE_FAKE);
! 	      *pdest_bb = pending->src;
! 	    }
! 	  else if (*pdest_bb == NULL)
  	    {
  	      basic_block bb;
  	      rtx start;
*************** epilogue_done:
*** 6219,6225 ****
  	    }
  	  redirect_edge_and_branch_force (e, *pdest_bb);
  	}
!       VEC_free (rtx, heap, unconverted_simple_returns);
      }
  #endif
  
--- 6414,6432 ----
  	    }
  	  redirect_edge_and_branch_force (e, *pdest_bb);
  	}
!       VEC_free (edge, heap, unconverted_simple_returns);
!     }
! 
!   if (entry_edge != orig_entry_edge)
!     {
!       FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR->preds)
! 	if (EDGE_COUNT (e->src->preds) != 0
! 	    && (e->flags & EDGE_FAKE) != 0
! 	    && !bitmap_bit_p (&bb_flags, e->src->index))
! 	  {
! 	    emit_return_into_block (true, e->src);
! 	    e->flags &= ~(EDGE_FALLTHRU | EDGE_FAKE);
! 	  }
      }
  #endif
  
*************** epilogue_done:
*** 6233,6240 ****
  
        if (!CALL_P (insn)
  	  || ! SIBLING_CALL_P (insn)
  	  || (entry_edge != orig_entry_edge
! 	      && !bitmap_bit_p (&bb_flags, bb->index)))
  	{
  	  ei_next (&ei);
  	  continue;
--- 6440,6450 ----
  
        if (!CALL_P (insn)
  	  || ! SIBLING_CALL_P (insn)
+ #ifdef HAVE_simple_return
  	  || (entry_edge != orig_entry_edge
! 	      && !bitmap_bit_p (&bb_flags, bb->index))
! #endif
! 	  )
  	{
  	  ei_next (&ei);
  	  continue;
*************** epilogue_done:
*** 6281,6287 ****
--- 6491,6499 ----
      }
  #endif
  
+ #ifdef HAVE_simple_return
    bitmap_clear (&bb_flags);
+ #endif
  
    /* Threading the prologue and epilogue changes the artificial refs
       in the entry and exit blocks.  */

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: PowerPC shrink-wrap support 3 of 3
  2011-10-31 15:14                     ` Alan Modra
@ 2011-11-01 15:34                       ` Alan Modra
  2011-11-07 17:27                         ` Jakub Jelinek
  2011-11-09  9:48                         ` Hans-Peter Nilsson
  0 siblings, 2 replies; 48+ messages in thread
From: Alan Modra @ 2011-11-01 15:34 UTC (permalink / raw)
  To: gcc-patches, Bernd Schmidt

On Tue, Nov 01, 2011 at 12:57:22AM +1030, Alan Modra wrote:
> Bits left to do
> - limit size of duplicated tails

Done here.  Also fixes a hole in that I took no notice of
targetm.cannot_copy_insn_p when duplicating tails.

One interesting result is that the tail duplication actually reduces
the text size of libstdc++.so from 1074042 to 1073478 bytes on
powerpc-linux.  The reason being that a shrink-wrapped function that
needs a prologue only on paths ending in a sibling call will lose one
copy of the epilogue.  That must happen enough to more than make up
for duplicated tails.

Bootstrapped and regression tested powerpc-linux.  OK to apply?
(And I won't be posting any more versions of the patch until this is
reviewed.  Please excuse me for spamming the list.)

	* function.c (bb_active_p): Delete.
	(dup_block_and_redirect, active_insn_between): New functions.
	(convert_jumps_to_returns, emit_return_for_exit): New functions,
	split out from..
	(thread_prologue_and_epilogue_insns): ..here.  Delete
	shadowing variables.  Don't do prologue register clobber tests
	when shrink wrapping already failed.  Delete all last_bb_active
	code.  Instead compute tail block candidates for duplicating
	exit path.  Remove these from antic set.  Duplicate tails when
	reached from both blocks needing a prologue/epilogue and
	blocks not needing such.
	* ifcvt.c (dead_or_predicable): Test both flag_shrink_wrap and
	HAVE_simple_return.
	* bb-reorder.c (get_uncond_jump_length): Make global.
	* bb-reorder.h (get_uncond_jump_length): Declare.
	* cfgrtl.c (rtl_create_basic_block): Comment typo fix.
	(rtl_split_edge): Likewise.  Warning fix.
	(rtl_duplicate_bb): New function.
	(rtl_cfg_hooks): Enable can_duplicate_block_p and duplicate_block.

Index: gcc/function.c
===================================================================
--- gcc/function.c	(revision 180588)
+++ gcc/function.c	(working copy)
@@ -65,6 +65,8 @@ along with GCC; see the file COPYING3.  
 #include "df.h"
 #include "timevar.h"
 #include "vecprim.h"
+#include "params.h"
+#include "bb-reorder.h"
 
 /* So we can assign to cfun in this file.  */
 #undef cfun
@@ -5290,8 +5292,6 @@ requires_stack_frame_p (rtx insn, HARD_R
   HARD_REG_SET hardregs;
   unsigned regno;
 
-  if (!INSN_P (insn) || DEBUG_INSN_P (insn))
-    return false;
   if (CALL_P (insn))
     return !SIBLING_CALL_P (insn);
 
@@ -5514,23 +5514,186 @@ set_return_jump_label (rtx returnjump)
     JUMP_LABEL (returnjump) = ret_rtx;
 }
 
-/* Return true if BB has any active insns.  */
+#ifdef HAVE_simple_return
+/* Create a copy of BB instructions and insert at BEFORE.  Redirect
+   preds of BB to COPY_BB if they don't appear in NEED_PROLOGUE.  */
+static void
+dup_block_and_redirect (basic_block bb, basic_block copy_bb, rtx before,
+			bitmap_head *need_prologue)
+{
+  edge_iterator ei;
+  edge e;
+  rtx insn = BB_END (bb);
+
+  /* We know BB has a single successor, so there is no need to copy a
+     simple jump at the end of BB.  */
+  if (simplejump_p (insn))
+    insn = PREV_INSN (insn);
+
+  start_sequence ();
+  duplicate_insn_chain (BB_HEAD (bb), insn);
+  if (dump_file)
+    {
+      unsigned count = 0;
+      for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
+	if (active_insn_p (insn))
+	  ++count;
+      fprintf (dump_file, "Duplicating bb %d to bb %d, %u active insns.\n",
+	       bb->index, copy_bb->index, count);
+    }
+  insn = get_insns ();
+  end_sequence ();
+  emit_insn_before (insn, before);
+
+  /* Redirect all the paths that need no prologue into copy_bb.  */
+  for (ei = ei_start (bb->preds); (e = ei_safe_edge (ei)); )
+    if (!bitmap_bit_p (need_prologue, e->src->index))
+      {
+	redirect_edge_and_branch_force (e, copy_bb);
+	continue;
+      }
+    else
+      ei_next (&ei);
+}
+#endif
+
+#if defined (HAVE_return) || defined (HAVE_simple_return)
+/* Return true if there are any active insns between HEAD and TAIL.  */
 static bool
-bb_active_p (basic_block bb)
+active_insn_between (rtx head, rtx tail)
 {
+  while (tail)
+    {
+      if (active_insn_p (tail))
+	return true;
+      if (tail == head)
+	return false;
+      tail = PREV_INSN (tail);
+    }
+  return false;
+}
+
+/* LAST_BB is a block that exits, and empty of active instructions.
+   Examine its predecessors for jumps that can be converted to
+   (conditional) returns.  */
+static VEC (edge, heap) *
+convert_jumps_to_returns (basic_block last_bb, bool simple_p,
+			  VEC (edge, heap) *unconverted ATTRIBUTE_UNUSED)
+{
+  int i;
+  basic_block bb;
   rtx label;
+  edge_iterator ei;
+  edge e;
+  VEC(basic_block,heap) *src_bbs;
+
+  src_bbs = VEC_alloc (basic_block, heap, EDGE_COUNT (last_bb->preds));
+  FOR_EACH_EDGE (e, ei, last_bb->preds)
+    if (e->src != ENTRY_BLOCK_PTR)
+      VEC_quick_push (basic_block, src_bbs, e->src);
 
-  /* Test whether there are active instructions in BB.  */
-  label = BB_END (bb);
-  while (label && !LABEL_P (label))
+  label = BB_HEAD (last_bb);
+
+  FOR_EACH_VEC_ELT (basic_block, src_bbs, i, bb)
     {
-      if (active_insn_p (label))
-	break;
-      label = PREV_INSN (label);
+      rtx jump = BB_END (bb);
+
+      if (!JUMP_P (jump) || JUMP_LABEL (jump) != label)
+	continue;
+
+      e = find_edge (bb, last_bb);
+
+      /* If we have an unconditional jump, we can replace that
+	 with a simple return instruction.  */
+      if (simplejump_p (jump))
+	{
+	  /* The use of the return register might be present in the exit
+	     fallthru block.  Either:
+	     - removing the use is safe, and we should remove the use in
+	     the exit fallthru block, or
+	     - removing the use is not safe, and we should add it here.
+	     For now, we conservatively choose the latter.  Either of the
+	     2 helps in crossjumping.  */
+	  emit_use_return_register_into_block (bb);
+
+	  emit_return_into_block (simple_p, bb);
+	  delete_insn (jump);
+	}
+
+      /* If we have a conditional jump branching to the last
+	 block, we can try to replace that with a conditional
+	 return instruction.  */
+      else if (condjump_p (jump))
+	{
+	  rtx dest;
+
+	  if (simple_p)
+	    dest = simple_return_rtx;
+	  else
+	    dest = ret_rtx;
+	  if (!redirect_jump (jump, dest, 0))
+	    {
+#ifdef HAVE_simple_return
+	      if (simple_p)
+		{
+		  if (dump_file)
+		    fprintf (dump_file,
+			     "Failed to redirect bb %d branch.\n", bb->index);
+		  VEC_safe_push (edge, heap, unconverted, e);
+		}
+#endif
+	      continue;
+	    }
+
+	  /* See comment in simplejump_p case above.  */
+	  emit_use_return_register_into_block (bb);
+
+	  /* If this block has only one successor, it both jumps
+	     and falls through to the fallthru block, so we can't
+	     delete the edge.  */
+	  if (single_succ_p (bb))
+	    continue;
+	}
+      else
+	{
+#ifdef HAVE_simple_return
+	  if (simple_p)
+	    {
+	      if (dump_file)
+		fprintf (dump_file,
+			 "Failed to redirect bb %d branch.\n", bb->index);
+	      VEC_safe_push (edge, heap, unconverted, e);
+	    }
+#endif
+	  continue;
+	}
+
+      /* Fix up the CFG for the successful change we just made.  */
+      redirect_edge_succ (e, EXIT_BLOCK_PTR);
     }
-  return BB_HEAD (bb) != label || !LABEL_P (label);
+  VEC_free (basic_block, heap, src_bbs);
+  return unconverted;
 }
 
+/* Emit a return insn for the exit fallthru block.  */
+static basic_block
+emit_return_for_exit (edge exit_fallthru_edge, bool simple_p)
+{
+  basic_block last_bb = exit_fallthru_edge->src;
+
+  if (JUMP_P (BB_END (last_bb)))
+    {
+      last_bb = split_edge (exit_fallthru_edge);
+      exit_fallthru_edge = single_succ_edge (last_bb);
+    }
+  emit_barrier_after (BB_END (last_bb));
+  emit_return_into_block (simple_p, last_bb);
+  exit_fallthru_edge->flags &= ~EDGE_FALLTHRU;
+  return last_bb;
+}
+#endif
+
+
 /* Generate the prologue and epilogue RTL if the machine supports it.  Thread
    this into place with notes indicating where the prologue ends and where
    the epilogue begins.  Update the basic block information when possible.
@@ -5583,20 +5746,17 @@ static void
 thread_prologue_and_epilogue_insns (void)
 {
   bool inserted;
-  basic_block last_bb;
-  bool last_bb_active ATTRIBUTE_UNUSED;
 #ifdef HAVE_simple_return
-  VEC (rtx, heap) *unconverted_simple_returns = NULL;
-  basic_block simple_return_block_hot = NULL;
-  basic_block simple_return_block_cold = NULL;
+  VEC (edge, heap) *unconverted_simple_returns = NULL;
   bool nonempty_prologue;
+  bitmap_head bb_flags;
+  unsigned max_grow_size;
 #endif
-  rtx returnjump ATTRIBUTE_UNUSED;
+  rtx returnjump;
   rtx seq ATTRIBUTE_UNUSED, epilogue_end ATTRIBUTE_UNUSED;
   rtx prologue_seq ATTRIBUTE_UNUSED, split_prologue_seq ATTRIBUTE_UNUSED;
   edge e, entry_edge, orig_entry_edge, exit_fallthru_edge;
   edge_iterator ei;
-  bitmap_head bb_flags;
 
   df_analyze ();
 
@@ -5614,18 +5774,6 @@ thread_prologue_and_epilogue_insns (void
   entry_edge = single_succ_edge (ENTRY_BLOCK_PTR);
   orig_entry_edge = entry_edge;
 
-  exit_fallthru_edge = find_fallthru_edge (EXIT_BLOCK_PTR->preds);
-  if (exit_fallthru_edge != NULL)
-    {
-      last_bb = exit_fallthru_edge->src;
-      last_bb_active = bb_active_p (last_bb);
-    }
-  else
-    {
-      last_bb = NULL;
-      last_bb_active = false;
-    }
-
   split_prologue_seq = NULL_RTX;
   if (flag_split_stack
       && (lookup_attribute ("no_split_stack", DECL_ATTRIBUTES (cfun->decl))
@@ -5675,9 +5823,9 @@ thread_prologue_and_epilogue_insns (void
     }
 #endif
 
+#ifdef HAVE_simple_return
   bitmap_initialize (&bb_flags, &bitmap_default_obstack);
 
-#ifdef HAVE_simple_return
   /* Try to perform a kind of shrink-wrapping, making sure the
      prologue/epilogue is emitted only around those parts of the
      function that require it.  */
@@ -5697,11 +5845,11 @@ thread_prologue_and_epilogue_insns (void
       HARD_REG_SET prologue_clobbered, prologue_used, live_on_edge;
       HARD_REG_SET set_up_by_prologue;
       rtx p_insn;
-
       VEC(basic_block, heap) *vec;
       basic_block bb;
       bitmap_head bb_antic_flags;
       bitmap_head bb_on_list;
+      bitmap_head bb_tail;
 
       if (dump_file)
 	fprintf (dump_file, "Attempting shrink-wrapping optimization.\n");
@@ -5726,14 +5874,12 @@ thread_prologue_and_epilogue_insns (void
 
       prepare_shrink_wrap (entry_edge->dest);
 
-      /* That may have inserted instructions into the last block.  */
-      if (last_bb && !last_bb_active)
-	last_bb_active = bb_active_p (last_bb);
-
       bitmap_initialize (&bb_antic_flags, &bitmap_default_obstack);
       bitmap_initialize (&bb_on_list, &bitmap_default_obstack);
+      bitmap_initialize (&bb_tail, &bitmap_default_obstack);
 
-      /* Find the set of basic blocks that require a stack frame.  */
+      /* Find the set of basic blocks that require a stack frame,
+	 and blocks that are too big to be duplicated.  */
 
       vec = VEC_alloc (basic_block, heap, n_basic_blocks);
 
@@ -5747,66 +5893,80 @@ thread_prologue_and_epilogue_insns (void
 	add_to_hard_reg_set (&set_up_by_prologue, Pmode,
 			     PIC_OFFSET_TABLE_REGNUM);
 
+      /* We don't use a different max size depending on
+	 optimize_bb_for_speed_p because increasing shrink-wrapping
+	 opportunities by duplicating tail blocks can actually result
+	 in an overall decrease in code size.  */
+      max_grow_size = get_uncond_jump_length ();
+      max_grow_size *= PARAM_VALUE (PARAM_MAX_GROW_COPY_BB_INSNS);
+
       FOR_EACH_BB (bb)
 	{
 	  rtx insn;
-	  /* As a special case, check for jumps to the last bb that
-	     cannot successfully be converted to simple_returns later
-	     on, and mark them as requiring a frame.  These are
-	     conditional jumps that jump to their fallthru block, so
-	     it's not a case that is expected to occur often.  */
-	  if (JUMP_P (BB_END (bb)) && any_condjump_p (BB_END (bb))
-	      && single_succ_p (bb)
-	      && !last_bb_active
-	      && single_succ (bb) == last_bb)
-	    {
-	      bitmap_set_bit (&bb_flags, bb->index);
-	      VEC_quick_push (basic_block, vec, bb);
-	    }
-	  else
-	    FOR_BB_INSNS (bb, insn)
-	      if (requires_stack_frame_p (insn, prologue_used,
-					  set_up_by_prologue))
-		{
-		  bitmap_set_bit (&bb_flags, bb->index);
-		  VEC_quick_push (basic_block, vec, bb);
-		  break;
-		}
+	  unsigned size = 0;
+
+	  FOR_BB_INSNS (bb, insn)
+	    if (NONDEBUG_INSN_P (insn))
+	      {
+		if (requires_stack_frame_p (insn, prologue_used,
+					    set_up_by_prologue))
+		  {
+		    if (bb == entry_edge->dest)
+		      goto fail_shrinkwrap;
+		    bitmap_set_bit (&bb_flags, bb->index);
+		    VEC_quick_push (basic_block, vec, bb);
+		    break;
+		  }
+		else if (size <= max_grow_size)
+		  {
+		    size += get_attr_min_length (insn);
+		    if (size > max_grow_size)
+		      bitmap_set_bit (&bb_on_list, bb->index);
+		  }
+	      }
 	}
 
+      /* Blocks that really need a prologue, or are too big for tails.  */
+      bitmap_ior_into (&bb_on_list, &bb_flags);
+
       /* For every basic block that needs a prologue, mark all blocks
 	 reachable from it, so as to ensure they are also seen as
 	 requiring a prologue.  */
       while (!VEC_empty (basic_block, vec))
 	{
 	  basic_block tmp_bb = VEC_pop (basic_block, vec);
-	  edge e;
-	  edge_iterator ei;
+
 	  FOR_EACH_EDGE (e, ei, tmp_bb->succs)
 	    if (e->dest != EXIT_BLOCK_PTR
 		&& bitmap_set_bit (&bb_flags, e->dest->index))
 	      VEC_quick_push (basic_block, vec, e->dest);
 	}
-      /* If the last basic block contains only a label, we'll be able
-	 to convert jumps to it to (potentially conditional) return
-	 insns later.  This means we don't necessarily need a prologue
-	 for paths reaching it.  */
-      if (last_bb && optimize)
-	{
-	  if (!last_bb_active)
-	    bitmap_clear_bit (&bb_flags, last_bb->index);
-	  else if (!bitmap_bit_p (&bb_flags, last_bb->index))
-	    goto fail_shrinkwrap;
+
+      /* Find the set of basic blocks that need no prologue, have a
+	 single successor, can be duplicated, meet a max size
+	 requirement, and go to the exit via like blocks.  */
+      VEC_quick_push (basic_block, vec, EXIT_BLOCK_PTR);
+      while (!VEC_empty (basic_block, vec))
+	{
+	  basic_block tmp_bb = VEC_pop (basic_block, vec);
+
+	  FOR_EACH_EDGE (e, ei, tmp_bb->preds)
+	    if (single_succ_p (e->src)
+		&& !bitmap_bit_p (&bb_on_list, e->src->index)
+		&& can_duplicate_block_p (e->src)
+		&& bitmap_set_bit (&bb_tail, e->src->index))
+	      VEC_quick_push (basic_block, vec, e->src);
 	}
 
       /* Now walk backwards from every block that is marked as needing
-	 a prologue to compute the bb_antic_flags bitmap.  */
-      bitmap_copy (&bb_antic_flags, &bb_flags);
+	 a prologue to compute the bb_antic_flags bitmap.  Exclude
+	 tail blocks; They can be duplicated to be used on paths not
+	 needing a prologue.  */
+      bitmap_clear (&bb_on_list);
+      bitmap_and_compl (&bb_antic_flags, &bb_flags, &bb_tail);
       FOR_EACH_BB (bb)
 	{
-	  edge e;
-	  edge_iterator ei;
-	  if (!bitmap_bit_p (&bb_flags, bb->index))
+	  if (!bitmap_bit_p (&bb_antic_flags, bb->index))
 	    continue;
 	  FOR_EACH_EDGE (e, ei, bb->preds)
 	    if (!bitmap_bit_p (&bb_antic_flags, e->src->index)
@@ -5816,8 +5976,6 @@ thread_prologue_and_epilogue_insns (void
       while (!VEC_empty (basic_block, vec))
 	{
 	  basic_block tmp_bb = VEC_pop (basic_block, vec);
-	  edge e;
-	  edge_iterator ei;
 	  bool all_set = true;
 
 	  bitmap_clear_bit (&bb_on_list, tmp_bb->index);
@@ -5862,28 +6020,134 @@ thread_prologue_and_epilogue_insns (void
 		}
 	  }
 
-      /* Test whether the prologue is known to clobber any register
-	 (other than FP or SP) which are live on the edge.  */
-      CLEAR_HARD_REG_BIT (prologue_clobbered, STACK_POINTER_REGNUM);
-      if (frame_pointer_needed)
-	CLEAR_HARD_REG_BIT (prologue_clobbered, HARD_FRAME_POINTER_REGNUM);
-      CLEAR_HARD_REG_SET (live_on_edge);
-      reg_set_to_hard_reg_set (&live_on_edge,
-			       df_get_live_in (entry_edge->dest));
-      if (hard_reg_set_intersect_p (live_on_edge, prologue_clobbered))
+      if (entry_edge != orig_entry_edge)
 	{
-	  entry_edge = orig_entry_edge;
-	  if (dump_file)
-	    fprintf (dump_file, "Shrink-wrapping aborted due to clobber.\n");
+	  /* Test whether the prologue is known to clobber any register
+	     (other than FP or SP) which are live on the edge.  */
+	  CLEAR_HARD_REG_BIT (prologue_clobbered, STACK_POINTER_REGNUM);
+	  if (frame_pointer_needed)
+	    CLEAR_HARD_REG_BIT (prologue_clobbered, HARD_FRAME_POINTER_REGNUM);
+	  CLEAR_HARD_REG_SET (live_on_edge);
+	  reg_set_to_hard_reg_set (&live_on_edge,
+				   df_get_live_in (entry_edge->dest));
+	  if (hard_reg_set_intersect_p (live_on_edge, prologue_clobbered))
+	    {
+	      entry_edge = orig_entry_edge;
+	      if (dump_file)
+		fprintf (dump_file,
+			 "Shrink-wrapping aborted due to clobber.\n");
+	    }
 	}
-      else if (entry_edge != orig_entry_edge)
+      if (entry_edge != orig_entry_edge)
 	{
 	  crtl->shrink_wrapped = true;
 	  if (dump_file)
 	    fprintf (dump_file, "Performing shrink-wrapping.\n");
+
+	  /* Find tail blocks reachable from both blocks needing a
+	     prologue and blocks not needing a prologue.  */
+	  if (!bitmap_empty_p (&bb_tail))
+	    FOR_EACH_BB (bb)
+	      {
+		bool some_pro, some_no_pro;
+		if (!bitmap_bit_p (&bb_tail, bb->index))
+		  continue;
+		some_pro = some_no_pro = false;
+		FOR_EACH_EDGE (e, ei, bb->preds)
+		  {
+		    if (bitmap_bit_p (&bb_flags, e->src->index))
+		      some_pro = true;
+		    else
+		      some_no_pro = true;
+		  }
+		if (some_pro && some_no_pro)
+		  VEC_quick_push (basic_block, vec, bb);
+		else
+		  bitmap_clear_bit (&bb_tail, bb->index);
+	      }
+	  /* Find the head of each tail.  */
+	  while (!VEC_empty (basic_block, vec))
+	    {
+	      basic_block tbb = VEC_pop (basic_block, vec);
+
+	      if (!bitmap_bit_p (&bb_tail, tbb->index))
+		continue;
+
+	      while (single_succ_p (tbb))
+		{
+		  tbb = single_succ (tbb);
+		  bitmap_clear_bit (&bb_tail, tbb->index);
+		}
+	    }
+	  /* Now duplicate the tails.  */
+	  if (!bitmap_empty_p (&bb_tail))
+	    FOR_EACH_BB_REVERSE (bb)
+	      {
+		basic_block copy_bb, tbb;
+		rtx insert_point;
+		int eflags;
+
+		if (!bitmap_clear_bit (&bb_tail, bb->index))
+		  continue;
+
+		/* Create a copy of BB, instructions and all, for
+		   use on paths that don't need a prologue.
+		   Ideal placement of the copy is on a fall-thru edge
+		   or after a block that would jump to the copy.  */ 
+		FOR_EACH_EDGE (e, ei, bb->preds)
+		  if (!bitmap_bit_p (&bb_flags, e->src->index)
+		      && single_succ_p (e->src))
+		    break;
+		if (e)
+		  {
+		    copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)),
+						  NULL_RTX, e->src);
+		    BB_COPY_PARTITION (copy_bb, e->src);
+		  }
+		else
+		  {
+		    /* Otherwise put the copy at the end of the function.  */
+		    copy_bb = create_basic_block (NULL_RTX, NULL_RTX,
+						  EXIT_BLOCK_PTR->prev_bb);
+		    BB_COPY_PARTITION (copy_bb, bb);
+		  }
+
+		insert_point = emit_note_after (NOTE_INSN_DELETED,
+						BB_END (copy_bb));
+		emit_barrier_after (BB_END (copy_bb));
+
+		tbb = bb;
+		while (1)
+		  {
+		    dup_block_and_redirect (tbb, copy_bb, insert_point,
+					    &bb_flags);
+		    tbb = single_succ (tbb);
+		    if (tbb == EXIT_BLOCK_PTR)
+		      break;
+		    e = split_block (copy_bb, PREV_INSN (insert_point));
+		    copy_bb = e->dest;
+		  }
+
+		/* Quiet verify_flow_info by (ab)using EDGE_FAKE.
+		   We have yet to add a simple_return to the tails,
+		   as we'd like to first convert_jumps_to_returns in
+		   case the block is no longer used after that.  */
+		eflags = EDGE_FAKE;
+		if (CALL_P (PREV_INSN (insert_point))
+		    && SIBLING_CALL_P (PREV_INSN (insert_point)))
+		  eflags = EDGE_SIBCALL | EDGE_ABNORMAL;
+		make_single_succ_edge (copy_bb, EXIT_BLOCK_PTR, eflags);
+
+		/* verify_flow_info doesn't like a note after a
+		   sibling call.  */
+		delete_insn (insert_point);
+		if (bitmap_empty_p (&bb_tail))
+		  break;
+	      }
 	}
 
     fail_shrinkwrap:
+      bitmap_clear (&bb_tail);
       bitmap_clear (&bb_antic_flags);
       bitmap_clear (&bb_on_list);
       VEC_free (basic_block, heap, vec);
@@ -5911,147 +6175,73 @@ thread_prologue_and_epilogue_insns (void
 
   rtl_profile_for_bb (EXIT_BLOCK_PTR);
 
-#ifdef HAVE_return
+  exit_fallthru_edge = find_fallthru_edge (EXIT_BLOCK_PTR->preds);
+
   /* If we're allowed to generate a simple return instruction, then by
      definition we don't need a full epilogue.  If the last basic
      block before the exit block does not contain active instructions,
      examine its predecessors and try to emit (conditional) return
      instructions.  */
-  if (optimize && !last_bb_active
-      && (HAVE_return || entry_edge != orig_entry_edge))
+#ifdef HAVE_simple_return
+  if (entry_edge != orig_entry_edge)
     {
-      edge_iterator ei2;
-      int i;
-      basic_block bb;
-      rtx label;
-      VEC(basic_block,heap) *src_bbs;
-
-      if (exit_fallthru_edge == NULL)
-	goto epilogue_done;
-      label = BB_HEAD (last_bb);
-
-      src_bbs = VEC_alloc (basic_block, heap, EDGE_COUNT (last_bb->preds));
-      FOR_EACH_EDGE (e, ei2, last_bb->preds)
-	if (e->src != ENTRY_BLOCK_PTR)
-	  VEC_quick_push (basic_block, src_bbs, e->src);
-
-      FOR_EACH_VEC_ELT (basic_block, src_bbs, i, bb)
+      if (optimize)
 	{
-	  bool simple_p;
-	  rtx jump;
-	  e = find_edge (bb, last_bb);
-
-	  jump = BB_END (bb);
+	  unsigned i, last;
 
-#ifdef HAVE_simple_return
-	  simple_p = (entry_edge != orig_entry_edge
-		      && !bitmap_bit_p (&bb_flags, bb->index));
-#else
-	  simple_p = false;
-#endif
+	  /* convert_jumps_to_returns may add to EXIT_BLOCK_PTR->preds
+	     (but won't remove).  Stop at end of current preds.  */
+	  last = EDGE_COUNT (EXIT_BLOCK_PTR->preds);
+	  for (i = 0; i < last; i++)
+	    {
+	      e = EDGE_I (EXIT_BLOCK_PTR->preds, i);
+	      if (LABEL_P (BB_HEAD (e->src))
+		  && !bitmap_bit_p (&bb_flags, e->src->index)
+		  && !active_insn_between (BB_HEAD (e->src), BB_END (e->src)))
+		unconverted_simple_returns
+		  = convert_jumps_to_returns (e->src, true,
+					      unconverted_simple_returns);
+	    }
+	}
 
-	  if (!simple_p
-	      && (!HAVE_return || !JUMP_P (jump)
-		  || JUMP_LABEL (jump) != label))
-	    continue;
+      if (exit_fallthru_edge != NULL
+	  && EDGE_COUNT (exit_fallthru_edge->src->preds) != 0
+	  && !bitmap_bit_p (&bb_flags, exit_fallthru_edge->src->index))
+	{
+	  basic_block last_bb;
 
-	  /* If we have an unconditional jump, we can replace that
-	     with a simple return instruction.  */
-	  if (!JUMP_P (jump))
-	    {
-	      emit_barrier_after (BB_END (bb));
-	      emit_return_into_block (simple_p, bb);
-	    }
-	  else if (simplejump_p (jump))
-	    {
-	      /* The use of the return register might be present in the exit
-	         fallthru block.  Either:
-	         - removing the use is safe, and we should remove the use in
-	           the exit fallthru block, or
-	         - removing the use is not safe, and we should add it here.
-	         For now, we conservatively choose the latter.  Either of the
-	         2 helps in crossjumping.  */
-	      emit_use_return_register_into_block (bb);
-
-	      emit_return_into_block (simple_p, bb);
-	      delete_insn (jump);
-	    }
-	  else if (condjump_p (jump) && JUMP_LABEL (jump) != label)
-	    {
-	      basic_block new_bb;
-	      edge new_e;
-
-	      gcc_assert (simple_p);
-	      new_bb = split_edge (e);
-	      emit_barrier_after (BB_END (new_bb));
-	      emit_return_into_block (simple_p, new_bb);
-#ifdef HAVE_simple_return
-	      if (BB_PARTITION (new_bb) == BB_HOT_PARTITION)
-		simple_return_block_hot = new_bb;
-	      else
-		simple_return_block_cold = new_bb;
+	  last_bb = emit_return_for_exit (exit_fallthru_edge, true);
+	  returnjump = BB_END (last_bb);
+	  exit_fallthru_edge = NULL;
+	}
+    }
 #endif
-	      new_e = single_succ_edge (new_bb);
-	      redirect_edge_succ (new_e, EXIT_BLOCK_PTR);
+#ifdef HAVE_return
+  if (HAVE_return)
+    {
+      if (exit_fallthru_edge == NULL)
+	goto epilogue_done;
 
-	      continue;
-	    }
-	  /* If we have a conditional jump branching to the last
-	     block, we can try to replace that with a conditional
-	     return instruction.  */
-	  else if (condjump_p (jump))
-	    {
-	      rtx dest;
-	      if (simple_p)
-		dest = simple_return_rtx;
-	      else
-		dest = ret_rtx;
-	      if (! redirect_jump (jump, dest, 0))
-		{
-#ifdef HAVE_simple_return
-		  if (simple_p)
-		    VEC_safe_push (rtx, heap,
-				   unconverted_simple_returns, jump);
-#endif
-		  continue;
-		}
+      if (optimize)
+	{
+	  basic_block last_bb = exit_fallthru_edge->src;
 
-	      /* See comment in simple_jump_p case above.  */
-	      emit_use_return_register_into_block (bb);
+	  if (LABEL_P (BB_HEAD (last_bb))
+	      && !active_insn_between (BB_HEAD (last_bb), BB_END (last_bb)))
+	    convert_jumps_to_returns (last_bb, false, NULL);
 
-	      /* If this block has only one successor, it both jumps
-		 and falls through to the fallthru block, so we can't
-		 delete the edge.  */
-	      if (single_succ_p (bb))
-		continue;
-	    }
-	  else
+	  if (EDGE_COUNT (exit_fallthru_edge->src->preds) != 0)
 	    {
+	      last_bb = emit_return_for_exit (exit_fallthru_edge, false);
+	      epilogue_end = returnjump = BB_END (last_bb);
 #ifdef HAVE_simple_return
-	      if (simple_p)
-		VEC_safe_push (rtx, heap,
-			       unconverted_simple_returns, jump);
+	      /* Emitting the return may add a basic block.
+		 Fix bb_flags for the added block.  */
+	      if (last_bb != exit_fallthru_edge->src)
+		bitmap_set_bit (&bb_flags, last_bb->index);
 #endif
-	      continue;
+	      goto epilogue_done;
 	    }
-
-	  /* Fix up the CFG for the successful change we just made.  */
-	  redirect_edge_succ (e, EXIT_BLOCK_PTR);
-	}
-      VEC_free (basic_block, heap, src_bbs);
-
-      if (HAVE_return)
-	{
-	  /* Emit a return insn for the exit fallthru block.  Whether
-	     this is still reachable will be determined later.  */
-
-	  emit_barrier_after (BB_END (last_bb));
-	  emit_return_into_block (false, last_bb);
-	  epilogue_end = BB_END (last_bb);
-	  if (JUMP_P (epilogue_end))
-	    set_return_jump_label (epilogue_end);
-	  single_succ_edge (last_bb)->flags &= ~EDGE_FALLTHRU;
-	  goto epilogue_done;
 	}
     }
 #endif
@@ -6171,10 +6361,13 @@ epilogue_done:
      convert to conditional simple_returns, but couldn't for some
      reason, create a block to hold a simple_return insn and redirect
      those remaining edges.  */
-  if (!VEC_empty (rtx, unconverted_simple_returns))
+  if (!VEC_empty (edge, unconverted_simple_returns))
     {
+      basic_block simple_return_block_hot = NULL;
+      basic_block simple_return_block_cold = NULL;
+      edge pending_edge_hot = NULL;
+      edge pending_edge_cold = NULL;
       basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb;
-      rtx jump;
       int i;
 
       gcc_assert (entry_edge != orig_entry_edge);
@@ -6184,25 +6377,48 @@ epilogue_done:
       if (returnjump != NULL_RTX
 	  && JUMP_LABEL (returnjump) == simple_return_rtx)
 	{
-	  edge e = split_block (exit_fallthru_edge->src,
-				PREV_INSN (returnjump));
+	  e = split_block (BLOCK_FOR_INSN (returnjump), PREV_INSN (returnjump));
 	  if (BB_PARTITION (e->src) == BB_HOT_PARTITION)
 	    simple_return_block_hot = e->dest;
 	  else
 	    simple_return_block_cold = e->dest;
 	}
 
-      FOR_EACH_VEC_ELT (rtx, unconverted_simple_returns, i, jump)
+      /* Also check returns we might need to add to tail blocks.  */
+      FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR->preds)
+	if (EDGE_COUNT (e->src->preds) != 0
+	    && (e->flags & EDGE_FAKE) != 0
+	    && !bitmap_bit_p (&bb_flags, e->src->index))
+	  {
+	    if (BB_PARTITION (e->src) == BB_HOT_PARTITION)
+	      pending_edge_hot = e;
+	    else
+	      pending_edge_cold = e;
+	  }
+
+      FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e)
 	{
-	  basic_block src_bb = BLOCK_FOR_INSN (jump);
-	  edge e = find_edge (src_bb, last_bb);
 	  basic_block *pdest_bb;
+	  edge pending;
 
-	  if (BB_PARTITION (src_bb) == BB_HOT_PARTITION)
-	    pdest_bb = &simple_return_block_hot;
+	  if (BB_PARTITION (e->src) == BB_HOT_PARTITION)
+	    {
+	      pdest_bb = &simple_return_block_hot;
+	      pending = pending_edge_hot;
+	    }
 	  else
-	    pdest_bb = &simple_return_block_cold;
-	  if (*pdest_bb == NULL)
+	    {
+	      pdest_bb = &simple_return_block_cold;
+	      pending = pending_edge_cold;
+	    }
+
+	  if (*pdest_bb == NULL && pending != NULL)
+	    {
+	      emit_return_into_block (true, pending->src);
+	      pending->flags &= ~(EDGE_FALLTHRU | EDGE_FAKE);
+	      *pdest_bb = pending->src;
+	    }
+	  else if (*pdest_bb == NULL)
 	    {
 	      basic_block bb;
 	      rtx start;
@@ -6219,7 +6435,19 @@ epilogue_done:
 	    }
 	  redirect_edge_and_branch_force (e, *pdest_bb);
 	}
-      VEC_free (rtx, heap, unconverted_simple_returns);
+      VEC_free (edge, heap, unconverted_simple_returns);
+    }
+
+  if (entry_edge != orig_entry_edge)
+    {
+      FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR->preds)
+	if (EDGE_COUNT (e->src->preds) != 0
+	    && (e->flags & EDGE_FAKE) != 0
+	    && !bitmap_bit_p (&bb_flags, e->src->index))
+	  {
+	    emit_return_into_block (true, e->src);
+	    e->flags &= ~(EDGE_FALLTHRU | EDGE_FAKE);
+	  }
     }
 #endif
 
@@ -6233,8 +6461,11 @@ epilogue_done:
 
       if (!CALL_P (insn)
 	  || ! SIBLING_CALL_P (insn)
+#ifdef HAVE_simple_return
 	  || (entry_edge != orig_entry_edge
-	      && !bitmap_bit_p (&bb_flags, bb->index)))
+	      && !bitmap_bit_p (&bb_flags, bb->index))
+#endif
+	  )
 	{
 	  ei_next (&ei);
 	  continue;
@@ -6281,7 +6512,9 @@ epilogue_done:
     }
 #endif
 
+#ifdef HAVE_simple_return
   bitmap_clear (&bb_flags);
+#endif
 
   /* Threading the prologue and epilogue changes the artificial refs
      in the entry and exit blocks.  */
Index: gcc/ifcvt.c
===================================================================
--- gcc/ifcvt.c	(revision 180588)
+++ gcc/ifcvt.c	(working copy)
@@ -4167,13 +4167,14 @@ dead_or_predicable (basic_block test_bb,
 	if (NONDEBUG_INSN_P (insn))
 	  df_simulate_find_defs (insn, merge_set);
 
+#ifdef HAVE_simple_return
       /* If shrink-wrapping, disable this optimization when test_bb is
 	 the first basic block and merge_bb exits.  The idea is to not
 	 move code setting up a return register as that may clobber a
 	 register used to pass function parameters, which then must be
 	 saved in caller-saved regs.  A caller-saved reg requires the
 	 prologue, killing a shrink-wrap opportunity.  */
-      if ((flag_shrink_wrap && !epilogue_completed)
+      if ((flag_shrink_wrap && HAVE_simple_return && !epilogue_completed)
 	  && ENTRY_BLOCK_PTR->next_bb == test_bb
 	  && single_succ_p (new_dest)
 	  && single_succ (new_dest) == EXIT_BLOCK_PTR
@@ -4224,6 +4225,7 @@ dead_or_predicable (basic_block test_bb,
 	    }
 	  BITMAP_FREE (return_regs);
 	}
+#endif
     }
 
  no_body:
Index: gcc/bb-reorder.c
===================================================================
--- gcc/bb-reorder.c	(revision 180588)
+++ gcc/bb-reorder.c	(working copy)
@@ -181,7 +181,6 @@ static fibheapkey_t bb_to_key (basic_blo
 static bool better_edge_p (const_basic_block, const_edge, int, int, int, int, const_edge);
 static void connect_traces (int, struct trace *);
 static bool copy_bb_p (const_basic_block, int);
-static int get_uncond_jump_length (void);
 static bool push_to_next_round_p (const_basic_block, int, int, int, gcov_type);
 \f
 /* Check to see if bb should be pushed into the next round of trace
@@ -1193,7 +1192,7 @@ copy_bb_p (const_basic_block bb, int cod
 
 /* Return the length of unconditional jump instruction.  */
 
-static int
+int
 get_uncond_jump_length (void)
 {
   rtx label, jump;
Index: gcc/bb-reorder.h
===================================================================
--- gcc/bb-reorder.h	(revision 180588)
+++ gcc/bb-reorder.h	(working copy)
@@ -34,4 +34,6 @@ extern struct target_bb_reorder *this_ta
 #define this_target_bb_reorder (&default_target_bb_reorder)
 #endif
 
+extern int get_uncond_jump_length (void);
+
 #endif
Index: gcc/cfgrtl.c
===================================================================
--- gcc/cfgrtl.c	(revision 180588)
+++ gcc/cfgrtl.c	(working copy)
@@ -322,9 +322,9 @@ create_basic_block_structure (rtx head, 
 }
 
 /* Create new basic block consisting of instructions in between HEAD and END
-   and place it to the BB chain after block AFTER.  END can be NULL in to
-   create new empty basic block before HEAD.  Both END and HEAD can be NULL to
-   create basic block at the end of INSN chain.  */
+   and place it to the BB chain after block AFTER.  END can be NULL to
+   create a new empty basic block before HEAD.  Both END and HEAD can be
+   NULL to create basic block at the end of INSN chain.  */
 
 static basic_block
 rtl_create_basic_block (void *headp, void *endp, basic_block after)
@@ -1411,8 +1411,8 @@ rtl_split_edge (edge edge_in)
     before = NULL_RTX;
 
   /* If this is a fall through edge to the exit block, the blocks might be
-     not adjacent, and the right place is the after the source.  */
-  if (edge_in->flags & EDGE_FALLTHRU && edge_in->dest == EXIT_BLOCK_PTR)
+     not adjacent, and the right place is after the source.  */
+  if ((edge_in->flags & EDGE_FALLTHRU) && edge_in->dest == EXIT_BLOCK_PTR)
     {
       before = NEXT_INSN (BB_END (edge_in->src));
       bb = create_basic_block (before, NULL, edge_in->src);
@@ -3175,6 +3175,21 @@ rtl_can_remove_branch_p (const_edge e)
   return true;
 }
 
+/* We do not want to declare these functions in a header file, since they
+   should only be used through the cfghooks interface, and we do not want to
+   move them here since it would require also moving quite a lot of related
+   code.  They are in cfglayout.c.  */
+extern bool cfg_layout_can_duplicate_bb_p (const_basic_block);
+extern basic_block cfg_layout_duplicate_bb (basic_block);
+
+static basic_block
+rtl_duplicate_bb (basic_block bb)
+{
+  bb = cfg_layout_duplicate_bb (bb);
+  bb->aux = NULL;
+  return bb;
+}
+
 /* Implementation of CFG manipulation for linearized RTL.  */
 struct cfg_hooks rtl_cfg_hooks = {
   "rtl",
@@ -3191,8 +3206,8 @@ struct cfg_hooks rtl_cfg_hooks = {
   rtl_merge_blocks,
   rtl_predict_edge,
   rtl_predicted_by_p,
-  NULL, /* can_duplicate_block_p */
-  NULL, /* duplicate_block */
+  cfg_layout_can_duplicate_bb_p,
+  rtl_duplicate_bb,
   rtl_split_edge,
   rtl_make_forwarder_block,
   rtl_tidy_fallthru_edge,
@@ -3214,13 +3229,6 @@ struct cfg_hooks rtl_cfg_hooks = {
    This representation will hopefully become the default one in future
    version of the compiler.  */
 
-/* We do not want to declare these functions in a header file, since they
-   should only be used through the cfghooks interface, and we do not want to
-   move them here since it would require also moving quite a lot of related
-   code.  They are in cfglayout.c.  */
-extern bool cfg_layout_can_duplicate_bb_p (const_basic_block);
-extern basic_block cfg_layout_duplicate_bb (basic_block);
-
 struct cfg_hooks cfg_layout_rtl_cfg_hooks = {
   "cfglayout mode",
   rtl_verify_flow_info_1,

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: PowerPC shrink-wrap support 3 of 3
  2011-11-01 15:34                       ` Alan Modra
@ 2011-11-07 17:27                         ` Jakub Jelinek
  2011-11-09  9:48                         ` Hans-Peter Nilsson
  1 sibling, 0 replies; 48+ messages in thread
From: Jakub Jelinek @ 2011-11-07 17:27 UTC (permalink / raw)
  To: Alan Modra, Bernd Schmidt, Richard Henderson; +Cc: gcc-patches

On Wed, Nov 02, 2011 at 02:03:40AM +1030, Alan Modra wrote:
> Bootstrapped and regression tested powerpc-linux.  OK to apply?
> (And I won't be posting any more versions of the patch until this is
> reviewed.  Please excuse me for spamming the list.)

Looks reasonable to me, appart from

> 	* function.c (bb_active_p): Delete.
> 	(dup_block_and_redirect, active_insn_between): New functions.
> 	(convert_jumps_to_returns, emit_return_for_exit): New functions,
> 	split out from..
> 	(thread_prologue_and_epilogue_insns): ..here.  Delete
> 	shadowing variables.  Don't do prologue register clobber tests
> 	when shrink wrapping already failed.  Delete all last_bb_active
> 	code.  Instead compute tail block candidates for duplicating
> 	exit path.  Remove these from antic set.  Duplicate tails when
> 	reached from both blocks needing a prologue/epilogue and
> 	blocks not needing such.
> 	* ifcvt.c (dead_or_predicable): Test both flag_shrink_wrap and
> 	HAVE_simple_return.
> 	* bb-reorder.c (get_uncond_jump_length): Make global.
> 	* bb-reorder.h (get_uncond_jump_length): Declare.
> 	* cfgrtl.c (rtl_create_basic_block): Comment typo fix.
> 	(rtl_split_edge): Likewise.  Warning fix.
> 	(rtl_duplicate_bb): New function.
> 	(rtl_cfg_hooks): Enable can_duplicate_block_p and duplicate_block.
> 
> Index: gcc/function.c
> ===================================================================
> --- gcc/function.c	(revision 180588)
> +++ gcc/function.c	(working copy)
> @@ -65,6 +65,8 @@ along with GCC; see the file COPYING3.  
>  #include "df.h"
>  #include "timevar.h"
>  #include "vecprim.h"
> +#include "params.h"
> +#include "bb-reorder.h"

This needs corresponding change in Makefile.in, function.o
needs $(PARAMS_H) bb-reorder.h dependencies added.

Ok for trunk with that change if no maintainer objects within 24 hours.

	Jakub

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: PowerPC shrink-wrap support 3 of 3
  2011-11-01 15:34                       ` Alan Modra
  2011-11-07 17:27                         ` Jakub Jelinek
@ 2011-11-09  9:48                         ` Hans-Peter Nilsson
  2011-11-10 11:25                           ` Revert "PowerPC shrink-wrap support 3 of 3" Hans-Peter Nilsson
  1 sibling, 1 reply; 48+ messages in thread
From: Hans-Peter Nilsson @ 2011-11-09  9:48 UTC (permalink / raw)
  To: amodra; +Cc: gcc-patches, bernds

> From: Alan Modra <amodra@gmail.com>
> Date: Tue, 1 Nov 2011 16:33:40 +0100

> On Tue, Nov 01, 2011 at 12:57:22AM +1030, Alan Modra wrote:

>         * function.c (bb_active_p): Delete.
>         (dup_block_and_redirect, active_insn_between): New functions.
>         (convert_jumps_to_returns, emit_return_for_exit): New functions,
>         split out from..
>         (thread_prologue_and_epilogue_insns): ..here.  Delete
>         shadowing variables.  Don't do prologue register clobber tests
>         when shrink wrapping already failed.  Delete all last_bb_active
>         code.  Instead compute tail block candidates for duplicating
>         exit path.  Remove these from antic set.  Duplicate tails when
>         reached from both blocks needing a prologue/epilogue and
>         blocks not needing such.
>         * ifcvt.c (dead_or_predicable): Test both flag_shrink_wrap and
>         HAVE_simple_return.
>         * bb-reorder.c (get_uncond_jump_length): Make global.
>         * bb-reorder.h (get_uncond_jump_length): Declare.
>         * cfgrtl.c (rtl_create_basic_block): Comment typo fix.
>         (rtl_split_edge): Likewise.  Warning fix.
>         (rtl_duplicate_bb): New function.
>         (rtl_cfg_hooks): Enable can_duplicate_block_p and duplicate_block.

This (a revision in the range 181187:181189) broke build for
cris-elf like so:

libtool: compile:  /tmp/hpautotest-gcc1/cris-elf/gccobj/./gcc/xgcc -shared-libgcc -B/tmp/hpautotest-gcc1/cris-elf/gccobj/./gcc -nostdinc++ -L/tmp/hpautotest-gcc1/cris-elf/gccobj/cris-elf/libstdc++-v3/src -L/tmp/hpautotest-gcc1/cris-elf/gccobj/cris-elf/libstdc++-v3/src/.libs -nostdinc -B/tmp/hpautotest-gcc1/cris-elf/gccobj/cris-elf/newlib/ -isystem /tmp/hpautotest-gcc1/cris-elf/gccobj/cris-elf/newlib/targ-include -isystem /tmp/hpautotest-gcc1/gcc/newlib/libc/include -B/tmp/hpautotest-gcc1/cris-elf/gccobj/cris-elf/libgloss/cris -L/tmp/hpautotest-gcc1/cris-elf/gccobj/cris-elf/libgloss/libnosys -L/tmp/hpautotest-gcc1/gcc/libgloss/cris -B/tmp/hpautotest-gcc1/cris-elf/pre/cris-elf/bin/ -B/tmp/hpautotest-gcc1/cris-elf/pre/cris-elf/lib/ -isystem /tmp/hpautotest-gcc1/cris-elf/pre/cris-elf/include -isystem /tmp/hpautotest-gcc1/cris-elf/pre/cris-elf/sys-include -I/tmp/hpautotest-gcc1/gcc/libstdc++-v3/../libgcc -I/tmp/hpautotest-gcc1/cris-elf/gccobj/cris-elf/libstdc++-v3/include/cris-elf
  -I/tmp/hpautotest-gcc1/cris-elf/gccobj/cris-elf/libstdc++-v3/include -I/tmp/hpautotest-gcc1/gcc/libstdc++-v3/libsupc++ -fno-implicit-templates -Wall -Wextra -Wwrite-strings -Wcast-qual -fdiagnostics-show-location=once -ffunction-sections -fdata-sections -frandom-seed=new_opv.lo -g -O2 -c /tmp/hpautotest-gcc1/gcc/libstdc++-v3/libsupc++/new_opv.cc -o new_opv.o
/tmp/hpautotest-gcc1/gcc/libstdc++-v3/libsupc++/new_opv.cc: In function 'void* operator new [](std::size_t)':
/tmp/hpautotest-gcc1/gcc/libstdc++-v3/libsupc++/new_opv.cc:34:1: error: missing REG_EH_REGION note in the end of bb 2
/tmp/hpautotest-gcc1/gcc/libstdc++-v3/libsupc++/new_opv.cc:34:1: error: call edges for non-call insn in bb 2
/tmp/hpautotest-gcc1/gcc/libstdc++-v3/libsupc++/new_opv.cc:34:1: error: in basic block 2:
/tmp/hpautotest-gcc1/gcc/libstdc++-v3/libsupc++/new_opv.cc:34:1: error: flow control insn inside a basic block
(call_insn 8 6 50 2 (parallel [
            (set (reg:SI 10 r10)
                (call (mem:QI (symbol_ref:SI ("_Znwm") [flags 0x41] <function_decl 0x7f94a90bf300 operator new>) [0 operator new S1 A8])
                    (const_int 0 [0])))
            (clobber (reg:SI 16 srp))
        ]) /tmp/hpautotest-gcc1/gcc/libstdc++-v3/libsupc++/new_opv.cc:33 222 {*expanded_call_value_non_v32}
     (expr_list:REG_EH_REGION (const_int 1 [0x1])
        (nil))
    (expr_list:REG_UNUSED (use (reg:SI 10 r10))
        (nil)))
/tmp/hpautotest-gcc1/gcc/libstdc++-v3/libsupc++/new_opv.cc:34:1: internal compiler error: in rtl_verify_flow_info_1, at cfgrtl.c:2001

See PR51051.

brgds, H-P

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Revert "PowerPC shrink-wrap support 3 of 3"
  2011-11-09  9:48                         ` Hans-Peter Nilsson
@ 2011-11-10 11:25                           ` Hans-Peter Nilsson
  2011-11-10 12:10                             ` Richard Guenther
  2015-04-08 11:11                             ` Gerald Pfeifer
  0 siblings, 2 replies; 48+ messages in thread
From: Hans-Peter Nilsson @ 2011-11-10 11:25 UTC (permalink / raw)
  To: amodra; +Cc: gcc-patches, bernds

> From: Hans-Peter Nilsson <hp@axis.com>
> Date: Wed, 9 Nov 2011 09:55:59 +0100

> > From: Alan Modra <amodra@gmail.com>
> > Date: Tue, 1 Nov 2011 16:33:40 +0100
> 
> > On Tue, Nov 01, 2011 at 12:57:22AM +1030, Alan Modra wrote:
> 
> >         * function.c (bb_active_p): Delete.
> >         (dup_block_and_redirect, active_insn_between): New functions.
> >         (convert_jumps_to_returns, emit_return_for_exit): New functions,
> >         split out from..
> >         (thread_prologue_and_epilogue_insns): ..here.  Delete
> >         shadowing variables.  Don't do prologue register clobber tests
> >         when shrink wrapping already failed.  Delete all last_bb_active
> >         code.  Instead compute tail block candidates for duplicating
> >         exit path.  Remove these from antic set.  Duplicate tails when
> >         reached from both blocks needing a prologue/epilogue and
> >         blocks not needing such.
> >         * ifcvt.c (dead_or_predicable): Test both flag_shrink_wrap and
> >         HAVE_simple_return.
> >         * bb-reorder.c (get_uncond_jump_length): Make global.
> >         * bb-reorder.h (get_uncond_jump_length): Declare.
> >         * cfgrtl.c (rtl_create_basic_block): Comment typo fix.
> >         (rtl_split_edge): Likewise.  Warning fix.
> >         (rtl_duplicate_bb): New function.
> >         (rtl_cfg_hooks): Enable can_duplicate_block_p and duplicate_block.
> 
> This (a revision in the range 181187:181189) broke build for
> cris-elf like so:
> See PR51051.

Given that this also broke arm-linux-gnueabi, a primary
platform, and Alan being absent until the 15th according to a
message on IRC, I move to revert r181188.

I think I need someone with appropriate write privileges to
agree with that, and to also give 48h for someone to fix the
problem.  Sorry for not forthcoming on the second point.

brgds, H-P
PS. where is the policy written down, besides the mailing list archives?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Revert "PowerPC shrink-wrap support 3 of 3"
  2011-11-10 11:25                           ` Revert "PowerPC shrink-wrap support 3 of 3" Hans-Peter Nilsson
@ 2011-11-10 12:10                             ` Richard Guenther
  2011-11-10 13:29                               ` Hans-Peter Nilsson
  2015-04-08 11:11                             ` Gerald Pfeifer
  1 sibling, 1 reply; 48+ messages in thread
From: Richard Guenther @ 2011-11-10 12:10 UTC (permalink / raw)
  To: Hans-Peter Nilsson; +Cc: amodra, gcc-patches, bernds

On Thu, Nov 10, 2011 at 11:38 AM, Hans-Peter Nilsson
<hans-peter.nilsson@axis.com> wrote:
>> From: Hans-Peter Nilsson <hp@axis.com>
>> Date: Wed, 9 Nov 2011 09:55:59 +0100
>
>> > From: Alan Modra <amodra@gmail.com>
>> > Date: Tue, 1 Nov 2011 16:33:40 +0100
>>
>> > On Tue, Nov 01, 2011 at 12:57:22AM +1030, Alan Modra wrote:
>>
>> >         * function.c (bb_active_p): Delete.
>> >         (dup_block_and_redirect, active_insn_between): New functions.
>> >         (convert_jumps_to_returns, emit_return_for_exit): New functions,
>> >         split out from..
>> >         (thread_prologue_and_epilogue_insns): ..here.  Delete
>> >         shadowing variables.  Don't do prologue register clobber tests
>> >         when shrink wrapping already failed.  Delete all last_bb_active
>> >         code.  Instead compute tail block candidates for duplicating
>> >         exit path.  Remove these from antic set.  Duplicate tails when
>> >         reached from both blocks needing a prologue/epilogue and
>> >         blocks not needing such.
>> >         * ifcvt.c (dead_or_predicable): Test both flag_shrink_wrap and
>> >         HAVE_simple_return.
>> >         * bb-reorder.c (get_uncond_jump_length): Make global.
>> >         * bb-reorder.h (get_uncond_jump_length): Declare.
>> >         * cfgrtl.c (rtl_create_basic_block): Comment typo fix.
>> >         (rtl_split_edge): Likewise.  Warning fix.
>> >         (rtl_duplicate_bb): New function.
>> >         (rtl_cfg_hooks): Enable can_duplicate_block_p and duplicate_block.
>>
>> This (a revision in the range 181187:181189) broke build for
>> cris-elf like so:
>> See PR51051.
>
> Given that this also broke arm-linux-gnueabi, a primary
> platform, and Alan being absent until the 15th according to a
> message on IRC, I move to revert r181188.

Is there a PR for the arm issue?

> I think I need someone with appropriate write privileges to
> agree with that, and to also give 48h for someone to fix the
> problem.  Sorry for not forthcoming on the second point.

Did you or somebody else try to look into the problem?  To decide
whether it's the "best course of action" it would be nice to know if
it's a simple error in the patch that is easy to fix.

> brgds, H-P
> PS. where is the policy written down, besides the mailing list archives?

http://gcc.gnu.org/develop.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Revert "PowerPC shrink-wrap support 3 of 3"
  2011-11-10 12:10                             ` Richard Guenther
@ 2011-11-10 13:29                               ` Hans-Peter Nilsson
  2011-11-10 13:44                                 ` Richard Guenther
  0 siblings, 1 reply; 48+ messages in thread
From: Hans-Peter Nilsson @ 2011-11-10 13:29 UTC (permalink / raw)
  To: richard.guenther; +Cc: hp, amodra, gcc-patches, bernds

> From: Richard Guenther <richard.guenther@gmail.com>
> Date: Thu, 10 Nov 2011 12:22:56 +0100

> On Thu, Nov 10, 2011 at 11:38 AM, Hans-Peter Nilsson
> <hans-peter.nilsson@axis.com> wrote:
> >> From: Hans-Peter Nilsson <hp@axis.com>
> >> Date: Wed, 9 Nov 2011 09:55:59 +0100
> >
> >> > From: Alan Modra <amodra@gmail.com>
> >> > Date: Tue, 1 Nov 2011 16:33:40 +0100
> >>
> >> > On Tue, Nov 01, 2011 at 12:57:22AM +1030, Alan Modra wrote:
> >>
> >> >         * function.c (bb_active_p): Delete.
> >> >         (dup_block_and_redirect, active_insn_between): New functions.
> >> >         (convert_jumps_to_returns, emit_return_for_exit): New functions,
> >> >         split out from..
> >> >         (thread_prologue_and_epilogue_insns): ..here.  Delete
> >> >         shadowing variables.  Don't do prologue register clobber tests
> >> >         when shrink wrapping already failed.  Delete all last_bb_active
> >> >         code.  Instead compute tail block candidates for duplicating
> >> >         exit path.  Remove these from antic set.  Duplicate tails when
> >> >         reached from both blocks needing a prologue/epilogue and
> >> >         blocks not needing such.
> >> >         * ifcvt.c (dead_or_predicable): Test both flag_shrink_wrap and
> >> >         HAVE_simple_return.
> >> >         * bb-reorder.c (get_uncond_jump_length): Make global.
> >> >         * bb-reorder.h (get_uncond_jump_length): Declare.
> >> >         * cfgrtl.c (rtl_create_basic_block): Comment typo fix.
> >> >         (rtl_split_edge): Likewise.  Warning fix.
> >> >         (rtl_duplicate_bb): New function.
> >> >         (rtl_cfg_hooks): Enable can_duplicate_block_p and duplicate_block.
> >>
> >> This (a revision in the range 181187:181189) broke build for
> >> cris-elf like so:
> >> See PR51051.
> >
> > Given that this also broke arm-linux-gnueabi, a primary
> > platform, and Alan being absent until the 15th according to a
> > message on IRC, I move to revert r181188.
> 
> Is there a PR for the arm issue?

It's covered by the same PR, see comment #1.
I've now updated the target field.

> > I think I need someone with appropriate write privileges to
> > agree with that, and to also give 48h for someone to fix the
> > problem.  Sorry for not forthcoming on the second point.
> 
> Did you or somebody else try to look into the problem?  To decide
> whether it's the "best course of action" it would be nice to know if
> it's a simple error in the patch that is easy to fix.

Nope, not really.  Wouldn't FWIW, de jure matter, me not having
write privileges to the affected area.  Though, I had a quick
look at the patch and nothing stood out except its
intrusiveness, and it seems the patch wasn't tested on a
!simple_return target (just "powerpc-linux" according to the
replied-to message).

brgds, H-P

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Revert "PowerPC shrink-wrap support 3 of 3"
  2011-11-10 13:29                               ` Hans-Peter Nilsson
@ 2011-11-10 13:44                                 ` Richard Guenther
  2011-11-10 14:13                                   ` Bernd Schmidt
  0 siblings, 1 reply; 48+ messages in thread
From: Richard Guenther @ 2011-11-10 13:44 UTC (permalink / raw)
  To: Hans-Peter Nilsson; +Cc: hp, amodra, gcc-patches, bernds

On Thu, Nov 10, 2011 at 12:43 PM, Hans-Peter Nilsson
<hans-peter.nilsson@axis.com> wrote:
>> From: Richard Guenther <richard.guenther@gmail.com>
>> Date: Thu, 10 Nov 2011 12:22:56 +0100
>
>> On Thu, Nov 10, 2011 at 11:38 AM, Hans-Peter Nilsson
>> <hans-peter.nilsson@axis.com> wrote:
>> >> From: Hans-Peter Nilsson <hp@axis.com>
>> >> Date: Wed, 9 Nov 2011 09:55:59 +0100
>> >
>> >> > From: Alan Modra <amodra@gmail.com>
>> >> > Date: Tue, 1 Nov 2011 16:33:40 +0100
>> >>
>> >> > On Tue, Nov 01, 2011 at 12:57:22AM +1030, Alan Modra wrote:
>> >>
>> >> >         * function.c (bb_active_p): Delete.
>> >> >         (dup_block_and_redirect, active_insn_between): New functions.
>> >> >         (convert_jumps_to_returns, emit_return_for_exit): New functions,
>> >> >         split out from..
>> >> >         (thread_prologue_and_epilogue_insns): ..here.  Delete
>> >> >         shadowing variables.  Don't do prologue register clobber tests
>> >> >         when shrink wrapping already failed.  Delete all last_bb_active
>> >> >         code.  Instead compute tail block candidates for duplicating
>> >> >         exit path.  Remove these from antic set.  Duplicate tails when
>> >> >         reached from both blocks needing a prologue/epilogue and
>> >> >         blocks not needing such.
>> >> >         * ifcvt.c (dead_or_predicable): Test both flag_shrink_wrap and
>> >> >         HAVE_simple_return.
>> >> >         * bb-reorder.c (get_uncond_jump_length): Make global.
>> >> >         * bb-reorder.h (get_uncond_jump_length): Declare.
>> >> >         * cfgrtl.c (rtl_create_basic_block): Comment typo fix.
>> >> >         (rtl_split_edge): Likewise.  Warning fix.
>> >> >         (rtl_duplicate_bb): New function.
>> >> >         (rtl_cfg_hooks): Enable can_duplicate_block_p and duplicate_block.
>> >>
>> >> This (a revision in the range 181187:181189) broke build for
>> >> cris-elf like so:
>> >> See PR51051.
>> >
>> > Given that this also broke arm-linux-gnueabi, a primary
>> > platform, and Alan being absent until the 15th according to a
>> > message on IRC, I move to revert r181188.
>>
>> Is there a PR for the arm issue?
>
> It's covered by the same PR, see comment #1.
> I've now updated the target field.
>
>> > I think I need someone with appropriate write privileges to
>> > agree with that, and to also give 48h for someone to fix the
>> > problem.  Sorry for not forthcoming on the second point.
>>
>> Did you or somebody else try to look into the problem?  To decide
>> whether it's the "best course of action" it would be nice to know if
>> it's a simple error in the patch that is easy to fix.
>
> Nope, not really.  Wouldn't FWIW, de jure matter, me not having
> write privileges to the affected area.  Though, I had a quick
> look at the patch and nothing stood out except its
> intrusiveness, and it seems the patch wasn't tested on a
> !simple_return target (just "powerpc-linux" according to the
> replied-to message).

Fair enough.  You can count me as "one" then, and I'll defer to Bernd
to either provide a fix or ack the revert.

Thanks,
Richard.

> brgds, H-P
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Revert "PowerPC shrink-wrap support 3 of 3"
  2011-11-10 13:44                                 ` Richard Guenther
@ 2011-11-10 14:13                                   ` Bernd Schmidt
  2011-11-10 15:23                                     ` Hans-Peter Nilsson
  2011-11-11  0:22                                     ` Revert " Michael Meissner
  0 siblings, 2 replies; 48+ messages in thread
From: Bernd Schmidt @ 2011-11-10 14:13 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Hans-Peter Nilsson, hp, amodra, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1032 bytes --]

On 11/10/11 13:14, Richard Guenther wrote:
> Fair enough.  You can count me as "one" then, and I'll defer to Bernd
> to either provide a fix or ack the revert.

I'm trying to track it down.

In 189r.outof_cfglayout, we have

(insn 31 33 35 3 (use (reg/i:SI 0 r0))
../../../../baseline-trunk/libstdc++-v3/libsupc++/new_opv.cc:34 -1
     (nil))

;; Successors:  EXIT [100.0%]  (fallthru)
;; lr  out       0 [r0] 11 [fp] 13 [sp] 14 [lr] 25 [sfp] 26 [afp]
;; live  out     0 [r0] 11 [fp] 13 [sp] 25 [sfp] 26 [afp]

followed by a number of other basic blocks, so that looks wrong to me.
outof_cfglayout seems to assume that fallthrough edges to the exit block
are OK and don't need fixing up, and changing that seems nontrivial at
first glance.

The situation is first created during cfgcleanup in into_cfglayout. The
following patch makes the testcase compile by stopping the compiler from
moving the exit fallthru block around, but I've not checked whether it
has a negative effect on code quality. HP, can you run full tests?


Bernd

[-- Attachment #2: cfgl-exit.diff --]
[-- Type: text/plain, Size: 856 bytes --]

Index: ../baseline-trunk/gcc/cfgrtl.c
===================================================================
--- ../baseline-trunk/gcc/cfgrtl.c	(revision 181252)
+++ ../baseline-trunk/gcc/cfgrtl.c	(working copy)
@@ -2735,6 +2735,16 @@ cfg_layout_can_merge_blocks_p (basic_blo
   if (BB_PARTITION (a) != BB_PARTITION (b))
     return false;
 
+  /* If we would end up moving B's instructions, make sure it doesn't fall
+     through into the exit block, since we cannot recover from a fallthrough
+     edge into the exit block occurring in the middle of a function.  */
+  if (NEXT_INSN (BB_END (a)) != BB_HEAD (b))
+    {
+      edge e = find_fallthru_edge (b->succs);
+      if (e && e->dest == EXIT_BLOCK_PTR)
+	return false;
+    }
+
   /* There must be exactly one edge in between the blocks.  */
   return (single_succ_p (a)
 	  && single_succ (a) == b

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Revert "PowerPC shrink-wrap support 3 of 3"
  2011-11-10 14:13                                   ` Bernd Schmidt
@ 2011-11-10 15:23                                     ` Hans-Peter Nilsson
  2011-11-10 18:06                                       ` Hans-Peter Nilsson
  2011-11-11  0:22                                     ` Revert " Michael Meissner
  1 sibling, 1 reply; 48+ messages in thread
From: Hans-Peter Nilsson @ 2011-11-10 15:23 UTC (permalink / raw)
  To: bernds; +Cc: richard.guenther, amodra, gcc-patches

> From: Bernd Schmidt <bernds@codesourcery.com>
> Date: Thu, 10 Nov 2011 14:29:04 +0100

> HP, can you run full tests?

Cross-test to cris-elf in progress.
Thanks!

brgds, H-P

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Revert "PowerPC shrink-wrap support 3 of 3"
  2011-11-10 15:23                                     ` Hans-Peter Nilsson
@ 2011-11-10 18:06                                       ` Hans-Peter Nilsson
  2011-11-11 22:09                                         ` Hans-Peter Nilsson
  0 siblings, 1 reply; 48+ messages in thread
From: Hans-Peter Nilsson @ 2011-11-10 18:06 UTC (permalink / raw)
  To: bernds; +Cc: richard.guenther, amodra, gcc-patches

> From: Hans-Peter Nilsson <hp@axis.com>
> Date: Thu, 10 Nov 2011 15:12:54 +0100

> > From: Bernd Schmidt <bernds@codesourcery.com>
> > Date: Thu, 10 Nov 2011 14:29:04 +0100
> 
> > HP, can you run full tests?
> 
> Cross-test to cris-elf in progress.
> Thanks!

Works, no regressions compared to before the breakage (r181187).
Thanks!  According to
<http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51051#c3> it fixes
building for arm-unknown-linux-gnueabi too.

brgds, H-P

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Revert "PowerPC shrink-wrap support 3 of 3"
  2011-11-10 14:13                                   ` Bernd Schmidt
  2011-11-10 15:23                                     ` Hans-Peter Nilsson
@ 2011-11-11  0:22                                     ` Michael Meissner
  1 sibling, 0 replies; 48+ messages in thread
From: Michael Meissner @ 2011-11-11  0:22 UTC (permalink / raw)
  To: Bernd Schmidt
  Cc: Richard Guenther, Hans-Peter Nilsson, hp, amodra, gcc-patches

On Thu, Nov 10, 2011 at 02:29:04PM +0100, Bernd Schmidt wrote:
> On 11/10/11 13:14, Richard Guenther wrote:
> > Fair enough.  You can count me as "one" then, and I'll defer to Bernd
> > to either provide a fix or ack the revert.
> 
> I'm trying to track it down.
> 
> In 189r.outof_cfglayout, we have
> 
> (insn 31 33 35 3 (use (reg/i:SI 0 r0))
> ../../../../baseline-trunk/libstdc++-v3/libsupc++/new_opv.cc:34 -1
>      (nil))
> 
> ;; Successors:  EXIT [100.0%]  (fallthru)
> ;; lr  out       0 [r0] 11 [fp] 13 [sp] 14 [lr] 25 [sfp] 26 [afp]
> ;; live  out     0 [r0] 11 [fp] 13 [sp] 25 [sfp] 26 [afp]
> 
> followed by a number of other basic blocks, so that looks wrong to me.
> outof_cfglayout seems to assume that fallthrough edges to the exit block
> are OK and don't need fixing up, and changing that seems nontrivial at
> first glance.
> 
> The situation is first created during cfgcleanup in into_cfglayout. The
> following patch makes the testcase compile by stopping the compiler from
> moving the exit fallthru block around, but I've not checked whether it
> has a negative effect on code quality. HP, can you run full tests?

FWIW, I did bootstrap and make check with/without this patch, and it introduces
no regressions in the PowerPC, but I haven't look at the code generated.

-- 
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
meissner@linux.vnet.ibm.com	fax +1 (978) 399-6899

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Revert "PowerPC shrink-wrap support 3 of 3"
  2011-11-10 18:06                                       ` Hans-Peter Nilsson
@ 2011-11-11 22:09                                         ` Hans-Peter Nilsson
  2011-11-14 14:59                                           ` Bernd Schmidt
  0 siblings, 1 reply; 48+ messages in thread
From: Hans-Peter Nilsson @ 2011-11-11 22:09 UTC (permalink / raw)
  To: bernds; +Cc: gcc-patches

> From: Hans-Peter Nilsson <hp@axis.com>
> Date: Thu, 10 Nov 2011 18:52:39 +0100

> > From: Hans-Peter Nilsson <hp@axis.com>
> > Date: Thu, 10 Nov 2011 15:12:54 +0100
> 
> > > From: Bernd Schmidt <bernds@codesourcery.com>
> > > Date: Thu, 10 Nov 2011 14:29:04 +0100
> > 
> > > HP, can you run full tests?
> > 
> > Cross-test to cris-elf in progress.
> > Thanks!
> 
> Works, no regressions compared to before the breakage (r181187).
> Thanks!  According to
> <http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51051#c3> it fixes
> building for arm-unknown-linux-gnueabi too.

AFAICT, your patch has got sufficiently testing now (on three
targets to boot) to be considered safe to check in.  Or is
something amiss?

(If it's the unchecked code quality you mentioned, that can be
just as well dealt with having the tree in a working state;
having the tree broken for some targets accomplishes nothing.)

brgds, H-P

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Revert "PowerPC shrink-wrap support 3 of 3"
  2011-11-11 22:09                                         ` Hans-Peter Nilsson
@ 2011-11-14 14:59                                           ` Bernd Schmidt
  2011-11-14 16:49                                             ` CFG review needed for fix of " Hans-Peter Nilsson
  0 siblings, 1 reply; 48+ messages in thread
From: Bernd Schmidt @ 2011-11-14 14:59 UTC (permalink / raw)
  To: Hans-Peter Nilsson; +Cc: gcc-patches

On 11/11/11 20:13, Hans-Peter Nilsson wrote:
> AFAICT, your patch has got sufficiently testing now (on three
> targets to boot) to be considered safe to check in.  Or is
> something amiss?
> 
> (If it's the unchecked code quality you mentioned, that can be
> just as well dealt with having the tree in a working state;
> having the tree broken for some targets accomplishes nothing.)

I briefly looked at that, and while it does tend to move stuff around
compared to an unpatched compiler, the effects don't seem to be all that
bad. If we find a testcase where it's a real problem, we could do more
aggressive reordering after epilogue generation, when we no longer have
exit fallthroguh edges. So I'd like to ask for an OK here.


Bernd

^ permalink raw reply	[flat|nested] 48+ messages in thread

* CFG review needed for fix of "PowerPC shrink-wrap support 3 of 3"
  2011-11-14 14:59                                           ` Bernd Schmidt
@ 2011-11-14 16:49                                             ` Hans-Peter Nilsson
  2011-11-14 17:06                                               ` Ramana Radhakrishnan
  2011-11-14 18:21                                               ` Richard Henderson
  0 siblings, 2 replies; 48+ messages in thread
From: Hans-Peter Nilsson @ 2011-11-14 16:49 UTC (permalink / raw)
  To: bernds; +Cc: hp, gcc-patches

> From: Bernd Schmidt <bernds@codesourcery.com>
> Date: Mon, 14 Nov 2011 10:51:56 +0100

> On 11/11/11 20:13, Hans-Peter Nilsson wrote:
> > AFAICT, your patch has got sufficiently testing now (on three
> > targets to boot) to be considered safe to check in.  Or is
> > something amiss?
> > 
> > (If it's the unchecked code quality you mentioned, that can be
> > just as well dealt with having the tree in a working state;
> > having the tree broken for some targets accomplishes nothing.)
> 
> I briefly looked at that, and while it does tend to move stuff around
> compared to an unpatched compiler, the effects don't seem to be all that
> bad. If we find a testcase where it's a real problem, we could do more
> aggressive reordering after epilogue generation, when we no longer have
> exit fallthroguh edges. So I'd like to ask for an OK here.

Looks like all we need is a positive review of
<http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01409.html> and a
ChangeLog entry to unbreak three or more targets.

Someone with approval rights: pretty please?

brgds, H-P

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: CFG review needed for fix of "PowerPC shrink-wrap support 3 of 3"
  2011-11-14 16:49                                             ` CFG review needed for fix of " Hans-Peter Nilsson
@ 2011-11-14 17:06                                               ` Ramana Radhakrishnan
  2011-11-14 17:15                                                 ` Rainer Orth
  2011-11-14 18:21                                               ` Richard Henderson
  1 sibling, 1 reply; 48+ messages in thread
From: Ramana Radhakrishnan @ 2011-11-14 17:06 UTC (permalink / raw)
  To: gcc-patches; +Cc: bernds, hp

>
> Someone with approval rights: pretty please?

Can I add my +1 "pretty please" as well here :) ? According to #c3
this fixes arm-linux-gnueabi cross-builds for C++ as well and
potentially allows this to bootstrap again. I have kicked off a
bootstrap and test run on arm-linux-gnueabi .

cheers
Ramana

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: CFG review needed for fix of "PowerPC shrink-wrap support 3 of 3"
  2011-11-14 17:06                                               ` Ramana Radhakrishnan
@ 2011-11-14 17:15                                                 ` Rainer Orth
  0 siblings, 0 replies; 48+ messages in thread
From: Rainer Orth @ 2011-11-14 17:15 UTC (permalink / raw)
  To: Ramana Radhakrishnan; +Cc: gcc-patches, bernds, hp

Ramana Radhakrishnan <ramana.radhakrishnan@linaro.org> writes:

>> Someone with approval rights: pretty please?
>
> Can I add my +1 "pretty please" as well here :) ? According to #c3
> this fixes arm-linux-gnueabi cross-builds for C++ as well and
> potentially allows this to bootstrap again. I have kicked off a
> bootstrap and test run on arm-linux-gnueabi .

It's better than before, but Ada bootstrap (on Solaris/SPARC, cf. PR
bootstrap/51086) still fails with the same ICE.

	Rainer

-- 
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: CFG review needed for fix of "PowerPC shrink-wrap support 3 of 3"
  2011-11-14 16:49                                             ` CFG review needed for fix of " Hans-Peter Nilsson
  2011-11-14 17:06                                               ` Ramana Radhakrishnan
@ 2011-11-14 18:21                                               ` Richard Henderson
  2011-11-14 22:44                                                 ` Alan Modra
  2011-11-15  0:45                                                 ` Hans-Peter Nilsson
  1 sibling, 2 replies; 48+ messages in thread
From: Richard Henderson @ 2011-11-14 18:21 UTC (permalink / raw)
  To: Hans-Peter Nilsson; +Cc: bernds, hp, gcc-patches

On 11/14/2011 04:10 AM, Hans-Peter Nilsson wrote:
> Looks like all we need is a positive review of
> <http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01409.html> and a
> ChangeLog entry to unbreak three or more targets.
> 
> Someone with approval rights: pretty please?

That patch is ok.


r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: CFG review needed for fix of "PowerPC shrink-wrap support 3 of 3"
  2011-11-14 18:21                                               ` Richard Henderson
@ 2011-11-14 22:44                                                 ` Alan Modra
  2011-11-15  2:50                                                   ` Hans-Peter Nilsson
                                                                     ` (2 more replies)
  2011-11-15  0:45                                                 ` Hans-Peter Nilsson
  1 sibling, 3 replies; 48+ messages in thread
From: Alan Modra @ 2011-11-14 22:44 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Hans-Peter Nilsson, bernds, hp, gcc-patches

On Mon, Nov 14, 2011 at 07:48:03AM -1000, Richard Henderson wrote:
> On 11/14/2011 04:10 AM, Hans-Peter Nilsson wrote:
> > Looks like all we need is a positive review of
> > <http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01409.html> and a
> > ChangeLog entry to unbreak three or more targets.
> > 
> > Someone with approval rights: pretty please?
> 
> That patch is ok.

Sorry for the bootstrap problems.  I believe the cause is my
extraction of convert_jumps_to_returns and emit_return_for_exit.  The
new HAVE_return code misses a single_succ_p test (covered by the
bitmap_bit_p (&bb_flags, ...) test in the HAVE_simple_return case).

I haven't really looked into what Bernd's fix does.  I know this one
fixes what I broke..

	* function.c (thread_prologue_and_epilogue_insns): Guard
	emitting return with single_succ_p test.

Index: gcc/function.c
===================================================================
--- gcc/function.c	(revision 181188)
+++ gcc/function.c	(working copy)
@@ -6230,7 +6230,8 @@ thread_prologue_and_epilogue_insns (void
 	      && !active_insn_between (BB_HEAD (last_bb), BB_END (last_bb)))
 	    convert_jumps_to_returns (last_bb, false, NULL);
 
-	  if (EDGE_COUNT (exit_fallthru_edge->src->preds) != 0)
+	  if (EDGE_COUNT (last_bb->preds) != 0
+	      && single_succ_p (last_bb))
 	    {
 	      last_bb = emit_return_for_exit (exit_fallthru_edge, false);
 	      epilogue_end = returnjump = BB_END (last_bb);

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: CFG review needed for fix of "PowerPC shrink-wrap support 3 of 3"
  2011-11-14 18:21                                               ` Richard Henderson
  2011-11-14 22:44                                                 ` Alan Modra
@ 2011-11-15  0:45                                                 ` Hans-Peter Nilsson
  1 sibling, 0 replies; 48+ messages in thread
From: Hans-Peter Nilsson @ 2011-11-15  0:45 UTC (permalink / raw)
  To: gcc-patches; +Cc: bernds

> From: Richard Henderson <rth@redhat.com>
> Date: Mon, 14 Nov 2011 18:48:03 +0100

> On 11/14/2011 04:10 AM, Hans-Peter Nilsson wrote:
> > Looks like all we need is a positive review of
> > <http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01409.html> and a
> > ChangeLog entry to unbreak three or more targets.
> > 
> > Someone with approval rights: pretty please?
> 
> That patch is ok.

Committed with this ChangeLog entry, obvious from the function
header and the added comment.

2011-11-15  Bernd Schmidt  <bernds@codesourcery.com>

	PR rtl-optimization/51051
	* cfgrtl.c (cfg_layout_can_merge_blocks_p): Return FALSE if the
	move would cause fallthrough into the exit block.

brgds, H-P

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: CFG review needed for fix of "PowerPC shrink-wrap support 3 of 3"
  2011-11-14 22:44                                                 ` Alan Modra
@ 2011-11-15  2:50                                                   ` Hans-Peter Nilsson
  2011-11-15  5:03                                                   ` Richard Henderson
  2011-11-15 18:54                                                   ` Richard Henderson
  2 siblings, 0 replies; 48+ messages in thread
From: Hans-Peter Nilsson @ 2011-11-15  2:50 UTC (permalink / raw)
  To: amodra; +Cc: gcc-patches

> From: Alan Modra <amodra@gmail.com>
> Date: Mon, 14 Nov 2011 22:56:48 +0100

> I haven't really looked into what Bernd's fix does.  I know this one
> fixes what I broke..

Hm...  Oh well, I'm trusting RTH and Bernd that it fixed a real issue.
Thanks for looking (and a belated thanks to RTH for the review).

brgds, H-P

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: CFG review needed for fix of "PowerPC shrink-wrap support 3 of 3"
  2011-11-14 22:44                                                 ` Alan Modra
  2011-11-15  2:50                                                   ` Hans-Peter Nilsson
@ 2011-11-15  5:03                                                   ` Richard Henderson
  2011-11-15  6:11                                                     ` Bernd Schmidt
  2011-11-15 18:54                                                   ` Richard Henderson
  2 siblings, 1 reply; 48+ messages in thread
From: Richard Henderson @ 2011-11-15  5:03 UTC (permalink / raw)
  To: Hans-Peter Nilsson, bernds, hp, gcc-patches

On 11/14/2011 11:56 AM, Alan Modra wrote:
> On Mon, Nov 14, 2011 at 07:48:03AM -1000, Richard Henderson wrote:
>> On 11/14/2011 04:10 AM, Hans-Peter Nilsson wrote:
>>> Looks like all we need is a positive review of
>>> <http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01409.html> and a
>>> ChangeLog entry to unbreak three or more targets.
>>>
>>> Someone with approval rights: pretty please?
>>
>> That patch is ok.
> 
> Sorry for the bootstrap problems.  I believe the cause is my
> extraction of convert_jumps_to_returns and emit_return_for_exit.  The
> new HAVE_return code misses a single_succ_p test (covered by the
> bitmap_bit_p (&bb_flags, ...) test in the HAVE_simple_return case).
> 
> I haven't really looked into what Bernd's fix does.  I know this one
> fixes what I broke..
> 
> 	* function.c (thread_prologue_and_epilogue_insns): Guard
> 	emitting return with single_succ_p test.


Hmm.  This looks plausible too.  

Bernd's patch made sure that cfglayout didn't do something impossible.
Of course, it's possible that his patch should merely be an assert,
and the correct fix goes here.

Thoughts?


r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: CFG review needed for fix of "PowerPC shrink-wrap support 3 of 3"
  2011-11-15  5:03                                                   ` Richard Henderson
@ 2011-11-15  6:11                                                     ` Bernd Schmidt
  2011-11-15  7:09                                                       ` David Miller
  0 siblings, 1 reply; 48+ messages in thread
From: Bernd Schmidt @ 2011-11-15  6:11 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Hans-Peter Nilsson, hp, gcc-patches

On 11/15/11 01:43, Richard Henderson wrote:
> On 11/14/2011 11:56 AM, Alan Modra wrote:
>> 	* function.c (thread_prologue_and_epilogue_insns): Guard
>> 	emitting return with single_succ_p test.
> 
> Hmm.  This looks plausible too.  
> 
> Bernd's patch made sure that cfglayout didn't do something impossible.
> Of course, it's possible that his patch should merely be an assert,
> and the correct fix goes here.

I haven't really looked at Alan's patch; I'm thinking there were
probably two bugs - the one I found being latent before. Note that the
problematic CFG exists way before thread_prologue_and_epilogue even
runs, so an assert would still trigger.


Bernd

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: CFG review needed for fix of "PowerPC shrink-wrap support 3 of 3"
  2011-11-15  6:11                                                     ` Bernd Schmidt
@ 2011-11-15  7:09                                                       ` David Miller
  0 siblings, 0 replies; 48+ messages in thread
From: David Miller @ 2011-11-15  7:09 UTC (permalink / raw)
  To: bernds; +Cc: rth, hans-peter.nilsson, hp, gcc-patches

From: Bernd Schmidt <bernds@codesourcery.com>
Date: Tue, 15 Nov 2011 01:54:34 +0100

> On 11/15/11 01:43, Richard Henderson wrote:
>> On 11/14/2011 11:56 AM, Alan Modra wrote:
>>> 	* function.c (thread_prologue_and_epilogue_insns): Guard
>>> 	emitting return with single_succ_p test.
>> 
>> Hmm.  This looks plausible too.  
>> 
>> Bernd's patch made sure that cfglayout didn't do something impossible.
>> Of course, it's possible that his patch should merely be an assert,
>> and the correct fix goes here.
> 
> I haven't really looked at Alan's patch; I'm thinking there were
> probably two bugs - the one I found being latent before. Note that the
> problematic CFG exists way before thread_prologue_and_epilogue even
> runs, so an assert would still trigger.

I suspect that Alan's fix here will take care of the sparc ada bootstrap
regression reported, and regardless if it's correct it should be installed.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: CFG review needed for fix of "PowerPC shrink-wrap support 3 of 3"
  2011-11-14 22:44                                                 ` Alan Modra
  2011-11-15  2:50                                                   ` Hans-Peter Nilsson
  2011-11-15  5:03                                                   ` Richard Henderson
@ 2011-11-15 18:54                                                   ` Richard Henderson
  2 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 2011-11-15 18:54 UTC (permalink / raw)
  To: Hans-Peter Nilsson, bernds, hp, gcc-patches

On 11/14/2011 11:56 AM, Alan Modra wrote:
> 	* function.c (thread_prologue_and_epilogue_insns): Guard
> 	emitting return with single_succ_p test.

Ok.


r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Revert "PowerPC shrink-wrap support 3 of 3"
  2011-11-10 11:25                           ` Revert "PowerPC shrink-wrap support 3 of 3" Hans-Peter Nilsson
  2011-11-10 12:10                             ` Richard Guenther
@ 2015-04-08 11:11                             ` Gerald Pfeifer
  1 sibling, 0 replies; 48+ messages in thread
From: Gerald Pfeifer @ 2015-04-08 11:11 UTC (permalink / raw)
  To: Hans-Peter Nilsson; +Cc: Alan Modra, gcc-patches, Bernd Schmidt

On Thu, 10 Nov 2011, Hans-Peter Nilsson wrote:
> I think I need someone with appropriate write privileges to
> agree with that, and to also give 48h for someone to fix the
> problem.  Sorry for not forthcoming on the second point.
> 
> brgds, H-P
> PS. where is the policy written down, besides the mailing list archives?

https://gcc.gnu.org/develop.html has had the following since 2003 or so:

  Patch Reversion

  If a patch is committed which introduces a regression on any target 
  which the Steering Committee considers to be important and if:

   * the problem is reported to the original poster;
   * 48 hours pass without the original poster or any other party 
     indicating that a fix will be forthcoming in the very near future;
   * two people with write privileges to the affected area of the compiler 
     determine that the best course of action is to revert the patch;

  then they may revert the patch.

  (The list of important targets will be revised at the beginning of each 
  release cycle, if necessary, and is part of the release criteria.)

  After the patch has been reverted, the poster may appeal the decision to 
  the Steering Committee.

  Note that no distinction is made between patches which are themselves 
  buggy and patches that expose latent bugs elsewhere in the compiler.

This is there as part of the overall (release) methodology, though
svnwrite.html would be at least as natural.  Perhaps a reference
there?  Thoughts?

Gerald

PS: Yes, I am catching up with some older mails. ;-)

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2015-04-08 11:11 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-17  7:59 PowerPC shrink-wrap support 0 of 3 Alan Modra
2011-09-17  8:22 ` PowerPC shrink-wrap support 1 " Alan Modra
2011-09-17  9:13 ` PowerPC shrink-wrap support 2 " Alan Modra
2011-09-17  9:26 ` PowerPC shrink-wrap support 3 " Alan Modra
2011-09-26 14:25   ` Alan Modra
2011-09-27  0:15     ` Alan Modra
2011-09-27  0:19       ` Bernd Schmidt
2011-09-27  0:49         ` Alan Modra
2011-09-27  1:08           ` Bernd Schmidt
2011-09-27  2:16             ` Alan Modra
2011-09-28 16:35               ` Alan Modra
2011-10-16 20:19                 ` David Edelsohn
2011-10-26 13:03                   ` Alan Modra
2011-10-26 13:42                     ` Bernd Schmidt
2011-10-26 14:40                       ` Alan Modra
2011-10-26 14:44                         ` Bernd Schmidt
2011-10-26 15:40                           ` Alan Modra
2011-10-28  0:41                         ` Alan Modra
2011-10-31 15:14                     ` Alan Modra
2011-11-01 15:34                       ` Alan Modra
2011-11-07 17:27                         ` Jakub Jelinek
2011-11-09  9:48                         ` Hans-Peter Nilsson
2011-11-10 11:25                           ` Revert "PowerPC shrink-wrap support 3 of 3" Hans-Peter Nilsson
2011-11-10 12:10                             ` Richard Guenther
2011-11-10 13:29                               ` Hans-Peter Nilsson
2011-11-10 13:44                                 ` Richard Guenther
2011-11-10 14:13                                   ` Bernd Schmidt
2011-11-10 15:23                                     ` Hans-Peter Nilsson
2011-11-10 18:06                                       ` Hans-Peter Nilsson
2011-11-11 22:09                                         ` Hans-Peter Nilsson
2011-11-14 14:59                                           ` Bernd Schmidt
2011-11-14 16:49                                             ` CFG review needed for fix of " Hans-Peter Nilsson
2011-11-14 17:06                                               ` Ramana Radhakrishnan
2011-11-14 17:15                                                 ` Rainer Orth
2011-11-14 18:21                                               ` Richard Henderson
2011-11-14 22:44                                                 ` Alan Modra
2011-11-15  2:50                                                   ` Hans-Peter Nilsson
2011-11-15  5:03                                                   ` Richard Henderson
2011-11-15  6:11                                                     ` Bernd Schmidt
2011-11-15  7:09                                                       ` David Miller
2011-11-15 18:54                                                   ` Richard Henderson
2011-11-15  0:45                                                 ` Hans-Peter Nilsson
2011-11-11  0:22                                     ` Revert " Michael Meissner
2015-04-08 11:11                             ` Gerald Pfeifer
2011-09-17 18:16 ` PowerPC shrink-wrap support 0 of 3 Bernd Schmidt
2011-09-19  5:39   ` Alan Modra
2011-09-19 13:36     ` Bernd Schmidt
     [not found]       ` <20110921152851.GE10321@bubble.grove.modra.org>
     [not found]         ` <20110922144017.GF10321@bubble.grove.modra.org>
2011-09-26 14:35           ` [PATCH] PowerPC shrink-wrap support benchmark gains Alan Modra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).