Shrink-wrapping: Introduction

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* Shrink-wrapping: Introduction
@ 2011-03-23 14:44 Bernd Schmidt
  2011-03-23 14:46 ` [PATCH 1/6] Disallow predicating the prologue Bernd Schmidt
                   ` (5 more replies)
  0 siblings, 6 replies; 73+ messages in thread
From: Bernd Schmidt @ 2011-03-23 14:44 UTC (permalink / raw)
  To: GCC Patches

I'll be posting a series of patches that add an implementation of
shrink-wrapping to gcc. I've used the algorithm in Muchnick's book as a
guideline, but ended up with an implementation that is somewhat more
conservative.

One of my goals was to reuse the existing prologue/epilogue
infrastructure as much as possible so that it is relatively easy to add
shrink-wrapping capability to a new port, rather than having to rewrite
prologue/epilogue generation for each. This implies that we still only
generate one prologue per function, rather than saving individual
registers at exactly the location where it becomes necessary. Still,
this is good enough to detect and optimize many early-exit cases.

There's a new comment at the top of thread_prologue_and_epilogue_insns
which describes the algorithm in some detail.

For now, the targets that can make use of this are ARM, MIPS and i386.
This turned out to be a fortunate choice, as these cover a large range
of situations. ARM has conditional return instructions, and requires a
distinction between return (which may pop lots of registers) and
simple_return (which just branches). MIPS has delay slots, so making it
work required fixing up reorg.c for the JUMP_LABEL changes.

One of the nastiest problems I ran into is that various RTL-level CFG
functions interpret a branch target of NULL to be the exit block, and
create branches using "return". This needed surgery to distinguish
between return and simple_return exits. JUMP_LABEL is now set and
maintained for return jumps.

A possible future enhancement is to add a targetm.gen_epilogue hook,
rather than using instruction patterns, and use that to pass information
about the set of registers that need to be saved. Epilogues for
tail-calls can be generated in multiple places, and it isn't always
necessary to restore the entire set of registers.

The patch has been in our local tree as well as the Linaro gcc tree for
a while now. That hasn't entirely been without pain, but most of the
issues we had should be ironed out by now. For this submission, I've
rested the patch (in some cases slightly earlier versions) on arm-linux,
mips64-elf and i686-linux (the latter including bootstraps).

Bernd

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 1/6] Disallow predicating the prologue
  2011-03-23 14:44 Shrink-wrapping: Introduction Bernd Schmidt
@ 2011-03-23 14:46 ` Bernd Schmidt
  2011-03-31 13:20   ` Jeff Law
  2011-04-01 18:59   ` H.J. Lu
  2011-03-23 14:48 ` [PATCH 2/6] Unique return rtx Bernd Schmidt
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 73+ messages in thread
From: Bernd Schmidt @ 2011-03-23 14:46 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 208 bytes --]

With prologues appearing in blocks other than the entry block, ifcvt can
decide to predicate them. This is not a good idea, as dwarf2out will
blow up trying to handle predicated frame-related things.


Bernd

[-- Attachment #2: no-ifcvt-prologue.diff --]
[-- Type: text/plain, Size: 1305 bytes --]

	* ifcvt.c (cond_exec_process_insns): Disallow converting a block
	that contains the prologue.

	* gcc.c-torture/compile/20110322-1.c: New test.

Index: gcc/ifcvt.c
===================================================================
--- gcc.orig/ifcvt.c
+++ gcc/ifcvt.c
@@ -304,6 +304,10 @@ cond_exec_process_insns (ce_if_block_t *
 
   for (insn = start; ; insn = NEXT_INSN (insn))
     {
+      /* dwarf2out can't cope with conditional prologues.  */
+      if (NOTE_P (insn) && NOTE_KIND (insn) == NOTE_INSN_PROLOGUE_END)
+	return FALSE;
+
       if (NOTE_P (insn) || DEBUG_INSN_P (insn))
 	goto insn_done;
 
Index: gcc/testsuite/gcc.c-torture/compile/20110322-1.c
===================================================================
--- /dev/null
+++ gcc/testsuite/gcc.c-torture/compile/20110322-1.c
@@ -0,0 +1,22 @@
+void asn1_length_der (unsigned long int len, unsigned char *ans, int *ans_len)
+{
+    int k;
+    unsigned char temp[4];
+    if (len < 128) {
+	if (ans != ((void *) 0))
+	    ans[0] = (unsigned char) len;
+	*ans_len = 1;
+    } else {
+	k = 0;
+	while (len) {
+	    temp[k++] = len & 0xFF;
+	    len = len >> 8;
+	}
+	*ans_len = k + 1;
+	if (ans != ((void *) 0)) {
+	    ans[0] = ((unsigned char) k & 0x7F) + 128;
+	    while (k--)
+		ans[*ans_len - 1 - k] = temp[k];
+	}
+    }
+}

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 2/6] Unique return rtx
  2011-03-23 14:44 Shrink-wrapping: Introduction Bernd Schmidt
  2011-03-23 14:46 ` [PATCH 1/6] Disallow predicating the prologue Bernd Schmidt
@ 2011-03-23 14:48 ` Bernd Schmidt
  2011-03-31 13:23   ` Jeff Law
  2011-03-23 14:51 ` [PATCH 3/6] Allow jumps in epilogues Bernd Schmidt
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 73+ messages in thread
From: Bernd Schmidt @ 2011-03-23 14:48 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 230 bytes --]

We'll start putting "return" into JUMP_LABELS in a subsequent patch, so
I've decided to make it unique as a small cleanup.

There's already another macro called "return_rtx", so the new one goes
by the name of "ret_rtx".


Bernd


[-- Attachment #2: unique-ret.diff --]
[-- Type: text/plain, Size: 11492 bytes --]

	* gengenrtl.c (special_rtx): PC, CC0 and RETURN are special.
	* genemit.c (gen_exp): Handle RETURN.
	* emit-rtl.c (verify_rtx_sharing): Likewise.
	(init_emit_regs): Create pc_rtx, ret_rtx and cc0_rtx specially.
	* rtl.c (copy_rtx): RETURN is shared.
	* rtl.h (enum global_rtl_index): Add GR_RETURN.
	(ret_rtx): New.
	* jump.c (redirect_exp_1): Don't use gen_rtx_RETURN.
	* config/s390/s390.c (s390_emit_epilogue): Likewise.
	* config/rx/rx.c (gen_rx_rtsd_vector): Likewise.
	* config/m68hc11/m68hc11.md (return): Likewise.
	* config/cris/cris.c (cris_expand_return): Likewise.
	* config/m68k/m68k.c (m68k_expand_epilogue): Likewise.
	* config/rs6000/rs6000.c (rs6000_make_savres_rtx,
	rs6000_emit_epilogue, rs6000_output_mi_thunk): Likewise.
	* config/picochip/picochip.c (picochip_expand_epilogue): Likewise.
	* config/h8300/h8300.c (h8300_push_pop, h8300_expand_epilogue):
	Likewise.
	* config/v850/v850.c (expand_epilogue): Likewise.
	* config/bfin/bfin.c (bfin_expand_call): Likewise.
	* config/arm/arm.md (epilogue): Likewise.

Index: gcc/gengenrtl.c
===================================================================
--- gcc.orig/gengenrtl.c
+++ gcc/gengenrtl.c
@@ -128,6 +128,9 @@ special_rtx (int idx)
 	  || strcmp (defs[idx].enumname, "REG") == 0
 	  || strcmp (defs[idx].enumname, "SUBREG") == 0
 	  || strcmp (defs[idx].enumname, "MEM") == 0
+	  || strcmp (defs[idx].enumname, "PC") == 0
+	  || strcmp (defs[idx].enumname, "CC0") == 0
+	  || strcmp (defs[idx].enumname, "RETURN") == 0
 	  || strcmp (defs[idx].enumname, "CONST_VECTOR") == 0);
 }
 
Index: gcc/genemit.c
===================================================================
--- gcc.orig/genemit.c
+++ gcc/genemit.c
@@ -223,6 +223,9 @@ gen_exp (rtx x, enum rtx_code subroutine
     case PC:
       printf ("pc_rtx");
       return;
+    case RETURN:
+      printf ("ret_rtx");
+      return;
     case CLOBBER:
       if (REG_P (XEXP (x, 0)))
 	{
Index: gcc/emit-rtl.c
===================================================================
--- gcc.orig/emit-rtl.c
+++ gcc/emit-rtl.c
@@ -2447,6 +2447,7 @@ verify_rtx_sharing (rtx orig, rtx insn)
     case CODE_LABEL:
     case PC:
     case CC0:
+    case RETURN:
     case SCRATCH:
       return;
       /* SCRATCH must be shared because they represent distinct values.  */
@@ -5651,8 +5652,9 @@ init_emit_regs (void)
   init_reg_modes_target ();
 
   /* Assign register numbers to the globally defined register rtx.  */
-  pc_rtx = gen_rtx_PC (VOIDmode);
-  cc0_rtx = gen_rtx_CC0 (VOIDmode);
+  pc_rtx = gen_rtx_fmt_ (PC, VOIDmode);
+  ret_rtx = gen_rtx_fmt_ (RETURN, VOIDmode);
+  cc0_rtx = gen_rtx_fmt_ (CC0, VOIDmode);
   stack_pointer_rtx = gen_raw_REG (Pmode, STACK_POINTER_REGNUM);
   frame_pointer_rtx = gen_raw_REG (Pmode, FRAME_POINTER_REGNUM);
   hard_frame_pointer_rtx = gen_raw_REG (Pmode, HARD_FRAME_POINTER_REGNUM);
Index: gcc/rtl.c
===================================================================
--- gcc.orig/rtl.c
+++ gcc/rtl.c
@@ -255,6 +255,7 @@ copy_rtx (rtx orig)
     case CODE_LABEL:
     case PC:
     case CC0:
+    case RETURN:
     case SCRATCH:
       /* SCRATCH must be shared because they represent distinct values.  */
       return orig;
Index: gcc/rtl.h
===================================================================
--- gcc.orig/rtl.h
+++ gcc/rtl.h
@@ -2045,6 +2045,7 @@ enum global_rtl_index
 {
   GR_PC,
   GR_CC0,
+  GR_RETURN,
   GR_STACK_POINTER,
   GR_FRAME_POINTER,
 /* For register elimination to work properly these hard_frame_pointer_rtx,
@@ -2134,6 +2135,7 @@ extern struct target_rtl *this_target_rt
 
 /* Standard pieces of rtx, to be substituted directly into things.  */
 #define pc_rtx                  (global_rtl[GR_PC])
+#define ret_rtx                 (global_rtl[GR_RETURN])
 #define cc0_rtx                 (global_rtl[GR_CC0])
 
 /* All references to certain hard regs, except those created
Index: gcc/config/s390/s390.c
===================================================================
--- gcc.orig/config/s390/s390.c
+++ gcc/config/s390/s390.c
@@ -8516,7 +8516,7 @@ s390_emit_epilogue (bool sibcall)
 
       p = rtvec_alloc (2);
 
-      RTVEC_ELT (p, 0) = gen_rtx_RETURN (VOIDmode);
+      RTVEC_ELT (p, 0) = ret_rtx;
       RTVEC_ELT (p, 1) = gen_rtx_USE (VOIDmode, return_reg);
       emit_jump_insn (gen_rtx_PARALLEL (VOIDmode, p));
     }
Index: gcc/config/rx/rx.c
===================================================================
--- gcc.orig/config/rx/rx.c
+++ gcc/config/rx/rx.c
@@ -1550,7 +1550,7 @@ gen_rx_rtsd_vector (unsigned int adjust,
 				: plus_constant (stack_pointer_rtx,
 						 i * UNITS_PER_WORD)));
 
-  XVECEXP (vector, 0, count - 1) = gen_rtx_RETURN (VOIDmode);
+  XVECEXP (vector, 0, count - 1) = ret_rtx;
 
   return vector;
 }
Index: gcc/config/m68hc11/m68hc11.md
===================================================================
--- gcc.orig/config/m68hc11/m68hc11.md
+++ gcc/config/m68hc11/m68hc11.md
@@ -6576,7 +6576,7 @@
   if (ret_size && ret_size <= 2)
     {
       emit_jump_insn (gen_rtx_PARALLEL (VOIDmode,
-		      gen_rtvec (2, gen_rtx_RETURN (VOIDmode),
+		      gen_rtvec (2, ret_rtx,
 			         gen_rtx_USE (VOIDmode,
 					      gen_rtx_REG (HImode, 1)))));
       DONE;
@@ -6584,7 +6584,7 @@
   if (ret_size)
     {
       emit_jump_insn (gen_rtx_PARALLEL (VOIDmode,
-		      gen_rtvec (2, gen_rtx_RETURN (VOIDmode),
+		      gen_rtvec (2, ret_rtx,
 			         gen_rtx_USE (VOIDmode,
 					      gen_rtx_REG (SImode, 0)))));
       DONE;
Index: gcc/config/cris/cris.c
===================================================================
--- gcc.orig/config/cris/cris.c
+++ gcc/config/cris/cris.c
@@ -1788,7 +1788,7 @@ cris_expand_return (bool on_stack)
      we do that until they're fixed.  Currently, all return insns in a
      function must be the same (not really a limiting factor) so we need
      to check that it doesn't change half-way through.  */
-  emit_jump_insn (gen_rtx_RETURN (VOIDmode));
+  emit_jump_insn (ret_rtx);
 
   CRIS_ASSERT (cfun->machine->return_type != CRIS_RETINSN_RET || !on_stack);
   CRIS_ASSERT (cfun->machine->return_type != CRIS_RETINSN_JUMP || on_stack);
Index: gcc/config/m68k/m68k.c
===================================================================
--- gcc.orig/config/m68k/m68k.c
+++ gcc/config/m68k/m68k.c
@@ -1384,7 +1384,7 @@ m68k_expand_epilogue (bool sibcall_p)
 			   EH_RETURN_STACKADJ_RTX));
 
   if (!sibcall_p)
-    emit_jump_insn (gen_rtx_RETURN (VOIDmode));
+    emit_jump_insn (ret_rtx);
 }
 \f
 /* Return true if X is a valid comparison operator for the dbcc 
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc.orig/config/rs6000/rs6000.c
+++ gcc/config/rs6000/rs6000.c
@@ -20277,7 +20277,7 @@ rs6000_make_savres_rtx (rs6000_stack_t *
   p = rtvec_alloc ((lr ? 4 : 3) + n_regs);
 
   if (!savep && lr)
-    RTVEC_ELT (p, offset++) = gen_rtx_RETURN (VOIDmode);
+    RTVEC_ELT (p, offset++) = ret_rtx;
 
   RTVEC_ELT (p, offset++)
     = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, 65));
@@ -21269,7 +21269,7 @@ rs6000_emit_epilogue (int sibcall)
       alloc_rname = ggc_strdup (rname);
 
       j = 0;
-      RTVEC_ELT (p, j++) = gen_rtx_RETURN (VOIDmode);
+      RTVEC_ELT (p, j++) = ret_rtx;
       RTVEC_ELT (p, j++) = gen_rtx_USE (VOIDmode,
 					gen_rtx_REG (Pmode,
 						     LR_REGNO));
@@ -21885,7 +21885,7 @@ rs6000_emit_epilogue (int sibcall)
       else
 	p = rtvec_alloc (2);
 
-      RTVEC_ELT (p, 0) = gen_rtx_RETURN (VOIDmode);
+      RTVEC_ELT (p, 0) = ret_rtx;
       RTVEC_ELT (p, 1) = ((restoring_FPRs_inline || !lr)
 			  ? gen_rtx_USE (VOIDmode, gen_rtx_REG (Pmode, 65))
 			  : gen_rtx_CLOBBER (VOIDmode,
@@ -22323,7 +22323,7 @@ rs6000_output_mi_thunk (FILE *file, tree
 			gen_rtx_USE (VOIDmode,
 				     gen_rtx_REG (SImode,
 						  LR_REGNO)),
-			gen_rtx_RETURN (VOIDmode))));
+			ret_rtx)));
   SIBLING_CALL_P (insn) = 1;
   emit_barrier ();
 
Index: gcc/config/picochip/picochip.c
===================================================================
--- gcc.orig/config/picochip/picochip.c
+++ gcc/config/picochip/picochip.c
@@ -2273,7 +2273,7 @@ picochip_expand_epilogue (int is_sibling
     rtvec p;
     p = rtvec_alloc (2);
 
-    RTVEC_ELT (p, 0) = gen_rtx_RETURN (VOIDmode);
+    RTVEC_ELT (p, 0) = ret_rtx;
     RTVEC_ELT (p, 1) = gen_rtx_USE (VOIDmode,
 				    gen_rtx_REG (Pmode, LINK_REGNUM));
     emit_jump_insn (gen_rtx_PARALLEL (VOIDmode, p));
Index: gcc/config/h8300/h8300.c
===================================================================
--- gcc.orig/config/h8300/h8300.c
+++ gcc/config/h8300/h8300.c
@@ -702,7 +702,7 @@ h8300_push_pop (int regno, int nregs, bo
   /* Add the return instruction.  */
   if (return_p)
     {
-      RTVEC_ELT (vec, i) = gen_rtx_RETURN (VOIDmode);
+      RTVEC_ELT (vec, i) = ret_rtx;
       i++;
     }
 
@@ -986,7 +986,7 @@ h8300_expand_epilogue (void)
     }
 
   if (!returned_p)
-    emit_jump_insn (gen_rtx_RETURN (VOIDmode));
+    emit_jump_insn (ret_rtx);
 }
 
 /* Return nonzero if the current function is an interrupt
Index: gcc/config/v850/v850.c
===================================================================
--- gcc.orig/config/v850/v850.c
+++ gcc/config/v850/v850.c
@@ -1886,7 +1886,7 @@ expand_epilogue (void)
 	  int offset;
 	  restore_all = gen_rtx_PARALLEL (VOIDmode,
 					  rtvec_alloc (num_restore + 2));
-	  XVECEXP (restore_all, 0, 0) = gen_rtx_RETURN (VOIDmode);
+	  XVECEXP (restore_all, 0, 0) = ret_rtx;
 	  XVECEXP (restore_all, 0, 1)
 	    = gen_rtx_SET (VOIDmode, stack_pointer_rtx,
 			    gen_rtx_PLUS (Pmode,
Index: gcc/config/bfin/bfin.c
===================================================================
--- gcc.orig/config/bfin/bfin.c
+++ gcc/config/bfin/bfin.c
@@ -2347,7 +2347,7 @@ bfin_expand_call (rtx retval, rtx fnaddr
     XVECEXP (pat, 0, n++) = gen_rtx_USE (VOIDmode, picreg);
   XVECEXP (pat, 0, n++) = gen_rtx_USE (VOIDmode, cookie);
   if (sibcall)
-    XVECEXP (pat, 0, n++) = gen_rtx_RETURN (VOIDmode);
+    XVECEXP (pat, 0, n++) = ret_rtx;
   else
     XVECEXP (pat, 0, n++) = gen_rtx_CLOBBER (VOIDmode, retsreg);
   call = emit_call_insn (pat);
Index: gcc/config/arm/arm.md
===================================================================
--- gcc.orig/config/arm/arm.md
+++ gcc/config/arm/arm.md
@@ -9970,9 +9970,7 @@
       DONE;
     }
   emit_jump_insn (gen_rtx_UNSPEC_VOLATILE (VOIDmode,
-	gen_rtvec (1,
-		gen_rtx_RETURN (VOIDmode)),
-	VUNSPEC_EPILOGUE));
+	gen_rtvec (1, ret_rtx), VUNSPEC_EPILOGUE));
   DONE;
   "
 )
Index: gcc/jump.c
===================================================================
--- gcc.orig/jump.c
+++ gcc/jump.c
@@ -1349,7 +1349,7 @@ redirect_exp_1 (rtx *loc, rtx olabel, rt
 	  if (nlabel)
 	    n = gen_rtx_LABEL_REF (Pmode, nlabel);
 	  else
-	    n = gen_rtx_RETURN (VOIDmode);
+	    n = ret_rtx;
 
 	  validate_change (insn, loc, n, 1);
 	  return;
@@ -1360,7 +1360,7 @@ redirect_exp_1 (rtx *loc, rtx olabel, rt
       if (nlabel)
 	x = gen_rtx_LABEL_REF (Pmode, nlabel);
       else
-	x = gen_rtx_RETURN (VOIDmode);
+	x = ret_rtx;
       if (loc == &PATTERN (insn))
 	x = gen_rtx_SET (VOIDmode, pc_rtx, x);
       validate_change (insn, loc, x, 1);
@@ -1371,7 +1371,7 @@ redirect_exp_1 (rtx *loc, rtx olabel, rt
       && GET_CODE (SET_SRC (x)) == LABEL_REF
       && XEXP (SET_SRC (x), 0) == olabel)
     {
-      validate_change (insn, loc, gen_rtx_RETURN (VOIDmode), 1);
+      validate_change (insn, loc, ret_rtx, 1);
       return;
     }
 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 3/6] Allow jumps in epilogues
  2011-03-23 14:44 Shrink-wrapping: Introduction Bernd Schmidt
  2011-03-23 14:46 ` [PATCH 1/6] Disallow predicating the prologue Bernd Schmidt
  2011-03-23 14:48 ` [PATCH 2/6] Unique return rtx Bernd Schmidt
@ 2011-03-23 14:51 ` Bernd Schmidt
  2011-03-23 16:46   ` Richard Henderson
  2011-03-23 14:56 ` [PATCH 5/6] Generate more shrink-wrapping opportunities Bernd Schmidt
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 73+ messages in thread
From: Bernd Schmidt @ 2011-03-23 14:51 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 348 bytes --]

dwarf2out has code that starts scanning from NOTE_INSN_EPILOGUE_BEG
until it finds the return jump. When there is common code in several
blocks ending in a return, we might want to share this, and in that case
it would be possible to encounter a simplejump rather than a returnjump.
This should be safe, and the following patch allows it.


Bernd


[-- Attachment #2: sj-ep.diff --]
[-- Type: text/plain, Size: 2806 bytes --]

    	* cfgcleanup.c (flow_find_head_matching_sequence): Ignore
    	epilogue notes.
    	* df-problems.c (can_move_insns_across): Don't stop at epilogue
    	notes.
    	* dwarf2out.c (dwarf2out_cfi_begin_epilogue): Also allow a
    	simplejump to end the block.

Index: gcc/cfgcleanup.c
===================================================================
--- gcc.orig/cfgcleanup.c
+++ gcc/cfgcleanup.c
@@ -1184,20 +1184,12 @@ flow_find_head_matching_sequence (basic_
 
   while (true)
     {
-      /* Ignore notes, except NOTE_INSN_EPILOGUE_BEG.  */
+      /* Ignore notes.  */
       while (!NONDEBUG_INSN_P (i1) && i1 != BB_END (bb1))
-	{
-	  if (NOTE_P (i1) && NOTE_KIND (i1) == NOTE_INSN_EPILOGUE_BEG)
-	    break;
-	  i1 = NEXT_INSN (i1);
-	}
+	i1 = NEXT_INSN (i1);
 
       while (!NONDEBUG_INSN_P (i2) && i2 != BB_END (bb2))
-	{
-	  if (NOTE_P (i2) && NOTE_KIND (i2) == NOTE_INSN_EPILOGUE_BEG)
-	    break;
-	  i2 = NEXT_INSN (i2);
-	}
+	i2 = NEXT_INSN (i2);
 
       if ((i1 == BB_END (bb1) && !NONDEBUG_INSN_P (i1))
 	  || (i2 == BB_END (bb2) && !NONDEBUG_INSN_P (i2)))
Index: gcc/df-problems.c
===================================================================
--- gcc.orig/df-problems.c
+++ gcc/df-problems.c
@@ -3953,8 +3953,6 @@ can_move_insns_across (rtx from, rtx to,
     {
       if (CALL_P (insn))
 	break;
-      if (NOTE_P (insn) && NOTE_KIND (insn) == NOTE_INSN_EPILOGUE_BEG)
-	break;
       if (NONDEBUG_INSN_P (insn))
 	{
 	  if (may_trap_or_fault_p (PATTERN (insn))
Index: gcc/dwarf2out.c
===================================================================
--- gcc.orig/dwarf2out.c
+++ gcc/dwarf2out.c
@@ -2939,10 +2939,10 @@ dwarf2out_frame_debug (rtx insn, bool af
     dwarf2out_flush_queued_reg_saves ();
 }
 
-/* Determine if we need to save and restore CFI information around this
-   epilogue.  If SIBCALL is true, then this is a sibcall epilogue.  If
-   we do need to save/restore, then emit the save now, and insert a
-   NOTE_INSN_CFA_RESTORE_STATE at the appropriate place in the stream.  */
+/* Determine if we need to save and restore CFI information around
+   this epilogue.  If we do need to save/restore, then emit the save
+   now, and insert a NOTE_INSN_CFA_RESTORE_STATE at the appropriate
+   place in the stream.  */
 
 void
 dwarf2out_cfi_begin_epilogue (rtx insn)
@@ -2957,8 +2957,10 @@ dwarf2out_cfi_begin_epilogue (rtx insn)
       if (!INSN_P (i))
 	continue;
 
-      /* Look for both regular and sibcalls to end the block.  */
-      if (returnjump_p (i))
+      /* Look for both regular and sibcalls to end the block.  Various
+	 optimization passes may cause us to jump to a common epilogue
+	 tail, so we also accept simplejumps.  */
+      if (returnjump_p (i) || simplejump_p (i))
 	break;
       if (CALL_P (i) && SIBLING_CALL_P (i))
 	break;

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 5/6] Generate more shrink-wrapping opportunities
  2011-03-23 14:44 Shrink-wrapping: Introduction Bernd Schmidt
                   ` (2 preceding siblings ...)
  2011-03-23 14:51 ` [PATCH 3/6] Allow jumps in epilogues Bernd Schmidt
@ 2011-03-23 14:56 ` Bernd Schmidt
  2011-03-23 15:03   ` Jeff Law
  2011-03-31 13:26   ` Jeff Law
  2011-03-23 14:56 ` [PATCH 4/6] Shrink-wrapping Bernd Schmidt
  2011-03-23 14:57 ` [PATCH 6/6] A testcase Bernd Schmidt
  5 siblings, 2 replies; 73+ messages in thread
From: Bernd Schmidt @ 2011-03-23 14:56 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 485 bytes --]

The first basic block contains insns to move incoming argument registers
to pseudos. When these pseudos live across calls, they get allocated to
call-saved registers. This in turns disables shrink-wrapping, since the
move instruction requires the prologue (saving the call-saved reg) to
occur before it.

This patch addresses the problem by moving such moves downwards through
the CFG until we find a place where the destination is used or the
incoming argument is clobbered.


Bernd


[-- Attachment #2: enhance-sw.diff --]
[-- Type: text/plain, Size: 3775 bytes --]

	* function.c (prepare_shrink_wrap): New function.
	(thread_prologue_and_epilogue_insns): Call it.

Index: gcc/function.c
===================================================================
--- gcc.orig/function.c
+++ gcc/function.c
@@ -5299,6 +5299,127 @@ requires_stack_frame_p (rtx insn)
       return true;
   return false;
 }
+
+/* Look for sets of call-saved registers in the first block of the
+   function, and move them down into successor blocks if the register
+   is used only on one path.  This exposes more opportunities for
+   shrink-wrapping.
+   These kinds of sets often occur when incoming argument registers are
+   moved to call-saved registers because their values are live across
+   one or more calls during the function.  */
+
+static void
+prepare_shrink_wrap (basic_block entry_block)
+{
+  rtx insn, curr;
+  FOR_BB_INSNS_SAFE (entry_block, insn, curr)
+    {
+      basic_block next_bb;
+      edge e, live_edge;
+      edge_iterator ei;
+      rtx set, scan;
+      unsigned destreg, srcreg;
+
+      if (!NONDEBUG_INSN_P (insn))
+	continue;
+      set = single_set (insn);
+      if (!set)
+	continue;
+
+      if (!REG_P (SET_SRC (set)) || !REG_P (SET_DEST (set)))
+	continue;
+      srcreg = REGNO (SET_SRC (set));
+      destreg = REGNO (SET_DEST (set));
+      if (hard_regno_nregs[srcreg][GET_MODE (SET_SRC (set))] > 1
+	  || hard_regno_nregs[destreg][GET_MODE (SET_DEST (set))] > 1)
+	continue;
+
+      next_bb = entry_block;
+      scan = insn;
+
+      for (;;)
+	{
+	  live_edge = NULL;
+	  FOR_EACH_EDGE (e, ei, next_bb->succs)
+	    {
+	      if (REGNO_REG_SET_P (df_get_live_in (e->dest), destreg))
+		{
+		  if (live_edge)
+		    {
+		      live_edge = NULL;
+		      break;
+		    }
+		  live_edge = e;
+		}
+	    }
+	  if (!live_edge)
+	    break;
+	  /* We can sometimes encounter dead code.  Don't try to move it
+	     into the exit block.  */
+	  if (live_edge->dest == EXIT_BLOCK_PTR)
+	    break;
+	  if (EDGE_COUNT (live_edge->dest->preds) > 1)
+	    break;
+	  while (scan != BB_END (next_bb))
+	    {
+	      scan = NEXT_INSN (scan);
+	      if (NONDEBUG_INSN_P (scan))
+		{
+		  rtx link;
+		  HARD_REG_SET set_regs;
+
+		  CLEAR_HARD_REG_SET (set_regs);
+		  note_stores (PATTERN (scan), record_hard_reg_sets,
+			       &set_regs);
+		  if (CALL_P (scan))
+		    IOR_HARD_REG_SET (set_regs, call_used_reg_set);
+		  for (link = REG_NOTES (scan); link; link = XEXP (link, 1))
+		    if (REG_NOTE_KIND (link) == REG_INC)
+		      record_hard_reg_sets (XEXP (link, 0), NULL, &set_regs);
+
+		  if (TEST_HARD_REG_BIT (set_regs, srcreg)
+		      || reg_referenced_p (SET_DEST (set),
+					   PATTERN (scan)))
+		    {
+		      scan = NULL_RTX;
+		      break;
+		    }
+		  if (CALL_P (scan))
+		    {
+		      rtx link = CALL_INSN_FUNCTION_USAGE (scan);
+		      while (link)
+			{
+			  rtx tmp = XEXP (link, 0);
+			  if (GET_CODE (tmp) == USE
+			      && reg_referenced_p (SET_DEST (set), tmp))
+			    break;
+			  link = XEXP (link, 1);
+			}
+		      if (link)
+			{
+			  scan = NULL_RTX;
+			  break;
+			}
+		    }
+		}
+	    }
+	  if (!scan)
+	    break;
+	  next_bb = live_edge->dest;
+	}
+
+      if (next_bb != entry_block)
+	{
+	  rtx after = BB_HEAD (next_bb);
+	  while (!NOTE_P (after)
+		 || NOTE_KIND (after) != NOTE_INSN_BASIC_BLOCK)
+	    after = NEXT_INSN (after);
+	  emit_insn_after (PATTERN (insn), after);
+	  delete_insn (insn);
+	}
+    }
+}
+
 #endif
 
 #ifdef HAVE_return
@@ -5499,6 +5620,8 @@ thread_prologue_and_epilogue_insns (void
       bitmap_head bb_antic_flags;
       bitmap_head bb_on_list;
 
+      prepare_shrink_wrap (entry_edge->dest);
+
       bitmap_initialize (&bb_antic_flags, &bitmap_default_obstack);
       bitmap_initialize (&bb_on_list, &bitmap_default_obstack);
 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 4/6] Shrink-wrapping
  2011-03-23 14:44 Shrink-wrapping: Introduction Bernd Schmidt
                   ` (3 preceding siblings ...)
  2011-03-23 14:56 ` [PATCH 5/6] Generate more shrink-wrapping opportunities Bernd Schmidt
@ 2011-03-23 14:56 ` Bernd Schmidt
  2011-07-07 14:51   ` Richard Sandiford
  2011-07-07 21:41   ` Michael Hope
  2011-03-23 14:57 ` [PATCH 6/6] A testcase Bernd Schmidt
  5 siblings, 2 replies; 73+ messages in thread
From: Bernd Schmidt @ 2011-03-23 14:56 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 203 bytes --]

This adds the actual optimization, and reworks the JUMP_LABEL handling
for return blocks. See the introduction mail or the new comment ahead of
thread_prologue_and_epilogue_insns for more notes.


Bernd

[-- Attachment #2: sw-full.diff --]
[-- Type: text/plain, Size: 105552 bytes --]

	* doc/tm.texi (RETURN_ADDR_REGNUM): Document.
	* doc/md.texi (simple_return): Document pattern.
	(return): Add a sentence to clarify.
	* doc/rtl.texi (simple_return): Document.
	* doc/invoke.texi (Optimize Options): Document -fshrink-wrap.
	* common.opt (fshrink-wrap): New.
	* opts.c (default_options_table): Enable it for -O1 and above.
	* gengenrtl.c (special_rtx): SIMPLE_RETURN is special.
	* rtl.h (ANY_RETURN_P): New macro.
	(global_rtl_index): Add GR_SIMPLE_RETURN.
	(simple_return_rtx): New macro.
	* genemit.c (gen_exp): SIMPLE_RETURN has a unique rtx.
	(gen_expand, gen_split): Use ANY_RETURN_P.
	* rtl.c (copy_rtx): SIMPLE_RETURN is shared.
	* emit-rtl.c (verify_rtx_sharing): Likewise.
	(skip_consecutive_labels): Return the argument if it is a return rtx.
	(classify_insn): Handle both kinds of return.
	(init_emit_regs): Create global rtl for ret_rtx and simple_return_rtx.
	* df-scan.c (df_uses_record): Handle SIMPLE_RETURN.
	* rtl.def (SIMPLE_RETURN): New.
	* rtlanal.c (tablejump_p): Check JUMP_LABEL for returns.
	* final.c (final_scan_insn): Recognize both kinds of return.
	* reorg.c (function_return_label, function_simple_return_label): New
	static variables.
	(end_of_function_label): Remove.
	(simplejump_or_return_p): New static function.
	(find_end_label): Add a new arg, KIND.  All callers changed.
	Depending on KIND, look for a label suitable for return or
	simple_return.
	(make_return_insns): Make corresponding changes.
	(get_jump_flags): Check JUMP_LABELs for returns.
	(follow_jumps): Likewise.
	(get_branch_condition): Check target for return patterns rather
	than NULL.
	(own_thread_p): Likewise for thread.
	(steal_delay_list_from_target): Check JUMP_LABELs for returns.
	Use simplejump_or_return_p.
	(fill_simple_delay_slots): Likewise.
	(optimize_skip): Likewise.
	(fill_slots_from_thread): Likewise.
	(relax_delay_slots): Likewise.
	(dbr_schedule): Adjust handling of end_of_function_label for the
	two new variables.
	* ifcvt.c (find_if_case_1): Take care when redirecting jumps to the
	exit block.
	(dead_or_predicable): Change NEW_DEST arg to DEST_EDGE.  All callers
	changed.  Ensure that the right label is passed to redirect_jump.
	* jump.c (condjump_p, condjump_in_parallel_p, any_condjump_p,
	returnjump_p): Handle SIMPLE_RETURNs.
	(delete_related_insns): Check JUMP_LABEL for returns.
	(redirect_target): New static function.
	(redirect_exp_1): Use it.  Handle any kind of return rtx as a label
	rather than interpreting NULL as a return.
	(redirect_jump_1): Assert that nlabel is not NULL.
	(redirect_jump): Likewise.
	(redirect_jump_2): Handle any kind of return rtx as a label rather
	than interpreting NULL as a return.
	* dwarf2out.c (compute_barrier_args_size_1): Check JUMP_LABEL for
	returns.
	* function.c (emit_return_into_block): Remove useless declaration.
	(record_hard_reg_sets, frame_required_for_rtx, gen_return_pattern,
	requires_stack_frame_p): New static functions.
	(emit_return_into_block): New arg SIMPLE_P.  All callers changed.
	Generate either kind of return pattern and update the JUMP_LABEL.
	(thread_prologue_and_epilogue_insns): Implement a form of
	shrink-wrapping.  Ensure JUMP_LABELs for return insns are set.
	* print-rtl.c (print_rtx): Handle returns in JUMP_LABELs.
	* cfglayout.c (fixup_reorder_chain): Ensure JUMP_LABELs for returns
	remain correct.
	* resource.c (find_dead_or_set_registers): Check JUMP_LABELs for
	returns.
	(mark_target_live_regs): Don't pass a return rtx to next_active_insn.
	* basic-block.h (force_nonfallthru_and_redirect): Declare.
	* sched-vis.c (print_pattern): Add case for SIMPLE_RETURN.
	* cfgrtl.c (force_nonfallthru_and_redirect): No longer static.  New arg
	JUMP_LABEL.  All callers changed.  Use the label when generating
	return insns.

	* config/i386/i386.md (returns, return_str, return_cond): New
	code_iterator and corresponding code_attrs.
	(<return_str>return): Renamed from return and adapted.
	(<return_str>return_internal): Likewise for return_internal.
	(<return_str>return_internal_long): Likewise for return_internal_long.
	(<return_str>return_pop_internal): Likewise for return_pop_internal.
	(<return_str>return_indirect_internal): Likewise for
	return_indirect_internal.
	* config/i386/i386.c (ix86_expand_epilogue): Expand a simple_return as
	the last insn.
	(ix86_pad_returns): Handle both kinds of return rtx.
	* config/arm/arm.c (use_simple_return_p): new function.
	(is_jump_table): Handle returns in JUMP_LABELs.
	(output_return_instruction): New arg SIMPLE.  All callers changed.
	Use it to determine which kind of return to generate.
	(arm_final_prescan_insn): Handle both kinds of return.
	* config/arm/arm.md (returns, return_str, return_simple_p,
	return_cond): New code_iterator and corresponding code_attrs.
	(<return_str>return): Renamed from return and adapted.
	(arm_<return_str>return): Renamed from arm_return and adapted.
	(cond_<return_str>return): Renamed from cond_return and adapted.
	(cond_<return_str>return_inverted): Renamed from cond_return_inverted
	and adapted.
	* config/arm/thumb2.md (thumb2_<return_str>return): Renamed from
	thumb2_return and adapted.
	* config/arm/arm.h (RETURN_ADDR_REGNUM): Define.
	* config/arm/arm-protos.h (use_simple_return_p): Declare.
	(output_return_instruction): Adjust declaration.
	* config/mips/mips.c (mips_expand_epilogue): Generate a simple_return
	as final insn.
	* config/mips/mips.md (simple_return): New expander.
	(*simple_return, simple_return_internal): New patterns.
	* config/sh/sh.c (barrier_align): Handle return in a JUMP_LABEL.
	(split_branches): Don't pass a null label to redirect_jump.

Index: gcc/basic-block.h
===================================================================
--- gcc.orig/basic-block.h
+++ gcc/basic-block.h
@@ -795,6 +795,7 @@ extern void flow_edge_list_print (const 
 
 /* In cfgrtl.c  */
 extern basic_block force_nonfallthru (edge);
+extern basic_block force_nonfallthru_and_redirect (edge, basic_block, rtx);
 extern rtx block_label (basic_block);
 extern bool purge_all_dead_edges (void);
 extern bool purge_dead_edges (basic_block);
Index: gcc/cfglayout.c
===================================================================
--- gcc.orig/cfglayout.c
+++ gcc/cfglayout.c
@@ -766,6 +766,7 @@ fixup_reorder_chain (void)
     {
       edge e_fall, e_taken, e;
       rtx bb_end_insn;
+      rtx ret_label = NULL_RTX;
       basic_block nb;
       edge_iterator ei;
 
@@ -785,6 +786,7 @@ fixup_reorder_chain (void)
       bb_end_insn = BB_END (bb);
       if (JUMP_P (bb_end_insn))
 	{
+	  ret_label = JUMP_LABEL (bb_end_insn);
 	  if (any_condjump_p (bb_end_insn))
 	    {
 	      /* This might happen if the conditional jump has side
@@ -895,7 +897,7 @@ fixup_reorder_chain (void)
 	}
 
       /* We got here if we need to add a new jump insn.  */
-      nb = force_nonfallthru (e_fall);
+      nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label);
       if (nb)
 	{
 	  nb->il.rtl->visited = 1;
@@ -1129,24 +1131,30 @@ extern bool cfg_layout_can_duplicate_bb_
 bool
 cfg_layout_can_duplicate_bb_p (const_basic_block bb)
 {
+  rtx insn;
+
   /* Do not attempt to duplicate tablejumps, as we need to unshare
      the dispatch table.  This is difficult to do, as the instructions
      computing jump destination may be hoisted outside the basic block.  */
   if (tablejump_p (BB_END (bb), NULL, NULL))
     return false;
 
-  /* Do not duplicate blocks containing insns that can't be copied.  */
-  if (targetm.cannot_copy_insn_p)
+  insn = BB_HEAD (bb);
+  while (1)
     {
-      rtx insn = BB_HEAD (bb);
-      while (1)
-	{
-	  if (INSN_P (insn) && targetm.cannot_copy_insn_p (insn))
-	    return false;
-	  if (insn == BB_END (bb))
-	    break;
-	  insn = NEXT_INSN (insn);
-	}
+      /* Do not duplicate blocks containing insns that can't be copied.  */
+      if (INSN_P (insn) && targetm.cannot_copy_insn_p
+	  && targetm.cannot_copy_insn_p (insn))
+	return false;
+      /* dwarf2out expects that these notes are always paired with a
+	 returnjump or sibling call.  */
+      if (NOTE_P (insn) && NOTE_KIND (insn) == NOTE_INSN_EPILOGUE_BEG
+	  && !returnjump_p (BB_END (bb))
+	  && (!CALL_P (BB_END (bb)) || !SIBLING_CALL_P (BB_END (bb))))
+	return false;
+      if (insn == BB_END (bb))
+	break;
+      insn = NEXT_INSN (insn);
     }
 
   return true;
@@ -1191,6 +1199,9 @@ duplicate_insn_chain (rtx from, rtx to)
 	      break;
 	    }
 	  copy = emit_copy_of_insn_after (insn, get_last_insn ());
+	  if (JUMP_P (insn) && JUMP_LABEL (insn) != NULL_RTX
+	      && ANY_RETURN_P (JUMP_LABEL (insn)))
+	    JUMP_LABEL (copy) = JUMP_LABEL (insn);
           maybe_copy_prologue_epilogue_insn (insn, copy);
 	  break;
 
Index: gcc/cfgrtl.c
===================================================================
--- gcc.orig/cfgrtl.c
+++ gcc/cfgrtl.c
@@ -1114,10 +1114,13 @@ rtl_redirect_edge_and_branch (edge e, ba
 }
 
 /* Like force_nonfallthru below, but additionally performs redirection
-   Used by redirect_edge_and_branch_force.  */
+   Used by redirect_edge_and_branch_force.  JUMP_LABEL is used only
+   when redirecting to the EXIT_BLOCK, it is either a return or a
+   simple_return rtx indicating which kind of returnjump to create.
+   It should be NULL otherwise.  */
 
-static basic_block
-force_nonfallthru_and_redirect (edge e, basic_block target)
+basic_block
+force_nonfallthru_and_redirect (edge e, basic_block target, rtx jump_label)
 {
   basic_block jump_block, new_bb = NULL, src = e->src;
   rtx note;
@@ -1249,11 +1252,25 @@ force_nonfallthru_and_redirect (edge e, 
   e->flags &= ~EDGE_FALLTHRU;
   if (target == EXIT_BLOCK_PTR)
     {
+      if (jump_label == ret_rtx)
+	{
 #ifdef HAVE_return
-	emit_jump_insn_after_setloc (gen_return (), BB_END (jump_block), loc);
+	  emit_jump_insn_after_setloc (gen_return (), BB_END (jump_block),
+				       loc);
 #else
-	gcc_unreachable ();
+	  gcc_unreachable ();
 #endif
+	}
+      else
+	{
+	  gcc_assert (jump_label == simple_return_rtx);
+#ifdef HAVE_simple_return
+	  emit_jump_insn_after_setloc (gen_simple_return (),
+				       BB_END (jump_block), loc);
+#else
+	  gcc_unreachable ();
+#endif
+	}
     }
   else
     {
@@ -1280,7 +1297,7 @@ force_nonfallthru_and_redirect (edge e, 
 basic_block
 force_nonfallthru (edge e)
 {
-  return force_nonfallthru_and_redirect (e, e->dest);
+  return force_nonfallthru_and_redirect (e, e->dest, NULL_RTX);
 }
 
 /* Redirect edge even at the expense of creating new jump insn or
@@ -1297,7 +1314,7 @@ rtl_redirect_edge_and_branch_force (edge
   /* In case the edge redirection failed, try to force it to be non-fallthru
      and redirect newly created simplejump.  */
   df_set_bb_dirty (e->src);
-  return force_nonfallthru_and_redirect (e, target);
+  return force_nonfallthru_and_redirect (e, target, NULL_RTX);
 }
 
 /* The given edge should potentially be a fallthru edge.  If that is in
Index: gcc/config/arm/arm-protos.h
===================================================================
--- gcc.orig/config/arm/arm-protos.h
+++ gcc/config/arm/arm-protos.h
@@ -24,6 +24,7 @@
 #define GCC_ARM_PROTOS_H
 
 extern int use_return_insn (int, rtx);
+extern bool use_simple_return_p (void);
 extern enum reg_class arm_regno_class (int);
 extern void arm_load_pic_register (unsigned long);
 extern int arm_volatile_func (void);
@@ -135,7 +136,7 @@ extern int arm_address_offset_is_imm (rt
 extern const char *output_add_immediate (rtx *);
 extern const char *arithmetic_instr (rtx, int);
 extern void output_ascii_pseudo_op (FILE *, const unsigned char *, int);
-extern const char *output_return_instruction (rtx, int, int);
+extern const char *output_return_instruction (rtx, bool, bool, bool);
 extern void arm_poke_function_name (FILE *, const char *);
 extern void arm_final_prescan_insn (rtx);
 extern int arm_debugger_arg_offset (int, rtx);
Index: gcc/config/arm/arm.c
===================================================================
--- gcc.orig/config/arm/arm.c
+++ gcc/config/arm/arm.c
@@ -2252,6 +2252,18 @@ arm_trampoline_adjust_address (rtx addr)
   return addr;
 }
 \f
+/* Return true if we should try to use a simple_return insn, i.e. perform
+   shrink-wrapping if possible.  This is the case if we need to emit a
+   prologue, which we can test by looking at the offsets.  */
+bool
+use_simple_return_p (void)
+{
+  arm_stack_offsets *offsets;
+
+  offsets = arm_get_frame_offsets ();
+  return offsets->outgoing_args != 0;
+}
+
 /* Return 1 if it is possible to return using a single instruction.
    If SIBLING is non-null, this is a test for a return before a sibling
    call.  SIBLING is the call insn, so we can examine its register usage.  */
@@ -11388,6 +11400,7 @@ is_jump_table (rtx insn)
 
   if (GET_CODE (insn) == JUMP_INSN
       && JUMP_LABEL (insn) != NULL
+      && !ANY_RETURN_P (JUMP_LABEL (insn))
       && ((table = next_real_insn (JUMP_LABEL (insn)))
 	  == next_real_insn (insn))
       && table != NULL
@@ -14234,7 +14247,7 @@ arm_get_vfp_saved_size (void)
 /* Generate a function exit sequence.  If REALLY_RETURN is false, then do
    everything bar the final return instruction.  */
 const char *
-output_return_instruction (rtx operand, int really_return, int reverse)
+output_return_instruction (rtx operand, bool really_return, bool reverse, bool simple)
 {
   char conditional[10];
   char instr[100];
@@ -14272,10 +14285,15 @@ output_return_instruction (rtx operand, 
 
   sprintf (conditional, "%%?%%%c0", reverse ? 'D' : 'd');
 
-  cfun->machine->return_used_this_function = 1;
+  if (simple)
+    live_regs_mask = 0;
+  else
+    {
+      cfun->machine->return_used_this_function = 1;
 
-  offsets = arm_get_frame_offsets ();
-  live_regs_mask = offsets->saved_regs_mask;
+      offsets = arm_get_frame_offsets ();
+      live_regs_mask = offsets->saved_regs_mask;
+    }
 
   if (live_regs_mask)
     {
@@ -17283,6 +17301,7 @@ arm_final_prescan_insn (rtx insn)
 
   /* If we start with a return insn, we only succeed if we find another one.  */
   int seeking_return = 0;
+  enum rtx_code return_code = UNKNOWN;
 
   /* START_INSN will hold the insn from where we start looking.  This is the
      first insn after the following code_label if REVERSE is true.  */
@@ -17321,7 +17340,7 @@ arm_final_prescan_insn (rtx insn)
 	  else
 	    return;
 	}
-      else if (GET_CODE (body) == RETURN)
+      else if (ANY_RETURN_P (body))
         {
 	  start_insn = next_nonnote_insn (start_insn);
 	  if (GET_CODE (start_insn) == BARRIER)
@@ -17332,6 +17351,7 @@ arm_final_prescan_insn (rtx insn)
 	    {
 	      reverse = TRUE;
 	      seeking_return = 1;
+	      return_code = GET_CODE (body);
 	    }
 	  else
 	    return;
@@ -17372,11 +17392,15 @@ arm_final_prescan_insn (rtx insn)
 	  label = XEXP (XEXP (SET_SRC (body), 2), 0);
 	  then_not_else = FALSE;
 	}
-      else if (GET_CODE (XEXP (SET_SRC (body), 1)) == RETURN)
-	seeking_return = 1;
-      else if (GET_CODE (XEXP (SET_SRC (body), 2)) == RETURN)
+      else if (ANY_RETURN_P (XEXP (SET_SRC (body), 1)))
+	{
+	  seeking_return = 1;
+	  return_code = GET_CODE (XEXP (SET_SRC (body), 1));
+	}
+      else if (ANY_RETURN_P (XEXP (SET_SRC (body), 2)))
         {
 	  seeking_return = 1;
+	  return_code = GET_CODE (XEXP (SET_SRC (body), 2));
 	  then_not_else = FALSE;
         }
       else
@@ -17477,8 +17501,7 @@ arm_final_prescan_insn (rtx insn)
 		       && !use_return_insn (TRUE, NULL)
 		       && !optimize_size)
 		fail = TRUE;
-	      else if (GET_CODE (scanbody) == RETURN
-		       && seeking_return)
+	      else if (GET_CODE (scanbody) == return_code)
 	        {
 		  arm_ccfsm_state = 2;
 		  succeed = TRUE;
Index: gcc/config/arm/arm.h
===================================================================
--- gcc.orig/config/arm/arm.h
+++ gcc/config/arm/arm.h
@@ -2255,6 +2255,8 @@ extern int making_const_table;
 #define RETURN_ADDR_RTX(COUNT, FRAME) \
   arm_return_addr (COUNT, FRAME)
 
+#define RETURN_ADDR_REGNUM LR_REGNUM
+
 /* Mask of the bits in the PC that contain the real return address
    when running in 26-bit mode.  */
 #define RETURN_ADDR_MASK26 (0x03fffffc)
Index: gcc/config/arm/arm.md
===================================================================
--- gcc.orig/config/arm/arm.md
+++ gcc/config/arm/arm.md
@@ -8116,66 +8116,65 @@
   [(set_attr "type" "call")]
 )
 
-(define_expand "return"
-  [(return)]
-  "TARGET_32BIT && USE_RETURN_INSN (FALSE)"
+(define_expand "<return_str>return"
+  [(returns)]
+  "TARGET_32BIT<return_cond>"
   "")
 
-;; Often the return insn will be the same as loading from memory, so set attr
-(define_insn "*arm_return"
-  [(return)]
-  "TARGET_ARM && USE_RETURN_INSN (FALSE)"
-  "*
-  {
-    if (arm_ccfsm_state == 2)
-      {
-        arm_ccfsm_state += 2;
-        return \"\";
-      }
-    return output_return_instruction (const_true_rtx, TRUE, FALSE);
-  }"
+(define_insn "*arm_<return_str>return"
+  [(returns)]
+  "TARGET_ARM<return_cond>"
+{
+  if (arm_ccfsm_state == 2)
+    {
+      arm_ccfsm_state += 2;
+      return "";
+    }
+  return output_return_instruction (const_true_rtx, true, false,
+				    <return_simple_p>);
+}
   [(set_attr "type" "load1")
    (set_attr "length" "12")
    (set_attr "predicable" "yes")]
 )
 
-(define_insn "*cond_return"
+(define_insn "*cond_<return_str>return"
   [(set (pc)
         (if_then_else (match_operator 0 "arm_comparison_operator"
 		       [(match_operand 1 "cc_register" "") (const_int 0)])
-                      (return)
+                      (returns)
                       (pc)))]
-  "TARGET_ARM && USE_RETURN_INSN (TRUE)"
-  "*
-  {
-    if (arm_ccfsm_state == 2)
-      {
-        arm_ccfsm_state += 2;
-        return \"\";
-      }
-    return output_return_instruction (operands[0], TRUE, FALSE);
-  }"
+  "TARGET_ARM<return_cond>"
+{
+  if (arm_ccfsm_state == 2)
+    {
+      arm_ccfsm_state += 2;
+      return "";
+    }
+  return output_return_instruction (operands[0], true, false,
+				    <return_simple_p>);
+}
   [(set_attr "conds" "use")
    (set_attr "length" "12")
    (set_attr "type" "load1")]
 )
 
-(define_insn "*cond_return_inverted"
+(define_insn "*cond_<return_str>return_inverted"
   [(set (pc)
         (if_then_else (match_operator 0 "arm_comparison_operator"
 		       [(match_operand 1 "cc_register" "") (const_int 0)])
                       (pc)
-		      (return)))]
-  "TARGET_ARM && USE_RETURN_INSN (TRUE)"
-  "*
-  {
-    if (arm_ccfsm_state == 2)
-      {
-        arm_ccfsm_state += 2;
-        return \"\";
-      }
-    return output_return_instruction (operands[0], TRUE, TRUE);
-  }"
+		      (returns)))]
+  "TARGET_ARM<return_cond>"
+{
+  if (arm_ccfsm_state == 2)
+    {
+      arm_ccfsm_state += 2;
+      return "";
+    }
+  return output_return_instruction (operands[0], true, true,
+				    <return_simple_p>);
+}
   [(set_attr "conds" "use")
    (set_attr "length" "12")
    (set_attr "type" "load1")]
@@ -9970,7 +9969,8 @@
       DONE;
     }
   emit_jump_insn (gen_rtx_UNSPEC_VOLATILE (VOIDmode,
-	gen_rtvec (1, ret_rtx), VUNSPEC_EPILOGUE));
+	gen_rtvec (1, ret_rtx),
+	VUNSPEC_EPILOGUE));
   DONE;
   "
 )
@@ -9986,7 +9986,7 @@
   "TARGET_32BIT"
   "*
   if (use_return_insn (FALSE, next_nonnote_insn (insn)))
-    return output_return_instruction (const_true_rtx, FALSE, FALSE);
+    return output_return_instruction (const_true_rtx, false, false, false);
   return arm_output_epilogue (next_nonnote_insn (insn));
   "
 ;; Length is absolute worst case
Index: gcc/config/arm/iterators.md
===================================================================
--- gcc.orig/config/arm/iterators.md
+++ gcc/config/arm/iterators.md
@@ -403,3 +403,11 @@
 
 ;; Assembler mnemonics for signedness of widening operations.
 (define_code_attr US [(sign_extend "s") (zero_extend "u")])
+
+;; Both kinds of return insn.
+(define_code_iterator returns [return simple_return])
+(define_code_attr return_str [(return "") (simple_return "simple_")])
+(define_code_attr return_simple_p [(return "false") (simple_return "true")])
+(define_code_attr return_cond [(return " && USE_RETURN_INSN (FALSE)")
+			       (simple_return " && use_simple_return_p ()")])
+
Index: gcc/config/arm/thumb2.md
===================================================================
--- gcc.orig/config/arm/thumb2.md
+++ gcc/config/arm/thumb2.md
@@ -635,16 +635,15 @@
 
 ;; Note: this is not predicable, to avoid issues with linker-generated
 ;; interworking stubs.
-(define_insn "*thumb2_return"
-  [(return)]
-  "TARGET_THUMB2 && USE_RETURN_INSN (FALSE)"
-  "*
-  {
-    return output_return_instruction (const_true_rtx, TRUE, FALSE);
-  }"
+(define_insn "*thumb2_<return_str>return"
+  [(returns)]
+  "TARGET_THUMB2<return_cond>"
+{
+  return output_return_instruction (const_true_rtx, true, false,
+				    <return_simple_p>);
+}
   [(set_attr "type" "load1")
-   (set_attr "length" "12")]
-)
+   (set_attr "length" "12")])
 
 (define_insn_and_split "thumb2_eh_return"
   [(unspec_volatile [(match_operand:SI 0 "s_register_operand" "r")]
Index: gcc/config/i386/i386.c
===================================================================
--- gcc.orig/config/i386/i386.c
+++ gcc/config/i386/i386.c
@@ -11230,13 +11230,13 @@ ix86_expand_epilogue (int style)
 
 	  pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
 				     popc, -1, true);
-	  emit_jump_insn (gen_return_indirect_internal (ecx));
+	  emit_jump_insn (gen_simple_return_indirect_internal (ecx));
 	}
       else
-	emit_jump_insn (gen_return_pop_internal (popc));
+	emit_jump_insn (gen_simple_return_pop_internal (popc));
     }
   else
-    emit_jump_insn (gen_return_internal ());
+    emit_jump_insn (gen_simple_return_internal ());
 
   /* Restore the state back to the state from the prologue,
      so that it's correct for the next epilogue.  */
@@ -30176,7 +30176,7 @@ ix86_pad_returns (void)
       rtx prev;
       bool replace = false;
 
-      if (!JUMP_P (ret) || GET_CODE (PATTERN (ret)) != RETURN
+      if (!JUMP_P (ret) || !ANY_RETURN_P (PATTERN (ret))
 	  || optimize_bb_for_size_p (bb))
 	continue;
       for (prev = PREV_INSN (ret); prev; prev = PREV_INSN (prev))
@@ -30206,7 +30206,10 @@ ix86_pad_returns (void)
 	}
       if (replace)
 	{
-	  emit_jump_insn_before (gen_return_internal_long (), ret);
+	  if (PATTERN (ret) == ret_rtx)
+	    emit_jump_insn_before (gen_return_internal_long (), ret);
+	  else
+	    emit_jump_insn_before (gen_simple_return_internal_long (), ret);
 	  delete_insn (ret);
 	}
     }
@@ -30227,7 +30230,7 @@ ix86_count_insn_bb (basic_block bb)
     {
       /* Only happen in exit blocks.  */
       if (JUMP_P (insn)
-	  && GET_CODE (PATTERN (insn)) == RETURN)
+	  && ANY_RETURN_P (PATTERN (insn)))
 	break;
 
       if (NONDEBUG_INSN_P (insn)
@@ -30300,7 +30303,7 @@ ix86_pad_short_function (void)
   FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR->preds)
     {
       rtx ret = BB_END (e->src);
-      if (JUMP_P (ret) && GET_CODE (PATTERN (ret)) == RETURN)
+      if (JUMP_P (ret) && ANY_RETURN_P (PATTERN (ret)))
 	{
 	  int insn_count = ix86_count_insn (e->src);
 
Index: gcc/config/i386/i386.md
===================================================================
--- gcc.orig/config/i386/i386.md
+++ gcc/config/i386/i386.md
@@ -11670,24 +11670,29 @@
   ""
   [(set_attr "length" "0")])
 
+(define_code_iterator returns [return simple_return])
+(define_code_attr return_str [(return "") (simple_return "simple_")])
+(define_code_attr return_cond [(return "ix86_can_use_return_insn_p ()")
+			       (simple_return "")])
+
 ;; Insn emitted into the body of a function to return from a function.
 ;; This is only done if the function's epilogue is known to be simple.
 ;; See comments for ix86_can_use_return_insn_p in i386.c.
 
-(define_expand "return"
-  [(return)]
-  "ix86_can_use_return_insn_p ()"
+(define_expand "<return_str>return"
+  [(returns)]
+  "<return_cond>"
 {
   if (crtl->args.pops_args)
     {
       rtx popc = GEN_INT (crtl->args.pops_args);
-      emit_jump_insn (gen_return_pop_internal (popc));
+      emit_jump_insn (gen_<return_str>return_pop_internal (popc));
       DONE;
     }
 })
 
-(define_insn "return_internal"
-  [(return)]
+(define_insn "<return_str>return_internal"
+  [(returns)]
   "reload_completed"
   "ret"
   [(set_attr "length" "1")
@@ -11698,8 +11703,8 @@
 ;; Used by x86_machine_dependent_reorg to avoid penalty on single byte RET
 ;; instruction Athlon and K8 have.
 
-(define_insn "return_internal_long"
-  [(return)
+(define_insn "<return_str>return_internal_long"
+  [(returns)
    (unspec [(const_int 0)] UNSPEC_REP)]
   "reload_completed"
   "rep\;ret"
@@ -11709,8 +11714,8 @@
    (set_attr "prefix_rep" "1")
    (set_attr "modrm" "0")])
 
-(define_insn "return_pop_internal"
-  [(return)
+(define_insn "<return_str>return_pop_internal"
+  [(returns)
    (use (match_operand:SI 0 "const_int_operand" ""))]
   "reload_completed"
   "ret\t%0"
@@ -11719,8 +11724,8 @@
    (set_attr "length_immediate" "2")
    (set_attr "modrm" "0")])
 
-(define_insn "return_indirect_internal"
-  [(return)
+(define_insn "<return_str>return_indirect_internal"
+  [(returns)
    (use (match_operand:SI 0 "register_operand" "r"))]
   "reload_completed"
   "jmp\t%A0"
Index: gcc/config/mips/mips.c
===================================================================
--- gcc.orig/config/mips/mips.c
+++ gcc/config/mips/mips.c
@@ -10543,7 +10543,8 @@ mips_expand_epilogue (bool sibcall_p)
 	    regno = GP_REG_FIRST + 7;
 	  else
 	    regno = RETURN_ADDR_REGNUM;
-	  emit_jump_insn (gen_return_internal (gen_rtx_REG (Pmode, regno)));
+	  emit_jump_insn (gen_simple_return_internal (gen_rtx_REG (Pmode,
+								   regno)));
 	}
     }
 
Index: gcc/config/mips/mips.md
===================================================================
--- gcc.orig/config/mips/mips.md
+++ gcc/config/mips/mips.md
@@ -5716,6 +5716,18 @@
   [(set_attr "type"	"jump")
    (set_attr "mode"	"none")])
 
+(define_expand "simple_return"
+  [(simple_return)]
+  "!mips_can_use_return_insn ()"
+  { mips_expand_before_return (); })
+
+(define_insn "*simple_return"
+  [(simple_return)]
+  "!mips_can_use_return_insn ()"
+  "%*j\t$31%/"
+  [(set_attr "type"	"jump")
+   (set_attr "mode"	"none")])
+
 ;; Normal return.
 
 (define_insn "return_internal"
@@ -5726,6 +5738,14 @@
   [(set_attr "type"	"jump")
    (set_attr "mode"	"none")])
 
+(define_insn "simple_return_internal"
+  [(simple_return)
+   (use (match_operand 0 "pmode_register_operand" ""))]
+  ""
+  "%*j\t%0%/"
+  [(set_attr "type"	"jump")
+   (set_attr "mode"	"none")])
+
 ;; Exception return.
 (define_insn "mips_eret"
   [(return)
Index: gcc/config/sh/sh.c
===================================================================
--- gcc.orig/config/sh/sh.c
+++ gcc/config/sh/sh.c
@@ -5455,7 +5455,8 @@ barrier_align (rtx barrier_or_label)
 	}
       if (prev
 	  && JUMP_P (prev)
-	  && JUMP_LABEL (prev))
+	  && JUMP_LABEL (prev)
+	  && !ANY_RETURN_P (JUMP_LABEL (prev)))
 	{
 	  rtx x;
 	  if (jump_to_next
@@ -6154,7 +6155,7 @@ split_branches (rtx first)
 			JUMP_LABEL (insn) = far_label;
 			LABEL_NUSES (far_label)++;
 		      }
-		    redirect_jump (insn, NULL_RTX, 1);
+		    redirect_jump (insn, ret_rtx, 1);
 		    far_label = 0;
 		  }
 	      }
Index: gcc/config/sparc/sparc.c
===================================================================
--- gcc.orig/config/sparc/sparc.c
+++ gcc/config/sparc/sparc.c
@@ -6105,7 +6105,7 @@ sparc_struct_value_rtx (tree fndecl, int
 	  /* We must check and adjust the return address, as it is
 	     optional as to whether the return object is really
 	     provided.  */
-	  rtx ret_rtx = gen_rtx_REG (Pmode, 31);
+	  rtx ret_reg = gen_rtx_REG (Pmode, 31);
 	  rtx scratch = gen_reg_rtx (SImode);
 	  rtx endlab = gen_label_rtx ();
 
@@ -6122,12 +6122,12 @@ sparc_struct_value_rtx (tree fndecl, int
 	     it's an unimp instruction (the most significant 10 bits
 	     will be zero).  */
 	  emit_move_insn (scratch, gen_rtx_MEM (SImode,
-						plus_constant (ret_rtx, 8)));
+						plus_constant (ret_reg, 8)));
 	  /* Assume the size is valid and pre-adjust */
-	  emit_insn (gen_add3_insn (ret_rtx, ret_rtx, GEN_INT (4)));
+	  emit_insn (gen_add3_insn (ret_reg, ret_reg, GEN_INT (4)));
 	  emit_cmp_and_jump_insns (scratch, size_rtx, EQ, const0_rtx, SImode,
 				   0, endlab);
-	  emit_insn (gen_sub3_insn (ret_rtx, ret_rtx, GEN_INT (4)));
+	  emit_insn (gen_sub3_insn (ret_reg, ret_reg, GEN_INT (4)));
 	  /* Write the address of the memory pointed to by temp_val into
 	     the memory pointed to by mem */
 	  emit_move_insn (mem, XEXP (temp_val, 0));
Index: gcc/df-scan.c
===================================================================
--- gcc.orig/df-scan.c
+++ gcc/df-scan.c
@@ -3181,6 +3181,7 @@ df_uses_record (struct df_collection_rec
       }
 
     case RETURN:
+    case SIMPLE_RETURN:
       break;
 
     case ASM_OPERANDS:
Index: gcc/doc/md.texi
===================================================================
--- gcc.orig/doc/md.texi
+++ gcc/doc/md.texi
@@ -4947,7 +4947,19 @@ RTL generation phase.  In this case it i
 multiple instructions are usually needed to return from a function, but
 some class of functions only requires one instruction to implement a
 return.  Normally, the applicable functions are those which do not need
-to save any registers or allocate stack space.
+to save any registers or allocate stack space, although some targets
+have instructions that can perform both the epilogue and function return
+in one instruction.
+
+@cindex @code{simple_return} instruction pattern
+@item @samp{simple_return}
+Subroutine return instruction.  This instruction pattern name should be
+defined only if a single instruction can do all the work of returning
+from a function on a path where no epilogue is required.  This pattern
+is very similar to the @code{return} instruction pattern, but it is emitted
+only by the shrink-wrapping optimization on paths where the function
+prologue has not been executed, and a function return should occur without
+any of the effects of the epilogue.
 
 @findex reload_completed
 @findex leaf_function_p
Index: gcc/doc/rtl.texi
===================================================================
--- gcc.orig/doc/rtl.texi
+++ gcc/doc/rtl.texi
@@ -2895,6 +2895,13 @@ placed in @code{pc} to return to the cal
 Note that an insn pattern of @code{(return)} is logically equivalent to
 @code{(set (pc) (return))}, but the latter form is never used.
 
+@findex simple_return
+@item (simple_return)
+Like @code{(return)}, but truly represents only a function return, while
+@code{(return)} may represent an insn that also performs other functions
+of the function epilogue.  Like @code{(return)}, this may also occur in
+conditional jumps.
+
 @findex call
 @item (call @var{function} @var{nargs})
 Represents a function call.  @var{function} is a @code{mem} expression
@@ -3024,7 +3031,7 @@ Represents several side effects performe
 brackets stand for a vector; the operand of @code{parallel} is a
 vector of expressions.  @var{x0}, @var{x1} and so on are individual
 side effect expressions---expressions of code @code{set}, @code{call},
-@code{return}, @code{clobber} or @code{use}.
+@code{return}, @code{simple_return}, @code{clobber} or @code{use}.
 
 ``In parallel'' means that first all the values used in the individual
 side-effects are computed, and second all the actual side-effects are
@@ -3663,14 +3670,16 @@ and @code{call_insn} insns:
 @table @code
 @findex PATTERN
 @item PATTERN (@var{i})
-An expression for the side effect performed by this insn.  This must be
-one of the following codes: @code{set}, @code{call}, @code{use},
-@code{clobber}, @code{return}, @code{asm_input}, @code{asm_output},
-@code{addr_vec}, @code{addr_diff_vec}, @code{trap_if}, @code{unspec},
-@code{unspec_volatile}, @code{parallel}, @code{cond_exec}, or @code{sequence}.  If it is a @code{parallel},
-each element of the @code{parallel} must be one these codes, except that
-@code{parallel} expressions cannot be nested and @code{addr_vec} and
-@code{addr_diff_vec} are not permitted inside a @code{parallel} expression.
+An expression for the side effect performed by this insn.  This must
+be one of the following codes: @code{set}, @code{call}, @code{use},
+@code{clobber}, @code{return}, @code{simple_return}, @code{asm_input},
+@code{asm_output}, @code{addr_vec}, @code{addr_diff_vec},
+@code{trap_if}, @code{unspec}, @code{unspec_volatile},
+@code{parallel}, @code{cond_exec}, or @code{sequence}.  If it is a
+@code{parallel}, each element of the @code{parallel} must be one these
+codes, except that @code{parallel} expressions cannot be nested and
+@code{addr_vec} and @code{addr_diff_vec} are not permitted inside a
+@code{parallel} expression.
 
 @findex INSN_CODE
 @item INSN_CODE (@var{i})
Index: gcc/doc/tm.texi
===================================================================
--- gcc.orig/doc/tm.texi
+++ gcc/doc/tm.texi
@@ -3210,6 +3210,12 @@ Define this if the return address of a p
 from the frame pointer of the previous stack frame.
 @end defmac
 
+@defmac RETURN_ADDR_REGNUM
+If defined, a C expression whose value is the register number of the return
+address for the current function.  Targets that pass the return address on
+the stack should not define this macro.
+@end defmac
+
 @defmac INCOMING_RETURN_ADDR_RTX
 A C expression whose value is RTL representing the location of the
 incoming return address at the beginning of any function, before the
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc.orig/doc/tm.texi.in
+++ gcc/doc/tm.texi.in
@@ -3198,6 +3198,12 @@ Define this if the return address of a p
 from the frame pointer of the previous stack frame.
 @end defmac
 
+@defmac RETURN_ADDR_REGNUM
+If defined, a C expression whose value is the register number of the return
+address for the current function.  Targets that pass the return address on
+the stack should not define this macro.
+@end defmac
+
 @defmac INCOMING_RETURN_ADDR_RTX
 A C expression whose value is RTL representing the location of the
 incoming return address at the beginning of any function, before the
Index: gcc/dwarf2out.c
===================================================================
--- gcc.orig/dwarf2out.c
+++ gcc/dwarf2out.c
@@ -1490,7 +1490,7 @@ compute_barrier_args_size_1 (rtx insn, H
     {
       rtx dest = JUMP_LABEL (insn);
 
-      if (dest)
+      if (dest && !ANY_RETURN_P (dest))
 	{
 	  if (barrier_args_size [INSN_UID (dest)] < 0)
 	    {
Index: gcc/emit-rtl.c
===================================================================
--- gcc.orig/emit-rtl.c
+++ gcc/emit-rtl.c
@@ -2448,6 +2448,7 @@ verify_rtx_sharing (rtx orig, rtx insn)
     case PC:
     case CC0:
     case RETURN:
+    case SIMPLE_RETURN:
     case SCRATCH:
       return;
       /* SCRATCH must be shared because they represent distinct values.  */
@@ -3252,14 +3253,17 @@ prev_label (rtx insn)
   return insn;
 }
 
-/* Return the last label to mark the same position as LABEL.  Return null
-   if LABEL itself is null.  */
+/* Return the last label to mark the same position as LABEL.  Return LABEL
+   itself if it is null or any return rtx.  */
 
 rtx
 skip_consecutive_labels (rtx label)
 {
   rtx insn;
 
+  if (label && ANY_RETURN_P (label))
+    return label;
+
   for (insn = label; insn != 0 && !INSN_P (insn); insn = NEXT_INSN (insn))
     if (LABEL_P (insn))
       label = insn;
@@ -5145,7 +5149,7 @@ classify_insn (rtx x)
     return CODE_LABEL;
   if (GET_CODE (x) == CALL)
     return CALL_INSN;
-  if (GET_CODE (x) == RETURN)
+  if (GET_CODE (x) == RETURN || GET_CODE (x) == SIMPLE_RETURN)
     return JUMP_INSN;
   if (GET_CODE (x) == SET)
     {
@@ -5654,6 +5658,7 @@ init_emit_regs (void)
   /* Assign register numbers to the globally defined register rtx.  */
   pc_rtx = gen_rtx_fmt_ (PC, VOIDmode);
   ret_rtx = gen_rtx_fmt_ (RETURN, VOIDmode);
+  simple_return_rtx = gen_rtx_fmt_ (SIMPLE_RETURN, VOIDmode);
   cc0_rtx = gen_rtx_fmt_ (CC0, VOIDmode);
   stack_pointer_rtx = gen_raw_REG (Pmode, STACK_POINTER_REGNUM);
   frame_pointer_rtx = gen_raw_REG (Pmode, FRAME_POINTER_REGNUM);
Index: gcc/final.c
===================================================================
--- gcc.orig/final.c
+++ gcc/final.c
@@ -2425,7 +2425,8 @@ final_scan_insn (rtx insn, FILE *file, i
 	        delete_insn (insn);
 		break;
 	      }
-	    else if (GET_CODE (SET_SRC (body)) == RETURN)
+	    else if (GET_CODE (SET_SRC (body)) == RETURN
+		     || GET_CODE (SET_SRC (body)) == SIMPLE_RETURN)
 	      /* Replace (set (pc) (return)) with (return).  */
 	      PATTERN (insn) = body = SET_SRC (body);
 
Index: gcc/function.c
===================================================================
--- gcc.orig/function.c
+++ gcc/function.c
@@ -146,9 +146,6 @@ extern tree debug_find_var_in_block_tree
    can always export `prologue_epilogue_contains'.  */
 static void record_insns (rtx, rtx, htab_t *) ATTRIBUTE_UNUSED;
 static bool contains (const_rtx, htab_t);
-#ifdef HAVE_return
-static void emit_return_into_block (basic_block);
-#endif
 static void prepare_function_start (void);
 static void do_clobber_return_reg (rtx, void *);
 static void do_use_return_reg (rtx, void *);
@@ -5262,42 +5259,181 @@ prologue_epilogue_contains (const_rtx in
   return 0;
 }
 
+#ifdef HAVE_simple_return
+/* A subroutine of requires_stack_frame_p, called via for_each_rtx.
+   If any change is made, set CHANGED
+   to true.  */
+
+static int
+frame_required_for_rtx (rtx *loc, void *data ATTRIBUTE_UNUSED)
+{
+  rtx x = *loc;
+  if (x == stack_pointer_rtx || x == hard_frame_pointer_rtx
+      || x == arg_pointer_rtx || x == pic_offset_table_rtx
+#ifdef RETURN_ADDR_REGNUM
+      || (REG_P (x) && REGNO (x) == RETURN_ADDR_REGNUM)
+#endif
+      )
+    return 1;
+  return 0;
+}
+
+/* Return true if INSN requires the stack frame to be set up.  */
+static bool
+requires_stack_frame_p (rtx insn)
+{
+  HARD_REG_SET hardregs;
+  unsigned regno;
+
+  if (!INSN_P (insn) || DEBUG_INSN_P (insn))
+    return false;
+  if (CALL_P (insn))
+    return !SIBLING_CALL_P (insn);
+  if (for_each_rtx (&PATTERN (insn), frame_required_for_rtx, NULL))
+    return true;
+  CLEAR_HARD_REG_SET (hardregs);
+  note_stores (PATTERN (insn), record_hard_reg_sets, &hardregs);
+  AND_COMPL_HARD_REG_SET (hardregs, call_used_reg_set);
+  for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+    if (TEST_HARD_REG_BIT (hardregs, regno)
+	&& df_regs_ever_live_p (regno))
+      return true;
+  return false;
+}
+#endif
+
 #ifdef HAVE_return
-/* Insert gen_return at the end of block BB.  This also means updating
-   block_for_insn appropriately.  */
+
+static rtx
+gen_return_pattern (bool simple_p)
+{
+#ifdef HAVE_simple_return
+  return simple_p ? gen_simple_return () : gen_return ();
+#else
+  gcc_assert (!simple_p);
+  return gen_return ();
+#endif
+}
+
+/* Insert an appropriate return pattern at the end of block BB.  This
+   also means updating block_for_insn appropriately.  */
 
 static void
-emit_return_into_block (basic_block bb)
+emit_return_into_block (bool simple_p, basic_block bb)
 {
-  emit_jump_insn_after (gen_return (), BB_END (bb));
+  rtx jump;
+  jump = emit_jump_insn_after (gen_return_pattern (simple_p), BB_END (bb));
+  JUMP_LABEL (jump) = simple_p ? simple_return_rtx : ret_rtx;
 }
-#endif /* HAVE_return */
+#endif
 
 /* Generate the prologue and epilogue RTL if the machine supports it.  Thread
    this into place with notes indicating where the prologue ends and where
-   the epilogue begins.  Update the basic block information when possible.  */
+   the epilogue begins.  Update the basic block information when possible.
+
+   Notes on epilogue placement:
+   There are several kinds of edges to the exit block:
+   * a single fallthru edge from LAST_BB
+   * possibly, edges from blocks containing sibcalls
+   * possibly, fake edges from infinite loops
+
+   The epilogue is always emitted on the fallthru edge from the last basic
+   block in the function, LAST_BB, into the exit block.
+
+   If LAST_BB is empty except for a label, it is the target of every
+   other basic block in the function that ends in a return.  If a
+   target has a return or simple_return pattern (possibly with
+   conditional variants), these basic blocks can be changed so that a
+   return insn is emitted into them, and their target is adjusted to
+   the real exit block.
+
+   Notes on shrink wrapping: We implement a fairly conservative
+   version of shrink-wrapping rather than the textbook one.  We only
+   generate a single prologue and a single epilogue.  This is
+   sufficient to catch a number of interesting cases involving early
+   exits.
+
+   First, we identify the blocks that require the prologue to occur before
+   them.  These are the ones that modify a call-saved register, or reference
+   any of the stack or frame pointer registers.  To simplify things, we then
+   mark everything reachable from these blocks as also requiring a prologue.
+   This takes care of loops automatically, and avoids the need to examine
+   whether MEMs reference the frame, since it is sufficient to check for
+   occurrences of the stack or frame pointer.
+
+   We then compute the set of blocks for which the need for a prologue
+   is anticipatable (borrowing terminology from the shrink-wrapping
+   description in Muchnick's book).  These are the blocks which either
+   require a prologue themselves, or those that have only successors
+   where the prologue is anticipatable.  The prologue needs to be
+   inserted on all edges from BB1->BB2 where BB2 is in ANTIC and BB1
+   is not.  For the moment, we ensure that only one such edge exists.
+
+   The epilogue is placed as described above, but we make a
+   distinction between inserting return and simple_return patterns
+   when modifying other blocks that end in a return.  Blocks that end
+   in a sibcall omit the sibcall_epilogue if the block is not in
+   ANTIC.  */
 
 static void
 thread_prologue_and_epilogue_insns (void)
 {
   bool inserted;
+  basic_block last_bb;
+  bool last_bb_active;
+#ifdef HAVE_simple_return
+  bool unconverted_simple_returns = false;
+  basic_block simple_return_block = NULL;
+#endif
+  rtx returnjump ATTRIBUTE_UNUSED;
   rtx seq ATTRIBUTE_UNUSED, epilogue_end ATTRIBUTE_UNUSED;
-  edge entry_edge ATTRIBUTE_UNUSED;
+  rtx prologue_seq ATTRIBUTE_UNUSED, split_prologue_seq ATTRIBUTE_UNUSED;
+  edge entry_edge, orig_entry_edge, exit_fallthru_edge;
   edge e;
   edge_iterator ei;
+  bitmap_head bb_flags;
+
+  df_analyze ();
 
   rtl_profile_for_bb (ENTRY_BLOCK_PTR);
 
   inserted = false;
   seq = NULL_RTX;
+  prologue_seq = NULL_RTX;
   epilogue_end = NULL_RTX;
+  returnjump = NULL_RTX;
 
   /* Can't deal with multiple successors of the entry block at the
      moment.  Function should always have at least one entry
      point.  */
   gcc_assert (single_succ_p (ENTRY_BLOCK_PTR));
   entry_edge = single_succ_edge (ENTRY_BLOCK_PTR);
+  orig_entry_edge = entry_edge;
 
+  exit_fallthru_edge = find_fallthru_edge (EXIT_BLOCK_PTR->preds);
+  if (exit_fallthru_edge != NULL)
+    {
+      rtx label;
+
+      last_bb = exit_fallthru_edge->src;
+      /* Test whether there are active instructions in the last block.  */
+      label = BB_END (last_bb);
+      while (label && !LABEL_P (label))
+	{
+	  if (active_insn_p (label))
+	    break;
+	  label = PREV_INSN (label);
+	}
+
+      last_bb_active = BB_HEAD (last_bb) != label || !LABEL_P (label);
+    }
+  else
+    {
+      last_bb = NULL;
+      last_bb_active = false;
+    }
+
+  split_prologue_seq = NULL_RTX;
   if (flag_split_stack
       && (lookup_attribute ("no_split_stack", DECL_ATTRIBUTES (cfun->decl))
 	  == NULL))
@@ -5309,21 +5445,15 @@ thread_prologue_and_epilogue_insns (void
 
       start_sequence ();
       emit_insn (gen_split_stack_prologue ());
-      seq = get_insns ();
+      split_prologue_seq = get_insns ();
       end_sequence ();
 
-      record_insns (seq, NULL, &prologue_insn_hash);
-      set_insn_locators (seq, prologue_locator);
-
-      /* This relies on the fact that committing the edge insertion
-	 will look for basic blocks within the inserted instructions,
-	 which in turn relies on the fact that we are not in CFG
-	 layout mode here.  */
-      insert_insn_on_edge (seq, entry_edge);
-      inserted = true;
+      record_insns (split_prologue_seq, NULL, &prologue_insn_hash);
+      set_insn_locators (split_prologue_seq, prologue_locator);
 #endif
     }
 
+  prologue_seq = NULL_RTX;
 #ifdef HAVE_prologue
   if (HAVE_prologue)
     {
@@ -5346,15 +5476,182 @@ thread_prologue_and_epilogue_insns (void
       if (!targetm.profile_before_prologue () && crtl->profile)
         emit_insn (gen_blockage ());
 
-      seq = get_insns ();
+      prologue_seq = get_insns ();
       end_sequence ();
-      set_insn_locators (seq, prologue_locator);
+      set_insn_locators (prologue_seq, prologue_locator);
+    }
+#endif
 
-      insert_insn_on_edge (seq, entry_edge);
-      inserted = true;
+  bitmap_initialize (&bb_flags, &bitmap_default_obstack);
+
+#ifdef HAVE_simple_return
+  /* Try to perform a kind of shrink-wrapping, making sure the
+     prologue/epilogue is emitted only around those parts of the
+     function that require it.  */
+
+  if (flag_shrink_wrap && HAVE_simple_return && !flag_non_call_exceptions
+      && HAVE_prologue && !crtl->calls_eh_return)
+    {
+      HARD_REG_SET prologue_clobbered, live_on_edge;
+      rtx p_insn;
+
+      VEC(basic_block, heap) *vec;
+      basic_block bb;
+      bitmap_head bb_antic_flags;
+      bitmap_head bb_on_list;
+
+      bitmap_initialize (&bb_antic_flags, &bitmap_default_obstack);
+      bitmap_initialize (&bb_on_list, &bitmap_default_obstack);
+
+      vec = VEC_alloc (basic_block, heap, n_basic_blocks);
+
+      FOR_EACH_BB (bb)
+	{
+	  rtx insn;
+	  FOR_BB_INSNS (bb, insn)
+	    {
+	      if (requires_stack_frame_p (insn))
+		{
+		  bitmap_set_bit (&bb_flags, bb->index);
+		  VEC_quick_push (basic_block, vec, bb);
+		  break;
+		}
+	    }
+	}
+
+      /* For every basic block that needs a prologue, mark all blocks
+	 reachable from it, so as to ensure they are also seen as
+	 requiring a prologue.  */
+      while (!VEC_empty (basic_block, vec))
+	{
+	  basic_block tmp_bb = VEC_pop (basic_block, vec);
+	  edge e;
+	  edge_iterator ei;
+	  FOR_EACH_EDGE (e, ei, tmp_bb->succs)
+	    {
+	      if (e->dest == EXIT_BLOCK_PTR
+		  || bitmap_bit_p (&bb_flags, e->dest->index))
+		continue;
+	      bitmap_set_bit (&bb_flags, e->dest->index);
+	      VEC_quick_push (basic_block, vec, e->dest);
+	    }
+	}
+      /* If the last basic block contains only a label, we'll be able
+	 to convert jumps to it to (potentially conditional) return
+	 insns later.  This means we don't necessarily need a prologue
+	 for paths reaching it.  */
+      if (last_bb)
+	{
+	  if (!last_bb_active)
+	    bitmap_clear_bit (&bb_flags, last_bb->index);
+	  else if (!bitmap_bit_p (&bb_flags, last_bb->index))
+	    goto fail_shrinkwrap;
+	}
+
+      /* Now walk backwards from every block that is marked as needing
+	 a prologue to compute the bb_antic_flags bitmap.  */
+      bitmap_copy (&bb_antic_flags, &bb_flags);
+      FOR_EACH_BB (bb)
+	{
+	  edge e;
+	  edge_iterator ei;
+	  if (!bitmap_bit_p (&bb_flags, bb->index))
+	    continue;
+	  FOR_EACH_EDGE (e, ei, bb->preds)
+	    if (!bitmap_bit_p (&bb_antic_flags, e->src->index))
+	      {
+		VEC_quick_push (basic_block, vec, e->src);
+		bitmap_set_bit (&bb_on_list, e->src->index);
+	      }
+	}
+      while (!VEC_empty (basic_block, vec))
+	{
+	  basic_block tmp_bb = VEC_pop (basic_block, vec);
+	  edge e;
+	  edge_iterator ei;
+	  bool all_set = true;
+
+	  bitmap_clear_bit (&bb_on_list, tmp_bb->index);
+	  FOR_EACH_EDGE (e, ei, tmp_bb->succs)
+	    {
+	      if (!bitmap_bit_p (&bb_antic_flags, e->dest->index))
+		{
+		  all_set = false;
+		  break;
+		}
+	    }
+	  if (all_set)
+	    {
+	      bitmap_set_bit (&bb_antic_flags, tmp_bb->index);
+	      FOR_EACH_EDGE (e, ei, tmp_bb->preds)
+		if (!bitmap_bit_p (&bb_antic_flags, e->src->index))
+		  {
+		    VEC_quick_push (basic_block, vec, e->src);
+		    bitmap_set_bit (&bb_on_list, e->src->index);
+		  }
+	    }
+	}
+      /* Find exactly one edge that leads to a block in ANTIC from
+	 a block that isn't.  */
+      if (!bitmap_bit_p (&bb_antic_flags, entry_edge->dest->index))
+	FOR_EACH_BB (bb)
+	  {
+	    if (!bitmap_bit_p (&bb_antic_flags, bb->index))
+	      continue;
+	    FOR_EACH_EDGE (e, ei, bb->preds)
+	      if (!bitmap_bit_p (&bb_antic_flags, e->src->index))
+		{
+		  if (entry_edge != orig_entry_edge)
+		    {
+		      entry_edge = orig_entry_edge;
+		      goto fail_shrinkwrap;
+		    }
+		  entry_edge = e;
+		}
+	  }
+
+      /* Test whether the prologue is known to clobber any register
+	 (other than FP or SP) which are live on the edge.  */
+      CLEAR_HARD_REG_SET (prologue_clobbered);
+      for (p_insn = prologue_seq; p_insn; p_insn = NEXT_INSN (p_insn))
+	if (NONDEBUG_INSN_P (p_insn))
+	  note_stores (PATTERN (p_insn), record_hard_reg_sets,
+		       &prologue_clobbered);
+      for (p_insn = split_prologue_seq; p_insn; p_insn = NEXT_INSN (p_insn))
+	if (NONDEBUG_INSN_P (p_insn))
+	  note_stores (PATTERN (p_insn), record_hard_reg_sets,
+		       &prologue_clobbered);
+      CLEAR_HARD_REG_BIT (prologue_clobbered, STACK_POINTER_REGNUM);
+      if (frame_pointer_needed)
+	CLEAR_HARD_REG_BIT (prologue_clobbered, HARD_FRAME_POINTER_REGNUM);
+      CLEAR_HARD_REG_SET (live_on_edge);
+      reg_set_to_hard_reg_set (&live_on_edge,
+			       df_get_live_in (entry_edge->dest));
+      if (hard_reg_set_intersect_p (live_on_edge, prologue_clobbered))
+	entry_edge = orig_entry_edge;
+
+    fail_shrinkwrap:
+      bitmap_clear (&bb_antic_flags);
+      bitmap_clear (&bb_on_list);
+      VEC_free (basic_block, heap, vec);
     }
 #endif
 
+  if (split_prologue_seq != NULL_RTX)
+    {
+      /* This relies on the fact that committing the edge insertion
+	 will look for basic blocks within the inserted instructions,
+	 which in turn relies on the fact that we are not in CFG
+	 layout mode here.  */
+      insert_insn_on_edge (split_prologue_seq, entry_edge);
+      inserted = true;
+    }
+  if (prologue_seq != NULL_RTX)
+    {
+      insert_insn_on_edge (prologue_seq, entry_edge);
+      inserted = true;
+    }
+
   /* If the exit block has no non-fake predecessors, we don't need
      an epilogue.  */
   FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR->preds)
@@ -5364,98 +5661,130 @@ thread_prologue_and_epilogue_insns (void
     goto epilogue_done;
 
   rtl_profile_for_bb (EXIT_BLOCK_PTR);
+
 #ifdef HAVE_return
-  if (optimize && HAVE_return)
+  /* If we're allowed to generate a simple return instruction, then by
+     definition we don't need a full epilogue.  If the last basic
+     block before the exit block does not contain active instructions,
+     examine its predecessors and try to emit (conditional) return
+     instructions.  */
+  if (optimize && !last_bb_active
+      && (HAVE_return || entry_edge != orig_entry_edge))
     {
-      /* If we're allowed to generate a simple return instruction,
-	 then by definition we don't need a full epilogue.  Examine
-	 the block that falls through to EXIT.   If it does not
-	 contain any code, examine its predecessors and try to
-	 emit (conditional) return instructions.  */
-
-      basic_block last;
+      edge_iterator ei2;
+      int i;
+      basic_block bb;
       rtx label;
+      VEC(basic_block,heap) *src_bbs;
 
-      e = find_fallthru_edge (EXIT_BLOCK_PTR->preds);
-      if (e == NULL)
+      if (exit_fallthru_edge == NULL)
 	goto epilogue_done;
-      last = e->src;
+      label = BB_HEAD (last_bb);
 
-      /* Verify that there are no active instructions in the last block.  */
-      label = BB_END (last);
-      while (label && !LABEL_P (label))
-	{
-	  if (active_insn_p (label))
-	    break;
-	  label = PREV_INSN (label);
-	}
+      src_bbs = VEC_alloc (basic_block, heap, EDGE_COUNT (last_bb->preds));
+      FOR_EACH_EDGE (e, ei2, last_bb->preds)
+	if (e->src != ENTRY_BLOCK_PTR)
+	  VEC_quick_push (basic_block, src_bbs, e->src);
 
-      if (BB_HEAD (last) == label && LABEL_P (label))
+      FOR_EACH_VEC_ELT (basic_block, src_bbs, i, bb)
 	{
-	  edge_iterator ei2;
+	  bool simple_p;
+	  rtx jump;
+	  e = find_edge (bb, last_bb);
 
-	  for (ei2 = ei_start (last->preds); (e = ei_safe_edge (ei2)); )
-	    {
-	      basic_block bb = e->src;
-	      rtx jump;
+	  jump = BB_END (bb);
 
-	      if (bb == ENTRY_BLOCK_PTR)
-		{
-		  ei_next (&ei2);
-		  continue;
-		}
+#ifdef HAVE_simple_return
+	  simple_p = (entry_edge != orig_entry_edge
+		      ? !bitmap_bit_p (&bb_flags, bb->index) : false);
+#else
+	  simple_p = false;
+#endif
 
-	      jump = BB_END (bb);
-	      if (!JUMP_P (jump) || JUMP_LABEL (jump) != label)
-		{
-		  ei_next (&ei2);
-		  continue;
-		}
+	  if (!simple_p
+	      && (!HAVE_return || !JUMP_P (jump)
+		  || JUMP_LABEL (jump) != label))
+	    continue;
 
-	      /* If we have an unconditional jump, we can replace that
-		 with a simple return instruction.  */
-	      if (simplejump_p (jump))
-		{
-		  emit_return_into_block (bb);
-		  delete_insn (jump);
-		}
+	  /* If we have an unconditional jump, we can replace that
+	     with a simple return instruction.  */
+	  if (!JUMP_P (jump))
+	    {
+	      emit_barrier_after (BB_END (bb));
+	      emit_return_into_block (simple_p, bb);
+	    }
+	  else if (simplejump_p (jump))
+	    {
+	      emit_return_into_block (simple_p, bb);
+	      delete_insn (jump);
+	    }
+	  else if (condjump_p (jump) && JUMP_LABEL (jump) != label)
+	    {
+	      basic_block new_bb;
+	      edge new_e;
 
-	      /* If we have a conditional jump, we can try to replace
-		 that with a conditional return instruction.  */
-	      else if (condjump_p (jump))
-		{
-		  if (! redirect_jump (jump, 0, 0))
-		    {
-		      ei_next (&ei2);
-		      continue;
-		    }
+	      gcc_assert (simple_p);
+	      new_bb = split_edge (e);
+	      emit_barrier_after (BB_END (new_bb));
+	      emit_return_into_block (simple_p, new_bb);
+#ifdef HAVE_simple_return
+	      simple_return_block = new_bb;
+#endif
+	      new_e = single_succ_edge (new_bb);
+	      redirect_edge_succ (new_e, EXIT_BLOCK_PTR);
 
-		  /* If this block has only one successor, it both jumps
-		     and falls through to the fallthru block, so we can't
-		     delete the edge.  */
-		  if (single_succ_p (bb))
-		    {
-		      ei_next (&ei2);
-		      continue;
-		    }
-		}
+	      continue;
+	    }
+	  /* If we have a conditional jump branching to the last
+	     block, we can try to replace that with a conditional
+	     return instruction.  */
+	  else if (condjump_p (jump))
+	    {
+	      rtx dest;
+	      if (simple_p)
+		dest = simple_return_rtx;
 	      else
+		dest = ret_rtx;
+	      if (! redirect_jump (jump, dest, 0))
 		{
-		  ei_next (&ei2);
+#ifdef HAVE_simple_return
+		  if (simple_p)
+		    unconverted_simple_returns = true;
+#endif
 		  continue;
 		}
 
-	      /* Fix up the CFG for the successful change we just made.  */
-	      redirect_edge_succ (e, EXIT_BLOCK_PTR);
+	      /* If this block has only one successor, it both jumps
+		 and falls through to the fallthru block, so we can't
+		 delete the edge.  */
+	      if (single_succ_p (bb))
+		continue;
 	    }
+	  else
+	    {
+#ifdef HAVE_simple_return
+	      if (simple_p)
+		unconverted_simple_returns = true;
+#endif
+	      continue;
+	    }
+
+	  /* Fix up the CFG for the successful change we just made.  */
+	  redirect_edge_succ (e, EXIT_BLOCK_PTR);
+	}
+      VEC_free (basic_block, heap, src_bbs);
 
+      if (HAVE_return)
+	{
 	  /* Emit a return insn for the exit fallthru block.  Whether
 	     this is still reachable will be determined later.  */
 
-	  emit_barrier_after (BB_END (last));
-	  emit_return_into_block (last);
-	  epilogue_end = BB_END (last);
-	  single_succ_edge (last)->flags &= ~EDGE_FALLTHRU;
+	  emit_barrier_after (BB_END (last_bb));
+	  emit_return_into_block (false, last_bb);
+	  epilogue_end = BB_END (last_bb);
+	  if (JUMP_P (epilogue_end))
+	    JUMP_LABEL (epilogue_end) = ret_rtx;
+	  single_succ_edge (last_bb)->flags &= ~EDGE_FALLTHRU;
 	  goto epilogue_done;
 	}
     }
@@ -5492,13 +5821,10 @@ thread_prologue_and_epilogue_insns (void
     }
 #endif
 
-  /* Find the edge that falls through to EXIT.  Other edges may exist
-     due to RETURN instructions, but those don't need epilogues.
-     There really shouldn't be a mixture -- either all should have
-     been converted or none, however...  */
+  /* If nothing falls through into the exit block, we don't need an
+     epilogue.  */
 
-  e = find_fallthru_edge (EXIT_BLOCK_PTR->preds);
-  if (e == NULL)
+  if (exit_fallthru_edge == NULL)
     goto epilogue_done;
 
 #ifdef HAVE_epilogue
@@ -5515,25 +5841,38 @@ thread_prologue_and_epilogue_insns (void
       set_insn_locators (seq, epilogue_locator);
 
       seq = get_insns ();
+      returnjump = get_last_insn ();
       end_sequence ();
 
-      insert_insn_on_edge (seq, e);
+      insert_insn_on_edge (seq, exit_fallthru_edge);
       inserted = true;
+      if (JUMP_P (returnjump))
+	{
+	  rtx pat = PATTERN (returnjump);
+	  if (GET_CODE (pat) == PARALLEL)
+	    pat = XVECEXP (pat, 0, 0);
+	  if (ANY_RETURN_P (pat))
+	    JUMP_LABEL (returnjump) = pat;
+	  else
+	    JUMP_LABEL (returnjump) = ret_rtx;
+	}
+      else
+	returnjump = NULL_RTX;
     }
   else
 #endif
     {
       basic_block cur_bb;
 
-      if (! next_active_insn (BB_END (e->src)))
+      if (! next_active_insn (BB_END (exit_fallthru_edge->src)))
 	goto epilogue_done;
       /* We have a fall-through edge to the exit block, the source is not
          at the end of the function, and there will be an assembler epilogue
          at the end of the function.
          We can't use force_nonfallthru here, because that would try to
-         use return.  Inserting a jump 'by hand' is extremely messy, so
+	 use return.  Inserting a jump 'by hand' is extremely messy, so
 	 we take advantage of cfg_layout_finalize using
-	fixup_fallthru_exit_predecessor.  */
+	 fixup_fallthru_exit_predecessor.  */
       cfg_layout_initialize (0);
       FOR_EACH_BB (cur_bb)
 	if (cur_bb->index >= NUM_FIXED_BLOCKS
@@ -5542,6 +5881,7 @@ thread_prologue_and_epilogue_insns (void
       cfg_layout_finalize ();
     }
 epilogue_done:
+
   default_rtl_profile ();
 
   if (inserted)
@@ -5558,33 +5898,93 @@ epilogue_done:
 	}
     }
 
+#ifdef HAVE_simple_return
+  /* If there were branches to an empty LAST_BB which we tried to
+     convert to conditional simple_returns, but couldn't for some
+     reason, create a block to hold a simple_return insn and redirect
+     those remaining edges.  */
+  if (unconverted_simple_returns)
+    {
+      edge_iterator ei2;
+      basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb;
+
+      gcc_assert (entry_edge != orig_entry_edge);
+
+#ifdef HAVE_epilogue
+      if (simple_return_block == NULL && returnjump != NULL_RTX
+	  && JUMP_LABEL (returnjump) == simple_return_rtx)
+	{
+	  edge e = split_block (exit_fallthru_edge->src,
+				PREV_INSN (returnjump));
+	  simple_return_block = e->dest;
+	}
+#endif
+      if (simple_return_block == NULL)
+	{
+	  basic_block bb;
+	  rtx start;
+
+	  bb = create_basic_block (NULL, NULL, exit_pred);
+	  start = emit_jump_insn_after (gen_simple_return (),
+					BB_END (bb));
+	  JUMP_LABEL (start) = simple_return_rtx;
+	  emit_barrier_after (start);
+
+	  simple_return_block = bb;
+	  make_edge (bb, EXIT_BLOCK_PTR, 0);
+	}
+
+    restart_scan:
+      for (ei2 = ei_start (last_bb->preds); (e = ei_safe_edge (ei2)); )
+	{
+	  basic_block bb = e->src;
+
+	  if (bb != ENTRY_BLOCK_PTR
+	      && !bitmap_bit_p (&bb_flags, bb->index))
+	    {
+	      redirect_edge_and_branch_force (e, simple_return_block);
+	      goto restart_scan;
+	    }
+	  ei_next (&ei2);
+
+	}
+    }
+#endif
+
 #ifdef HAVE_sibcall_epilogue
   /* Emit sibling epilogues before any sibling call sites.  */
   for (ei = ei_start (EXIT_BLOCK_PTR->preds); (e = ei_safe_edge (ei)); )
     {
       basic_block bb = e->src;
       rtx insn = BB_END (bb);
+      rtx ep_seq;
 
       if (!CALL_P (insn)
-	  || ! SIBLING_CALL_P (insn))
+	  || ! SIBLING_CALL_P (insn)
+	  || (entry_edge != orig_entry_edge
+	      && !bitmap_bit_p (&bb_flags, bb->index)))
 	{
 	  ei_next (&ei);
 	  continue;
 	}
 
-      start_sequence ();
-      emit_note (NOTE_INSN_EPILOGUE_BEG);
-      emit_insn (gen_sibcall_epilogue ());
-      seq = get_insns ();
-      end_sequence ();
+      ep_seq = gen_sibcall_epilogue ();
+      if (ep_seq)
+	{
+	  start_sequence ();
+	  emit_note (NOTE_INSN_EPILOGUE_BEG);
+	  emit_insn (ep_seq);
+	  seq = get_insns ();
+	  end_sequence ();
 
-      /* Retain a map of the epilogue insns.  Used in life analysis to
-	 avoid getting rid of sibcall epilogue insns.  Do this before we
-	 actually emit the sequence.  */
-      record_insns (seq, NULL, &epilogue_insn_hash);
-      set_insn_locators (seq, epilogue_locator);
+	  /* Retain a map of the epilogue insns.  Used in life analysis to
+	     avoid getting rid of sibcall epilogue insns.  Do this before we
+	     actually emit the sequence.  */
+	  record_insns (seq, NULL, &epilogue_insn_hash);
+	  set_insn_locators (seq, epilogue_locator);
 
-      emit_insn_before (seq, insn);
+	  emit_insn_before (seq, insn);
+	}
       ei_next (&ei);
     }
 #endif
@@ -5609,6 +6009,8 @@ epilogue_done:
     }
 #endif
 
+  bitmap_clear (&bb_flags);
+
   /* Threading the prologue and epilogue changes the artificial refs
      in the entry and exit blocks.  */
   epilogue_completed = 1;
Index: gcc/genemit.c
===================================================================
--- gcc.orig/genemit.c
+++ gcc/genemit.c
@@ -226,6 +226,9 @@ gen_exp (rtx x, enum rtx_code subroutine
     case RETURN:
       printf ("ret_rtx");
       return;
+    case SIMPLE_RETURN:
+      printf ("simple_return_rtx");
+      return;
     case CLOBBER:
       if (REG_P (XEXP (x, 0)))
 	{
@@ -549,8 +552,8 @@ gen_expand (rtx expand)
 	  || (GET_CODE (next) == PARALLEL
 	      && ((GET_CODE (XVECEXP (next, 0, 0)) == SET
 		   && GET_CODE (SET_DEST (XVECEXP (next, 0, 0))) == PC)
-		  || GET_CODE (XVECEXP (next, 0, 0)) == RETURN))
-	  || GET_CODE (next) == RETURN)
+		  || ANY_RETURN_P (XVECEXP (next, 0, 0))))
+	  || ANY_RETURN_P (next))
 	printf ("  emit_jump_insn (");
       else if ((GET_CODE (next) == SET && GET_CODE (SET_SRC (next)) == CALL)
 	       || GET_CODE (next) == CALL
@@ -668,7 +671,7 @@ gen_split (rtx split)
 	  || (GET_CODE (next) == PARALLEL
 	      && GET_CODE (XVECEXP (next, 0, 0)) == SET
 	      && GET_CODE (SET_DEST (XVECEXP (next, 0, 0))) == PC)
-	  || GET_CODE (next) == RETURN)
+	  || ANY_RETURN_P (next))
 	printf ("  emit_jump_insn (");
       else if ((GET_CODE (next) == SET && GET_CODE (SET_SRC (next)) == CALL)
 	       || GET_CODE (next) == CALL
Index: gcc/haifa-sched.c
===================================================================
--- gcc.orig/haifa-sched.c
+++ gcc/haifa-sched.c
@@ -5310,6 +5310,11 @@ check_cfg (rtx head, rtx tail)
 		    gcc_assert (/* Usual case.  */
                                 (EDGE_COUNT (bb->succs) > 1
                                  && !BARRIER_P (NEXT_INSN (head)))
+				/* Special cases, see cfglayout.c:
+				   fixup_reorder_chain.  */
+				|| (EDGE_COUNT (bb->succs) == 1
+				    && (!onlyjump_p (head)
+					|| returnjump_p (head)))
                                 /* Or jump to the next instruction.  */
                                 || (EDGE_COUNT (bb->succs) == 1
                                     && (BB_HEAD (EDGE_I (bb->succs, 0)->dest)
Index: gcc/ifcvt.c
===================================================================
--- gcc.orig/ifcvt.c
+++ gcc/ifcvt.c
@@ -103,7 +103,7 @@ static int cond_exec_find_if_block (ce_i
 static int find_if_case_1 (basic_block, edge, edge);
 static int find_if_case_2 (basic_block, edge, edge);
 static int dead_or_predicable (basic_block, basic_block, basic_block,
-			       basic_block, int);
+			       edge, int);
 static void noce_emit_move_insn (rtx, rtx);
 static rtx block_has_only_trap (basic_block);
 \f
@@ -3793,6 +3793,7 @@ find_if_case_1 (basic_block test_bb, edg
   basic_block then_bb = then_edge->dest;
   basic_block else_bb = else_edge->dest;
   basic_block new_bb;
+  rtx else_target = NULL_RTX;
   int then_bb_index;
 
   /* If we are partitioning hot/cold basic blocks, we don't want to
@@ -3842,9 +3843,16 @@ find_if_case_1 (basic_block test_bb, edg
 				    predictable_edge_p (then_edge)))))
     return FALSE;
 
+  if (else_bb == EXIT_BLOCK_PTR)
+    {
+      rtx jump = BB_END (else_edge->src);
+      gcc_assert (JUMP_P (jump));
+      else_target = JUMP_LABEL (jump);
+    }
+
   /* Registers set are dead, or are predicable.  */
   if (! dead_or_predicable (test_bb, then_bb, else_bb,
-			    single_succ (then_bb), 1))
+			    single_succ_edge (then_bb), 1))
     return FALSE;
 
   /* Conversion went ok, including moving the insns and fixing up the
@@ -3861,6 +3869,9 @@ find_if_case_1 (basic_block test_bb, edg
       redirect_edge_succ (FALLTHRU_EDGE (test_bb), else_bb);
       new_bb = 0;
     }
+  else if (else_bb == EXIT_BLOCK_PTR)
+    new_bb = force_nonfallthru_and_redirect (FALLTHRU_EDGE (test_bb),
+					     else_bb, else_target);
   else
     new_bb = redirect_edge_and_branch_force (FALLTHRU_EDGE (test_bb),
 					     else_bb);
@@ -3959,7 +3970,7 @@ find_if_case_2 (basic_block test_bb, edg
     return FALSE;
 
   /* Registers set are dead, or are predicable.  */
-  if (! dead_or_predicable (test_bb, else_bb, then_bb, else_succ->dest, 0))
+  if (! dead_or_predicable (test_bb, else_bb, then_bb, else_succ, 0))
     return FALSE;
 
   /* Conversion went ok, including moving the insns and fixing up the
@@ -3988,12 +3999,34 @@ find_if_case_2 (basic_block test_bb, edg
 
 static int
 dead_or_predicable (basic_block test_bb, basic_block merge_bb,
-		    basic_block other_bb, basic_block new_dest, int reversep)
+		    basic_block other_bb, edge dest_edge, int reversep)
 {
-  rtx head, end, jump, earliest = NULL_RTX, old_dest, new_label = NULL_RTX;
+  basic_block new_dest = dest_edge->dest;
+  rtx head, end, jump, earliest = NULL_RTX, old_dest;
   bitmap merge_set = NULL;
   /* Number of pending changes.  */
   int n_validated_changes = 0;
+  rtx new_dest_label;
+
+  jump = BB_END (dest_edge->src);
+  if (JUMP_P (jump))
+    {
+      new_dest_label = JUMP_LABEL (jump);
+      if (new_dest_label == NULL_RTX)
+	{
+	  new_dest_label = PATTERN (jump);
+	  gcc_assert (ANY_RETURN_P (new_dest_label));
+	}
+    }
+  else if (other_bb != new_dest)
+    {
+      if (new_dest == EXIT_BLOCK_PTR)
+	new_dest_label = ret_rtx;
+      else
+	new_dest_label = block_label (new_dest);
+    }
+  else
+    new_dest_label = NULL_RTX;
 
   jump = BB_END (test_bb);
 
@@ -4131,10 +4164,9 @@ dead_or_predicable (basic_block test_bb,
   old_dest = JUMP_LABEL (jump);
   if (other_bb != new_dest)
     {
-      new_label = block_label (new_dest);
       if (reversep
-	  ? ! invert_jump_1 (jump, new_label)
-	  : ! redirect_jump_1 (jump, new_label))
+	  ? ! invert_jump_1 (jump, new_dest_label)
+	  : ! redirect_jump_1 (jump, new_dest_label))
 	goto cancel;
     }
 
@@ -4145,7 +4177,7 @@ dead_or_predicable (basic_block test_bb,
 
   if (other_bb != new_dest)
     {
-      redirect_jump_2 (jump, old_dest, new_label, 0, reversep);
+      redirect_jump_2 (jump, old_dest, new_dest_label, 0, reversep);
 
       redirect_edge_succ (BRANCH_EDGE (test_bb), new_dest);
       if (reversep)
Index: gcc/jump.c
===================================================================
--- gcc.orig/jump.c
+++ gcc/jump.c
@@ -29,7 +29,8 @@ along with GCC; see the file COPYING3.  
    JUMP_LABEL internal field.  With this we can detect labels that
    become unused because of the deletion of all the jumps that
    formerly used them.  The JUMP_LABEL info is sometimes looked
-   at by later passes.
+   at by later passes.  For return insns, it contains either a
+   RETURN or a SIMPLE_RETURN rtx.
 
    The subroutines redirect_jump and invert_jump are used
    from other passes as well.  */
@@ -741,10 +742,10 @@ condjump_p (const_rtx insn)
     return (GET_CODE (x) == IF_THEN_ELSE
 	    && ((GET_CODE (XEXP (x, 2)) == PC
 		 && (GET_CODE (XEXP (x, 1)) == LABEL_REF
-		     || GET_CODE (XEXP (x, 1)) == RETURN))
+		     || ANY_RETURN_P (XEXP (x, 1))))
 		|| (GET_CODE (XEXP (x, 1)) == PC
 		    && (GET_CODE (XEXP (x, 2)) == LABEL_REF
-			|| GET_CODE (XEXP (x, 2)) == RETURN))));
+			|| ANY_RETURN_P (XEXP (x, 2))))));
 }
 
 /* Return nonzero if INSN is a (possibly) conditional jump inside a
@@ -773,11 +774,11 @@ condjump_in_parallel_p (const_rtx insn)
     return 0;
   if (XEXP (SET_SRC (x), 2) == pc_rtx
       && (GET_CODE (XEXP (SET_SRC (x), 1)) == LABEL_REF
-	  || GET_CODE (XEXP (SET_SRC (x), 1)) == RETURN))
+	  || ANY_RETURN_P (XEXP (SET_SRC (x), 1)) == RETURN))
     return 1;
   if (XEXP (SET_SRC (x), 1) == pc_rtx
       && (GET_CODE (XEXP (SET_SRC (x), 2)) == LABEL_REF
-	  || GET_CODE (XEXP (SET_SRC (x), 2)) == RETURN))
+	  || ANY_RETURN_P (XEXP (SET_SRC (x), 2))))
     return 1;
   return 0;
 }
@@ -839,8 +840,9 @@ any_condjump_p (const_rtx insn)
   a = GET_CODE (XEXP (SET_SRC (x), 1));
   b = GET_CODE (XEXP (SET_SRC (x), 2));
 
-  return ((b == PC && (a == LABEL_REF || a == RETURN))
-	  || (a == PC && (b == LABEL_REF || b == RETURN)));
+  return ((b == PC && (a == LABEL_REF || a == RETURN || a == SIMPLE_RETURN))
+	  || (a == PC
+	      && (b == LABEL_REF || b == RETURN || b == SIMPLE_RETURN)));
 }
 
 /* Return the label of a conditional jump.  */
@@ -877,6 +879,7 @@ returnjump_p_1 (rtx *loc, void *data ATT
   switch (GET_CODE (x))
     {
     case RETURN:
+    case SIMPLE_RETURN:
     case EH_RETURN:
       return true;
 
@@ -1199,7 +1202,7 @@ delete_related_insns (rtx insn)
   /* If deleting a jump, decrement the count of the label,
      and delete the label if it is now unused.  */
 
-  if (JUMP_P (insn) && JUMP_LABEL (insn))
+  if (JUMP_P (insn) && JUMP_LABEL (insn) && !ANY_RETURN_P (JUMP_LABEL (insn)))
     {
       rtx lab = JUMP_LABEL (insn), lab_next;
 
@@ -1330,6 +1333,18 @@ delete_for_peephole (rtx from, rtx to)
      is also an unconditional jump in that case.  */
 }
 \f
+/* A helper function for redirect_exp_1; examines its input X and returns
+   either a LABEL_REF around a label, or a RETURN if X was NULL.  */
+static rtx
+redirect_target (rtx x)
+{
+  if (x == NULL_RTX)
+    return ret_rtx;
+  if (!ANY_RETURN_P (x))
+    return gen_rtx_LABEL_REF (Pmode, x);
+  return x;
+}
+
 /* Throughout LOC, redirect OLABEL to NLABEL.  Treat null OLABEL or
    NLABEL as a return.  Accrue modifications into the change group.  */
 
@@ -1341,37 +1356,19 @@ redirect_exp_1 (rtx *loc, rtx olabel, rt
   int i;
   const char *fmt;
 
-  if (code == LABEL_REF)
-    {
-      if (XEXP (x, 0) == olabel)
-	{
-	  rtx n;
-	  if (nlabel)
-	    n = gen_rtx_LABEL_REF (Pmode, nlabel);
-	  else
-	    n = ret_rtx;
-
-	  validate_change (insn, loc, n, 1);
-	  return;
-	}
-    }
-  else if (code == RETURN && olabel == 0)
+  if ((code == LABEL_REF && XEXP (x, 0) == olabel)
+      || x == olabel)
     {
-      if (nlabel)
-	x = gen_rtx_LABEL_REF (Pmode, nlabel);
-      else
-	x = ret_rtx;
-      if (loc == &PATTERN (insn))
-	x = gen_rtx_SET (VOIDmode, pc_rtx, x);
-      validate_change (insn, loc, x, 1);
+      validate_change (insn, loc, redirect_target (nlabel), 1);
       return;
     }
 
-  if (code == SET && nlabel == 0 && SET_DEST (x) == pc_rtx
+  if (code == SET && SET_DEST (x) == pc_rtx
+      && ANY_RETURN_P (nlabel)
       && GET_CODE (SET_SRC (x)) == LABEL_REF
       && XEXP (SET_SRC (x), 0) == olabel)
     {
-      validate_change (insn, loc, ret_rtx, 1);
+      validate_change (insn, loc, nlabel, 1);
       return;
     }
 
@@ -1408,6 +1405,7 @@ redirect_jump_1 (rtx jump, rtx nlabel)
   int ochanges = num_validated_changes ();
   rtx *loc, asmop;
 
+  gcc_assert (nlabel);
   asmop = extract_asm_operands (PATTERN (jump));
   if (asmop)
     {
@@ -1429,17 +1427,20 @@ redirect_jump_1 (rtx jump, rtx nlabel)
    jump target label is unused as a result, it and the code following
    it may be deleted.
 
-   If NLABEL is zero, we are to turn the jump into a (possibly conditional)
-   RETURN insn.
+   Normally, NLABEL will be a label, but it may also be a RETURN or
+   SIMPLE_RETURN rtx; in that case we are to turn the jump into a
+   (possibly conditional) return insn.
 
    The return value will be 1 if the change was made, 0 if it wasn't
-   (this can only occur for NLABEL == 0).  */
+   (this can only occur when trying to produce return insns).  */
 
 int
 redirect_jump (rtx jump, rtx nlabel, int delete_unused)
 {
   rtx olabel = JUMP_LABEL (jump);
 
+  gcc_assert (nlabel != NULL_RTX);
+
   if (nlabel == olabel)
     return 1;
 
@@ -1451,7 +1452,7 @@ redirect_jump (rtx jump, rtx nlabel, int
 }
 
 /* Fix up JUMP_LABEL and label ref counts after OLABEL has been replaced with
-   NLABEL in JUMP.
+   NEW_DEST in JUMP.
    If DELETE_UNUSED is positive, delete related insn to OLABEL if its ref
    count has dropped to zero.  */
 void
@@ -1467,13 +1468,14 @@ redirect_jump_2 (rtx jump, rtx olabel, r
      about this.  */
   gcc_assert (delete_unused >= 0);
   JUMP_LABEL (jump) = nlabel;
-  if (nlabel)
+  if (nlabel && !ANY_RETURN_P (nlabel))
     ++LABEL_NUSES (nlabel);
 
   /* Update labels in any REG_EQUAL note.  */
   if ((note = find_reg_note (jump, REG_EQUAL, NULL_RTX)) != NULL_RTX)
     {
-      if (!nlabel || (invert && !invert_exp_1 (XEXP (note, 0), jump)))
+      if (ANY_RETURN_P (nlabel)
+	  || (invert && !invert_exp_1 (XEXP (note, 0), jump)))
 	remove_note (jump, note);
       else
 	{
@@ -1482,7 +1484,8 @@ redirect_jump_2 (rtx jump, rtx olabel, r
 	}
     }
 
-  if (olabel && --LABEL_NUSES (olabel) == 0 && delete_unused > 0
+  if (olabel && !ANY_RETURN_P (olabel)
+      && --LABEL_NUSES (olabel) == 0 && delete_unused > 0
       /* Undefined labels will remain outside the insn stream.  */
       && INSN_UID (olabel))
     delete_related_insns (olabel);
Index: gcc/print-rtl.c
===================================================================
--- gcc.orig/print-rtl.c
+++ gcc/print-rtl.c
@@ -314,9 +314,16 @@ print_rtx (const_rtx in_rtx)
 	      }
 	  }
 	else if (i == 8 && JUMP_P (in_rtx) && JUMP_LABEL (in_rtx) != NULL)
-	  /* Output the JUMP_LABEL reference.  */
-	  fprintf (outfile, "\n%s%*s -> %d", print_rtx_head, indent * 2, "",
-		   INSN_UID (JUMP_LABEL (in_rtx)));
+	  {
+	    /* Output the JUMP_LABEL reference.  */
+	    fprintf (outfile, "\n%s%*s -> ", print_rtx_head, indent * 2, "");
+	    if (GET_CODE (JUMP_LABEL (in_rtx)) == RETURN)
+	      fprintf (outfile, "return");
+	    else if (GET_CODE (JUMP_LABEL (in_rtx)) == SIMPLE_RETURN)
+	      fprintf (outfile, "simple_return");
+	    else
+	      fprintf (outfile, "%d", INSN_UID (JUMP_LABEL (in_rtx)));
+	  }
 	else if (i == 0 && GET_CODE (in_rtx) == VALUE)
 	  {
 #ifndef GENERATOR_FILE
Index: gcc/reorg.c
===================================================================
--- gcc.orig/reorg.c
+++ gcc/reorg.c
@@ -161,8 +161,11 @@ static rtx *unfilled_firstobj;
 #define unfilled_slots_next	\
   ((rtx *) obstack_next_free (&unfilled_slots_obstack))
 
-/* Points to the label before the end of the function.  */
-static rtx end_of_function_label;
+/* Points to the label before the end of the function, or before a
+   return insn.  */
+static rtx function_return_label;
+/* Likewise for a simple_return.  */
+static rtx function_simple_return_label;
 
 /* Mapping between INSN_UID's and position in the code since INSN_UID's do
    not always monotonically increase.  */
@@ -175,7 +178,7 @@ static int stop_search_p (rtx, int);
 static int resource_conflicts_p (struct resources *, struct resources *);
 static int insn_references_resource_p (rtx, struct resources *, bool);
 static int insn_sets_resource_p (rtx, struct resources *, bool);
-static rtx find_end_label (void);
+static rtx find_end_label (rtx);
 static rtx emit_delay_sequence (rtx, rtx, int);
 static rtx add_to_delay_list (rtx, rtx);
 static rtx delete_from_delay_slot (rtx);
@@ -220,6 +223,15 @@ static void relax_delay_slots (rtx);
 static void make_return_insns (rtx);
 #endif
 \f
+/* Return true iff INSN is a simplejump, or any kind of return insn.  */
+
+static bool
+simplejump_or_return_p (rtx insn)
+{
+  return (JUMP_P (insn)
+	  && (simplejump_p (insn) || ANY_RETURN_P (PATTERN (insn))));
+}
+\f
 /* Return TRUE if this insn should stop the search for insn to fill delay
    slots.  LABELS_P indicates that labels should terminate the search.
    In all cases, jumps terminate the search.  */
@@ -335,23 +347,29 @@ insn_sets_resource_p (rtx insn, struct r
 
    ??? There may be a problem with the current implementation.  Suppose
    we start with a bare RETURN insn and call find_end_label.  It may set
-   end_of_function_label just before the RETURN.  Suppose the machinery
+   function_return_label just before the RETURN.  Suppose the machinery
    is able to fill the delay slot of the RETURN insn afterwards.  Then
-   end_of_function_label is no longer valid according to the property
+   function_return_label is no longer valid according to the property
    described above and find_end_label will still return it unmodified.
    Note that this is probably mitigated by the following observation:
-   once end_of_function_label is made, it is very likely the target of
+   once function_return_label is made, it is very likely the target of
    a jump, so filling the delay slot of the RETURN will be much more
    difficult.  */
 
 static rtx
-find_end_label (void)
+find_end_label (rtx kind)
 {
   rtx insn;
+  rtx *plabel;
+
+  if (kind == ret_rtx)
+    plabel = &function_return_label;
+  else
+    plabel = &function_simple_return_label;
 
   /* If we found one previously, return it.  */
-  if (end_of_function_label)
-    return end_of_function_label;
+  if (*plabel)
+    return *plabel;
 
   /* Otherwise, see if there is a label at the end of the function.  If there
      is, it must be that RETURN insns aren't needed, so that is our return
@@ -366,44 +384,44 @@ find_end_label (void)
 
   /* When a target threads its epilogue we might already have a
      suitable return insn.  If so put a label before it for the
-     end_of_function_label.  */
+     function_return_label.  */
   if (BARRIER_P (insn)
       && JUMP_P (PREV_INSN (insn))
-      && GET_CODE (PATTERN (PREV_INSN (insn))) == RETURN)
+      && PATTERN (PREV_INSN (insn)) == kind)
     {
       rtx temp = PREV_INSN (PREV_INSN (insn));
-      end_of_function_label = gen_label_rtx ();
-      LABEL_NUSES (end_of_function_label) = 0;
+      rtx label = gen_label_rtx ();
+      LABEL_NUSES (label) = 0;
 
       /* Put the label before an USE insns that may precede the RETURN insn.  */
       while (GET_CODE (temp) == USE)
 	temp = PREV_INSN (temp);
 
-      emit_label_after (end_of_function_label, temp);
+      emit_label_after (label, temp);
+      *plabel = label;
     }
 
   else if (LABEL_P (insn))
-    end_of_function_label = insn;
+    *plabel = insn;
   else
     {
-      end_of_function_label = gen_label_rtx ();
-      LABEL_NUSES (end_of_function_label) = 0;
+      rtx label = gen_label_rtx ();
+      LABEL_NUSES (label) = 0;
       /* If the basic block reorder pass moves the return insn to
 	 some other place try to locate it again and put our
-	 end_of_function_label there.  */
-      while (insn && ! (JUMP_P (insn)
-		        && (GET_CODE (PATTERN (insn)) == RETURN)))
+	 function_return_label there.  */
+      while (insn && ! (JUMP_P (insn) && (PATTERN (insn) == kind)))
 	insn = PREV_INSN (insn);
       if (insn)
 	{
 	  insn = PREV_INSN (insn);
 
-	  /* Put the label before an USE insns that may proceed the
+	  /* Put the label before an USE insns that may precede the
 	     RETURN insn.  */
 	  while (GET_CODE (insn) == USE)
 	    insn = PREV_INSN (insn);
 
-	  emit_label_after (end_of_function_label, insn);
+	  emit_label_after (label, insn);
 	}
       else
 	{
@@ -413,19 +431,16 @@ find_end_label (void)
 	      && ! HAVE_return
 #endif
 	      )
-	    {
-	      /* The RETURN insn has its delay slot filled so we cannot
-		 emit the label just before it.  Since we already have
-		 an epilogue and cannot emit a new RETURN, we cannot
-		 emit the label at all.  */
-	      end_of_function_label = NULL_RTX;
-	      return end_of_function_label;
-	    }
+	    /* The RETURN insn has its delay slot filled so we cannot
+	       emit the label just before it.  Since we already have
+	       an epilogue and cannot emit a new RETURN, we cannot
+	       emit the label at all.  */
+	    return NULL_RTX;
 #endif /* HAVE_epilogue */
 
 	  /* Otherwise, make a new label and emit a RETURN and BARRIER,
 	     if needed.  */
-	  emit_label (end_of_function_label);
+	  emit_label (label);
 #ifdef HAVE_return
 	  /* We don't bother trying to create a return insn if the
 	     epilogue has filled delay-slots; we would have to try and
@@ -437,19 +452,21 @@ find_end_label (void)
 	      /* The return we make may have delay slots too.  */
 	      rtx insn = gen_return ();
 	      insn = emit_jump_insn (insn);
+	      JUMP_LABEL (insn) = ret_rtx;
 	      emit_barrier ();
 	      if (num_delay_slots (insn) > 0)
 		obstack_ptr_grow (&unfilled_slots_obstack, insn);
 	    }
 #endif
 	}
+      *plabel = label;
     }
 
   /* Show one additional use for this label so it won't go away until
      we are done.  */
-  ++LABEL_NUSES (end_of_function_label);
+  ++LABEL_NUSES (*plabel);
 
-  return end_of_function_label;
+  return *plabel;
 }
 \f
 /* Put INSN and LIST together in a SEQUENCE rtx of LENGTH, and replace
@@ -797,10 +814,8 @@ optimize_skip (rtx insn)
   if ((next_trial == next_active_insn (JUMP_LABEL (insn))
        && ! (next_trial == 0 && crtl->epilogue_delay_list != 0))
       || (next_trial != 0
-	  && JUMP_P (next_trial)
-	  && JUMP_LABEL (insn) == JUMP_LABEL (next_trial)
-	  && (simplejump_p (next_trial)
-	      || GET_CODE (PATTERN (next_trial)) == RETURN)))
+	  && simplejump_or_return_p (next_trial)
+	  && JUMP_LABEL (insn) == JUMP_LABEL (next_trial)))
     {
       if (eligible_for_annul_false (insn, 0, trial, flags))
 	{
@@ -819,13 +834,11 @@ optimize_skip (rtx insn)
 	 branch, thread our jump to the target of that branch.  Don't
 	 change this into a RETURN here, because it may not accept what
 	 we have in the delay slot.  We'll fix this up later.  */
-      if (next_trial && JUMP_P (next_trial)
-	  && (simplejump_p (next_trial)
-	      || GET_CODE (PATTERN (next_trial)) == RETURN))
+      if (next_trial && simplejump_or_return_p (next_trial))
 	{
 	  rtx target_label = JUMP_LABEL (next_trial);
-	  if (target_label == 0)
-	    target_label = find_end_label ();
+	  if (ANY_RETURN_P (target_label))
+	    target_label = find_end_label (target_label);
 
 	  if (target_label)
 	    {
@@ -866,7 +879,7 @@ get_jump_flags (rtx insn, rtx label)
   if (JUMP_P (insn)
       && (condjump_p (insn) || condjump_in_parallel_p (insn))
       && INSN_UID (insn) <= max_uid
-      && label != 0
+      && label != 0 && !ANY_RETURN_P (label)
       && INSN_UID (label) <= max_uid)
     flags
       = (uid_to_ruid[INSN_UID (label)] > uid_to_ruid[INSN_UID (insn)])
@@ -1038,7 +1051,7 @@ get_branch_condition (rtx insn, rtx targ
     pat = XVECEXP (pat, 0, 0);
 
   if (GET_CODE (pat) == RETURN)
-    return target == 0 ? const_true_rtx : 0;
+    return ANY_RETURN_P (target) ? const_true_rtx : 0;
 
   else if (GET_CODE (pat) != SET || SET_DEST (pat) != pc_rtx)
     return 0;
@@ -1358,8 +1371,7 @@ steal_delay_list_from_fallthrough (rtx i
   /* We can't do anything if SEQ's delay insn isn't an
      unconditional branch.  */
 
-  if (! simplejump_p (XVECEXP (seq, 0, 0))
-      && GET_CODE (PATTERN (XVECEXP (seq, 0, 0))) != RETURN)
+  if (! simplejump_or_return_p (XVECEXP (seq, 0, 0)))
     return delay_list;
 
   for (i = 1; i < XVECLEN (seq, 0); i++)
@@ -2245,7 +2257,8 @@ fill_simple_delay_slots (int non_jumps_p
 	  && (!JUMP_P (insn)
 	      || ((condjump_p (insn) || condjump_in_parallel_p (insn))
 		  && ! simplejump_p (insn)
-		  && JUMP_LABEL (insn) != 0)))
+		  && JUMP_LABEL (insn) != 0
+		  && !ANY_RETURN_P (JUMP_LABEL (insn)))))
 	{
 	  /* Invariant: If insn is a JUMP_INSN, the insn's jump
 	     label.  Otherwise, zero.  */
@@ -2270,7 +2283,7 @@ fill_simple_delay_slots (int non_jumps_p
 		target = JUMP_LABEL (insn);
 	    }
 
-	  if (target == 0)
+	  if (target == 0 || ANY_RETURN_P (target))
 	    for (trial = next_nonnote_insn (insn); trial; trial = next_trial)
 	      {
 		next_trial = next_nonnote_insn (trial);
@@ -2349,6 +2362,7 @@ fill_simple_delay_slots (int non_jumps_p
 	      && JUMP_P (trial)
 	      && simplejump_p (trial)
 	      && (target == 0 || JUMP_LABEL (trial) == target)
+	      && !ANY_RETURN_P (JUMP_LABEL (trial))
 	      && (next_trial = next_active_insn (JUMP_LABEL (trial))) != 0
 	      && ! (NONJUMP_INSN_P (next_trial)
 		    && GET_CODE (PATTERN (next_trial)) == SEQUENCE)
@@ -2371,7 +2385,7 @@ fill_simple_delay_slots (int non_jumps_p
 	      if (new_label != 0)
 		new_label = get_label_before (new_label);
 	      else
-		new_label = find_end_label ();
+		new_label = find_end_label (simple_return_rtx);
 
 	      if (new_label)
 	        {
@@ -2503,7 +2517,8 @@ fill_simple_delay_slots (int non_jumps_p
 \f
 /* Follow any unconditional jump at LABEL;
    return the ultimate label reached by any such chain of jumps.
-   Return null if the chain ultimately leads to a return instruction.
+   Return a suitable return rtx if the chain ultimately leads to a
+   return instruction.
    If LABEL is not followed by a jump, return LABEL.
    If the chain loops or we can't find end, return LABEL,
    since that tells caller to avoid changing the insn.  */
@@ -2518,6 +2533,7 @@ follow_jumps (rtx label)
 
   for (depth = 0;
        (depth < 10
+	&& !ANY_RETURN_P (value)
 	&& (insn = next_active_insn (value)) != 0
 	&& JUMP_P (insn)
 	&& ((JUMP_LABEL (insn) != 0 && any_uncondjump_p (insn)
@@ -2527,18 +2543,22 @@ follow_jumps (rtx label)
 	&& BARRIER_P (next));
        depth++)
     {
-      rtx tem;
+      rtx this_label = JUMP_LABEL (insn);
 
       /* If we have found a cycle, make the insn jump to itself.  */
-      if (JUMP_LABEL (insn) == label)
+      if (this_label == label)
 	return label;
 
-      tem = next_active_insn (JUMP_LABEL (insn));
-      if (tem && (GET_CODE (PATTERN (tem)) == ADDR_VEC
+      if (!ANY_RETURN_P (this_label))
+	{
+	  rtx tem = next_active_insn (this_label);
+	  if (tem
+	      && (GET_CODE (PATTERN (tem)) == ADDR_VEC
 		  || GET_CODE (PATTERN (tem)) == ADDR_DIFF_VEC))
-	break;
+	    break;
+	}
 
-      value = JUMP_LABEL (insn);
+      value = this_label;
     }
   if (depth == 10)
     return label;
@@ -2985,16 +3005,14 @@ fill_slots_from_thread (rtx insn, rtx co
 
       gcc_assert (thread_if_true);
 
-      if (new_thread && JUMP_P (new_thread)
-	  && (simplejump_p (new_thread)
-	      || GET_CODE (PATTERN (new_thread)) == RETURN)
+      if (new_thread && simplejump_or_return_p (new_thread)
 	  && redirect_with_delay_list_safe_p (insn,
 					      JUMP_LABEL (new_thread),
 					      delay_list))
 	new_thread = follow_jumps (JUMP_LABEL (new_thread));
 
-      if (new_thread == 0)
-	label = find_end_label ();
+      if (ANY_RETURN_P (new_thread))
+	label = find_end_label (new_thread);
       else if (LABEL_P (new_thread))
 	label = new_thread;
       else
@@ -3340,11 +3358,12 @@ relax_delay_slots (rtx first)
 	 group of consecutive labels.  */
       if (JUMP_P (insn)
 	  && (condjump_p (insn) || condjump_in_parallel_p (insn))
-	  && (target_label = JUMP_LABEL (insn)) != 0)
+	  && (target_label = JUMP_LABEL (insn)) != 0
+	  && !ANY_RETURN_P (target_label))
 	{
 	  target_label = skip_consecutive_labels (follow_jumps (target_label));
-	  if (target_label == 0)
-	    target_label = find_end_label ();
+	  if (ANY_RETURN_P (target_label))
+	    target_label = find_end_label (target_label);
 
 	  if (target_label && next_active_insn (target_label) == next
 	      && ! condjump_in_parallel_p (insn))
@@ -3359,9 +3378,8 @@ relax_delay_slots (rtx first)
 	  /* See if this jump conditionally branches around an unconditional
 	     jump.  If so, invert this jump and point it to the target of the
 	     second jump.  */
-	  if (next && JUMP_P (next)
+	  if (next && simplejump_or_return_p (next)
 	      && any_condjump_p (insn)
-	      && (simplejump_p (next) || GET_CODE (PATTERN (next)) == RETURN)
 	      && target_label
 	      && next_active_insn (target_label) == next_active_insn (next)
 	      && no_labels_between_p (insn, next))
@@ -3376,7 +3394,7 @@ relax_delay_slots (rtx first)
 		 invert_jump fails.  */
 
 	      ++LABEL_NUSES (target_label);
-	      if (label)
+	      if (label && LABEL_P (label))
 		++LABEL_NUSES (label);
 
 	      if (invert_jump (insn, label, 1))
@@ -3385,7 +3403,7 @@ relax_delay_slots (rtx first)
 		  next = insn;
 		}
 
-	      if (label)
+	      if (label && LABEL_P (label))
 		--LABEL_NUSES (label);
 
 	      if (--LABEL_NUSES (target_label) == 0)
@@ -3403,8 +3421,7 @@ relax_delay_slots (rtx first)
 	 Don't do this if we expect the conditional branch to be true, because
 	 we would then be making the more common case longer.  */
 
-      if (JUMP_P (insn)
-	  && (simplejump_p (insn) || GET_CODE (PATTERN (insn)) == RETURN)
+      if (simplejump_or_return_p (insn)
 	  && (other = prev_active_insn (insn)) != 0
 	  && any_condjump_p (other)
 	  && no_labels_between_p (other, insn)
@@ -3445,10 +3462,10 @@ relax_delay_slots (rtx first)
 	 Only do so if optimizing for size since this results in slower, but
 	 smaller code.  */
       if (optimize_function_for_size_p (cfun)
-	  && GET_CODE (PATTERN (delay_insn)) == RETURN
+	  && ANY_RETURN_P (PATTERN (delay_insn))
 	  && next
 	  && JUMP_P (next)
-	  && GET_CODE (PATTERN (next)) == RETURN)
+	  && PATTERN (next) == PATTERN (delay_insn))
 	{
 	  rtx after;
 	  int i;
@@ -3487,14 +3504,16 @@ relax_delay_slots (rtx first)
 	continue;
 
       target_label = JUMP_LABEL (delay_insn);
+      if (target_label && ANY_RETURN_P (target_label))
+	continue;
 
       if (target_label)
 	{
 	  /* If this jump goes to another unconditional jump, thread it, but
 	     don't convert a jump into a RETURN here.  */
 	  trial = skip_consecutive_labels (follow_jumps (target_label));
-	  if (trial == 0)
-	    trial = find_end_label ();
+	  if (ANY_RETURN_P (trial))
+	    trial = find_end_label (trial);
 
 	  if (trial && trial != target_label
 	      && redirect_with_delay_slots_safe_p (delay_insn, trial, insn))
@@ -3517,7 +3536,7 @@ relax_delay_slots (rtx first)
 		 later incorrectly compute register live/death info.  */
 	      rtx tmp = next_active_insn (trial);
 	      if (tmp == 0)
-		tmp = find_end_label ();
+		tmp = find_end_label (simple_return_rtx);
 
 	      if (tmp)
 	        {
@@ -3537,14 +3556,12 @@ relax_delay_slots (rtx first)
 	     delay list and that insn is redundant, thread the jump.  */
 	  if (trial && GET_CODE (PATTERN (trial)) == SEQUENCE
 	      && XVECLEN (PATTERN (trial), 0) == 2
-	      && JUMP_P (XVECEXP (PATTERN (trial), 0, 0))
-	      && (simplejump_p (XVECEXP (PATTERN (trial), 0, 0))
-		  || GET_CODE (PATTERN (XVECEXP (PATTERN (trial), 0, 0))) == RETURN)
+	      && simplejump_or_return_p (XVECEXP (PATTERN (trial), 0, 0))
 	      && redundant_insn (XVECEXP (PATTERN (trial), 0, 1), insn, 0))
 	    {
 	      target_label = JUMP_LABEL (XVECEXP (PATTERN (trial), 0, 0));
-	      if (target_label == 0)
-		target_label = find_end_label ();
+	      if (ANY_RETURN_P (target_label))
+		target_label = find_end_label (target_label);
 
 	      if (target_label
 	          && redirect_with_delay_slots_safe_p (delay_insn, target_label,
@@ -3622,16 +3639,15 @@ relax_delay_slots (rtx first)
 	 a RETURN here.  */
       if (! INSN_ANNULLED_BRANCH_P (delay_insn)
 	  && any_condjump_p (delay_insn)
-	  && next && JUMP_P (next)
-	  && (simplejump_p (next) || GET_CODE (PATTERN (next)) == RETURN)
+	  && next && simplejump_or_return_p (next)
 	  && next_active_insn (target_label) == next_active_insn (next)
 	  && no_labels_between_p (insn, next))
 	{
 	  rtx label = JUMP_LABEL (next);
 	  rtx old_label = JUMP_LABEL (delay_insn);
 
-	  if (label == 0)
-	    label = find_end_label ();
+	  if (ANY_RETURN_P (label))
+	    label = find_end_label (label);
 
 	  /* find_end_label can generate a new label. Check this first.  */
 	  if (label
@@ -3692,7 +3708,8 @@ static void
 make_return_insns (rtx first)
 {
   rtx insn, jump_insn, pat;
-  rtx real_return_label = end_of_function_label;
+  rtx real_return_label = function_return_label;
+  rtx real_simple_return_label = function_simple_return_label;
   int slots, i;
 
 #ifdef DELAY_SLOTS_FOR_EPILOGUE
@@ -3710,15 +3727,22 @@ make_return_insns (rtx first)
      made for END_OF_FUNCTION_LABEL.  If so, set up anything we can't change
      into a RETURN to jump to it.  */
   for (insn = first; insn; insn = NEXT_INSN (insn))
-    if (JUMP_P (insn) && GET_CODE (PATTERN (insn)) == RETURN)
+    if (JUMP_P (insn) && ANY_RETURN_P (PATTERN (insn)))
       {
-	real_return_label = get_label_before (insn);
+	rtx t = get_label_before (insn);
+	if (PATTERN (insn) == ret_rtx)
+	  real_return_label = t;
+	else
+	  real_simple_return_label = t;
 	break;
       }
 
   /* Show an extra usage of REAL_RETURN_LABEL so it won't go away if it
      was equal to END_OF_FUNCTION_LABEL.  */
-  LABEL_NUSES (real_return_label)++;
+  if (real_return_label)
+    LABEL_NUSES (real_return_label)++;
+  if (real_simple_return_label)
+    LABEL_NUSES (real_simple_return_label)++;
 
   /* Clear the list of insns to fill so we can use it.  */
   obstack_free (&unfilled_slots_obstack, unfilled_firstobj);
@@ -3726,13 +3750,27 @@ make_return_insns (rtx first)
   for (insn = first; insn; insn = NEXT_INSN (insn))
     {
       int flags;
+      rtx kind, real_label;
 
       /* Only look at filled JUMP_INSNs that go to the end of function
 	 label.  */
       if (!NONJUMP_INSN_P (insn)
 	  || GET_CODE (PATTERN (insn)) != SEQUENCE
-	  || !JUMP_P (XVECEXP (PATTERN (insn), 0, 0))
-	  || JUMP_LABEL (XVECEXP (PATTERN (insn), 0, 0)) != end_of_function_label)
+	  || !JUMP_P (XVECEXP (PATTERN (insn), 0, 0)))
+	continue;
+
+      if (JUMP_LABEL (XVECEXP (PATTERN (insn), 0, 0)) == function_return_label)
+	{
+	  kind = ret_rtx;
+	  real_label = real_return_label;
+	}
+      else if (JUMP_LABEL (XVECEXP (PATTERN (insn), 0, 0))
+	       == function_simple_return_label)
+	{
+	  kind = simple_return_rtx;
+	  real_label = real_simple_return_label;
+	}
+      else
 	continue;
 
       pat = PATTERN (insn);
@@ -3740,14 +3778,12 @@ make_return_insns (rtx first)
 
       /* If we can't make the jump into a RETURN, try to redirect it to the best
 	 RETURN and go on to the next insn.  */
-      if (! reorg_redirect_jump (jump_insn, NULL_RTX))
+      if (! reorg_redirect_jump (jump_insn, kind))
 	{
 	  /* Make sure redirecting the jump will not invalidate the delay
 	     slot insns.  */
-	  if (redirect_with_delay_slots_safe_p (jump_insn,
-						real_return_label,
-						insn))
-	    reorg_redirect_jump (jump_insn, real_return_label);
+	  if (redirect_with_delay_slots_safe_p (jump_insn, real_label, insn))
+	    reorg_redirect_jump (jump_insn, real_label);
 	  continue;
 	}
 
@@ -3787,7 +3823,7 @@ make_return_insns (rtx first)
 	 RETURN, delete the SEQUENCE and output the individual insns,
 	 followed by the RETURN.  Then set things up so we try to find
 	 insns for its delay slots, if it needs some.  */
-      if (GET_CODE (PATTERN (jump_insn)) == RETURN)
+      if (ANY_RETURN_P (PATTERN (jump_insn)))
 	{
 	  rtx prev = PREV_INSN (insn);
 
@@ -3804,13 +3840,16 @@ make_return_insns (rtx first)
       else
 	/* It is probably more efficient to keep this with its current
 	   delay slot as a branch to a RETURN.  */
-	reorg_redirect_jump (jump_insn, real_return_label);
+	reorg_redirect_jump (jump_insn, real_label);
     }
 
   /* Now delete REAL_RETURN_LABEL if we never used it.  Then try to fill any
      new delay slots we have created.  */
-  if (--LABEL_NUSES (real_return_label) == 0)
+  if (real_return_label != NULL_RTX && --LABEL_NUSES (real_return_label) == 0)
     delete_related_insns (real_return_label);
+  if (real_simple_return_label != NULL_RTX
+      && --LABEL_NUSES (real_simple_return_label) == 0)
+    delete_related_insns (real_simple_return_label);
 
   fill_simple_delay_slots (1);
   fill_simple_delay_slots (0);
@@ -3878,7 +3917,7 @@ dbr_schedule (rtx first)
   init_resource_info (epilogue_insn);
 
   /* Show we haven't computed an end-of-function label yet.  */
-  end_of_function_label = 0;
+  function_return_label = function_simple_return_label = NULL_RTX;
 
   /* Initialize the statistics for this function.  */
   memset (num_insns_needing_delays, 0, sizeof num_insns_needing_delays);
@@ -3900,11 +3939,23 @@ dbr_schedule (rtx first)
   /* If we made an end of function label, indicate that it is now
      safe to delete it by undoing our prior adjustment to LABEL_NUSES.
      If it is now unused, delete it.  */
-  if (end_of_function_label && --LABEL_NUSES (end_of_function_label) == 0)
-    delete_related_insns (end_of_function_label);
+  if (function_return_label && --LABEL_NUSES (function_return_label) == 0)
+    delete_related_insns (function_return_label);
+  if (function_simple_return_label
+      && --LABEL_NUSES (function_simple_return_label) == 0)
+    delete_related_insns (function_simple_return_label);
 
+#if defined HAVE_return || defined HAVE_simple_return
+  if (
 #ifdef HAVE_return
-  if (HAVE_return && end_of_function_label != 0)
+      (HAVE_return && function_return_label != 0)
+#else
+      0
+#endif
+#ifdef HAVE_simple_return
+      || (HAVE_simple_return && function_simple_return_label != 0)
+#endif
+      )
     make_return_insns (first);
 #endif
 
Index: gcc/resource.c
===================================================================
--- gcc.orig/resource.c
+++ gcc/resource.c
@@ -495,6 +495,8 @@ find_dead_or_set_registers (rtx target, 
 		  || GET_CODE (PATTERN (this_jump_insn)) == RETURN)
 		{
 		  next = JUMP_LABEL (this_jump_insn);
+		  if (next && ANY_RETURN_P (next))
+		    next = NULL_RTX;
 		  if (jump_insn == 0)
 		    {
 		      jump_insn = insn;
@@ -562,9 +564,10 @@ find_dead_or_set_registers (rtx target, 
 		  AND_COMPL_HARD_REG_SET (scratch, needed.regs);
 		  AND_COMPL_HARD_REG_SET (fallthrough_res.regs, scratch);
 
-		  find_dead_or_set_registers (JUMP_LABEL (this_jump_insn),
-					      &target_res, 0, jump_count,
-					      target_set, needed);
+		  if (!ANY_RETURN_P (JUMP_LABEL (this_jump_insn)))
+		    find_dead_or_set_registers (JUMP_LABEL (this_jump_insn),
+						&target_res, 0, jump_count,
+						target_set, needed);
 		  find_dead_or_set_registers (next,
 					      &fallthrough_res, 0, jump_count,
 					      set, needed);
@@ -1097,6 +1100,8 @@ mark_target_live_regs (rtx insns, rtx ta
       struct resources new_resources;
       rtx stop_insn = next_active_insn (jump_insn);
 
+      if (jump_target && ANY_RETURN_P (jump_target))
+	jump_target = NULL_RTX;
       mark_target_live_regs (insns, next_active_insn (jump_target),
 			     &new_resources);
       CLEAR_RESOURCE (&set);
Index: gcc/rtl.c
===================================================================
--- gcc.orig/rtl.c
+++ gcc/rtl.c
@@ -256,6 +256,7 @@ copy_rtx (rtx orig)
     case PC:
     case CC0:
     case RETURN:
+    case SIMPLE_RETURN:
     case SCRATCH:
       /* SCRATCH must be shared because they represent distinct values.  */
       return orig;
Index: gcc/rtl.def
===================================================================
--- gcc.orig/rtl.def
+++ gcc/rtl.def
@@ -296,6 +296,10 @@ DEF_RTL_EXPR(CALL, "call", "ee", RTX_EXT
 
 DEF_RTL_EXPR(RETURN, "return", "", RTX_EXTRA)
 
+/* A plain return, to be used on paths that are reached without going
+   through the function prologue.  */
+DEF_RTL_EXPR(SIMPLE_RETURN, "simple_return", "", RTX_EXTRA)
+
 /* Special for EH return from subroutine.  */
 
 DEF_RTL_EXPR(EH_RETURN, "eh_return", "", RTX_EXTRA)
Index: gcc/rtl.h
===================================================================
--- gcc.orig/rtl.h
+++ gcc/rtl.h
@@ -412,6 +412,10 @@ struct GTY((variable_size)) rtvec_def {
   (JUMP_P (INSN) && (GET_CODE (PATTERN (INSN)) == ADDR_VEC || \
 		     GET_CODE (PATTERN (INSN)) == ADDR_DIFF_VEC))
 
+/* Predicate yielding nonzero iff X is a return or simple_preturn.  */
+#define ANY_RETURN_P(X) \
+  (GET_CODE (X) == RETURN || GET_CODE (X) == SIMPLE_RETURN)
+
 /* 1 if X is a unary operator.  */
 
 #define UNARY_P(X)   \
@@ -2046,6 +2050,7 @@ enum global_rtl_index
   GR_PC,
   GR_CC0,
   GR_RETURN,
+  GR_SIMPLE_RETURN,
   GR_STACK_POINTER,
   GR_FRAME_POINTER,
 /* For register elimination to work properly these hard_frame_pointer_rtx,
@@ -2136,6 +2141,7 @@ extern struct target_rtl *this_target_rt
 /* Standard pieces of rtx, to be substituted directly into things.  */
 #define pc_rtx                  (global_rtl[GR_PC])
 #define ret_rtx                 (global_rtl[GR_RETURN])
+#define simple_return_rtx       (global_rtl[GR_SIMPLE_RETURN])
 #define cc0_rtx                 (global_rtl[GR_CC0])
 
 /* All references to certain hard regs, except those created
Index: gcc/rtlanal.c
===================================================================
--- gcc.orig/rtlanal.c
+++ gcc/rtlanal.c
@@ -2662,6 +2662,7 @@ tablejump_p (const_rtx insn, rtx *labelp
 
   if (JUMP_P (insn)
       && (label = JUMP_LABEL (insn)) != NULL_RTX
+      && !ANY_RETURN_P (label)
       && (table = next_active_insn (label)) != NULL_RTX
       && JUMP_TABLE_DATA_P (table))
     {
Index: gcc/sched-vis.c
===================================================================
--- gcc.orig/sched-vis.c
+++ gcc/sched-vis.c
@@ -554,6 +554,9 @@ print_pattern (char *buf, const_rtx x, i
     case RETURN:
       sprintf (buf, "return");
       break;
+    case SIMPLE_RETURN:
+      sprintf (buf, "simple_return");
+      break;
     case CALL:
       print_exp (buf, x, verbose);
       break;
Index: gcc/gengenrtl.c
===================================================================
--- gcc.orig/gengenrtl.c
+++ gcc/gengenrtl.c
@@ -131,6 +131,7 @@ special_rtx (int idx)
 	  || strcmp (defs[idx].enumname, "PC") == 0
 	  || strcmp (defs[idx].enumname, "CC0") == 0
 	  || strcmp (defs[idx].enumname, "RETURN") == 0
+	  || strcmp (defs[idx].enumname, "SIMPLE_RETURN") == 0
 	  || strcmp (defs[idx].enumname, "CONST_VECTOR") == 0);
 }
 
Index: gcc/common.opt
===================================================================
--- gcc.orig/common.opt
+++ gcc/common.opt
@@ -1718,6 +1718,11 @@ fshow-column
 Common Report Var(flag_show_column) Init(1)
 Show column numbers in diagnostics, when available.  Default on
 
+fshrink-wrap
+Common Report Var(flag_shrink_wrap) Optimization
+Emit function prologues only before parts of the function that need it,
+rather than at the top of the function.
+
 fsignaling-nans
 Common Report Var(flag_signaling_nans) Optimization SetByCombined
 Disable optimizations observable by IEEE signaling NaNs
Index: gcc/opts.c
===================================================================
--- gcc.orig/opts.c
+++ gcc/opts.c
@@ -442,6 +442,7 @@ static const struct default_options defa
     { OPT_LEVELS_1_PLUS, OPT_fipa_reference, NULL, 1 },
     { OPT_LEVELS_1_PLUS, OPT_fipa_profile, NULL, 1 },
     { OPT_LEVELS_1_PLUS, OPT_fmerge_constants, NULL, 1 },
+    { OPT_LEVELS_1_PLUS, OPT_fshrink_wrap, NULL, 1 },
     { OPT_LEVELS_1_PLUS, OPT_fsplit_wide_types, NULL, 1 },
     { OPT_LEVELS_1_PLUS, OPT_ftree_ccp, NULL, 1 },
     { OPT_LEVELS_1_PLUS, OPT_ftree_bit_ccp, NULL, 1 },
Index: gcc/doc/invoke.texi
===================================================================
--- gcc.orig/doc/invoke.texi
+++ gcc/doc/invoke.texi
@@ -384,10 +384,10 @@ Objective-C and Objective-C++ Dialects}.
 -fschedule-insns -fschedule-insns2 -fsection-anchors @gol
 -fselective-scheduling -fselective-scheduling2 @gol
 -fsel-sched-pipelining -fsel-sched-pipelining-outer-loops @gol
--fsignaling-nans -fsingle-precision-constant -fsplit-ivs-in-unroller @gol
--fsplit-wide-types -fstack-protector -fstack-protector-all @gol
--fstrict-aliasing -fstrict-overflow -fthread-jumps -ftracer @gol
--ftree-bit-ccp @gol
+-fshrink-wrap -fsignaling-nans -fsingle-precision-constant @gol
+-fsplit-ivs-in-unroller -fsplit-wide-types -fstack-protector @gol
+-fstack-protector-all -fstrict-aliasing -fstrict-overflow @gol
+-fthread-jumps -ftracer -ftree-bit-ccp @gol
 -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-copy-prop @gol
 -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol
 -ftree-forwprop -ftree-fre -ftree-loop-if-convert @gol
@@ -6708,6 +6708,12 @@ This option has no effect until one of @
 When pipelining loops during selective scheduling, also pipeline outer loops.
 This option has no effect until @option{-fsel-sched-pipelining} is turned on.
 
+@item -fshrink-wrap
+@opindex fshrink-wrap
+Emit function prologues only before parts of the function that need it,
+rather than at the top of the function.  This flag is enabled by default at
+@option{-O} and higher.
+
 @item -fcaller-saves
 @opindex fcaller-saves
 Enable values to be allocated in registers that will be clobbered by

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH 6/6] A testcase
  2011-03-23 14:44 Shrink-wrapping: Introduction Bernd Schmidt
                   ` (4 preceding siblings ...)
  2011-03-23 14:56 ` [PATCH 4/6] Shrink-wrapping Bernd Schmidt
@ 2011-03-23 14:57 ` Bernd Schmidt
  5 siblings, 0 replies; 73+ messages in thread
From: Bernd Schmidt @ 2011-03-23 14:57 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 180 bytes --]

I don't see a good way to scan assembly output for this optimization, so
I've just added the following testcase based on scanning the dump file.
Better ideas are welcome.


Bernd


[-- Attachment #2: tests.diff --]
[-- Type: text/plain, Size: 1329 bytes --]

	* function.c (thread_prologue_and_epilogue_insns): Emit information
	about shrink wrapping in the dump file.

	* gcc.target/i386/sw-1.c: New test.

Index: gcc/function.c
===================================================================
--- gcc.orig/function.c
+++ gcc/function.c
@@ -5752,6 +5752,9 @@ thread_prologue_and_epilogue_insns (void
       if (hard_reg_set_intersect_p (live_on_edge, prologue_clobbered))
 	entry_edge = orig_entry_edge;
 
+      if (dump_file && entry_edge != orig_entry_edge)
+	fprintf (dump_file, "Prologue moved down by shrink-wrapping.\n");
+
     fail_shrinkwrap:
       bitmap_clear (&bb_antic_flags);
       bitmap_clear (&bb_on_list);
Index: gcc/testsuite/gcc.target/i386/sw-1.c
===================================================================
--- /dev/null
+++ gcc/testsuite/gcc.target/i386/sw-1.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fshrink-wrap -fdump-rtl-pro_and_epilogue" } */
+
+#include <string.h>
+
+int c;
+int x[2000];
+__attribute__((regparm(1))) int foo (int a, int b)
+ {
+   int t[200];
+   if (a == 0)
+     return 1;
+   if (c == 0)
+     return 2;
+   memcpy (t, x + b, sizeof t);
+   return t[a];
+ }
+
+/* { dg-final { scan-rtl-dump "Prologue moved down" "pro_and_epilogue" } } */
+/* { dg-final { cleanup-rtl-dump "pro_and_epilogue" } } */

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 5/6] Generate more shrink-wrapping opportunities
  2011-03-23 14:56 ` [PATCH 5/6] Generate more shrink-wrapping opportunities Bernd Schmidt
@ 2011-03-23 15:03   ` Jeff Law
  2011-03-23 15:05     ` Bernd Schmidt
  2011-03-31 13:26   ` Jeff Law
  1 sibling, 1 reply; 73+ messages in thread
From: Jeff Law @ 2011-03-23 15:03 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 03/23/11 08:55, Bernd Schmidt wrote:
> The first basic block contains insns to move incoming argument registers
> to pseudos. When these pseudos live across calls, they get allocated to
> call-saved registers. This in turns disables shrink-wrapping, since the
> move instruction requires the prologue (saving the call-saved reg) to
> occur before it.
> 
> This patch addresses the problem by moving such moves downwards through
> the CFG until we find a place where the destination is used or the
> incoming argument is clobbered.
FWIW, downward motion of the moves out of arg registers (or loads from
arg slots) is definitely a good thing.  This was a regular source of
unnecessary register pressure leading to spills in codes I've looked at.

I hope your sinking code works better than the quick and dirty one I
wrote but never contributed.

jeff
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJNigvCAAoJEBRtltQi2kC7/DcH/0OPNnWZD9qoqRwrpm/Zc5qN
zaCsSc9VYFuQa5Kh6dTd0md6ORfqSWFt6v0ygOXueYt7/bni4YsEA33N52dp3VVY
xg6R0m1XEmfg8Pcn0SzyBGGmAnprgn7XpRnbOLycAT11CjfNFN9jjdeXFYbSHiNu
NkvdtiKzz2HeucmvTBEZByN1mhP3/9DeQ3R6MM7uZ9xZFuA4rBfx8wfijxTYEg2d
3T52kiDzqcTBsD6Q5apAtNcFU6X7o1KS/eZsbno+nnMcc4z7lQ+6EQVfnBPfs9m2
GLb4ZNNzYCesczNHM+DyuJfQQVAkECKx0DcAGL8AivffUr9o05l4nGsS5D4Kip8=
=gALv
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 5/6] Generate more shrink-wrapping opportunities
  2011-03-23 15:03   ` Jeff Law
@ 2011-03-23 15:05     ` Bernd Schmidt
  2011-03-23 15:18       ` Jeff Law
  0 siblings, 1 reply; 73+ messages in thread
From: Bernd Schmidt @ 2011-03-23 15:05 UTC (permalink / raw)
  To: Jeff Law; +Cc: GCC Patches

On 03/23/2011 04:03 PM, Jeff Law wrote:
> On 03/23/11 08:55, Bernd Schmidt wrote:
>> The first basic block contains insns to move incoming argument registers
>> to pseudos. When these pseudos live across calls, they get allocated to
>> call-saved registers. This in turns disables shrink-wrapping, since the
>> move instruction requires the prologue (saving the call-saved reg) to
>> occur before it.
> 
>> This patch addresses the problem by moving such moves downwards through
>> the CFG until we find a place where the destination is used or the
>> incoming argument is clobbered.
> FWIW, downward motion of the moves out of arg registers (or loads from
> arg slots) is definitely a good thing.  This was a regular source of
> unnecessary register pressure leading to spills in codes I've looked at.
> 
> I hope your sinking code works better than the quick and dirty one I
> wrote but never contributed.

Sadly I'm doing it after register allocation, so it wouldn't help with
your problem.


Bernd

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 5/6] Generate more shrink-wrapping opportunities
  2011-03-23 15:05     ` Bernd Schmidt
@ 2011-03-23 15:18       ` Jeff Law
  0 siblings, 0 replies; 73+ messages in thread
From: Jeff Law @ 2011-03-23 15:18 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 03/23/11 09:04, Bernd Schmidt wrote:
> On 03/23/2011 04:03 PM, Jeff Law wrote:
>> On 03/23/11 08:55, Bernd Schmidt wrote:
>>> The first basic block contains insns to move incoming argument registers
>>> to pseudos. When these pseudos live across calls, they get allocated to
>>> call-saved registers. This in turns disables shrink-wrapping, since the
>>> move instruction requires the prologue (saving the call-saved reg) to
>>> occur before it.
>>
>>> This patch addresses the problem by moving such moves downwards through
>>> the CFG until we find a place where the destination is used or the
>>> incoming argument is clobbered.
>> FWIW, downward motion of the moves out of arg registers (or loads from
>> arg slots) is definitely a good thing.  This was a regular source of
>> unnecessary register pressure leading to spills in codes I've looked at.
>>
>> I hope your sinking code works better than the quick and dirty one I
>> wrote but never contributed.
> 
> Sadly I'm doing it after register allocation, so it wouldn't help with
> your problem.
That'll still help :-)  I can run your sinking code, then use the
existing IRA callbacks to attempt to allocate any pseudos which didn't
previously get hard regs.  It's actually quite easy.

jeff
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJNig9CAAoJEBRtltQi2kC7I3kIAJ6manYH+/ZFcdQwfxroN0im
U0Oc18wbEz5VZg3SXnw7wEm6uRPKxYLS9/t2bMo0xLY2cHCWZx2QLH9g9O09lnUv
EU5lG46H7oIJuwhuC8osvsJUbLfQL5PkGQJdF4mfab1uk/Et5RSo8wfna7HDhTXU
b8WJltvS2ZJQIggSFxXtM101eq2/oiiU286WC8wdqlbq0lgWotBcxhHNuZeO/LEj
NeGX91kcFDl8RcwQNeT2a2G+JC0i5tc1S4C3d9pGgUiqDWpmjx74WrlRpzpkF8EF
7kfYm48xY5hPcmUcZ7vJU5Aq3Ik7U2rM6aO9H3dYoFTAOmcgs22Rbj6FoDMrUXE=
=N5L7
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 3/6] Allow jumps in epilogues
  2011-03-23 14:51 ` [PATCH 3/6] Allow jumps in epilogues Bernd Schmidt
@ 2011-03-23 16:46   ` Richard Henderson
  2011-03-23 16:49     ` Bernd Schmidt
  0 siblings, 1 reply; 73+ messages in thread
From: Richard Henderson @ 2011-03-23 16:46 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches

On 03/23/2011 07:50 AM, Bernd Schmidt wrote:
> dwarf2out has code that starts scanning from NOTE_INSN_EPILOGUE_BEG
> until it finds the return jump. When there is common code in several
> blocks ending in a return, we might want to share this, and in that case
> it would be possible to encounter a simplejump rather than a returnjump.
> This should be safe, and the following patch allows it.

With no more code than this, I cannot believe you're generating correct
unwind info anymore.

It would be possible to handle code merging including epilogue blocks
if (and IMO only if) you track unwind state on a per-block basis, and
propagate this information around the CFG, finally linearizing this
when blocks are re-ordered for the last time before final.

At present, sadly, we assume steady-state for the unwind info except
before PROLOGUE_END and after EPILOGUE_BEG.

r~

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 3/6] Allow jumps in epilogues
  2011-03-23 16:46   ` Richard Henderson
@ 2011-03-23 16:49     ` Bernd Schmidt
  2011-03-23 17:19       ` Richard Henderson
  0 siblings, 1 reply; 73+ messages in thread
From: Bernd Schmidt @ 2011-03-23 16:49 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches

On 03/23/2011 05:46 PM, Richard Henderson wrote:
> On 03/23/2011 07:50 AM, Bernd Schmidt wrote:
>> dwarf2out has code that starts scanning from NOTE_INSN_EPILOGUE_BEG
>> until it finds the return jump. When there is common code in several
>> blocks ending in a return, we might want to share this, and in that case
>> it would be possible to encounter a simplejump rather than a returnjump.
>> This should be safe, and the following patch allows it.
> 
> With no more code than this, I cannot believe you're generating correct
> unwind info anymore.

Why not? Are you worried about the code at the destination of the jump?
That should be preceded by another block falling through into it which
also has a NOTE_INSN_EPILOGUE_BEG.

If that isn't the problem you have in mind, what is and how can we test
for it?


Bernd

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 3/6] Allow jumps in epilogues
  2011-03-23 16:49     ` Bernd Schmidt
@ 2011-03-23 17:19       ` Richard Henderson
  2011-03-23 17:24         ` Bernd Schmidt
  2011-03-25 17:51         ` Bernd Schmidt
  0 siblings, 2 replies; 73+ messages in thread
From: Richard Henderson @ 2011-03-23 17:19 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches

On 03/23/2011 09:48 AM, Bernd Schmidt wrote:
>> With no more code than this, I cannot believe you're generating correct
>> unwind info anymore.
> 
> Why not? Are you worried about the code at the destination of the jump?
> That should be preceded by another block falling through into it which
> also has a NOTE_INSN_EPILOGUE_BEG.
> 
> If that isn't the problem you have in mind, what is and how can we test
> for it?

	body
	body
	restore r1	XXX
	restore r2	XXX
	jmp L2		XXX

L1:	body		YYY
	body		YYY
	restore r2

L2:	restore r3
	return

Assume for the moment that "restore" on this target is something that
can't be delayed or repeated.  E.g. a pop, rather than a move which
leaves the saved value in memory at a unknown offset from the CFA.

This means we have to emit unwind directives immediately after the 
restore insn and cannot delay the epilogue unwind until we deallocate
the entire stack frame.

This means that your patch either gets the unwind info wrong for
the XXX sequence or the YYY sequence.

Correct unwind info would look like

	body
	body
	.cfi_remember_state
	restore r1
	.cfi_restore r1
	restore r2
	.cfi_restore r2
	jmp L2
	.cfi_restore_state

L1:	body
	body
	restore r2
	.cfi_restore r2

L2:	// validate the unwind info across the CFG making sure that the incoming
	// edges contain the same unwind info here.
	restore r3
	.cfi_restore r3
	return

In general, with shrink-wrapping, we can have essentially arbitrary
differences in unwind info between blocks that are sequential.  We have
to be prepared to fully adjust the unwind state between blocks.

Assume a { x } is the set of registers saved into the stack frame in a
given block.  We have both incoming and outgoing sets.

foo:				// in: { }
	cmp	r1,r2
	jne	L1		// out: { }

L0:				// in: { }
	save r8
	save r9
	body
	...			// out: { r8, r9 }

L2:				// in: { r8, r9, r10 }
	body
	body
	...			// out: { r8, r9, r10 }

L1:				// in: { }
	save r8
	save r9
	save r10
	body
	...			// out: { r8, r9, r10 }

L3:				// in: { r8, r9, r10 }
	restore r10		// out: { r8, r9 }

L4:				// in: { r8, r9 }
	restore r9
	restore r8
	return

This layout requires more than just .cfi_remember_state/restore_state
between blocks.  We have to be prepared to emit full unwind info at
any point.  Assume cfi info marked with XXX exists between basic blocks
to fixup the transition points:

L0:	save r8
	save r9
	.cfi_offset r8,x
	.cfi_offset r9,y
	body

	.cfi_offset r10,z	XXX

L2:	body
	body

	.cfi_restore r8		XXX
	.cfi_restore r9		XXX
	.cfi_restore r10	XXX

L1:	save r8
	save r9
	save r10
	.cfi_offset r8,x
	.cfi_offset r9,y
	.cfi_offset r10,z
	body

If this isn't clear, please ask questions.  The problem of unwinding is
way more complicated than what you appear to be assuming.

r~

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 3/6] Allow jumps in epilogues
  2011-03-23 17:19       ` Richard Henderson
@ 2011-03-23 17:24         ` Bernd Schmidt
  2011-03-23 17:27           ` Richard Henderson
  2011-03-25 17:51         ` Bernd Schmidt
  1 sibling, 1 reply; 73+ messages in thread
From: Bernd Schmidt @ 2011-03-23 17:24 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches

On 03/23/2011 06:19 PM, Richard Henderson wrote:
> 	body
> 	body
> 	restore r1	XXX
> 	restore r2	XXX
> 	jmp L2		XXX
> 
> L1:	body		YYY
> 	body		YYY
> 	restore r2
> 
> L2:	restore r3
> 	return

> In general, with shrink-wrapping, we can have essentially arbitrary
> differences in unwind info between blocks that are sequential.

I don't think this can actually happen with the current implementation.
There is only one prologue, and all epilogues (the normal one and the
sibcall epilogues) match it exactly. I don't believe we can generate
code as in the example above, both before and after my patch.


Bernd

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 3/6] Allow jumps in epilogues
  2011-03-23 17:24         ` Bernd Schmidt
@ 2011-03-23 17:27           ` Richard Henderson
  2011-03-24 10:30             ` Bernd Schmidt
  0 siblings, 1 reply; 73+ messages in thread
From: Richard Henderson @ 2011-03-23 17:27 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches

On 03/23/2011 10:22 AM, Bernd Schmidt wrote:
> On 03/23/2011 06:19 PM, Richard Henderson wrote:
>> 	body
>> 	body
>> 	restore r1	XXX
>> 	restore r2	XXX
>> 	jmp L2		XXX
>>
>> L1:	body		YYY
>> 	body		YYY
>> 	restore r2
>>
>> L2:	restore r3
>> 	return
> 
>> In general, with shrink-wrapping, we can have essentially arbitrary
>> differences in unwind info between blocks that are sequential.
> 
> I don't think this can actually happen with the current implementation.
> There is only one prologue, and all epilogues (the normal one and the
> sibcall epilogues) match it exactly. I don't believe we can generate
> code as in the example above, both before and after my patch.

Um.. then what's this "allow jumps in epilogues" thing of which you speak?
If there's a jump, then it goes somewhere, and branches over something.
I see no constraints on what that something might be.

Could you give an example of a transformation that is allowed by this?



r~

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 3/6] Allow jumps in epilogues
  2011-03-23 17:27           ` Richard Henderson
@ 2011-03-24 10:30             ` Bernd Schmidt
  0 siblings, 0 replies; 73+ messages in thread
From: Bernd Schmidt @ 2011-03-24 10:30 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches

On 03/23/2011 06:27 PM, Richard Henderson wrote:
> On 03/23/2011 10:22 AM, Bernd Schmidt wrote:
>> On 03/23/2011 06:19 PM, Richard Henderson wrote:
>>> 	body
>>> 	body
>>> 	restore r1	XXX
>>> 	restore r2	XXX
>>> 	jmp L2		XXX
>>>
>>> L1:	body		YYY
>>> 	body		YYY
>>> 	restore r2
>>>
>>> L2:	restore r3
>>> 	return
>>
>>> In general, with shrink-wrapping, we can have essentially arbitrary
>>> differences in unwind info between blocks that are sequential.
>>
>> I don't think this can actually happen with the current implementation.
>> There is only one prologue, and all epilogues (the normal one and the
>> sibcall epilogues) match it exactly. I don't believe we can generate
>> code as in the example above, both before and after my patch.
> 
> Um.. then what's this "allow jumps in epilogues" thing of which you speak?
> If there's a jump, then it goes somewhere, and branches over something.
> I see no constraints on what that something might be.
> 
> Could you give an example of a transformation that is allowed by this?

The idea was to be able to share a single return instruction between
epilogue/non-epilogue return paths, so that e.g. on i686 a conditional
return could be implemented as a conditional jump to a common return
insn. The allow-jumps patch then becomes necessary because bbro can move
the blocks around.

It does seem, however, that bbro can in fact cause problems for the
unwind information when the prologue is no longer in the first block.
Let me try to come up with a solution for that.


Bernd

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 3/6] Allow jumps in epilogues
  2011-03-23 17:19       ` Richard Henderson
  2011-03-23 17:24         ` Bernd Schmidt
@ 2011-03-25 17:51         ` Bernd Schmidt
  2011-03-26  5:33           ` Richard Henderson
  1 sibling, 1 reply; 73+ messages in thread
From: Bernd Schmidt @ 2011-03-25 17:51 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches

On 03/23/2011 06:19 PM, Richard Henderson wrote:
> In general, with shrink-wrapping, we can have essentially arbitrary
> differences in unwind info between blocks that are sequential.

So, while that isn't the case just yet with the current shrink-wrapping
patch, it seems I will either have to make dwarf2out fully general, or
ensure that basic blocks occur in only a certain order (the prologue
must be written out only after all basic blocks that can be executed
before or without reaching it).

I don't know much about the unwinding code. I'm currently thinking about
writing out a cfi_remember_state at the start of the function, restoring
that clean state when necessary at the start of a new block and emitting
the necessary directives to reach the correct state. What directives
should I expect to be required? Can I get by just with cfi_offset and
cfi_def_cfa_offset, or will something else be necessary?

Bernd

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 3/6] Allow jumps in epilogues
  2011-03-25 17:51         ` Bernd Schmidt
@ 2011-03-26  5:33           ` Richard Henderson
  2011-03-31 20:09             ` Bernd Schmidt
  0 siblings, 1 reply; 73+ messages in thread
From: Richard Henderson @ 2011-03-26  5:33 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches

On 03/25/2011 10:34 AM, Bernd Schmidt wrote:
> I don't know much about the unwinding code. I'm currently thinking about
> writing out a cfi_remember_state at the start of the function, restoring
> that clean state when necessary at the start of a new block and emitting
> the necessary directives to reach the correct state. What directives
> should I expect to be required? Can I get by just with cfi_offset and
> cfi_def_cfa_offset, or will something else be necessary?

Yes, several things: register, expression, gnu_args_size, perhaps a few more.

I think the ideal thing would be a pass while the cfg is still extant that
captures the unwind info into notes; these can be recorded at basic block
boundaries, so that they persist until the end of compilation.

So long as late late compilation passes continue to not move frame-related
insns across basic block boundaries, we should be fine.

Irritatingly, the exact place to locate this pass is difficult to pin down.
Immediately before md_reorg is the last place we have the cfg.  But we do
strange things in, e.g. ia64 where we rebuild the cfg and run sched_ebb
during md_reorg.

Of course, ia64 is a bad example because its unwind info is target-specific,
and quite a lot of the possible benefit of shrink wrapping is lost via the
register windowing.

I'm willing to work with you on the problem of cfg-aware unwind info.  We
have needed this for a really long time; there are existing bugs related 
to exception handling and !ACCUMULATE_OUTGOING_ARGS that would be fixed by
this.

r~

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 1/6] Disallow predicating the prologue
  2011-03-23 14:46 ` [PATCH 1/6] Disallow predicating the prologue Bernd Schmidt
@ 2011-03-31 13:20   ` Jeff Law
  2011-04-01 18:59   ` H.J. Lu
  1 sibling, 0 replies; 73+ messages in thread
From: Jeff Law @ 2011-03-31 13:20 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 03/23/11 08:45, Bernd Schmidt wrote:
> With prologues appearing in blocks other than the entry block, ifcvt can
> decide to predicate them. This is not a good idea, as dwarf2out will
> blow up trying to handle predicated frame-related things.
OK.
jeff
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJNlH2qAAoJEBRtltQi2kC7BP0H/jB4AxLO9QsXlFdHcsM0j9Kd
UG8NL+orqkfL3mxH+LTw5SkpovgSKf2n8NzTeJoe6xA2wHcPcVdI9or8AdnsUdg1
aW9qXTACw7CLa5f/7mPj/3XSSzw+Ub42lbSkt2o/dZBjizI8z2EPdDXduAxne5aL
BZCKi56Sg47g+B1UB2F9S6DFA6VL+Qiv+GJqPQw1Bfd6OQxpfircAr4ZB60I9TwN
1hCy3tOUxugA1cfs/U5UJKbM18kEnjcX13R1PthWvzow/xhYSwaITwD3eo5dX/ps
L2uCK5FBSKCHBrd9NRxTTpK75yAdyh+wLFpPHcmHJXsfrFwFuXjqqO+9sydyTlM=
=s6Dp
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 2/6] Unique return rtx
  2011-03-23 14:48 ` [PATCH 2/6] Unique return rtx Bernd Schmidt
@ 2011-03-31 13:23   ` Jeff Law
  2011-05-03 11:54     ` Bernd Schmidt
  0 siblings, 1 reply; 73+ messages in thread
From: Jeff Law @ 2011-03-31 13:23 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 03/23/11 08:47, Bernd Schmidt wrote:
> We'll start putting "return" into JUMP_LABELS in a subsequent patch, so
> I've decided to make it unique as a small cleanup.
> 
> There's already another macro called "return_rtx", so the new one goes
> by the name of "ret_rtx".
OK.
Jeff
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJNlH7oAAoJEBRtltQi2kC7HzUIAILc84OamkwdrFzp/Y4y8C+r
4zYJnn3IYxUcr9fb7kvssK+A8ABZF3qO34v9xzFkmEf7nsOSeivoM6qyo4ai/uuo
CJ7sKgHg44JcBC4eS6qJQIYAJ9VJH2etQTrFyaI2zORQg0LzuU+iG/wEIuWlRVj7
4C46JIzeNJ9ntPXvXJwjUjWimHlPH5WhTdxa+6fchEuZwkRK2ckGJ/nsZ+XEUZJJ
HmizaKR5icGj3CIEiJEAxltuCMy4wOLes/fawfLGah6o/q51rSRsiPZrSO3teK15
VM+8lvdOyRR0O1npDgIERCOV/L2qiWkmkYQDdwoPy11YxMin2JPhxBkOb+lx9Eg=
=Wd06
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 5/6] Generate more shrink-wrapping opportunities
  2011-03-23 14:56 ` [PATCH 5/6] Generate more shrink-wrapping opportunities Bernd Schmidt
  2011-03-23 15:03   ` Jeff Law
@ 2011-03-31 13:26   ` Jeff Law
  2011-03-31 13:34     ` Bernd Schmidt
  1 sibling, 1 reply; 73+ messages in thread
From: Jeff Law @ 2011-03-31 13:26 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 03/23/11 08:55, Bernd Schmidt wrote:
> The first basic block contains insns to move incoming argument registers
> to pseudos. When these pseudos live across calls, they get allocated to
> call-saved registers. This in turns disables shrink-wrapping, since the
> move instruction requires the prologue (saving the call-saved reg) to
> occur before it.
> 
> This patch addresses the problem by moving such moves downwards through
> the CFG until we find a place where the destination is used or the
> incoming argument is clobbered.
OK.  At some point I'll probably want to move this code to run
immediately after IRA completes and extend it in a few ways, but for now
it's OK as it stands.

jeff
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJNlIA5AAoJEBRtltQi2kC7DeIH+weQ1mb1whdL6gD18RmLeBvR
cMtbAHMOz6Hyt8vojAjyXEFJtVPGDv9tbr3/GiKZiOamBhXLVy38YCh9dCZDaE/K
+sqSCSPSc9/N0poUpeYY/Am5HWp/rBXbY0+vUxnaS3rCTGAiOSrzpw4/VR23MTLN
mVTa4lhz4KPifFIigpvGoNS+n1MZkUfeg7CoMXMvVl516bTwuLHB0VcSWV7qud8w
E4uIqHngf8k1rSLvGOsRy08a2tXxJBIRolfTI8bYZLFXG9EhsIQnR5e664ySBgyj
gVPVBSTT+zpbpkF12f8qJ/PLO1JWZlS7gDTwS+lGhEcWq9hmO3+rP/C5Z7XXsYQ=
=ippB
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 5/6] Generate more shrink-wrapping opportunities
  2011-03-31 13:26   ` Jeff Law
@ 2011-03-31 13:34     ` Bernd Schmidt
  0 siblings, 0 replies; 73+ messages in thread
From: Bernd Schmidt @ 2011-03-31 13:34 UTC (permalink / raw)
  To: Jeff Law; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1422 bytes --]

On 03/31/2011 03:23 PM, Jeff Law wrote:
> On 03/23/11 08:55, Bernd Schmidt wrote:
>> The first basic block contains insns to move incoming argument registers
>> to pseudos. When these pseudos live across calls, they get allocated to
>> call-saved registers. This in turns disables shrink-wrapping, since the
>> move instruction requires the prologue (saving the call-saved reg) to
>> occur before it.
> 
>> This patch addresses the problem by moving such moves downwards through
>> the CFG until we find a place where the destination is used or the
>> incoming argument is clobbered.
> OK.

Thanks for the reviews. This patch is still blocked by the issues raised
by rth in 3/6 (and of course the full shrink-wrapping patch 4/6); I hope
to have at least draft patches for the dwarf2out problems by the end of
the week.

We've also discovered a problem with this particular patch, it requires
the additional fix posted below which I'll include in an eventual commit.

> At some point I'll probably want to move this code to run
> immediately after IRA completes and extend it in a few ways, but for now
> it's OK as it stands.

Note that the prepare_shrink_wrap code can be further enhanced by
running a regcprop pass before thread_prologue_and_epilogue_insns, since
that tends to eliminate uses of the call-used destination reg. I'm going
to submit a patch for this once all the other shrink-wrapping bits are in.


Bernd

[-- Attachment #2: active.diff --]
[-- Type: text/plain, Size: 1866 bytes --]

Index: function.c
===================================================================
--- function.c	(revision 318184)
+++ function.c	(working copy)
@@ -5190,6 +5190,23 @@ emit_return_into_block (bool simple_p, b
 }
 #endif
 
+/* Return true if BB has any active insns.  */
+static bool
+bb_active_p (basic_block bb)
+{
+  rtx label;
+
+  /* Test whether there are active instructions in the last block.  */
+  label = BB_END (bb);
+  while (label && !LABEL_P (label))
+    {
+      if (active_insn_p (label))
+	break;
+      label = PREV_INSN (label);
+    }
+  return BB_HEAD (bb) != label || !LABEL_P (label);
+}
+
 /* Generate the prologue and epilogue RTL if the machine supports it.  Thread
    this into place with notes indicating where the prologue ends and where
    the epilogue begins.  Update the basic block information when possible.
@@ -5275,19 +5292,8 @@ thread_prologue_and_epilogue_insns (void
   exit_fallthru_edge = find_fallthru_edge (EXIT_BLOCK_PTR->preds);
   if (exit_fallthru_edge != NULL)
     {
-      rtx label;
-
       last_bb = exit_fallthru_edge->src;
-      /* Test whether there are active instructions in the last block.  */
-      label = BB_END (last_bb);
-      while (label && !LABEL_P (label))
-	{
-	  if (active_insn_p (label))
-	    break;
-	  label = PREV_INSN (label);
-	}
-
-      last_bb_active = BB_HEAD (last_bb) != label || !LABEL_P (label);
+      last_bb_active = bb_active_p (last_bb);
     }
   else
     {
@@ -5344,6 +5350,10 @@ thread_prologue_and_epilogue_insns (void
 
       prepare_shrink_wrap (entry_edge->dest);
 
+      /* That may have inserted instructions into the last block.  */
+      if (last_bb && !last_bb_active)
+	last_bb_active = bb_active_p (last_bb);
+
       bitmap_initialize (&bb_antic_flags, &bitmap_default_obstack);
       bitmap_initialize (&bb_on_list, &bitmap_default_obstack);
 

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 3/6] Allow jumps in epilogues
  2011-03-26  5:33           ` Richard Henderson
@ 2011-03-31 20:09             ` Bernd Schmidt
  2011-03-31 21:51               ` Richard Henderson
  0 siblings, 1 reply; 73+ messages in thread
From: Bernd Schmidt @ 2011-03-31 20:09 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2365 bytes --]

On 03/26/2011 04:26 AM, Richard Henderson wrote:
> I think the ideal thing would be a pass while the cfg is still extant that
> captures the unwind info into notes; these can be recorded at basic block
> boundaries, so that they persist until the end of compilation.
> 
> So long as late late compilation passes continue to not move frame-related
> insns across basic block boundaries, we should be fine.

I'm nervous about this as the reorg pass can do arbitrary
transformations. On Blackfin for example, we can reorder basic blocks
for the sake of loop optimizations; sched-ebb can create new blocks,
etc. I think it would be best if we can somehow make it work during
final, without a CFG.

> I'm willing to work with you on the problem of cfg-aware unwind info.  We
> have needed this for a really long time; there are existing bugs related 
> to exception handling and !ACCUMULATE_OUTGOING_ARGS that would be fixed by
> this.

I'm appending a series of draft patches. It tries to compute CFIs for
stretches of straight-line code and records state for potential jump
targets, iterating until everything is covered. Then, a pass through all
insns from start to finish reassembles the pieces into a coherent string
of CFI insns.

Rather than use a CFG, I've tried to do something similar to
compute_barrier_args_size, using JUMP_LABELs etc.

Summary of the patches:
001 - just create a dwarf2out_frame_debug_init function.
002 - Make it walk the function in a first pass and record CFIs to
      be output later
003 - Store dw_cfi_refs in VECs rather than linked lists. Looks
      larger than it is due to reindentation
004 - Change the function walk introduced in 002 so that it records
      and restores state when reaching jumps/barriers

For now I'd just like some input on whether this looks remotely viable.
There are a number of known problems with it:
* The generated CFIs are inefficient (poor use of remember/restore)
* -freorder-blocks-and-partition is broken
* i386.c uses dwarf2out_frame_debug directly in some cases and is
  unconverted
* I haven't tested whether my attempt to use
  get_eh_landing_pad_from_rtx in the absence of a CFG actually works
* Computed jumps and nonlocal gotos aren't handled. I think this
  could be done by recording the state at NOTE_INSN_PROLOGUE_END
  and using that for all labels we can't otherwise reach.


Bernd

[-- Attachment #2: 001-dwinit.diff --]
[-- Type: text/plain, Size: 3095 bytes --]

Index: gcc/dwarf2out.c
===================================================================
--- gcc.orig/dwarf2out.c
+++ gcc/dwarf2out.c
@@ -2790,38 +2790,6 @@ dwarf2out_frame_debug (rtx insn, bool af
   rtx note, n;
   bool handled_one = false;
 
-  if (insn == NULL_RTX)
-    {
-      size_t i;
-
-      /* Flush any queued register saves.  */
-      dwarf2out_flush_queued_reg_saves ();
-
-      /* Set up state for generating call frame debug info.  */
-      lookup_cfa (&cfa);
-      gcc_assert (cfa.reg
-		  == (unsigned long)DWARF_FRAME_REGNUM (STACK_POINTER_REGNUM));
-
-      cfa.reg = STACK_POINTER_REGNUM;
-      cfa_store = cfa;
-      cfa_temp.reg = -1;
-      cfa_temp.offset = 0;
-
-      for (i = 0; i < num_regs_saved_in_regs; i++)
-	{
-	  regs_saved_in_regs[i].orig_reg = NULL_RTX;
-	  regs_saved_in_regs[i].saved_in_reg = NULL_RTX;
-	}
-      num_regs_saved_in_regs = 0;
-
-      if (barrier_args_size)
-	{
-	  XDELETEVEC (barrier_args_size);
-	  barrier_args_size = NULL;
-	}
-      return;
-    }
-
   if (!NONJUMP_INSN_P (insn) || clobbers_queued_reg_save (insn))
     dwarf2out_flush_queued_reg_saves ();
 
@@ -2939,6 +2907,38 @@ dwarf2out_frame_debug (rtx insn, bool af
     dwarf2out_flush_queued_reg_saves ();
 }
 
+void
+dwarf2out_frame_debug_init (void)
+{
+  size_t i;
+
+  /* Flush any queued register saves.  */
+  dwarf2out_flush_queued_reg_saves ();
+
+  /* Set up state for generating call frame debug info.  */
+  lookup_cfa (&cfa);
+  gcc_assert (cfa.reg
+	      == (unsigned long)DWARF_FRAME_REGNUM (STACK_POINTER_REGNUM));
+
+  cfa.reg = STACK_POINTER_REGNUM;
+  cfa_store = cfa;
+  cfa_temp.reg = -1;
+  cfa_temp.offset = 0;
+
+  for (i = 0; i < num_regs_saved_in_regs; i++)
+    {
+      regs_saved_in_regs[i].orig_reg = NULL_RTX;
+      regs_saved_in_regs[i].saved_in_reg = NULL_RTX;
+    }
+  num_regs_saved_in_regs = 0;
+
+  if (barrier_args_size)
+    {
+      XDELETEVEC (barrier_args_size);
+      barrier_args_size = NULL;
+    }
+}
+
 /* Determine if we need to save and restore CFI information around this
    epilogue.  If SIBCALL is true, then this is a sibcall epilogue.  If
    we do need to save/restore, then emit the save now, and insert a
Index: gcc/dwarf2out.h
===================================================================
--- gcc.orig/dwarf2out.h
+++ gcc/dwarf2out.h
@@ -20,6 +20,7 @@ along with GCC; see the file COPYING3.  
 
 extern void dwarf2out_decl (tree);
 extern void dwarf2out_frame_debug (rtx, bool);
+extern void dwarf2out_frame_debug_init (void);
 extern void dwarf2out_cfi_begin_epilogue (rtx);
 extern void dwarf2out_frame_debug_restore_state (void);
 extern void dwarf2out_flush_queued_reg_saves (void);
Index: gcc/final.c
===================================================================
--- gcc.orig/final.c
+++ gcc/final.c
@@ -1561,7 +1561,7 @@ final_start_function (rtx first ATTRIBUT
 
 #if defined (HAVE_prologue)
   if (dwarf2out_do_frame ())
-    dwarf2out_frame_debug (NULL_RTX, false);
+    dwarf2out_frame_debug_init ();
 #endif
 
   /* If debugging, assign block numbers to all of the blocks in this

[-- Attachment #3: 002-scanfirst.diff --]
[-- Type: text/plain, Size: 6919 bytes --]

    	* cfgcleanup.c (flow_find_head_matching_sequence): Ignore
    	epilogue notes.
    	* df-problems.c (can_move_insns_across): Don't stop at epilogue
    	notes.
    	* dwarf2out.c (dwarf2out_cfi_begin_epilogue): Also allow a
    	simplejump to end the block.

Index: gcc/dwarf2out.c
===================================================================
--- gcc.orig/dwarf2out.c
+++ gcc/dwarf2out.c
@@ -471,6 +471,8 @@ static void output_call_frame_info (int)
 static void dwarf2out_note_section_used (void);
 static bool clobbers_queued_reg_save (const_rtx);
 static void dwarf2out_frame_debug_expr (rtx, const char *);
+static void dwarf2out_cfi_begin_epilogue (rtx);
+static void dwarf2out_frame_debug_restore_state (void);
 
 /* Support for complex CFA locations.  */
 static void output_cfa_loc (dw_cfi_ref, int);
@@ -879,6 +881,9 @@ dwarf2out_cfi_label (bool force)
   return label;
 }
 
+/* The insn after which a new CFI note should be emitted.  */
+static rtx cfi_insn;
+
 /* True if remember_state should be emitted before following CFI directive.  */
 static bool emit_cfa_remember;
 
@@ -961,7 +966,8 @@ add_fde_cfi (const char *label, dw_cfi_r
 	        }
 	    }
 
-	  output_cfi_directive (cfi);
+	  cfi_insn = emit_note_after (NOTE_INSN_CFI, cfi_insn);
+	  NOTE_CFI (cfi_insn) = cfi;
 
 	  list_head = &fde->dw_fde_cfi;
 	  any_cfis_emitted = true;
@@ -2790,6 +2796,11 @@ dwarf2out_frame_debug (rtx insn, bool af
   rtx note, n;
   bool handled_one = false;
 
+  if (after_p)
+    cfi_insn = insn;
+  else
+    cfi_insn = PREV_INSN (insn);
+
   if (!NONJUMP_INSN_P (insn) || clobbers_queued_reg_save (insn))
     dwarf2out_flush_queued_reg_saves ();
 
@@ -2911,6 +2922,7 @@ void
 dwarf2out_frame_debug_init (void)
 {
   size_t i;
+  rtx insn;
 
   /* Flush any queued register saves.  */
   dwarf2out_flush_queued_reg_saves ();
@@ -2937,12 +2949,64 @@ dwarf2out_frame_debug_init (void)
       XDELETEVEC (barrier_args_size);
       barrier_args_size = NULL;
     }
+  for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
+    {
+      rtx pat;
+      if (BARRIER_P (insn))
+	{
+	  dwarf2out_frame_debug (insn, false);
+	  continue;
+	}
+      else if (NOTE_P (insn))
+	{
+	  switch (NOTE_KIND (insn))
+	    {
+	    case NOTE_INSN_EPILOGUE_BEG:
+#if defined (HAVE_epilogue)
+	      dwarf2out_cfi_begin_epilogue (insn);
+#endif
+	      break;
+	    case NOTE_INSN_CFA_RESTORE_STATE:
+	      cfi_insn = insn;
+	      dwarf2out_frame_debug_restore_state ();
+	      break;
+	    }
+	  continue;
+	}
+      if (!NONDEBUG_INSN_P (insn))
+	continue;
+      pat = PATTERN (insn);
+      if (asm_noperands (pat) >= 0)
+	continue;
+      if (GET_CODE (pat) == SEQUENCE)
+	{
+	  int j;
+	  for (j = 1; j < XVECLEN (pat, 0); j++)
+	    dwarf2out_frame_debug (XVECEXP (pat, 0, j), false);
+	  insn = XVECEXP (pat, 0, 0);
+	}
+
+      if (CALL_P (insn) && dwarf2out_do_frame ())
+	dwarf2out_frame_debug (insn, false);
+      if (dwarf2out_do_frame ()
+#if !defined (HAVE_prologue)
+	  && !ACCUMULATE_OUTGOING_ARGS 
+#endif
+	  )
+	dwarf2out_frame_debug (insn, true);
+    }
+}
+
+void
+dwarf2out_emit_cfi (dw_cfi_ref cfi)
+{
+  output_cfi_directive (cfi);
 }
 
-/* Determine if we need to save and restore CFI information around this
-   epilogue.  If SIBCALL is true, then this is a sibcall epilogue.  If
-   we do need to save/restore, then emit the save now, and insert a
-   NOTE_INSN_CFA_RESTORE_STATE at the appropriate place in the stream.  */
+/* Determine if we need to save and restore CFI information around
+   this epilogue.  If we do need to save/restore, then emit the save
+   now, and insert a NOTE_INSN_CFA_RESTORE_STATE at the appropriate
+   place in the stream.  */
 
 void
 dwarf2out_cfi_begin_epilogue (rtx insn)
@@ -2957,8 +3021,10 @@ dwarf2out_cfi_begin_epilogue (rtx insn)
       if (!INSN_P (i))
 	continue;
 
-      /* Look for both regular and sibcalls to end the block.  */
-      if (returnjump_p (i))
+      /* Look for both regular and sibcalls to end the block.  Various
+	 optimization passes may cause us to jump to a common epilogue
+	 tail, so we also accept simplejumps.  */
+      if (returnjump_p (i) || simplejump_p (i))
 	break;
       if (CALL_P (i) && SIBLING_CALL_P (i))
 	break;
Index: gcc/dwarf2out.h
===================================================================
--- gcc.orig/dwarf2out.h
+++ gcc/dwarf2out.h
@@ -18,11 +18,11 @@ You should have received a copy of the G
 along with GCC; see the file COPYING3.  If not see
 <http://www.gnu.org/licenses/>.  */
 
+struct dw_cfi_struct;
 extern void dwarf2out_decl (tree);
 extern void dwarf2out_frame_debug (rtx, bool);
 extern void dwarf2out_frame_debug_init (void);
-extern void dwarf2out_cfi_begin_epilogue (rtx);
-extern void dwarf2out_frame_debug_restore_state (void);
+extern void dwarf2out_emit_cfi (struct dw_cfi_struct *);
 extern void dwarf2out_flush_queued_reg_saves (void);
 
 extern void debug_dwarf (void);
Index: gcc/insn-notes.def
===================================================================
--- gcc.orig/insn-notes.def
+++ gcc/insn-notes.def
@@ -77,4 +77,8 @@ INSN_NOTE (SWITCH_TEXT_SECTIONS)
    when an epilogue appears in the middle of a function.  */
 INSN_NOTE (CFA_RESTORE_STATE)
 
+/* When emitting dwarf2 frame information, contains a directive that
+   should be emitted.  */
+INSN_NOTE (CFI)
+
 #undef INSN_NOTE
Index: gcc/rtl.h
===================================================================
--- gcc.orig/rtl.h
+++ gcc/rtl.h
@@ -180,6 +180,7 @@ union rtunion_def
   mem_attrs *rt_mem;
   reg_attrs *rt_reg;
   struct constant_descriptor_rtx *rt_constant;
+  struct dw_cfi_struct *rt_cfi;
 };
 typedef union rtunion_def rtunion;
 
@@ -708,6 +709,7 @@ extern void rtl_check_failed_flag (const
 #define XTREE(RTX, N)   (RTL_CHECK1 (RTX, N, 't').rt_tree)
 #define XBBDEF(RTX, N)	(RTL_CHECK1 (RTX, N, 'B').rt_bb)
 #define XTMPL(RTX, N)	(RTL_CHECK1 (RTX, N, 'T').rt_str)
+#define XCFI(RTX, N)	(RTL_CHECK1 (RTX, N, 'C').rt_cfi)
 
 #define XVECEXP(RTX, N, M)	RTVEC_ELT (XVEC (RTX, N), M)
 #define XVECLEN(RTX, N)		GET_NUM_ELEM (XVEC (RTX, N))
@@ -740,6 +742,7 @@ extern void rtl_check_failed_flag (const
 #define XCMODE(RTX, N, C)     (RTL_CHECKC1 (RTX, N, C).rt_type)
 #define XCTREE(RTX, N, C)     (RTL_CHECKC1 (RTX, N, C).rt_tree)
 #define XCBBDEF(RTX, N, C)    (RTL_CHECKC1 (RTX, N, C).rt_bb)
+#define XCCFI(RTX, N, C)      (RTL_CHECKC1 (RTX, N, C).rt_cfi)
 #define XCCSELIB(RTX, N, C)   (RTL_CHECKC1 (RTX, N, C).rt_cselib)
 
 #define XCVECEXP(RTX, N, M, C)	RTVEC_ELT (XCVEC (RTX, N, C), M)
@@ -882,6 +885,7 @@ extern const char * const reg_note_name[
 #define NOTE_BLOCK(INSN)	XCTREE (INSN, 4, NOTE)
 #define NOTE_EH_HANDLER(INSN)	XCINT (INSN, 4, NOTE)
 #define NOTE_BASIC_BLOCK(INSN)	XCBBDEF (INSN, 4, NOTE)
+#define NOTE_CFI(INSN)		XCCFI (INSN, 4, NOTE)
 #define NOTE_VAR_LOCATION(INSN)	XCEXP (INSN, 4, NOTE)
 
 /* In a NOTE that is a line number, this is the line number.

[-- Attachment #4: 003-cfivec.diff --]
[-- Type: text/plain, Size: 18457 bytes --]

Index: gcc/dwarf2out.c
===================================================================
--- gcc.orig/dwarf2out.c
+++ gcc/dwarf2out.c
@@ -267,7 +267,6 @@ typedef union GTY(()) dw_cfi_oprnd_struc
 dw_cfi_oprnd;
 
 typedef struct GTY(()) dw_cfi_struct {
-  dw_cfi_ref dw_cfi_next;
   enum dwarf_call_frame_info dw_cfi_opc;
   dw_cfi_oprnd GTY ((desc ("dw_cfi_oprnd1_desc (%1.dw_cfi_opc)")))
     dw_cfi_oprnd1;
@@ -276,6 +275,12 @@ typedef struct GTY(()) dw_cfi_struct {
 }
 dw_cfi_node;
 
+DEF_VEC_P (dw_cfi_ref);
+DEF_VEC_ALLOC_P (dw_cfi_ref, heap);
+DEF_VEC_ALLOC_P (dw_cfi_ref, gc);
+
+typedef VEC(dw_cfi_ref, gc) *cfi_vec;
+
 /* This is how we define the location of the CFA. We use to handle it
    as REG + OFFSET all the time,  but now it can be more complex.
    It can now be either REG + CFA_OFFSET or *(REG + BASE_OFFSET) + CFA_OFFSET.
@@ -306,8 +311,8 @@ typedef struct GTY(()) dw_fde_struct {
   const char *dw_fde_hot_section_end_label;
   const char *dw_fde_unlikely_section_label;
   const char *dw_fde_unlikely_section_end_label;
-  dw_cfi_ref dw_fde_cfi;
-  dw_cfi_ref dw_fde_switch_cfi; /* Last CFI before switching sections.  */
+  cfi_vec dw_fde_cfi;
+  int dw_fde_switch_cfi_index; /* Last CFI before switching sections.  */
   HOST_WIDE_INT stack_realignment;
   unsigned funcdef_number;
   /* Dynamic realign argument pointer register.  */
@@ -416,8 +421,8 @@ current_fde (void)
   return fde_table_in_use ? &fde_table[fde_table_in_use - 1] : NULL;
 }
 
-/* A list of call frame insns for the CIE.  */
-static GTY(()) dw_cfi_ref cie_cfi_head;
+/* A vector of call frame insns for the CIE.  */
+static GTY(()) cfi_vec cie_cfi_vec;
 
 /* Some DWARF extensions (e.g., MIPS/SGI) implement a subprogram
    attribute that accelerates the lookup of the FDE associated
@@ -457,7 +462,7 @@ static GTY(()) section *cold_text_sectio
 static char *stripattributes (const char *);
 static const char *dwarf_cfi_name (unsigned);
 static dw_cfi_ref new_cfi (void);
-static void add_cfi (dw_cfi_ref *, dw_cfi_ref);
+static void add_cfi (cfi_vec *, dw_cfi_ref);
 static void add_fde_cfi (const char *, dw_cfi_ref);
 static void lookup_cfa_1 (dw_cfi_ref, dw_cfa_location *, dw_cfa_location *);
 static void lookup_cfa (dw_cfa_location *);
@@ -815,7 +820,6 @@ new_cfi (void)
 {
   dw_cfi_ref cfi = ggc_alloc_dw_cfi_node ();
 
-  cfi->dw_cfi_next = NULL;
   cfi->dw_cfi_oprnd1.dw_cfi_reg_num = 0;
   cfi->dw_cfi_oprnd2.dw_cfi_reg_num = 0;
 
@@ -825,9 +829,8 @@ new_cfi (void)
 /* Add a Call Frame Instruction to list of instructions.  */
 
 static inline void
-add_cfi (dw_cfi_ref *list_head, dw_cfi_ref cfi)
+add_cfi (cfi_vec *vec, dw_cfi_ref cfi)
 {
-  dw_cfi_ref *p;
   dw_fde_ref fde = current_fde ();
 
   /* When DRAP is used, CFA is defined with an expression.  Redefine
@@ -849,11 +852,7 @@ add_cfi (dw_cfi_ref *list_head, dw_cfi_r
           break;
       }
 
-  /* Find the end of the chain.  */
-  for (p = list_head; (*p) != NULL; p = &(*p)->dw_cfi_next)
-    ;
-
-  *p = cfi;
+  VEC_safe_push (dw_cfi_ref, gc, *vec, cfi);
 }
 
 /* Generate a new label for the CFI info to refer to.  FORCE is true
@@ -896,7 +895,12 @@ static bool any_cfis_emitted;
 static void
 add_fde_cfi (const char *label, dw_cfi_ref cfi)
 {
-  dw_cfi_ref *list_head;
+  cfi_vec *vec;
+
+  if (cie_cfi_vec == NULL)
+    cie_cfi_vec = VEC_alloc (dw_cfi_ref, gc, 20);
+
+  vec = &cie_cfi_vec;
 
   if (emit_cfa_remember)
     {
@@ -909,8 +913,6 @@ add_fde_cfi (const char *label, dw_cfi_r
       add_fde_cfi (label, cfi_remember);
     }
 
-  list_head = &cie_cfi_head;
-
   if (dwarf2out_do_cfi_asm ())
     {
       if (label)
@@ -969,7 +971,7 @@ add_fde_cfi (const char *label, dw_cfi_r
 	  cfi_insn = emit_note_after (NOTE_INSN_CFI, cfi_insn);
 	  NOTE_CFI (cfi_insn) = cfi;
 
-	  list_head = &fde->dw_fde_cfi;
+	  vec = &fde->dw_fde_cfi;
 	  any_cfis_emitted = true;
 	}
       /* ??? If this is a CFI for the CIE, we don't emit.  This
@@ -1007,11 +1009,11 @@ add_fde_cfi (const char *label, dw_cfi_r
 	  fde->dw_fde_current_label = label;
 	}
 
-      list_head = &fde->dw_fde_cfi;
+      vec = &fde->dw_fde_cfi;
       any_cfis_emitted = true;
     }
 
-  add_cfi (list_head, cfi);
+  add_cfi (vec, cfi);
 }
 
 /* Subroutine of lookup_cfa.  */
@@ -1058,6 +1060,7 @@ lookup_cfa_1 (dw_cfi_ref cfi, dw_cfa_loc
 static void
 lookup_cfa (dw_cfa_location *loc)
 {
+  int ix;
   dw_cfi_ref cfi;
   dw_fde_ref fde;
   dw_cfa_location remember;
@@ -1066,12 +1069,12 @@ lookup_cfa (dw_cfa_location *loc)
   loc->reg = INVALID_REGNUM;
   remember = *loc;
 
-  for (cfi = cie_cfi_head; cfi; cfi = cfi->dw_cfi_next)
+  FOR_EACH_VEC_ELT (dw_cfi_ref, cie_cfi_vec, ix, cfi)
     lookup_cfa_1 (cfi, loc, &remember);
 
   fde = current_fde ();
   if (fde)
-    for (cfi = fde->dw_fde_cfi; cfi; cfi = cfi->dw_cfi_next)
+    FOR_EACH_VEC_ELT (dw_cfi_ref, fde->dw_fde_cfi, ix, cfi)
       lookup_cfa_1 (cfi, loc, &remember);
 }
 
@@ -3496,169 +3499,181 @@ output_cfi_directive (dw_cfi_ref cfi)
     }
 }
 
-DEF_VEC_P (dw_cfi_ref);
-DEF_VEC_ALLOC_P (dw_cfi_ref, heap);
-
-/* Output CFIs to bring current FDE to the same state as after executing
-   CFIs in CFI chain.  DO_CFI_ASM is true if .cfi_* directives shall
-   be emitted, false otherwise.  If it is false, FDE and FOR_EH are the
-   other arguments to pass to output_cfi.  */
+/* Output CFIs from VEC, up to index UPTO, to bring current FDE to the
+   same state as after executing CFIs in CFI chain.  DO_CFI_ASM is
+   true if .cfi_* directives shall be emitted, false otherwise.  If it
+   is false, FDE and FOR_EH are the other arguments to pass to
+   output_cfi.  */
 
 static void
-output_cfis (dw_cfi_ref cfi, bool do_cfi_asm, dw_fde_ref fde, bool for_eh)
+output_cfis (cfi_vec vec, int upto, bool do_cfi_asm,
+	     dw_fde_ref fde, bool for_eh)
 {
+  int ix;
   struct dw_cfi_struct cfi_buf;
   dw_cfi_ref cfi2;
   dw_cfi_ref cfi_args_size = NULL, cfi_cfa = NULL, cfi_cfa_offset = NULL;
-  VEC (dw_cfi_ref, heap) *regs = VEC_alloc (dw_cfi_ref, heap, 32);
+  VEC(dw_cfi_ref, heap) *regs = VEC_alloc (dw_cfi_ref, heap, 32);
   unsigned int len, idx;
 
-  for (;; cfi = cfi->dw_cfi_next)
-    switch (cfi ? cfi->dw_cfi_opc : DW_CFA_nop)
-      {
-      case DW_CFA_advance_loc:
-      case DW_CFA_advance_loc1:
-      case DW_CFA_advance_loc2:
-      case DW_CFA_advance_loc4:
-      case DW_CFA_MIPS_advance_loc8:
-      case DW_CFA_set_loc:
-	/* All advances should be ignored.  */
-	break;
-      case DW_CFA_remember_state:
+  for (ix = 0; ix < upto + 1; ix++)
+    {
+      dw_cfi_ref cfi = ix < upto ? VEC_index (dw_cfi_ref, vec, ix) : NULL;
+      switch (cfi ? cfi->dw_cfi_opc : DW_CFA_nop)
 	{
-	  dw_cfi_ref args_size = cfi_args_size;
-
-	  /* Skip everything between .cfi_remember_state and
-	     .cfi_restore_state.  */
-	  for (cfi2 = cfi->dw_cfi_next; cfi2; cfi2 = cfi2->dw_cfi_next)
-	    if (cfi2->dw_cfi_opc == DW_CFA_restore_state)
-	      break;
-	    else if (cfi2->dw_cfi_opc == DW_CFA_GNU_args_size)
-	      args_size = cfi2;
-	    else
-	      gcc_assert (cfi2->dw_cfi_opc != DW_CFA_remember_state);
-
-	  if (cfi2 == NULL)
-	    goto flush_all;
-	  else
-	    {
-	      cfi = cfi2;
-	      cfi_args_size = args_size;
-	    }
+	case DW_CFA_advance_loc:
+	case DW_CFA_advance_loc1:
+	case DW_CFA_advance_loc2:
+	case DW_CFA_advance_loc4:
+	case DW_CFA_MIPS_advance_loc8:
+	case DW_CFA_set_loc:
+	  /* All advances should be ignored.  */
 	  break;
-	}
-      case DW_CFA_GNU_args_size:
-	cfi_args_size = cfi;
-	break;
-      case DW_CFA_GNU_window_save:
-	goto flush_all;
-      case DW_CFA_offset:
-      case DW_CFA_offset_extended:
-      case DW_CFA_offset_extended_sf:
-      case DW_CFA_restore:
-      case DW_CFA_restore_extended:
-      case DW_CFA_undefined:
-      case DW_CFA_same_value:
-      case DW_CFA_register:
-      case DW_CFA_val_offset:
-      case DW_CFA_val_offset_sf:
-      case DW_CFA_expression:
-      case DW_CFA_val_expression:
-      case DW_CFA_GNU_negative_offset_extended:
-	if (VEC_length (dw_cfi_ref, regs) <= cfi->dw_cfi_oprnd1.dw_cfi_reg_num)
-	  VEC_safe_grow_cleared (dw_cfi_ref, heap, regs,
-				 cfi->dw_cfi_oprnd1.dw_cfi_reg_num + 1);
-	VEC_replace (dw_cfi_ref, regs, cfi->dw_cfi_oprnd1.dw_cfi_reg_num, cfi);
-	break;
-      case DW_CFA_def_cfa:
-      case DW_CFA_def_cfa_sf:
-      case DW_CFA_def_cfa_expression:
-	cfi_cfa = cfi;
-	cfi_cfa_offset = cfi;
-	break;
-      case DW_CFA_def_cfa_register:
-	cfi_cfa = cfi;
-	break;
-      case DW_CFA_def_cfa_offset:
-      case DW_CFA_def_cfa_offset_sf:
-	cfi_cfa_offset = cfi;
-	break;
-      case DW_CFA_nop:
-	gcc_assert (cfi == NULL);
-      flush_all:
-	len = VEC_length (dw_cfi_ref, regs);
-	for (idx = 0; idx < len; idx++)
+	case DW_CFA_remember_state:
 	  {
-	    cfi2 = VEC_replace (dw_cfi_ref, regs, idx, NULL);
-	    if (cfi2 != NULL
-		&& cfi2->dw_cfi_opc != DW_CFA_restore
-		&& cfi2->dw_cfi_opc != DW_CFA_restore_extended)
+	    dw_cfi_ref args_size = cfi_args_size;
+
+	    /* Skip everything between .cfi_remember_state and
+	       .cfi_restore_state.  */
+	    for (; ix < upto; ix++)
 	      {
-		if (do_cfi_asm)
-		  output_cfi_directive (cfi2);
+		cfi2 = VEC_index (dw_cfi_ref, vec, ix);
+		if (cfi2->dw_cfi_opc == DW_CFA_restore_state)
+		  break;
+		else if (cfi2->dw_cfi_opc == DW_CFA_GNU_args_size)
+		  args_size = cfi2;
 		else
-		  output_cfi (cfi2, fde, for_eh);
+		  gcc_assert (cfi2->dw_cfi_opc != DW_CFA_remember_state);
 	      }
+
+	    if (cfi2 == NULL)
+	      goto flush_all;
+	    cfi_args_size = args_size;
+	    break;
 	  }
-	if (cfi_cfa && cfi_cfa_offset && cfi_cfa_offset != cfi_cfa)
-	  {
-	    gcc_assert (cfi_cfa->dw_cfi_opc != DW_CFA_def_cfa_expression);
-	    cfi_buf = *cfi_cfa;
-	    switch (cfi_cfa_offset->dw_cfi_opc)
-	      {
-	      case DW_CFA_def_cfa_offset:
-		cfi_buf.dw_cfi_opc = DW_CFA_def_cfa;
-		cfi_buf.dw_cfi_oprnd2 = cfi_cfa_offset->dw_cfi_oprnd1;
-		break;
-	      case DW_CFA_def_cfa_offset_sf:
-		cfi_buf.dw_cfi_opc = DW_CFA_def_cfa_sf;
-		cfi_buf.dw_cfi_oprnd2 = cfi_cfa_offset->dw_cfi_oprnd1;
-		break;
-	      case DW_CFA_def_cfa:
-	      case DW_CFA_def_cfa_sf:
-		cfi_buf.dw_cfi_opc = cfi_cfa_offset->dw_cfi_opc;
-		cfi_buf.dw_cfi_oprnd2 = cfi_cfa_offset->dw_cfi_oprnd2;
-		break;
-	      default:
-		gcc_unreachable ();
-	      }
-	    cfi_cfa = &cfi_buf;
-	  }
-	else if (cfi_cfa_offset)
-	  cfi_cfa = cfi_cfa_offset;
-	if (cfi_cfa)
-	  {
-	    if (do_cfi_asm)
-	      output_cfi_directive (cfi_cfa);
-	    else
-	      output_cfi (cfi_cfa, fde, for_eh);
-	  }
-	cfi_cfa = NULL;
-	cfi_cfa_offset = NULL;
-	if (cfi_args_size
-	    && cfi_args_size->dw_cfi_oprnd1.dw_cfi_offset)
-	  {
-	    if (do_cfi_asm)
-	      output_cfi_directive (cfi_args_size);
-	    else
-	      output_cfi (cfi_args_size, fde, for_eh);
-	  }
-	cfi_args_size = NULL;
-	if (cfi == NULL)
-	  {
-	    VEC_free (dw_cfi_ref, heap, regs);
-	    return;
-	  }
-	else if (do_cfi_asm)
-	  output_cfi_directive (cfi);
-	else
-	  output_cfi (cfi, fde, for_eh);
-	break;
-      default:
-	gcc_unreachable ();
+	case DW_CFA_GNU_args_size:
+	  cfi_args_size = cfi;
+	  break;
+	case DW_CFA_GNU_window_save:
+	  goto flush_all;
+	case DW_CFA_offset:
+	case DW_CFA_offset_extended:
+	case DW_CFA_offset_extended_sf:
+	case DW_CFA_restore:
+	case DW_CFA_restore_extended:
+	case DW_CFA_undefined:
+	case DW_CFA_same_value:
+	case DW_CFA_register:
+	case DW_CFA_val_offset:
+	case DW_CFA_val_offset_sf:
+	case DW_CFA_expression:
+	case DW_CFA_val_expression:
+	case DW_CFA_GNU_negative_offset_extended:
+	  if (VEC_length (dw_cfi_ref, regs)
+	      <= cfi->dw_cfi_oprnd1.dw_cfi_reg_num)
+	    VEC_safe_grow_cleared (dw_cfi_ref, heap, regs,
+				   cfi->dw_cfi_oprnd1.dw_cfi_reg_num + 1);
+	  VEC_replace (dw_cfi_ref, regs, cfi->dw_cfi_oprnd1.dw_cfi_reg_num,
+		       cfi);
+	  break;
+	case DW_CFA_def_cfa:
+	case DW_CFA_def_cfa_sf:
+	case DW_CFA_def_cfa_expression:
+	  cfi_cfa = cfi;
+	  cfi_cfa_offset = cfi;
+	  break;
+	case DW_CFA_def_cfa_register:
+	  cfi_cfa = cfi;
+	  break;
+	case DW_CFA_def_cfa_offset:
+	case DW_CFA_def_cfa_offset_sf:
+	  cfi_cfa_offset = cfi;
+	  break;
+	case DW_CFA_nop:
+	  gcc_assert (cfi == NULL);
+	flush_all:
+	  len = VEC_length (dw_cfi_ref, regs);
+	  for (idx = 0; idx < len; idx++)
+	    {
+	      cfi2 = VEC_replace (dw_cfi_ref, regs, idx, NULL);
+	      if (cfi2 != NULL
+		  && cfi2->dw_cfi_opc != DW_CFA_restore
+		  && cfi2->dw_cfi_opc != DW_CFA_restore_extended)
+		{
+		  if (do_cfi_asm)
+		    output_cfi_directive (cfi2);
+		  else
+		    output_cfi (cfi2, fde, for_eh);
+		}
+	    }
+	  if (cfi_cfa && cfi_cfa_offset && cfi_cfa_offset != cfi_cfa)
+	    {
+	      gcc_assert (cfi_cfa->dw_cfi_opc != DW_CFA_def_cfa_expression);
+	      cfi_buf = *cfi_cfa;
+	      switch (cfi_cfa_offset->dw_cfi_opc)
+		{
+		case DW_CFA_def_cfa_offset:
+		  cfi_buf.dw_cfi_opc = DW_CFA_def_cfa;
+		  cfi_buf.dw_cfi_oprnd2 = cfi_cfa_offset->dw_cfi_oprnd1;
+		  break;
+		case DW_CFA_def_cfa_offset_sf:
+		  cfi_buf.dw_cfi_opc = DW_CFA_def_cfa_sf;
+		  cfi_buf.dw_cfi_oprnd2 = cfi_cfa_offset->dw_cfi_oprnd1;
+		  break;
+		case DW_CFA_def_cfa:
+		case DW_CFA_def_cfa_sf:
+		  cfi_buf.dw_cfi_opc = cfi_cfa_offset->dw_cfi_opc;
+		  cfi_buf.dw_cfi_oprnd2 = cfi_cfa_offset->dw_cfi_oprnd2;
+		  break;
+		default:
+		  gcc_unreachable ();
+		}
+	      cfi_cfa = &cfi_buf;
+	    }
+	  else if (cfi_cfa_offset)
+	    cfi_cfa = cfi_cfa_offset;
+	  if (cfi_cfa)
+	    {
+	      if (do_cfi_asm)
+		output_cfi_directive (cfi_cfa);
+	      else
+		output_cfi (cfi_cfa, fde, for_eh);
+	    }
+	  cfi_cfa = NULL;
+	  cfi_cfa_offset = NULL;
+	  if (cfi_args_size
+	      && cfi_args_size->dw_cfi_oprnd1.dw_cfi_offset)
+	    {
+	      if (do_cfi_asm)
+		output_cfi_directive (cfi_args_size);
+	      else
+		output_cfi (cfi_args_size, fde, for_eh);
+	    }
+	  cfi_args_size = NULL;
+	  if (cfi == NULL)
+	    {
+	      VEC_free (dw_cfi_ref, heap, regs);
+	      return;
+	    }
+	  else if (do_cfi_asm)
+	    output_cfi_directive (cfi);
+	  else
+	    output_cfi (cfi, fde, for_eh);
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
     }
 }
 
+/* Like output_cfis, but emit all CFIs in the vector.  */
+static void
+output_all_cfis (cfi_vec vec, bool do_cfi_asm,
+		 dw_fde_ref fde, bool for_eh)
+{
+  output_cfis (vec, VEC_length (dw_cfi_ref, vec), do_cfi_asm, fde, for_eh);
+}
+
 /* Output one FDE.  */
 
 static void
@@ -3666,6 +3681,7 @@ output_fde (dw_fde_ref fde, bool for_eh,
 	    char *section_start_label, int fde_encoding, char *augmentation,
 	    bool any_lsda_needed, int lsda_encoding)
 {
+  int ix;
   const char *begin, *end;
   static unsigned int j;
   char l1[20], l2[20];
@@ -3773,31 +3789,31 @@ output_fde (dw_fde_ref fde, bool for_eh,
      this FDE.  */
   fde->dw_fde_current_label = begin;
   if (!fde->dw_fde_switched_sections)
-    for (cfi = fde->dw_fde_cfi; cfi != NULL; cfi = cfi->dw_cfi_next)
+    FOR_EACH_VEC_ELT (dw_cfi_ref, fde->dw_fde_cfi, ix, cfi)
       output_cfi (cfi, fde, for_eh);
   else if (!second)
     {
-      if (fde->dw_fde_switch_cfi)
-	for (cfi = fde->dw_fde_cfi; cfi != NULL; cfi = cfi->dw_cfi_next)
+      if (fde->dw_fde_switch_cfi_index)
+	FOR_EACH_VEC_ELT (dw_cfi_ref, fde->dw_fde_cfi, ix, cfi)
 	  {
 	    output_cfi (cfi, fde, for_eh);
-	    if (cfi == fde->dw_fde_switch_cfi)
+	    if (ix == fde->dw_fde_switch_cfi_index)
 	      break;
 	  }
     }
   else
     {
-      dw_cfi_ref cfi_next = fde->dw_fde_cfi;
+      int i, from = 0;
+      int until = VEC_length (dw_cfi_ref, fde->dw_fde_cfi);
 
-      if (fde->dw_fde_switch_cfi)
+      if (fde->dw_fde_switch_cfi_index > 0)
 	{
-	  cfi_next = fde->dw_fde_switch_cfi->dw_cfi_next;
-	  fde->dw_fde_switch_cfi->dw_cfi_next = NULL;
-	  output_cfis (fde->dw_fde_cfi, false, fde, for_eh);
-	  fde->dw_fde_switch_cfi->dw_cfi_next = cfi_next;
+	  from = fde->dw_fde_switch_cfi_index;
+	  output_cfis (fde->dw_fde_cfi, from, false, fde, for_eh);
 	}
-      for (cfi = cfi_next; cfi != NULL; cfi = cfi->dw_cfi_next)
-	output_cfi (cfi, fde, for_eh);
+      for (i = from; i < until; i++)
+	output_cfi (VEC_index (dw_cfi_ref, fde->dw_fde_cfi, i),
+		    fde, for_eh);
     }
 
   /* If we are to emit a ref/link from function bodies to their frame tables,
@@ -4033,7 +4049,7 @@ output_call_frame_info (int for_eh)
 			     eh_data_format_name (fde_encoding));
     }
 
-  for (cfi = cie_cfi_head; cfi != NULL; cfi = cfi->dw_cfi_next)
+  FOR_EACH_VEC_ELT (dw_cfi_ref, cie_cfi_vec, i, cfi)
     output_cfi (cfi, NULL, for_eh);
 
   /* Pad the CIE out to an address sized boundary.  */
@@ -4179,8 +4195,8 @@ dwarf2out_begin_prologue (unsigned int l
   fde->dw_fde_end = NULL;
   fde->dw_fde_vms_end_prologue = NULL;
   fde->dw_fde_vms_begin_epilogue = NULL;
-  fde->dw_fde_cfi = NULL;
-  fde->dw_fde_switch_cfi = NULL;
+  fde->dw_fde_cfi = VEC_alloc (dw_cfi_ref, gc, 20);
+  fde->dw_fde_switch_cfi_index = 0;
   fde->funcdef_number = current_function_funcdef_no;
   fde->all_throwers_are_sibcalls = crtl->all_throwers_are_sibcalls;
   fde->uses_eh_lsda = crtl->uses_eh_lsda;
@@ -4385,18 +4401,10 @@ dwarf2out_switch_text_section (void)
       dwarf2out_do_cfi_startproc (true);
       /* As this is a different FDE, insert all current CFI instructions
 	 again.  */
-      output_cfis (fde->dw_fde_cfi, true, fde, true);
+      output_all_cfis (fde->dw_fde_cfi, true, fde, true);
     }
   else
-    {
-      dw_cfi_ref cfi = fde->dw_fde_cfi;
-
-      cfi = fde->dw_fde_cfi;
-      if (cfi)
-	while (cfi->dw_cfi_next != NULL)
-	  cfi = cfi->dw_cfi_next;
-      fde->dw_fde_switch_cfi = cfi;
-    }
+    fde->dw_fde_switch_cfi_index = VEC_length (dw_cfi_ref, fde->dw_fde_cfi);
 }
 \f
 /* And now, the subset of the debugging information support code necessary
@@ -17258,6 +17266,7 @@ tree_add_const_value_attribute_for_decl 
 static dw_loc_list_ref
 convert_cfa_to_fb_loc_list (HOST_WIDE_INT offset)
 {
+  int ix;
   dw_fde_ref fde;
   dw_loc_list_ref list, *list_tail;
   dw_cfi_ref cfi;
@@ -17280,13 +17289,13 @@ convert_cfa_to_fb_loc_list (HOST_WIDE_IN
 
   /* ??? Bald assumption that the CIE opcode list does not contain
      advance opcodes.  */
-  for (cfi = cie_cfi_head; cfi; cfi = cfi->dw_cfi_next)
+  FOR_EACH_VEC_ELT (dw_cfi_ref, cie_cfi_vec, ix, cfi)
     lookup_cfa_1 (cfi, &next_cfa, &remember);
 
   last_cfa = next_cfa;
   last_label = start_label;
 
-  for (cfi = fde->dw_fde_cfi; cfi; cfi = cfi->dw_cfi_next)
+  FOR_EACH_VEC_ELT (dw_cfi_ref, fde->dw_fde_cfi, ix, cfi)
     switch (cfi->dw_cfi_opc)
       {
       case DW_CFA_set_loc:

[-- Attachment #5: 004-dw2cfg.diff --]
[-- Type: text/plain, Size: 19693 bytes --]

Index: gcc/dwarf2out.c
===================================================================
--- gcc.orig/dwarf2out.c
+++ gcc/dwarf2out.c
@@ -471,13 +471,14 @@ static void initial_return_save (rtx);
 static HOST_WIDE_INT stack_adjust_offset (const_rtx, HOST_WIDE_INT,
 					  HOST_WIDE_INT);
 static void output_cfi (dw_cfi_ref, dw_fde_ref, int);
-static void output_cfi_directive (dw_cfi_ref);
+static void output_cfi_directive (FILE *, dw_cfi_ref);
 static void output_call_frame_info (int);
 static void dwarf2out_note_section_used (void);
 static bool clobbers_queued_reg_save (const_rtx);
 static void dwarf2out_frame_debug_expr (rtx, const char *);
-static void dwarf2out_cfi_begin_epilogue (rtx);
-static void dwarf2out_frame_debug_restore_state (void);
+
+extern void dwarf2out_cfi_begin_epilogue (rtx);
+extern void dwarf2out_frame_debug_restore_state (const char *);
 
 /* Support for complex CFA locations.  */
 static void output_cfa_loc (dw_cfi_ref, int);
@@ -889,6 +890,17 @@ static bool emit_cfa_remember;
 /* True if any CFI directives were emitted at the current insn.  */
 static bool any_cfis_emitted;
 
+static void
+add_cfa_remember (const char *label)
+{
+  dw_cfi_ref cfi_remember;
+
+  /* Emit the state save.  */
+  cfi_remember = new_cfi ();
+  cfi_remember->dw_cfi_opc = DW_CFA_remember_state;
+  add_fde_cfi (label, cfi_remember);
+}
+
 /* Add CFI to the current fde at the PC value indicated by LABEL if specified,
    or to the CIE if LABEL is NULL.  */
 
@@ -904,13 +916,8 @@ add_fde_cfi (const char *label, dw_cfi_r
 
   if (emit_cfa_remember)
     {
-      dw_cfi_ref cfi_remember;
-
-      /* Emit the state save.  */
       emit_cfa_remember = false;
-      cfi_remember = new_cfi ();
-      cfi_remember->dw_cfi_opc = DW_CFA_remember_state;
-      add_fde_cfi (label, cfi_remember);
+      add_cfa_remember (label);
     }
 
   if (dwarf2out_do_cfi_asm ())
@@ -1436,7 +1443,7 @@ stack_adjust_offset (const_rtx pattern, 
 
   return offset;
 }
-
+#if 0
 /* Precomputed args_size for CODE_LABELs and BARRIERs preceeding them,
    indexed by INSN_UID.  */
 
@@ -1541,7 +1548,11 @@ compute_barrier_args_size (void)
 	  cur_args_size = barrier_args_size[INSN_UID (insn)];
 	  prev = prev_nonnote_insn (insn);
 	  if (prev && BARRIER_P (prev))
-	    barrier_args_size[INSN_UID (prev)] = cur_args_size;
+	    {
+	      gcc_assert (LABEL_P (insn));
+	      barrier_args_size[INSN_UID (prev)] = cur_args_size;
+	      barrier_args_size[INSN_UID (insn)] = cur_args_size;
+	    }
 
 	  for (; insn; insn = NEXT_INSN (insn))
 	    {
@@ -1609,6 +1620,7 @@ compute_barrier_args_size (void)
   VEC_free (rtx, heap, worklist);
   VEC_free (rtx, heap, next);
 }
+#endif
 
 /* Add a CFI to update the running total of the size of arguments
    pushed onto the stack.  */
@@ -1707,25 +1719,7 @@ dwarf2out_notice_stack_adjust (rtx insn,
       return;
     }
   else if (BARRIER_P (insn))
-    {
-      /* Don't call compute_barrier_args_size () if the only
-	 BARRIER is at the end of function.  */
-      if (barrier_args_size == NULL && next_nonnote_insn (insn))
-	compute_barrier_args_size ();
-      if (barrier_args_size == NULL)
-	offset = 0;
-      else
-	{
-	  offset = barrier_args_size[INSN_UID (insn)];
-	  if (offset < 0)
-	    offset = 0;
-	}
-
-      offset -= args_size;
-#ifndef STACK_GROWS_DOWNWARD
-      offset = -offset;
-#endif
-    }
+    return;
   else if (GET_CODE (PATTERN (insn)) == SET)
     offset = stack_adjust_offset (PATTERN (insn), args_size, 0);
   else if (GET_CODE (PATTERN (insn)) == PARALLEL
@@ -2921,11 +2915,102 @@ dwarf2out_frame_debug (rtx insn, bool af
     dwarf2out_flush_queued_reg_saves ();
 }
 
+typedef struct
+{
+  cfi_vec cfis;
+  dw_cfa_location cfa, cfa_store;
+  bool visited;
+  bool used_as_start;
+  int args_size;
+} jump_target_info;
+
+static void
+maybe_record_jump_target (rtx label, VEC (rtx, heap) **worklist,
+			  int *uid_luid, jump_target_info *info)
+{
+  dw_fde_ref fde = current_fde ();
+  int uid;
+
+  if (GET_CODE (label) == LABEL_REF)
+    label = XEXP (label, 0);
+  gcc_assert (LABEL_P (label));
+  uid = INSN_UID (label);
+  info += uid_luid[uid];
+  if (info->visited || info->cfis)
+    return;
+
+  if (dump_file)
+    fprintf (dump_file, "recording label %d as possible jump target\n", uid);
+
+  VEC_safe_push (rtx, heap, *worklist, label);
+  info->cfis = VEC_copy (dw_cfi_ref, gc, fde->dw_fde_cfi);
+  info->args_size = args_size;
+  info->cfa = cfa;
+  info->cfa_store = cfa_store;
+}
+
+static bool
+vec_is_prefix_of (cfi_vec vec1, cfi_vec vec2)
+{
+  int i;
+  int len1 = VEC_length (dw_cfi_ref, vec1);
+  int len2 = VEC_length (dw_cfi_ref, vec2);
+  if (len1 > len2)
+    return false;
+  for (i = 0; i < len1; i++)
+    if (VEC_index (dw_cfi_ref, vec1, i) != VEC_index (dw_cfi_ref, vec1, i))
+      return false;
+  return true;
+}
+
+static void
+append_extra_cfis (dw_fde_ref fde, cfi_vec prefix, cfi_vec full, const char *label)
+{
+  int i;
+  int len = VEC_length (dw_cfi_ref, full);
+  int prefix_len = VEC_length (dw_cfi_ref, prefix);
+  for (i = 0; i < len; i++)
+    {
+      dw_cfi_ref elt = VEC_index (dw_cfi_ref, full, i);
+      if (i < prefix_len)
+	gcc_assert (elt == VEC_index (dw_cfi_ref, prefix, i));
+      else
+	{
+	  if (label)
+	    {
+	      dw_cfi_ref cfi2 = new_cfi ();
+	      *cfi2 = *elt;
+	      add_fde_cfi (label, cfi2);
+	    }
+	  else
+	    VEC_safe_push (dw_cfi_ref, gc, fde->dw_fde_cfi, elt);
+	}
+    }
+}
+
+extern void debug_cfi_vec (FILE *, cfi_vec v);
+void debug_cfi_vec (FILE *f, cfi_vec v)
+{
+  int ix;
+  dw_cfi_ref cfi;
+
+  FOR_EACH_VEC_ELT (dw_cfi_ref, v, ix, cfi)
+    output_cfi_directive (f, cfi);
+}
+
 void
 dwarf2out_frame_debug_init (void)
 {
-  size_t i;
+  int max_uid = get_max_uid ();
+  size_t j;
+  int i;
   rtx insn;
+  VEC (rtx, heap) *worklist;
+  cfi_vec incoming_cfis;
+  int n_points;
+  int *uid_luid;
+  jump_target_info *point_info;
+  dw_fde_ref fde = current_fde ();
 
   /* Flush any queued register saves.  */
   dwarf2out_flush_queued_reg_saves ();
@@ -2940,70 +3025,249 @@ dwarf2out_frame_debug_init (void)
   cfa_temp.reg = -1;
   cfa_temp.offset = 0;
 
-  for (i = 0; i < num_regs_saved_in_regs; i++)
+  for (j = 0; j < num_regs_saved_in_regs; j++)
     {
-      regs_saved_in_regs[i].orig_reg = NULL_RTX;
-      regs_saved_in_regs[i].saved_in_reg = NULL_RTX;
+      regs_saved_in_regs[j].orig_reg = NULL_RTX;
+      regs_saved_in_regs[j].saved_in_reg = NULL_RTX;
     }
   num_regs_saved_in_regs = 0;
-
+#if 0
   if (barrier_args_size)
     {
       XDELETEVEC (barrier_args_size);
       barrier_args_size = NULL;
     }
-  for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
+#endif
+
+  n_points = 0;
+  for (insn = get_insns (); insn != NULL_RTX; insn = NEXT_INSN (insn))
+    if (LABEL_P (insn) || BARRIER_P (insn))
+      n_points++;
+  uid_luid = XCNEWVEC (int, max_uid);
+  n_points = 0;
+  for (insn = get_insns (); insn != NULL_RTX; insn = NEXT_INSN (insn))
+    if (LABEL_P (insn) || BARRIER_P (insn))
+      uid_luid[INSN_UID (insn)] = n_points++;
+
+  point_info = XCNEWVEC (jump_target_info, n_points);
+  for (i = 0; i < n_points; i++)
+    point_info[i].args_size = -1;
+
+  worklist = VEC_alloc (rtx, heap, 20);
+  insn = get_insns ();
+  incoming_cfis = VEC_copy (dw_cfi_ref, gc, fde->dw_fde_cfi);
+
+  args_size = old_args_size = 0;
+
+  for (;;)
     {
-      rtx pat;
-      if (BARRIER_P (insn))
-	{
-	  dwarf2out_frame_debug (insn, false);
-	  continue;
-	}
-      else if (NOTE_P (insn))
-	{
-	  switch (NOTE_KIND (insn))
+      HOST_WIDE_INT offset;
+      rtx new_insn, best, next;
+      bool best_has_barrier;
+      jump_target_info *restart_info;
+
+      for (; insn != NULL_RTX; insn = next)
+	{
+	  int uid = INSN_UID (insn);
+	  rtx pat, note;
+
+	  next = NEXT_INSN (insn);
+	  if (LABEL_P (insn) || BARRIER_P (insn))
+	    {
+	      int luid = uid_luid[uid];
+	      jump_target_info *info = point_info + luid;
+	      if (info->used_as_start)
+		{
+		  if (dump_file)
+		    fprintf (dump_file,
+			     "Stopping scan at insn %d; previously reached\n",
+			     uid);
+		  break;
+		}
+	      info->visited = true;
+	      if (BARRIER_P (insn))
+		{
+		  dwarf2out_frame_debug (insn, false);
+		  gcc_assert (info->cfis == NULL);
+		  info->cfis = fde->dw_fde_cfi;
+		  if (dump_file)
+		    {
+		      fprintf (dump_file, "Stopping scan at barrier %d\n", uid);
+		      if (dump_flags & TDF_DETAILS)
+			debug_cfi_vec (dump_file, fde->dw_fde_cfi);
+		    }
+		  break;
+		}
+	    }
+	  if (!NONDEBUG_INSN_P (insn))
+	    continue;
+	  pat = PATTERN (insn);
+	  if (asm_noperands (pat) >= 0)
+	    continue;
+	  if (GET_CODE (pat) == SEQUENCE)
 	    {
-	    case NOTE_INSN_EPILOGUE_BEG:
-#if defined (HAVE_epilogue)
-	      dwarf2out_cfi_begin_epilogue (insn);
+	      for (i = 1; i < XVECLEN (pat, 0); i++)
+		dwarf2out_frame_debug (XVECEXP (pat, 0, i), false);
+	      insn = XVECEXP (pat, 0, 0);
+	    }
+
+	  if (CALL_P (insn) && dwarf2out_do_frame ())
+	    dwarf2out_frame_debug (insn, false);
+	  if (dwarf2out_do_frame ()
+#if !defined (HAVE_prologue)
+	      && !ACCUMULATE_OUTGOING_ARGS 
 #endif
-	      break;
-	    case NOTE_INSN_CFA_RESTORE_STATE:
-	      cfi_insn = insn;
-	      dwarf2out_frame_debug_restore_state ();
-	      break;
+	      )
+	    dwarf2out_frame_debug (insn, true);
+	  if (JUMP_P (insn))
+	    {
+	      rtx label = JUMP_LABEL (insn);
+	      if (label)
+		{
+		  rtx next = next_real_insn (label);
+		  if (next && JUMP_P (next)
+		      && (GET_CODE (PATTERN (next)) == ADDR_VEC
+			  || GET_CODE (PATTERN (next)) == ADDR_DIFF_VEC))
+		    {
+		      rtx pat = PATTERN (next);
+		      int eltnum = GET_CODE (pat) == ADDR_DIFF_VEC ? 1 : 0;
+
+		      for (i = 0; i < XVECLEN (pat, eltnum); i++)
+			maybe_record_jump_target (XVECEXP (pat, eltnum, i),
+						  &worklist, uid_luid,
+						  point_info);
+		    }
+		  else
+		    maybe_record_jump_target (label, &worklist, uid_luid,
+					      point_info);
+		}
+	    }
+	  note = find_reg_note (insn, REG_EH_REGION, NULL_RTX);
+	  if (note)
+	    {
+	      eh_landing_pad lp;
+
+	      lp = get_eh_landing_pad_from_rtx (insn);
+	      if (lp)
+		maybe_record_jump_target (lp->landing_pad, &worklist,
+					  uid_luid, point_info);
 	    }
-	  continue;
 	}
-      if (!NONDEBUG_INSN_P (insn))
-	continue;
-      pat = PATTERN (insn);
-      if (asm_noperands (pat) >= 0)
-	continue;
-      if (GET_CODE (pat) == SEQUENCE)
+      best = NULL_RTX;
+      best_has_barrier = false;
+      FOR_EACH_VEC_ELT (rtx, worklist, i, new_insn)
 	{
-	  int j;
-	  for (j = 1; j < XVECLEN (pat, 0); j++)
-	    dwarf2out_frame_debug (XVECEXP (pat, 0, j), false);
-	  insn = XVECEXP (pat, 0, 0);
+	  rtx prev;
+	  bool this_has_barrier;
+	  restart_info = point_info + uid_luid[INSN_UID (new_insn)];
+	  if (restart_info->visited)
+	    continue;
+	  prev = prev_nonnote_nondebug_insn (new_insn);
+	  this_has_barrier = prev && BARRIER_P (prev);
+	  if (best == NULL_RTX
+	      || prev == insn
+	      || (!best_has_barrier && this_has_barrier))
+	    {
+	      best = new_insn;
+	      best_has_barrier = this_has_barrier;
+	    }
 	}
+      if (best == NULL_RTX)
+	break;
+      if (dump_file)
+	fprintf (dump_file, "restarting scan at label %d", INSN_UID (best));
+      restart_info = point_info + uid_luid[INSN_UID (best)];
+      restart_info->used_as_start = true;
+      insn = best;
+      gcc_assert (LABEL_P (insn));
+      fde->dw_fde_cfi = VEC_copy (dw_cfi_ref, gc, restart_info->cfis);
+      cfa = restart_info->cfa;
+      cfa_store = restart_info->cfa_store;
+      offset = restart_info->args_size;
+      if (offset >= 0)
+	{
+	  const char *label;
+
+	  if (dump_file && offset != args_size)
+	    fprintf (dump_file, ", args_size " HOST_WIDE_INT_PRINT_DEC
+		     "  -> " HOST_WIDE_INT_PRINT_DEC,
+		     args_size, offset);
 
-      if (CALL_P (insn) && dwarf2out_do_frame ())
-	dwarf2out_frame_debug (insn, false);
-      if (dwarf2out_do_frame ()
-#if !defined (HAVE_prologue)
-	  && !ACCUMULATE_OUTGOING_ARGS 
+	  offset -= args_size;
+#ifndef STACK_GROWS_DOWNWARD
+	  offset = -offset;
 #endif
-	  )
-	dwarf2out_frame_debug (insn, true);
+	  label = dwarf2out_cfi_label (false);
+	  cfi_insn = prev_nonnote_nondebug_insn (insn);
+	  dwarf2out_stack_adjust (offset, label);
+	}
+      if (dump_file)
+	{
+	  fprintf (dump_file, "\n");
+	  if (dump_flags & TDF_DETAILS)
+	    debug_cfi_vec (dump_file, fde->dw_fde_cfi);
+	}
+
+      restart_info->visited = true;
+      insn = NEXT_INSN (insn);
+    }
+
+  /* Now splice the various CFI fragments together into a coherent whole.  */
+  fde->dw_fde_cfi = VEC_alloc (dw_cfi_ref, gc, 20);
+  insn = get_insns ();
+  while (!NOTE_P (insn) && NOTE_KIND (insn) != NOTE_INSN_FUNCTION_BEG)
+    insn = NEXT_INSN (insn);
+  cfi_insn = insn;
+  add_cfa_remember (dwarf2out_cfi_label (false));
+
+  for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
+    {
+      if (BARRIER_P (insn))
+	{
+	  cfi_vec new_cfi_vec;
+	  jump_target_info *info = point_info + uid_luid[INSN_UID (insn)];
+	  cfi_vec barrier_cfi = info->cfis;
+	  const char *label = dwarf2out_cfi_label (false);
+
+	  append_extra_cfis (fde, incoming_cfis, barrier_cfi, NULL);
+
+	  /* Find the start of the next sequence we processed.  */
+	  do
+	    {
+	      if (LABEL_P (insn))
+		{
+		  info = point_info + uid_luid[INSN_UID (insn)];
+		  if (info->used_as_start)
+		    break;
+		}
+	      insn = NEXT_INSN (insn);
+	    }
+	  while (insn != NULL_RTX);
+	  if (insn == NULL_RTX)
+	    break;
+
+	  /* Emit extra CFIs as necessary to achieve the correct state.  */
+	  gcc_assert (LABEL_P (insn));
+	  new_cfi_vec = info->cfis;
+	  cfi_insn = insn;
+	  if (vec_is_prefix_of (barrier_cfi, new_cfi_vec))
+	    append_extra_cfis (fde, barrier_cfi, new_cfi_vec, label);
+	  else
+	    {
+	      dwarf2out_frame_debug_restore_state (label);
+	      add_cfa_remember (label);
+
+	      append_extra_cfis (fde, NULL, new_cfi_vec, label);
+	    }
+	  incoming_cfis = new_cfi_vec;
+	}
     }
 }
 
 void
 dwarf2out_emit_cfi (dw_cfi_ref cfi)
 {
-  output_cfi_directive (cfi);
+  output_cfi_directive (asm_out_file, cfi);
 }
 
 /* Determine if we need to save and restore CFI information around
@@ -3091,17 +3355,12 @@ dwarf2out_cfi_begin_epilogue (rtx insn)
    required.  */
 
 void
-dwarf2out_frame_debug_restore_state (void)
+dwarf2out_frame_debug_restore_state (const char *label)
 {
   dw_cfi_ref cfi = new_cfi ();
-  const char *label = dwarf2out_cfi_label (false);
 
   cfi->dw_cfi_opc = DW_CFA_restore_state;
   add_fde_cfi (label, cfi);
-
-  gcc_assert (cfa_remember.in_use);
-  cfa = cfa_remember;
-  cfa_remember.in_use = 0;
 }
 
 /* Describe for the GTY machinery what parts of dw_cfi_oprnd1 are used.  */
@@ -3401,7 +3660,7 @@ output_cfi (dw_cfi_ref cfi, dw_fde_ref f
 /* Similar, but do it via assembler directives instead.  */
 
 static void
-output_cfi_directive (dw_cfi_ref cfi)
+output_cfi_directive (FILE *f, dw_cfi_ref cfi)
 {
   unsigned long r, r2;
 
@@ -3422,76 +3681,76 @@ output_cfi_directive (dw_cfi_ref cfi)
     case DW_CFA_offset_extended:
     case DW_CFA_offset_extended_sf:
       r = DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, 1);
-      fprintf (asm_out_file, "\t.cfi_offset %lu, "HOST_WIDE_INT_PRINT_DEC"\n",
+      fprintf (f, "\t.cfi_offset %lu, "HOST_WIDE_INT_PRINT_DEC"\n",
 	       r, cfi->dw_cfi_oprnd2.dw_cfi_offset);
       break;
 
     case DW_CFA_restore:
     case DW_CFA_restore_extended:
       r = DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, 1);
-      fprintf (asm_out_file, "\t.cfi_restore %lu\n", r);
+      fprintf (f, "\t.cfi_restore %lu\n", r);
       break;
 
     case DW_CFA_undefined:
       r = DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, 1);
-      fprintf (asm_out_file, "\t.cfi_undefined %lu\n", r);
+      fprintf (f, "\t.cfi_undefined %lu\n", r);
       break;
 
     case DW_CFA_same_value:
       r = DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, 1);
-      fprintf (asm_out_file, "\t.cfi_same_value %lu\n", r);
+      fprintf (f, "\t.cfi_same_value %lu\n", r);
       break;
 
     case DW_CFA_def_cfa:
     case DW_CFA_def_cfa_sf:
       r = DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, 1);
-      fprintf (asm_out_file, "\t.cfi_def_cfa %lu, "HOST_WIDE_INT_PRINT_DEC"\n",
+      fprintf (f, "\t.cfi_def_cfa %lu, "HOST_WIDE_INT_PRINT_DEC"\n",
 	       r, cfi->dw_cfi_oprnd2.dw_cfi_offset);
       break;
 
     case DW_CFA_def_cfa_register:
       r = DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, 1);
-      fprintf (asm_out_file, "\t.cfi_def_cfa_register %lu\n", r);
+      fprintf (f, "\t.cfi_def_cfa_register %lu\n", r);
       break;
 
     case DW_CFA_register:
       r = DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, 1);
       r2 = DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd2.dw_cfi_reg_num, 1);
-      fprintf (asm_out_file, "\t.cfi_register %lu, %lu\n", r, r2);
+      fprintf (f, "\t.cfi_register %lu, %lu\n", r, r2);
       break;
 
     case DW_CFA_def_cfa_offset:
     case DW_CFA_def_cfa_offset_sf:
-      fprintf (asm_out_file, "\t.cfi_def_cfa_offset "
+      fprintf (f, "\t.cfi_def_cfa_offset "
 	       HOST_WIDE_INT_PRINT_DEC"\n",
 	       cfi->dw_cfi_oprnd1.dw_cfi_offset);
       break;
 
     case DW_CFA_remember_state:
-      fprintf (asm_out_file, "\t.cfi_remember_state\n");
+      fprintf (f, "\t.cfi_remember_state\n");
       break;
     case DW_CFA_restore_state:
-      fprintf (asm_out_file, "\t.cfi_restore_state\n");
+      fprintf (f, "\t.cfi_restore_state\n");
       break;
 
     case DW_CFA_GNU_args_size:
-      fprintf (asm_out_file, "\t.cfi_escape %#x,", DW_CFA_GNU_args_size);
+      fprintf (f, "\t.cfi_escape %#x,", DW_CFA_GNU_args_size);
       dw2_asm_output_data_uleb128_raw (cfi->dw_cfi_oprnd1.dw_cfi_offset);
       if (flag_debug_asm)
-	fprintf (asm_out_file, "\t%s args_size "HOST_WIDE_INT_PRINT_DEC,
+	fprintf (f, "\t%s args_size "HOST_WIDE_INT_PRINT_DEC,
 		 ASM_COMMENT_START, cfi->dw_cfi_oprnd1.dw_cfi_offset);
-      fputc ('\n', asm_out_file);
+      fputc ('\n', f);
       break;
 
     case DW_CFA_GNU_window_save:
-      fprintf (asm_out_file, "\t.cfi_window_save\n");
+      fprintf (f, "\t.cfi_window_save\n");
       break;
 
     case DW_CFA_def_cfa_expression:
     case DW_CFA_expression:
-      fprintf (asm_out_file, "\t.cfi_escape %#x,", cfi->dw_cfi_opc);
+      fprintf (f, "\t.cfi_escape %#x,", cfi->dw_cfi_opc);
       output_cfa_loc_raw (cfi);
-      fputc ('\n', asm_out_file);
+      fputc ('\n', f);
       break;
 
     default:
@@ -3601,7 +3860,7 @@ output_cfis (cfi_vec vec, int upto, bool
 		  && cfi2->dw_cfi_opc != DW_CFA_restore_extended)
 		{
 		  if (do_cfi_asm)
-		    output_cfi_directive (cfi2);
+		    output_cfi_directive (asm_out_file, cfi2);
 		  else
 		    output_cfi (cfi2, fde, for_eh);
 		}
@@ -3635,7 +3894,7 @@ output_cfis (cfi_vec vec, int upto, bool
 	  if (cfi_cfa)
 	    {
 	      if (do_cfi_asm)
-		output_cfi_directive (cfi_cfa);
+		output_cfi_directive (asm_out_file, cfi_cfa);
 	      else
 		output_cfi (cfi_cfa, fde, for_eh);
 	    }
@@ -3645,7 +3904,7 @@ output_cfis (cfi_vec vec, int upto, bool
 	      && cfi_args_size->dw_cfi_oprnd1.dw_cfi_offset)
 	    {
 	      if (do_cfi_asm)
-		output_cfi_directive (cfi_args_size);
+		output_cfi_directive (asm_out_file, cfi_args_size);
 	      else
 		output_cfi (cfi_args_size, fde, for_eh);
 	    }
@@ -3656,7 +3915,7 @@ output_cfis (cfi_vec vec, int upto, bool
 	      return;
 	    }
 	  else if (do_cfi_asm)
-	    output_cfi_directive (cfi);
+	    output_cfi_directive (asm_out_file, cfi);
 	  else
 	    output_cfi (cfi, fde, for_eh);
 	  break;

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 3/6] Allow jumps in epilogues
  2011-03-31 20:09             ` Bernd Schmidt
@ 2011-03-31 21:51               ` Richard Henderson
  2011-03-31 22:36                 ` Bernd Schmidt
  2011-04-05 21:59                 ` Bernd Schmidt
  0 siblings, 2 replies; 73+ messages in thread
From: Richard Henderson @ 2011-03-31 21:51 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches

On 03/31/2011 12:59 PM, Bernd Schmidt wrote:
>> So long as late late compilation passes continue to not move frame-related
>> insns across basic block boundaries, we should be fine.
> 
> I'm nervous about this as the reorg pass can do arbitrary
> transformations. On Blackfin for example, we can reorder basic blocks
> for the sake of loop optimizations; sched-ebb can create new blocks,
> etc. I think it would be best if we can somehow make it work during
> final, without a CFG.

I guess that's the best thing for now.  I'm sure we all agree that long
term all transformations should preserve the CFG all the way through final.
At which point this could be implemented as a pass on the function after
all transformations are complete.

> Rather than use a CFG, I've tried to do something similar to
> compute_barrier_args_size, using JUMP_LABELs etc.

A reasonable solution for now, I suppose.

> 
> Summary of the patches:
> 001 - just create a dwarf2out_frame_debug_init function.

Ok.

> 002 - Make it walk the function in a first pass and record CFIs to
>       be output later

Do I correctly understand that NOTE_INSN_CFI isn't actually being
used in this patch?

> 003 - Store dw_cfi_refs in VECs rather than linked lists. Looks
>       larger than it is due to reindentation

Like 001, this one looks like it's totally independent of and of
the other changes, and a good cleanup.  Please go ahead and test
and commit this one independently.

> 004 - Change the function walk introduced in 002 so that it records
>       and restores state when reaching jumps/barriers

I'm not too fond of vec_is_prefix_of.  The Problem is that you're
not applying any understanding of the actual data, just doing a
string comparison (effectively).

Imagine two code paths A and B that both save R2 and R3 into their
respective stack slots.  Imagine that -- for whatever reason -- the
stores have been scheduled differently such that on path A R2 is 
saved before R3, and the reverse on path B.

Your prefix test will conclude that paths A and B end with different
unwind info, even though they are in fact compatible.

Using some mechanism by which we can compare aggregate CFI information
on a per-register basis ought to also vastly improve the efficiency in
adjusting the cfi info between code points.  It should also enable
proper information in the -freorder-blocks-and-partition case.

> * i386.c uses dwarf2out_frame_debug directly in some cases and is
>   unconverted

Hum.  I wonder what the best way to attack this.  It's a local change,
adjusting and then restoring the unwind state between two insns that
should not be scheduled separately.

We could turn them into two unspec_volatiles, and lose scheduling 
across this pattern.  But ideally this is a value that ought to be
shrink-wrapped.  It's expensive to compute, and there are many
early-return situations in which we don't need it.

I suppose we could split this pattern manually in i386 reorg; 
forcing this to be split before final even at -O0.  At that point
all shrink-wrapping would be done and an unspecv replacement 
would be ok.

> * I haven't tested whether my attempt to use
>   get_eh_landing_pad_from_rtx in the absence of a CFG actually works

It will.  This information is stored in cfun->eh.  By design this
information must survive until final, so that we can emit the 
actual eh info into the appropriate tables.

> * Computed jumps and nonlocal gotos aren't handled. I think this
>   could be done by recording the state at NOTE_INSN_PROLOGUE_END
>   and using that for all labels we can't otherwise reach.

That should be reasonable.  You could assert that all of these 
labels are in forced_labels.  All computed branch targets should
be listed therein.

r~

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 3/6] Allow jumps in epilogues
  2011-03-31 21:51               ` Richard Henderson
@ 2011-03-31 22:36                 ` Bernd Schmidt
  2011-03-31 23:57                   ` Richard Henderson
  2011-04-05 21:59                 ` Bernd Schmidt
  1 sibling, 1 reply; 73+ messages in thread
From: Bernd Schmidt @ 2011-03-31 22:36 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1972 bytes --]

On 03/31/2011 11:28 PM, Richard Henderson wrote:
>> Rather than use a CFG, I've tried to do something similar to
>> compute_barrier_args_size, using JUMP_LABELs etc.
> 
> A reasonable solution for now, I suppose.

Ok, sounds like you haven't discovered a fatal flaw; I'll keep hacking
on it.

>> Summary of the patches:
>> 001 - just create a dwarf2out_frame_debug_init function.
> 
> Ok.
> 
>> 003 - Store dw_cfi_refs in VECs rather than linked lists. Looks
>>       larger than it is due to reindentation
> 
> Like 001, this one looks like it's totally independent of and of
> the other changes, and a good cleanup.  Please go ahead and test
> and commit this one independently.

Will do.

>> 002 - Make it walk the function in a first pass and record CFIs to
>>       be output later
>
> Do I correctly understand that NOTE_INSN_CFI isn't actually being
> used in this patch?

No, it's used - but it looks like I forgot to quilt refresh and the
final.c changes weren't included. New patch below. After this patch, the
whole function is processed before final, and rather than emitting cfi
directives immediately, we create these notes which cause the directives
to be emitted during final.

This probably shouldn't be committed separately when these changes go
in, as (I think) it breaks -freorder-blocks-and-partition as well as the
code in i386.c; it's split out simply to show an intermediate stage.

>> 004 - Change the function walk introduced in 002 so that it records
>>       and restores state when reaching jumps/barriers
> 
> I'm not too fond of vec_is_prefix_of.  The Problem is that you're
> not applying any understanding of the actual data, just doing a
> string comparison (effectively).

Yes, this falls under "inefficient CFI insns". I wanted to post a
preliminary proof-of-concept patch set now which generates
correct(-looking) output, but not necessarily optimized output. Not
quite sure yet how to tackle this but I'll think of something.


Bernd

[-- Attachment #2: 002-scanfirst.diff --]
[-- Type: text/plain, Size: 9344 bytes --]

    	* cfgcleanup.c (flow_find_head_matching_sequence): Ignore
    	epilogue notes.
    	* df-problems.c (can_move_insns_across): Don't stop at epilogue
    	notes.
    	* dwarf2out.c (dwarf2out_cfi_begin_epilogue): Also allow a
    	simplejump to end the block.

Index: gcc/dwarf2out.c
===================================================================
--- gcc.orig/dwarf2out.c
+++ gcc/dwarf2out.c
@@ -471,6 +471,8 @@ static void output_call_frame_info (int)
 static void dwarf2out_note_section_used (void);
 static bool clobbers_queued_reg_save (const_rtx);
 static void dwarf2out_frame_debug_expr (rtx, const char *);
+static void dwarf2out_cfi_begin_epilogue (rtx);
+static void dwarf2out_frame_debug_restore_state (void);
 
 /* Support for complex CFA locations.  */
 static void output_cfa_loc (dw_cfi_ref, int);
@@ -879,6 +881,9 @@ dwarf2out_cfi_label (bool force)
   return label;
 }
 
+/* The insn after which a new CFI note should be emitted.  */
+static rtx cfi_insn;
+
 /* True if remember_state should be emitted before following CFI directive.  */
 static bool emit_cfa_remember;
 
@@ -961,7 +966,8 @@ add_fde_cfi (const char *label, dw_cfi_r
 	        }
 	    }
 
-	  output_cfi_directive (cfi);
+	  cfi_insn = emit_note_after (NOTE_INSN_CFI, cfi_insn);
+	  NOTE_CFI (cfi_insn) = cfi;
 
 	  list_head = &fde->dw_fde_cfi;
 	  any_cfis_emitted = true;
@@ -2790,6 +2796,11 @@ dwarf2out_frame_debug (rtx insn, bool af
   rtx note, n;
   bool handled_one = false;
 
+  if (after_p)
+    cfi_insn = insn;
+  else
+    cfi_insn = PREV_INSN (insn);
+
   if (!NONJUMP_INSN_P (insn) || clobbers_queued_reg_save (insn))
     dwarf2out_flush_queued_reg_saves ();
 
@@ -2911,6 +2922,7 @@ void
 dwarf2out_frame_debug_init (void)
 {
   size_t i;
+  rtx insn;
 
   /* Flush any queued register saves.  */
   dwarf2out_flush_queued_reg_saves ();
@@ -2937,12 +2949,64 @@ dwarf2out_frame_debug_init (void)
       XDELETEVEC (barrier_args_size);
       barrier_args_size = NULL;
     }
+  for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
+    {
+      rtx pat;
+      if (BARRIER_P (insn))
+	{
+	  dwarf2out_frame_debug (insn, false);
+	  continue;
+	}
+      else if (NOTE_P (insn))
+	{
+	  switch (NOTE_KIND (insn))
+	    {
+	    case NOTE_INSN_EPILOGUE_BEG:
+#if defined (HAVE_epilogue)
+	      dwarf2out_cfi_begin_epilogue (insn);
+#endif
+	      break;
+	    case NOTE_INSN_CFA_RESTORE_STATE:
+	      cfi_insn = insn;
+	      dwarf2out_frame_debug_restore_state ();
+	      break;
+	    }
+	  continue;
+	}
+      if (!NONDEBUG_INSN_P (insn))
+	continue;
+      pat = PATTERN (insn);
+      if (asm_noperands (pat) >= 0)
+	continue;
+      if (GET_CODE (pat) == SEQUENCE)
+	{
+	  int j;
+	  for (j = 1; j < XVECLEN (pat, 0); j++)
+	    dwarf2out_frame_debug (XVECEXP (pat, 0, j), false);
+	  insn = XVECEXP (pat, 0, 0);
+	}
+
+      if (CALL_P (insn) && dwarf2out_do_frame ())
+	dwarf2out_frame_debug (insn, false);
+      if (dwarf2out_do_frame ()
+#if !defined (HAVE_prologue)
+	  && !ACCUMULATE_OUTGOING_ARGS
+#endif
+	  )
+	dwarf2out_frame_debug (insn, true);
+    }
+}
+
+void
+dwarf2out_emit_cfi (dw_cfi_ref cfi)
+{
+  output_cfi_directive (cfi);
 }
 
-/* Determine if we need to save and restore CFI information around this
-   epilogue.  If SIBCALL is true, then this is a sibcall epilogue.  If
-   we do need to save/restore, then emit the save now, and insert a
-   NOTE_INSN_CFA_RESTORE_STATE at the appropriate place in the stream.  */
+/* Determine if we need to save and restore CFI information around
+   this epilogue.  If we do need to save/restore, then emit the save
+   now, and insert a NOTE_INSN_CFA_RESTORE_STATE at the appropriate
+   place in the stream.  */
 
 void
 dwarf2out_cfi_begin_epilogue (rtx insn)
@@ -2957,8 +3021,10 @@ dwarf2out_cfi_begin_epilogue (rtx insn)
       if (!INSN_P (i))
 	continue;
 
-      /* Look for both regular and sibcalls to end the block.  */
-      if (returnjump_p (i))
+      /* Look for both regular and sibcalls to end the block.  Various
+	 optimization passes may cause us to jump to a common epilogue
+	 tail, so we also accept simplejumps.  */
+      if (returnjump_p (i) || simplejump_p (i))
 	break;
       if (CALL_P (i) && SIBLING_CALL_P (i))
 	break;
Index: gcc/dwarf2out.h
===================================================================
--- gcc.orig/dwarf2out.h
+++ gcc/dwarf2out.h
@@ -18,11 +18,11 @@ You should have received a copy of the G
 along with GCC; see the file COPYING3.  If not see
 <http://www.gnu.org/licenses/>.  */
 
+struct dw_cfi_struct;
 extern void dwarf2out_decl (tree);
 extern void dwarf2out_frame_debug (rtx, bool);
 extern void dwarf2out_frame_debug_init (void);
-extern void dwarf2out_cfi_begin_epilogue (rtx);
-extern void dwarf2out_frame_debug_restore_state (void);
+extern void dwarf2out_emit_cfi (struct dw_cfi_struct *);
 extern void dwarf2out_flush_queued_reg_saves (void);
 
 extern void debug_dwarf (void);
Index: gcc/insn-notes.def
===================================================================
--- gcc.orig/insn-notes.def
+++ gcc/insn-notes.def
@@ -77,4 +77,8 @@ INSN_NOTE (SWITCH_TEXT_SECTIONS)
    when an epilogue appears in the middle of a function.  */
 INSN_NOTE (CFA_RESTORE_STATE)
 
+/* When emitting dwarf2 frame information, contains a directive that
+   should be emitted.  */
+INSN_NOTE (CFI)
+
 #undef INSN_NOTE
Index: gcc/rtl.h
===================================================================
--- gcc.orig/rtl.h
+++ gcc/rtl.h
@@ -180,6 +180,7 @@ union rtunion_def
   mem_attrs *rt_mem;
   reg_attrs *rt_reg;
   struct constant_descriptor_rtx *rt_constant;
+  struct dw_cfi_struct *rt_cfi;
 };
 typedef union rtunion_def rtunion;
 
@@ -708,6 +709,7 @@ extern void rtl_check_failed_flag (const
 #define XTREE(RTX, N)   (RTL_CHECK1 (RTX, N, 't').rt_tree)
 #define XBBDEF(RTX, N)	(RTL_CHECK1 (RTX, N, 'B').rt_bb)
 #define XTMPL(RTX, N)	(RTL_CHECK1 (RTX, N, 'T').rt_str)
+#define XCFI(RTX, N)	(RTL_CHECK1 (RTX, N, 'C').rt_cfi)
 
 #define XVECEXP(RTX, N, M)	RTVEC_ELT (XVEC (RTX, N), M)
 #define XVECLEN(RTX, N)		GET_NUM_ELEM (XVEC (RTX, N))
@@ -740,6 +742,7 @@ extern void rtl_check_failed_flag (const
 #define XCMODE(RTX, N, C)     (RTL_CHECKC1 (RTX, N, C).rt_type)
 #define XCTREE(RTX, N, C)     (RTL_CHECKC1 (RTX, N, C).rt_tree)
 #define XCBBDEF(RTX, N, C)    (RTL_CHECKC1 (RTX, N, C).rt_bb)
+#define XCCFI(RTX, N, C)      (RTL_CHECKC1 (RTX, N, C).rt_cfi)
 #define XCCSELIB(RTX, N, C)   (RTL_CHECKC1 (RTX, N, C).rt_cselib)
 
 #define XCVECEXP(RTX, N, M, C)	RTVEC_ELT (XCVEC (RTX, N, C), M)
@@ -882,6 +885,7 @@ extern const char * const reg_note_name[
 #define NOTE_BLOCK(INSN)	XCTREE (INSN, 4, NOTE)
 #define NOTE_EH_HANDLER(INSN)	XCINT (INSN, 4, NOTE)
 #define NOTE_BASIC_BLOCK(INSN)	XCBBDEF (INSN, 4, NOTE)
+#define NOTE_CFI(INSN)		XCCFI (INSN, 4, NOTE)
 #define NOTE_VAR_LOCATION(INSN)	XCEXP (INSN, 4, NOTE)
 
 /* In a NOTE that is a line number, this is the line number.
Index: gcc/final.c
===================================================================
--- gcc.orig/final.c
+++ gcc/final.c
@@ -1899,16 +1899,15 @@ final_scan_insn (rtx insn, FILE *file, i
 	  break;
 
 	case NOTE_INSN_EPILOGUE_BEG:
-#if defined (HAVE_epilogue)
-	  if (dwarf2out_do_frame ())
-	    dwarf2out_cfi_begin_epilogue (insn);
-#endif
 	  (*debug_hooks->begin_epilogue) (last_linenum, last_filename);
 	  targetm.asm_out.function_begin_epilogue (file);
 	  break;
 
 	case NOTE_INSN_CFA_RESTORE_STATE:
-	  dwarf2out_frame_debug_restore_state ();
+	  break;
+
+	case NOTE_INSN_CFI:
+	  dwarf2out_emit_cfi (NOTE_CFI (insn));
 	  break;
 
 	case NOTE_INSN_FUNCTION_BEG:
@@ -2018,8 +2017,6 @@ final_scan_insn (rtx insn, FILE *file, i
       break;
 
     case BARRIER:
-      if (dwarf2out_do_frame ())
-	dwarf2out_frame_debug (insn, false);
       break;
 
     case CODE_LABEL:
@@ -2285,12 +2282,6 @@ final_scan_insn (rtx insn, FILE *file, i
 
 	    final_sequence = body;
 
-	    /* Record the delay slots' frame information before the branch.
-	       This is needed for delayed calls: see execute_cfa_program().  */
-	    if (dwarf2out_do_frame ())
-	      for (i = 1; i < XVECLEN (body, 0); i++)
-		dwarf2out_frame_debug (XVECEXP (body, 0, i), false);
-
 	    /* The first insn in this SEQUENCE might be a JUMP_INSN that will
 	       force the restoration of a comparison that was previously
 	       thought unnecessary.  If that happens, cancel this sequence
@@ -2604,9 +2595,6 @@ final_scan_insn (rtx insn, FILE *file, i
 
 	current_output_insn = debug_insn = insn;
 
-	if (CALL_P (insn) && dwarf2out_do_frame ())
-	  dwarf2out_frame_debug (insn, false);
-
 	/* Find the proper template for this insn.  */
 	templ = get_insn_template (insn_code_number, insn);
 
@@ -2686,16 +2674,6 @@ final_scan_insn (rtx insn, FILE *file, i
 	  targetm.asm_out.final_postscan_insn (file, insn, recog_data.operand,
 					       recog_data.n_operands);
 
-	/* If necessary, report the effect that the instruction has on
-	   the unwind info.   We've already done this for delay slots
-	   and call instructions.  */
-	if (final_sequence == 0
-#if !defined (HAVE_prologue)
-	    && !ACCUMULATE_OUTGOING_ARGS
-#endif
-	    && dwarf2out_do_frame ())
-	  dwarf2out_frame_debug (insn, true);
-
 	if (!targetm.asm_out.unwind_emit_before_insn
 	    && targetm.asm_out.unwind_emit)
 	  targetm.asm_out.unwind_emit (asm_out_file, insn);

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 3/6] Allow jumps in epilogues
  2011-03-31 22:36                 ` Bernd Schmidt
@ 2011-03-31 23:57                   ` Richard Henderson
  0 siblings, 0 replies; 73+ messages in thread
From: Richard Henderson @ 2011-03-31 23:57 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches

On 03/31/2011 03:07 PM, Bernd Schmidt wrote:
> No, it's used - but it looks like I forgot to quilt refresh and the
> final.c changes weren't included. New patch below. After this patch, the
> whole function is processed before final, and rather than emitting cfi
> directives immediately, we create these notes which cause the directives
> to be emitted during final.

Ah, much better.  I had wondered what I was missing.

> This probably shouldn't be committed separately when these changes go
> in, as (I think) it breaks -freorder-blocks-and-partition as well as the
> code in i386.c; it's split out simply to show an intermediate stage.

Sure.

> Yes, this falls under "inefficient CFI insns". I wanted to post a
> preliminary proof-of-concept patch set now which generates
> correct(-looking) output, but not necessarily optimized output. Not
> quite sure yet how to tackle this but I'll think of something.

Ok, I'll go ahead and apply all the patches locally and see what the
output actually looks like.  Perhaps I'll have more suggestions.


r~

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 1/6] Disallow predicating the prologue
  2011-03-23 14:46 ` [PATCH 1/6] Disallow predicating the prologue Bernd Schmidt
  2011-03-31 13:20   ` Jeff Law
@ 2011-04-01 18:59   ` H.J. Lu
  2011-04-01 21:08     ` Bernd Schmidt
  1 sibling, 1 reply; 73+ messages in thread
From: H.J. Lu @ 2011-04-01 18:59 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches

On Wed, Mar 23, 2011 at 7:45 AM, Bernd Schmidt <bernds@codesourcery.com> wrote:
> With prologues appearing in blocks other than the entry block, ifcvt can
> decide to predicate them. This is not a good idea, as dwarf2out will
> blow up trying to handle predicated frame-related things.
>

One of your changes breaks GCC bootstrap:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48403

-- 
H.J.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 1/6] Disallow predicating the prologue
  2011-04-01 18:59   ` H.J. Lu
@ 2011-04-01 21:08     ` Bernd Schmidt
  0 siblings, 0 replies; 73+ messages in thread
From: Bernd Schmidt @ 2011-04-01 21:08 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GCC Patches

On 04/01/2011 08:59 PM, H.J. Lu wrote:
> On Wed, Mar 23, 2011 at 7:45 AM, Bernd Schmidt <bernds@codesourcery.com> wrote:
>> With prologues appearing in blocks other than the entry block, ifcvt can
>> decide to predicate them. This is not a good idea, as dwarf2out will
>> blow up trying to handle predicated frame-related things.
>>
> 
> One of your changes breaks GCC bootstrap:
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48403

I bootstrapped and tested on i686-linux, so it would be useful if you
could narrow it down.


Bernd

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 3/6] Allow jumps in epilogues
  2011-03-31 21:51               ` Richard Henderson
  2011-03-31 22:36                 ` Bernd Schmidt
@ 2011-04-05 21:59                 ` Bernd Schmidt
  2011-04-11 17:10                   ` Richard Henderson
  1 sibling, 1 reply; 73+ messages in thread
From: Bernd Schmidt @ 2011-04-05 21:59 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 653 bytes --]

On 03/31/2011 11:28 PM, Richard Henderson wrote:

>> 003 - Store dw_cfi_refs in VECs rather than linked lists. Looks
>>       larger than it is due to reindentation
> 
> Like 001, this one looks like it's totally independent of and of
> the other changes, and a good cleanup.  Please go ahead and test
> and commit this one independently.

Here's a new version - the code had changed underneath me in the
meantime, and I had some off-by-one errors involving the switch_index.
Bootstrapped and tested on i686-linux; I've also built my set of
testcases with -freorder-blocks-and-partition without code generation
changes from before to after. Ok?


Bernd

[-- Attachment #2: dw03b.diff --]
[-- Type: text/plain, Size: 20547 bytes --]

	* dwarf2out.c (struct dw_cfi_struct): Remove member dw_cfi_next.
	(dw_cfi_ref): Add DEF_VEC_P and some DEF_VEC_ALLOC_Ps.
	(cfi_vec): New typedef.
	(struct dw_fde_struct): Make dw_fde_cfi a cfi_vec. Replace
	dw_fde_switch_cfi with an integer dw_fde_switch_cfi_index.
	(cie_cfi_vec): New static variable.
	(cie_cfi_head): Delete.
	(add_cfi): Accept a cfi_vec * as first argument. All callers and
	declaration changed. Use vector rather than list operations.
	(new_cfi): Don't initialize the dw_cfi_next field.
	(add_fde_cfi): Allocate cie_cfi_vec if necessary. Use vector
	rather than list operations.
	(lookup_cfa): Use vector rather than list operations.
	(output_cfis): New argument upto. Accept a cfi_vec rather than
	a dw_cfi_ref list head as argument. All callers changed.
	Iterate over the vector using upto as a maximum index.
	(output_all_cfis): New static function.
	(output_fde): Use vector rather than list operations. Use the
	new upto argument for output_cfis rather than manipulating a
	list.
	(dwarf2out_begin_prologue): Change initializations to match
	new struct members.
	(dwarf2out_switch_text_section): Initialize dw_fde_switch_cfi_index
	from the vector length rather than searching for the end of a list.
	Use output_all_cfis.
	(convert_cfa_to_fb_loc_list): Use vector rather than list operations.

Index: dwarf2out.c
===================================================================
--- dwarf2out.c	(revision 171839)
+++ dwarf2out.c	(working copy)
@@ -267,7 +267,6 @@ typedef union GTY(()) dw_cfi_oprnd_struc
 dw_cfi_oprnd;
 
 typedef struct GTY(()) dw_cfi_struct {
-  dw_cfi_ref dw_cfi_next;
   enum dwarf_call_frame_info dw_cfi_opc;
   dw_cfi_oprnd GTY ((desc ("dw_cfi_oprnd1_desc (%1.dw_cfi_opc)")))
     dw_cfi_oprnd1;
@@ -276,6 +275,12 @@ typedef struct GTY(()) dw_cfi_struct {
 }
 dw_cfi_node;
 
+DEF_VEC_P (dw_cfi_ref);
+DEF_VEC_ALLOC_P (dw_cfi_ref, heap);
+DEF_VEC_ALLOC_P (dw_cfi_ref, gc);
+
+typedef VEC(dw_cfi_ref, gc) *cfi_vec;
+
 /* This is how we define the location of the CFA. We use to handle it
    as REG + OFFSET all the time,  but now it can be more complex.
    It can now be either REG + CFA_OFFSET or *(REG + BASE_OFFSET) + CFA_OFFSET.
@@ -304,8 +309,8 @@ typedef struct GTY(()) dw_fde_struct {
   const char *dw_fde_vms_begin_epilogue;
   const char *dw_fde_second_begin;
   const char *dw_fde_second_end;
-  dw_cfi_ref dw_fde_cfi;
-  dw_cfi_ref dw_fde_switch_cfi; /* Last CFI before switching sections.  */
+  cfi_vec dw_fde_cfi;
+  int dw_fde_switch_cfi_index; /* Last CFI before switching sections.  */
   HOST_WIDE_INT stack_realignment;
   unsigned funcdef_number;
   /* Dynamic realign argument pointer register.  */
@@ -410,8 +415,8 @@ current_fde (void)
   return fde_table_in_use ? &fde_table[fde_table_in_use - 1] : NULL;
 }
 
-/* A list of call frame insns for the CIE.  */
-static GTY(()) dw_cfi_ref cie_cfi_head;
+/* A vector of call frame insns for the CIE.  */
+static GTY(()) cfi_vec cie_cfi_vec;
 
 /* Some DWARF extensions (e.g., MIPS/SGI) implement a subprogram
    attribute that accelerates the lookup of the FDE associated
@@ -451,7 +456,7 @@ static GTY(()) section *cold_text_sectio
 static char *stripattributes (const char *);
 static const char *dwarf_cfi_name (unsigned);
 static dw_cfi_ref new_cfi (void);
-static void add_cfi (dw_cfi_ref *, dw_cfi_ref);
+static void add_cfi (cfi_vec *, dw_cfi_ref);
 static void add_fde_cfi (const char *, dw_cfi_ref);
 static void lookup_cfa_1 (dw_cfi_ref, dw_cfa_location *, dw_cfa_location *);
 static void lookup_cfa (dw_cfa_location *);
@@ -807,7 +812,6 @@ new_cfi (void)
 {
   dw_cfi_ref cfi = ggc_alloc_dw_cfi_node ();
 
-  cfi->dw_cfi_next = NULL;
   cfi->dw_cfi_oprnd1.dw_cfi_reg_num = 0;
   cfi->dw_cfi_oprnd2.dw_cfi_reg_num = 0;
 
@@ -817,9 +821,8 @@ new_cfi (void)
 /* Add a Call Frame Instruction to list of instructions.  */
 
 static inline void
-add_cfi (dw_cfi_ref *list_head, dw_cfi_ref cfi)
+add_cfi (cfi_vec *vec, dw_cfi_ref cfi)
 {
-  dw_cfi_ref *p;
   dw_fde_ref fde = current_fde ();
 
   /* When DRAP is used, CFA is defined with an expression.  Redefine
@@ -841,11 +844,7 @@ add_cfi (dw_cfi_ref *list_head, dw_cfi_r
           break;
       }
 
-  /* Find the end of the chain.  */
-  for (p = list_head; (*p) != NULL; p = &(*p)->dw_cfi_next)
-    ;
-
-  *p = cfi;
+  VEC_safe_push (dw_cfi_ref, gc, *vec, cfi);
 }
 
 /* Generate a new label for the CFI info to refer to.  FORCE is true
@@ -885,7 +884,12 @@ static bool any_cfis_emitted;
 static void
 add_fde_cfi (const char *label, dw_cfi_ref cfi)
 {
-  dw_cfi_ref *list_head;
+  cfi_vec *vec;
+
+  if (cie_cfi_vec == NULL)
+    cie_cfi_vec = VEC_alloc (dw_cfi_ref, gc, 20);
+
+  vec = &cie_cfi_vec;
 
   if (emit_cfa_remember)
     {
@@ -898,8 +902,6 @@ add_fde_cfi (const char *label, dw_cfi_r
       add_fde_cfi (label, cfi_remember);
     }
 
-  list_head = &cie_cfi_head;
-
   if (dwarf2out_do_cfi_asm ())
     {
       if (label)
@@ -957,7 +959,7 @@ add_fde_cfi (const char *label, dw_cfi_r
 
 	  output_cfi_directive (cfi);
 
-	  list_head = &fde->dw_fde_cfi;
+	  vec = &fde->dw_fde_cfi;
 	  any_cfis_emitted = true;
 	}
       /* ??? If this is a CFI for the CIE, we don't emit.  This
@@ -995,11 +997,11 @@ add_fde_cfi (const char *label, dw_cfi_r
 	  fde->dw_fde_current_label = label;
 	}
 
-      list_head = &fde->dw_fde_cfi;
+      vec = &fde->dw_fde_cfi;
       any_cfis_emitted = true;
     }
 
-  add_cfi (list_head, cfi);
+  add_cfi (vec, cfi);
 }
 
 /* Subroutine of lookup_cfa.  */
@@ -1046,6 +1048,7 @@ lookup_cfa_1 (dw_cfi_ref cfi, dw_cfa_loc
 static void
 lookup_cfa (dw_cfa_location *loc)
 {
+  int ix;
   dw_cfi_ref cfi;
   dw_fde_ref fde;
   dw_cfa_location remember;
@@ -1054,12 +1057,12 @@ lookup_cfa (dw_cfa_location *loc)
   loc->reg = INVALID_REGNUM;
   remember = *loc;
 
-  for (cfi = cie_cfi_head; cfi; cfi = cfi->dw_cfi_next)
+  FOR_EACH_VEC_ELT (dw_cfi_ref, cie_cfi_vec, ix, cfi)
     lookup_cfa_1 (cfi, loc, &remember);
 
   fde = current_fde ();
   if (fde)
-    for (cfi = fde->dw_fde_cfi; cfi; cfi = cfi->dw_cfi_next)
+    FOR_EACH_VEC_ELT (dw_cfi_ref, fde->dw_fde_cfi, ix, cfi)
       lookup_cfa_1 (cfi, loc, &remember);
 }
 
@@ -3430,169 +3433,183 @@ output_cfi_directive (dw_cfi_ref cfi)
     }
 }
 
-DEF_VEC_P (dw_cfi_ref);
-DEF_VEC_ALLOC_P (dw_cfi_ref, heap);
-
-/* Output CFIs to bring current FDE to the same state as after executing
-   CFIs in CFI chain.  DO_CFI_ASM is true if .cfi_* directives shall
-   be emitted, false otherwise.  If it is false, FDE and FOR_EH are the
-   other arguments to pass to output_cfi.  */
+/* Output CFIs from VEC, up to index UPTO, to bring current FDE to the
+   same state as after executing CFIs in CFI chain.  DO_CFI_ASM is
+   true if .cfi_* directives shall be emitted, false otherwise.  If it
+   is false, FDE and FOR_EH are the other arguments to pass to
+   output_cfi.  */
 
 static void
-output_cfis (dw_cfi_ref cfi, bool do_cfi_asm, dw_fde_ref fde, bool for_eh)
+output_cfis (cfi_vec vec, int upto, bool do_cfi_asm,
+	     dw_fde_ref fde, bool for_eh)
 {
+  int ix;
   struct dw_cfi_struct cfi_buf;
   dw_cfi_ref cfi2;
   dw_cfi_ref cfi_args_size = NULL, cfi_cfa = NULL, cfi_cfa_offset = NULL;
-  VEC (dw_cfi_ref, heap) *regs = VEC_alloc (dw_cfi_ref, heap, 32);
+  VEC(dw_cfi_ref, heap) *regs = VEC_alloc (dw_cfi_ref, heap, 32);
   unsigned int len, idx;
 
-  for (;; cfi = cfi->dw_cfi_next)
-    switch (cfi ? cfi->dw_cfi_opc : DW_CFA_nop)
-      {
-      case DW_CFA_advance_loc:
-      case DW_CFA_advance_loc1:
-      case DW_CFA_advance_loc2:
-      case DW_CFA_advance_loc4:
-      case DW_CFA_MIPS_advance_loc8:
-      case DW_CFA_set_loc:
-	/* All advances should be ignored.  */
-	break;
-      case DW_CFA_remember_state:
+  for (ix = 0; ix < upto + 1; ix++)
+    {
+      dw_cfi_ref cfi = ix < upto ? VEC_index (dw_cfi_ref, vec, ix) : NULL;
+      switch (cfi ? cfi->dw_cfi_opc : DW_CFA_nop)
 	{
-	  dw_cfi_ref args_size = cfi_args_size;
-
-	  /* Skip everything between .cfi_remember_state and
-	     .cfi_restore_state.  */
-	  for (cfi2 = cfi->dw_cfi_next; cfi2; cfi2 = cfi2->dw_cfi_next)
-	    if (cfi2->dw_cfi_opc == DW_CFA_restore_state)
-	      break;
-	    else if (cfi2->dw_cfi_opc == DW_CFA_GNU_args_size)
-	      args_size = cfi2;
-	    else
-	      gcc_assert (cfi2->dw_cfi_opc != DW_CFA_remember_state);
-
-	  if (cfi2 == NULL)
-	    goto flush_all;
-	  else
-	    {
-	      cfi = cfi2;
-	      cfi_args_size = args_size;
-	    }
+	case DW_CFA_advance_loc:
+	case DW_CFA_advance_loc1:
+	case DW_CFA_advance_loc2:
+	case DW_CFA_advance_loc4:
+	case DW_CFA_MIPS_advance_loc8:
+	case DW_CFA_set_loc:
+	  /* All advances should be ignored.  */
 	  break;
-	}
-      case DW_CFA_GNU_args_size:
-	cfi_args_size = cfi;
-	break;
-      case DW_CFA_GNU_window_save:
-	goto flush_all;
-      case DW_CFA_offset:
-      case DW_CFA_offset_extended:
-      case DW_CFA_offset_extended_sf:
-      case DW_CFA_restore:
-      case DW_CFA_restore_extended:
-      case DW_CFA_undefined:
-      case DW_CFA_same_value:
-      case DW_CFA_register:
-      case DW_CFA_val_offset:
-      case DW_CFA_val_offset_sf:
-      case DW_CFA_expression:
-      case DW_CFA_val_expression:
-      case DW_CFA_GNU_negative_offset_extended:
-	if (VEC_length (dw_cfi_ref, regs) <= cfi->dw_cfi_oprnd1.dw_cfi_reg_num)
-	  VEC_safe_grow_cleared (dw_cfi_ref, heap, regs,
-				 cfi->dw_cfi_oprnd1.dw_cfi_reg_num + 1);
-	VEC_replace (dw_cfi_ref, regs, cfi->dw_cfi_oprnd1.dw_cfi_reg_num, cfi);
-	break;
-      case DW_CFA_def_cfa:
-      case DW_CFA_def_cfa_sf:
-      case DW_CFA_def_cfa_expression:
-	cfi_cfa = cfi;
-	cfi_cfa_offset = cfi;
-	break;
-      case DW_CFA_def_cfa_register:
-	cfi_cfa = cfi;
-	break;
-      case DW_CFA_def_cfa_offset:
-      case DW_CFA_def_cfa_offset_sf:
-	cfi_cfa_offset = cfi;
-	break;
-      case DW_CFA_nop:
-	gcc_assert (cfi == NULL);
-      flush_all:
-	len = VEC_length (dw_cfi_ref, regs);
-	for (idx = 0; idx < len; idx++)
+	case DW_CFA_remember_state:
 	  {
-	    cfi2 = VEC_replace (dw_cfi_ref, regs, idx, NULL);
-	    if (cfi2 != NULL
-		&& cfi2->dw_cfi_opc != DW_CFA_restore
-		&& cfi2->dw_cfi_opc != DW_CFA_restore_extended)
+	    dw_cfi_ref args_size = cfi_args_size;
+
+	    /* Skip everything between .cfi_remember_state and
+	       .cfi_restore_state.  */
+	    ix++;
+	    if (ix == upto)
+	      goto flush_all;
+
+	    for (; ix < upto; ix++)
 	      {
-		if (do_cfi_asm)
-		  output_cfi_directive (cfi2);
+		cfi2 = VEC_index (dw_cfi_ref, vec, ix);
+		if (cfi2->dw_cfi_opc == DW_CFA_restore_state)
+		  break;
+		else if (cfi2->dw_cfi_opc == DW_CFA_GNU_args_size)
+		  args_size = cfi2;
 		else
-		  output_cfi (cfi2, fde, for_eh);
-	      }
-	  }
-	if (cfi_cfa && cfi_cfa_offset && cfi_cfa_offset != cfi_cfa)
-	  {
-	    gcc_assert (cfi_cfa->dw_cfi_opc != DW_CFA_def_cfa_expression);
-	    cfi_buf = *cfi_cfa;
-	    switch (cfi_cfa_offset->dw_cfi_opc)
-	      {
-	      case DW_CFA_def_cfa_offset:
-		cfi_buf.dw_cfi_opc = DW_CFA_def_cfa;
-		cfi_buf.dw_cfi_oprnd2 = cfi_cfa_offset->dw_cfi_oprnd1;
-		break;
-	      case DW_CFA_def_cfa_offset_sf:
-		cfi_buf.dw_cfi_opc = DW_CFA_def_cfa_sf;
-		cfi_buf.dw_cfi_oprnd2 = cfi_cfa_offset->dw_cfi_oprnd1;
-		break;
-	      case DW_CFA_def_cfa:
-	      case DW_CFA_def_cfa_sf:
-		cfi_buf.dw_cfi_opc = cfi_cfa_offset->dw_cfi_opc;
-		cfi_buf.dw_cfi_oprnd2 = cfi_cfa_offset->dw_cfi_oprnd2;
-		break;
-	      default:
-		gcc_unreachable ();
+		  gcc_assert (cfi2->dw_cfi_opc != DW_CFA_remember_state);
 	      }
-	    cfi_cfa = &cfi_buf;
-	  }
-	else if (cfi_cfa_offset)
-	  cfi_cfa = cfi_cfa_offset;
-	if (cfi_cfa)
-	  {
-	    if (do_cfi_asm)
-	      output_cfi_directive (cfi_cfa);
-	    else
-	      output_cfi (cfi_cfa, fde, for_eh);
-	  }
-	cfi_cfa = NULL;
-	cfi_cfa_offset = NULL;
-	if (cfi_args_size
-	    && cfi_args_size->dw_cfi_oprnd1.dw_cfi_offset)
-	  {
-	    if (do_cfi_asm)
-	      output_cfi_directive (cfi_args_size);
-	    else
-	      output_cfi (cfi_args_size, fde, for_eh);
-	  }
-	cfi_args_size = NULL;
-	if (cfi == NULL)
-	  {
-	    VEC_free (dw_cfi_ref, heap, regs);
-	    return;
+
+	    cfi_args_size = args_size;
+	    break;
 	  }
-	else if (do_cfi_asm)
-	  output_cfi_directive (cfi);
-	else
-	  output_cfi (cfi, fde, for_eh);
-	break;
-      default:
-	gcc_unreachable ();
+	case DW_CFA_GNU_args_size:
+	  cfi_args_size = cfi;
+	  break;
+	case DW_CFA_GNU_window_save:
+	  goto flush_all;
+	case DW_CFA_offset:
+	case DW_CFA_offset_extended:
+	case DW_CFA_offset_extended_sf:
+	case DW_CFA_restore:
+	case DW_CFA_restore_extended:
+	case DW_CFA_undefined:
+	case DW_CFA_same_value:
+	case DW_CFA_register:
+	case DW_CFA_val_offset:
+	case DW_CFA_val_offset_sf:
+	case DW_CFA_expression:
+	case DW_CFA_val_expression:
+	case DW_CFA_GNU_negative_offset_extended:
+	  if (VEC_length (dw_cfi_ref, regs)
+	      <= cfi->dw_cfi_oprnd1.dw_cfi_reg_num)
+	    VEC_safe_grow_cleared (dw_cfi_ref, heap, regs,
+				   cfi->dw_cfi_oprnd1.dw_cfi_reg_num + 1);
+	  VEC_replace (dw_cfi_ref, regs, cfi->dw_cfi_oprnd1.dw_cfi_reg_num,
+		       cfi);
+	  break;
+	case DW_CFA_def_cfa:
+	case DW_CFA_def_cfa_sf:
+	case DW_CFA_def_cfa_expression:
+	  cfi_cfa = cfi;
+	  cfi_cfa_offset = cfi;
+	  break;
+	case DW_CFA_def_cfa_register:
+	  cfi_cfa = cfi;
+	  break;
+	case DW_CFA_def_cfa_offset:
+	case DW_CFA_def_cfa_offset_sf:
+	  cfi_cfa_offset = cfi;
+	  break;
+	case DW_CFA_nop:
+	  gcc_assert (cfi == NULL);
+	flush_all:
+	  len = VEC_length (dw_cfi_ref, regs);
+	  for (idx = 0; idx < len; idx++)
+	    {
+	      cfi2 = VEC_replace (dw_cfi_ref, regs, idx, NULL);
+	      if (cfi2 != NULL
+		  && cfi2->dw_cfi_opc != DW_CFA_restore
+		  && cfi2->dw_cfi_opc != DW_CFA_restore_extended)
+		{
+		  if (do_cfi_asm)
+		    output_cfi_directive (cfi2);
+		  else
+		    output_cfi (cfi2, fde, for_eh);
+		}
+	    }
+	  if (cfi_cfa && cfi_cfa_offset && cfi_cfa_offset != cfi_cfa)
+	    {
+	      gcc_assert (cfi_cfa->dw_cfi_opc != DW_CFA_def_cfa_expression);
+	      cfi_buf = *cfi_cfa;
+	      switch (cfi_cfa_offset->dw_cfi_opc)
+		{
+		case DW_CFA_def_cfa_offset:
+		  cfi_buf.dw_cfi_opc = DW_CFA_def_cfa;
+		  cfi_buf.dw_cfi_oprnd2 = cfi_cfa_offset->dw_cfi_oprnd1;
+		  break;
+		case DW_CFA_def_cfa_offset_sf:
+		  cfi_buf.dw_cfi_opc = DW_CFA_def_cfa_sf;
+		  cfi_buf.dw_cfi_oprnd2 = cfi_cfa_offset->dw_cfi_oprnd1;
+		  break;
+		case DW_CFA_def_cfa:
+		case DW_CFA_def_cfa_sf:
+		  cfi_buf.dw_cfi_opc = cfi_cfa_offset->dw_cfi_opc;
+		  cfi_buf.dw_cfi_oprnd2 = cfi_cfa_offset->dw_cfi_oprnd2;
+		  break;
+		default:
+		  gcc_unreachable ();
+		}
+	      cfi_cfa = &cfi_buf;
+	    }
+	  else if (cfi_cfa_offset)
+	    cfi_cfa = cfi_cfa_offset;
+	  if (cfi_cfa)
+	    {
+	      if (do_cfi_asm)
+		output_cfi_directive (cfi_cfa);
+	      else
+		output_cfi (cfi_cfa, fde, for_eh);
+	    }
+	  cfi_cfa = NULL;
+	  cfi_cfa_offset = NULL;
+	  if (cfi_args_size
+	      && cfi_args_size->dw_cfi_oprnd1.dw_cfi_offset)
+	    {
+	      if (do_cfi_asm)
+		output_cfi_directive (cfi_args_size);
+	      else
+		output_cfi (cfi_args_size, fde, for_eh);
+	    }
+	  cfi_args_size = NULL;
+	  if (cfi == NULL)
+	    {
+	      VEC_free (dw_cfi_ref, heap, regs);
+	      return;
+	    }
+	  else if (do_cfi_asm)
+	    output_cfi_directive (cfi);
+	  else
+	    output_cfi (cfi, fde, for_eh);
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
     }
 }
 
+/* Like output_cfis, but emit all CFIs in the vector.  */
+static void
+output_all_cfis (cfi_vec vec, bool do_cfi_asm,
+		 dw_fde_ref fde, bool for_eh)
+{
+  output_cfis (vec, VEC_length (dw_cfi_ref, vec), do_cfi_asm, fde, for_eh);
+}
+
 /* Output one FDE.  */
 
 static void
@@ -3600,6 +3617,7 @@ output_fde (dw_fde_ref fde, bool for_eh,
 	    char *section_start_label, int fde_encoding, char *augmentation,
 	    bool any_lsda_needed, int lsda_encoding)
 {
+  int ix;
   const char *begin, *end;
   static unsigned int j;
   char l1[20], l2[20];
@@ -3687,31 +3705,31 @@ output_fde (dw_fde_ref fde, bool for_eh,
      this FDE.  */
   fde->dw_fde_current_label = begin;
   if (fde->dw_fde_second_begin == NULL)
-    for (cfi = fde->dw_fde_cfi; cfi != NULL; cfi = cfi->dw_cfi_next)
+    FOR_EACH_VEC_ELT (dw_cfi_ref, fde->dw_fde_cfi, ix, cfi)
       output_cfi (cfi, fde, for_eh);
   else if (!second)
     {
-      if (fde->dw_fde_switch_cfi)
-	for (cfi = fde->dw_fde_cfi; cfi != NULL; cfi = cfi->dw_cfi_next)
+      if (fde->dw_fde_switch_cfi_index > 0)
+	FOR_EACH_VEC_ELT (dw_cfi_ref, fde->dw_fde_cfi, ix, cfi)
 	  {
-	    output_cfi (cfi, fde, for_eh);
-	    if (cfi == fde->dw_fde_switch_cfi)
+	    if (ix == fde->dw_fde_switch_cfi_index)
 	      break;
+	    output_cfi (cfi, fde, for_eh);
 	  }
     }
   else
     {
-      dw_cfi_ref cfi_next = fde->dw_fde_cfi;
+      int i, from = 0;
+      int until = VEC_length (dw_cfi_ref, fde->dw_fde_cfi);
 
-      if (fde->dw_fde_switch_cfi)
+      if (fde->dw_fde_switch_cfi_index > 0)
 	{
-	  cfi_next = fde->dw_fde_switch_cfi->dw_cfi_next;
-	  fde->dw_fde_switch_cfi->dw_cfi_next = NULL;
-	  output_cfis (fde->dw_fde_cfi, false, fde, for_eh);
-	  fde->dw_fde_switch_cfi->dw_cfi_next = cfi_next;
+	  from = fde->dw_fde_switch_cfi_index;
+	  output_cfis (fde->dw_fde_cfi, from, false, fde, for_eh);
 	}
-      for (cfi = cfi_next; cfi != NULL; cfi = cfi->dw_cfi_next)
-	output_cfi (cfi, fde, for_eh);
+      for (i = from; i < until; i++)
+	output_cfi (VEC_index (dw_cfi_ref, fde->dw_fde_cfi, i),
+		    fde, for_eh);
     }
 
   /* If we are to emit a ref/link from function bodies to their frame tables,
@@ -3947,7 +3965,7 @@ output_call_frame_info (int for_eh)
 			     eh_data_format_name (fde_encoding));
     }
 
-  for (cfi = cie_cfi_head; cfi != NULL; cfi = cfi->dw_cfi_next)
+  FOR_EACH_VEC_ELT (dw_cfi_ref, cie_cfi_vec, i, cfi)
     output_cfi (cfi, NULL, for_eh);
 
   /* Pad the CIE out to an address sized boundary.  */
@@ -4089,8 +4107,8 @@ dwarf2out_begin_prologue (unsigned int l
   fde->dw_fde_second_end = NULL;
   fde->dw_fde_vms_end_prologue = NULL;
   fde->dw_fde_vms_begin_epilogue = NULL;
-  fde->dw_fde_cfi = NULL;
-  fde->dw_fde_switch_cfi = NULL;
+  fde->dw_fde_cfi = VEC_alloc (dw_cfi_ref, gc, 20);
+  fde->dw_fde_switch_cfi_index = 0;
   fde->funcdef_number = current_function_funcdef_no;
   fde->all_throwers_are_sibcalls = crtl->all_throwers_are_sibcalls;
   fde->uses_eh_lsda = crtl->uses_eh_lsda;
@@ -4251,7 +4269,6 @@ dwarf2out_switch_text_section (void)
 {
   section *sect;
   dw_fde_ref fde = current_fde ();
-  dw_cfi_ref cfi;
 
   gcc_assert (cfun && fde && fde->dw_fde_second_begin == NULL);
 
@@ -4293,13 +4310,9 @@ dwarf2out_switch_text_section (void)
       dwarf2out_do_cfi_startproc (true);
       /* As this is a different FDE, insert all current CFI instructions
 	 again.  */
-      output_cfis (fde->dw_fde_cfi, true, fde, true);
+      output_all_cfis (fde->dw_fde_cfi, true, fde, true);
     }
-  cfi = fde->dw_fde_cfi;
-  if (cfi)
-    while (cfi->dw_cfi_next != NULL)
-      cfi = cfi->dw_cfi_next;
-  fde->dw_fde_switch_cfi = cfi;
+  fde->dw_fde_switch_cfi_index = VEC_length (dw_cfi_ref, fde->dw_fde_cfi);
   var_location_switch_text_section ();
 
   set_cur_line_info_table (sect);
@@ -17152,6 +17165,7 @@ tree_add_const_value_attribute_for_decl 
 static dw_loc_list_ref
 convert_cfa_to_fb_loc_list (HOST_WIDE_INT offset)
 {
+  int ix;
   dw_fde_ref fde;
   dw_loc_list_ref list, *list_tail;
   dw_cfi_ref cfi;
@@ -17174,13 +17188,13 @@ convert_cfa_to_fb_loc_list (HOST_WIDE_IN
 
   /* ??? Bald assumption that the CIE opcode list does not contain
      advance opcodes.  */
-  for (cfi = cie_cfi_head; cfi; cfi = cfi->dw_cfi_next)
+  FOR_EACH_VEC_ELT (dw_cfi_ref, cie_cfi_vec, ix, cfi)
     lookup_cfa_1 (cfi, &next_cfa, &remember);
 
   last_cfa = next_cfa;
   last_label = start_label;
 
-  if (fde->dw_fde_second_begin && fde->dw_fde_switch_cfi == NULL)
+  if (fde->dw_fde_second_begin && fde->dw_fde_switch_cfi_index == 0)
     {
       /* If the first partition contained no CFI adjustments, the
 	 CIE opcodes apply to the whole first partition.  */
@@ -17190,7 +17204,7 @@ convert_cfa_to_fb_loc_list (HOST_WIDE_IN
       start_label = last_label = fde->dw_fde_second_begin;
     }
 
-  for (cfi = fde->dw_fde_cfi; cfi; cfi = cfi->dw_cfi_next)
+  FOR_EACH_VEC_ELT (dw_cfi_ref, fde->dw_fde_cfi, ix, cfi)
     {
       switch (cfi->dw_cfi_opc)
 	{
@@ -17218,7 +17232,7 @@ convert_cfa_to_fb_loc_list (HOST_WIDE_IN
 	  lookup_cfa_1 (cfi, &next_cfa, &remember);
 	  break;
 	}
-      if (cfi == fde->dw_fde_switch_cfi)
+      if (ix + 1 == fde->dw_fde_switch_cfi_index)
 	{
 	  if (!cfa_equal_p (&last_cfa, &next_cfa))
 	    {

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 3/6] Allow jumps in epilogues
  2011-04-05 21:59                 ` Bernd Schmidt
@ 2011-04-11 17:10                   ` Richard Henderson
  2011-04-13 14:16                     ` Bernd Schmidt
  2011-04-13 15:28                     ` Bernd Schmidt
  0 siblings, 2 replies; 73+ messages in thread
From: Richard Henderson @ 2011-04-11 17:10 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches

On 04/05/2011 02:58 PM, Bernd Schmidt wrote:
> 	* dwarf2out.c (struct dw_cfi_struct): Remove member dw_cfi_next.
> 	(dw_cfi_ref): Add DEF_VEC_P and some DEF_VEC_ALLOC_Ps.
> 	(cfi_vec): New typedef.
> 	(struct dw_fde_struct): Make dw_fde_cfi a cfi_vec. Replace
> 	dw_fde_switch_cfi with an integer dw_fde_switch_cfi_index.
> 	(cie_cfi_vec): New static variable.
> 	(cie_cfi_head): Delete.
> 	(add_cfi): Accept a cfi_vec * as first argument. All callers and
> 	declaration changed. Use vector rather than list operations.
> 	(new_cfi): Don't initialize the dw_cfi_next field.
> 	(add_fde_cfi): Allocate cie_cfi_vec if necessary. Use vector
> 	rather than list operations.
> 	(lookup_cfa): Use vector rather than list operations.
> 	(output_cfis): New argument upto. Accept a cfi_vec rather than
> 	a dw_cfi_ref list head as argument. All callers changed.
> 	Iterate over the vector using upto as a maximum index.
> 	(output_all_cfis): New static function.
> 	(output_fde): Use vector rather than list operations. Use the
> 	new upto argument for output_cfis rather than manipulating a
> 	list.
> 	(dwarf2out_begin_prologue): Change initializations to match
> 	new struct members.
> 	(dwarf2out_switch_text_section): Initialize dw_fde_switch_cfi_index
> 	from the vector length rather than searching for the end of a list.
> 	Use output_all_cfis.
> 	(convert_cfa_to_fb_loc_list): Use vector rather than list operations.

Ok.


r~

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 3/6] Allow jumps in epilogues
  2011-04-11 17:10                   ` Richard Henderson
@ 2011-04-13 14:16                     ` Bernd Schmidt
  2011-04-13 15:14                       ` Bernd Schmidt
                                         ` (2 more replies)
  2011-04-13 15:28                     ` Bernd Schmidt
  1 sibling, 3 replies; 73+ messages in thread
From: Bernd Schmidt @ 2011-04-13 14:16 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches

On 04/11/2011 07:10 PM, Richard Henderson wrote:
> Ok.

Did you receive my reply to this message from earlier today? It doesn't
seem to have made it to gcc-patches yet.


Bernd

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 3/6] Allow jumps in epilogues
  2011-04-13 15:28                     ` Bernd Schmidt
@ 2011-04-13 14:44                       ` Richard Henderson
  2011-04-13 14:54                         ` Jakub Jelinek
  2011-04-15 16:29                       ` Bernd Schmidt
  1 sibling, 1 reply; 73+ messages in thread
From: Richard Henderson @ 2011-04-13 14:44 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches

On 04/13/2011 05:38 AM, Bernd Schmidt wrote:
> This bootstraps and tests ok on i686-linux. However, there is work left
> to be done. Can I take you up on your offer to work with me on this?
> This still requires the i386 output_set_got which I think I can cope
> with, but the ia64 backend does a number of things with unwinding that I
> don't understand. Also, I'll be away the next two weeks - if you arrive
> at a complete version during that time it would be great if you could
> commit it.

Ok, I'll put this on my to-do list.

> One thing to note is that it seems surprisingly hard to make
> -freorder-blocks-and-partition do anything interesting. There's one C++
> testcase (partition2.C I think) which I used to debug this code, but
> other than that I haven't really found anything that actually generates
> two nonempty partitions.

Yeah, while I was working on dwarf line numbers recently, I found that
just about the only thing that would produce anything interesting was
a profiled bootstrap.


r~

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 3/6] Allow jumps in epilogues
  2011-04-13 14:44                       ` Richard Henderson
@ 2011-04-13 14:54                         ` Jakub Jelinek
  0 siblings, 0 replies; 73+ messages in thread
From: Jakub Jelinek @ 2011-04-13 14:54 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Bernd Schmidt, GCC Patches

On Wed, Apr 13, 2011 at 07:44:26AM -0700, Richard Henderson wrote:
> Yeah, while I was working on dwarf line numbers recently, I found that
> just about the only thing that would produce anything interesting was
> a profiled bootstrap.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48253#c1

is what I've been using when I touched dwarf2out recently.

	Jakub

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 3/6] Allow jumps in epilogues
  2011-04-13 14:16                     ` Bernd Schmidt
@ 2011-04-13 15:14                       ` Bernd Schmidt
  2011-04-13 15:16                       ` Bernd Schmidt
  2011-04-13 15:17                       ` Bernd Schmidt
  2 siblings, 0 replies; 73+ messages in thread
From: Bernd Schmidt @ 2011-04-13 15:14 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4775 bytes --]

On 04/13/2011 04:14 PM, Bernd Schmidt wrote:
> On 04/11/2011 07:10 PM, Richard Henderson wrote:
>> Ok.
> 
> Did you receive my reply to this message from earlier today? It doesn't
> seem to have made it to gcc-patches yet.

Since gcc-patches appears to have dropped the message, I'll resend it in
three parts.

There are three patches here, but they must be applied together (things
will mostly work otherwise, but I expect -freorder-blocks-and-partition
is broken in the intermediate stages). Below is the ChangeLog for the
entire set, and the first of the patches. This is just a new version of
the previously posted 002-scanfirst patch, now changed to delete the CFI
notes afterwards in order to avoid -fcompare-debug failures.


Bernd

	* target.def (dwarf_handle_frame_unspec): Remove label argument.
	* doc/tm.texi: Regenerate.
	* tree.h (dwarf2out_cfi_label, dwarf2out_def_cfa,
	dwarf2out_window_save, dwarf2out_reg_save, dwarf2out_return_save,
	dwarf2out_return_reg, dwarf2out_reg_save_reg): Don't declare.
	* final.c (final_start_function): Call
	dwarf2out_frame_debug_after_prologue.
	(final_scan_insn): Don't call dwarf2out_frame_debug for anything.
	Handle NOTE_INSN_CFI and NOTE_INSN_CFI_LABEL.
	(final): Delete these notes.
	* insn-notes.def (CFI, CFI_LABEL): New.
	* jump.c (addr_vec_p): New function.
	* dwarf2out.c (cfi_insn): New static variable.
	(dwarf2out_cfi_label): Remove force argument. All callers changed.
	Only generate the label, don't emit it.
	(dwarf2out_maybe_emit_cfi_label): New function.
	
	(add_fde_cfi): Remove label argument.  All callers changed.  Remove
	most code; leave a condition to either emit a CFI insn, or add the
	CFI to the FDE CFI vector.
	(add_cie_cfi): New static function.
	(add_cfi): Remove function.
	(old_cfa): New static variable.
	(cfa_remember): Remove static variable.
	(dwarf2out_def_cfa): Replace label argument with a bool for_cie
	argument.  All callers changed.  Don't use lookup_cfa; use and
	update the global old_cfa variable.  Call add_fde_cfi or add_cie_cfi
	at the end.
	(reg_save): Replace label argument with a bool.  All callers changed.
	Call add_fde_cfi or add_cie_cfi at the end.
	(dwarf2out_reg_save, dwarf2out_return_save, dwarf2out_return_reg,
	dwarf2out_args_szie, dwarf2out_stack_adjust, dwarf2out_reg_save_reg,
	dwarf2out_frame_debug_def_cfa, dwarf2out_frame_debug_cfa_offset,
	dwarf2out_frame_debug_cfa_register, dwarf2out_frame_debug_cfa_restore,
	dwarf2out_frame_debug_cfa_expression, dwarf2out_frame_debug_expr):
	Remove label argument.  All callers changed.
	(barrier_args_size): Remove variable.
	(compute_barrier_args_size_1, compute_barrier_args_size): Remove
	functions.
	(dwarf2out_notice_stack_adjust): Don't handle barriers.
	(last_reg_save_label): Remove variable.  All sets and uses removed.
	(cfi_label_required_p, add_cfis_to_fde): New static functions.
	(dwarf2out_frame_debug_restore_state): Simply add the new CFI.
	(dwarf2out_frame_debug): Set cfi_insn, and clear it.  Don't call
	dwarf2out_flush_queued_reg_saves at the top.
	(dwarf2out_frame_debug_init): Initialize old_cfa.
	(copy_cfi_vec_parts): New static function.
	(jump_target_info): New struct type.
	(dwarf2out_cfi_begin_epilogue): Remove.
	(save_point_p, record_current_state, maybe_record_jump_target,
	vec_is_prefix_of, append_extra_cfis, debug_cfi_vec, switch_note_p,
	scan_until_barrier, find_best_starting_point): New static functions.
	(dwarf2out_frame_debug_after_prologue): New function.
	(dwarf2out_emit_cfi): New function.
	(output_cfi_directive): New FILE argument.  All callers changed.
	Avoid some paths if it is not asm_out_file; otherwise print to it.
	(output_all_cfis): Remove function.
	(output_cfis): Remove do_cfi_asm arg.  All callers changed.  Never
	call output_cfi_directive.
	(dwarf2out_frame_init): Initialize old_cfa.
	(dwarf2out_switch_text_section): Don't initialize dw_fde_current_label.
	Don't call output_all_cfis.
	* dwarf2out.h (dwarf2out_cfi_label, dwarf2out_def_cfa,
	dwarf2out_window_save, dwarf2out_reg_save, dwarf2out_return_save,
	dwarf2out_return_reg, dwarf2out_reg_save_reg, dwarf2out_emit_cfi,
	dwarf2out_frame_debug_after_prologue): Declare.
	(dwarf2out_cfi_begin_epilogue, dwarf2out_frame_debug_restore_state):
	Don't declare.
	(struct dw_cfi_struct): Add forward declaration.
	* rtl.h (union rtunion_def): Add rt_cfi member.
	(XCFI, XCCFI, NOTE_CFI, NOTE_LABEL_NUMBER): New macros.
	(addr_vec_p): Declare.
	* config/sparc/sparc.c (sparc_dwarf_handle_frame_unspec): Remove
	label argument.
	* config/ia64/ia64.c (ia64_dwarf_handle_frame_unspec): Likewise.
	* config/arm/arm.c (thumb_pushpop): Use dwarf2out_maybe_emit_cfi_label
	rather than dwarf2out_cfi_label.
	(thumb1_output_function_prologue): Likewise.
	(arm_dwarf_handle_frame_unspec): Remove label argument.

[-- Attachment #2: 005-scanfirst.diff --]
[-- Type: text/plain, Size: 11200 bytes --]

    	* cfgcleanup.c (flow_find_head_matching_sequence): Ignore
    	epilogue notes.
    	* df-problems.c (can_move_insns_across): Don't stop at epilogue
    	notes.
    	* dwarf2out.c (dwarf2out_cfi_begin_epilogue): Also allow a
    	simplejump to end the block.

Index: gcc/dwarf2out.c
===================================================================
--- gcc.orig/dwarf2out.c
+++ gcc/dwarf2out.c
@@ -470,6 +470,8 @@ static void output_call_frame_info (int)
 static void dwarf2out_note_section_used (void);
 static bool clobbers_queued_reg_save (const_rtx);
 static void dwarf2out_frame_debug_expr (rtx, const char *);
+static void dwarf2out_cfi_begin_epilogue (rtx);
+static void dwarf2out_frame_debug_restore_state (void);
 
 /* Support for complex CFA locations.  */
 static void output_cfa_loc (dw_cfi_ref, int);
@@ -847,6 +849,15 @@ add_cfi (cfi_vec *vec, dw_cfi_ref cfi)
   VEC_safe_push (dw_cfi_ref, gc, *vec, cfi);
 }
 
+/* The insn after which a new CFI note should be emitted.  */
+static rtx cfi_insn;
+
+/* True if remember_state should be emitted before following CFI directive.  */
+static bool emit_cfa_remember;
+
+/* True if any CFI directives were emitted at the current insn.  */
+static bool any_cfis_emitted;
+
 /* Generate a new label for the CFI info to refer to.  FORCE is true
    if a label needs to be output even when using .cfi_* directives.  */
 
@@ -866,18 +877,13 @@ dwarf2out_cfi_label (bool force)
     {
       int num = dwarf2out_cfi_label_num++;
       ASM_GENERATE_INTERNAL_LABEL (label, "LCFI", num);
-      ASM_OUTPUT_DEBUG_LABEL (asm_out_file, "LCFI", num);
+      cfi_insn = emit_note_after (NOTE_INSN_CFI_LABEL, cfi_insn);
+      NOTE_LABEL_NUMBER (cfi_insn) = num;
     }
 
   return label;
 }
 
-/* True if remember_state should be emitted before following CFI directive.  */
-static bool emit_cfa_remember;
-
-/* True if any CFI directives were emitted at the current insn.  */
-static bool any_cfis_emitted;
-
 /* Add CFI to the current fde at the PC value indicated by LABEL if specified,
    or to the CIE if LABEL is NULL.  */
 
@@ -957,7 +963,8 @@ add_fde_cfi (const char *label, dw_cfi_r
 	        }
 	    }
 
-	  output_cfi_directive (cfi);
+	  cfi_insn = emit_note_after (NOTE_INSN_CFI, cfi_insn);
+	  NOTE_CFI (cfi_insn) = cfi;
 
 	  vec = &fde->dw_fde_cfi;
 	  any_cfis_emitted = true;
@@ -2791,6 +2798,11 @@ dwarf2out_frame_debug (rtx insn, bool af
   rtx note, n;
   bool handled_one = false;
 
+  if (after_p)
+    cfi_insn = insn;
+  else
+    cfi_insn = PREV_INSN (insn);
+
   if (!NONJUMP_INSN_P (insn) || clobbers_queued_reg_save (insn))
     dwarf2out_flush_queued_reg_saves ();
 
@@ -2914,6 +2926,7 @@ void
 dwarf2out_frame_debug_init (void)
 {
   size_t i;
+  rtx insn;
 
   /* Flush any queued register saves.  */
   dwarf2out_flush_queued_reg_saves ();
@@ -2940,12 +2953,64 @@ dwarf2out_frame_debug_init (void)
       XDELETEVEC (barrier_args_size);
       barrier_args_size = NULL;
     }
+  for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
+    {
+      rtx pat;
+      if (BARRIER_P (insn))
+	{
+	  dwarf2out_frame_debug (insn, false);
+	  continue;
+	}
+      else if (NOTE_P (insn))
+	{
+	  switch (NOTE_KIND (insn))
+	    {
+	    case NOTE_INSN_EPILOGUE_BEG:
+#if defined (HAVE_epilogue)
+	      dwarf2out_cfi_begin_epilogue (insn);
+#endif
+	      break;
+	    case NOTE_INSN_CFA_RESTORE_STATE:
+	      cfi_insn = insn;
+	      dwarf2out_frame_debug_restore_state ();
+	      break;
+	    }
+	  continue;
+	}
+      if (!NONDEBUG_INSN_P (insn))
+	continue;
+      pat = PATTERN (insn);
+      if (asm_noperands (pat) >= 0)
+	continue;
+      if (GET_CODE (pat) == SEQUENCE)
+	{
+	  int j;
+	  for (j = 1; j < XVECLEN (pat, 0); j++)
+	    dwarf2out_frame_debug (XVECEXP (pat, 0, j), false);
+	  insn = XVECEXP (pat, 0, 0);
+	}
+
+      if (CALL_P (insn) && dwarf2out_do_frame ())
+	dwarf2out_frame_debug (insn, false);
+      if (dwarf2out_do_frame ()
+#if !defined (HAVE_prologue)
+	  && !ACCUMULATE_OUTGOING_ARGS
+#endif
+	  )
+	dwarf2out_frame_debug (insn, true);
+    }
+}
+
+void
+dwarf2out_emit_cfi (dw_cfi_ref cfi)
+{
+  output_cfi_directive (cfi);
 }
 
-/* Determine if we need to save and restore CFI information around this
-   epilogue.  If SIBCALL is true, then this is a sibcall epilogue.  If
-   we do need to save/restore, then emit the save now, and insert a
-   NOTE_INSN_CFA_RESTORE_STATE at the appropriate place in the stream.  */
+/* Determine if we need to save and restore CFI information around
+   this epilogue.  If we do need to save/restore, then emit the save
+   now, and insert a NOTE_INSN_CFA_RESTORE_STATE at the appropriate
+   place in the stream.  */
 
 void
 dwarf2out_cfi_begin_epilogue (rtx insn)
@@ -2960,8 +3025,10 @@ dwarf2out_cfi_begin_epilogue (rtx insn)
       if (!INSN_P (i))
 	continue;
 
-      /* Look for both regular and sibcalls to end the block.  */
-      if (returnjump_p (i))
+      /* Look for both regular and sibcalls to end the block.  Various
+	 optimization passes may cause us to jump to a common epilogue
+	 tail, so we also accept simplejumps.  */
+      if (returnjump_p (i) || simplejump_p (i))
 	break;
       if (CALL_P (i) && SIBLING_CALL_P (i))
 	break;
Index: gcc/dwarf2out.h
===================================================================
--- gcc.orig/dwarf2out.h
+++ gcc/dwarf2out.h
@@ -18,11 +18,11 @@ You should have received a copy of the G
 along with GCC; see the file COPYING3.  If not see
 <http://www.gnu.org/licenses/>.  */
 
+struct dw_cfi_struct;
 extern void dwarf2out_decl (tree);
 extern void dwarf2out_frame_debug (rtx, bool);
 extern void dwarf2out_frame_debug_init (void);
-extern void dwarf2out_cfi_begin_epilogue (rtx);
-extern void dwarf2out_frame_debug_restore_state (void);
+extern void dwarf2out_emit_cfi (struct dw_cfi_struct *);
 extern void dwarf2out_flush_queued_reg_saves (void);
 
 extern void debug_dwarf (void);
Index: gcc/insn-notes.def
===================================================================
--- gcc.orig/insn-notes.def
+++ gcc/insn-notes.def
@@ -77,4 +77,12 @@ INSN_NOTE (SWITCH_TEXT_SECTIONS)
    when an epilogue appears in the middle of a function.  */
 INSN_NOTE (CFA_RESTORE_STATE)
 
+/* When emitting dwarf2 frame information, contains a directive that
+   should be emitted.  */
+INSN_NOTE (CFI)
+
+/* When emitting dwarf2 frame information, contains the number of a debug
+   label that should be emitted.  */
+INSN_NOTE (CFI_LABEL)
+
 #undef INSN_NOTE
Index: gcc/rtl.h
===================================================================
--- gcc.orig/rtl.h
+++ gcc/rtl.h
@@ -180,6 +180,7 @@ union rtunion_def
   mem_attrs *rt_mem;
   reg_attrs *rt_reg;
   struct constant_descriptor_rtx *rt_constant;
+  struct dw_cfi_struct *rt_cfi;
 };
 typedef union rtunion_def rtunion;
 
@@ -708,6 +709,7 @@ extern void rtl_check_failed_flag (const
 #define XTREE(RTX, N)   (RTL_CHECK1 (RTX, N, 't').rt_tree)
 #define XBBDEF(RTX, N)	(RTL_CHECK1 (RTX, N, 'B').rt_bb)
 #define XTMPL(RTX, N)	(RTL_CHECK1 (RTX, N, 'T').rt_str)
+#define XCFI(RTX, N)	(RTL_CHECK1 (RTX, N, 'C').rt_cfi)
 
 #define XVECEXP(RTX, N, M)	RTVEC_ELT (XVEC (RTX, N), M)
 #define XVECLEN(RTX, N)		GET_NUM_ELEM (XVEC (RTX, N))
@@ -740,6 +742,7 @@ extern void rtl_check_failed_flag (const
 #define XCMODE(RTX, N, C)     (RTL_CHECKC1 (RTX, N, C).rt_type)
 #define XCTREE(RTX, N, C)     (RTL_CHECKC1 (RTX, N, C).rt_tree)
 #define XCBBDEF(RTX, N, C)    (RTL_CHECKC1 (RTX, N, C).rt_bb)
+#define XCCFI(RTX, N, C)      (RTL_CHECKC1 (RTX, N, C).rt_cfi)
 #define XCCSELIB(RTX, N, C)   (RTL_CHECKC1 (RTX, N, C).rt_cselib)
 
 #define XCVECEXP(RTX, N, M, C)	RTVEC_ELT (XCVEC (RTX, N, C), M)
@@ -882,6 +885,8 @@ extern const char * const reg_note_name[
 #define NOTE_BLOCK(INSN)	XCTREE (INSN, 4, NOTE)
 #define NOTE_EH_HANDLER(INSN)	XCINT (INSN, 4, NOTE)
 #define NOTE_BASIC_BLOCK(INSN)	XCBBDEF (INSN, 4, NOTE)
+#define NOTE_CFI(INSN)		XCCFI (INSN, 4, NOTE)
+#define NOTE_LABEL_NUMBER(INSN)	XCINT (INSN, 4, NOTE)
 #define NOTE_VAR_LOCATION(INSN)	XCEXP (INSN, 4, NOTE)
 
 /* In a NOTE that is a line number, this is the line number.
Index: gcc/final.c
===================================================================
--- gcc.orig/final.c
+++ gcc/final.c
@@ -1678,7 +1678,7 @@ final_end_function (void)
 void
 final (rtx first, FILE *file, int optimize_p)
 {
-  rtx insn;
+  rtx insn, next;
   int max_uid = 0;
   int seen = 0;
 
@@ -1723,6 +1723,15 @@ final (rtx first, FILE *file, int optimi
 
       insn = final_scan_insn (insn, file, optimize_p, 0, &seen);
     }
+
+  for (insn = first; insn; insn = next)
+    {
+      next = NEXT_INSN (insn);
+      if (NOTE_P (insn)
+	  && (NOTE_KIND (insn) == NOTE_INSN_CFI
+	      || NOTE_KIND (insn) == NOTE_INSN_CFI_LABEL))
+	delete_insn (insn);
+    }
 }
 \f
 const char *
@@ -1899,16 +1908,19 @@ final_scan_insn (rtx insn, FILE *file, i
 	  break;
 
 	case NOTE_INSN_EPILOGUE_BEG:
-#if defined (HAVE_epilogue)
-	  if (dwarf2out_do_frame ())
-	    dwarf2out_cfi_begin_epilogue (insn);
-#endif
 	  (*debug_hooks->begin_epilogue) (last_linenum, last_filename);
 	  targetm.asm_out.function_begin_epilogue (file);
 	  break;
 
 	case NOTE_INSN_CFA_RESTORE_STATE:
-	  dwarf2out_frame_debug_restore_state ();
+	  break;
+
+	case NOTE_INSN_CFI:
+	  dwarf2out_emit_cfi (NOTE_CFI (insn));
+	  break;
+
+	case NOTE_INSN_CFI_LABEL:
+	  ASM_OUTPUT_DEBUG_LABEL (asm_out_file, "LCFI", NOTE_LABEL_NUMBER (insn));
 	  break;
 
 	case NOTE_INSN_FUNCTION_BEG:
@@ -2018,8 +2030,6 @@ final_scan_insn (rtx insn, FILE *file, i
       break;
 
     case BARRIER:
-      if (dwarf2out_do_frame ())
-	dwarf2out_frame_debug (insn, false);
       break;
 
     case CODE_LABEL:
@@ -2285,12 +2295,6 @@ final_scan_insn (rtx insn, FILE *file, i
 
 	    final_sequence = body;
 
-	    /* Record the delay slots' frame information before the branch.
-	       This is needed for delayed calls: see execute_cfa_program().  */
-	    if (dwarf2out_do_frame ())
-	      for (i = 1; i < XVECLEN (body, 0); i++)
-		dwarf2out_frame_debug (XVECEXP (body, 0, i), false);
-
 	    /* The first insn in this SEQUENCE might be a JUMP_INSN that will
 	       force the restoration of a comparison that was previously
 	       thought unnecessary.  If that happens, cancel this sequence
@@ -2604,9 +2608,6 @@ final_scan_insn (rtx insn, FILE *file, i
 
 	current_output_insn = debug_insn = insn;
 
-	if (CALL_P (insn) && dwarf2out_do_frame ())
-	  dwarf2out_frame_debug (insn, false);
-
 	/* Find the proper template for this insn.  */
 	templ = get_insn_template (insn_code_number, insn);
 
@@ -2686,16 +2687,6 @@ final_scan_insn (rtx insn, FILE *file, i
 	  targetm.asm_out.final_postscan_insn (file, insn, recog_data.operand,
 					       recog_data.n_operands);
 
-	/* If necessary, report the effect that the instruction has on
-	   the unwind info.   We've already done this for delay slots
-	   and call instructions.  */
-	if (final_sequence == 0
-#if !defined (HAVE_prologue)
-	    && !ACCUMULATE_OUTGOING_ARGS
-#endif
-	    && dwarf2out_do_frame ())
-	  dwarf2out_frame_debug (insn, true);
-
 	if (!targetm.asm_out.unwind_emit_before_insn
 	    && targetm.asm_out.unwind_emit)
 	  targetm.asm_out.unwind_emit (asm_out_file, insn);

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 3/6] Allow jumps in epilogues
  2011-04-13 14:16                     ` Bernd Schmidt
  2011-04-13 15:14                       ` Bernd Schmidt
@ 2011-04-13 15:16                       ` Bernd Schmidt
  2011-04-13 15:17                       ` Bernd Schmidt
  2 siblings, 0 replies; 73+ messages in thread
From: Bernd Schmidt @ 2011-04-13 15:16 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 625 bytes --]

The second part is a new patch, which reduces the amount of different
code paths we can take in add_fde_cfi, as this was becoming
unmanageable. The concept is to first emit just the CFI notes, in all
cases. Later, after we're done producing the CFI insns we need, another
pass over the rtl adds the necessary labels and set_loc/advance_loc
CFIs. One consequence of this is that def_cfa_1 can no longer use
lookup_cfa, so it just compares to an old_cfa variable instead. This
also requires target-specific changes as some ports use
dwarf2out_cfi_label. An (untested) example of the necessary changes is
in config/arm.


Bernd

[-- Attachment #2: 006-cfilabel.diff --]
[-- Type: text/plain, Size: 44197 bytes --]

---
 config/arm/arm.c     |    5 
 config/ia64/ia64.c   |    6 
 config/sparc/sparc.c |    7 
 config/vax/vax.c     |    2 
 dwarf2out.c          |  467 ++++++++++++++++++++++++---------------------------
 dwarf2out.h          |   32 +++
 final.c              |    5 
 target.def           |    2 
 tree.h               |   31 ---
 9 files changed, 270 insertions(+), 287 deletions(-)

Index: gcc/config/arm/arm.c
===================================================================
--- gcc.orig/config/arm/arm.c
+++ gcc/config/arm/arm.c
@@ -19977,18 +19977,19 @@ thumb_pushpop (FILE *f, unsigned long ma
 
   if (push && pushed_words && dwarf2out_do_frame ())
     {
-      char *l = dwarf2out_cfi_label (false);
       int pushed_mask = real_regs;
 
+      dwarf2out_maybe_emit_cfi_label ();
+
       *cfa_offset += pushed_words * 4;
-      dwarf2out_def_cfa (l, SP_REGNUM, *cfa_offset);
+      dwarf2out_def_cfa (SP_REGNUM, *cfa_offset);
 
       pushed_words = 0;
       pushed_mask = real_regs;
       for (regno = 0; regno <= 14; regno++, pushed_mask >>= 1)
 	{
 	  if (pushed_mask & 1)
-	    dwarf2out_reg_save (l, regno, 4 * pushed_words++ - *cfa_offset);
+	    dwarf2out_reg_save (regno, 4 * pushed_words++ - *cfa_offset);
 	}
     }
 }
@@ -20997,10 +20998,9 @@ thumb1_output_function_prologue (FILE *f
 	 the stack pointer.  */
       if (dwarf2out_do_frame ())
 	{
-	  char *l = dwarf2out_cfi_label (false);
-
+	  dwarf2out_maybe_emit_cfi_label ();
 	  cfa_offset = cfa_offset + crtl->args.pretend_args_size;
-	  dwarf2out_def_cfa (l, SP_REGNUM, cfa_offset);
+	  dwarf2out_def_cfa (SP_REGNUM, cfa_offset);
 	}
     }
 
@@ -21046,10 +21046,10 @@ thumb1_output_function_prologue (FILE *f
 
       if (dwarf2out_do_frame ())
 	{
-	  char *l = dwarf2out_cfi_label (false);
+	  dwarf2out_maybe_emit_cfi_label ();
 
 	  cfa_offset = cfa_offset + 16;
-	  dwarf2out_def_cfa (l, SP_REGNUM, cfa_offset);
+	  dwarf2out_def_cfa (SP_REGNUM, cfa_offset);
 	}
 
       if (l_mask)
@@ -22749,7 +22749,7 @@ arm_except_unwind_info (struct gcc_optio
    stack alignment.  */
 
 static void
-arm_dwarf_handle_frame_unspec (const char *label, rtx pattern, int index)
+arm_dwarf_handle_frame_unspec (rtx pattern, int index)
 {
   rtx unspec = SET_SRC (pattern);
   gcc_assert (GET_CODE (unspec) == UNSPEC);
@@ -22760,8 +22760,7 @@ arm_dwarf_handle_frame_unspec (const cha
       /* ??? We should set the CFA = (SP & ~7).  At this point we haven't
          put anything on the stack, so hopefully it won't matter.
          CFA = SP will be correct after alignment.  */
-      dwarf2out_reg_save_reg (label, stack_pointer_rtx,
-                              SET_DEST (pattern));
+      dwarf2out_reg_save_reg (stack_pointer_rtx, SET_DEST (pattern));
       break;
     default:
       gcc_unreachable ();
Index: gcc/config/ia64/ia64.c
===================================================================
--- gcc.orig/config/ia64/ia64.c
+++ gcc/config/ia64/ia64.c
@@ -330,7 +330,7 @@ static enum machine_mode ia64_promote_fu
 static void ia64_trampoline_init (rtx, tree, rtx);
 static void ia64_override_options_after_change (void);
 
-static void ia64_dwarf_handle_frame_unspec (const char *, rtx, int);
+static void ia64_dwarf_handle_frame_unspec (rtx, int);
 static tree ia64_builtin_decl (unsigned, bool);
 
 static reg_class_t ia64_preferred_reload_class (rtx, reg_class_t);
@@ -9710,9 +9710,7 @@ ia64_dwarf2out_def_steady_cfa (rtx insn,
    processing.  The real CFA definition is set up above.  */
 
 static void
-ia64_dwarf_handle_frame_unspec (const char * ARG_UNUSED (label),
-				rtx ARG_UNUSED (pattern),
-				int index)
+ia64_dwarf_handle_frame_unspec (rtx ARG_UNUSED (pattern), int index)
 {
   gcc_assert (index == UNSPECV_ALLOC);
 }
Index: gcc/config/sparc/sparc.c
===================================================================
--- gcc.orig/config/sparc/sparc.c
+++ gcc/config/sparc/sparc.c
@@ -454,7 +454,7 @@ static unsigned int sparc_function_arg_b
 						 const_tree);
 static int sparc_arg_partial_bytes (CUMULATIVE_ARGS *,
 				    enum machine_mode, tree, bool);
-static void sparc_dwarf_handle_frame_unspec (const char *, rtx, int);
+static void sparc_dwarf_handle_frame_unspec (rtx, int);
 static void sparc_output_dwarf_dtprel (FILE *, int, rtx) ATTRIBUTE_UNUSED;
 static void sparc_file_end (void);
 static bool sparc_frame_pointer_required (void);
@@ -9423,12 +9423,11 @@ get_some_local_dynamic_name_1 (rtx *px, 
    This is called from dwarf2out.c to emit call frame instructions
    for frame-related insns containing UNSPECs and UNSPEC_VOLATILEs. */
 static void
-sparc_dwarf_handle_frame_unspec (const char *label,
-				 rtx pattern ATTRIBUTE_UNUSED,
+sparc_dwarf_handle_frame_unspec (rtx pattern ATTRIBUTE_UNUSED,
 				 int index ATTRIBUTE_UNUSED)
 {
   gcc_assert (index == UNSPECV_SAVEW);
-  dwarf2out_window_save (label);
+  dwarf2out_window_save ();
 }
 
 /* This is called from dwarf2out.c via TARGET_ASM_OUTPUT_DWARF_DTPREL.
Index: gcc/config/vax/vax.c
===================================================================
--- gcc.orig/config/vax/vax.c
+++ gcc/config/vax/vax.c
@@ -163,17 +163,18 @@ vax_output_function_prologue (FILE * fil
 
   if (dwarf2out_do_frame ())
     {
-      const char *label = dwarf2out_cfi_label (false);
       int offset = 0;
 
+      dwarf2out_maybe_emit_cfi_label ();
+
       for (regno = FIRST_PSEUDO_REGISTER-1; regno >= 0; --regno)
 	if (df_regs_ever_live_p (regno) && !call_used_regs[regno])
-	  dwarf2out_reg_save (label, regno, offset -= 4);
+	  dwarf2out_reg_save (regno, offset -= 4);
 
-      dwarf2out_reg_save (label, PC_REGNUM, offset -= 4);
-      dwarf2out_reg_save (label, FRAME_POINTER_REGNUM, offset -= 4);
-      dwarf2out_reg_save (label, ARG_POINTER_REGNUM, offset -= 4);
-      dwarf2out_def_cfa (label, FRAME_POINTER_REGNUM, -(offset - 4));
+      dwarf2out_reg_save (PC_REGNUM, offset -= 4);
+      dwarf2out_reg_save (FRAME_POINTER_REGNUM, offset -= 4);
+      dwarf2out_reg_save (ARG_POINTER_REGNUM, offset -= 4);
+      dwarf2out_def_cfa (false, FRAME_POINTER_REGNUM, -(offset - 4));
     }
 
   size -= STARTING_FRAME_OFFSET;
Index: gcc/dwarf2out.c
===================================================================
--- gcc.orig/dwarf2out.c
+++ gcc/dwarf2out.c
@@ -456,11 +456,11 @@ static GTY(()) section *cold_text_sectio
 static char *stripattributes (const char *);
 static const char *dwarf_cfi_name (unsigned);
 static dw_cfi_ref new_cfi (void);
-static void add_cfi (cfi_vec *, dw_cfi_ref);
-static void add_fde_cfi (const char *, dw_cfi_ref);
+static void add_fde_cfi (dw_cfi_ref);
+static void add_cie_cfi (dw_cfi_ref);
 static void lookup_cfa_1 (dw_cfi_ref, dw_cfa_location *, dw_cfa_location *);
 static void lookup_cfa (dw_cfa_location *);
-static void reg_save (const char *, unsigned, unsigned, HOST_WIDE_INT);
+static void reg_save (bool, unsigned, unsigned, HOST_WIDE_INT);
 static void initial_return_save (rtx);
 static HOST_WIDE_INT stack_adjust_offset (const_rtx, HOST_WIDE_INT,
 					  HOST_WIDE_INT);
@@ -469,7 +469,7 @@ static void output_cfi_directive (dw_cfi
 static void output_call_frame_info (int);
 static void dwarf2out_note_section_used (void);
 static bool clobbers_queued_reg_save (const_rtx);
-static void dwarf2out_frame_debug_expr (rtx, const char *);
+static void dwarf2out_frame_debug_expr (rtx);
 static void dwarf2out_cfi_begin_epilogue (rtx);
 static void dwarf2out_frame_debug_restore_state (void);
 
@@ -482,7 +482,7 @@ static struct dw_loc_descr_struct *build
   (dw_cfa_location *, HOST_WIDE_INT);
 static struct dw_loc_descr_struct *build_cfa_aligned_loc
   (HOST_WIDE_INT, HOST_WIDE_INT);
-static void def_cfa_1 (const char *, dw_cfa_location *);
+static void def_cfa_1 (bool, dw_cfa_location *);
 static struct dw_loc_descr_struct *mem_loc_descriptor
   (rtx, enum machine_mode mode, enum var_init_status);
 
@@ -820,35 +820,6 @@ new_cfi (void)
   return cfi;
 }
 
-/* Add a Call Frame Instruction to list of instructions.  */
-
-static inline void
-add_cfi (cfi_vec *vec, dw_cfi_ref cfi)
-{
-  dw_fde_ref fde = current_fde ();
-
-  /* When DRAP is used, CFA is defined with an expression.  Redefine
-     CFA may lead to a different CFA value.   */
-  /* ??? Of course, this heuristic fails when we're annotating epilogues,
-     because of course we'll always want to redefine the CFA back to the
-     stack pointer on the way out.  Where should we move this check?  */
-  if (0 && fde && fde->drap_reg != INVALID_REGNUM)
-    switch (cfi->dw_cfi_opc)
-      {
-        case DW_CFA_def_cfa_register:
-        case DW_CFA_def_cfa_offset:
-        case DW_CFA_def_cfa_offset_sf:
-        case DW_CFA_def_cfa:
-        case DW_CFA_def_cfa_sf:
-	  gcc_unreachable ();
-
-        default:
-          break;
-      }
-
-  VEC_safe_push (dw_cfi_ref, gc, *vec, cfi);
-}
-
 /* The insn after which a new CFI note should be emitted.  */
 static rtx cfi_insn;
 
@@ -858,45 +829,51 @@ static bool emit_cfa_remember;
 /* True if any CFI directives were emitted at the current insn.  */
 static bool any_cfis_emitted;
 
-/* Generate a new label for the CFI info to refer to.  FORCE is true
-   if a label needs to be output even when using .cfi_* directives.  */
+/* Generate a new label for the CFI info to refer to.  */
 
-char *
-dwarf2out_cfi_label (bool force)
+static char *
+dwarf2out_cfi_label (void)
 {
   static char label[20];
 
-  if (!force && dwarf2out_do_cfi_asm ())
-    {
-      /* In this case, we will be emitting the asm directive instead of
-	 the label, so just return a placeholder to keep the rest of the
-	 interfaces happy.  */
-      strcpy (label, "<do not output>");
-    }
-  else
-    {
-      int num = dwarf2out_cfi_label_num++;
-      ASM_GENERATE_INTERNAL_LABEL (label, "LCFI", num);
-      cfi_insn = emit_note_after (NOTE_INSN_CFI_LABEL, cfi_insn);
-      NOTE_LABEL_NUMBER (cfi_insn) = num;
-    }
+  int num = dwarf2out_cfi_label_num++;
+  ASM_GENERATE_INTERNAL_LABEL (label, "LCFI", num);
 
   return label;
 }
 
+/* Called by target specific code if it wants to emit CFI insns in the text
+   prologue.  If necessary, emit a CFI label and an advance_loc CFI.  See
+   also cfi_label_required_p.  */
+void
+dwarf2out_maybe_emit_cfi_label (void)
+{
+  if ((dwarf_version == 2
+       && debug_info_level > DINFO_LEVEL_TERSE
+       && (write_symbols == DWARF2_DEBUG
+	   || write_symbols == VMS_AND_DWARF2_DEBUG))
+      || !dwarf2out_do_cfi_asm ())
+    {
+      const char *l;
+      dw_cfi_ref xcfi;
+
+      ASM_OUTPUT_DEBUG_LABEL (asm_out_file, "LCFI", dwarf2out_cfi_label_num);
+      l = dwarf2out_cfi_label ();
+      l = xstrdup (l);
+
+      xcfi = new_cfi ();
+      xcfi->dw_cfi_opc = DW_CFA_advance_loc4;
+      xcfi->dw_cfi_oprnd1.dw_cfi_addr = l;
+      add_fde_cfi (xcfi);
+    }
+}
+
 /* Add CFI to the current fde at the PC value indicated by LABEL if specified,
    or to the CIE if LABEL is NULL.  */
 
 static void
-add_fde_cfi (const char *label, dw_cfi_ref cfi)
+add_fde_cfi (dw_cfi_ref cfi)
 {
-  cfi_vec *vec;
-
-  if (cie_cfi_vec == NULL)
-    cie_cfi_vec = VEC_alloc (dw_cfi_ref, gc, 20);
-
-  vec = &cie_cfi_vec;
-
   if (emit_cfa_remember)
     {
       dw_cfi_ref cfi_remember;
@@ -905,110 +882,30 @@ add_fde_cfi (const char *label, dw_cfi_r
       emit_cfa_remember = false;
       cfi_remember = new_cfi ();
       cfi_remember->dw_cfi_opc = DW_CFA_remember_state;
-      add_fde_cfi (label, cfi_remember);
+      add_fde_cfi (cfi_remember);
     }
 
-  if (dwarf2out_do_cfi_asm ())
+  any_cfis_emitted = true;
+  if (cfi_insn != NULL)
     {
-      if (label)
-	{
-	  dw_fde_ref fde = current_fde ();
-
-	  gcc_assert (fde != NULL);
-
-	  /* We still have to add the cfi to the list so that lookup_cfa
-	     works later on.  When -g2 and above we even need to force
-	     emitting of CFI labels and add to list a DW_CFA_set_loc for
-	     convert_cfa_to_fb_loc_list purposes.  If we're generating
-	     DWARF3 output we use DW_OP_call_frame_cfa and so don't use
-	     convert_cfa_to_fb_loc_list.  */
-	  if (dwarf_version == 2
-	      && debug_info_level > DINFO_LEVEL_TERSE
-	      && (write_symbols == DWARF2_DEBUG
-		  || write_symbols == VMS_AND_DWARF2_DEBUG))
-	    {
-	      switch (cfi->dw_cfi_opc)
-		{
-		case DW_CFA_def_cfa_offset:
-		case DW_CFA_def_cfa_offset_sf:
-		case DW_CFA_def_cfa_register:
-		case DW_CFA_def_cfa:
-		case DW_CFA_def_cfa_sf:
-		case DW_CFA_def_cfa_expression:
-		case DW_CFA_restore_state:
-		  if (*label == 0 || strcmp (label, "<do not output>") == 0)
-		    label = dwarf2out_cfi_label (true);
-
-		  if (fde->dw_fde_current_label == NULL
-		      || strcmp (label, fde->dw_fde_current_label) != 0)
-		    {
-		      dw_cfi_ref xcfi;
-
-		      label = xstrdup (label);
-
-		      /* Set the location counter to the new label.  */
-		      xcfi = new_cfi ();
-		      /* It doesn't metter whether DW_CFA_set_loc
-		         or DW_CFA_advance_loc4 is added here, those aren't
-		         emitted into assembly, only looked up by
-		         convert_cfa_to_fb_loc_list.  */
-		      xcfi->dw_cfi_opc = DW_CFA_set_loc;
-		      xcfi->dw_cfi_oprnd1.dw_cfi_addr = label;
-		      add_cfi (&fde->dw_fde_cfi, xcfi);
-		      fde->dw_fde_current_label = label;
-		    }
-		  break;
-		default:
-		  break;
-	        }
-	    }
-
-	  cfi_insn = emit_note_after (NOTE_INSN_CFI, cfi_insn);
-	  NOTE_CFI (cfi_insn) = cfi;
-
-	  vec = &fde->dw_fde_cfi;
-	  any_cfis_emitted = true;
-	}
-      /* ??? If this is a CFI for the CIE, we don't emit.  This
-	 assumes that the standard CIE contents that the assembler
-	 uses matches the standard CIE contents that the compiler
-	 uses.  This is probably a bad assumption.  I'm not quite
-	 sure how to address this for now.  */
+      cfi_insn = emit_note_after (NOTE_INSN_CFI, cfi_insn);
+      NOTE_CFI (cfi_insn) = cfi;
     }
-  else if (label)
+  else
     {
       dw_fde_ref fde = current_fde ();
-
-      gcc_assert (fde != NULL);
-
-      if (*label == 0)
-	label = dwarf2out_cfi_label (false);
-
-      if (fde->dw_fde_current_label == NULL
-	  || strcmp (label, fde->dw_fde_current_label) != 0)
-	{
-	  dw_cfi_ref xcfi;
-
-	  label = xstrdup (label);
-
-	  /* Set the location counter to the new label.  */
-	  xcfi = new_cfi ();
-	  /* If we have a current label, advance from there, otherwise
-	     set the location directly using set_loc.  */
-	  xcfi->dw_cfi_opc = fde->dw_fde_current_label
-			     ? DW_CFA_advance_loc4
-			     : DW_CFA_set_loc;
-	  xcfi->dw_cfi_oprnd1.dw_cfi_addr = label;
-	  add_cfi (&fde->dw_fde_cfi, xcfi);
-
-	  fde->dw_fde_current_label = label;
-	}
-
-      vec = &fde->dw_fde_cfi;
-      any_cfis_emitted = true;
+      VEC_safe_push (dw_cfi_ref, gc, fde->dw_fde_cfi, cfi);
+      dwarf2out_emit_cfi (cfi);
     }
+}
+
+static void
+add_cie_cfi (dw_cfi_ref cfi)
+{
+  if (cie_cfi_vec == NULL)
+    cie_cfi_vec = VEC_alloc (dw_cfi_ref, gc, 20);
 
-  add_cfi (vec, cfi);
+  VEC_safe_push (dw_cfi_ref, gc, cie_cfi_vec, cfi);
 }
 
 /* Subroutine of lookup_cfa.  */
@@ -1076,6 +973,9 @@ lookup_cfa (dw_cfa_location *loc)
 /* The current rule for calculating the DWARF2 canonical frame address.  */
 static dw_cfa_location cfa;
 
+/* A copy of CFA, for comparison purposes  */
+static dw_cfa_location old_cfa;
+
 /* The register used for saving registers to the stack, and its offset
    from the CFA.  */
 static dw_cfa_location cfa_store;
@@ -1083,25 +983,27 @@ static dw_cfa_location cfa_store;
 /* The current save location around an epilogue.  */
 static dw_cfa_location cfa_remember;
 
+/* Like cfa_remember, but a copy of old_cfa.  */
+static dw_cfa_location old_cfa_remember;
+
 /* The running total of the size of arguments pushed onto the stack.  */
 static HOST_WIDE_INT args_size;
 
 /* The last args_size we actually output.  */
 static HOST_WIDE_INT old_args_size;
 
-/* Entry point to update the canonical frame address (CFA).
-   LABEL is passed to add_fde_cfi.  The value of CFA is now to be
-   calculated from REG+OFFSET.  */
+/* Entry point to update the canonical frame address (CFA).  The value
+   of CFA is now to be calculated from REG+OFFSET.  */
 
 void
-dwarf2out_def_cfa (const char *label, unsigned int reg, HOST_WIDE_INT offset)
+dwarf2out_def_cfa (bool for_cie, unsigned int reg, HOST_WIDE_INT offset)
 {
   dw_cfa_location loc;
   loc.indirect = 0;
   loc.base_offset = 0;
   loc.reg = reg;
   loc.offset = offset;
-  def_cfa_1 (label, &loc);
+  def_cfa_1 (for_cie, &loc);
 }
 
 /* Determine if two dw_cfa_location structures define the same data.  */
@@ -1120,10 +1022,10 @@ cfa_equal_p (const dw_cfa_location *loc1
    the dw_cfa_location structure.  */
 
 static void
-def_cfa_1 (const char *label, dw_cfa_location *loc_p)
+def_cfa_1 (bool for_cie, dw_cfa_location *loc_p)
 {
   dw_cfi_ref cfi;
-  dw_cfa_location old_cfa, loc;
+  dw_cfa_location loc;
 
   cfa = *loc_p;
   loc = *loc_p;
@@ -1132,7 +1034,6 @@ def_cfa_1 (const char *label, dw_cfa_loc
     cfa_store.offset = loc.offset;
 
   loc.reg = DWARF_FRAME_REGNUM (loc.reg);
-  lookup_cfa (&old_cfa);
 
   /* If nothing changed, no need to issue any call frame instructions.  */
   if (cfa_equal_p (&loc, &old_cfa))
@@ -1193,16 +1094,19 @@ def_cfa_1 (const char *label, dw_cfa_loc
       cfi->dw_cfi_oprnd1.dw_cfi_loc = loc_list;
     }
 
-  add_fde_cfi (label, cfi);
+  if (for_cie)
+    add_cie_cfi (cfi);
+  else
+    add_fde_cfi (cfi);
+  old_cfa = loc;
 }
 
 /* Add the CFI for saving a register.  REG is the CFA column number.
-   LABEL is passed to add_fde_cfi.
    If SREG is -1, the register is saved at OFFSET from the CFA;
    otherwise it is saved in SREG.  */
 
 static void
-reg_save (const char *label, unsigned int reg, unsigned int sreg, HOST_WIDE_INT offset)
+reg_save (bool for_cie, unsigned int reg, unsigned int sreg, HOST_WIDE_INT offset)
 {
   dw_cfi_ref cfi = new_cfi ();
   dw_fde_ref fde = current_fde ();
@@ -1238,10 +1142,13 @@ reg_save (const char *label, unsigned in
       cfi->dw_cfi_oprnd2.dw_cfi_reg_num = sreg;
     }
 
-  add_fde_cfi (label, cfi);
+  if (for_cie)
+    add_cie_cfi (cfi);
+  else
+    add_fde_cfi (cfi);
 }
 
-/* Add the CFI for saving a register window.  LABEL is passed to reg_save.
+/* Add the CFI for saving a register window.
    This CFI tells the unwinder that it needs to restore the window registers
    from the previous frame's window save area.
 
@@ -1249,39 +1156,39 @@ reg_save (const char *label, unsigned in
    assuming 0(cfa)) and what registers are in the window.  */
 
 void
-dwarf2out_window_save (const char *label)
+dwarf2out_window_save (void)
 {
   dw_cfi_ref cfi = new_cfi ();
 
   cfi->dw_cfi_opc = DW_CFA_GNU_window_save;
-  add_fde_cfi (label, cfi);
+  add_fde_cfi (cfi);
 }
 
 /* Entry point for saving a register to the stack.  REG is the GCC register
    number.  LABEL and OFFSET are passed to reg_save.  */
 
 void
-dwarf2out_reg_save (const char *label, unsigned int reg, HOST_WIDE_INT offset)
+dwarf2out_reg_save (unsigned int reg, HOST_WIDE_INT offset)
 {
-  reg_save (label, DWARF_FRAME_REGNUM (reg), INVALID_REGNUM, offset);
+  reg_save (false, DWARF_FRAME_REGNUM (reg), INVALID_REGNUM, offset);
 }
 
 /* Entry point for saving the return address in the stack.
    LABEL and OFFSET are passed to reg_save.  */
 
 void
-dwarf2out_return_save (const char *label, HOST_WIDE_INT offset)
+dwarf2out_return_save (HOST_WIDE_INT offset)
 {
-  reg_save (label, DWARF_FRAME_RETURN_COLUMN, INVALID_REGNUM, offset);
+  reg_save (false, DWARF_FRAME_RETURN_COLUMN, INVALID_REGNUM, offset);
 }
 
 /* Entry point for saving the return address in a register.
    LABEL and SREG are passed to reg_save.  */
 
 void
-dwarf2out_return_reg (const char *label, unsigned int sreg)
+dwarf2out_return_reg (unsigned int sreg)
 {
-  reg_save (label, DWARF_FRAME_RETURN_COLUMN, DWARF_FRAME_REGNUM (sreg), 0);
+  reg_save (false, DWARF_FRAME_RETURN_COLUMN, DWARF_FRAME_REGNUM (sreg), 0);
 }
 
 /* Record the initial position of the return address.  RTL is
@@ -1339,7 +1246,7 @@ initial_return_save (rtx rtl)
     }
 
   if (reg != DWARF_FRAME_RETURN_COLUMN)
-    reg_save (NULL, DWARF_FRAME_RETURN_COLUMN, reg, offset - cfa.offset);
+    reg_save (true, DWARF_FRAME_RETURN_COLUMN, reg, offset - cfa.offset);
 }
 
 /* Given a SET, calculate the amount of stack adjustment it
@@ -1609,7 +1516,7 @@ compute_barrier_args_size (void)
    pushed onto the stack.  */
 
 static void
-dwarf2out_args_size (const char *label, HOST_WIDE_INT size)
+dwarf2out_args_size (HOST_WIDE_INT size)
 {
   dw_cfi_ref cfi;
 
@@ -1621,13 +1528,13 @@ dwarf2out_args_size (const char *label, 
   cfi = new_cfi ();
   cfi->dw_cfi_opc = DW_CFA_GNU_args_size;
   cfi->dw_cfi_oprnd1.dw_cfi_offset = size;
-  add_fde_cfi (label, cfi);
+  add_fde_cfi (cfi);
 }
 
 /* Record a stack adjustment of OFFSET bytes.  */
 
 static void
-dwarf2out_stack_adjust (HOST_WIDE_INT offset, const char *label)
+dwarf2out_stack_adjust (HOST_WIDE_INT offset)
 {
   if (cfa.reg == STACK_POINTER_REGNUM)
     cfa.offset += offset;
@@ -1646,9 +1553,9 @@ dwarf2out_stack_adjust (HOST_WIDE_INT of
   if (args_size < 0)
     args_size = 0;
 
-  def_cfa_1 (label, &cfa);
+  def_cfa_1 (false, &cfa);
   if (flag_asynchronous_unwind_tables)
-    dwarf2out_args_size (label, args_size);
+    dwarf2out_args_size (args_size);
 }
 
 /* Check INSN to see if it looks like a push or a stack adjustment, and
@@ -1659,7 +1566,6 @@ static void
 dwarf2out_notice_stack_adjust (rtx insn, bool after_p)
 {
   HOST_WIDE_INT offset;
-  const char *label;
   int i;
 
   /* Don't handle epilogues at all.  Certainly it would be wrong to do so
@@ -1690,7 +1596,7 @@ dwarf2out_notice_stack_adjust (rtx insn,
 	  if (GET_CODE (insn) == SET)
 	    insn = SET_SRC (insn);
 	  gcc_assert (GET_CODE (insn) == CALL);
-	  dwarf2out_args_size ("", INTVAL (XEXP (insn, 1)));
+	  dwarf2out_args_size (INTVAL (XEXP (insn, 1)));
 	}
       return;
     }
@@ -1698,7 +1604,7 @@ dwarf2out_notice_stack_adjust (rtx insn,
   if (CALL_P (insn) && !after_p)
     {
       if (!flag_asynchronous_unwind_tables)
-	dwarf2out_args_size ("", args_size);
+	dwarf2out_args_size (args_size);
       return;
     }
   else if (BARRIER_P (insn))
@@ -1739,8 +1645,7 @@ dwarf2out_notice_stack_adjust (rtx insn,
   if (offset == 0)
     return;
 
-  label = dwarf2out_cfi_label (false);
-  dwarf2out_stack_adjust (offset, label);
+  dwarf2out_stack_adjust (offset);
 }
 
 /* We delay emitting a register save until either (a) we reach the end
@@ -1769,13 +1674,11 @@ struct GTY(()) reg_saved_in_data {
 static GTY(()) struct reg_saved_in_data regs_saved_in_regs[4];
 static GTY(()) size_t num_regs_saved_in_regs;
 
-static const char *last_reg_save_label;
-
 /* Add an entry to QUEUED_REG_SAVES saying that REG is now saved at
    SREG, or if SREG is NULL then it is saved at OFFSET to the CFA.  */
 
 static void
-queue_reg_save (const char *label, rtx reg, rtx sreg, HOST_WIDE_INT offset)
+queue_reg_save (rtx reg, rtx sreg, HOST_WIDE_INT offset)
 {
   struct queued_reg_save *q;
 
@@ -1796,8 +1699,6 @@ queue_reg_save (const char *label, rtx r
   q->reg = reg;
   q->cfa_offset = offset;
   q->saved_reg = sreg;
-
-  last_reg_save_label = label;
 }
 
 /* Output all the entries in QUEUED_REG_SAVES.  */
@@ -1831,11 +1732,10 @@ dwarf2out_flush_queued_reg_saves (void)
 	sreg = DWARF_FRAME_REGNUM (REGNO (q->saved_reg));
       else
 	sreg = INVALID_REGNUM;
-      reg_save (last_reg_save_label, reg, sreg, q->cfa_offset);
+      reg_save (false, reg, sreg, q->cfa_offset);
     }
 
   queued_reg_saves = NULL;
-  last_reg_save_label = NULL;
 }
 
 /* Does INSN clobber any register which QUEUED_REG_SAVES lists a saved
@@ -1865,7 +1765,7 @@ clobbers_queued_reg_save (const_rtx insn
 /* Entry point for saving the first register into the second.  */
 
 void
-dwarf2out_reg_save_reg (const char *label, rtx reg, rtx sreg)
+dwarf2out_reg_save_reg (rtx reg, rtx sreg)
 {
   size_t i;
   unsigned int regno, sregno;
@@ -1883,7 +1783,7 @@ dwarf2out_reg_save_reg (const char *labe
 
   regno = DWARF_FRAME_REGNUM (REGNO (reg));
   sregno = DWARF_FRAME_REGNUM (REGNO (sreg));
-  reg_save (label, regno, sregno, 0);
+  reg_save (false, regno, sregno, 0);
 }
 
 /* What register, if any, is currently saved in REG?  */
@@ -1916,7 +1816,7 @@ static dw_cfa_location cfa_temp;
 /* A subroutine of dwarf2out_frame_debug, process a REG_DEF_CFA note.  */
 
 static void
-dwarf2out_frame_debug_def_cfa (rtx pat, const char *label)
+dwarf2out_frame_debug_def_cfa (rtx pat)
 {
   memset (&cfa, 0, sizeof (cfa));
 
@@ -1947,13 +1847,13 @@ dwarf2out_frame_debug_def_cfa (rtx pat, 
       gcc_unreachable ();
     }
 
-  def_cfa_1 (label, &cfa);
+  def_cfa_1 (false, &cfa);
 }
 
 /* A subroutine of dwarf2out_frame_debug, process a REG_ADJUST_CFA note.  */
 
 static void
-dwarf2out_frame_debug_adjust_cfa (rtx pat, const char *label)
+dwarf2out_frame_debug_adjust_cfa (rtx pat)
 {
   rtx src, dest;
 
@@ -1978,13 +1878,13 @@ dwarf2out_frame_debug_adjust_cfa (rtx pa
   cfa.reg = REGNO (dest);
   gcc_assert (cfa.indirect == 0);
 
-  def_cfa_1 (label, &cfa);
+  def_cfa_1 (false, &cfa);
 }
 
 /* A subroutine of dwarf2out_frame_debug, process a REG_CFA_OFFSET note.  */
 
 static void
-dwarf2out_frame_debug_cfa_offset (rtx set, const char *label)
+dwarf2out_frame_debug_cfa_offset (rtx set)
 {
   HOST_WIDE_INT offset;
   rtx src, addr, span;
@@ -2014,7 +1914,7 @@ dwarf2out_frame_debug_cfa_offset (rtx se
   /* ??? We'd like to use queue_reg_save, but we need to come up with
      a different flushing heuristic for epilogues.  */
   if (!span)
-    reg_save (label, DWARF_FRAME_REGNUM (REGNO (src)), INVALID_REGNUM, offset);
+    reg_save (false, DWARF_FRAME_REGNUM (REGNO (src)), INVALID_REGNUM, offset);
   else
     {
       /* We have a PARALLEL describing where the contents of SRC live.
@@ -2030,7 +1930,7 @@ dwarf2out_frame_debug_cfa_offset (rtx se
 	{
 	  rtx elem = XVECEXP (span, 0, par_index);
 
-	  reg_save (label, DWARF_FRAME_REGNUM (REGNO (elem)),
+	  reg_save (false, DWARF_FRAME_REGNUM (REGNO (elem)),
 		    INVALID_REGNUM, span_offset);
 	  span_offset += GET_MODE_SIZE (GET_MODE (elem));
 	}
@@ -2040,7 +1940,7 @@ dwarf2out_frame_debug_cfa_offset (rtx se
 /* A subroutine of dwarf2out_frame_debug, process a REG_CFA_REGISTER note.  */
 
 static void
-dwarf2out_frame_debug_cfa_register (rtx set, const char *label)
+dwarf2out_frame_debug_cfa_register (rtx set)
 {
   rtx src, dest;
   unsigned sregno, dregno;
@@ -2057,13 +1957,13 @@ dwarf2out_frame_debug_cfa_register (rtx 
 
   /* ??? We'd like to use queue_reg_save, but we need to come up with
      a different flushing heuristic for epilogues.  */
-  reg_save (label, sregno, dregno, 0);
+  reg_save (false, sregno, dregno, 0);
 }
 
 /* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note. */
 
 static void
-dwarf2out_frame_debug_cfa_expression (rtx set, const char *label)
+dwarf2out_frame_debug_cfa_expression (rtx set)
 {
   rtx src, dest, span;
   dw_cfi_ref cfi = new_cfi ();
@@ -2085,13 +1985,13 @@ dwarf2out_frame_debug_cfa_expression (rt
 
   /* ??? We'd like to use queue_reg_save, were the interface different,
      and, as above, we could manage flushing for epilogues.  */
-  add_fde_cfi (label, cfi);
+  add_fde_cfi (cfi);
 }
 
 /* A subroutine of dwarf2out_frame_debug, process a REG_CFA_RESTORE note.  */
 
 static void
-dwarf2out_frame_debug_cfa_restore (rtx reg, const char *label)
+dwarf2out_frame_debug_cfa_restore (rtx reg)
 {
   dw_cfi_ref cfi = new_cfi ();
   unsigned int regno = DWARF_FRAME_REGNUM (REGNO (reg));
@@ -2099,7 +1999,102 @@ dwarf2out_frame_debug_cfa_restore (rtx r
   cfi->dw_cfi_opc = (regno & ~0x3f ? DW_CFA_restore_extended : DW_CFA_restore);
   cfi->dw_cfi_oprnd1.dw_cfi_reg_num = regno;
 
-  add_fde_cfi (label, cfi);
+  add_fde_cfi (cfi);
+}
+
+/* Examine CFI and return true if a cfi label and set_loc is needed before
+   it.  Even when generating CFI assembler instructions, we still have to
+   add the cfi to the list so that lookup_cfa works later on.  When
+   -g2 and above we even need to force emitting of CFI labels and add
+   to list a DW_CFA_set_loc for convert_cfa_to_fb_loc_list purposes.
+   If we're generating DWARF3 output we use DW_OP_call_frame_cfa and
+   so don't use convert_cfa_to_fb_loc_list.  */
+
+static bool
+cfi_label_required_p (dw_cfi_ref cfi)
+{
+  if (!dwarf2out_do_cfi_asm ())
+    return true;
+
+  if (dwarf_version == 2
+      && debug_info_level > DINFO_LEVEL_TERSE
+      && (write_symbols == DWARF2_DEBUG
+	  || write_symbols == VMS_AND_DWARF2_DEBUG))
+    {
+      switch (cfi->dw_cfi_opc)
+	{
+	case DW_CFA_def_cfa_offset:
+	case DW_CFA_def_cfa_offset_sf:
+	case DW_CFA_def_cfa_register:
+	case DW_CFA_def_cfa:
+	case DW_CFA_def_cfa_sf:
+	case DW_CFA_def_cfa_expression:
+	case DW_CFA_restore_state:
+	  return true;
+	default:
+	  return false;
+	}
+    }
+  return false;
+}
+
+/* Walk the functino, looking for NOTE_INSN_CFI notes.  Add the CFIs to the
+   function's FDE, adding CFI labels and set_loc/advance_loc opcodes as
+   necessary.  */
+static void
+add_cfis_to_fde (void)
+{
+  dw_fde_ref fde = current_fde ();
+  rtx insn, next;
+  /* We always start with a function_begin label.  */
+  bool first = false;
+
+  for (insn = get_insns (); insn; insn = next)
+    {
+      next = NEXT_INSN (insn);
+
+      if (NOTE_P (insn) && NOTE_KIND (insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS)
+	/* Don't attempt to advance_loc4 between labels in different
+	   sections.  */
+	first = true;
+
+      if (NOTE_P (insn) && NOTE_KIND (insn) == NOTE_INSN_CFI)
+	{
+	  bool required = cfi_label_required_p (NOTE_CFI (insn));
+	  while (next && NOTE_P (next) && NOTE_KIND (next) == NOTE_INSN_CFI)
+	    {
+	      required |= cfi_label_required_p (NOTE_CFI (next));
+	      next = NEXT_INSN (next);
+	    }
+	  if (required)
+	    {
+	      int num = dwarf2out_cfi_label_num;
+	      const char *label = dwarf2out_cfi_label ();
+	      dw_cfi_ref xcfi;
+	      rtx tmp;
+
+	      label = xstrdup (label);
+
+	      /* Set the location counter to the new label.  */
+	      xcfi = new_cfi ();
+	      xcfi->dw_cfi_opc = (first ? DW_CFA_set_loc
+				  : DW_CFA_advance_loc4);
+	      xcfi->dw_cfi_oprnd1.dw_cfi_addr = label;
+	      VEC_safe_push (dw_cfi_ref, gc, fde->dw_fde_cfi, xcfi);
+
+	      tmp = emit_note_before (NOTE_INSN_CFI_LABEL, insn);
+	      NOTE_LABEL_NUMBER (tmp) = num;
+	    }
+
+	  do
+	    {
+	      VEC_safe_push (dw_cfi_ref, gc, fde->dw_fde_cfi, NOTE_CFI (insn));
+	      insn = NEXT_INSN (insn);
+	    }
+	  while (insn != next);
+	  first = false;
+	}
+    }
 }
 
 /* Record call frame debugging information for an expression EXPR,
@@ -2298,7 +2293,7 @@ dwarf2out_frame_debug_cfa_restore (rtx r
   	   cfa.reg == fde->drap_reg  */
 
 static void
-dwarf2out_frame_debug_expr (rtx expr, const char *label)
+dwarf2out_frame_debug_expr (rtx expr)
 {
   rtx src, dest, span;
   HOST_WIDE_INT offset;
@@ -2327,7 +2322,7 @@ dwarf2out_frame_debug_expr (rtx expr, co
 	    if (GET_CODE (elem) == SET
 		&& MEM_P (SET_DEST (elem))
 		&& (RTX_FRAME_RELATED_P (elem) || par_index == 0))
-	      dwarf2out_frame_debug_expr (elem, label);
+	      dwarf2out_frame_debug_expr (elem);
 	  }
 
       for (par_index = 0; par_index < limit; par_index++)
@@ -2336,7 +2331,7 @@ dwarf2out_frame_debug_expr (rtx expr, co
 	  if (GET_CODE (elem) == SET
 	      && (!MEM_P (SET_DEST (elem)) || GET_CODE (expr) == SEQUENCE)
 	      && (RTX_FRAME_RELATED_P (elem) || par_index == 0))
-	    dwarf2out_frame_debug_expr (elem, label);
+	    dwarf2out_frame_debug_expr (elem);
 	  else if (GET_CODE (elem) == SET
 		   && par_index != 0
 		   && !RTX_FRAME_RELATED_P (elem))
@@ -2346,7 +2341,7 @@ dwarf2out_frame_debug_expr (rtx expr, co
 	      HOST_WIDE_INT offset = stack_adjust_offset (elem, args_size, 0);
 
 	      if (offset != 0)
-		dwarf2out_stack_adjust (offset, label);
+		dwarf2out_stack_adjust (offset);
 	    }
 	}
       return;
@@ -2406,7 +2401,7 @@ dwarf2out_frame_debug_expr (rtx expr, co
 			    && fde->drap_reg != INVALID_REGNUM
 			    && cfa.reg != REGNO (src));
 	      else
-		queue_reg_save (label, src, dest, 0);
+		queue_reg_save (src, dest, 0);
 	    }
 	  break;
 
@@ -2536,7 +2531,7 @@ dwarf2out_frame_debug_expr (rtx expr, co
 	case UNSPEC:
 	case UNSPEC_VOLATILE:
 	  gcc_assert (targetm.dwarf_handle_frame_unspec);
-	  targetm.dwarf_handle_frame_unspec (label, expr, XINT (src, 1));
+	  targetm.dwarf_handle_frame_unspec (expr, XINT (src, 1));
 	  return;
 
 	  /* Rule 16 */
@@ -2565,7 +2560,7 @@ dwarf2out_frame_debug_expr (rtx expr, co
 	  gcc_unreachable ();
 	}
 
-      def_cfa_1 (label, &cfa);
+      def_cfa_1 (false, &cfa);
       break;
 
     case MEM:
@@ -2721,15 +2716,15 @@ dwarf2out_frame_debug_expr (rtx expr, co
 
 		  fde->drap_reg_saved = 1;
 
-		  def_cfa_1 (label, &cfa_exp);
+		  def_cfa_1 (false, &cfa_exp);
 		  break;
                 }
 
 	      /* If the source register is exactly the CFA, assume
 		 we're saving SP like any other register; this happens
 		 on the ARM.  */
-	      def_cfa_1 (label, &cfa);
-	      queue_reg_save (label, stack_pointer_rtx, NULL_RTX, offset);
+	      def_cfa_1 (false, &cfa);
+	      queue_reg_save (stack_pointer_rtx, NULL_RTX, offset);
 	      break;
 	    }
 	  else
@@ -2745,17 +2740,17 @@ dwarf2out_frame_debug_expr (rtx expr, co
 	      cfa.reg = REGNO (x);
 	      cfa.base_offset = offset;
 	      cfa.indirect = 1;
-	      def_cfa_1 (label, &cfa);
+	      def_cfa_1 (false, &cfa);
 	      break;
 	    }
 	}
 
-      def_cfa_1 (label, &cfa);
+      def_cfa_1 (false, &cfa);
       {
 	span = targetm.dwarf_register_span (src);
 
 	if (!span)
-	  queue_reg_save (label, src, NULL_RTX, offset);
+	  queue_reg_save (src, NULL_RTX, offset);
 	else
 	  {
 	    /* We have a PARALLEL describing where the contents of SRC
@@ -2772,7 +2767,7 @@ dwarf2out_frame_debug_expr (rtx expr, co
 	      {
 		rtx elem = XVECEXP (span, 0, par_index);
 
-		queue_reg_save (label, elem, NULL_RTX, span_offset);
+		queue_reg_save (elem, NULL_RTX, span_offset);
 		span_offset += GET_MODE_SIZE (GET_MODE (elem));
 	      }
 	  }
@@ -2794,7 +2789,6 @@ dwarf2out_frame_debug_expr (rtx expr, co
 void
 dwarf2out_frame_debug (rtx insn, bool after_p)
 {
-  const char *label;
   rtx note, n;
   bool handled_one = false;
 
@@ -2813,10 +2807,10 @@ dwarf2out_frame_debug (rtx insn, bool af
 	 is still used to save registers.  */
       if (!ACCUMULATE_OUTGOING_ARGS)
 	dwarf2out_notice_stack_adjust (insn, after_p);
+      cfi_insn = NULL;
       return;
     }
 
-  label = dwarf2out_cfi_label (false);
   any_cfis_emitted = false;
 
   for (note = REG_NOTES (insn); note; note = XEXP (note, 1))
@@ -2827,7 +2821,7 @@ dwarf2out_frame_debug (rtx insn, bool af
 	goto found;
 
       case REG_CFA_DEF_CFA:
-	dwarf2out_frame_debug_def_cfa (XEXP (note, 0), label);
+	dwarf2out_frame_debug_def_cfa (XEXP (note, 0));
 	handled_one = true;
 	break;
 
@@ -2839,7 +2833,7 @@ dwarf2out_frame_debug (rtx insn, bool af
 	    if (GET_CODE (n) == PARALLEL)
 	      n = XVECEXP (n, 0, 0);
 	  }
-	dwarf2out_frame_debug_adjust_cfa (n, label);
+	dwarf2out_frame_debug_adjust_cfa (n);
 	handled_one = true;
 	break;
 
@@ -2847,7 +2841,7 @@ dwarf2out_frame_debug (rtx insn, bool af
 	n = XEXP (note, 0);
 	if (n == NULL)
 	  n = single_set (insn);
-	dwarf2out_frame_debug_cfa_offset (n, label);
+	dwarf2out_frame_debug_cfa_offset (n);
 	handled_one = true;
 	break;
 
@@ -2859,7 +2853,7 @@ dwarf2out_frame_debug (rtx insn, bool af
 	    if (GET_CODE (n) == PARALLEL)
 	      n = XVECEXP (n, 0, 0);
 	  }
-	dwarf2out_frame_debug_cfa_register (n, label);
+	dwarf2out_frame_debug_cfa_register (n);
 	handled_one = true;
 	break;
 
@@ -2867,7 +2861,7 @@ dwarf2out_frame_debug (rtx insn, bool af
 	n = XEXP (note, 0);
 	if (n == NULL)
 	  n = single_set (insn);
-	dwarf2out_frame_debug_cfa_expression (n, label);
+	dwarf2out_frame_debug_cfa_expression (n);
 	handled_one = true;
 	break;
 
@@ -2880,7 +2874,7 @@ dwarf2out_frame_debug (rtx insn, bool af
 	      n = XVECEXP (n, 0, 0);
 	    n = XEXP (n, 0);
 	  }
-	dwarf2out_frame_debug_cfa_restore (n, label);
+	dwarf2out_frame_debug_cfa_restore (n);
 	handled_one = true;
 	break;
 
@@ -2906,18 +2900,20 @@ dwarf2out_frame_debug (rtx insn, bool af
     {
       if (any_cfis_emitted)
 	dwarf2out_flush_queued_reg_saves ();
+      cfi_insn = NULL;
       return;
     }
 
   insn = PATTERN (insn);
  found:
-  dwarf2out_frame_debug_expr (insn, label);
+  dwarf2out_frame_debug_expr (insn);
 
   /* Check again.  A parallel can save and update the same register.
      We could probably check just once, here, but this is safer than
      removing the check above.  */
   if (any_cfis_emitted || clobbers_queued_reg_save (insn))
     dwarf2out_flush_queued_reg_saves ();
+  cfi_insn = NULL;
 }
 
 /* Called once at the start of final to initialize some data for the
@@ -2926,7 +2922,6 @@ void
 dwarf2out_frame_debug_init (void)
 {
   size_t i;
-  rtx insn;
 
   /* Flush any queued register saves.  */
   dwarf2out_flush_queued_reg_saves ();
@@ -2936,6 +2931,7 @@ dwarf2out_frame_debug_init (void)
   gcc_assert (cfa.reg
 	      == (unsigned long)DWARF_FRAME_REGNUM (STACK_POINTER_REGNUM));
 
+  old_cfa = cfa;
   cfa.reg = STACK_POINTER_REGNUM;
   cfa_store = cfa;
   cfa_temp.reg = -1;
@@ -2947,7 +2943,15 @@ dwarf2out_frame_debug_init (void)
       regs_saved_in_regs[i].saved_in_reg = NULL_RTX;
     }
   num_regs_saved_in_regs = 0;
+}
+
+/* After the (optional) text prologue has been written, emit CFI insns
+   and update the FDE for frame-related instructions.  */
 
+void
+dwarf2out_frame_debug_after_prologue (void)
+{
+  rtx insn;
   if (barrier_args_size)
     {
       XDELETEVEC (barrier_args_size);
@@ -2973,6 +2977,7 @@ dwarf2out_frame_debug_init (void)
 	    case NOTE_INSN_CFA_RESTORE_STATE:
 	      cfi_insn = insn;
 	      dwarf2out_frame_debug_restore_state ();
+	      cfi_insn = NULL;
 	      break;
 	    }
 	  continue;
@@ -2999,12 +3004,15 @@ dwarf2out_frame_debug_init (void)
 	  )
 	dwarf2out_frame_debug (insn, true);
     }
+
+  add_cfis_to_fde ();
 }
 
 void
 dwarf2out_emit_cfi (dw_cfi_ref cfi)
 {
-  output_cfi_directive (cfi);
+  if (dwarf2out_do_cfi_asm ())
+    output_cfi_directive (cfi);
 }
 
 /* Determine if we need to save and restore CFI information around
@@ -3085,23 +3093,24 @@ dwarf2out_cfi_begin_epilogue (rtx insn)
   /* And emulate the state save.  */
   gcc_assert (!cfa_remember.in_use);
   cfa_remember = cfa;
+  old_cfa_remember = old_cfa;
   cfa_remember.in_use = 1;
 }
 
 /* A "subroutine" of dwarf2out_cfi_begin_epilogue.  Emit the restore
    required.  */
 
-void
+static void
 dwarf2out_frame_debug_restore_state (void)
 {
   dw_cfi_ref cfi = new_cfi ();
-  const char *label = dwarf2out_cfi_label (false);
 
   cfi->dw_cfi_opc = DW_CFA_restore_state;
-  add_fde_cfi (label, cfi);
+  add_fde_cfi (cfi);
 
   gcc_assert (cfa_remember.in_use);
   cfa = cfa_remember;
+  old_cfa = old_cfa_remember;
   cfa_remember.in_use = 0;
 }
 
@@ -4296,7 +4305,8 @@ dwarf2out_frame_init (void)
      sake of lookup_cfa.  */
 
   /* On entry, the Canonical Frame Address is at SP.  */
-  dwarf2out_def_cfa (NULL, STACK_POINTER_REGNUM, INCOMING_FRAME_SP_OFFSET);
+  old_cfa.reg = INVALID_REGNUM;
+  dwarf2out_def_cfa (true, STACK_POINTER_REGNUM, INCOMING_FRAME_SP_OFFSET);
 
   if (targetm.debug_unwind_info () == UI_DWARF2
       || targetm.except_unwind_info (&global_options) == UI_DWARF2)
@@ -4353,10 +4363,6 @@ dwarf2out_switch_text_section (void)
     }
   have_multiple_function_sections = true;
 
-  /* Reset the current label on switching text sections, so that we
-     don't attempt to advance_loc4 between labels in different sections.  */
-  fde->dw_fde_current_label = NULL;
-
   /* There is no need to mark used sections when not debugging.  */
   if (cold_text_section != NULL)
     dwarf2out_note_section_used ();
Index: gcc/dwarf2out.h
===================================================================
--- gcc.orig/dwarf2out.h
+++ gcc/dwarf2out.h
@@ -19,9 +19,41 @@ along with GCC; see the file COPYING3.  
 <http://www.gnu.org/licenses/>.  */
 
 struct dw_cfi_struct;
+/* In dwarf2out.c */
+/* Interface of the DWARF2 unwind info support.  */
+
+/* Generate a new label for the CFI info to refer to.  */
+
+extern void dwarf2out_maybe_emit_cfi_label (void);
+
+/* Entry point to update the canonical frame address (CFA).  */
+
+extern void dwarf2out_def_cfa (bool, unsigned, HOST_WIDE_INT);
+
+/* Add the CFI for saving a register window.  */
+
+extern void dwarf2out_window_save (void);
+
+/* Entry point for saving a register to the stack.  */
+
+extern void dwarf2out_reg_save (unsigned, HOST_WIDE_INT);
+
+/* Entry point for saving the return address in the stack.  */
+
+extern void dwarf2out_return_save (HOST_WIDE_INT);
+
+/* Entry point for saving the return address in a register.  */
+
+extern void dwarf2out_return_reg (unsigned);
+
+/* Entry point for saving the first register into the second.  */
+
+extern void dwarf2out_reg_save_reg (rtx, rtx);
+
 extern void dwarf2out_decl (tree);
 extern void dwarf2out_frame_debug (rtx, bool);
 extern void dwarf2out_frame_debug_init (void);
+extern void dwarf2out_frame_debug_after_prologue (void);
 extern void dwarf2out_emit_cfi (struct dw_cfi_struct *);
 extern void dwarf2out_flush_queued_reg_saves (void);
 
Index: gcc/final.c
===================================================================
--- gcc.orig/final.c
+++ gcc/final.c
@@ -1588,6 +1588,11 @@ final_start_function (rtx first ATTRIBUT
   /* First output the function prologue: code to set up the stack frame.  */
   targetm.asm_out.function_prologue (file, get_frame_size ());
 
+#if defined (HAVE_prologue)
+  if (dwarf2out_do_frame ())
+    dwarf2out_frame_debug_after_prologue ();
+#endif
+
   /* If the machine represents the prologue as RTL, the profiling code must
      be emitted when NOTE_INSN_PROLOGUE_END is scanned.  */
 #ifdef HAVE_prologue
Index: gcc/target.def
===================================================================
--- gcc.orig/target.def
+++ gcc/target.def
@@ -1792,7 +1792,7 @@ DEFHOOK
 DEFHOOK
 (dwarf_handle_frame_unspec,
  "",
- void, (const char *label, rtx pattern, int index), NULL)
+ void, (rtx pattern, int index), NULL)
 
 /* ??? Documenting this hook requires a GFDL license grant.  */
 DEFHOOK_UNDOC
Index: gcc/tree.h
===================================================================
--- gcc.orig/tree.h
+++ gcc/tree.h
@@ -5424,37 +5424,6 @@ extern tree tree_overlaps_hard_reg_set (
 #endif
 
 \f
-/* In dwarf2out.c */
-/* Interface of the DWARF2 unwind info support.  */
-
-/* Generate a new label for the CFI info to refer to.  */
-
-extern char *dwarf2out_cfi_label (bool);
-
-/* Entry point to update the canonical frame address (CFA).  */
-
-extern void dwarf2out_def_cfa (const char *, unsigned, HOST_WIDE_INT);
-
-/* Add the CFI for saving a register window.  */
-
-extern void dwarf2out_window_save (const char *);
-
-/* Entry point for saving a register to the stack.  */
-
-extern void dwarf2out_reg_save (const char *, unsigned, HOST_WIDE_INT);
-
-/* Entry point for saving the return address in the stack.  */
-
-extern void dwarf2out_return_save (const char *, HOST_WIDE_INT);
-
-/* Entry point for saving the return address in a register.  */
-
-extern void dwarf2out_return_reg (const char *, unsigned);
-
-/* Entry point for saving the first register into the second.  */
-
-extern void dwarf2out_reg_save_reg (const char *, rtx, rtx);
-
 /* In tree-inline.c  */
 
 /* The type of a set of already-visited pointers.  Functions for creating
Index: gcc/doc/tm.texi
===================================================================
--- gcc.orig/doc/tm.texi
+++ gcc/doc/tm.texi
@@ -3203,7 +3203,7 @@ someone decided it was a good idea to us
 terminate the stack backtrace.  New ports should avoid this.
 @end defmac
 
-@deftypefn {Target Hook} void TARGET_DWARF_HANDLE_FRAME_UNSPEC (const char *@var{label}, rtx @var{pattern}, int @var{index})
+@deftypefn {Target Hook} void TARGET_DWARF_HANDLE_FRAME_UNSPEC (rtx @var{pattern}, int @var{index})
 This target hook allows the backend to emit frame-related insns that
 contain UNSPECs or UNSPEC_VOLATILEs.  The DWARF 2 call frame debugging
 info engine will invoke it on insns of the form

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 3/6] Allow jumps in epilogues
  2011-04-13 14:16                     ` Bernd Schmidt
  2011-04-13 15:14                       ` Bernd Schmidt
  2011-04-13 15:16                       ` Bernd Schmidt
@ 2011-04-13 15:17                       ` Bernd Schmidt
  2 siblings, 0 replies; 73+ messages in thread
From: Bernd Schmidt @ 2011-04-13 15:17 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 368 bytes --]

The final part, an updated version of the old 004-dw2cfg patch. This
does much better placement of remember/restore; in almost all cases the
code is identical to what we currently generate, modulo minor
differences around the PROLOGUE_END label. I've made it emit queued
register saves before PROLOGUE_END so that we can use the state there
for forced labels.


Bernd

[-- Attachment #2: 007-dw2cfg.diff --]
[-- Type: text/plain, Size: 42958 bytes --]

Index: gcc/dwarf2out.c
===================================================================
--- gcc.orig/dwarf2out.c
+++ gcc/dwarf2out.c
@@ -465,12 +465,11 @@ static void initial_return_save (rtx);
 static HOST_WIDE_INT stack_adjust_offset (const_rtx, HOST_WIDE_INT,
 					  HOST_WIDE_INT);
 static void output_cfi (dw_cfi_ref, dw_fde_ref, int);
-static void output_cfi_directive (dw_cfi_ref);
+static void output_cfi_directive (FILE *, dw_cfi_ref);
 static void output_call_frame_info (int);
 static void dwarf2out_note_section_used (void);
 static bool clobbers_queued_reg_save (const_rtx);
 static void dwarf2out_frame_debug_expr (rtx);
-static void dwarf2out_cfi_begin_epilogue (rtx);
 static void dwarf2out_frame_debug_restore_state (void);
 
 /* Support for complex CFA locations.  */
@@ -823,9 +822,6 @@ new_cfi (void)
 /* The insn after which a new CFI note should be emitted.  */
 static rtx cfi_insn;
 
-/* True if remember_state should be emitted before following CFI directive.  */
-static bool emit_cfa_remember;
-
 /* True if any CFI directives were emitted at the current insn.  */
 static bool any_cfis_emitted;
 
@@ -868,28 +864,34 @@ dwarf2out_maybe_emit_cfi_label (void)
     }
 }
 
+static void
+add_cfa_remember (void)
+{
+  dw_cfi_ref cfi_remember;
+
+  /* Emit the state save.  */
+  cfi_remember = new_cfi ();
+  cfi_remember->dw_cfi_opc = DW_CFA_remember_state;
+  add_fde_cfi (cfi_remember);
+}
+
+/* Nonnull if add_fde_cfi should not just emit a NOTE_INSN_CFI, but
+   also add the CFI to this vector.  */
+static cfi_vec *cfi_insn_vec;
+
 /* Add CFI to the current fde at the PC value indicated by LABEL if specified,
    or to the CIE if LABEL is NULL.  */
 
 static void
 add_fde_cfi (dw_cfi_ref cfi)
 {
-  if (emit_cfa_remember)
-    {
-      dw_cfi_ref cfi_remember;
-
-      /* Emit the state save.  */
-      emit_cfa_remember = false;
-      cfi_remember = new_cfi ();
-      cfi_remember->dw_cfi_opc = DW_CFA_remember_state;
-      add_fde_cfi (cfi_remember);
-    }
-
   any_cfis_emitted = true;
   if (cfi_insn != NULL)
     {
       cfi_insn = emit_note_after (NOTE_INSN_CFI, cfi_insn);
       NOTE_CFI (cfi_insn) = cfi;
+      if (cfi_insn_vec != NULL)
+	VEC_safe_push (dw_cfi_ref, gc, *cfi_insn_vec, cfi);
     }
   else
     {
@@ -980,12 +982,6 @@ static dw_cfa_location old_cfa;
    from the CFA.  */
 static dw_cfa_location cfa_store;
 
-/* The current save location around an epilogue.  */
-static dw_cfa_location cfa_remember;
-
-/* Like cfa_remember, but a copy of old_cfa.  */
-static dw_cfa_location old_cfa_remember;
-
 /* The running total of the size of arguments pushed onto the stack.  */
 static HOST_WIDE_INT args_size;
 
@@ -1339,179 +1335,6 @@ stack_adjust_offset (const_rtx pattern, 
   return offset;
 }
 
-/* Precomputed args_size for CODE_LABELs and BARRIERs preceeding them,
-   indexed by INSN_UID.  */
-
-static HOST_WIDE_INT *barrier_args_size;
-
-/* Helper function for compute_barrier_args_size.  Handle one insn.  */
-
-static HOST_WIDE_INT
-compute_barrier_args_size_1 (rtx insn, HOST_WIDE_INT cur_args_size,
-			     VEC (rtx, heap) **next)
-{
-  HOST_WIDE_INT offset = 0;
-  int i;
-
-  if (! RTX_FRAME_RELATED_P (insn))
-    {
-      if (prologue_epilogue_contains (insn))
-	/* Nothing */;
-      else if (GET_CODE (PATTERN (insn)) == SET)
-	offset = stack_adjust_offset (PATTERN (insn), cur_args_size, 0);
-      else if (GET_CODE (PATTERN (insn)) == PARALLEL
-	       || GET_CODE (PATTERN (insn)) == SEQUENCE)
-	{
-	  /* There may be stack adjustments inside compound insns.  Search
-	     for them.  */
-	  for (i = XVECLEN (PATTERN (insn), 0) - 1; i >= 0; i--)
-	    if (GET_CODE (XVECEXP (PATTERN (insn), 0, i)) == SET)
-	      offset += stack_adjust_offset (XVECEXP (PATTERN (insn), 0, i),
-					     cur_args_size, offset);
-	}
-    }
-  else
-    {
-      rtx expr = find_reg_note (insn, REG_FRAME_RELATED_EXPR, NULL_RTX);
-
-      if (expr)
-	{
-	  expr = XEXP (expr, 0);
-	  if (GET_CODE (expr) == PARALLEL
-	      || GET_CODE (expr) == SEQUENCE)
-	    for (i = 1; i < XVECLEN (expr, 0); i++)
-	      {
-		rtx elem = XVECEXP (expr, 0, i);
-
-		if (GET_CODE (elem) == SET && !RTX_FRAME_RELATED_P (elem))
-		  offset += stack_adjust_offset (elem, cur_args_size, offset);
-	      }
-	}
-    }
-
-#ifndef STACK_GROWS_DOWNWARD
-  offset = -offset;
-#endif
-
-  cur_args_size += offset;
-  if (cur_args_size < 0)
-    cur_args_size = 0;
-
-  if (JUMP_P (insn))
-    {
-      rtx dest = JUMP_LABEL (insn);
-
-      if (dest)
-	{
-	  if (barrier_args_size [INSN_UID (dest)] < 0)
-	    {
-	      barrier_args_size [INSN_UID (dest)] = cur_args_size;
-	      VEC_safe_push (rtx, heap, *next, dest);
-	    }
-	}
-    }
-
-  return cur_args_size;
-}
-
-/* Walk the whole function and compute args_size on BARRIERs.  */
-
-static void
-compute_barrier_args_size (void)
-{
-  int max_uid = get_max_uid (), i;
-  rtx insn;
-  VEC (rtx, heap) *worklist, *next, *tmp;
-
-  barrier_args_size = XNEWVEC (HOST_WIDE_INT, max_uid);
-  for (i = 0; i < max_uid; i++)
-    barrier_args_size[i] = -1;
-
-  worklist = VEC_alloc (rtx, heap, 20);
-  next = VEC_alloc (rtx, heap, 20);
-  insn = get_insns ();
-  barrier_args_size[INSN_UID (insn)] = 0;
-  VEC_quick_push (rtx, worklist, insn);
-  for (;;)
-    {
-      while (!VEC_empty (rtx, worklist))
-	{
-	  rtx prev, body, first_insn;
-	  HOST_WIDE_INT cur_args_size;
-
-	  first_insn = insn = VEC_pop (rtx, worklist);
-	  cur_args_size = barrier_args_size[INSN_UID (insn)];
-	  prev = prev_nonnote_insn (insn);
-	  if (prev && BARRIER_P (prev))
-	    barrier_args_size[INSN_UID (prev)] = cur_args_size;
-
-	  for (; insn; insn = NEXT_INSN (insn))
-	    {
-	      if (INSN_DELETED_P (insn) || NOTE_P (insn))
-		continue;
-	      if (BARRIER_P (insn))
-		break;
-
-	      if (LABEL_P (insn))
-		{
-		  if (insn == first_insn)
-		    continue;
-		  else if (barrier_args_size[INSN_UID (insn)] < 0)
-		    {
-		      barrier_args_size[INSN_UID (insn)] = cur_args_size;
-		      continue;
-		    }
-		  else
-		    {
-		      /* The insns starting with this label have been
-			 already scanned or are in the worklist.  */
-		      break;
-		    }
-		}
-
-	      body = PATTERN (insn);
-	      if (GET_CODE (body) == SEQUENCE)
-		{
-		  HOST_WIDE_INT dest_args_size = cur_args_size;
-		  for (i = 1; i < XVECLEN (body, 0); i++)
-		    if (INSN_ANNULLED_BRANCH_P (XVECEXP (body, 0, 0))
-			&& INSN_FROM_TARGET_P (XVECEXP (body, 0, i)))
-		      dest_args_size
-			= compute_barrier_args_size_1 (XVECEXP (body, 0, i),
-						       dest_args_size, &next);
-		    else
-		      cur_args_size
-			= compute_barrier_args_size_1 (XVECEXP (body, 0, i),
-						       cur_args_size, &next);
-
-		  if (INSN_ANNULLED_BRANCH_P (XVECEXP (body, 0, 0)))
-		    compute_barrier_args_size_1 (XVECEXP (body, 0, 0),
-						 dest_args_size, &next);
-		  else
-		    cur_args_size
-		      = compute_barrier_args_size_1 (XVECEXP (body, 0, 0),
-						     cur_args_size, &next);
-		}
-	      else
-		cur_args_size
-		  = compute_barrier_args_size_1 (insn, cur_args_size, &next);
-	    }
-	}
-
-      if (VEC_empty (rtx, next))
-	break;
-
-      /* Swap WORKLIST with NEXT and truncate NEXT for next iteration.  */
-      tmp = next;
-      next = worklist;
-      worklist = tmp;
-      VEC_truncate (rtx, next, 0);
-    }
-
-  VEC_free (rtx, heap, worklist);
-  VEC_free (rtx, heap, next);
-}
-
 /* Add a CFI to update the running total of the size of arguments
    pushed onto the stack.  */
 
@@ -1608,25 +1431,7 @@ dwarf2out_notice_stack_adjust (rtx insn,
       return;
     }
   else if (BARRIER_P (insn))
-    {
-      /* Don't call compute_barrier_args_size () if the only
-	 BARRIER is at the end of function.  */
-      if (barrier_args_size == NULL && next_nonnote_insn (insn))
-	compute_barrier_args_size ();
-      if (barrier_args_size == NULL)
-	offset = 0;
-      else
-	{
-	  offset = barrier_args_size[INSN_UID (insn)];
-	  if (offset < 0)
-	    offset = 0;
-	}
-
-      offset -= args_size;
-#ifndef STACK_GROWS_DOWNWARD
-      offset = -offset;
-#endif
-    }
+    return;
   else if (GET_CODE (PATTERN (insn)) == SET)
     offset = stack_adjust_offset (PATTERN (insn), args_size, 0);
   else if (GET_CODE (PATTERN (insn)) == PARALLEL
@@ -2054,9 +1859,12 @@ add_cfis_to_fde (void)
       next = NEXT_INSN (insn);
 
       if (NOTE_P (insn) && NOTE_KIND (insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS)
-	/* Don't attempt to advance_loc4 between labels in different
-	   sections.  */
-	first = true;
+	{
+	  fde->dw_fde_switch_cfi_index = VEC_length (dw_cfi_ref, fde->dw_fde_cfi);
+	  /* Don't attempt to advance_loc4 between labels in different
+	     sections.  */
+	  first = true;
+	}
 
       if (NOTE_P (insn) && NOTE_KIND (insn) == NOTE_INSN_CFI)
 	{
@@ -2097,6 +1905,17 @@ add_cfis_to_fde (void)
     }
 }
 
+/* A subroutine of dwarf2out_frame_debug_init, emit a CFA_restore_state.  */
+
+void
+dwarf2out_frame_debug_restore_state (void)
+{
+  dw_cfi_ref cfi = new_cfi ();
+
+  cfi->dw_cfi_opc = DW_CFA_restore_state;
+  add_fde_cfi (cfi);
+}
+
 /* Record call frame debugging information for an expression EXPR,
    which either sets SP or FP (adjusting how we calculate the frame
    address) or saves a register to the stack or another register.
@@ -2797,9 +2616,6 @@ dwarf2out_frame_debug (rtx insn, bool af
   else
     cfi_insn = PREV_INSN (insn);
 
-  if (!NONJUMP_INSN_P (insn) || clobbers_queued_reg_save (insn))
-    dwarf2out_flush_queued_reg_saves ();
-
   if (!RTX_FRAME_RELATED_P (insn))
     {
       /* ??? This should be done unconditionally since stack adjustments
@@ -2945,56 +2761,224 @@ dwarf2out_frame_debug_init (void)
   num_regs_saved_in_regs = 0;
 }
 
-/* After the (optional) text prologue has been written, emit CFI insns
-   and update the FDE for frame-related instructions.  */
+/* Copy a CFI vector, except for args_size opcodes.  */
+static cfi_vec
+copy_cfi_vec_parts (cfi_vec in_vec)
+{
+  int length = VEC_length (dw_cfi_ref, in_vec);
+  /* Ensure we always have a pointer to a vector, not just NULL.  */
+  cfi_vec new_vec = VEC_alloc (dw_cfi_ref, gc, length > 0 ? length : 1);
+  int i;
+  for (i = 0; i < length; i++)
+    {
+      dw_cfi_ref elt = VEC_index (dw_cfi_ref, in_vec, i);
+      if (elt->dw_cfi_opc == DW_CFA_GNU_args_size)
+	continue;
 
-void
-dwarf2out_frame_debug_after_prologue (void)
+      VEC_quick_push (dw_cfi_ref, new_vec, elt);
+    }
+  return new_vec;
+}
+
+/* Record the state of the CFI program at a point in the program.  */
+typedef struct
 {
-  rtx insn;
-  if (barrier_args_size)
+  /* The CFI instructions up to this point.  */
+  cfi_vec cfis;
+  /* Copies of the global variables with the same name.  */
+  dw_cfa_location cfa, cfa_store, old_cfa;
+  /* True if we have seen this point during a scan in scan_until_barrier.  */
+  bool visited;
+  /* True if this point was used as a starting point for such a scan.  */
+  bool used_as_start;
+  /* Other than CFI instructions and CFA state, the only thing necessary to
+     be tracked is the argument size.  */
+  int args_size;
+  /* Nonzero for states that must be remembered and restored.  If higher
+     than one, the first restores will be immediately followed by another
+     remember.  */
+  int n_restores;
+} jump_target_info;
+
+/* Return true if we'll want to save or restore CFI state at INSN.  This is
+   true for labels and barriers, and certain notes.  */
+static bool
+save_point_p (rtx insn)
+{
+  return (BARRIER_P (insn) || LABEL_P (insn)
+	  || (NOTE_P (insn)
+	      && (NOTE_KIND (insn) == NOTE_INSN_FUNCTION_BEG
+		  || NOTE_KIND (insn) == NOTE_INSN_PROLOGUE_END
+		  || NOTE_KIND (insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS)));
+}
+
+/* Save the current state in INFO.  */
+
+static void
+record_current_state (jump_target_info *info)
+{
+  info->cfis = copy_cfi_vec_parts (*cfi_insn_vec);
+  info->args_size = args_size;
+  info->cfa = cfa;
+  info->old_cfa = old_cfa;
+  info->cfa_store = cfa_store;
+}
+
+/* LABEL is the target of a jump we encountered while scanning the
+   function.  Record it in START_POINTS as a potential new starting point
+   for the scan, unless we've visited it before.  UID_LUID gives a
+   mapping for uids used to index INFO, which holds the CFI
+   information for labels and barriers.  */
+static void
+maybe_record_jump_target (rtx label, VEC (rtx, heap) **start_points,
+			  int *uid_luid, jump_target_info *info)
+{
+  int uid;
+
+  if (GET_CODE (label) == LABEL_REF)
+    label = XEXP (label, 0);
+  gcc_assert (LABEL_P (label));
+  uid = INSN_UID (label);
+  info += uid_luid[uid];
+  if (info->visited || info->cfis)
+    return;
+
+  if (dump_file)
+    fprintf (dump_file, "recording label %d as possible jump target\n", uid);
+
+  VEC_safe_push (rtx, heap, *start_points, label);
+  record_current_state (info);
+}
+
+/* Return true if VEC1 and VEC2 are identical up to the length of VEC1.  */
+static bool
+vec_is_prefix_of (cfi_vec vec1, cfi_vec vec2)
+{
+  int i;
+  int len1 = VEC_length (dw_cfi_ref, vec1);
+  int len2 = VEC_length (dw_cfi_ref, vec2);
+  if (len1 > len2)
+    return false;
+  for (i = 0; i < len1; i++)
+    if (VEC_index (dw_cfi_ref, vec1, i) != VEC_index (dw_cfi_ref, vec1, i))
+      return false;
+  return true;
+}
+
+/* Append entries to FDE's cfi vector.  PREFIX and FULL are two
+   existing vectors, where PREFIX is contained in FULL as a prefix.  */
+
+static void
+append_extra_cfis (cfi_vec prefix, cfi_vec full)
+{
+  int i;
+  int len = VEC_length (dw_cfi_ref, full);
+  int prefix_len = VEC_length (dw_cfi_ref, prefix);
+
+  gcc_assert (prefix_len <= len);
+  for (i = 0; i < prefix_len; i++)
     {
-      XDELETEVEC (barrier_args_size);
-      barrier_args_size = NULL;
+      dw_cfi_ref elt, elt2;
+
+      elt = VEC_index (dw_cfi_ref, full, i);
+      elt2 = VEC_index (dw_cfi_ref, prefix, i);
+      gcc_assert (elt == elt2);
     }
-  for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
+  for (; i < len; i++)
     {
-      rtx pat;
-      if (BARRIER_P (insn))
-	{
-	  dwarf2out_frame_debug (insn, false);
-	  continue;
-	}
-      else if (NOTE_P (insn))
+      dw_cfi_ref elt = VEC_index (dw_cfi_ref, full, i);
+      add_fde_cfi (elt);
+    }
+}
+
+extern void debug_cfi_vec (FILE *, cfi_vec v);
+void debug_cfi_vec (FILE *f, cfi_vec v)
+{
+  int ix;
+  dw_cfi_ref cfi;
+
+  FOR_EACH_VEC_ELT (dw_cfi_ref, v, ix, cfi)
+    output_cfi_directive (f, cfi);
+}
+
+static bool
+switch_note_p (rtx insn)
+{
+  return NOTE_P (insn) && NOTE_KIND (insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS;
+}
+
+/* From the current starting point in INSN, scan forwards until we hit a
+   barrier, the end of the function, or a label we've previously used as
+   a starting point.
+   UID_LUID is a mapping to linear uids used to map an insn to an entry in
+   POINT_INFO, if save_point_p is true for a given insn.  */
+
+static void
+scan_until_barrier (rtx insn, jump_target_info *point_info, int *uid_luid,
+		    VEC (rtx, heap) **start_points)
+{
+  rtx next;
+  for (; insn != NULL_RTX; insn = next)
+    {
+      int uid = INSN_UID (insn);
+      rtx pat, note;
+
+      next = NEXT_INSN (insn);
+      if (save_point_p (insn))
 	{
-	  switch (NOTE_KIND (insn))
-	    {
-	    case NOTE_INSN_EPILOGUE_BEG:
-#if defined (HAVE_epilogue)
-	      dwarf2out_cfi_begin_epilogue (insn);
-#endif
+	  int luid = uid_luid[uid];
+	  jump_target_info *info = point_info + luid;
+	  if (info->used_as_start)
+	    {
+	      if (dump_file)
+		fprintf (dump_file,
+			 "Stopping scan at insn %d; previously reached\n",
+			 uid);
 	      break;
-	    case NOTE_INSN_CFA_RESTORE_STATE:
-	      cfi_insn = insn;
-	      dwarf2out_frame_debug_restore_state ();
-	      cfi_insn = NULL;
+	    }
+	  info->visited = true;
+	  if (BARRIER_P (insn))
+	    gcc_assert (info->cfis == NULL);
+	  if (switch_note_p (insn))
+	    {
+	      /* Don't record the state, it was set to a clean slate in
+		 the caller.  */
+	      if (dump_file)
+		fprintf (dump_file,
+			 "Stopping scan at text section switch %d\n", uid);
+	      break;
+	    }
+	  record_current_state (info);
+	  if (BARRIER_P (insn))
+	    {
+	      if (dump_file)
+		fprintf (dump_file, "Stopping scan at barrier %d\n", uid);
 	      break;
 	    }
-	  continue;
 	}
+
       if (!NONDEBUG_INSN_P (insn))
 	continue;
       pat = PATTERN (insn);
       if (asm_noperands (pat) >= 0)
 	continue;
+
       if (GET_CODE (pat) == SEQUENCE)
 	{
-	  int j;
-	  for (j = 1; j < XVECLEN (pat, 0); j++)
-	    dwarf2out_frame_debug (XVECEXP (pat, 0, j), false);
+	  int i;
+	  for (i = 1; i < XVECLEN (pat, 0); i++)
+	    dwarf2out_frame_debug (XVECEXP (pat, 0, i), false);
 	  insn = XVECEXP (pat, 0, 0);
 	}
 
+      if (!NONJUMP_INSN_P (insn) || clobbers_queued_reg_save (insn)
+	  || (NOTE_P (insn) && NOTE_KIND (insn) == NOTE_INSN_PROLOGUE_END))
+	{
+	  cfi_insn = PREV_INSN (insn);
+	  dwarf2out_flush_queued_reg_saves ();
+	  cfi_insn = NULL_RTX;
+	}
+
       if (CALL_P (insn) && dwarf2out_do_frame ())
 	dwarf2out_frame_debug (insn, false);
       if (dwarf2out_do_frame ()
@@ -3003,115 +2987,463 @@ dwarf2out_frame_debug_after_prologue (vo
 #endif
 	  )
 	dwarf2out_frame_debug (insn, true);
-    }
 
-  add_cfis_to_fde ();
+      if (JUMP_P (insn))
+	{
+	  rtx label = JUMP_LABEL (insn);
+	  if (label)
+	    {
+	      rtx next = next_real_insn (label);
+	      if (next != NULL_RTX && addr_vec_p (next))
+		{
+		  int i;
+		  rtx pat = PATTERN (next);
+		  int eltnum = GET_CODE (pat) == ADDR_DIFF_VEC ? 1 : 0;
+
+		  for (i = 0; i < XVECLEN (pat, eltnum); i++)
+		    maybe_record_jump_target (XVECEXP (pat, eltnum, i),
+					      start_points, uid_luid,
+					      point_info);
+		}
+	      else
+		maybe_record_jump_target (label, start_points, uid_luid,
+					  point_info);
+	    }
+	}
+      note = find_reg_note (insn, REG_EH_REGION, NULL_RTX);
+      if (note)
+	{
+	  eh_landing_pad lp;
+
+	  lp = get_eh_landing_pad_from_rtx (insn);
+	  if (lp)
+	    maybe_record_jump_target (lp->landing_pad, start_points,
+				      uid_luid, point_info);
+	}
+    }
 }
 
-void
-dwarf2out_emit_cfi (dw_cfi_ref cfi)
+/* A subroutine of dwarf2out_debug_after_prologue.  Given the vector
+   of potential starting points in *START_POINTS, pick the best one to
+   use for the next scan.  Return NULL_RTX if there's nothing left to
+   scan.
+   UID_LUID and START_POINTS are as in scan_until_barrier.  */
+
+static rtx
+find_best_starting_point (jump_target_info *point_info, int *uid_luid,
+			  VEC (rtx, heap) **start_points)
 {
-  if (dwarf2out_do_cfi_asm ())
-    output_cfi_directive (cfi);
+  int i;
+  rtx insn;
+  int best_idx;
+  bool best_has_barrier;
+  jump_target_info *restart_info;
+
+  FOR_EACH_VEC_ELT_REVERSE (rtx, *start_points, i, insn)
+    {
+      restart_info = point_info + uid_luid[INSN_UID (insn)];
+      if (restart_info->visited)
+	VEC_ordered_remove (rtx, *start_points, i);
+    }
+
+  best_idx = -1;
+  best_has_barrier = false;
+  FOR_EACH_VEC_ELT (rtx, *start_points, i, insn)
+    {
+      rtx prev;
+      bool this_has_barrier;
+
+      restart_info = point_info + uid_luid[INSN_UID (insn)];
+      prev = prev_nonnote_nondebug_insn (insn);
+      this_has_barrier = (prev
+			  && (BARRIER_P (prev) || switch_note_p (prev)));
+      if (best_idx < 0
+	  || (!best_has_barrier && this_has_barrier))
+	{
+	  best_idx = i;
+	  best_has_barrier = this_has_barrier;
+	}
+    }
+
+  if (best_idx < 0)
+    {
+      rtx link;
+      for (link = forced_labels; link; link = XEXP (link, 1))
+	{
+	  insn = XEXP (link, 0);
+	  restart_info = point_info + uid_luid[INSN_UID (insn)];
+	  if (!restart_info->visited)
+	    return insn;
+	}
+      return NULL_RTX;
+    }
+  insn = VEC_index (rtx, *start_points, best_idx);
+  VEC_ordered_remove (rtx, *start_points, best_idx);
+  return insn;
 }
 
-/* Determine if we need to save and restore CFI information around
-   this epilogue.  If we do need to save/restore, then emit the save
-   now, and insert a NOTE_INSN_CFA_RESTORE_STATE at the appropriate
-   place in the stream.  */
+/* After the (optional) text prologue has been written, emit CFI insns
+   and update the FDE for frame-related instructions.  */
 
 void
-dwarf2out_cfi_begin_epilogue (rtx insn)
+dwarf2out_frame_debug_after_prologue (void)
 {
-  bool saw_frp = false;
-  rtx i;
+  int max_uid = get_max_uid ();
+  int i, n_saves_restores, prologue_end_point, switch_note_point;
+  rtx insn, save_point;
+  VEC (rtx, heap) *start_points;
+  int n_points;
+  int *uid_luid;
+  bool remember_needed;
+  jump_target_info *point_info, *save_point_info;
+  cfi_vec current_vec;
+
+  n_points = 0;
+  for (insn = get_insns (); insn != NULL_RTX; insn = NEXT_INSN (insn))
+    if (save_point_p (insn))
+      n_points++;
+  uid_luid = XCNEWVEC (int, max_uid);
+  n_points = 0;
+  prologue_end_point = -1;
+  switch_note_point = -1;
+  for (insn = get_insns (); insn != NULL_RTX; insn = NEXT_INSN (insn))
+    if (save_point_p (insn))
+      {
+	if (NOTE_P (insn) && NOTE_KIND (insn) == NOTE_INSN_PROLOGUE_END)
+	  prologue_end_point = n_points;
+	else if (switch_note_p (insn))
+	  switch_note_point = n_points;
+	uid_luid[INSN_UID (insn)] = n_points++;
+      }
+
+  point_info = XCNEWVEC (jump_target_info, n_points);
+  for (i = 0; i < n_points; i++)
+    point_info[i].args_size = -1;
+
+  start_points = VEC_alloc (rtx, heap, 20);
+  insn = get_insns ();
+  current_vec = VEC_alloc (dw_cfi_ref, gc, 10);
+
+  /* At a NOTE_INSN_SWITCH_TEXT_SECTIONS we'll emit a cfi_startproc.
+     Ensure the state at this note reflects that.  */
+  if (switch_note_point != -1)
+    {
+      cfi_insn_vec = &current_vec;
+      record_current_state (point_info + switch_note_point);
+      cfi_insn_vec = NULL;
+    }
+  args_size = old_args_size = 0;
 
-  /* Scan forward to the return insn, noticing if there are possible
-     frame related insns.  */
-  for (i = NEXT_INSN (insn); i ; i = NEXT_INSN (i))
+  for (;;)
     {
-      if (!INSN_P (i))
-	continue;
+      HOST_WIDE_INT offset;
+      jump_target_info *restart_info;
 
-      /* Look for both regular and sibcalls to end the block.  Various
-	 optimization passes may cause us to jump to a common epilogue
-	 tail, so we also accept simplejumps.  */
-      if (returnjump_p (i) || simplejump_p (i))
-	break;
-      if (CALL_P (i) && SIBLING_CALL_P (i))
+      /* Scan the insns and emit NOTE_CFIs where necessary.  */
+      cfi_insn_vec = &current_vec;
+      scan_until_barrier (insn, point_info, uid_luid, &start_points);
+      cfi_insn_vec = NULL;
+
+      insn = find_best_starting_point (point_info, uid_luid, &start_points);
+
+      if (insn == NULL_RTX)
 	break;
 
-      if (GET_CODE (PATTERN (i)) == SEQUENCE)
-	{
-	  int idx;
-	  rtx seq = PATTERN (i);
+      if (dump_file)
+	fprintf (dump_file, "restarting scan at label %d", INSN_UID (insn));
 
-	  if (returnjump_p (XVECEXP (seq, 0, 0)))
-	    break;
-	  if (CALL_P (XVECEXP (seq, 0, 0))
-	      && SIBLING_CALL_P (XVECEXP (seq, 0, 0)))
-	    break;
+      restart_info = point_info + uid_luid[INSN_UID (insn)];
+      restart_info->visited = true;
+      restart_info->used_as_start = true;
+      /* If find_best_starting_point returned a forced label, use the
+	 state at the NOTE_INSN_PROLOGUE_END note.  */
+      if (restart_info->cfis == NULL)
+	{
+	  cfi_vec *v = &restart_info->cfis;
+	  gcc_assert (prologue_end_point != -1);
+	  restart_info = point_info + prologue_end_point;
+	  *v = copy_cfi_vec_parts (restart_info->cfis);
+	}
+
+      gcc_assert (LABEL_P (insn));
+      current_vec = copy_cfi_vec_parts (restart_info->cfis);
+      cfa = restart_info->cfa;
+      old_cfa = restart_info->old_cfa;
+      cfa_store = restart_info->cfa_store;
+      offset = restart_info->args_size;
+      if (offset >= 0)
+	{
+	  if (dump_file && offset != args_size)
+	    fprintf (dump_file, ", args_size " HOST_WIDE_INT_PRINT_DEC
+		     "  -> " HOST_WIDE_INT_PRINT_DEC,
+		     args_size, offset);
 
-	  for (idx = 0; idx < XVECLEN (seq, 0); idx++)
-	    if (RTX_FRAME_RELATED_P (XVECEXP (seq, 0, idx)))
-	      saw_frp = true;
+	  offset -= args_size;
+#ifndef STACK_GROWS_DOWNWARD
+	  offset = -offset;
+#endif
+	  if (offset != 0)
+	    {
+	      cfi_insn = prev_nonnote_nondebug_insn (insn);
+	      dwarf2out_stack_adjust (offset);
+	      cfi_insn = NULL_RTX;
+	    }
+	}
+      if (dump_file)
+	{
+	  fprintf (dump_file, "\n");
+	  if (dump_flags & TDF_DETAILS)
+	    debug_cfi_vec (dump_file, current_vec);
 	}
 
-      if (RTX_FRAME_RELATED_P (i))
-	saw_frp = true;
+      insn = NEXT_INSN (insn);
     }
 
-  /* If the port doesn't emit epilogue unwind info, we don't need a
-     save/restore pair.  */
-  if (!saw_frp)
-    return;
+  VEC_free (rtx, heap, start_points);
 
-  /* Otherwise, search forward to see if the return insn was the last
-     basic block of the function.  If so, we don't need save/restore.  */
-  gcc_assert (i != NULL);
-  i = next_real_insn (i);
-  if (i == NULL)
-    return;
+  /* Now splice the various CFI fragments together into a coherent whole.  */
 
-  /* Insert the restore before that next real insn in the stream, and before
-     a potential NOTE_INSN_EPILOGUE_BEG -- we do need these notes to be
-     properly nested.  This should be after any label or alignment.  This
-     will be pushed into the CFI stream by the function below.  */
-  while (1)
+  /* First, discover discontinuities, and where necessary search for suitable
+     remember/restore points.  */
+  save_point = NULL_RTX;
+  save_point_info = NULL;
+  n_saves_restores = 0;
+  for (insn = get_last_insn (); insn; insn = PREV_INSN (insn))
     {
-      rtx p = PREV_INSN (i);
-      if (!NOTE_P (p))
-	break;
-      if (NOTE_KIND (p) == NOTE_INSN_BASIC_BLOCK)
-	break;
-      i = p;
+      jump_target_info *info, *barrier_info, *candidate_info;
+      rtx prev;
+
+      if (insn == save_point)
+	{
+	  save_point = NULL_RTX;
+	  save_point_info = NULL;
+	  info = point_info + uid_luid[INSN_UID (insn)];
+	  info->n_restores = n_saves_restores;
+	  n_saves_restores = 0;
+	  if (dump_file)
+	    fprintf (dump_file, "finalize save point %d\n", INSN_UID (insn));
+	}
+
+      /* Look for labels that were used as starting points and are
+	 preceded by a BARRIER.  */
+      if (!LABEL_P (insn))
+	continue;
+
+      info = point_info + uid_luid[INSN_UID (insn)];
+      if (!info->used_as_start)
+	continue;
+      barrier_info = NULL;
+      for (prev = PREV_INSN (insn); prev; prev = PREV_INSN (prev))
+	{
+	  if (!BARRIER_P (prev) && !LABEL_P (prev))
+	    continue;
+	  barrier_info = point_info + uid_luid[INSN_UID (prev)];
+	  /* Skip through barriers we haven't visited; they may occur
+	     for things like jump tables.  */
+	  if ((BARRIER_P (prev) && barrier_info->visited)
+	      || (LABEL_P (prev) && barrier_info->used_as_start)
+	      || switch_note_p (prev))
+	    break;
+	}
+      if (!BARRIER_P (prev))
+	continue;
+
+      if (dump_file)
+	fprintf (dump_file, "State transition at barrier %d, label %d ... ",
+		 INSN_UID (prev), INSN_UID (insn));
+
+      /* If the state at the barrier can easily be transformed into the state
+	 at the label, we don't need save/restore points.  */
+      if (vec_is_prefix_of (barrier_info->cfis, info->cfis))
+	{
+	  if (dump_file)
+	    fprintf (dump_file, "prefix\n");
+	  continue;
+	}
+
+      /* A save/restore is necessary.  Walk backwards to find the best
+	 save point.  First see if we know a save point already and if
+	 it's suitable.  */
+      n_saves_restores++;
+      if (save_point)
+	{
+	  prev = save_point;
+	  if (vec_is_prefix_of (save_point_info->cfis, info->cfis))
+	    {
+	      if (dump_file)
+		fprintf (dump_file, "reuse save point\n");
+	      continue;
+	    }
+	}
+
+      for (;;)
+	{
+	  prev = PREV_INSN (prev);
+	  /* We should eventually encounter the NOTE_INSN_FUNCTION_BEG,
+	     which must be a suitable save point fo anything.  */
+	  gcc_assert (prev != NULL_RTX);
+
+	  if (!save_point_p (prev))
+	    continue;
+
+	  candidate_info = point_info + uid_luid[INSN_UID (prev)];
+	  /* We don't necessarily get to see this note during
+	     scanning. Record an empty CFI vector for it so that it is
+	     usable as a restore point.  */
+	  if (switch_note_p (prev))
+	    {
+	      if (candidate_info->cfis == NULL)
+		candidate_info->cfis = VEC_alloc (dw_cfi_ref, gc, 1);
+	    }
+
+	  if (candidate_info->cfis != NULL
+	      && vec_is_prefix_of (candidate_info->cfis, info->cfis)
+	      && (save_point == NULL
+		  || vec_is_prefix_of (candidate_info->cfis,
+				       save_point_info->cfis)))
+	    {
+	      if (dump_file)
+		fprintf (dump_file, "save point %d\n", INSN_UID (prev));
+	      save_point = prev;
+	      save_point_info = candidate_info;
+	      break;
+	    }
+	}
     }
-  emit_note_before (NOTE_INSN_CFA_RESTORE_STATE, i);
 
-  emit_cfa_remember = true;
+  save_point = NULL_RTX;
+  save_point_info = NULL;
+  remember_needed = false;
+
+  /* This value is now used to distinguish between NOTE_CFI added up
+     to now and those added by the next loop.  */
+  max_uid = get_max_uid ();
 
-  /* And emulate the state save.  */
-  gcc_assert (!cfa_remember.in_use);
-  cfa_remember = cfa;
-  old_cfa_remember = old_cfa;
-  cfa_remember.in_use = 1;
-}
+  for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
+    {
+      jump_target_info *info;
 
-/* A "subroutine" of dwarf2out_cfi_begin_epilogue.  Emit the restore
-   required.  */
+      if (INSN_UID (insn) < max_uid
+	  && NOTE_P (insn) && NOTE_KIND (insn) == NOTE_INSN_CFI
+	  && remember_needed)
+	{
+	  cfi_insn = PREV_INSN (insn);
+	  add_cfa_remember ();
+	  cfi_insn = NULL_RTX;
+	  remember_needed = false;
+	}
 
-static void
-dwarf2out_frame_debug_restore_state (void)
-{
-  dw_cfi_ref cfi = new_cfi ();
+      if (!save_point_p (insn))
+	continue;
 
-  cfi->dw_cfi_opc = DW_CFA_restore_state;
-  add_fde_cfi (cfi);
+      cfi_insn = insn;
+      info = point_info + uid_luid[INSN_UID (insn)];
+
+      if (info->n_restores > 0)
+	{
+	  gcc_assert (save_point_info == NULL);
+	  save_point_info = info;
+	  remember_needed = true;
+	}
+      if (switch_note_p (insn))
+	{
+	  jump_target_info *label_info;
+	  rtx next = insn;
+
+	  cfi_insn = insn;
+	  if (remember_needed)
+	    add_cfa_remember ();
+	  remember_needed = false;
+
+	  /* Find the next label, and emit extra CFIs as necessary to
+	     achieve the correct state.  */
+	  do
+	    {
+	      if (LABEL_P (next))
+		{
+		  label_info = point_info + uid_luid[INSN_UID (next)];
+		  if (label_info->used_as_start)
+		    break;
+		}
+	      insn = next;
+	      next = NEXT_INSN (next);
+	    }
+	  while (next != NULL_RTX);
+	  if (next == NULL_RTX)
+	    break;
+	  append_extra_cfis (NULL, label_info->cfis);
+	  cfi_insn = NULL_RTX;
+	}
+      else if (BARRIER_P (insn))
+	{
+	  jump_target_info *label_info;
+	  cfi_vec new_cfi_vec;
+	  cfi_vec barrier_cfi = info->cfis;
+	  rtx next = insn;
+
+	  /* Find the start of the next sequence we processed.  */
+	  do
+	    {
+	      if (LABEL_P (next))
+		{
+		  label_info = point_info + uid_luid[INSN_UID (next)];
+		  if (label_info->used_as_start)
+		    break;
+		}
+	      if (switch_note_p (next))
+		break;
+	      insn = next;
+	      next = NEXT_INSN (next);
+	    }
+	  while (next != NULL_RTX);
+	  if (next == NULL_RTX)
+	    break;
+	  if (!LABEL_P (next))
+	    continue;
 
-  gcc_assert (cfa_remember.in_use);
-  cfa = cfa_remember;
-  old_cfa = old_cfa_remember;
-  cfa_remember.in_use = 0;
+	  /* Emit extra CFIs as necessary to achieve the correct state.  */
+	  new_cfi_vec = label_info->cfis;
+	  cfi_insn = next;
+	  if (vec_is_prefix_of (barrier_cfi, new_cfi_vec))
+	    {
+	      if (VEC_length (dw_cfi_ref, barrier_cfi)
+		  != VEC_length (dw_cfi_ref, new_cfi_vec))
+		{
+		  /* If the barrier was a point needing a restore, we must
+		     add the remember here as we ignore the newly added
+		     CFI notes.  */
+		  if (info->n_restores > 0)
+		    add_cfa_remember ();
+		  remember_needed = false;
+		  append_extra_cfis (barrier_cfi, new_cfi_vec);
+		}
+	    }
+	  else
+	    {
+	      save_point_info->n_restores--;
+	      dwarf2out_frame_debug_restore_state ();
+
+	      if (save_point_info->n_restores > 0)
+		add_cfa_remember ();
+	      gcc_assert (!remember_needed);
+	      append_extra_cfis (save_point_info->cfis, new_cfi_vec);
+	      if (save_point_info->n_restores == 0)
+		save_point_info = NULL;
+	    }
+	  cfi_insn = NULL_RTX;
+	}
+    }
+  free (uid_luid);
+  free (point_info);
+
+  add_cfis_to_fde ();
+}
+
+void
+dwarf2out_emit_cfi (dw_cfi_ref cfi)
+{
+  if (dwarf2out_do_cfi_asm ())
+    output_cfi_directive (asm_out_file, cfi);
 }
 
 /* Describe for the GTY machinery what parts of dw_cfi_oprnd1 are used.  */
@@ -3411,7 +3743,7 @@ output_cfi (dw_cfi_ref cfi, dw_fde_ref f
 /* Similar, but do it via assembler directives instead.  */
 
 static void
-output_cfi_directive (dw_cfi_ref cfi)
+output_cfi_directive (FILE *f, dw_cfi_ref cfi)
 {
   unsigned long r, r2;
 
@@ -3426,82 +3758,96 @@ output_cfi_directive (dw_cfi_ref cfi)
       /* Should only be created by add_fde_cfi in a code path not
 	 followed when emitting via directives.  The assembler is
 	 going to take care of this for us.  */
-      gcc_unreachable ();
+      if (f == asm_out_file)
+	gcc_unreachable ();
+      fprintf (f, "\t.cfi_advance_loc\n");
+      break;
 
     case DW_CFA_offset:
     case DW_CFA_offset_extended:
     case DW_CFA_offset_extended_sf:
       r = DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, 1);
-      fprintf (asm_out_file, "\t.cfi_offset %lu, "HOST_WIDE_INT_PRINT_DEC"\n",
+      fprintf (f, "\t.cfi_offset %lu, "HOST_WIDE_INT_PRINT_DEC"\n",
 	       r, cfi->dw_cfi_oprnd2.dw_cfi_offset);
       break;
 
     case DW_CFA_restore:
     case DW_CFA_restore_extended:
       r = DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, 1);
-      fprintf (asm_out_file, "\t.cfi_restore %lu\n", r);
+      fprintf (f, "\t.cfi_restore %lu\n", r);
       break;
 
     case DW_CFA_undefined:
       r = DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, 1);
-      fprintf (asm_out_file, "\t.cfi_undefined %lu\n", r);
+      fprintf (f, "\t.cfi_undefined %lu\n", r);
       break;
 
     case DW_CFA_same_value:
       r = DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, 1);
-      fprintf (asm_out_file, "\t.cfi_same_value %lu\n", r);
+      fprintf (f, "\t.cfi_same_value %lu\n", r);
       break;
 
     case DW_CFA_def_cfa:
     case DW_CFA_def_cfa_sf:
       r = DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, 1);
-      fprintf (asm_out_file, "\t.cfi_def_cfa %lu, "HOST_WIDE_INT_PRINT_DEC"\n",
+      fprintf (f, "\t.cfi_def_cfa %lu, "HOST_WIDE_INT_PRINT_DEC"\n",
 	       r, cfi->dw_cfi_oprnd2.dw_cfi_offset);
       break;
 
     case DW_CFA_def_cfa_register:
       r = DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, 1);
-      fprintf (asm_out_file, "\t.cfi_def_cfa_register %lu\n", r);
+      fprintf (f, "\t.cfi_def_cfa_register %lu\n", r);
       break;
 
     case DW_CFA_register:
       r = DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, 1);
       r2 = DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd2.dw_cfi_reg_num, 1);
-      fprintf (asm_out_file, "\t.cfi_register %lu, %lu\n", r, r2);
+      fprintf (f, "\t.cfi_register %lu, %lu\n", r, r2);
       break;
 
     case DW_CFA_def_cfa_offset:
     case DW_CFA_def_cfa_offset_sf:
-      fprintf (asm_out_file, "\t.cfi_def_cfa_offset "
+      fprintf (f, "\t.cfi_def_cfa_offset "
 	       HOST_WIDE_INT_PRINT_DEC"\n",
 	       cfi->dw_cfi_oprnd1.dw_cfi_offset);
       break;
 
     case DW_CFA_remember_state:
-      fprintf (asm_out_file, "\t.cfi_remember_state\n");
+      fprintf (f, "\t.cfi_remember_state\n");
       break;
     case DW_CFA_restore_state:
-      fprintf (asm_out_file, "\t.cfi_restore_state\n");
+      fprintf (f, "\t.cfi_restore_state\n");
       break;
 
     case DW_CFA_GNU_args_size:
-      fprintf (asm_out_file, "\t.cfi_escape %#x,", DW_CFA_GNU_args_size);
+      if (f != asm_out_file)
+	{
+	  fprintf (f, "\t.cfi_GNU_args_size"HOST_WIDE_INT_PRINT_DEC "\n",
+		   cfi->dw_cfi_oprnd1.dw_cfi_offset);
+	  break;
+	}
+      fprintf (f, "\t.cfi_escape %#x,", DW_CFA_GNU_args_size);
       dw2_asm_output_data_uleb128_raw (cfi->dw_cfi_oprnd1.dw_cfi_offset);
       if (flag_debug_asm)
-	fprintf (asm_out_file, "\t%s args_size "HOST_WIDE_INT_PRINT_DEC,
+	fprintf (f, "\t%s args_size "HOST_WIDE_INT_PRINT_DEC,
 		 ASM_COMMENT_START, cfi->dw_cfi_oprnd1.dw_cfi_offset);
-      fputc ('\n', asm_out_file);
+      fputc ('\n', f);
       break;
 
     case DW_CFA_GNU_window_save:
-      fprintf (asm_out_file, "\t.cfi_window_save\n");
+      fprintf (f, "\t.cfi_window_save\n");
       break;
 
     case DW_CFA_def_cfa_expression:
     case DW_CFA_expression:
-      fprintf (asm_out_file, "\t.cfi_escape %#x,", cfi->dw_cfi_opc);
+      if (f != asm_out_file)
+	{
+	  fprintf (f, "\t.cfi_cfa_{def_,}expression\n");
+	  break;
+	}
+      fprintf (f, "\t.cfi_escape %#x,", cfi->dw_cfi_opc);
       output_cfa_loc_raw (cfi);
-      fputc ('\n', asm_out_file);
+      fputc ('\n', f);
       break;
 
     default:
@@ -3510,14 +3856,11 @@ output_cfi_directive (dw_cfi_ref cfi)
 }
 
 /* Output CFIs from VEC, up to index UPTO, to bring current FDE to the
-   same state as after executing CFIs in CFI chain.  DO_CFI_ASM is
-   true if .cfi_* directives shall be emitted, false otherwise.  If it
-   is false, FDE and FOR_EH are the other arguments to pass to
-   output_cfi.  */
+   same state as after executing CFIs in CFI chain.  FDE and FOR_EH
+   are the other arguments to pass to output_cfi.  */
 
 static void
-output_cfis (cfi_vec vec, int upto, bool do_cfi_asm,
-	     dw_fde_ref fde, bool for_eh)
+output_cfis (cfi_vec vec, int upto, dw_fde_ref fde, bool for_eh)
 {
   int ix;
   struct dw_cfi_struct cfi_buf;
@@ -3611,12 +3954,7 @@ output_cfis (cfi_vec vec, int upto, bool
 	      if (cfi2 != NULL
 		  && cfi2->dw_cfi_opc != DW_CFA_restore
 		  && cfi2->dw_cfi_opc != DW_CFA_restore_extended)
-		{
-		  if (do_cfi_asm)
-		    output_cfi_directive (cfi2);
-		  else
-		    output_cfi (cfi2, fde, for_eh);
-		}
+		output_cfi (cfi2, fde, for_eh);
 	    }
 	  if (cfi_cfa && cfi_cfa_offset && cfi_cfa_offset != cfi_cfa)
 	    {
@@ -3645,30 +3983,20 @@ output_cfis (cfi_vec vec, int upto, bool
 	  else if (cfi_cfa_offset)
 	    cfi_cfa = cfi_cfa_offset;
 	  if (cfi_cfa)
-	    {
-	      if (do_cfi_asm)
-		output_cfi_directive (cfi_cfa);
-	      else
-		output_cfi (cfi_cfa, fde, for_eh);
-	    }
+	    output_cfi (cfi_cfa, fde, for_eh);
+
 	  cfi_cfa = NULL;
 	  cfi_cfa_offset = NULL;
 	  if (cfi_args_size
 	      && cfi_args_size->dw_cfi_oprnd1.dw_cfi_offset)
-	    {
-	      if (do_cfi_asm)
-		output_cfi_directive (cfi_args_size);
-	      else
-		output_cfi (cfi_args_size, fde, for_eh);
-	    }
+	    output_cfi (cfi_args_size, fde, for_eh);
+
 	  cfi_args_size = NULL;
 	  if (cfi == NULL)
 	    {
 	      VEC_free (dw_cfi_ref, heap, regs);
 	      return;
 	    }
-	  else if (do_cfi_asm)
-	    output_cfi_directive (cfi);
 	  else
 	    output_cfi (cfi, fde, for_eh);
 	  break;
@@ -3678,14 +4006,6 @@ output_cfis (cfi_vec vec, int upto, bool
     }
 }
 
-/* Like output_cfis, but emit all CFIs in the vector.  */
-static void
-output_all_cfis (cfi_vec vec, bool do_cfi_asm,
-		 dw_fde_ref fde, bool for_eh)
-{
-  output_cfis (vec, VEC_length (dw_cfi_ref, vec), do_cfi_asm, fde, for_eh);
-}
-
 /* Output one FDE.  */
 
 static void
@@ -3801,7 +4121,7 @@ output_fde (dw_fde_ref fde, bool for_eh,
       if (fde->dw_fde_switch_cfi_index > 0)
 	{
 	  from = fde->dw_fde_switch_cfi_index;
-	  output_cfis (fde->dw_fde_cfi, from, false, fde, for_eh);
+	  output_cfis (fde->dw_fde_cfi, from, fde, for_eh);
 	}
       for (i = from; i < until; i++)
 	output_cfi (VEC_index (dw_cfi_ref, fde->dw_fde_cfi, i),
@@ -4379,13 +4699,8 @@ dwarf2out_switch_text_section (void)
        || (cold_text_section && sect == cold_text_section));
 
   if (dwarf2out_do_cfi_asm ())
-    {
-      dwarf2out_do_cfi_startproc (true);
-      /* As this is a different FDE, insert all current CFI instructions
-	 again.  */
-      output_all_cfis (fde->dw_fde_cfi, true, fde, true);
-    }
-  fde->dw_fde_switch_cfi_index = VEC_length (dw_cfi_ref, fde->dw_fde_cfi);
+    dwarf2out_do_cfi_startproc (true);
+
   var_location_switch_text_section ();
 
   set_cur_line_info_table (sect);
@@ -5490,7 +5805,7 @@ output_loc_operands_raw (dw_loc_descr_re
 	dw2_asm_output_data_uleb128_raw (r);
       }
       break;
-      
+
     case DW_OP_constu:
     case DW_OP_plus_uconst:
     case DW_OP_piece:
@@ -12472,7 +12787,7 @@ output_one_line_info_table (dw_line_info
 	  dw2_asm_output_data (1, DW_LNS_set_prologue_end,
 			       "set prologue end");
 	  break;
-	  
+
 	case LI_set_epilogue_begin:
 	  dw2_asm_output_data (1, DW_LNS_set_epilogue_begin,
 			       "set epilogue begin");
@@ -14799,7 +15114,7 @@ static bool
 decl_by_reference_p (tree decl)
 {
   return ((TREE_CODE (decl) == PARM_DECL || TREE_CODE (decl) == RESULT_DECL
-  	   || TREE_CODE (decl) == VAR_DECL)
+	   || TREE_CODE (decl) == VAR_DECL)
 	  && DECL_BY_REFERENCE (decl));
 }
 
@@ -20724,7 +21039,7 @@ gen_type_die_with_usage (tree type, dw_d
       if (DECL_CONTEXT (TYPE_NAME (type))
 	  && TREE_CODE (DECL_CONTEXT (TYPE_NAME (type))) == NAMESPACE_DECL)
 	context_die = get_context_die (DECL_CONTEXT (TYPE_NAME (type)));
-      
+
       gen_decl_die (TYPE_NAME (type), NULL, context_die);
       return;
     }
@@ -21954,7 +22269,7 @@ gen_scheduled_generic_parms_dies (void)
 
   if (generic_type_instances == NULL)
     return;
-  
+
   FOR_EACH_VEC_ELT (tree, generic_type_instances, i, t)
     gen_generic_params_dies (t);
 }
@@ -23863,7 +24178,7 @@ dwarf2out_finish (const char *filename)
   if (!VEC_empty (pubname_entry, pubtype_table))
     {
       bool empty = false;
-      
+
       if (flag_eliminate_unused_debug_types)
 	{
 	  /* The pubtypes table might be emptied by pruning unused items.  */
Index: gcc/jump.c
===================================================================
--- gcc.orig/jump.c
+++ gcc/jump.c
@@ -709,6 +709,15 @@ comparison_dominates_p (enum rtx_code co
   return 0;
 }
 \f
+/* Return true if INSN is an ADDR_VEC or ADDR_DIFF_VEC.  */
+bool
+addr_vec_p (const_rtx insn)
+{
+  return (JUMP_P (insn)
+	  && (GET_CODE (PATTERN (insn)) == ADDR_VEC
+	      || GET_CODE (PATTERN (insn)) == ADDR_DIFF_VEC));
+}
+
 /* Return 1 if INSN is an unconditional jump and nothing else.  */
 
 int
Index: gcc/rtl.h
===================================================================
--- gcc.orig/rtl.h
+++ gcc/rtl.h
@@ -2307,6 +2307,7 @@ extern int any_condjump_p (const_rtx);
 extern int any_uncondjump_p (const_rtx);
 extern rtx pc_set (const_rtx);
 extern rtx condjump_label (const_rtx);
+extern bool addr_vec_p (const_rtx);
 extern int simplejump_p (const_rtx);
 extern int returnjump_p (rtx);
 extern int eh_returnjump_p (rtx);

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 3/6] Allow jumps in epilogues
  2011-04-11 17:10                   ` Richard Henderson
  2011-04-13 14:16                     ` Bernd Schmidt
@ 2011-04-13 15:28                     ` Bernd Schmidt
  2011-04-13 14:44                       ` Richard Henderson
  2011-04-15 16:29                       ` Bernd Schmidt
  1 sibling, 2 replies; 73+ messages in thread
From: Bernd Schmidt @ 2011-04-13 15:28 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2428 bytes --]

On 04/11/2011 07:10 PM, Richard Henderson wrote:
> Ok.

Thanks, committed. And here's an update of the other patch set,
renumbered to start at 5, with a changelog to cover all three patches
together (which is how they'll need to be committed eventually).

005-scanfirst: Mostly identical to the previous version of the scanfirst
patch. Now deletes the CFI notes to avoid compare-debug
failures.

006-cfilabel: A new patch, which reduces the amount of different code
paths we can take in add_fde_cfi, as this was becoming unmanageable. The
concept is to first emit just the CFI notes, in all cases. Later, after
we're done producing the CFI insns we need, another pass over the rtl
adds the necessary labels and set_loc/advance_loc CFIs. One consequence
of this is that def_cfa_1 can no longer use lookup_cfa, so it just
compares to an old_cfa variable instead. This also requires
target-specific changes as some ports use dwarf2out_cfi_label. An
(untested) example of the necessary changes is in config/arm.

One thing that has disappeared is
-  /* ??? Of course, this heuristic fails when we're annotating epilogues,
-     because of course we'll always want to redefine the CFA back to the
-     stack pointer on the way out.  Where should we move this check?  */
because I didn't know what it was for, and "if (0)"ed anyway.

007-dw2cfg: A much extended version of the previous patch. Now does much
better placement of remember/restore; in almost all cases the code is
identical to what we currently generate, modulo minor differences around
the PROLOGUE_END label. I've made it emit queued register saves before
PROLOGUE_END so that we can use the state there for forced labels.

This bootstraps and tests ok on i686-linux. However, there is work left
to be done. Can I take you up on your offer to work with me on this?
This still requires the i386 output_set_got which I think I can cope
with, but the ia64 backend does a number of things with unwinding that I
don't understand. Also, I'll be away the next two weeks - if you arrive
at a complete version during that time it would be great if you could
commit it.

One thing to note is that it seems surprisingly hard to make
-freorder-blocks-and-partition do anything interesting. There's one C++
testcase (partition2.C I think) which I used to debug this code, but
other than that I haven't really found anything that actually generates
two nonempty partitions.


Bernd

[-- Attachment #2: dw2-cl --]
[-- Type: text/plain, Size: 4035 bytes --]

	* target.def (dwarf_handle_frame_unspec): Remove label argument.
	* doc/tm.texi: Regenerate.
	* tree.h (dwarf2out_cfi_label, dwarf2out_def_cfa,
	dwarf2out_window_save, dwarf2out_reg_save, dwarf2out_return_save,
	dwarf2out_return_reg, dwarf2out_reg_save_reg): Don't declare.
	* final.c (final_start_function): Call
	dwarf2out_frame_debug_after_prologue.
	(final_scan_insn): Don't call dwarf2out_frame_debug for anything.
	Handle NOTE_INSN_CFI and NOTE_INSN_CFI_LABEL.
	(final): Delete these notes.
	* insn-notes.def (CFI, CFI_LABEL): New.
	* jump.c (addr_vec_p): New function.
	* dwarf2out.c (cfi_insn): New static variable.
	(dwarf2out_cfi_label): Remove force argument. All callers changed.
	Only generate the label, don't emit it.
	(dwarf2out_maybe_emit_cfi_label): New function.
	
	(add_fde_cfi): Remove label argument.  All callers changed.  Remove
	most code; leave a condition to either emit a CFI insn, or add the
	CFI to the FDE CFI vector.
	(add_cie_cfi): New static function.
	(add_cfi): Remove function.
	(old_cfa): New static variable.
	(cfa_remember): Remove static variable.
	(dwarf2out_def_cfa): Replace label argument with a bool for_cie
	argument.  All callers changed.  Don't use lookup_cfa; use and
	update the global old_cfa variable.  Call add_fde_cfi or add_cie_cfi
	at the end.
	(reg_save): Replace label argument with a bool.  All callers changed.
	Call add_fde_cfi or add_cie_cfi at the end.
	(dwarf2out_reg_save, dwarf2out_return_save, dwarf2out_return_reg,
	dwarf2out_args_szie, dwarf2out_stack_adjust, dwarf2out_reg_save_reg,
	dwarf2out_frame_debug_def_cfa, dwarf2out_frame_debug_cfa_offset,
	dwarf2out_frame_debug_cfa_register, dwarf2out_frame_debug_cfa_restore,
	dwarf2out_frame_debug_cfa_expression, dwarf2out_frame_debug_expr):
	Remove label argument.  All callers changed.
	(barrier_args_size): Remove variable.
	(compute_barrier_args_size_1, compute_barrier_args_size): Remove
	functions.
	(dwarf2out_notice_stack_adjust): Don't handle barriers.
	(last_reg_save_label): Remove variable.  All sets and uses removed.
	(cfi_label_required_p, add_cfis_to_fde): New static functions.
	(dwarf2out_frame_debug_restore_state): Simply add the new CFI.
	(dwarf2out_frame_debug): Set cfi_insn, and clear it.  Don't call
	dwarf2out_flush_queued_reg_saves at the top.
	(dwarf2out_frame_debug_init): Initialize old_cfa.
	(copy_cfi_vec_parts): New static function.
	(jump_target_info): New struct type.
	(dwarf2out_cfi_begin_epilogue): Remove.
	(save_point_p, record_current_state, maybe_record_jump_target,
	vec_is_prefix_of, append_extra_cfis, debug_cfi_vec, switch_note_p,
	scan_until_barrier, find_best_starting_point): New static functions.
	(dwarf2out_frame_debug_after_prologue): New function.
	(dwarf2out_emit_cfi): New function.
	(output_cfi_directive): New FILE argument.  All callers changed.
	Avoid some paths if it is not asm_out_file; otherwise print to it.
	(output_all_cfis): Remove function.
	(output_cfis): Remove do_cfi_asm arg.  All callers changed.  Never
	call output_cfi_directive.
	(dwarf2out_frame_init): Initialize old_cfa.
	(dwarf2out_switch_text_section): Don't initialize dw_fde_current_label.
	Don't call output_all_cfis.
	* dwarf2out.h (dwarf2out_cfi_label, dwarf2out_def_cfa,
	dwarf2out_window_save, dwarf2out_reg_save, dwarf2out_return_save,
	dwarf2out_return_reg, dwarf2out_reg_save_reg, dwarf2out_emit_cfi,
	dwarf2out_frame_debug_after_prologue): Declare.
	(dwarf2out_cfi_begin_epilogue, dwarf2out_frame_debug_restore_state):
	Don't declare.
	(struct dw_cfi_struct): Add forward declaration.
	* rtl.h (union rtunion_def): Add rt_cfi member.
	(XCFI, XCCFI, NOTE_CFI, NOTE_LABEL_NUMBER): New macros.
	(addr_vec_p): Declare.
	* config/sparc/sparc.c (sparc_dwarf_handle_frame_unspec): Remove
	label argument.
	* config/ia64/ia64.c (ia64_dwarf_handle_frame_unspec): Likewise.
	* config/arm/arm.c (thumb_pushpop): Use dwarf2out_maybe_emit_cfi_label
	rather than dwarf2out_cfi_label.
	(thumb1_output_function_prologue): Likewise.
	(arm_dwarf_handle_frame_unspec): Remove label argument.

[-- Attachment #3: 005-scanfirst.diff --]
[-- Type: text/plain, Size: 11200 bytes --]

    	* cfgcleanup.c (flow_find_head_matching_sequence): Ignore
    	epilogue notes.
    	* df-problems.c (can_move_insns_across): Don't stop at epilogue
    	notes.
    	* dwarf2out.c (dwarf2out_cfi_begin_epilogue): Also allow a
    	simplejump to end the block.

Index: gcc/dwarf2out.c
===================================================================
--- gcc.orig/dwarf2out.c
+++ gcc/dwarf2out.c
@@ -470,6 +470,8 @@ static void output_call_frame_info (int)
 static void dwarf2out_note_section_used (void);
 static bool clobbers_queued_reg_save (const_rtx);
 static void dwarf2out_frame_debug_expr (rtx, const char *);
+static void dwarf2out_cfi_begin_epilogue (rtx);
+static void dwarf2out_frame_debug_restore_state (void);
 
 /* Support for complex CFA locations.  */
 static void output_cfa_loc (dw_cfi_ref, int);
@@ -847,6 +849,15 @@ add_cfi (cfi_vec *vec, dw_cfi_ref cfi)
   VEC_safe_push (dw_cfi_ref, gc, *vec, cfi);
 }
 
+/* The insn after which a new CFI note should be emitted.  */
+static rtx cfi_insn;
+
+/* True if remember_state should be emitted before following CFI directive.  */
+static bool emit_cfa_remember;
+
+/* True if any CFI directives were emitted at the current insn.  */
+static bool any_cfis_emitted;
+
 /* Generate a new label for the CFI info to refer to.  FORCE is true
    if a label needs to be output even when using .cfi_* directives.  */
 
@@ -866,18 +877,13 @@ dwarf2out_cfi_label (bool force)
     {
       int num = dwarf2out_cfi_label_num++;
       ASM_GENERATE_INTERNAL_LABEL (label, "LCFI", num);
-      ASM_OUTPUT_DEBUG_LABEL (asm_out_file, "LCFI", num);
+      cfi_insn = emit_note_after (NOTE_INSN_CFI_LABEL, cfi_insn);
+      NOTE_LABEL_NUMBER (cfi_insn) = num;
     }
 
   return label;
 }
 
-/* True if remember_state should be emitted before following CFI directive.  */
-static bool emit_cfa_remember;
-
-/* True if any CFI directives were emitted at the current insn.  */
-static bool any_cfis_emitted;
-
 /* Add CFI to the current fde at the PC value indicated by LABEL if specified,
    or to the CIE if LABEL is NULL.  */
 
@@ -957,7 +963,8 @@ add_fde_cfi (const char *label, dw_cfi_r
 	        }
 	    }
 
-	  output_cfi_directive (cfi);
+	  cfi_insn = emit_note_after (NOTE_INSN_CFI, cfi_insn);
+	  NOTE_CFI (cfi_insn) = cfi;
 
 	  vec = &fde->dw_fde_cfi;
 	  any_cfis_emitted = true;
@@ -2791,6 +2798,11 @@ dwarf2out_frame_debug (rtx insn, bool af
   rtx note, n;
   bool handled_one = false;
 
+  if (after_p)
+    cfi_insn = insn;
+  else
+    cfi_insn = PREV_INSN (insn);
+
   if (!NONJUMP_INSN_P (insn) || clobbers_queued_reg_save (insn))
     dwarf2out_flush_queued_reg_saves ();
 
@@ -2914,6 +2926,7 @@ void
 dwarf2out_frame_debug_init (void)
 {
   size_t i;
+  rtx insn;
 
   /* Flush any queued register saves.  */
   dwarf2out_flush_queued_reg_saves ();
@@ -2940,12 +2953,64 @@ dwarf2out_frame_debug_init (void)
       XDELETEVEC (barrier_args_size);
       barrier_args_size = NULL;
     }
+  for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
+    {
+      rtx pat;
+      if (BARRIER_P (insn))
+	{
+	  dwarf2out_frame_debug (insn, false);
+	  continue;
+	}
+      else if (NOTE_P (insn))
+	{
+	  switch (NOTE_KIND (insn))
+	    {
+	    case NOTE_INSN_EPILOGUE_BEG:
+#if defined (HAVE_epilogue)
+	      dwarf2out_cfi_begin_epilogue (insn);
+#endif
+	      break;
+	    case NOTE_INSN_CFA_RESTORE_STATE:
+	      cfi_insn = insn;
+	      dwarf2out_frame_debug_restore_state ();
+	      break;
+	    }
+	  continue;
+	}
+      if (!NONDEBUG_INSN_P (insn))
+	continue;
+      pat = PATTERN (insn);
+      if (asm_noperands (pat) >= 0)
+	continue;
+      if (GET_CODE (pat) == SEQUENCE)
+	{
+	  int j;
+	  for (j = 1; j < XVECLEN (pat, 0); j++)
+	    dwarf2out_frame_debug (XVECEXP (pat, 0, j), false);
+	  insn = XVECEXP (pat, 0, 0);
+	}
+
+      if (CALL_P (insn) && dwarf2out_do_frame ())
+	dwarf2out_frame_debug (insn, false);
+      if (dwarf2out_do_frame ()
+#if !defined (HAVE_prologue)
+	  && !ACCUMULATE_OUTGOING_ARGS
+#endif
+	  )
+	dwarf2out_frame_debug (insn, true);
+    }
+}
+
+void
+dwarf2out_emit_cfi (dw_cfi_ref cfi)
+{
+  output_cfi_directive (cfi);
 }
 
-/* Determine if we need to save and restore CFI information around this
-   epilogue.  If SIBCALL is true, then this is a sibcall epilogue.  If
-   we do need to save/restore, then emit the save now, and insert a
-   NOTE_INSN_CFA_RESTORE_STATE at the appropriate place in the stream.  */
+/* Determine if we need to save and restore CFI information around
+   this epilogue.  If we do need to save/restore, then emit the save
+   now, and insert a NOTE_INSN_CFA_RESTORE_STATE at the appropriate
+   place in the stream.  */
 
 void
 dwarf2out_cfi_begin_epilogue (rtx insn)
@@ -2960,8 +3025,10 @@ dwarf2out_cfi_begin_epilogue (rtx insn)
       if (!INSN_P (i))
 	continue;
 
-      /* Look for both regular and sibcalls to end the block.  */
-      if (returnjump_p (i))
+      /* Look for both regular and sibcalls to end the block.  Various
+	 optimization passes may cause us to jump to a common epilogue
+	 tail, so we also accept simplejumps.  */
+      if (returnjump_p (i) || simplejump_p (i))
 	break;
       if (CALL_P (i) && SIBLING_CALL_P (i))
 	break;
Index: gcc/dwarf2out.h
===================================================================
--- gcc.orig/dwarf2out.h
+++ gcc/dwarf2out.h
@@ -18,11 +18,11 @@ You should have received a copy of the G
 along with GCC; see the file COPYING3.  If not see
 <http://www.gnu.org/licenses/>.  */
 
+struct dw_cfi_struct;
 extern void dwarf2out_decl (tree);
 extern void dwarf2out_frame_debug (rtx, bool);
 extern void dwarf2out_frame_debug_init (void);
-extern void dwarf2out_cfi_begin_epilogue (rtx);
-extern void dwarf2out_frame_debug_restore_state (void);
+extern void dwarf2out_emit_cfi (struct dw_cfi_struct *);
 extern void dwarf2out_flush_queued_reg_saves (void);
 
 extern void debug_dwarf (void);
Index: gcc/insn-notes.def
===================================================================
--- gcc.orig/insn-notes.def
+++ gcc/insn-notes.def
@@ -77,4 +77,12 @@ INSN_NOTE (SWITCH_TEXT_SECTIONS)
    when an epilogue appears in the middle of a function.  */
 INSN_NOTE (CFA_RESTORE_STATE)
 
+/* When emitting dwarf2 frame information, contains a directive that
+   should be emitted.  */
+INSN_NOTE (CFI)
+
+/* When emitting dwarf2 frame information, contains the number of a debug
+   label that should be emitted.  */
+INSN_NOTE (CFI_LABEL)
+
 #undef INSN_NOTE
Index: gcc/rtl.h
===================================================================
--- gcc.orig/rtl.h
+++ gcc/rtl.h
@@ -180,6 +180,7 @@ union rtunion_def
   mem_attrs *rt_mem;
   reg_attrs *rt_reg;
   struct constant_descriptor_rtx *rt_constant;
+  struct dw_cfi_struct *rt_cfi;
 };
 typedef union rtunion_def rtunion;
 
@@ -708,6 +709,7 @@ extern void rtl_check_failed_flag (const
 #define XTREE(RTX, N)   (RTL_CHECK1 (RTX, N, 't').rt_tree)
 #define XBBDEF(RTX, N)	(RTL_CHECK1 (RTX, N, 'B').rt_bb)
 #define XTMPL(RTX, N)	(RTL_CHECK1 (RTX, N, 'T').rt_str)
+#define XCFI(RTX, N)	(RTL_CHECK1 (RTX, N, 'C').rt_cfi)
 
 #define XVECEXP(RTX, N, M)	RTVEC_ELT (XVEC (RTX, N), M)
 #define XVECLEN(RTX, N)		GET_NUM_ELEM (XVEC (RTX, N))
@@ -740,6 +742,7 @@ extern void rtl_check_failed_flag (const
 #define XCMODE(RTX, N, C)     (RTL_CHECKC1 (RTX, N, C).rt_type)
 #define XCTREE(RTX, N, C)     (RTL_CHECKC1 (RTX, N, C).rt_tree)
 #define XCBBDEF(RTX, N, C)    (RTL_CHECKC1 (RTX, N, C).rt_bb)
+#define XCCFI(RTX, N, C)      (RTL_CHECKC1 (RTX, N, C).rt_cfi)
 #define XCCSELIB(RTX, N, C)   (RTL_CHECKC1 (RTX, N, C).rt_cselib)
 
 #define XCVECEXP(RTX, N, M, C)	RTVEC_ELT (XCVEC (RTX, N, C), M)
@@ -882,6 +885,8 @@ extern const char * const reg_note_name[
 #define NOTE_BLOCK(INSN)	XCTREE (INSN, 4, NOTE)
 #define NOTE_EH_HANDLER(INSN)	XCINT (INSN, 4, NOTE)
 #define NOTE_BASIC_BLOCK(INSN)	XCBBDEF (INSN, 4, NOTE)
+#define NOTE_CFI(INSN)		XCCFI (INSN, 4, NOTE)
+#define NOTE_LABEL_NUMBER(INSN)	XCINT (INSN, 4, NOTE)
 #define NOTE_VAR_LOCATION(INSN)	XCEXP (INSN, 4, NOTE)
 
 /* In a NOTE that is a line number, this is the line number.
Index: gcc/final.c
===================================================================
--- gcc.orig/final.c
+++ gcc/final.c
@@ -1678,7 +1678,7 @@ final_end_function (void)
 void
 final (rtx first, FILE *file, int optimize_p)
 {
-  rtx insn;
+  rtx insn, next;
   int max_uid = 0;
   int seen = 0;
 
@@ -1723,6 +1723,15 @@ final (rtx first, FILE *file, int optimi
 
       insn = final_scan_insn (insn, file, optimize_p, 0, &seen);
     }
+
+  for (insn = first; insn; insn = next)
+    {
+      next = NEXT_INSN (insn);
+      if (NOTE_P (insn)
+	  && (NOTE_KIND (insn) == NOTE_INSN_CFI
+	      || NOTE_KIND (insn) == NOTE_INSN_CFI_LABEL))
+	delete_insn (insn);
+    }
 }
 \f
 const char *
@@ -1899,16 +1908,19 @@ final_scan_insn (rtx insn, FILE *file, i
 	  break;
 
 	case NOTE_INSN_EPILOGUE_BEG:
-#if defined (HAVE_epilogue)
-	  if (dwarf2out_do_frame ())
-	    dwarf2out_cfi_begin_epilogue (insn);
-#endif
 	  (*debug_hooks->begin_epilogue) (last_linenum, last_filename);
 	  targetm.asm_out.function_begin_epilogue (file);
 	  break;
 
 	case NOTE_INSN_CFA_RESTORE_STATE:
-	  dwarf2out_frame_debug_restore_state ();
+	  break;
+
+	case NOTE_INSN_CFI:
+	  dwarf2out_emit_cfi (NOTE_CFI (insn));
+	  break;
+
+	case NOTE_INSN_CFI_LABEL:
+	  ASM_OUTPUT_DEBUG_LABEL (asm_out_file, "LCFI", NOTE_LABEL_NUMBER (insn));
 	  break;
 
 	case NOTE_INSN_FUNCTION_BEG:
@@ -2018,8 +2030,6 @@ final_scan_insn (rtx insn, FILE *file, i
       break;
 
     case BARRIER:
-      if (dwarf2out_do_frame ())
-	dwarf2out_frame_debug (insn, false);
       break;
 
     case CODE_LABEL:
@@ -2285,12 +2295,6 @@ final_scan_insn (rtx insn, FILE *file, i
 
 	    final_sequence = body;
 
-	    /* Record the delay slots' frame information before the branch.
-	       This is needed for delayed calls: see execute_cfa_program().  */
-	    if (dwarf2out_do_frame ())
-	      for (i = 1; i < XVECLEN (body, 0); i++)
-		dwarf2out_frame_debug (XVECEXP (body, 0, i), false);
-
 	    /* The first insn in this SEQUENCE might be a JUMP_INSN that will
 	       force the restoration of a comparison that was previously
 	       thought unnecessary.  If that happens, cancel this sequence
@@ -2604,9 +2608,6 @@ final_scan_insn (rtx insn, FILE *file, i
 
 	current_output_insn = debug_insn = insn;
 
-	if (CALL_P (insn) && dwarf2out_do_frame ())
-	  dwarf2out_frame_debug (insn, false);
-
 	/* Find the proper template for this insn.  */
 	templ = get_insn_template (insn_code_number, insn);
 
@@ -2686,16 +2687,6 @@ final_scan_insn (rtx insn, FILE *file, i
 	  targetm.asm_out.final_postscan_insn (file, insn, recog_data.operand,
 					       recog_data.n_operands);
 
-	/* If necessary, report the effect that the instruction has on
-	   the unwind info.   We've already done this for delay slots
-	   and call instructions.  */
-	if (final_sequence == 0
-#if !defined (HAVE_prologue)
-	    && !ACCUMULATE_OUTGOING_ARGS
-#endif
-	    && dwarf2out_do_frame ())
-	  dwarf2out_frame_debug (insn, true);
-
 	if (!targetm.asm_out.unwind_emit_before_insn
 	    && targetm.asm_out.unwind_emit)
 	  targetm.asm_out.unwind_emit (asm_out_file, insn);

[-- Attachment #4: 006-cfilabel.diff --]
[-- Type: text/plain, Size: 44197 bytes --]

---
 config/arm/arm.c     |    5 
 config/ia64/ia64.c   |    6 
 config/sparc/sparc.c |    7 
 config/vax/vax.c     |    2 
 dwarf2out.c          |  467 ++++++++++++++++++++++++---------------------------
 dwarf2out.h          |   32 +++
 final.c              |    5 
 target.def           |    2 
 tree.h               |   31 ---
 9 files changed, 270 insertions(+), 287 deletions(-)

Index: gcc/config/arm/arm.c
===================================================================
--- gcc.orig/config/arm/arm.c
+++ gcc/config/arm/arm.c
@@ -19977,18 +19977,19 @@ thumb_pushpop (FILE *f, unsigned long ma
 
   if (push && pushed_words && dwarf2out_do_frame ())
     {
-      char *l = dwarf2out_cfi_label (false);
       int pushed_mask = real_regs;
 
+      dwarf2out_maybe_emit_cfi_label ();
+
       *cfa_offset += pushed_words * 4;
-      dwarf2out_def_cfa (l, SP_REGNUM, *cfa_offset);
+      dwarf2out_def_cfa (SP_REGNUM, *cfa_offset);
 
       pushed_words = 0;
       pushed_mask = real_regs;
       for (regno = 0; regno <= 14; regno++, pushed_mask >>= 1)
 	{
 	  if (pushed_mask & 1)
-	    dwarf2out_reg_save (l, regno, 4 * pushed_words++ - *cfa_offset);
+	    dwarf2out_reg_save (regno, 4 * pushed_words++ - *cfa_offset);
 	}
     }
 }
@@ -20997,10 +20998,9 @@ thumb1_output_function_prologue (FILE *f
 	 the stack pointer.  */
       if (dwarf2out_do_frame ())
 	{
-	  char *l = dwarf2out_cfi_label (false);
-
+	  dwarf2out_maybe_emit_cfi_label ();
 	  cfa_offset = cfa_offset + crtl->args.pretend_args_size;
-	  dwarf2out_def_cfa (l, SP_REGNUM, cfa_offset);
+	  dwarf2out_def_cfa (SP_REGNUM, cfa_offset);
 	}
     }
 
@@ -21046,10 +21046,10 @@ thumb1_output_function_prologue (FILE *f
 
       if (dwarf2out_do_frame ())
 	{
-	  char *l = dwarf2out_cfi_label (false);
+	  dwarf2out_maybe_emit_cfi_label ();
 
 	  cfa_offset = cfa_offset + 16;
-	  dwarf2out_def_cfa (l, SP_REGNUM, cfa_offset);
+	  dwarf2out_def_cfa (SP_REGNUM, cfa_offset);
 	}
 
       if (l_mask)
@@ -22749,7 +22749,7 @@ arm_except_unwind_info (struct gcc_optio
    stack alignment.  */
 
 static void
-arm_dwarf_handle_frame_unspec (const char *label, rtx pattern, int index)
+arm_dwarf_handle_frame_unspec (rtx pattern, int index)
 {
   rtx unspec = SET_SRC (pattern);
   gcc_assert (GET_CODE (unspec) == UNSPEC);
@@ -22760,8 +22760,7 @@ arm_dwarf_handle_frame_unspec (const cha
       /* ??? We should set the CFA = (SP & ~7).  At this point we haven't
          put anything on the stack, so hopefully it won't matter.
          CFA = SP will be correct after alignment.  */
-      dwarf2out_reg_save_reg (label, stack_pointer_rtx,
-                              SET_DEST (pattern));
+      dwarf2out_reg_save_reg (stack_pointer_rtx, SET_DEST (pattern));
       break;
     default:
       gcc_unreachable ();
Index: gcc/config/ia64/ia64.c
===================================================================
--- gcc.orig/config/ia64/ia64.c
+++ gcc/config/ia64/ia64.c
@@ -330,7 +330,7 @@ static enum machine_mode ia64_promote_fu
 static void ia64_trampoline_init (rtx, tree, rtx);
 static void ia64_override_options_after_change (void);
 
-static void ia64_dwarf_handle_frame_unspec (const char *, rtx, int);
+static void ia64_dwarf_handle_frame_unspec (rtx, int);
 static tree ia64_builtin_decl (unsigned, bool);
 
 static reg_class_t ia64_preferred_reload_class (rtx, reg_class_t);
@@ -9710,9 +9710,7 @@ ia64_dwarf2out_def_steady_cfa (rtx insn,
    processing.  The real CFA definition is set up above.  */
 
 static void
-ia64_dwarf_handle_frame_unspec (const char * ARG_UNUSED (label),
-				rtx ARG_UNUSED (pattern),
-				int index)
+ia64_dwarf_handle_frame_unspec (rtx ARG_UNUSED (pattern), int index)
 {
   gcc_assert (index == UNSPECV_ALLOC);
 }
Index: gcc/config/sparc/sparc.c
===================================================================
--- gcc.orig/config/sparc/sparc.c
+++ gcc/config/sparc/sparc.c
@@ -454,7 +454,7 @@ static unsigned int sparc_function_arg_b
 						 const_tree);
 static int sparc_arg_partial_bytes (CUMULATIVE_ARGS *,
 				    enum machine_mode, tree, bool);
-static void sparc_dwarf_handle_frame_unspec (const char *, rtx, int);
+static void sparc_dwarf_handle_frame_unspec (rtx, int);
 static void sparc_output_dwarf_dtprel (FILE *, int, rtx) ATTRIBUTE_UNUSED;
 static void sparc_file_end (void);
 static bool sparc_frame_pointer_required (void);
@@ -9423,12 +9423,11 @@ get_some_local_dynamic_name_1 (rtx *px, 
    This is called from dwarf2out.c to emit call frame instructions
    for frame-related insns containing UNSPECs and UNSPEC_VOLATILEs. */
 static void
-sparc_dwarf_handle_frame_unspec (const char *label,
-				 rtx pattern ATTRIBUTE_UNUSED,
+sparc_dwarf_handle_frame_unspec (rtx pattern ATTRIBUTE_UNUSED,
 				 int index ATTRIBUTE_UNUSED)
 {
   gcc_assert (index == UNSPECV_SAVEW);
-  dwarf2out_window_save (label);
+  dwarf2out_window_save ();
 }
 
 /* This is called from dwarf2out.c via TARGET_ASM_OUTPUT_DWARF_DTPREL.
Index: gcc/config/vax/vax.c
===================================================================
--- gcc.orig/config/vax/vax.c
+++ gcc/config/vax/vax.c
@@ -163,17 +163,18 @@ vax_output_function_prologue (FILE * fil
 
   if (dwarf2out_do_frame ())
     {
-      const char *label = dwarf2out_cfi_label (false);
       int offset = 0;
 
+      dwarf2out_maybe_emit_cfi_label ();
+
       for (regno = FIRST_PSEUDO_REGISTER-1; regno >= 0; --regno)
 	if (df_regs_ever_live_p (regno) && !call_used_regs[regno])
-	  dwarf2out_reg_save (label, regno, offset -= 4);
+	  dwarf2out_reg_save (regno, offset -= 4);
 
-      dwarf2out_reg_save (label, PC_REGNUM, offset -= 4);
-      dwarf2out_reg_save (label, FRAME_POINTER_REGNUM, offset -= 4);
-      dwarf2out_reg_save (label, ARG_POINTER_REGNUM, offset -= 4);
-      dwarf2out_def_cfa (label, FRAME_POINTER_REGNUM, -(offset - 4));
+      dwarf2out_reg_save (PC_REGNUM, offset -= 4);
+      dwarf2out_reg_save (FRAME_POINTER_REGNUM, offset -= 4);
+      dwarf2out_reg_save (ARG_POINTER_REGNUM, offset -= 4);
+      dwarf2out_def_cfa (false, FRAME_POINTER_REGNUM, -(offset - 4));
     }
 
   size -= STARTING_FRAME_OFFSET;
Index: gcc/dwarf2out.c
===================================================================
--- gcc.orig/dwarf2out.c
+++ gcc/dwarf2out.c
@@ -456,11 +456,11 @@ static GTY(()) section *cold_text_sectio
 static char *stripattributes (const char *);
 static const char *dwarf_cfi_name (unsigned);
 static dw_cfi_ref new_cfi (void);
-static void add_cfi (cfi_vec *, dw_cfi_ref);
-static void add_fde_cfi (const char *, dw_cfi_ref);
+static void add_fde_cfi (dw_cfi_ref);
+static void add_cie_cfi (dw_cfi_ref);
 static void lookup_cfa_1 (dw_cfi_ref, dw_cfa_location *, dw_cfa_location *);
 static void lookup_cfa (dw_cfa_location *);
-static void reg_save (const char *, unsigned, unsigned, HOST_WIDE_INT);
+static void reg_save (bool, unsigned, unsigned, HOST_WIDE_INT);
 static void initial_return_save (rtx);
 static HOST_WIDE_INT stack_adjust_offset (const_rtx, HOST_WIDE_INT,
 					  HOST_WIDE_INT);
@@ -469,7 +469,7 @@ static void output_cfi_directive (dw_cfi
 static void output_call_frame_info (int);
 static void dwarf2out_note_section_used (void);
 static bool clobbers_queued_reg_save (const_rtx);
-static void dwarf2out_frame_debug_expr (rtx, const char *);
+static void dwarf2out_frame_debug_expr (rtx);
 static void dwarf2out_cfi_begin_epilogue (rtx);
 static void dwarf2out_frame_debug_restore_state (void);
 
@@ -482,7 +482,7 @@ static struct dw_loc_descr_struct *build
   (dw_cfa_location *, HOST_WIDE_INT);
 static struct dw_loc_descr_struct *build_cfa_aligned_loc
   (HOST_WIDE_INT, HOST_WIDE_INT);
-static void def_cfa_1 (const char *, dw_cfa_location *);
+static void def_cfa_1 (bool, dw_cfa_location *);
 static struct dw_loc_descr_struct *mem_loc_descriptor
   (rtx, enum machine_mode mode, enum var_init_status);
 
@@ -820,35 +820,6 @@ new_cfi (void)
   return cfi;
 }
 
-/* Add a Call Frame Instruction to list of instructions.  */
-
-static inline void
-add_cfi (cfi_vec *vec, dw_cfi_ref cfi)
-{
-  dw_fde_ref fde = current_fde ();
-
-  /* When DRAP is used, CFA is defined with an expression.  Redefine
-     CFA may lead to a different CFA value.   */
-  /* ??? Of course, this heuristic fails when we're annotating epilogues,
-     because of course we'll always want to redefine the CFA back to the
-     stack pointer on the way out.  Where should we move this check?  */
-  if (0 && fde && fde->drap_reg != INVALID_REGNUM)
-    switch (cfi->dw_cfi_opc)
-      {
-        case DW_CFA_def_cfa_register:
-        case DW_CFA_def_cfa_offset:
-        case DW_CFA_def_cfa_offset_sf:
-        case DW_CFA_def_cfa:
-        case DW_CFA_def_cfa_sf:
-	  gcc_unreachable ();
-
-        default:
-          break;
-      }
-
-  VEC_safe_push (dw_cfi_ref, gc, *vec, cfi);
-}
-
 /* The insn after which a new CFI note should be emitted.  */
 static rtx cfi_insn;
 
@@ -858,45 +829,51 @@ static bool emit_cfa_remember;
 /* True if any CFI directives were emitted at the current insn.  */
 static bool any_cfis_emitted;
 
-/* Generate a new label for the CFI info to refer to.  FORCE is true
-   if a label needs to be output even when using .cfi_* directives.  */
+/* Generate a new label for the CFI info to refer to.  */
 
-char *
-dwarf2out_cfi_label (bool force)
+static char *
+dwarf2out_cfi_label (void)
 {
   static char label[20];
 
-  if (!force && dwarf2out_do_cfi_asm ())
-    {
-      /* In this case, we will be emitting the asm directive instead of
-	 the label, so just return a placeholder to keep the rest of the
-	 interfaces happy.  */
-      strcpy (label, "<do not output>");
-    }
-  else
-    {
-      int num = dwarf2out_cfi_label_num++;
-      ASM_GENERATE_INTERNAL_LABEL (label, "LCFI", num);
-      cfi_insn = emit_note_after (NOTE_INSN_CFI_LABEL, cfi_insn);
-      NOTE_LABEL_NUMBER (cfi_insn) = num;
-    }
+  int num = dwarf2out_cfi_label_num++;
+  ASM_GENERATE_INTERNAL_LABEL (label, "LCFI", num);
 
   return label;
 }
 
+/* Called by target specific code if it wants to emit CFI insns in the text
+   prologue.  If necessary, emit a CFI label and an advance_loc CFI.  See
+   also cfi_label_required_p.  */
+void
+dwarf2out_maybe_emit_cfi_label (void)
+{
+  if ((dwarf_version == 2
+       && debug_info_level > DINFO_LEVEL_TERSE
+       && (write_symbols == DWARF2_DEBUG
+	   || write_symbols == VMS_AND_DWARF2_DEBUG))
+      || !dwarf2out_do_cfi_asm ())
+    {
+      const char *l;
+      dw_cfi_ref xcfi;
+
+      ASM_OUTPUT_DEBUG_LABEL (asm_out_file, "LCFI", dwarf2out_cfi_label_num);
+      l = dwarf2out_cfi_label ();
+      l = xstrdup (l);
+
+      xcfi = new_cfi ();
+      xcfi->dw_cfi_opc = DW_CFA_advance_loc4;
+      xcfi->dw_cfi_oprnd1.dw_cfi_addr = l;
+      add_fde_cfi (xcfi);
+    }
+}
+
 /* Add CFI to the current fde at the PC value indicated by LABEL if specified,
    or to the CIE if LABEL is NULL.  */
 
 static void
-add_fde_cfi (const char *label, dw_cfi_ref cfi)
+add_fde_cfi (dw_cfi_ref cfi)
 {
-  cfi_vec *vec;
-
-  if (cie_cfi_vec == NULL)
-    cie_cfi_vec = VEC_alloc (dw_cfi_ref, gc, 20);
-
-  vec = &cie_cfi_vec;
-
   if (emit_cfa_remember)
     {
       dw_cfi_ref cfi_remember;
@@ -905,110 +882,30 @@ add_fde_cfi (const char *label, dw_cfi_r
       emit_cfa_remember = false;
       cfi_remember = new_cfi ();
       cfi_remember->dw_cfi_opc = DW_CFA_remember_state;
-      add_fde_cfi (label, cfi_remember);
+      add_fde_cfi (cfi_remember);
     }
 
-  if (dwarf2out_do_cfi_asm ())
+  any_cfis_emitted = true;
+  if (cfi_insn != NULL)
     {
-      if (label)
-	{
-	  dw_fde_ref fde = current_fde ();
-
-	  gcc_assert (fde != NULL);
-
-	  /* We still have to add the cfi to the list so that lookup_cfa
-	     works later on.  When -g2 and above we even need to force
-	     emitting of CFI labels and add to list a DW_CFA_set_loc for
-	     convert_cfa_to_fb_loc_list purposes.  If we're generating
-	     DWARF3 output we use DW_OP_call_frame_cfa and so don't use
-	     convert_cfa_to_fb_loc_list.  */
-	  if (dwarf_version == 2
-	      && debug_info_level > DINFO_LEVEL_TERSE
-	      && (write_symbols == DWARF2_DEBUG
-		  || write_symbols == VMS_AND_DWARF2_DEBUG))
-	    {
-	      switch (cfi->dw_cfi_opc)
-		{
-		case DW_CFA_def_cfa_offset:
-		case DW_CFA_def_cfa_offset_sf:
-		case DW_CFA_def_cfa_register:
-		case DW_CFA_def_cfa:
-		case DW_CFA_def_cfa_sf:
-		case DW_CFA_def_cfa_expression:
-		case DW_CFA_restore_state:
-		  if (*label == 0 || strcmp (label, "<do not output>") == 0)
-		    label = dwarf2out_cfi_label (true);
-
-		  if (fde->dw_fde_current_label == NULL
-		      || strcmp (label, fde->dw_fde_current_label) != 0)
-		    {
-		      dw_cfi_ref xcfi;
-
-		      label = xstrdup (label);
-
-		      /* Set the location counter to the new label.  */
-		      xcfi = new_cfi ();
-		      /* It doesn't metter whether DW_CFA_set_loc
-		         or DW_CFA_advance_loc4 is added here, those aren't
-		         emitted into assembly, only looked up by
-		         convert_cfa_to_fb_loc_list.  */
-		      xcfi->dw_cfi_opc = DW_CFA_set_loc;
-		      xcfi->dw_cfi_oprnd1.dw_cfi_addr = label;
-		      add_cfi (&fde->dw_fde_cfi, xcfi);
-		      fde->dw_fde_current_label = label;
-		    }
-		  break;
-		default:
-		  break;
-	        }
-	    }
-
-	  cfi_insn = emit_note_after (NOTE_INSN_CFI, cfi_insn);
-	  NOTE_CFI (cfi_insn) = cfi;
-
-	  vec = &fde->dw_fde_cfi;
-	  any_cfis_emitted = true;
-	}
-      /* ??? If this is a CFI for the CIE, we don't emit.  This
-	 assumes that the standard CIE contents that the assembler
-	 uses matches the standard CIE contents that the compiler
-	 uses.  This is probably a bad assumption.  I'm not quite
-	 sure how to address this for now.  */
+      cfi_insn = emit_note_after (NOTE_INSN_CFI, cfi_insn);
+      NOTE_CFI (cfi_insn) = cfi;
     }
-  else if (label)
+  else
     {
       dw_fde_ref fde = current_fde ();
-
-      gcc_assert (fde != NULL);
-
-      if (*label == 0)
-	label = dwarf2out_cfi_label (false);
-
-      if (fde->dw_fde_current_label == NULL
-	  || strcmp (label, fde->dw_fde_current_label) != 0)
-	{
-	  dw_cfi_ref xcfi;
-
-	  label = xstrdup (label);
-
-	  /* Set the location counter to the new label.  */
-	  xcfi = new_cfi ();
-	  /* If we have a current label, advance from there, otherwise
-	     set the location directly using set_loc.  */
-	  xcfi->dw_cfi_opc = fde->dw_fde_current_label
-			     ? DW_CFA_advance_loc4
-			     : DW_CFA_set_loc;
-	  xcfi->dw_cfi_oprnd1.dw_cfi_addr = label;
-	  add_cfi (&fde->dw_fde_cfi, xcfi);
-
-	  fde->dw_fde_current_label = label;
-	}
-
-      vec = &fde->dw_fde_cfi;
-      any_cfis_emitted = true;
+      VEC_safe_push (dw_cfi_ref, gc, fde->dw_fde_cfi, cfi);
+      dwarf2out_emit_cfi (cfi);
     }
+}
+
+static void
+add_cie_cfi (dw_cfi_ref cfi)
+{
+  if (cie_cfi_vec == NULL)
+    cie_cfi_vec = VEC_alloc (dw_cfi_ref, gc, 20);
 
-  add_cfi (vec, cfi);
+  VEC_safe_push (dw_cfi_ref, gc, cie_cfi_vec, cfi);
 }
 
 /* Subroutine of lookup_cfa.  */
@@ -1076,6 +973,9 @@ lookup_cfa (dw_cfa_location *loc)
 /* The current rule for calculating the DWARF2 canonical frame address.  */
 static dw_cfa_location cfa;
 
+/* A copy of CFA, for comparison purposes  */
+static dw_cfa_location old_cfa;
+
 /* The register used for saving registers to the stack, and its offset
    from the CFA.  */
 static dw_cfa_location cfa_store;
@@ -1083,25 +983,27 @@ static dw_cfa_location cfa_store;
 /* The current save location around an epilogue.  */
 static dw_cfa_location cfa_remember;
 
+/* Like cfa_remember, but a copy of old_cfa.  */
+static dw_cfa_location old_cfa_remember;
+
 /* The running total of the size of arguments pushed onto the stack.  */
 static HOST_WIDE_INT args_size;
 
 /* The last args_size we actually output.  */
 static HOST_WIDE_INT old_args_size;
 
-/* Entry point to update the canonical frame address (CFA).
-   LABEL is passed to add_fde_cfi.  The value of CFA is now to be
-   calculated from REG+OFFSET.  */
+/* Entry point to update the canonical frame address (CFA).  The value
+   of CFA is now to be calculated from REG+OFFSET.  */
 
 void
-dwarf2out_def_cfa (const char *label, unsigned int reg, HOST_WIDE_INT offset)
+dwarf2out_def_cfa (bool for_cie, unsigned int reg, HOST_WIDE_INT offset)
 {
   dw_cfa_location loc;
   loc.indirect = 0;
   loc.base_offset = 0;
   loc.reg = reg;
   loc.offset = offset;
-  def_cfa_1 (label, &loc);
+  def_cfa_1 (for_cie, &loc);
 }
 
 /* Determine if two dw_cfa_location structures define the same data.  */
@@ -1120,10 +1022,10 @@ cfa_equal_p (const dw_cfa_location *loc1
    the dw_cfa_location structure.  */
 
 static void
-def_cfa_1 (const char *label, dw_cfa_location *loc_p)
+def_cfa_1 (bool for_cie, dw_cfa_location *loc_p)
 {
   dw_cfi_ref cfi;
-  dw_cfa_location old_cfa, loc;
+  dw_cfa_location loc;
 
   cfa = *loc_p;
   loc = *loc_p;
@@ -1132,7 +1034,6 @@ def_cfa_1 (const char *label, dw_cfa_loc
     cfa_store.offset = loc.offset;
 
   loc.reg = DWARF_FRAME_REGNUM (loc.reg);
-  lookup_cfa (&old_cfa);
 
   /* If nothing changed, no need to issue any call frame instructions.  */
   if (cfa_equal_p (&loc, &old_cfa))
@@ -1193,16 +1094,19 @@ def_cfa_1 (const char *label, dw_cfa_loc
       cfi->dw_cfi_oprnd1.dw_cfi_loc = loc_list;
     }
 
-  add_fde_cfi (label, cfi);
+  if (for_cie)
+    add_cie_cfi (cfi);
+  else
+    add_fde_cfi (cfi);
+  old_cfa = loc;
 }
 
 /* Add the CFI for saving a register.  REG is the CFA column number.
-   LABEL is passed to add_fde_cfi.
    If SREG is -1, the register is saved at OFFSET from the CFA;
    otherwise it is saved in SREG.  */
 
 static void
-reg_save (const char *label, unsigned int reg, unsigned int sreg, HOST_WIDE_INT offset)
+reg_save (bool for_cie, unsigned int reg, unsigned int sreg, HOST_WIDE_INT offset)
 {
   dw_cfi_ref cfi = new_cfi ();
   dw_fde_ref fde = current_fde ();
@@ -1238,10 +1142,13 @@ reg_save (const char *label, unsigned in
       cfi->dw_cfi_oprnd2.dw_cfi_reg_num = sreg;
     }
 
-  add_fde_cfi (label, cfi);
+  if (for_cie)
+    add_cie_cfi (cfi);
+  else
+    add_fde_cfi (cfi);
 }
 
-/* Add the CFI for saving a register window.  LABEL is passed to reg_save.
+/* Add the CFI for saving a register window.
    This CFI tells the unwinder that it needs to restore the window registers
    from the previous frame's window save area.
 
@@ -1249,39 +1156,39 @@ reg_save (const char *label, unsigned in
    assuming 0(cfa)) and what registers are in the window.  */
 
 void
-dwarf2out_window_save (const char *label)
+dwarf2out_window_save (void)
 {
   dw_cfi_ref cfi = new_cfi ();
 
   cfi->dw_cfi_opc = DW_CFA_GNU_window_save;
-  add_fde_cfi (label, cfi);
+  add_fde_cfi (cfi);
 }
 
 /* Entry point for saving a register to the stack.  REG is the GCC register
    number.  LABEL and OFFSET are passed to reg_save.  */
 
 void
-dwarf2out_reg_save (const char *label, unsigned int reg, HOST_WIDE_INT offset)
+dwarf2out_reg_save (unsigned int reg, HOST_WIDE_INT offset)
 {
-  reg_save (label, DWARF_FRAME_REGNUM (reg), INVALID_REGNUM, offset);
+  reg_save (false, DWARF_FRAME_REGNUM (reg), INVALID_REGNUM, offset);
 }
 
 /* Entry point for saving the return address in the stack.
    LABEL and OFFSET are passed to reg_save.  */
 
 void
-dwarf2out_return_save (const char *label, HOST_WIDE_INT offset)
+dwarf2out_return_save (HOST_WIDE_INT offset)
 {
-  reg_save (label, DWARF_FRAME_RETURN_COLUMN, INVALID_REGNUM, offset);
+  reg_save (false, DWARF_FRAME_RETURN_COLUMN, INVALID_REGNUM, offset);
 }
 
 /* Entry point for saving the return address in a register.
    LABEL and SREG are passed to reg_save.  */
 
 void
-dwarf2out_return_reg (const char *label, unsigned int sreg)
+dwarf2out_return_reg (unsigned int sreg)
 {
-  reg_save (label, DWARF_FRAME_RETURN_COLUMN, DWARF_FRAME_REGNUM (sreg), 0);
+  reg_save (false, DWARF_FRAME_RETURN_COLUMN, DWARF_FRAME_REGNUM (sreg), 0);
 }
 
 /* Record the initial position of the return address.  RTL is
@@ -1339,7 +1246,7 @@ initial_return_save (rtx rtl)
     }
 
   if (reg != DWARF_FRAME_RETURN_COLUMN)
-    reg_save (NULL, DWARF_FRAME_RETURN_COLUMN, reg, offset - cfa.offset);
+    reg_save (true, DWARF_FRAME_RETURN_COLUMN, reg, offset - cfa.offset);
 }
 
 /* Given a SET, calculate the amount of stack adjustment it
@@ -1609,7 +1516,7 @@ compute_barrier_args_size (void)
    pushed onto the stack.  */
 
 static void
-dwarf2out_args_size (const char *label, HOST_WIDE_INT size)
+dwarf2out_args_size (HOST_WIDE_INT size)
 {
   dw_cfi_ref cfi;
 
@@ -1621,13 +1528,13 @@ dwarf2out_args_size (const char *label, 
   cfi = new_cfi ();
   cfi->dw_cfi_opc = DW_CFA_GNU_args_size;
   cfi->dw_cfi_oprnd1.dw_cfi_offset = size;
-  add_fde_cfi (label, cfi);
+  add_fde_cfi (cfi);
 }
 
 /* Record a stack adjustment of OFFSET bytes.  */
 
 static void
-dwarf2out_stack_adjust (HOST_WIDE_INT offset, const char *label)
+dwarf2out_stack_adjust (HOST_WIDE_INT offset)
 {
   if (cfa.reg == STACK_POINTER_REGNUM)
     cfa.offset += offset;
@@ -1646,9 +1553,9 @@ dwarf2out_stack_adjust (HOST_WIDE_INT of
   if (args_size < 0)
     args_size = 0;
 
-  def_cfa_1 (label, &cfa);
+  def_cfa_1 (false, &cfa);
   if (flag_asynchronous_unwind_tables)
-    dwarf2out_args_size (label, args_size);
+    dwarf2out_args_size (args_size);
 }
 
 /* Check INSN to see if it looks like a push or a stack adjustment, and
@@ -1659,7 +1566,6 @@ static void
 dwarf2out_notice_stack_adjust (rtx insn, bool after_p)
 {
   HOST_WIDE_INT offset;
-  const char *label;
   int i;
 
   /* Don't handle epilogues at all.  Certainly it would be wrong to do so
@@ -1690,7 +1596,7 @@ dwarf2out_notice_stack_adjust (rtx insn,
 	  if (GET_CODE (insn) == SET)
 	    insn = SET_SRC (insn);
 	  gcc_assert (GET_CODE (insn) == CALL);
-	  dwarf2out_args_size ("", INTVAL (XEXP (insn, 1)));
+	  dwarf2out_args_size (INTVAL (XEXP (insn, 1)));
 	}
       return;
     }
@@ -1698,7 +1604,7 @@ dwarf2out_notice_stack_adjust (rtx insn,
   if (CALL_P (insn) && !after_p)
     {
       if (!flag_asynchronous_unwind_tables)
-	dwarf2out_args_size ("", args_size);
+	dwarf2out_args_size (args_size);
       return;
     }
   else if (BARRIER_P (insn))
@@ -1739,8 +1645,7 @@ dwarf2out_notice_stack_adjust (rtx insn,
   if (offset == 0)
     return;
 
-  label = dwarf2out_cfi_label (false);
-  dwarf2out_stack_adjust (offset, label);
+  dwarf2out_stack_adjust (offset);
 }
 
 /* We delay emitting a register save until either (a) we reach the end
@@ -1769,13 +1674,11 @@ struct GTY(()) reg_saved_in_data {
 static GTY(()) struct reg_saved_in_data regs_saved_in_regs[4];
 static GTY(()) size_t num_regs_saved_in_regs;
 
-static const char *last_reg_save_label;
-
 /* Add an entry to QUEUED_REG_SAVES saying that REG is now saved at
    SREG, or if SREG is NULL then it is saved at OFFSET to the CFA.  */
 
 static void
-queue_reg_save (const char *label, rtx reg, rtx sreg, HOST_WIDE_INT offset)
+queue_reg_save (rtx reg, rtx sreg, HOST_WIDE_INT offset)
 {
   struct queued_reg_save *q;
 
@@ -1796,8 +1699,6 @@ queue_reg_save (const char *label, rtx r
   q->reg = reg;
   q->cfa_offset = offset;
   q->saved_reg = sreg;
-
-  last_reg_save_label = label;
 }
 
 /* Output all the entries in QUEUED_REG_SAVES.  */
@@ -1831,11 +1732,10 @@ dwarf2out_flush_queued_reg_saves (void)
 	sreg = DWARF_FRAME_REGNUM (REGNO (q->saved_reg));
       else
 	sreg = INVALID_REGNUM;
-      reg_save (last_reg_save_label, reg, sreg, q->cfa_offset);
+      reg_save (false, reg, sreg, q->cfa_offset);
     }
 
   queued_reg_saves = NULL;
-  last_reg_save_label = NULL;
 }
 
 /* Does INSN clobber any register which QUEUED_REG_SAVES lists a saved
@@ -1865,7 +1765,7 @@ clobbers_queued_reg_save (const_rtx insn
 /* Entry point for saving the first register into the second.  */
 
 void
-dwarf2out_reg_save_reg (const char *label, rtx reg, rtx sreg)
+dwarf2out_reg_save_reg (rtx reg, rtx sreg)
 {
   size_t i;
   unsigned int regno, sregno;
@@ -1883,7 +1783,7 @@ dwarf2out_reg_save_reg (const char *labe
 
   regno = DWARF_FRAME_REGNUM (REGNO (reg));
   sregno = DWARF_FRAME_REGNUM (REGNO (sreg));
-  reg_save (label, regno, sregno, 0);
+  reg_save (false, regno, sregno, 0);
 }
 
 /* What register, if any, is currently saved in REG?  */
@@ -1916,7 +1816,7 @@ static dw_cfa_location cfa_temp;
 /* A subroutine of dwarf2out_frame_debug, process a REG_DEF_CFA note.  */
 
 static void
-dwarf2out_frame_debug_def_cfa (rtx pat, const char *label)
+dwarf2out_frame_debug_def_cfa (rtx pat)
 {
   memset (&cfa, 0, sizeof (cfa));
 
@@ -1947,13 +1847,13 @@ dwarf2out_frame_debug_def_cfa (rtx pat, 
       gcc_unreachable ();
     }
 
-  def_cfa_1 (label, &cfa);
+  def_cfa_1 (false, &cfa);
 }
 
 /* A subroutine of dwarf2out_frame_debug, process a REG_ADJUST_CFA note.  */
 
 static void
-dwarf2out_frame_debug_adjust_cfa (rtx pat, const char *label)
+dwarf2out_frame_debug_adjust_cfa (rtx pat)
 {
   rtx src, dest;
 
@@ -1978,13 +1878,13 @@ dwarf2out_frame_debug_adjust_cfa (rtx pa
   cfa.reg = REGNO (dest);
   gcc_assert (cfa.indirect == 0);
 
-  def_cfa_1 (label, &cfa);
+  def_cfa_1 (false, &cfa);
 }
 
 /* A subroutine of dwarf2out_frame_debug, process a REG_CFA_OFFSET note.  */
 
 static void
-dwarf2out_frame_debug_cfa_offset (rtx set, const char *label)
+dwarf2out_frame_debug_cfa_offset (rtx set)
 {
   HOST_WIDE_INT offset;
   rtx src, addr, span;
@@ -2014,7 +1914,7 @@ dwarf2out_frame_debug_cfa_offset (rtx se
   /* ??? We'd like to use queue_reg_save, but we need to come up with
      a different flushing heuristic for epilogues.  */
   if (!span)
-    reg_save (label, DWARF_FRAME_REGNUM (REGNO (src)), INVALID_REGNUM, offset);
+    reg_save (false, DWARF_FRAME_REGNUM (REGNO (src)), INVALID_REGNUM, offset);
   else
     {
       /* We have a PARALLEL describing where the contents of SRC live.
@@ -2030,7 +1930,7 @@ dwarf2out_frame_debug_cfa_offset (rtx se
 	{
 	  rtx elem = XVECEXP (span, 0, par_index);
 
-	  reg_save (label, DWARF_FRAME_REGNUM (REGNO (elem)),
+	  reg_save (false, DWARF_FRAME_REGNUM (REGNO (elem)),
 		    INVALID_REGNUM, span_offset);
 	  span_offset += GET_MODE_SIZE (GET_MODE (elem));
 	}
@@ -2040,7 +1940,7 @@ dwarf2out_frame_debug_cfa_offset (rtx se
 /* A subroutine of dwarf2out_frame_debug, process a REG_CFA_REGISTER note.  */
 
 static void
-dwarf2out_frame_debug_cfa_register (rtx set, const char *label)
+dwarf2out_frame_debug_cfa_register (rtx set)
 {
   rtx src, dest;
   unsigned sregno, dregno;
@@ -2057,13 +1957,13 @@ dwarf2out_frame_debug_cfa_register (rtx 
 
   /* ??? We'd like to use queue_reg_save, but we need to come up with
      a different flushing heuristic for epilogues.  */
-  reg_save (label, sregno, dregno, 0);
+  reg_save (false, sregno, dregno, 0);
 }
 
 /* A subroutine of dwarf2out_frame_debug, process a REG_CFA_EXPRESSION note. */
 
 static void
-dwarf2out_frame_debug_cfa_expression (rtx set, const char *label)
+dwarf2out_frame_debug_cfa_expression (rtx set)
 {
   rtx src, dest, span;
   dw_cfi_ref cfi = new_cfi ();
@@ -2085,13 +1985,13 @@ dwarf2out_frame_debug_cfa_expression (rt
 
   /* ??? We'd like to use queue_reg_save, were the interface different,
      and, as above, we could manage flushing for epilogues.  */
-  add_fde_cfi (label, cfi);
+  add_fde_cfi (cfi);
 }
 
 /* A subroutine of dwarf2out_frame_debug, process a REG_CFA_RESTORE note.  */
 
 static void
-dwarf2out_frame_debug_cfa_restore (rtx reg, const char *label)
+dwarf2out_frame_debug_cfa_restore (rtx reg)
 {
   dw_cfi_ref cfi = new_cfi ();
   unsigned int regno = DWARF_FRAME_REGNUM (REGNO (reg));
@@ -2099,7 +1999,102 @@ dwarf2out_frame_debug_cfa_restore (rtx r
   cfi->dw_cfi_opc = (regno & ~0x3f ? DW_CFA_restore_extended : DW_CFA_restore);
   cfi->dw_cfi_oprnd1.dw_cfi_reg_num = regno;
 
-  add_fde_cfi (label, cfi);
+  add_fde_cfi (cfi);
+}
+
+/* Examine CFI and return true if a cfi label and set_loc is needed before
+   it.  Even when generating CFI assembler instructions, we still have to
+   add the cfi to the list so that lookup_cfa works later on.  When
+   -g2 and above we even need to force emitting of CFI labels and add
+   to list a DW_CFA_set_loc for convert_cfa_to_fb_loc_list purposes.
+   If we're generating DWARF3 output we use DW_OP_call_frame_cfa and
+   so don't use convert_cfa_to_fb_loc_list.  */
+
+static bool
+cfi_label_required_p (dw_cfi_ref cfi)
+{
+  if (!dwarf2out_do_cfi_asm ())
+    return true;
+
+  if (dwarf_version == 2
+      && debug_info_level > DINFO_LEVEL_TERSE
+      && (write_symbols == DWARF2_DEBUG
+	  || write_symbols == VMS_AND_DWARF2_DEBUG))
+    {
+      switch (cfi->dw_cfi_opc)
+	{
+	case DW_CFA_def_cfa_offset:
+	case DW_CFA_def_cfa_offset_sf:
+	case DW_CFA_def_cfa_register:
+	case DW_CFA_def_cfa:
+	case DW_CFA_def_cfa_sf:
+	case DW_CFA_def_cfa_expression:
+	case DW_CFA_restore_state:
+	  return true;
+	default:
+	  return false;
+	}
+    }
+  return false;
+}
+
+/* Walk the functino, looking for NOTE_INSN_CFI notes.  Add the CFIs to the
+   function's FDE, adding CFI labels and set_loc/advance_loc opcodes as
+   necessary.  */
+static void
+add_cfis_to_fde (void)
+{
+  dw_fde_ref fde = current_fde ();
+  rtx insn, next;
+  /* We always start with a function_begin label.  */
+  bool first = false;
+
+  for (insn = get_insns (); insn; insn = next)
+    {
+      next = NEXT_INSN (insn);
+
+      if (NOTE_P (insn) && NOTE_KIND (insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS)
+	/* Don't attempt to advance_loc4 between labels in different
+	   sections.  */
+	first = true;
+
+      if (NOTE_P (insn) && NOTE_KIND (insn) == NOTE_INSN_CFI)
+	{
+	  bool required = cfi_label_required_p (NOTE_CFI (insn));
+	  while (next && NOTE_P (next) && NOTE_KIND (next) == NOTE_INSN_CFI)
+	    {
+	      required |= cfi_label_required_p (NOTE_CFI (next));
+	      next = NEXT_INSN (next);
+	    }
+	  if (required)
+	    {
+	      int num = dwarf2out_cfi_label_num;
+	      const char *label = dwarf2out_cfi_label ();
+	      dw_cfi_ref xcfi;
+	      rtx tmp;
+
+	      label = xstrdup (label);
+
+	      /* Set the location counter to the new label.  */
+	      xcfi = new_cfi ();
+	      xcfi->dw_cfi_opc = (first ? DW_CFA_set_loc
+				  : DW_CFA_advance_loc4);
+	      xcfi->dw_cfi_oprnd1.dw_cfi_addr = label;
+	      VEC_safe_push (dw_cfi_ref, gc, fde->dw_fde_cfi, xcfi);
+
+	      tmp = emit_note_before (NOTE_INSN_CFI_LABEL, insn);
+	      NOTE_LABEL_NUMBER (tmp) = num;
+	    }
+
+	  do
+	    {
+	      VEC_safe_push (dw_cfi_ref, gc, fde->dw_fde_cfi, NOTE_CFI (insn));
+	      insn = NEXT_INSN (insn);
+	    }
+	  while (insn != next);
+	  first = false;
+	}
+    }
 }
 
 /* Record call frame debugging information for an expression EXPR,
@@ -2298,7 +2293,7 @@ dwarf2out_frame_debug_cfa_restore (rtx r
   	   cfa.reg == fde->drap_reg  */
 
 static void
-dwarf2out_frame_debug_expr (rtx expr, const char *label)
+dwarf2out_frame_debug_expr (rtx expr)
 {
   rtx src, dest, span;
   HOST_WIDE_INT offset;
@@ -2327,7 +2322,7 @@ dwarf2out_frame_debug_expr (rtx expr, co
 	    if (GET_CODE (elem) == SET
 		&& MEM_P (SET_DEST (elem))
 		&& (RTX_FRAME_RELATED_P (elem) || par_index == 0))
-	      dwarf2out_frame_debug_expr (elem, label);
+	      dwarf2out_frame_debug_expr (elem);
 	  }
 
       for (par_index = 0; par_index < limit; par_index++)
@@ -2336,7 +2331,7 @@ dwarf2out_frame_debug_expr (rtx expr, co
 	  if (GET_CODE (elem) == SET
 	      && (!MEM_P (SET_DEST (elem)) || GET_CODE (expr) == SEQUENCE)
 	      && (RTX_FRAME_RELATED_P (elem) || par_index == 0))
-	    dwarf2out_frame_debug_expr (elem, label);
+	    dwarf2out_frame_debug_expr (elem);
 	  else if (GET_CODE (elem) == SET
 		   && par_index != 0
 		   && !RTX_FRAME_RELATED_P (elem))
@@ -2346,7 +2341,7 @@ dwarf2out_frame_debug_expr (rtx expr, co
 	      HOST_WIDE_INT offset = stack_adjust_offset (elem, args_size, 0);
 
 	      if (offset != 0)
-		dwarf2out_stack_adjust (offset, label);
+		dwarf2out_stack_adjust (offset);
 	    }
 	}
       return;
@@ -2406,7 +2401,7 @@ dwarf2out_frame_debug_expr (rtx expr, co
 			    && fde->drap_reg != INVALID_REGNUM
 			    && cfa.reg != REGNO (src));
 	      else
-		queue_reg_save (label, src, dest, 0);
+		queue_reg_save (src, dest, 0);
 	    }
 	  break;
 
@@ -2536,7 +2531,7 @@ dwarf2out_frame_debug_expr (rtx expr, co
 	case UNSPEC:
 	case UNSPEC_VOLATILE:
 	  gcc_assert (targetm.dwarf_handle_frame_unspec);
-	  targetm.dwarf_handle_frame_unspec (label, expr, XINT (src, 1));
+	  targetm.dwarf_handle_frame_unspec (expr, XINT (src, 1));
 	  return;
 
 	  /* Rule 16 */
@@ -2565,7 +2560,7 @@ dwarf2out_frame_debug_expr (rtx expr, co
 	  gcc_unreachable ();
 	}
 
-      def_cfa_1 (label, &cfa);
+      def_cfa_1 (false, &cfa);
       break;
 
     case MEM:
@@ -2721,15 +2716,15 @@ dwarf2out_frame_debug_expr (rtx expr, co
 
 		  fde->drap_reg_saved = 1;
 
-		  def_cfa_1 (label, &cfa_exp);
+		  def_cfa_1 (false, &cfa_exp);
 		  break;
                 }
 
 	      /* If the source register is exactly the CFA, assume
 		 we're saving SP like any other register; this happens
 		 on the ARM.  */
-	      def_cfa_1 (label, &cfa);
-	      queue_reg_save (label, stack_pointer_rtx, NULL_RTX, offset);
+	      def_cfa_1 (false, &cfa);
+	      queue_reg_save (stack_pointer_rtx, NULL_RTX, offset);
 	      break;
 	    }
 	  else
@@ -2745,17 +2740,17 @@ dwarf2out_frame_debug_expr (rtx expr, co
 	      cfa.reg = REGNO (x);
 	      cfa.base_offset = offset;
 	      cfa.indirect = 1;
-	      def_cfa_1 (label, &cfa);
+	      def_cfa_1 (false, &cfa);
 	      break;
 	    }
 	}
 
-      def_cfa_1 (label, &cfa);
+      def_cfa_1 (false, &cfa);
       {
 	span = targetm.dwarf_register_span (src);
 
 	if (!span)
-	  queue_reg_save (label, src, NULL_RTX, offset);
+	  queue_reg_save (src, NULL_RTX, offset);
 	else
 	  {
 	    /* We have a PARALLEL describing where the contents of SRC
@@ -2772,7 +2767,7 @@ dwarf2out_frame_debug_expr (rtx expr, co
 	      {
 		rtx elem = XVECEXP (span, 0, par_index);
 
-		queue_reg_save (label, elem, NULL_RTX, span_offset);
+		queue_reg_save (elem, NULL_RTX, span_offset);
 		span_offset += GET_MODE_SIZE (GET_MODE (elem));
 	      }
 	  }
@@ -2794,7 +2789,6 @@ dwarf2out_frame_debug_expr (rtx expr, co
 void
 dwarf2out_frame_debug (rtx insn, bool after_p)
 {
-  const char *label;
   rtx note, n;
   bool handled_one = false;
 
@@ -2813,10 +2807,10 @@ dwarf2out_frame_debug (rtx insn, bool af
 	 is still used to save registers.  */
       if (!ACCUMULATE_OUTGOING_ARGS)
 	dwarf2out_notice_stack_adjust (insn, after_p);
+      cfi_insn = NULL;
       return;
     }
 
-  label = dwarf2out_cfi_label (false);
   any_cfis_emitted = false;
 
   for (note = REG_NOTES (insn); note; note = XEXP (note, 1))
@@ -2827,7 +2821,7 @@ dwarf2out_frame_debug (rtx insn, bool af
 	goto found;
 
       case REG_CFA_DEF_CFA:
-	dwarf2out_frame_debug_def_cfa (XEXP (note, 0), label);
+	dwarf2out_frame_debug_def_cfa (XEXP (note, 0));
 	handled_one = true;
 	break;
 
@@ -2839,7 +2833,7 @@ dwarf2out_frame_debug (rtx insn, bool af
 	    if (GET_CODE (n) == PARALLEL)
 	      n = XVECEXP (n, 0, 0);
 	  }
-	dwarf2out_frame_debug_adjust_cfa (n, label);
+	dwarf2out_frame_debug_adjust_cfa (n);
 	handled_one = true;
 	break;
 
@@ -2847,7 +2841,7 @@ dwarf2out_frame_debug (rtx insn, bool af
 	n = XEXP (note, 0);
 	if (n == NULL)
 	  n = single_set (insn);
-	dwarf2out_frame_debug_cfa_offset (n, label);
+	dwarf2out_frame_debug_cfa_offset (n);
 	handled_one = true;
 	break;
 
@@ -2859,7 +2853,7 @@ dwarf2out_frame_debug (rtx insn, bool af
 	    if (GET_CODE (n) == PARALLEL)
 	      n = XVECEXP (n, 0, 0);
 	  }
-	dwarf2out_frame_debug_cfa_register (n, label);
+	dwarf2out_frame_debug_cfa_register (n);
 	handled_one = true;
 	break;
 
@@ -2867,7 +2861,7 @@ dwarf2out_frame_debug (rtx insn, bool af
 	n = XEXP (note, 0);
 	if (n == NULL)
 	  n = single_set (insn);
-	dwarf2out_frame_debug_cfa_expression (n, label);
+	dwarf2out_frame_debug_cfa_expression (n);
 	handled_one = true;
 	break;
 
@@ -2880,7 +2874,7 @@ dwarf2out_frame_debug (rtx insn, bool af
 	      n = XVECEXP (n, 0, 0);
 	    n = XEXP (n, 0);
 	  }
-	dwarf2out_frame_debug_cfa_restore (n, label);
+	dwarf2out_frame_debug_cfa_restore (n);
 	handled_one = true;
 	break;
 
@@ -2906,18 +2900,20 @@ dwarf2out_frame_debug (rtx insn, bool af
     {
       if (any_cfis_emitted)
 	dwarf2out_flush_queued_reg_saves ();
+      cfi_insn = NULL;
       return;
     }
 
   insn = PATTERN (insn);
  found:
-  dwarf2out_frame_debug_expr (insn, label);
+  dwarf2out_frame_debug_expr (insn);
 
   /* Check again.  A parallel can save and update the same register.
      We could probably check just once, here, but this is safer than
      removing the check above.  */
   if (any_cfis_emitted || clobbers_queued_reg_save (insn))
     dwarf2out_flush_queued_reg_saves ();
+  cfi_insn = NULL;
 }
 
 /* Called once at the start of final to initialize some data for the
@@ -2926,7 +2922,6 @@ void
 dwarf2out_frame_debug_init (void)
 {
   size_t i;
-  rtx insn;
 
   /* Flush any queued register saves.  */
   dwarf2out_flush_queued_reg_saves ();
@@ -2936,6 +2931,7 @@ dwarf2out_frame_debug_init (void)
   gcc_assert (cfa.reg
 	      == (unsigned long)DWARF_FRAME_REGNUM (STACK_POINTER_REGNUM));
 
+  old_cfa = cfa;
   cfa.reg = STACK_POINTER_REGNUM;
   cfa_store = cfa;
   cfa_temp.reg = -1;
@@ -2947,7 +2943,15 @@ dwarf2out_frame_debug_init (void)
       regs_saved_in_regs[i].saved_in_reg = NULL_RTX;
     }
   num_regs_saved_in_regs = 0;
+}
+
+/* After the (optional) text prologue has been written, emit CFI insns
+   and update the FDE for frame-related instructions.  */
 
+void
+dwarf2out_frame_debug_after_prologue (void)
+{
+  rtx insn;
   if (barrier_args_size)
     {
       XDELETEVEC (barrier_args_size);
@@ -2973,6 +2977,7 @@ dwarf2out_frame_debug_init (void)
 	    case NOTE_INSN_CFA_RESTORE_STATE:
 	      cfi_insn = insn;
 	      dwarf2out_frame_debug_restore_state ();
+	      cfi_insn = NULL;
 	      break;
 	    }
 	  continue;
@@ -2999,12 +3004,15 @@ dwarf2out_frame_debug_init (void)
 	  )
 	dwarf2out_frame_debug (insn, true);
     }
+
+  add_cfis_to_fde ();
 }
 
 void
 dwarf2out_emit_cfi (dw_cfi_ref cfi)
 {
-  output_cfi_directive (cfi);
+  if (dwarf2out_do_cfi_asm ())
+    output_cfi_directive (cfi);
 }
 
 /* Determine if we need to save and restore CFI information around
@@ -3085,23 +3093,24 @@ dwarf2out_cfi_begin_epilogue (rtx insn)
   /* And emulate the state save.  */
   gcc_assert (!cfa_remember.in_use);
   cfa_remember = cfa;
+  old_cfa_remember = old_cfa;
   cfa_remember.in_use = 1;
 }
 
 /* A "subroutine" of dwarf2out_cfi_begin_epilogue.  Emit the restore
    required.  */
 
-void
+static void
 dwarf2out_frame_debug_restore_state (void)
 {
   dw_cfi_ref cfi = new_cfi ();
-  const char *label = dwarf2out_cfi_label (false);
 
   cfi->dw_cfi_opc = DW_CFA_restore_state;
-  add_fde_cfi (label, cfi);
+  add_fde_cfi (cfi);
 
   gcc_assert (cfa_remember.in_use);
   cfa = cfa_remember;
+  old_cfa = old_cfa_remember;
   cfa_remember.in_use = 0;
 }
 
@@ -4296,7 +4305,8 @@ dwarf2out_frame_init (void)
      sake of lookup_cfa.  */
 
   /* On entry, the Canonical Frame Address is at SP.  */
-  dwarf2out_def_cfa (NULL, STACK_POINTER_REGNUM, INCOMING_FRAME_SP_OFFSET);
+  old_cfa.reg = INVALID_REGNUM;
+  dwarf2out_def_cfa (true, STACK_POINTER_REGNUM, INCOMING_FRAME_SP_OFFSET);
 
   if (targetm.debug_unwind_info () == UI_DWARF2
       || targetm.except_unwind_info (&global_options) == UI_DWARF2)
@@ -4353,10 +4363,6 @@ dwarf2out_switch_text_section (void)
     }
   have_multiple_function_sections = true;
 
-  /* Reset the current label on switching text sections, so that we
-     don't attempt to advance_loc4 between labels in different sections.  */
-  fde->dw_fde_current_label = NULL;
-
   /* There is no need to mark used sections when not debugging.  */
   if (cold_text_section != NULL)
     dwarf2out_note_section_used ();
Index: gcc/dwarf2out.h
===================================================================
--- gcc.orig/dwarf2out.h
+++ gcc/dwarf2out.h
@@ -19,9 +19,41 @@ along with GCC; see the file COPYING3.  
 <http://www.gnu.org/licenses/>.  */
 
 struct dw_cfi_struct;
+/* In dwarf2out.c */
+/* Interface of the DWARF2 unwind info support.  */
+
+/* Generate a new label for the CFI info to refer to.  */
+
+extern void dwarf2out_maybe_emit_cfi_label (void);
+
+/* Entry point to update the canonical frame address (CFA).  */
+
+extern void dwarf2out_def_cfa (bool, unsigned, HOST_WIDE_INT);
+
+/* Add the CFI for saving a register window.  */
+
+extern void dwarf2out_window_save (void);
+
+/* Entry point for saving a register to the stack.  */
+
+extern void dwarf2out_reg_save (unsigned, HOST_WIDE_INT);
+
+/* Entry point for saving the return address in the stack.  */
+
+extern void dwarf2out_return_save (HOST_WIDE_INT);
+
+/* Entry point for saving the return address in a register.  */
+
+extern void dwarf2out_return_reg (unsigned);
+
+/* Entry point for saving the first register into the second.  */
+
+extern void dwarf2out_reg_save_reg (rtx, rtx);
+
 extern void dwarf2out_decl (tree);
 extern void dwarf2out_frame_debug (rtx, bool);
 extern void dwarf2out_frame_debug_init (void);
+extern void dwarf2out_frame_debug_after_prologue (void);
 extern void dwarf2out_emit_cfi (struct dw_cfi_struct *);
 extern void dwarf2out_flush_queued_reg_saves (void);
 
Index: gcc/final.c
===================================================================
--- gcc.orig/final.c
+++ gcc/final.c
@@ -1588,6 +1588,11 @@ final_start_function (rtx first ATTRIBUT
   /* First output the function prologue: code to set up the stack frame.  */
   targetm.asm_out.function_prologue (file, get_frame_size ());
 
+#if defined (HAVE_prologue)
+  if (dwarf2out_do_frame ())
+    dwarf2out_frame_debug_after_prologue ();
+#endif
+
   /* If the machine represents the prologue as RTL, the profiling code must
      be emitted when NOTE_INSN_PROLOGUE_END is scanned.  */
 #ifdef HAVE_prologue
Index: gcc/target.def
===================================================================
--- gcc.orig/target.def
+++ gcc/target.def
@@ -1792,7 +1792,7 @@ DEFHOOK
 DEFHOOK
 (dwarf_handle_frame_unspec,
  "",
- void, (const char *label, rtx pattern, int index), NULL)
+ void, (rtx pattern, int index), NULL)
 
 /* ??? Documenting this hook requires a GFDL license grant.  */
 DEFHOOK_UNDOC
Index: gcc/tree.h
===================================================================
--- gcc.orig/tree.h
+++ gcc/tree.h
@@ -5424,37 +5424,6 @@ extern tree tree_overlaps_hard_reg_set (
 #endif
 
 \f
-/* In dwarf2out.c */
-/* Interface of the DWARF2 unwind info support.  */
-
-/* Generate a new label for the CFI info to refer to.  */
-
-extern char *dwarf2out_cfi_label (bool);
-
-/* Entry point to update the canonical frame address (CFA).  */
-
-extern void dwarf2out_def_cfa (const char *, unsigned, HOST_WIDE_INT);
-
-/* Add the CFI for saving a register window.  */
-
-extern void dwarf2out_window_save (const char *);
-
-/* Entry point for saving a register to the stack.  */
-
-extern void dwarf2out_reg_save (const char *, unsigned, HOST_WIDE_INT);
-
-/* Entry point for saving the return address in the stack.  */
-
-extern void dwarf2out_return_save (const char *, HOST_WIDE_INT);
-
-/* Entry point for saving the return address in a register.  */
-
-extern void dwarf2out_return_reg (const char *, unsigned);
-
-/* Entry point for saving the first register into the second.  */
-
-extern void dwarf2out_reg_save_reg (const char *, rtx, rtx);
-
 /* In tree-inline.c  */
 
 /* The type of a set of already-visited pointers.  Functions for creating
Index: gcc/doc/tm.texi
===================================================================
--- gcc.orig/doc/tm.texi
+++ gcc/doc/tm.texi
@@ -3203,7 +3203,7 @@ someone decided it was a good idea to us
 terminate the stack backtrace.  New ports should avoid this.
 @end defmac
 
-@deftypefn {Target Hook} void TARGET_DWARF_HANDLE_FRAME_UNSPEC (const char *@var{label}, rtx @var{pattern}, int @var{index})
+@deftypefn {Target Hook} void TARGET_DWARF_HANDLE_FRAME_UNSPEC (rtx @var{pattern}, int @var{index})
 This target hook allows the backend to emit frame-related insns that
 contain UNSPECs or UNSPEC_VOLATILEs.  The DWARF 2 call frame debugging
 info engine will invoke it on insns of the form

[-- Attachment #5: 007-dw2cfg.diff --]
[-- Type: text/plain, Size: 42958 bytes --]

Index: gcc/dwarf2out.c
===================================================================
--- gcc.orig/dwarf2out.c
+++ gcc/dwarf2out.c
@@ -465,12 +465,11 @@ static void initial_return_save (rtx);
 static HOST_WIDE_INT stack_adjust_offset (const_rtx, HOST_WIDE_INT,
 					  HOST_WIDE_INT);
 static void output_cfi (dw_cfi_ref, dw_fde_ref, int);
-static void output_cfi_directive (dw_cfi_ref);
+static void output_cfi_directive (FILE *, dw_cfi_ref);
 static void output_call_frame_info (int);
 static void dwarf2out_note_section_used (void);
 static bool clobbers_queued_reg_save (const_rtx);
 static void dwarf2out_frame_debug_expr (rtx);
-static void dwarf2out_cfi_begin_epilogue (rtx);
 static void dwarf2out_frame_debug_restore_state (void);
 
 /* Support for complex CFA locations.  */
@@ -823,9 +822,6 @@ new_cfi (void)
 /* The insn after which a new CFI note should be emitted.  */
 static rtx cfi_insn;
 
-/* True if remember_state should be emitted before following CFI directive.  */
-static bool emit_cfa_remember;
-
 /* True if any CFI directives were emitted at the current insn.  */
 static bool any_cfis_emitted;
 
@@ -868,28 +864,34 @@ dwarf2out_maybe_emit_cfi_label (void)
     }
 }
 
+static void
+add_cfa_remember (void)
+{
+  dw_cfi_ref cfi_remember;
+
+  /* Emit the state save.  */
+  cfi_remember = new_cfi ();
+  cfi_remember->dw_cfi_opc = DW_CFA_remember_state;
+  add_fde_cfi (cfi_remember);
+}
+
+/* Nonnull if add_fde_cfi should not just emit a NOTE_INSN_CFI, but
+   also add the CFI to this vector.  */
+static cfi_vec *cfi_insn_vec;
+
 /* Add CFI to the current fde at the PC value indicated by LABEL if specified,
    or to the CIE if LABEL is NULL.  */
 
 static void
 add_fde_cfi (dw_cfi_ref cfi)
 {
-  if (emit_cfa_remember)
-    {
-      dw_cfi_ref cfi_remember;
-
-      /* Emit the state save.  */
-      emit_cfa_remember = false;
-      cfi_remember = new_cfi ();
-      cfi_remember->dw_cfi_opc = DW_CFA_remember_state;
-      add_fde_cfi (cfi_remember);
-    }
-
   any_cfis_emitted = true;
   if (cfi_insn != NULL)
     {
       cfi_insn = emit_note_after (NOTE_INSN_CFI, cfi_insn);
       NOTE_CFI (cfi_insn) = cfi;
+      if (cfi_insn_vec != NULL)
+	VEC_safe_push (dw_cfi_ref, gc, *cfi_insn_vec, cfi);
     }
   else
     {
@@ -980,12 +982,6 @@ static dw_cfa_location old_cfa;
    from the CFA.  */
 static dw_cfa_location cfa_store;
 
-/* The current save location around an epilogue.  */
-static dw_cfa_location cfa_remember;
-
-/* Like cfa_remember, but a copy of old_cfa.  */
-static dw_cfa_location old_cfa_remember;
-
 /* The running total of the size of arguments pushed onto the stack.  */
 static HOST_WIDE_INT args_size;
 
@@ -1339,179 +1335,6 @@ stack_adjust_offset (const_rtx pattern, 
   return offset;
 }
 
-/* Precomputed args_size for CODE_LABELs and BARRIERs preceeding them,
-   indexed by INSN_UID.  */
-
-static HOST_WIDE_INT *barrier_args_size;
-
-/* Helper function for compute_barrier_args_size.  Handle one insn.  */
-
-static HOST_WIDE_INT
-compute_barrier_args_size_1 (rtx insn, HOST_WIDE_INT cur_args_size,
-			     VEC (rtx, heap) **next)
-{
-  HOST_WIDE_INT offset = 0;
-  int i;
-
-  if (! RTX_FRAME_RELATED_P (insn))
-    {
-      if (prologue_epilogue_contains (insn))
-	/* Nothing */;
-      else if (GET_CODE (PATTERN (insn)) == SET)
-	offset = stack_adjust_offset (PATTERN (insn), cur_args_size, 0);
-      else if (GET_CODE (PATTERN (insn)) == PARALLEL
-	       || GET_CODE (PATTERN (insn)) == SEQUENCE)
-	{
-	  /* There may be stack adjustments inside compound insns.  Search
-	     for them.  */
-	  for (i = XVECLEN (PATTERN (insn), 0) - 1; i >= 0; i--)
-	    if (GET_CODE (XVECEXP (PATTERN (insn), 0, i)) == SET)
-	      offset += stack_adjust_offset (XVECEXP (PATTERN (insn), 0, i),
-					     cur_args_size, offset);
-	}
-    }
-  else
-    {
-      rtx expr = find_reg_note (insn, REG_FRAME_RELATED_EXPR, NULL_RTX);
-
-      if (expr)
-	{
-	  expr = XEXP (expr, 0);
-	  if (GET_CODE (expr) == PARALLEL
-	      || GET_CODE (expr) == SEQUENCE)
-	    for (i = 1; i < XVECLEN (expr, 0); i++)
-	      {
-		rtx elem = XVECEXP (expr, 0, i);
-
-		if (GET_CODE (elem) == SET && !RTX_FRAME_RELATED_P (elem))
-		  offset += stack_adjust_offset (elem, cur_args_size, offset);
-	      }
-	}
-    }
-
-#ifndef STACK_GROWS_DOWNWARD
-  offset = -offset;
-#endif
-
-  cur_args_size += offset;
-  if (cur_args_size < 0)
-    cur_args_size = 0;
-
-  if (JUMP_P (insn))
-    {
-      rtx dest = JUMP_LABEL (insn);
-
-      if (dest)
-	{
-	  if (barrier_args_size [INSN_UID (dest)] < 0)
-	    {
-	      barrier_args_size [INSN_UID (dest)] = cur_args_size;
-	      VEC_safe_push (rtx, heap, *next, dest);
-	    }
-	}
-    }
-
-  return cur_args_size;
-}
-
-/* Walk the whole function and compute args_size on BARRIERs.  */
-
-static void
-compute_barrier_args_size (void)
-{
-  int max_uid = get_max_uid (), i;
-  rtx insn;
-  VEC (rtx, heap) *worklist, *next, *tmp;
-
-  barrier_args_size = XNEWVEC (HOST_WIDE_INT, max_uid);
-  for (i = 0; i < max_uid; i++)
-    barrier_args_size[i] = -1;
-
-  worklist = VEC_alloc (rtx, heap, 20);
-  next = VEC_alloc (rtx, heap, 20);
-  insn = get_insns ();
-  barrier_args_size[INSN_UID (insn)] = 0;
-  VEC_quick_push (rtx, worklist, insn);
-  for (;;)
-    {
-      while (!VEC_empty (rtx, worklist))
-	{
-	  rtx prev, body, first_insn;
-	  HOST_WIDE_INT cur_args_size;
-
-	  first_insn = insn = VEC_pop (rtx, worklist);
-	  cur_args_size = barrier_args_size[INSN_UID (insn)];
-	  prev = prev_nonnote_insn (insn);
-	  if (prev && BARRIER_P (prev))
-	    barrier_args_size[INSN_UID (prev)] = cur_args_size;
-
-	  for (; insn; insn = NEXT_INSN (insn))
-	    {
-	      if (INSN_DELETED_P (insn) || NOTE_P (insn))
-		continue;
-	      if (BARRIER_P (insn))
-		break;
-
-	      if (LABEL_P (insn))
-		{
-		  if (insn == first_insn)
-		    continue;
-		  else if (barrier_args_size[INSN_UID (insn)] < 0)
-		    {
-		      barrier_args_size[INSN_UID (insn)] = cur_args_size;
-		      continue;
-		    }
-		  else
-		    {
-		      /* The insns starting with this label have been
-			 already scanned or are in the worklist.  */
-		      break;
-		    }
-		}
-
-	      body = PATTERN (insn);
-	      if (GET_CODE (body) == SEQUENCE)
-		{
-		  HOST_WIDE_INT dest_args_size = cur_args_size;
-		  for (i = 1; i < XVECLEN (body, 0); i++)
-		    if (INSN_ANNULLED_BRANCH_P (XVECEXP (body, 0, 0))
-			&& INSN_FROM_TARGET_P (XVECEXP (body, 0, i)))
-		      dest_args_size
-			= compute_barrier_args_size_1 (XVECEXP (body, 0, i),
-						       dest_args_size, &next);
-		    else
-		      cur_args_size
-			= compute_barrier_args_size_1 (XVECEXP (body, 0, i),
-						       cur_args_size, &next);
-
-		  if (INSN_ANNULLED_BRANCH_P (XVECEXP (body, 0, 0)))
-		    compute_barrier_args_size_1 (XVECEXP (body, 0, 0),
-						 dest_args_size, &next);
-		  else
-		    cur_args_size
-		      = compute_barrier_args_size_1 (XVECEXP (body, 0, 0),
-						     cur_args_size, &next);
-		}
-	      else
-		cur_args_size
-		  = compute_barrier_args_size_1 (insn, cur_args_size, &next);
-	    }
-	}
-
-      if (VEC_empty (rtx, next))
-	break;
-
-      /* Swap WORKLIST with NEXT and truncate NEXT for next iteration.  */
-      tmp = next;
-      next = worklist;
-      worklist = tmp;
-      VEC_truncate (rtx, next, 0);
-    }
-
-  VEC_free (rtx, heap, worklist);
-  VEC_free (rtx, heap, next);
-}
-
 /* Add a CFI to update the running total of the size of arguments
    pushed onto the stack.  */
 
@@ -1608,25 +1431,7 @@ dwarf2out_notice_stack_adjust (rtx insn,
       return;
     }
   else if (BARRIER_P (insn))
-    {
-      /* Don't call compute_barrier_args_size () if the only
-	 BARRIER is at the end of function.  */
-      if (barrier_args_size == NULL && next_nonnote_insn (insn))
-	compute_barrier_args_size ();
-      if (barrier_args_size == NULL)
-	offset = 0;
-      else
-	{
-	  offset = barrier_args_size[INSN_UID (insn)];
-	  if (offset < 0)
-	    offset = 0;
-	}
-
-      offset -= args_size;
-#ifndef STACK_GROWS_DOWNWARD
-      offset = -offset;
-#endif
-    }
+    return;
   else if (GET_CODE (PATTERN (insn)) == SET)
     offset = stack_adjust_offset (PATTERN (insn), args_size, 0);
   else if (GET_CODE (PATTERN (insn)) == PARALLEL
@@ -2054,9 +1859,12 @@ add_cfis_to_fde (void)
       next = NEXT_INSN (insn);
 
       if (NOTE_P (insn) && NOTE_KIND (insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS)
-	/* Don't attempt to advance_loc4 between labels in different
-	   sections.  */
-	first = true;
+	{
+	  fde->dw_fde_switch_cfi_index = VEC_length (dw_cfi_ref, fde->dw_fde_cfi);
+	  /* Don't attempt to advance_loc4 between labels in different
+	     sections.  */
+	  first = true;
+	}
 
       if (NOTE_P (insn) && NOTE_KIND (insn) == NOTE_INSN_CFI)
 	{
@@ -2097,6 +1905,17 @@ add_cfis_to_fde (void)
     }
 }
 
+/* A subroutine of dwarf2out_frame_debug_init, emit a CFA_restore_state.  */
+
+void
+dwarf2out_frame_debug_restore_state (void)
+{
+  dw_cfi_ref cfi = new_cfi ();
+
+  cfi->dw_cfi_opc = DW_CFA_restore_state;
+  add_fde_cfi (cfi);
+}
+
 /* Record call frame debugging information for an expression EXPR,
    which either sets SP or FP (adjusting how we calculate the frame
    address) or saves a register to the stack or another register.
@@ -2797,9 +2616,6 @@ dwarf2out_frame_debug (rtx insn, bool af
   else
     cfi_insn = PREV_INSN (insn);
 
-  if (!NONJUMP_INSN_P (insn) || clobbers_queued_reg_save (insn))
-    dwarf2out_flush_queued_reg_saves ();
-
   if (!RTX_FRAME_RELATED_P (insn))
     {
       /* ??? This should be done unconditionally since stack adjustments
@@ -2945,56 +2761,224 @@ dwarf2out_frame_debug_init (void)
   num_regs_saved_in_regs = 0;
 }
 
-/* After the (optional) text prologue has been written, emit CFI insns
-   and update the FDE for frame-related instructions.  */
+/* Copy a CFI vector, except for args_size opcodes.  */
+static cfi_vec
+copy_cfi_vec_parts (cfi_vec in_vec)
+{
+  int length = VEC_length (dw_cfi_ref, in_vec);
+  /* Ensure we always have a pointer to a vector, not just NULL.  */
+  cfi_vec new_vec = VEC_alloc (dw_cfi_ref, gc, length > 0 ? length : 1);
+  int i;
+  for (i = 0; i < length; i++)
+    {
+      dw_cfi_ref elt = VEC_index (dw_cfi_ref, in_vec, i);
+      if (elt->dw_cfi_opc == DW_CFA_GNU_args_size)
+	continue;
 
-void
-dwarf2out_frame_debug_after_prologue (void)
+      VEC_quick_push (dw_cfi_ref, new_vec, elt);
+    }
+  return new_vec;
+}
+
+/* Record the state of the CFI program at a point in the program.  */
+typedef struct
 {
-  rtx insn;
-  if (barrier_args_size)
+  /* The CFI instructions up to this point.  */
+  cfi_vec cfis;
+  /* Copies of the global variables with the same name.  */
+  dw_cfa_location cfa, cfa_store, old_cfa;
+  /* True if we have seen this point during a scan in scan_until_barrier.  */
+  bool visited;
+  /* True if this point was used as a starting point for such a scan.  */
+  bool used_as_start;
+  /* Other than CFI instructions and CFA state, the only thing necessary to
+     be tracked is the argument size.  */
+  int args_size;
+  /* Nonzero for states that must be remembered and restored.  If higher
+     than one, the first restores will be immediately followed by another
+     remember.  */
+  int n_restores;
+} jump_target_info;
+
+/* Return true if we'll want to save or restore CFI state at INSN.  This is
+   true for labels and barriers, and certain notes.  */
+static bool
+save_point_p (rtx insn)
+{
+  return (BARRIER_P (insn) || LABEL_P (insn)
+	  || (NOTE_P (insn)
+	      && (NOTE_KIND (insn) == NOTE_INSN_FUNCTION_BEG
+		  || NOTE_KIND (insn) == NOTE_INSN_PROLOGUE_END
+		  || NOTE_KIND (insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS)));
+}
+
+/* Save the current state in INFO.  */
+
+static void
+record_current_state (jump_target_info *info)
+{
+  info->cfis = copy_cfi_vec_parts (*cfi_insn_vec);
+  info->args_size = args_size;
+  info->cfa = cfa;
+  info->old_cfa = old_cfa;
+  info->cfa_store = cfa_store;
+}
+
+/* LABEL is the target of a jump we encountered while scanning the
+   function.  Record it in START_POINTS as a potential new starting point
+   for the scan, unless we've visited it before.  UID_LUID gives a
+   mapping for uids used to index INFO, which holds the CFI
+   information for labels and barriers.  */
+static void
+maybe_record_jump_target (rtx label, VEC (rtx, heap) **start_points,
+			  int *uid_luid, jump_target_info *info)
+{
+  int uid;
+
+  if (GET_CODE (label) == LABEL_REF)
+    label = XEXP (label, 0);
+  gcc_assert (LABEL_P (label));
+  uid = INSN_UID (label);
+  info += uid_luid[uid];
+  if (info->visited || info->cfis)
+    return;
+
+  if (dump_file)
+    fprintf (dump_file, "recording label %d as possible jump target\n", uid);
+
+  VEC_safe_push (rtx, heap, *start_points, label);
+  record_current_state (info);
+}
+
+/* Return true if VEC1 and VEC2 are identical up to the length of VEC1.  */
+static bool
+vec_is_prefix_of (cfi_vec vec1, cfi_vec vec2)
+{
+  int i;
+  int len1 = VEC_length (dw_cfi_ref, vec1);
+  int len2 = VEC_length (dw_cfi_ref, vec2);
+  if (len1 > len2)
+    return false;
+  for (i = 0; i < len1; i++)
+    if (VEC_index (dw_cfi_ref, vec1, i) != VEC_index (dw_cfi_ref, vec1, i))
+      return false;
+  return true;
+}
+
+/* Append entries to FDE's cfi vector.  PREFIX and FULL are two
+   existing vectors, where PREFIX is contained in FULL as a prefix.  */
+
+static void
+append_extra_cfis (cfi_vec prefix, cfi_vec full)
+{
+  int i;
+  int len = VEC_length (dw_cfi_ref, full);
+  int prefix_len = VEC_length (dw_cfi_ref, prefix);
+
+  gcc_assert (prefix_len <= len);
+  for (i = 0; i < prefix_len; i++)
     {
-      XDELETEVEC (barrier_args_size);
-      barrier_args_size = NULL;
+      dw_cfi_ref elt, elt2;
+
+      elt = VEC_index (dw_cfi_ref, full, i);
+      elt2 = VEC_index (dw_cfi_ref, prefix, i);
+      gcc_assert (elt == elt2);
     }
-  for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
+  for (; i < len; i++)
     {
-      rtx pat;
-      if (BARRIER_P (insn))
-	{
-	  dwarf2out_frame_debug (insn, false);
-	  continue;
-	}
-      else if (NOTE_P (insn))
+      dw_cfi_ref elt = VEC_index (dw_cfi_ref, full, i);
+      add_fde_cfi (elt);
+    }
+}
+
+extern void debug_cfi_vec (FILE *, cfi_vec v);
+void debug_cfi_vec (FILE *f, cfi_vec v)
+{
+  int ix;
+  dw_cfi_ref cfi;
+
+  FOR_EACH_VEC_ELT (dw_cfi_ref, v, ix, cfi)
+    output_cfi_directive (f, cfi);
+}
+
+static bool
+switch_note_p (rtx insn)
+{
+  return NOTE_P (insn) && NOTE_KIND (insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS;
+}
+
+/* From the current starting point in INSN, scan forwards until we hit a
+   barrier, the end of the function, or a label we've previously used as
+   a starting point.
+   UID_LUID is a mapping to linear uids used to map an insn to an entry in
+   POINT_INFO, if save_point_p is true for a given insn.  */
+
+static void
+scan_until_barrier (rtx insn, jump_target_info *point_info, int *uid_luid,
+		    VEC (rtx, heap) **start_points)
+{
+  rtx next;
+  for (; insn != NULL_RTX; insn = next)
+    {
+      int uid = INSN_UID (insn);
+      rtx pat, note;
+
+      next = NEXT_INSN (insn);
+      if (save_point_p (insn))
 	{
-	  switch (NOTE_KIND (insn))
-	    {
-	    case NOTE_INSN_EPILOGUE_BEG:
-#if defined (HAVE_epilogue)
-	      dwarf2out_cfi_begin_epilogue (insn);
-#endif
+	  int luid = uid_luid[uid];
+	  jump_target_info *info = point_info + luid;
+	  if (info->used_as_start)
+	    {
+	      if (dump_file)
+		fprintf (dump_file,
+			 "Stopping scan at insn %d; previously reached\n",
+			 uid);
 	      break;
-	    case NOTE_INSN_CFA_RESTORE_STATE:
-	      cfi_insn = insn;
-	      dwarf2out_frame_debug_restore_state ();
-	      cfi_insn = NULL;
+	    }
+	  info->visited = true;
+	  if (BARRIER_P (insn))
+	    gcc_assert (info->cfis == NULL);
+	  if (switch_note_p (insn))
+	    {
+	      /* Don't record the state, it was set to a clean slate in
+		 the caller.  */
+	      if (dump_file)
+		fprintf (dump_file,
+			 "Stopping scan at text section switch %d\n", uid);
+	      break;
+	    }
+	  record_current_state (info);
+	  if (BARRIER_P (insn))
+	    {
+	      if (dump_file)
+		fprintf (dump_file, "Stopping scan at barrier %d\n", uid);
 	      break;
 	    }
-	  continue;
 	}
+
       if (!NONDEBUG_INSN_P (insn))
 	continue;
       pat = PATTERN (insn);
       if (asm_noperands (pat) >= 0)
 	continue;
+
       if (GET_CODE (pat) == SEQUENCE)
 	{
-	  int j;
-	  for (j = 1; j < XVECLEN (pat, 0); j++)
-	    dwarf2out_frame_debug (XVECEXP (pat, 0, j), false);
+	  int i;
+	  for (i = 1; i < XVECLEN (pat, 0); i++)
+	    dwarf2out_frame_debug (XVECEXP (pat, 0, i), false);
 	  insn = XVECEXP (pat, 0, 0);
 	}
 
+      if (!NONJUMP_INSN_P (insn) || clobbers_queued_reg_save (insn)
+	  || (NOTE_P (insn) && NOTE_KIND (insn) == NOTE_INSN_PROLOGUE_END))
+	{
+	  cfi_insn = PREV_INSN (insn);
+	  dwarf2out_flush_queued_reg_saves ();
+	  cfi_insn = NULL_RTX;
+	}
+
       if (CALL_P (insn) && dwarf2out_do_frame ())
 	dwarf2out_frame_debug (insn, false);
       if (dwarf2out_do_frame ()
@@ -3003,115 +2987,463 @@ dwarf2out_frame_debug_after_prologue (vo
 #endif
 	  )
 	dwarf2out_frame_debug (insn, true);
-    }
 
-  add_cfis_to_fde ();
+      if (JUMP_P (insn))
+	{
+	  rtx label = JUMP_LABEL (insn);
+	  if (label)
+	    {
+	      rtx next = next_real_insn (label);
+	      if (next != NULL_RTX && addr_vec_p (next))
+		{
+		  int i;
+		  rtx pat = PATTERN (next);
+		  int eltnum = GET_CODE (pat) == ADDR_DIFF_VEC ? 1 : 0;
+
+		  for (i = 0; i < XVECLEN (pat, eltnum); i++)
+		    maybe_record_jump_target (XVECEXP (pat, eltnum, i),
+					      start_points, uid_luid,
+					      point_info);
+		}
+	      else
+		maybe_record_jump_target (label, start_points, uid_luid,
+					  point_info);
+	    }
+	}
+      note = find_reg_note (insn, REG_EH_REGION, NULL_RTX);
+      if (note)
+	{
+	  eh_landing_pad lp;
+
+	  lp = get_eh_landing_pad_from_rtx (insn);
+	  if (lp)
+	    maybe_record_jump_target (lp->landing_pad, start_points,
+				      uid_luid, point_info);
+	}
+    }
 }
 
-void
-dwarf2out_emit_cfi (dw_cfi_ref cfi)
+/* A subroutine of dwarf2out_debug_after_prologue.  Given the vector
+   of potential starting points in *START_POINTS, pick the best one to
+   use for the next scan.  Return NULL_RTX if there's nothing left to
+   scan.
+   UID_LUID and START_POINTS are as in scan_until_barrier.  */
+
+static rtx
+find_best_starting_point (jump_target_info *point_info, int *uid_luid,
+			  VEC (rtx, heap) **start_points)
 {
-  if (dwarf2out_do_cfi_asm ())
-    output_cfi_directive (cfi);
+  int i;
+  rtx insn;
+  int best_idx;
+  bool best_has_barrier;
+  jump_target_info *restart_info;
+
+  FOR_EACH_VEC_ELT_REVERSE (rtx, *start_points, i, insn)
+    {
+      restart_info = point_info + uid_luid[INSN_UID (insn)];
+      if (restart_info->visited)
+	VEC_ordered_remove (rtx, *start_points, i);
+    }
+
+  best_idx = -1;
+  best_has_barrier = false;
+  FOR_EACH_VEC_ELT (rtx, *start_points, i, insn)
+    {
+      rtx prev;
+      bool this_has_barrier;
+
+      restart_info = point_info + uid_luid[INSN_UID (insn)];
+      prev = prev_nonnote_nondebug_insn (insn);
+      this_has_barrier = (prev
+			  && (BARRIER_P (prev) || switch_note_p (prev)));
+      if (best_idx < 0
+	  || (!best_has_barrier && this_has_barrier))
+	{
+	  best_idx = i;
+	  best_has_barrier = this_has_barrier;
+	}
+    }
+
+  if (best_idx < 0)
+    {
+      rtx link;
+      for (link = forced_labels; link; link = XEXP (link, 1))
+	{
+	  insn = XEXP (link, 0);
+	  restart_info = point_info + uid_luid[INSN_UID (insn)];
+	  if (!restart_info->visited)
+	    return insn;
+	}
+      return NULL_RTX;
+    }
+  insn = VEC_index (rtx, *start_points, best_idx);
+  VEC_ordered_remove (rtx, *start_points, best_idx);
+  return insn;
 }
 
-/* Determine if we need to save and restore CFI information around
-   this epilogue.  If we do need to save/restore, then emit the save
-   now, and insert a NOTE_INSN_CFA_RESTORE_STATE at the appropriate
-   place in the stream.  */
+/* After the (optional) text prologue has been written, emit CFI insns
+   and update the FDE for frame-related instructions.  */
 
 void
-dwarf2out_cfi_begin_epilogue (rtx insn)
+dwarf2out_frame_debug_after_prologue (void)
 {
-  bool saw_frp = false;
-  rtx i;
+  int max_uid = get_max_uid ();
+  int i, n_saves_restores, prologue_end_point, switch_note_point;
+  rtx insn, save_point;
+  VEC (rtx, heap) *start_points;
+  int n_points;
+  int *uid_luid;
+  bool remember_needed;
+  jump_target_info *point_info, *save_point_info;
+  cfi_vec current_vec;
+
+  n_points = 0;
+  for (insn = get_insns (); insn != NULL_RTX; insn = NEXT_INSN (insn))
+    if (save_point_p (insn))
+      n_points++;
+  uid_luid = XCNEWVEC (int, max_uid);
+  n_points = 0;
+  prologue_end_point = -1;
+  switch_note_point = -1;
+  for (insn = get_insns (); insn != NULL_RTX; insn = NEXT_INSN (insn))
+    if (save_point_p (insn))
+      {
+	if (NOTE_P (insn) && NOTE_KIND (insn) == NOTE_INSN_PROLOGUE_END)
+	  prologue_end_point = n_points;
+	else if (switch_note_p (insn))
+	  switch_note_point = n_points;
+	uid_luid[INSN_UID (insn)] = n_points++;
+      }
+
+  point_info = XCNEWVEC (jump_target_info, n_points);
+  for (i = 0; i < n_points; i++)
+    point_info[i].args_size = -1;
+
+  start_points = VEC_alloc (rtx, heap, 20);
+  insn = get_insns ();
+  current_vec = VEC_alloc (dw_cfi_ref, gc, 10);
+
+  /* At a NOTE_INSN_SWITCH_TEXT_SECTIONS we'll emit a cfi_startproc.
+     Ensure the state at this note reflects that.  */
+  if (switch_note_point != -1)
+    {
+      cfi_insn_vec = &current_vec;
+      record_current_state (point_info + switch_note_point);
+      cfi_insn_vec = NULL;
+    }
+  args_size = old_args_size = 0;
 
-  /* Scan forward to the return insn, noticing if there are possible
-     frame related insns.  */
-  for (i = NEXT_INSN (insn); i ; i = NEXT_INSN (i))
+  for (;;)
     {
-      if (!INSN_P (i))
-	continue;
+      HOST_WIDE_INT offset;
+      jump_target_info *restart_info;
 
-      /* Look for both regular and sibcalls to end the block.  Various
-	 optimization passes may cause us to jump to a common epilogue
-	 tail, so we also accept simplejumps.  */
-      if (returnjump_p (i) || simplejump_p (i))
-	break;
-      if (CALL_P (i) && SIBLING_CALL_P (i))
+      /* Scan the insns and emit NOTE_CFIs where necessary.  */
+      cfi_insn_vec = &current_vec;
+      scan_until_barrier (insn, point_info, uid_luid, &start_points);
+      cfi_insn_vec = NULL;
+
+      insn = find_best_starting_point (point_info, uid_luid, &start_points);
+
+      if (insn == NULL_RTX)
 	break;
 
-      if (GET_CODE (PATTERN (i)) == SEQUENCE)
-	{
-	  int idx;
-	  rtx seq = PATTERN (i);
+      if (dump_file)
+	fprintf (dump_file, "restarting scan at label %d", INSN_UID (insn));
 
-	  if (returnjump_p (XVECEXP (seq, 0, 0)))
-	    break;
-	  if (CALL_P (XVECEXP (seq, 0, 0))
-	      && SIBLING_CALL_P (XVECEXP (seq, 0, 0)))
-	    break;
+      restart_info = point_info + uid_luid[INSN_UID (insn)];
+      restart_info->visited = true;
+      restart_info->used_as_start = true;
+      /* If find_best_starting_point returned a forced label, use the
+	 state at the NOTE_INSN_PROLOGUE_END note.  */
+      if (restart_info->cfis == NULL)
+	{
+	  cfi_vec *v = &restart_info->cfis;
+	  gcc_assert (prologue_end_point != -1);
+	  restart_info = point_info + prologue_end_point;
+	  *v = copy_cfi_vec_parts (restart_info->cfis);
+	}
+
+      gcc_assert (LABEL_P (insn));
+      current_vec = copy_cfi_vec_parts (restart_info->cfis);
+      cfa = restart_info->cfa;
+      old_cfa = restart_info->old_cfa;
+      cfa_store = restart_info->cfa_store;
+      offset = restart_info->args_size;
+      if (offset >= 0)
+	{
+	  if (dump_file && offset != args_size)
+	    fprintf (dump_file, ", args_size " HOST_WIDE_INT_PRINT_DEC
+		     "  -> " HOST_WIDE_INT_PRINT_DEC,
+		     args_size, offset);
 
-	  for (idx = 0; idx < XVECLEN (seq, 0); idx++)
-	    if (RTX_FRAME_RELATED_P (XVECEXP (seq, 0, idx)))
-	      saw_frp = true;
+	  offset -= args_size;
+#ifndef STACK_GROWS_DOWNWARD
+	  offset = -offset;
+#endif
+	  if (offset != 0)
+	    {
+	      cfi_insn = prev_nonnote_nondebug_insn (insn);
+	      dwarf2out_stack_adjust (offset);
+	      cfi_insn = NULL_RTX;
+	    }
+	}
+      if (dump_file)
+	{
+	  fprintf (dump_file, "\n");
+	  if (dump_flags & TDF_DETAILS)
+	    debug_cfi_vec (dump_file, current_vec);
 	}
 
-      if (RTX_FRAME_RELATED_P (i))
-	saw_frp = true;
+      insn = NEXT_INSN (insn);
     }
 
-  /* If the port doesn't emit epilogue unwind info, we don't need a
-     save/restore pair.  */
-  if (!saw_frp)
-    return;
+  VEC_free (rtx, heap, start_points);
 
-  /* Otherwise, search forward to see if the return insn was the last
-     basic block of the function.  If so, we don't need save/restore.  */
-  gcc_assert (i != NULL);
-  i = next_real_insn (i);
-  if (i == NULL)
-    return;
+  /* Now splice the various CFI fragments together into a coherent whole.  */
 
-  /* Insert the restore before that next real insn in the stream, and before
-     a potential NOTE_INSN_EPILOGUE_BEG -- we do need these notes to be
-     properly nested.  This should be after any label or alignment.  This
-     will be pushed into the CFI stream by the function below.  */
-  while (1)
+  /* First, discover discontinuities, and where necessary search for suitable
+     remember/restore points.  */
+  save_point = NULL_RTX;
+  save_point_info = NULL;
+  n_saves_restores = 0;
+  for (insn = get_last_insn (); insn; insn = PREV_INSN (insn))
     {
-      rtx p = PREV_INSN (i);
-      if (!NOTE_P (p))
-	break;
-      if (NOTE_KIND (p) == NOTE_INSN_BASIC_BLOCK)
-	break;
-      i = p;
+      jump_target_info *info, *barrier_info, *candidate_info;
+      rtx prev;
+
+      if (insn == save_point)
+	{
+	  save_point = NULL_RTX;
+	  save_point_info = NULL;
+	  info = point_info + uid_luid[INSN_UID (insn)];
+	  info->n_restores = n_saves_restores;
+	  n_saves_restores = 0;
+	  if (dump_file)
+	    fprintf (dump_file, "finalize save point %d\n", INSN_UID (insn));
+	}
+
+      /* Look for labels that were used as starting points and are
+	 preceded by a BARRIER.  */
+      if (!LABEL_P (insn))
+	continue;
+
+      info = point_info + uid_luid[INSN_UID (insn)];
+      if (!info->used_as_start)
+	continue;
+      barrier_info = NULL;
+      for (prev = PREV_INSN (insn); prev; prev = PREV_INSN (prev))
+	{
+	  if (!BARRIER_P (prev) && !LABEL_P (prev))
+	    continue;
+	  barrier_info = point_info + uid_luid[INSN_UID (prev)];
+	  /* Skip through barriers we haven't visited; they may occur
+	     for things like jump tables.  */
+	  if ((BARRIER_P (prev) && barrier_info->visited)
+	      || (LABEL_P (prev) && barrier_info->used_as_start)
+	      || switch_note_p (prev))
+	    break;
+	}
+      if (!BARRIER_P (prev))
+	continue;
+
+      if (dump_file)
+	fprintf (dump_file, "State transition at barrier %d, label %d ... ",
+		 INSN_UID (prev), INSN_UID (insn));
+
+      /* If the state at the barrier can easily be transformed into the state
+	 at the label, we don't need save/restore points.  */
+      if (vec_is_prefix_of (barrier_info->cfis, info->cfis))
+	{
+	  if (dump_file)
+	    fprintf (dump_file, "prefix\n");
+	  continue;
+	}
+
+      /* A save/restore is necessary.  Walk backwards to find the best
+	 save point.  First see if we know a save point already and if
+	 it's suitable.  */
+      n_saves_restores++;
+      if (save_point)
+	{
+	  prev = save_point;
+	  if (vec_is_prefix_of (save_point_info->cfis, info->cfis))
+	    {
+	      if (dump_file)
+		fprintf (dump_file, "reuse save point\n");
+	      continue;
+	    }
+	}
+
+      for (;;)
+	{
+	  prev = PREV_INSN (prev);
+	  /* We should eventually encounter the NOTE_INSN_FUNCTION_BEG,
+	     which must be a suitable save point fo anything.  */
+	  gcc_assert (prev != NULL_RTX);
+
+	  if (!save_point_p (prev))
+	    continue;
+
+	  candidate_info = point_info + uid_luid[INSN_UID (prev)];
+	  /* We don't necessarily get to see this note during
+	     scanning. Record an empty CFI vector for it so that it is
+	     usable as a restore point.  */
+	  if (switch_note_p (prev))
+	    {
+	      if (candidate_info->cfis == NULL)
+		candidate_info->cfis = VEC_alloc (dw_cfi_ref, gc, 1);
+	    }
+
+	  if (candidate_info->cfis != NULL
+	      && vec_is_prefix_of (candidate_info->cfis, info->cfis)
+	      && (save_point == NULL
+		  || vec_is_prefix_of (candidate_info->cfis,
+				       save_point_info->cfis)))
+	    {
+	      if (dump_file)
+		fprintf (dump_file, "save point %d\n", INSN_UID (prev));
+	      save_point = prev;
+	      save_point_info = candidate_info;
+	      break;
+	    }
+	}
     }
-  emit_note_before (NOTE_INSN_CFA_RESTORE_STATE, i);
 
-  emit_cfa_remember = true;
+  save_point = NULL_RTX;
+  save_point_info = NULL;
+  remember_needed = false;
+
+  /* This value is now used to distinguish between NOTE_CFI added up
+     to now and those added by the next loop.  */
+  max_uid = get_max_uid ();
 
-  /* And emulate the state save.  */
-  gcc_assert (!cfa_remember.in_use);
-  cfa_remember = cfa;
-  old_cfa_remember = old_cfa;
-  cfa_remember.in_use = 1;
-}
+  for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
+    {
+      jump_target_info *info;
 
-/* A "subroutine" of dwarf2out_cfi_begin_epilogue.  Emit the restore
-   required.  */
+      if (INSN_UID (insn) < max_uid
+	  && NOTE_P (insn) && NOTE_KIND (insn) == NOTE_INSN_CFI
+	  && remember_needed)
+	{
+	  cfi_insn = PREV_INSN (insn);
+	  add_cfa_remember ();
+	  cfi_insn = NULL_RTX;
+	  remember_needed = false;
+	}
 
-static void
-dwarf2out_frame_debug_restore_state (void)
-{
-  dw_cfi_ref cfi = new_cfi ();
+      if (!save_point_p (insn))
+	continue;
 
-  cfi->dw_cfi_opc = DW_CFA_restore_state;
-  add_fde_cfi (cfi);
+      cfi_insn = insn;
+      info = point_info + uid_luid[INSN_UID (insn)];
+
+      if (info->n_restores > 0)
+	{
+	  gcc_assert (save_point_info == NULL);
+	  save_point_info = info;
+	  remember_needed = true;
+	}
+      if (switch_note_p (insn))
+	{
+	  jump_target_info *label_info;
+	  rtx next = insn;
+
+	  cfi_insn = insn;
+	  if (remember_needed)
+	    add_cfa_remember ();
+	  remember_needed = false;
+
+	  /* Find the next label, and emit extra CFIs as necessary to
+	     achieve the correct state.  */
+	  do
+	    {
+	      if (LABEL_P (next))
+		{
+		  label_info = point_info + uid_luid[INSN_UID (next)];
+		  if (label_info->used_as_start)
+		    break;
+		}
+	      insn = next;
+	      next = NEXT_INSN (next);
+	    }
+	  while (next != NULL_RTX);
+	  if (next == NULL_RTX)
+	    break;
+	  append_extra_cfis (NULL, label_info->cfis);
+	  cfi_insn = NULL_RTX;
+	}
+      else if (BARRIER_P (insn))
+	{
+	  jump_target_info *label_info;
+	  cfi_vec new_cfi_vec;
+	  cfi_vec barrier_cfi = info->cfis;
+	  rtx next = insn;
+
+	  /* Find the start of the next sequence we processed.  */
+	  do
+	    {
+	      if (LABEL_P (next))
+		{
+		  label_info = point_info + uid_luid[INSN_UID (next)];
+		  if (label_info->used_as_start)
+		    break;
+		}
+	      if (switch_note_p (next))
+		break;
+	      insn = next;
+	      next = NEXT_INSN (next);
+	    }
+	  while (next != NULL_RTX);
+	  if (next == NULL_RTX)
+	    break;
+	  if (!LABEL_P (next))
+	    continue;
 
-  gcc_assert (cfa_remember.in_use);
-  cfa = cfa_remember;
-  old_cfa = old_cfa_remember;
-  cfa_remember.in_use = 0;
+	  /* Emit extra CFIs as necessary to achieve the correct state.  */
+	  new_cfi_vec = label_info->cfis;
+	  cfi_insn = next;
+	  if (vec_is_prefix_of (barrier_cfi, new_cfi_vec))
+	    {
+	      if (VEC_length (dw_cfi_ref, barrier_cfi)
+		  != VEC_length (dw_cfi_ref, new_cfi_vec))
+		{
+		  /* If the barrier was a point needing a restore, we must
+		     add the remember here as we ignore the newly added
+		     CFI notes.  */
+		  if (info->n_restores > 0)
+		    add_cfa_remember ();
+		  remember_needed = false;
+		  append_extra_cfis (barrier_cfi, new_cfi_vec);
+		}
+	    }
+	  else
+	    {
+	      save_point_info->n_restores--;
+	      dwarf2out_frame_debug_restore_state ();
+
+	      if (save_point_info->n_restores > 0)
+		add_cfa_remember ();
+	      gcc_assert (!remember_needed);
+	      append_extra_cfis (save_point_info->cfis, new_cfi_vec);
+	      if (save_point_info->n_restores == 0)
+		save_point_info = NULL;
+	    }
+	  cfi_insn = NULL_RTX;
+	}
+    }
+  free (uid_luid);
+  free (point_info);
+
+  add_cfis_to_fde ();
+}
+
+void
+dwarf2out_emit_cfi (dw_cfi_ref cfi)
+{
+  if (dwarf2out_do_cfi_asm ())
+    output_cfi_directive (asm_out_file, cfi);
 }
 
 /* Describe for the GTY machinery what parts of dw_cfi_oprnd1 are used.  */
@@ -3411,7 +3743,7 @@ output_cfi (dw_cfi_ref cfi, dw_fde_ref f
 /* Similar, but do it via assembler directives instead.  */
 
 static void
-output_cfi_directive (dw_cfi_ref cfi)
+output_cfi_directive (FILE *f, dw_cfi_ref cfi)
 {
   unsigned long r, r2;
 
@@ -3426,82 +3758,96 @@ output_cfi_directive (dw_cfi_ref cfi)
       /* Should only be created by add_fde_cfi in a code path not
 	 followed when emitting via directives.  The assembler is
 	 going to take care of this for us.  */
-      gcc_unreachable ();
+      if (f == asm_out_file)
+	gcc_unreachable ();
+      fprintf (f, "\t.cfi_advance_loc\n");
+      break;
 
     case DW_CFA_offset:
     case DW_CFA_offset_extended:
     case DW_CFA_offset_extended_sf:
       r = DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, 1);
-      fprintf (asm_out_file, "\t.cfi_offset %lu, "HOST_WIDE_INT_PRINT_DEC"\n",
+      fprintf (f, "\t.cfi_offset %lu, "HOST_WIDE_INT_PRINT_DEC"\n",
 	       r, cfi->dw_cfi_oprnd2.dw_cfi_offset);
       break;
 
     case DW_CFA_restore:
     case DW_CFA_restore_extended:
       r = DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, 1);
-      fprintf (asm_out_file, "\t.cfi_restore %lu\n", r);
+      fprintf (f, "\t.cfi_restore %lu\n", r);
       break;
 
     case DW_CFA_undefined:
       r = DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, 1);
-      fprintf (asm_out_file, "\t.cfi_undefined %lu\n", r);
+      fprintf (f, "\t.cfi_undefined %lu\n", r);
       break;
 
     case DW_CFA_same_value:
       r = DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, 1);
-      fprintf (asm_out_file, "\t.cfi_same_value %lu\n", r);
+      fprintf (f, "\t.cfi_same_value %lu\n", r);
       break;
 
     case DW_CFA_def_cfa:
     case DW_CFA_def_cfa_sf:
       r = DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, 1);
-      fprintf (asm_out_file, "\t.cfi_def_cfa %lu, "HOST_WIDE_INT_PRINT_DEC"\n",
+      fprintf (f, "\t.cfi_def_cfa %lu, "HOST_WIDE_INT_PRINT_DEC"\n",
 	       r, cfi->dw_cfi_oprnd2.dw_cfi_offset);
       break;
 
     case DW_CFA_def_cfa_register:
       r = DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, 1);
-      fprintf (asm_out_file, "\t.cfi_def_cfa_register %lu\n", r);
+      fprintf (f, "\t.cfi_def_cfa_register %lu\n", r);
       break;
 
     case DW_CFA_register:
       r = DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd1.dw_cfi_reg_num, 1);
       r2 = DWARF2_FRAME_REG_OUT (cfi->dw_cfi_oprnd2.dw_cfi_reg_num, 1);
-      fprintf (asm_out_file, "\t.cfi_register %lu, %lu\n", r, r2);
+      fprintf (f, "\t.cfi_register %lu, %lu\n", r, r2);
       break;
 
     case DW_CFA_def_cfa_offset:
     case DW_CFA_def_cfa_offset_sf:
-      fprintf (asm_out_file, "\t.cfi_def_cfa_offset "
+      fprintf (f, "\t.cfi_def_cfa_offset "
 	       HOST_WIDE_INT_PRINT_DEC"\n",
 	       cfi->dw_cfi_oprnd1.dw_cfi_offset);
       break;
 
     case DW_CFA_remember_state:
-      fprintf (asm_out_file, "\t.cfi_remember_state\n");
+      fprintf (f, "\t.cfi_remember_state\n");
       break;
     case DW_CFA_restore_state:
-      fprintf (asm_out_file, "\t.cfi_restore_state\n");
+      fprintf (f, "\t.cfi_restore_state\n");
       break;
 
     case DW_CFA_GNU_args_size:
-      fprintf (asm_out_file, "\t.cfi_escape %#x,", DW_CFA_GNU_args_size);
+      if (f != asm_out_file)
+	{
+	  fprintf (f, "\t.cfi_GNU_args_size"HOST_WIDE_INT_PRINT_DEC "\n",
+		   cfi->dw_cfi_oprnd1.dw_cfi_offset);
+	  break;
+	}
+      fprintf (f, "\t.cfi_escape %#x,", DW_CFA_GNU_args_size);
       dw2_asm_output_data_uleb128_raw (cfi->dw_cfi_oprnd1.dw_cfi_offset);
       if (flag_debug_asm)
-	fprintf (asm_out_file, "\t%s args_size "HOST_WIDE_INT_PRINT_DEC,
+	fprintf (f, "\t%s args_size "HOST_WIDE_INT_PRINT_DEC,
 		 ASM_COMMENT_START, cfi->dw_cfi_oprnd1.dw_cfi_offset);
-      fputc ('\n', asm_out_file);
+      fputc ('\n', f);
       break;
 
     case DW_CFA_GNU_window_save:
-      fprintf (asm_out_file, "\t.cfi_window_save\n");
+      fprintf (f, "\t.cfi_window_save\n");
       break;
 
     case DW_CFA_def_cfa_expression:
     case DW_CFA_expression:
-      fprintf (asm_out_file, "\t.cfi_escape %#x,", cfi->dw_cfi_opc);
+      if (f != asm_out_file)
+	{
+	  fprintf (f, "\t.cfi_cfa_{def_,}expression\n");
+	  break;
+	}
+      fprintf (f, "\t.cfi_escape %#x,", cfi->dw_cfi_opc);
       output_cfa_loc_raw (cfi);
-      fputc ('\n', asm_out_file);
+      fputc ('\n', f);
       break;
 
     default:
@@ -3510,14 +3856,11 @@ output_cfi_directive (dw_cfi_ref cfi)
 }
 
 /* Output CFIs from VEC, up to index UPTO, to bring current FDE to the
-   same state as after executing CFIs in CFI chain.  DO_CFI_ASM is
-   true if .cfi_* directives shall be emitted, false otherwise.  If it
-   is false, FDE and FOR_EH are the other arguments to pass to
-   output_cfi.  */
+   same state as after executing CFIs in CFI chain.  FDE and FOR_EH
+   are the other arguments to pass to output_cfi.  */
 
 static void
-output_cfis (cfi_vec vec, int upto, bool do_cfi_asm,
-	     dw_fde_ref fde, bool for_eh)
+output_cfis (cfi_vec vec, int upto, dw_fde_ref fde, bool for_eh)
 {
   int ix;
   struct dw_cfi_struct cfi_buf;
@@ -3611,12 +3954,7 @@ output_cfis (cfi_vec vec, int upto, bool
 	      if (cfi2 != NULL
 		  && cfi2->dw_cfi_opc != DW_CFA_restore
 		  && cfi2->dw_cfi_opc != DW_CFA_restore_extended)
-		{
-		  if (do_cfi_asm)
-		    output_cfi_directive (cfi2);
-		  else
-		    output_cfi (cfi2, fde, for_eh);
-		}
+		output_cfi (cfi2, fde, for_eh);
 	    }
 	  if (cfi_cfa && cfi_cfa_offset && cfi_cfa_offset != cfi_cfa)
 	    {
@@ -3645,30 +3983,20 @@ output_cfis (cfi_vec vec, int upto, bool
 	  else if (cfi_cfa_offset)
 	    cfi_cfa = cfi_cfa_offset;
 	  if (cfi_cfa)
-	    {
-	      if (do_cfi_asm)
-		output_cfi_directive (cfi_cfa);
-	      else
-		output_cfi (cfi_cfa, fde, for_eh);
-	    }
+	    output_cfi (cfi_cfa, fde, for_eh);
+
 	  cfi_cfa = NULL;
 	  cfi_cfa_offset = NULL;
 	  if (cfi_args_size
 	      && cfi_args_size->dw_cfi_oprnd1.dw_cfi_offset)
-	    {
-	      if (do_cfi_asm)
-		output_cfi_directive (cfi_args_size);
-	      else
-		output_cfi (cfi_args_size, fde, for_eh);
-	    }
+	    output_cfi (cfi_args_size, fde, for_eh);
+
 	  cfi_args_size = NULL;
 	  if (cfi == NULL)
 	    {
 	      VEC_free (dw_cfi_ref, heap, regs);
 	      return;
 	    }
-	  else if (do_cfi_asm)
-	    output_cfi_directive (cfi);
 	  else
 	    output_cfi (cfi, fde, for_eh);
 	  break;
@@ -3678,14 +4006,6 @@ output_cfis (cfi_vec vec, int upto, bool
     }
 }
 
-/* Like output_cfis, but emit all CFIs in the vector.  */
-static void
-output_all_cfis (cfi_vec vec, bool do_cfi_asm,
-		 dw_fde_ref fde, bool for_eh)
-{
-  output_cfis (vec, VEC_length (dw_cfi_ref, vec), do_cfi_asm, fde, for_eh);
-}
-
 /* Output one FDE.  */
 
 static void
@@ -3801,7 +4121,7 @@ output_fde (dw_fde_ref fde, bool for_eh,
       if (fde->dw_fde_switch_cfi_index > 0)
 	{
 	  from = fde->dw_fde_switch_cfi_index;
-	  output_cfis (fde->dw_fde_cfi, from, false, fde, for_eh);
+	  output_cfis (fde->dw_fde_cfi, from, fde, for_eh);
 	}
       for (i = from; i < until; i++)
 	output_cfi (VEC_index (dw_cfi_ref, fde->dw_fde_cfi, i),
@@ -4379,13 +4699,8 @@ dwarf2out_switch_text_section (void)
        || (cold_text_section && sect == cold_text_section));
 
   if (dwarf2out_do_cfi_asm ())
-    {
-      dwarf2out_do_cfi_startproc (true);
-      /* As this is a different FDE, insert all current CFI instructions
-	 again.  */
-      output_all_cfis (fde->dw_fde_cfi, true, fde, true);
-    }
-  fde->dw_fde_switch_cfi_index = VEC_length (dw_cfi_ref, fde->dw_fde_cfi);
+    dwarf2out_do_cfi_startproc (true);
+
   var_location_switch_text_section ();
 
   set_cur_line_info_table (sect);
@@ -5490,7 +5805,7 @@ output_loc_operands_raw (dw_loc_descr_re
 	dw2_asm_output_data_uleb128_raw (r);
       }
       break;
-      
+
     case DW_OP_constu:
     case DW_OP_plus_uconst:
     case DW_OP_piece:
@@ -12472,7 +12787,7 @@ output_one_line_info_table (dw_line_info
 	  dw2_asm_output_data (1, DW_LNS_set_prologue_end,
 			       "set prologue end");
 	  break;
-	  
+
 	case LI_set_epilogue_begin:
 	  dw2_asm_output_data (1, DW_LNS_set_epilogue_begin,
 			       "set epilogue begin");
@@ -14799,7 +15114,7 @@ static bool
 decl_by_reference_p (tree decl)
 {
   return ((TREE_CODE (decl) == PARM_DECL || TREE_CODE (decl) == RESULT_DECL
-  	   || TREE_CODE (decl) == VAR_DECL)
+	   || TREE_CODE (decl) == VAR_DECL)
 	  && DECL_BY_REFERENCE (decl));
 }
 
@@ -20724,7 +21039,7 @@ gen_type_die_with_usage (tree type, dw_d
       if (DECL_CONTEXT (TYPE_NAME (type))
 	  && TREE_CODE (DECL_CONTEXT (TYPE_NAME (type))) == NAMESPACE_DECL)
 	context_die = get_context_die (DECL_CONTEXT (TYPE_NAME (type)));
-      
+
       gen_decl_die (TYPE_NAME (type), NULL, context_die);
       return;
     }
@@ -21954,7 +22269,7 @@ gen_scheduled_generic_parms_dies (void)
 
   if (generic_type_instances == NULL)
     return;
-  
+
   FOR_EACH_VEC_ELT (tree, generic_type_instances, i, t)
     gen_generic_params_dies (t);
 }
@@ -23863,7 +24178,7 @@ dwarf2out_finish (const char *filename)
   if (!VEC_empty (pubname_entry, pubtype_table))
     {
       bool empty = false;
-      
+
       if (flag_eliminate_unused_debug_types)
 	{
 	  /* The pubtypes table might be emptied by pruning unused items.  */
Index: gcc/jump.c
===================================================================
--- gcc.orig/jump.c
+++ gcc/jump.c
@@ -709,6 +709,15 @@ comparison_dominates_p (enum rtx_code co
   return 0;
 }
 \f
+/* Return true if INSN is an ADDR_VEC or ADDR_DIFF_VEC.  */
+bool
+addr_vec_p (const_rtx insn)
+{
+  return (JUMP_P (insn)
+	  && (GET_CODE (PATTERN (insn)) == ADDR_VEC
+	      || GET_CODE (PATTERN (insn)) == ADDR_DIFF_VEC));
+}
+
 /* Return 1 if INSN is an unconditional jump and nothing else.  */
 
 int
Index: gcc/rtl.h
===================================================================
--- gcc.orig/rtl.h
+++ gcc/rtl.h
@@ -2307,6 +2307,7 @@ extern int any_condjump_p (const_rtx);
 extern int any_uncondjump_p (const_rtx);
 extern rtx pc_set (const_rtx);
 extern rtx condjump_label (const_rtx);
+extern bool addr_vec_p (const_rtx);
 extern int simplejump_p (const_rtx);
 extern int returnjump_p (rtx);
 extern int eh_returnjump_p (rtx);

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 3/6] Allow jumps in epilogues
  2011-04-13 15:28                     ` Bernd Schmidt
  2011-04-13 14:44                       ` Richard Henderson
@ 2011-04-15 16:29                       ` Bernd Schmidt
  1 sibling, 0 replies; 73+ messages in thread
From: Bernd Schmidt @ 2011-04-15 16:29 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 559 bytes --]

On 04/13/2011 02:38 PM, Bernd Schmidt wrote:
> This still requires the i386 output_set_got which I think I can cope
> with [...]

Patch below, to be applied on top of all the others. Only lightly tested
so far beyond standard (fairly useless) regression tests, by comparing
generated assembly before/after, for -fpic -march=pentium and core2, for
i686-linux and i686-apple-darwin10 (for TARGET_MACHO).

I've not found or managed to create a testcase for making MI thunks
generate a call to output_set_got. I think it should work, but it's not
tested.


Bernd

[-- Attachment #2: i386-got.diff --]
[-- Type: text/plain, Size: 8998 bytes --]

	* config/i386/i386.c (output_set_got): Don't call
	dwarf2out_flush_queued_reg_saves.
	(ix86_reorg): Split set_got patterns.
	(i386_dwarf_handle_frame_unspec,
	i386_dwarf_flush_queued_register_saves): New static functions.
	(TARGET_DWARF_HANDLE_FRAME_UNSPEC,
	TARGET_DWARF_FLUSH_QUEUED_REGISTER_SAVES): New macros.
	* config/i386/i386.md (set_got_call, set_got_pop, set_got_add):
	New patterns.
	* dwarf2out.c (scan_until_barrier): Use the target's
	dwarf_flush_queued_register_saves hook.
	* target.def (dwarf_flush_queued_register_saves): New hook.
	* doc/tm.texi: Regenerate.
Index: gcc/config/i386/i386.c
===================================================================
--- gcc.orig/config/i386/i386.c
+++ gcc/config/i386/i386.c
@@ -8953,6 +8953,8 @@ output_set_got (rtx dest, rtx label ATTR
 	output_asm_insn ("mov%z0\t{%2, %0|%0, %2}", xops);
       else
 	{
+	  /* For normal functions, this pattern is split, but we can still
+	     get here for thunks.  */
 	  output_asm_insn ("call\t%a2", xops);
 #ifdef DWARF2_UNWIND_INFO
 	  /* The call to next label acts as a push.  */
@@ -9010,12 +9012,6 @@ output_set_got (rtx dest, rtx label ATTR
       get_pc_thunk_name (name, REGNO (dest));
       pic_labels_used |= 1 << REGNO (dest);
 
-#ifdef DWARF2_UNWIND_INFO
-      /* Ensure all queued register saves are flushed before the
-	 call.  */
-      if (dwarf2out_do_frame ())
-	dwarf2out_flush_queued_reg_saves ();
-#endif
       xops[2] = gen_rtx_SYMBOL_REF (Pmode, ggc_strdup (name));
       xops[2] = gen_rtx_MEM (QImode, xops[2]);
       output_asm_insn ("call\t%X2", xops);
@@ -30458,6 +30454,56 @@ ix86_reorg (void)
      with old MDEP_REORGS that are not CFG based.  Recompute it now.  */
   compute_bb_for_insn ();
 
+  /* Split any set_got patterns so that we interact correctly with
+     dwarf2out.  */
+  if (!TARGET_64BIT && !TARGET_VXWORKS_RTP && !TARGET_DEEP_BRANCH_PREDICTION
+      && flag_pic)
+    {
+      rtx insn, next;
+      for (insn = get_insns (); insn; insn = next)
+	{
+	  rtx pat, label, dest, cst, gotsym, new_insn;
+	  int icode;
+
+	  next = NEXT_INSN (insn);
+	  if (!NONDEBUG_INSN_P (insn))
+	    continue;
+
+	  icode = recog_memoized (insn);
+	  if (icode != CODE_FOR_set_got && icode != CODE_FOR_set_got_labelled)
+	    continue;
+
+	  extract_insn (insn);
+	  if (icode == CODE_FOR_set_got)
+	    {
+	      label = gen_label_rtx ();
+	      cst = const0_rtx;
+	    }
+	  else
+	    {
+	      label = recog_data.operand[1];
+	      cst = const1_rtx;
+	    }
+
+	  dest = recog_data.operand[0];
+	  pat = gen_set_got_call (label, cst);
+	  new_insn = emit_insn_before (pat, insn);
+	  RTX_FRAME_RELATED_P (new_insn) = 1;
+	  RTX_FRAME_RELATED_P (XVECEXP (PATTERN (new_insn), 0, 1)) = 1;
+	  gotsym = gen_rtx_SYMBOL_REF (Pmode, GOT_SYMBOL_NAME);
+	  pat = gen_set_got_pop (dest, label);
+	  new_insn = emit_insn_before (pat, insn);
+	  RTX_FRAME_RELATED_P (new_insn) = 1;
+	  RTX_FRAME_RELATED_P (XVECEXP (PATTERN (new_insn), 0, 1)) = 1;
+	  if (!TARGET_MACHO)
+	    {
+	      pat = gen_set_got_add (dest, gotsym, label);
+	      new_insn = emit_insn_before (pat, insn);
+	    }
+	  delete_insn (insn);
+	}
+    }
+
   if (optimize && optimize_function_for_speed_p (cfun))
     {
       if (TARGET_PAD_SHORT_FUNCTION)
@@ -30475,6 +30521,30 @@ ix86_reorg (void)
     move_or_delete_vzeroupper ();
 }
 
+/* Handle the TARGET_DWARF_HANDLE_FRAME_UNSPEC hook.
+   This is called from dwarf2out.c to emit call frame instructions
+   for frame-related insns containing UNSPECs and UNSPEC_VOLATILEs.  */
+static void
+i386_dwarf_handle_frame_unspec (rtx pattern ATTRIBUTE_UNUSED,
+				int index ATTRIBUTE_UNUSED)
+{
+  gcc_assert (index == UNSPEC_SET_GOT);
+}
+
+/* Handle the TARGET_DWARF_FLUSH_QUEUED_REGISTER_SAVES hook.
+   This is called from dwarf2out.c to decide whether all queued
+   register saves should be emitted before INSN.  */
+static bool
+i386_dwarf_flush_queued_register_saves (rtx insn)
+{
+  if (!TARGET_VXWORKS_RTP || !flag_pic)
+    {
+      int icode = recog_memoized (insn);
+      return (icode == CODE_FOR_set_got || icode == CODE_FOR_set_got_labelled);
+    }
+  return false;
+}
+
 /* Return nonzero when QImode register that must be represented via REX prefix
    is used.  */
 bool
@@ -35321,6 +35391,13 @@ ix86_autovectorize_vector_sizes (void)
 #define TARGET_ASM_OUTPUT_DWARF_DTPREL i386_output_dwarf_dtprel
 #endif
 
+#undef TARGET_DWARF_HANDLE_FRAME_UNSPEC
+#define TARGET_DWARF_HANDLE_FRAME_UNSPEC i386_dwarf_handle_frame_unspec
+
+#undef TARGET_DWARF_FLUSH_QUEUED_REGISTER_SAVES
+#define TARGET_DWARF_FLUSH_QUEUED_REGISTER_SAVES \
+  i386_dwarf_flush_queued_register_saves
+
 #ifdef SUBTARGET_INSERT_ATTRIBUTES
 #undef TARGET_INSERT_ATTRIBUTES
 #define TARGET_INSERT_ATTRIBUTES SUBTARGET_INSERT_ATTRIBUTES
Index: gcc/config/i386/i386.md
===================================================================
--- gcc.orig/config/i386/i386.md
+++ gcc/config/i386/i386.md
@@ -11797,6 +11797,52 @@
   ""
   "ix86_expand_prologue (); DONE;")
 
+(define_insn "set_got_call"
+  [(unspec [(label_ref (match_operand 0 "" ""))
+	    (match_operand:SI 1 "const_int_operand" "")] UNSPEC_SET_GOT)
+   (set (reg:SI SP_REG) (plus:SI (reg:SI SP_REG) (const_int -4)))]
+  "!TARGET_64BIT && !TARGET_VXWORKS_RTP && !TARGET_DEEP_BRANCH_PREDICTION
+   && flag_pic"
+{
+  output_asm_insn ("call\t%l0", operands);
+#if TARGET_MACHO
+  /* If this was for an unlabelled set_got instruction, output the Mach-O
+     "canonical" label name ("Lxx$pb") here too.  This is what will be
+     referenced by the Mach-O PIC subsystem.  */
+  if (operands[1] == const0_rtx)
+    ASM_OUTPUT_LABEL (asm_out_file, MACHOPIC_FUNCTION_BASE_NAME);
+#endif
+  targetm.asm_out.internal_label (asm_out_file, "L",
+				  CODE_LABEL_NUMBER (operands[0]));
+  return "";
+}
+  [(set_attr "type" "call")
+   (set_attr "length" "5")])
+
+(define_insn "set_got_pop"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+	(unspec [(label_ref (match_operand 1 "" ""))] UNSPEC_SET_GOT))
+   (set (reg:SI SP_REG) (plus:SI (reg:SI SP_REG) (const_int 4)))
+   (clobber (reg:CC FLAGS_REG))]
+  "!TARGET_64BIT && !TARGET_VXWORKS_RTP && !TARGET_DEEP_BRANCH_PREDICTION
+   && flag_pic"
+  "pop%z0\t%0"
+  [(set_attr "type" "multi")
+   (set_attr "length" "7")])
+
+(define_insn "set_got_add"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+	(unspec [(match_dup 0)
+		 (match_operand 1 "" "")
+		 (label_ref (match_operand 2 "" ""))] UNSPEC_SET_GOT))
+   (set (reg:SI SP_REG) (plus:SI (reg:SI SP_REG) (const_int 4)))
+   (clobber (reg:CC FLAGS_REG))]
+  "!TARGET_64BIT && !TARGET_VXWORKS_RTP && !TARGET_DEEP_BRANCH_PREDICTION
+   && flag_pic"
+  "add%z0\t{%1+[.-%l2], %0|%0, %1+(.-%l2)}"
+  [(set_attr "type" "multi")
+   (set_attr "length" "7")])
+
 (define_insn "set_got"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(unspec:SI [(const_int 0)] UNSPEC_SET_GOT))
Index: gcc/doc/tm.texi
===================================================================
--- gcc.orig/doc/tm.texi
+++ gcc/doc/tm.texi
@@ -3204,6 +3204,10 @@ terminate the stack backtrace.  New port
 @end defmac
 
 @deftypefn {Target Hook} void TARGET_DWARF_HANDLE_FRAME_UNSPEC (rtx @var{pattern}, int @var{index})
+
+@deftypefn {Target Hook} bool TARGET_DWARF_FLUSH_QUEUED_REGISTER_SAVES (rtx @var{insn})
+This target hook allows the backend to force dwarf2out to flush queued register saves before an insn when generating unwind information.  It is called with an insn as its argument and should return true if register saves must be flushed.
+@end deftypefn
 This target hook allows the backend to emit frame-related insns that
 contain UNSPECs or UNSPEC_VOLATILEs.  The DWARF 2 call frame debugging
 info engine will invoke it on insns of the form
Index: gcc/dwarf2out.c
===================================================================
--- gcc.orig/dwarf2out.c
+++ gcc/dwarf2out.c
@@ -2972,7 +2972,9 @@ scan_until_barrier (rtx insn, jump_targe
 	}
 
       if (!NONJUMP_INSN_P (insn) || clobbers_queued_reg_save (insn)
-	  || (NOTE_P (insn) && NOTE_KIND (insn) == NOTE_INSN_PROLOGUE_END))
+	  || (NOTE_P (insn) && NOTE_KIND (insn) == NOTE_INSN_PROLOGUE_END)
+	  || (targetm.dwarf_flush_queued_register_saves != NULL
+	      && targetm.dwarf_flush_queued_register_saves (insn)))
 	{
 	  cfi_insn = PREV_INSN (insn);
 	  dwarf2out_flush_queued_reg_saves ();
Index: gcc/target.def
===================================================================
--- gcc.orig/target.def
+++ gcc/target.def
@@ -1794,6 +1794,14 @@ DEFHOOK
  "",
  void, (rtx pattern, int index), NULL)
 
+DEFHOOK
+(dwarf_flush_queued_register_saves,
+"This target hook allows the backend to force dwarf2out to flush queued\
+ register saves before an insn when generating unwind information.  It\
+ is called with an insn as its argument and should return true if\
+ register saves must be flushed.",
+ bool, (rtx insn), NULL)
+
 /* ??? Documenting this hook requires a GFDL license grant.  */
 DEFHOOK_UNDOC
 (stdarg_optimize_hook,

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 2/6] Unique return rtx
  2011-03-31 13:23   ` Jeff Law
@ 2011-05-03 11:54     ` Bernd Schmidt
  0 siblings, 0 replies; 73+ messages in thread
From: Bernd Schmidt @ 2011-05-03 11:54 UTC (permalink / raw)
  To: Jeff Law; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 483 bytes --]

On 03/31/2011 03:17 PM, Jeff Law wrote:
> On 03/23/11 08:47, Bernd Schmidt wrote:
>> We'll start putting "return" into JUMP_LABELS in a subsequent patch, so
>> I've decided to make it unique as a small cleanup.
> 
>> There's already another macro called "return_rtx", so the new one goes
>> by the name of "ret_rtx".
> OK.

Thanks. Now committed in this form, slightly updated. I've verified that
all ports modified in this patch still build the gcc/ directory
successfully.


Bernd

[-- Attachment #2: unique-ret2.diff --]
[-- Type: text/plain, Size: 13381 bytes --]

Index: gengenrtl.c
===================================================================
--- gengenrtl.c	(revision 173297)
+++ gengenrtl.c	(working copy)
@@ -128,6 +128,9 @@ special_rtx (int idx)
 	  || strcmp (defs[idx].enumname, "REG") == 0
 	  || strcmp (defs[idx].enumname, "SUBREG") == 0
 	  || strcmp (defs[idx].enumname, "MEM") == 0
+	  || strcmp (defs[idx].enumname, "PC") == 0
+	  || strcmp (defs[idx].enumname, "CC0") == 0
+	  || strcmp (defs[idx].enumname, "RETURN") == 0
 	  || strcmp (defs[idx].enumname, "CONST_VECTOR") == 0);
 }
 
Index: genemit.c
===================================================================
--- genemit.c	(revision 173297)
+++ genemit.c	(working copy)
@@ -166,6 +166,9 @@ gen_exp (rtx x, enum rtx_code subroutine
     case PC:
       printf ("pc_rtx");
       return;
+    case RETURN:
+      printf ("ret_rtx");
+      return;
     case CLOBBER:
       if (REG_P (XEXP (x, 0)))
 	{
Index: ChangeLog
===================================================================
--- ChangeLog	(revision 173297)
+++ ChangeLog	(working copy)
@@ -1,3 +1,29 @@
+2011-05-03  Bernd Schmidt  <bernds@codesourcery.com>
+
+	* gengenrtl.c (special_rtx): PC, CC0 and RETURN are special.
+	* genemit.c (gen_exp): Handle RETURN.
+	* emit-rtl.c (verify_rtx_sharing): Likewise.
+	(init_emit_regs): Create pc_rtx, ret_rtx and cc0_rtx specially.
+	* rtl.c (copy_rtx): RETURN is shared.
+	* rtl.h (enum global_rtl_index): Add GR_RETURN.
+	(ret_rtx): New.
+	* jump.c (redirect_exp_1): Don't use gen_rtx_RETURN.
+	* config/s390/s390.c (s390_emit_epilogue): Likewise.
+	* config/rx/rx.c (gen_rx_rtsd_vector): Likewise.
+	* config/cris/cris.c (cris_expand_return): Likewise.
+	* config/m68k/m68k.c (m68k_expand_epilogue): Likewise.
+	* config/rs6000/rs6000.c (rs6000_make_savres_rtx,
+	rs6000_emit_epilogue, rs6000_output_mi_thunk): Likewise.
+	* config/picochip/picochip.c (picochip_expand_epilogue): Likewise.
+	* config/h8300/h8300.c (h8300_push_pop, h8300_expand_epilogue):
+	Likewise.
+	* config/v850/v850.c (expand_epilogue): Likewise.
+	* config/bfin/bfin.c (bfin_expand_call): Likewise.
+	* config/arm/arm.md (epilogue): Likewise.
+	* config/mn10300/mn10300.c (mn10300_expand_epilogue): Likewise.
+	* config/sparc/sparc.c (sparc_struct_value_rtx): Rename ret_rtx
+	variable to ret_reg.
+
 2011-05-03  Richard Guenther  <rguenther@suse.de>
 
 	* c-decl.c (grokdeclarator): Instead of looking at
Index: jump.c
===================================================================
--- jump.c	(revision 173297)
+++ jump.c	(working copy)
@@ -1367,7 +1367,7 @@ redirect_exp_1 (rtx *loc, rtx olabel, rt
 	  if (nlabel)
 	    n = gen_rtx_LABEL_REF (Pmode, nlabel);
 	  else
-	    n = gen_rtx_RETURN (VOIDmode);
+	    n = ret_rtx;
 
 	  validate_change (insn, loc, n, 1);
 	  return;
@@ -1378,7 +1378,7 @@ redirect_exp_1 (rtx *loc, rtx olabel, rt
       if (nlabel)
 	x = gen_rtx_LABEL_REF (Pmode, nlabel);
       else
-	x = gen_rtx_RETURN (VOIDmode);
+	x = ret_rtx;
       if (loc == &PATTERN (insn))
 	x = gen_rtx_SET (VOIDmode, pc_rtx, x);
       validate_change (insn, loc, x, 1);
@@ -1389,7 +1389,7 @@ redirect_exp_1 (rtx *loc, rtx olabel, rt
       && GET_CODE (SET_SRC (x)) == LABEL_REF
       && XEXP (SET_SRC (x), 0) == olabel)
     {
-      validate_change (insn, loc, gen_rtx_RETURN (VOIDmode), 1);
+      validate_change (insn, loc, ret_rtx, 1);
       return;
     }
 
Index: emit-rtl.c
===================================================================
--- emit-rtl.c	(revision 173297)
+++ emit-rtl.c	(working copy)
@@ -2450,6 +2450,7 @@ verify_rtx_sharing (rtx orig, rtx insn)
     case CODE_LABEL:
     case PC:
     case CC0:
+    case RETURN:
     case SCRATCH:
       return;
       /* SCRATCH must be shared because they represent distinct values.  */
@@ -5416,8 +5417,9 @@ init_emit_regs (void)
   init_reg_modes_target ();
 
   /* Assign register numbers to the globally defined register rtx.  */
-  pc_rtx = gen_rtx_PC (VOIDmode);
-  cc0_rtx = gen_rtx_CC0 (VOIDmode);
+  pc_rtx = gen_rtx_fmt_ (PC, VOIDmode);
+  ret_rtx = gen_rtx_fmt_ (RETURN, VOIDmode);
+  cc0_rtx = gen_rtx_fmt_ (CC0, VOIDmode);
   stack_pointer_rtx = gen_raw_REG (Pmode, STACK_POINTER_REGNUM);
   frame_pointer_rtx = gen_raw_REG (Pmode, FRAME_POINTER_REGNUM);
   hard_frame_pointer_rtx = gen_raw_REG (Pmode, HARD_FRAME_POINTER_REGNUM);
Index: rtl.c
===================================================================
--- rtl.c	(revision 173297)
+++ rtl.c	(working copy)
@@ -255,6 +255,7 @@ copy_rtx (rtx orig)
     case CODE_LABEL:
     case PC:
     case CC0:
+    case RETURN:
     case SCRATCH:
       /* SCRATCH must be shared because they represent distinct values.  */
       return orig;
Index: rtl.h
===================================================================
--- rtl.h	(revision 173297)
+++ rtl.h	(working copy)
@@ -2045,6 +2045,7 @@ enum global_rtl_index
 {
   GR_PC,
   GR_CC0,
+  GR_RETURN,
   GR_STACK_POINTER,
   GR_FRAME_POINTER,
 /* For register elimination to work properly these hard_frame_pointer_rtx,
@@ -2134,6 +2135,7 @@ extern struct target_rtl *this_target_rt
 
 /* Standard pieces of rtx, to be substituted directly into things.  */
 #define pc_rtx                  (global_rtl[GR_PC])
+#define ret_rtx                 (global_rtl[GR_RETURN])
 #define cc0_rtx                 (global_rtl[GR_CC0])
 
 /* All references to certain hard regs, except those created
Index: config/s390/s390.c
===================================================================
--- config/s390/s390.c	(revision 173297)
+++ config/s390/s390.c	(working copy)
@@ -8485,7 +8485,7 @@ s390_emit_epilogue (bool sibcall)
 
       p = rtvec_alloc (2);
 
-      RTVEC_ELT (p, 0) = gen_rtx_RETURN (VOIDmode);
+      RTVEC_ELT (p, 0) = ret_rtx;
       RTVEC_ELT (p, 1) = gen_rtx_USE (VOIDmode, return_reg);
       emit_jump_insn (gen_rtx_PARALLEL (VOIDmode, p));
     }
Index: config/sparc/sparc.c
===================================================================
--- config/sparc/sparc.c	(revision 173297)
+++ config/sparc/sparc.c	(working copy)
@@ -6063,7 +6063,7 @@ sparc_struct_value_rtx (tree fndecl, int
 	  /* We must check and adjust the return address, as it is
 	     optional as to whether the return object is really
 	     provided.  */
-	  rtx ret_rtx = gen_rtx_REG (Pmode, 31);
+	  rtx ret_reg = gen_rtx_REG (Pmode, 31);
 	  rtx scratch = gen_reg_rtx (SImode);
 	  rtx endlab = gen_label_rtx ();
 
@@ -6080,12 +6080,12 @@ sparc_struct_value_rtx (tree fndecl, int
 	     it's an unimp instruction (the most significant 10 bits
 	     will be zero).  */
 	  emit_move_insn (scratch, gen_rtx_MEM (SImode,
-						plus_constant (ret_rtx, 8)));
+						plus_constant (ret_reg, 8)));
 	  /* Assume the size is valid and pre-adjust */
-	  emit_insn (gen_add3_insn (ret_rtx, ret_rtx, GEN_INT (4)));
+	  emit_insn (gen_add3_insn (ret_reg, ret_reg, GEN_INT (4)));
 	  emit_cmp_and_jump_insns (scratch, size_rtx, EQ, const0_rtx, SImode,
 				   0, endlab);
-	  emit_insn (gen_sub3_insn (ret_rtx, ret_rtx, GEN_INT (4)));
+	  emit_insn (gen_sub3_insn (ret_reg, ret_reg, GEN_INT (4)));
 	  /* Write the address of the memory pointed to by temp_val into
 	     the memory pointed to by mem */
 	  emit_move_insn (mem, XEXP (temp_val, 0));
Index: config/rx/rx.c
===================================================================
--- config/rx/rx.c	(revision 173297)
+++ config/rx/rx.c	(working copy)
@@ -1567,7 +1567,7 @@ gen_rx_rtsd_vector (unsigned int adjust,
 				: plus_constant (stack_pointer_rtx,
 						 i * UNITS_PER_WORD)));
 
-  XVECEXP (vector, 0, count - 1) = gen_rtx_RETURN (VOIDmode);
+  XVECEXP (vector, 0, count - 1) = ret_rtx;
 
   return vector;
 }
Index: config/cris/cris.c
===================================================================
--- config/cris/cris.c	(revision 173297)
+++ config/cris/cris.c	(working copy)
@@ -1790,7 +1790,7 @@ cris_expand_return (bool on_stack)
      we do that until they're fixed.  Currently, all return insns in a
      function must be the same (not really a limiting factor) so we need
      to check that it doesn't change half-way through.  */
-  emit_jump_insn (gen_rtx_RETURN (VOIDmode));
+  emit_jump_insn (ret_rtx);
 
   CRIS_ASSERT (cfun->machine->return_type != CRIS_RETINSN_RET || !on_stack);
   CRIS_ASSERT (cfun->machine->return_type != CRIS_RETINSN_JUMP || on_stack);
Index: config/mn10300/mn10300.c
===================================================================
--- config/mn10300/mn10300.c	(revision 173297)
+++ config/mn10300/mn10300.c	(working copy)
@@ -1255,7 +1255,7 @@ mn10300_expand_epilogue (void)
 
   /* Adjust the stack and restore callee-saved registers, if any.  */
   if (mn10300_can_use_rets_insn ())
-    emit_jump_insn (gen_rtx_RETURN (VOIDmode));
+    emit_jump_insn (ret_rtx);
   else
     emit_jump_insn (gen_return_ret (GEN_INT (size + REG_SAVE_BYTES)));
 }
Index: config/m68k/m68k.c
===================================================================
--- config/m68k/m68k.c	(revision 173297)
+++ config/m68k/m68k.c	(working copy)
@@ -1308,7 +1308,7 @@ m68k_expand_epilogue (bool sibcall_p)
 			   EH_RETURN_STACKADJ_RTX));
 
   if (!sibcall_p)
-    emit_jump_insn (gen_rtx_RETURN (VOIDmode));
+    emit_jump_insn (ret_rtx);
 }
 \f
 /* Return true if X is a valid comparison operator for the dbcc 
Index: config/rs6000/rs6000.c
===================================================================
--- config/rs6000/rs6000.c	(revision 173297)
+++ config/rs6000/rs6000.c	(working copy)
@@ -20358,7 +20358,7 @@ rs6000_make_savres_rtx (rs6000_stack_t *
   p = rtvec_alloc ((lr ? 4 : 3) + n_regs);
 
   if (!savep && lr)
-    RTVEC_ELT (p, offset++) = gen_rtx_RETURN (VOIDmode);
+    RTVEC_ELT (p, offset++) = ret_rtx;
 
   RTVEC_ELT (p, offset++)
     = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, 65));
@@ -21350,7 +21350,7 @@ rs6000_emit_epilogue (int sibcall)
       alloc_rname = ggc_strdup (rname);
 
       j = 0;
-      RTVEC_ELT (p, j++) = gen_rtx_RETURN (VOIDmode);
+      RTVEC_ELT (p, j++) = ret_rtx;
       RTVEC_ELT (p, j++) = gen_rtx_USE (VOIDmode,
 					gen_rtx_REG (Pmode,
 						     LR_REGNO));
@@ -21966,7 +21966,7 @@ rs6000_emit_epilogue (int sibcall)
       else
 	p = rtvec_alloc (2);
 
-      RTVEC_ELT (p, 0) = gen_rtx_RETURN (VOIDmode);
+      RTVEC_ELT (p, 0) = ret_rtx;
       RTVEC_ELT (p, 1) = ((restoring_FPRs_inline || !lr)
 			  ? gen_rtx_USE (VOIDmode, gen_rtx_REG (Pmode, 65))
 			  : gen_rtx_CLOBBER (VOIDmode,
@@ -22405,7 +22405,7 @@ rs6000_output_mi_thunk (FILE *file, tree
 			gen_rtx_USE (VOIDmode,
 				     gen_rtx_REG (SImode,
 						  LR_REGNO)),
-			gen_rtx_RETURN (VOIDmode))));
+			ret_rtx)));
   SIBLING_CALL_P (insn) = 1;
   emit_barrier ();
 
Index: config/picochip/picochip.c
===================================================================
--- config/picochip/picochip.c	(revision 173297)
+++ config/picochip/picochip.c	(working copy)
@@ -2273,7 +2273,7 @@ picochip_expand_epilogue (int is_sibling
     rtvec p;
     p = rtvec_alloc (2);
 
-    RTVEC_ELT (p, 0) = gen_rtx_RETURN (VOIDmode);
+    RTVEC_ELT (p, 0) = ret_rtx;
     RTVEC_ELT (p, 1) = gen_rtx_USE (VOIDmode,
 				    gen_rtx_REG (Pmode, LINK_REGNUM));
     emit_jump_insn (gen_rtx_PARALLEL (VOIDmode, p));
Index: config/arm/arm.md
===================================================================
--- config/arm/arm.md	(revision 173297)
+++ config/arm/arm.md	(working copy)
@@ -10006,9 +10006,7 @@ (define_expand "epilogue"
       DONE;
     }
   emit_jump_insn (gen_rtx_UNSPEC_VOLATILE (VOIDmode,
-	gen_rtvec (1,
-		gen_rtx_RETURN (VOIDmode)),
-	VUNSPEC_EPILOGUE));
+	gen_rtvec (1, ret_rtx), VUNSPEC_EPILOGUE));
   DONE;
   "
 )
Index: config/h8300/h8300.c
===================================================================
--- config/h8300/h8300.c	(revision 173297)
+++ config/h8300/h8300.c	(working copy)
@@ -643,7 +643,7 @@ h8300_push_pop (int regno, int nregs, bo
   /* Add the return instruction.  */
   if (return_p)
     {
-      RTVEC_ELT (vec, i) = gen_rtx_RETURN (VOIDmode);
+      RTVEC_ELT (vec, i) = ret_rtx;
       i++;
     }
 
@@ -927,7 +927,7 @@ h8300_expand_epilogue (void)
     }
 
   if (!returned_p)
-    emit_jump_insn (gen_rtx_RETURN (VOIDmode));
+    emit_jump_insn (ret_rtx);
 }
 
 /* Return nonzero if the current function is an interrupt
Index: config/v850/v850.c
===================================================================
--- config/v850/v850.c	(revision 173297)
+++ config/v850/v850.c	(working copy)
@@ -1890,7 +1890,7 @@ expand_epilogue (void)
 	  int offset;
 	  restore_all = gen_rtx_PARALLEL (VOIDmode,
 					  rtvec_alloc (num_restore + 2));
-	  XVECEXP (restore_all, 0, 0) = gen_rtx_RETURN (VOIDmode);
+	  XVECEXP (restore_all, 0, 0) = ret_rtx;
 	  XVECEXP (restore_all, 0, 1)
 	    = gen_rtx_SET (VOIDmode, stack_pointer_rtx,
 			    gen_rtx_PLUS (Pmode,
Index: config/bfin/bfin.c
===================================================================
--- config/bfin/bfin.c	(revision 173297)
+++ config/bfin/bfin.c	(working copy)
@@ -2334,7 +2334,7 @@ bfin_expand_call (rtx retval, rtx fnaddr
     XVECEXP (pat, 0, n++) = gen_rtx_USE (VOIDmode, picreg);
   XVECEXP (pat, 0, n++) = gen_rtx_USE (VOIDmode, cookie);
   if (sibcall)
-    XVECEXP (pat, 0, n++) = gen_rtx_RETURN (VOIDmode);
+    XVECEXP (pat, 0, n++) = ret_rtx;
   else
     XVECEXP (pat, 0, n++) = gen_rtx_CLOBBER (VOIDmode, retsreg);
   call = emit_call_insn (pat);

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/6] Shrink-wrapping
  2011-03-23 14:56 ` [PATCH 4/6] Shrink-wrapping Bernd Schmidt
@ 2011-07-07 14:51   ` Richard Sandiford
  2011-07-07 15:40     ` Bernd Schmidt
                       ` (3 more replies)
  2011-07-07 21:41   ` Michael Hope
  1 sibling, 4 replies; 73+ messages in thread
From: Richard Sandiford @ 2011-07-07 14:51 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches

Bernd Schmidt <bernds@codesourcery.com> writes:
> This adds the actual optimization, and reworks the JUMP_LABEL handling
> for return blocks. See the introduction mail or the new comment ahead of
> thread_prologue_and_epilogue_insns for more notes.

It seems a shame to have both (return) and (simple_return).  You said
that we need the distinction in order to cope with targets like ARM,
whose (return) instruction actually performs some of the epilogue too.
It feels like the load of the saved registers should really be expressed
in rtl, in parallel with the return.  I realise that'd prevent
conditional returns though.  Maybe there's no elegant way out...

With the hidden loads, it seems like we'll have a situation in which the
values of call-saved registers will appear to be different for different
"real" incoming edges to the exit block.

Is JUMP_LABEL ever null after this change?  (In fully-complete rtl
sequences, I mean.)  It looked like some of the null checks in the
patch might not be necessary any more.

JUMP_LABEL also seems somewhat misnamed after this change; maybe
JUMP_TARGET would be better?  I'm the last person who should be
recommending names though.

I know it's a pain, but it'd really help if you could split the
"JUMP_LABEL == a return rtx" stuff out.

I think it'd also be worth splitting the RETURN_ADDR_REGNUM bit out into
a separate patch, and handling other things in a more generic way.
E.g. the default INCOMING_RETURN_ADDR_RTX could then be:

  #define INCOMING_RETURN_ADDR_RTX gen_rtx_REG (Pmode, RETURN_ADDR_REGNUM)

and df.c:df_get_exit_block_use_set should include RETURN_ADDR_REGNUM
when epilogue_completed.

It'd be nice to handle cases in which all references to the stack pointer
are to the incoming arguments.  Maybe mention the fact that we don't as
another source of conservatism?

It'd also be nice to get rid of all these big blocks of code that are
conditional on preprocessor macros, but I realise you're just following
existing practice in the surrounding code, so again it can be left to
a future cleanup.

> @@ -1280,7 +1297,7 @@ force_nonfallthru_and_redirect (edge e, 
>  basic_block
>  force_nonfallthru (edge e)
>  {
> -  return force_nonfallthru_and_redirect (e, e->dest);
> +  return force_nonfallthru_and_redirect (e, e->dest, NULL_RTX);
>  }

Maybe assert here that e->dest isn't the exit block?  I realise it
will be caught by the:

    gcc_assert (jump_label == simple_return_rtx);

check, but an assert here would make it more obvious what had gone wrong.

> -  if (GET_CODE (x) == RETURN)
> +  if (GET_CODE (x) == RETURN || GET_CODE (x) == SIMPLE_RETURN)

ANY_RETURN_P (x).  A few other cases.

> @@ -5654,6 +5658,7 @@ init_emit_regs (void)
>    /* Assign register numbers to the globally defined register rtx.  */
>    pc_rtx = gen_rtx_fmt_ (PC, VOIDmode);
>    ret_rtx = gen_rtx_fmt_ (RETURN, VOIDmode);
> +  simple_return_rtx = gen_rtx_fmt_ (SIMPLE_RETURN, VOIDmode);
>    cc0_rtx = gen_rtx_fmt_ (CC0, VOIDmode);
>    stack_pointer_rtx = gen_raw_REG (Pmode, STACK_POINTER_REGNUM);
>    frame_pointer_rtx = gen_raw_REG (Pmode, FRAME_POINTER_REGNUM);

It'd be nice to s/ret_rtx/return_rtx/ for consistency, but that can
happen anytime.

> +/* Return true if INSN requires the stack frame to be set up.  */
> +static bool
> +requires_stack_frame_p (rtx insn)
> +{
> +  HARD_REG_SET hardregs;
> +  unsigned regno;
> +
> +  if (!INSN_P (insn) || DEBUG_INSN_P (insn))
> +    return false;
> +  if (CALL_P (insn))
> +    return !SIBLING_CALL_P (insn);
> +  if (for_each_rtx (&PATTERN (insn), frame_required_for_rtx, NULL))
> +    return true;
> +  CLEAR_HARD_REG_SET (hardregs);
> +  note_stores (PATTERN (insn), record_hard_reg_sets, &hardregs);
> +  AND_COMPL_HARD_REG_SET (hardregs, call_used_reg_set);
> +  for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    if (TEST_HARD_REG_BIT (hardregs, regno)
> +	&& df_regs_ever_live_p (regno))
> +      return true;

This can be done as a follow-up, but it looks like df should be using
a HARD_REG_SET here, and that we should be able to get at it directly.

> +	  FOR_EACH_EDGE (e, ei, bb->preds)
> +	    if (!bitmap_bit_p (&bb_antic_flags, e->src->index))
> +	      {
> +		VEC_quick_push (basic_block, vec, e->src);
> +		bitmap_set_bit (&bb_on_list, e->src->index);
> +	      }

&& !bitmap_bit_p (&bb_on_list, e->src->index) ?

> +	}
> +      while (!VEC_empty (basic_block, vec))
> +	{
> +	  basic_block tmp_bb = VEC_pop (basic_block, vec);
> +	  edge e;
> +	  edge_iterator ei;
> +	  bool all_set = true;
> +
> +	  bitmap_clear_bit (&bb_on_list, tmp_bb->index);
> +	  FOR_EACH_EDGE (e, ei, tmp_bb->succs)
> +	    {
> +	      if (!bitmap_bit_p (&bb_antic_flags, e->dest->index))
> +		{
> +		  all_set = false;
> +		  break;
> +		}
> +	    }
> +	  if (all_set)
> +	    {
> +	      bitmap_set_bit (&bb_antic_flags, tmp_bb->index);
> +	      FOR_EACH_EDGE (e, ei, tmp_bb->preds)
> +		if (!bitmap_bit_p (&bb_antic_flags, e->src->index))
> +		  {
> +		    VEC_quick_push (basic_block, vec, e->src);
> +		    bitmap_set_bit (&bb_on_list, e->src->index);
> +		  }

same here.

> +	    }
> +	}
> +      /* Find exactly one edge that leads to a block in ANTIC from
> +	 a block that isn't.  */
> +      if (!bitmap_bit_p (&bb_antic_flags, entry_edge->dest->index))
> +	FOR_EACH_BB (bb)
> +	  {
> +	    if (!bitmap_bit_p (&bb_antic_flags, bb->index))
> +	      continue;
> +	    FOR_EACH_EDGE (e, ei, bb->preds)
> +	      if (!bitmap_bit_p (&bb_antic_flags, e->src->index))
> +		{
> +		  if (entry_edge != orig_entry_edge)
> +		    {
> +		      entry_edge = orig_entry_edge;
> +		      goto fail_shrinkwrap;
> +		    }
> +		  entry_edge = e;
> +		}
> +	  }

AIUI, this prevents the optimisation for things like

  if (a) {
    switch (b) {
      case 1:
        ...stuff that requires a frame...
        break;
      case 2:
        ...stuff that requires a frame...
        break;
      default:
        ...stuff that doesn't require a frame...
        break;
    }
  }

The switch won't be in ANTIC, but it will have two successors that are.
Is that right?

Would it work to do something like:

      FOR_EACH_BB (bb)
	{
	  rtx insn;
	  FOR_BB_INSNS (bb, insn)
	    if (requires_stack_frame_p (insn))
	      {
		bitmap_set_bit (&bb_flags, bb->index);
		break;
	      }
	}
      if (bitmap_empty_p (bb_flags))
	...no frame needed...
      else
	{
	  calculate_dominance_info (CDI_DOMINATORS);
	  bb = nearest_common_dominator_for_set (CDI_DOMINATORS, bb_flags);
	  if (bb == ENTRY_BLOCK_PTR)
	    ...bleh...
	  else
	    ...insert prologue at the beginning of bb...
	}

?  Or (for a different trade-off) just use nearest_common_dominator
directly, and avoid the bitmap.

> @@ -5515,25 +5841,38 @@ thread_prologue_and_epilogue_insns (void
>        set_insn_locators (seq, epilogue_locator);
>  
>        seq = get_insns ();
> +      returnjump = get_last_insn ();
>        end_sequence ();
>  
> -      insert_insn_on_edge (seq, e);
> +      insert_insn_on_edge (seq, exit_fallthru_edge);
>        inserted = true;
> +      if (JUMP_P (returnjump))
> +	{
> +	  rtx pat = PATTERN (returnjump);
> +	  if (GET_CODE (pat) == PARALLEL)
> +	    pat = XVECEXP (pat, 0, 0);
> +	  if (ANY_RETURN_P (pat))
> +	    JUMP_LABEL (returnjump) = pat;
> +	  else
> +	    JUMP_LABEL (returnjump) = ret_rtx;
> +	}
> +      else
> +	returnjump = NULL_RTX;

Does the "JUMP_LABEL (returnjump) = ret_rtx;" handle targets that
use things like (set (pc) (reg RA)) as their return?  Probably worth
adding a comment if so.

I didn't review much after this, because it was hard to sort the
simple_return stuff out from the "JUMP_LABEL can be a return rtx" change.

Richard

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/6] Shrink-wrapping
  2011-07-07 14:51   ` Richard Sandiford
@ 2011-07-07 15:40     ` Bernd Schmidt
  2011-07-07 17:00       ` Paul Koning
  2011-07-07 15:57     ` [PATCH 4/6] Shrink-wrapping Richard Earnshaw
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 73+ messages in thread
From: Bernd Schmidt @ 2011-07-07 15:40 UTC (permalink / raw)
  To: GCC Patches, richard.sandiford; +Cc: Paul Koning, ni1d

Whee! Thanks for reviewing (reviving?) this old thing.

I should be posting an up-to-date version of this, but for the moment it
has to wait until dwarf2out is sorted out, and I'm rather busy with
other stuff. I hope to squeeze this in in the not too distant future.

I'll try to answer some of the questions now...

On 07/07/11 16:34, Richard Sandiford wrote:
> Bernd Schmidt <bernds@codesourcery.com> writes:
>> This adds the actual optimization, and reworks the JUMP_LABEL handling
>> for return blocks. See the introduction mail or the new comment ahead of
>> thread_prologue_and_epilogue_insns for more notes.
> 
> It seems a shame to have both (return) and (simple_return).

Yes, but the distinction exists and must be represented somehow - you
can have both in the same function.

> You said
> that we need the distinction in order to cope with targets like ARM,
> whose (return) instruction actually performs some of the epilogue too.
> It feels like the load of the saved registers should really be expressed
> in rtl, in parallel with the return.  I realise that'd prevent
> conditional returns though.  Maybe there's no elegant way out...

It certainly would make it harder to transform branches to conditional
returns. It would also require examining every port to see if it needs
changes to its return patterns. It probably only affects ARM though, but
that target is important enough that we should support the feature (i.e.
conditional returns that pop registers).

If we described conditional returns only as COND_EXEC maybe... AFAICT
only ia64, arm, frv and c6x have conditional return. I'll have to think
about it.

Note that some interface changes will be necessary in any case - passing
NULL as a new jump label simply isn't informative enough when
redirecting a jump; we must be able to distinguish between the two forms
of return at this level. So the ret_rtx/simple_return_rtx may turn out
to be the simplest solution after all.

> With the hidden loads, it seems like we'll have a situation in which the
> values of call-saved registers will appear to be different for different
> "real" incoming edges to the exit block.

Probably true, but I doubt we have any code that would notice. Can you
imagine anything that would care?

> Is JUMP_LABEL ever null after this change?  (In fully-complete rtl
> sequences, I mean.)  It looked like some of the null checks in the
> patch might not be necessary any more.

It shouldn't be, and it's possible that a few of these tests survived
when they shouldn't have.

> JUMP_LABEL also seems somewhat misnamed after this change; maybe
> JUMP_TARGET would be better?

Maybe. I dread the renaming patch though.

> It'd also be nice to get rid of all these big blocks of code that are
> conditional on preprocessor macros, but I realise you're just following
> existing practice in the surrounding code, so again it can be left to
> a future cleanup.

Yeah, this function is quite horrid - so many different paths through it.

However, it looks like the only target without HAVE_prologue is actually
pdp11, so we're carrying some unnecessary baggage for purely
retrocomputing purposes. Paul, can you fix that?

>>    ret_rtx = gen_rtx_fmt_ (RETURN, VOIDmode);
>> +  simple_return_rtx = gen_rtx_fmt_ (SIMPLE_RETURN, VOIDmode);
> 
> It'd be nice to s/ret_rtx/return_rtx/ for consistency, but that can
> happen anytime.

Unfortunately there's another macro called return_rtx.

>> +	&& df_regs_ever_live_p (regno))
>> +      return true;
> 
> This can be done as a follow-up, but it looks like df should be using
> a HARD_REG_SET here, and that we should be able to get at it directly.

For the df_regs_ever_live thing? Could change that, yes.

[...]
> AIUI, this prevents the optimisation for things like
> 
>   if (a) {
>     switch (b) {
>       case 1:
>         ...stuff that requires a frame...
>         break;
>       case 2:
>         ...stuff that requires a frame...
>         break;
>       default:
>         ...stuff that doesn't require a frame...
>         break;
>     }
>   }
> 
> The switch won't be in ANTIC, but it will have two successors that are.
> Is that right?
> 
> Would it work to do something like:
> 
[...]

IIRC the problem here is making sure to match up prologues and epilogues
- the latter should not occur on any path that had a prologue set up and
vice versa. I think something more clever would break on e.g.

   if (c)
     goto label;
   if (a) {
     switch (b) {
       case 1:
         ...stuff that requires a frame...
         break;
       case 2:
         ...stuff that requires a frame...
         break;
       default:
         ...stuff that doesn't require a frame...
	label:
         ...more stuff that doesn't require a frame...
         break;
     }
   }

If you add a prologue before the switch, two paths join at label where
one needs a prologue and the other doesn't.

> Does the "JUMP_LABEL (returnjump) = ret_rtx;" handle targets that
> use things like (set (pc) (reg RA)) as their return?  Probably worth
> adding a comment if so.

It simply must be a JUMP_INSN, right? I think we can assume that all
returns are JUMP_INSNS and fix any ports that break.

Bernd

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/6] Shrink-wrapping
  2011-07-07 14:51   ` Richard Sandiford
  2011-07-07 15:40     ` Bernd Schmidt
@ 2011-07-07 15:57     ` Richard Earnshaw
  2011-07-07 20:19       ` Richard Sandiford
  2011-07-21  3:57     ` Bernd Schmidt
  2011-08-02  8:40     ` Bernd Schmidt
  3 siblings, 1 reply; 73+ messages in thread
From: Richard Earnshaw @ 2011-07-07 15:57 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches, richard.sandiford

On 07/07/11 15:34, Richard Sandiford wrote:
> It seems a shame to have both (return) and (simple_return).  You said
> that we need the distinction in order to cope with targets like ARM,
> whose (return) instruction actually performs some of the epilogue too.
> It feels like the load of the saved registers should really be expressed
> in rtl, in parallel with the return.  I realise that'd prevent
> conditional returns though.  Maybe there's no elegant way out...

You'd still need to deal with distinct returns for shrink-wrapped code
when the full (return) expands to

	ldm	sp, {regs..., pc}

The shrink wrapped version would always be
	bx	lr

There are also cases (eg on v4T) where the Thumb return sequence
sometimes has to pop into a lo register before branching to that return
address, eg

	pop	{r3}
	bx	r3

in order to get interworking.

R.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/6] Shrink-wrapping
  2011-07-07 15:40     ` Bernd Schmidt
@ 2011-07-07 17:00       ` Paul Koning
  2011-07-07 17:02         ` Jeff Law
  0 siblings, 1 reply; 73+ messages in thread
From: Paul Koning @ 2011-07-07 17:00 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches, Richard Sandiford


On Jul 7, 2011, at 11:38 AM, Bernd Schmidt wrote:

> ...
> 
>> It'd also be nice to get rid of all these big blocks of code that are
>> conditional on preprocessor macros, but I realise you're just following
>> existing practice in the surrounding code, so again it can be left to
>> a future cleanup.
> 
> Yeah, this function is quite horrid - so many different paths through it.
> 
> However, it looks like the only target without HAVE_prologue is actually
> pdp11, so we're carrying some unnecessary baggage for purely
> retrocomputing purposes. Paul, can you fix that?

Sure, but...  I searched for HAVE_prologue and I can't find any place that set it.  There are tests for it, but I see nothing that defines it (other than df-scan.c which defines it as zero if it's not defined, not sure what the point of that is).

I must be missing something...

	paul


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/6] Shrink-wrapping
  2011-07-07 17:00       ` Paul Koning
@ 2011-07-07 17:02         ` Jeff Law
  2011-07-07 17:05           ` Paul Koning
  0 siblings, 1 reply; 73+ messages in thread
From: Jeff Law @ 2011-07-07 17:02 UTC (permalink / raw)
  To: Paul Koning; +Cc: Bernd Schmidt, GCC Patches, Richard Sandiford

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/07/11 10:58, Paul Koning wrote:
> 
> On Jul 7, 2011, at 11:38 AM, Bernd Schmidt wrote:
> 
>> ...
>>
>>> It'd also be nice to get rid of all these big blocks of code that are
>>> conditional on preprocessor macros, but I realise you're just following
>>> existing practice in the surrounding code, so again it can be left to
>>> a future cleanup.
>>
>> Yeah, this function is quite horrid - so many different paths through it.
>>
>> However, it looks like the only target without HAVE_prologue is actually
>> pdp11, so we're carrying some unnecessary baggage for purely
>> retrocomputing purposes. Paul, can you fix that?
> 
> Sure, but...  I searched for HAVE_prologue and I can't find any place that set it.  There are tests for it, but I see nothing that defines it (other than df-scan.c which defines it as zero if it's not defined, not sure what the point of that is).
> 
> I must be missing something...
Isn't it defined by the insn-foo generators based on the existence of a
prologue/epilogue insn in the MD file?

jeff
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOFeYcAAoJEBRtltQi2kC7VGgIALj386a0t+LMKL8dqj81DnQ1
iMx7q+bMcKhJz6HT9iJNsH1u9rFuwlw5K+FqNlrlxazSUmDpnbqUbwcem33ciicl
jdBQQrCCyNMI0piWNS+2VwG8D3UZYOLsgHWSONK5oBDwNwDo5P8rQ3USOh4Gv6in
puKL0HsteTvycMPGoAj2ZQCs+dL6r5nogIsBMAtJ7n+Vw+hstGnbc7TdxDbWikDC
63KekXpeTyrYSBwK+mxzhs6p3lkydZxEQoh/iuKm4Pi6DFZRSZB+GTvFWSz+0Ek5
hLgqEI42LWRKx34qioO37C7cbY5ONo/O/G7wiPp3wjCm07YBFDV4awKP6XEnEfQ=
=4v2Y
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/6] Shrink-wrapping
  2011-07-07 17:02         ` Jeff Law
@ 2011-07-07 17:05           ` Paul Koning
  2011-07-07 17:08             ` Jeff Law
                               ` (2 more replies)
  0 siblings, 3 replies; 73+ messages in thread
From: Paul Koning @ 2011-07-07 17:05 UTC (permalink / raw)
  To: Jeff Law; +Cc: Bernd Schmidt, GCC Patches, Richard Sandiford


On Jul 7, 2011, at 1:00 PM, Jeff Law wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 07/07/11 10:58, Paul Koning wrote:
>> 
>> On Jul 7, 2011, at 11:38 AM, Bernd Schmidt wrote:
>> 
>>> ...
>>> 
>>>> It'd also be nice to get rid of all these big blocks of code that are
>>>> conditional on preprocessor macros, but I realise you're just following
>>>> existing practice in the surrounding code, so again it can be left to
>>>> a future cleanup.
>>> 
>>> Yeah, this function is quite horrid - so many different paths through it.
>>> 
>>> However, it looks like the only target without HAVE_prologue is actually
>>> pdp11, so we're carrying some unnecessary baggage for purely
>>> retrocomputing purposes. Paul, can you fix that?
>> 
>> Sure, but...  I searched for HAVE_prologue and I can't find any place that set it.  There are tests for it, but I see nothing that defines it (other than df-scan.c which defines it as zero if it's not defined, not sure what the point of that is).
>> 
>> I must be missing something...
> Isn't it defined by the insn-foo generators based on the existence of a
> prologue/epilogue insn in the MD file?

Thanks, that must be what I was missing.  So someone is generating HAVE_%s, and that's why grep didn't find HAVE_prologue?

From a note by Richard Henderson (June 30, 2011) it sounds like rs6000 is the other platform that still generates asm prologues.  But yes, I said I would do this.  It sounds like doing it soon would help Bernd a lot.  Let me try to accelerate it.

	paul


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/6] Shrink-wrapping
  2011-07-07 17:05           ` Paul Koning
@ 2011-07-07 17:08             ` Jeff Law
  2011-07-07 17:30             ` Bernd Schmidt
  2011-07-08 22:59             ` [pdp11] Emit prologue as rtl Richard Henderson
  2 siblings, 0 replies; 73+ messages in thread
From: Jeff Law @ 2011-07-07 17:08 UTC (permalink / raw)
  To: Paul Koning; +Cc: Bernd Schmidt, GCC Patches, Richard Sandiford

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/07/11 11:05, Paul Koning wrote:
> 
> On Jul 7, 2011, at 1:00 PM, Jeff Law wrote:
> 
>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
>> 
>> On 07/07/11 10:58, Paul Koning wrote:
>>> 
>>> On Jul 7, 2011, at 11:38 AM, Bernd Schmidt wrote:
>>> 
>>>> ...
>>>> 
>>>>> It'd also be nice to get rid of all these big blocks of code
>>>>> that are conditional on preprocessor macros, but I realise
>>>>> you're just following existing practice in the surrounding
>>>>> code, so again it can be left to a future cleanup.
>>>> 
>>>> Yeah, this function is quite horrid - so many different paths
>>>> through it.
>>>> 
>>>> However, it looks like the only target without HAVE_prologue is
>>>> actually pdp11, so we're carrying some unnecessary baggage for
>>>> purely retrocomputing purposes. Paul, can you fix that?
>>> 
>>> Sure, but...  I searched for HAVE_prologue and I can't find any
>>> place that set it.  There are tests for it, but I see nothing
>>> that defines it (other than df-scan.c which defines it as zero if
>>> it's not defined, not sure what the point of that is).
>>> 
>>> I must be missing something...
>> Isn't it defined by the insn-foo generators based on the existence
>> of a prologue/epilogue insn in the MD file?
> 
> Thanks, that must be what I was missing.  So someone is generating
> HAVE_%s, and that's why grep didn't find HAVE_prologue?
Yup.

Jeff
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOFefZAAoJEBRtltQi2kC7Q1wH/R/vdaJUQfF732FZyuAHSMMu
TcDFJT4+uL4r5WaqBdrboyllLN0sJZYsXle/SDIMlL6wBMHDOmCykzEqWUC/Kukl
YC6u1NabYlWp0KcZqB+o2+ge4aixahPc5IJiQ/WHU9aT7/7t6VePYVSI8O9p7FjI
VXAtzrd7rrXpZnarTBHrbnmPOq/BIBzYM33kPUwThPkvy+NpYWWMPrH2moeN8EFM
1D9CATQTy3ysUGyLpxxIxNKmWqS/wJyl6+JycOE8aws9hiCclnlOdaI9yiKnU1Ht
cJut1tCv987VUidyEvKKGv/iDHm8fvTEPQ+EuwB3zD9bRqVM/cSRq2RKAdOiXoE=
=laeg
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/6] Shrink-wrapping
  2011-07-07 17:05           ` Paul Koning
  2011-07-07 17:08             ` Jeff Law
@ 2011-07-07 17:30             ` Bernd Schmidt
  2011-07-08 22:59             ` [pdp11] Emit prologue as rtl Richard Henderson
  2 siblings, 0 replies; 73+ messages in thread
From: Bernd Schmidt @ 2011-07-07 17:30 UTC (permalink / raw)
  To: Paul Koning; +Cc: Jeff Law, GCC Patches, Richard Sandiford

On 07/07/11 19:05, Paul Koning wrote:
> From a note by Richard Henderson (June 30, 2011) it sounds like
> rs6000 is the other platform that still generates asm prologues.  But
> yes, I said I would do this.  It sounds like doing it soon would help
> Bernd a lot.  Let me try to accelerate it.

Maybe not a whole lot, but it would allow us to simplify some code.


Bernd

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/6] Shrink-wrapping
  2011-07-07 15:57     ` [PATCH 4/6] Shrink-wrapping Richard Earnshaw
@ 2011-07-07 20:19       ` Richard Sandiford
  2011-07-08  8:30         ` Richard Earnshaw
  2011-07-08 13:57         ` Bernd Schmidt
  0 siblings, 2 replies; 73+ messages in thread
From: Richard Sandiford @ 2011-07-07 20:19 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: Bernd Schmidt, GCC Patches, richard.sandiford

Richard Earnshaw <rearnsha@arm.com> writes:
> On 07/07/11 15:34, Richard Sandiford wrote:
>> It seems a shame to have both (return) and (simple_return).  You said
>> that we need the distinction in order to cope with targets like ARM,
>> whose (return) instruction actually performs some of the epilogue too.
>> It feels like the load of the saved registers should really be expressed
>> in rtl, in parallel with the return.  I realise that'd prevent
>> conditional returns though.  Maybe there's no elegant way out...
>
> You'd still need to deal with distinct returns for shrink-wrapped code
> when the full (return) expands to
>
> 	ldm	sp, {regs..., pc}
>
> The shrink wrapped version would always be
> 	bx	lr

Sure, I understand that returns does more than return on ARM.
What I meant was: we'd normally want that other stuff to be
expressed in rtl alongside the (return) rtx.  E.g. something like:

  (parallel
    [(return)
     (set (reg r4) (mem (plus (reg sp) (const_int ...))))
     (set (reg r5) (mem (plus (reg sp) (const_int ...))))
     (set (reg sp) (plus (reg sp) (const_int ...)))])

And what I meant was: the reason we can't do that is that it would make
conditional execution harder.  But the downside is that (return) and
(simple_return) will appear to do the same thing to register r4
(i.e. nothing).  I.e. we are to some extent going to be lying to
the rtl optimisers.

Richard

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/6] Shrink-wrapping
  2011-03-23 14:56 ` [PATCH 4/6] Shrink-wrapping Bernd Schmidt
  2011-07-07 14:51   ` Richard Sandiford
@ 2011-07-07 21:41   ` Michael Hope
  1 sibling, 0 replies; 73+ messages in thread
From: Michael Hope @ 2011-07-07 21:41 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches

On Thu, Mar 24, 2011 at 3:53 AM, Bernd Schmidt <bernds@codesourcery.com> wrote:
> This adds the actual optimization, and reworks the JUMP_LABEL handling
> for return blocks. See the introduction mail or the new comment ahead of
> thread_prologue_and_epilogue_insns for more notes.

Hi Bernd.  Here's a list of issues we found with an earlier version of
the patch on a 4.5 based compiler:
 http://bit.ly/oBhuO7 [also below]

They should all contain test cases.  You might want to see if your
patch set covers all of them.  The missing stack cleanup on a switch
statement in LP: #757427 is particularly nasty.

-- Michael

https://bugs.launchpad.net/gcc-linaro/+bugs?field.status%3Alist=NEW&field.status%3Alist=TRIAGED&field.status%3Alist=FIXCOMMITTED&field.status%3Alist=FIXRELEASED&field.tag=shrinkwrap

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/6] Shrink-wrapping
  2011-07-07 20:19       ` Richard Sandiford
@ 2011-07-08  8:30         ` Richard Earnshaw
  2011-07-08 13:57         ` Bernd Schmidt
  1 sibling, 0 replies; 73+ messages in thread
From: Richard Earnshaw @ 2011-07-08  8:30 UTC (permalink / raw)
  To: Richard Earnshaw, Bernd Schmidt, GCC Patches, richard.sandiford,
	rdsandiford

On 07/07/11 21:08, Richard Sandiford wrote:
> Richard Earnshaw <rearnsha@arm.com> writes:
>> On 07/07/11 15:34, Richard Sandiford wrote:
>>> It seems a shame to have both (return) and (simple_return).  You said
>>> that we need the distinction in order to cope with targets like ARM,
>>> whose (return) instruction actually performs some of the epilogue too.
>>> It feels like the load of the saved registers should really be expressed
>>> in rtl, in parallel with the return.  I realise that'd prevent
>>> conditional returns though.  Maybe there's no elegant way out...
>>
>> You'd still need to deal with distinct returns for shrink-wrapped code
>> when the full (return) expands to
>>
>> 	ldm	sp, {regs..., pc}
>>
>> The shrink wrapped version would always be
>> 	bx	lr
> 
> Sure, I understand that returns does more than return on ARM.
> What I meant was: we'd normally want that other stuff to be
> expressed in rtl alongside the (return) rtx.  E.g. something like:
> 
>   (parallel
>     [(return)
>      (set (reg r4) (mem (plus (reg sp) (const_int ...))))
>      (set (reg r5) (mem (plus (reg sp) (const_int ...))))
>      (set (reg sp) (plus (reg sp) (const_int ...)))])
> 
> And what I meant was: the reason we can't do that is that it would make
> conditional execution harder.  But the downside is that (return) and
> (simple_return) will appear to do the same thing to register r4
> (i.e. nothing).  I.e. we are to some extent going to be lying to
> the rtl optimisers.
>

Hmm, yes, that would certainly help in terms of ensuring the compiler
knew the liveness correctly.  But as you say, that doesn't match a
simple-jump and that could lead to other problems.

R.

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/6] Shrink-wrapping
  2011-07-07 20:19       ` Richard Sandiford
  2011-07-08  8:30         ` Richard Earnshaw
@ 2011-07-08 13:57         ` Bernd Schmidt
  2011-07-11 11:24           ` Richard Sandiford
  1 sibling, 1 reply; 73+ messages in thread
From: Bernd Schmidt @ 2011-07-08 13:57 UTC (permalink / raw)
  To: Richard Earnshaw, GCC Patches, richard.sandiford, rdsandiford

On 07/07/11 22:08, Richard Sandiford wrote:
> Sure, I understand that returns does more than return on ARM.
> What I meant was: we'd normally want that other stuff to be
> expressed in rtl alongside the (return) rtx.  E.g. something like:
> 
>   (parallel
>     [(return)
>      (set (reg r4) (mem (plus (reg sp) (const_int ...))))
>      (set (reg r5) (mem (plus (reg sp) (const_int ...))))
>      (set (reg sp) (plus (reg sp) (const_int ...)))])

I've thought about it some more. Isn't this just a question of
definitions? Much like we implicitly clobber call-used registers for a
CALL rtx, we might as well define RETURN to restore the intersection
between regs_ever_live and call-saved regs? This is what its current
usage implies, but I guess it's never been necessary to spell it out
explicitly since we don't optimize across branches to the exit block.

Bernd

^ permalink raw reply	[flat|nested] 73+ messages in thread

* [pdp11] Emit prologue as rtl
  2011-07-07 17:05           ` Paul Koning
  2011-07-07 17:08             ` Jeff Law
  2011-07-07 17:30             ` Bernd Schmidt
@ 2011-07-08 22:59             ` Richard Henderson
  2011-07-09 13:46               ` Paul Koning
  2 siblings, 1 reply; 73+ messages in thread
From: Richard Henderson @ 2011-07-08 22:59 UTC (permalink / raw)
  To: Paul Koning; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 577 bytes --]

This appears to do the right thing.  I didn't bother with
markers for unwind info, since pdp11 is limited to a.out
and thus will never use dwarf2.

There are improvements that could be made.  I added some
comments to that effect but did not otherwise change the
code generation.

Note that the existence of the epilogue and return patterns
will by themselves cause changes.  In particular, basic block
re-ordering will likely move the epilogue into the middle of
the function, placing unlikely code paths later; something
that was not possible with a text-based epilogue.


r~

[-- Attachment #2: d-pdp11-1 --]
[-- Type: text/plain, Size: 16679 bytes --]

	* config/pdp11/pdp11.md (define_c_enum "unspecv"): New.
	(prologue, epilogue): New.
	(return, *rts): New.
	(blockage, setd, seti): New.
	* config/pdp11/pdp11.c (TARGET_ASM_FUNCTION_PROLOGUE): Remove.
	(TARGET_ASM_FUNCTION_EPILOGUE): Remove.
	(pdp11_saved_regno): New.
	(pdp11_expand_prologue): Rename from pdp11_output_function_prologue;
	generate rtl instead of text.
	(pdp11_expand_epilogue): Similarly from pdp11_output_function_epilogue.
	(pdp11_sp_frame_offset): Export.  Use pdp11_saved_regno.
	* config/pdp11/pdp11-protos.h: Update.


diff --git a/gcc/config/pdp11/pdp11-protos.h b/gcc/config/pdp11/pdp11-protos.h
index 56ad909..bb13309 100644
--- a/gcc/config/pdp11/pdp11-protos.h
+++ b/gcc/config/pdp11/pdp11-protos.h
@@ -38,6 +38,7 @@ typedef enum { no_action, dec_before, inc_after } pdp11_action;
 typedef enum { little, either, big } pdp11_partorder;
 extern bool pdp11_expand_operands (rtx *, rtx [][2], int, 
 				   pdp11_action *, pdp11_partorder);
+extern int pdp11_sp_frame_offset (void);
 extern int pdp11_initial_elimination_offset (int, int);
 extern enum reg_class pdp11_regno_reg_class (int);
 
@@ -45,3 +46,5 @@ extern enum reg_class pdp11_regno_reg_class (int);
 
 extern void output_ascii (FILE *, const char *, int);
 extern void pdp11_asm_output_var (FILE *, const char *, int, int, bool);
+extern void pdp11_expand_prologue (void);
+extern void pdp11_expand_epilogue (void);
diff --git a/gcc/config/pdp11/pdp11.c b/gcc/config/pdp11/pdp11.c
index 870b947..63e9986 100644
--- a/gcc/config/pdp11/pdp11.c
+++ b/gcc/config/pdp11/pdp11.c
@@ -141,8 +141,6 @@ decode_pdp11_d (const struct real_format *fmt ATTRIBUTE_UNUSED,
 
 static const char *singlemove_string (rtx *);
 static bool pdp11_assemble_integer (rtx, unsigned int, int);
-static void pdp11_output_function_prologue (FILE *, HOST_WIDE_INT);
-static void pdp11_output_function_epilogue (FILE *, HOST_WIDE_INT);
 static bool pdp11_rtx_costs (rtx, int, int, int *, bool);
 static bool pdp11_return_in_memory (const_tree, const_tree);
 static rtx pdp11_function_value (const_tree, const_tree, bool);
@@ -166,11 +164,6 @@ static bool pdp11_legitimate_constant_p (enum machine_mode, rtx);
 #undef TARGET_ASM_INTEGER
 #define TARGET_ASM_INTEGER pdp11_assemble_integer
 
-#undef TARGET_ASM_FUNCTION_PROLOGUE
-#define TARGET_ASM_FUNCTION_PROLOGUE pdp11_output_function_prologue
-#undef TARGET_ASM_FUNCTION_EPILOGUE
-#define TARGET_ASM_FUNCTION_EPILOGUE pdp11_output_function_epilogue
-
 #undef TARGET_ASM_OPEN_PAREN
 #define TARGET_ASM_OPEN_PAREN "["
 #undef TARGET_ASM_CLOSE_PAREN
@@ -227,95 +220,92 @@ static bool pdp11_legitimate_constant_p (enum machine_mode, rtx);
 #undef  TARGET_LEGITIMATE_CONSTANT_P
 #define TARGET_LEGITIMATE_CONSTANT_P pdp11_legitimate_constant_p
 \f
-/*
-   stream is a stdio stream to output the code to.
-   size is an int: how many units of temporary storage to allocate.
-   Refer to the array `regs_ever_live' to determine which registers
-   to save; `regs_ever_live[I]' is nonzero if register number I
-   is ever used in the function.  This macro is responsible for
-   knowing which registers should not be saved even if used.  
-*/
+/* A helper function to determine if REGNO should be saved in the
+   current function's stack frame.  */
 
-static void
-pdp11_output_function_prologue (FILE *stream, HOST_WIDE_INT size)
-{							       
-    HOST_WIDE_INT fsize = ((size) + 1) & ~1;
-    int regno;
-    int via_ac = -1;
+static inline bool
+pdp11_saved_regno (unsigned regno)
+{
+  return !call_used_regs[regno] && df_regs_ever_live_p (regno);
+}
 
-    fprintf (stream,
-	     "\n\t;	/* function prologue %s*/\n",
-	     current_function_name ());
+/* Expand the function prologue.  */
 
-    /* if we are outputting code for main, 
-       the switch FPU to right mode if TARGET_FPU */
-    if (MAIN_NAME_P (DECL_NAME (current_function_decl)) && TARGET_FPU)
+void
+pdp11_expand_prologue (void)
+{							       
+  HOST_WIDE_INT fsize = get_frame_size ();
+  unsigned regno;
+  rtx x, via_ac = NULL;
+
+  /* If we are outputting code for main, the switch FPU to the
+     right mode if TARGET_FPU.  */
+  if (MAIN_NAME_P (DECL_NAME (current_function_decl)) && TARGET_FPU)
     {
-	fprintf(stream,
-		"\t;/* switch cpu to double float, single integer */\n");
-	fprintf(stream, "\tsetd\n");
-	fprintf(stream, "\tseti\n\n");
+      emit_insn (gen_setd ());
+      emit_insn (gen_seti ());
     }
     
-    if (frame_pointer_needed) 					
-    {								
-	fprintf(stream, "\tmov r5, -(sp)\n");			
-	fprintf(stream, "\tmov sp, r5\n");				
-    }								
-    else 								
+  if (frame_pointer_needed) 					
     {								
-	/* DON'T SAVE FP */
+      x = gen_rtx_PRE_DEC (Pmode, stack_pointer_rtx);
+      x = gen_frame_mem (Pmode, x);
+      emit_move_insn (x, hard_frame_pointer_rtx);
+
+      emit_move_insn (hard_frame_pointer_rtx, stack_pointer_rtx);
     }								
 
-    /* make frame */
-    if (fsize)							
-	asm_fprintf (stream, "\tsub $%#wo, sp\n", fsize);
-
-    /* save CPU registers  */
-    for (regno = R0_REGNUM; regno <= PC_REGNUM; regno++)				
-      if (df_regs_ever_live_p (regno) && ! call_used_regs[regno])	
-	    if (! ((regno == FRAME_POINTER_REGNUM)			
-		   && frame_pointer_needed))				
-		fprintf (stream, "\tmov %s, -(sp)\n", reg_names[regno]);	
-    /* fpu regs saving */
-    
-    /* via_ac specifies the ac to use for saving ac4, ac5 */
-    via_ac = -1;
-    
-    for (regno = AC0_REGNUM; regno <= AC5_REGNUM ; regno++) 
+  /* Make frame.  */
+  if (fsize)
     {
-	/* ac0 - ac3 */						
-	if (LOAD_FPU_REG_P(regno)
-	    && df_regs_ever_live_p (regno) 
-	    && ! call_used_regs[regno])
-	{
-	    fprintf (stream, "\tstd %s, -(sp)\n", reg_names[regno]);
-	    via_ac = regno;
-	}
-	
-	/* maybe make ac4, ac5 call used regs?? */
-	/* ac4 - ac5 */
-	if (NO_LOAD_FPU_REG_P(regno)
-	    && df_regs_ever_live_p (regno)
-	    && ! call_used_regs[regno])
-	{
-	  gcc_assert (via_ac != -1);
-	  fprintf (stream, "\tldd %s, %s\n",
-		   reg_names[regno], reg_names[via_ac]);
-	  fprintf (stream, "\tstd %s, -(sp)\n", reg_names[via_ac]);
-	}
+      emit_insn (gen_addhi3 (stack_pointer_rtx, stack_pointer_rtx,
+			     GEN_INT (-fsize)));
+
+      /* Prevent frame references via the frame pointer from being
+	 scheduled before the frame is allocated.  */
+      if (frame_pointer_needed)
+	emit_insn (gen_blockage ());
     }
 
-    fprintf (stream, "\t;/* end of prologue */\n\n");		
+  /* Save CPU registers.  */
+  for (regno = R0_REGNUM; regno <= PC_REGNUM; regno++)
+    if (pdp11_saved_regno (regno)
+	&& (regno != HARD_FRAME_POINTER_REGNUM || !frame_pointer_needed))
+      {
+	x = gen_rtx_PRE_DEC (Pmode, stack_pointer_rtx);
+	x = gen_frame_mem (Pmode, x);
+	emit_move_insn (x, gen_rtx_REG (Pmode, regno));
+      }
+
+  /* Save FPU registers.  */
+  for (regno = AC0_REGNUM; regno <= AC3_REGNUM; regno++) 
+    if (pdp11_saved_regno (regno))
+      {
+	x = gen_rtx_PRE_DEC (Pmode, stack_pointer_rtx);
+	x = gen_frame_mem (DFmode, x);
+	via_ac = gen_rtx_REG (DFmode, regno);
+	emit_move_insn (x, via_ac);
+      }
+
+  /* ??? Maybe make ac4, ac5 call used regs?? */
+  for (regno = AC4_REGNUM; regno <= AC5_REGNUM; regno++)
+    if (pdp11_saved_regno (regno))
+      {
+	gcc_assert (via_ac != NULL);
+	emit_move_insn (via_ac, gen_rtx_REG (DFmode, regno));
+
+	x = gen_rtx_PRE_DEC (Pmode, stack_pointer_rtx);
+	x = gen_frame_mem (DFmode, x);
+	emit_move_insn (x, via_ac);
+      }
 }
 
-/*
-   The function epilogue should not depend on the current stack pointer!
+/* The function epilogue should not depend on the current stack pointer!
    It should use the frame pointer only.  This is mandatory because
    of alloca; we also take advantage of it to omit stack adjustments
    before returning.  */
 
-/* maybe we can make leaf functions faster by switching to the
+/* Maybe we can make leaf functions faster by switching to the
    second register file - this way we don't have to save regs!
    leaf functions are ~ 50% of all functions (dynamically!) 
 
@@ -328,109 +318,127 @@ pdp11_output_function_prologue (FILE *stream, HOST_WIDE_INT size)
 
    maybe as option if you want to generate code for kernel mode? */
 
-static void
-pdp11_output_function_epilogue (FILE *stream, HOST_WIDE_INT size)
+void
+pdp11_expand_epilogue (void)
 {								
-    HOST_WIDE_INT fsize = ((size) + 1) & ~1;
-    int i, j, k;
+  HOST_WIDE_INT fsize = get_frame_size ();
+  unsigned regno;
+  rtx x, reg, via_ac = NULL;
 
-    int via_ac;
-    
-    fprintf (stream, "\n\t;	/*function epilogue */\n");		
+  if (pdp11_saved_regno (AC4_REGNUM) || pdp11_saved_regno (AC5_REGNUM))
+    {
+      /* Find a temporary with which to restore AC4/5.  */
+      for (regno = AC0_REGNUM; regno <= AC3_REGNUM; regno++)
+	if (pdp11_saved_regno (regno))
+	  {
+	    via_ac = gen_rtx_REG (DFmode, regno);
+	    break;
+	  }
+    }
 
-    if (frame_pointer_needed)					
-    {								
-	/* hope this is safe - m68k does it also .... */		
-        df_set_regs_ever_live (FRAME_POINTER_REGNUM, false);
-								
-	for (i = PC_REGNUM, j = 0 ; i >= 0 ; i--)				
-	  if (df_regs_ever_live_p (i) && ! call_used_regs[i])		
-		j++;
-	
-	/* remember # of pushed bytes for CPU regs */
-	k = 2*j;
-	
-	/* change fp -> r5 due to the compile error on libgcc2.c */
-	for (i = PC_REGNUM ; i >= R0_REGNUM ; i--)					
-	  if (df_regs_ever_live_p (i) && ! call_used_regs[i])		
-		fprintf(stream, "\tmov %#" HOST_WIDE_INT_PRINT "o(r5), %s\n",
-			(-fsize-2*j--)&0xffff, reg_names[i]);
-
-	/* get ACs */						
-	via_ac = AC5_REGNUM;
-	
-	for (i = AC5_REGNUM; i >= AC0_REGNUM; i--)
-	  if (df_regs_ever_live_p (i) && ! call_used_regs[i])
-	    {
-		via_ac = i;
-		k += 8;
-	    }
-	
-	for (i = AC5_REGNUM; i >= AC0_REGNUM; i--)
-	{
-	    if (LOAD_FPU_REG_P(i)
-		&& df_regs_ever_live_p (i)
-		&& ! call_used_regs[i])
-	    {
-		fprintf(stream, "\tldd %#" HOST_WIDE_INT_PRINT "o(r5), %s\n",
-			(-fsize-k)&0xffff, reg_names[i]);
-		k -= 8;
-	    }
-	    
-	    if (NO_LOAD_FPU_REG_P(i)
-		&& df_regs_ever_live_p (i)
-		&& ! call_used_regs[i])
-	    {
-	        gcc_assert (LOAD_FPU_REG_P(via_ac));
-		    
-		fprintf(stream, "\tldd %#" HOST_WIDE_INT_PRINT "o(r5), %s\n",
-			(-fsize-k)&0xffff, reg_names[via_ac]);
-		fprintf(stream, "\tstd %s, %s\n", reg_names[via_ac], reg_names[i]);
-		k -= 8;
-	    }
-	}
-	
-	fprintf(stream, "\tmov r5, sp\n");				
-	fprintf (stream, "\tmov (sp)+, r5\n");     			
-    }								
-    else								
-    {		   
-      via_ac = AC5_REGNUM;
-	
-	/* get ACs */
-	for (i = AC5_REGNUM; i >= AC0_REGNUM; i--)
-	  if (df_regs_ever_live_p (i) && ! call_used_regs[i])
-		via_ac = i;
-	
-	for (i = AC5_REGNUM; i >= AC0_REGNUM; i--)
+  /* If possible, restore registers via pops.  */
+  if (!frame_pointer_needed || current_function_sp_is_unchanging)
+    {
+      /* Restore registers via pops.  */
+
+      for (regno = AC5_REGNUM; regno >= AC0_REGNUM; regno--)
+	if (pdp11_saved_regno (regno))
+	  {
+	    x = gen_rtx_POST_INC (Pmode, stack_pointer_rtx);
+	    x = gen_frame_mem (DFmode, x);
+	    reg = gen_rtx_REG (DFmode, regno);
+
+	    if (LOAD_FPU_REG_P (regno))
+	      emit_move_insn (reg, x);
+	    else
+	      {
+	        emit_move_insn (via_ac, x);
+		emit_move_insn (reg, via_ac);
+	      }
+	  }
+
+      for (regno = PC_REGNUM; regno >= R0_REGNUM + 2; regno--)
+	if (pdp11_saved_regno (regno)
+	    && (regno != HARD_FRAME_POINTER_REGNUM || !frame_pointer_needed))
+	  {
+	    x = gen_rtx_POST_INC (Pmode, stack_pointer_rtx);
+	    x = gen_frame_mem (Pmode, x);
+	    emit_move_insn (gen_rtx_REG (Pmode, regno), x);
+	  }
+    }
+  else
+    {
+      /* Restore registers via moves.  */
+      /* ??? If more than a few registers need to be restored, it's smaller
+	 to generate a pointer through which we can emit pops.  Consider
+	 that moves cost 2*NREG words and pops cost NREG+3 words.  This
+	 means that the crossover is NREG=3.
+
+	 Possible registers to use are:
+	  (1) The first call-saved general register.  This register will
+		be restored with the last pop.
+	  (2) R1, if it's not used as a return register.
+	  (3) FP itself.  This option may result in +4 words, since we
+		may need two add imm,rn instructions instead of just one.
+		This also has the downside that we're not representing
+		the unwind info in any way, so during the epilogue the
+		debugger may get lost.  */
+
+      HOST_WIDE_INT ofs = -pdp11_sp_frame_offset ();
+
+      for (regno = AC5_REGNUM; regno >= AC0_REGNUM; regno--)
+	if (pdp11_saved_regno (regno))
+	  {
+	    x = plus_constant (hard_frame_pointer_rtx, ofs);
+	    x = gen_frame_mem (DFmode, x);
+	    reg = gen_rtx_REG (DFmode, regno);
+
+	    if (LOAD_FPU_REG_P (regno))
+	      emit_move_insn (reg, x);
+	    else
+	      {
+	        emit_move_insn (via_ac, x);
+		emit_move_insn (reg, via_ac);
+	      }
+	    ofs += 8;
+	  }
+
+      for (regno = PC_REGNUM; regno >= R0_REGNUM + 2; regno--)
+	if (pdp11_saved_regno (regno)
+	    && (regno != HARD_FRAME_POINTER_REGNUM || !frame_pointer_needed))
+	  {
+	    x = plus_constant (hard_frame_pointer_rtx, ofs);
+	    x = gen_frame_mem (Pmode, x);
+	    emit_move_insn (gen_rtx_REG (Pmode, regno), x);
+	    ofs += 2;
+	  }
+    }
+
+  /* Deallocate the stack frame.  */
+  if (fsize)
+    {
+      /* Prevent frame references via any pointer from being
+	 scheduled after the frame is deallocated.  */
+      emit_insn (gen_blockage ());
+
+      if (frame_pointer_needed)
 	{
-	    if (LOAD_FPU_REG_P(i)
-		&& df_regs_ever_live_p (i)
-		&& ! call_used_regs[i])
-	      fprintf(stream, "\tldd (sp)+, %s\n", reg_names[i]);
-	    
-	    if (NO_LOAD_FPU_REG_P(i)
-		&& df_regs_ever_live_p (i)
-		&& ! call_used_regs[i])
-	    {
-	        gcc_assert (LOAD_FPU_REG_P(via_ac));
-		    
-		fprintf(stream, "\tldd (sp)+, %s\n", reg_names[via_ac]);
-		fprintf(stream, "\tstd %s, %s\n", reg_names[via_ac], reg_names[i]);
-	    }
+	  /* We can deallocate the frame with a single move.  */
+	  emit_move_insn (stack_pointer_rtx, hard_frame_pointer_rtx);
 	}
+      else
+	emit_insn (gen_addhi3 (stack_pointer_rtx, stack_pointer_rtx,
+			       GEN_INT (fsize)));
+    }
 
-	for (i = PC_REGNUM; i >= 0; i--)					
-	  if (df_regs_ever_live_p (i) && !call_used_regs[i])		
-		fprintf(stream, "\tmov (sp)+, %s\n", reg_names[i]);	
-								
-	if (fsize)						
-	    fprintf((stream), "\tadd $%#" HOST_WIDE_INT_PRINT "o, sp\n",
-		    (fsize)&0xffff);      		
-    }			
-					
-    fprintf (stream, "\trts pc\n");					
-    fprintf (stream, "\t;/* end of epilogue*/\n\n\n");		
+  if (frame_pointer_needed)
+    {
+      x = gen_rtx_POST_INC (Pmode, stack_pointer_rtx);
+      x = gen_frame_mem (Pmode, x);
+      emit_move_insn (hard_frame_pointer_rtx, x);
+    }
+
+  emit_jump_insn (gen_return ());
 }
 
 /* Return the best assembler insn template
@@ -1570,16 +1578,16 @@ pdp11_regno_reg_class (int regno)
 }
 
 
-static int
+int
 pdp11_sp_frame_offset (void)
 {
   int offset = 0, regno;
   offset = get_frame_size();
   for (regno = 0; regno <= PC_REGNUM; regno++)
-    if (df_regs_ever_live_p (regno) && ! call_used_regs[regno])
+    if (pdp11_saved_regno (regno))
       offset += 2;
   for (regno = AC0_REGNUM; regno <= AC5_REGNUM; regno++)
-    if (df_regs_ever_live_p (regno) && ! call_used_regs[regno])
+    if (pdp11_saved_regno (regno))
       offset += 8;
   
   return offset;
diff --git a/gcc/config/pdp11/pdp11.md b/gcc/config/pdp11/pdp11.md
index 1c65426..23a8665 100644
--- a/gcc/config/pdp11/pdp11.md
+++ b/gcc/config/pdp11/pdp11.md
@@ -22,6 +22,13 @@
 (include "predicates.md")
 (include "constraints.md")
 
+(define_c_enum "unspecv"
+  [
+    UNSPECV_BLOCKAGE
+    UNSPECV_SETD
+    UNSPECV_SETI
+  ])
+
 (define_constants
   [
    ;; Register numbers
@@ -104,6 +111,50 @@
 
 ;; define function units
 
+;; Prologue and epilogue support.
+
+(define_expand "prologue"
+  [(const_int 0)]
+  ""
+{
+  pdp11_expand_prologue ();
+  DONE;
+})
+
+(define_expand "epilogue"
+  [(const_int 0)]
+  ""
+{
+  pdp11_expand_epilogue ();
+  DONE;
+})
+
+(define_expand "return"
+  [(return)]
+  "reload_completed && !frame_pointer_needed && pdp11_sp_frame_offset () == 0"
+  "")
+
+(define_insn "*rts"
+  [(return)]
+  ""
+  "rts pc")
+
+(define_insn "blockage"
+  [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)]
+  ""
+  ""
+  [(set_attr "length" "0")])
+
+(define_insn "setd"
+  [(unspec_volatile [(const_int 0)] UNSPECV_SETD)]
+  ""
+  "setd")
+
+(define_insn "seti"
+  [(unspec_volatile [(const_int 0)] UNSPECV_SETI)]
+  ""
+  "seti")
+
 ;; arithmetic - values here immediately when next insn issued
 ;; or does it mean the number of cycles after this insn was issued?
 ;; how do I say that fpu insns use cpu also? (pre-interaction phase)

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [pdp11] Emit prologue as rtl
  2011-07-08 22:59             ` [pdp11] Emit prologue as rtl Richard Henderson
@ 2011-07-09 13:46               ` Paul Koning
  2011-07-09 16:53                 ` Richard Henderson
  0 siblings, 1 reply; 73+ messages in thread
From: Paul Koning @ 2011-07-09 13:46 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches


On Jul 8, 2011, at 6:13 PM, Richard Henderson wrote:

> This appears to do the right thing.  I didn't bother with
> markers for unwind info, since pdp11 is limited to a.out
> and thus will never use dwarf2.
> 
> There are improvements that could be made.  I added some
> comments to that effect but did not otherwise change the
> code generation.
> 
> Note that the existence of the epilogue and return patterns
> will by themselves cause changes.  In particular, basic block
> re-ordering will likely move the epilogue into the middle of
> the function, placing unlikely code paths later; something
> that was not possible with a text-based epilogue.

Thanks, that looks good.

	paul

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [pdp11] Emit prologue as rtl
  2011-07-09 13:46               ` Paul Koning
@ 2011-07-09 16:53                 ` Richard Henderson
  0 siblings, 0 replies; 73+ messages in thread
From: Richard Henderson @ 2011-07-09 16:53 UTC (permalink / raw)
  To: Paul Koning; +Cc: GCC Patches

On 07/09/2011 06:30 AM, Paul Koning wrote:
> 
> On Jul 8, 2011, at 6:13 PM, Richard Henderson wrote:
> 
>> This appears to do the right thing.  I didn't bother with
>> markers for unwind info, since pdp11 is limited to a.out
>> and thus will never use dwarf2.
>>
>> There are improvements that could be made.  I added some
>> comments to that effect but did not otherwise change the
>> code generation.
>>
>> Note that the existence of the epilogue and return patterns
>> will by themselves cause changes.  In particular, basic block
>> re-ordering will likely move the epilogue into the middle of
>> the function, placing unlikely code paths later; something
>> that was not possible with a text-based epilogue.
> 
> Thanks, that looks good.

Committed.  HAVE_prologue and HAVE_epilogue are now
universally true.  Yaye!


r~

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/6] Shrink-wrapping
  2011-07-08 13:57         ` Bernd Schmidt
@ 2011-07-11 11:24           ` Richard Sandiford
  2011-07-11 11:42             ` Bernd Schmidt
  0 siblings, 1 reply; 73+ messages in thread
From: Richard Sandiford @ 2011-07-11 11:24 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: Richard Earnshaw, GCC Patches

Bernd Schmidt <bernds@codesourcery.com> writes:
> On 07/07/11 22:08, Richard Sandiford wrote:
>> Sure, I understand that returns does more than return on ARM.
>> What I meant was: we'd normally want that other stuff to be
>> expressed in rtl alongside the (return) rtx.  E.g. something like:
>> 
>>   (parallel
>>     [(return)
>>      (set (reg r4) (mem (plus (reg sp) (const_int ...))))
>>      (set (reg r5) (mem (plus (reg sp) (const_int ...))))
>>      (set (reg sp) (plus (reg sp) (const_int ...)))])
>
> I've thought about it some more. Isn't this just a question of
> definitions? Much like we implicitly clobber call-used registers for a
> CALL rtx, we might as well define RETURN to restore the intersection
> between regs_ever_live and call-saved regs? This is what its current
> usage implies, but I guess it's never been necessary to spell it out
> explicitly since we don't optimize across branches to the exit block.

I don't think we could assume that for all targets.  On ARM, (return)
restores registers, but on many targets it's done separately.  I suppose
we could define some sort of hook though (if the need arose).

To be clear, the comment about return vs. simple_return was really just
musing about something that seemed a bit unclean, especially on targets
like MIPS where the two instructions actually do the same thing.  It also
makes it harder to have partial prologues in future (do we add a third
return code for that?).

I don't think it's a reason to reject the change though.

An alternative might be to add an rtx operand to return that gives a
prologue number (again with the hook mentioned above being defined
if we ever need it).  That'd be slightly more general, and would allow
targets to ignore it if it doesn't matter.

Richard

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/6] Shrink-wrapping
  2011-07-11 11:24           ` Richard Sandiford
@ 2011-07-11 11:42             ` Bernd Schmidt
  0 siblings, 0 replies; 73+ messages in thread
From: Bernd Schmidt @ 2011-07-11 11:42 UTC (permalink / raw)
  To: Richard Earnshaw, GCC Patches, richard.sandiford

On 07/11/11 13:08, Richard Sandiford wrote:
> Bernd Schmidt <bernds@codesourcery.com> writes:
>> On 07/07/11 22:08, Richard Sandiford wrote:
>>> Sure, I understand that returns does more than return on ARM.
>>> What I meant was: we'd normally want that other stuff to be
>>> expressed in rtl alongside the (return) rtx.  E.g. something like:
>>>
>>>   (parallel
>>>     [(return)
>>>      (set (reg r4) (mem (plus (reg sp) (const_int ...))))
>>>      (set (reg r5) (mem (plus (reg sp) (const_int ...))))
>>>      (set (reg sp) (plus (reg sp) (const_int ...)))])
>>
>> I've thought about it some more. Isn't this just a question of
>> definitions? Much like we implicitly clobber call-used registers for a
>> CALL rtx, we might as well define RETURN to restore the intersection
>> between regs_ever_live and call-saved regs? This is what its current
>> usage implies, but I guess it's never been necessary to spell it out
>> explicitly since we don't optimize across branches to the exit block.
> 
> I don't think we could assume that for all targets.  On ARM, (return)
> restores registers, but on many targets it's done separately.

An instruction that does not do this should then use simple_return,
which has the appropriate definition (just return, nothing else).

For most ports I expect there is no difference, since HAVE_return tends
to have a guard that requires no epilogue (as the documentation suggests
should be the case).


Bernd

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/6] Shrink-wrapping
  2011-07-07 14:51   ` Richard Sandiford
  2011-07-07 15:40     ` Bernd Schmidt
  2011-07-07 15:57     ` [PATCH 4/6] Shrink-wrapping Richard Earnshaw
@ 2011-07-21  3:57     ` Bernd Schmidt
  2011-07-21 11:25       ` Richard Sandiford
  2011-08-02  8:40     ` Bernd Schmidt
  3 siblings, 1 reply; 73+ messages in thread
From: Bernd Schmidt @ 2011-07-21  3:57 UTC (permalink / raw)
  To: GCC Patches, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 1046 bytes --]

On 07/07/11 16:34, Richard Sandiford wrote:
> Is JUMP_LABEL ever null after this change?  (In fully-complete rtl
> sequences, I mean.)  It looked like some of the null checks in the
> patch might not be necessary any more.

It turns out that computed jumps can have a NULL JUMP_LABEL, and so can
JUMP_INSNs holding ADDR_VECs.

> I know it's a pain, but it'd really help if you could split the
> "JUMP_LABEL == a return rtx" stuff out.

Done with this patch. I've looked at all the target code that uses
JUMP_LABELS and either convinced myself that they are safe, or changed
them, but I haven't tested them all. Bootstrapped and tested on
i686-linux, and also with a mips64-elf cross compiler (using the
workaround patch in PR49735). Also verified that there are no changes in
code generation for any of my collected .i files on mips64-elf. Ok?

This only deals with JUMP_LABELs, not (return) occurring in patterns -
another patch will be needed to change these kinds of tests to
ANY_RETURN_P to allow the introduction of (simple_return).


Bernd

[-- Attachment #2: jlabel0720.diff --]
[-- Type: text/plain, Size: 34371 bytes --]

	* rtlanal.c (tablejump_p): False for returns.
	* reorg.c (active_insn_after): New static function.
	(find_end_label): Set JUMP_LABEL for a new returnjump.
	(optimize_skip, get_jump_flags, rare_destination,
	mostly_true_jump, get_branch_condition,
	steal_delay_list_from_target, own_thread_p,
	fill_simple_delay_slots, follow_jumps, fill_slots_from_thread,
	fill_eager_delay_slots, relax_delay_slots, make_return_insns,
	dbr_schedule): Adjust to handle ret_rtx in JUMP_LABELs.
	* jump.c (delete_related_insns): Likewise.
	(redirect_target): New static function.
	(redirect_exp_1): Use it.  Adjust to handle ret_rtx in JUMP_LABELS.
	(redirect_jump_1): Assert that the new label is nonnull.
	(redirect_jump): Likewise.
	(redirect_jump_2): Check for ANY_RETURN_P rather than NULL labels.
	* ifcvt.c (find_if_case_1): Take care when redirecting jumps to the
	exit block.
	(dead_or_predicable): Change NEW_DEST arg to DEST_EDGE.  All callers
	changed.  Ensure that the right label is passed to redirect_jump.
	* function.c (emit_return_into_block,
	thread_prologue_and_epilogue_insns): Ensure new returnjumps have
	ret_rtx in their JUMP_LABEL.
	* print-rtl.c (print_rtx): Handle ret_rtx in a JUMP_LABEL.
	* emit-rtl.c (skip_consecutive_labels): Allow the caller to
	pass ret_rtx as label.
	* cfglayout.c (fixup_reorder_chain): Use
	force_nonfallthru_and_redirect rather than force_nonfallthru.
	(duplicate_insn_chain): Copy JUMP_LABELs for returns.
	* rtl.h (ANY_RETURN_P): New macro.
	* dwarf2cfi.c (compute_barrier_args_size_1): Check JUMP_LABEL
	for ret_rtx.
	(create_cfi_notes): Skip ADDR_VECs and ADDR_DIFF_VECs early.
	* resource.c (find_dead_or_set_registers): Handle ret_rtx in
	JUMP_LABELs.
	(mark_target_live_regs): Likewise.
	* basic-block.h (force_nonfallthru_and_redirect): Declare.
	* cfgrtl.c (force_nonfallthru_and_redirect): No longer static.
	* config/alpha/alpha.c (alpha_tablejump_addr_vec,
	alpha_tablejump_best_label): Remove functions.
	* config/alpha/alpha-protos.c (alpha_tablejump_addr_vec,
	alpha_tablejump_best_label): Remove declarations.
	* config/sh/sh.c (barrier_align, split_branches): Adjust for
	ret_rtx in JUMP_LABELs.
	* config/arm/arm.c (is_jump_table): Likewise.

Index: gcc/rtlanal.c
===================================================================
--- gcc/rtlanal.c	(revision 176230)
+++ gcc/rtlanal.c	(working copy)
@@ -2660,8 +2660,11 @@ tablejump_p (const_rtx insn, rtx *labelp
 {
   rtx label, table;
 
-  if (JUMP_P (insn)
-      && (label = JUMP_LABEL (insn)) != NULL_RTX
+  if (!JUMP_P (insn))
+    return false;
+
+  label = JUMP_LABEL (insn);
+  if (label != NULL_RTX && !ANY_RETURN_P (label)
       && (table = next_active_insn (label)) != NULL_RTX
       && JUMP_TABLE_DATA_P (table))
     {
Index: gcc/reorg.c
===================================================================
--- gcc/reorg.c	(revision 176230)
+++ gcc/reorg.c	(working copy)
@@ -220,6 +220,17 @@ static void relax_delay_slots (rtx);
 static void make_return_insns (rtx);
 #endif
 \f
+/* A wrapper around next_active_insn which takes care to return ret_rtx
+   unchanged.  */
+
+static rtx
+active_insn_after (rtx insn)
+{
+  if (ANY_RETURN_P (insn))
+    return insn;
+  return next_active_insn (insn);
+}
+\f
 /* Return TRUE if this insn should stop the search for insn to fill delay
    slots.  LABELS_P indicates that labels should terminate the search.
    In all cases, jumps terminate the search.  */
@@ -437,6 +448,7 @@ find_end_label (void)
 	      /* The return we make may have delay slots too.  */
 	      rtx insn = gen_return ();
 	      insn = emit_jump_insn (insn);
+	      JUMP_LABEL (insn) = ret_rtx;
 	      emit_barrier ();
 	      if (num_delay_slots (insn) > 0)
 		obstack_ptr_grow (&unfilled_slots_obstack, insn);
@@ -824,7 +836,7 @@ optimize_skip (rtx insn)
 	      || GET_CODE (PATTERN (next_trial)) == RETURN))
 	{
 	  rtx target_label = JUMP_LABEL (next_trial);
-	  if (target_label == 0)
+	  if (ANY_RETURN_P (target_label))
 	    target_label = find_end_label ();
 
 	  if (target_label)
@@ -866,7 +878,7 @@ get_jump_flags (rtx insn, rtx label)
   if (JUMP_P (insn)
       && (condjump_p (insn) || condjump_in_parallel_p (insn))
       && INSN_UID (insn) <= max_uid
-      && label != 0
+      && !ANY_RETURN_P (label)
       && INSN_UID (label) <= max_uid)
     flags
       = (uid_to_ruid[INSN_UID (label)] > uid_to_ruid[INSN_UID (insn)])
@@ -921,7 +933,7 @@ rare_destination (rtx insn)
   int jump_count = 0;
   rtx next;
 
-  for (; insn; insn = next)
+  for (; insn && !ANY_RETURN_P (insn); insn = next)
     {
       if (NONJUMP_INSN_P (insn) && GET_CODE (PATTERN (insn)) == SEQUENCE)
 	insn = XVECEXP (PATTERN (insn), 0, 0);
@@ -1017,7 +1029,7 @@ mostly_true_jump (rtx jump_insn, rtx con
   /* Predict backward branches usually take, forward branches usually not.  If
      we don't know whether this is forward or backward, assume the branch
      will be taken, since most are.  */
-  return (target_label == 0 || INSN_UID (jump_insn) > max_uid
+  return (ANY_RETURN_P (target_label) || INSN_UID (jump_insn) > max_uid
 	  || INSN_UID (target_label) > max_uid
 	  || (uid_to_ruid[INSN_UID (jump_insn)]
 	      > uid_to_ruid[INSN_UID (target_label)]));
@@ -1037,10 +1049,10 @@ get_branch_condition (rtx insn, rtx targ
   if (condjump_in_parallel_p (insn))
     pat = XVECEXP (pat, 0, 0);
 
-  if (GET_CODE (pat) == RETURN)
-    return target == 0 ? const_true_rtx : 0;
+  if (ANY_RETURN_P (pat))
+    return pat == target ? const_true_rtx : 0;
 
-  else if (GET_CODE (pat) != SET || SET_DEST (pat) != pc_rtx)
+  if (GET_CODE (pat) != SET || SET_DEST (pat) != pc_rtx)
     return 0;
 
   src = SET_SRC (pat);
@@ -1048,16 +1060,12 @@ get_branch_condition (rtx insn, rtx targ
     return const_true_rtx;
 
   else if (GET_CODE (src) == IF_THEN_ELSE
-	   && ((target == 0 && GET_CODE (XEXP (src, 1)) == RETURN)
-	       || (GET_CODE (XEXP (src, 1)) == LABEL_REF
-		   && XEXP (XEXP (src, 1), 0) == target))
+	   && XEXP (XEXP (src, 1), 0) == target
 	   && XEXP (src, 2) == pc_rtx)
     return XEXP (src, 0);
 
   else if (GET_CODE (src) == IF_THEN_ELSE
-	   && ((target == 0 && GET_CODE (XEXP (src, 2)) == RETURN)
-	       || (GET_CODE (XEXP (src, 2)) == LABEL_REF
-		   && XEXP (XEXP (src, 2), 0) == target))
+	   && XEXP (XEXP (src, 2), 0) == target
 	   && XEXP (src, 1) == pc_rtx)
     {
       enum rtx_code rev;
@@ -1318,7 +1326,7 @@ steal_delay_list_from_target (rtx insn,
     }
 
   /* Show the place to which we will be branching.  */
-  *pnew_thread = next_active_insn (JUMP_LABEL (XVECEXP (seq, 0, 0)));
+  *pnew_thread = active_insn_after (JUMP_LABEL (XVECEXP (seq, 0, 0)));
 
   /* Add any new insns to the delay list and update the count of the
      number of slots filled.  */
@@ -1827,7 +1835,7 @@ own_thread_p (rtx thread, rtx label, int
   rtx insn;
 
   /* We don't own the function end.  */
-  if (thread == 0)
+  if (thread == 0 || ANY_RETURN_P (thread))
     return 0;
 
   /* Get the first active insn, or THREAD, if it is an active insn.  */
@@ -2245,7 +2253,7 @@ fill_simple_delay_slots (int non_jumps_p
 	  && (!JUMP_P (insn)
 	      || ((condjump_p (insn) || condjump_in_parallel_p (insn))
 		  && ! simplejump_p (insn)
-		  && JUMP_LABEL (insn) != 0)))
+		  && !ANY_RETURN_P (JUMP_LABEL (insn)))))
 	{
 	  /* Invariant: If insn is a JUMP_INSN, the insn's jump
 	     label.  Otherwise, zero.  */
@@ -2270,7 +2278,7 @@ fill_simple_delay_slots (int non_jumps_p
 		target = JUMP_LABEL (insn);
 	    }
 
-	  if (target == 0)
+	  if (target == 0 || ANY_RETURN_P (target))
 	    for (trial = next_nonnote_insn (insn); !stop_search_p (trial, 1);
 		 trial = next_trial)
 	      {
@@ -2346,6 +2354,7 @@ fill_simple_delay_slots (int non_jumps_p
 	      && JUMP_P (trial)
 	      && simplejump_p (trial)
 	      && (target == 0 || JUMP_LABEL (trial) == target)
+	      && !ANY_RETURN_P (JUMP_LABEL (trial))
 	      && (next_trial = next_active_insn (JUMP_LABEL (trial))) != 0
 	      && ! (NONJUMP_INSN_P (next_trial)
 		    && GET_CODE (PATTERN (next_trial)) == SEQUENCE)
@@ -2500,7 +2509,7 @@ fill_simple_delay_slots (int non_jumps_p
 \f
 /* Follow any unconditional jump at LABEL;
    return the ultimate label reached by any such chain of jumps.
-   Return null if the chain ultimately leads to a return instruction.
+   Return ret_rtx if the chain ultimately leads to a return instruction.
    If LABEL is not followed by a jump, return LABEL.
    If the chain loops or we can't find end, return LABEL,
    since that tells caller to avoid changing the insn.  */
@@ -2513,29 +2522,34 @@ follow_jumps (rtx label)
   rtx value = label;
   int depth;
 
+  if (ANY_RETURN_P (label))
+    return label;
   for (depth = 0;
        (depth < 10
 	&& (insn = next_active_insn (value)) != 0
 	&& JUMP_P (insn)
-	&& ((JUMP_LABEL (insn) != 0 && any_uncondjump_p (insn)
-	     && onlyjump_p (insn))
+	&& JUMP_LABEL (insn) != NULL_RTX
+	&& ((any_uncondjump_p (insn) && onlyjump_p (insn))
 	    || GET_CODE (PATTERN (insn)) == RETURN)
 	&& (next = NEXT_INSN (insn))
 	&& BARRIER_P (next));
        depth++)
     {
+      rtx this_label = JUMP_LABEL (insn);
       rtx tem;
 
       /* If we have found a cycle, make the insn jump to itself.  */
-      if (JUMP_LABEL (insn) == label)
+      if (this_label == label)
 	return label;
-
-      tem = next_active_insn (JUMP_LABEL (insn));
-      if (tem && (GET_CODE (PATTERN (tem)) == ADDR_VEC
-		  || GET_CODE (PATTERN (tem)) == ADDR_DIFF_VEC))
+      if (ANY_RETURN_P (this_label))
+	return this_label;
+      tem = next_active_insn (this_label);
+      if (tem
+	  && (GET_CODE (PATTERN (tem)) == ADDR_VEC
+	      || GET_CODE (PATTERN (tem)) == ADDR_DIFF_VEC))
 	break;
 
-      value = JUMP_LABEL (insn);
+      value = this_label;
     }
   if (depth == 10)
     return label;
@@ -2587,7 +2601,7 @@ fill_slots_from_thread (rtx insn, rtx co
 
   /* If our thread is the end of subroutine, we can't get any delay
      insns from that.  */
-  if (thread == 0)
+  if (thread == NULL_RTX || ANY_RETURN_P (thread))
     return delay_list;
 
   /* If this is an unconditional branch, nothing is needed at the
@@ -2757,7 +2771,8 @@ fill_slots_from_thread (rtx insn, rtx co
 			      gcc_assert (REG_NOTE_KIND (note)
 					  == REG_LABEL_OPERAND);
 			  }
-		      if (JUMP_P (trial) && JUMP_LABEL (trial))
+		      if (JUMP_P (trial) && JUMP_LABEL (trial)
+			  && !ANY_RETURN_P (JUMP_LABEL (trial)))
 			LABEL_NUSES (JUMP_LABEL (trial))++;
 
 		      delete_related_insns (trial);
@@ -2776,7 +2791,8 @@ fill_slots_from_thread (rtx insn, rtx co
 			      gcc_assert (REG_NOTE_KIND (note)
 					  == REG_LABEL_OPERAND);
 			  }
-		      if (JUMP_P (trial) && JUMP_LABEL (trial))
+		      if (JUMP_P (trial) && JUMP_LABEL (trial)
+			  && !ANY_RETURN_P (JUMP_LABEL (trial)))
 			LABEL_NUSES (JUMP_LABEL (trial))--;
 		    }
 		  else
@@ -2897,7 +2913,8 @@ fill_slots_from_thread (rtx insn, rtx co
      depend on the destination register.  If so, try to place the opposite
      arithmetic insn after the jump insn and put the arithmetic insn in the
      delay slot.  If we can't do this, return.  */
-  if (delay_list == 0 && likely && new_thread
+  if (delay_list == 0 && likely
+      && new_thread && !ANY_RETURN_P (new_thread)
       && NONJUMP_INSN_P (new_thread)
       && GET_CODE (PATTERN (new_thread)) != ASM_INPUT
       && asm_noperands (PATTERN (new_thread)) < 0)
@@ -2990,7 +3007,7 @@ fill_slots_from_thread (rtx insn, rtx co
 					      delay_list))
 	new_thread = follow_jumps (JUMP_LABEL (new_thread));
 
-      if (new_thread == 0)
+      if (ANY_RETURN_P (new_thread))
 	label = find_end_label ();
       else if (LABEL_P (new_thread))
 	label = new_thread;
@@ -3063,7 +3080,7 @@ fill_eager_delay_slots (void)
 	 them.  Then see whether the branch is likely true.  We don't need
 	 to do a lot of this for unconditional branches.  */
 
-      insn_at_target = next_active_insn (target_label);
+      insn_at_target = active_insn_after (target_label);
       own_target = own_thread_p (target_label, target_label, 0);
 
       if (condition == const_true_rtx)
@@ -3098,7 +3115,7 @@ fill_eager_delay_slots (void)
 		 from the thread that was filled.  So we have to recompute
 		 the next insn at the target.  */
 	      target_label = JUMP_LABEL (insn);
-	      insn_at_target = next_active_insn (target_label);
+	      insn_at_target = active_insn_after (target_label);
 
 	      delay_list
 		= fill_slots_from_thread (insn, condition, fallthrough_insn,
@@ -3337,10 +3354,10 @@ relax_delay_slots (rtx first)
 	 group of consecutive labels.  */
       if (JUMP_P (insn)
 	  && (condjump_p (insn) || condjump_in_parallel_p (insn))
-	  && (target_label = JUMP_LABEL (insn)) != 0)
+	  && !ANY_RETURN_P (target_label = JUMP_LABEL (insn)))
 	{
 	  target_label = skip_consecutive_labels (follow_jumps (target_label));
-	  if (target_label == 0)
+	  if (ANY_RETURN_P (target_label))
 	    target_label = find_end_label ();
 
 	  if (target_label && next_active_insn (target_label) == next
@@ -3373,7 +3390,7 @@ relax_delay_slots (rtx first)
 		 invert_jump fails.  */
 
 	      ++LABEL_NUSES (target_label);
-	      if (label)
+	      if (!ANY_RETURN_P (label))
 		++LABEL_NUSES (label);
 
 	      if (invert_jump (insn, label, 1))
@@ -3382,7 +3399,7 @@ relax_delay_slots (rtx first)
 		  next = insn;
 		}
 
-	      if (label)
+	      if (!ANY_RETURN_P (label))
 		--LABEL_NUSES (label);
 
 	      if (--LABEL_NUSES (target_label) == 0)
@@ -3485,12 +3502,12 @@ relax_delay_slots (rtx first)
 
       target_label = JUMP_LABEL (delay_insn);
 
-      if (target_label)
+      if (!ANY_RETURN_P (target_label))
 	{
 	  /* If this jump goes to another unconditional jump, thread it, but
 	     don't convert a jump into a RETURN here.  */
 	  trial = skip_consecutive_labels (follow_jumps (target_label));
-	  if (trial == 0)
+	  if (ANY_RETURN_P (trial))
 	    trial = find_end_label ();
 
 	  if (trial && trial != target_label
@@ -3540,7 +3557,7 @@ relax_delay_slots (rtx first)
 	      && redundant_insn (XVECEXP (PATTERN (trial), 0, 1), insn, 0))
 	    {
 	      target_label = JUMP_LABEL (XVECEXP (PATTERN (trial), 0, 0));
-	      if (target_label == 0)
+	      if (ANY_RETURN_P (target_label))
 		target_label = find_end_label ();
 
 	      if (target_label
@@ -3627,7 +3644,7 @@ relax_delay_slots (rtx first)
 	  rtx label = JUMP_LABEL (next);
 	  rtx old_label = JUMP_LABEL (delay_insn);
 
-	  if (label == 0)
+	  if (ANY_RETURN_P (label))
 	    label = find_end_label ();
 
 	  /* find_end_label can generate a new label. Check this first.  */
@@ -3737,7 +3754,7 @@ make_return_insns (rtx first)
 
       /* If we can't make the jump into a RETURN, try to redirect it to the best
 	 RETURN and go on to the next insn.  */
-      if (! reorg_redirect_jump (jump_insn, NULL_RTX))
+      if (! reorg_redirect_jump (jump_insn, ret_rtx))
 	{
 	  /* Make sure redirecting the jump will not invalidate the delay
 	     slot insns.  */
@@ -3866,7 +3883,7 @@ dbr_schedule (rtx first)
       /* Ensure all jumps go to the last of a set of consecutive labels.  */
       if (JUMP_P (insn)
 	  && (condjump_p (insn) || condjump_in_parallel_p (insn))
-	  && JUMP_LABEL (insn) != 0
+	  && !ANY_RETURN_P (JUMP_LABEL (insn))
 	  && ((target = skip_consecutive_labels (JUMP_LABEL (insn)))
 	      != JUMP_LABEL (insn)))
 	redirect_jump (insn, target, 1);
Index: gcc/jump.c
===================================================================
--- gcc/jump.c	(revision 176230)
+++ gcc/jump.c	(working copy)
@@ -1217,7 +1217,7 @@ delete_related_insns (rtx insn)
   /* If deleting a jump, decrement the count of the label,
      and delete the label if it is now unused.  */
 
-  if (JUMP_P (insn) && JUMP_LABEL (insn))
+  if (JUMP_P (insn) && !ANY_RETURN_P (JUMP_LABEL (insn)))
     {
       rtx lab = JUMP_LABEL (insn), lab_next;
 
@@ -1348,6 +1348,18 @@ delete_for_peephole (rtx from, rtx to)
      is also an unconditional jump in that case.  */
 }
 \f
+/* A helper function for redirect_exp_1; examines its input X and returns
+   either a LABEL_REF around a label, or a RETURN if X was NULL.  */
+static rtx
+redirect_target (rtx x)
+{
+  if (x == NULL_RTX)
+    return ret_rtx;
+  if (!ANY_RETURN_P (x))
+    return gen_rtx_LABEL_REF (Pmode, x);
+  return x;
+}
+
 /* Throughout LOC, redirect OLABEL to NLABEL.  Treat null OLABEL or
    NLABEL as a return.  Accrue modifications into the change group.  */
 
@@ -1359,37 +1371,19 @@ redirect_exp_1 (rtx *loc, rtx olabel, rt
   int i;
   const char *fmt;
 
-  if (code == LABEL_REF)
-    {
-      if (XEXP (x, 0) == olabel)
-	{
-	  rtx n;
-	  if (nlabel)
-	    n = gen_rtx_LABEL_REF (Pmode, nlabel);
-	  else
-	    n = ret_rtx;
-
-	  validate_change (insn, loc, n, 1);
-	  return;
-	}
-    }
-  else if (code == RETURN && olabel == 0)
+  if ((code == LABEL_REF && XEXP (x, 0) == olabel)
+      || x == olabel)
     {
-      if (nlabel)
-	x = gen_rtx_LABEL_REF (Pmode, nlabel);
-      else
-	x = ret_rtx;
-      if (loc == &PATTERN (insn))
-	x = gen_rtx_SET (VOIDmode, pc_rtx, x);
-      validate_change (insn, loc, x, 1);
+      validate_change (insn, loc, redirect_target (nlabel), 1);
       return;
     }
 
-  if (code == SET && nlabel == 0 && SET_DEST (x) == pc_rtx
+  if (code == SET && SET_DEST (x) == pc_rtx
+      && ANY_RETURN_P (nlabel)
       && GET_CODE (SET_SRC (x)) == LABEL_REF
       && XEXP (SET_SRC (x), 0) == olabel)
     {
-      validate_change (insn, loc, ret_rtx, 1);
+      validate_change (insn, loc, nlabel, 1);
       return;
     }
 
@@ -1426,6 +1420,7 @@ redirect_jump_1 (rtx jump, rtx nlabel)
   int ochanges = num_validated_changes ();
   rtx *loc, asmop;
 
+  gcc_assert (nlabel != NULL_RTX);
   asmop = extract_asm_operands (PATTERN (jump));
   if (asmop)
     {
@@ -1447,17 +1442,20 @@ redirect_jump_1 (rtx jump, rtx nlabel)
    jump target label is unused as a result, it and the code following
    it may be deleted.
 
-   If NLABEL is zero, we are to turn the jump into a (possibly conditional)
-   RETURN insn.
+   Normally, NLABEL will be a label, but it may also be a RETURN rtx;
+   in that case we are to turn the jump into a (possibly conditional)
+   return insn.
 
    The return value will be 1 if the change was made, 0 if it wasn't
-   (this can only occur for NLABEL == 0).  */
+   (this can only occur when trying to produce return insns).  */
 
 int
 redirect_jump (rtx jump, rtx nlabel, int delete_unused)
 {
   rtx olabel = JUMP_LABEL (jump);
 
+  gcc_assert (nlabel != NULL_RTX);
+
   if (nlabel == olabel)
     return 1;
 
@@ -1485,13 +1483,14 @@ redirect_jump_2 (rtx jump, rtx olabel, r
      about this.  */
   gcc_assert (delete_unused >= 0);
   JUMP_LABEL (jump) = nlabel;
-  if (nlabel)
+  if (!ANY_RETURN_P (nlabel))
     ++LABEL_NUSES (nlabel);
 
   /* Update labels in any REG_EQUAL note.  */
   if ((note = find_reg_note (jump, REG_EQUAL, NULL_RTX)) != NULL_RTX)
     {
-      if (!nlabel || (invert && !invert_exp_1 (XEXP (note, 0), jump)))
+      if (ANY_RETURN_P (nlabel)
+	  || (invert && !invert_exp_1 (XEXP (note, 0), jump)))
 	remove_note (jump, note);
       else
 	{
@@ -1500,7 +1499,8 @@ redirect_jump_2 (rtx jump, rtx olabel, r
 	}
     }
 
-  if (olabel && --LABEL_NUSES (olabel) == 0 && delete_unused > 0
+  if (!ANY_RETURN_P (olabel)
+      && --LABEL_NUSES (olabel) == 0 && delete_unused > 0
       /* Undefined labels will remain outside the insn stream.  */
       && INSN_UID (olabel))
     delete_related_insns (olabel);
Index: gcc/ifcvt.c
===================================================================
--- gcc/ifcvt.c	(revision 176230)
+++ gcc/ifcvt.c	(working copy)
@@ -104,7 +104,7 @@ static int cond_exec_find_if_block (ce_i
 static int find_if_case_1 (basic_block, edge, edge);
 static int find_if_case_2 (basic_block, edge, edge);
 static int dead_or_predicable (basic_block, basic_block, basic_block,
-			       basic_block, int);
+			       edge, int);
 static void noce_emit_move_insn (rtx, rtx);
 static rtx block_has_only_trap (basic_block);
 \f
@@ -3846,7 +3846,7 @@ find_if_case_1 (basic_block test_bb, edg
 
   /* Registers set are dead, or are predicable.  */
   if (! dead_or_predicable (test_bb, then_bb, else_bb,
-			    single_succ (then_bb), 1))
+			    single_succ_edge (then_bb), 1))
     return FALSE;
 
   /* Conversion went ok, including moving the insns and fixing up the
@@ -3961,7 +3961,7 @@ find_if_case_2 (basic_block test_bb, edg
     return FALSE;
 
   /* Registers set are dead, or are predicable.  */
-  if (! dead_or_predicable (test_bb, else_bb, then_bb, else_succ->dest, 0))
+  if (! dead_or_predicable (test_bb, else_bb, then_bb, else_succ, 0))
     return FALSE;
 
   /* Conversion went ok, including moving the insns and fixing up the
@@ -3984,18 +3984,21 @@ find_if_case_2 (basic_block test_bb, edg
    Return TRUE if successful.
 
    TEST_BB is the block containing the conditional branch.  MERGE_BB
-   is the block containing the code to manipulate.  NEW_DEST is the
-   label TEST_BB should be branching to after the conversion.
+   is the block containing the code to manipulate.  DEST_EDGE is an
+   edge representing a jump to the join block; after the conversion,
+   TEST_BB should be branching to its destination.
    REVERSEP is true if the sense of the branch should be reversed.  */
 
 static int
 dead_or_predicable (basic_block test_bb, basic_block merge_bb,
-		    basic_block other_bb, basic_block new_dest, int reversep)
+		    basic_block other_bb, edge dest_edge, int reversep)
 {
-  rtx head, end, jump, earliest = NULL_RTX, old_dest, new_label = NULL_RTX;
+  basic_block new_dest = dest_edge->dest;
+  rtx head, end, jump, earliest = NULL_RTX, old_dest;
   bitmap merge_set = NULL;
   /* Number of pending changes.  */
   int n_validated_changes = 0;
+  rtx new_dest_label;
 
   jump = BB_END (test_bb);
 
@@ -4126,6 +4129,18 @@ dead_or_predicable (basic_block test_bb,
     }
 
  no_body:
+  if (JUMP_P (BB_END (dest_edge->src)))
+    new_dest_label = JUMP_LABEL (BB_END (dest_edge->src));
+  else if (other_bb != new_dest)
+    {
+      if (new_dest == EXIT_BLOCK_PTR)
+	new_dest_label = ret_rtx;
+      else
+	new_dest_label = block_label (new_dest);
+    }
+  else
+    new_dest_label = NULL_RTX;
+
   /* We don't want to use normal invert_jump or redirect_jump because
      we don't want to delete_insn called.  Also, we want to do our own
      change group management.  */
@@ -4133,10 +4148,9 @@ dead_or_predicable (basic_block test_bb,
   old_dest = JUMP_LABEL (jump);
   if (other_bb != new_dest)
     {
-      new_label = block_label (new_dest);
       if (reversep
-	  ? ! invert_jump_1 (jump, new_label)
-	  : ! redirect_jump_1 (jump, new_label))
+	  ? ! invert_jump_1 (jump, new_dest_label)
+	  : ! redirect_jump_1 (jump, new_dest_label))
 	goto cancel;
     }
 
@@ -4147,7 +4161,7 @@ dead_or_predicable (basic_block test_bb,
 
   if (other_bb != new_dest)
     {
-      redirect_jump_2 (jump, old_dest, new_label, 0, reversep);
+      redirect_jump_2 (jump, old_dest, new_dest_label, 0, reversep);
 
       redirect_edge_succ (BRANCH_EDGE (test_bb), new_dest);
       if (reversep)
Index: gcc/function.c
===================================================================
--- gcc/function.c	(revision 176230)
+++ gcc/function.c	(working copy)
@@ -5309,7 +5309,8 @@ emit_use_return_register_into_block (bas
 static void
 emit_return_into_block (basic_block bb)
 {
-  emit_jump_insn_after (gen_return (), BB_END (bb));
+  rtx jump = emit_jump_insn_after (gen_return (), BB_END (bb));
+  JUMP_LABEL (jump) = ret_rtx;
 }
 #endif /* HAVE_return */
 
@@ -5468,7 +5469,7 @@ thread_prologue_and_epilogue_insns (void
 		 that with a conditional return instruction.  */
 	      else if (condjump_p (jump))
 		{
-		  if (! redirect_jump (jump, 0, 0))
+		  if (! redirect_jump (jump, ret_rtx, 0))
 		    {
 		      ei_next (&ei2);
 		      continue;
@@ -5551,6 +5552,8 @@ thread_prologue_and_epilogue_insns (void
 #ifdef HAVE_epilogue
   if (HAVE_epilogue)
     {
+      rtx returnjump;
+
       start_sequence ();
       epilogue_end = emit_note (NOTE_INSN_EPILOGUE_BEG);
       seq = gen_epilogue ();
@@ -5561,11 +5564,25 @@ thread_prologue_and_epilogue_insns (void
       record_insns (seq, NULL, &epilogue_insn_hash);
       set_insn_locators (seq, epilogue_locator);
 
+      returnjump = get_last_insn ();
       seq = get_insns ();
       end_sequence ();
 
       insert_insn_on_edge (seq, e);
       inserted = true;
+
+      if (JUMP_P (returnjump))
+	{
+	  rtx pat = PATTERN (returnjump);
+	  if (GET_CODE (pat) == PARALLEL)
+	    pat = XVECEXP (pat, 0, 0);
+	  if (ANY_RETURN_P (pat))
+	    JUMP_LABEL (returnjump) = pat;
+	  else
+	    JUMP_LABEL (returnjump) = ret_rtx;
+	}
+      else
+	returnjump = NULL_RTX;
     }
   else
 #endif
Index: gcc/print-rtl.c
===================================================================
--- gcc/print-rtl.c	(revision 176230)
+++ gcc/print-rtl.c	(working copy)
@@ -323,9 +323,14 @@ print_rtx (const_rtx in_rtx)
 	      }
 	  }
 	else if (i == 8 && JUMP_P (in_rtx) && JUMP_LABEL (in_rtx) != NULL)
-	  /* Output the JUMP_LABEL reference.  */
-	  fprintf (outfile, "\n%s%*s -> %d", print_rtx_head, indent * 2, "",
-		   INSN_UID (JUMP_LABEL (in_rtx)));
+	  {
+	    /* Output the JUMP_LABEL reference.  */
+	    fprintf (outfile, "\n%s%*s -> ", print_rtx_head, indent * 2, "");
+	    if (GET_CODE (JUMP_LABEL (in_rtx)) == RETURN)
+	      fprintf (outfile, "return");
+	    else
+	      fprintf (outfile, "%d", INSN_UID (JUMP_LABEL (in_rtx)));
+	  }
 	else if (i == 0 && GET_CODE (in_rtx) == VALUE)
 	  {
 #ifndef GENERATOR_FILE
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c	(revision 176230)
+++ gcc/emit-rtl.c	(working copy)
@@ -3265,14 +3265,17 @@ prev_label (rtx insn)
   return insn;
 }
 
-/* Return the last label to mark the same position as LABEL.  Return null
-   if LABEL itself is null.  */
+/* Return the last label to mark the same position as LABEL.  Return LABEL
+   itself if it is null or any return rtx.  */
 
 rtx
 skip_consecutive_labels (rtx label)
 {
   rtx insn;
 
+  if (label && ANY_RETURN_P (label))
+    return label;
+
   for (insn = label; insn != 0 && !INSN_P (insn); insn = NEXT_INSN (insn))
     if (LABEL_P (insn))
       label = insn;
Index: gcc/cfglayout.c
===================================================================
--- gcc/cfglayout.c	(revision 176230)
+++ gcc/cfglayout.c	(working copy)
@@ -899,7 +899,7 @@ fixup_reorder_chain (void)
 	 Note force_nonfallthru can delete E_FALL and thus we have to
 	 save E_FALL->src prior to the call to force_nonfallthru.  */
       src_bb = e_fall->src;
-      nb = force_nonfallthru (e_fall);
+      nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest);
       if (nb)
 	{
 	  nb->il.rtl->visited = 1;
@@ -1195,6 +1195,9 @@ duplicate_insn_chain (rtx from, rtx to)
 	      break;
 	    }
 	  copy = emit_copy_of_insn_after (insn, get_last_insn ());
+	  if (JUMP_P (insn) && JUMP_LABEL (insn) != NULL_RTX
+	      && ANY_RETURN_P (JUMP_LABEL (insn)))
+	    JUMP_LABEL (copy) = JUMP_LABEL (insn);
           maybe_copy_prologue_epilogue_insn (insn, copy);
 	  break;
 
Index: gcc/rtl.h
===================================================================
--- gcc/rtl.h	(revision 176230)
+++ gcc/rtl.h	(working copy)
@@ -413,6 +413,9 @@ struct GTY((variable_size)) rtvec_def {
   (JUMP_P (INSN) && (GET_CODE (PATTERN (INSN)) == ADDR_VEC || \
 		     GET_CODE (PATTERN (INSN)) == ADDR_DIFF_VEC))
 
+/* Predicate yielding nonzero iff X is a return.  */
+#define ANY_RETURN_P(X) ((X) == ret_rtx)
+
 /* 1 if X is a unary operator.  */
 
 #define UNARY_P(X)   \
Index: gcc/dwarf2cfi.c
===================================================================
--- gcc/dwarf2cfi.c	(revision 176230)
+++ gcc/dwarf2cfi.c	(working copy)
@@ -678,7 +678,7 @@ compute_barrier_args_size_1 (rtx insn, H
     {
       rtx dest = JUMP_LABEL (insn);
 
-      if (dest)
+      if (dest != NULL_RTX && !ANY_RETURN_P (dest))
 	{
 	  if (barrier_args_size [INSN_UID (dest)] < 0)
 	    {
@@ -2294,6 +2294,8 @@ create_cfi_notes (void)
 	  dwarf2out_frame_debug (insn, false);
 	  continue;
 	}
+      if (GET_CODE (pat) == ADDR_VEC || GET_CODE (pat) == ADDR_DIFF_VEC)
+	continue;
 
       if (GET_CODE (pat) == SEQUENCE)
 	{
Index: gcc/resource.c
===================================================================
--- gcc/resource.c	(revision 176230)
+++ gcc/resource.c	(working copy)
@@ -495,6 +495,8 @@ find_dead_or_set_registers (rtx target,
 		  || GET_CODE (PATTERN (this_jump_insn)) == RETURN)
 		{
 		  next = JUMP_LABEL (this_jump_insn);
+		  if (ANY_RETURN_P (next))
+		    next = NULL_RTX;
 		  if (jump_insn == 0)
 		    {
 		      jump_insn = insn;
@@ -562,9 +564,10 @@ find_dead_or_set_registers (rtx target,
 		  AND_COMPL_HARD_REG_SET (scratch, needed.regs);
 		  AND_COMPL_HARD_REG_SET (fallthrough_res.regs, scratch);
 
-		  find_dead_or_set_registers (JUMP_LABEL (this_jump_insn),
-					      &target_res, 0, jump_count,
-					      target_set, needed);
+		  if (!ANY_RETURN_P (JUMP_LABEL (this_jump_insn)))
+		    find_dead_or_set_registers (JUMP_LABEL (this_jump_insn),
+						&target_res, 0, jump_count,
+						target_set, needed);
 		  find_dead_or_set_registers (next,
 					      &fallthrough_res, 0, jump_count,
 					      set, needed);
@@ -878,7 +881,7 @@ mark_target_live_regs (rtx insns, rtx ta
   struct resources set, needed;
 
   /* Handle end of function.  */
-  if (target == 0)
+  if (target == 0 || ANY_RETURN_P (target))
     {
       *res = end_of_function_needs;
       return;
@@ -1097,8 +1100,9 @@ mark_target_live_regs (rtx insns, rtx ta
       struct resources new_resources;
       rtx stop_insn = next_active_insn (jump_insn);
 
-      mark_target_live_regs (insns, next_active_insn (jump_target),
-			     &new_resources);
+      if (!ANY_RETURN_P (jump_target))
+	jump_target = next_active_insn (jump_target);
+      mark_target_live_regs (insns, jump_target, &new_resources);
       CLEAR_RESOURCE (&set);
       CLEAR_RESOURCE (&needed);
 
Index: gcc/basic-block.h
===================================================================
--- gcc/basic-block.h	(revision 176230)
+++ gcc/basic-block.h	(working copy)
@@ -799,6 +799,7 @@ extern rtx block_label (basic_block);
 extern bool purge_all_dead_edges (void);
 extern bool purge_dead_edges (basic_block);
 extern bool fixup_abnormal_edges (void);
+extern basic_block force_nonfallthru_and_redirect (edge, basic_block);
 
 /* In cfgbuild.c.  */
 extern void find_many_sub_basic_blocks (sbitmap);
Index: gcc/config/alpha/alpha.c
===================================================================
--- gcc/config/alpha/alpha.c	(revision 176230)
+++ gcc/config/alpha/alpha.c	(working copy)
@@ -571,59 +571,6 @@ direct_return (void)
 	  && crtl->args.pretend_args_size == 0);
 }
 
-/* Return the ADDR_VEC associated with a tablejump insn.  */
-
-rtx
-alpha_tablejump_addr_vec (rtx insn)
-{
-  rtx tmp;
-
-  tmp = JUMP_LABEL (insn);
-  if (!tmp)
-    return NULL_RTX;
-  tmp = NEXT_INSN (tmp);
-  if (!tmp)
-    return NULL_RTX;
-  if (JUMP_P (tmp)
-      && GET_CODE (PATTERN (tmp)) == ADDR_DIFF_VEC)
-    return PATTERN (tmp);
-  return NULL_RTX;
-}
-
-/* Return the label of the predicted edge, or CONST0_RTX if we don't know.  */
-
-rtx
-alpha_tablejump_best_label (rtx insn)
-{
-  rtx jump_table = alpha_tablejump_addr_vec (insn);
-  rtx best_label = NULL_RTX;
-
-  /* ??? Once the CFG doesn't keep getting completely rebuilt, look
-     there for edge frequency counts from profile data.  */
-
-  if (jump_table)
-    {
-      int n_labels = XVECLEN (jump_table, 1);
-      int best_count = -1;
-      int i, j;
-
-      for (i = 0; i < n_labels; i++)
-	{
-	  int count = 1;
-
-	  for (j = i + 1; j < n_labels; j++)
-	    if (XEXP (XVECEXP (jump_table, 1, i), 0)
-		== XEXP (XVECEXP (jump_table, 1, j), 0))
-	      count++;
-
-	  if (count > best_count)
-	    best_count = count, best_label = XVECEXP (jump_table, 1, i);
-	}
-    }
-
-  return best_label ? best_label : const0_rtx;
-}
-
 /* Return the TLS model to use for SYMBOL.  */
 
 static enum tls_model
Index: gcc/config/alpha/alpha-protos.h
===================================================================
--- gcc/config/alpha/alpha-protos.h	(revision 176230)
+++ gcc/config/alpha/alpha-protos.h	(working copy)
@@ -31,9 +31,6 @@ extern void alpha_expand_prologue (void)
 extern void alpha_expand_epilogue (void);
 extern void alpha_output_filename (FILE *, const char *);
 
-extern rtx alpha_tablejump_addr_vec (rtx);
-extern rtx alpha_tablejump_best_label (rtx);
-
 extern bool alpha_legitimate_constant_p (enum machine_mode, rtx);
 extern rtx alpha_legitimize_reload_address (rtx, enum machine_mode,
 					    int, int, int);
Index: gcc/config/sh/sh.c
===================================================================
--- gcc/config/sh/sh.c	(revision 176230)
+++ gcc/config/sh/sh.c	(working copy)
@@ -5276,7 +5276,8 @@ barrier_align (rtx barrier_or_label)
 	}
       if (prev
 	  && JUMP_P (prev)
-	  && JUMP_LABEL (prev))
+	  && JUMP_LABEL (prev) != NULL_RTX
+	  && !ANY_RETURN_P (JUMP_LABEL (prev)))
 	{
 	  rtx x;
 	  if (jump_to_next
@@ -5975,7 +5976,7 @@ split_branches (rtx first)
 			JUMP_LABEL (insn) = far_label;
 			LABEL_NUSES (far_label)++;
 		      }
-		    redirect_jump (insn, NULL_RTX, 1);
+		    redirect_jump (insn, ret_rtx, 1);
 		    far_label = 0;
 		  }
 	      }
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 176230)
+++ gcc/config/arm/arm.c	(working copy)
@@ -11466,6 +11466,7 @@ is_jump_table (rtx insn)
 
   if (GET_CODE (insn) == JUMP_INSN
       && JUMP_LABEL (insn) != NULL
+      && !ANY_RETURN_P (JUMP_LABEL (insn))
       && ((table = next_real_insn (JUMP_LABEL (insn)))
 	  == next_real_insn (insn))
       && table != NULL
Index: gcc/cfgrtl.c
===================================================================
--- gcc/cfgrtl.c	(revision 176230)
+++ gcc/cfgrtl.c	(working copy)
@@ -1119,7 +1119,7 @@ rtl_redirect_edge_and_branch (edge e, ba
 /* Like force_nonfallthru below, but additionally performs redirection
    Used by redirect_edge_and_branch_force.  */
 
-static basic_block
+basic_block
 force_nonfallthru_and_redirect (edge e, basic_block target)
 {
   basic_block jump_block, new_bb = NULL, src = e->src;

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/6] Shrink-wrapping
  2011-07-21  3:57     ` Bernd Schmidt
@ 2011-07-21 11:25       ` Richard Sandiford
  2011-07-28 11:48         ` Bernd Schmidt
  0 siblings, 1 reply; 73+ messages in thread
From: Richard Sandiford @ 2011-07-21 11:25 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches

Bernd Schmidt <bernds@codesourcery.com> writes:
> On 07/07/11 16:34, Richard Sandiford wrote:
>> Is JUMP_LABEL ever null after this change?  (In fully-complete rtl
>> sequences, I mean.)  It looked like some of the null checks in the
>> patch might not be necessary any more.
>
> It turns out that computed jumps can have a NULL JUMP_LABEL, and so can
> JUMP_INSNs holding ADDR_VECs.

Bleh.  Thanks for checking.

> +/* A wrapper around next_active_insn which takes care to return ret_rtx
> +   unchanged.  */
> +
> +static rtx
> +active_insn_after (rtx insn)
> +{
> +  if (ANY_RETURN_P (insn))
> +    return insn;
> +  return next_active_insn (insn);
> +}

The name "active_insn_after" seems a bit too similar to "next_active_insn"
for the difference to be obvious.  How about something like
"first_active_target_insn" instead?

It wasn't clear to me whether this should return null instead of "insn"
for the ANY_RETURN_P code.  In things like:

     insn_at_target = active_insn_after (target_label);

it introduces a new "INSN_P or RETURN" rtx choice, rather than the
"label or RETURN" choice seen in JUMP_LABELs.  So it might seem at a
glance that PATTERN could be directly applied to a nonnull insn_at_target,
whereas you actually need to test ANY_RETURN_P first.

But the existing code seems inconsistent.  Sometimes it passes
JUMP_LABELs directly to functions like own_thread_p, whereas sometimes
it passes the first active insn instead.  So if you returned null here,
you'd probably have three-way "null or RETURN or LABEL" checks where you
otherwise wouldn't.

All in all, I agree it's probably better this way.

> @@ -921,7 +933,7 @@ rare_destination (rtx insn)
>    int jump_count = 0;
>    rtx next;
>  
> -  for (; insn; insn = next)
> +  for (; insn && !ANY_RETURN_P (insn); insn = next)
>      {
>        if (NONJUMP_INSN_P (insn) && GET_CODE (PATTERN (insn)) == SEQUENCE)
>  	insn = XVECEXP (PATTERN (insn), 0, 0);

Since ANY_RETURN looks for patterns, while this loop iterates over insns,
I think it'd be more obvious to have:

  if (insn && ANY_RETURN_P (insn))
    return 1;

above the loop instead, as you did in follow_jumps and
skip_consecutive_labels.

> Index: gcc/jump.c
> ===================================================================
> --- gcc/jump.c	(revision 176230)
> +++ gcc/jump.c	(working copy)
> @@ -1217,7 +1217,7 @@ delete_related_insns (rtx insn)
>    /* If deleting a jump, decrement the count of the label,
>       and delete the label if it is now unused.  */
>  
> -  if (JUMP_P (insn) && JUMP_LABEL (insn))
> +  if (JUMP_P (insn) && !ANY_RETURN_P (JUMP_LABEL (insn)))
>      {
>        rtx lab = JUMP_LABEL (insn), lab_next;
>  

Given what you said above, and given that this is a public function,
I think we should keep the null check.

This pattern came up in reorg.c too, so maybe it would be worth having
a jump_to_label_p inline function somewhere, such as:

static bool
jump_to_label_p (rtx insn)
{
  return JUMP_P (insn) && JUMP_LABEL (insn) && LABEL_P (JUMP_LABEL (insn));
}

And maybe also:

static rtx
jump_target_insn (rtx insn)
{
  return jump_to_label_p (insn) ? JUMP_LABEL (insn) : NULL_RTX;
}

It might help avoid the sprinkling of ANY_RETURN_Ps.  Just a suggestion
though, not going to insist.

>  /* Throughout LOC, redirect OLABEL to NLABEL.  Treat null OLABEL or
>     NLABEL as a return.  Accrue modifications into the change group.  */
>  
> @@ -1359,37 +1371,19 @@ redirect_exp_1 (rtx *loc, rtx olabel, rt
>    int i;
>    const char *fmt;
>  
> -  if (code == LABEL_REF)
> -    {
> -      if (XEXP (x, 0) == olabel)
> -	{
> -	  rtx n;
> -	  if (nlabel)
> -	    n = gen_rtx_LABEL_REF (Pmode, nlabel);
> -	  else
> -	    n = ret_rtx;
> -
> -	  validate_change (insn, loc, n, 1);
> -	  return;
> -	}
> -    }
> -  else if (code == RETURN && olabel == 0)
> +  if ((code == LABEL_REF && XEXP (x, 0) == olabel)
> +      || x == olabel)
>      {
> -      if (nlabel)
> -	x = gen_rtx_LABEL_REF (Pmode, nlabel);
> -      else
> -	x = ret_rtx;
> -      if (loc == &PATTERN (insn))
> -	x = gen_rtx_SET (VOIDmode, pc_rtx, x);
> -      validate_change (insn, loc, x, 1);
> +      validate_change (insn, loc, redirect_target (nlabel), 1);
>        return;

It looks like the old code tried to allow returns to be redirected
to a label -- (return) to (set (pc) (label_ref)) -- whereas the new
code doesn't.  (Then again, it looks like the old code would create
(set (pc) (return)) when "redirecting" a return to a return.
That doesn't seem like a good idea, and it ought to be dead
anyway with the olabel == nlabel shortcuts.)

How about:

      x = redirect_target (nlabel);
      if (GET_CODE (x) == LABEL_REF && loc == &PATTERN (insn))
	x = gen_rtx_SET (VOIDmode, pc_rtx, x);
      validate_change (insn, loc, x, 1);

I realise this doesn't help for PARALLELs though (just as it didn't
for the old code).

> @@ -4126,6 +4129,18 @@ dead_or_predicable (basic_block test_bb,
>      }
>  
>   no_body:
> +  if (JUMP_P (BB_END (dest_edge->src)))
> +    new_dest_label = JUMP_LABEL (BB_END (dest_edge->src));
> +  else if (other_bb != new_dest)
> +    {
> +      if (new_dest == EXIT_BLOCK_PTR)
> +	new_dest_label = ret_rtx;
> +      else
> +	new_dest_label = block_label (new_dest);
> +    }
> +  else
> +    new_dest_label = NULL_RTX;
> +

I found the placement of this code a bit confusing as things stand.
new_dest_label is only meaningful if other_bb != new_dest, so it seemed
like something that should directly replace the existing new_label
assignment.  It's OK if it makes the shrink-wrap stuff easier though.

> @@ -1195,6 +1195,9 @@ duplicate_insn_chain (rtx from, rtx to)
>  	      break;
>  	    }
>  	  copy = emit_copy_of_insn_after (insn, get_last_insn ());
> +	  if (JUMP_P (insn) && JUMP_LABEL (insn) != NULL_RTX
> +	      && ANY_RETURN_P (JUMP_LABEL (insn)))
> +	    JUMP_LABEL (copy) = JUMP_LABEL (insn);

I think this should go in emit_copy_of_insn_after instead.

> @@ -2294,6 +2294,8 @@ create_cfi_notes (void)
>  	  dwarf2out_frame_debug (insn, false);
>  	  continue;
>  	}
> +      if (GET_CODE (pat) == ADDR_VEC || GET_CODE (pat) == ADDR_DIFF_VEC)
> +	continue;
>  
>        if (GET_CODE (pat) == SEQUENCE)
>  	{

rth better approve this bit...

Looks good to me otherwise.

Richard

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/6] Shrink-wrapping
  2011-07-21 11:25       ` Richard Sandiford
@ 2011-07-28 11:48         ` Bernd Schmidt
  2011-07-28 12:45           ` Richard Sandiford
                             ` (2 more replies)
  0 siblings, 3 replies; 73+ messages in thread
From: Bernd Schmidt @ 2011-07-28 11:48 UTC (permalink / raw)
  To: GCC Patches, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 3001 bytes --]

On 07/21/11 11:52, Richard Sandiford wrote:
> The name "active_insn_after" seems a bit too similar to "next_active_insn"
> for the difference to be obvious.  How about something like
> "first_active_target_insn" instead?

Changed.
>> -  for (; insn; insn = next)
>> +  for (; insn && !ANY_RETURN_P (insn); insn = next)
>>      {
>>        if (NONJUMP_INSN_P (insn) && GET_CODE (PATTERN (insn)) == SEQUENCE)
>>  	insn = XVECEXP (PATTERN (insn), 0, 0);
> 
> Since ANY_RETURN looks for patterns, while this loop iterates over insns,
> I think it'd be more obvious to have:
> 
>   if (insn && ANY_RETURN_P (insn))
>     return 1;
> 
> above the loop instead

That alone wouldn't work since we assign JUMP_LABELs to next. Left alone
for now.

>> --- gcc/jump.c	(revision 176230)
>> +++ gcc/jump.c	(working copy)
>> @@ -1217,7 +1217,7 @@ delete_related_insns (rtx insn)
> 
> Given what you said above, and given that this is a public function,
> I think we should keep the null check.

Changed.
> 
> This pattern came up in reorg.c too, so maybe it would be worth having
> a jump_to_label_p inline function somewhere, such as:

Done. Only has two uses for now though; reorg.c uses different patterns
mostly.

> It looks like the old code tried to allow returns to be redirected
> to a label -- (return) to (set (pc) (label_ref)) -- whereas the new
> code doesn't. [...]
> 
> How about:
> 
>       x = redirect_target (nlabel);
>       if (GET_CODE (x) == LABEL_REF && loc == &PATTERN (insn))
> 	x = gen_rtx_SET (VOIDmode, pc_rtx, x);
>       validate_change (insn, loc, x, 1);

Changed, although this probably isn't a useful thing to allow (it will
just add one more unnecessary jump to the code?).

[ifcvt changes]
> I found the placement of this code a bit confusing as things stand.
> new_dest_label is only meaningful if other_bb != new_dest, so it seemed
> like something that should directly replace the existing new_label
> assignment.  It's OK if it makes the shrink-wrap stuff easier though.

Changed.

>> @@ -1195,6 +1195,9 @@ duplicate_insn_chain (rtx from, rtx to)
>>  	      break;
>>  	    }
>>  	  copy = emit_copy_of_insn_after (insn, get_last_insn ());
>> +	  if (JUMP_P (insn) && JUMP_LABEL (insn) != NULL_RTX
>> +	      && ANY_RETURN_P (JUMP_LABEL (insn)))
>> +	    JUMP_LABEL (copy) = JUMP_LABEL (insn);
> 
> I think this should go in emit_copy_of_insn_after instead.

Here I'd like to avoid modifying the existing code in
emit_copy_of_insn_after if possible. Not sure why it's not copying
JUMP_LABELS, but that's something I'd prefer to investigate at some
other time rather than risk breaking things.

>> @@ -2294,6 +2294,8 @@ create_cfi_notes (void)
>>  	  dwarf2out_frame_debug (insn, false);
>>  	  continue;
>>  	}
>> +      if (GET_CODE (pat) == ADDR_VEC || GET_CODE (pat) == ADDR_DIFF_VEC)
>> +	continue;
>>  
>>        if (GET_CODE (pat) == SEQUENCE)
>>  	{
> 
> rth better approve this bit...

It went away.

New patch below. Retested on i686-linux and mips64-elf. Ok?


Bernd

[-- Attachment #2: jlabel0726b.diff --]
[-- Type: text/plain, Size: 34433 bytes --]

	* rtlanal.c (tablejump_p): False for returns.
	* reorg.c (first_active_target_insn): New static function.
	(find_end_label): Set JUMP_LABEL for a new returnjump.
	(optimize_skip, get_jump_flags, rare_destination,
	mostly_true_jump, get_branch_condition,
	steal_delay_list_from_target, own_thread_p,
	fill_simple_delay_slots, follow_jumps, fill_slots_from_thread,
	fill_eager_delay_slots, relax_delay_slots, make_return_insns,
	dbr_schedule): Adjust to handle ret_rtx in JUMP_LABELs.
	* jump.c (delete_related_insns): Likewise.
	(jump_to_label_p): New function.
	(redirect_target): New static function.
	(redirect_exp_1): Use it.  Adjust to handle ret_rtx in JUMP_LABELS.
	(redirect_jump_1): Assert that the new label is nonnull.
	(redirect_jump): Likewise.
	(redirect_jump_2): Check for ANY_RETURN_P rather than NULL labels.
	* ifcvt.c (find_if_case_1): Take care when redirecting jumps to the
	exit block.
	(dead_or_predicable): Change NEW_DEST arg to DEST_EDGE.  All callers
	changed.  Ensure that the right label is passed to redirect_jump.
	* function.c (emit_return_into_block,
	thread_prologue_and_epilogue_insns): Ensure new returnjumps have
	ret_rtx in their JUMP_LABEL.
	* print-rtl.c (print_rtx): Handle ret_rtx in a JUMP_LABEL.
	* emit-rtl.c (skip_consecutive_labels): Allow the caller to
	pass ret_rtx as label.
	* cfglayout.c (fixup_reorder_chain): Use
	force_nonfallthru_and_redirect rather than force_nonfallthru.
	(duplicate_insn_chain): Copy JUMP_LABELs for returns.
	* rtl.h (ANY_RETURN_P): New macro.
	(jump_to_label_p): Declare.
	* resource.c (find_dead_or_set_registers): Handle ret_rtx in
	JUMP_LABELs.
	(mark_target_live_regs): Likewise.
	* basic-block.h (force_nonfallthru_and_redirect): Declare.
	* cfgrtl.c (force_nonfallthru_and_redirect): No longer static.
	* config/alpha/alpha.c (alpha_tablejump_addr_vec,
	alpha_tablejump_best_label): Remove functions.
	* config/alpha/alpha-protos.c (alpha_tablejump_addr_vec,
	alpha_tablejump_best_label): Remove declarations.
	* config/sh/sh.c (barrier_align, split_branches): Adjust for
	ret_rtx in JUMP_LABELs.
	* config/arm/arm.c (is_jump_table): Likewise.

Index: gcc/rtlanal.c
===================================================================
--- gcc/rtlanal.c	(revision 176838)
+++ gcc/rtlanal.c	(working copy)
@@ -2660,8 +2660,11 @@ tablejump_p (const_rtx insn, rtx *labelp
 {
   rtx label, table;
 
-  if (JUMP_P (insn)
-      && (label = JUMP_LABEL (insn)) != NULL_RTX
+  if (!JUMP_P (insn))
+    return false;
+
+  label = JUMP_LABEL (insn);
+  if (label != NULL_RTX && !ANY_RETURN_P (label)
       && (table = next_active_insn (label)) != NULL_RTX
       && JUMP_TABLE_DATA_P (table))
     {
Index: gcc/reorg.c
===================================================================
--- gcc/reorg.c	(revision 176838)
+++ gcc/reorg.c	(working copy)
@@ -220,6 +220,17 @@ static void relax_delay_slots (rtx);
 static void make_return_insns (rtx);
 #endif
 \f
+/* A wrapper around next_active_insn which takes care to return ret_rtx
+   unchanged.  */
+
+static rtx
+first_active_target_insn (rtx insn)
+{
+  if (ANY_RETURN_P (insn))
+    return insn;
+  return next_active_insn (insn);
+}
+\f
 /* Return TRUE if this insn should stop the search for insn to fill delay
    slots.  LABELS_P indicates that labels should terminate the search.
    In all cases, jumps terminate the search.  */
@@ -437,6 +448,7 @@ find_end_label (void)
 	      /* The return we make may have delay slots too.  */
 	      rtx insn = gen_return ();
 	      insn = emit_jump_insn (insn);
+	      JUMP_LABEL (insn) = ret_rtx;
 	      emit_barrier ();
 	      if (num_delay_slots (insn) > 0)
 		obstack_ptr_grow (&unfilled_slots_obstack, insn);
@@ -824,7 +836,7 @@ optimize_skip (rtx insn)
 	      || GET_CODE (PATTERN (next_trial)) == RETURN))
 	{
 	  rtx target_label = JUMP_LABEL (next_trial);
-	  if (target_label == 0)
+	  if (ANY_RETURN_P (target_label))
 	    target_label = find_end_label ();
 
 	  if (target_label)
@@ -861,12 +873,12 @@ get_jump_flags (rtx insn, rtx label)
      be INSNs, CALL_INSNs, or JUMP_INSNs.  Only JUMP_INSNs have branch
      direction information, and only if they are conditional jumps.
 
-     If LABEL is zero, then there is no way to determine the branch
+     If LABEL is a return, then there is no way to determine the branch
      direction.  */
   if (JUMP_P (insn)
       && (condjump_p (insn) || condjump_in_parallel_p (insn))
+      && !ANY_RETURN_P (label)
       && INSN_UID (insn) <= max_uid
-      && label != 0
       && INSN_UID (label) <= max_uid)
     flags
       = (uid_to_ruid[INSN_UID (label)] > uid_to_ruid[INSN_UID (insn)])
@@ -921,7 +933,7 @@ rare_destination (rtx insn)
   int jump_count = 0;
   rtx next;
 
-  for (; insn; insn = next)
+  for (; insn && !ANY_RETURN_P (insn); insn = next)
     {
       if (NONJUMP_INSN_P (insn) && GET_CODE (PATTERN (insn)) == SEQUENCE)
 	insn = XVECEXP (PATTERN (insn), 0, 0);
@@ -1017,7 +1029,7 @@ mostly_true_jump (rtx jump_insn, rtx con
   /* Predict backward branches usually take, forward branches usually not.  If
      we don't know whether this is forward or backward, assume the branch
      will be taken, since most are.  */
-  return (target_label == 0 || INSN_UID (jump_insn) > max_uid
+  return (ANY_RETURN_P (target_label) || INSN_UID (jump_insn) > max_uid
 	  || INSN_UID (target_label) > max_uid
 	  || (uid_to_ruid[INSN_UID (jump_insn)]
 	      > uid_to_ruid[INSN_UID (target_label)]));
@@ -1037,10 +1049,10 @@ get_branch_condition (rtx insn, rtx targ
   if (condjump_in_parallel_p (insn))
     pat = XVECEXP (pat, 0, 0);
 
-  if (GET_CODE (pat) == RETURN)
-    return target == 0 ? const_true_rtx : 0;
+  if (ANY_RETURN_P (pat))
+    return pat == target ? const_true_rtx : 0;
 
-  else if (GET_CODE (pat) != SET || SET_DEST (pat) != pc_rtx)
+  if (GET_CODE (pat) != SET || SET_DEST (pat) != pc_rtx)
     return 0;
 
   src = SET_SRC (pat);
@@ -1048,16 +1060,12 @@ get_branch_condition (rtx insn, rtx targ
     return const_true_rtx;
 
   else if (GET_CODE (src) == IF_THEN_ELSE
-	   && ((target == 0 && GET_CODE (XEXP (src, 1)) == RETURN)
-	       || (GET_CODE (XEXP (src, 1)) == LABEL_REF
-		   && XEXP (XEXP (src, 1), 0) == target))
+	   && XEXP (XEXP (src, 1), 0) == target
 	   && XEXP (src, 2) == pc_rtx)
     return XEXP (src, 0);
 
   else if (GET_CODE (src) == IF_THEN_ELSE
-	   && ((target == 0 && GET_CODE (XEXP (src, 2)) == RETURN)
-	       || (GET_CODE (XEXP (src, 2)) == LABEL_REF
-		   && XEXP (XEXP (src, 2), 0) == target))
+	   && XEXP (XEXP (src, 2), 0) == target
 	   && XEXP (src, 1) == pc_rtx)
     {
       enum rtx_code rev;
@@ -1318,7 +1326,7 @@ steal_delay_list_from_target (rtx insn,
     }
 
   /* Show the place to which we will be branching.  */
-  *pnew_thread = next_active_insn (JUMP_LABEL (XVECEXP (seq, 0, 0)));
+  *pnew_thread = first_active_target_insn (JUMP_LABEL (XVECEXP (seq, 0, 0)));
 
   /* Add any new insns to the delay list and update the count of the
      number of slots filled.  */
@@ -1827,7 +1835,7 @@ own_thread_p (rtx thread, rtx label, int
   rtx insn;
 
   /* We don't own the function end.  */
-  if (thread == 0)
+  if (thread == 0 || ANY_RETURN_P (thread))
     return 0;
 
   /* Get the first active insn, or THREAD, if it is an active insn.  */
@@ -2245,7 +2253,7 @@ fill_simple_delay_slots (int non_jumps_p
 	  && (!JUMP_P (insn)
 	      || ((condjump_p (insn) || condjump_in_parallel_p (insn))
 		  && ! simplejump_p (insn)
-		  && JUMP_LABEL (insn) != 0)))
+		  && !ANY_RETURN_P (JUMP_LABEL (insn)))))
 	{
 	  /* Invariant: If insn is a JUMP_INSN, the insn's jump
 	     label.  Otherwise, zero.  */
@@ -2270,7 +2278,7 @@ fill_simple_delay_slots (int non_jumps_p
 		target = JUMP_LABEL (insn);
 	    }
 
-	  if (target == 0)
+	  if (target == 0 || ANY_RETURN_P (target))
 	    for (trial = next_nonnote_insn (insn); !stop_search_p (trial, 1);
 		 trial = next_trial)
 	      {
@@ -2343,7 +2351,7 @@ fill_simple_delay_slots (int non_jumps_p
 	     Don't do this if the insn at the branch target is a branch.  */
 	  if (slots_to_fill != slots_filled
 	      && trial
-	      && JUMP_P (trial)
+	      && jump_to_label_p (trial)
 	      && simplejump_p (trial)
 	      && (target == 0 || JUMP_LABEL (trial) == target)
 	      && (next_trial = next_active_insn (JUMP_LABEL (trial))) != 0
@@ -2500,7 +2508,7 @@ fill_simple_delay_slots (int non_jumps_p
 \f
 /* Follow any unconditional jump at LABEL;
    return the ultimate label reached by any such chain of jumps.
-   Return null if the chain ultimately leads to a return instruction.
+   Return ret_rtx if the chain ultimately leads to a return instruction.
    If LABEL is not followed by a jump, return LABEL.
    If the chain loops or we can't find end, return LABEL,
    since that tells caller to avoid changing the insn.  */
@@ -2513,29 +2521,34 @@ follow_jumps (rtx label)
   rtx value = label;
   int depth;
 
+  if (ANY_RETURN_P (label))
+    return label;
   for (depth = 0;
        (depth < 10
 	&& (insn = next_active_insn (value)) != 0
 	&& JUMP_P (insn)
-	&& ((JUMP_LABEL (insn) != 0 && any_uncondjump_p (insn)
-	     && onlyjump_p (insn))
+	&& JUMP_LABEL (insn) != NULL_RTX
+	&& ((any_uncondjump_p (insn) && onlyjump_p (insn))
 	    || GET_CODE (PATTERN (insn)) == RETURN)
 	&& (next = NEXT_INSN (insn))
 	&& BARRIER_P (next));
        depth++)
     {
+      rtx this_label = JUMP_LABEL (insn);
       rtx tem;
 
       /* If we have found a cycle, make the insn jump to itself.  */
-      if (JUMP_LABEL (insn) == label)
+      if (this_label == label)
 	return label;
-
-      tem = next_active_insn (JUMP_LABEL (insn));
-      if (tem && (GET_CODE (PATTERN (tem)) == ADDR_VEC
-		  || GET_CODE (PATTERN (tem)) == ADDR_DIFF_VEC))
+      if (ANY_RETURN_P (this_label))
+	return this_label;
+      tem = next_active_insn (this_label);
+      if (tem
+	  && (GET_CODE (PATTERN (tem)) == ADDR_VEC
+	      || GET_CODE (PATTERN (tem)) == ADDR_DIFF_VEC))
 	break;
 
-      value = JUMP_LABEL (insn);
+      value = this_label;
     }
   if (depth == 10)
     return label;
@@ -2587,7 +2600,7 @@ fill_slots_from_thread (rtx insn, rtx co
 
   /* If our thread is the end of subroutine, we can't get any delay
      insns from that.  */
-  if (thread == 0)
+  if (thread == NULL_RTX || ANY_RETURN_P (thread))
     return delay_list;
 
   /* If this is an unconditional branch, nothing is needed at the
@@ -2757,7 +2770,8 @@ fill_slots_from_thread (rtx insn, rtx co
 			      gcc_assert (REG_NOTE_KIND (note)
 					  == REG_LABEL_OPERAND);
 			  }
-		      if (JUMP_P (trial) && JUMP_LABEL (trial))
+		      if (JUMP_P (trial) && JUMP_LABEL (trial)
+			  && !ANY_RETURN_P (JUMP_LABEL (trial)))
 			LABEL_NUSES (JUMP_LABEL (trial))++;
 
 		      delete_related_insns (trial);
@@ -2776,7 +2790,8 @@ fill_slots_from_thread (rtx insn, rtx co
 			      gcc_assert (REG_NOTE_KIND (note)
 					  == REG_LABEL_OPERAND);
 			  }
-		      if (JUMP_P (trial) && JUMP_LABEL (trial))
+		      if (JUMP_P (trial) && JUMP_LABEL (trial)
+			  && !ANY_RETURN_P (JUMP_LABEL (trial)))
 			LABEL_NUSES (JUMP_LABEL (trial))--;
 		    }
 		  else
@@ -2897,7 +2912,8 @@ fill_slots_from_thread (rtx insn, rtx co
      depend on the destination register.  If so, try to place the opposite
      arithmetic insn after the jump insn and put the arithmetic insn in the
      delay slot.  If we can't do this, return.  */
-  if (delay_list == 0 && likely && new_thread
+  if (delay_list == 0 && likely
+      && new_thread && !ANY_RETURN_P (new_thread)
       && NONJUMP_INSN_P (new_thread)
       && GET_CODE (PATTERN (new_thread)) != ASM_INPUT
       && asm_noperands (PATTERN (new_thread)) < 0)
@@ -2990,7 +3006,7 @@ fill_slots_from_thread (rtx insn, rtx co
 					      delay_list))
 	new_thread = follow_jumps (JUMP_LABEL (new_thread));
 
-      if (new_thread == 0)
+      if (ANY_RETURN_P (new_thread))
 	label = find_end_label ();
       else if (LABEL_P (new_thread))
 	label = new_thread;
@@ -3063,7 +3079,7 @@ fill_eager_delay_slots (void)
 	 them.  Then see whether the branch is likely true.  We don't need
 	 to do a lot of this for unconditional branches.  */
 
-      insn_at_target = next_active_insn (target_label);
+      insn_at_target = first_active_target_insn (target_label);
       own_target = own_thread_p (target_label, target_label, 0);
 
       if (condition == const_true_rtx)
@@ -3098,7 +3114,7 @@ fill_eager_delay_slots (void)
 		 from the thread that was filled.  So we have to recompute
 		 the next insn at the target.  */
 	      target_label = JUMP_LABEL (insn);
-	      insn_at_target = next_active_insn (target_label);
+	      insn_at_target = first_active_target_insn (target_label);
 
 	      delay_list
 		= fill_slots_from_thread (insn, condition, fallthrough_insn,
@@ -3337,10 +3353,10 @@ relax_delay_slots (rtx first)
 	 group of consecutive labels.  */
       if (JUMP_P (insn)
 	  && (condjump_p (insn) || condjump_in_parallel_p (insn))
-	  && (target_label = JUMP_LABEL (insn)) != 0)
+	  && !ANY_RETURN_P (target_label = JUMP_LABEL (insn)))
 	{
 	  target_label = skip_consecutive_labels (follow_jumps (target_label));
-	  if (target_label == 0)
+	  if (ANY_RETURN_P (target_label))
 	    target_label = find_end_label ();
 
 	  if (target_label && next_active_insn (target_label) == next
@@ -3373,7 +3389,7 @@ relax_delay_slots (rtx first)
 		 invert_jump fails.  */
 
 	      ++LABEL_NUSES (target_label);
-	      if (label)
+	      if (!ANY_RETURN_P (label))
 		++LABEL_NUSES (label);
 
 	      if (invert_jump (insn, label, 1))
@@ -3382,7 +3398,7 @@ relax_delay_slots (rtx first)
 		  next = insn;
 		}
 
-	      if (label)
+	      if (!ANY_RETURN_P (label))
 		--LABEL_NUSES (label);
 
 	      if (--LABEL_NUSES (target_label) == 0)
@@ -3485,12 +3501,12 @@ relax_delay_slots (rtx first)
 
       target_label = JUMP_LABEL (delay_insn);
 
-      if (target_label)
+      if (!ANY_RETURN_P (target_label))
 	{
 	  /* If this jump goes to another unconditional jump, thread it, but
 	     don't convert a jump into a RETURN here.  */
 	  trial = skip_consecutive_labels (follow_jumps (target_label));
-	  if (trial == 0)
+	  if (ANY_RETURN_P (trial))
 	    trial = find_end_label ();
 
 	  if (trial && trial != target_label
@@ -3540,7 +3556,7 @@ relax_delay_slots (rtx first)
 	      && redundant_insn (XVECEXP (PATTERN (trial), 0, 1), insn, 0))
 	    {
 	      target_label = JUMP_LABEL (XVECEXP (PATTERN (trial), 0, 0));
-	      if (target_label == 0)
+	      if (ANY_RETURN_P (target_label))
 		target_label = find_end_label ();
 
 	      if (target_label
@@ -3627,7 +3643,7 @@ relax_delay_slots (rtx first)
 	  rtx label = JUMP_LABEL (next);
 	  rtx old_label = JUMP_LABEL (delay_insn);
 
-	  if (label == 0)
+	  if (ANY_RETURN_P (label))
 	    label = find_end_label ();
 
 	  /* find_end_label can generate a new label. Check this first.  */
@@ -3737,7 +3753,7 @@ make_return_insns (rtx first)
 
       /* If we can't make the jump into a RETURN, try to redirect it to the best
 	 RETURN and go on to the next insn.  */
-      if (! reorg_redirect_jump (jump_insn, NULL_RTX))
+      if (! reorg_redirect_jump (jump_insn, ret_rtx))
 	{
 	  /* Make sure redirecting the jump will not invalidate the delay
 	     slot insns.  */
@@ -3866,7 +3882,7 @@ dbr_schedule (rtx first)
       /* Ensure all jumps go to the last of a set of consecutive labels.  */
       if (JUMP_P (insn)
 	  && (condjump_p (insn) || condjump_in_parallel_p (insn))
-	  && JUMP_LABEL (insn) != 0
+	  && !ANY_RETURN_P (JUMP_LABEL (insn))
 	  && ((target = skip_consecutive_labels (JUMP_LABEL (insn)))
 	      != JUMP_LABEL (insn)))
 	redirect_jump (insn, target, 1);
Index: gcc/jump.c
===================================================================
--- gcc/jump.c	(revision 176838)
+++ gcc/jump.c	(working copy)
@@ -970,6 +970,15 @@ onlyjump_p (const_rtx insn)
   return 1;
 }
 
+/* Return true iff INSN is a jump and its JUMP_LABEL is a label, not
+   NULL or a return.  */
+bool
+jump_to_label_p (rtx insn)
+{
+  return (JUMP_P (insn)
+	  && JUMP_LABEL (insn) != NULL && !ANY_RETURN_P (JUMP_LABEL (insn)));
+}
+
 #ifdef HAVE_cc0
 
 /* Return nonzero if X is an RTX that only sets the condition codes
@@ -1233,7 +1242,7 @@ delete_related_insns (rtx insn)
   /* If deleting a jump, decrement the count of the label,
      and delete the label if it is now unused.  */
 
-  if (JUMP_P (insn) && JUMP_LABEL (insn))
+  if (jump_to_label_p (insn))
     {
       rtx lab = JUMP_LABEL (insn), lab_next;
 
@@ -1364,6 +1373,18 @@ delete_for_peephole (rtx from, rtx to)
      is also an unconditional jump in that case.  */
 }
 \f
+/* A helper function for redirect_exp_1; examines its input X and returns
+   either a LABEL_REF around a label, or a RETURN if X was NULL.  */
+static rtx
+redirect_target (rtx x)
+{
+  if (x == NULL_RTX)
+    return ret_rtx;
+  if (!ANY_RETURN_P (x))
+    return gen_rtx_LABEL_REF (Pmode, x);
+  return x;
+}
+
 /* Throughout LOC, redirect OLABEL to NLABEL.  Treat null OLABEL or
    NLABEL as a return.  Accrue modifications into the change group.  */
 
@@ -1375,37 +1396,22 @@ redirect_exp_1 (rtx *loc, rtx olabel, rt
   int i;
   const char *fmt;
 
-  if (code == LABEL_REF)
-    {
-      if (XEXP (x, 0) == olabel)
-	{
-	  rtx n;
-	  if (nlabel)
-	    n = gen_rtx_LABEL_REF (Pmode, nlabel);
-	  else
-	    n = ret_rtx;
-
-	  validate_change (insn, loc, n, 1);
-	  return;
-	}
-    }
-  else if (code == RETURN && olabel == 0)
+      if ((code == LABEL_REF && XEXP (x, 0) == olabel)
+      || x == olabel)
     {
-      if (nlabel)
-	x = gen_rtx_LABEL_REF (Pmode, nlabel);
-      else
-	x = ret_rtx;
-      if (loc == &PATTERN (insn))
-	x = gen_rtx_SET (VOIDmode, pc_rtx, x);
+      x = redirect_target (nlabel);
+      if (GET_CODE (x) == LABEL_REF && loc == &PATTERN (insn))
+ 	x = gen_rtx_SET (VOIDmode, pc_rtx, x);
       validate_change (insn, loc, x, 1);
       return;
     }
 
-  if (code == SET && nlabel == 0 && SET_DEST (x) == pc_rtx
+  if (code == SET && SET_DEST (x) == pc_rtx
+      && ANY_RETURN_P (nlabel)
       && GET_CODE (SET_SRC (x)) == LABEL_REF
       && XEXP (SET_SRC (x), 0) == olabel)
     {
-      validate_change (insn, loc, ret_rtx, 1);
+      validate_change (insn, loc, nlabel, 1);
       return;
     }
 
@@ -1442,6 +1448,7 @@ redirect_jump_1 (rtx jump, rtx nlabel)
   int ochanges = num_validated_changes ();
   rtx *loc, asmop;
 
+  gcc_assert (nlabel != NULL_RTX);
   asmop = extract_asm_operands (PATTERN (jump));
   if (asmop)
     {
@@ -1463,17 +1470,20 @@ redirect_jump_1 (rtx jump, rtx nlabel)
    jump target label is unused as a result, it and the code following
    it may be deleted.
 
-   If NLABEL is zero, we are to turn the jump into a (possibly conditional)
-   RETURN insn.
+   Normally, NLABEL will be a label, but it may also be a RETURN rtx;
+   in that case we are to turn the jump into a (possibly conditional)
+   return insn.
 
    The return value will be 1 if the change was made, 0 if it wasn't
-   (this can only occur for NLABEL == 0).  */
+   (this can only occur when trying to produce return insns).  */
 
 int
 redirect_jump (rtx jump, rtx nlabel, int delete_unused)
 {
   rtx olabel = JUMP_LABEL (jump);
 
+  gcc_assert (nlabel != NULL_RTX);
+
   if (nlabel == olabel)
     return 1;
 
@@ -1501,13 +1511,14 @@ redirect_jump_2 (rtx jump, rtx olabel, r
      about this.  */
   gcc_assert (delete_unused >= 0);
   JUMP_LABEL (jump) = nlabel;
-  if (nlabel)
+  if (!ANY_RETURN_P (nlabel))
     ++LABEL_NUSES (nlabel);
 
   /* Update labels in any REG_EQUAL note.  */
   if ((note = find_reg_note (jump, REG_EQUAL, NULL_RTX)) != NULL_RTX)
     {
-      if (!nlabel || (invert && !invert_exp_1 (XEXP (note, 0), jump)))
+      if (ANY_RETURN_P (nlabel)
+	  || (invert && !invert_exp_1 (XEXP (note, 0), jump)))
 	remove_note (jump, note);
       else
 	{
@@ -1516,7 +1527,8 @@ redirect_jump_2 (rtx jump, rtx olabel, r
 	}
     }
 
-  if (olabel && --LABEL_NUSES (olabel) == 0 && delete_unused > 0
+  if (!ANY_RETURN_P (olabel)
+      && --LABEL_NUSES (olabel) == 0 && delete_unused > 0
       /* Undefined labels will remain outside the insn stream.  */
       && INSN_UID (olabel))
     delete_related_insns (olabel);
Index: gcc/ifcvt.c
===================================================================
--- gcc/ifcvt.c	(revision 176838)
+++ gcc/ifcvt.c	(working copy)
@@ -104,7 +104,7 @@ static int cond_exec_find_if_block (ce_i
 static int find_if_case_1 (basic_block, edge, edge);
 static int find_if_case_2 (basic_block, edge, edge);
 static int dead_or_predicable (basic_block, basic_block, basic_block,
-			       basic_block, int);
+			       edge, int);
 static void noce_emit_move_insn (rtx, rtx);
 static rtx block_has_only_trap (basic_block);
 \f
@@ -3847,7 +3847,7 @@ find_if_case_1 (basic_block test_bb, edg
 
   /* Registers set are dead, or are predicable.  */
   if (! dead_or_predicable (test_bb, then_bb, else_bb,
-			    single_succ (then_bb), 1))
+			    single_succ_edge (then_bb), 1))
     return FALSE;
 
   /* Conversion went ok, including moving the insns and fixing up the
@@ -3962,7 +3962,7 @@ find_if_case_2 (basic_block test_bb, edg
     return FALSE;
 
   /* Registers set are dead, or are predicable.  */
-  if (! dead_or_predicable (test_bb, else_bb, then_bb, else_succ->dest, 0))
+  if (! dead_or_predicable (test_bb, else_bb, then_bb, else_succ, 0))
     return FALSE;
 
   /* Conversion went ok, including moving the insns and fixing up the
@@ -3985,18 +3985,21 @@ find_if_case_2 (basic_block test_bb, edg
    Return TRUE if successful.
 
    TEST_BB is the block containing the conditional branch.  MERGE_BB
-   is the block containing the code to manipulate.  NEW_DEST is the
-   label TEST_BB should be branching to after the conversion.
+   is the block containing the code to manipulate.  DEST_EDGE is an
+   edge representing a jump to the join block; after the conversion,
+   TEST_BB should be branching to its destination.
    REVERSEP is true if the sense of the branch should be reversed.  */
 
 static int
 dead_or_predicable (basic_block test_bb, basic_block merge_bb,
-		    basic_block other_bb, basic_block new_dest, int reversep)
+		    basic_block other_bb, edge dest_edge, int reversep)
 {
-  rtx head, end, jump, earliest = NULL_RTX, old_dest, new_label = NULL_RTX;
+  basic_block new_dest = dest_edge->dest;
+  rtx head, end, jump, earliest = NULL_RTX, old_dest;
   bitmap merge_set = NULL;
   /* Number of pending changes.  */
   int n_validated_changes = 0;
+  rtx new_dest_label = NULL_RTX;
 
   jump = BB_END (test_bb);
 
@@ -4134,10 +4137,16 @@ dead_or_predicable (basic_block test_bb,
   old_dest = JUMP_LABEL (jump);
   if (other_bb != new_dest)
     {
-      new_label = block_label (new_dest);
+      if (JUMP_P (BB_END (dest_edge->src)))
+	new_dest_label = JUMP_LABEL (BB_END (dest_edge->src));
+      else if (new_dest == EXIT_BLOCK_PTR)
+	new_dest_label = ret_rtx;
+      else
+	new_dest_label = block_label (new_dest);
+
       if (reversep
-	  ? ! invert_jump_1 (jump, new_label)
-	  : ! redirect_jump_1 (jump, new_label))
+	  ? ! invert_jump_1 (jump, new_dest_label)
+	  : ! redirect_jump_1 (jump, new_dest_label))
 	goto cancel;
     }
 
@@ -4148,7 +4157,7 @@ dead_or_predicable (basic_block test_bb,
 
   if (other_bb != new_dest)
     {
-      redirect_jump_2 (jump, old_dest, new_label, 0, reversep);
+      redirect_jump_2 (jump, old_dest, new_dest_label, 0, reversep);
 
       redirect_edge_succ (BRANCH_EDGE (test_bb), new_dest);
       if (reversep)
Index: gcc/function.c
===================================================================
--- gcc/function.c	(revision 176838)
+++ gcc/function.c	(working copy)
@@ -5305,7 +5305,8 @@ emit_use_return_register_into_block (bas
 static void
 emit_return_into_block (basic_block bb)
 {
-  emit_jump_insn_after (gen_return (), BB_END (bb));
+  rtx jump = emit_jump_insn_after (gen_return (), BB_END (bb));
+  JUMP_LABEL (jump) = ret_rtx;
 }
 #endif /* HAVE_return */
 
@@ -5464,7 +5465,7 @@ thread_prologue_and_epilogue_insns (void
 		 that with a conditional return instruction.  */
 	      else if (condjump_p (jump))
 		{
-		  if (! redirect_jump (jump, 0, 0))
+		  if (! redirect_jump (jump, ret_rtx, 0))
 		    {
 		      ei_next (&ei2);
 		      continue;
@@ -5547,6 +5548,8 @@ thread_prologue_and_epilogue_insns (void
 #ifdef HAVE_epilogue
   if (HAVE_epilogue)
     {
+      rtx returnjump;
+
       start_sequence ();
       epilogue_end = emit_note (NOTE_INSN_EPILOGUE_BEG);
       seq = gen_epilogue ();
@@ -5557,11 +5560,25 @@ thread_prologue_and_epilogue_insns (void
       record_insns (seq, NULL, &epilogue_insn_hash);
       set_insn_locators (seq, epilogue_locator);
 
+      returnjump = get_last_insn ();
       seq = get_insns ();
       end_sequence ();
 
       insert_insn_on_edge (seq, e);
       inserted = true;
+
+      if (JUMP_P (returnjump))
+	{
+	  rtx pat = PATTERN (returnjump);
+	  if (GET_CODE (pat) == PARALLEL)
+	    pat = XVECEXP (pat, 0, 0);
+	  if (ANY_RETURN_P (pat))
+	    JUMP_LABEL (returnjump) = pat;
+	  else
+	    JUMP_LABEL (returnjump) = ret_rtx;
+	}
+      else
+	returnjump = NULL_RTX;
     }
   else
 #endif
Index: gcc/print-rtl.c
===================================================================
--- gcc/print-rtl.c	(revision 176838)
+++ gcc/print-rtl.c	(working copy)
@@ -323,9 +323,14 @@ print_rtx (const_rtx in_rtx)
 	      }
 	  }
 	else if (i == 8 && JUMP_P (in_rtx) && JUMP_LABEL (in_rtx) != NULL)
-	  /* Output the JUMP_LABEL reference.  */
-	  fprintf (outfile, "\n%s%*s -> %d", print_rtx_head, indent * 2, "",
-		   INSN_UID (JUMP_LABEL (in_rtx)));
+	  {
+	    /* Output the JUMP_LABEL reference.  */
+	    fprintf (outfile, "\n%s%*s -> ", print_rtx_head, indent * 2, "");
+	    if (GET_CODE (JUMP_LABEL (in_rtx)) == RETURN)
+	      fprintf (outfile, "return");
+	    else
+	      fprintf (outfile, "%d", INSN_UID (JUMP_LABEL (in_rtx)));
+	  }
 	else if (i == 0 && GET_CODE (in_rtx) == VALUE)
 	  {
 #ifndef GENERATOR_FILE
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c	(revision 176838)
+++ gcc/emit-rtl.c	(working copy)
@@ -3322,14 +3322,17 @@ prev_label (rtx insn)
   return insn;
 }
 
-/* Return the last label to mark the same position as LABEL.  Return null
-   if LABEL itself is null.  */
+/* Return the last label to mark the same position as LABEL.  Return LABEL
+   itself if it is null or any return rtx.  */
 
 rtx
 skip_consecutive_labels (rtx label)
 {
   rtx insn;
 
+  if (label && ANY_RETURN_P (label))
+    return label;
+
   for (insn = label; insn != 0 && !INSN_P (insn); insn = NEXT_INSN (insn))
     if (LABEL_P (insn))
       label = insn;
Index: gcc/cfglayout.c
===================================================================
--- gcc/cfglayout.c	(revision 176838)
+++ gcc/cfglayout.c	(working copy)
@@ -899,7 +899,7 @@ fixup_reorder_chain (void)
 	 Note force_nonfallthru can delete E_FALL and thus we have to
 	 save E_FALL->src prior to the call to force_nonfallthru.  */
       src_bb = e_fall->src;
-      nb = force_nonfallthru (e_fall);
+      nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest);
       if (nb)
 	{
 	  nb->il.rtl->visited = 1;
@@ -1195,6 +1195,9 @@ duplicate_insn_chain (rtx from, rtx to)
 	      break;
 	    }
 	  copy = emit_copy_of_insn_after (insn, get_last_insn ());
+	  if (JUMP_P (insn) && JUMP_LABEL (insn) != NULL_RTX
+	      && ANY_RETURN_P (JUMP_LABEL (insn)))
+	    JUMP_LABEL (copy) = JUMP_LABEL (insn);
           maybe_copy_prologue_epilogue_insn (insn, copy);
 	  break;
 
Index: gcc/rtl.h
===================================================================
--- gcc/rtl.h	(revision 176838)
+++ gcc/rtl.h	(working copy)
@@ -432,6 +432,9 @@ struct GTY((variable_size)) rtvec_def {
   (JUMP_P (INSN) && (GET_CODE (PATTERN (INSN)) == ADDR_VEC || \
 		     GET_CODE (PATTERN (INSN)) == ADDR_DIFF_VEC))
 
+/* Predicate yielding nonzero iff X is a return.  */
+#define ANY_RETURN_P(X) ((X) == ret_rtx)
+
 /* 1 if X is a unary operator.  */
 
 #define UNARY_P(X)   \
@@ -2341,6 +2344,7 @@ extern void check_for_inc_dec (rtx insn)
 
 /* In jump.c */
 extern int comparison_dominates_p (enum rtx_code, enum rtx_code);
+extern bool jump_to_label_p (rtx);
 extern int condjump_p (const_rtx);
 extern int any_condjump_p (const_rtx);
 extern int any_uncondjump_p (const_rtx);
Index: gcc/resource.c
===================================================================
--- gcc/resource.c	(revision 176838)
+++ gcc/resource.c	(working copy)
@@ -495,6 +495,8 @@ find_dead_or_set_registers (rtx target,
 		  || GET_CODE (PATTERN (this_jump_insn)) == RETURN)
 		{
 		  next = JUMP_LABEL (this_jump_insn);
+		  if (ANY_RETURN_P (next))
+		    next = NULL_RTX;
 		  if (jump_insn == 0)
 		    {
 		      jump_insn = insn;
@@ -562,9 +564,10 @@ find_dead_or_set_registers (rtx target,
 		  AND_COMPL_HARD_REG_SET (scratch, needed.regs);
 		  AND_COMPL_HARD_REG_SET (fallthrough_res.regs, scratch);
 
-		  find_dead_or_set_registers (JUMP_LABEL (this_jump_insn),
-					      &target_res, 0, jump_count,
-					      target_set, needed);
+		  if (!ANY_RETURN_P (JUMP_LABEL (this_jump_insn)))
+		    find_dead_or_set_registers (JUMP_LABEL (this_jump_insn),
+						&target_res, 0, jump_count,
+						target_set, needed);
 		  find_dead_or_set_registers (next,
 					      &fallthrough_res, 0, jump_count,
 					      set, needed);
@@ -878,7 +881,7 @@ mark_target_live_regs (rtx insns, rtx ta
   struct resources set, needed;
 
   /* Handle end of function.  */
-  if (target == 0)
+  if (target == 0 || ANY_RETURN_P (target))
     {
       *res = end_of_function_needs;
       return;
@@ -1097,8 +1100,9 @@ mark_target_live_regs (rtx insns, rtx ta
       struct resources new_resources;
       rtx stop_insn = next_active_insn (jump_insn);
 
-      mark_target_live_regs (insns, next_active_insn (jump_target),
-			     &new_resources);
+      if (!ANY_RETURN_P (jump_target))
+	jump_target = next_active_insn (jump_target);
+      mark_target_live_regs (insns, jump_target, &new_resources);
       CLEAR_RESOURCE (&set);
       CLEAR_RESOURCE (&needed);
 
Index: gcc/basic-block.h
===================================================================
--- gcc/basic-block.h	(revision 176838)
+++ gcc/basic-block.h	(working copy)
@@ -804,6 +804,7 @@ extern rtx block_label (basic_block);
 extern bool purge_all_dead_edges (void);
 extern bool purge_dead_edges (basic_block);
 extern bool fixup_abnormal_edges (void);
+extern basic_block force_nonfallthru_and_redirect (edge, basic_block);
 
 /* In cfgbuild.c.  */
 extern void find_many_sub_basic_blocks (sbitmap);
Index: gcc/config/alpha/alpha.c
===================================================================
--- gcc/config/alpha/alpha.c	(revision 176838)
+++ gcc/config/alpha/alpha.c	(working copy)
@@ -571,59 +571,6 @@ direct_return (void)
 	  && crtl->args.pretend_args_size == 0);
 }
 
-/* Return the ADDR_VEC associated with a tablejump insn.  */
-
-rtx
-alpha_tablejump_addr_vec (rtx insn)
-{
-  rtx tmp;
-
-  tmp = JUMP_LABEL (insn);
-  if (!tmp)
-    return NULL_RTX;
-  tmp = NEXT_INSN (tmp);
-  if (!tmp)
-    return NULL_RTX;
-  if (JUMP_P (tmp)
-      && GET_CODE (PATTERN (tmp)) == ADDR_DIFF_VEC)
-    return PATTERN (tmp);
-  return NULL_RTX;
-}
-
-/* Return the label of the predicted edge, or CONST0_RTX if we don't know.  */
-
-rtx
-alpha_tablejump_best_label (rtx insn)
-{
-  rtx jump_table = alpha_tablejump_addr_vec (insn);
-  rtx best_label = NULL_RTX;
-
-  /* ??? Once the CFG doesn't keep getting completely rebuilt, look
-     there for edge frequency counts from profile data.  */
-
-  if (jump_table)
-    {
-      int n_labels = XVECLEN (jump_table, 1);
-      int best_count = -1;
-      int i, j;
-
-      for (i = 0; i < n_labels; i++)
-	{
-	  int count = 1;
-
-	  for (j = i + 1; j < n_labels; j++)
-	    if (XEXP (XVECEXP (jump_table, 1, i), 0)
-		== XEXP (XVECEXP (jump_table, 1, j), 0))
-	      count++;
-
-	  if (count > best_count)
-	    best_count = count, best_label = XVECEXP (jump_table, 1, i);
-	}
-    }
-
-  return best_label ? best_label : const0_rtx;
-}
-
 /* Return the TLS model to use for SYMBOL.  */
 
 static enum tls_model
Index: gcc/config/alpha/alpha-protos.h
===================================================================
--- gcc/config/alpha/alpha-protos.h	(revision 176838)
+++ gcc/config/alpha/alpha-protos.h	(working copy)
@@ -31,9 +31,6 @@ extern void alpha_expand_prologue (void)
 extern void alpha_expand_epilogue (void);
 extern void alpha_output_filename (FILE *, const char *);
 
-extern rtx alpha_tablejump_addr_vec (rtx);
-extern rtx alpha_tablejump_best_label (rtx);
-
 extern bool alpha_legitimate_constant_p (enum machine_mode, rtx);
 extern rtx alpha_legitimize_reload_address (rtx, enum machine_mode,
 					    int, int, int);
Index: gcc/config/sh/sh.c
===================================================================
--- gcc/config/sh/sh.c	(revision 176838)
+++ gcc/config/sh/sh.c	(working copy)
@@ -5276,7 +5276,8 @@ barrier_align (rtx barrier_or_label)
 	}
       if (prev
 	  && JUMP_P (prev)
-	  && JUMP_LABEL (prev))
+	  && JUMP_LABEL (prev) != NULL_RTX
+	  && !ANY_RETURN_P (JUMP_LABEL (prev)))
 	{
 	  rtx x;
 	  if (jump_to_next
@@ -5975,7 +5976,7 @@ split_branches (rtx first)
 			JUMP_LABEL (insn) = far_label;
 			LABEL_NUSES (far_label)++;
 		      }
-		    redirect_jump (insn, NULL_RTX, 1);
+		    redirect_jump (insn, ret_rtx, 1);
 		    far_label = 0;
 		  }
 	      }
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 176838)
+++ gcc/config/arm/arm.c	(working copy)
@@ -11479,6 +11479,7 @@ is_jump_table (rtx insn)
 
   if (GET_CODE (insn) == JUMP_INSN
       && JUMP_LABEL (insn) != NULL
+      && !ANY_RETURN_P (JUMP_LABEL (insn))
       && ((table = next_real_insn (JUMP_LABEL (insn)))
 	  == next_real_insn (insn))
       && table != NULL
Index: gcc/cfgrtl.c
===================================================================
--- gcc/cfgrtl.c	(revision 176838)
+++ gcc/cfgrtl.c	(working copy)
@@ -1119,7 +1119,7 @@ rtl_redirect_edge_and_branch (edge e, ba
 /* Like force_nonfallthru below, but additionally performs redirection
    Used by redirect_edge_and_branch_force.  */
 
-static basic_block
+basic_block
 force_nonfallthru_and_redirect (edge e, basic_block target)
 {
   basic_block jump_block, new_bb = NULL, src = e->src;

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/6] Shrink-wrapping
  2011-07-28 11:48         ` Bernd Schmidt
@ 2011-07-28 12:45           ` Richard Sandiford
  2011-07-28 23:30           ` Richard Earnshaw
  2011-08-03 10:42           ` Alan Modra
  2 siblings, 0 replies; 73+ messages in thread
From: Richard Sandiford @ 2011-07-28 12:45 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches

Bernd Schmidt <bernds@codesourcery.com> writes:
>>> -  for (; insn; insn = next)
>>> +  for (; insn && !ANY_RETURN_P (insn); insn = next)
>>>      {
>>>        if (NONJUMP_INSN_P (insn) && GET_CODE (PATTERN (insn)) == SEQUENCE)
>>>  	insn = XVECEXP (PATTERN (insn), 0, 0);
>> 
>> Since ANY_RETURN looks for patterns, while this loop iterates over insns,
>> I think it'd be more obvious to have:
>> 
>>   if (insn && ANY_RETURN_P (insn))
>>     return 1;
>> 
>> above the loop instead
>
> That alone wouldn't work since we assign JUMP_LABELs to next.

Doh

> Left alone for now.

OK.

>> This pattern came up in reorg.c too, so maybe it would be worth having
>> a jump_to_label_p inline function somewhere, such as:
>
> Done. Only has two uses for now though; reorg.c uses different patterns
> mostly.

There are a few other natural uses too though (below).

>>> @@ -1195,6 +1195,9 @@ duplicate_insn_chain (rtx from, rtx to)
>>>  	      break;
>>>  	    }
>>>  	  copy = emit_copy_of_insn_after (insn, get_last_insn ());
>>> +	  if (JUMP_P (insn) && JUMP_LABEL (insn) != NULL_RTX
>>> +	      && ANY_RETURN_P (JUMP_LABEL (insn)))
>>> +	    JUMP_LABEL (copy) = JUMP_LABEL (insn);
>> 
>> I think this should go in emit_copy_of_insn_after instead.
>
> Here I'd like to avoid modifying the existing code in
> emit_copy_of_insn_after if possible. Not sure why it's not copying
> JUMP_LABELS, but that's something I'd prefer to investigate at some
> other time rather than risk breaking things.

OK.

> New patch below. Retested on i686-linux and mips64-elf. Ok?

Looks good to me, thanks.  OK with:

> @@ -2757,7 +2770,8 @@ fill_slots_from_thread (rtx insn, rtx co
>  			      gcc_assert (REG_NOTE_KIND (note)
>  					  == REG_LABEL_OPERAND);
>  			  }
> -		      if (JUMP_P (trial) && JUMP_LABEL (trial))
> +		      if (JUMP_P (trial) && JUMP_LABEL (trial)
> +			  && !ANY_RETURN_P (JUMP_LABEL (trial)))
>  			LABEL_NUSES (JUMP_LABEL (trial))++;

jump_to_label_p here.

> @@ -2776,7 +2790,8 @@ fill_slots_from_thread (rtx insn, rtx co
>  			      gcc_assert (REG_NOTE_KIND (note)
>  					  == REG_LABEL_OPERAND);
>  			  }
> -		      if (JUMP_P (trial) && JUMP_LABEL (trial))
> +		      if (JUMP_P (trial) && JUMP_LABEL (trial)
> +			  && !ANY_RETURN_P (JUMP_LABEL (trial)))
>  			LABEL_NUSES (JUMP_LABEL (trial))--;

and here.

> Index: gcc/config/sh/sh.c
> ===================================================================
> --- gcc/config/sh/sh.c	(revision 176838)
> +++ gcc/config/sh/sh.c	(working copy)
> @@ -5276,7 +5276,8 @@ barrier_align (rtx barrier_or_label)
>  	}
>        if (prev
>  	  && JUMP_P (prev)
> -	  && JUMP_LABEL (prev))
> +	  && JUMP_LABEL (prev) != NULL_RTX
> +	  && !ANY_RETURN_P (JUMP_LABEL (prev)))
>  	{
>  	  rtx x;
>  	  if (jump_to_next

and here.

> Index: gcc/config/arm/arm.c
> ===================================================================
> --- gcc/config/arm/arm.c	(revision 176838)
> +++ gcc/config/arm/arm.c	(working copy)
> @@ -11479,6 +11479,7 @@ is_jump_table (rtx insn)
>  
>    if (GET_CODE (insn) == JUMP_INSN
>        && JUMP_LABEL (insn) != NULL
> +      && !ANY_RETURN_P (JUMP_LABEL (insn))
>        && ((table = next_real_insn (JUMP_LABEL (insn)))
>  	  == next_real_insn (insn))

and here.

Richard

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/6] Shrink-wrapping
  2011-07-28 11:48         ` Bernd Schmidt
  2011-07-28 12:45           ` Richard Sandiford
@ 2011-07-28 23:30           ` Richard Earnshaw
  2011-07-29 12:40             ` Bernd Schmidt
  2011-08-03 10:42           ` Alan Modra
  2 siblings, 1 reply; 73+ messages in thread
From: Richard Earnshaw @ 2011-07-28 23:30 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches, richard.sandiford

On 28/07/11 11:35, Bernd Schmidt wrote:
> On 07/21/11 11:52, Richard Sandiford wrote:
>> The name "active_insn_after" seems a bit too similar to "next_active_insn"
>> for the difference to be obvious.  How about something like
>> "first_active_target_insn" instead?
> 
> Changed.
>>> -  for (; insn; insn = next)
>>> +  for (; insn && !ANY_RETURN_P (insn); insn = next)
>>>      {
>>>        if (NONJUMP_INSN_P (insn) && GET_CODE (PATTERN (insn)) == SEQUENCE)
>>>  	insn = XVECEXP (PATTERN (insn), 0, 0);
>>
>> Since ANY_RETURN looks for patterns, while this loop iterates over insns,
>> I think it'd be more obvious to have:
>>
>>   if (insn && ANY_RETURN_P (insn))
>>     return 1;
>>
>> above the loop instead
> 
> That alone wouldn't work since we assign JUMP_LABELs to next. Left alone
> for now.
> 
>>> --- gcc/jump.c	(revision 176230)
>>> +++ gcc/jump.c	(working copy)
>>> @@ -1217,7 +1217,7 @@ delete_related_insns (rtx insn)
>>
>> Given what you said above, and given that this is a public function,
>> I think we should keep the null check.
> 
> Changed.
>>
>> This pattern came up in reorg.c too, so maybe it would be worth having
>> a jump_to_label_p inline function somewhere, such as:
> 
> Done. Only has two uses for now though; reorg.c uses different patterns
> mostly.
> 
>> It looks like the old code tried to allow returns to be redirected
>> to a label -- (return) to (set (pc) (label_ref)) -- whereas the new
>> code doesn't. [...]
>>
>> How about:
>>
>>       x = redirect_target (nlabel);
>>       if (GET_CODE (x) == LABEL_REF && loc == &PATTERN (insn))
>> 	x = gen_rtx_SET (VOIDmode, pc_rtx, x);
>>       validate_change (insn, loc, x, 1);
> 
> Changed, although this probably isn't a useful thing to allow (it will
> just add one more unnecessary jump to the code?).
> 
> [ifcvt changes]
>> I found the placement of this code a bit confusing as things stand.
>> new_dest_label is only meaningful if other_bb != new_dest, so it seemed
>> like something that should directly replace the existing new_label
>> assignment.  It's OK if it makes the shrink-wrap stuff easier though.
> 
> Changed.
> 
>>> @@ -1195,6 +1195,9 @@ duplicate_insn_chain (rtx from, rtx to)
>>>  	      break;
>>>  	    }
>>>  	  copy = emit_copy_of_insn_after (insn, get_last_insn ());
>>> +	  if (JUMP_P (insn) && JUMP_LABEL (insn) != NULL_RTX
>>> +	      && ANY_RETURN_P (JUMP_LABEL (insn)))
>>> +	    JUMP_LABEL (copy) = JUMP_LABEL (insn);
>>
>> I think this should go in emit_copy_of_insn_after instead.
> 
> Here I'd like to avoid modifying the existing code in
> emit_copy_of_insn_after if possible. Not sure why it's not copying
> JUMP_LABELS, but that's something I'd prefer to investigate at some
> other time rather than risk breaking things.
> 
>>> @@ -2294,6 +2294,8 @@ create_cfi_notes (void)
>>>  	  dwarf2out_frame_debug (insn, false);
>>>  	  continue;
>>>  	}
>>> +      if (GET_CODE (pat) == ADDR_VEC || GET_CODE (pat) == ADDR_DIFF_VEC)
>>> +	continue;
>>>  
>>>        if (GET_CODE (pat) == SEQUENCE)
>>>  	{
>>
>> rth better approve this bit...
> 
> It went away.
> 
> New patch below. Retested on i686-linux and mips64-elf. Ok?
> 
> 
> Bernd


This causes http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49891

R.


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/6] Shrink-wrapping
  2011-07-28 23:30           ` Richard Earnshaw
@ 2011-07-29 12:40             ` Bernd Schmidt
  0 siblings, 0 replies; 73+ messages in thread
From: Bernd Schmidt @ 2011-07-29 12:40 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: GCC Patches, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 134 bytes --]

On 07/29/11 00:31, Richard Earnshaw wrote:

> This causes http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49891

Fixed with this.


Bernd

[-- Attachment #2: 49891.diff --]
[-- Type: text/plain, Size: 864 bytes --]

Index: gcc/ChangeLog
===================================================================
--- gcc/ChangeLog	(revision 176904)
+++ gcc/ChangeLog	(working copy)
@@ -1,3 +1,9 @@
+2011-07-29  Bernd Schmidt  <bernds@codesourcery.com>
+
+	PR rtl-optimization/49891
+	* cfgrtl.c (force_nonfallthru_and_redirect): Set JUMP_LABEL for
+	newly created returnjumps.
+
 2011-07-28  DJ Delorie  <dj@redhat.com>
 
 	* expr.c (expand_expr_addr_expr_1): Detect a user request for a
Index: gcc/cfgrtl.c
===================================================================
--- gcc/cfgrtl.c	(revision 176881)
+++ gcc/cfgrtl.c	(working copy)
@@ -1254,6 +1254,7 @@ force_nonfallthru_and_redirect (edge e,
     {
 #ifdef HAVE_return
 	emit_jump_insn_after_setloc (gen_return (), BB_END (jump_block), loc);
+	JUMP_LABEL (BB_END (jump_block)) = ret_rtx;
 #else
 	gcc_unreachable ();
 #endif

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/6] Shrink-wrapping
  2011-07-07 14:51   ` Richard Sandiford
                       ` (2 preceding siblings ...)
  2011-07-21  3:57     ` Bernd Schmidt
@ 2011-08-02  8:40     ` Bernd Schmidt
  2011-08-03 15:39       ` Richard Sandiford
  3 siblings, 1 reply; 73+ messages in thread
From: Bernd Schmidt @ 2011-08-02  8:40 UTC (permalink / raw)
  To: GCC Patches, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 816 bytes --]

On 07/07/11 16:34, Richard Sandiford wrote:
> I didn't review much after this, because it was hard to sort the
> simple_return stuff out from the "JUMP_LABEL can be a return rtx" change.

So, here's a second preliminary patch. Now that we have returns in
JUMP_LABELs, we can introduce SIMPLE_RETURN and distinguish between the two.

Admittedly this patch is somewhat poorly motivated when separated from
the rest of the shrink-wrapping stuff, but it is self-contained. In
order to have one user of simple_return, I've modified the mips epilogue
to generate it.

Bootstrapped and tested on i686-linux; I've also verified that I don't
see code generation changes with mips64-elf, sh-elf and sparc-linux
cross-compilers (with SIMPLE_RETURN placed last in rtl.def since
otherwise there are hashing differences).


Bernd

[-- Attachment #2: sret2.diff --]
[-- Type: text/plain, Size: 45705 bytes --]

	* doc/rtl.texi (simple_return): Document.
	(parallel, PATTERN): Here too.
	* gengenrtl.c (special_rtx): SIMPLE_RETURN is special.
	* final.c (final_scan_insn): Use ANY_RETURN_P on body.
	* reorg.c (function_return_label, function_simple_return_label):
	New static variables, replacing...
	(end_of_function_label): ... this.
	(simplejump_or_return_p): New static function.
	(optimize_skip, steal_delay_list_from_fallthrough,
	fill_slots_from_thread): Use it.
	(relax_delay_slots): Likewise.  Use ANY_RETURN_P on body.
	(rare_destination, follow_jumps): Use ANY_RETURN_P on body.
	(find_end_label): Take a new arg which is one of the two return
	rtxs.  Depending on which, set either function_return_label or
	function_simple_return_label.  All callers changed.
	(make_return_insns): Make both kinds.
	(dbr_schedule): Adjust for two kinds of end labels.
	* genemit.c (gen_exp): Handle SIMPLE_RETURN.
	(gen_expand, gen_split): Use ANY_RETURN_P.
	* df-scan.c (df_uses_record): Handle SIMPLE_RETURN.
	* rtl.def (SIMPLE_RETURN): New code.
	* ifcvt.c (find_if_case_1): Be more careful about
	redirecting jumps to the EXIT_BLOCK.
	* jump.c (condjump_p, condjump_in_parallel_p, any_condjump_p,
	returnjump_p_1): Handle SIMPLE_RETURNs.
	* print-rtl.c (print_rtx): Likewise.
	* rtl.c (copy_rtx): Likewise.
	* bt-load.c (compute_defs_uses_and_gen): Use ANY_RETURN_P.
	* combine.c (simplify_set): Likewise.
	* resource.c (find_dead_or_set_registers, mark_set_resources):
	Likewise.
	* emit-rtl.c (verify_rtx_sharing, classify_insn): Handle
	SIMPLE_RETURNs.
	(init_emit_regs): Initialize simple_return_rtx.
	* cfglayout.c (fixup_reorder_chain): Pass a JUMP_LABEL to
	force_nonfallthru_and_redirect.
	* rtl.h (ANY_RETURN_P): Allow SIMPLE_RETURN.
	(GR_SIMPLE_RETURN): New enum value.
	(simple_return_rtx): New macro.
	* basic-block.h (force_nonfallthru_and_redirect): Adjust
	declaration.
	* cfgrtl.c (force_nonfallthru_and_redirect): Take a new jump_label
	argument.  All callers changed.  Be careful about what kinds of
	returnjumps to generate.
	* config/i386/3i86.c (ix86_pad_returns, ix86_count_insn_bb,
	ix86_pad_short_function): Likewise.
	* config/arm/arm.c (arm_final_prescan_insn): Handle both kinds
	of return.
	* config/mips/mips.md (simple_return, *simple_return,
	simple_return_internal): New patterns.
	* config/mips/mips.c (mips_expand_epilogue): Make the last insn
	a simple_return_internal.

Index: gcc/doc/rtl.texi
===================================================================
--- gcc/doc/rtl.texi	(revision 176879)
+++ gcc/doc/rtl.texi	(working copy)
@@ -2915,6 +2915,13 @@ placed in @code{pc} to return to the cal
 Note that an insn pattern of @code{(return)} is logically equivalent to
 @code{(set (pc) (return))}, but the latter form is never used.
 
+@findex simple_return
+@item (simple_return)
+Like @code{(return)}, but truly represents only a function return, while
+@code{(return)} may represent an insn that also performs other functions
+of the function epilogue.  Like @code{(return)}, this may also occur in
+conditional jumps.
+
 @findex call
 @item (call @var{function} @var{nargs})
 Represents a function call.  @var{function} is a @code{mem} expression
@@ -3044,7 +3051,7 @@ Represents several side effects performe
 brackets stand for a vector; the operand of @code{parallel} is a
 vector of expressions.  @var{x0}, @var{x1} and so on are individual
 side effect expressions---expressions of code @code{set}, @code{call},
-@code{return}, @code{clobber} or @code{use}.
+@code{return}, @code{simple_return}, @code{clobber} or @code{use}.
 
 ``In parallel'' means that first all the values used in the individual
 side-effects are computed, and second all the actual side-effects are
@@ -3683,14 +3690,16 @@ and @code{call_insn} insns:
 @table @code
 @findex PATTERN
 @item PATTERN (@var{i})
-An expression for the side effect performed by this insn.  This must be
-one of the following codes: @code{set}, @code{call}, @code{use},
-@code{clobber}, @code{return}, @code{asm_input}, @code{asm_output},
-@code{addr_vec}, @code{addr_diff_vec}, @code{trap_if}, @code{unspec},
-@code{unspec_volatile}, @code{parallel}, @code{cond_exec}, or @code{sequence}.  If it is a @code{parallel},
-each element of the @code{parallel} must be one these codes, except that
-@code{parallel} expressions cannot be nested and @code{addr_vec} and
-@code{addr_diff_vec} are not permitted inside a @code{parallel} expression.
+An expression for the side effect performed by this insn.  This must
+be one of the following codes: @code{set}, @code{call}, @code{use},
+@code{clobber}, @code{return}, @code{simple_return}, @code{asm_input},
+@code{asm_output}, @code{addr_vec}, @code{addr_diff_vec},
+@code{trap_if}, @code{unspec}, @code{unspec_volatile},
+@code{parallel}, @code{cond_exec}, or @code{sequence}.  If it is a
+@code{parallel}, each element of the @code{parallel} must be one these
+codes, except that @code{parallel} expressions cannot be nested and
+@code{addr_vec} and @code{addr_diff_vec} are not permitted inside a
+@code{parallel} expression.
 
 @findex INSN_CODE
 @item INSN_CODE (@var{i})
Index: gcc/gengenrtl.c
===================================================================
--- gcc/gengenrtl.c	(revision 176879)
+++ gcc/gengenrtl.c	(working copy)
@@ -131,6 +131,7 @@ special_rtx (int idx)
 	  || strcmp (defs[idx].enumname, "PC") == 0
 	  || strcmp (defs[idx].enumname, "CC0") == 0
 	  || strcmp (defs[idx].enumname, "RETURN") == 0
+	  || strcmp (defs[idx].enumname, "SIMPLE_RETURN") == 0
 	  || strcmp (defs[idx].enumname, "CONST_VECTOR") == 0);
 }
 
Index: gcc/final.c
===================================================================
--- gcc/final.c	(revision 176879)
+++ gcc/final.c	(working copy)
@@ -2492,7 +2492,7 @@ final_scan_insn (rtx insn, FILE *file, i
 	        delete_insn (insn);
 		break;
 	      }
-	    else if (GET_CODE (SET_SRC (body)) == RETURN)
+	    else if (ANY_RETURN_P (SET_SRC (body)))
 	      /* Replace (set (pc) (return)) with (return).  */
 	      PATTERN (insn) = body = SET_SRC (body);
 
Index: gcc/reorg.c
===================================================================
--- gcc/reorg.c	(revision 176881)
+++ gcc/reorg.c	(working copy)
@@ -161,8 +161,11 @@ static rtx *unfilled_firstobj;
 #define unfilled_slots_next	\
   ((rtx *) obstack_next_free (&unfilled_slots_obstack))
 
-/* Points to the label before the end of the function.  */
-static rtx end_of_function_label;
+/* Points to the label before the end of the function, or before a
+   return insn.  */
+static rtx function_return_label;
+/* Likewise for a simple_return.  */
+static rtx function_simple_return_label;
 
 /* Mapping between INSN_UID's and position in the code since INSN_UID's do
    not always monotonically increase.  */
@@ -175,7 +178,7 @@ static int stop_search_p (rtx, int);
 static int resource_conflicts_p (struct resources *, struct resources *);
 static int insn_references_resource_p (rtx, struct resources *, bool);
 static int insn_sets_resource_p (rtx, struct resources *, bool);
-static rtx find_end_label (void);
+static rtx find_end_label (rtx);
 static rtx emit_delay_sequence (rtx, rtx, int);
 static rtx add_to_delay_list (rtx, rtx);
 static rtx delete_from_delay_slot (rtx);
@@ -231,6 +234,15 @@ first_active_target_insn (rtx insn)
   return next_active_insn (insn);
 }
 \f
+/* Return true iff INSN is a simplejump, or any kind of return insn.  */
+
+static bool
+simplejump_or_return_p (rtx insn)
+{
+  return (JUMP_P (insn)
+	  && (simplejump_p (insn) || ANY_RETURN_P (PATTERN (insn))));
+}
+\f
 /* Return TRUE if this insn should stop the search for insn to fill delay
    slots.  LABELS_P indicates that labels should terminate the search.
    In all cases, jumps terminate the search.  */
@@ -346,23 +358,29 @@ insn_sets_resource_p (rtx insn, struct r
 
    ??? There may be a problem with the current implementation.  Suppose
    we start with a bare RETURN insn and call find_end_label.  It may set
-   end_of_function_label just before the RETURN.  Suppose the machinery
+   function_return_label just before the RETURN.  Suppose the machinery
    is able to fill the delay slot of the RETURN insn afterwards.  Then
-   end_of_function_label is no longer valid according to the property
+   function_return_label is no longer valid according to the property
    described above and find_end_label will still return it unmodified.
    Note that this is probably mitigated by the following observation:
-   once end_of_function_label is made, it is very likely the target of
+   once function_return_label is made, it is very likely the target of
    a jump, so filling the delay slot of the RETURN will be much more
    difficult.  */
 
 static rtx
-find_end_label (void)
+find_end_label (rtx kind)
 {
   rtx insn;
+  rtx *plabel;
+
+  if (kind == ret_rtx)
+    plabel = &function_return_label;
+  else
+    plabel = &function_simple_return_label;
 
   /* If we found one previously, return it.  */
-  if (end_of_function_label)
-    return end_of_function_label;
+  if (*plabel)
+    return *plabel;
 
   /* Otherwise, see if there is a label at the end of the function.  If there
      is, it must be that RETURN insns aren't needed, so that is our return
@@ -377,44 +395,44 @@ find_end_label (void)
 
   /* When a target threads its epilogue we might already have a
      suitable return insn.  If so put a label before it for the
-     end_of_function_label.  */
+     function_return_label.  */
   if (BARRIER_P (insn)
       && JUMP_P (PREV_INSN (insn))
-      && GET_CODE (PATTERN (PREV_INSN (insn))) == RETURN)
+      && PATTERN (PREV_INSN (insn)) == kind)
     {
       rtx temp = PREV_INSN (PREV_INSN (insn));
-      end_of_function_label = gen_label_rtx ();
-      LABEL_NUSES (end_of_function_label) = 0;
+      rtx label = gen_label_rtx ();
+      LABEL_NUSES (label) = 0;
 
       /* Put the label before an USE insns that may precede the RETURN insn.  */
       while (GET_CODE (temp) == USE)
 	temp = PREV_INSN (temp);
 
-      emit_label_after (end_of_function_label, temp);
+      emit_label_after (label, temp);
+      *plabel = label;
     }
 
   else if (LABEL_P (insn))
-    end_of_function_label = insn;
+    *plabel = insn;
   else
     {
-      end_of_function_label = gen_label_rtx ();
-      LABEL_NUSES (end_of_function_label) = 0;
+      rtx label = gen_label_rtx ();
+      LABEL_NUSES (label) = 0;
       /* If the basic block reorder pass moves the return insn to
 	 some other place try to locate it again and put our
-	 end_of_function_label there.  */
-      while (insn && ! (JUMP_P (insn)
-		        && (GET_CODE (PATTERN (insn)) == RETURN)))
+	 function_return_label there.  */
+      while (insn && ! (JUMP_P (insn) && (PATTERN (insn) == kind)))
 	insn = PREV_INSN (insn);
       if (insn)
 	{
 	  insn = PREV_INSN (insn);
 
-	  /* Put the label before an USE insns that may proceed the
+	  /* Put the label before an USE insns that may precede the
 	     RETURN insn.  */
 	  while (GET_CODE (insn) == USE)
 	    insn = PREV_INSN (insn);
 
-	  emit_label_after (end_of_function_label, insn);
+	  emit_label_after (label, insn);
 	}
       else
 	{
@@ -424,19 +442,16 @@ find_end_label (void)
 	      && ! HAVE_return
 #endif
 	      )
-	    {
-	      /* The RETURN insn has its delay slot filled so we cannot
-		 emit the label just before it.  Since we already have
-		 an epilogue and cannot emit a new RETURN, we cannot
-		 emit the label at all.  */
-	      end_of_function_label = NULL_RTX;
-	      return end_of_function_label;
-	    }
+	    /* The RETURN insn has its delay slot filled so we cannot
+	       emit the label just before it.  Since we already have
+	       an epilogue and cannot emit a new RETURN, we cannot
+	       emit the label at all.  */
+	    return NULL_RTX;
 #endif /* HAVE_epilogue */
 
 	  /* Otherwise, make a new label and emit a RETURN and BARRIER,
 	     if needed.  */
-	  emit_label (end_of_function_label);
+	  emit_label (label);
 #ifdef HAVE_return
 	  /* We don't bother trying to create a return insn if the
 	     epilogue has filled delay-slots; we would have to try and
@@ -455,13 +470,14 @@ find_end_label (void)
 	    }
 #endif
 	}
+      *plabel = label;
     }
 
   /* Show one additional use for this label so it won't go away until
      we are done.  */
-  ++LABEL_NUSES (end_of_function_label);
+  ++LABEL_NUSES (*plabel);
 
-  return end_of_function_label;
+  return *plabel;
 }
 \f
 /* Put INSN and LIST together in a SEQUENCE rtx of LENGTH, and replace
@@ -809,10 +825,8 @@ optimize_skip (rtx insn)
   if ((next_trial == next_active_insn (JUMP_LABEL (insn))
        && ! (next_trial == 0 && crtl->epilogue_delay_list != 0))
       || (next_trial != 0
-	  && JUMP_P (next_trial)
-	  && JUMP_LABEL (insn) == JUMP_LABEL (next_trial)
-	  && (simplejump_p (next_trial)
-	      || GET_CODE (PATTERN (next_trial)) == RETURN)))
+	  && simplejump_or_return_p (next_trial)
+	  && JUMP_LABEL (insn) == JUMP_LABEL (next_trial)))
     {
       if (eligible_for_annul_false (insn, 0, trial, flags))
 	{
@@ -831,13 +845,11 @@ optimize_skip (rtx insn)
 	 branch, thread our jump to the target of that branch.  Don't
 	 change this into a RETURN here, because it may not accept what
 	 we have in the delay slot.  We'll fix this up later.  */
-      if (next_trial && JUMP_P (next_trial)
-	  && (simplejump_p (next_trial)
-	      || GET_CODE (PATTERN (next_trial)) == RETURN))
+      if (next_trial && simplejump_or_return_p (next_trial))
 	{
 	  rtx target_label = JUMP_LABEL (next_trial);
 	  if (ANY_RETURN_P (target_label))
-	    target_label = find_end_label ();
+	    target_label = find_end_label (target_label);
 
 	  if (target_label)
 	    {
@@ -951,7 +963,7 @@ rare_destination (rtx insn)
 	     return.  */
 	  return 2;
 	case JUMP_INSN:
-	  if (GET_CODE (PATTERN (insn)) == RETURN)
+	  if (ANY_RETURN_P (PATTERN (insn)))
 	    return 1;
 	  else if (simplejump_p (insn)
 		   && jump_count++ < 10)
@@ -1366,8 +1378,7 @@ steal_delay_list_from_fallthrough (rtx i
   /* We can't do anything if SEQ's delay insn isn't an
      unconditional branch.  */
 
-  if (! simplejump_p (XVECEXP (seq, 0, 0))
-      && GET_CODE (PATTERN (XVECEXP (seq, 0, 0))) != RETURN)
+  if (! simplejump_or_return_p (XVECEXP (seq, 0, 0)))
     return delay_list;
 
   for (i = 1; i < XVECLEN (seq, 0); i++)
@@ -2376,7 +2387,7 @@ fill_simple_delay_slots (int non_jumps_p
 	      if (new_label != 0)
 		new_label = get_label_before (new_label);
 	      else
-		new_label = find_end_label ();
+		new_label = find_end_label (simple_return_rtx);
 
 	      if (new_label)
 	        {
@@ -2508,7 +2519,8 @@ fill_simple_delay_slots (int non_jumps_p
 \f
 /* Follow any unconditional jump at LABEL;
    return the ultimate label reached by any such chain of jumps.
-   Return ret_rtx if the chain ultimately leads to a return instruction.
+   Return a suitable return rtx if the chain ultimately leads to a
+   return instruction.
    If LABEL is not followed by a jump, return LABEL.
    If the chain loops or we can't find end, return LABEL,
    since that tells caller to avoid changing the insn.  */
@@ -2529,7 +2541,7 @@ follow_jumps (rtx label)
 	&& JUMP_P (insn)
 	&& JUMP_LABEL (insn) != NULL_RTX
 	&& ((any_uncondjump_p (insn) && onlyjump_p (insn))
-	    || GET_CODE (PATTERN (insn)) == RETURN)
+	    || ANY_RETURN_P (PATTERN (insn)))
 	&& (next = NEXT_INSN (insn))
 	&& BARRIER_P (next));
        depth++)
@@ -2996,16 +3008,14 @@ fill_slots_from_thread (rtx insn, rtx co
 
       gcc_assert (thread_if_true);
 
-      if (new_thread && JUMP_P (new_thread)
-	  && (simplejump_p (new_thread)
-	      || GET_CODE (PATTERN (new_thread)) == RETURN)
+      if (new_thread && simplejump_or_return_p (new_thread)
 	  && redirect_with_delay_list_safe_p (insn,
 					      JUMP_LABEL (new_thread),
 					      delay_list))
 	new_thread = follow_jumps (JUMP_LABEL (new_thread));
 
       if (ANY_RETURN_P (new_thread))
-	label = find_end_label ();
+	label = find_end_label (new_thread);
       else if (LABEL_P (new_thread))
 	label = new_thread;
       else
@@ -3355,7 +3365,7 @@ relax_delay_slots (rtx first)
 	{
 	  target_label = skip_consecutive_labels (follow_jumps (target_label));
 	  if (ANY_RETURN_P (target_label))
-	    target_label = find_end_label ();
+	    target_label = find_end_label (target_label);
 
 	  if (target_label && next_active_insn (target_label) == next
 	      && ! condjump_in_parallel_p (insn))
@@ -3370,9 +3380,8 @@ relax_delay_slots (rtx first)
 	  /* See if this jump conditionally branches around an unconditional
 	     jump.  If so, invert this jump and point it to the target of the
 	     second jump.  */
-	  if (next && JUMP_P (next)
+	  if (next && simplejump_or_return_p (next)
 	      && any_condjump_p (insn)
-	      && (simplejump_p (next) || GET_CODE (PATTERN (next)) == RETURN)
 	      && target_label
 	      && next_active_insn (target_label) == next_active_insn (next)
 	      && no_labels_between_p (insn, next))
@@ -3414,8 +3423,7 @@ relax_delay_slots (rtx first)
 	 Don't do this if we expect the conditional branch to be true, because
 	 we would then be making the more common case longer.  */
 
-      if (JUMP_P (insn)
-	  && (simplejump_p (insn) || GET_CODE (PATTERN (insn)) == RETURN)
+      if (simplejump_or_return_p (insn)
 	  && (other = prev_active_insn (insn)) != 0
 	  && any_condjump_p (other)
 	  && no_labels_between_p (other, insn)
@@ -3456,10 +3464,10 @@ relax_delay_slots (rtx first)
 	 Only do so if optimizing for size since this results in slower, but
 	 smaller code.  */
       if (optimize_function_for_size_p (cfun)
-	  && GET_CODE (PATTERN (delay_insn)) == RETURN
+	  && ANY_RETURN_P (PATTERN (delay_insn))
 	  && next
 	  && JUMP_P (next)
-	  && GET_CODE (PATTERN (next)) == RETURN)
+	  && PATTERN (next) == PATTERN (delay_insn))
 	{
 	  rtx after;
 	  int i;
@@ -3498,6 +3506,8 @@ relax_delay_slots (rtx first)
 	continue;
 
       target_label = JUMP_LABEL (delay_insn);
+      if (target_label && ANY_RETURN_P (target_label))
+	continue;
 
       if (!ANY_RETURN_P (target_label))
 	{
@@ -3505,7 +3515,7 @@ relax_delay_slots (rtx first)
 	     don't convert a jump into a RETURN here.  */
 	  trial = skip_consecutive_labels (follow_jumps (target_label));
 	  if (ANY_RETURN_P (trial))
-	    trial = find_end_label ();
+	    trial = find_end_label (trial);
 
 	  if (trial && trial != target_label
 	      && redirect_with_delay_slots_safe_p (delay_insn, trial, insn))
@@ -3528,7 +3538,7 @@ relax_delay_slots (rtx first)
 		 later incorrectly compute register live/death info.  */
 	      rtx tmp = next_active_insn (trial);
 	      if (tmp == 0)
-		tmp = find_end_label ();
+		tmp = find_end_label (simple_return_rtx);
 
 	      if (tmp)
 	        {
@@ -3549,13 +3559,12 @@ relax_delay_slots (rtx first)
 	  if (trial && GET_CODE (PATTERN (trial)) == SEQUENCE
 	      && XVECLEN (PATTERN (trial), 0) == 2
 	      && JUMP_P (XVECEXP (PATTERN (trial), 0, 0))
-	      && (simplejump_p (XVECEXP (PATTERN (trial), 0, 0))
-		  || GET_CODE (PATTERN (XVECEXP (PATTERN (trial), 0, 0))) == RETURN)
+	      && simplejump_or_return_p (XVECEXP (PATTERN (trial), 0, 0))
 	      && redundant_insn (XVECEXP (PATTERN (trial), 0, 1), insn, 0))
 	    {
 	      target_label = JUMP_LABEL (XVECEXP (PATTERN (trial), 0, 0));
 	      if (ANY_RETURN_P (target_label))
-		target_label = find_end_label ();
+		target_label = find_end_label (target_label);
 
 	      if (target_label
 	          && redirect_with_delay_slots_safe_p (delay_insn, target_label,
@@ -3633,8 +3642,7 @@ relax_delay_slots (rtx first)
 	 a RETURN here.  */
       if (! INSN_ANNULLED_BRANCH_P (delay_insn)
 	  && any_condjump_p (delay_insn)
-	  && next && JUMP_P (next)
-	  && (simplejump_p (next) || GET_CODE (PATTERN (next)) == RETURN)
+	  && next && simplejump_or_return_p (next)
 	  && next_active_insn (target_label) == next_active_insn (next)
 	  && no_labels_between_p (insn, next))
 	{
@@ -3642,7 +3650,7 @@ relax_delay_slots (rtx first)
 	  rtx old_label = JUMP_LABEL (delay_insn);
 
 	  if (ANY_RETURN_P (label))
-	    label = find_end_label ();
+	    label = find_end_label (label);
 
 	  /* find_end_label can generate a new label. Check this first.  */
 	  if (label
@@ -3703,7 +3711,8 @@ static void
 make_return_insns (rtx first)
 {
   rtx insn, jump_insn, pat;
-  rtx real_return_label = end_of_function_label;
+  rtx real_return_label = function_return_label;
+  rtx real_simple_return_label = function_simple_return_label;
   int slots, i;
 
 #ifdef DELAY_SLOTS_FOR_EPILOGUE
@@ -3721,15 +3730,22 @@ make_return_insns (rtx first)
      made for END_OF_FUNCTION_LABEL.  If so, set up anything we can't change
      into a RETURN to jump to it.  */
   for (insn = first; insn; insn = NEXT_INSN (insn))
-    if (JUMP_P (insn) && GET_CODE (PATTERN (insn)) == RETURN)
+    if (JUMP_P (insn) && ANY_RETURN_P (PATTERN (insn)))
       {
-	real_return_label = get_label_before (insn);
+	rtx t = get_label_before (insn);
+	if (PATTERN (insn) == ret_rtx)
+	  real_return_label = t;
+	else
+	  real_simple_return_label = t;
 	break;
       }
 
   /* Show an extra usage of REAL_RETURN_LABEL so it won't go away if it
      was equal to END_OF_FUNCTION_LABEL.  */
-  LABEL_NUSES (real_return_label)++;
+  if (real_return_label)
+    LABEL_NUSES (real_return_label)++;
+  if (real_simple_return_label)
+    LABEL_NUSES (real_simple_return_label)++;
 
   /* Clear the list of insns to fill so we can use it.  */
   obstack_free (&unfilled_slots_obstack, unfilled_firstobj);
@@ -3737,13 +3753,27 @@ make_return_insns (rtx first)
   for (insn = first; insn; insn = NEXT_INSN (insn))
     {
       int flags;
+      rtx kind, real_label;
 
       /* Only look at filled JUMP_INSNs that go to the end of function
 	 label.  */
       if (!NONJUMP_INSN_P (insn)
 	  || GET_CODE (PATTERN (insn)) != SEQUENCE
-	  || !JUMP_P (XVECEXP (PATTERN (insn), 0, 0))
-	  || JUMP_LABEL (XVECEXP (PATTERN (insn), 0, 0)) != end_of_function_label)
+	  || !JUMP_P (XVECEXP (PATTERN (insn), 0, 0)))
+	continue;
+
+      if (JUMP_LABEL (XVECEXP (PATTERN (insn), 0, 0)) == function_return_label)
+	{
+	  kind = ret_rtx;
+	  real_label = real_return_label;
+	}
+      else if (JUMP_LABEL (XVECEXP (PATTERN (insn), 0, 0))
+	       == function_simple_return_label)
+	{
+	  kind = simple_return_rtx;
+	  real_label = real_simple_return_label;
+	}
+      else
 	continue;
 
       pat = PATTERN (insn);
@@ -3751,14 +3781,12 @@ make_return_insns (rtx first)
 
       /* If we can't make the jump into a RETURN, try to redirect it to the best
 	 RETURN and go on to the next insn.  */
-      if (! reorg_redirect_jump (jump_insn, ret_rtx))
+      if (!reorg_redirect_jump (jump_insn, kind))
 	{
 	  /* Make sure redirecting the jump will not invalidate the delay
 	     slot insns.  */
-	  if (redirect_with_delay_slots_safe_p (jump_insn,
-						real_return_label,
-						insn))
-	    reorg_redirect_jump (jump_insn, real_return_label);
+	  if (redirect_with_delay_slots_safe_p (jump_insn, real_label, insn))
+	    reorg_redirect_jump (jump_insn, real_label);
 	  continue;
 	}
 
@@ -3798,7 +3826,7 @@ make_return_insns (rtx first)
 	 RETURN, delete the SEQUENCE and output the individual insns,
 	 followed by the RETURN.  Then set things up so we try to find
 	 insns for its delay slots, if it needs some.  */
-      if (GET_CODE (PATTERN (jump_insn)) == RETURN)
+      if (ANY_RETURN_P (PATTERN (jump_insn)))
 	{
 	  rtx prev = PREV_INSN (insn);
 
@@ -3815,13 +3843,16 @@ make_return_insns (rtx first)
       else
 	/* It is probably more efficient to keep this with its current
 	   delay slot as a branch to a RETURN.  */
-	reorg_redirect_jump (jump_insn, real_return_label);
+	reorg_redirect_jump (jump_insn, real_label);
     }
 
   /* Now delete REAL_RETURN_LABEL if we never used it.  Then try to fill any
      new delay slots we have created.  */
-  if (--LABEL_NUSES (real_return_label) == 0)
+  if (real_return_label != NULL_RTX && --LABEL_NUSES (real_return_label) == 0)
     delete_related_insns (real_return_label);
+  if (real_simple_return_label != NULL_RTX
+      && --LABEL_NUSES (real_simple_return_label) == 0)
+    delete_related_insns (real_simple_return_label);
 
   fill_simple_delay_slots (1);
   fill_simple_delay_slots (0);
@@ -3889,7 +3920,7 @@ dbr_schedule (rtx first)
   init_resource_info (epilogue_insn);
 
   /* Show we haven't computed an end-of-function label yet.  */
-  end_of_function_label = 0;
+  function_return_label = function_simple_return_label = NULL_RTX;
 
   /* Initialize the statistics for this function.  */
   memset (num_insns_needing_delays, 0, sizeof num_insns_needing_delays);
@@ -3911,11 +3942,23 @@ dbr_schedule (rtx first)
   /* If we made an end of function label, indicate that it is now
      safe to delete it by undoing our prior adjustment to LABEL_NUSES.
      If it is now unused, delete it.  */
-  if (end_of_function_label && --LABEL_NUSES (end_of_function_label) == 0)
-    delete_related_insns (end_of_function_label);
+  if (function_return_label && --LABEL_NUSES (function_return_label) == 0)
+    delete_related_insns (function_return_label);
+  if (function_simple_return_label
+      && --LABEL_NUSES (function_simple_return_label) == 0)
+    delete_related_insns (function_simple_return_label);
 
+#if defined HAVE_return || defined HAVE_simple_return
+  if (
 #ifdef HAVE_return
-  if (HAVE_return && end_of_function_label != 0)
+      (HAVE_return && function_return_label != 0)
+#else
+      0
+#endif
+#ifdef HAVE_simple_return
+      || (HAVE_simple_return && function_simple_return_label != 0)
+#endif
+      )
     make_return_insns (first);
 #endif
 
Index: gcc/genemit.c
===================================================================
--- gcc/genemit.c	(revision 176879)
+++ gcc/genemit.c	(working copy)
@@ -169,6 +169,9 @@ gen_exp (rtx x, enum rtx_code subroutine
     case RETURN:
       printf ("ret_rtx");
       return;
+    case SIMPLE_RETURN:
+      printf ("simple_return_rtx");
+      return;
     case CLOBBER:
       if (REG_P (XEXP (x, 0)))
 	{
@@ -489,8 +492,8 @@ gen_expand (rtx expand)
 	  || (GET_CODE (next) == PARALLEL
 	      && ((GET_CODE (XVECEXP (next, 0, 0)) == SET
 		   && GET_CODE (SET_DEST (XVECEXP (next, 0, 0))) == PC)
-		  || GET_CODE (XVECEXP (next, 0, 0)) == RETURN))
-	  || GET_CODE (next) == RETURN)
+		  || ANY_RETURN_P (XVECEXP (next, 0, 0))))
+	  || ANY_RETURN_P (next))
 	printf ("  emit_jump_insn (");
       else if ((GET_CODE (next) == SET && GET_CODE (SET_SRC (next)) == CALL)
 	       || GET_CODE (next) == CALL
@@ -607,7 +610,7 @@ gen_split (rtx split)
 	  || (GET_CODE (next) == PARALLEL
 	      && GET_CODE (XVECEXP (next, 0, 0)) == SET
 	      && GET_CODE (SET_DEST (XVECEXP (next, 0, 0))) == PC)
-	  || GET_CODE (next) == RETURN)
+	  || ANY_RETURN_P (next))
 	printf ("  emit_jump_insn (");
       else if ((GET_CODE (next) == SET && GET_CODE (SET_SRC (next)) == CALL)
 	       || GET_CODE (next) == CALL
Index: gcc/df-scan.c
===================================================================
--- gcc/df-scan.c	(revision 176879)
+++ gcc/df-scan.c	(working copy)
@@ -3181,6 +3181,7 @@ df_uses_record (struct df_collection_rec
       }
 
     case RETURN:
+    case SIMPLE_RETURN:
       break;
 
     case ASM_OPERANDS:
Index: gcc/rtl.def
===================================================================
--- gcc/rtl.def	(revision 176879)
+++ gcc/rtl.def	(working copy)
@@ -731,6 +731,10 @@ DEF_RTL_EXPR(ENTRY_VALUE, "entry_value",
    been optimized away completely.  */
 DEF_RTL_EXPR(DEBUG_PARAMETER_REF, "debug_parameter_ref", "t", RTX_OBJ)
 
+/* A plain return, to be used on paths that are reached without going
+   through the function prologue.  */
+DEF_RTL_EXPR(SIMPLE_RETURN, "simple_return", "", RTX_EXTRA)
+
 /* All expressions from this point forward appear only in machine
    descriptions.  */
 #ifdef GENERATOR_FILE
Index: gcc/ifcvt.c
===================================================================
--- gcc/ifcvt.c	(revision 176881)
+++ gcc/ifcvt.c	(working copy)
@@ -3796,6 +3796,7 @@ find_if_case_1 (basic_block test_bb, edg
   basic_block then_bb = then_edge->dest;
   basic_block else_bb = else_edge->dest;
   basic_block new_bb;
+  rtx else_target = NULL_RTX;
   int then_bb_index;
 
   /* If we are partitioning hot/cold basic blocks, we don't want to
@@ -3845,6 +3846,13 @@ find_if_case_1 (basic_block test_bb, edg
 				    predictable_edge_p (then_edge)))))
     return FALSE;
 
+  if (else_bb == EXIT_BLOCK_PTR)
+    {
+      rtx jump = BB_END (else_edge->src);
+      gcc_assert (JUMP_P (jump));
+      else_target = JUMP_LABEL (jump);
+    }
+
   /* Registers set are dead, or are predicable.  */
   if (! dead_or_predicable (test_bb, then_bb, else_bb,
 			    single_succ_edge (then_bb), 1))
@@ -3864,6 +3872,9 @@ find_if_case_1 (basic_block test_bb, edg
       redirect_edge_succ (FALLTHRU_EDGE (test_bb), else_bb);
       new_bb = 0;
     }
+  else if (else_bb == EXIT_BLOCK_PTR)
+    new_bb = force_nonfallthru_and_redirect (FALLTHRU_EDGE (test_bb),
+					     else_bb, else_target);
   else
     new_bb = redirect_edge_and_branch_force (FALLTHRU_EDGE (test_bb),
 					     else_bb);
Index: gcc/jump.c
===================================================================
--- gcc/jump.c	(revision 176881)
+++ gcc/jump.c	(working copy)
@@ -29,7 +29,8 @@ along with GCC; see the file COPYING3.
    JUMP_LABEL internal field.  With this we can detect labels that
    become unused because of the deletion of all the jumps that
    formerly used them.  The JUMP_LABEL info is sometimes looked
-   at by later passes.
+   at by later passes.  For return insns, it contains either a
+   RETURN or a SIMPLE_RETURN rtx.
 
    The subroutines redirect_jump and invert_jump are used
    from other passes as well.  */
@@ -775,10 +776,10 @@ condjump_p (const_rtx insn)
     return (GET_CODE (x) == IF_THEN_ELSE
 	    && ((GET_CODE (XEXP (x, 2)) == PC
 		 && (GET_CODE (XEXP (x, 1)) == LABEL_REF
-		     || GET_CODE (XEXP (x, 1)) == RETURN))
+		     || ANY_RETURN_P (XEXP (x, 1))))
 		|| (GET_CODE (XEXP (x, 1)) == PC
 		    && (GET_CODE (XEXP (x, 2)) == LABEL_REF
-			|| GET_CODE (XEXP (x, 2)) == RETURN))));
+			|| ANY_RETURN_P (XEXP (x, 2))))));
 }
 
 /* Return nonzero if INSN is a (possibly) conditional jump inside a
@@ -807,11 +808,11 @@ condjump_in_parallel_p (const_rtx insn)
     return 0;
   if (XEXP (SET_SRC (x), 2) == pc_rtx
       && (GET_CODE (XEXP (SET_SRC (x), 1)) == LABEL_REF
-	  || GET_CODE (XEXP (SET_SRC (x), 1)) == RETURN))
+	  || ANY_RETURN_P (XEXP (SET_SRC (x), 1))))
     return 1;
   if (XEXP (SET_SRC (x), 1) == pc_rtx
       && (GET_CODE (XEXP (SET_SRC (x), 2)) == LABEL_REF
-	  || GET_CODE (XEXP (SET_SRC (x), 2)) == RETURN))
+	  || ANY_RETURN_P (XEXP (SET_SRC (x), 2))))
     return 1;
   return 0;
 }
@@ -873,8 +874,9 @@ any_condjump_p (const_rtx insn)
   a = GET_CODE (XEXP (SET_SRC (x), 1));
   b = GET_CODE (XEXP (SET_SRC (x), 2));
 
-  return ((b == PC && (a == LABEL_REF || a == RETURN))
-	  || (a == PC && (b == LABEL_REF || b == RETURN)));
+  return ((b == PC && (a == LABEL_REF || a == RETURN || a == SIMPLE_RETURN))
+	  || (a == PC
+	      && (b == LABEL_REF || b == RETURN || b == SIMPLE_RETURN)));
 }
 
 /* Return the label of a conditional jump.  */
@@ -911,6 +913,7 @@ returnjump_p_1 (rtx *loc, void *data ATT
   switch (GET_CODE (x))
     {
     case RETURN:
+    case SIMPLE_RETURN:
     case EH_RETURN:
       return true;
 
Index: gcc/print-rtl.c
===================================================================
--- gcc/print-rtl.c	(revision 176881)
+++ gcc/print-rtl.c	(working copy)
@@ -328,6 +328,8 @@ print_rtx (const_rtx in_rtx)
 	    fprintf (outfile, "\n%s%*s -> ", print_rtx_head, indent * 2, "");
 	    if (GET_CODE (JUMP_LABEL (in_rtx)) == RETURN)
 	      fprintf (outfile, "return");
+	    else if (GET_CODE (JUMP_LABEL (in_rtx)) == SIMPLE_RETURN)
+	      fprintf (outfile, "simple_return");
 	    else
 	      fprintf (outfile, "%d", INSN_UID (JUMP_LABEL (in_rtx)));
 	  }
Index: gcc/bt-load.c
===================================================================
--- gcc/bt-load.c	(revision 176879)
+++ gcc/bt-load.c	(working copy)
@@ -558,7 +558,7 @@ compute_defs_uses_and_gen (fibheap_t all
 		      /* Check for sibcall.  */
 		      if (GET_CODE (pat) == PARALLEL)
 			for (i = XVECLEN (pat, 0) - 1; i >= 0; i--)
-			  if (GET_CODE (XVECEXP (pat, 0, i)) == RETURN)
+			  if (ANY_RETURN_P (XVECEXP (pat, 0, i)))
 			    {
 			      COMPL_HARD_REG_SET (call_saved,
 						  call_used_reg_set);
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c	(revision 176881)
+++ gcc/emit-rtl.c	(working copy)
@@ -2518,6 +2518,7 @@ verify_rtx_sharing (rtx orig, rtx insn)
     case PC:
     case CC0:
     case RETURN:
+    case SIMPLE_RETURN:
     case SCRATCH:
       return;
       /* SCRATCH must be shared because they represent distinct values.  */
@@ -5002,7 +5003,7 @@ classify_insn (rtx x)
     return CODE_LABEL;
   if (GET_CODE (x) == CALL)
     return CALL_INSN;
-  if (GET_CODE (x) == RETURN)
+  if (ANY_RETURN_P (x))
     return JUMP_INSN;
   if (GET_CODE (x) == SET)
     {
@@ -5514,6 +5515,7 @@ init_emit_regs (void)
   /* Assign register numbers to the globally defined register rtx.  */
   pc_rtx = gen_rtx_fmt_ (PC, VOIDmode);
   ret_rtx = gen_rtx_fmt_ (RETURN, VOIDmode);
+  simple_return_rtx = gen_rtx_fmt_ (SIMPLE_RETURN, VOIDmode);
   cc0_rtx = gen_rtx_fmt_ (CC0, VOIDmode);
   stack_pointer_rtx = gen_raw_REG (Pmode, STACK_POINTER_REGNUM);
   frame_pointer_rtx = gen_raw_REG (Pmode, FRAME_POINTER_REGNUM);
Index: gcc/cfglayout.c
===================================================================
--- gcc/cfglayout.c	(revision 176881)
+++ gcc/cfglayout.c	(working copy)
@@ -767,6 +767,7 @@ fixup_reorder_chain (void)
     {
       edge e_fall, e_taken, e;
       rtx bb_end_insn;
+      rtx ret_label = NULL_RTX;
       basic_block nb, src_bb;
       edge_iterator ei;
 
@@ -786,6 +787,7 @@ fixup_reorder_chain (void)
       bb_end_insn = BB_END (bb);
       if (JUMP_P (bb_end_insn))
 	{
+	  ret_label = JUMP_LABEL (bb_end_insn);
 	  if (any_condjump_p (bb_end_insn))
 	    {
 	      /* This might happen if the conditional jump has side
@@ -899,7 +901,7 @@ fixup_reorder_chain (void)
 	 Note force_nonfallthru can delete E_FALL and thus we have to
 	 save E_FALL->src prior to the call to force_nonfallthru.  */
       src_bb = e_fall->src;
-      nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest);
+      nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label);
       if (nb)
 	{
 	  nb->il.rtl->visited = 1;
Index: gcc/rtl.c
===================================================================
--- gcc/rtl.c	(revision 176879)
+++ gcc/rtl.c	(working copy)
@@ -256,6 +256,7 @@ copy_rtx (rtx orig)
     case PC:
     case CC0:
     case RETURN:
+    case SIMPLE_RETURN:
     case SCRATCH:
       /* SCRATCH must be shared because they represent distinct values.  */
       return orig;
Index: gcc/rtl.h
===================================================================
--- gcc/rtl.h	(revision 176881)
+++ gcc/rtl.h	(working copy)
@@ -432,8 +432,9 @@ struct GTY((variable_size)) rtvec_def {
   (JUMP_P (INSN) && (GET_CODE (PATTERN (INSN)) == ADDR_VEC || \
 		     GET_CODE (PATTERN (INSN)) == ADDR_DIFF_VEC))
 
-/* Predicate yielding nonzero iff X is a return.  */
-#define ANY_RETURN_P(X) ((X) == ret_rtx)
+/* Predicate yielding nonzero iff X is a return or simple_return.  */
+#define ANY_RETURN_P(X) \
+  (GET_CODE (X) == RETURN || GET_CODE (X) == SIMPLE_RETURN)
 
 /* 1 if X is a unary operator.  */
 
@@ -2074,6 +2075,7 @@ enum global_rtl_index
   GR_PC,
   GR_CC0,
   GR_RETURN,
+  GR_SIMPLE_RETURN,
   GR_STACK_POINTER,
   GR_FRAME_POINTER,
 /* For register elimination to work properly these hard_frame_pointer_rtx,
@@ -2169,6 +2171,7 @@ extern struct target_rtl *this_target_rt
 /* Standard pieces of rtx, to be substituted directly into things.  */
 #define pc_rtx                  (global_rtl[GR_PC])
 #define ret_rtx                 (global_rtl[GR_RETURN])
+#define simple_return_rtx       (global_rtl[GR_SIMPLE_RETURN])
 #define cc0_rtx                 (global_rtl[GR_CC0])
 
 /* All references to certain hard regs, except those created
Index: gcc/combine.c
===================================================================
--- gcc/combine.c	(revision 176879)
+++ gcc/combine.c	(working copy)
@@ -6303,7 +6303,7 @@ simplify_set (rtx x)
   rtx *cc_use;
 
   /* (set (pc) (return)) gets written as (return).  */
-  if (GET_CODE (dest) == PC && GET_CODE (src) == RETURN)
+  if (GET_CODE (dest) == PC && ANY_RETURN_P (src))
     return src;
 
   /* Now that we know for sure which bits of SRC we are using, see if we can
Index: gcc/resource.c
===================================================================
--- gcc/resource.c	(revision 176881)
+++ gcc/resource.c	(working copy)
@@ -492,7 +492,7 @@ find_dead_or_set_registers (rtx target,
 	  if (jump_count++ < 10)
 	    {
 	      if (any_uncondjump_p (this_jump_insn)
-		  || GET_CODE (PATTERN (this_jump_insn)) == RETURN)
+		  || ANY_RETURN_P (PATTERN (this_jump_insn)))
 		{
 		  next = JUMP_LABEL (this_jump_insn);
 		  if (ANY_RETURN_P (next))
@@ -821,7 +821,7 @@ mark_set_resources (rtx x, struct resour
 static bool
 return_insn_p (const_rtx insn)
 {
-  if (JUMP_P (insn) && GET_CODE (PATTERN (insn)) == RETURN)
+  if (JUMP_P (insn) && ANY_RETURN_P (PATTERN (insn)))
     return true;
 
   if (NONJUMP_INSN_P (insn) && GET_CODE (PATTERN (insn)) == SEQUENCE)
Index: gcc/basic-block.h
===================================================================
--- gcc/basic-block.h	(revision 176881)
+++ gcc/basic-block.h	(working copy)
@@ -804,7 +804,7 @@ extern rtx block_label (basic_block);
 extern bool purge_all_dead_edges (void);
 extern bool purge_dead_edges (basic_block);
 extern bool fixup_abnormal_edges (void);
-extern basic_block force_nonfallthru_and_redirect (edge, basic_block);
+extern basic_block force_nonfallthru_and_redirect (edge, basic_block, rtx);
 
 /* In cfgbuild.c.  */
 extern void find_many_sub_basic_blocks (sbitmap);
Index: gcc/sched-vis.c
===================================================================
--- gcc/sched-vis.c	(revision 176879)
+++ gcc/sched-vis.c	(working copy)
@@ -554,6 +554,9 @@ print_pattern (char *buf, const_rtx x, i
     case RETURN:
       sprintf (buf, "return");
       break;
+    case SIMPLE_RETURN:
+      sprintf (buf, "simple_return");
+      break;
     case CALL:
       print_exp (buf, x, verbose);
       break;
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 176879)
+++ gcc/config/i386/i386.c	(working copy)
@@ -29890,7 +29890,7 @@ ix86_pad_returns (void)
       rtx prev;
       bool replace = false;
 
-      if (!JUMP_P (ret) || GET_CODE (PATTERN (ret)) != RETURN
+      if (!JUMP_P (ret) || !ANY_RETURN_P (PATTERN (ret))
 	  || optimize_bb_for_size_p (bb))
 	continue;
       for (prev = PREV_INSN (ret); prev; prev = PREV_INSN (prev))
@@ -29941,7 +29941,7 @@ ix86_count_insn_bb (basic_block bb)
     {
       /* Only happen in exit blocks.  */
       if (JUMP_P (insn)
-	  && GET_CODE (PATTERN (insn)) == RETURN)
+	  && ANY_RETURN_P (PATTERN (insn)))
 	break;
 
       if (NONDEBUG_INSN_P (insn)
@@ -30014,7 +30014,7 @@ ix86_pad_short_function (void)
   FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR->preds)
     {
       rtx ret = BB_END (e->src);
-      if (JUMP_P (ret) && GET_CODE (PATTERN (ret)) == RETURN)
+      if (JUMP_P (ret) && ANY_RETURN_P (PATTERN (ret)))
 	{
 	  int insn_count = ix86_count_insn (e->src);
 
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 176881)
+++ gcc/config/arm/arm.c	(working copy)
@@ -17382,6 +17382,7 @@ arm_final_prescan_insn (rtx insn)
 
   /* If we start with a return insn, we only succeed if we find another one.  */
   int seeking_return = 0;
+  enum rtx_code return_code = UNKNOWN;
 
   /* START_INSN will hold the insn from where we start looking.  This is the
      first insn after the following code_label if REVERSE is true.  */
@@ -17420,7 +17421,7 @@ arm_final_prescan_insn (rtx insn)
 	  else
 	    return;
 	}
-      else if (GET_CODE (body) == RETURN)
+      else if (ANY_RETURN_P (body))
         {
 	  start_insn = next_nonnote_insn (start_insn);
 	  if (GET_CODE (start_insn) == BARRIER)
@@ -17431,6 +17432,7 @@ arm_final_prescan_insn (rtx insn)
 	    {
 	      reverse = TRUE;
 	      seeking_return = 1;
+	      return_code = GET_CODE (body);
 	    }
 	  else
 	    return;
@@ -17471,11 +17473,15 @@ arm_final_prescan_insn (rtx insn)
 	  label = XEXP (XEXP (SET_SRC (body), 2), 0);
 	  then_not_else = FALSE;
 	}
-      else if (GET_CODE (XEXP (SET_SRC (body), 1)) == RETURN)
-	seeking_return = 1;
-      else if (GET_CODE (XEXP (SET_SRC (body), 2)) == RETURN)
+      else if (ANY_RETURN_P (XEXP (SET_SRC (body), 1)))
+	{
+	  seeking_return = 1;
+	  return_code = GET_CODE (XEXP (SET_SRC (body), 1));
+	}
+      else if (ANY_RETURN_P (XEXP (SET_SRC (body), 2)))
         {
 	  seeking_return = 1;
+	  return_code = GET_CODE (XEXP (SET_SRC (body), 2));
 	  then_not_else = FALSE;
         }
       else
@@ -17572,12 +17578,11 @@ arm_final_prescan_insn (rtx insn)
 		}
 	      /* Fail if a conditional return is undesirable (e.g. on a
 		 StrongARM), but still allow this if optimizing for size.  */
-	      else if (GET_CODE (scanbody) == RETURN
+	      else if (GET_CODE (scanbody) == return_code
 		       && !use_return_insn (TRUE, NULL)
 		       && !optimize_size)
 		fail = TRUE;
-	      else if (GET_CODE (scanbody) == RETURN
-		       && seeking_return)
+	      else if (GET_CODE (scanbody) == return_code)
 	        {
 		  arm_ccfsm_state = 2;
 		  succeed = TRUE;
Index: gcc/config/mips/mips.md
===================================================================
--- gcc/config/mips/mips.md	(revision 176879)
+++ gcc/config/mips/mips.md	(working copy)
@@ -5724,6 +5724,18 @@ (define_insn "*return"
   [(set_attr "type"	"jump")
    (set_attr "mode"	"none")])
 
+(define_expand "simple_return"
+  [(simple_return)]
+  "!mips_can_use_return_insn ()"
+  { mips_expand_before_return (); })
+
+(define_insn "*simple_return"
+  [(simple_return)]
+  "!mips_can_use_return_insn ()"
+  "%*j\t$31%/"
+  [(set_attr "type"	"jump")
+   (set_attr "mode"	"none")])
+
 ;; Normal return.
 
 (define_insn "return_internal"
@@ -5731,6 +5743,14 @@ (define_insn "return_internal"
    (use (match_operand 0 "pmode_register_operand" ""))]
   ""
   "%*j\t%0%/"
+  [(set_attr "type"	"jump")
+   (set_attr "mode"	"none")])
+
+(define_insn "simple_return_internal"
+  [(simple_return)
+   (use (match_operand 0 "pmode_register_operand" ""))]
+  ""
+  "%*j\t%0%/"
   [(set_attr "type"	"jump")
    (set_attr "mode"	"none")])
 
Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c	(revision 176879)
+++ gcc/config/mips/mips.c	(working copy)
@@ -10452,7 +10452,8 @@ mips_expand_epilogue (bool sibcall_p)
 	    regno = GP_REG_FIRST + 7;
 	  else
 	    regno = RETURN_ADDR_REGNUM;
-	  emit_jump_insn (gen_return_internal (gen_rtx_REG (Pmode, regno)));
+	  emit_jump_insn (gen_simple_return_internal (gen_rtx_REG (Pmode,
+								   regno)));
 	}
     }
 
Index: gcc/cfgrtl.c
===================================================================
--- gcc/cfgrtl.c	(revision 176905)
+++ gcc/cfgrtl.c	(working copy)
@@ -1117,10 +1117,13 @@ rtl_redirect_edge_and_branch (edge e, ba
 }
 
 /* Like force_nonfallthru below, but additionally performs redirection
-   Used by redirect_edge_and_branch_force.  */
+   Used by redirect_edge_and_branch_force.  JUMP_LABEL is used only
+   when redirecting to the EXIT_BLOCK, it is either ret_rtx or
+   simple_return_rtx, indicating which kind of returnjump to create.
+   It should be NULL otherwise.  */
 
 basic_block
-force_nonfallthru_and_redirect (edge e, basic_block target)
+force_nonfallthru_and_redirect (edge e, basic_block target, rtx jump_label)
 {
   basic_block jump_block, new_bb = NULL, src = e->src;
   rtx note;
@@ -1252,12 +1255,25 @@ force_nonfallthru_and_redirect (edge e,
   e->flags &= ~EDGE_FALLTHRU;
   if (target == EXIT_BLOCK_PTR)
     {
+      if (jump_label == ret_rtx)
+	{
 #ifdef HAVE_return
-	emit_jump_insn_after_setloc (gen_return (), BB_END (jump_block), loc);
-	JUMP_LABEL (BB_END (jump_block)) = ret_rtx;
+	  emit_jump_insn_after_setloc (gen_return (), BB_END (jump_block), loc);
 #else
-	gcc_unreachable ();
+	  gcc_unreachable ();
+#endif
+	}
+      else
+	{
+	  gcc_assert (jump_label == simple_return_rtx);
+#ifdef HAVE_simple_return
+	  emit_jump_insn_after_setloc (gen_simple_return (),
+				       BB_END (jump_block), loc);
+#else
+	  gcc_unreachable ();
 #endif
+	}
+      JUMP_LABEL (BB_END (jump_block)) = jump_label;
     }
   else
     {
@@ -1284,7 +1300,7 @@ force_nonfallthru_and_redirect (edge e,
 static basic_block
 rtl_force_nonfallthru (edge e)
 {
-  return force_nonfallthru_and_redirect (e, e->dest);
+  return force_nonfallthru_and_redirect (e, e->dest, NULL_RTX);
 }
 
 /* Redirect edge even at the expense of creating new jump insn or
@@ -1301,7 +1317,7 @@ rtl_redirect_edge_and_branch_force (edge
   /* In case the edge redirection failed, try to force it to be non-fallthru
      and redirect newly created simplejump.  */
   df_set_bb_dirty (e->src);
-  return force_nonfallthru_and_redirect (e, target);
+  return force_nonfallthru_and_redirect (e, target, NULL_RTX);
 }
 
 /* The given edge should potentially be a fallthru edge.  If that is in

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/6] Shrink-wrapping
  2011-07-28 11:48         ` Bernd Schmidt
  2011-07-28 12:45           ` Richard Sandiford
  2011-07-28 23:30           ` Richard Earnshaw
@ 2011-08-03 10:42           ` Alan Modra
  2011-08-03 11:19             ` Bernd Schmidt
  2 siblings, 1 reply; 73+ messages in thread
From: Alan Modra @ 2011-08-03 10:42 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches

On Thu, Jul 28, 2011 at 12:35:46PM +0200, Bernd Schmidt wrote:
[snip]
> 	* rtl.h (ANY_RETURN_P): New macro.
[snip]

This patch makes rebuild_jump_labels set JUMP_LABEL appropriately
for return jumps, and fixes sharing for RETURN.  Since ANY_RETURN_P(X)
is defined as ((X) == ret_rtx), RETURNs need to stay shared.
Bootstrapped and regression tested powerpc-linux and powerpc64-linux.
OK to apply?

	PR rtl-optimization/49941
	* jump.c (mark_jump_label): Comment.
	(mark_jump_label_1): Set JUMP_LABEL for return jumps.
	* emit-rtl.c (copy_rtx_if_shared_1, copy_insn_1): Leave RETURN shared.
	(mark_used_flags): Don't mark RETURN.

Index: gcc/jump.c
===================================================================
--- gcc/jump.c	(revision 177084)
+++ gcc/jump.c	(working copy)
@@ -1039,6 +1039,7 @@ sets_cc0_p (const_rtx x)
    notes.  If INSN is an INSN or a CALL_INSN or non-target operands of
    a JUMP_INSN, and there is at least one CODE_LABEL referenced in
    INSN, add a REG_LABEL_OPERAND note containing that label to INSN.
+   For returnjumps, the JUMP_LABEL will also be set as appropriate.
 
    Note that two labels separated by a loop-beginning note
    must be kept distinct if we have not yet done loop-optimization,
@@ -1081,6 +1082,14 @@ mark_jump_label_1 (rtx x, rtx insn, bool
     case CALL:
       return;
 
+    case RETURN:
+      if (is_target)
+	{
+	  gcc_assert (JUMP_LABEL (insn) == NULL || JUMP_LABEL (insn) == x);
+	  JUMP_LABEL (insn) = x;
+	}
+      return;
+
     case MEM:
       in_mem = true;
       break;
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c	(revision 177084)
+++ gcc/emit-rtl.c	(working copy)
@@ -2724,6 +2724,7 @@ repeat:
     case CODE_LABEL:
     case PC:
     case CC0:
+    case RETURN:
     case SCRATCH:
       /* SCRATCH must be shared because they represent distinct values.  */
       return;
@@ -2843,6 +2844,7 @@ repeat:
     case CODE_LABEL:
     case PC:
     case CC0:
+    case RETURN:
       return;
 
     case DEBUG_INSN:
@@ -5257,6 +5259,7 @@ copy_insn_1 (rtx orig)
     case CODE_LABEL:
     case PC:
     case CC0:
+    case RETURN:
       return orig;
     case CLOBBER:
       if (REG_P (XEXP (orig, 0)) && REGNO (XEXP (orig, 0)) < FIRST_PSEUDO_REGISTER)

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/6] Shrink-wrapping
  2011-08-03 10:42           ` Alan Modra
@ 2011-08-03 11:19             ` Bernd Schmidt
  0 siblings, 0 replies; 73+ messages in thread
From: Bernd Schmidt @ 2011-08-03 11:19 UTC (permalink / raw)
  To: GCC Patches, amodra

On 08/03/11 12:41, Alan Modra wrote:
> This patch makes rebuild_jump_labels set JUMP_LABEL appropriately
> for return jumps, and fixes sharing for RETURN.  Since ANY_RETURN_P(X)
> is defined as ((X) == ret_rtx), RETURNs need to stay shared.
> Bootstrapped and regression tested powerpc-linux and powerpc64-linux.
> OK to apply?
> 
> 	PR rtl-optimization/49941
> 	* jump.c (mark_jump_label): Comment.
> 	(mark_jump_label_1): Set JUMP_LABEL for return jumps.
> 	* emit-rtl.c (copy_rtx_if_shared_1, copy_insn_1): Leave RETURN shared.
> 	(mark_used_flags): Don't mark RETURN.

Ok, thanks.


Bernd

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/6] Shrink-wrapping
  2011-08-02  8:40     ` Bernd Schmidt
@ 2011-08-03 15:39       ` Richard Sandiford
  2011-08-24 19:23         ` Bernd Schmidt
  0 siblings, 1 reply; 73+ messages in thread
From: Richard Sandiford @ 2011-08-03 15:39 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches

Bernd Schmidt <bernds@codesourcery.com> writes:
> +@findex simple_return
> +@item (simple_return)
> +Like @code{(return)}, but truly represents only a function return, while
> +@code{(return)} may represent an insn that also performs other functions
> +of the function epilogue.  Like @code{(return)}, this may also occur in
> +conditional jumps.

Sorry, I've forgotton the outcome of the discussion about what happens
on targets whose return expands to the same code as their simple_return.
Do the targets still need both "return" and "simple_return" rtxes?
Do they need both md patterns (but potentially using the same rtx
underneath)?

I ask because the rtl.def comment implies that those targets still
need both expanders and both rtxes.  If that's so, I think it needs
to be mentioned here too.  E.g. something like:

  Like @code{(return)}, but truly represents only a function return, while
  @code{(return)} may represent an insn that also performs other functions
  of the function epilogue.  @code{(return)} only occurs on paths that
  pass through the function prologue, while @code{(simple_return)}
  only occurs on paths that do not pass through the prologue.

  Like @code{(return)}, @code{(simple_return)} may also occur in
  conditional jumps.

You need to document the simple_return pattern in md.texi too.

> @@ -231,6 +234,15 @@ first_active_target_insn (rtx insn)
>    return next_active_insn (insn);
>  }
>  \f
> +/* Return true iff INSN is a simplejump, or any kind of return insn.  */
> +
> +static bool
> +simplejump_or_return_p (rtx insn)
> +{
> +  return (JUMP_P (insn)
> +	  && (simplejump_p (insn) || ANY_RETURN_P (PATTERN (insn))));
> +}

Maybe better in jump.c?  I'll leave it up to you though.

> @@ -346,23 +358,29 @@ insn_sets_resource_p (rtx insn, struct r
>  
>     ??? There may be a problem with the current implementation.  Suppose
>     we start with a bare RETURN insn and call find_end_label.  It may set
> -   end_of_function_label just before the RETURN.  Suppose the machinery
> +   function_return_label just before the RETURN.  Suppose the machinery
>     is able to fill the delay slot of the RETURN insn afterwards.  Then
> -   end_of_function_label is no longer valid according to the property
> +   function_return_label is no longer valid according to the property
>     described above and find_end_label will still return it unmodified.
>     Note that this is probably mitigated by the following observation:
> -   once end_of_function_label is made, it is very likely the target of
> +   once function_return_label is made, it is very likely the target of
>     a jump, so filling the delay slot of the RETURN will be much more
>     difficult.  */
>  
>  static rtx
> -find_end_label (void)
> +find_end_label (rtx kind)

Need to document the new parameter.

>  {
>    rtx insn;
> +  rtx *plabel;
> +
> +  if (kind == ret_rtx)
> +    plabel = &function_return_label;
> +  else
> +    plabel = &function_simple_return_label;

I think it'd be worth a gcc_checking_assert that ret == simple_return_rtx
in the other case.

> -	  /* Put the label before an USE insns that may proceed the
> +	  /* Put the label before an USE insns that may precede the
>  	     RETURN insn.  */

Might as well fix s/an USE/any USE/ too while you're there

> @@ -3498,6 +3506,8 @@ relax_delay_slots (rtx first)
>  	continue;
>  
>        target_label = JUMP_LABEL (delay_insn);
> +      if (target_label && ANY_RETURN_P (target_label))
> +	continue;
>  
>        if (!ANY_RETURN_P (target_label))
>  	{

This doesn't look like a pure "handle return as well as simple return"
change.  Is the idea that every following test only makes sense for
labels, and that things like:

	  && prev_active_insn (target_label) == insn

(to pick just one example) are actively dangerous for returns?
If so, I think you should remove the immediately-following.
"if (!ANY_RETURN_P (target_label))" condition and reindent the body.

> @@ -3737,13 +3753,27 @@ make_return_insns (rtx first)
>    for (insn = first; insn; insn = NEXT_INSN (insn))
>      {
>        int flags;
> +      rtx kind, real_label;
>  
>        /* Only look at filled JUMP_INSNs that go to the end of function
>  	 label.  */
>        if (!NONJUMP_INSN_P (insn)
>  	  || GET_CODE (PATTERN (insn)) != SEQUENCE
> -	  || !JUMP_P (XVECEXP (PATTERN (insn), 0, 0))
> -	  || JUMP_LABEL (XVECEXP (PATTERN (insn), 0, 0)) != end_of_function_label)
> +	  || !JUMP_P (XVECEXP (PATTERN (insn), 0, 0)))
> +	continue;
> +
> +      if (JUMP_LABEL (XVECEXP (PATTERN (insn), 0, 0)) == function_return_label)
> +	{
> +	  kind = ret_rtx;
> +	  real_label = real_return_label;
> +	}
> +      else if (JUMP_LABEL (XVECEXP (PATTERN (insn), 0, 0))
> +	       == function_simple_return_label)
> +	{
> +	  kind = simple_return_rtx;
> +	  real_label = real_simple_return_label;
> +	}
> +      else
>  	continue;
>  
>        pat = PATTERN (insn);

Given what you said about JUMP_LABEL sometimes being null,
I think we need either (a) to check whether each *_return_label
is null before comparing it with JUMP_LABEL, or (b) to ensure that
we're dealing with a jump to a label.  (b) seems neater IMO
(as a call to jump_to_label_p).

> +#if defined HAVE_return || defined HAVE_simple_return
> +  if (
>  #ifdef HAVE_return
> -  if (HAVE_return && end_of_function_label != 0)
> +      (HAVE_return && function_return_label != 0)
> +#else
> +      0
> +#endif
> +#ifdef HAVE_simple_return
> +      || (HAVE_simple_return && function_simple_return_label != 0)
> +#endif
> +      )
>      make_return_insns (first);
>  #endif

Eww.  Given that make_return_insns clears the *return_labels,
it's probably more readable just to have two conditional calls:

#ifdef HAVE_return
  if (HAVE_return && function_return_label != 0)
    make_return_insns (first);
#endif
#ifdef HAVE_simple_return
  if (HAVE_simple_return && function_simple_return_label != 0)
    make_return_insns (first);
#endif

I'll leave it up to you though.

> Index: gcc/emit-rtl.c
> ===================================================================
> --- gcc/emit-rtl.c	(revision 176881)
> +++ gcc/emit-rtl.c	(working copy)
> @@ -2518,6 +2518,7 @@ verify_rtx_sharing (rtx orig, rtx insn)
>      case PC:
>      case CC0:
>      case RETURN:
> +    case SIMPLE_RETURN:
>      case SCRATCH:
>        return;
>        /* SCRATCH must be shared because they represent distinct values.  */

Given Alan's patch, I suppose you also need cases for copy_rtx_if_shared_1,
copy_insn_1 and mark_used_flags.  (Sorry about being wise after the fact here.)

> Index: gcc/config/mips/mips.md
> ===================================================================
> --- gcc/config/mips/mips.md	(revision 176879)
> +++ gcc/config/mips/mips.md	(working copy)
> @@ -5724,6 +5724,18 @@ (define_insn "*return"
>    [(set_attr "type"	"jump")
>     (set_attr "mode"	"none")])
>  
> +(define_expand "simple_return"
> +  [(simple_return)]
> +  "!mips_can_use_return_insn ()"
> +  { mips_expand_before_return (); })
> +
> +(define_insn "*simple_return"
> +  [(simple_return)]
> +  "!mips_can_use_return_insn ()"
> +  "%*j\t$31%/"
> +  [(set_attr "type"	"jump")
> +   (set_attr "mode"	"none")])
> +
>  ;; Normal return.
>  
>  (define_insn "return_internal"
> @@ -5731,6 +5743,14 @@ (define_insn "return_internal"
>     (use (match_operand 0 "pmode_register_operand" ""))]
>    ""
>    "%*j\t%0%/"
> +  [(set_attr "type"	"jump")
> +   (set_attr "mode"	"none")])
> +
> +(define_insn "simple_return_internal"
> +  [(simple_return)
> +   (use (match_operand 0 "pmode_register_operand" ""))]
> +  ""
> +  "%*j\t%0%/"
>    [(set_attr "type"	"jump")
>     (set_attr "mode"	"none")])

Please add:

(define_code_iterator any_return [return simple_return])

and just change the appropriate returns to any_returns.

The rtl and MIPS bits look good to me otherwise.

Richard

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/6] Shrink-wrapping
  2011-08-03 15:39       ` Richard Sandiford
@ 2011-08-24 19:23         ` Bernd Schmidt
  2011-08-24 20:48           ` Richard Sandiford
  2011-08-28 10:58           ` H.J. Lu
  0 siblings, 2 replies; 73+ messages in thread
From: Bernd Schmidt @ 2011-08-24 19:23 UTC (permalink / raw)
  To: GCC Patches, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 5261 bytes --]

On 08/03/11 17:38, Richard Sandiford wrote:
> Bernd Schmidt <bernds@codesourcery.com> writes:
>> +@findex simple_return
>> +@item (simple_return)
>> +Like @code{(return)}, but truly represents only a function return, while
>> +@code{(return)} may represent an insn that also performs other functions
>> +of the function epilogue.  Like @code{(return)}, this may also occur in
>> +conditional jumps.
> 
> Sorry, I've forgotton the outcome of the discussion about what happens
> on targets whose return expands to the same code as their simple_return.
> Do the targets still need both "return" and "simple_return" rtxes?

It's important to distinguish between these names as rtxes that can
occur in instruction patterns, and a use as a standard pattern name.
When a "return" pattern is generated, it should either fail or expand to
something that performs both the epilogue and the return. A
"simple_return" expands to something that performs only the return.

Most targets allow "return" patterns only if the epilogue is empty. In
that case, "return" and "simple_return" can expand to the same insn; it
does not matter if that insn uses "simple_return" or "return", as they
are equivalent in the absence of an epilogue. It would be slightly nicer
to use "simple_return" in the patterns everywhere except ARM, but ports
don't need to be changed.

> Do they need both md patterns (but potentially using the same rtx
> underneath)?

The "return" standard pattern is needed for the existing optimizations
(inserting returns in-line rather than jumping to the end of the
function). Typically, it always fails if the function needs an epilogue,
except in the ARM case.
For shrink-wrapping to work, a port needs a "simple_return" pattern,
which the compiler can use even if parts of the function need an
epilogue. So yes, they have different purposes.

> I ask because the rtl.def comment implies that those targets still
> need both expanders and both rtxes.  If that's so, I think it needs
> to be mentioned here too.  E.g. something like:
> 
>   Like @code{(return)}, but truly represents only a function return, while
>   @code{(return)} may represent an insn that also performs other functions
>   of the function epilogue.  @code{(return)} only occurs on paths that
>   pass through the function prologue, while @code{(simple_return)}
>   only occurs on paths that do not pass through the prologue.

This is not accurate for the rtx code. It is mostly accurate for the
standard pattern name. A simple_return rtx may occur just after an
epilogue, i.e. on a path that has passed through the prologue.

Even for the simple_return pattern, I'm not sure reorg.c couldn't
introduce new expansions in a location after both prologue and epilogue.

>   Like @code{(return)}, @code{(simple_return)} may also occur in
>   conditional jumps.
> 
> You need to document the simple_return pattern in md.texi too.

I was trying to update the documentation to only the current state after
the patch. The thinking was that without shrink-wrapping, nothing
generates this pattern, so documenting it would be misleading.
However, with the mips changes in this version of the patch, reorg.c
does make use of this pattern, so I've added documentation

>> @@ -3498,6 +3506,8 @@ relax_delay_slots (rtx first)
>>  	continue;
>>  
>>        target_label = JUMP_LABEL (delay_insn);
>> +      if (target_label && ANY_RETURN_P (target_label))
>> +	continue;
>>  
>>        if (!ANY_RETURN_P (target_label))
>>  	{
> 
> This doesn't look like a pure "handle return as well as simple return"
> change.  Is the idea that every following test only makes sense for
> labels, and that things like:
> 
> 	  && prev_active_insn (target_label) == insn
> 
> (to pick just one example) are actively dangerous for returns?

That probably was the idea. Looking at it again, there's one case at the
bottom of the loop which may be safe, but given that there were no code
generation differences with the patch on three targets with
define_delay, I've done:

> If so, I think you should remove the immediately-following.
> "if (!ANY_RETURN_P (target_label))" condition and reindent the body.

this.

> Given what you said about JUMP_LABEL sometimes being null,
> I think we need either (a) to check whether each *_return_label
> is null before comparing it with JUMP_LABEL, or (b) to ensure that
> we're dealing with a jump to a label.  (b) seems neater IMO
> (as a call to jump_to_label_p).

Done.

> 
>> +#if defined HAVE_return || defined HAVE_simple_return
>> +  if (
>>  #ifdef HAVE_return
>> -  if (HAVE_return && end_of_function_label != 0)
>> +      (HAVE_return && function_return_label != 0)
>> +#else
>> +      0
>> +#endif
>> +#ifdef HAVE_simple_return
>> +      || (HAVE_simple_return && function_simple_return_label != 0)
>> +#endif
>> +      )
>>      make_return_insns (first);
>>  #endif
> 
> Eww.

Restructured.

> (define_code_iterator any_return [return simple_return])
> 
> and just change the appropriate returns to any_returns.

I've done this a bit differently - to show that it can be done, I've
changed mips to always emit simple_return rtxs, even for "return"
patterns (no code generation changes observed again).

This version regression tested on mips64-elf, c/c++/objc.


Bernd

[-- Attachment #2: sw-part2.diff --]
[-- Type: text/plain, Size: 53015 bytes --]

	* doc/rtl.texi (simple_return): Document.
	(parallel, PATTERN): Here too.
	* doc/md.texi (return): Mention it's allowed to expand to simple_return
	in some cases.
	(simple_return): Document standard pattern.
	* gengenrtl.c (special_rtx): SIMPLE_RETURN is special.
	* final.c (final_scan_insn): Use ANY_RETURN_P on body.
	* reorg.c (function_return_label, function_simple_return_label):
	New static variables, replacing...
	(end_of_function_label): ... this.
	(simplejump_or_return_p): New static function.
	(optimize_skip, steal_delay_list_from_fallthrough,
	fill_slots_from_thread): Use it.
	(relax_delay_slots): Likewise.  Use ANY_RETURN_P on body.
	(rare_destination, follow_jumps): Use ANY_RETURN_P on body.
	(find_end_label): Take a new arg which is one of the two return
	rtxs.  Depending on which, set either function_return_label or
	function_simple_return_label.  All callers changed.
	(make_return_insns): Make both kinds.
	(dbr_schedule): Adjust for two kinds of end labels.
	* genemit.c (gen_exp): Handle SIMPLE_RETURN.
	(gen_expand, gen_split): Use ANY_RETURN_P.
	* df-scan.c (df_uses_record): Handle SIMPLE_RETURN.
	* rtl.def (SIMPLE_RETURN): New code.
	* ifcvt.c (find_if_case_1): Be more careful about
	redirecting jumps to the EXIT_BLOCK.
	* jump.c (condjump_p, condjump_in_parallel_p, any_condjump_p,
	returnjump_p_1): Handle SIMPLE_RETURNs.
	* print-rtl.c (print_rtx): Likewise.
	* rtl.c (copy_rtx): Likewise.
	* bt-load.c (compute_defs_uses_and_gen): Use ANY_RETURN_P.
	* combine.c (simplify_set): Likewise.
	* resource.c (find_dead_or_set_registers, mark_set_resources):
	Likewise.
	* emit-rtl.c (verify_rtx_sharing, classify_insn, copy_insn_1,
	copy_rtx_if_shared_1, mark_used_flags): Handle SIMPLE_RETURNs.
	(init_emit_regs): Initialize simple_return_rtx.
	* cfglayout.c (fixup_reorder_chain): Pass a JUMP_LABEL to
	force_nonfallthru_and_redirect.
	* rtl.h (ANY_RETURN_P): Allow SIMPLE_RETURN.
	(GR_SIMPLE_RETURN): New enum value.
	(simple_return_rtx): New macro.
	* basic-block.h (force_nonfallthru_and_redirect): Adjust
	declaration.
	* cfgrtl.c (force_nonfallthru_and_redirect): Take a new jump_label
	argument.  All callers changed.  Be careful about what kinds of
	returnjumps to generate.
	* config/i386/3i86.c (ix86_pad_returns, ix86_count_insn_bb,
	ix86_pad_short_function): Likewise.
	* config/arm/arm.c (arm_final_prescan_insn): Handle both kinds
	of return.
	* config/mips/mips.md (any_return): New code_iterator.
	(optab): Add cases for return and simple_return.
	(return): Expand to a simple_return.
	(simple_return): New pattern.
	(*<optab>, *<optab>_internal for any_return): New patterns.
	(return_internal): Remove.
	* config/mips/mips.c (mips_expand_epilogue): Make the last insn
	a simple_return_internal.

Index: gcc/doc/rtl.texi
===================================================================
--- gcc/doc/rtl.texi	(revision 177999)
+++ gcc/doc/rtl.texi	(working copy)
@@ -2915,6 +2915,13 @@ placed in @code{pc} to return to the cal
 Note that an insn pattern of @code{(return)} is logically equivalent to
 @code{(set (pc) (return))}, but the latter form is never used.
 
+@findex simple_return
+@item (simple_return)
+Like @code{(return)}, but truly represents only a function return, while
+@code{(return)} may represent an insn that also performs other functions
+of the function epilogue.  Like @code{(return)}, this may also occur in
+conditional jumps.
+
 @findex call
 @item (call @var{function} @var{nargs})
 Represents a function call.  @var{function} is a @code{mem} expression
@@ -3044,7 +3051,7 @@ Represents several side effects performe
 brackets stand for a vector; the operand of @code{parallel} is a
 vector of expressions.  @var{x0}, @var{x1} and so on are individual
 side effect expressions---expressions of code @code{set}, @code{call},
-@code{return}, @code{clobber} or @code{use}.
+@code{return}, @code{simple_return}, @code{clobber} or @code{use}.
 
 ``In parallel'' means that first all the values used in the individual
 side-effects are computed, and second all the actual side-effects are
@@ -3683,14 +3690,16 @@ and @code{call_insn} insns:
 @table @code
 @findex PATTERN
 @item PATTERN (@var{i})
-An expression for the side effect performed by this insn.  This must be
-one of the following codes: @code{set}, @code{call}, @code{use},
-@code{clobber}, @code{return}, @code{asm_input}, @code{asm_output},
-@code{addr_vec}, @code{addr_diff_vec}, @code{trap_if}, @code{unspec},
-@code{unspec_volatile}, @code{parallel}, @code{cond_exec}, or @code{sequence}.  If it is a @code{parallel},
-each element of the @code{parallel} must be one these codes, except that
-@code{parallel} expressions cannot be nested and @code{addr_vec} and
-@code{addr_diff_vec} are not permitted inside a @code{parallel} expression.
+An expression for the side effect performed by this insn.  This must
+be one of the following codes: @code{set}, @code{call}, @code{use},
+@code{clobber}, @code{return}, @code{simple_return}, @code{asm_input},
+@code{asm_output}, @code{addr_vec}, @code{addr_diff_vec},
+@code{trap_if}, @code{unspec}, @code{unspec_volatile},
+@code{parallel}, @code{cond_exec}, or @code{sequence}.  If it is a
+@code{parallel}, each element of the @code{parallel} must be one these
+codes, except that @code{parallel} expressions cannot be nested and
+@code{addr_vec} and @code{addr_diff_vec} are not permitted inside a
+@code{parallel} expression.
 
 @findex INSN_CODE
 @item INSN_CODE (@var{i})
Index: gcc/gengenrtl.c
===================================================================
--- gcc/gengenrtl.c	(revision 177999)
+++ gcc/gengenrtl.c	(working copy)
@@ -131,6 +131,7 @@ special_rtx (int idx)
 	  || strcmp (defs[idx].enumname, "PC") == 0
 	  || strcmp (defs[idx].enumname, "CC0") == 0
 	  || strcmp (defs[idx].enumname, "RETURN") == 0
+	  || strcmp (defs[idx].enumname, "SIMPLE_RETURN") == 0
 	  || strcmp (defs[idx].enumname, "CONST_VECTOR") == 0);
 }
 
Index: gcc/final.c
===================================================================
--- gcc/final.c	(revision 177999)
+++ gcc/final.c	(working copy)
@@ -2492,7 +2492,7 @@ final_scan_insn (rtx insn, FILE *file, i
 	        delete_insn (insn);
 		break;
 	      }
-	    else if (GET_CODE (SET_SRC (body)) == RETURN)
+	    else if (ANY_RETURN_P (SET_SRC (body)))
 	      /* Replace (set (pc) (return)) with (return).  */
 	      PATTERN (insn) = body = SET_SRC (body);
 
Index: gcc/reorg.c
===================================================================
--- gcc/reorg.c	(revision 177999)
+++ gcc/reorg.c	(working copy)
@@ -161,8 +161,11 @@ static rtx *unfilled_firstobj;
 #define unfilled_slots_next	\
   ((rtx *) obstack_next_free (&unfilled_slots_obstack))
 
-/* Points to the label before the end of the function.  */
-static rtx end_of_function_label;
+/* Points to the label before the end of the function, or before a
+   return insn.  */
+static rtx function_return_label;
+/* Likewise for a simple_return.  */
+static rtx function_simple_return_label;
 
 /* Mapping between INSN_UID's and position in the code since INSN_UID's do
    not always monotonically increase.  */
@@ -175,7 +178,7 @@ static int stop_search_p (rtx, int);
 static int resource_conflicts_p (struct resources *, struct resources *);
 static int insn_references_resource_p (rtx, struct resources *, bool);
 static int insn_sets_resource_p (rtx, struct resources *, bool);
-static rtx find_end_label (void);
+static rtx find_end_label (rtx);
 static rtx emit_delay_sequence (rtx, rtx, int);
 static rtx add_to_delay_list (rtx, rtx);
 static rtx delete_from_delay_slot (rtx);
@@ -231,6 +234,15 @@ first_active_target_insn (rtx insn)
   return next_active_insn (insn);
 }
 \f
+/* Return true iff INSN is a simplejump, or any kind of return insn.  */
+
+static bool
+simplejump_or_return_p (rtx insn)
+{
+  return (JUMP_P (insn)
+	  && (simplejump_p (insn) || ANY_RETURN_P (PATTERN (insn))));
+}
+\f
 /* Return TRUE if this insn should stop the search for insn to fill delay
    slots.  LABELS_P indicates that labels should terminate the search.
    In all cases, jumps terminate the search.  */
@@ -346,23 +358,34 @@ insn_sets_resource_p (rtx insn, struct r
 
    ??? There may be a problem with the current implementation.  Suppose
    we start with a bare RETURN insn and call find_end_label.  It may set
-   end_of_function_label just before the RETURN.  Suppose the machinery
+   function_return_label just before the RETURN.  Suppose the machinery
    is able to fill the delay slot of the RETURN insn afterwards.  Then
-   end_of_function_label is no longer valid according to the property
+   function_return_label is no longer valid according to the property
    described above and find_end_label will still return it unmodified.
    Note that this is probably mitigated by the following observation:
-   once end_of_function_label is made, it is very likely the target of
+   once function_return_label is made, it is very likely the target of
    a jump, so filling the delay slot of the RETURN will be much more
-   difficult.  */
+   difficult.
+   KIND is either simple_return_rtx or ret_rtx, indicating which type of
+   return we're looking for.  */
 
 static rtx
-find_end_label (void)
+find_end_label (rtx kind)
 {
   rtx insn;
+  rtx *plabel;
+
+  if (kind == ret_rtx)
+    plabel = &function_return_label;
+  else
+    {
+      gcc_assert (kind == simple_return_rtx);
+      plabel = &function_simple_return_label;
+    }
 
   /* If we found one previously, return it.  */
-  if (end_of_function_label)
-    return end_of_function_label;
+  if (*plabel)
+    return *plabel;
 
   /* Otherwise, see if there is a label at the end of the function.  If there
      is, it must be that RETURN insns aren't needed, so that is our return
@@ -377,44 +400,45 @@ find_end_label (void)
 
   /* When a target threads its epilogue we might already have a
      suitable return insn.  If so put a label before it for the
-     end_of_function_label.  */
+     function_return_label.  */
   if (BARRIER_P (insn)
       && JUMP_P (PREV_INSN (insn))
-      && GET_CODE (PATTERN (PREV_INSN (insn))) == RETURN)
+      && PATTERN (PREV_INSN (insn)) == kind)
     {
       rtx temp = PREV_INSN (PREV_INSN (insn));
-      end_of_function_label = gen_label_rtx ();
-      LABEL_NUSES (end_of_function_label) = 0;
+      rtx label = gen_label_rtx ();
+      LABEL_NUSES (label) = 0;
 
-      /* Put the label before an USE insns that may precede the RETURN insn.  */
+      /* Put the label before any USE insns that may precede the RETURN
+	 insn.  */
       while (GET_CODE (temp) == USE)
 	temp = PREV_INSN (temp);
 
-      emit_label_after (end_of_function_label, temp);
+      emit_label_after (label, temp);
+      *plabel = label;
     }
 
   else if (LABEL_P (insn))
-    end_of_function_label = insn;
+    *plabel = insn;
   else
     {
-      end_of_function_label = gen_label_rtx ();
-      LABEL_NUSES (end_of_function_label) = 0;
+      rtx label = gen_label_rtx ();
+      LABEL_NUSES (label) = 0;
       /* If the basic block reorder pass moves the return insn to
 	 some other place try to locate it again and put our
-	 end_of_function_label there.  */
-      while (insn && ! (JUMP_P (insn)
-		        && (GET_CODE (PATTERN (insn)) == RETURN)))
+	 function_return_label there.  */
+      while (insn && ! (JUMP_P (insn) && (PATTERN (insn) == kind)))
 	insn = PREV_INSN (insn);
       if (insn)
 	{
 	  insn = PREV_INSN (insn);
 
-	  /* Put the label before an USE insns that may proceed the
+	  /* Put the label before any USE insns that may precede the
 	     RETURN insn.  */
 	  while (GET_CODE (insn) == USE)
 	    insn = PREV_INSN (insn);
 
-	  emit_label_after (end_of_function_label, insn);
+	  emit_label_after (label, insn);
 	}
       else
 	{
@@ -424,19 +448,16 @@ find_end_label (void)
 	      && ! HAVE_return
 #endif
 	      )
-	    {
-	      /* The RETURN insn has its delay slot filled so we cannot
-		 emit the label just before it.  Since we already have
-		 an epilogue and cannot emit a new RETURN, we cannot
-		 emit the label at all.  */
-	      end_of_function_label = NULL_RTX;
-	      return end_of_function_label;
-	    }
+	    /* The RETURN insn has its delay slot filled so we cannot
+	       emit the label just before it.  Since we already have
+	       an epilogue and cannot emit a new RETURN, we cannot
+	       emit the label at all.  */
+	    return NULL_RTX;
 #endif /* HAVE_epilogue */
 
 	  /* Otherwise, make a new label and emit a RETURN and BARRIER,
 	     if needed.  */
-	  emit_label (end_of_function_label);
+	  emit_label (label);
 #ifdef HAVE_return
 	  /* We don't bother trying to create a return insn if the
 	     epilogue has filled delay-slots; we would have to try and
@@ -455,13 +476,14 @@ find_end_label (void)
 	    }
 #endif
 	}
+      *plabel = label;
     }
 
   /* Show one additional use for this label so it won't go away until
      we are done.  */
-  ++LABEL_NUSES (end_of_function_label);
+  ++LABEL_NUSES (*plabel);
 
-  return end_of_function_label;
+  return *plabel;
 }
 \f
 /* Put INSN and LIST together in a SEQUENCE rtx of LENGTH, and replace
@@ -809,10 +831,8 @@ optimize_skip (rtx insn)
   if ((next_trial == next_active_insn (JUMP_LABEL (insn))
        && ! (next_trial == 0 && crtl->epilogue_delay_list != 0))
       || (next_trial != 0
-	  && JUMP_P (next_trial)
-	  && JUMP_LABEL (insn) == JUMP_LABEL (next_trial)
-	  && (simplejump_p (next_trial)
-	      || GET_CODE (PATTERN (next_trial)) == RETURN)))
+	  && simplejump_or_return_p (next_trial)
+	  && JUMP_LABEL (insn) == JUMP_LABEL (next_trial)))
     {
       if (eligible_for_annul_false (insn, 0, trial, flags))
 	{
@@ -831,13 +851,11 @@ optimize_skip (rtx insn)
 	 branch, thread our jump to the target of that branch.  Don't
 	 change this into a RETURN here, because it may not accept what
 	 we have in the delay slot.  We'll fix this up later.  */
-      if (next_trial && JUMP_P (next_trial)
-	  && (simplejump_p (next_trial)
-	      || GET_CODE (PATTERN (next_trial)) == RETURN))
+      if (next_trial && simplejump_or_return_p (next_trial))
 	{
 	  rtx target_label = JUMP_LABEL (next_trial);
 	  if (ANY_RETURN_P (target_label))
-	    target_label = find_end_label ();
+	    target_label = find_end_label (target_label);
 
 	  if (target_label)
 	    {
@@ -951,7 +969,7 @@ rare_destination (rtx insn)
 	     return.  */
 	  return 2;
 	case JUMP_INSN:
-	  if (GET_CODE (PATTERN (insn)) == RETURN)
+	  if (ANY_RETURN_P (PATTERN (insn)))
 	    return 1;
 	  else if (simplejump_p (insn)
 		   && jump_count++ < 10)
@@ -1368,8 +1386,7 @@ steal_delay_list_from_fallthrough (rtx i
   /* We can't do anything if SEQ's delay insn isn't an
      unconditional branch.  */
 
-  if (! simplejump_p (XVECEXP (seq, 0, 0))
-      && GET_CODE (PATTERN (XVECEXP (seq, 0, 0))) != RETURN)
+  if (! simplejump_or_return_p (XVECEXP (seq, 0, 0)))
     return delay_list;
 
   for (i = 1; i < XVECLEN (seq, 0); i++)
@@ -2383,7 +2400,7 @@ fill_simple_delay_slots (int non_jumps_p
 	      if (new_label != 0)
 		new_label = get_label_before (new_label);
 	      else
-		new_label = find_end_label ();
+		new_label = find_end_label (simple_return_rtx);
 
 	      if (new_label)
 	        {
@@ -2515,7 +2532,8 @@ fill_simple_delay_slots (int non_jumps_p
 \f
 /* Follow any unconditional jump at LABEL;
    return the ultimate label reached by any such chain of jumps.
-   Return ret_rtx if the chain ultimately leads to a return instruction.
+   Return a suitable return rtx if the chain ultimately leads to a
+   return instruction.
    If LABEL is not followed by a jump, return LABEL.
    If the chain loops or we can't find end, return LABEL,
    since that tells caller to avoid changing the insn.  */
@@ -2536,7 +2554,7 @@ follow_jumps (rtx label)
 	&& JUMP_P (insn)
 	&& JUMP_LABEL (insn) != NULL_RTX
 	&& ((any_uncondjump_p (insn) && onlyjump_p (insn))
-	    || GET_CODE (PATTERN (insn)) == RETURN)
+	    || ANY_RETURN_P (PATTERN (insn)))
 	&& (next = NEXT_INSN (insn))
 	&& BARRIER_P (next));
        depth++)
@@ -3003,16 +3021,14 @@ fill_slots_from_thread (rtx insn, rtx co
 
       gcc_assert (thread_if_true);
 
-      if (new_thread && JUMP_P (new_thread)
-	  && (simplejump_p (new_thread)
-	      || GET_CODE (PATTERN (new_thread)) == RETURN)
+      if (new_thread && simplejump_or_return_p (new_thread)
 	  && redirect_with_delay_list_safe_p (insn,
 					      JUMP_LABEL (new_thread),
 					      delay_list))
 	new_thread = follow_jumps (JUMP_LABEL (new_thread));
 
       if (ANY_RETURN_P (new_thread))
-	label = find_end_label ();
+	label = find_end_label (new_thread);
       else if (LABEL_P (new_thread))
 	label = new_thread;
       else
@@ -3362,7 +3378,7 @@ relax_delay_slots (rtx first)
 	{
 	  target_label = skip_consecutive_labels (follow_jumps (target_label));
 	  if (ANY_RETURN_P (target_label))
-	    target_label = find_end_label ();
+	    target_label = find_end_label (target_label);
 
 	  if (target_label && next_active_insn (target_label) == next
 	      && ! condjump_in_parallel_p (insn))
@@ -3377,9 +3393,8 @@ relax_delay_slots (rtx first)
 	  /* See if this jump conditionally branches around an unconditional
 	     jump.  If so, invert this jump and point it to the target of the
 	     second jump.  */
-	  if (next && JUMP_P (next)
+	  if (next && simplejump_or_return_p (next)
 	      && any_condjump_p (insn)
-	      && (simplejump_p (next) || GET_CODE (PATTERN (next)) == RETURN)
 	      && target_label
 	      && next_active_insn (target_label) == next_active_insn (next)
 	      && no_labels_between_p (insn, next))
@@ -3421,8 +3436,7 @@ relax_delay_slots (rtx first)
 	 Don't do this if we expect the conditional branch to be true, because
 	 we would then be making the more common case longer.  */
 
-      if (JUMP_P (insn)
-	  && (simplejump_p (insn) || GET_CODE (PATTERN (insn)) == RETURN)
+      if (simplejump_or_return_p (insn)
 	  && (other = prev_active_insn (insn)) != 0
 	  && any_condjump_p (other)
 	  && no_labels_between_p (other, insn)
@@ -3463,10 +3477,10 @@ relax_delay_slots (rtx first)
 	 Only do so if optimizing for size since this results in slower, but
 	 smaller code.  */
       if (optimize_function_for_size_p (cfun)
-	  && GET_CODE (PATTERN (delay_insn)) == RETURN
+	  && ANY_RETURN_P (PATTERN (delay_insn))
 	  && next
 	  && JUMP_P (next)
-	  && GET_CODE (PATTERN (next)) == RETURN)
+	  && PATTERN (next) == PATTERN (delay_insn))
 	{
 	  rtx after;
 	  int i;
@@ -3505,73 +3519,71 @@ relax_delay_slots (rtx first)
 	continue;
 
       target_label = JUMP_LABEL (delay_insn);
+      if (target_label && ANY_RETURN_P (target_label))
+	continue;
 
-      if (!ANY_RETURN_P (target_label))
+      /* If this jump goes to another unconditional jump, thread it, but
+	 don't convert a jump into a RETURN here.  */
+      trial = skip_consecutive_labels (follow_jumps (target_label));
+      if (ANY_RETURN_P (trial))
+	trial = find_end_label (trial);
+
+      if (trial && trial != target_label
+	  && redirect_with_delay_slots_safe_p (delay_insn, trial, insn))
 	{
-	  /* If this jump goes to another unconditional jump, thread it, but
-	     don't convert a jump into a RETURN here.  */
-	  trial = skip_consecutive_labels (follow_jumps (target_label));
-	  if (ANY_RETURN_P (trial))
-	    trial = find_end_label ();
+	  reorg_redirect_jump (delay_insn, trial);
+	  target_label = trial;
+	}
 
-	  if (trial && trial != target_label
-	      && redirect_with_delay_slots_safe_p (delay_insn, trial, insn))
-	    {
-	      reorg_redirect_jump (delay_insn, trial);
-	      target_label = trial;
-	    }
+      /* If the first insn at TARGET_LABEL is redundant with a previous
+	 insn, redirect the jump to the following insn and process again.
+	 We use next_real_insn instead of next_active_insn so we
+	 don't skip USE-markers, or we'll end up with incorrect
+	 liveness info.  */
+      trial = next_real_insn (target_label);
+      if (trial && GET_CODE (PATTERN (trial)) != SEQUENCE
+	  && redundant_insn (trial, insn, 0)
+	  && ! can_throw_internal (trial))
+	{
+	  /* Figure out where to emit the special USE insn so we don't
+	     later incorrectly compute register live/death info.  */
+	  rtx tmp = next_active_insn (trial);
+	  if (tmp == 0)
+	    tmp = find_end_label (simple_return_rtx);
 
-	  /* If the first insn at TARGET_LABEL is redundant with a previous
-	     insn, redirect the jump to the following insn and process again.
-	     We use next_real_insn instead of next_active_insn so we
-	     don't skip USE-markers, or we'll end up with incorrect
-	     liveness info.  */
-	  trial = next_real_insn (target_label);
-	  if (trial && GET_CODE (PATTERN (trial)) != SEQUENCE
-	      && redundant_insn (trial, insn, 0)
-	      && ! can_throw_internal (trial))
+	  if (tmp)
 	    {
-	      /* Figure out where to emit the special USE insn so we don't
-		 later incorrectly compute register live/death info.  */
-	      rtx tmp = next_active_insn (trial);
-	      if (tmp == 0)
-		tmp = find_end_label ();
-
-	      if (tmp)
-	        {
-		  /* Insert the special USE insn and update dataflow info.  */
-		  update_block (trial, tmp);
-
-		  /* Now emit a label before the special USE insn, and
-		     redirect our jump to the new label.  */
-		  target_label = get_label_before (PREV_INSN (tmp));
-		  reorg_redirect_jump (delay_insn, target_label);
-		  next = insn;
-		  continue;
-		}
+	      /* Insert the special USE insn and update dataflow info.  */
+	      update_block (trial, tmp);
+	      
+	      /* Now emit a label before the special USE insn, and
+		 redirect our jump to the new label.  */
+	      target_label = get_label_before (PREV_INSN (tmp));
+	      reorg_redirect_jump (delay_insn, target_label);
+	      next = insn;
+	      continue;
 	    }
+	}
 
-	  /* Similarly, if it is an unconditional jump with one insn in its
-	     delay list and that insn is redundant, thread the jump.  */
-	  if (trial && GET_CODE (PATTERN (trial)) == SEQUENCE
-	      && XVECLEN (PATTERN (trial), 0) == 2
-	      && JUMP_P (XVECEXP (PATTERN (trial), 0, 0))
-	      && (simplejump_p (XVECEXP (PATTERN (trial), 0, 0))
-		  || GET_CODE (PATTERN (XVECEXP (PATTERN (trial), 0, 0))) == RETURN)
-	      && redundant_insn (XVECEXP (PATTERN (trial), 0, 1), insn, 0))
+      /* Similarly, if it is an unconditional jump with one insn in its
+	 delay list and that insn is redundant, thread the jump.  */
+      if (trial && GET_CODE (PATTERN (trial)) == SEQUENCE
+	  && XVECLEN (PATTERN (trial), 0) == 2
+	  && JUMP_P (XVECEXP (PATTERN (trial), 0, 0))
+	  && simplejump_or_return_p (XVECEXP (PATTERN (trial), 0, 0))
+	  && redundant_insn (XVECEXP (PATTERN (trial), 0, 1), insn, 0))
+	{
+	  target_label = JUMP_LABEL (XVECEXP (PATTERN (trial), 0, 0));
+	  if (ANY_RETURN_P (target_label))
+	    target_label = find_end_label (target_label);
+	  
+	  if (target_label
+	      && redirect_with_delay_slots_safe_p (delay_insn, target_label,
+						   insn))
 	    {
-	      target_label = JUMP_LABEL (XVECEXP (PATTERN (trial), 0, 0));
-	      if (ANY_RETURN_P (target_label))
-		target_label = find_end_label ();
-
-	      if (target_label
-	          && redirect_with_delay_slots_safe_p (delay_insn, target_label,
-						       insn))
-		{
-		  reorg_redirect_jump (delay_insn, target_label);
-		  next = insn;
-		  continue;
-		}
+	      reorg_redirect_jump (delay_insn, target_label);
+	      next = insn;
+	      continue;
 	    }
 	}
 
@@ -3640,8 +3652,7 @@ relax_delay_slots (rtx first)
 	 a RETURN here.  */
       if (! INSN_ANNULLED_BRANCH_P (delay_insn)
 	  && any_condjump_p (delay_insn)
-	  && next && JUMP_P (next)
-	  && (simplejump_p (next) || GET_CODE (PATTERN (next)) == RETURN)
+	  && next && simplejump_or_return_p (next)
 	  && next_active_insn (target_label) == next_active_insn (next)
 	  && no_labels_between_p (insn, next))
 	{
@@ -3649,7 +3660,7 @@ relax_delay_slots (rtx first)
 	  rtx old_label = JUMP_LABEL (delay_insn);
 
 	  if (ANY_RETURN_P (label))
-	    label = find_end_label ();
+	    label = find_end_label (label);
 
 	  /* find_end_label can generate a new label. Check this first.  */
 	  if (label
@@ -3710,7 +3721,8 @@ static void
 make_return_insns (rtx first)
 {
   rtx insn, jump_insn, pat;
-  rtx real_return_label = end_of_function_label;
+  rtx real_return_label = function_return_label;
+  rtx real_simple_return_label = function_simple_return_label;
   int slots, i;
 
 #ifdef DELAY_SLOTS_FOR_EPILOGUE
@@ -3728,15 +3740,22 @@ make_return_insns (rtx first)
      made for END_OF_FUNCTION_LABEL.  If so, set up anything we can't change
      into a RETURN to jump to it.  */
   for (insn = first; insn; insn = NEXT_INSN (insn))
-    if (JUMP_P (insn) && GET_CODE (PATTERN (insn)) == RETURN)
+    if (JUMP_P (insn) && ANY_RETURN_P (PATTERN (insn)))
       {
-	real_return_label = get_label_before (insn);
+	rtx t = get_label_before (insn);
+	if (PATTERN (insn) == ret_rtx)
+	  real_return_label = t;
+	else
+	  real_simple_return_label = t;
 	break;
       }
 
   /* Show an extra usage of REAL_RETURN_LABEL so it won't go away if it
      was equal to END_OF_FUNCTION_LABEL.  */
-  LABEL_NUSES (real_return_label)++;
+  if (real_return_label)
+    LABEL_NUSES (real_return_label)++;
+  if (real_simple_return_label)
+    LABEL_NUSES (real_simple_return_label)++;
 
   /* Clear the list of insns to fill so we can use it.  */
   obstack_free (&unfilled_slots_obstack, unfilled_firstobj);
@@ -3744,13 +3763,27 @@ make_return_insns (rtx first)
   for (insn = first; insn; insn = NEXT_INSN (insn))
     {
       int flags;
+      rtx kind, real_label;
 
       /* Only look at filled JUMP_INSNs that go to the end of function
 	 label.  */
       if (!NONJUMP_INSN_P (insn)
 	  || GET_CODE (PATTERN (insn)) != SEQUENCE
-	  || !JUMP_P (XVECEXP (PATTERN (insn), 0, 0))
-	  || JUMP_LABEL (XVECEXP (PATTERN (insn), 0, 0)) != end_of_function_label)
+	  || !jump_to_label_p (XVECEXP (PATTERN (insn), 0, 0)))
+	continue;
+
+      if (JUMP_LABEL (XVECEXP (PATTERN (insn), 0, 0)) == function_return_label)
+	{
+	  kind = ret_rtx;
+	  real_label = real_return_label;
+	}
+      else if (JUMP_LABEL (XVECEXP (PATTERN (insn), 0, 0))
+	       == function_simple_return_label)
+	{
+	  kind = simple_return_rtx;
+	  real_label = real_simple_return_label;
+	}
+      else
 	continue;
 
       pat = PATTERN (insn);
@@ -3758,14 +3791,12 @@ make_return_insns (rtx first)
 
       /* If we can't make the jump into a RETURN, try to redirect it to the best
 	 RETURN and go on to the next insn.  */
-      if (! reorg_redirect_jump (jump_insn, ret_rtx))
+      if (!reorg_redirect_jump (jump_insn, kind))
 	{
 	  /* Make sure redirecting the jump will not invalidate the delay
 	     slot insns.  */
-	  if (redirect_with_delay_slots_safe_p (jump_insn,
-						real_return_label,
-						insn))
-	    reorg_redirect_jump (jump_insn, real_return_label);
+	  if (redirect_with_delay_slots_safe_p (jump_insn, real_label, insn))
+	    reorg_redirect_jump (jump_insn, real_label);
 	  continue;
 	}
 
@@ -3805,7 +3836,7 @@ make_return_insns (rtx first)
 	 RETURN, delete the SEQUENCE and output the individual insns,
 	 followed by the RETURN.  Then set things up so we try to find
 	 insns for its delay slots, if it needs some.  */
-      if (GET_CODE (PATTERN (jump_insn)) == RETURN)
+      if (ANY_RETURN_P (PATTERN (jump_insn)))
 	{
 	  rtx prev = PREV_INSN (insn);
 
@@ -3822,13 +3853,16 @@ make_return_insns (rtx first)
       else
 	/* It is probably more efficient to keep this with its current
 	   delay slot as a branch to a RETURN.  */
-	reorg_redirect_jump (jump_insn, real_return_label);
+	reorg_redirect_jump (jump_insn, real_label);
     }
 
   /* Now delete REAL_RETURN_LABEL if we never used it.  Then try to fill any
      new delay slots we have created.  */
-  if (--LABEL_NUSES (real_return_label) == 0)
+  if (real_return_label != NULL_RTX && --LABEL_NUSES (real_return_label) == 0)
     delete_related_insns (real_return_label);
+  if (real_simple_return_label != NULL_RTX
+      && --LABEL_NUSES (real_simple_return_label) == 0)
+    delete_related_insns (real_simple_return_label);
 
   fill_simple_delay_slots (1);
   fill_simple_delay_slots (0);
@@ -3842,6 +3876,7 @@ dbr_schedule (rtx first)
 {
   rtx insn, next, epilogue_insn = 0;
   int i;
+  bool need_return_insns;
 
   /* If the current function has no insns other than the prologue and
      epilogue, then do not try to fill any delay slots.  */
@@ -3897,7 +3932,7 @@ dbr_schedule (rtx first)
   init_resource_info (epilogue_insn);
 
   /* Show we haven't computed an end-of-function label yet.  */
-  end_of_function_label = 0;
+  function_return_label = function_simple_return_label = NULL_RTX;
 
   /* Initialize the statistics for this function.  */
   memset (num_insns_needing_delays, 0, sizeof num_insns_needing_delays);
@@ -3919,13 +3954,21 @@ dbr_schedule (rtx first)
   /* If we made an end of function label, indicate that it is now
      safe to delete it by undoing our prior adjustment to LABEL_NUSES.
      If it is now unused, delete it.  */
-  if (end_of_function_label && --LABEL_NUSES (end_of_function_label) == 0)
-    delete_related_insns (end_of_function_label);
+  if (function_return_label && --LABEL_NUSES (function_return_label) == 0)
+    delete_related_insns (function_return_label);
+  if (function_simple_return_label
+      && --LABEL_NUSES (function_simple_return_label) == 0)
+    delete_related_insns (function_simple_return_label);
 
+  need_return_insns = false;
 #ifdef HAVE_return
-  if (HAVE_return && end_of_function_label != 0)
-    make_return_insns (first);
+  need_return_insns |= HAVE_return && function_return_label != 0;
 #endif
+#ifdef HAVE_simple_return
+  need_return_insns |= HAVE_simple_return && function_simple_return_label != 0;
+#endif
+  if (need_return_insns)
+    make_return_insns (first);
 
   /* Delete any USE insns made by update_block; subsequent passes don't need
      them or know how to deal with them.  */
Index: gcc/genemit.c
===================================================================
--- gcc/genemit.c	(revision 177999)
+++ gcc/genemit.c	(working copy)
@@ -169,6 +169,9 @@ gen_exp (rtx x, enum rtx_code subroutine
     case RETURN:
       printf ("ret_rtx");
       return;
+    case SIMPLE_RETURN:
+      printf ("simple_return_rtx");
+      return;
     case CLOBBER:
       if (REG_P (XEXP (x, 0)))
 	{
@@ -489,8 +492,8 @@ gen_expand (rtx expand)
 	  || (GET_CODE (next) == PARALLEL
 	      && ((GET_CODE (XVECEXP (next, 0, 0)) == SET
 		   && GET_CODE (SET_DEST (XVECEXP (next, 0, 0))) == PC)
-		  || GET_CODE (XVECEXP (next, 0, 0)) == RETURN))
-	  || GET_CODE (next) == RETURN)
+		  || ANY_RETURN_P (XVECEXP (next, 0, 0))))
+	  || ANY_RETURN_P (next))
 	printf ("  emit_jump_insn (");
       else if ((GET_CODE (next) == SET && GET_CODE (SET_SRC (next)) == CALL)
 	       || GET_CODE (next) == CALL
@@ -607,7 +610,7 @@ gen_split (rtx split)
 	  || (GET_CODE (next) == PARALLEL
 	      && GET_CODE (XVECEXP (next, 0, 0)) == SET
 	      && GET_CODE (SET_DEST (XVECEXP (next, 0, 0))) == PC)
-	  || GET_CODE (next) == RETURN)
+	  || ANY_RETURN_P (next))
 	printf ("  emit_jump_insn (");
       else if ((GET_CODE (next) == SET && GET_CODE (SET_SRC (next)) == CALL)
 	       || GET_CODE (next) == CALL
Index: gcc/df-scan.c
===================================================================
--- gcc/df-scan.c	(revision 177999)
+++ gcc/df-scan.c	(working copy)
@@ -3181,6 +3181,7 @@ df_uses_record (struct df_collection_rec
       }
 
     case RETURN:
+    case SIMPLE_RETURN:
       break;
 
     case ASM_OPERANDS:
Index: gcc/rtl.def
===================================================================
--- gcc/rtl.def	(revision 177999)
+++ gcc/rtl.def	(working copy)
@@ -731,6 +731,10 @@ DEF_RTL_EXPR(ENTRY_VALUE, "entry_value",
    been optimized away completely.  */
 DEF_RTL_EXPR(DEBUG_PARAMETER_REF, "debug_parameter_ref", "t", RTX_OBJ)
 
+/* A plain return, to be used on paths that are reached without going
+   through the function prologue.  */
+DEF_RTL_EXPR(SIMPLE_RETURN, "simple_return", "", RTX_EXTRA)
+
 /* All expressions from this point forward appear only in machine
    descriptions.  */
 #ifdef GENERATOR_FILE
Index: gcc/ifcvt.c
===================================================================
--- gcc/ifcvt.c	(revision 177999)
+++ gcc/ifcvt.c	(working copy)
@@ -3796,6 +3796,7 @@ find_if_case_1 (basic_block test_bb, edg
   basic_block then_bb = then_edge->dest;
   basic_block else_bb = else_edge->dest;
   basic_block new_bb;
+  rtx else_target = NULL_RTX;
   int then_bb_index;
 
   /* If we are partitioning hot/cold basic blocks, we don't want to
@@ -3845,6 +3846,13 @@ find_if_case_1 (basic_block test_bb, edg
 				    predictable_edge_p (then_edge)))))
     return FALSE;
 
+  if (else_bb == EXIT_BLOCK_PTR)
+    {
+      rtx jump = BB_END (else_edge->src);
+      gcc_assert (JUMP_P (jump));
+      else_target = JUMP_LABEL (jump);
+    }
+
   /* Registers set are dead, or are predicable.  */
   if (! dead_or_predicable (test_bb, then_bb, else_bb,
 			    single_succ_edge (then_bb), 1))
@@ -3864,6 +3872,9 @@ find_if_case_1 (basic_block test_bb, edg
       redirect_edge_succ (FALLTHRU_EDGE (test_bb), else_bb);
       new_bb = 0;
     }
+  else if (else_bb == EXIT_BLOCK_PTR)
+    new_bb = force_nonfallthru_and_redirect (FALLTHRU_EDGE (test_bb),
+					     else_bb, else_target);
   else
     new_bb = redirect_edge_and_branch_force (FALLTHRU_EDGE (test_bb),
 					     else_bb);
Index: gcc/jump.c
===================================================================
--- gcc/jump.c	(revision 177999)
+++ gcc/jump.c	(working copy)
@@ -29,7 +29,8 @@ along with GCC; see the file COPYING3.
    JUMP_LABEL internal field.  With this we can detect labels that
    become unused because of the deletion of all the jumps that
    formerly used them.  The JUMP_LABEL info is sometimes looked
-   at by later passes.
+   at by later passes.  For return insns, it contains either a
+   RETURN or a SIMPLE_RETURN rtx.
 
    The subroutines redirect_jump and invert_jump are used
    from other passes as well.  */
@@ -775,10 +776,10 @@ condjump_p (const_rtx insn)
     return (GET_CODE (x) == IF_THEN_ELSE
 	    && ((GET_CODE (XEXP (x, 2)) == PC
 		 && (GET_CODE (XEXP (x, 1)) == LABEL_REF
-		     || GET_CODE (XEXP (x, 1)) == RETURN))
+		     || ANY_RETURN_P (XEXP (x, 1))))
 		|| (GET_CODE (XEXP (x, 1)) == PC
 		    && (GET_CODE (XEXP (x, 2)) == LABEL_REF
-			|| GET_CODE (XEXP (x, 2)) == RETURN))));
+			|| ANY_RETURN_P (XEXP (x, 2))))));
 }
 
 /* Return nonzero if INSN is a (possibly) conditional jump inside a
@@ -807,11 +808,11 @@ condjump_in_parallel_p (const_rtx insn)
     return 0;
   if (XEXP (SET_SRC (x), 2) == pc_rtx
       && (GET_CODE (XEXP (SET_SRC (x), 1)) == LABEL_REF
-	  || GET_CODE (XEXP (SET_SRC (x), 1)) == RETURN))
+	  || ANY_RETURN_P (XEXP (SET_SRC (x), 1))))
     return 1;
   if (XEXP (SET_SRC (x), 1) == pc_rtx
       && (GET_CODE (XEXP (SET_SRC (x), 2)) == LABEL_REF
-	  || GET_CODE (XEXP (SET_SRC (x), 2)) == RETURN))
+	  || ANY_RETURN_P (XEXP (SET_SRC (x), 2))))
     return 1;
   return 0;
 }
@@ -873,8 +874,9 @@ any_condjump_p (const_rtx insn)
   a = GET_CODE (XEXP (SET_SRC (x), 1));
   b = GET_CODE (XEXP (SET_SRC (x), 2));
 
-  return ((b == PC && (a == LABEL_REF || a == RETURN))
-	  || (a == PC && (b == LABEL_REF || b == RETURN)));
+  return ((b == PC && (a == LABEL_REF || a == RETURN || a == SIMPLE_RETURN))
+	  || (a == PC
+	      && (b == LABEL_REF || b == RETURN || b == SIMPLE_RETURN)));
 }
 
 /* Return the label of a conditional jump.  */
@@ -911,6 +913,7 @@ returnjump_p_1 (rtx *loc, void *data ATT
   switch (GET_CODE (x))
     {
     case RETURN:
+    case SIMPLE_RETURN:
     case EH_RETURN:
       return true;
 
Index: gcc/function.c
===================================================================
--- gcc/function.c	(revision 177999)
+++ gcc/function.c	(working copy)
@@ -5306,7 +5306,11 @@ static void
 emit_return_into_block (basic_block bb)
 {
   rtx jump = emit_jump_insn_after (gen_return (), BB_END (bb));
-  JUMP_LABEL (jump) = ret_rtx;
+  rtx pat = PATTERN (jump);
+  if (GET_CODE (pat) == PARALLEL)
+    pat = XVECEXP (pat, 0, 0);
+  gcc_assert (ANY_RETURN_P (pat));
+  JUMP_LABEL (jump) = pat;
 }
 #endif /* HAVE_return */
 
Index: gcc/print-rtl.c
===================================================================
--- gcc/print-rtl.c	(revision 177999)
+++ gcc/print-rtl.c	(working copy)
@@ -328,6 +328,8 @@ print_rtx (const_rtx in_rtx)
 	    fprintf (outfile, "\n%s%*s -> ", print_rtx_head, indent * 2, "");
 	    if (GET_CODE (JUMP_LABEL (in_rtx)) == RETURN)
 	      fprintf (outfile, "return");
+	    else if (GET_CODE (JUMP_LABEL (in_rtx)) == SIMPLE_RETURN)
+	      fprintf (outfile, "simple_return");
 	    else
 	      fprintf (outfile, "%d", INSN_UID (JUMP_LABEL (in_rtx)));
 	  }
Index: gcc/bt-load.c
===================================================================
--- gcc/bt-load.c	(revision 177999)
+++ gcc/bt-load.c	(working copy)
@@ -558,7 +558,7 @@ compute_defs_uses_and_gen (fibheap_t all
 		      /* Check for sibcall.  */
 		      if (GET_CODE (pat) == PARALLEL)
 			for (i = XVECLEN (pat, 0) - 1; i >= 0; i--)
-			  if (GET_CODE (XVECEXP (pat, 0, i)) == RETURN)
+			  if (ANY_RETURN_P (XVECEXP (pat, 0, i)))
 			    {
 			      COMPL_HARD_REG_SET (call_saved,
 						  call_used_reg_set);
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c	(revision 177999)
+++ gcc/emit-rtl.c	(working copy)
@@ -2518,6 +2518,7 @@ verify_rtx_sharing (rtx orig, rtx insn)
     case PC:
     case CC0:
     case RETURN:
+    case SIMPLE_RETURN:
     case SCRATCH:
       return;
       /* SCRATCH must be shared because they represent distinct values.  */
@@ -2725,6 +2726,7 @@ repeat:
     case PC:
     case CC0:
     case RETURN:
+    case SIMPLE_RETURN:
     case SCRATCH:
       /* SCRATCH must be shared because they represent distinct values.  */
       return;
@@ -2845,6 +2847,7 @@ repeat:
     case PC:
     case CC0:
     case RETURN:
+    case SIMPLE_RETURN:
       return;
 
     case DEBUG_INSN:
@@ -5008,7 +5011,7 @@ classify_insn (rtx x)
     return CODE_LABEL;
   if (GET_CODE (x) == CALL)
     return CALL_INSN;
-  if (GET_CODE (x) == RETURN)
+  if (ANY_RETURN_P (x))
     return JUMP_INSN;
   if (GET_CODE (x) == SET)
     {
@@ -5264,6 +5267,7 @@ copy_insn_1 (rtx orig)
     case PC:
     case CC0:
     case RETURN:
+    case SIMPLE_RETURN:
       return orig;
     case CLOBBER:
       if (REG_P (XEXP (orig, 0)) && REGNO (XEXP (orig, 0)) < FIRST_PSEUDO_REGISTER)
@@ -5521,6 +5525,7 @@ init_emit_regs (void)
   /* Assign register numbers to the globally defined register rtx.  */
   pc_rtx = gen_rtx_fmt_ (PC, VOIDmode);
   ret_rtx = gen_rtx_fmt_ (RETURN, VOIDmode);
+  simple_return_rtx = gen_rtx_fmt_ (SIMPLE_RETURN, VOIDmode);
   cc0_rtx = gen_rtx_fmt_ (CC0, VOIDmode);
   stack_pointer_rtx = gen_raw_REG (Pmode, STACK_POINTER_REGNUM);
   frame_pointer_rtx = gen_raw_REG (Pmode, FRAME_POINTER_REGNUM);
Index: gcc/cfglayout.c
===================================================================
--- gcc/cfglayout.c	(revision 177999)
+++ gcc/cfglayout.c	(working copy)
@@ -767,6 +767,7 @@ fixup_reorder_chain (void)
     {
       edge e_fall, e_taken, e;
       rtx bb_end_insn;
+      rtx ret_label = NULL_RTX;
       basic_block nb, src_bb;
       edge_iterator ei;
 
@@ -786,6 +787,7 @@ fixup_reorder_chain (void)
       bb_end_insn = BB_END (bb);
       if (JUMP_P (bb_end_insn))
 	{
+	  ret_label = JUMP_LABEL (bb_end_insn);
 	  if (any_condjump_p (bb_end_insn))
 	    {
 	      /* This might happen if the conditional jump has side
@@ -899,7 +901,7 @@ fixup_reorder_chain (void)
 	 Note force_nonfallthru can delete E_FALL and thus we have to
 	 save E_FALL->src prior to the call to force_nonfallthru.  */
       src_bb = e_fall->src;
-      nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest);
+      nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label);
       if (nb)
 	{
 	  nb->il.rtl->visited = 1;
Index: gcc/rtl.c
===================================================================
--- gcc/rtl.c	(revision 177999)
+++ gcc/rtl.c	(working copy)
@@ -256,6 +256,7 @@ copy_rtx (rtx orig)
     case PC:
     case CC0:
     case RETURN:
+    case SIMPLE_RETURN:
     case SCRATCH:
       /* SCRATCH must be shared because they represent distinct values.  */
       return orig;
Index: gcc/rtl.h
===================================================================
--- gcc/rtl.h	(revision 177999)
+++ gcc/rtl.h	(working copy)
@@ -432,8 +432,9 @@ struct GTY((variable_size)) rtvec_def {
   (JUMP_P (INSN) && (GET_CODE (PATTERN (INSN)) == ADDR_VEC || \
 		     GET_CODE (PATTERN (INSN)) == ADDR_DIFF_VEC))
 
-/* Predicate yielding nonzero iff X is a return.  */
-#define ANY_RETURN_P(X) ((X) == ret_rtx)
+/* Predicate yielding nonzero iff X is a return or simple_return.  */
+#define ANY_RETURN_P(X) \
+  (GET_CODE (X) == RETURN || GET_CODE (X) == SIMPLE_RETURN)
 
 /* 1 if X is a unary operator.  */
 
@@ -2111,6 +2112,7 @@ enum global_rtl_index
   GR_PC,
   GR_CC0,
   GR_RETURN,
+  GR_SIMPLE_RETURN,
   GR_STACK_POINTER,
   GR_FRAME_POINTER,
 /* For register elimination to work properly these hard_frame_pointer_rtx,
@@ -2206,6 +2208,7 @@ extern struct target_rtl *this_target_rt
 /* Standard pieces of rtx, to be substituted directly into things.  */
 #define pc_rtx                  (global_rtl[GR_PC])
 #define ret_rtx                 (global_rtl[GR_RETURN])
+#define simple_return_rtx       (global_rtl[GR_SIMPLE_RETURN])
 #define cc0_rtx                 (global_rtl[GR_CC0])
 
 /* All references to certain hard regs, except those created
Index: gcc/combine.c
===================================================================
--- gcc/combine.c	(revision 177999)
+++ gcc/combine.c	(working copy)
@@ -6303,7 +6303,7 @@ simplify_set (rtx x)
   rtx *cc_use;
 
   /* (set (pc) (return)) gets written as (return).  */
-  if (GET_CODE (dest) == PC && GET_CODE (src) == RETURN)
+  if (GET_CODE (dest) == PC && ANY_RETURN_P (src))
     return src;
 
   /* Now that we know for sure which bits of SRC we are using, see if we can
Index: gcc/resource.c
===================================================================
--- gcc/resource.c	(revision 177999)
+++ gcc/resource.c	(working copy)
@@ -492,7 +492,7 @@ find_dead_or_set_registers (rtx target,
 	  if (jump_count++ < 10)
 	    {
 	      if (any_uncondjump_p (this_jump_insn)
-		  || GET_CODE (PATTERN (this_jump_insn)) == RETURN)
+		  || ANY_RETURN_P (PATTERN (this_jump_insn)))
 		{
 		  next = JUMP_LABEL (this_jump_insn);
 		  if (ANY_RETURN_P (next))
@@ -829,7 +829,7 @@ mark_set_resources (rtx x, struct resour
 static bool
 return_insn_p (const_rtx insn)
 {
-  if (JUMP_P (insn) && GET_CODE (PATTERN (insn)) == RETURN)
+  if (JUMP_P (insn) && ANY_RETURN_P (PATTERN (insn)))
     return true;
 
   if (NONJUMP_INSN_P (insn) && GET_CODE (PATTERN (insn)) == SEQUENCE)
Index: gcc/basic-block.h
===================================================================
--- gcc/basic-block.h	(revision 177999)
+++ gcc/basic-block.h	(working copy)
@@ -804,7 +804,7 @@ extern rtx block_label (basic_block);
 extern bool purge_all_dead_edges (void);
 extern bool purge_dead_edges (basic_block);
 extern bool fixup_abnormal_edges (void);
-extern basic_block force_nonfallthru_and_redirect (edge, basic_block);
+extern basic_block force_nonfallthru_and_redirect (edge, basic_block, rtx);
 
 /* In cfgbuild.c.  */
 extern void find_many_sub_basic_blocks (sbitmap);
Index: gcc/sched-vis.c
===================================================================
--- gcc/sched-vis.c	(revision 177999)
+++ gcc/sched-vis.c	(working copy)
@@ -554,6 +554,9 @@ print_pattern (char *buf, const_rtx x, i
     case RETURN:
       sprintf (buf, "return");
       break;
+    case SIMPLE_RETURN:
+      sprintf (buf, "simple_return");
+      break;
     case CALL:
       print_exp (buf, x, verbose);
       break;
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 177999)
+++ gcc/config/i386/i386.c	(working copy)
@@ -30545,7 +30545,7 @@ ix86_pad_returns (void)
       rtx prev;
       bool replace = false;
 
-      if (!JUMP_P (ret) || GET_CODE (PATTERN (ret)) != RETURN
+      if (!JUMP_P (ret) || !ANY_RETURN_P (PATTERN (ret))
 	  || optimize_bb_for_size_p (bb))
 	continue;
       for (prev = PREV_INSN (ret); prev; prev = PREV_INSN (prev))
@@ -30596,7 +30596,7 @@ ix86_count_insn_bb (basic_block bb)
     {
       /* Only happen in exit blocks.  */
       if (JUMP_P (insn)
-	  && GET_CODE (PATTERN (insn)) == RETURN)
+	  && ANY_RETURN_P (PATTERN (insn)))
 	break;
 
       if (NONDEBUG_INSN_P (insn)
@@ -30669,7 +30669,7 @@ ix86_pad_short_function (void)
   FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR->preds)
     {
       rtx ret = BB_END (e->src);
-      if (JUMP_P (ret) && GET_CODE (PATTERN (ret)) == RETURN)
+      if (JUMP_P (ret) && ANY_RETURN_P (PATTERN (ret)))
 	{
 	  int insn_count = ix86_count_insn (e->src);
 
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 177999)
+++ gcc/config/arm/arm.c	(working copy)
@@ -17723,6 +17723,7 @@ arm_final_prescan_insn (rtx insn)
 
   /* If we start with a return insn, we only succeed if we find another one.  */
   int seeking_return = 0;
+  enum rtx_code return_code = UNKNOWN;
 
   /* START_INSN will hold the insn from where we start looking.  This is the
      first insn after the following code_label if REVERSE is true.  */
@@ -17761,7 +17762,7 @@ arm_final_prescan_insn (rtx insn)
 	  else
 	    return;
 	}
-      else if (GET_CODE (body) == RETURN)
+      else if (ANY_RETURN_P (body))
         {
 	  start_insn = next_nonnote_insn (start_insn);
 	  if (GET_CODE (start_insn) == BARRIER)
@@ -17772,6 +17773,7 @@ arm_final_prescan_insn (rtx insn)
 	    {
 	      reverse = TRUE;
 	      seeking_return = 1;
+	      return_code = GET_CODE (body);
 	    }
 	  else
 	    return;
@@ -17812,11 +17814,15 @@ arm_final_prescan_insn (rtx insn)
 	  label = XEXP (XEXP (SET_SRC (body), 2), 0);
 	  then_not_else = FALSE;
 	}
-      else if (GET_CODE (XEXP (SET_SRC (body), 1)) == RETURN)
-	seeking_return = 1;
-      else if (GET_CODE (XEXP (SET_SRC (body), 2)) == RETURN)
+      else if (ANY_RETURN_P (XEXP (SET_SRC (body), 1)))
+	{
+	  seeking_return = 1;
+	  return_code = GET_CODE (XEXP (SET_SRC (body), 1));
+	}
+      else if (ANY_RETURN_P (XEXP (SET_SRC (body), 2)))
         {
 	  seeking_return = 1;
+	  return_code = GET_CODE (XEXP (SET_SRC (body), 2));
 	  then_not_else = FALSE;
         }
       else
@@ -17913,12 +17919,11 @@ arm_final_prescan_insn (rtx insn)
 		}
 	      /* Fail if a conditional return is undesirable (e.g. on a
 		 StrongARM), but still allow this if optimizing for size.  */
-	      else if (GET_CODE (scanbody) == RETURN
+	      else if (GET_CODE (scanbody) == return_code
 		       && !use_return_insn (TRUE, NULL)
 		       && !optimize_size)
 		fail = TRUE;
-	      else if (GET_CODE (scanbody) == RETURN
-		       && seeking_return)
+	      else if (GET_CODE (scanbody) == return_code)
 	        {
 		  arm_ccfsm_state = 2;
 		  succeed = TRUE;
Index: gcc/config/mips/mips.md
===================================================================
--- gcc/config/mips/mips.md	(revision 177999)
+++ gcc/config/mips/mips.md	(working copy)
@@ -777,6 +777,8 @@ (define_code_iterator any_ge [ge geu])
 (define_code_iterator any_lt [lt ltu])
 (define_code_iterator any_le [le leu])
 
+(define_code_iterator any_return [return simple_return])
+
 ;; <u> expands to an empty string when doing a signed operation and
 ;; "u" when doing an unsigned operation.
 (define_code_attr u [(sign_extend "") (zero_extend "u")
@@ -798,7 +800,9 @@ (define_code_attr optab [(ashift "ashl")
 			 (xor "xor")
 			 (and "and")
 			 (plus "add")
-			 (minus "sub")])
+			 (minus "sub")
+			 (return "return")
+			 (simple_return "simple_return")])
 
 ;; <insn> expands to the name of the insn that implements a particular code.
 (define_code_attr insn [(ashift "sll")
@@ -5713,21 +5717,26 @@ (define_expand "sibcall_epilogue"
 ;; allows jump optimizations to work better.
 
 (define_expand "return"
-  [(return)]
+  [(simple_return)]
   "mips_can_use_return_insn ()"
   { mips_expand_before_return (); })
 
-(define_insn "*return"
-  [(return)]
-  "mips_can_use_return_insn ()"
+(define_expand "simple_return"
+  [(simple_return)]
+  ""
+  { mips_expand_before_return (); })
+
+(define_insn "*<optab>"
+  [(any_return)]
+  ""
   "%*j\t$31%/"
   [(set_attr "type"	"jump")
    (set_attr "mode"	"none")])
 
 ;; Normal return.
 
-(define_insn "return_internal"
-  [(return)
+(define_insn "<optab>_internal"
+  [(any_return)
    (use (match_operand 0 "pmode_register_operand" ""))]
   ""
   "%*j\t%0%/"
Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c	(revision 177999)
+++ gcc/config/mips/mips.c	(working copy)
@@ -10453,7 +10453,8 @@ mips_expand_epilogue (bool sibcall_p)
 	    regno = GP_REG_FIRST + 7;
 	  else
 	    regno = RETURN_ADDR_REGNUM;
-	  emit_jump_insn (gen_return_internal (gen_rtx_REG (Pmode, regno)));
+	  emit_jump_insn (gen_simple_return_internal (gen_rtx_REG (Pmode,
+								   regno)));
 	}
     }
 
Index: gcc/cfgrtl.c
===================================================================
--- gcc/cfgrtl.c	(revision 177999)
+++ gcc/cfgrtl.c	(working copy)
@@ -1117,10 +1117,13 @@ rtl_redirect_edge_and_branch (edge e, ba
 }
 
 /* Like force_nonfallthru below, but additionally performs redirection
-   Used by redirect_edge_and_branch_force.  */
+   Used by redirect_edge_and_branch_force.  JUMP_LABEL is used only
+   when redirecting to the EXIT_BLOCK, it is either ret_rtx or
+   simple_return_rtx, indicating which kind of returnjump to create.
+   It should be NULL otherwise.  */
 
 basic_block
-force_nonfallthru_and_redirect (edge e, basic_block target)
+force_nonfallthru_and_redirect (edge e, basic_block target, rtx jump_label)
 {
   basic_block jump_block, new_bb = NULL, src = e->src;
   rtx note;
@@ -1252,12 +1255,25 @@ force_nonfallthru_and_redirect (edge e,
   e->flags &= ~EDGE_FALLTHRU;
   if (target == EXIT_BLOCK_PTR)
     {
+      if (jump_label == ret_rtx)
+	{
 #ifdef HAVE_return
-	emit_jump_insn_after_setloc (gen_return (), BB_END (jump_block), loc);
-	JUMP_LABEL (BB_END (jump_block)) = ret_rtx;
+	  emit_jump_insn_after_setloc (gen_return (), BB_END (jump_block), loc);
 #else
-	gcc_unreachable ();
+	  gcc_unreachable ();
+#endif
+	}
+      else
+	{
+	  gcc_assert (jump_label == simple_return_rtx);
+#ifdef HAVE_simple_return
+	  emit_jump_insn_after_setloc (gen_simple_return (),
+				       BB_END (jump_block), loc);
+#else
+	  gcc_unreachable ();
 #endif
+	}
+      JUMP_LABEL (BB_END (jump_block)) = jump_label;
     }
   else
     {
@@ -1284,7 +1300,7 @@ force_nonfallthru_and_redirect (edge e,
 static basic_block
 rtl_force_nonfallthru (edge e)
 {
-  return force_nonfallthru_and_redirect (e, e->dest);
+  return force_nonfallthru_and_redirect (e, e->dest, NULL_RTX);
 }
 
 /* Redirect edge even at the expense of creating new jump insn or
@@ -1301,7 +1317,7 @@ rtl_redirect_edge_and_branch_force (edge
   /* In case the edge redirection failed, try to force it to be non-fallthru
      and redirect newly created simplejump.  */
   df_set_bb_dirty (e->src);
-  return force_nonfallthru_and_redirect (e, target);
+  return force_nonfallthru_and_redirect (e, target, NULL_RTX);
 }
 
 /* The given edge should potentially be a fallthru edge.  If that is in
Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi	(revision 177999)
+++ gcc/doc/md.texi	(working copy)
@@ -4992,6 +4992,20 @@ some class of functions only requires on
 return.  Normally, the applicable functions are those which do not need
 to save any registers or allocate stack space.
 
+It is valid for this pattern to expand to an instruction using
+@code{simple_return} if no epilogue is required.
+
+@cindex @code{simple_return} instruction pattern
+@item @samp{simple_return}
+Subroutine return instruction.  This instruction pattern name should be
+defined only if a single instruction can do all the work of returning
+from a function on a path where no epilogue is required.  This pattern
+is very similar to the @code{return} instruction pattern, but it is emitted
+only by the shrink-wrapping optimization on paths where the function
+prologue has not been executed, and a function return should occur without
+any of the effects of the epilogue.  Additional uses may be introduced on
+paths where both the prologue and the epilogue have executed.
+
 @findex reload_completed
 @findex leaf_function_p
 For such machines, the condition specified in this pattern should only

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/6] Shrink-wrapping
  2011-08-24 19:23         ` Bernd Schmidt
@ 2011-08-24 20:48           ` Richard Sandiford
  2011-08-24 20:55             ` Bernd Schmidt
  2011-08-28 10:58           ` H.J. Lu
  1 sibling, 1 reply; 73+ messages in thread
From: Richard Sandiford @ 2011-08-24 20:48 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches

Bernd Schmidt <bernds@codesourcery.com> writes:
> On 08/03/11 17:38, Richard Sandiford wrote:
>> Bernd Schmidt <bernds@codesourcery.com> writes:
>>> +@findex simple_return
>>> +@item (simple_return)
>>> +Like @code{(return)}, but truly represents only a function return, while
>>> +@code{(return)} may represent an insn that also performs other functions
>>> +of the function epilogue.  Like @code{(return)}, this may also occur in
>>> +conditional jumps.
>> 
>> Sorry, I've forgotton the outcome of the discussion about what happens
>> on targets whose return expands to the same code as their simple_return.
>> Do the targets still need both "return" and "simple_return" rtxes?
>
> It's important to distinguish between these names as rtxes that can
> occur in instruction patterns, and a use as a standard pattern name.
> When a "return" pattern is generated, it should either fail or expand to
> something that performs both the epilogue and the return. A
> "simple_return" expands to something that performs only the return.
>
> Most targets allow "return" patterns only if the epilogue is empty. In
> that case, "return" and "simple_return" can expand to the same insn; it
> does not matter if that insn uses "simple_return" or "return", as they
> are equivalent in the absence of an epilogue. It would be slightly nicer
> to use "simple_return" in the patterns everywhere except ARM, but ports
> don't need to be changed.
>
>> Do they need both md patterns (but potentially using the same rtx
>> underneath)?
>
> The "return" standard pattern is needed for the existing optimizations
> (inserting returns in-line rather than jumping to the end of the
> function). Typically, it always fails if the function needs an epilogue,
> except in the ARM case.
> For shrink-wrapping to work, a port needs a "simple_return" pattern,
> which the compiler can use even if parts of the function need an
> epilogue. So yes, they have different purposes.
>
>> I ask because the rtl.def comment implies that those targets still
>> need both expanders and both rtxes.  If that's so, I think it needs
>> to be mentioned here too.  E.g. something like:
>> 
>>   Like @code{(return)}, but truly represents only a function return, while
>>   @code{(return)} may represent an insn that also performs other functions
>>   of the function epilogue.  @code{(return)} only occurs on paths that
>>   pass through the function prologue, while @code{(simple_return)}
>>   only occurs on paths that do not pass through the prologue.
>
> This is not accurate for the rtx code. It is mostly accurate for the
> standard pattern name. A simple_return rtx may occur just after an
> epilogue, i.e. on a path that has passed through the prologue.
>
> Even for the simple_return pattern, I'm not sure reorg.c couldn't
> introduce new expansions in a location after both prologue and epilogue.

Ah, OK, thanks.  That clears up my confusion, and the new md.texi
documentation looks nicely thorough.

One of the points I was trying to make (badly) was that rtl.def
and rtl.texi, which both document the rtx rather than the pattern,
didn't seem to agree on what the semantics were.  rtl.texi says:

@findex simple_return
@item (simple_return)
Like @code{(return)}, but truly represents only a function return, while
@code{(return)} may represent an insn that also performs other functions
of the function epilogue.  Like @code{(return)}, this may also occur in
conditional jumps.

which from what you say above seems to be accurate, but rtl.def says:

/* A plain return, to be used on paths that are reached without going
   through the function prologue.  */
DEF_RTL_EXPR(SIMPLE_RETURN, "simple_return", "", RTX_EXTRA)

which is slightly different, and seems to be describing the pattern
more than the rtx.  I think the rtl.def comment should be along the
same lines as the rtl.texi documentation.

OK with that change from a MIPS and rtl and perspective.

Richard

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/6] Shrink-wrapping
  2011-08-24 20:48           ` Richard Sandiford
@ 2011-08-24 20:55             ` Bernd Schmidt
  2011-08-26 14:49               ` Ramana Radhakrishnan
  0 siblings, 1 reply; 73+ messages in thread
From: Bernd Schmidt @ 2011-08-24 20:55 UTC (permalink / raw)
  To: GCC Patches, richard.sandiford, Richard Earnshaw, ramana.radhakrishnan

On 08/24/11 19:17, Richard Sandiford wrote:
> OK with that change from a MIPS and rtl and perspective.

Thanks. What else is in there? Trivial x86 changes, and a slightly less
trivial but still tiny ARM bit, I suppose. Richard/Ramana?


Bernd

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/6] Shrink-wrapping
  2011-08-24 20:55             ` Bernd Schmidt
@ 2011-08-26 14:49               ` Ramana Radhakrishnan
  2011-08-26 14:58                 ` Bernd Schmidt
  0 siblings, 1 reply; 73+ messages in thread
From: Ramana Radhakrishnan @ 2011-08-26 14:49 UTC (permalink / raw)
  To: Bernd Schmidt
  Cc: GCC Patches, richard.sandiford, Richard Earnshaw, ramana.radhakrishnan

On 24 August 2011 18:23, Bernd Schmidt <bernds@codesourcery.com> wrote:
> On 08/24/11 19:17, Richard Sandiford wrote:
>> OK with that change from a MIPS and rtl and perspective.
>
> Thanks. What else is in there? Trivial x86 changes, and a slightly less
> trivial but still tiny ARM bit, I suppose. Richard/Ramana?

Sorry about the delayed review -  I read through this for a bit this
afternoon and for a while I must admit I was confused for a while by
why the arm.md changes and the other changes in the backend hadn't
made it in here from the original patch.

This is OK but please watch out for any fall-out next week.

cheers
Ramana

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/6] Shrink-wrapping
  2011-08-26 14:49               ` Ramana Radhakrishnan
@ 2011-08-26 14:58                 ` Bernd Schmidt
  2011-08-26 15:06                   ` Ramana Radhakrishnan
  0 siblings, 1 reply; 73+ messages in thread
From: Bernd Schmidt @ 2011-08-26 14:58 UTC (permalink / raw)
  To: Ramana Radhakrishnan
  Cc: GCC Patches, richard.sandiford, Richard Earnshaw, ramana.radhakrishnan

On 08/26/11 16:32, Ramana Radhakrishnan wrote:
> On 24 August 2011 18:23, Bernd Schmidt <bernds@codesourcery.com> wrote:
>> On 08/24/11 19:17, Richard Sandiford wrote:
>>> OK with that change from a MIPS and rtl and perspective.
>>
>> Thanks. What else is in there? Trivial x86 changes, and a slightly less
>> trivial but still tiny ARM bit, I suppose. Richard/Ramana?
> 
> Sorry about the delayed review -  I read through this for a bit this

Nothing delayed about it really :)

> afternoon and for a while I must admit I was confused for a while by
> why the arm.md changes and the other changes in the backend hadn't
> made it in here from the original patch.

You mean the introduction of simple_return patterns for ARM? The patch
is split up further (this one is now piece 2/3 of the original patch
4/6) and I've postponed these until the final shrink-wrapping patch. In
this patch I've only made some MIPS changes in this area, more as a
proof-of-concept rather than because they gain anything yet.

> This is OK but please watch out for any fall-out next week.

Thanks!


Bernd

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/6] Shrink-wrapping
  2011-08-26 14:58                 ` Bernd Schmidt
@ 2011-08-26 15:06                   ` Ramana Radhakrishnan
  0 siblings, 0 replies; 73+ messages in thread
From: Ramana Radhakrishnan @ 2011-08-26 15:06 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches, richard.sandiford, Richard Earnshaw

On 26 August 2011 15:36, Bernd Schmidt <bernds@codesourcery.com> wrote:
> On 08/26/11 16:32, Ramana Radhakrishnan wrote:
>> On 24 August 2011 18:23, Bernd Schmidt <bernds@codesourcery.com> wrote:
>>> On 08/24/11 19:17, Richard Sandiford wrote:
>
> You mean the introduction of simple_return patterns for ARM? The patch
> is split up further (this one is now piece 2/3 of the original patch
> 4/6) and I've postponed these until the final shrink-wrapping patch. In
> this patch I've only made some MIPS changes in this area, more as a
> proof-of-concept rather than because they gain anything yet.


Yes that's what I meant and figured out later. Thanks for making that
explicit. Richard Sandiford did point that out to me on IRC as I was
pretty much scratching my head about why some of the other changes
were missing :) .

cheers
Ramana

^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH 4/6] Shrink-wrapping
  2011-08-24 19:23         ` Bernd Schmidt
  2011-08-24 20:48           ` Richard Sandiford
@ 2011-08-28 10:58           ` H.J. Lu
  1 sibling, 0 replies; 73+ messages in thread
From: H.J. Lu @ 2011-08-28 10:58 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches, richard.sandiford

On Wed, Aug 24, 2011 at 9:46 AM, Bernd Schmidt <bernds@codesourcery.com> wrote:
> On 08/03/11 17:38, Richard Sandiford wrote:
>> Bernd Schmidt <bernds@codesourcery.com> writes:
>>> +@findex simple_return
>>> +@item (simple_return)
>>> +Like @code{(return)}, but truly represents only a function return, while
>>> +@code{(return)} may represent an insn that also performs other functions
>>> +of the function epilogue.  Like @code{(return)}, this may also occur in
>>> +conditional jumps.
>>
>> Sorry, I've forgotton the outcome of the discussion about what happens
>> on targets whose return expands to the same code as their simple_return.
>> Do the targets still need both "return" and "simple_return" rtxes?
>
> It's important to distinguish between these names as rtxes that can
> occur in instruction patterns, and a use as a standard pattern name.
> When a "return" pattern is generated, it should either fail or expand to
> something that performs both the epilogue and the return. A
> "simple_return" expands to something that performs only the return.
>
> Most targets allow "return" patterns only if the epilogue is empty. In
> that case, "return" and "simple_return" can expand to the same insn; it
> does not matter if that insn uses "simple_return" or "return", as they
> are equivalent in the absence of an epilogue. It would be slightly nicer
> to use "simple_return" in the patterns everywhere except ARM, but ports
> don't need to be changed.
>
>> Do they need both md patterns (but potentially using the same rtx
>> underneath)?
>
> The "return" standard pattern is needed for the existing optimizations
> (inserting returns in-line rather than jumping to the end of the
> function). Typically, it always fails if the function needs an epilogue,
> except in the ARM case.
> For shrink-wrapping to work, a port needs a "simple_return" pattern,
> which the compiler can use even if parts of the function need an
> epilogue. So yes, they have different purposes.
>
>> I ask because the rtl.def comment implies that those targets still
>> need both expanders and both rtxes.  If that's so, I think it needs
>> to be mentioned here too.  E.g. something like:
>>
>>   Like @code{(return)}, but truly represents only a function return, while
>>   @code{(return)} may represent an insn that also performs other functions
>>   of the function epilogue.  @code{(return)} only occurs on paths that
>>   pass through the function prologue, while @code{(simple_return)}
>>   only occurs on paths that do not pass through the prologue.
>
> This is not accurate for the rtx code. It is mostly accurate for the
> standard pattern name. A simple_return rtx may occur just after an
> epilogue, i.e. on a path that has passed through the prologue.
>
> Even for the simple_return pattern, I'm not sure reorg.c couldn't
> introduce new expansions in a location after both prologue and epilogue.
>
>>   Like @code{(return)}, @code{(simple_return)} may also occur in
>>   conditional jumps.
>>
>> You need to document the simple_return pattern in md.texi too.
>
> I was trying to update the documentation to only the current state after
> the patch. The thinking was that without shrink-wrapping, nothing
> generates this pattern, so documenting it would be misleading.
> However, with the mips changes in this version of the patch, reorg.c
> does make use of this pattern, so I've added documentation
>
>>> @@ -3498,6 +3506,8 @@ relax_delay_slots (rtx first)
>>>      continue;
>>>
>>>        target_label = JUMP_LABEL (delay_insn);
>>> +      if (target_label && ANY_RETURN_P (target_label))
>>> +    continue;
>>>
>>>        if (!ANY_RETURN_P (target_label))
>>>      {
>>
>> This doesn't look like a pure "handle return as well as simple return"
>> change.  Is the idea that every following test only makes sense for
>> labels, and that things like:
>>
>>         && prev_active_insn (target_label) == insn
>>
>> (to pick just one example) are actively dangerous for returns?
>
> That probably was the idea. Looking at it again, there's one case at the
> bottom of the loop which may be safe, but given that there were no code
> generation differences with the patch on three targets with
> define_delay, I've done:
>
>> If so, I think you should remove the immediately-following.
>> "if (!ANY_RETURN_P (target_label))" condition and reindent the body.
>
> this.
>
>> Given what you said about JUMP_LABEL sometimes being null,
>> I think we need either (a) to check whether each *_return_label
>> is null before comparing it with JUMP_LABEL, or (b) to ensure that
>> we're dealing with a jump to a label.  (b) seems neater IMO
>> (as a call to jump_to_label_p).
>
> Done.
>
>>
>>> +#if defined HAVE_return || defined HAVE_simple_return
>>> +  if (
>>>  #ifdef HAVE_return
>>> -  if (HAVE_return && end_of_function_label != 0)
>>> +      (HAVE_return && function_return_label != 0)
>>> +#else
>>> +      0
>>> +#endif
>>> +#ifdef HAVE_simple_return
>>> +      || (HAVE_simple_return && function_simple_return_label != 0)
>>> +#endif
>>> +      )
>>>      make_return_insns (first);
>>>  #endif
>>
>> Eww.
>
> Restructured.
>
>> (define_code_iterator any_return [return simple_return])
>>
>> and just change the appropriate returns to any_returns.
>
> I've done this a bit differently - to show that it can be done, I've
> changed mips to always emit simple_return rtxs, even for "return"
> patterns (no code generation changes observed again).
>
> This version regression tested on mips64-elf, c/c++/objc.
>
>
> Bernd
>

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50215


-- 
H.J.

^ permalink raw reply	[flat|nested] 73+ messages in thread

end of thread, other threads:[~2011-08-27 19:35 UTC | newest]

Thread overview: 73+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-23 14:44 Shrink-wrapping: Introduction Bernd Schmidt
2011-03-23 14:46 ` [PATCH 1/6] Disallow predicating the prologue Bernd Schmidt
2011-03-31 13:20   ` Jeff Law
2011-04-01 18:59   ` H.J. Lu
2011-04-01 21:08     ` Bernd Schmidt
2011-03-23 14:48 ` [PATCH 2/6] Unique return rtx Bernd Schmidt
2011-03-31 13:23   ` Jeff Law
2011-05-03 11:54     ` Bernd Schmidt
2011-03-23 14:51 ` [PATCH 3/6] Allow jumps in epilogues Bernd Schmidt
2011-03-23 16:46   ` Richard Henderson
2011-03-23 16:49     ` Bernd Schmidt
2011-03-23 17:19       ` Richard Henderson
2011-03-23 17:24         ` Bernd Schmidt
2011-03-23 17:27           ` Richard Henderson
2011-03-24 10:30             ` Bernd Schmidt
2011-03-25 17:51         ` Bernd Schmidt
2011-03-26  5:33           ` Richard Henderson
2011-03-31 20:09             ` Bernd Schmidt
2011-03-31 21:51               ` Richard Henderson
2011-03-31 22:36                 ` Bernd Schmidt
2011-03-31 23:57                   ` Richard Henderson
2011-04-05 21:59                 ` Bernd Schmidt
2011-04-11 17:10                   ` Richard Henderson
2011-04-13 14:16                     ` Bernd Schmidt
2011-04-13 15:14                       ` Bernd Schmidt
2011-04-13 15:16                       ` Bernd Schmidt
2011-04-13 15:17                       ` Bernd Schmidt
2011-04-13 15:28                     ` Bernd Schmidt
2011-04-13 14:44                       ` Richard Henderson
2011-04-13 14:54                         ` Jakub Jelinek
2011-04-15 16:29                       ` Bernd Schmidt
2011-03-23 14:56 ` [PATCH 5/6] Generate more shrink-wrapping opportunities Bernd Schmidt
2011-03-23 15:03   ` Jeff Law
2011-03-23 15:05     ` Bernd Schmidt
2011-03-23 15:18       ` Jeff Law
2011-03-31 13:26   ` Jeff Law
2011-03-31 13:34     ` Bernd Schmidt
2011-03-23 14:56 ` [PATCH 4/6] Shrink-wrapping Bernd Schmidt
2011-07-07 14:51   ` Richard Sandiford
2011-07-07 15:40     ` Bernd Schmidt
2011-07-07 17:00       ` Paul Koning
2011-07-07 17:02         ` Jeff Law
2011-07-07 17:05           ` Paul Koning
2011-07-07 17:08             ` Jeff Law
2011-07-07 17:30             ` Bernd Schmidt
2011-07-08 22:59             ` [pdp11] Emit prologue as rtl Richard Henderson
2011-07-09 13:46               ` Paul Koning
2011-07-09 16:53                 ` Richard Henderson
2011-07-07 15:57     ` [PATCH 4/6] Shrink-wrapping Richard Earnshaw
2011-07-07 20:19       ` Richard Sandiford
2011-07-08  8:30         ` Richard Earnshaw
2011-07-08 13:57         ` Bernd Schmidt
2011-07-11 11:24           ` Richard Sandiford
2011-07-11 11:42             ` Bernd Schmidt
2011-07-21  3:57     ` Bernd Schmidt
2011-07-21 11:25       ` Richard Sandiford
2011-07-28 11:48         ` Bernd Schmidt
2011-07-28 12:45           ` Richard Sandiford
2011-07-28 23:30           ` Richard Earnshaw
2011-07-29 12:40             ` Bernd Schmidt
2011-08-03 10:42           ` Alan Modra
2011-08-03 11:19             ` Bernd Schmidt
2011-08-02  8:40     ` Bernd Schmidt
2011-08-03 15:39       ` Richard Sandiford
2011-08-24 19:23         ` Bernd Schmidt
2011-08-24 20:48           ` Richard Sandiford
2011-08-24 20:55             ` Bernd Schmidt
2011-08-26 14:49               ` Ramana Radhakrishnan
2011-08-26 14:58                 ` Bernd Schmidt
2011-08-26 15:06                   ` Ramana Radhakrishnan
2011-08-28 10:58           ` H.J. Lu
2011-07-07 21:41   ` Michael Hope
2011-03-23 14:57 ` [PATCH 6/6] A testcase Bernd Schmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).