public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 3/9] dce: Don't dead-code delete separately wrapped restores
  2016-06-08  1:48 [PATCH 0/9] separate shrink-wrapping Segher Boessenkool
@ 2016-06-08  1:48 ` Segher Boessenkool
  2016-06-08  1:48 ` [PATCH 1/9] separate shrink-wrap: New command-line flag, status flag, hooks, and doc Segher Boessenkool
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 60+ messages in thread
From: Segher Boessenkool @ 2016-06-08  1:48 UTC (permalink / raw)
  To: gcc-patches; +Cc: dje.gcc, Segher Boessenkool

Deleting restores (before a noreturn) that are dead confuses dwarf2cfi.


2016-06-07  Segher Boessenkool  <segher@kernel.crashing.org>

	* dce.c (delete_unmarked_insns): Don't delete instructions with
	a REG_CFA_RESTORE note.

---
 gcc/dce.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/gcc/dce.c b/gcc/dce.c
index ea3fb00..d510287 100644
--- a/gcc/dce.c
+++ b/gcc/dce.c
@@ -587,6 +587,15 @@ delete_unmarked_insns (void)
 	  if (!dbg_cnt (dce))
 	    continue;
 
+	  if (crtl->shrink_wrapped_separate
+	      && find_reg_note (insn, REG_CFA_RESTORE, NULL))
+	    {
+	      if (dump_file)
+		fprintf (dump_file, "DCE: NOT deleting insn %d, it's a "
+				    "callee-save restore\n", INSN_UID (insn));
+	      continue;
+	    }
+
 	  if (dump_file)
 	    fprintf (dump_file, "DCE: Deleting insn %d\n", INSN_UID (insn));
 
-- 
1.9.3

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 0/9] separate shrink-wrapping
@ 2016-06-08  1:48 Segher Boessenkool
  2016-06-08  1:48 ` [PATCH 3/9] dce: Don't dead-code delete separately wrapped restores Segher Boessenkool
                   ` (11 more replies)
  0 siblings, 12 replies; 60+ messages in thread
From: Segher Boessenkool @ 2016-06-08  1:48 UTC (permalink / raw)
  To: gcc-patches; +Cc: dje.gcc, Segher Boessenkool

This patch series introduces separate shrink-wrapping.

There are many things the prologue/epilogue of a function do, and most of
those things can be done independently.  For example, most of the time,
for many targets, the save of callee-saved registers can be done later
than the "main" prologue.

Doing so helps quite a bit because the prologue is expensive for functions
that do not need everything it does done for every path through the
function; often, the hot paths do not need much at all, e.g. not those
things the prologue needs to do for the function to call other functions.

The first patch creates a command-line flag, some hooks, a status flag
("is this function wrapped separately", used by later passes), and
documentation for these things.

The next six patches are to prevent later passes from mishandling the
epilogue instructions that now appear before the epilogue: mostly, you
cannot do much to instructions with a REG_CFA_RESTORE note without
confusing dwarf2cfi.  The cprop one is for prologue instructions.

Then, the main patch.  And finally a patch for PowerPC that implements
separate wrapping for GPRs and LR.

Tested on powerpc64-linux (-m32/-m64, -mlra/-mno-lra), and on
powerpc64le-linux.  Previous versions of this series also tested on
x86_64-linux.

Is this okay for trunk?


Segher


Segher Boessenkool (9):
  separate shrink-wrap: New command-line flag, status flag, hooks, and doc
  cfgcleanup: Don't confuse CFI when -fshrink-wrap-separate
  dce: Don't dead-code delete separately wrapped restores
  regrename: Don't rename restores
  regrename: Don't run if function was separately shrink-wrapped
  sel-sched: Don't mess with register restores
  cprop: Leave RTX_FRAME_RELATED_P instructions alone
  shrink-wrap: shrink-wrapping for separate concerns
  rs6000: Separate shrink-wrapping

 gcc/cfgcleanup.c           |   5 +
 gcc/common.opt             |   4 +
 gcc/config/rs6000/rs6000.c | 257 ++++++++++++++++--
 gcc/dce.c                  |   9 +
 gcc/doc/invoke.texi        |  11 +-
 gcc/doc/tm.texi            |  53 ++++
 gcc/doc/tm.texi.in         |  29 ++
 gcc/emit-rtl.h             |   4 +
 gcc/function.c             |  15 +-
 gcc/regcprop.c             |   3 +
 gcc/regrename.c            |  12 +-
 gcc/sel-sched-ir.c         |   1 +
 gcc/shrink-wrap.c          | 647 +++++++++++++++++++++++++++++++++++++++++++++
 gcc/shrink-wrap.h          |   1 +
 gcc/target.def             |  56 ++++
 15 files changed, 1088 insertions(+), 19 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 2/9] cfgcleanup: Don't confuse CFI when -fshrink-wrap-separate
  2016-06-08  1:48 [PATCH 0/9] separate shrink-wrapping Segher Boessenkool
  2016-06-08  1:48 ` [PATCH 3/9] dce: Don't dead-code delete separately wrapped restores Segher Boessenkool
  2016-06-08  1:48 ` [PATCH 1/9] separate shrink-wrap: New command-line flag, status flag, hooks, and doc Segher Boessenkool
@ 2016-06-08  1:48 ` Segher Boessenkool
  2016-06-08  1:53 ` [PATCH 4/9] regrename: Don't rename restores Segher Boessenkool
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 60+ messages in thread
From: Segher Boessenkool @ 2016-06-08  1:48 UTC (permalink / raw)
  To: gcc-patches; +Cc: dje.gcc, Segher Boessenkool

cfgcleanup would try to join noreturn paths with different concerns handled.
This then fails in dwarf2cfi.


2016-06-07  Segher Boessenkool  <segher@kernel.crashing.org>

	* cfgcleanup.c (outgoing_edges_match): Don't join noreturn calls
	if shrink_wrapped_separate.

---
 gcc/cfgcleanup.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c
index 023b9d2..f08e4ee 100644
--- a/gcc/cfgcleanup.c
+++ b/gcc/cfgcleanup.c
@@ -1824,6 +1824,11 @@ outgoing_edges_match (int mode, basic_block bb1, basic_block bb2)
           || !find_reg_note (last1, REG_ARGS_SIZE, NULL)))
     return false;
 
+  /* If shrink-wrapping separate concerns, joining noreturn calls that
+     have different concerns set up will confuse dwarf2cfi.  */
+  if (!nonfakeedges && crtl->shrink_wrapped_separate)
+    return false;
+
   /* fallthru edges must be forwarded to the same destination.  */
   if (fallthru1)
     {
-- 
1.9.3

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 1/9] separate shrink-wrap: New command-line flag, status flag, hooks, and doc
  2016-06-08  1:48 [PATCH 0/9] separate shrink-wrapping Segher Boessenkool
  2016-06-08  1:48 ` [PATCH 3/9] dce: Don't dead-code delete separately wrapped restores Segher Boessenkool
@ 2016-06-08  1:48 ` Segher Boessenkool
  2016-06-08  1:48 ` [PATCH 2/9] cfgcleanup: Don't confuse CFI when -fshrink-wrap-separate Segher Boessenkool
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 60+ messages in thread
From: Segher Boessenkool @ 2016-06-08  1:48 UTC (permalink / raw)
  To: gcc-patches; +Cc: dje.gcc, Segher Boessenkool

This patch adds a new command-line flag "-fshrink-wrap-separate", a status
flag "shrink_wrapped_separate", hooks for abstracting the target concerns,
and documentation for all those.


2016-06-07  Segher Boessenkool  <segher@kernel.crashing.org>

	* common.opt (-fshrink-wrap-separate): New flag.
	* doc/invoke.texi: Document it.
	* doc/tm.texi.in (Shrink-wrapping separate concerns): New subsection.
	* doc/tm.texi: Regenerate.
	* emit-rtl.h (struct rtl_data): New field shrink_wrapped_separate.
	* target.def (shrink_wrap): New hook vector.
	(get_separate_concerns, concerns_for_bb, disqualify_concerns,
	emit_prologue_concerns, emit_epilogue_concerns, set_handled_concerns):
	New hooks.

---
 gcc/common.opt      |  4 ++++
 gcc/doc/invoke.texi | 11 ++++++++++-
 gcc/doc/tm.texi     | 53 ++++++++++++++++++++++++++++++++++++++++++++++++++
 gcc/doc/tm.texi.in  | 29 +++++++++++++++++++++++++++
 gcc/emit-rtl.h      |  4 ++++
 gcc/target.def      | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 156 insertions(+), 1 deletion(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 2bb576c..b00f538 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2153,6 +2153,10 @@ Common Report Var(flag_shrink_wrap) Optimization
 Emit function prologues only before parts of the function that need it,
 rather than at the top of the function.
 
+fshrink-wrap-separate
+Common Report Var(flag_shrink_wrap_separate) Init(1) Optimization
+Shrink-wrap parts of the prologue and epilogue separately.
+
 fsignaling-nans
 Common Report Var(flag_signaling_nans) Optimization SetByCombined
 Disable optimizations observable by IEEE signaling NaNs.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e3a2399..21df5c5 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -394,7 +394,8 @@ Objective-C and Objective-C++ Dialects}.
 -fschedule-insns -fschedule-insns2 -fsection-anchors @gol
 -fselective-scheduling -fselective-scheduling2 @gol
 -fsel-sched-pipelining -fsel-sched-pipelining-outer-loops @gol
--fsemantic-interposition -fshrink-wrap -fsignaling-nans @gol
+-fsemantic-interposition -fshrink-wrap -fshrink-wrap-separate @gol
+-fsignaling-nans @gol
 -fsingle-precision-constant -fsplit-ivs-in-unroller @gol
 -fsplit-paths @gol
 -fsplit-wide-types -fssa-backprop -fssa-phiopt @gol
@@ -6263,6 +6264,7 @@ compilation time.
 -fmove-loop-invariants @gol
 -freorder-blocks @gol
 -fshrink-wrap @gol
+-fshrink-wrap-separate @gol
 -fsplit-wide-types @gol
 -fssa-backprop @gol
 -fssa-phiopt @gol
@@ -7180,6 +7182,13 @@ Emit function prologues only before parts of the function that need it,
 rather than at the top of the function.  This flag is enabled by default at
 @option{-O} and higher.
 
+@item -fshrink-wrap-separate
+@opindex fshrink-wrap-separate
+Shrink-wrap separate parts of the prologue and epilogue separately, so that
+those parts are only executed when needed.
+This option is on by default, but has no effect unless @option{-fshrink-wrap}
+is also turned on.
+
 @item -fcaller-saves
 @opindex fcaller-saves
 Enable allocation of values to registers that are clobbered by
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index b318615..c326599 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -2924,6 +2924,7 @@ This describes the stack layout and calling conventions.
 * Function Entry::
 * Profiling::
 * Tail Calls::
+* Shrink-wrapping separate concerns::
 * Stack Smashing Protection::
 * Miscellaneous Register Hooks::
 @end menu
@@ -4852,6 +4853,58 @@ This hook should add additional registers that are computed by the prologue to t
 True if a function's return statements should be checked for matching the function's return type.  This includes checking for falling off the end of a non-void function.  Return false if no such check should be made.
 @end deftypefn
 
+@node Shrink-wrapping separate concerns
+@subsection Shrink-wrapping separate concerns
+@cindex shrink-wrapping separate concerns
+
+The prologue does a lot of separate things: save callee-saved registers,
+do whatever needs to be done to be able to call things (save the return
+address, align the stack, whatever; different for each target), set up a
+stack frame, do whatever needs to be done for the static chain (if anything),
+set up registers for PIC, etc.  Using the following hooks those concerns
+can be shrink-wrapped separately, so that the initialization (and possibly
+teardown) for those concerns is not done on execution paths where it is
+unnecessary.
+
+What exactly those concerns are is up to the target code; the generic
+code treats them abstractly.
+
+@deftypefn {Target Hook} sbitmap TARGET_SHRINK_WRAP_GET_SEPARATE_CONCERNS (void)
+This hook should return an @code{sbitmap} with the bits set for those
+concerns that can be separately shrink-wrapped in the current function.
+Return @code{NULL} if the current function should not get any separate
+shrink-wrapping.
+Don't define this hook if it would always return @code{NULL}.
+If it is defined, the other hooks in this group have to be defined as well.
+@end deftypefn
+
+@deftypefn {Target Hook} sbitmap TARGET_SHRINK_WRAP_CONCERNS_FOR_BB (basic_block)
+This hook should return an @code{sbitmap} with the bits set for those
+concerns that the @code{basic_block} needs set up, does set up, or does
+otherwise touch.
+@end deftypefn
+
+@deftypefn {Target Hook} void TARGET_SHRINK_WRAP_DISQUALIFY_CONCERNS (sbitmap @var{concerns}, edge @var{e}, sbitmap @var{edge_concerns}, bool @var{is_prologue})
+This hook should clear the bits in the @var{concerns} bitmap for those
+concerns in @var{edge_concerns} that the target cannot handle on edge
+@var{e}, where @var{is_prologue} says if this is for a prologue or an
+epilogue instead.
+@end deftypefn
+
+@deftypefn {Target Hook} void TARGET_SHRINK_WRAP_EMIT_PROLOGUE_CONCERNS (sbitmap)
+Emit prologue insns for the concerns indicated by the parameter.
+@end deftypefn
+
+@deftypefn {Target Hook} void TARGET_SHRINK_WRAP_EMIT_EPILOGUE_CONCERNS (sbitmap)
+Emit epilogue insns for the concerns indicated by the parameter.
+@end deftypefn
+
+@deftypefn {Target Hook} void TARGET_SHRINK_WRAP_SET_HANDLED_CONCERNS (sbitmap)
+Mark the concerns in the parameter as handled, so that the @code{prologue}
+and @code{epilogue} named patterns know to ignore those concerns.  Don't
+hang on to the @code{sbitmap}, it will be deleted after this call.
+@end deftypefn
+
 @node Stack Smashing Protection
 @subsection Stack smashing protection
 @cindex stack smashing protection
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 1e8423c..05bb50a 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -2530,6 +2530,7 @@ This describes the stack layout and calling conventions.
 * Function Entry::
 * Profiling::
 * Tail Calls::
+* Shrink-wrapping separate concerns::
 * Stack Smashing Protection::
 * Miscellaneous Register Hooks::
 @end menu
@@ -3789,6 +3790,34 @@ the function prologue.  Normally, the profiling code comes after.
 
 @hook TARGET_WARN_FUNC_RETURN
 
+@node Shrink-wrapping separate concerns
+@subsection Shrink-wrapping separate concerns
+@cindex shrink-wrapping separate concerns
+
+The prologue does a lot of separate things: save callee-saved registers,
+do whatever needs to be done to be able to call things (save the return
+address, align the stack, whatever; different for each target), set up a
+stack frame, do whatever needs to be done for the static chain (if anything),
+set up registers for PIC, etc.  Using the following hooks those concerns
+can be shrink-wrapped separately, so that the initialization (and possibly
+teardown) for those concerns is not done on execution paths where it is
+unnecessary.
+
+What exactly those concerns are is up to the target code; the generic
+code treats them abstractly.
+
+@hook TARGET_SHRINK_WRAP_GET_SEPARATE_CONCERNS
+
+@hook TARGET_SHRINK_WRAP_CONCERNS_FOR_BB
+
+@hook TARGET_SHRINK_WRAP_DISQUALIFY_CONCERNS
+
+@hook TARGET_SHRINK_WRAP_EMIT_PROLOGUE_CONCERNS
+
+@hook TARGET_SHRINK_WRAP_EMIT_EPILOGUE_CONCERNS
+
+@hook TARGET_SHRINK_WRAP_SET_HANDLED_CONCERNS
+
 @node Stack Smashing Protection
 @subsection Stack smashing protection
 @cindex stack smashing protection
diff --git a/gcc/emit-rtl.h b/gcc/emit-rtl.h
index 39dfce9..279e580 100644
--- a/gcc/emit-rtl.h
+++ b/gcc/emit-rtl.h
@@ -254,6 +254,10 @@ struct GTY(()) rtl_data {
   /* True if we performed shrink-wrapping for the current function.  */
   bool shrink_wrapped;
 
+  /* True if we performed shrink-wrapping for separate concerns for
+     the current function.  */
+  bool shrink_wrapped_separate;
+
   /* Nonzero if function being compiled doesn't modify the stack pointer
      (ignoring the prologue and epilogue).  This is only valid after
      pass_stack_ptr_mod has run.  */
diff --git a/gcc/target.def b/gcc/target.def
index a4df363..0c57cbf 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -5742,6 +5742,62 @@ DEFHOOK
  bool, (tree),
  hook_bool_tree_true)
 
+#undef HOOK_PREFIX
+#define HOOK_PREFIX "TARGET_SHRINK_WRAP_"
+HOOK_VECTOR (TARGET_SHRINK_WRAP_HOOKS, shrink_wrap)
+
+DEFHOOK
+(get_separate_concerns,
+ "This hook should return an @code{sbitmap} with the bits set for those\n\
+concerns that can be separately shrink-wrapped in the current function.\n\
+Return @code{NULL} if the current function should not get any separate\n\
+shrink-wrapping.\n\
+Don't define this hook if it would always return @code{NULL}.\n\
+If it is defined, the other hooks in this group have to be defined as well.",
+ sbitmap, (void),
+ NULL)
+
+DEFHOOK
+(concerns_for_bb,
+ "This hook should return an @code{sbitmap} with the bits set for those\n\
+concerns that the @code{basic_block} needs set up, does set up, or does\n\
+otherwise touch.",
+ sbitmap, (basic_block),
+ NULL)
+
+DEFHOOK
+(disqualify_concerns,
+ "This hook should clear the bits in the @var{concerns} bitmap for those\n\
+concerns in @var{edge_concerns} that the target cannot handle on edge\n\
+@var{e}, where @var{is_prologue} says if this is for a prologue or an\n\
+epilogue instead.",
+ void, (sbitmap concerns, edge e, sbitmap edge_concerns, bool is_prologue),
+ NULL)
+
+DEFHOOK
+(emit_prologue_concerns,
+ "Emit prologue insns for the concerns indicated by the parameter.",
+ void, (sbitmap),
+ NULL)
+
+DEFHOOK
+(emit_epilogue_concerns,
+ "Emit epilogue insns for the concerns indicated by the parameter.",
+ void, (sbitmap),
+ NULL)
+
+DEFHOOK
+(set_handled_concerns,
+ "Mark the concerns in the parameter as handled, so that the @code{prologue}\n\
+and @code{epilogue} named patterns know to ignore those concerns.  Don't\n\
+hang on to the @code{sbitmap}, it will be deleted after this call.",
+ void, (sbitmap),
+ NULL)
+
+HOOK_VECTOR_END (shrink_wrap)
+#undef HOOK_PREFIX
+#define HOOK_PREFIX "TARGET_"
+
 /* Determine the type of unwind info to emit for debugging.  */
 DEFHOOK
 (debug_unwind_info,
-- 
1.9.3

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 6/9] sel-sched: Don't mess with register restores
  2016-06-08  1:48 [PATCH 0/9] separate shrink-wrapping Segher Boessenkool
                   ` (3 preceding siblings ...)
  2016-06-08  1:53 ` [PATCH 4/9] regrename: Don't rename restores Segher Boessenkool
@ 2016-06-08  1:53 ` Segher Boessenkool
  2016-06-08  1:54 ` [PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped Segher Boessenkool
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 60+ messages in thread
From: Segher Boessenkool @ 2016-06-08  1:53 UTC (permalink / raw)
  To: gcc-patches; +Cc: dje.gcc, Segher Boessenkool

If selective scheduling copies register restores it confuses dwarf2cfi.


2016-06-07  Segher Boessenkool  <segher@kernel.crashing.org>

	* sel-sched-ir.c (init_global_and_expr_for_insn): Don't copy
	instructions with a REG_CFA_RESTORE note.

---
 gcc/sel-sched-ir.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c
index 83f813a..4a3984a 100644
--- a/gcc/sel-sched-ir.c
+++ b/gcc/sel-sched-ir.c
@@ -3015,6 +3015,7 @@ init_global_and_expr_for_insn (insn_t insn)
           /* TRAP_IF though have an INSN code is control_flow_insn_p ().  */
           || control_flow_insn_p (insn)
           || volatile_insn_p (PATTERN (insn))
+	  || find_reg_note (insn, REG_CFA_RESTORE, NULL)
           || (targetm.cannot_copy_insn_p
               && targetm.cannot_copy_insn_p (insn)))
         force_unique_p = true;
-- 
1.9.3

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 4/9] regrename: Don't rename restores
  2016-06-08  1:48 [PATCH 0/9] separate shrink-wrapping Segher Boessenkool
                   ` (2 preceding siblings ...)
  2016-06-08  1:48 ` [PATCH 2/9] cfgcleanup: Don't confuse CFI when -fshrink-wrap-separate Segher Boessenkool
@ 2016-06-08  1:53 ` Segher Boessenkool
  2016-06-08  1:53 ` [PATCH 6/9] sel-sched: Don't mess with register restores Segher Boessenkool
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 60+ messages in thread
From: Segher Boessenkool @ 2016-06-08  1:53 UTC (permalink / raw)
  To: gcc-patches; +Cc: dje.gcc, Segher Boessenkool

A restore is supposed to restore some certain register.  Restoring it
into some other register will not work.  Don't.


2016-06-07  Segher Boessenkool  <segher@kernel.crashing.org>

	* regrename.c (build_def_use): Invalidate chains that have a
	REG_CFA_RESTORE on some instruction.

---
 gcc/regrename.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/gcc/regrename.c b/gcc/regrename.c
index 54c7768..00a5d02 100644
--- a/gcc/regrename.c
+++ b/gcc/regrename.c
@@ -1867,6 +1867,12 @@ build_def_use (basic_block bb)
 		scan_rtx (insn, &XEXP (note, 0), NO_REGS, terminate_dead,
 			  OP_IN);
 	      }
+
+	  /* Step 8: Kill the chains involving register restores.  Those
+	     should restore _that_ register.  */
+	  for (note = REG_NOTES (insn); note; note = XEXP (note, 1))
+	    if (REG_NOTE_KIND (note) == REG_CFA_RESTORE)
+	      scan_rtx (insn, &XEXP (note, 0), NO_REGS, mark_all_read, OP_IN);
 	}
       else if (DEBUG_INSN_P (insn)
 	       && !VAR_LOC_UNKNOWN_P (INSN_VAR_LOCATION_LOC (insn)))
-- 
1.9.3

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 7/9] cprop: Leave RTX_FRAME_RELATED_P instructions alone
  2016-06-08  1:48 [PATCH 0/9] separate shrink-wrapping Segher Boessenkool
                   ` (5 preceding siblings ...)
  2016-06-08  1:54 ` [PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped Segher Boessenkool
@ 2016-06-08  1:54 ` Segher Boessenkool
  2016-06-08  2:03 ` [PATCH 8/9] shrink-wrap: shrink-wrapping for separate concerns Segher Boessenkool
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 60+ messages in thread
From: Segher Boessenkool @ 2016-06-08  1:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: dje.gcc, Segher Boessenkool

Doing cprop on frame-related instructions blows up spectacularly.
So don't.


2016-06-07  Segher Boessenkool  <segher@kernel.crashing.org>

	* regcprop.c (copyprop_hardreg_forward_1): Don't change
	RTX_FRAME_RELATED_P instructions.

---
 gcc/regcprop.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/regcprop.c b/gcc/regcprop.c
index 933cc8a..0c01aab 100644
--- a/gcc/regcprop.c
+++ b/gcc/regcprop.c
@@ -829,6 +829,10 @@ copyprop_hardreg_forward_1 (basic_block bb, struct value_data *vd)
 	    }
 	}
 
+      /* Don't change prologue instructions.  */
+      if (RTX_FRAME_RELATED_P (insn))
+	set = NULL;
+
       /* Special-case plain move instructions, since we may well
 	 be able to do the move from a different register class.  */
       if (set && REG_P (SET_SRC (set)))
-- 
1.9.3

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped
  2016-06-08  1:48 [PATCH 0/9] separate shrink-wrapping Segher Boessenkool
                   ` (4 preceding siblings ...)
  2016-06-08  1:53 ` [PATCH 6/9] sel-sched: Don't mess with register restores Segher Boessenkool
@ 2016-06-08  1:54 ` Segher Boessenkool
  2016-06-08  9:18   ` Bernd Schmidt
  2016-06-08  1:54 ` [PATCH 7/9] cprop: Leave RTX_FRAME_RELATED_P instructions alone Segher Boessenkool
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 60+ messages in thread
From: Segher Boessenkool @ 2016-06-08  1:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: dje.gcc, Segher Boessenkool

Unfortunately even after the previous patch there are still a few cases
where regrename creates invalid code when used together with separate
shrink-wrapping.  I haven't found the cause of those.  This patch disables
regrename for functions that were separately shrink-wrapped.


2016-06-07  Segher Boessenkool  <segher@kernel.crashing.org>

	* regrename.c (gate): Disable regrename if shrink_wrapped_separate
	is set in crtl.

---
 gcc/regrename.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/regrename.c b/gcc/regrename.c
index 00a5d02..5740b1a 100644
--- a/gcc/regrename.c
+++ b/gcc/regrename.c
@@ -1956,7 +1956,11 @@ public:
   /* opt_pass methods: */
   virtual bool gate (function *)
     {
-      return (optimize > 0 && (flag_rename_registers));
+      /* regrename creates wrong code for exception handling, if used together
+         with separate shrink-wrapping.  Disable for now, until we have
+	 figured out what exactly is going on.  */
+      return (optimize > 0 && flag_rename_registers
+	      && !crtl->shrink_wrapped_separate);
     }
 
   virtual unsigned int execute (function *) { return regrename_optimize (); }
-- 
1.9.3

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 8/9] shrink-wrap: shrink-wrapping for separate concerns
  2016-06-08  1:48 [PATCH 0/9] separate shrink-wrapping Segher Boessenkool
                   ` (6 preceding siblings ...)
  2016-06-08  1:54 ` [PATCH 7/9] cprop: Leave RTX_FRAME_RELATED_P instructions alone Segher Boessenkool
@ 2016-06-08  2:03 ` Segher Boessenkool
  2016-07-15 12:42   ` Bernd Schmidt
  2016-06-08  2:04 ` [PATCH 9/9] rs6000: Separate shrink-wrapping Segher Boessenkool
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 60+ messages in thread
From: Segher Boessenkool @ 2016-06-08  2:03 UTC (permalink / raw)
  To: gcc-patches; +Cc: dje.gcc, Segher Boessenkool

This is the main substance of this patch series.  The big comment before
the new code says:

/* Separate shrink-wrapping

   Instead of putting all of the prologue and epilogue in one spot, we
   can put parts of it in places that are executed less frequently.  The
   following code does this, for concerns that can have more than one
   prologue and epilogue, and where those pro- and epilogues can be
   executed more than once.

   What exactly is a concern is target-dependent.  The more usual ones
   are simple saves of callee-saved registers to the frame.  This code
   treats it abstractly (as an sbitmap of concerns), letting the target
   handle all details.

   Prologue concerns are placed in such a way that they are executed as
   infrequently as possible.  Epilogue concerns are put everywhere where
   there is an edge from a bb dominated by such a prologue concern to a
   bb not dominated by one.

   Prologues and epilogues are preferably placed into a block, either at
   the beginning or end of it, if it is needed for all predecessor resp.
   successor edges; or placed on the edge otherwise.

   If the placement of any prologue/epilogue leads to a situation we cannot
   handle (for example, an abnormal edge would need to be split, or some
   targets want to use some specific registers that may not be available
   where we want to put them), separate shrink-wrapping for that concern
   is aborted.  */


2016-06-07  Segher Boessenkool  <segher@kernel.crashing.org>

	* function.c (thread_prologue_and_epilogue_insns): Recompute the
	live info.  Call try_shrink_wrapping_separate.  Compute the
	prologue_seq afterwards, if it has possibly changed.  Compute the
	split_prologue_seq and epilogue_seq later, too.
	* shrink-wrap.c: #include cfgbuild.h.
	(dump_concern): New function.
	(struct sw): New struct.
	(SW): New function.
	(init_separate_shrink_wrap): New function.
	(fini_separate_shrink_wrap): New function.
	(place_prologue_for_one_concern): New function.
	(spread_concerns): New function.
	(disqualify_problematic_concerns): New function.
	(do_common_heads_for_concerns): New function.
	(do_common_tails_for_concerns): New function.
	(insert_prologue_epilogue_for_concerns): New function.
	(try_shrink_wrapping_separate): New function.
	* shrink-wrap.h: Declare try_shrink_wrapping_separate.

---
 gcc/function.c    |  15 +-
 gcc/shrink-wrap.c | 647 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 gcc/shrink-wrap.h |   1 +
 3 files changed, 661 insertions(+), 2 deletions(-)

diff --git a/gcc/function.c b/gcc/function.c
index c15d47d..cc5fa6b 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -5909,6 +5909,12 @@ make_epilogue_seq (void)
 void
 thread_prologue_and_epilogue_insns (void)
 {
+  if (optimize > 1)
+    {
+      df_live_add_problem ();
+      df_live_set_all_dirty ();
+    }
+
   df_analyze ();
 
   /* Can't deal with multiple successors of the entry block at the
@@ -5919,9 +5925,7 @@ thread_prologue_and_epilogue_insns (void)
   edge entry_edge = single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun));
   edge orig_entry_edge = entry_edge;
 
-  rtx_insn *split_prologue_seq = make_split_prologue_seq ();
   rtx_insn *prologue_seq = make_prologue_seq ();
-  rtx_insn *epilogue_seq = make_epilogue_seq ();
 
   /* Try to perform a kind of shrink-wrapping, making sure the
      prologue/epilogue is emitted only around those parts of the
@@ -5929,6 +5933,13 @@ thread_prologue_and_epilogue_insns (void)
 
   try_shrink_wrapping (&entry_edge, prologue_seq);
 
+  df_analyze ();
+  try_shrink_wrapping_separate (entry_edge->dest);
+  if (crtl->shrink_wrapped_separate)
+    prologue_seq = make_prologue_seq ();
+
+  rtx_insn *split_prologue_seq = make_split_prologue_seq ();
+  rtx_insn *epilogue_seq = make_epilogue_seq ();
 
   rtl_profile_for_bb (EXIT_BLOCK_PTR_FOR_FN (cfun));
 
diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index b85b1c3..6e157d0 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "output.h"
 #include "tree-pass.h"
 #include "cfgrtl.h"
+#include "cfgbuild.h"
 #include "params.h"
 #include "bb-reorder.h"
 #include "shrink-wrap.h"
@@ -1006,3 +1007,649 @@ try_shrink_wrapping (edge *entry_edge, rtx_insn *prologue_seq)
   BITMAP_FREE (bb_with);
   free_dominance_info (CDI_DOMINATORS);
 }
+\f
+/* Separate shrink-wrapping
+
+   Instead of putting all of the prologue and epilogue in one spot, we
+   can put parts of it in places that are executed less frequently.  The
+   following code does this, for concerns that can have more than one
+   prologue and epilogue, and where those pro- and epilogues can be
+   executed more than once.
+
+   What exactly is a concern is target-dependent.  The more usual ones
+   are simple saves of callee-saved registers to the frame.  This code
+   treats it abstractly (as an sbitmap of concerns), letting the target
+   handle all details.
+
+   Prologue concerns are placed in such a way that they are executed as
+   infrequently as possible.  Epilogue concerns are put everywhere where
+   there is an edge from a bb dominated by such a prologue concern to a
+   bb not dominated by one.
+
+   Prologues and epilogues are preferably placed into a block, either at
+   the beginning or end of it, if it is needed for all predecessor resp.
+   successor edges; or placed on the edge otherwise.
+
+   If the placement of any prologue/epilogue leads to a situation we cannot
+   handle (for example, an abnormal edge would need to be split, or some
+   targets want to use some specific registers that may not be available
+   where we want to put them), separate shrink-wrapping for that concern
+   is aborted.  */
+
+/* Print the sbitmap CONCERN to the DUMP_FILE if not empty, with the
+   label LABEL.  */
+static void
+dump_concern (const char *label, sbitmap concern)
+{
+  if (bitmap_empty_p (concern))
+    return;
+
+  fprintf (dump_file, " [%s", label);
+
+  for (unsigned int j = 0; j < concern->n_bits; j++)
+    if (bitmap_bit_p (concern, j))
+      fprintf (dump_file, " %u", j);
+
+  fprintf (dump_file, "]");
+}
+
+/* The data we collect for each bb.  */
+struct sw {
+  /* Which concerns does this BB have?  Before placing the prologues this
+     is the concerns the BB really needs; after placing prologues it is the
+     concerns it runs with.  */
+  sbitmap has_concern;
+
+  /* The concerns for which we want a prologue placed on this BB.  */
+  sbitmap pro_concern;
+
+  /* The concerns for which we placed code at the start of the BB.  */
+  sbitmap head_concern;
+
+  /* The concerns for which we placed code at the end of the BB.  */
+  sbitmap tail_concern;
+
+  /* The frequency of executing the prologue for this BB and all BBs
+     dominated by it.  */
+  gcov_type cost;
+};
+
+/* A helper function for accessing the pass-specific info.  */
+static inline struct sw *
+SW (basic_block bb)
+{
+  gcc_assert (bb->aux);
+  return (struct sw *) bb->aux;
+}
+
+/* Create the pass-specific data structures for separately shrink-wrapping
+   with concerns CONCERNS.  */
+static void
+init_separate_shrink_wrap (sbitmap concerns)
+{
+  basic_block bb;
+  FOR_ALL_BB_FN (bb, cfun)
+    {
+      bb->aux = xcalloc (1, sizeof (struct sw));
+
+      SW (bb)->has_concern = targetm.shrink_wrap.concerns_for_bb (bb);
+
+      if (dump_file)
+	{
+	  fprintf (dump_file, "bb %d concerns:", bb->index);
+	  dump_concern ("has", SW (bb)->has_concern);
+	  fprintf (dump_file, "\n");
+	}
+
+      SW (bb)->pro_concern = sbitmap_alloc (SBITMAP_SIZE (concerns));
+      SW (bb)->head_concern = sbitmap_alloc (SBITMAP_SIZE (concerns));
+      SW (bb)->tail_concern = sbitmap_alloc (SBITMAP_SIZE (concerns));
+      bitmap_clear (SW (bb)->pro_concern);
+      bitmap_clear (SW (bb)->head_concern);
+      bitmap_clear (SW (bb)->tail_concern);
+    }
+}
+
+/* Destroy the pass-specific data.  */
+static void
+fini_separate_shrink_wrap (void)
+{
+  basic_block bb;
+  FOR_ALL_BB_FN (bb, cfun)
+    if (bb->aux)
+      {
+	sbitmap_free (SW (bb)->has_concern);
+	sbitmap_free (SW (bb)->pro_concern);
+	sbitmap_free (SW (bb)->head_concern);
+	sbitmap_free (SW (bb)->tail_concern);
+	free (bb->aux);
+	bb->aux = 0;
+      }
+}
+
+/* Place the prologue for concern WHICH, in the basic blocks dominated
+   by HEAD.  Do a DFS over the dominator tree, and set PRO_CONCERN on a
+   block if either the block has HAS_CONCERN set or it is cheaper to place
+   the prologue here than in all dominator subtrees separately.  */
+static void
+place_prologue_for_one_concern (unsigned int which, basic_block head)
+{
+  basic_block bb = head;
+  bool first_visit = true;
+
+  for (;;)
+    {
+      /* First visit of a block: set cost to zero; if the block does not
+         have the concern itself, walk down.  */
+      if (first_visit)
+	{
+	  SW (bb)->cost = 0;
+
+	  if (!bitmap_bit_p (SW (bb)->has_concern, which)
+	      && first_dom_son (CDI_DOMINATORS, bb))
+	    {
+	      bb = first_dom_son (CDI_DOMINATORS, bb);
+	      continue;
+	    }
+	}
+
+      /* If this block does have the concern itself, or it is cheaper to
+         put the prologue here than in all the descendants that need it,
+	 mark it so.  If it is the same cost, put it here if there is no
+	 block reachable from this block that does not need the prologue.
+	 The actual test is a bit more stringent but catches most cases.  */
+      if (bitmap_bit_p (SW (bb)->has_concern, which)
+	  || SW (bb)->cost > bb->frequency)
+	{
+	  SW (bb)->cost = bb->frequency;
+	  bitmap_set_bit (SW (bb)->pro_concern, which);
+	}
+      else if (SW (bb)->cost == bb->frequency)
+	{
+	  basic_block kid = get_immediate_dominator (CDI_POST_DOMINATORS, bb);
+	  if (dominated_by_p (CDI_DOMINATORS, kid, bb)
+	      && bitmap_bit_p (SW (kid)->pro_concern, which))
+	    bitmap_set_bit (SW (bb)->pro_concern, which);
+	}
+
+      if (bb == head)
+	return;
+
+      basic_block parent = get_immediate_dominator (CDI_DOMINATORS, bb);
+      SW (parent)->cost += SW (bb)->cost;
+
+      /* Don't walk the tree down unless necessary.  */
+      if (next_dom_son (CDI_DOMINATORS, bb)
+          && SW (parent)->cost <= parent->frequency)
+	{
+	  bb = next_dom_son (CDI_DOMINATORS, bb);
+	  first_visit = true;
+	}
+      else
+	{
+	  bb = parent;
+	  first_visit = false;
+	}
+    }
+}
+
+/* Mark HAS_CONCERN for every block dominated by at least one block with
+   PRO_CONCERN set, starting at HEAD.  */
+static void
+spread_concerns (basic_block head)
+{
+  basic_block bb = head;
+  bool first_visit = true;
+  /* This keeps a tally of all concerns active.  */
+  sbitmap concern = SW (head)->has_concern;
+
+  for (;;)
+    {
+      if (first_visit)
+	{
+	  bitmap_ior (SW (bb)->has_concern, SW (bb)->pro_concern, concern);
+
+	  if (first_dom_son (CDI_DOMINATORS, bb))
+	    {
+	      concern = SW (bb)->has_concern;
+	      bb = first_dom_son (CDI_DOMINATORS, bb);
+	      continue;
+	    }
+	}
+
+      concern = SW (bb)->has_concern;
+
+      if (next_dom_son (CDI_DOMINATORS, bb))
+	{
+	  bb = next_dom_son (CDI_DOMINATORS, bb);
+	  basic_block parent = get_immediate_dominator (CDI_DOMINATORS, bb);
+	  concern = SW (parent)->has_concern;
+	  first_visit = true;
+	}
+      else
+	{
+	  if (bb == head)
+	    return;
+	  bb = get_immediate_dominator (CDI_DOMINATORS, bb);
+	  first_visit = false;
+	}
+    }
+}
+
+/* If we cannot handle placing some concern's prologues or epilogues where
+   we decided we should place them, unmark that concern in CONCERNS so
+   that it is not wrapped separately.  */
+static void
+disqualify_problematic_concerns (sbitmap concerns)
+{
+  sbitmap pro = sbitmap_alloc (SBITMAP_SIZE (concerns));
+  sbitmap epi = sbitmap_alloc (SBITMAP_SIZE (concerns));
+
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      edge e;
+      edge_iterator ei;
+      FOR_EACH_EDGE (e, ei, bb->succs)
+	{
+	  bitmap_and_compl (epi, SW (e->src)->has_concern,
+			    SW (e->dest)->has_concern);
+	  bitmap_and_compl (pro, SW (e->dest)->has_concern,
+			    SW (e->src)->has_concern);
+
+	  /* Ask the target what it thinks about things.  */
+	  if (!bitmap_empty_p (epi))
+	    targetm.shrink_wrap.disqualify_concerns (concerns, e, epi, false);
+	  if (!bitmap_empty_p (pro))
+	    targetm.shrink_wrap.disqualify_concerns (concerns, e, pro, true);
+
+	  /* If this edge doesn't need splitting, we're fine.  */
+	  if (single_pred_p (e->dest)
+	      && e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun))
+	    continue;
+
+	  /* If the edge can be split, that is fine too.  */
+	  if ((e->flags & EDGE_ABNORMAL) == 0)
+	    continue;
+
+	  /* We also can handle sibcalls.  */
+	  if (e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun))
+	    {
+	      gcc_assert (e->flags & EDGE_SIBCALL);
+	      continue;
+	    }
+
+	  /* Remove from consideration those concerns we would need
+	     pro/epilogues for on edges where we cannot insert them.  */
+	  bitmap_and_compl (concerns, concerns, epi);
+	  bitmap_and_compl (concerns, concerns, pro);
+
+	  if (dump_file && !bitmap_subset_p (epi, concerns))
+	    {
+	      fprintf (dump_file, "  BAD epi %d->%d", e->src->index,
+		       e->dest->index);
+	      if (e->flags & EDGE_EH)
+		fprintf (dump_file, " for EH");
+	      dump_concern ("epi", epi);
+	      fprintf (dump_file, "\n");
+	    }
+
+	  if (dump_file && !bitmap_subset_p (pro, concerns))
+	    {
+	      fprintf (dump_file, "  BAD pro %d->%d", e->src->index,
+		       e->dest->index);
+	      if (e->flags & EDGE_EH)
+		fprintf (dump_file, " for EH");
+	      dump_concern ("pro", pro);
+	      fprintf (dump_file, "\n");
+	    }
+	}
+    }
+
+  sbitmap_free (pro);
+  sbitmap_free (epi);
+}
+
+/* Place code for prologues and epilogues for CONCERNS where we can put
+   that code at the start of basic blocks.  */
+static void
+do_common_heads_for_concerns (sbitmap concerns)
+{
+  sbitmap pro = sbitmap_alloc (SBITMAP_SIZE (concerns));
+  sbitmap epi = sbitmap_alloc (SBITMAP_SIZE (concerns));
+  sbitmap tmp = sbitmap_alloc (SBITMAP_SIZE (concerns));
+
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      /* Find which prologues resp. epilogues are needed for all predecessor
+         edges to this block.  */
+      bitmap_ones (pro);
+      bitmap_ones (epi);
+
+      edge e;
+      edge_iterator ei;
+      FOR_EACH_EDGE (e, ei, bb->preds)
+	{
+	  if (e->flags & EDGE_ABNORMAL)
+	    {
+	      bitmap_clear (epi);
+	      bitmap_clear (pro);
+	      break;
+	    }
+
+	  bitmap_and_compl (tmp, SW (e->src)->has_concern,
+			    SW (e->dest)->has_concern);
+	  bitmap_and (epi, epi, tmp);
+
+	  bitmap_and_compl (tmp, SW (e->dest)->has_concern,
+			    SW (e->src)->has_concern);
+	  bitmap_and (pro, pro, tmp);
+	}
+
+      bitmap_and (epi, epi, concerns);
+      bitmap_and (pro, pro, concerns);
+
+      if (dump_file && !(bitmap_empty_p (epi) && bitmap_empty_p (pro)))
+	fprintf (dump_file, "  bb %d", bb->index);
+
+      if (dump_file && !bitmap_empty_p (epi))
+	dump_concern ("epi", epi);
+      if (dump_file && !bitmap_empty_p (pro))
+	dump_concern ("pro", pro);
+
+      if (dump_file && !(bitmap_empty_p (epi) && bitmap_empty_p (pro)))
+	fprintf (dump_file, "\n");
+
+      /* Place code after the BB note.  */
+      if (!bitmap_empty_p (pro))
+	{
+	  start_sequence ();
+	  targetm.shrink_wrap.emit_prologue_concerns (pro);
+	  rtx_insn *seq = get_insns ();
+	  end_sequence ();
+
+	  emit_insn_after (seq, bb_note (bb));
+
+	  bitmap_ior (SW (bb)->head_concern, SW (bb)->head_concern, pro);
+	}
+
+      if (!bitmap_empty_p (epi))
+	{
+	  start_sequence ();
+	  targetm.shrink_wrap.emit_epilogue_concerns (epi);
+	  rtx_insn *seq = get_insns ();
+	  end_sequence ();
+
+	  emit_insn_after (seq, bb_note (bb));
+
+	  bitmap_ior (SW (bb)->head_concern, SW (bb)->head_concern, epi);
+	}
+    }
+
+  sbitmap_free (pro);
+  sbitmap_free (epi);
+  sbitmap_free (tmp);
+}
+
+/* Place code for prologues and epilogues for CONCERNS where we can put
+   that code at the end of basic blocks.  */
+static void
+do_common_tails_for_concerns (sbitmap concerns)
+{
+  sbitmap pro = sbitmap_alloc (SBITMAP_SIZE (concerns));
+  sbitmap epi = sbitmap_alloc (SBITMAP_SIZE (concerns));
+  sbitmap tmp = sbitmap_alloc (SBITMAP_SIZE (concerns));
+
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      /* Find which prologues resp. epilogues are needed for all successor
+         edges from this block.  */
+      if (EDGE_COUNT (bb->succs) == 0)
+	continue;
+
+      bitmap_ones (pro);
+      bitmap_ones (epi);
+
+      edge e;
+      edge_iterator ei;
+      FOR_EACH_EDGE (e, ei, bb->succs)
+	{
+	  if (e->flags & EDGE_ABNORMAL)
+	    {
+	      bitmap_clear (epi);
+	      bitmap_clear (pro);
+	      break;
+	    }
+
+	  bitmap_and_compl (tmp, SW (e->src)->has_concern,
+			    SW (e->dest)->has_concern);
+	  bitmap_and_compl (tmp, tmp, SW (e->dest)->head_concern);
+	  bitmap_and (epi, epi, tmp);
+
+	  bitmap_and_compl (tmp, SW (e->dest)->has_concern,
+			    SW (e->src)->has_concern);
+	  bitmap_and_compl (tmp, tmp, SW (e->dest)->head_concern);
+	  bitmap_and (pro, pro, tmp);
+	}
+
+      bitmap_and (epi, epi, concerns);
+      bitmap_and (pro, pro, concerns);
+
+      if (dump_file && !(bitmap_empty_p (epi) && bitmap_empty_p (pro)))
+	fprintf (dump_file, "  bb %d", bb->index);
+
+      if (dump_file && !bitmap_empty_p (epi))
+	dump_concern ("epi", epi);
+      if (dump_file && !bitmap_empty_p (pro))
+	dump_concern ("pro", pro);
+
+      if (dump_file && !(bitmap_empty_p (epi) && bitmap_empty_p (pro)))
+	fprintf (dump_file, "\n");
+
+      /* Put the code at the end of the BB, but before any final jump.  */
+      if (!bitmap_empty_p (epi))
+	{
+	  start_sequence ();
+	  targetm.shrink_wrap.emit_epilogue_concerns (epi);
+	  rtx_insn *seq = get_insns ();
+	  end_sequence ();
+
+	  rtx_insn *insn = BB_END (bb);
+	  if (control_flow_insn_p (insn))
+	    emit_insn_before (seq, insn);
+	  else
+	    emit_insn_after (seq, insn);
+
+	  bitmap_ior (SW (bb)->tail_concern, SW (bb)->tail_concern, epi);
+	}
+
+      if (!bitmap_empty_p (pro))
+	{
+	  start_sequence ();
+	  targetm.shrink_wrap.emit_prologue_concerns (pro);
+	  rtx_insn *seq = get_insns ();
+	  end_sequence ();
+
+	  rtx_insn *insn = BB_END (bb);
+	  if (JUMP_P (insn) || (CALL_P (insn) && SIBLING_CALL_P (insn)))
+	    emit_insn_before (seq, insn);
+	  else
+	    emit_insn_after (seq, insn);
+
+	  bitmap_ior (SW (bb)->tail_concern, SW (bb)->tail_concern, pro);
+	}
+    }
+
+  sbitmap_free (pro);
+  sbitmap_free (epi);
+  sbitmap_free (tmp);
+}
+
+/* Place prologues and epilogues for CONCERNS on edges, if we haven't already
+   placed them inside blocks directly.  */
+static void
+insert_prologue_epilogue_for_concerns (sbitmap concerns)
+{
+  sbitmap pro = sbitmap_alloc (SBITMAP_SIZE (concerns));
+  sbitmap epi = sbitmap_alloc (SBITMAP_SIZE (concerns));
+
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      if (!bb->aux)
+	continue;
+
+      edge e;
+      edge_iterator ei;
+      FOR_EACH_EDGE (e, ei, bb->succs)
+	{
+	  bitmap_and_compl (epi, SW (e->src)->has_concern,
+			    SW (e->dest)->has_concern);
+	  bitmap_and_compl (pro, SW (e->dest)->has_concern,
+			    SW (e->src)->has_concern);
+	  bitmap_and (epi, epi, concerns);
+	  bitmap_and (pro, pro, concerns);
+	  bitmap_and_compl (epi, epi, SW (e->dest)->head_concern);
+	  bitmap_and_compl (pro, pro, SW (e->dest)->head_concern);
+	  bitmap_and_compl (epi, epi, SW (e->src)->tail_concern);
+	  bitmap_and_compl (pro, pro, SW (e->src)->tail_concern);
+
+	  if (!bitmap_empty_p (epi) || !bitmap_empty_p (pro))
+	    {
+	      if (dump_file)
+		{
+		  fprintf (dump_file, "  %d->%d", e->src->index,
+			   e->dest->index);
+		  dump_concern ("epi", epi);
+		  dump_concern ("pro", pro);
+		  fprintf (dump_file, "\n");
+		}
+
+	      start_sequence ();
+	      targetm.shrink_wrap.emit_epilogue_concerns (epi);
+	      rtx_insn *seq = get_insns ();
+	      end_sequence ();
+
+	      if (e->flags & EDGE_SIBCALL)
+		{
+		  gcc_assert (e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun));
+
+		  rtx_insn *insn = BB_END (e->src);
+		  gcc_assert (CALL_P (insn) && SIBLING_CALL_P (insn));
+		  emit_insn_before (seq, insn);
+		}
+	      else if (e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun))
+		{
+		  gcc_assert (e->flags & EDGE_FALLTHRU);
+		  basic_block new_bb = split_edge (e);
+		  emit_insn_after (seq, BB_END (new_bb));
+		}
+	      else
+		insert_insn_on_edge (seq, e);
+
+	      start_sequence ();
+	      targetm.shrink_wrap.emit_prologue_concerns (pro);
+	      seq = get_insns ();
+	      end_sequence ();
+
+	      insert_insn_on_edge (seq, e);
+	    }
+	}
+    }
+
+  sbitmap_free (pro);
+  sbitmap_free (epi);
+
+  commit_edge_insertions ();
+}
+
+/* The main entry point to this subpass.  FIRST_BB is where the prologue
+   would be normally put.  */
+void
+try_shrink_wrapping_separate (basic_block first_bb)
+{
+  if (!(SHRINK_WRAPPING_ENABLED
+	&& flag_shrink_wrap_separate
+	&& optimize_function_for_speed_p (cfun)
+	&& targetm.shrink_wrap.get_separate_concerns))
+    return;
+
+  /* We need LIVE info.  */
+  if (!df_live)
+    return;
+
+  /* We don't handle "strange" functions.  */
+  if (cfun->calls_alloca
+      || cfun->calls_setjmp
+      || cfun->can_throw_non_call_exceptions
+      || crtl->calls_eh_return
+      || crtl->has_nonlocal_goto
+      || crtl->saves_all_registers)
+    return;
+
+  /* Ask the target what concerns there are.  If it returns NULL, don't
+     do anything.  */
+  sbitmap concerns = targetm.shrink_wrap.get_separate_concerns ();
+  if (!concerns)
+    return;
+
+  calculate_dominance_info (CDI_DOMINATORS);
+  calculate_dominance_info (CDI_POST_DOMINATORS);
+
+  init_separate_shrink_wrap (concerns);
+
+  sbitmap_iterator sbi;
+  unsigned int j;
+  EXECUTE_IF_SET_IN_BITMAP (concerns, 0, j, sbi)
+    place_prologue_for_one_concern (j, first_bb);
+
+  spread_concerns (first_bb);
+
+  disqualify_problematic_concerns (concerns);
+
+  bitmap_and_compl (concerns, concerns, SW (first_bb)->has_concern);
+
+  if (bitmap_empty_p (concerns))
+    {
+      if (dump_file)
+	fprintf (dump_file, "Not wrapping anything separately.\n");
+    }
+  else
+    {
+      if (dump_file)
+	{
+	  fprintf (dump_file, "The concerns we wrap separately are");
+	  dump_concern ("sep", concerns);
+	  fprintf (dump_file, "\n");
+
+	  fprintf (dump_file, "... Inserting common heads...\n");
+	}
+
+      do_common_heads_for_concerns (concerns);
+
+      if (dump_file)
+	fprintf (dump_file, "... Inserting common tails...\n");
+
+      do_common_tails_for_concerns (concerns);
+
+      if (dump_file)
+	fprintf (dump_file, "... Inserting the more difficult ones...\n");
+
+      insert_prologue_epilogue_for_concerns (concerns);
+
+      if (dump_file)
+	fprintf (dump_file, "... Done.\n");
+
+      targetm.shrink_wrap.set_handled_concerns (concerns);
+
+      crtl->shrink_wrapped_separate = true;
+    }
+
+  fini_separate_shrink_wrap ();
+
+  sbitmap_free (concerns);
+  free_dominance_info (CDI_DOMINATORS);
+  free_dominance_info (CDI_POST_DOMINATORS);
+}
diff --git a/gcc/shrink-wrap.h b/gcc/shrink-wrap.h
index e06ab37..05fcb41 100644
--- a/gcc/shrink-wrap.h
+++ b/gcc/shrink-wrap.h
@@ -25,6 +25,7 @@ along with GCC; see the file COPYING3.  If not see
 /* In shrink-wrap.c.  */
 extern bool requires_stack_frame_p (rtx_insn *, HARD_REG_SET, HARD_REG_SET);
 extern void try_shrink_wrapping (edge *entry_edge, rtx_insn *prologue_seq);
+extern void try_shrink_wrapping_separate (basic_block first_bb);
 #define SHRINK_WRAPPING_ENABLED \
   (flag_shrink_wrap && targetm.have_simple_return ())
 
-- 
1.9.3

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 9/9] rs6000: Separate shrink-wrapping
  2016-06-08  1:48 [PATCH 0/9] separate shrink-wrapping Segher Boessenkool
                   ` (7 preceding siblings ...)
  2016-06-08  2:03 ` [PATCH 8/9] shrink-wrap: shrink-wrapping for separate concerns Segher Boessenkool
@ 2016-06-08  2:04 ` Segher Boessenkool
  2016-06-08 11:56 ` [PATCH 0/9] separate shrink-wrapping Bernd Schmidt
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 60+ messages in thread
From: Segher Boessenkool @ 2016-06-08  2:04 UTC (permalink / raw)
  To: gcc-patches; +Cc: dje.gcc, Segher Boessenkool

This implements the hooks for separate shrink-wrapping for rs6000.
It handles GPRs and LR.  The GPRs get a concern number corresponding
to their register number; LR gets concern number 0.

This improves specint by 0.9%, specfp by 0.8%, some separate benchmarks
much more (on POWER8).  It improves the hot path in various interpreters,
and e.g. in glibc's malloc.


2016-06-07  Segher Boessenkool  <segher@kernel.crashing.org>

	* config/rs6000/rs6000.c (machine_function): Add new fields
	gpr_is_wrapped_separately and lr_is_wrapped_separately.
	(TARGET_SHRINK_WRAP_GET_SEPARATE_CONCERNS,
	TARGET_SHRINK_WRAP_CONCERNS_FOR_BB,
	TARGET_SHRINK_WRAP_DISQUALIFY_CONCERNS,
	TARGET_SHRINK_WRAP_EMIT_PROLOGUE_CONCERNS,
	TARGET_SHRINK_WRAP_EMIT_EPILOGUE_CONCERNS,
	TARGET_SHRINK_WRAP_SET_HANDLED_CONCERNS): Define.
	(rs6000_get_separate_concerns): New function.
	(rs6000_concerns_for_bb): New function.
	(rs6000_disqualify_concerns): New function.
	(rs6000_emit_prologue_concerns): New function.
	(rs6000_emit_epilogue_concerns): New function.
	(rs6000_set_handled_concerns): New function.
	(rs6000_emit_prologue): Don't emit LR save if lr_is_wrapped_separately.
	Don't emit GPR saves if gpr_is_wrapped_separately for that register.
	(restore_saved_lr): Don't restore LR if lr_is_wrapped_separately.
	(rs6000_emit_epilogue): Don't emit GPR restores if
	gpr_is_wrapped_separately for that register.  Don't make a
	REG_CFA_RESTORE note for registers we did not restore, either.

---
 gcc/config/rs6000/rs6000.c | 257 ++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 242 insertions(+), 15 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index c6b2b6a..af56d8e 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -152,6 +152,10 @@ typedef struct GTY(()) machine_function
   bool split_stack_argp_used;
   /* Flag if r2 setup is needed with ELFv2 ABI.  */
   bool r2_setup_needed;
+  /* The concerns already handled by separate shrink-wrapping, which should
+     not be considered by the prologue and epilogue.  */
+  bool gpr_is_wrapped_separately[32];
+  bool lr_is_wrapped_separately;
 } machine_function;
 
 /* Support targetm.vectorize.builtin_mask_for_load.  */
@@ -1511,6 +1515,19 @@ static const struct attribute_spec rs6000_attribute_table[] =
 #undef TARGET_SET_UP_BY_PROLOGUE
 #define TARGET_SET_UP_BY_PROLOGUE rs6000_set_up_by_prologue
 
+#undef TARGET_SHRINK_WRAP_GET_SEPARATE_CONCERNS
+#define TARGET_SHRINK_WRAP_GET_SEPARATE_CONCERNS rs6000_get_separate_concerns
+#undef TARGET_SHRINK_WRAP_CONCERNS_FOR_BB
+#define TARGET_SHRINK_WRAP_CONCERNS_FOR_BB rs6000_concerns_for_bb
+#undef TARGET_SHRINK_WRAP_DISQUALIFY_CONCERNS
+#define TARGET_SHRINK_WRAP_DISQUALIFY_CONCERNS rs6000_disqualify_concerns
+#undef TARGET_SHRINK_WRAP_EMIT_PROLOGUE_CONCERNS
+#define TARGET_SHRINK_WRAP_EMIT_PROLOGUE_CONCERNS rs6000_emit_prologue_concerns
+#undef TARGET_SHRINK_WRAP_EMIT_EPILOGUE_CONCERNS
+#define TARGET_SHRINK_WRAP_EMIT_EPILOGUE_CONCERNS rs6000_emit_epilogue_concerns
+#undef TARGET_SHRINK_WRAP_SET_HANDLED_CONCERNS
+#define TARGET_SHRINK_WRAP_SET_HANDLED_CONCERNS rs6000_set_handled_concerns
+
 #undef TARGET_EXTRA_LIVE_ON_ENTRY
 #define TARGET_EXTRA_LIVE_ON_ENTRY rs6000_live_on_entry
 
@@ -26111,6 +26128,201 @@ rs6000_global_entry_point_needed_p (void)
   return cfun->machine->r2_setup_needed;
 }
 
+/* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_CONCERNS.  */
+static sbitmap
+rs6000_get_separate_concerns (void)
+{
+  rs6000_stack_t *info = rs6000_stack_info ();
+
+  if (!(info->savres_strategy & SAVE_INLINE_GPRS)
+      || !(info->savres_strategy & REST_INLINE_GPRS)
+      || WORLD_SAVE_P (info))
+    return NULL;
+
+  sbitmap concerns = sbitmap_alloc (32);
+  bitmap_clear (concerns);
+
+  /* The GPRs we need saved to the frame.  */
+  int reg_size = TARGET_32BIT ? 4 : 8;
+  int offset = info->gp_save_offset;
+  if (info->push_p)
+    offset += info->total_size;
+
+  for (unsigned regno = info->first_gp_reg_save; regno < 32; regno++)
+    {
+      if (IN_RANGE (offset, -0x8000, 0x7fff)
+	  && rs6000_reg_live_or_pic_offset_p (regno))
+	bitmap_set_bit (concerns, regno);
+
+      offset += reg_size;
+    }
+
+  /* Don't mess with the hard frame pointer.  */
+  if (frame_pointer_needed)
+    bitmap_clear_bit (concerns, HARD_FRAME_POINTER_REGNUM);
+
+  /* Don't mess with the fixed TOC register.  */
+  if ((TARGET_TOC && TARGET_MINIMAL_TOC)
+      || (flag_pic == 1 && DEFAULT_ABI == ABI_V4)
+      || (flag_pic && DEFAULT_ABI == ABI_DARWIN))
+    bitmap_clear_bit (concerns, RS6000_PIC_OFFSET_TABLE_REGNUM);
+
+  /* Optimize LR save and restore if we can.  This is concern 0.  */
+  if (info->lr_save_p
+      && !(flag_pic && (DEFAULT_ABI == ABI_V4 || DEFAULT_ABI == ABI_DARWIN)))
+    {
+      offset = info->lr_save_offset;
+      if (info->push_p)
+	offset += info->total_size;
+      if (IN_RANGE (offset, -0x8000, 0x7fff))
+	bitmap_set_bit (concerns, 0);
+    }
+
+  return concerns;
+}
+
+/* Implement TARGET_SHRINK_WRAP_CONCERNS_FOR_BB.  */
+static sbitmap
+rs6000_concerns_for_bb (basic_block bb)
+{
+  rs6000_stack_t *info = rs6000_stack_info ();
+
+  bitmap in = DF_LIVE_IN (bb);
+  bitmap gen = &DF_LIVE_BB_INFO (bb)->gen;
+  bitmap kill = &DF_LIVE_BB_INFO (bb)->kill;
+
+  sbitmap concerns = sbitmap_alloc (32);
+  bitmap_clear (concerns);
+
+  /* GPRs are used in a bb if they are in the IN, GEN, or KILL sets.  */
+  for (unsigned regno = info->first_gp_reg_save; regno < 32; regno++)
+    if (bitmap_bit_p (in, regno)
+	|| bitmap_bit_p (gen, regno)
+	|| bitmap_bit_p (kill, regno))
+      bitmap_set_bit (concerns, regno);
+
+  /* LR needs to be saved around a bb if it is killed in that bb.  */
+  if (bitmap_bit_p (kill, LR_REGNO))
+    bitmap_set_bit (concerns, 0);
+
+  return concerns;
+}
+
+/* Implement TARGET_SHRINK_WRAP_DISQUALIFY_CONCERNS.  */
+static void
+rs6000_disqualify_concerns (sbitmap concerns, edge e, sbitmap edge_concerns,
+			    bool /*is_prologue*/)
+{
+  /* Our LR pro/epilogue code moves LR via R0, so R0 had better not be
+     live where we want to place that code.  */
+  if (bitmap_bit_p (edge_concerns, 0)
+      && bitmap_bit_p (DF_LIVE_IN (e->dest), 0))
+    {
+      if (dump_file)
+	fprintf (dump_file, "Disqualifying LR because GPR0 is live "
+		 "on entry to bb %d\n", e->dest->index);
+      bitmap_clear_bit (concerns, 0);
+    }
+}
+
+/* Implement TARGET_SHRINK_WRAP_EMIT_PROLOGUE_CONCERNS.  */
+static void
+rs6000_emit_prologue_concerns (sbitmap concerns)
+{
+  rs6000_stack_t *info = rs6000_stack_info ();
+  rtx sp_reg_rtx = gen_rtx_REG (Pmode, STACK_POINTER_REGNUM);
+  int reg_size = TARGET_32BIT ? 4 : 8;
+
+  /* Prologue for LR.  */
+  if (bitmap_bit_p (concerns, 0))
+    {
+      rtx reg = gen_rtx_REG (Pmode, 0);
+      emit_move_insn (reg, gen_rtx_REG (Pmode, LR_REGNO));
+      RTX_FRAME_RELATED_P (get_last_insn ()) = 1;
+
+      int offset = info->lr_save_offset;
+      if (info->push_p)
+	offset += info->total_size;
+
+      emit_insn (gen_frame_store (reg, sp_reg_rtx, offset));
+      RTX_FRAME_RELATED_P (get_last_insn ()) = 1;
+    }
+
+  /* Prologue for the GPRs.  */
+  int offset = info->gp_save_offset;
+  if (info->push_p)
+    offset += info->total_size;
+
+  for (int i = info->first_gp_reg_save; i < 32; i++)
+    {
+      if (bitmap_bit_p (concerns, i))
+	{
+	  rtx reg = gen_rtx_REG (Pmode, i);
+	  emit_insn (gen_frame_store (reg, sp_reg_rtx, offset));
+	  RTX_FRAME_RELATED_P (get_last_insn ()) = 1;
+	}
+
+      offset += reg_size;
+    }
+}
+
+/* Implement TARGET_SHRINK_WRAP_EMIT_EPILOGUE_CONCERNS.  */
+static void
+rs6000_emit_epilogue_concerns (sbitmap concerns)
+{
+  rs6000_stack_t *info = rs6000_stack_info ();
+  rtx sp_reg_rtx = gen_rtx_REG (Pmode, STACK_POINTER_REGNUM);
+  int reg_size = TARGET_32BIT ? 4 : 8;
+
+  /* Epilogue for the GPRs.  */
+  int offset = info->gp_save_offset;
+  if (info->push_p)
+    offset += info->total_size;
+
+  for (int i = info->first_gp_reg_save; i < 32; i++)
+    {
+      if (bitmap_bit_p (concerns, i))
+	{
+	  rtx reg = gen_rtx_REG (Pmode, i);
+	  emit_insn (gen_frame_load (reg, sp_reg_rtx, offset));
+	  RTX_FRAME_RELATED_P (get_last_insn ()) = 1;
+	  add_reg_note (get_last_insn (), REG_CFA_RESTORE, reg);
+	}
+
+      offset += reg_size;
+    }
+
+  /* Epilogue for LR.  */
+  if (bitmap_bit_p (concerns, 0))
+    {
+      int offset = info->lr_save_offset;
+      if (info->push_p)
+	offset += info->total_size;
+
+      rtx reg = gen_rtx_REG (Pmode, 0);
+      emit_insn (gen_frame_load (reg, sp_reg_rtx, offset));
+
+      rtx lr = gen_rtx_REG (Pmode, LR_REGNO);
+      emit_move_insn (lr, reg);
+      RTX_FRAME_RELATED_P (get_last_insn ()) = 1;
+      add_reg_note (get_last_insn (), REG_CFA_RESTORE, lr);
+    }
+}
+
+/* Implement TARGET_SHRINK_WRAP_SET_HANDLED_CONCERNS.  */
+static void
+rs6000_set_handled_concerns (sbitmap concerns)
+{
+  rs6000_stack_t *info = rs6000_stack_info ();
+
+  for (int i = info->first_gp_reg_save; i < 32; i++)
+    if (bitmap_bit_p (concerns, i))
+      cfun->machine->gpr_is_wrapped_separately[i] = true;
+
+  if (bitmap_bit_p (concerns, 0))
+    cfun->machine->lr_is_wrapped_separately = true;
+}
+
 /* Emit function prologue as insns.  */
 
 void
@@ -26368,7 +26580,8 @@ rs6000_emit_prologue (void)
     }
 
   /* If we use the link register, get it into r0.  */
-  if (!WORLD_SAVE_P (info) && info->lr_save_p)
+  if (!WORLD_SAVE_P (info) && info->lr_save_p
+      && !cfun->machine->lr_is_wrapped_separately)
     {
       rtx addr, reg, mem;
 
@@ -26596,13 +26809,16 @@ rs6000_emit_prologue (void)
     }
   else if (!WORLD_SAVE_P (info))
     {
-      int i;
-      for (i = 0; i < 32 - info->first_gp_reg_save; i++)
-	if (rs6000_reg_live_or_pic_offset_p (info->first_gp_reg_save + i))
-	  emit_frame_save (frame_reg_rtx, reg_mode,
-			   info->first_gp_reg_save + i,
-			   info->gp_save_offset + frame_off + reg_size * i,
-			   sp_off - frame_off);
+      int offset = info->gp_save_offset + frame_off;
+      for (int i = info->first_gp_reg_save; i < 32; i++)
+	{
+	  if (rs6000_reg_live_or_pic_offset_p (i)
+	      && !cfun->machine->gpr_is_wrapped_separately[i])
+	    emit_frame_save (frame_reg_rtx, reg_mode, i, offset,
+			     sp_off - frame_off);
+
+	  offset += reg_size;
+	}
     }
 
   if (crtl->calls_eh_return)
@@ -27407,6 +27623,9 @@ load_lr_save (int regno, rtx frame_reg_rtx, int offset)
 static void
 restore_saved_lr (int regno, bool exit_func)
 {
+  if (cfun->machine->lr_is_wrapped_separately)
+    return;
+
   rtx reg = gen_rtx_REG (Pmode, regno);
   rtx lr = gen_rtx_REG (Pmode, LR_REGNO);
   rtx_insn *insn = emit_move_insn (lr, reg);
@@ -28164,12 +28383,18 @@ rs6000_emit_epilogue (int sibcall)
     }
   else
     {
-      for (i = 0; i < 32 - info->first_gp_reg_save; i++)
-	if (rs6000_reg_live_or_pic_offset_p (info->first_gp_reg_save + i))
-	  emit_insn (gen_frame_load
-		     (gen_rtx_REG (reg_mode, info->first_gp_reg_save + i),
-		      frame_reg_rtx,
-		      info->gp_save_offset + frame_off + reg_size * i));
+      int offset = info->gp_save_offset + frame_off;
+      for (i = info->first_gp_reg_save; i < 32; i++)
+	{
+	  if (rs6000_reg_live_or_pic_offset_p (i)
+	      && !cfun->machine->gpr_is_wrapped_separately[i])
+	    {
+	      rtx reg = gen_rtx_REG (reg_mode, i);
+	      emit_insn (gen_frame_load (reg, frame_reg_rtx, offset));
+	    }
+
+	  offset += reg_size;
+	}
     }
 
   if (DEFAULT_ABI == ABI_V4 || flag_shrink_wrap)
@@ -28208,8 +28433,10 @@ rs6000_emit_epilogue (int sibcall)
 	    || using_load_multiple
 	    || rs6000_reg_live_or_pic_offset_p (i))
 	  {
-	    rtx reg = gen_rtx_REG (reg_mode, i);
+	    if (cfun->machine->gpr_is_wrapped_separately[i])
+	      continue;
 
+	    rtx reg = gen_rtx_REG (reg_mode, i);
 	    cfa_restores = alloc_reg_note (REG_CFA_RESTORE, reg, cfa_restores);
 	  }
     }
-- 
1.9.3

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped
  2016-06-08  1:54 ` [PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped Segher Boessenkool
@ 2016-06-08  9:18   ` Bernd Schmidt
  2016-09-09 18:41     ` Jeff Law
  0 siblings, 1 reply; 60+ messages in thread
From: Bernd Schmidt @ 2016-06-08  9:18 UTC (permalink / raw)
  To: Segher Boessenkool, gcc-patches; +Cc: dje.gcc

On 06/08/2016 03:47 AM, Segher Boessenkool wrote:
> +      /* regrename creates wrong code for exception handling, if used together
> +         with separate shrink-wrapping.  Disable for now, until we have
> +	 figured out what exactly is going on.  */

That needs to be figured out now or it'll be there forever.


Bernd

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/9] separate shrink-wrapping
  2016-06-08  1:48 [PATCH 0/9] separate shrink-wrapping Segher Boessenkool
                   ` (8 preceding siblings ...)
  2016-06-08  2:04 ` [PATCH 9/9] rs6000: Separate shrink-wrapping Segher Boessenkool
@ 2016-06-08 11:56 ` Bernd Schmidt
  2016-06-08 12:45   ` Eric Botcazou
  2016-06-08 15:16   ` Segher Boessenkool
  2016-06-09 16:12 ` Jeff Law
  2016-06-28  0:22 ` PING " Segher Boessenkool
  11 siblings, 2 replies; 60+ messages in thread
From: Bernd Schmidt @ 2016-06-08 11:56 UTC (permalink / raw)
  To: Segher Boessenkool, gcc-patches; +Cc: dje.gcc

On 06/08/2016 03:47 AM, Segher Boessenkool wrote:
> This patch series introduces separate shrink-wrapping.
[...]
> The next six patches are to prevent later passes from mishandling the
> epilogue instructions that now appear before the epilogue: mostly, you
> cannot do much to instructions with a REG_CFA_RESTORE note without
> confusing dwarf2cfi.  The cprop one is for prologue instructions.

I'll need a while to sort out my thoughts about this. On the whole I 
like having the ability to do this, but I'm worried about the fragility 
it introduces in passes after shrink-wrapping. Ideally we'd need an ix86 
implementation for test coverage reasons.

Is the usage of the word "concern" here standard for this kind of thing? 
It seems odd somehow but maybe that's just me.


Bernd

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/9] separate shrink-wrapping
  2016-06-08 11:56 ` [PATCH 0/9] separate shrink-wrapping Bernd Schmidt
@ 2016-06-08 12:45   ` Eric Botcazou
  2016-06-08 15:16   ` Segher Boessenkool
  1 sibling, 0 replies; 60+ messages in thread
From: Eric Botcazou @ 2016-06-08 12:45 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: gcc-patches, Segher Boessenkool, dje.gcc

> Is the usage of the word "concern" here standard for this kind of thing?
> It seems odd somehow but maybe that's just me.

No, I find it quite odd too.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/9] separate shrink-wrapping
  2016-06-08 11:56 ` [PATCH 0/9] separate shrink-wrapping Bernd Schmidt
  2016-06-08 12:45   ` Eric Botcazou
@ 2016-06-08 15:16   ` Segher Boessenkool
  2016-06-08 16:43     ` Bernd Schmidt
  1 sibling, 1 reply; 60+ messages in thread
From: Segher Boessenkool @ 2016-06-08 15:16 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: gcc-patches, dje.gcc

On Wed, Jun 08, 2016 at 01:55:55PM +0200, Bernd Schmidt wrote:
> On 06/08/2016 03:47 AM, Segher Boessenkool wrote:
> >This patch series introduces separate shrink-wrapping.
> [...]
> >The next six patches are to prevent later passes from mishandling the
> >epilogue instructions that now appear before the epilogue: mostly, you
> >cannot do much to instructions with a REG_CFA_RESTORE note without
> >confusing dwarf2cfi.  The cprop one is for prologue instructions.
> 
> I'll need a while to sort out my thoughts about this. On the whole I 
> like having the ability to do this, but I'm worried about the fragility 
> it introduces in passes after shrink-wrapping.

On the plus side I should have caught most of it now.  And the failures
are rarely silent, they show up during compilation already.

Most of the problems are code changes the later passes want to do that
are valid an sich but that dwarf2cfi does not like, like not restoring
a callee-saved register before a noreturn call.  Those later patches
already know not to touch epilogue instructions, but only for the single
epilogue, not for instructions scattered throughout the whole function.

> Ideally we'd need an ix86 implementation for test coverage reasons.

Yes, but someone who knows the x86 backend well will have to write that.

> Is the usage of the word "concern" here standard for this kind of thing? 
> It seems odd somehow but maybe that's just me.

There is no standard naming for this as far as I know.  I'll gladly
use a better name anyone comes up with.


Segher

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/9] separate shrink-wrapping
  2016-06-08 15:16   ` Segher Boessenkool
@ 2016-06-08 16:43     ` Bernd Schmidt
  2016-06-08 17:26       ` Segher Boessenkool
  2016-06-14 21:24       ` Segher Boessenkool
  0 siblings, 2 replies; 60+ messages in thread
From: Bernd Schmidt @ 2016-06-08 16:43 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, dje.gcc

On 06/08/2016 05:16 PM, Segher Boessenkool wrote:
> On the plus side I should have caught most of it now.  And the failures
> are rarely silent, they show up during compilation already.

That does count as a plus. Aborts in dwarf2cfi, I assume.

> Most of the problems are code changes the later passes want to do that
> are valid an sich but that dwarf2cfi does not like, like not restoring
> a callee-saved register before a noreturn call.  Those later patches
> already know not to touch epilogue instructions, but only for the single
> epilogue, not for instructions scattered throughout the whole function.

Yeah, that's a problem though - having to disable otherwise valid 
transformations is always a source of errors.

Is there a strong reason to keep thread_p_e_insns at its current 
position in the compilation process, or could it be moved later to 
expose this problem to fewer passes?

> There is no standard naming for this as far as I know.  I'll gladly
> use a better name anyone comes up with.

Maybe just subpart?


Bernd

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/9] separate shrink-wrapping
  2016-06-08 16:43     ` Bernd Schmidt
@ 2016-06-08 17:26       ` Segher Boessenkool
  2016-06-29 23:06         ` Bernd Schmidt
  2016-06-14 21:24       ` Segher Boessenkool
  1 sibling, 1 reply; 60+ messages in thread
From: Segher Boessenkool @ 2016-06-08 17:26 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: gcc-patches, dje.gcc

On Wed, Jun 08, 2016 at 06:43:23PM +0200, Bernd Schmidt wrote:
> On 06/08/2016 05:16 PM, Segher Boessenkool wrote:
> >On the plus side I should have caught most of it now.  And the failures
> >are rarely silent, they show up during compilation already.
> 
> That does count as a plus. Aborts in dwarf2cfi, I assume.

Yeah.

> >Most of the problems are code changes the later passes want to do that
> >are valid an sich but that dwarf2cfi does not like, like not restoring
> >a callee-saved register before a noreturn call.  Those later patches
> >already know not to touch epilogue instructions, but only for the single
> >epilogue, not for instructions scattered throughout the whole function.
> 
> Yeah, that's a problem though - having to disable otherwise valid 
> transformations is always a source of errors.

Ideally we would be able to e.g. dead-code delete frame restores and not
have dwarf2cfi throw fits.  I'm not certain we can always express that
in the CFI though (two paths joining at the same block, having different
locations for the saved registers -- one restored and one not).  Maybe
we could just say a register is restored even when it's not, but that
seems very fragile.

One thing I should try is put a USE of the saved registers at such
exits, maybe that helps those passes that now delete frame restores
to not do that.

> Is there a strong reason to keep thread_p_e_insns at its current 
> position in the compilation process, or could it be moved later to 
> expose this problem to fewer passes?

peephole2, bbro, split4/5, sched2 should stay later.  It seems reasonable
we still want some dce/dse there.  rnreg needs to be late, too; maybe
not that late though.  Dunno about cprop_hardreg, but I guess it wants to
be after peep2.

> >There is no standard naming for this as far as I know.  I'll gladly
> >use a better name anyone comes up with.
> 
> Maybe just subpart?

That is maybe just a bit too generic.  Naming, such a hard problem :-)


Segher

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/9] separate shrink-wrapping
  2016-06-08  1:48 [PATCH 0/9] separate shrink-wrapping Segher Boessenkool
                   ` (9 preceding siblings ...)
  2016-06-08 11:56 ` [PATCH 0/9] separate shrink-wrapping Bernd Schmidt
@ 2016-06-09 16:12 ` Jeff Law
  2016-06-09 19:57   ` Segher Boessenkool
  2016-06-28  0:22 ` PING " Segher Boessenkool
  11 siblings, 1 reply; 60+ messages in thread
From: Jeff Law @ 2016-06-09 16:12 UTC (permalink / raw)
  To: Segher Boessenkool, gcc-patches; +Cc: dje.gcc

On 06/07/2016 07:47 PM, Segher Boessenkool wrote:
> This patch series introduces separate shrink-wrapping.
>
> There are many things the prologue/epilogue of a function do, and most of
> those things can be done independently.  For example, most of the time,
> for many targets, the save of callee-saved registers can be done later
> than the "main" prologue.
>
> Doing so helps quite a bit because the prologue is expensive for functions
> that do not need everything it does done for every path through the
> function; often, the hot paths do not need much at all, e.g. not those
> things the prologue needs to do for the function to call other functions.
>
> The first patch creates a command-line flag, some hooks, a status flag
> ("is this function wrapped separately", used by later passes), and
> documentation for these things.
>
> The next six patches are to prevent later passes from mishandling the
> epilogue instructions that now appear before the epilogue: mostly, you
> cannot do much to instructions with a REG_CFA_RESTORE note without
> confusing dwarf2cfi.  The cprop one is for prologue instructions.
>
> Then, the main patch.  And finally a patch for PowerPC that implements
> separate wrapping for GPRs and LR.
>
> Tested on powerpc64-linux (-m32/-m64, -mlra/-mno-lra), and on
> powerpc64le-linux.  Previous versions of this series also tested on
> x86_64-linux.
>
> Is this okay for trunk?
>
>
> Segher
>
>
> Segher Boessenkool (9):
>   separate shrink-wrap: New command-line flag, status flag, hooks, and doc
>   cfgcleanup: Don't confuse CFI when -fshrink-wrap-separate
>   dce: Don't dead-code delete separately wrapped restores
>   regrename: Don't rename restores
>   regrename: Don't run if function was separately shrink-wrapped
>   sel-sched: Don't mess with register restores
>   cprop: Leave RTX_FRAME_RELATED_P instructions alone
>   shrink-wrap: shrink-wrapping for separate concerns
>   rs6000: Separate shrink-wrapping
I'm going to largely let Bernd own the review on this.  Just a few comments.

I certainly like the concept.  My mental model is that parts of the 
prologue might sink further than other parts of the prologue.  It's not 
an exact match, but I think it's a "close enough" mental model.

It looks like the "concerns" are all target defined and its the target's 
responsibility to deal with dependencies between the "concerns".  Right? 
  (BTW, I think "concerns" is a particularly poor name, perhaps 
"reasons" would be better?).

Right now it looks like shrink_wrapped_separate essentially says "we did 
something special here" -- while I don't think it's necessary for this 
patch, describing what we did might be better.  Essentially different 
paths would have a set of prologue/epilogue attributes that apply to 
that path.

This would be useful for things like cfgcleanup where you could join 
noreturn calls more aggressively.   Though this may not matter in any 
significant way in practice.

There may be enough commonality that we can promote some "concerns" from 
target dependent to target independent -- I've often wondered if we 
could handle a fair amount of prologue/epilogue generation in target 
independent ways.    For simple targets, I wouldn't be terribly 
surprised if all prologue/epilogue generation could be handled 
generically and they'd get separate shrink-wrapping "for free".  But 
that's all blue-sky.

Jeff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/9] separate shrink-wrapping
  2016-06-09 16:12 ` Jeff Law
@ 2016-06-09 19:57   ` Segher Boessenkool
  0 siblings, 0 replies; 60+ messages in thread
From: Segher Boessenkool @ 2016-06-09 19:57 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches, dje.gcc

On Thu, Jun 09, 2016 at 10:12:53AM -0600, Jeff Law wrote:
> I'm going to largely let Bernd own the review on this.  Just a few comments.
> 
> I certainly like the concept.  My mental model is that parts of the 
> prologue might sink further than other parts of the prologue.  It's not 
> an exact match, but I think it's a "close enough" mental model.

The "normal", "main" shrink-wrapping sinks the prologue, duplicating
blocks that can be run both with and without prologue.  It always
keeps one prologue, and any path through the function will execute the
prologue at most once.

Separate shrink-wrapping can put the prologue pieces in more than one
spot (each), and also execute them more than once.  But it does not
copy blocks, that can explode the code size too much if there are many
concerns.  The standard example:

      |
      1
     / \
    2   3
     \ /
      4
     / \
    5   6
     \ /
      7
      |

where 3 and 6 have some concern.  If 3 and 6 together have a lower
execution frequency than 1 does, it is better to put prologues and
epilogues around 3 and 6; otherwise, it is better to have a prologue
before 1 and an epilogue after 7.

> It looks like the "concerns" are all target defined and its the target's 
> responsibility to deal with dependencies between the "concerns".  Right? 

Exactly.

There are not many dependencies in general (except for the obvious
"setting up a stack frame has to come before almost everything else").

I have found no reasonably simple way yet to have the target inform the
generic code about the dependencies, and nothing uses it so far anyway.
So I just punt the responsibility it to the target code, for now anyway.

>  (BTW, I think "concerns" is a particularly poor name, perhaps 
> "reasons" would be better?).

I think that's even worse.  "pieces", maybe?

> Right now it looks like shrink_wrapped_separate essentially says "we did 
> something special here" -- while I don't think it's necessary for this 
> patch, describing what we did might be better.  Essentially different 
> paths would have a set of prologue/epilogue attributes that apply to 
> that path.

It is used by a few later passes to say "don't touch some stuff, you do
not have all the info you need".  This is analogous to the existing
"shrink_wrapped" flag.

> This would be useful for things like cfgcleanup where you could join 
> noreturn calls more aggressively.   Though this may not matter in any 
> significant way in practice.

Oh, it already tried to join such paths more aggressively.  Which then
blows up big time in the usual dwarf2cfi spot.

> There may be enough commonality that we can promote some "concerns" from 
> target dependent to target independent -- I've often wondered if we 
> could handle a fair amount of prologue/epilogue generation in target 
> independent ways.    For simple targets, I wouldn't be terribly 
> surprised if all prologue/epilogue generation could be handled 
> generically and they'd get separate shrink-wrapping "for free".  But 
> that's all blue-sky.

I don't think that will work to well, unfortunately.


Segher

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/9] separate shrink-wrapping
  2016-06-08 16:43     ` Bernd Schmidt
  2016-06-08 17:26       ` Segher Boessenkool
@ 2016-06-14 21:24       ` Segher Boessenkool
  2016-07-08 10:42         ` Bernd Schmidt
  1 sibling, 1 reply; 60+ messages in thread
From: Segher Boessenkool @ 2016-06-14 21:24 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: gcc-patches, dje.gcc

On Wed, Jun 08, 2016 at 06:43:23PM +0200, Bernd Schmidt wrote:
> On 06/08/2016 05:16 PM, Segher Boessenkool wrote:
> >There is no standard naming for this as far as I know.  I'll gladly
> >use a better name anyone comes up with.
> 
> Maybe just subpart?

How about "factor"?


Segher

^ permalink raw reply	[flat|nested] 60+ messages in thread

* PING  Re: [PATCH 0/9] separate shrink-wrapping
  2016-06-08  1:48 [PATCH 0/9] separate shrink-wrapping Segher Boessenkool
                   ` (10 preceding siblings ...)
  2016-06-09 16:12 ` Jeff Law
@ 2016-06-28  0:22 ` Segher Boessenkool
  2016-07-07 10:16   ` PING x2 " Segher Boessenkool
  11 siblings, 1 reply; 60+ messages in thread
From: Segher Boessenkool @ 2016-06-28  0:22 UTC (permalink / raw)
  To: gcc-patches; +Cc: dje.gcc

Ping.

On Wed, Jun 08, 2016 at 01:47:31AM +0000, Segher Boessenkool wrote:
> This patch series introduces separate shrink-wrapping.
> 
> There are many things the prologue/epilogue of a function do, and most of
> those things can be done independently.  For example, most of the time,
> for many targets, the save of callee-saved registers can be done later
> than the "main" prologue.
> 
> Doing so helps quite a bit because the prologue is expensive for functions
> that do not need everything it does done for every path through the
> function; often, the hot paths do not need much at all, e.g. not those
> things the prologue needs to do for the function to call other functions.
> 
> The first patch creates a command-line flag, some hooks, a status flag
> ("is this function wrapped separately", used by later passes), and
> documentation for these things.
> 
> The next six patches are to prevent later passes from mishandling the
> epilogue instructions that now appear before the epilogue: mostly, you
> cannot do much to instructions with a REG_CFA_RESTORE note without
> confusing dwarf2cfi.  The cprop one is for prologue instructions.
> 
> Then, the main patch.  And finally a patch for PowerPC that implements
> separate wrapping for GPRs and LR.
> 
> Tested on powerpc64-linux (-m32/-m64, -mlra/-mno-lra), and on
> powerpc64le-linux.  Previous versions of this series also tested on
> x86_64-linux.
> 
> Is this okay for trunk?
> 
> 
> Segher
> 
> 
> Segher Boessenkool (9):
>   separate shrink-wrap: New command-line flag, status flag, hooks, and doc
>   cfgcleanup: Don't confuse CFI when -fshrink-wrap-separate
>   dce: Don't dead-code delete separately wrapped restores
>   regrename: Don't rename restores
>   regrename: Don't run if function was separately shrink-wrapped
>   sel-sched: Don't mess with register restores
>   cprop: Leave RTX_FRAME_RELATED_P instructions alone
>   shrink-wrap: shrink-wrapping for separate concerns
>   rs6000: Separate shrink-wrapping
> 
>  gcc/cfgcleanup.c           |   5 +
>  gcc/common.opt             |   4 +
>  gcc/config/rs6000/rs6000.c | 257 ++++++++++++++++--
>  gcc/dce.c                  |   9 +
>  gcc/doc/invoke.texi        |  11 +-
>  gcc/doc/tm.texi            |  53 ++++
>  gcc/doc/tm.texi.in         |  29 ++
>  gcc/emit-rtl.h             |   4 +
>  gcc/function.c             |  15 +-
>  gcc/regcprop.c             |   3 +
>  gcc/regrename.c            |  12 +-
>  gcc/sel-sched-ir.c         |   1 +
>  gcc/shrink-wrap.c          | 647 +++++++++++++++++++++++++++++++++++++++++++++
>  gcc/shrink-wrap.h          |   1 +
>  gcc/target.def             |  56 ++++
>  15 files changed, 1088 insertions(+), 19 deletions(-)
> 
> -- 
> 1.9.3

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/9] separate shrink-wrapping
  2016-06-08 17:26       ` Segher Boessenkool
@ 2016-06-29 23:06         ` Bernd Schmidt
  2016-06-29 23:24           ` Segher Boessenkool
  0 siblings, 1 reply; 60+ messages in thread
From: Bernd Schmidt @ 2016-06-29 23:06 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, dje.gcc

On 06/08/2016 07:26 PM, Segher Boessenkool wrote:
> One thing I should try is put a USE of the saved registers at such
> exits, maybe that helps those passes that now delete frame restores
> to not do that.

Have you had a chance to try this?


Bernd

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/9] separate shrink-wrapping
  2016-06-29 23:06         ` Bernd Schmidt
@ 2016-06-29 23:24           ` Segher Boessenkool
  2016-07-04  8:57             ` Segher Boessenkool
  0 siblings, 1 reply; 60+ messages in thread
From: Segher Boessenkool @ 2016-06-29 23:24 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: gcc-patches, dje.gcc

On Thu, Jun 30, 2016 at 01:03:17AM +0200, Bernd Schmidt wrote:
> On 06/08/2016 07:26 PM, Segher Boessenkool wrote:
> >One thing I should try is put a USE of the saved registers at such
> >exits, maybe that helps those passes that now delete frame restores
> >to not do that.
> 
> Have you had a chance to try this?

Not yet.  I have tried to get dwarf2cfi not to complain when one path
entering a block has a restore and some other patch doesn't (and mark
the register as unavailable).  That works great in most cases but it
seems sometimes this also then happens for exception handlers, which
is disastrous of course (and horrendous to debug).

The USE thing should be much easier, I might have results tomorrow,
if not, next week.


Segher

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/9] separate shrink-wrapping
  2016-06-29 23:24           ` Segher Boessenkool
@ 2016-07-04  8:57             ` Segher Boessenkool
  0 siblings, 0 replies; 60+ messages in thread
From: Segher Boessenkool @ 2016-07-04  8:57 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: gcc-patches, dje.gcc

On Wed, Jun 29, 2016 at 06:16:10PM -0500, Segher Boessenkool wrote:
> On Thu, Jun 30, 2016 at 01:03:17AM +0200, Bernd Schmidt wrote:
> > On 06/08/2016 07:26 PM, Segher Boessenkool wrote:
> > >One thing I should try is put a USE of the saved registers at such
> > >exits, maybe that helps those passes that now delete frame restores
> > >to not do that.
> > 
> > Have you had a chance to try this?
> 
> Not yet.  I have tried to get dwarf2cfi not to complain when one path
> entering a block has a restore and some other patch doesn't (and mark
> the register as unavailable).  That works great in most cases but it
> seems sometimes this also then happens for exception handlers, which
> is disastrous of course (and horrendous to debug).
> 
> The USE thing should be much easier, I might have results tomorrow,
> if not, next week.

Not so easy actually; but I have results now.

Putting USEs of all separately shrink-wrapped variables at each "strange"
exit (i.e. those blocks that have no successors) helps regrename (makes
the first regrename patch unnecessary), but not any of the other patches
(which makes sense if you look at what they do).  It does not help the
problem the second regrename patch is for though (the patch that disables
regrename altogether if separate shrink-wrapping is used).

So the problem shows up on PowerPC with hash_map_rand on -m32.  This is
a huge testcase, and randomised; sometimes it works, more often it corrupts
various data structures.

I tracked it down to regrename renaming one variable (r9, the first choice
for allocation) to some non-volatile register.  This however is a register
that is not separately shrink-wrapped!  So it seems the problem could
happen always, they just rarely do (one in a million or so testcases, and
almost nobody uses regrename anyway).

If at all blocks without successors I put a USE of every register that
*could* be separately shrink-wrapped, the problem goes away.  I do not
yet know why exactly; I don't understand what regrename does very well.

Doing this requires yet another hook (I hardcoded the PowerPC requirements
for now, adding hooks is painful, probably as it should be ;-) ).  It
feels wrong anyway.  I'll have to dig even deeper to find out if this
can or cannot happen without this patch series.


Segher

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: PING x2  Re: [PATCH 0/9] separate shrink-wrapping
  2016-06-28  0:22 ` PING " Segher Boessenkool
@ 2016-07-07 10:16   ` Segher Boessenkool
  0 siblings, 0 replies; 60+ messages in thread
From: Segher Boessenkool @ 2016-07-07 10:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: dje.gcc

Ping.

On Mon, Jun 27, 2016 at 06:51:01PM -0500, Segher Boessenkool wrote:
> Ping.
> 
> On Wed, Jun 08, 2016 at 01:47:31AM +0000, Segher Boessenkool wrote:
> > This patch series introduces separate shrink-wrapping.
> > 
> > There are many things the prologue/epilogue of a function do, and most of
> > those things can be done independently.  For example, most of the time,
> > for many targets, the save of callee-saved registers can be done later
> > than the "main" prologue.
> > 
> > Doing so helps quite a bit because the prologue is expensive for functions
> > that do not need everything it does done for every path through the
> > function; often, the hot paths do not need much at all, e.g. not those
> > things the prologue needs to do for the function to call other functions.
> > 
> > The first patch creates a command-line flag, some hooks, a status flag
> > ("is this function wrapped separately", used by later passes), and
> > documentation for these things.
> > 
> > The next six patches are to prevent later passes from mishandling the
> > epilogue instructions that now appear before the epilogue: mostly, you
> > cannot do much to instructions with a REG_CFA_RESTORE note without
> > confusing dwarf2cfi.  The cprop one is for prologue instructions.
> > 
> > Then, the main patch.  And finally a patch for PowerPC that implements
> > separate wrapping for GPRs and LR.
> > 
> > Tested on powerpc64-linux (-m32/-m64, -mlra/-mno-lra), and on
> > powerpc64le-linux.  Previous versions of this series also tested on
> > x86_64-linux.
> > 
> > Is this okay for trunk?
> > 
> > 
> > Segher
> > 
> > 
> > Segher Boessenkool (9):
> >   separate shrink-wrap: New command-line flag, status flag, hooks, and doc
> >   cfgcleanup: Don't confuse CFI when -fshrink-wrap-separate
> >   dce: Don't dead-code delete separately wrapped restores
> >   regrename: Don't rename restores
> >   regrename: Don't run if function was separately shrink-wrapped
> >   sel-sched: Don't mess with register restores
> >   cprop: Leave RTX_FRAME_RELATED_P instructions alone
> >   shrink-wrap: shrink-wrapping for separate concerns
> >   rs6000: Separate shrink-wrapping
> > 
> >  gcc/cfgcleanup.c           |   5 +
> >  gcc/common.opt             |   4 +
> >  gcc/config/rs6000/rs6000.c | 257 ++++++++++++++++--
> >  gcc/dce.c                  |   9 +
> >  gcc/doc/invoke.texi        |  11 +-
> >  gcc/doc/tm.texi            |  53 ++++
> >  gcc/doc/tm.texi.in         |  29 ++
> >  gcc/emit-rtl.h             |   4 +
> >  gcc/function.c             |  15 +-
> >  gcc/regcprop.c             |   3 +
> >  gcc/regrename.c            |  12 +-
> >  gcc/sel-sched-ir.c         |   1 +
> >  gcc/shrink-wrap.c          | 647 +++++++++++++++++++++++++++++++++++++++++++++
> >  gcc/shrink-wrap.h          |   1 +
> >  gcc/target.def             |  56 ++++
> >  15 files changed, 1088 insertions(+), 19 deletions(-)
> > 
> > -- 
> > 1.9.3

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/9] separate shrink-wrapping
  2016-06-14 21:24       ` Segher Boessenkool
@ 2016-07-08 10:42         ` Bernd Schmidt
  2016-07-08 12:11           ` Segher Boessenkool
  0 siblings, 1 reply; 60+ messages in thread
From: Bernd Schmidt @ 2016-07-08 10:42 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, dje.gcc

On 06/14/2016 11:24 PM, Segher Boessenkool wrote:
> On Wed, Jun 08, 2016 at 06:43:23PM +0200, Bernd Schmidt wrote:
>> On 06/08/2016 05:16 PM, Segher Boessenkool wrote:
>>> There is no standard naming for this as far as I know.  I'll gladly
>>> use a better name anyone comes up with.
>>
>> Maybe just subpart?
>
> How about "factor"?

Still sounds odd to me. "Component" maybe? Ideally a native speaker 
would help decide what sounds natural to them.


Bernd

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/9] separate shrink-wrapping
  2016-07-08 10:42         ` Bernd Schmidt
@ 2016-07-08 12:11           ` Segher Boessenkool
  2016-07-08 13:16             ` David Malcolm
  0 siblings, 1 reply; 60+ messages in thread
From: Segher Boessenkool @ 2016-07-08 12:11 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: gcc-patches, dje.gcc

On Fri, Jul 08, 2016 at 12:42:34PM +0200, Bernd Schmidt wrote:
> On 06/14/2016 11:24 PM, Segher Boessenkool wrote:
> >On Wed, Jun 08, 2016 at 06:43:23PM +0200, Bernd Schmidt wrote:
> >>On 06/08/2016 05:16 PM, Segher Boessenkool wrote:
> >>>There is no standard naming for this as far as I know.  I'll gladly
> >>>use a better name anyone comes up with.
> >>
> >>Maybe just subpart?
> >
> >How about "factor"?
> 
> Still sounds odd to me. "Component" maybe? Ideally a native speaker 
> would help decide what sounds natural to them.

That does sound nice...  OTOH,

$ grep -i component *.c|wc -l
1081

but the opportunity for confusion is limited I think (and calling it
"shrink-wrapping component" where needed sounds natural too!)


Segher

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/9] separate shrink-wrapping
  2016-07-08 12:11           ` Segher Boessenkool
@ 2016-07-08 13:16             ` David Malcolm
  2016-07-08 13:45               ` Segher Boessenkool
  0 siblings, 1 reply; 60+ messages in thread
From: David Malcolm @ 2016-07-08 13:16 UTC (permalink / raw)
  To: Segher Boessenkool, Bernd Schmidt; +Cc: gcc-patches, dje.gcc

On Fri, 2016-07-08 at 07:11 -0500, Segher Boessenkool wrote:
> On Fri, Jul 08, 2016 at 12:42:34PM +0200, Bernd Schmidt wrote:
> > On 06/14/2016 11:24 PM, Segher Boessenkool wrote:
> > > On Wed, Jun 08, 2016 at 06:43:23PM +0200, Bernd Schmidt wrote:
> > > > On 06/08/2016 05:16 PM, Segher Boessenkool wrote:
> > > > > There is no standard naming for this as far as I know.  I'll
> > > > > gladly
> > > > > use a better name anyone comes up with.
> > > > 
> > > > Maybe just subpart?
> > > 
> > > How about "factor"?
> > 
> > Still sounds odd to me. "Component" maybe? Ideally a native speaker
> > would help decide what sounds natural to them.
> 
> That does sound nice...  OTOH,
> 
> $ grep -i component *.c|wc -l
> 1081
> 
> but the opportunity for confusion is limited I think (and calling it
> "shrink-wrapping component" where needed sounds natural too!)

As far as I understand the idea, there are a number of target-specific
things that are to be done during a function call, and the optimization
tries to detect which of optimize each of these separately.

Some synonyms and near-synonyms for these "things":

  aspect
  component
  concern
  duty
  element
  facet
  factor
  item
  part
  piece
  portion
  responsibility

and I suppose "shrink_wrap_part" is shorter than
"shrink_wrap_component".

(Yeah, I'm bike-shedding; sorry)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/9] separate shrink-wrapping
  2016-07-08 13:16             ` David Malcolm
@ 2016-07-08 13:45               ` Segher Boessenkool
  2016-07-08 14:35                 ` Bill Schmidt
  0 siblings, 1 reply; 60+ messages in thread
From: Segher Boessenkool @ 2016-07-08 13:45 UTC (permalink / raw)
  To: David Malcolm; +Cc: Bernd Schmidt, gcc-patches, dje.gcc

On Fri, Jul 08, 2016 at 09:16:03AM -0400, David Malcolm wrote:
> As far as I understand the idea, there are a number of target-specific
> things that are to be done during a function call, and the optimization
> tries to detect which of optimize each of these separately.
> 
> Some synonyms and near-synonyms for these "things":
> 
>   aspect
>   component
>   concern
>   duty
>   element
>   facet
>   factor
>   item
>   part
>   piece
>   portion
>   responsibility
> 
> and I suppose "shrink_wrap_part" is shorter than
> "shrink_wrap_component".

The reason I called it "concern" is that this isn't dealing with the
prologue/epilogue divided neatly into separate insns.  The generic code
only deals with what basic blocks will have what concerns the prologue
deals with, dealt with.  The target code then worries about what code
to write for that.  "concerns" does not map 1-1 to parts of the prologue,
in the general case.  (A very simple example: the arm load/store pair
instructions).

But component is abstract enough I think.

> (Yeah, I'm bike-shedding; sorry)

:-)


Segher

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/9] separate shrink-wrapping
  2016-07-08 13:45               ` Segher Boessenkool
@ 2016-07-08 14:35                 ` Bill Schmidt
  0 siblings, 0 replies; 60+ messages in thread
From: Bill Schmidt @ 2016-07-08 14:35 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: David Malcolm, Bernd Schmidt, GCC Patches, dje.gcc

Not that getting the terminology right isn't important, but it would be
nice if Segher could get a review for the rest of the content, too. :)

Bill

> On Jul 8, 2016, at 8:45 AM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Fri, Jul 08, 2016 at 09:16:03AM -0400, David Malcolm wrote:
>> As far as I understand the idea, there are a number of target-specific
>> things that are to be done during a function call, and the optimization
>> tries to detect which of optimize each of these separately.
>> 
>> Some synonyms and near-synonyms for these "things":
>> 
>>  aspect
>>  component
>>  concern
>>  duty
>>  element
>>  facet
>>  factor
>>  item
>>  part
>>  piece
>>  portion
>>  responsibility
>> 
>> and I suppose "shrink_wrap_part" is shorter than
>> "shrink_wrap_component".
> 
> The reason I called it "concern" is that this isn't dealing with the
> prologue/epilogue divided neatly into separate insns.  The generic code
> only deals with what basic blocks will have what concerns the prologue
> deals with, dealt with.  The target code then worries about what code
> to write for that.  "concerns" does not map 1-1 to parts of the prologue,
> in the general case.  (A very simple example: the arm load/store pair
> instructions).
> 
> But component is abstract enough I think.
> 
>> (Yeah, I'm bike-shedding; sorry)
> 
> :-)
> 
> 
> Segher
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 8/9] shrink-wrap: shrink-wrapping for separate concerns
  2016-06-08  2:03 ` [PATCH 8/9] shrink-wrap: shrink-wrapping for separate concerns Segher Boessenkool
@ 2016-07-15 12:42   ` Bernd Schmidt
  2016-07-18 16:34     ` Segher Boessenkool
  0 siblings, 1 reply; 60+ messages in thread
From: Bernd Schmidt @ 2016-07-15 12:42 UTC (permalink / raw)
  To: Segher Boessenkool, gcc-patches; +Cc: dje.gcc

I still have misgivings about all the changes needed to the following 
passes, but I guess there's no choice but to live with it. So, I'm 
trying to look at this patch, but I'm finding it fairly impenetrable and 
underdocumented.

> +  /* The concerns for which we want a prologue placed on this BB.  */
> +  sbitmap pro_concern;
> +
> +  /* The concerns for which we placed code at the start of the BB.  */
> +  sbitmap head_concern;

What's the difference?

> +  /* The frequency of executing the prologue for this BB and all BBs
> +     dominated by it.  */
> +  gcov_type cost;

Is this frequency consideration the only thing that attempts to prevent 
placing prologue insns into loops?

> +
> +/* Destroy the pass-specific data.  */
> +static void
> +fini_separate_shrink_wrap (void)
> +{
> +  basic_block bb;
> +  FOR_ALL_BB_FN (bb, cfun)
> +    if (bb->aux)
> +      {
> +	sbitmap_free (SW (bb)->has_concern);
> +	sbitmap_free (SW (bb)->pro_concern);
> +	sbitmap_free (SW (bb)->head_concern);
> +	sbitmap_free (SW (bb)->tail_concern);
> +	free (bb->aux);
> +	bb->aux = 0;
> +      }
> +}

Almost makes me want to ask for an sbitmap variant allocated on obstacks.

> +      /* If this block does have the concern itself, or it is cheaper to
> +         put the prologue here than in all the descendants that need it,
> +	 mark it so.  If it is the same cost, put it here if there is no
> +	 block reachable from this block that does not need the prologue.
> +	 The actual test is a bit more stringent but catches most cases.  */

There's some oddness here with the leading whitespace.

> +/* Mark HAS_CONCERN for every block dominated by at least one block with
> +   PRO_CONCERN set, starting at HEAD.  */

I see a lot of code dealing with the placement of prologue 
parts/concerns/components, but very little dealing with how to place 
epilogues, leading me to wonder whether we could do better wrt the 
latter. Shouldn't there be some mirror symmetry, i.e. 
spread_concerns_backwards?

> +    {
> +      if (first_visit)
> +	{
> +	  bitmap_ior (SW (bb)->has_concern, SW (bb)->pro_concern, concern);
> +
> +	  if (first_dom_son (CDI_DOMINATORS, bb))
> +	    {
> +	      concern = SW (bb)->has_concern;
> +	      bb = first_dom_son (CDI_DOMINATORS, bb);
> +	      continue;
> +	    }

Calling first_dom_son twice with the same args? More importantly, this 
first_visit business seems very confusing. I'd try to find a way to 
merge this if with the places that set first_visit to true. Also - 
instead of having a "continue;" here it seems the code inside the if 
represents an inner loop that should be written explicitly. There are 
two loops with such a structure.

> +/* If we cannot handle placing some concern's prologues or epilogues where
> +   we decided we should place them, unmark that concern in CONCERNS so
> +   that it is not wrapped separately.  */
> +static void
> +disqualify_problematic_concerns (sbitmap concerns)
> +{
> +  sbitmap pro = sbitmap_alloc (SBITMAP_SIZE (concerns));
> +  sbitmap epi = sbitmap_alloc (SBITMAP_SIZE (concerns));
> +
> +  basic_block bb;
> +  FOR_EACH_BB_FN (bb, cfun)
> +    {
> +      edge e;
> +      edge_iterator ei;
> +      FOR_EACH_EDGE (e, ei, bb->succs)
> +	{
> +	  bitmap_and_compl (epi, SW (e->src)->has_concern,
> +			    SW (e->dest)->has_concern);
> +	  bitmap_and_compl (pro, SW (e->dest)->has_concern,
> +			    SW (e->src)->has_concern);

What is the purpose of this?

> +/* Place code for prologues and epilogues for CONCERNS where we can put
> +   that code at the start of basic blocks.  */
> +static void
> +do_common_heads_for_concerns (sbitmap concerns)

The function name should probably have some combination of the strings 
emit_ and _at or _into to make it clear what it's doing. This and the 
following function have some logical operations on the bitmaps which are 
not explained anywhere. In general a much better overview of the 
intended operation of this pass is needed.

> +	{
> +	  bitmap_and_compl (epi, SW (e->src)->has_concern,
> +			    SW (e->dest)->has_concern);
> +	  bitmap_and_compl (pro, SW (e->dest)->has_concern,
> +			    SW (e->src)->has_concern);
> +	  bitmap_and (epi, epi, concerns);
> +	  bitmap_and (pro, pro, concerns);
> +	  bitmap_and_compl (epi, epi, SW (e->dest)->head_concern);
> +	  bitmap_and_compl (pro, pro, SW (e->dest)->head_concern);
> +	  bitmap_and_compl (epi, epi, SW (e->src)->tail_concern);
> +	  bitmap_and_compl (pro, pro, SW (e->src)->tail_concern);

Likewise here.


Bernd

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 8/9] shrink-wrap: shrink-wrapping for separate concerns
  2016-07-15 12:42   ` Bernd Schmidt
@ 2016-07-18 16:34     ` Segher Boessenkool
  2016-07-18 17:03       ` Bernd Schmidt
  0 siblings, 1 reply; 60+ messages in thread
From: Segher Boessenkool @ 2016-07-18 16:34 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: gcc-patches, dje.gcc

Hi Bernd,

Thanks for the review.


On Fri, Jul 15, 2016 at 02:42:24PM +0200, Bernd Schmidt wrote:
> I still have misgivings about all the changes needed to the following 
> passes, but I guess there's no choice but to live with it. So, I'm 
> trying to look at this patch, but I'm finding it fairly impenetrable and 
> underdocumented.

I'll add some more comments with a fresh eye.

> >+  /* The concerns for which we want a prologue placed on this BB.  */
> >+  sbitmap pro_concern;
> >+
> >+  /* The concerns for which we placed code at the start of the BB.  */
> >+  sbitmap head_concern;
> 
> What's the difference?

Concerns in head_concern have the prologue code placed at the start
of the bb; concerns in pro_concern have that code placed before the
existing code in this bb, but after the code in any predecessor bb.
It will be inserted on an edge, unless it can be a head_concern or a
tail_concern.

head_concern and tail_concern reduce code size, in the (quite frequent)
cases where cross jumping does a sub-par job.  Originally I didn't have
these; it seems I didn't document them well enough.

> >+  /* The frequency of executing the prologue for this BB and all BBs
> >+     dominated by it.  */
> >+  gcov_type cost;
> 
> Is this frequency consideration the only thing that attempts to prevent 
> placing prologue insns into loops?

Yes.  The algorithm makes sure the prologues are executed as infrequently
as possible.  If a block that would get a prologue has the same frequency
as a predecessor does, and that predecessor always has that first block as
eventual successor, the prologue is moved to the earlier block (this
handles the case where both have a frequency of zero, and other cases
where the range of freq is too limited).

> >+/* Destroy the pass-specific data.  */
> >+static void
> >+fini_separate_shrink_wrap (void)
> >+{
> >+  basic_block bb;
> >+  FOR_ALL_BB_FN (bb, cfun)
> >+    if (bb->aux)
> >+      {
> >+	sbitmap_free (SW (bb)->has_concern);
> >+	sbitmap_free (SW (bb)->pro_concern);
> >+	sbitmap_free (SW (bb)->head_concern);
> >+	sbitmap_free (SW (bb)->tail_concern);
> >+	free (bb->aux);
> >+	bb->aux = 0;
> >+      }
> >+}
> 
> Almost makes me want to ask for an sbitmap variant allocated on obstacks.

Heh, yes.  I'll have a look.

> >+      /* If this block does have the concern itself, or it is cheaper to
> >+         put the prologue here than in all the descendants that need it,
> >+	 mark it so.  If it is the same cost, put it here if there is no
> >+	 block reachable from this block that does not need the prologue.
> >+	 The actual test is a bit more stringent but catches most cases.  */
> 
> There's some oddness here with the leading whitespace.

Will fix.

> >+/* Mark HAS_CONCERN for every block dominated by at least one block with
> >+   PRO_CONCERN set, starting at HEAD.  */
> 
> I see a lot of code dealing with the placement of prologue 
> parts/concerns/components, but very little dealing with how to place 
> epilogues, leading me to wonder whether we could do better wrt the 
> latter. Shouldn't there be some mirror symmetry, i.e. 
> spread_concerns_backwards?

That is unfortunately harder to do (the "global" prologue block dominates
everywhere we could put a prologue component, but this is not true for
epilogues -- there can be more exits).

It is also true the epilogues already are sort of optimal in execution
cost: the epilogues are executed at most as often as the prologues,
which are placed optimally by construction.  The win from placing the
epilogues better is from infinite loops and non-returning abnormal
exits; but also you can get somewhat smaller code.

So I left this as a future improvement.

> >+    {
> >+      if (first_visit)
> >+	{
> >+	  bitmap_ior (SW (bb)->has_concern, SW (bb)->pro_concern, concern);
> >+
> >+	  if (first_dom_son (CDI_DOMINATORS, bb))
> >+	    {
> >+	      concern = SW (bb)->has_concern;
> >+	      bb = first_dom_son (CDI_DOMINATORS, bb);
> >+	      continue;
> >+	    }
> 
> Calling first_dom_son twice with the same args?

I thought it was more readable this way.  It's two derefs and an add or
two.  Maybe we should make it an inline function?

> More importantly, this 
> first_visit business seems very confusing. I'd try to find a way to 
> merge this if with the places that set first_visit to true.

That will break the early-out optimisation I think?  I'll have to look
again.  All the loops here are O(n) (with n the number of edges, or
blocks); but the place_prologues loop is called once for every component,
as well.  So the early-out helps quite a lot here.

> Also - 
> instead of having a "continue;" here it seems the code inside the if 
> represents an inner loop that should be written explicitly. There are 
> two loops with such a structure.

This "loop" is just a non-recursive tree traversal.  The complicated
part is doing the early-out at just the right time.

I'll document it better.

> >+/* If we cannot handle placing some concern's prologues or epilogues where
> >+   we decided we should place them, unmark that concern in CONCERNS so
> >+   that it is not wrapped separately.  */
> >+static void
> >+disqualify_problematic_concerns (sbitmap concerns)
> >+{
> >+  sbitmap pro = sbitmap_alloc (SBITMAP_SIZE (concerns));
> >+  sbitmap epi = sbitmap_alloc (SBITMAP_SIZE (concerns));
> >+
> >+  basic_block bb;
> >+  FOR_EACH_BB_FN (bb, cfun)
> >+    {
> >+      edge e;
> >+      edge_iterator ei;
> >+      FOR_EACH_EDGE (e, ei, bb->succs)
> >+	{
> >+	  bitmap_and_compl (epi, SW (e->src)->has_concern,
> >+			    SW (e->dest)->has_concern);
> >+	  bitmap_and_compl (pro, SW (e->dest)->has_concern,
> >+			    SW (e->src)->has_concern);
> 
> What is the purpose of this?

Of the and_compl's?  epi is those components that need an epilogue on
this edge, pro is those that need a prologue on this edge.  I.e. epi
is those components that are at src but not at dest, and pro is the
other way around.

> >+/* Place code for prologues and epilogues for CONCERNS where we can put
> >+   that code at the start of basic blocks.  */
> >+static void
> >+do_common_heads_for_concerns (sbitmap concerns)
> 
> The function name should probably have some combination of the strings 
> emit_ and _at or _into to make it clear what it's doing. This and the 
> following function have some logical operations on the bitmaps which are 
> not explained anywhere. In general a much better overview of the 
> intended operation of this pass is needed.

It's hard to think of good names.  I'll try harder.

> >+	{
> >+	  bitmap_and_compl (epi, SW (e->src)->has_concern,
> >+			    SW (e->dest)->has_concern);
> >+	  bitmap_and_compl (pro, SW (e->dest)->has_concern,
> >+			    SW (e->src)->has_concern);

epi/pro are those concerns that need an epilogue resp. prologue ...

> >+	  bitmap_and (epi, epi, concerns);
> >+	  bitmap_and (pro, pro, concerns);

... but only handle those concerns we still consider (i.e. those that
are not disqualified one way or another) ...

> >+	  bitmap_and_compl (epi, epi, SW (e->dest)->head_concern);
> >+	  bitmap_and_compl (pro, pro, SW (e->dest)->head_concern);

... don't put the *logues we already put in a head on the edge as well ...

> >+	  bitmap_and_compl (epi, epi, SW (e->src)->tail_concern);
> >+	  bitmap_and_compl (pro, pro, SW (e->src)->tail_concern);

... and likewise for the tail.


Segher

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 8/9] shrink-wrap: shrink-wrapping for separate concerns
  2016-07-18 16:34     ` Segher Boessenkool
@ 2016-07-18 17:03       ` Bernd Schmidt
  2016-07-19 14:46         ` Segher Boessenkool
  0 siblings, 1 reply; 60+ messages in thread
From: Bernd Schmidt @ 2016-07-18 17:03 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, dje.gcc

On 07/18/2016 06:34 PM, Segher Boessenkool wrote:
>
>>> +  /* The frequency of executing the prologue for this BB and all BBs
>>> +     dominated by it.  */
>>> +  gcov_type cost;
>>
>> Is this frequency consideration the only thing that attempts to prevent
>> placing prologue insns into loops?
>
> Yes.  The algorithm makes sure the prologues are executed as infrequently
> as possible.  If a block that would get a prologue has the same frequency
> as a predecessor does, and that predecessor always has that first block as
> eventual successor, the prologue is moved to the earlier block (this
> handles the case where both have a frequency of zero, and other cases
> where the range of freq is too limited).

Ugh, that is really scaring me. I'd much prefer a classification of 
valid blocks based on cfg structure alone - I'll need serious convincing 
that the frequency data is reliable enough for what you are trying to do.


Bernd

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 8/9] shrink-wrap: shrink-wrapping for separate concerns
  2016-07-18 17:03       ` Bernd Schmidt
@ 2016-07-19 14:46         ` Segher Boessenkool
  2016-07-19 14:49           ` Bernd Schmidt
  0 siblings, 1 reply; 60+ messages in thread
From: Segher Boessenkool @ 2016-07-19 14:46 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: gcc-patches, dje.gcc

On Mon, Jul 18, 2016 at 07:03:04PM +0200, Bernd Schmidt wrote:
> >>>+  /* The frequency of executing the prologue for this BB and all BBs
> >>>+     dominated by it.  */
> >>>+  gcov_type cost;
> >>
> >>Is this frequency consideration the only thing that attempts to prevent
> >>placing prologue insns into loops?
> >
> >Yes.  The algorithm makes sure the prologues are executed as infrequently
> >as possible.  If a block that would get a prologue has the same frequency
> >as a predecessor does, and that predecessor always has that first block as
> >eventual successor, the prologue is moved to the earlier block (this
> >handles the case where both have a frequency of zero, and other cases
> >where the range of freq is too limited).
> 
> Ugh, that is really scaring me. I'd much prefer a classification of 
> valid blocks based on cfg structure alone - I'll need serious convincing 
> that the frequency data is reliable enough for what you are trying to do.

But you need the profile to make even reasonably good decisions.

The standard example:

   1
  / \
 2   3
  \ /
   4
  / \
 5   6
  \ /
   7

where 3 and 6 need some prologue, the rest do not.
If freq(3) + freq(6) > freq(1), it is better to put the prologue at 1;
if not, it is better to place it at 3 and 6.

If you do not use the profile, you cannot do better than the status quo,
i.e. always place it at 1.

In the general case, you have the choice between putting the prologue at
some basic block X, or at certain blocks dominated by X.  This algorithm
chooses the case that has the prologue executed the least often in total,
and that is really all there is to it.


Yes, our profile data sometimes is, uh, less than optimal.  But:
- All our other passes use it, too;
- What matters most here is comparing the execution frequency locally,
  and that is not usually messed up so badly;
- All our other passes use it, too;
- The important cases (loops, exceptional cases) normally have a pretty
  reasonable profile;
- All our other passes use it, too;
- Benchmarking shows big wins with this patch.


Segher

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 8/9] shrink-wrap: shrink-wrapping for separate concerns
  2016-07-19 14:46         ` Segher Boessenkool
@ 2016-07-19 14:49           ` Bernd Schmidt
  2016-07-19 15:35             ` Segher Boessenkool
  0 siblings, 1 reply; 60+ messages in thread
From: Bernd Schmidt @ 2016-07-19 14:49 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, dje.gcc

On 07/19/2016 04:46 PM, Segher Boessenkool wrote:

> But you need the profile to make even reasonably good decisions.

I'm not worried about making cost decisions: as far as I'm concerned 
it's perfectly fine for that. I'm worried about correctness - you can't 
validly save registers inside a loop. So IMO there needs to be an 
additional cfg-based check that verifies whether the bb where we want to 
place parts of the prologue is guaranteed to be executed at most once.


Bernd

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 8/9] shrink-wrap: shrink-wrapping for separate concerns
  2016-07-19 14:49           ` Bernd Schmidt
@ 2016-07-19 15:35             ` Segher Boessenkool
  2016-07-20 11:23               ` Bernd Schmidt
  0 siblings, 1 reply; 60+ messages in thread
From: Segher Boessenkool @ 2016-07-19 15:35 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: gcc-patches, dje.gcc

On Tue, Jul 19, 2016 at 04:49:26PM +0200, Bernd Schmidt wrote:
> >But you need the profile to make even reasonably good decisions.
> 
> I'm not worried about making cost decisions: as far as I'm concerned 
> it's perfectly fine for that. I'm worried about correctness - you can't 
> validly save registers inside a loop.

Of course you can.  It needs to be paired with a restore; and we do
that just fine.

Pretty much *all* implementations in the literature do this, fwiw.

> So IMO there needs to be an 
> additional cfg-based check that verifies whether the bb where we want to 
> place parts of the prologue is guaranteed to be executed at most once.

That is equivalent to not doing this optimisation *at all*.


Segher

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 8/9] shrink-wrap: shrink-wrapping for separate concerns
  2016-07-19 15:35             ` Segher Boessenkool
@ 2016-07-20 11:23               ` Bernd Schmidt
  2016-07-20 15:06                 ` Segher Boessenkool
  0 siblings, 1 reply; 60+ messages in thread
From: Bernd Schmidt @ 2016-07-20 11:23 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, dje.gcc



On 07/19/2016 05:35 PM, Segher Boessenkool wrote:
> On Tue, Jul 19, 2016 at 04:49:26PM +0200, Bernd Schmidt wrote:
>>> But you need the profile to make even reasonably good decisions.
>>
>> I'm not worried about making cost decisions: as far as I'm concerned
>> it's perfectly fine for that. I'm worried about correctness - you can't
>> validly save registers inside a loop.
>
> Of course you can.  It needs to be paired with a restore; and we do
> that just fine.
 > Pretty much *all* implementations in the literature do this, fwiw.

I, however, fail to see where this happens. If you have references to 
somewhere where this algorithm is described, that would be helpful, 
because at this stage I think I really don't understand what you're 
trying to achieve. The submission lacks examples.

So I could see things could work if you place an epilogue part in the 
last block of a loop if the start of the loop contains a corresponding 
part of the prologue, but taking just the comment in the code:
    Prologue concerns are placed in such a way that they are executed as
    infrequently as possible.  Epilogue concerns are put everywhere where
    there is an edge from a bb dominated by such a prologue concern to a
    bb not dominated by one.

this describes no mechanism by which such a thing would happen. And I 
fail to see how moving parts of the prologue into a loop would be 
beneficial as an optimization.


Bernd

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 8/9] shrink-wrap: shrink-wrapping for separate concerns
  2016-07-20 11:23               ` Bernd Schmidt
@ 2016-07-20 15:06                 ` Segher Boessenkool
  0 siblings, 0 replies; 60+ messages in thread
From: Segher Boessenkool @ 2016-07-20 15:06 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: gcc-patches, dje.gcc

On Wed, Jul 20, 2016 at 01:23:44PM +0200, Bernd Schmidt wrote:
> >>>But you need the profile to make even reasonably good decisions.
> >>
> >>I'm not worried about making cost decisions: as far as I'm concerned
> >>it's perfectly fine for that. I'm worried about correctness - you can't
> >>validly save registers inside a loop.
> >
> >Of course you can.  It needs to be paired with a restore; and we do
> >that just fine.
> > Pretty much *all* implementations in the literature do this, fwiw.

> I, however, fail to see where this happens.

See for example one of the better papers on shrink-wrapping, "Post Register
Allocation Spill Code Optimization", by Lupo and Wilken.

See the problem definition (section 2), figure 1 for a figure clearly
showing multiple save/restore (and executed more than once). section 4.2
for why we don't need to look at loops.

[ In this paper prologue/epilogue pairs are only placed around SESE
regions, which we do not have many in GCC that late in RTL (often the
CFG isn't even reducible); there is no reason to restrict to SESE regions
though ].

> If you have references to 
> somewhere where this algorithm is described, that would be helpful, 

No, of course not, because I just made this up, as should be clear.

The problem definition is simple: we have a CFG, and some of the blocks
in that CFG need some things done by the prologue done before they
execute.  We don't want to run that prologue code more often than
necessary, because it can be expensive (compared to the parts of the
function that are executed at all).  Considering all possible combinations
of blocks (or edges) where we can place a prologue is not computationally
feasible.  But there is a shortcut: if a block X gets a prologue, all
blocks dominated by it will for zero cost have that prologue established
as well (by simply not doing an epilogue until they are reached).  So
this algo does the obvious thing, simply walking the dom tree (which is
O(n)).  Then, from the prologue placement, we compute which blocks will
execute with that concern "active"; and we insert prologue/epilogue code
to make that assignment true (a prologue or epilogue for every edge that
crosses from "does not have" to "does have", or the other way around; and
then there is the head/tail thing because cross-jumping fails to unify
many of those *logues, so we take care of it manually).

> because at this stage I think I really don't understand what you're 
> trying to achieve. The submission lacks examples.

It says what it does right at the start of the head comment:

"""
   Instead of putting all of the prologue and epilogue in one spot, we
   can put parts of it in places that are executed less frequently.  The
   following code does this, for concerns that can have more than one
   prologue and epilogue, and where those pro- and epilogues can be
   executed more than once.
"""

followed by a bunch of detail.

> So I could see things could work if you place an epilogue part in the 
> last block of a loop if the start of the loop contains a corresponding 
> part of the prologue, but taking just the comment in the code:
>    Prologue concerns are placed in such a way that they are executed as
>    infrequently as possible.  Epilogue concerns are put everywhere where
>    there is an edge from a bb dominated by such a prologue concern to a
>    bb not dominated by one.
> 
> this describes no mechanism by which such a thing would happen.

Sure it does.  The edge leaving the loop, for example.

You can have a prologue/epilogue pair within a loop (which is unusual,
but *can* happen, e.g. as part of a conditional that executes almost
never -- this is quite frequent btw, assertions, on-the-run initialisation,
etc.)

The situation you describe has all the blocks in the loop needing the
prologue (but, say, nothing outside the loop).  Then of course the prologue
is placed on the entry (edge) into the loop, and the epilogue on the exit
edge(s).

> And I 
> fail to see how moving parts of the prologue into a loop would be 
> beneficial as an optimization.

for (i = 0; i < 10; i++)
	if (less_than_one_in_ten_times)
		do_something_that_needs_a_prologue;

or

for (i = 0; i < 10; i++)
	if (whatever)
		do_something_that_needs_a_prologue_and_does_not_return;

or whatever other situation.  We do not have natural loops, often.  The
algorithm places prologues so that their dynamic execution frequency is
optimal, which results in their dynamic execution being optimal, whatever
the CFG looks like.


Segher

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped
  2016-06-08  9:18   ` Bernd Schmidt
@ 2016-09-09 18:41     ` Jeff Law
  2016-09-09 20:56       ` Segher Boessenkool
  0 siblings, 1 reply; 60+ messages in thread
From: Jeff Law @ 2016-09-09 18:41 UTC (permalink / raw)
  To: Bernd Schmidt, Segher Boessenkool, gcc-patches; +Cc: dje.gcc

On 06/08/2016 03:18 AM, Bernd Schmidt wrote:
> On 06/08/2016 03:47 AM, Segher Boessenkool wrote:
>> +      /* regrename creates wrong code for exception handling, if used
>> together
>> +         with separate shrink-wrapping.  Disable for now, until we have
>> +     figured out what exactly is going on.  */
>
> That needs to be figured out now or it'll be there forever.
I suspect it's related to liveness computations getting out-of-wack with 
separate shrink wrapping.  If that's the case, then the question in my 
mind is how painful is this going to be to fix in the df scanning code.

jeff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped
  2016-09-09 18:41     ` Jeff Law
@ 2016-09-09 20:56       ` Segher Boessenkool
  2016-09-09 23:12         ` Jeff Law
  0 siblings, 1 reply; 60+ messages in thread
From: Segher Boessenkool @ 2016-09-09 20:56 UTC (permalink / raw)
  To: Jeff Law; +Cc: Bernd Schmidt, gcc-patches, dje.gcc

On Fri, Sep 09, 2016 at 12:31:31PM -0600, Jeff Law wrote:
> On 06/08/2016 03:18 AM, Bernd Schmidt wrote:
> >On 06/08/2016 03:47 AM, Segher Boessenkool wrote:
> >>+      /* regrename creates wrong code for exception handling, if used
> >>together
> >>+         with separate shrink-wrapping.  Disable for now, until we have
> >>+     figured out what exactly is going on.  */
> >
> >That needs to be figured out now or it'll be there forever.
> I suspect it's related to liveness computations getting out-of-wack with 
> separate shrink wrapping.  If that's the case, then the question in my 
> mind is how painful is this going to be to fix in the df scanning code.

I haven't been able to pin-point the failure.  It happens for just a few
huge testcases.  So I am hoping someone who understands regrename will
figure it out.


Segher

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped
  2016-09-09 20:56       ` Segher Boessenkool
@ 2016-09-09 23:12         ` Jeff Law
  2016-09-10  6:59           ` Segher Boessenkool
  0 siblings, 1 reply; 60+ messages in thread
From: Jeff Law @ 2016-09-09 23:12 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Bernd Schmidt, gcc-patches, dje.gcc

On 09/09/2016 02:41 PM, Segher Boessenkool wrote:
> On Fri, Sep 09, 2016 at 12:31:31PM -0600, Jeff Law wrote:
>> On 06/08/2016 03:18 AM, Bernd Schmidt wrote:
>>> On 06/08/2016 03:47 AM, Segher Boessenkool wrote:
>>>> +      /* regrename creates wrong code for exception handling, if used
>>>> together
>>>> +         with separate shrink-wrapping.  Disable for now, until we have
>>>> +     figured out what exactly is going on.  */
>>>
>>> That needs to be figured out now or it'll be there forever.
>> I suspect it's related to liveness computations getting out-of-wack with
>> separate shrink wrapping.  If that's the case, then the question in my
>> mind is how painful is this going to be to fix in the df scanning code.
>
> I haven't been able to pin-point the failure.  It happens for just a few
> huge testcases.  So I am hoping someone who understands regrename will
> figure it out.
I think that's likely going to fall onto you :-)  We don't generally 
allow passes to just get disabled because some other pass causes them to 
generate the wrong code.

Though I'm curious how you triggered this -- regrename isn't enabled by 
default (except for that brief window earlier this year...).

Jeff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped
  2016-09-09 23:12         ` Jeff Law
@ 2016-09-10  6:59           ` Segher Boessenkool
  2016-09-12 16:36             ` Jeff Law
  0 siblings, 1 reply; 60+ messages in thread
From: Segher Boessenkool @ 2016-09-10  6:59 UTC (permalink / raw)
  To: Jeff Law; +Cc: Bernd Schmidt, gcc-patches, dje.gcc

On Fri, Sep 09, 2016 at 05:01:19PM -0600, Jeff Law wrote:
> On 09/09/2016 02:41 PM, Segher Boessenkool wrote:
> >On Fri, Sep 09, 2016 at 12:31:31PM -0600, Jeff Law wrote:
> >>On 06/08/2016 03:18 AM, Bernd Schmidt wrote:
> >>>On 06/08/2016 03:47 AM, Segher Boessenkool wrote:
> >>>>+      /* regrename creates wrong code for exception handling, if used
> >>>>together
> >>>>+         with separate shrink-wrapping.  Disable for now, until we have
> >>>>+     figured out what exactly is going on.  */
> >>>
> >>>That needs to be figured out now or it'll be there forever.
> >>I suspect it's related to liveness computations getting out-of-wack with
> >>separate shrink wrapping.  If that's the case, then the question in my
> >>mind is how painful is this going to be to fix in the df scanning code.
> >
> >I haven't been able to pin-point the failure.  It happens for just a few
> >huge testcases.  So I am hoping someone who understands regrename will
> >figure it out.
> I think that's likely going to fall onto you :-)  We don't generally 
> allow passes to just get disabled because some other pass causes them to 
> generate the wrong code.

We can instead declare that anyone enabling regrename is on his own?
I like that plan better.

I can also make the compiler error out if you try to have both separate
shrink-wrapping and regrename on at the same time.

There are no happy answers :-(

> Though I'm curious how you triggered this -- regrename isn't enabled by 
> default (except for that brief window earlier this year...).

Yes, these patches are that old now already.  I also enabled regrename
by default for quite a while, for testing purposes, since at the time
it looked like regrename would be on by default for most architectures.
Separate shrink-wrapping is supposed to be useful for all ports, not
just for Power.


Segher

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped
  2016-09-10  6:59           ` Segher Boessenkool
@ 2016-09-12 16:36             ` Jeff Law
       [not found]               ` <CAGWvny=fHHZtKF4_D2098+3PTPPzxtg3EjKDWHyJwUxz8g_tEA@mail.gmail.com>
  2016-09-14 13:08               ` Segher Boessenkool
  0 siblings, 2 replies; 60+ messages in thread
From: Jeff Law @ 2016-09-12 16:36 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Bernd Schmidt, gcc-patches, dje.gcc

On 09/10/2016 12:40 AM, Segher Boessenkool wrote:
> On Fri, Sep 09, 2016 at 05:01:19PM -0600, Jeff Law wrote:
>> On 09/09/2016 02:41 PM, Segher Boessenkool wrote:
>>> On Fri, Sep 09, 2016 at 12:31:31PM -0600, Jeff Law wrote:
>>>> On 06/08/2016 03:18 AM, Bernd Schmidt wrote:
>>>>> On 06/08/2016 03:47 AM, Segher Boessenkool wrote:
>>>>>> +      /* regrename creates wrong code for exception handling, if used
>>>>>> together
>>>>>> +         with separate shrink-wrapping.  Disable for now, until we have
>>>>>> +     figured out what exactly is going on.  */
>>>>>
>>>>> That needs to be figured out now or it'll be there forever.
>>>> I suspect it's related to liveness computations getting out-of-wack with
>>>> separate shrink wrapping.  If that's the case, then the question in my
>>>> mind is how painful is this going to be to fix in the df scanning code.
>>>
>>> I haven't been able to pin-point the failure.  It happens for just a few
>>> huge testcases.  So I am hoping someone who understands regrename will
>>> figure it out.
>> I think that's likely going to fall onto you :-)  We don't generally
>> allow passes to just get disabled because some other pass causes them to
>> generate the wrong code.
>
> We can instead declare that anyone enabling regrename is on his own?
> I like that plan better.
No.  I won't ack that.  It seems totally backwards.

>
> I can also make the compiler error out if you try to have both separate
> shrink-wrapping and regrename on at the same time.
>
> There are no happy answers :-(
The answer is to debug the problem, and yes it may be painful.  I'd 
probably start by fixing the dataflow issues and see if that fixes the 
regrename thing as a side effect.

>
>> Though I'm curious how you triggered this -- regrename isn't enabled by
>> default (except for that brief window earlier this year...).
>
> Yes, these patches are that old now already.  I also enabled regrename
> by default for quite a while, for testing purposes, since at the time
> it looked like regrename would be on by default for most architectures.
> Separate shrink-wrapping is supposed to be useful for all ports, not
> just for Power.
I think enabling regrename by default is the right thing to do long term.

jeff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped
       [not found]                   ` <CAGWvnykQ3oz0UpcF6U1WYivbJww65h2EH5n3FocQ8JGY9hrOrA@mail.gmail.com>
@ 2016-09-12 17:04                     ` Jeff Law
  0 siblings, 0 replies; 60+ messages in thread
From: Jeff Law @ 2016-09-12 17:04 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Segher Boessenkool, GCC Patches, Bernd Schmidt

On 09/12/2016 10:54 AM, David Edelsohn wrote:
> On Sep 12, 2016 17:33, "Jeff Law" <law@redhat.com
> <mailto:law@redhat.com>> wrote:
>>
>> On 09/10/2016 12:40 AM, Segher Boessenkool wrote:
>>>
>>> On Fri, Sep 09, 2016 at 05:01:19PM -0600, Jeff Law wrote:
>>>>
>>>> On 09/09/2016 02:41 PM, Segher Boessenkool wrote:
>>>>>
>>>>> On Fri, Sep 09, 2016 at 12:31:31PM -0600, Jeff Law wrote:
>>>>>>
>>>>>> On 06/08/2016 03:18 AM, Bernd Schmidt wrote:
>>>>>>>
>>>>>>> On 06/08/2016 03:47 AM, Segher Boessenkool wrote:
>>>>>>>>
>>>>>>>> +      /* regrename creates wrong code for exception handling,
> if used
>>>>>>>> together
>>>>>>>> +         with separate shrink-wrapping.  Disable for now, until
> we have
>>>>>>>> +     figured out what exactly is going on.  */
>>>>>>>
>>>>>>>
>>>>>>> That needs to be figured out now or it'll be there forever.
>>>>>>
>>>>>> I suspect it's related to liveness computations getting
> out-of-wack with
>>>>>> separate shrink wrapping.  If that's the case, then the question in my
>>>>>> mind is how painful is this going to be to fix in the df scanning
> code.
>>>>>
>>>>>
>>>>> I haven't been able to pin-point the failure.  It happens for just
> a few
>>>>> huge testcases.  So I am hoping someone who understands regrename will
>>>>> figure it out.
>>>>
>>>> I think that's likely going to fall onto you :-)  We don't generally
>>>> allow passes to just get disabled because some other pass causes them to
>>>> generate the wrong code.
>>>
>>>
>>> We can instead declare that anyone enabling regrename is on his own?
>>> I like that plan better.
>>
>> No.  I won't ack that.  It seems totally backwards.
>>
>>
>>>
>>> I can also make the compiler error out if you try to have both separate
>>> shrink-wrapping and regrename on at the same time.
>>>
>>> There are no happy answers :-(
>>
>> The answer is to debug the problem, and yes it may be painful.  I'd
> probably start by fixing the dataflow issues and see if that fixes the
> regrename thing as a side effect.
>>
>>
>>>
>>>> Though I'm curious how you triggered this -- regrename isn't enabled by
>>>> default (except for that brief window earlier this year...).
>>>
>>>
>>> Yes, these patches are that old now already.  I also enabled regrename
>>> by default for quite a while, for testing purposes, since at the time
>>> it looked like regrename would be on by default for most architectures.
>>> Separate shrink-wrapping is supposed to be useful for all ports, not
>>> just for Power.
>>
>> I think enabling regrename by default is the right thing to do long term.
>
> Jeff,
>
> The current regrename pass is horrible. Badly designed, badly written
> and ineffective. You are the only person who likes it.
?!?  I don't like or dislike it.  I think it solves a well known problem 
for processors with particular pipeline characteristics.  No more, no less.


>
> There are plenty of gcc passes that don't work well together. If segher
> can work around it easily, okay. This series of patches should not be
> held up because of a bug in a badly written pass that is not enabled by
> default and only found by segher being extra diligent.
It's not a question of not working well together.   It's a case where 
the two in combination generate incorrect code and we don't know if it's 
a regrename issue or an issue with Segher's code.  I have some 
suspicions here, but until Segher does the analysis, I can't ack the patch.

jeff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped
  2016-09-12 16:36             ` Jeff Law
       [not found]               ` <CAGWvny=fHHZtKF4_D2098+3PTPPzxtg3EjKDWHyJwUxz8g_tEA@mail.gmail.com>
@ 2016-09-14 13:08               ` Segher Boessenkool
  2016-09-14 13:18                 ` Bernd Schmidt
  2016-09-14 18:21                 ` Jeff Law
  1 sibling, 2 replies; 60+ messages in thread
From: Segher Boessenkool @ 2016-09-14 13:08 UTC (permalink / raw)
  To: Jeff Law; +Cc: Bernd Schmidt, gcc-patches, dje.gcc

On Mon, Sep 12, 2016 at 10:33:22AM -0600, Jeff Law wrote:
> The answer is to debug the problem, and yes it may be painful.

Yes, the three weeks pretty much full-time I spent on that already attest
to that.

> I'd 
> probably start by fixing the dataflow issues and see if that fixes the 
> regrename thing as a side effect.

Have you seen https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00091.html ?

> I think enabling regrename by default is the right thing to do long term.

Then rs6000 (and many other ports probably) will just turn it off again.
It is actively harmful.


Segher

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped
  2016-09-14 13:08               ` Segher Boessenkool
@ 2016-09-14 13:18                 ` Bernd Schmidt
  2016-09-14 14:01                   ` Segher Boessenkool
  2016-09-14 18:21                 ` Jeff Law
  1 sibling, 1 reply; 60+ messages in thread
From: Bernd Schmidt @ 2016-09-14 13:18 UTC (permalink / raw)
  To: Segher Boessenkool, Jeff Law; +Cc: gcc-patches, dje.gcc

On 09/14/2016 03:04 PM, Segher Boessenkool wrote:
> Then rs6000 (and many other ports probably) will just turn it off again.
> It is actively harmful.

The data that was posted showed a code size decrease on a number of 
targets. I'm really not sure where this irrational hate for regrename 
comes from.


Bernd

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped
  2016-09-14 13:18                 ` Bernd Schmidt
@ 2016-09-14 14:01                   ` Segher Boessenkool
  2016-09-14 14:54                     ` Bernd Schmidt
  2016-09-14 17:55                     ` Jeff Law
  0 siblings, 2 replies; 60+ messages in thread
From: Segher Boessenkool @ 2016-09-14 14:01 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: Jeff Law, gcc-patches, dje.gcc

On Wed, Sep 14, 2016 at 03:08:21PM +0200, Bernd Schmidt wrote:
> On 09/14/2016 03:04 PM, Segher Boessenkool wrote:
> >Then rs6000 (and many other ports probably) will just turn it off again.
> >It is actively harmful.
> 
> The data that was posted showed a code size decrease on a number of 
> targets. I'm really not sure where this irrational hate for regrename 
> comes from.

It increases the number of active, "young" registers per thread.

There is no irrational hate.  Regrename is simply a de-optimisation on
some (heavily) out-of-order targets.  Not by much, but not a win either.
We would rather do without it, on current CPUs at least (and I doubt this
will change soon, but we'll see).

(It also makes the generated code much harder to read, but you know that).


Segher

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped
  2016-09-14 14:01                   ` Segher Boessenkool
@ 2016-09-14 14:54                     ` Bernd Schmidt
  2016-09-14 16:33                       ` Segher Boessenkool
  2016-09-14 19:10                       ` Jeff Law
  2016-09-14 17:55                     ` Jeff Law
  1 sibling, 2 replies; 60+ messages in thread
From: Bernd Schmidt @ 2016-09-14 14:54 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Jeff Law, gcc-patches, dje.gcc

On 09/14/2016 03:55 PM, Segher Boessenkool wrote:
> On Wed, Sep 14, 2016 at 03:08:21PM +0200, Bernd Schmidt wrote:
>> The data that was posted showed a code size decrease on a number of
>> targets. I'm really not sure where this irrational hate for regrename
>> comes from.
>
> It increases the number of active, "young" registers per thread.
>
> There is no irrational hate.  Regrename is simply a de-optimisation on
> some (heavily) out-of-order targets.

Can you point me at a processor manual for such a chip that explains why 
this would be the case?

> (It also makes the generated code much harder to read, but you know that).

Can't say I do, really. I imagine it could be the case if it enables 
more aggressive scheduling but that's kind of one of the intended effects.


Bernd

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped
  2016-09-14 14:54                     ` Bernd Schmidt
@ 2016-09-14 16:33                       ` Segher Boessenkool
  2016-09-14 19:10                       ` Jeff Law
  1 sibling, 0 replies; 60+ messages in thread
From: Segher Boessenkool @ 2016-09-14 16:33 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: Jeff Law, gcc-patches, dje.gcc

Hi Bernd,

On Wed, Sep 14, 2016 at 04:01:34PM +0200, Bernd Schmidt wrote:
> On 09/14/2016 03:55 PM, Segher Boessenkool wrote:
> >On Wed, Sep 14, 2016 at 03:08:21PM +0200, Bernd Schmidt wrote:
> >>The data that was posted showed a code size decrease on a number of
> >>targets. I'm really not sure where this irrational hate for regrename
> >>comes from.
> >
> >It increases the number of active, "young" registers per thread.
> >
> >There is no irrational hate.  Regrename is simply a de-optimisation on
> >some (heavily) out-of-order targets.
> 
> Can you point me at a processor manual for such a chip that explains why 
> this would be the case?

The POWER8 UM explains it a bit.  I don't have an URL, everything is behind
a registration wall it seems.

> >(It also makes the generated code much harder to read, but you know that).
> 
> Can't say I do, really. I imagine it could be the case if it enables 
> more aggressive scheduling but that's kind of one of the intended effects.

We have 32 int regs, 32 float regs, 32 vec regs, 8 cond regs.  It becomes
really hard to read if things are pretty much randomly jumbled about, as
regrename does.

The dump files are almost impossible to read btw, regrename prints the
assembler name of registers only (not the RTL name), and the assembler name
of all of GPRn, FPRn, VRn, CRn is "n".  All (?) other passes print the RTL
register number (or both).


Segher

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped
  2016-09-14 14:01                   ` Segher Boessenkool
  2016-09-14 14:54                     ` Bernd Schmidt
@ 2016-09-14 17:55                     ` Jeff Law
  2016-09-14 19:13                       ` Segher Boessenkool
  1 sibling, 1 reply; 60+ messages in thread
From: Jeff Law @ 2016-09-14 17:55 UTC (permalink / raw)
  To: Segher Boessenkool, Bernd Schmidt; +Cc: gcc-patches, dje.gcc

On 09/14/2016 07:55 AM, Segher Boessenkool wrote:
> On Wed, Sep 14, 2016 at 03:08:21PM +0200, Bernd Schmidt wrote:
>> On 09/14/2016 03:04 PM, Segher Boessenkool wrote:
>>> Then rs6000 (and many other ports probably) will just turn it off again.
>>> It is actively harmful.
>>
>> The data that was posted showed a code size decrease on a number of
>> targets. I'm really not sure where this irrational hate for regrename
>> comes from.
>
> It increases the number of active, "young" registers per thread.
Yea, it'll certainly do that and I can imagine a design which would have 
that property.  And there's other designs which benefit from spreading 
across the register file as much as possible.

Which argues there needs to be a way to tune or disable the pass.

Jeff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped
  2016-09-14 13:08               ` Segher Boessenkool
  2016-09-14 13:18                 ` Bernd Schmidt
@ 2016-09-14 18:21                 ` Jeff Law
  2016-09-14 19:13                   ` Segher Boessenkool
  1 sibling, 1 reply; 60+ messages in thread
From: Jeff Law @ 2016-09-14 18:21 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Bernd Schmidt, gcc-patches, dje.gcc

On 09/14/2016 07:04 AM, Segher Boessenkool wrote:
>> I'd
>> probably start by fixing the dataflow issues and see if that fixes the
>> regrename thing as a side effect.
>
> Have you seen https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00091.html ?
I missed it.  My interpretation....

The uses at each "strange" exit fixing the first issue with 
shrink-wrapping definitely sounds like we've got a dataflow problem of 
some sort.

If you think about it, conceptually we want the return insn to make the 
callee saved registers "used" so that DCE, regrename and friends don't 
muck with them.  The fact that we don't is as much never having to care 
all that much until recently.

I continue to wonder if we could add something similar to 
CALL_INSN_FUNCTION_USAGE where we attach uses for all the call-saved 
registers onto a return insn.  We would attach them at the end of 
prologue/epilogue generation and only attach those where were live 
somewhere in the code.

By deferring that step until after prologue/epilogue generation we 
shouldn't cause unnecessary register saves/restores.

I pondered just doing it for the separately wrapped components on that 
particular path, but I've yet to convince myself that's actually correct.

Bernd knows the regrename code better than anyone.  Is there any way the 
two of you could work together to try and track down what's going on in 
the hash_map_rand case?  Even throwing in some more debug stuff might 
help narrow things down since it's renaming something to a non-volatile, 
non-separately shrink wrapped register that's causing problems.

>> I think enabling regrename by default is the right thing to do long term.
>
> Then rs6000 (and many other ports probably) will just turn it off again.
> It is actively harmful.
I think you're over stating things here.

Can we agree that there's a set of targets that will improve and a set 
that are harmed? And that to enable regrename by default we need to 
either better describe the pipeline characteristics we're optimizing for 
or a well defined way for targets to turn it off?


jeff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped
  2016-09-14 14:54                     ` Bernd Schmidt
  2016-09-14 16:33                       ` Segher Boessenkool
@ 2016-09-14 19:10                       ` Jeff Law
  1 sibling, 0 replies; 60+ messages in thread
From: Jeff Law @ 2016-09-14 19:10 UTC (permalink / raw)
  To: Bernd Schmidt, Segher Boessenkool; +Cc: gcc-patches, dje.gcc

On 09/14/2016 08:01 AM, Bernd Schmidt wrote:
> On 09/14/2016 03:55 PM, Segher Boessenkool wrote:
>> On Wed, Sep 14, 2016 at 03:08:21PM +0200, Bernd Schmidt wrote:
>>> The data that was posted showed a code size decrease on a number of
>>> targets. I'm really not sure where this irrational hate for regrename
>>> comes from.
>>
>> It increases the number of active, "young" registers per thread.
>>
>> There is no irrational hate.  Regrename is simply a de-optimisation on
>> some (heavily) out-of-order targets.
>
> Can you point me at a processor manual for such a chip that explains why
> this would be the case?
Ponder a processor where the cost to access a register is non-uniform 
and related to how long ago a particular register was used.   This can 
happen when the actual hardware register file is much larger than the 
exposed register file (to support hardware renaming, hyperthreading, 
partitioning, etc).

>  I imagine it could be the case if it enables
> more aggressive scheduling but that's kind of one of the intended effects.
Exactly.


jeff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped
  2016-09-14 17:55                     ` Jeff Law
@ 2016-09-14 19:13                       ` Segher Boessenkool
  2016-09-14 19:36                         ` Jeff Law
  0 siblings, 1 reply; 60+ messages in thread
From: Segher Boessenkool @ 2016-09-14 19:13 UTC (permalink / raw)
  To: Jeff Law; +Cc: Bernd Schmidt, gcc-patches, dje.gcc

On Wed, Sep 14, 2016 at 11:52:02AM -0600, Jeff Law wrote:
> Yea, it'll certainly do that and I can imagine a design which would have 
> that property.  And there's other designs which benefit from spreading 
> across the register file as much as possible.
> 
> Which argues there needs to be a way to tune or disable the pass.

Yes, some targets want it, and some don't.  It seems to me that targets
that use sched1 have much less benefit from regrename.  Of course enabling
sched1 has its own problems.


Segher

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped
  2016-09-14 18:21                 ` Jeff Law
@ 2016-09-14 19:13                   ` Segher Boessenkool
  2016-09-14 19:38                     ` Jeff Law
  2016-09-14 20:04                     ` Jeff Law
  0 siblings, 2 replies; 60+ messages in thread
From: Segher Boessenkool @ 2016-09-14 19:13 UTC (permalink / raw)
  To: Jeff Law; +Cc: Bernd Schmidt, gcc-patches, dje.gcc

On Wed, Sep 14, 2016 at 12:11:50PM -0600, Jeff Law wrote:
> On 09/14/2016 07:04 AM, Segher Boessenkool wrote:
> >>I'd
> >>probably start by fixing the dataflow issues and see if that fixes the
> >>regrename thing as a side effect.
> >
> >Have you seen https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00091.html ?
> I missed it.

Yeah thought so, too much stuff in flight here.

> My interpretation....
> 
> The uses at each "strange" exit fixing the first issue with 
> shrink-wrapping definitely sounds like we've got a dataflow problem of 
> some sort.
> 
> If you think about it, conceptually we want the return insn to make the 
> callee saved registers "used" so that DCE, regrename and friends don't 
> muck with them.  The fact that we don't is as much never having to care 
> all that much until recently.

(There is no return insn at those exits; these are exits *without*
successor block, not the exit block).

It is puzzling to me why adding USEs for just the registers that *are*
separately shrink-wrapped does not work; only all those that *could* be
shrink-wrapped does.  Do you have any idea about that?

> I continue to wonder if we could add something similar to 
> CALL_INSN_FUNCTION_USAGE where we attach uses for all the call-saved 
> registers onto a return insn.  We would attach them at the end of 
> prologue/epilogue generation and only attach those where were live 
> somewhere in the code.

Maybe adding the new insns to the {pro,epi}logue_insn_hash will help
something.  Or maybe it will blow up spectacularly.  Will know in a
bit.

> By deferring that step until after prologue/epilogue generation we 
> shouldn't cause unnecessary register saves/restores.

Hrm.  I'll see about that as well.

> I pondered just doing it for the separately wrapped components on that 
> particular path, but I've yet to convince myself that's actually correct.

If that is not correct, how is the status quo correct?  That is what
puzzles me above, too.

> Bernd knows the regrename code better than anyone.  Is there any way the 
> two of you could work together to try and track down what's going on in 
> the hash_map_rand case?  Even throwing in some more debug stuff might 
> help narrow things down since it's renaming something to a non-volatile, 
> non-separately shrink wrapped register that's causing problems.

Okay with me, I could certainly use his help.  I'll try the above things
first though, so not before friday.

> Can we agree that there's a set of targets that will improve and a set 
> that are harmed? And that to enable regrename by default we need to 
> either better describe the pipeline characteristics we're optimizing for 
> or a well defined way for targets to turn it off?

There is a well-defined way to turn it off, via common/config/*/*-common.c ,
TARGET_OPTION_OPTIMIZATION_TABLE.  We disagree on whether most targets will
want it enabled, i.e. whether it should (eventually) be enabled by default.


Segher

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped
  2016-09-14 19:13                       ` Segher Boessenkool
@ 2016-09-14 19:36                         ` Jeff Law
  0 siblings, 0 replies; 60+ messages in thread
From: Jeff Law @ 2016-09-14 19:36 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Bernd Schmidt, gcc-patches, dje.gcc

On 09/14/2016 01:10 PM, Segher Boessenkool wrote:
> On Wed, Sep 14, 2016 at 11:52:02AM -0600, Jeff Law wrote:
>> Yea, it'll certainly do that and I can imagine a design which would have
>> that property.  And there's other designs which benefit from spreading
>> across the register file as much as possible.
>>
>> Which argues there needs to be a way to tune or disable the pass.
>
> Yes, some targets want it, and some don't.  It seems to me that targets
> that use sched1 have much less benefit from regrename.  Of course enabling
> sched1 has its own problems.
I think that sched1 would decrease the benefit of reg renaming, but 
wouldn't totally eliminate the desire to avoid anti dependencies created 
by register allocation.

jeff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped
  2016-09-14 19:13                   ` Segher Boessenkool
@ 2016-09-14 19:38                     ` Jeff Law
  2016-09-14 22:34                       ` Segher Boessenkool
  2016-09-19 17:11                       ` Segher Boessenkool
  2016-09-14 20:04                     ` Jeff Law
  1 sibling, 2 replies; 60+ messages in thread
From: Jeff Law @ 2016-09-14 19:38 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Bernd Schmidt, gcc-patches, dje.gcc

On 09/14/2016 01:03 PM, Segher Boessenkool wrote:
>> If you think about it, conceptually we want the return insn to make the
>> callee saved registers "used" so that DCE, regrename and friends don't
>> muck with them.  The fact that we don't is as much never having to care
>> all that much until recently.
>
> (There is no return insn at those exits; these are exits *without*
> successor block, not the exit block).
Ugh.  Anywhere we could attach this stuff in the insn chain?  If not, 
the DF side of this problem gets uglier.

>
> It is puzzling to me why adding USEs for just the registers that *are*
> separately shrink-wrapped does not work; only all those that *could* be
> shrink-wrapped does.  Do you have any idea about that?
Nope.  No clue on that one.  It almost seems like a book-keeping error 
somewhere.

>
>> I continue to wonder if we could add something similar to
>> CALL_INSN_FUNCTION_USAGE where we attach uses for all the call-saved
>> registers onto a return insn.  We would attach them at the end of
>> prologue/epilogue generation and only attach those where were live
>> somewhere in the code.
>
> Maybe adding the new insns to the {pro,epi}logue_insn_hash will help
> something.  Or maybe it will blow up spectacularly.  Will know in a
> bit.
I think that'll resolve the sel-sched issue, but I doubt the rest.

>
>> I pondered just doing it for the separately wrapped components on that
>> particular path, but I've yet to convince myself that's actually correct.
>
> If that is not correct, how is the status quo correct?  That is what
> puzzles me above, too.
I couldn't convince myself that we could trivially find all the 
separately wrapped components on a particular path -- ie, we may have 
had components saved/restored in some sub-graph.  If an exit point from 
the function is reachable from that sub-graph, then we need to make sure 
any components from the subgraph are marked as live in addition to those 
which are restored on the exit path.

While it is just a reachability problem, I don't think we need to solve 
it if we mark anything that was separately shrink wrapped as live at all 
the exit points.


>
> There is a well-defined way to turn it off, via common/config/*/*-common.c ,
> TARGET_OPTION_OPTIMIZATION_TABLE.  We disagree on whether most targets will
> want it enabled, i.e. whether it should (eventually) be enabled by default.
I'm aware of TARGET_OPTION_OPTIMIZATION_TABLE -- I was thinking more 
along the lines of trying to describe the conditions under which it is 
not profitable and expose that.  TARGET_OPTION_OPTIMIZATION_TABLE is a 
lame way to attack this problem.

Jeff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped
  2016-09-14 19:13                   ` Segher Boessenkool
  2016-09-14 19:38                     ` Jeff Law
@ 2016-09-14 20:04                     ` Jeff Law
  2016-09-14 22:51                       ` Segher Boessenkool
  1 sibling, 1 reply; 60+ messages in thread
From: Jeff Law @ 2016-09-14 20:04 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Bernd Schmidt, gcc-patches, dje.gcc

On 09/14/2016 01:03 PM, Segher Boessenkool wrote:
>
> (There is no return insn at those exits; these are exits *without*
> successor block, not the exit block).
Hmm, I thought these were return blocks, but you're saying they're 
no-return blocks?  I missed that.

In that case, aren't the restores dead because no callers can observe 
their value changing unexpectedly?


Jeff


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped
  2016-09-14 19:38                     ` Jeff Law
@ 2016-09-14 22:34                       ` Segher Boessenkool
  2016-09-15 17:28                         ` Jeff Law
  2016-09-19 17:11                       ` Segher Boessenkool
  1 sibling, 1 reply; 60+ messages in thread
From: Segher Boessenkool @ 2016-09-14 22:34 UTC (permalink / raw)
  To: Jeff Law; +Cc: Bernd Schmidt, gcc-patches, dje.gcc

On Wed, Sep 14, 2016 at 01:33:04PM -0600, Jeff Law wrote:
> On 09/14/2016 01:03 PM, Segher Boessenkool wrote:
> >>If you think about it, conceptually we want the return insn to make the
> >>callee saved registers "used" so that DCE, regrename and friends don't
> >>muck with them.  The fact that we don't is as much never having to care
> >>all that much until recently.
> >
> >(There is no return insn at those exits; these are exits *without*
> >successor block, not the exit block).
> Ugh.  Anywhere we could attach this stuff in the insn chain?  If not, 
> the DF side of this problem gets uglier.

I put the USEs at the start of that noreturn basic block.

> >It is puzzling to me why adding USEs for just the registers that *are*
> >separately shrink-wrapped does not work; only all those that *could* be
> >shrink-wrapped does.  Do you have any idea about that?
> Nope.  No clue on that one.  It almost seems like a book-keeping error 
> somewhere.

Right.

> >>I pondered just doing it for the separately wrapped components on that
> >>particular path, but I've yet to convince myself that's actually correct.
> >
> >If that is not correct, how is the status quo correct?  That is what
> >puzzles me above, too.
> I couldn't convince myself that we could trivially find all the 
> separately wrapped components on a particular path -- ie, we may have 
> had components saved/restored in some sub-graph.  If an exit point from 
> the function is reachable from that sub-graph, then we need to make sure 
> any components from the subgraph are marked as live in addition to those 
> which are restored on the exit path.
> 
> While it is just a reachability problem, I don't think we need to solve 
> it if we mark anything that was separately shrink wrapped as live at all 
> the exit points.

Agreed, but why does it work if not separately shrink-wrapping anything?
And why does it break on things that are *not* separately wrapped *anywhere*?

> >There is a well-defined way to turn it off, via common/config/*/*-common.c 
> >,
> >TARGET_OPTION_OPTIMIZATION_TABLE.  We disagree on whether most targets will
> >want it enabled, i.e. whether it should (eventually) be enabled by default.
> I'm aware of TARGET_OPTION_OPTIMIZATION_TABLE -- I was thinking more 
> along the lines of trying to describe the conditions under which it is 
> not profitable and expose that.  TARGET_OPTION_OPTIMIZATION_TABLE is a 
> lame way to attack this problem.

Better than a target macro ;-)


Segher

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped
  2016-09-14 20:04                     ` Jeff Law
@ 2016-09-14 22:51                       ` Segher Boessenkool
  0 siblings, 0 replies; 60+ messages in thread
From: Segher Boessenkool @ 2016-09-14 22:51 UTC (permalink / raw)
  To: Jeff Law; +Cc: Bernd Schmidt, gcc-patches, dje.gcc

On Wed, Sep 14, 2016 at 01:35:57PM -0600, Jeff Law wrote:
> On 09/14/2016 01:03 PM, Segher Boessenkool wrote:
> >
> >(There is no return insn at those exits; these are exits *without*
> >successor block, not the exit block).
> Hmm, I thought these were return blocks, but you're saying they're 
> no-return blocks?  I missed that.
> 
> In that case, aren't the restores dead because no callers can observe 
> their value changing unexpectedly?

Yes, but DCE can not remove the insns because dwarf2cfi would throw a
fit (and for good reason).  That is why adding all these USEs makes
the DCE patch unnecessary (that patch simply disallows deleting insns
with a REG_CFA_RESTORE note, much simpler than all that USE surgery,
and perfectly correct and pretty much the best we can do).

The problem is that regrename decides to rename a (volatile) register to
some non-volatile register that is *not* separately shrink-wrapped (and
not actually dead there).  Why would it do that, why would it not do that
if not separately shrink-wrapping at all?

(I did run this with checking=yes,rtl btw, it is not simple corruption).


Segher

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped
  2016-09-14 22:34                       ` Segher Boessenkool
@ 2016-09-15 17:28                         ` Jeff Law
  0 siblings, 0 replies; 60+ messages in thread
From: Jeff Law @ 2016-09-15 17:28 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Bernd Schmidt, gcc-patches, dje.gcc

On 09/14/2016 04:11 PM, Segher Boessenkool wrote:
> On Wed, Sep 14, 2016 at 01:33:04PM -0600, Jeff Law wrote:
>> On 09/14/2016 01:03 PM, Segher Boessenkool wrote:
>>>> If you think about it, conceptually we want the return insn to make the
>>>> callee saved registers "used" so that DCE, regrename and friends don't
>>>> muck with them.  The fact that we don't is as much never having to care
>>>> all that much until recently.
>>>
>>> (There is no return insn at those exits; these are exits *without*
>>> successor block, not the exit block).
>> Ugh.  Anywhere we could attach this stuff in the insn chain?  If not,
>> the DF side of this problem gets uglier.
>
> I put the USEs at the start of that noreturn basic block.
Just naked USEs of the REG?  For some reason I was uneasy about this, 
but I can't recall why, maybe I just latched onto 
CALL_INSN_FUNCTION_USAGE and wanted to use the same model.

Seems like we should just go with the naked USE of the REGs.


>> While it is just a reachability problem, I don't think we need to solve
>> it if we mark anything that was separately shrink wrapped as live at all
>> the exit points.
>
> Agreed, but why does it work if not separately shrink-wrapping anything?
> And why does it break on things that are *not* separately wrapped *anywhere*?
That I don't know...  It's a hell of a mystery.

Jeff

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped
  2016-09-14 19:38                     ` Jeff Law
  2016-09-14 22:34                       ` Segher Boessenkool
@ 2016-09-19 17:11                       ` Segher Boessenkool
  1 sibling, 0 replies; 60+ messages in thread
From: Segher Boessenkool @ 2016-09-19 17:11 UTC (permalink / raw)
  To: Jeff Law; +Cc: Bernd Schmidt, gcc-patches, dje.gcc

Hi Jeff,

On Wed, Sep 14, 2016 at 01:33:04PM -0600, Jeff Law wrote:
> On 09/14/2016 01:03 PM, Segher Boessenkool wrote:
> >>If you think about it, conceptually we want the return insn to make the
> >>callee saved registers "used" so that DCE, regrename and friends don't
> >>muck with them.  The fact that we don't is as much never having to care
> >>all that much until recently.
> >
> >(There is no return insn at those exits; these are exits *without*
> >successor block, not the exit block).
> Ugh.  Anywhere we could attach this stuff in the insn chain?  If not, 
> the DF side of this problem gets uglier.

> >>I continue to wonder if we could add something similar to
> >>CALL_INSN_FUNCTION_USAGE where we attach uses for all the call-saved
> >>registers onto a return insn.  We would attach them at the end of
> >>prologue/epilogue generation and only attach those where were live
> >>somewhere in the code.
> >
> >Maybe adding the new insns to the {pro,epi}logue_insn_hash will help
> >something.  Or maybe it will blow up spectacularly.  Will know in a
> >bit.
> I think that'll resolve the sel-sched issue, but I doubt the rest.

Adding all the separately shrink-wrapped *logues to the hashes does not
help at all.

Trying to fix DF did not work either.

> >>I pondered just doing it for the separately wrapped components on that
> >>particular path, but I've yet to convince myself that's actually correct.
> >
> >If that is not correct, how is the status quo correct?  That is what
> >puzzles me above, too.
> I couldn't convince myself that we could trivially find all the 
> separately wrapped components on a particular path -- ie, we may have 
> had components saved/restored in some sub-graph.  If an exit point from 
> the function is reachable from that sub-graph, then we need to make sure 
> any components from the subgraph are marked as live in addition to those 
> which are restored on the exit path.
> 
> While it is just a reachability problem, I don't think we need to solve 
> it if we mark anything that was separately shrink wrapped as live at all 
> the exit points.

If I mark every block with no successors as needing all components, various
patches are no longer needed.  It also should not hurt performance much
(with the possible exception of sibling calls, which can be executed often;
still have to do performance runs).

The dce and cprop patches and the first regrename patch are still needed.
Those three are simple and obviously correct patches though.

I'll walk through all your comments again, updating the patches.  Will
send a v3 soon.

Thanks for all the reviews and helpful comments,


Segher

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2016-09-19 16:56 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-08  1:48 [PATCH 0/9] separate shrink-wrapping Segher Boessenkool
2016-06-08  1:48 ` [PATCH 3/9] dce: Don't dead-code delete separately wrapped restores Segher Boessenkool
2016-06-08  1:48 ` [PATCH 1/9] separate shrink-wrap: New command-line flag, status flag, hooks, and doc Segher Boessenkool
2016-06-08  1:48 ` [PATCH 2/9] cfgcleanup: Don't confuse CFI when -fshrink-wrap-separate Segher Boessenkool
2016-06-08  1:53 ` [PATCH 4/9] regrename: Don't rename restores Segher Boessenkool
2016-06-08  1:53 ` [PATCH 6/9] sel-sched: Don't mess with register restores Segher Boessenkool
2016-06-08  1:54 ` [PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped Segher Boessenkool
2016-06-08  9:18   ` Bernd Schmidt
2016-09-09 18:41     ` Jeff Law
2016-09-09 20:56       ` Segher Boessenkool
2016-09-09 23:12         ` Jeff Law
2016-09-10  6:59           ` Segher Boessenkool
2016-09-12 16:36             ` Jeff Law
     [not found]               ` <CAGWvny=fHHZtKF4_D2098+3PTPPzxtg3EjKDWHyJwUxz8g_tEA@mail.gmail.com>
     [not found]                 ` <CAGWvnymZVg_FR_PHqhwkgrAkHDntVMEiG4shfst_GA9OnZKvWg@mail.gmail.com>
     [not found]                   ` <CAGWvnykQ3oz0UpcF6U1WYivbJww65h2EH5n3FocQ8JGY9hrOrA@mail.gmail.com>
2016-09-12 17:04                     ` Jeff Law
2016-09-14 13:08               ` Segher Boessenkool
2016-09-14 13:18                 ` Bernd Schmidt
2016-09-14 14:01                   ` Segher Boessenkool
2016-09-14 14:54                     ` Bernd Schmidt
2016-09-14 16:33                       ` Segher Boessenkool
2016-09-14 19:10                       ` Jeff Law
2016-09-14 17:55                     ` Jeff Law
2016-09-14 19:13                       ` Segher Boessenkool
2016-09-14 19:36                         ` Jeff Law
2016-09-14 18:21                 ` Jeff Law
2016-09-14 19:13                   ` Segher Boessenkool
2016-09-14 19:38                     ` Jeff Law
2016-09-14 22:34                       ` Segher Boessenkool
2016-09-15 17:28                         ` Jeff Law
2016-09-19 17:11                       ` Segher Boessenkool
2016-09-14 20:04                     ` Jeff Law
2016-09-14 22:51                       ` Segher Boessenkool
2016-06-08  1:54 ` [PATCH 7/9] cprop: Leave RTX_FRAME_RELATED_P instructions alone Segher Boessenkool
2016-06-08  2:03 ` [PATCH 8/9] shrink-wrap: shrink-wrapping for separate concerns Segher Boessenkool
2016-07-15 12:42   ` Bernd Schmidt
2016-07-18 16:34     ` Segher Boessenkool
2016-07-18 17:03       ` Bernd Schmidt
2016-07-19 14:46         ` Segher Boessenkool
2016-07-19 14:49           ` Bernd Schmidt
2016-07-19 15:35             ` Segher Boessenkool
2016-07-20 11:23               ` Bernd Schmidt
2016-07-20 15:06                 ` Segher Boessenkool
2016-06-08  2:04 ` [PATCH 9/9] rs6000: Separate shrink-wrapping Segher Boessenkool
2016-06-08 11:56 ` [PATCH 0/9] separate shrink-wrapping Bernd Schmidt
2016-06-08 12:45   ` Eric Botcazou
2016-06-08 15:16   ` Segher Boessenkool
2016-06-08 16:43     ` Bernd Schmidt
2016-06-08 17:26       ` Segher Boessenkool
2016-06-29 23:06         ` Bernd Schmidt
2016-06-29 23:24           ` Segher Boessenkool
2016-07-04  8:57             ` Segher Boessenkool
2016-06-14 21:24       ` Segher Boessenkool
2016-07-08 10:42         ` Bernd Schmidt
2016-07-08 12:11           ` Segher Boessenkool
2016-07-08 13:16             ` David Malcolm
2016-07-08 13:45               ` Segher Boessenkool
2016-07-08 14:35                 ` Bill Schmidt
2016-06-09 16:12 ` Jeff Law
2016-06-09 19:57   ` Segher Boessenkool
2016-06-28  0:22 ` PING " Segher Boessenkool
2016-07-07 10:16   ` PING x2 " Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).