public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: Inliner heuristics facelift
@ 2011-04-17 12:07 Dominique Dhumieres
  2011-04-17 13:35 ` Jan Hubicka
  2011-04-17 14:29 ` Jan Hubicka
  0 siblings, 2 replies; 9+ messages in thread
From: Dominique Dhumieres @ 2011-04-17 12:07 UTC (permalink / raw)
  To: hubicka; +Cc: richard.guenther, gcc-patches

Honza,

Your patch seems to make --param max-inline-insns-auto= ineffective:

gfc -Ofast -funroll-loops -ftree-loop-linear -fomit-frame-pointer --param max-inline-insns-auto=4000 -fwhole-program -fstack-arrays ac.f90
8.105u 0.005s 0:08.11 99.8%	0+0k 0+5io 0pf+0w

while the timing was ~2.7s with a 125 threshold before the patch.

Cheers,

Dominique

^ permalink raw reply	[flat|nested] 9+ messages in thread
* Inliner heuristics facelift
@ 2011-04-17 10:54 Jan Hubicka
  2011-04-19 18:37 ` Xinliang David Li
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Hubicka @ 2011-04-17 10:54 UTC (permalink / raw)
  To: gcc-patches

Hi,
cgraph inliner implementation is 8 years old now.  It has growh from simple
C-only inliner preventing unbounded code size explossion seen on earlier tree
inlining heuristics over "lets solve problem with abstraction penalty in
tramp3d" into "lets do something useful on Mozilla".

In this patch I am attempting to retrospect some design into the code that
since original one got lost in tons of various additions and migration from
different cgraph implementations.

While I think the overhall scheme of having way to transform cgraph&functions
separated from heuritics and heuristics implemented as a greedy algorithm doing
as much of useful inlining as allowed by the limits is OK, the implementation
got clumpsy.

Main idea of the patch is to move all the various check scattered across the
code into can_inline_edge_p predicate that test if inlining is at all possible.
In addition each of different inliner heuristics (always inline, flatteing,
early inlining, inlining of small functions, recursive inlining and inlining of
functions called once) has bit different rules on when it thinks inlining is
good idea.  For always inline and flattening it is trivial. For other there are
now separated want_* predicates doing all the checks at one place.

The patch also cleanups the dump files and makes function naming more regular,
especially by dropping nonsential cgraph prefixes coming from dark times when
cgraph and inliner was in single source file. 

While working on the patch I contied number of minor issues that are fixed,
since it was easier than go with immitating previous behaviour and adding
FIXMEs.

In particular

  1) Long standing TODO in estimate_growth concerning miscomputation of self
     recursive functions is gone.  It was trivial ever since the cgraph code
     was revamped to inline plans with clones.
  2) Small function inliner no longer inserts only functoins passing
     cgraph_default_inline_p nto the queue.  The feasibility of small function
     inlining is edge specific now and accounts estimated benefits into
     --param large-function-insns-auto and -single limits.
  3) There was bug in cgraph_check_inline_limits concerning stack usage
     computation for nested functions, probably introduced by my previous patch.
     It is fixed now
  4) cgraph_check_inline_limits was specially allowing inlining large function
     into small by setting the base of limit to be bigger caller or callee self
     size.  This is very important since often large functions should be inlined
     into tiny wrappers and benefit from the extra context.

     It is however also sort shighted, since when large function gets inlined into
     tiny wrapper, later we migh prevent inlining small functions into the large functions
     since in next iteration the size will be maximum of the tiny wrapper and small
     function.
     Fixed by looking for largest function in the inline stack and symetrically in
     the case of
  5) All of the recursive inlining tests was implemented before callgraph edge
     frequencies was added and not updated since then. We now compute the decision
     based on both.  It is still not doing as well as I would like, I will try
     to reconsider this later.
  6) Inlining functions called once was missing oppurtunities because at inliner
     time we keep in callgraph offline copies of extern inline functions and some
     virtual functions for future inlining.  Calls from these functions prevent
     inliner to notice that function is called once.  Fixed by removing unreachable
     calls before functions called once are processed.

Bootstrapped/regtested x86_64-linux, also tested on tramp3d to behave sanely
and on nightly tester (frescobaldi).
Will commit it later today.

Honza

	* lto-symtab.c (lto_cgraph_replace_node): When call statement is
	present, also set gimple_call_set_cannot_inline.
	* ipa-inline.c: Update toplevel comment.
	(MAX_TIME): Remove.
	(cgraph_clone_inlined_nodes): Fix linebreaks.
	(cgraph_check_inline_limits): Restructure to ...
	(caller_growth_limits): ... this one; be more tolerant
	on growth in nested inline chains; add explanatory comment;
	fix stack accounting thinko introduced by previous patch.
	(cgraph_default_inline_p): Remove.
	(report_inline_failed_reason): New function.
	(can_inline_edge_p): New function.
	(can_early_inline_edge_p): New function.
	(leaf_node_p): Move upwards in file.
	(want_early_inline_function_p): New function.
	(want_inline_small_function_p): New function.
	(want_inline_self_recursive_call_p): New function.
	(cgraph_edge_badness): Rename to ...
	(edge_badness) ... this one; fix linebreaks.
	(update_edge_key): Update call of edge_baddness; add
	detailed dump about queue updates.
	(update_caller_keys): Use can_inline_edge_p and
	want_inline_small_function_p.
	(cgraph_decide_recursive_inlining): Rename to...
	(recursive_inlining): Use can_inline_edge_p and
	want_inline_self_recursive_call_p; simplify and
	remove no longer valid FIXME.
	(cgraph_set_inline_failed): Remove.
	(add_new_edges_to_heap): Use can_inline_edge_p and
	want_inline_small_function_p.
	(cgraph_decide_inlining_of_small_functions): Rename to ...
	(inline_small_functions): ... this one; cleanup; use
	can/want predicates; cleanup debug ouput; work edges
	till fibheap is exhausted and do not stop once unit
	growth is reached; remove later loop processing remaining
	edges.
	(cgraph_flatten): Rename to ...
	(flatten_function): ... this one; use can_inline_edge_p
	and can_early_inline_edge_p predicates.
	(cgraph_decide_inlining): Rename to ...
	(ipa_inline): ... this one; remove unreachable nodes before
	inlining functions called once; simplify the pass.
	(cgraph_perform_always_inlining): Rename to ...
	(inline_always_inline_functions): ... this one; use
	DECL_DISREGARD_INLINE_LIMITS; use can_inline_edge_p
	predicate
	(cgraph_decide_inlining_incrementally): Rename to ...
	(early_inline_small_functions): ... this one; simplify
	using new predicates; cleanup; make dumps prettier.
	(cgraph_early_inlining): Rename to ...
	(early_inliner): newer inline regular functions into always-inlines;
	fix updating of call stmt summaries.
	(pass_early_inline): Update for new names.
	(inline_transform): Fix formating.
	(gate_cgraph_decide_inlining): Rename to ...
	(pass_ipa_inline): ... this one.
	* ipa-inline.h (inline_summary): Remove disregard_inline_limits.
	* ipa-inline-analysis.c (dump_inline_summary): Update.
	(compute_inline_parameters): Do not compute disregard_inline_limits.
	(estimate_growth): Fix handlig of non-trivial self recursion.
	(inline_read_summary): Do not read info->disregard_inline_limits.
	(inline_write_summary): Do not write info->disregard_inline_limits.

	* gcc.dg/winline-5.c: Update testcase.

Index: testsuite/gcc.dg/winline-5.c
===================================================================
*** testsuite/gcc.dg/winline-5.c	(revision 172597)
--- testsuite/gcc.dg/winline-5.c	(working copy)
*************** inline int q(void) /* { dg-warning "inli
*** 15,29 ****
  	big();
  	big();
  }
- inline int q1(void)
- {
- 	big();
- 	big();
- 	big();
- }
  int t (void)
  {
-  /* We allow one inlining over limit.  */
- 	q1();
  	return q ();		 /* { dg-warning "called from here" } */
  }
--- 15,21 ----
Index: ipa-inline.c
===================================================================
*** ipa-inline.c	(revision 172597)
--- ipa-inline.c	(working copy)
*************** along with GCC; see the file COPYING3.  
*** 21,93 ****
  
  /*  Inlining decision heuristics
  
!     We separate inlining decisions from the inliner itself and store it
!     inside callgraph as so called inline plan.  Refer to cgraph.c
!     documentation about particular representation of inline plans in the
!     callgraph.
! 
!     There are three major parts of this file:
! 
!     cgraph_mark_inline_edge implementation
! 
!       This function allows to mark given call inline and performs necessary
!       modifications of cgraph (production of the clones and updating overall
!       statistics)
  
      inlining heuristics limits
  
!       These functions allow to check that particular inlining is allowed
!       by the limits specified by user (allowed function growth, overall unit
!       growth and so on).
  
      inlining heuristics
  
!       This is implementation of IPA pass aiming to get as much of benefit
!       from inlining obeying the limits checked above.
  
!       The implementation of particular heuristics is separated from
!       the rest of code to make it easier to replace it with more complicated
!       implementation in the future.  The rest of inlining code acts as a
!       library aimed to modify the callgraph and verify that the parameters
!       on code size growth fits.
! 
!       To mark given call inline, use cgraph_mark_inline function, the
!       verification is performed by cgraph_default_inline_p and
!       cgraph_check_inline_limits.
! 
!       The heuristics implements simple knapsack style algorithm ordering
!       all functions by their "profitability" (estimated by code size growth)
!       and inlining them in priority order.
! 
!       cgraph_decide_inlining implements heuristics taking whole callgraph
!       into account, while cgraph_decide_inlining_incrementally considers
!       only one function at a time and is used by early inliner.
! 
!    The inliner itself is split into two passes:
! 
!    pass_early_inlining
! 
!      Simple local inlining pass inlining callees into current function.  This
!      pass makes no global whole compilation unit analysis and this when allowed
!      to do inlining expanding code size it might result in unbounded growth of
!      whole unit.
! 
!      The pass is run during conversion into SSA form.  Only functions already
!      converted into SSA form are inlined, so the conversion must happen in
!      topological order on the callgraph (that is maintained by pass manager).
!      The functions after inlining are early optimized so the early inliner sees
!      unoptimized function itself, but all considered callees are already
!      optimized allowing it to unfold abstraction penalty on C++ effectively and
!      cheaply.
! 
!    pass_ipa_inline
! 
!      This is the main pass implementing simple greedy algorithm to do inlining
!      of small functions that results in overall growth of compilation unit and
!      inlining of functions called once.  The pass compute just so called inline
!      plan (representation of inlining to be done in callgraph) and unlike early
!      inlining it is not performing the inlining itself.
!  */
  
  #include "config.h"
  #include "system.h"
--- 21,106 ----
  
  /*  Inlining decision heuristics
  
!     The implementation of inliner is organized as follows:
! 
!     Transformation of callgraph to represent inlining decisions.
! 
!       The inline decisions are stored in callgraph in "inline plan" and
!       all applied later.
! 
!       To mark given call inline, use cgraph_mark_inline function.
!       The function marks the edge inlinable and, if neccesary, produces
!       virtual clone in the callgraph representing the new copy of callee's
!       function body.
! 
!       The inline plan is applied on given function body by inline_transform. 
  
      inlining heuristics limits
  
!       can_inline_edge_p allow to check that particular inlining is allowed
!       by the limits specified by user (allowed function growth, growth and so
!       on).
! 
!       Functions are inlined when it is obvious the result is profitable (such
!       as functions called once or when inlining reduce code size).
!       In addition to that we perform inlining of small functions and recursive
!       inlining.
  
      inlining heuristics
  
!        The inliner itself is split into two passes:
! 
!        pass_early_inlining
  
! 	 Simple local inlining pass inlining callees into current function.
! 	 This pass makes no use of whole unit analysis and thus it can do only
! 	 very simple decisions based on local properties.
! 
! 	 The strength of the pass is that it is run in topological order
! 	 (reverse postorder) on the callgraph. Functions are converted into SSA
! 	 form just before this pass and optimized subsequently. As a result, the
! 	 callees of the function seen by the early inliner was already optimized
! 	 and results of early inlining adds a lot of optimization oppurtunities
! 	 for the local optimization.
! 
! 	 The pass handle the obvious inlining decisions within the copmilation
! 	 unit - inlining auto inline functions, inlining for size and
! 	 flattening.
! 
! 	 main strength of the pass is the ability to eliminate abstraction
! 	 penalty in C++ code (via combination of inlining and early
! 	 optimization) and thus improve quality of analysis done by real IPA
! 	 optimizers.
! 
! 	 Because of lack of whole unit knowledge, the pass can not really make
! 	 good code size/performance tradeoffs.  It however does very simple
! 	 speculative inlining allowing code size to grow by
! 	 EARLY_INLINING_INSNS when calee is leaf function.  In this case the
! 	 optimizations perfomed later are very likely to eliminate the cost.
! 
!        pass_ipa_inline
! 
! 	 This is the real inliner able to handle inlining with whole program
! 	 knowledge. It performs following steps:
! 
! 	 1) inlining of small functions.  This is implemented by greedy
! 	 algorithm ordering all inlinable cgraph edges by their badness and
! 	 inlining them in this order as long as inline limits allows doing so.
! 
! 	 This heuristics is not very good on inlining recursive calls. Recursive
! 	 calls can be inlined with results similar to loop unrolling. To do so,
! 	 special purpose recursive inliner is executed on function when
! 	 recursive edge is met as viable candidate.
! 
! 	 2) Unreachable functions are removed from callgraph.  Inlining leads
! 	 to devirtualization and other modification of callgraph so functions
! 	 may become unreachable during the process. Also functions declared as
! 	 extern inline or virtual functions are removed, since after inlining
! 	 we no longer need the offline bodies.
! 
! 	 3) Functions called once and not exported from the unit are inlined.
! 	 This should almost always lead to reduction of code size by eliminating
! 	 the need for offline copy of the function.  */
  
  #include "config.h"
  #include "system.h"
*************** along with GCC; see the file COPYING3.  
*** 114,122 ****
  #include "except.h"
  #include "ipa-inline.h"
  
- #define MAX_TIME 1000000000
- 
- 
  /* Statistics we collect about inlining algorithm.  */
  static int ncalls_inlined;
  static int nfunctions_inlined;
--- 127,132 ----
*************** cgraph_clone_inlined_nodes (struct cgrap
*** 163,173 ****
        /* We may eliminate the need for out-of-line copy to be output.
  	 In that case just go ahead and re-use it.  */
        if (!e->callee->callers->next_caller
! 	  /* Recursive inlining never wants the master clone to be overwritten.  */
  	  && update_original
! 	  /* FIXME: When address is taken of DECL_EXTERNAL function we still can remove its
! 	     offline copy, but we would need to keep unanalyzed node in the callgraph so
! 	     references can point to it.  */
  	  && !e->callee->address_taken
  	  && cgraph_can_remove_if_no_direct_calls_p (e->callee)
  	  /* Inlining might enable more devirtualizing, so we want to remove
--- 173,184 ----
        /* We may eliminate the need for out-of-line copy to be output.
  	 In that case just go ahead and re-use it.  */
        if (!e->callee->callers->next_caller
! 	  /* Recursive inlining never wants the master clone to
! 	     be overwritten.  */
  	  && update_original
! 	  /* FIXME: When address is taken of DECL_EXTERNAL function we still
! 	     can remove its offline copy, but we would need to keep unanalyzed
! 	     node in the callgraph so references can point to it.  */
  	  && !e->callee->address_taken
  	  && cgraph_can_remove_if_no_direct_calls_p (e->callee)
  	  /* Inlining might enable more devirtualizing, so we want to remove
*************** cgraph_clone_inlined_nodes (struct cgrap
*** 175,181 ****
  	     Lacking may edges in callgraph we just preserve them post
  	     inlining.  */
  	  && (!DECL_VIRTUAL_P (e->callee->decl)
! 	      || (!DECL_COMDAT (e->callee->decl) && !DECL_EXTERNAL (e->callee->decl)))
  	  /* Don't reuse if more than one function shares a comdat group.
  	     If the other function(s) are needed, we need to emit even
  	     this function out of line.  */
--- 186,193 ----
  	     Lacking may edges in callgraph we just preserve them post
  	     inlining.  */
  	  && (!DECL_VIRTUAL_P (e->callee->decl)
! 	      || (!DECL_COMDAT (e->callee->decl)
! 		  && !DECL_EXTERNAL (e->callee->decl)))
  	  /* Don't reuse if more than one function shares a comdat group.
  	     If the other function(s) are needed, we need to emit even
  	     this function out of line.  */
*************** cgraph_clone_inlined_nodes (struct cgrap
*** 214,220 ****
        + caller_info->estimated_self_stack_size;
    peak = callee_info->stack_frame_offset
        + callee_info->estimated_self_stack_size;
!   if (inline_summary (e->callee->global.inlined_to)->estimated_stack_size < peak)
      inline_summary (e->callee->global.inlined_to)->estimated_stack_size = peak;
    cgraph_propagate_frequency (e->callee);
  
--- 226,233 ----
        + caller_info->estimated_self_stack_size;
    peak = callee_info->stack_frame_offset
        + callee_info->estimated_self_stack_size;
!   if (inline_summary (e->callee->global.inlined_to)->estimated_stack_size
!       < peak)
      inline_summary (e->callee->global.inlined_to)->estimated_stack_size = peak;
    cgraph_propagate_frequency (e->callee);
  
*************** cgraph_mark_inline_edge (struct cgraph_e
*** 272,304 ****
      return false;
  }
  
! /* Return false when inlining edge E is not good idea
!    as it would cause too large growth of the callers function body
!    or stack frame size.  *REASON if non-NULL is updated if the
!    inlining is not a good idea.  */
  
  static bool
! cgraph_check_inline_limits (struct cgraph_edge *e,
! 			    cgraph_inline_failed_t *reason)
  {
    struct cgraph_node *to = e->caller;
    struct cgraph_node *what = e->callee;
    int newsize;
!   int limit;
!   HOST_WIDE_INT stack_size_limit, inlined_stack;
!   struct inline_summary *info, *what_info;
! 
!   if (to->global.inlined_to)
!     to = to->global.inlined_to;
  
-   info = inline_summary (to);
    what_info = inline_summary (what);
  
!   /* When inlining large function body called once into small function,
!      take the inlined function as base for limiting the growth.  */
!   if (info->self_size > what_info->self_size)
!     limit = info->self_size;
!   else
      limit = what_info->self_size;
  
    limit += limit * PARAM_VALUE (PARAM_LARGE_FUNCTION_GROWTH) / 100;
--- 285,336 ----
      return false;
  }
  
! /* Return false when inlining edge E would lead to violating
!    limits on function unit growth or stack usage growth.  
! 
!    The relative function body growth limit is present generally
!    to avoid problems with non-linear behaviour of the compiler.
!    To allow inlining huge functions into tiny wrapper, the limit
!    is always based on the bigger of the two functions considered.
! 
!    For stack growth limits we always base the growth in stack usage
!    of the callers.  We want to prevent applications from segfaulting
!    on stack overflow when functions with huge stack frames gets
!    inlined. */
  
  static bool
! caller_growth_limits (struct cgraph_edge *e)
  {
    struct cgraph_node *to = e->caller;
    struct cgraph_node *what = e->callee;
    int newsize;
!   int limit = 0;
!   HOST_WIDE_INT stack_size_limit = 0, inlined_stack;
!   struct inline_summary *info, *what_info, *outer_info = inline_summary (to);
! 
!   /* Look for function e->caller is inlined to.  While doing
!      so work out the largest function body on the way.  As
!      described above, we want to base our function growth
!      limits based on that.  Not on the self size of the
!      outer function, not on the self size of inline code
!      we immediately inline to.  This is the most relaxed
!      interpretation of the rule "do not grow large functions
!      too much in order to prevent compiler from exploding".  */
!   do
!     {
!       info = inline_summary (to);
!       if (limit < info->self_size)
! 	limit = info->self_size;
!       if (stack_size_limit < info->estimated_self_stack_size)
! 	stack_size_limit = info->estimated_self_stack_size;
!       if (to->global.inlined_to)
!         to = to->callers->caller;
!     }
!   while (to->global.inlined_to);
  
    what_info = inline_summary (what);
  
!   if (limit < what_info->self_size)
      limit = what_info->self_size;
  
    limit += limit * PARAM_VALUE (PARAM_LARGE_FUNCTION_GROWTH) / 100;
*************** cgraph_check_inline_limits (struct cgrap
*** 310,388 ****
        && newsize > PARAM_VALUE (PARAM_LARGE_FUNCTION_INSNS)
        && newsize > limit)
      {
!       if (reason)
!         *reason = CIF_LARGE_FUNCTION_GROWTH_LIMIT;
        return false;
      }
  
!   stack_size_limit = info->estimated_self_stack_size;
! 
!   stack_size_limit += stack_size_limit * PARAM_VALUE (PARAM_STACK_FRAME_GROWTH) / 100;
  
!   inlined_stack = (info->stack_frame_offset
! 		   + info->estimated_self_stack_size
  		   + what_info->estimated_stack_size);
!   if (inlined_stack  > stack_size_limit
        && inlined_stack > PARAM_VALUE (PARAM_LARGE_STACK_FRAME))
      {
!       if (reason)
!         *reason = CIF_LARGE_STACK_FRAME_GROWTH_LIMIT;
        return false;
      }
    return true;
  }
  
! /* Return true when function N is small enough to be inlined.  */
  
  static bool
! cgraph_default_inline_p (struct cgraph_node *n, cgraph_inline_failed_t *reason)
  {
!   tree decl = n->decl;
!   struct inline_summary *info = inline_summary (n);
  
!   if (info->disregard_inline_limits)
!     return true;
  
!   if (!flag_inline_small_functions && !DECL_DECLARED_INLINE_P (decl))
      {
!       if (reason)
! 	*reason = CIF_FUNCTION_NOT_INLINE_CANDIDATE;
        return false;
      }
!   if (!n->analyzed)
      {
!       if (reason)
! 	*reason = CIF_BODY_NOT_AVAILABLE;
        return false;
      }
!   if (cgraph_function_body_availability (n) <= AVAIL_OVERWRITABLE)
      {
!       if (reason)
! 	*reason = CIF_OVERWRITABLE;
        return false;
      }
  
  
!   if (DECL_DECLARED_INLINE_P (decl))
      {
!       if (info->size >= MAX_INLINE_INSNS_SINGLE)
  	{
! 	  if (reason)
! 	    *reason = CIF_MAX_INLINE_INSNS_SINGLE_LIMIT;
! 	  return false;
  	}
      }
    else
      {
!       if (info->size >= MAX_INLINE_INSNS_AUTO)
  	{
! 	  if (reason)
! 	    *reason = CIF_MAX_INLINE_INSNS_AUTO_LIMIT;
! 	  return false;
  	}
      }
  
!   return true;
  }
  
  /* A cost model driving the inlining heuristics in a way so the edges with
--- 342,711 ----
        && newsize > PARAM_VALUE (PARAM_LARGE_FUNCTION_INSNS)
        && newsize > limit)
      {
!       e->inline_failed = CIF_LARGE_FUNCTION_GROWTH_LIMIT;
        return false;
      }
  
!   /* FIXME: Stack size limit often prevents inlining in fortran programs
!      due to large i/o datastructures used by the fortran frontend.
!      We ought to ignore this limit when we know that the edge is executed
!      on every invocation of the caller (i.e. its call statement dominates
!      exit block).  We do not track this information, yet.  */
!   stack_size_limit += (stack_size_limit
! 		       * PARAM_VALUE (PARAM_STACK_FRAME_GROWTH) / 100);
  
!   inlined_stack = (outer_info->stack_frame_offset
! 		   + outer_info->estimated_self_stack_size
  		   + what_info->estimated_stack_size);
!   /* Check new stack consumption with stack consumption at the place
!      stack is used.  */
!   if (inlined_stack > stack_size_limit
!       /* If function already has large stack usage from sibbling
! 	 inline call, we can inline, too.
! 	 This bit overoptimistically assume that we are good at stack
! 	 packing.  */
!       && inlined_stack > info->estimated_stack_size
        && inlined_stack > PARAM_VALUE (PARAM_LARGE_STACK_FRAME))
      {
!       e->inline_failed = CIF_LARGE_STACK_FRAME_GROWTH_LIMIT;
        return false;
      }
    return true;
  }
  
! /* Dump info about why inlining has failed.  */
! 
! static void
! report_inline_failed_reason (struct cgraph_edge *e)
! {
!   if (dump_file)
!     {
!       fprintf (dump_file, "  not inlinable: %s/%i -> %s/%i, %s\n",
! 	       cgraph_node_name (e->caller), e->caller->uid,
! 	       cgraph_node_name (e->callee), e->callee->uid,
! 	       cgraph_inline_failed_string (e->inline_failed));
!     }
! }
! 
! /* Decide if we can inline the edge and possibly update
!    inline_failed reason.  
!    We check whether inlining is possible at all and whether
!    caller growth limits allow doing so.  
! 
!    if REPORT is true, output reason to dump file*/
  
  static bool
! can_inline_edge_p (struct cgraph_edge *e, bool report)
  {
!   bool inlinable = true;
  
!   gcc_assert (e->inline_failed);
  
!   if (!e->callee->analyzed)
!     {
!       e->inline_failed = CIF_BODY_NOT_AVAILABLE;
!       inlinable = false;
!     }
!   else if (!inline_summary (e->callee)->inlinable)
!     {
!       e->inline_failed = CIF_FUNCTION_NOT_INLINABLE;
!       inlinable = false;
!     }
!   else if (cgraph_function_body_availability (e->callee) <= AVAIL_OVERWRITABLE)
      {
!       e->inline_failed = CIF_OVERWRITABLE;
        return false;
      }
!   else if (e->call_stmt_cannot_inline_p)
!     {
!       e->inline_failed = CIF_MISMATCHED_ARGUMENTS;
!       inlinable = false;
!     }
!   else if (!tree_can_inline_p (e))
!     inlinable = false;
!   else if (!DECL_DISREGARD_INLINE_LIMITS (e->callee->decl)
!            && !caller_growth_limits (e))
!     inlinable = false;
! 
!   /* Be sure that the cannot_inline_p flag is up to date.  */
!   gcc_checking_assert (!e->call_stmt
! 		       || (gimple_call_cannot_inline_p (e->call_stmt)
! 		           == e->call_stmt_cannot_inline_p)
! 		       /* In -flto-partition=none mode we really keep things out of
! 			  sync because call_stmt_cannot_inline_p is set at cgraph
! 			  merging when function bodies are not there yet.  */
! 		       || (in_lto_p && !gimple_call_cannot_inline_p (e->call_stmt)));
!   if (!inlinable && report)
!     report_inline_failed_reason (e);
!   return inlinable;
! }
! 
! 
! /* Return true if the edge E is inlinable during early inlining.  */
! 
! static bool
! can_early_inline_edge_p (struct cgraph_edge *e)
! {
!   /* Early inliner might get called at WPA stage when IPA pass adds new
!      function.  In this case we can not really do any of early inlining
!      because function bodies are missing.  */
!   if (!gimple_has_body_p (e->callee->decl))
      {
!       e->inline_failed = CIF_BODY_NOT_AVAILABLE;
        return false;
      }
!   /* In early inliner some of callees may not be in SSA form yet
!      (i.e. the callgraph is cyclic and we did not process
!      the callee by early inliner, yet).  We don't have CIF code for this
!      case; later we will re-do the decision in the real inliner.  */
!   if (!gimple_in_ssa_p (DECL_STRUCT_FUNCTION (e->caller->decl))
!       || !gimple_in_ssa_p (DECL_STRUCT_FUNCTION (e->callee->decl)))
      {
!       if (dump_file)
! 	fprintf (dump_file, "  edge not inlinable: not in SSA form\n");
        return false;
      }
+   if (!can_inline_edge_p (e, true))
+     return false;
+   return true;
+ }
+ 
  
+ /* Return true when N is leaf function.  Accept cheap builtins
+    in leaf functions.  */
+ 
+ static bool
+ leaf_node_p (struct cgraph_node *n)
+ {
+   struct cgraph_edge *e;
+   for (e = n->callees; e; e = e->next_callee)
+     if (!is_inexpensive_builtin (e->callee->decl))
+       return false;
+   return true;
+ }
  
! 
! /* Return true if we are interested in inlining small function.  */
! 
! static bool
! want_early_inline_function_p (struct cgraph_edge *e)
! {
!   bool want_inline = true;
! 
!   if (DECL_DISREGARD_INLINE_LIMITS (e->callee->decl))
!     ;
!   else if (!DECL_DECLARED_INLINE_P (e->callee->decl)
! 	   && !flag_inline_small_functions)
      {
!       e->inline_failed = CIF_FUNCTION_NOT_INLINE_CANDIDATE;
!       report_inline_failed_reason (e);
!       want_inline = false;
!     }
!   else
!     {
!       int growth = estimate_edge_growth (e);
!       if (growth <= 0)
! 	;
!       else if (!cgraph_maybe_hot_edge_p (e)
! 	       && growth > 0)
  	{
! 	  if (dump_file)
! 	    fprintf (dump_file, "  will not early inline: %s/%i->%s/%i, "
! 		     "call is cold and code would grow by %i\n",
! 		     cgraph_node_name (e->caller), e->caller->uid,
! 		     cgraph_node_name (e->callee), e->callee->uid,
! 		     growth);
! 	  want_inline = false;
  	}
+       else if (!leaf_node_p (e->callee)
+ 	       && growth > 0)
+ 	{
+ 	  if (dump_file)
+ 	    fprintf (dump_file, "  will not early inline: %s/%i->%s/%i, "
+ 		     "callee is not leaf and code would grow by %i\n",
+ 		     cgraph_node_name (e->caller), e->caller->uid,
+ 		     cgraph_node_name (e->callee), e->callee->uid,
+ 		     growth);
+ 	  want_inline = false;
+ 	}
+       else if (growth > PARAM_VALUE (PARAM_EARLY_INLINING_INSNS))
+ 	{
+ 	  if (dump_file)
+ 	    fprintf (dump_file, "  will not early inline: %s/%i->%s/%i, "
+ 		     "growth %i exceeds --param early-inlining-insns\n",
+ 		     cgraph_node_name (e->caller), e->caller->uid,
+ 		     cgraph_node_name (e->callee), e->callee->uid,
+ 		     growth);
+ 	  want_inline = false;
+ 	}
+     }
+   return want_inline;
+ }
+ 
+ /* Return true if we are interested in inlining small function.
+    When REPORT is true, report reason to dump file.  */
+ 
+ static bool
+ want_inline_small_function_p (struct cgraph_edge *e, bool report)
+ {
+   bool want_inline = true;
+ 
+   if (DECL_DISREGARD_INLINE_LIMITS (e->callee->decl))
+     ;
+   else if (!DECL_DECLARED_INLINE_P (e->callee->decl)
+ 	   && !flag_inline_small_functions)
+     {
+       e->inline_failed = CIF_FUNCTION_NOT_INLINE_CANDIDATE;
+       want_inline = false;
      }
    else
      {
!       int growth = estimate_edge_growth (e);
! 
!       if (growth <= 0)
! 	;
!       else if (DECL_DECLARED_INLINE_P (e->callee->decl)
! 	       && growth >= MAX_INLINE_INSNS_SINGLE)
  	{
!           e->inline_failed = CIF_MAX_INLINE_INSNS_SINGLE_LIMIT;
! 	  want_inline = false;
! 	}
!       else if (!DECL_DECLARED_INLINE_P (e->callee->decl)
! 	       && !flag_inline_functions)
! 	{
!           e->inline_failed = CIF_NOT_DECLARED_INLINED;
! 	  want_inline = false;
! 	}
!       else if (!DECL_DECLARED_INLINE_P (e->callee->decl)
! 	       && growth >= MAX_INLINE_INSNS_AUTO)
! 	{
!           e->inline_failed = CIF_MAX_INLINE_INSNS_AUTO_LIMIT;
! 	  want_inline = false;
! 	}
!       else if (!cgraph_maybe_hot_edge_p (e)
! 	       && estimate_growth (e->callee) > 0)
! 	{
!           e->inline_failed = CIF_UNLIKELY_CALL;
! 	  want_inline = false;
  	}
      }
+   if (!want_inline && report)
+     report_inline_failed_reason (e);
+   return want_inline;
+ }
  
! /* EDGE is self recursive edge.
!    We hand two cases - when function A is inlining into itself
!    or when function A is being inlined into another inliner copy of function
!    A within function B.  
! 
!    In first case OUTER_NODE points to the toplevel copy of A, while
!    in the second case OUTER_NODE points to the outermost copy of A in B.
! 
!    In both cases we want to be extra selective since
!    inlining the call will just introduce new recursive calls to appear.  */
! static bool
! want_inline_self_recursive_call_p (struct cgraph_edge *edge,
! 				   struct cgraph_node *outer_node,
! 				   bool peeling,
! 				   int depth)
! {
!   char const *reason = NULL;
!   bool want_inline = true;
!   int caller_freq = CGRAPH_FREQ_BASE;
!   int max_depth = PARAM_VALUE (PARAM_MAX_INLINE_RECURSIVE_DEPTH_AUTO);
! 
!   if (DECL_DECLARED_INLINE_P (edge->callee->decl))
!     max_depth = PARAM_VALUE (PARAM_MAX_INLINE_RECURSIVE_DEPTH);
! 
!   if (!cgraph_maybe_hot_edge_p (edge))
!     {
!       reason = "recursive call is cold";
!       want_inline = false;
!     }
!   else if (max_count && !outer_node->count)
!     {
!       reason = "not executed in profile";
!       want_inline = false;
!     }
!   else if (depth > max_depth)
!     {
!       reason = "--param max-inline-recursive-depth exceeded.";
!       want_inline = false;
!     }
! 
!   if (outer_node->global.inlined_to)
!     caller_freq = outer_node->callers->frequency;
! 
!   if (!want_inline)
!     ;
!   /* Inlining of self recursive function into copy of itself within other function
!      is transformation similar to loop peeling.
! 
!      Peeling is profitable if we can inline enough copies to make probablility
!      of actual call to the self recursive function very small.  Be sure that
!      the probability of recursion is small.
! 
!      We ensure that the frequency of recusing is at most 1 - (1/max_depth).
!      This way the expected number of recusion is at most max_depth.  */
!   else if (peeling)
!     {
!       int max_prob = CGRAPH_FREQ_BASE - ((CGRAPH_FREQ_BASE + max_depth - 1)
! 					 / max_depth);
!       int i;
!       for (i = 1; i < depth; i++)
! 	max_prob = max_prob * max_prob / CGRAPH_FREQ_BASE;
!       if (max_count
! 	  && (edge->count * CGRAPH_FREQ_BASE / outer_node->count
! 	      >= max_prob))
! 	{
! 	  reason = "profile of recursive call is too large";
! 	  want_inline = false;
! 	}
!       if (!max_count
! 	  && (edge->frequency * CGRAPH_FREQ_BASE / caller_freq
! 	      >= max_prob))
! 	{
! 	  reason = "frequency of recursive call is too large";
! 	  want_inline = false;
! 	}
!     }
!   /* Recusive inlining, i.e. equivalent of unrolling, is profitable if recusion
!      depth is large.  We reduce function call overhead and increase chances that
!      things fit in hardware return predictor.
! 
!      Recursive inlining might however increase cost of stack frame setup
!      actually slowing down functions whose recursion tree is wide rather than
!      deep.
! 
!      Deciding reliably on when to do recursive inlining withthout profile feedback
!      is tricky.  For now we disable recursive inlining when probability of self
!      recursion is low. 
! 
!      Recursive inlining of self recursive call within loop also results in large loop
!      depths that generally optimize badly.  We may want to throttle down inlining
!      in those cases.  In particular this seems to happen in one of libstdc++ rb tree
!      methods.  */
!   else
!     {
!       if (max_count
! 	  && (edge->count * 100 / outer_node->count
! 	      <= PARAM_VALUE (PARAM_MIN_INLINE_RECURSIVE_PROBABILITY)))
! 	{
! 	  reason = "profile of recursive call is too small";
! 	  want_inline = false;
! 	}
!       else if (!max_count
! 	       && (edge->frequency * 100 / caller_freq
! 	           <= PARAM_VALUE (PARAM_MIN_INLINE_RECURSIVE_PROBABILITY)))
! 	{
! 	  reason = "frequency of recursive call is too small";
! 	  want_inline = false;
! 	}
!     }
!   if (!want_inline && dump_file)
!     fprintf (dump_file, "   not inlining recursively: %s\n", reason);
!   return want_inline;
  }
  
  /* A cost model driving the inlining heuristics in a way so the edges with
*************** cgraph_default_inline_p (struct cgraph_n
*** 392,404 ****
     of the function or function body size.  */
  
  static int
! cgraph_edge_badness (struct cgraph_edge *edge, bool dump)
  {
    gcov_type badness;
    int growth;
    struct inline_summary *callee_info = inline_summary (edge->callee);
  
!   if (callee_info->disregard_inline_limits)
      return INT_MIN;
  
    growth = estimate_edge_growth (edge);
--- 715,727 ----
     of the function or function body size.  */
  
  static int
! edge_badness (struct cgraph_edge *edge, bool dump)
  {
    gcov_type badness;
    int growth;
    struct inline_summary *callee_info = inline_summary (edge->callee);
  
!   if (DECL_DISREGARD_INLINE_LIMITS (edge->callee->decl))
      return INT_MIN;
  
    growth = estimate_edge_growth (edge);
*************** cgraph_edge_badness (struct cgraph_edge 
*** 488,494 ****
  	  fprintf (dump_file,
  		   "      %i: guessed profile. frequency %i, overall growth %i,"
  		   " benefit %i%%, divisor %i\n",
! 		   (int) badness, edge->frequency, growth_for_all, benefitperc, div);
  	}
      }
    /* When function local profile is not available or it does not give
--- 811,818 ----
  	  fprintf (dump_file,
  		   "      %i: guessed profile. frequency %i, overall growth %i,"
  		   " benefit %i%%, divisor %i\n",
! 		   (int) badness, edge->frequency, growth_for_all,
! 		   benefitperc, div);
  	}
      }
    /* When function local profile is not available or it does not give
*************** cgraph_edge_badness (struct cgraph_edge 
*** 523,532 ****
  }
  
  /* Recompute badness of EDGE and update its key in HEAP if needed.  */
! static void
  update_edge_key (fibheap_t heap, struct cgraph_edge *edge)
  {
!   int badness = cgraph_edge_badness (edge, false);
    if (edge->aux)
      {
        fibnode_t n = (fibnode_t) edge->aux;
--- 847,856 ----
  }
  
  /* Recompute badness of EDGE and update its key in HEAP if needed.  */
! static inline void
  update_edge_key (fibheap_t heap, struct cgraph_edge *edge)
  {
!   int badness = edge_badness (edge, false);
    if (edge->aux)
      {
        fibnode_t n = (fibnode_t) edge->aux;
*************** update_edge_key (fibheap_t heap, struct 
*** 539,549 ****
        if (badness < n->key)
  	{
  	  fibheap_replace_key (heap, n, badness);
  	  gcc_checking_assert (n->key == badness);
  	}
      }
    else
!     edge->aux = fibheap_insert (heap, badness, edge);
  }
  
  /* Recompute heap nodes for each of caller edge.  */
--- 863,892 ----
        if (badness < n->key)
  	{
  	  fibheap_replace_key (heap, n, badness);
+ 	  if (dump_file && (dump_flags & TDF_DETAILS))
+ 	    {
+ 	      fprintf (dump_file,
+ 		       "  decreasing badness %s/%i -> %s/%i, %i to %i\n",
+ 		       cgraph_node_name (edge->caller), edge->caller->uid,
+ 		       cgraph_node_name (edge->callee), edge->callee->uid,
+ 		       (int)n->key,
+ 		       badness);
+ 	    }
  	  gcc_checking_assert (n->key == badness);
  	}
      }
    else
!     {
!        if (dump_file && (dump_flags & TDF_DETAILS))
! 	 {
! 	   fprintf (dump_file,
! 		    "  enqueuing call %s/%i -> %s/%i, badness %i\n",
! 		    cgraph_node_name (edge->caller), edge->caller->uid,
! 		    cgraph_node_name (edge->callee), edge->callee->uid,
! 		    badness);
! 	 }
!       edge->aux = fibheap_insert (heap, badness, edge);
!     }
  }
  
  /* Recompute heap nodes for each of caller edge.  */
*************** update_caller_keys (fibheap_t heap, stru
*** 553,559 ****
  		    bitmap updated_nodes)
  {
    struct cgraph_edge *edge;
-   cgraph_inline_failed_t failed_reason;
  
    if (!inline_summary (node)->inlinable
        || cgraph_function_body_availability (node) <= AVAIL_OVERWRITABLE
--- 896,901 ----
*************** update_caller_keys (fibheap_t heap, stru
*** 569,591 ****
        break;
    if (!edge)
      return;
!   /* Prune out edges we won't inline into anymore.  */
!   if (!cgraph_default_inline_p (node, &failed_reason))
!     {
!       for (; edge; edge = edge->next_caller)
! 	if (edge->aux)
  	  {
  	    fibheap_delete_node (heap, (fibnode_t) edge->aux);
  	    edge->aux = NULL;
- 	    if (edge->inline_failed)
- 	      edge->inline_failed = failed_reason;
  	  }
!       return;
!     }
! 
!   for (; edge; edge = edge->next_caller)
!     if (edge->inline_failed)
!       update_edge_key (heap, edge);
  }
  
  /* Recompute heap nodes for each uninlined call.
--- 911,930 ----
        break;
    if (!edge)
      return;
! 
!   for (; edge; edge = edge->next_caller)
!     if (edge->inline_failed)
!       {
! 	if (can_inline_edge_p (edge, false)
! 	    && want_inline_small_function_p (edge, false))
!           update_edge_key (heap, edge);
! 	else if (edge->aux)
  	  {
+ 	    report_inline_failed_reason (edge);
  	    fibheap_delete_node (heap, (fibnode_t) edge->aux);
  	    edge->aux = NULL;
  	  }
!       }
  }
  
  /* Recompute heap nodes for each uninlined call.
*************** update_callee_keys (fibheap_t heap, stru
*** 613,624 ****
  	    && !bitmap_bit_p (updated_nodes, e->callee->uid))
  	  {
  	    inline_summary (node)->estimated_growth = INT_MIN;
! 	    /* If function becomes uninlinable, we need to remove it from the heap.  */
! 	    if (!cgraph_default_inline_p (e->callee, &e->inline_failed))
! 	      update_caller_keys (heap, e->callee, updated_nodes);
! 	    else
! 	    /* Otherwise update just edge E.  */
! 	      update_edge_key (heap, e);
  	  }
  	if (e->next_callee)
  	  e = e->next_callee;
--- 952,958 ----
  	    && !bitmap_bit_p (updated_nodes, e->callee->uid))
  	  {
  	    inline_summary (node)->estimated_growth = INT_MIN;
! 	    update_edge_key (heap, e);
  	  }
  	if (e->next_callee)
  	  e = e->next_callee;
*************** lookup_recursive_calls (struct cgraph_no
*** 702,717 ****
     is NULL.  */
  
  static bool
! cgraph_decide_recursive_inlining (struct cgraph_edge *edge,
! 				  VEC (cgraph_edge_p, heap) **new_edges)
  {
    int limit = PARAM_VALUE (PARAM_MAX_INLINE_INSNS_RECURSIVE_AUTO);
-   int max_depth = PARAM_VALUE (PARAM_MAX_INLINE_RECURSIVE_DEPTH_AUTO);
-   int probability = PARAM_VALUE (PARAM_MIN_INLINE_RECURSIVE_PROBABILITY);
    fibheap_t heap;
    struct cgraph_node *node;
    struct cgraph_edge *e;
!   struct cgraph_node *master_clone, *next;
    int depth = 0;
    int n = 0;
  
--- 1036,1049 ----
     is NULL.  */
  
  static bool
! recursive_inlining (struct cgraph_edge *edge,
! 		    VEC (cgraph_edge_p, heap) **new_edges)
  {
    int limit = PARAM_VALUE (PARAM_MAX_INLINE_INSNS_RECURSIVE_AUTO);
    fibheap_t heap;
    struct cgraph_node *node;
    struct cgraph_edge *e;
!   struct cgraph_node *master_clone = NULL, *next;
    int depth = 0;
    int n = 0;
  
*************** cgraph_decide_recursive_inlining (struct
*** 719,743 ****
    if (node->global.inlined_to)
      node = node->global.inlined_to;
  
-   /* It does not make sense to recursively inline always-inline functions
-      as we are going to sorry() on the remaining calls anyway.  */
-   if (inline_summary (node)->disregard_inline_limits
-       && lookup_attribute ("always_inline", DECL_ATTRIBUTES (node->decl)))
-     return false;
- 
-   if (optimize_function_for_size_p (DECL_STRUCT_FUNCTION (node->decl))
-       || (!flag_inline_functions && !DECL_DECLARED_INLINE_P (node->decl)))
-     return false;
- 
    if (DECL_DECLARED_INLINE_P (node->decl))
!     {
!       limit = PARAM_VALUE (PARAM_MAX_INLINE_INSNS_RECURSIVE);
!       max_depth = PARAM_VALUE (PARAM_MAX_INLINE_RECURSIVE_DEPTH);
!     }
  
    /* Make sure that function is small enough to be considered for inlining.  */
!   if (!max_depth
!       || estimate_size_after_inlining (node, edge)  >= limit)
      return false;
    heap = fibheap_new ();
    lookup_recursive_calls (node, node, heap);
--- 1051,1061 ----
    if (node->global.inlined_to)
      node = node->global.inlined_to;
  
    if (DECL_DECLARED_INLINE_P (node->decl))
!     limit = PARAM_VALUE (PARAM_MAX_INLINE_INSNS_RECURSIVE);
  
    /* Make sure that function is small enough to be considered for inlining.  */
!   if (estimate_size_after_inlining (node, edge)  >= limit)
      return false;
    heap = fibheap_new ();
    lookup_recursive_calls (node, node, heap);
*************** cgraph_decide_recursive_inlining (struct
*** 752,765 ****
  	     "  Performing recursive inlining on %s\n",
  	     cgraph_node_name (node));
  
-   /* We need original clone to copy around.  */
-   master_clone = cgraph_clone_node (node, node->decl,
- 				    node->count, CGRAPH_FREQ_BASE, 1,
-   				    false, NULL);
-   for (e = master_clone->callees; e; e = e->next_callee)
-     if (!e->inline_failed)
-       cgraph_clone_inlined_nodes (e, true, false);
- 
    /* Do the inlining and update list of recursive call during process.  */
    while (!fibheap_empty (heap))
      {
--- 1070,1075 ----
*************** cgraph_decide_recursive_inlining (struct
*** 770,804 ****
        if (estimate_size_after_inlining (node, curr) > limit)
  	break;
  
        depth = 1;
        for (cnode = curr->caller;
  	   cnode->global.inlined_to; cnode = cnode->callers->caller)
  	if (node->decl == curr->callee->decl)
  	  depth++;
-       if (depth > max_depth)
- 	{
-           if (dump_file)
- 	    fprintf (dump_file,
- 		     "   maximal depth reached\n");
- 	  continue;
- 	}
  
!       if (max_count)
! 	{
!           if (!cgraph_maybe_hot_edge_p (curr))
! 	    {
! 	      if (dump_file)
! 		fprintf (dump_file, "   Not inlining cold call\n");
! 	      continue;
! 	    }
!           if (curr->count * 100 / node->count < probability)
! 	    {
! 	      if (dump_file)
! 		fprintf (dump_file,
! 			 "   Probability of edge is too small\n");
! 	      continue;
! 	    }
! 	}
  
        if (dump_file)
  	{
--- 1080,1096 ----
        if (estimate_size_after_inlining (node, curr) > limit)
  	break;
  
+       if (!can_inline_edge_p (curr, true))
+ 	continue;
+ 
        depth = 1;
        for (cnode = curr->caller;
  	   cnode->global.inlined_to; cnode = cnode->callers->caller)
  	if (node->decl == curr->callee->decl)
  	  depth++;
  
!       if (!want_inline_self_recursive_call_p (curr, node, false, depth))
! 	continue;
  
        if (dump_file)
  	{
*************** cgraph_decide_recursive_inlining (struct
*** 811,828 ****
  	    }
  	  fprintf (dump_file, "\n");
  	}
        cgraph_redirect_edge_callee (curr, master_clone);
        cgraph_mark_inline_edge (curr, false, new_edges);
        lookup_recursive_calls (node, curr->callee, heap);
        n++;
      }
    if (!fibheap_empty (heap) && dump_file)
      fprintf (dump_file, "    Recursive inlining growth limit met.\n");
- 
    fibheap_delete (heap);
    if (dump_file)
      fprintf (dump_file,
! 	     "\n   Inlined %i times, body grown from size %i to %i, time %i to %i\n", n,
  	     inline_summary (master_clone)->size, inline_summary (node)->size,
  	     inline_summary (master_clone)->time, inline_summary (node)->time);
  
--- 1103,1136 ----
  	    }
  	  fprintf (dump_file, "\n");
  	}
+       if (!master_clone)
+ 	{
+ 	  /* We need original clone to copy around.  */
+ 	  master_clone = cgraph_clone_node (node, node->decl,
+ 					    node->count, CGRAPH_FREQ_BASE, 1,
+ 					    false, NULL);
+ 	  for (e = master_clone->callees; e; e = e->next_callee)
+ 	    if (!e->inline_failed)
+ 	      cgraph_clone_inlined_nodes (e, true, false);
+ 	}
+ 
        cgraph_redirect_edge_callee (curr, master_clone);
        cgraph_mark_inline_edge (curr, false, new_edges);
        lookup_recursive_calls (node, curr->callee, heap);
        n++;
      }
+ 
    if (!fibheap_empty (heap) && dump_file)
      fprintf (dump_file, "    Recursive inlining growth limit met.\n");
    fibheap_delete (heap);
+ 
+   if (!master_clone)
+     return false;
+ 
    if (dump_file)
      fprintf (dump_file,
! 	     "\n   Inlined %i times, "
! 	     "body grown from size %i to %i, time %i to %i\n", n,
  	     inline_summary (master_clone)->size, inline_summary (node)->size,
  	     inline_summary (master_clone)->time, inline_summary (node)->time);
  
*************** cgraph_decide_recursive_inlining (struct
*** 837,863 ****
  	cgraph_remove_node (node);
      }
    cgraph_remove_node (master_clone);
!   /* FIXME: Recursive inlining actually reduces number of calls of the
!      function.  At this place we should probably walk the function and
!      inline clones and compensate the counts accordingly.  This probably
!      doesn't matter much in practice.  */
!   return n > 0;
! }
! 
! /* Set inline_failed for all callers of given function to REASON.  */
! 
! static void
! cgraph_set_inline_failed (struct cgraph_node *node,
! 			  cgraph_inline_failed_t reason)
! {
!   struct cgraph_edge *e;
! 
!   if (dump_file)
!     fprintf (dump_file, "Inlining failed: %s\n",
! 	     cgraph_inline_failed_string (reason));
!   for (e = node->callers; e; e = e->next_caller)
!     if (e->inline_failed)
!       e->inline_failed = reason;
  }
  
  /* Given whole compilation unit estimate of INSNS, compute how large we can
--- 1145,1151 ----
  	cgraph_remove_node (node);
      }
    cgraph_remove_node (master_clone);
!   return true;
  }
  
  /* Given whole compilation unit estimate of INSNS, compute how large we can
*************** add_new_edges_to_heap (fibheap_t heap, V
*** 884,891 ****
        gcc_assert (!edge->aux);
        if (inline_summary (edge->callee)->inlinable
  	  && edge->inline_failed
! 	  && cgraph_default_inline_p (edge->callee, &edge->inline_failed))
!         edge->aux = fibheap_insert (heap, cgraph_edge_badness (edge, false), edge);
      }
  }
  
--- 1172,1180 ----
        gcc_assert (!edge->aux);
        if (inline_summary (edge->callee)->inlinable
  	  && edge->inline_failed
! 	  && can_inline_edge_p (edge, true)
! 	  && want_inline_small_function_p (edge, true))
!         edge->aux = fibheap_insert (heap, edge_badness (edge, false), edge);
      }
  }
  
*************** add_new_edges_to_heap (fibheap_t heap, V
*** 898,908 ****
     to be passed to cgraph_inlined_into and cgraph_inlined_callees.  */
  
  static void
! cgraph_decide_inlining_of_small_functions (void)
  {
    struct cgraph_node *node;
    struct cgraph_edge *edge;
-   cgraph_inline_failed_t failed_reason;
    fibheap_t heap = fibheap_new ();
    bitmap updated_nodes = BITMAP_ALLOC (NULL);
    int min_size, max_size;
--- 1187,1196 ----
     to be passed to cgraph_inlined_into and cgraph_inlined_callees.  */
  
  static void
! inline_small_functions (void)
  {
    struct cgraph_node *node;
    struct cgraph_edge *edge;
    fibheap_t heap = fibheap_new ();
    bitmap updated_nodes = BITMAP_ALLOC (NULL);
    int min_size, max_size;
*************** cgraph_decide_inlining_of_small_function
*** 921,951 ****
        {
  	struct inline_summary *info = inline_summary (node);
  
- 	if (!info->inlinable || !node->callers)
- 	  {
- 	    struct cgraph_edge *e;
- 	    for (e = node->callers; e; e = e->next_caller)
- 	      {
- 		gcc_assert (e->inline_failed);
- 		e->inline_failed = CIF_FUNCTION_NOT_INLINABLE;
- 	      }
- 	    continue;
- 	  }
  	if (dump_file)
! 	  fprintf (dump_file, "Considering inline candidate %s.\n", cgraph_node_name (node));
  
  	info->estimated_growth = INT_MIN;
- 	if (!cgraph_default_inline_p (node, &failed_reason))
- 	  {
- 	    cgraph_set_inline_failed (node, failed_reason);
- 	    continue;
- 	  }
  
  	for (edge = node->callers; edge; edge = edge->next_caller)
! 	  if (edge->inline_failed)
  	    {
  	      gcc_assert (!edge->aux);
! 	      edge->aux = fibheap_insert (heap, cgraph_edge_badness (edge, false), edge);
  	    }
        }
  
--- 1209,1228 ----
        {
  	struct inline_summary *info = inline_summary (node);
  
  	if (dump_file)
! 	  fprintf (dump_file, "Enqueueing calls of %s/%i.\n",
! 		   cgraph_node_name (node), node->uid);
  
  	info->estimated_growth = INT_MIN;
  
  	for (edge = node->callers; edge; edge = edge->next_caller)
! 	  if (edge->inline_failed
! 	      && can_inline_edge_p (edge, true)
! 	      && want_inline_small_function_p (edge, true)
! 	      && edge->inline_failed)
  	    {
  	      gcc_assert (!edge->aux);
! 	      update_edge_key (heap, edge);
  	    }
        }
  
*************** cgraph_decide_inlining_of_small_function
*** 960,966 ****
        int badness = fibheap_min_key (heap);
        int current_badness;
        int growth;
-       cgraph_inline_failed_t not_good = CIF_OK;
  
        edge = (struct cgraph_edge *) fibheap_extract_min (heap);
        gcc_assert (edge->aux);
--- 1237,1242 ----
*************** cgraph_decide_inlining_of_small_function
*** 971,983 ****
        /* When updating the edge costs, we only decrease badness in the keys.
  	 When the badness increase, we keep the heap as it is and re-insert
  	 key now.  */
!       current_badness = cgraph_edge_badness (edge, false);
        gcc_assert (current_badness >= badness);
        if (current_badness != badness)
  	{
  	  edge->aux = fibheap_insert (heap, current_badness, edge);
  	  continue;
  	}
        
        callee = edge->callee;
        growth = estimate_edge_growth (edge);
--- 1247,1262 ----
        /* When updating the edge costs, we only decrease badness in the keys.
  	 When the badness increase, we keep the heap as it is and re-insert
  	 key now.  */
!       current_badness = edge_badness (edge, false);
        gcc_assert (current_badness >= badness);
        if (current_badness != badness)
  	{
  	  edge->aux = fibheap_insert (heap, current_badness, edge);
  	  continue;
  	}
+ 
+       if (!can_inline_edge_p (edge, true))
+ 	continue;
        
        callee = edge->callee;
        growth = estimate_edge_growth (edge);
*************** cgraph_decide_inlining_of_small_function
*** 989,1069 ****
  		   inline_summary (edge->callee)->size);
  	  fprintf (dump_file,
  		   " to be inlined into %s in %s:%i\n"
! 		   " Estimated growth after inlined into all callees is %+i insns.\n"
  		   " Estimated badness is %i, frequency %.2f.\n",
  		   cgraph_node_name (edge->caller),
  		   flag_wpa ? "unknown"
  		   : gimple_filename ((const_gimple) edge->call_stmt),
! 		   flag_wpa ? -1 : gimple_lineno ((const_gimple) edge->call_stmt),
  		   estimate_growth (edge->callee),
  		   badness,
  		   edge->frequency / (double)CGRAPH_FREQ_BASE);
  	  if (edge->count)
! 	    fprintf (dump_file," Called "HOST_WIDEST_INT_PRINT_DEC"x\n", edge->count);
  	  if (dump_flags & TDF_DETAILS)
! 	    cgraph_edge_badness (edge, true);
  	}
  
!       /* When not having profile info ready we don't weight by any way the
!          position of call in procedure itself.  This means if call of
! 	 function A from function B seems profitable to inline, the recursive
! 	 call of function A in inline copy of A in B will look profitable too
! 	 and we end up inlining until reaching maximal function growth.  This
! 	 is not good idea so prohibit the recursive inlining.
! 
! 	 ??? When the frequencies are taken into account we might not need this
! 	 restriction.
! 
! 	 We need to be careful here, in some testcases, e.g. directives.c in
! 	 libcpp, we can estimate self recursive function to have negative growth
! 	 for inlining completely.
! 	 */
!       if (!edge->count)
  	{
! 	  where = edge->caller;
! 	  while (where->global.inlined_to)
! 	    {
! 	      if (where->decl == edge->callee->decl)
! 		break;
! 	      where = where->callers->caller;
! 	    }
! 	  if (where->global.inlined_to)
! 	    {
! 	      edge->inline_failed
! 		= (inline_summary (edge->callee)->disregard_inline_limits
! 		   ? CIF_RECURSIVE_INLINING : CIF_UNSPECIFIED);
! 	      if (dump_file)
! 		fprintf (dump_file, " inline_failed:Recursive inlining performed only for function itself.\n");
! 	      continue;
! 	    }
! 	}
! 
!       if (inline_summary (edge->callee)->disregard_inline_limits)
! 	;
!       else if (!cgraph_maybe_hot_edge_p (edge))
!  	not_good = CIF_UNLIKELY_CALL;
!       else if (!flag_inline_functions
! 	  && !DECL_DECLARED_INLINE_P (edge->callee->decl))
!  	not_good = CIF_NOT_DECLARED_INLINED;
!       else if (optimize_function_for_size_p (DECL_STRUCT_FUNCTION(edge->caller->decl)))
!  	not_good = CIF_OPTIMIZING_FOR_SIZE;
!       if (not_good && growth > 0 && estimate_growth (edge->callee) > 0)
! 	{
! 	  edge->inline_failed = not_good;
! 	  if (dump_file)
! 	    fprintf (dump_file, " inline_failed:%s.\n",
! 		     cgraph_inline_failed_string (edge->inline_failed));
  	  continue;
  	}
!       if (!cgraph_default_inline_p (edge->callee, &edge->inline_failed))
! 	{
! 	  if (dump_file)
! 	    fprintf (dump_file, " inline_failed:%s.\n",
! 		     cgraph_inline_failed_string (edge->inline_failed));
! 	  continue;
! 	}
!       if (!tree_can_inline_p (edge)
! 	  || edge->call_stmt_cannot_inline_p)
  	{
  	  if (dump_file)
  	    fprintf (dump_file, " inline_failed:%s.\n",
--- 1268,1299 ----
  		   inline_summary (edge->callee)->size);
  	  fprintf (dump_file,
  		   " to be inlined into %s in %s:%i\n"
! 		   " Estimated growth after inlined into all is %+i insns.\n"
  		   " Estimated badness is %i, frequency %.2f.\n",
  		   cgraph_node_name (edge->caller),
  		   flag_wpa ? "unknown"
  		   : gimple_filename ((const_gimple) edge->call_stmt),
! 		   flag_wpa ? -1
! 		   : gimple_lineno ((const_gimple) edge->call_stmt),
  		   estimate_growth (edge->callee),
  		   badness,
  		   edge->frequency / (double)CGRAPH_FREQ_BASE);
  	  if (edge->count)
! 	    fprintf (dump_file," Called "HOST_WIDEST_INT_PRINT_DEC"x\n",
! 		     edge->count);
  	  if (dump_flags & TDF_DETAILS)
! 	    edge_badness (edge, true);
  	}
  
!       if (overall_size + growth > max_size
! 	  && !DECL_DISREGARD_INLINE_LIMITS (edge->callee->decl))
  	{
! 	  edge->inline_failed = CIF_INLINE_UNIT_GROWTH_LIMIT;
! 	  report_inline_failed_reason (edge);
  	  continue;
  	}
! 
!       if (!want_inline_small_function_p (edge, true))
  	{
  	  if (dump_file)
  	    fprintf (dump_file, " inline_failed:%s.\n",
*************** cgraph_decide_inlining_of_small_function
*** 1075,1083 ****
  	  where = edge->caller;
  	  if (where->global.inlined_to)
  	    where = where->global.inlined_to;
! 	  if (!cgraph_decide_recursive_inlining (edge,
! 						 flag_indirect_inlining
! 						 ? &new_indirect_edges : NULL))
  	    {
  	      edge->inline_failed = CIF_RECURSIVE_INLINING;
  	      continue;
--- 1305,1313 ----
  	  where = edge->caller;
  	  if (where->global.inlined_to)
  	    where = where->global.inlined_to;
! 	  if (!recursive_inlining (edge,
! 				   flag_indirect_inlining
! 				   ? &new_indirect_edges : NULL))
  	    {
  	      edge->inline_failed = CIF_RECURSIVE_INLINING;
  	      continue;
*************** cgraph_decide_inlining_of_small_function
*** 1089,1102 ****
        else
  	{
  	  struct cgraph_node *callee;
! 	  if (!cgraph_check_inline_limits (edge, &edge->inline_failed))
  	    {
! 	      if (dump_file)
! 		fprintf (dump_file, " Not inlining into %s:%s.\n",
! 			 cgraph_node_name (edge->caller),
! 			 cgraph_inline_failed_string (edge->inline_failed));
  	      continue;
  	    }
  	  callee = edge->callee;
  	  gcc_checking_assert (!callee->global.inlined_to);
  	  cgraph_mark_inline_edge (edge, true, &new_indirect_edges);
--- 1319,1351 ----
        else
  	{
  	  struct cgraph_node *callee;
! 	  struct cgraph_node *outer_node = NULL;
! 	  int depth = 0;
! 
! 	  /* Consider the case where self recursive function A is inlined into B.
! 	     This is desired optimization in some cases, since it leads to effect
! 	     similar of loop peeling and we might completely optimize out the
! 	     recursive call.  However we must be extra selective.  */
! 
! 	  where = edge->caller;
! 	  while (where->global.inlined_to)
  	    {
! 	      if (where->decl == edge->callee->decl)
! 		outer_node = where, depth++;
! 	      where = where->callers->caller;
! 	    }
! 	  if (outer_node
! 	      && !want_inline_self_recursive_call_p (edge, outer_node,
! 						     true, depth))
! 	    {
! 	      edge->inline_failed
! 		= (DECL_DISREGARD_INLINE_LIMITS (edge->callee->decl)
! 		   ? CIF_RECURSIVE_INLINING : CIF_UNSPECIFIED);
  	      continue;
  	    }
+ 	  else if (depth && dump_file)
+ 	    fprintf (dump_file, " Peeling recursion with depth %i\n", depth);
+ 
  	  callee = edge->callee;
  	  gcc_checking_assert (!callee->global.inlined_to);
  	  cgraph_mark_inline_edge (edge, true, &new_indirect_edges);
*************** cgraph_decide_inlining_of_small_function
*** 1148,1190 ****
  	    fprintf (dump_file, "New minimal size reached: %i\n", min_size);
  	}
      }
-   while (!fibheap_empty (heap))
-     {
-       int badness = fibheap_min_key (heap);
- 
-       edge = (struct cgraph_edge *) fibheap_extract_min (heap);
-       gcc_assert (edge->aux);
-       edge->aux = NULL;
-       if (!edge->inline_failed)
- 	continue;
- #ifdef ENABLE_CHECKING
-       gcc_assert (cgraph_edge_badness (edge, false) >= badness);
- #endif
-       if (dump_file)
- 	{
- 	  fprintf (dump_file,
- 		   "\nSkipping %s with %i size\n",
- 		   cgraph_node_name (edge->callee),
- 		   inline_summary (edge->callee)->size);
- 	  fprintf (dump_file,
- 		   " called by %s in %s:%i\n"
- 		   " Estimated growth after inlined into all callees is %+i insns.\n"
- 		   " Estimated badness is %i, frequency %.2f.\n",
- 		   cgraph_node_name (edge->caller),
- 		   flag_wpa ? "unknown"
- 		   : gimple_filename ((const_gimple) edge->call_stmt),
- 		   flag_wpa ? -1 : gimple_lineno ((const_gimple) edge->call_stmt),
- 		   estimate_growth (edge->callee),
- 		   badness,
- 		   edge->frequency / (double)CGRAPH_FREQ_BASE);
- 	  if (edge->count)
- 	    fprintf (dump_file," Called "HOST_WIDEST_INT_PRINT_DEC"x\n", edge->count);
- 	  if (dump_flags & TDF_DETAILS)
- 	    cgraph_edge_badness (edge, true);
- 	}
-       if (!inline_summary (edge->callee)->disregard_inline_limits && edge->inline_failed)
- 	edge->inline_failed = CIF_INLINE_UNIT_GROWTH_LIMIT;
-     }
  
    if (new_indirect_edges)
      VEC_free (cgraph_edge_p, heap, new_indirect_edges);
--- 1397,1402 ----
*************** cgraph_decide_inlining_of_small_function
*** 1195,1201 ****
  /* Flatten NODE from the IPA inliner.  */
  
  static void
! cgraph_flatten (struct cgraph_node *node)
  {
    struct cgraph_edge *e;
  
--- 1407,1413 ----
  /* Flatten NODE from the IPA inliner.  */
  
  static void
! flatten_function (struct cgraph_node *node)
  {
    struct cgraph_edge *e;
  
*************** cgraph_flatten (struct cgraph_node *node
*** 1208,1229 ****
      {
        struct cgraph_node *orig_callee;
  
-       if (e->call_stmt_cannot_inline_p)
- 	{
- 	  if (dump_file)
- 	    fprintf (dump_file, "Not inlining: %s",
- 		     cgraph_inline_failed_string (e->inline_failed));
- 	  continue;
- 	}
- 
-       if (!e->callee->analyzed)
- 	{
- 	  if (dump_file)
- 	    fprintf (dump_file,
- 		     "Not inlining: Function body not available.\n");
- 	  continue;
- 	}
- 
        /* We've hit cycle?  It is time to give up.  */
        if (e->callee->aux)
  	{
--- 1420,1425 ----
*************** cgraph_flatten (struct cgraph_node *node
*** 1240,1249 ****
  	 it in order to fully flatten the leaves.  */
        if (!e->inline_failed)
  	{
! 	  cgraph_flatten (e->callee);
  	  continue;
  	}
  
        if (cgraph_edge_recursive_p (e))
  	{
  	  if (dump_file)
--- 1436,1453 ----
  	 it in order to fully flatten the leaves.  */
        if (!e->inline_failed)
  	{
! 	  flatten_function (e->callee);
  	  continue;
  	}
  
+       /* Flatten attribute needs to be processed during late inlining. For
+ 	 extra code quality we however do flattening during early optimization,
+ 	 too.  */
+       if (cgraph_state != CGRAPH_STATE_IPA_SSA
+ 	  ? !can_inline_edge_p (e, true)
+ 	  : !can_early_inline_edge_p (e))
+ 	continue;
+ 
        if (cgraph_edge_recursive_p (e))
  	{
  	  if (dump_file)
*************** cgraph_flatten (struct cgraph_node *node
*** 1251,1264 ****
  	  continue;
  	}
  
-       if (!tree_can_inline_p (e))
- 	{
- 	  if (dump_file)
- 	    fprintf (dump_file, "Not inlining: %s",
- 		     cgraph_inline_failed_string (e->inline_failed));
- 	  continue;
- 	}
- 
        if (gimple_in_ssa_p (DECL_STRUCT_FUNCTION (node->decl))
  	  != gimple_in_ssa_p (DECL_STRUCT_FUNCTION (e->callee->decl)))
  	{
--- 1455,1460 ----
*************** cgraph_flatten (struct cgraph_node *node
*** 1277,1283 ****
        cgraph_mark_inline_edge (e, true, NULL);
        if (e->callee != orig_callee)
  	orig_callee->aux = (void *) node;
!       cgraph_flatten (e->callee);
        if (e->callee != orig_callee)
  	orig_callee->aux = NULL;
      }
--- 1473,1479 ----
        cgraph_mark_inline_edge (e, true, NULL);
        if (e->callee != orig_callee)
  	orig_callee->aux = (void *) node;
!       flatten_function (e->callee);
        if (e->callee != orig_callee)
  	orig_callee->aux = NULL;
      }
*************** cgraph_flatten (struct cgraph_node *node
*** 1289,1295 ****
     expenses on updating data structures.  */
  
  static unsigned int
! cgraph_decide_inlining (void)
  {
    struct cgraph_node *node;
    int nnodes;
--- 1485,1491 ----
     expenses on updating data structures.  */
  
  static unsigned int
! ipa_inline (void)
  {
    struct cgraph_node *node;
    int nnodes;
*************** cgraph_decide_inlining (void)
*** 1366,1406 ****
  	  if (dump_file)
  	    fprintf (dump_file,
  		     "Flattening %s\n", cgraph_node_name (node));
! 	  cgraph_flatten (node);
  	}
      }
  
!   cgraph_decide_inlining_of_small_functions ();
  
    if (flag_inline_functions_called_once)
      {
        if (dump_file)
  	fprintf (dump_file, "\nDeciding on functions called once:\n");
  
        /* And finally decide what functions are called once.  */
!       for (i = nnodes - 1; i >= 0; i--)
  	{
- 	  node = order[i];
- 
  	  if (node->callers
  	      && !node->callers->next_caller
  	      && !node->global.inlined_to
- 	      && cgraph_will_be_removed_from_program_if_no_direct_calls (node)
- 	      && inline_summary (node)->inlinable
- 	      && cgraph_function_body_availability (node) >= AVAIL_AVAILABLE
  	      && node->callers->inline_failed
  	      && node->callers->caller != node
  	      && node->callers->caller->global.inlined_to != node
! 	      && !node->callers->call_stmt_cannot_inline_p
! 	      && tree_can_inline_p (node->callers)
! 	      && !DECL_EXTERNAL (node->decl))
  	    {
! 	      cgraph_inline_failed_t reason;
  	      old_size = overall_size;
  	      if (dump_file)
  		{
  		  fprintf (dump_file,
! 			   "\nConsidering %s size %i.\n",
  			   cgraph_node_name (node), inline_summary (node)->size);
  		  fprintf (dump_file,
  			   " Called once from %s %i insns.\n",
--- 1562,1605 ----
  	  if (dump_file)
  	    fprintf (dump_file,
  		     "Flattening %s\n", cgraph_node_name (node));
! 	  flatten_function (node);
  	}
      }
  
!   inline_small_functions ();
!   cgraph_remove_unreachable_nodes (true, dump_file);
!   free (order);
  
+   /* We already perform some inlining of functions called once during
+      inlining small functions above.  After unreachable nodes are removed,
+      we still might do a quick check that nothing new is found.  */
    if (flag_inline_functions_called_once)
      {
        if (dump_file)
  	fprintf (dump_file, "\nDeciding on functions called once:\n");
  
        /* And finally decide what functions are called once.  */
!       for (node = cgraph_nodes; node; node = node->next)
  	{
  	  if (node->callers
  	      && !node->callers->next_caller
  	      && !node->global.inlined_to
  	      && node->callers->inline_failed
  	      && node->callers->caller != node
  	      && node->callers->caller->global.inlined_to != node
! 	      && cgraph_will_be_removed_from_program_if_no_direct_calls (node)
! 	      && inline_summary (node)->inlinable
! 	      && cgraph_function_body_availability (node) >= AVAIL_AVAILABLE
! 	      && !DECL_EXTERNAL (node->decl)
! 	      && can_inline_edge_p (node->callers, true))
  	    {
! 	      struct cgraph_node *caller = node->callers->caller;
! 
  	      old_size = overall_size;
  	      if (dump_file)
  		{
  		  fprintf (dump_file,
! 			   "\nInlining %s size %i.\n",
  			   cgraph_node_name (node), inline_summary (node)->size);
  		  fprintf (dump_file,
  			   " Called once from %s %i insns.\n",
*************** cgraph_decide_inlining (void)
*** 1408,1432 ****
  			   inline_summary (node->callers->caller)->size);
  		}
  
! 	      if (cgraph_check_inline_limits (node->callers, &reason))
! 		{
! 		  struct cgraph_node *caller = node->callers->caller;
! 		  cgraph_mark_inline_edge (node->callers, true, NULL);
! 		  if (dump_file)
! 		    fprintf (dump_file,
! 			     " Inlined into %s which now has %i size"
! 			     " for a net change of %+i size.\n",
! 			     cgraph_node_name (caller),
! 			     inline_summary (caller)->size,
! 			     overall_size - old_size);
! 		}
! 	      else
! 		{
! 		  if (dump_file)
! 		    fprintf (dump_file,
! 			     " Not inlining: %s.\n",
!                              cgraph_inline_failed_string (reason));
! 		}
  	    }
  	}
      }
--- 1607,1620 ----
  			   inline_summary (node->callers->caller)->size);
  		}
  
! 	      cgraph_mark_inline_edge (node->callers, true, NULL);
! 	      if (dump_file)
! 		fprintf (dump_file,
! 			 " Inlined into %s which now has %i size"
! 			 " for a net change of %+i size.\n",
! 			 cgraph_node_name (caller),
! 			 inline_summary (caller)->size,
! 			 overall_size - old_size);
  	    }
  	}
      }
*************** cgraph_decide_inlining (void)
*** 1441,1532 ****
  	     "size %i turned to %i size.\n\n",
  	     ncalls_inlined, nfunctions_inlined, initial_size,
  	     overall_size);
-   free (order);
    /* In WPA we use inline summaries for partitioning process.  */
    if (!flag_wpa)
      inline_free_summary ();
    return 0;
  }
  
- /* Return true when N is leaf function.  Accept cheap builtins
-    in leaf functions.  */
- 
- static bool
- leaf_node_p (struct cgraph_node *n)
- {
-   struct cgraph_edge *e;
-   for (e = n->callees; e; e = e->next_callee)
-     if (!is_inexpensive_builtin (e->callee->decl))
-       return false;
-   return true;
- }
- 
- /* Return true if the edge E is inlinable during early inlining.  */
- 
- static bool
- cgraph_edge_early_inlinable_p (struct cgraph_edge *e, FILE *file)
- {
-   if (!inline_summary (e->callee)->inlinable)
-     {
-       if (file)
- 	fprintf (file, "Not inlining: Function not inlinable.\n");
-       return false;
-     }
-   if (!e->callee->analyzed)
-     {
-       if (file)
- 	fprintf (file, "Not inlining: Function body not available.\n");
-       return false;
-     }
-   if (!tree_can_inline_p (e)
-       || e->call_stmt_cannot_inline_p)
-     {
-       if (file)
- 	fprintf (file, "Not inlining: %s.\n",
- 		 cgraph_inline_failed_string (e->inline_failed));
-       return false;
-     }
-   if (!gimple_in_ssa_p (DECL_STRUCT_FUNCTION (e->caller->decl))
-       || !gimple_in_ssa_p (DECL_STRUCT_FUNCTION (e->callee->decl)))
-     {
-       if (file)
- 	fprintf (file, "Not inlining: not in SSA form.\n");
-       return false;
-     }
-   return true;
- }
- 
  /* Inline always-inline function calls in NODE.  */
  
  static bool
! cgraph_perform_always_inlining (struct cgraph_node *node)
  {
    struct cgraph_edge *e;
    bool inlined = false;
  
    for (e = node->callees; e; e = e->next_callee)
      {
!       if (!inline_summary (e->callee)->disregard_inline_limits)
  	continue;
  
-       if (dump_file)
- 	fprintf (dump_file,
- 		 "Considering always-inline candidate %s.\n",
- 		 cgraph_node_name (e->callee));
- 
        if (cgraph_edge_recursive_p (e))
  	{
  	  if (dump_file)
! 	    fprintf (dump_file, "Not inlining: recursive call.\n");
  	  e->inline_failed = CIF_RECURSIVE_INLINING;
  	  continue;
  	}
  
!       if (!cgraph_edge_early_inlinable_p (e, dump_file))
  	continue;
  
        if (dump_file)
! 	fprintf (dump_file, " Inlining %s into %s.\n",
  		 cgraph_node_name (e->callee),
  		 cgraph_node_name (e->caller));
        cgraph_mark_inline_edge (e, true, NULL);
--- 1629,1667 ----
  	     "size %i turned to %i size.\n\n",
  	     ncalls_inlined, nfunctions_inlined, initial_size,
  	     overall_size);
    /* In WPA we use inline summaries for partitioning process.  */
    if (!flag_wpa)
      inline_free_summary ();
    return 0;
  }
  
  /* Inline always-inline function calls in NODE.  */
  
  static bool
! inline_always_inline_functions (struct cgraph_node *node)
  {
    struct cgraph_edge *e;
    bool inlined = false;
  
    for (e = node->callees; e; e = e->next_callee)
      {
!       if (!DECL_DISREGARD_INLINE_LIMITS (e->callee->decl))
  	continue;
  
        if (cgraph_edge_recursive_p (e))
  	{
  	  if (dump_file)
! 	    fprintf (dump_file, "  Not inlining recursive call to %s.\n",
! 		     cgraph_node_name (e->callee));
  	  e->inline_failed = CIF_RECURSIVE_INLINING;
  	  continue;
  	}
  
!       if (!can_early_inline_edge_p (e))
  	continue;
  
        if (dump_file)
! 	fprintf (dump_file, "  Inlining %s into %s (always_inline).\n",
  		 cgraph_node_name (e->callee),
  		 cgraph_node_name (e->caller));
        cgraph_mark_inline_edge (e, true, NULL);
*************** cgraph_perform_always_inlining (struct c
*** 1540,1563 ****
     expenses on updating data structures.  */
  
  static bool
! cgraph_decide_inlining_incrementally (struct cgraph_node *node)
  {
    struct cgraph_edge *e;
    bool inlined = false;
-   cgraph_inline_failed_t failed_reason;
- 
-   /* Never inline regular functions into always-inline functions
-      during incremental inlining.  */
-   if (inline_summary (node)->disregard_inline_limits)
-     return false;
  
    for (e = node->callees; e; e = e->next_callee)
      {
-       int allowed_growth = 0;
- 
        if (!inline_summary (e->callee)->inlinable
! 	  || !e->inline_failed
! 	  || inline_summary (e->callee)->disregard_inline_limits)
  	continue;
  
        /* Do not consider functions not declared inline.  */
--- 1675,1689 ----
     expenses on updating data structures.  */
  
  static bool
! early_inline_small_functions (struct cgraph_node *node)
  {
    struct cgraph_edge *e;
    bool inlined = false;
  
    for (e = node->callees; e; e = e->next_callee)
      {
        if (!inline_summary (e->callee)->inlinable
! 	  || !e->inline_failed)
  	continue;
  
        /* Do not consider functions not declared inline.  */
*************** cgraph_decide_inlining_incrementally (st
*** 1570,1616 ****
  	fprintf (dump_file, "Considering inline candidate %s.\n",
  		 cgraph_node_name (e->callee));
  
        if (cgraph_edge_recursive_p (e))
  	{
  	  if (dump_file)
! 	    fprintf (dump_file, "Not inlining: recursive call.\n");
  	  continue;
  	}
  
!       if (!cgraph_edge_early_inlinable_p (e, dump_file))
  	continue;
  
!       if (cgraph_maybe_hot_edge_p (e) && leaf_node_p (e->callee)
! 	  && optimize_function_for_speed_p (cfun))
! 	allowed_growth = PARAM_VALUE (PARAM_EARLY_INLINING_INSNS);
! 
!       /* When the function body would grow and inlining the function
! 	 won't eliminate the need for offline copy of the function,
! 	 don't inline.  */
!       if (estimate_edge_growth (e) > allowed_growth)
! 	{
! 	  if (dump_file)
! 	    fprintf (dump_file,
! 		     "Not inlining: code size would grow by %i.\n",
! 		     estimate_edge_growth (e));
! 	  continue;
! 	}
!       if (!cgraph_check_inline_limits (e, &e->inline_failed))
! 	{
! 	  if (dump_file)
! 	    fprintf (dump_file, "Not inlining: %s.\n",
! 		     cgraph_inline_failed_string (e->inline_failed));
! 	  continue;
! 	}
!       if (cgraph_default_inline_p (e->callee, &failed_reason))
! 	{
! 	  if (dump_file)
! 	    fprintf (dump_file, " Inlining %s into %s.\n",
! 		     cgraph_node_name (e->callee),
! 		     cgraph_node_name (e->caller));
! 	  cgraph_mark_inline_edge (e, true, NULL);
! 	  inlined = true;
! 	}
      }
  
    return inlined;
--- 1696,1720 ----
  	fprintf (dump_file, "Considering inline candidate %s.\n",
  		 cgraph_node_name (e->callee));
  
+       if (!can_early_inline_edge_p (e))
+ 	continue;
+ 
        if (cgraph_edge_recursive_p (e))
  	{
  	  if (dump_file)
! 	    fprintf (dump_file, "  Not inlining: recursive call.\n");
  	  continue;
  	}
  
!       if (!want_early_inline_function_p (e))
  	continue;
  
!       if (dump_file)
! 	fprintf (dump_file, " Inlining %s into %s.\n",
! 		 cgraph_node_name (e->callee),
! 		 cgraph_node_name (e->caller));
!       cgraph_mark_inline_edge (e, true, NULL);
!       inlined = true;
      }
  
    return inlined;
*************** static GTY ((length ("nnodes"))) struct 
*** 1626,1632 ****
     passes to be somewhat more effective and avoids some code duplication in
     later real inlining pass for testcases with very many function calls.  */
  static unsigned int
! cgraph_early_inlining (void)
  {
    struct cgraph_node *node = cgraph_get_node (current_function_decl);
    struct cgraph_edge *edge;
--- 1730,1736 ----
     passes to be somewhat more effective and avoids some code duplication in
     later real inlining pass for testcases with very many function calls.  */
  static unsigned int
! early_inliner (void)
  {
    struct cgraph_node *node = cgraph_get_node (current_function_decl);
    struct cgraph_edge *edge;
*************** cgraph_early_inlining (void)
*** 1643,1653 ****
  
    /* Even when not optimizing or not inlining inline always-inline
       functions.  */
!   inlined = cgraph_perform_always_inlining (node);
  
    if (!optimize
        || flag_no_inline
!       || !flag_early_inlining)
      ;
    else if (lookup_attribute ("flatten",
  			     DECL_ATTRIBUTES (node->decl)) != NULL)
--- 1747,1766 ----
  
    /* Even when not optimizing or not inlining inline always-inline
       functions.  */
!   inlined = inline_always_inline_functions (node);
  
    if (!optimize
        || flag_no_inline
!       || !flag_early_inlining
!       /* Never inline regular functions into always-inline functions
! 	 during incremental inlining.  This sucks as functions calling
! 	 always inline functions will get less optimized, but at the
! 	 same time inlining of functions calling always inline
! 	 functoin into an always inline function might introduce
! 	 cycles of edges to be always inlined in the callgraph.
! 
! 	 We might want to be smarter and just avoid this type of inlining.  */
!       || DECL_DISREGARD_INLINE_LIMITS (node->decl))
      ;
    else if (lookup_attribute ("flatten",
  			     DECL_ATTRIBUTES (node->decl)) != NULL)
*************** cgraph_early_inlining (void)
*** 1657,1663 ****
        if (dump_file)
  	fprintf (dump_file,
  		 "Flattening %s\n", cgraph_node_name (node));
!       cgraph_flatten (node);
        inlined = true;
      }
    else
--- 1770,1776 ----
        if (dump_file)
  	fprintf (dump_file,
  		 "Flattening %s\n", cgraph_node_name (node));
!       flatten_function (node);
        inlined = true;
      }
    else
*************** cgraph_early_inlining (void)
*** 1665,1674 ****
        /* We iterate incremental inlining to get trivial cases of indirect
  	 inlining.  */
        while (iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS)
! 	     && cgraph_decide_inlining_incrementally (node))
  	{
  	  timevar_push (TV_INTEGRATION);
  	  todo |= optimize_inline_calls (current_function_decl);
  	  timevar_pop (TV_INTEGRATION);
  	  iterations++;
  	  inlined = false;
--- 1778,1799 ----
        /* We iterate incremental inlining to get trivial cases of indirect
  	 inlining.  */
        while (iterations < PARAM_VALUE (PARAM_EARLY_INLINER_MAX_ITERATIONS)
! 	     && early_inline_small_functions (node))
  	{
  	  timevar_push (TV_INTEGRATION);
  	  todo |= optimize_inline_calls (current_function_decl);
+ 
+ 	  /* Technically we ought to recompute inline parameters so the new
+  	     iteration of early inliner works as expected.  We however have
+ 	     values approximately right and thus we only need to update edge
+ 	     info that might be cleared out for newly discovered edges.  */
+ 	  for (edge = node->callees; edge; edge = edge->next_callee)
+ 	    {
+ 	      edge->call_stmt_size
+ 		= estimate_num_insns (edge->call_stmt, &eni_size_weights);
+ 	      edge->call_stmt_time
+ 		= estimate_num_insns (edge->call_stmt, &eni_time_weights);
+ 	    }
  	  timevar_pop (TV_INTEGRATION);
  	  iterations++;
  	  inlined = false;
*************** cgraph_early_inlining (void)
*** 1681,1699 ****
      {
        timevar_push (TV_INTEGRATION);
        todo |= optimize_inline_calls (current_function_decl);
- 
-       /* Technically we ought to recompute inline parameters so the new iteration of
- 	 early inliner works as expected.  We however have values approximately right
- 	 and thus we only need to update edge info that might be cleared out for
- 	 newly discovered edges.  */
-       for (edge = node->callees; edge; edge = edge->next_callee)
- 	{
- 	  edge->call_stmt_size
- 	    = estimate_num_insns (edge->call_stmt, &eni_size_weights);
- 	  edge->call_stmt_time
- 	    = estimate_num_insns (edge->call_stmt, &eni_time_weights);
- 	}
- 
        timevar_pop (TV_INTEGRATION);
      }
  
--- 1806,1811 ----
*************** struct gimple_opt_pass pass_early_inline
*** 1708,1714 ****
    GIMPLE_PASS,
    "einline",	 			/* name */
    NULL,					/* gate */
!   cgraph_early_inlining,		/* execute */
    NULL,					/* sub */
    NULL,					/* next */
    0,					/* static_pass_number */
--- 1820,1826 ----
    GIMPLE_PASS,
    "einline",	 			/* name */
    NULL,					/* gate */
!   early_inliner,			/* execute */
    NULL,					/* sub */
    NULL,					/* next */
    0,					/* static_pass_number */
*************** inline_transform (struct cgraph_node *no
*** 1730,1737 ****
    struct cgraph_edge *e;
    bool inline_p = false;
  
!   /* FIXME: Currently the pass manager is adding inline transform more than once to some
!      clones.  This needs revisiting after WPA cleanups.  */
    if (cfun->after_inlining)
      return 0;
  
--- 1842,1849 ----
    struct cgraph_edge *e;
    bool inline_p = false;
  
!   /* FIXME: Currently the pass manager is adding inline transform more than
!      once to some clones.  This needs revisiting after WPA cleanups.  */
    if (cfun->after_inlining)
      return 0;
  
*************** inline_transform (struct cgraph_node *no
*** 1762,1768 ****
     happens during early inlining.  */
  
  static bool
! gate_cgraph_decide_inlining (void)
  {
    /* ???  We'd like to skip this if not optimizing or not inlining as
       all always-inline functions have been processed by early
--- 1874,1880 ----
     happens during early inlining.  */
  
  static bool
! gate_ipa_inline (void)
  {
    /* ???  We'd like to skip this if not optimizing or not inlining as
       all always-inline functions have been processed by early
*************** struct ipa_opt_pass_d pass_ipa_inline =
*** 1777,1784 ****
   {
    IPA_PASS,
    "inline",				/* name */
!   gate_cgraph_decide_inlining,		/* gate */
!   cgraph_decide_inlining,		/* execute */
    NULL,					/* sub */
    NULL,					/* next */
    0,					/* static_pass_number */
--- 1889,1896 ----
   {
    IPA_PASS,
    "inline",				/* name */
!   gate_ipa_inline,			/* gate */
!   ipa_inline,				/* execute */
    NULL,					/* sub */
    NULL,					/* next */
    0,					/* static_pass_number */
Index: ipa-inline.h
===================================================================
*** ipa-inline.h	(revision 172597)
--- ipa-inline.h	(working copy)
*************** struct inline_summary
*** 41,48 ****
    /* False when there something makes versioning impossible.
       Currently computed and used only by ipa-cp.  */
    unsigned versionable : 1;
-   /* True when function should be inlined independently on its size.  */
-   unsigned disregard_inline_limits : 1;
  
    /* Information about function that will result after applying all the
       inline decisions present in the callgraph.  Generally kept up to
--- 41,46 ----
Index: ipa-inline-analysis.c
===================================================================
*** ipa-inline-analysis.c	(revision 172597)
--- ipa-inline-analysis.c	(working copy)
*************** dump_inline_summary (FILE *f, struct cgr
*** 131,137 ****
        struct inline_summary *s = inline_summary (node);
        fprintf (f, "Inline summary for %s/%i", cgraph_node_name (node),
  	       node->uid);
!       if (s->disregard_inline_limits)
  	fprintf (f, " always_inline");
        if (s->inlinable)
  	fprintf (f, " inlinable");
--- 131,137 ----
        struct inline_summary *s = inline_summary (node);
        fprintf (f, "Inline summary for %s/%i", cgraph_node_name (node),
  	       node->uid);
!       if (DECL_DISREGARD_INLINE_LIMITS (node->decl))
  	fprintf (f, " always_inline");
        if (s->inlinable)
  	fprintf (f, " inlinable");
*************** dump_inline_summary (FILE *f, struct cgr
*** 142,148 ****
        fprintf (f, "  global time:     %i\n", s->time);
        fprintf (f, "  self size:       %i, benefit: %i\n",
  	       s->self_size, s->size_inlining_benefit);
!       fprintf (f, "  global size:     %i", s->size);
        fprintf (f, "  self stack:      %i\n",
  	       (int)s->estimated_self_stack_size);
        fprintf (f, "  global stack:    %i\n\n",
--- 142,148 ----
        fprintf (f, "  global time:     %i\n", s->time);
        fprintf (f, "  self size:       %i, benefit: %i\n",
  	       s->self_size, s->size_inlining_benefit);
!       fprintf (f, "  global size:     %i\n", s->size);
        fprintf (f, "  self stack:      %i\n",
  	       (int)s->estimated_self_stack_size);
        fprintf (f, "  global stack:    %i\n\n",
*************** compute_inline_parameters (struct cgraph
*** 364,371 ****
  
    /* Can this function be inlined at all?  */
    info->inlinable = tree_inlinable_function_p (node->decl);
-   if (!info->inlinable)
-     info->disregard_inline_limits = 0;
  
    /* Inlinable functions always can change signature.  */
    if (info->inlinable)
--- 364,369 ----
*************** compute_inline_parameters (struct cgraph
*** 388,395 ****
    info->estimated_growth = INT_MIN;
    info->stack_frame_offset = 0;
    info->estimated_stack_size = info->estimated_self_stack_size;
-   info->disregard_inline_limits
-     = DECL_DISREGARD_INLINE_LIMITS (node->decl);
  }
  
  
--- 386,391 ----
*************** estimate_growth (struct cgraph_node *nod
*** 483,507 ****
  
    for (e = node->callers; e; e = e->next_caller)
      {
!       if (e->caller == node)
          self_recursive = true;
!       if (e->inline_failed)
! 	growth += estimate_edge_growth (e);
      }
  
!   /* ??? Wrong for non-trivially self recursive functions or cases where
!      we decide to not inline for different reasons, but it is not big deal
!      as in that case we will keep the body around, but we will also avoid
!      some inlining.  */
!   if (cgraph_will_be_removed_from_program_if_no_direct_calls (node)
!       && !DECL_EXTERNAL (node->decl) && !self_recursive)
!     growth -= info->size;
!   /* COMDAT functions are very often not shared across multiple units since they
!      come from various template instantiations.  Take this into account.  */
!   else  if (DECL_COMDAT (node->decl) && !self_recursive
! 	    && cgraph_can_remove_if_no_direct_calls_p (node))
!     growth -= (info->size
! 	       * (100 - PARAM_VALUE (PARAM_COMDAT_SHARING_PROBABILITY)) + 50) / 100;
  
    info->estimated_growth = growth;
    return growth;
--- 479,512 ----
  
    for (e = node->callers; e; e = e->next_caller)
      {
!       gcc_checking_assert (e->inline_failed);
! 
!       if (e->caller == node
! 	  || (e->caller->global.inlined_to
! 	      && e->caller->global.inlined_to == node))
          self_recursive = true;
!       growth += estimate_edge_growth (e);
      }
+      
  
!   /* For self recursive functions the growth estimation really should be
!      infinity.  We don't want to return very large values because the growth
!      plays various roles in badness computation fractions.  Be sure to not
!      return zero or negative growths. */
!   if (self_recursive)
!     growth = growth < info->size ? info->size : growth;
!   else
!     {
!       if (cgraph_will_be_removed_from_program_if_no_direct_calls (node)
! 	  && !DECL_EXTERNAL (node->decl))
! 	growth -= info->size;
!       /* COMDAT functions are very often not shared across multiple units since they
! 	 come from various template instantiations.  Take this into account.  */
!       else  if (DECL_COMDAT (node->decl)
! 		&& cgraph_can_remove_if_no_direct_calls_p (node))
! 	growth -= (info->size
! 		   * (100 - PARAM_VALUE (PARAM_COMDAT_SHARING_PROBABILITY)) + 50) / 100;
!     }
  
    info->estimated_growth = growth;
    return growth;
*************** inline_read_summary (void)
*** 621,627 ****
  	      bp = lto_input_bitpack (ib);
  	      info->inlinable = bp_unpack_value (&bp, 1);
  	      info->versionable = bp_unpack_value (&bp, 1);
- 	      info->disregard_inline_limits = bp_unpack_value (&bp, 1);
  	    }
  
  	  lto_destroy_simple_input_block (file_data,
--- 626,631 ----
*************** inline_write_summary (cgraph_node_set se
*** 688,694 ****
  	  bp = bitpack_create (ob->main_stream);
  	  bp_pack_value (&bp, info->inlinable, 1);
  	  bp_pack_value (&bp, info->versionable, 1);
- 	  bp_pack_value (&bp, info->disregard_inline_limits, 1);
  	  lto_output_bitpack (&bp);
  	}
      }
--- 692,697 ----

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-04-19 18:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-17 12:07 Inliner heuristics facelift Dominique Dhumieres
2011-04-17 13:35 ` Jan Hubicka
2011-04-17 14:58   ` Dominique Dhumieres
2011-04-17 15:29     ` Jan Hubicka
2011-04-17 14:29 ` Jan Hubicka
2011-04-19 13:40   ` H.J. Lu
  -- strict thread matches above, loose matches on Subject: below --
2011-04-17 10:54 Jan Hubicka
2011-04-19 18:37 ` Xinliang David Li
2011-04-19 19:24   ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).