From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-226128-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 13520 invoked by alias); 28 Aug 2008 20:33:58 -0000
Received: (qmail 13405 invoked by uid 22791); 28 Aug 2008 20:33:49 -0000
X-Spam-Check-By: sourceware.org
Received: from atrey.karlin.mff.cuni.cz (HELO atrey.karlin.mff.cuni.cz) (195.113.26.193)     by sourceware.org (qpsmtpd/0.31) with ESMTP; Thu, 28 Aug 2008 20:32:52 +0000
Received: by atrey.karlin.mff.cuni.cz (Postfix, from userid 4018) 	id 2AD73F0172; Thu, 28 Aug 2008 22:32:49 +0200 (CEST)
Date: Fri, 29 Aug 2008 22:15:00 -0000
From: Jan Hubicka <hubicka@ucw.cz>
To: Richard Guenther <richard.guenther@gmail.com>
Cc: Jan Hubicka <hubicka@ucw.cz>, gcc-patches@gcc.gnu.org
Subject: Re: Patch ping...
Message-ID: <20080828203249.GH15545@atrey.karlin.mff.cuni.cz>
References: <20080405162606.GA22594@atrey.karlin.mff.cuni.cz> <84fc9c000804050953o429fde26jb3938827ff9dc5a@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <84fc9c000804050953o429fde26jb3938827ff9dc5a@mail.gmail.com>
User-Agent: Mutt/1.5.13 (2006-08-11)
X-IsSubscribed: yes
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2008-08/txt/msg02311.txt.bz2

> On Sat, Apr 5, 2008 at 6:26 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
> > Hi,
> >  I would like to ping the BRANCH_COST patch
> >  http://gcc.gnu.org/ml/gcc/2008-03/msg00137.html
> >
> >  I hope to proceed with updating GCC to optimize cold blocks in same way
> >  as -Os and explicitely marked hot functions in -Os code for speed.
> >  For this I need to populate RTL cost interfaces with the profile info
> >  and teach expansion about it.
> >  This is taking quite some years now, I realize it might not be clear
> >  what I am precisely shooting for, so I will also add wiki page.
> 
> I think the patch makes sense (BRANCH_COST is special anyway compared to
> other isns cost), but I'd like to see the bigger picture as well here.  In
> particular, BRANCH_COST (hot, predictable), why isn't that simply
> BRANCH_COST (optimize_size_p, predictable) matching what I possibly
> expect for the other cost interface (insn_cost (optimize_size_p, rtx)).

Hi,
with the optimize_*_for_speed_p predicates, this patch becomes cleaner
now.  I would also like to update other costs similar way so we can
avoid the current way we switch optimize_size global variable.

Bootstrapped/regtested i686-linux, OK?

	* optabs.c (expand_abs_nojump): Update BRANCH_COST call.
	* fold-cost.c (LOGICAL_OP_NON_SHORT_CIRCUIT, fold_truthop): Likewise.
	* dojump.c (do_jump): Likewise.
	* ifcvt.c (MAX_CONDITIONAL_EXECUTE): Likewise.
	(note-if_info): Add BRANCH_COST.
	(noce_try_store_flag_constants, noce_try_addcc, noce_try_store_flag_mask,
	noce_try_cmove_arith, noce_try_cmove_arith, noce_try_cmove_arith,
	noce_find_if_block, find_if_case_1, find_if_case_2): Use compuated
	branch cost.
	* expr.h (BRANCH_COST): Update default.
	* predict.c (predictable_edge_p): New function.
	* expmed.c (expand_smod_pow2, expand_sdiv_pow2, emit_store_flag):
	Update BRANCH_COST call.
	* basic-block.h (predictable_edge_p): Declare.
	* config/alpha/alpha.h (BRANCH_COST): Update.
	* config/frv/frv.h (BRANCH_COST): Update.
	* config/s390/s390.h (BRANCH_COST): Update.
	* config/spu/spu.h (BRANCH_COST): Update.
	* config/sparc/sparc.h (BRANCH_COST): Update.
	* config/m32r/m32r.h (BRANCH_COST): Update.
	* config/i386/i386.h (BRANCH_COST): Update.
	* config/i386/i386.c (ix86_expand_int_movcc): Update use of BRANCH_COST.
	* config/sh/sh.h (BRANCH_COST): Update.
	* config/pdp11/pdp11.h (BRANCH_COST): Update.
	* config/avr/avr.h (BRANCH_COST): Update.
	* config/crx/crx.h (BRANCH_COST): Update.
	* config/xtensa/xtensa.h (BRANCH_COST): Update.
	* config/stormy16/stormy16.h (BRANCH_COST): Update.
	* config/m68hc11/m68hc11.h (BRANCH_COST): Update.
	* config/iq2000/iq2000.h (BRANCH_COST): Update.
	* config/ia64/ia64.h (BRANCH_COST): Update.
	* config/rs6000/rs6000.h (BRANCH_COST): Update.
	* config/arc/arc.h (BRANCH_COST): Update.
	* config/score/score.h (BRANCH_COST): Update.
	* config/arm/arm.h (BRANCH_COST): Update.
	* config/pa/pa.h (BRANCH_COST): Update.
	* config/mips/mips.h (BRANCH_COST): Update.
	* config/vax/vax.h (BRANCH_COST): Update.
	* config/h8300/h8300.h (BRANCH_COST): Update.
	* params.def (PARAM_PREDICTABLE_BRANCH_OUTCOME): New.
	* doc/invoke.texi (predictable-branch-cost-outcome): Document.
	* doc/tm.texi (BRANCH_COST): Update.
Index: doc/tm.texi
===================================================================
*** doc/tm.texi	(revision 139737)
--- doc/tm.texi	(working copy)
*************** value to the result of that function.  T
*** 5874,5882 ****
  are the same as to this macro.
  @end defmac
  
! @defmac BRANCH_COST
! A C expression for the cost of a branch instruction.  A value of 1 is
! the default; other values are interpreted relative to that.
  @end defmac
  
  Here are additional macros which do not specify precise relative costs,
--- 5874,5887 ----
  are the same as to this macro.
  @end defmac
  
! @defmac BRANCH_COST (@var{speed_p}, @var{predictable_p})
! A C expression for the cost of a branch instruction.  A value of 1 is the
! default; other values are interpreted relative to that. Parameter @var{speed_p}
! is true when the branch in question should be optimized for speed.  When
! it is false, @code{BRANCH_COST} should be returning value optimal for code size
! rather then performance considerations.  @var{predictable_p} is true for well
! predictable branches. On many architectures the @code{BRANCH_COST} can be
! reduced then.
  @end defmac
  
  Here are additional macros which do not specify precise relative costs,
Index: doc/invoke.texi
===================================================================
*** doc/invoke.texi	(revision 139737)
--- doc/invoke.texi	(working copy)
*************** to the hottest structure frequency in th
*** 6905,6910 ****
--- 6905,6914 ----
  parameter, then structure reorganization is not applied to this structure.
  The default is 10.
  
+ @item predictable-branch-cost-outcome
+ When branch is predicted to be taken with probability lower than this threshold
+ (in percent), then it is considered well predictable. The default is 10.
+ 
  @item max-crossjump-edges
  The maximum number of incoming edges to consider for crossjumping.
  The algorithm used by @option{-fcrossjumping} is @math{O(N^2)} in
Index: optabs.c
===================================================================
*** optabs.c	(revision 139737)
--- optabs.c	(working copy)
*************** expand_abs_nojump (enum machine_mode mod
*** 3443,3449 ****
       value of X as (((signed) x >> (W-1)) ^ x) - ((signed) x >> (W-1)),
       where W is the width of MODE.  */
  
!   if (GET_MODE_CLASS (mode) == MODE_INT && BRANCH_COST >= 2)
      {
        rtx extended = expand_shift (RSHIFT_EXPR, mode, op0,
  				   size_int (GET_MODE_BITSIZE (mode) - 1),
--- 3443,3451 ----
       value of X as (((signed) x >> (W-1)) ^ x) - ((signed) x >> (W-1)),
       where W is the width of MODE.  */
  
!   if (GET_MODE_CLASS (mode) == MODE_INT
!       && BRANCH_COST (optimize_insn_for_speed_p (),
! 	      	      false) >= 2)
      {
        rtx extended = expand_shift (RSHIFT_EXPR, mode, op0,
  				   size_int (GET_MODE_BITSIZE (mode) - 1),
Index: fold-const.c
===================================================================
*** fold-const.c	(revision 139737)
--- fold-const.c	(working copy)
*************** fold_cond_expr_with_comparison (tree typ
*** 5109,5115 ****
  
  
  #ifndef LOGICAL_OP_NON_SHORT_CIRCUIT
! #define LOGICAL_OP_NON_SHORT_CIRCUIT (BRANCH_COST >= 2)
  #endif
  
  /* EXP is some logical combination of boolean tests.  See if we can
--- 5109,5117 ----
  
  
  #ifndef LOGICAL_OP_NON_SHORT_CIRCUIT
! #define LOGICAL_OP_NON_SHORT_CIRCUIT \
!   (BRANCH_COST (!cfun || optimize_function_for_speed_p (cfun), \
! 		false) >= 2)
  #endif
  
  /* EXP is some logical combination of boolean tests.  See if we can
*************** fold_truthop (enum tree_code code, tree 
*** 5357,5363 ****
       that can be merged.  Avoid doing this if the RHS is a floating-point
       comparison since those can trap.  */
  
!   if (BRANCH_COST >= 2
        && ! FLOAT_TYPE_P (TREE_TYPE (rl_arg))
        && simple_operand_p (rl_arg)
        && simple_operand_p (rr_arg))
--- 5359,5366 ----
       that can be merged.  Avoid doing this if the RHS is a floating-point
       comparison since those can trap.  */
  
!   if (BRANCH_COST (!cfun || optimize_function_for_speed_p (cfun),
! 		   false) >= 2
        && ! FLOAT_TYPE_P (TREE_TYPE (rl_arg))
        && simple_operand_p (rl_arg)
        && simple_operand_p (rr_arg))
Index: dojump.c
===================================================================
*** dojump.c	(revision 139737)
--- dojump.c	(working copy)
*************** do_jump (tree exp, rtx if_false_label, r
*** 510,516 ****
        /* High branch cost, expand as the bitwise AND of the conditions.
  	 Do the same if the RHS has side effects, because we're effectively
  	 turning a TRUTH_AND_EXPR into a TRUTH_ANDIF_EXPR.  */
!       if (BRANCH_COST >= 4 || TREE_SIDE_EFFECTS (TREE_OPERAND (exp, 1)))
  	goto normal;
  
      case TRUTH_ANDIF_EXPR:
--- 510,518 ----
        /* High branch cost, expand as the bitwise AND of the conditions.
  	 Do the same if the RHS has side effects, because we're effectively
  	 turning a TRUTH_AND_EXPR into a TRUTH_ANDIF_EXPR.  */
!       if (BRANCH_COST (optimize_insn_for_speed_p (),
! 		       false) >= 4
! 	  || TREE_SIDE_EFFECTS (TREE_OPERAND (exp, 1)))
  	goto normal;
  
      case TRUTH_ANDIF_EXPR:
*************** do_jump (tree exp, rtx if_false_label, r
*** 531,537 ****
        /* High branch cost, expand as the bitwise OR of the conditions.
  	 Do the same if the RHS has side effects, because we're effectively
  	 turning a TRUTH_OR_EXPR into a TRUTH_ORIF_EXPR.  */
!       if (BRANCH_COST >= 4 || TREE_SIDE_EFFECTS (TREE_OPERAND (exp, 1)))
  	goto normal;
  
      case TRUTH_ORIF_EXPR:
--- 533,540 ----
        /* High branch cost, expand as the bitwise OR of the conditions.
  	 Do the same if the RHS has side effects, because we're effectively
  	 turning a TRUTH_OR_EXPR into a TRUTH_ORIF_EXPR.  */
!       if (BRANCH_COST (optimize_insn_for_speed_p (), false)>= 4
! 	  || TREE_SIDE_EFFECTS (TREE_OPERAND (exp, 1)))
  	goto normal;
  
      case TRUTH_ORIF_EXPR:
Index: ifcvt.c
===================================================================
*** ifcvt.c	(revision 139737)
--- ifcvt.c	(working copy)
***************
*** 67,73 ****
  #endif
  
  #ifndef MAX_CONDITIONAL_EXECUTE
! #define MAX_CONDITIONAL_EXECUTE   (BRANCH_COST + 1)
  #endif
  
  #define IFCVT_MULTIPLE_DUMPS 1
--- 67,75 ----
  #endif
  
  #ifndef MAX_CONDITIONAL_EXECUTE
! #define MAX_CONDITIONAL_EXECUTE \
!   (BRANCH_COST (optimize_function_for_speed_p (cfun), false) \
!    + 1)
  #endif
  
  #define IFCVT_MULTIPLE_DUMPS 1
*************** struct noce_if_info
*** 626,631 ****
--- 628,636 ----
       from TEST_BB.  For the noce transformations, we allow the symmetric
       form as well.  */
    bool then_else_reversed;
+ 
+   /* Estimated cost of the particular branch instruction.  */
+   int branch_cost;
  };
  
  static rtx noce_emit_store_flag (struct noce_if_info *, rtx, int, int);
*************** noce_try_store_flag_constants (struct no
*** 963,982 ****
  	normalize = 0;
        else if (ifalse == 0 && exact_log2 (itrue) >= 0
  	       && (STORE_FLAG_VALUE == 1
! 		   || BRANCH_COST >= 2))
  	normalize = 1;
        else if (itrue == 0 && exact_log2 (ifalse) >= 0 && can_reverse
! 	       && (STORE_FLAG_VALUE == 1 || BRANCH_COST >= 2))
  	normalize = 1, reversep = 1;
        else if (itrue == -1
  	       && (STORE_FLAG_VALUE == -1
! 		   || BRANCH_COST >= 2))
  	normalize = -1;
        else if (ifalse == -1 && can_reverse
! 	       && (STORE_FLAG_VALUE == -1 || BRANCH_COST >= 2))
  	normalize = -1, reversep = 1;
!       else if ((BRANCH_COST >= 2 && STORE_FLAG_VALUE == -1)
! 	       || BRANCH_COST >= 3)
  	normalize = -1;
        else
  	return FALSE;
--- 968,987 ----
  	normalize = 0;
        else if (ifalse == 0 && exact_log2 (itrue) >= 0
  	       && (STORE_FLAG_VALUE == 1
! 		   || if_info->branch_cost >= 2))
  	normalize = 1;
        else if (itrue == 0 && exact_log2 (ifalse) >= 0 && can_reverse
! 	       && (STORE_FLAG_VALUE == 1 || if_info->branch_cost >= 2))
  	normalize = 1, reversep = 1;
        else if (itrue == -1
  	       && (STORE_FLAG_VALUE == -1
! 		   || if_info->branch_cost >= 2))
  	normalize = -1;
        else if (ifalse == -1 && can_reverse
! 	       && (STORE_FLAG_VALUE == -1 || if_info->branch_cost >= 2))
  	normalize = -1, reversep = 1;
!       else if ((if_info->branch_cost >= 2 && STORE_FLAG_VALUE == -1)
! 	       || if_info->branch_cost >= 3)
  	normalize = -1;
        else
  	return FALSE;
*************** noce_try_addcc (struct noce_if_info *if_
*** 1107,1113 ****
  
        /* If that fails, construct conditional increment or decrement using
  	 setcc.  */
!       if (BRANCH_COST >= 2
  	  && (XEXP (if_info->a, 1) == const1_rtx
  	      || XEXP (if_info->a, 1) == constm1_rtx))
          {
--- 1112,1118 ----
  
        /* If that fails, construct conditional increment or decrement using
  	 setcc.  */
!       if (if_info->branch_cost >= 2
  	  && (XEXP (if_info->a, 1) == const1_rtx
  	      || XEXP (if_info->a, 1) == constm1_rtx))
          {
*************** noce_try_store_flag_mask (struct noce_if
*** 1158,1164 ****
    int reversep;
  
    reversep = 0;
!   if ((BRANCH_COST >= 2
         || STORE_FLAG_VALUE == -1)
        && ((if_info->a == const0_rtx
  	   && rtx_equal_p (if_info->b, if_info->x))
--- 1163,1169 ----
    int reversep;
  
    reversep = 0;
!   if ((if_info->branch_cost >= 2
         || STORE_FLAG_VALUE == -1)
        && ((if_info->a == const0_rtx
  	   && rtx_equal_p (if_info->b, if_info->x))
*************** noce_try_cmove_arith (struct noce_if_inf
*** 1317,1323 ****
    /* ??? FIXME: Magic number 5.  */
    if (cse_not_expected
        && MEM_P (a) && MEM_P (b)
!       && BRANCH_COST >= 5)
      {
        a = XEXP (a, 0);
        b = XEXP (b, 0);
--- 1322,1328 ----
    /* ??? FIXME: Magic number 5.  */
    if (cse_not_expected
        && MEM_P (a) && MEM_P (b)
!       && if_info->branch_cost >= 5)
      {
        a = XEXP (a, 0);
        b = XEXP (b, 0);
*************** noce_try_cmove_arith (struct noce_if_inf
*** 1347,1353 ****
    if (insn_a)
      {
        insn_cost = insn_rtx_cost (PATTERN (insn_a));
!       if (insn_cost == 0 || insn_cost > COSTS_N_INSNS (BRANCH_COST))
  	return FALSE;
      }
    else
--- 1352,1358 ----
    if (insn_a)
      {
        insn_cost = insn_rtx_cost (PATTERN (insn_a));
!       if (insn_cost == 0 || insn_cost > COSTS_N_INSNS (if_info->branch_cost))
  	return FALSE;
      }
    else
*************** noce_try_cmove_arith (struct noce_if_inf
*** 1356,1362 ****
    if (insn_b)
      {
        insn_cost += insn_rtx_cost (PATTERN (insn_b));
!       if (insn_cost == 0 || insn_cost > COSTS_N_INSNS (BRANCH_COST))
          return FALSE;
      }
  
--- 1361,1367 ----
    if (insn_b)
      {
        insn_cost += insn_rtx_cost (PATTERN (insn_b));
!       if (insn_cost == 0 || insn_cost > COSTS_N_INSNS (if_info->branch_cost))
          return FALSE;
      }
  
*************** noce_find_if_block (basic_block test_bb,
*** 2831,2836 ****
--- 2836,2843 ----
    if_info.cond_earliest = cond_earliest;
    if_info.jump = jump;
    if_info.then_else_reversed = then_else_reversed;
+   if_info.branch_cost = BRANCH_COST (optimize_bb_for_speed_p (test_bb),
+ 				     predictable_edge_p (then_edge));
  
    /* Do the real work.  */
  
*************** find_if_case_1 (basic_block test_bb, edg
*** 3597,3603 ****
  	     test_bb->index, then_bb->index);
  
    /* THEN is small.  */
!   if (! cheap_bb_rtx_cost_p (then_bb, COSTS_N_INSNS (BRANCH_COST)))
      return FALSE;
  
    /* Registers set are dead, or are predicable.  */
--- 3604,3612 ----
  	     test_bb->index, then_bb->index);
  
    /* THEN is small.  */
!   if (! cheap_bb_rtx_cost_p (then_bb,
! 	COSTS_N_INSNS (BRANCH_COST (optimize_bb_for_speed_p (then_edge->src),
! 				    predictable_edge_p (then_edge)))))
      return FALSE;
  
    /* Registers set are dead, or are predicable.  */
*************** find_if_case_2 (basic_block test_bb, edg
*** 3711,3717 ****
  	     test_bb->index, else_bb->index);
  
    /* ELSE is small.  */
!   if (! cheap_bb_rtx_cost_p (else_bb, COSTS_N_INSNS (BRANCH_COST)))
      return FALSE;
  
    /* Registers set are dead, or are predicable.  */
--- 3720,3728 ----
  	     test_bb->index, else_bb->index);
  
    /* ELSE is small.  */
!   if (! cheap_bb_rtx_cost_p (else_bb, 
! 	COSTS_N_INSNS (BRANCH_COST (optimize_bb_for_speed_p (else_edge->src),
! 				    predictable_edge_p (else_edge)))))
      return FALSE;
  
    /* Registers set are dead, or are predicable.  */
Index: expr.h
===================================================================
*** expr.h	(revision 139737)
--- expr.h	(working copy)
*************** along with GCC; see the file COPYING3.  
*** 36,42 ****
  
  /* The default branch cost is 1.  */
  #ifndef BRANCH_COST
! #define BRANCH_COST 1
  #endif
  
  /* This is the 4th arg to `expand_expr'.
--- 36,42 ----
  
  /* The default branch cost is 1.  */
  #ifndef BRANCH_COST
! #define BRANCH_COST(speed_p, predictable_p) 1
  #endif
  
  /* This is the 4th arg to `expand_expr'.
Index: predict.c
===================================================================
*** predict.c	(revision 139737)
--- predict.c	(working copy)
*************** optimize_insn_for_speed_p (void)
*** 245,250 ****
--- 245,267 ----
    return !optimize_insn_for_size_p ();
  }
  
+ /* Return true when edge E is likely to be well predictable by branch
+    predictor.  */
+ 
+ bool
+ predictable_edge_p (edge e)
+ {
+   if (profile_status == PROFILE_ABSENT)
+     return false;
+   if ((e->probability
+        <= PARAM_VALUE (PARAM_PREDICTABLE_BRANCH_OUTCOME) * REG_BR_PROB_BASE / 100)
+       || (REG_BR_PROB_BASE - e->probability
+           <= PARAM_VALUE (PARAM_PREDICTABLE_BRANCH_OUTCOME) * REG_BR_PROB_BASE / 100))
+     return true;
+   return false;
+ }
+ 
+ 
  /* Set RTL expansion for BB profile.  */
  
  void
Index: expmed.c
===================================================================
*** expmed.c	(revision 139737)
--- expmed.c	(working copy)
*************** expand_smod_pow2 (enum machine_mode mode
*** 3492,3498 ****
    result = gen_reg_rtx (mode);
  
    /* Avoid conditional branches when they're expensive.  */
!   if (BRANCH_COST >= 2
        && optimize_insn_for_speed_p ())
      {
        rtx signmask = emit_store_flag (result, LT, op0, const0_rtx,
--- 3492,3498 ----
    result = gen_reg_rtx (mode);
  
    /* Avoid conditional branches when they're expensive.  */
!   if (BRANCH_COST (optimize_insn_for_speed_p (), false) >= 2
        && optimize_insn_for_speed_p ())
      {
        rtx signmask = emit_store_flag (result, LT, op0, const0_rtx,
*************** expand_sdiv_pow2 (enum machine_mode mode
*** 3592,3598 ****
    logd = floor_log2 (d);
    shift = build_int_cst (NULL_TREE, logd);
  
!   if (d == 2 && BRANCH_COST >= 1)
      {
        temp = gen_reg_rtx (mode);
        temp = emit_store_flag (temp, LT, op0, const0_rtx, mode, 0, 1);
--- 3592,3600 ----
    logd = floor_log2 (d);
    shift = build_int_cst (NULL_TREE, logd);
  
!   if (d == 2
!       && BRANCH_COST (optimize_insn_for_speed_p (),
! 		      false) >= 1)
      {
        temp = gen_reg_rtx (mode);
        temp = emit_store_flag (temp, LT, op0, const0_rtx, mode, 0, 1);
*************** expand_sdiv_pow2 (enum machine_mode mode
*** 3602,3608 ****
      }
  
  #ifdef HAVE_conditional_move
!   if (BRANCH_COST >= 2)
      {
        rtx temp2;
  
--- 3604,3611 ----
      }
  
  #ifdef HAVE_conditional_move
!   if (BRANCH_COST (optimize_insn_for_speed_p (), false)
!       >= 2)
      {
        rtx temp2;
  
*************** expand_sdiv_pow2 (enum machine_mode mode
*** 3631,3637 ****
      }
  #endif
  
!   if (BRANCH_COST >= 2)
      {
        int ushift = GET_MODE_BITSIZE (mode) - logd;
  
--- 3634,3641 ----
      }
  #endif
  
!   if (BRANCH_COST (optimize_insn_for_speed_p (),
! 		   false) >= 2)
      {
        int ushift = GET_MODE_BITSIZE (mode) - logd;
  
*************** emit_store_flag (rtx target, enum rtx_co
*** 5345,5351 ****
       comparison with zero.  Don't do any of these cases if branches are
       very cheap.  */
  
!   if (BRANCH_COST > 0
        && GET_MODE_CLASS (mode) == MODE_INT && (code == EQ || code == NE)
        && op1 != const0_rtx)
      {
--- 5349,5356 ----
       comparison with zero.  Don't do any of these cases if branches are
       very cheap.  */
  
!   if (BRANCH_COST (optimize_insn_for_speed_p (),
! 		   false) > 0
        && GET_MODE_CLASS (mode) == MODE_INT && (code == EQ || code == NE)
        && op1 != const0_rtx)
      {
*************** emit_store_flag (rtx target, enum rtx_co
*** 5368,5377 ****
       do LE and GT if branches are expensive since they are expensive on
       2-operand machines.  */
  
!   if (BRANCH_COST == 0
        || GET_MODE_CLASS (mode) != MODE_INT || op1 != const0_rtx
        || (code != EQ && code != NE
! 	  && (BRANCH_COST <= 1 || (code != LE && code != GT))))
      return 0;
  
    /* See what we need to return.  We can only return a 1, -1, or the
--- 5373,5384 ----
       do LE and GT if branches are expensive since they are expensive on
       2-operand machines.  */
  
!   if (BRANCH_COST (optimize_insn_for_speed_p (),
! 		   false) == 0
        || GET_MODE_CLASS (mode) != MODE_INT || op1 != const0_rtx
        || (code != EQ && code != NE
! 	  && (BRANCH_COST (optimize_insn_for_speed_p (),
! 			   false) <= 1 || (code != LE && code != GT))))
      return 0;
  
    /* See what we need to return.  We can only return a 1, -1, or the
*************** emit_store_flag (rtx target, enum rtx_co
*** 5467,5473 ****
  	 that "or", which is an extra insn, so we only handle EQ if branches
  	 are expensive.  */
  
!       if (tem == 0 && (code == NE || BRANCH_COST > 1))
  	{
  	  if (rtx_equal_p (subtarget, op0))
  	    subtarget = 0;
--- 5474,5483 ----
  	 that "or", which is an extra insn, so we only handle EQ if branches
  	 are expensive.  */
  
!       if (tem == 0
! 	  && (code == NE
! 	      || BRANCH_COST (optimize_insn_for_speed_p (),
! 		      	      false) > 1))
  	{
  	  if (rtx_equal_p (subtarget, op0))
  	    subtarget = 0;
Index: basic-block.h
===================================================================
*** basic-block.h	(revision 139737)
--- basic-block.h	(working copy)
*************** extern void guess_outgoing_edge_probabil
*** 848,853 ****
--- 848,854 ----
  extern void remove_predictions_associated_with_edge (edge);
  extern bool edge_probability_reliable_p (const_edge);
  extern bool br_prob_note_reliable_p (const_rtx);
+ extern bool predictable_edge_p (edge);
  
  /* In cfg.c  */
  extern void dump_regset (regset, FILE *);
Index: config/alpha/alpha.h
===================================================================
*** config/alpha/alpha.h	(revision 139737)
--- config/alpha/alpha.h	(working copy)
*************** extern int alpha_memory_latency;
*** 640,646 ****
  #define MEMORY_MOVE_COST(MODE,CLASS,IN)  (2*alpha_memory_latency)
  
  /* Provide the cost of a branch.  Exact meaning under development.  */
! #define BRANCH_COST 5
  
  /* Stack layout; function entry, exit and calling.  */
  
--- 640,646 ----
  #define MEMORY_MOVE_COST(MODE,CLASS,IN)  (2*alpha_memory_latency)
  
  /* Provide the cost of a branch.  Exact meaning under development.  */
! #define BRANCH_COST(speed_p, predictable_p) 5
  
  /* Stack layout; function entry, exit and calling.  */
  
Index: config/frv/frv.h
===================================================================
*** config/frv/frv.h	(revision 139737)
--- config/frv/frv.h	(working copy)
*************** do {							\
*** 2193,2199 ****
  
  /* A C expression for the cost of a branch instruction.  A value of 1 is the
     default; other values are interpreted relative to that.  */
! #define BRANCH_COST frv_branch_cost_int
  
  /* Define this macro as a C expression which is nonzero if accessing less than
     a word of memory (i.e. a `char' or a `short') is no faster than accessing a
--- 2193,2199 ----
  
  /* A C expression for the cost of a branch instruction.  A value of 1 is the
     default; other values are interpreted relative to that.  */
! #define BRANCH_COST(speed_p, predictable_p) frv_branch_cost_int
  
  /* Define this macro as a C expression which is nonzero if accessing less than
     a word of memory (i.e. a `char' or a `short') is no faster than accessing a
Index: config/s390/s390.h
===================================================================
*** config/s390/s390.h	(revision 139737)
--- config/s390/s390.h	(working copy)
*************** extern struct rtx_def *s390_compare_op0,
*** 828,834 ****
  
  /* A C expression for the cost of a branch instruction.  A value of 1
     is the default; other values are interpreted relative to that.  */
! #define BRANCH_COST 1
  
  /* Nonzero if access to memory by bytes is slow and undesirable.  */
  #define SLOW_BYTE_ACCESS 1
--- 828,834 ----
  
  /* A C expression for the cost of a branch instruction.  A value of 1
     is the default; other values are interpreted relative to that.  */
! #define BRANCH_COST(speed_p, predictable_p) 1
  
  /* Nonzero if access to memory by bytes is slow and undesirable.  */
  #define SLOW_BYTE_ACCESS 1
Index: config/spu/spu.h
===================================================================
*** config/spu/spu.h	(revision 139737)
--- config/spu/spu.h	(working copy)
*************** targetm.resolve_overloaded_builtin = spu
*** 434,440 ****
  
  /* Costs */
  
! #define BRANCH_COST spu_branch_cost
  
  #define SLOW_BYTE_ACCESS 0
  
--- 434,440 ----
  
  /* Costs */
  
! #define BRANCH_COST(speed_p, predictable_p) spu_branch_cost
  
  #define SLOW_BYTE_ACCESS 0
  
Index: config/sparc/sparc.h
===================================================================
*** config/sparc/sparc.h	(revision 139737)
--- config/sparc/sparc.h	(working copy)
*************** do {                                    
*** 2196,2202 ****
     On Niagara-2, a not-taken branch costs 1 cycle whereas a taken
     branch costs 6 cycles.  */
  
! #define BRANCH_COST \
  	((sparc_cpu == PROCESSOR_V9 \
  	  || sparc_cpu == PROCESSOR_ULTRASPARC) \
  	 ? 7 \
--- 2196,2202 ----
     On Niagara-2, a not-taken branch costs 1 cycle whereas a taken
     branch costs 6 cycles.  */
  
! #define BRANCH_COST (speed_p, predictable_p) \
  	((sparc_cpu == PROCESSOR_V9 \
  	  || sparc_cpu == PROCESSOR_ULTRASPARC) \
  	 ? 7 \
Index: config/m32r/m32r.h
===================================================================
*** config/m32r/m32r.h	(revision 139737)
--- config/m32r/m32r.h	(working copy)
*************** L2:     .word STATIC
*** 1224,1230 ****
  /* A value of 2 here causes GCC to avoid using branches in comparisons like
     while (a < N && a).  Branches aren't that expensive on the M32R so
     we define this as 1.  Defining it as 2 had a heavy hit in fp-bit.c.  */
! #define BRANCH_COST ((TARGET_BRANCH_COST) ? 2 : 1)
  
  /* Nonzero if access to memory by bytes is slow and undesirable.
     For RISC chips, it means that access to memory by bytes is no
--- 1224,1230 ----
  /* A value of 2 here causes GCC to avoid using branches in comparisons like
     while (a < N && a).  Branches aren't that expensive on the M32R so
     we define this as 1.  Defining it as 2 had a heavy hit in fp-bit.c.  */
! #define BRANCH_COST(speed_p, predictable_p) ((TARGET_BRANCH_COST) ? 2 : 1)
  
  /* Nonzero if access to memory by bytes is slow and undesirable.
     For RISC chips, it means that access to memory by bytes is no
Index: config/i386/i386.h
===================================================================
*** config/i386/i386.h	(revision 139737)
--- config/i386/i386.h	(working copy)
*************** do {							\
*** 1975,1981 ****
  /* A C expression for the cost of a branch instruction.  A value of 1
     is the default; other values are interpreted relative to that.  */
  
! #define BRANCH_COST ix86_branch_cost
  
  /* Define this macro as a C expression which is nonzero if accessing
     less than a word of memory (i.e. a `char' or a `short') is no
--- 1975,1982 ----
  /* A C expression for the cost of a branch instruction.  A value of 1
     is the default; other values are interpreted relative to that.  */
  
! #define BRANCH_COST(speed_p, predictable_p) \
!   (!(speed_p) ? 2 : (predictable_p) ? 0 : ix86_branch_cost)
  
  /* Define this macro as a C expression which is nonzero if accessing
     less than a word of memory (i.e. a `char' or a `short') is no
Index: config/i386/i386.c
===================================================================
*** config/i386/i386.c	(revision 139737)
--- config/i386/i386.c	(working copy)
*************** ix86_expand_int_movcc (rtx operands[])
*** 14636,14642 ****
         */
  
        if ((!TARGET_CMOVE || (mode == QImode && TARGET_PARTIAL_REG_STALL))
! 	  && BRANCH_COST >= 2)
  	{
  	  if (cf == 0)
  	    {
--- 14636,14643 ----
         */
  
        if ((!TARGET_CMOVE || (mode == QImode && TARGET_PARTIAL_REG_STALL))
! 	  && BRANCH_COST (optimize_insn_for_speed_p (),
! 		  	  false) >= 2)
  	{
  	  if (cf == 0)
  	    {
*************** ix86_expand_int_movcc (rtx operands[])
*** 14721,14727 ****
        optab op;
        rtx var, orig_out, out, tmp;
  
!       if (BRANCH_COST <= 2)
  	return 0; /* FAIL */
  
        /* If one of the two operands is an interesting constant, load a
--- 14722,14728 ----
        optab op;
        rtx var, orig_out, out, tmp;
  
!       if (BRANCH_COST (optimize_insn_for_speed_p (), false) <= 2)
  	return 0; /* FAIL */
  
        /* If one of the two operands is an interesting constant, load a
Index: config/sh/sh.h
===================================================================
*** config/sh/sh.h	(revision 139737)
--- config/sh/sh.h	(working copy)
*************** struct sh_args {
*** 2847,2853 ****
     The SH1 does not have delay slots, hence we get a pipeline stall
     at every branch.  The SH4 is superscalar, so the single delay slot
     is not sufficient to keep both pipelines filled.  */
! #define BRANCH_COST (TARGET_SH5 ? 1 : ! TARGET_SH2 || TARGET_HARD_SH4 ? 2 : 1)
  
  /* Assembler output control.  */
  
--- 2847,2854 ----
     The SH1 does not have delay slots, hence we get a pipeline stall
     at every branch.  The SH4 is superscalar, so the single delay slot
     is not sufficient to keep both pipelines filled.  */
! #define BRANCH_COST(speed_p, predictable_p) \
! 	(TARGET_SH5 ? 1 : ! TARGET_SH2 || TARGET_HARD_SH4 ? 2 : 1)
  
  /* Assembler output control.  */
  
Index: config/pdp11/pdp11.h
===================================================================
*** config/pdp11/pdp11.h	(revision 139737)
--- config/pdp11/pdp11.h	(working copy)
*************** JMP	FUNCTION	0x0058  0x0000 <- FUNCTION
*** 1057,1063 ****
  /* there is no point in avoiding branches on a pdp, 
     since branches are really cheap - I just want to find out
     how much difference the BRANCH_COST macro makes in code */
! #define BRANCH_COST (TARGET_BRANCH_CHEAP ? 0 : 1)
  
  
  #define COMPARE_FLAG_MODE HImode
--- 1057,1063 ----
  /* there is no point in avoiding branches on a pdp, 
     since branches are really cheap - I just want to find out
     how much difference the BRANCH_COST macro makes in code */
! #define BRANCH_COST(speed_p, predictable_p) (TARGET_BRANCH_CHEAP ? 0 : 1)
  
  
  #define COMPARE_FLAG_MODE HImode
Index: config/avr/avr.h
===================================================================
*** config/avr/avr.h	(revision 139737)
--- config/avr/avr.h	(working copy)
*************** do {									    \
*** 511,517 ****
  					 (MODE)==SImode ? 8 :	\
  					 (MODE)==SFmode ? 8 : 16)
  
! #define BRANCH_COST 0
  
  #define SLOW_BYTE_ACCESS 0
  
--- 511,517 ----
  					 (MODE)==SImode ? 8 :	\
  					 (MODE)==SFmode ? 8 : 16)
  
! #define BRANCH_COST(speed_p, predictable_p) 0
  
  #define SLOW_BYTE_ACCESS 0
  
Index: config/crx/crx.h
===================================================================
*** config/crx/crx.h	(revision 139737)
--- config/crx/crx.h	(working copy)
*************** struct cumulative_args
*** 420,426 ****
  /* Moving to processor register flushes pipeline - thus asymmetric */
  #define REGISTER_MOVE_COST(MODE, FROM, TO) ((TO != GENERAL_REGS) ? 8 : 2)
  /* Assume best case (branch predicted) */
! #define BRANCH_COST 2
  
  #define SLOW_BYTE_ACCESS  1
  
--- 420,426 ----
  /* Moving to processor register flushes pipeline - thus asymmetric */
  #define REGISTER_MOVE_COST(MODE, FROM, TO) ((TO != GENERAL_REGS) ? 8 : 2)
  /* Assume best case (branch predicted) */
! #define BRANCH_COST(speed_p, predictable_p) 2
  
  #define SLOW_BYTE_ACCESS  1
  
Index: config/xtensa/xtensa.h
===================================================================
*** config/xtensa/xtensa.h	(revision 139737)
--- config/xtensa/xtensa.h	(working copy)
*************** typedef struct xtensa_args
*** 882,888 ****
  
  #define MEMORY_MOVE_COST(MODE, CLASS, IN) 4
  
! #define BRANCH_COST 3
  
  /* How to refer to registers in assembler output.
     This sequence is indexed by compiler's hard-register-number (see above).  */
--- 882,888 ----
  
  #define MEMORY_MOVE_COST(MODE, CLASS, IN) 4
  
! #define BRANCH_COST(speed_p, predictable_p) 3
  
  /* How to refer to registers in assembler output.
     This sequence is indexed by compiler's hard-register-number (see above).  */
Index: config/stormy16/stormy16.h
===================================================================
*** config/stormy16/stormy16.h	(revision 139737)
--- config/stormy16/stormy16.h	(working copy)
*************** do {							\
*** 587,593 ****
  
  #define MEMORY_MOVE_COST(M,C,I) (5 + memory_move_secondary_cost (M, C, I))
  
! #define BRANCH_COST 5
  
  #define SLOW_BYTE_ACCESS 0
  
--- 587,593 ----
  
  #define MEMORY_MOVE_COST(M,C,I) (5 + memory_move_secondary_cost (M, C, I))
  
! #define BRANCH_COST(speed_p, predictable_p) 5
  
  #define SLOW_BYTE_ACCESS 0
  
Index: config/m68hc11/m68hc11.h
===================================================================
*** config/m68hc11/m68hc11.h	(revision 139737)
--- config/m68hc11/m68hc11.h	(working copy)
*************** extern unsigned char m68hc11_reg_valid_f
*** 1266,1272 ****
  
     Pretend branches are cheap because GCC generates sub-optimal code
     for the default value.  */
! #define BRANCH_COST 0
  
  /* Nonzero if access to memory by bytes is slow and undesirable.  */
  #define SLOW_BYTE_ACCESS	0
--- 1266,1272 ----
  
     Pretend branches are cheap because GCC generates sub-optimal code
     for the default value.  */
! #define BRANCH_COST(speed_p, predictable_p) 0
  
  /* Nonzero if access to memory by bytes is slow and undesirable.  */
  #define SLOW_BYTE_ACCESS	0
Index: config/iq2000/iq2000.h
===================================================================
*** config/iq2000/iq2000.h	(revision 139737)
--- config/iq2000/iq2000.h	(working copy)
*************** typedef struct iq2000_args
*** 624,630 ****
  #define MEMORY_MOVE_COST(MODE,CLASS,TO_P)	\
    (TO_P ? 2 : 16)
  
! #define BRANCH_COST 2
  
  #define SLOW_BYTE_ACCESS 1
  
--- 624,630 ----
  #define MEMORY_MOVE_COST(MODE,CLASS,TO_P)	\
    (TO_P ? 2 : 16)
  
! #define BRANCH_COST(speed_p, predictable_p) 2
  
  #define SLOW_BYTE_ACCESS 1
  
Index: config/ia64/ia64.h
===================================================================
*** config/ia64/ia64.h	(revision 139737)
--- config/ia64/ia64.h	(working copy)
*************** do {									\
*** 1384,1390 ****
     many additional insn groups we run into, vs how good the dynamic
     branch predictor is.  */
  
! #define BRANCH_COST 6
  
  /* Define this macro as a C expression which is nonzero if accessing less than
     a word of memory (i.e. a `char' or a `short') is no faster than accessing a
--- 1384,1390 ----
     many additional insn groups we run into, vs how good the dynamic
     branch predictor is.  */
  
! #define BRANCH_COST(speed_p, predictable_p) 6
  
  /* Define this macro as a C expression which is nonzero if accessing less than
     a word of memory (i.e. a `char' or a `short') is no faster than accessing a
Index: config/rs6000/rs6000.h
===================================================================
*** config/rs6000/rs6000.h	(revision 139737)
--- config/rs6000/rs6000.h	(working copy)
*************** extern enum rs6000_nop_insertion rs6000_
*** 967,973 ****
     Set this to 3 on the RS/6000 since that is roughly the average cost of an
     unscheduled conditional branch.  */
  
! #define BRANCH_COST 3
  
  /* Override BRANCH_COST heuristic which empirically produces worse
     performance for removing short circuiting from the logical ops.  */
--- 967,973 ----
     Set this to 3 on the RS/6000 since that is roughly the average cost of an
     unscheduled conditional branch.  */
  
! #define BRANCH_COST(speed_p, predictable_p) 3
  
  /* Override BRANCH_COST heuristic which empirically produces worse
     performance for removing short circuiting from the logical ops.  */
Index: config/arc/arc.h
===================================================================
*** config/arc/arc.h	(revision 139737)
--- config/arc/arc.h	(working copy)
*************** arc_select_cc_mode (OP, X, Y)
*** 824,830 ****
  /* The cost of a branch insn.  */
  /* ??? What's the right value here?  Branches are certainly more
     expensive than reg->reg moves.  */
! #define BRANCH_COST 2
  
  /* Nonzero if access to memory by bytes is slow and undesirable.
     For RISC chips, it means that access to memory by bytes is no
--- 824,830 ----
  /* The cost of a branch insn.  */
  /* ??? What's the right value here?  Branches are certainly more
     expensive than reg->reg moves.  */
! #define BRANCH_COST(speed_p, predictable_p) 2
  
  /* Nonzero if access to memory by bytes is slow and undesirable.
     For RISC chips, it means that access to memory by bytes is no
Index: config/score/score.h
===================================================================
*** config/score/score.h	(revision 139737)
--- config/score/score.h	(working copy)
*************** typedef struct score_args
*** 793,799 ****
    (4 + memory_move_secondary_cost ((MODE), (CLASS), (TO_P)))
  
  /* Try to generate sequences that don't involve branches.  */
! #define BRANCH_COST                     2
  
  /* Nonzero if access to memory by bytes is slow and undesirable.  */
  #define SLOW_BYTE_ACCESS                1
--- 793,799 ----
    (4 + memory_move_secondary_cost ((MODE), (CLASS), (TO_P)))
  
  /* Try to generate sequences that don't involve branches.  */
! #define BRANCH_COST(speed_p, predictable_p) 2
  
  /* Nonzero if access to memory by bytes is slow and undesirable.  */
  #define SLOW_BYTE_ACCESS                1
Index: config/arm/arm.h
===================================================================
*** config/arm/arm.h	(revision 139737)
--- config/arm/arm.h	(working copy)
*************** do {							\
*** 2297,2303 ****
  
  /* Try to generate sequences that don't involve branches, we can then use
     conditional instructions */
! #define BRANCH_COST \
    (TARGET_32BIT ? 4 : (optimize > 0 ? 2 : 0))
  
  /* Position Independent Code.  */
--- 2297,2303 ----
  
  /* Try to generate sequences that don't involve branches, we can then use
     conditional instructions */
! #define BRANCH_COST(speed_p, predictable_p) \
    (TARGET_32BIT ? 4 : (optimize > 0 ? 2 : 0))
  
  /* Position Independent Code.  */
Index: config/pa/pa.h
===================================================================
*** config/pa/pa.h	(revision 139737)
--- config/pa/pa.h	(working copy)
*************** do { 									\
*** 1570,1576 ****
    : 2)
  
  /* Adjust the cost of branches.  */
! #define BRANCH_COST (pa_cpu == PROCESSOR_8000 ? 2 : 1)
  
  /* Handling the special cases is going to get too complicated for a macro,
     just call `pa_adjust_insn_length' to do the real work.  */
--- 1570,1576 ----
    : 2)
  
  /* Adjust the cost of branches.  */
! #define BRANCH_COST(speed_p, predictable_p) (pa_cpu == PROCESSOR_8000 ? 2 : 1)
  
  /* Handling the special cases is going to get too complicated for a macro,
     just call `pa_adjust_insn_length' to do the real work.  */
Index: config/mips/mips.h
===================================================================
*** config/mips/mips.h	(revision 139737)
--- config/mips/mips.h	(working copy)
*************** typedef struct mips_args {
*** 2551,2557 ****
  /* A C expression for the cost of a branch instruction.  A value of
     1 is the default; other values are interpreted relative to that.  */
  
! #define BRANCH_COST mips_branch_cost
  #define LOGICAL_OP_NON_SHORT_CIRCUIT 0
  
  /* If defined, modifies the length assigned to instruction INSN as a
--- 2551,2557 ----
  /* A C expression for the cost of a branch instruction.  A value of
     1 is the default; other values are interpreted relative to that.  */
  
! #define BRANCH_COST(speed_p, predictable_p) mips_branch_cost
  #define LOGICAL_OP_NON_SHORT_CIRCUIT 0
  
  /* If defined, modifies the length assigned to instruction INSN as a
Index: config/vax/vax.h
===================================================================
*** config/vax/vax.h	(revision 139737)
--- config/vax/vax.h	(working copy)
*************** enum reg_class { NO_REGS, ALL_REGS, LIM_
*** 648,654 ****
     Branches are extremely cheap on the VAX while the shift insns often
     used to replace branches can be expensive.  */
  
! #define BRANCH_COST 0
  
  /* Tell final.c how to eliminate redundant test instructions.  */
  
--- 648,654 ----
     Branches are extremely cheap on the VAX while the shift insns often
     used to replace branches can be expensive.  */
  
! #define BRANCH_COST(speed_p, predictable_p) 0
  
  /* Tell final.c how to eliminate redundant test instructions.  */
  
Index: config/h8300/h8300.h
===================================================================
*** config/h8300/h8300.h	(revision 139737)
--- config/h8300/h8300.h	(working copy)
*************** struct cum_arg
*** 1004,1010 ****
  #define DELAY_SLOT_LENGTH(JUMP) \
    (NEXT_INSN (PREV_INSN (JUMP)) == JUMP ? 0 : 2)
  
! #define BRANCH_COST 0
  
  /* Tell final.c how to eliminate redundant test instructions.  */
  
--- 1004,1010 ----
  #define DELAY_SLOT_LENGTH(JUMP) \
    (NEXT_INSN (PREV_INSN (JUMP)) == JUMP ? 0 : 2)
  
! #define BRANCH_COST(speed_p, predictable_p) 0
  
  /* Tell final.c how to eliminate redundant test instructions.  */
  
Index: params.def
===================================================================
*** params.def	(revision 139737)
--- params.def	(working copy)
*************** DEFPARAM (PARAM_STRUCT_REORG_COLD_STRUCT
*** 78,83 ****
--- 78,90 ----
  	  "The threshold ratio between current and hottest structure counts",
  	  10, 0, 100)
  
+ /* When branch is predicted to be taken with probability lower than this
+    threshold (in percent), then it is considered well predictable. */
+ DEFPARAM (PARAM_PREDICTABLE_BRANCH_OUTCOME,
+ 	  "predictable-branch-outcome",
+ 	  "Maximal esitmated outcome of branch considered predictable",
+ 	  2, 0, 50)
+ 
  /* The single function inlining limit. This is the maximum size
     of a function counted in internal gcc instructions (not in
     real machine instructions) that is eligible for inlining