From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 21171 invoked by alias); 29 Aug 2008 12:53:39 -0000 Received: (qmail 21152 invoked by uid 22791); 29 Aug 2008 12:53:34 -0000 X-Spam-Check-By: sourceware.org Received: from fg-out-1718.google.com (HELO fg-out-1718.google.com) (72.14.220.159) by sourceware.org (qpsmtpd/0.31) with ESMTP; Fri, 29 Aug 2008 12:52:39 +0000 Received: by fg-out-1718.google.com with SMTP id e21so538877fga.28 for ; Fri, 29 Aug 2008 05:52:36 -0700 (PDT) Received: by 10.187.201.4 with SMTP id d4mr1643994faq.17.1220014356108; Fri, 29 Aug 2008 05:52:36 -0700 (PDT) Received: by 10.187.174.3 with HTTP; Fri, 29 Aug 2008 05:52:36 -0700 (PDT) Message-ID: <84fc9c000808290552w34f7548ft5b89b659aa5eb6ce@mail.gmail.com> Date: Sat, 30 Aug 2008 19:06:00 -0000 From: "Richard Guenther" To: "Jan Hubicka" Subject: Re: Patch ping... Cc: gcc-patches@gcc.gnu.org In-Reply-To: <20080828203249.GH15545@atrey.karlin.mff.cuni.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20080405162606.GA22594@atrey.karlin.mff.cuni.cz> <84fc9c000804050953o429fde26jb3938827ff9dc5a@mail.gmail.com> <20080828203249.GH15545@atrey.karlin.mff.cuni.cz> X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2008-08/txt/msg02377.txt.bz2 On Thu, Aug 28, 2008 at 10:32 PM, Jan Hubicka wrote: >> On Sat, Apr 5, 2008 at 6:26 PM, Jan Hubicka wrote: >> > Hi, >> > I would like to ping the BRANCH_COST patch >> > http://gcc.gnu.org/ml/gcc/2008-03/msg00137.html >> > >> > I hope to proceed with updating GCC to optimize cold blocks in same way >> > as -Os and explicitely marked hot functions in -Os code for speed. >> > For this I need to populate RTL cost interfaces with the profile info >> > and teach expansion about it. >> > This is taking quite some years now, I realize it might not be clear >> > what I am precisely shooting for, so I will also add wiki page. >> >> I think the patch makes sense (BRANCH_COST is special anyway compared to >> other isns cost), but I'd like to see the bigger picture as well here. In >> particular, BRANCH_COST (hot, predictable), why isn't that simply >> BRANCH_COST (optimize_size_p, predictable) matching what I possibly >> expect for the other cost interface (insn_cost (optimize_size_p, rtx)). > > Hi, > with the optimize_*_for_speed_p predicates, this patch becomes cleaner > now. I would also like to update other costs similar way so we can > avoid the current way we switch optimize_size global variable. > > Bootstrapped/regtested i686-linux, OK? It looks ok, but I think that PARAM_PREDICTABLE_BRANCH_OUTCOME should be a target macro and not a param. Ok with that change, but please wait 24h to let others comment. Thanks, Richard. > * optabs.c (expand_abs_nojump): Update BRANCH_COST call. > * fold-cost.c (LOGICAL_OP_NON_SHORT_CIRCUIT, fold_truthop): Likewise. > * dojump.c (do_jump): Likewise. > * ifcvt.c (MAX_CONDITIONAL_EXECUTE): Likewise. > (note-if_info): Add BRANCH_COST. > (noce_try_store_flag_constants, noce_try_addcc, noce_try_store_flag_mask, > noce_try_cmove_arith, noce_try_cmove_arith, noce_try_cmove_arith, > noce_find_if_block, find_if_case_1, find_if_case_2): Use compuated > branch cost. > * expr.h (BRANCH_COST): Update default. > * predict.c (predictable_edge_p): New function. > * expmed.c (expand_smod_pow2, expand_sdiv_pow2, emit_store_flag): > Update BRANCH_COST call. > * basic-block.h (predictable_edge_p): Declare. > * config/alpha/alpha.h (BRANCH_COST): Update. > * config/frv/frv.h (BRANCH_COST): Update. > * config/s390/s390.h (BRANCH_COST): Update. > * config/spu/spu.h (BRANCH_COST): Update. > * config/sparc/sparc.h (BRANCH_COST): Update. > * config/m32r/m32r.h (BRANCH_COST): Update. > * config/i386/i386.h (BRANCH_COST): Update. > * config/i386/i386.c (ix86_expand_int_movcc): Update use of BRANCH_COST. > * config/sh/sh.h (BRANCH_COST): Update. > * config/pdp11/pdp11.h (BRANCH_COST): Update. > * config/avr/avr.h (BRANCH_COST): Update. > * config/crx/crx.h (BRANCH_COST): Update. > * config/xtensa/xtensa.h (BRANCH_COST): Update. > * config/stormy16/stormy16.h (BRANCH_COST): Update. > * config/m68hc11/m68hc11.h (BRANCH_COST): Update. > * config/iq2000/iq2000.h (BRANCH_COST): Update. > * config/ia64/ia64.h (BRANCH_COST): Update. > * config/rs6000/rs6000.h (BRANCH_COST): Update. > * config/arc/arc.h (BRANCH_COST): Update. > * config/score/score.h (BRANCH_COST): Update. > * config/arm/arm.h (BRANCH_COST): Update. > * config/pa/pa.h (BRANCH_COST): Update. > * config/mips/mips.h (BRANCH_COST): Update. > * config/vax/vax.h (BRANCH_COST): Update. > * config/h8300/h8300.h (BRANCH_COST): Update. > * params.def (PARAM_PREDICTABLE_BRANCH_OUTCOME): New. > * doc/invoke.texi (predictable-branch-cost-outcome): Document. > * doc/tm.texi (BRANCH_COST): Update. > Index: doc/tm.texi > =================================================================== > *** doc/tm.texi (revision 139737) > --- doc/tm.texi (working copy) > *************** value to the result of that function. T > *** 5874,5882 **** > are the same as to this macro. > @end defmac > > ! @defmac BRANCH_COST > ! A C expression for the cost of a branch instruction. A value of 1 is > ! the default; other values are interpreted relative to that. > @end defmac > > Here are additional macros which do not specify precise relative costs, > --- 5874,5887 ---- > are the same as to this macro. > @end defmac > > ! @defmac BRANCH_COST (@var{speed_p}, @var{predictable_p}) > ! A C expression for the cost of a branch instruction. A value of 1 is the > ! default; other values are interpreted relative to that. Parameter @var{speed_p} > ! is true when the branch in question should be optimized for speed. When > ! it is false, @code{BRANCH_COST} should be returning value optimal for code size > ! rather then performance considerations. @var{predictable_p} is true for well > ! predictable branches. On many architectures the @code{BRANCH_COST} can be > ! reduced then. > @end defmac > > Here are additional macros which do not specify precise relative costs, > Index: doc/invoke.texi > =================================================================== > *** doc/invoke.texi (revision 139737) > --- doc/invoke.texi (working copy) > *************** to the hottest structure frequency in th > *** 6905,6910 **** > --- 6905,6914 ---- > parameter, then structure reorganization is not applied to this structure. > The default is 10. > > + @item predictable-branch-cost-outcome > + When branch is predicted to be taken with probability lower than this threshold > + (in percent), then it is considered well predictable. The default is 10. > + > @item max-crossjump-edges > The maximum number of incoming edges to consider for crossjumping. > The algorithm used by @option{-fcrossjumping} is @math{O(N^2)} in > Index: optabs.c > =================================================================== > *** optabs.c (revision 139737) > --- optabs.c (working copy) > *************** expand_abs_nojump (enum machine_mode mod > *** 3443,3449 **** > value of X as (((signed) x >> (W-1)) ^ x) - ((signed) x >> (W-1)), > where W is the width of MODE. */ > > ! if (GET_MODE_CLASS (mode) == MODE_INT && BRANCH_COST >= 2) > { > rtx extended = expand_shift (RSHIFT_EXPR, mode, op0, > size_int (GET_MODE_BITSIZE (mode) - 1), > --- 3443,3451 ---- > value of X as (((signed) x >> (W-1)) ^ x) - ((signed) x >> (W-1)), > where W is the width of MODE. */ > > ! if (GET_MODE_CLASS (mode) == MODE_INT > ! && BRANCH_COST (optimize_insn_for_speed_p (), > ! false) >= 2) > { > rtx extended = expand_shift (RSHIFT_EXPR, mode, op0, > size_int (GET_MODE_BITSIZE (mode) - 1), > Index: fold-const.c > =================================================================== > *** fold-const.c (revision 139737) > --- fold-const.c (working copy) > *************** fold_cond_expr_with_comparison (tree typ > *** 5109,5115 **** > > > #ifndef LOGICAL_OP_NON_SHORT_CIRCUIT > ! #define LOGICAL_OP_NON_SHORT_CIRCUIT (BRANCH_COST >= 2) > #endif > > /* EXP is some logical combination of boolean tests. See if we can > --- 5109,5117 ---- > > > #ifndef LOGICAL_OP_NON_SHORT_CIRCUIT > ! #define LOGICAL_OP_NON_SHORT_CIRCUIT \ > ! (BRANCH_COST (!cfun || optimize_function_for_speed_p (cfun), \ > ! false) >= 2) > #endif > > /* EXP is some logical combination of boolean tests. See if we can > *************** fold_truthop (enum tree_code code, tree > *** 5357,5363 **** > that can be merged. Avoid doing this if the RHS is a floating-point > comparison since those can trap. */ > > ! if (BRANCH_COST >= 2 > && ! FLOAT_TYPE_P (TREE_TYPE (rl_arg)) > && simple_operand_p (rl_arg) > && simple_operand_p (rr_arg)) > --- 5359,5366 ---- > that can be merged. Avoid doing this if the RHS is a floating-point > comparison since those can trap. */ > > ! if (BRANCH_COST (!cfun || optimize_function_for_speed_p (cfun), > ! false) >= 2 > && ! FLOAT_TYPE_P (TREE_TYPE (rl_arg)) > && simple_operand_p (rl_arg) > && simple_operand_p (rr_arg)) > Index: dojump.c > =================================================================== > *** dojump.c (revision 139737) > --- dojump.c (working copy) > *************** do_jump (tree exp, rtx if_false_label, r > *** 510,516 **** > /* High branch cost, expand as the bitwise AND of the conditions. > Do the same if the RHS has side effects, because we're effectively > turning a TRUTH_AND_EXPR into a TRUTH_ANDIF_EXPR. */ > ! if (BRANCH_COST >= 4 || TREE_SIDE_EFFECTS (TREE_OPERAND (exp, 1))) > goto normal; > > case TRUTH_ANDIF_EXPR: > --- 510,518 ---- > /* High branch cost, expand as the bitwise AND of the conditions. > Do the same if the RHS has side effects, because we're effectively > turning a TRUTH_AND_EXPR into a TRUTH_ANDIF_EXPR. */ > ! if (BRANCH_COST (optimize_insn_for_speed_p (), > ! false) >= 4 > ! || TREE_SIDE_EFFECTS (TREE_OPERAND (exp, 1))) > goto normal; > > case TRUTH_ANDIF_EXPR: > *************** do_jump (tree exp, rtx if_false_label, r > *** 531,537 **** > /* High branch cost, expand as the bitwise OR of the conditions. > Do the same if the RHS has side effects, because we're effectively > turning a TRUTH_OR_EXPR into a TRUTH_ORIF_EXPR. */ > ! if (BRANCH_COST >= 4 || TREE_SIDE_EFFECTS (TREE_OPERAND (exp, 1))) > goto normal; > > case TRUTH_ORIF_EXPR: > --- 533,540 ---- > /* High branch cost, expand as the bitwise OR of the conditions. > Do the same if the RHS has side effects, because we're effectively > turning a TRUTH_OR_EXPR into a TRUTH_ORIF_EXPR. */ > ! if (BRANCH_COST (optimize_insn_for_speed_p (), false)>= 4 > ! || TREE_SIDE_EFFECTS (TREE_OPERAND (exp, 1))) > goto normal; > > case TRUTH_ORIF_EXPR: > Index: ifcvt.c > =================================================================== > *** ifcvt.c (revision 139737) > --- ifcvt.c (working copy) > *************** > *** 67,73 **** > #endif > > #ifndef MAX_CONDITIONAL_EXECUTE > ! #define MAX_CONDITIONAL_EXECUTE (BRANCH_COST + 1) > #endif > > #define IFCVT_MULTIPLE_DUMPS 1 > --- 67,75 ---- > #endif > > #ifndef MAX_CONDITIONAL_EXECUTE > ! #define MAX_CONDITIONAL_EXECUTE \ > ! (BRANCH_COST (optimize_function_for_speed_p (cfun), false) \ > ! + 1) > #endif > > #define IFCVT_MULTIPLE_DUMPS 1 > *************** struct noce_if_info > *** 626,631 **** > --- 628,636 ---- > from TEST_BB. For the noce transformations, we allow the symmetric > form as well. */ > bool then_else_reversed; > + > + /* Estimated cost of the particular branch instruction. */ > + int branch_cost; > }; > > static rtx noce_emit_store_flag (struct noce_if_info *, rtx, int, int); > *************** noce_try_store_flag_constants (struct no > *** 963,982 **** > normalize = 0; > else if (ifalse == 0 && exact_log2 (itrue) >= 0 > && (STORE_FLAG_VALUE == 1 > ! || BRANCH_COST >= 2)) > normalize = 1; > else if (itrue == 0 && exact_log2 (ifalse) >= 0 && can_reverse > ! && (STORE_FLAG_VALUE == 1 || BRANCH_COST >= 2)) > normalize = 1, reversep = 1; > else if (itrue == -1 > && (STORE_FLAG_VALUE == -1 > ! || BRANCH_COST >= 2)) > normalize = -1; > else if (ifalse == -1 && can_reverse > ! && (STORE_FLAG_VALUE == -1 || BRANCH_COST >= 2)) > normalize = -1, reversep = 1; > ! else if ((BRANCH_COST >= 2 && STORE_FLAG_VALUE == -1) > ! || BRANCH_COST >= 3) > normalize = -1; > else > return FALSE; > --- 968,987 ---- > normalize = 0; > else if (ifalse == 0 && exact_log2 (itrue) >= 0 > && (STORE_FLAG_VALUE == 1 > ! || if_info->branch_cost >= 2)) > normalize = 1; > else if (itrue == 0 && exact_log2 (ifalse) >= 0 && can_reverse > ! && (STORE_FLAG_VALUE == 1 || if_info->branch_cost >= 2)) > normalize = 1, reversep = 1; > else if (itrue == -1 > && (STORE_FLAG_VALUE == -1 > ! || if_info->branch_cost >= 2)) > normalize = -1; > else if (ifalse == -1 && can_reverse > ! && (STORE_FLAG_VALUE == -1 || if_info->branch_cost >= 2)) > normalize = -1, reversep = 1; > ! else if ((if_info->branch_cost >= 2 && STORE_FLAG_VALUE == -1) > ! || if_info->branch_cost >= 3) > normalize = -1; > else > return FALSE; > *************** noce_try_addcc (struct noce_if_info *if_ > *** 1107,1113 **** > > /* If that fails, construct conditional increment or decrement using > setcc. */ > ! if (BRANCH_COST >= 2 > && (XEXP (if_info->a, 1) == const1_rtx > || XEXP (if_info->a, 1) == constm1_rtx)) > { > --- 1112,1118 ---- > > /* If that fails, construct conditional increment or decrement using > setcc. */ > ! if (if_info->branch_cost >= 2 > && (XEXP (if_info->a, 1) == const1_rtx > || XEXP (if_info->a, 1) == constm1_rtx)) > { > *************** noce_try_store_flag_mask (struct noce_if > *** 1158,1164 **** > int reversep; > > reversep = 0; > ! if ((BRANCH_COST >= 2 > || STORE_FLAG_VALUE == -1) > && ((if_info->a == const0_rtx > && rtx_equal_p (if_info->b, if_info->x)) > --- 1163,1169 ---- > int reversep; > > reversep = 0; > ! if ((if_info->branch_cost >= 2 > || STORE_FLAG_VALUE == -1) > && ((if_info->a == const0_rtx > && rtx_equal_p (if_info->b, if_info->x)) > *************** noce_try_cmove_arith (struct noce_if_inf > *** 1317,1323 **** > /* ??? FIXME: Magic number 5. */ > if (cse_not_expected > && MEM_P (a) && MEM_P (b) > ! && BRANCH_COST >= 5) > { > a = XEXP (a, 0); > b = XEXP (b, 0); > --- 1322,1328 ---- > /* ??? FIXME: Magic number 5. */ > if (cse_not_expected > && MEM_P (a) && MEM_P (b) > ! && if_info->branch_cost >= 5) > { > a = XEXP (a, 0); > b = XEXP (b, 0); > *************** noce_try_cmove_arith (struct noce_if_inf > *** 1347,1353 **** > if (insn_a) > { > insn_cost = insn_rtx_cost (PATTERN (insn_a)); > ! if (insn_cost == 0 || insn_cost > COSTS_N_INSNS (BRANCH_COST)) > return FALSE; > } > else > --- 1352,1358 ---- > if (insn_a) > { > insn_cost = insn_rtx_cost (PATTERN (insn_a)); > ! if (insn_cost == 0 || insn_cost > COSTS_N_INSNS (if_info->branch_cost)) > return FALSE; > } > else > *************** noce_try_cmove_arith (struct noce_if_inf > *** 1356,1362 **** > if (insn_b) > { > insn_cost += insn_rtx_cost (PATTERN (insn_b)); > ! if (insn_cost == 0 || insn_cost > COSTS_N_INSNS (BRANCH_COST)) > return FALSE; > } > > --- 1361,1367 ---- > if (insn_b) > { > insn_cost += insn_rtx_cost (PATTERN (insn_b)); > ! if (insn_cost == 0 || insn_cost > COSTS_N_INSNS (if_info->branch_cost)) > return FALSE; > } > > *************** noce_find_if_block (basic_block test_bb, > *** 2831,2836 **** > --- 2836,2843 ---- > if_info.cond_earliest = cond_earliest; > if_info.jump = jump; > if_info.then_else_reversed = then_else_reversed; > + if_info.branch_cost = BRANCH_COST (optimize_bb_for_speed_p (test_bb), > + predictable_edge_p (then_edge)); > > /* Do the real work. */ > > *************** find_if_case_1 (basic_block test_bb, edg > *** 3597,3603 **** > test_bb->index, then_bb->index); > > /* THEN is small. */ > ! if (! cheap_bb_rtx_cost_p (then_bb, COSTS_N_INSNS (BRANCH_COST))) > return FALSE; > > /* Registers set are dead, or are predicable. */ > --- 3604,3612 ---- > test_bb->index, then_bb->index); > > /* THEN is small. */ > ! if (! cheap_bb_rtx_cost_p (then_bb, > ! COSTS_N_INSNS (BRANCH_COST (optimize_bb_for_speed_p (then_edge->src), > ! predictable_edge_p (then_edge))))) > return FALSE; > > /* Registers set are dead, or are predicable. */ > *************** find_if_case_2 (basic_block test_bb, edg > *** 3711,3717 **** > test_bb->index, else_bb->index); > > /* ELSE is small. */ > ! if (! cheap_bb_rtx_cost_p (else_bb, COSTS_N_INSNS (BRANCH_COST))) > return FALSE; > > /* Registers set are dead, or are predicable. */ > --- 3720,3728 ---- > test_bb->index, else_bb->index); > > /* ELSE is small. */ > ! if (! cheap_bb_rtx_cost_p (else_bb, > ! COSTS_N_INSNS (BRANCH_COST (optimize_bb_for_speed_p (else_edge->src), > ! predictable_edge_p (else_edge))))) > return FALSE; > > /* Registers set are dead, or are predicable. */ > Index: expr.h > =================================================================== > *** expr.h (revision 139737) > --- expr.h (working copy) > *************** along with GCC; see the file COPYING3. > *** 36,42 **** > > /* The default branch cost is 1. */ > #ifndef BRANCH_COST > ! #define BRANCH_COST 1 > #endif > > /* This is the 4th arg to `expand_expr'. > --- 36,42 ---- > > /* The default branch cost is 1. */ > #ifndef BRANCH_COST > ! #define BRANCH_COST(speed_p, predictable_p) 1 > #endif > > /* This is the 4th arg to `expand_expr'. > Index: predict.c > =================================================================== > *** predict.c (revision 139737) > --- predict.c (working copy) > *************** optimize_insn_for_speed_p (void) > *** 245,250 **** > --- 245,267 ---- > return !optimize_insn_for_size_p (); > } > > + /* Return true when edge E is likely to be well predictable by branch > + predictor. */ > + > + bool > + predictable_edge_p (edge e) > + { > + if (profile_status == PROFILE_ABSENT) > + return false; > + if ((e->probability > + <= PARAM_VALUE (PARAM_PREDICTABLE_BRANCH_OUTCOME) * REG_BR_PROB_BASE / 100) > + || (REG_BR_PROB_BASE - e->probability > + <= PARAM_VALUE (PARAM_PREDICTABLE_BRANCH_OUTCOME) * REG_BR_PROB_BASE / 100)) > + return true; > + return false; > + } > + > + > /* Set RTL expansion for BB profile. */ > > void > Index: expmed.c > =================================================================== > *** expmed.c (revision 139737) > --- expmed.c (working copy) > *************** expand_smod_pow2 (enum machine_mode mode > *** 3492,3498 **** > result = gen_reg_rtx (mode); > > /* Avoid conditional branches when they're expensive. */ > ! if (BRANCH_COST >= 2 > && optimize_insn_for_speed_p ()) > { > rtx signmask = emit_store_flag (result, LT, op0, const0_rtx, > --- 3492,3498 ---- > result = gen_reg_rtx (mode); > > /* Avoid conditional branches when they're expensive. */ > ! if (BRANCH_COST (optimize_insn_for_speed_p (), false) >= 2 > && optimize_insn_for_speed_p ()) > { > rtx signmask = emit_store_flag (result, LT, op0, const0_rtx, > *************** expand_sdiv_pow2 (enum machine_mode mode > *** 3592,3598 **** > logd = floor_log2 (d); > shift = build_int_cst (NULL_TREE, logd); > > ! if (d == 2 && BRANCH_COST >= 1) > { > temp = gen_reg_rtx (mode); > temp = emit_store_flag (temp, LT, op0, const0_rtx, mode, 0, 1); > --- 3592,3600 ---- > logd = floor_log2 (d); > shift = build_int_cst (NULL_TREE, logd); > > ! if (d == 2 > ! && BRANCH_COST (optimize_insn_for_speed_p (), > ! false) >= 1) > { > temp = gen_reg_rtx (mode); > temp = emit_store_flag (temp, LT, op0, const0_rtx, mode, 0, 1); > *************** expand_sdiv_pow2 (enum machine_mode mode > *** 3602,3608 **** > } > > #ifdef HAVE_conditional_move > ! if (BRANCH_COST >= 2) > { > rtx temp2; > > --- 3604,3611 ---- > } > > #ifdef HAVE_conditional_move > ! if (BRANCH_COST (optimize_insn_for_speed_p (), false) > ! >= 2) > { > rtx temp2; > > *************** expand_sdiv_pow2 (enum machine_mode mode > *** 3631,3637 **** > } > #endif > > ! if (BRANCH_COST >= 2) > { > int ushift = GET_MODE_BITSIZE (mode) - logd; > > --- 3634,3641 ---- > } > #endif > > ! if (BRANCH_COST (optimize_insn_for_speed_p (), > ! false) >= 2) > { > int ushift = GET_MODE_BITSIZE (mode) - logd; > > *************** emit_store_flag (rtx target, enum rtx_co > *** 5345,5351 **** > comparison with zero. Don't do any of these cases if branches are > very cheap. */ > > ! if (BRANCH_COST > 0 > && GET_MODE_CLASS (mode) == MODE_INT && (code == EQ || code == NE) > && op1 != const0_rtx) > { > --- 5349,5356 ---- > comparison with zero. Don't do any of these cases if branches are > very cheap. */ > > ! if (BRANCH_COST (optimize_insn_for_speed_p (), > ! false) > 0 > && GET_MODE_CLASS (mode) == MODE_INT && (code == EQ || code == NE) > && op1 != const0_rtx) > { > *************** emit_store_flag (rtx target, enum rtx_co > *** 5368,5377 **** > do LE and GT if branches are expensive since they are expensive on > 2-operand machines. */ > > ! if (BRANCH_COST == 0 > || GET_MODE_CLASS (mode) != MODE_INT || op1 != const0_rtx > || (code != EQ && code != NE > ! && (BRANCH_COST <= 1 || (code != LE && code != GT)))) > return 0; > > /* See what we need to return. We can only return a 1, -1, or the > --- 5373,5384 ---- > do LE and GT if branches are expensive since they are expensive on > 2-operand machines. */ > > ! if (BRANCH_COST (optimize_insn_for_speed_p (), > ! false) == 0 > || GET_MODE_CLASS (mode) != MODE_INT || op1 != const0_rtx > || (code != EQ && code != NE > ! && (BRANCH_COST (optimize_insn_for_speed_p (), > ! false) <= 1 || (code != LE && code != GT)))) > return 0; > > /* See what we need to return. We can only return a 1, -1, or the > *************** emit_store_flag (rtx target, enum rtx_co > *** 5467,5473 **** > that "or", which is an extra insn, so we only handle EQ if branches > are expensive. */ > > ! if (tem == 0 && (code == NE || BRANCH_COST > 1)) > { > if (rtx_equal_p (subtarget, op0)) > subtarget = 0; > --- 5474,5483 ---- > that "or", which is an extra insn, so we only handle EQ if branches > are expensive. */ > > ! if (tem == 0 > ! && (code == NE > ! || BRANCH_COST (optimize_insn_for_speed_p (), > ! false) > 1)) > { > if (rtx_equal_p (subtarget, op0)) > subtarget = 0; > Index: basic-block.h > =================================================================== > *** basic-block.h (revision 139737) > --- basic-block.h (working copy) > *************** extern void guess_outgoing_edge_probabil > *** 848,853 **** > --- 848,854 ---- > extern void remove_predictions_associated_with_edge (edge); > extern bool edge_probability_reliable_p (const_edge); > extern bool br_prob_note_reliable_p (const_rtx); > + extern bool predictable_edge_p (edge); > > /* In cfg.c */ > extern void dump_regset (regset, FILE *); > Index: config/alpha/alpha.h > =================================================================== > *** config/alpha/alpha.h (revision 139737) > --- config/alpha/alpha.h (working copy) > *************** extern int alpha_memory_latency; > *** 640,646 **** > #define MEMORY_MOVE_COST(MODE,CLASS,IN) (2*alpha_memory_latency) > > /* Provide the cost of a branch. Exact meaning under development. */ > ! #define BRANCH_COST 5 > > /* Stack layout; function entry, exit and calling. */ > > --- 640,646 ---- > #define MEMORY_MOVE_COST(MODE,CLASS,IN) (2*alpha_memory_latency) > > /* Provide the cost of a branch. Exact meaning under development. */ > ! #define BRANCH_COST(speed_p, predictable_p) 5 > > /* Stack layout; function entry, exit and calling. */ > > Index: config/frv/frv.h > =================================================================== > *** config/frv/frv.h (revision 139737) > --- config/frv/frv.h (working copy) > *************** do { \ > *** 2193,2199 **** > > /* A C expression for the cost of a branch instruction. A value of 1 is the > default; other values are interpreted relative to that. */ > ! #define BRANCH_COST frv_branch_cost_int > > /* Define this macro as a C expression which is nonzero if accessing less than > a word of memory (i.e. a `char' or a `short') is no faster than accessing a > --- 2193,2199 ---- > > /* A C expression for the cost of a branch instruction. A value of 1 is the > default; other values are interpreted relative to that. */ > ! #define BRANCH_COST(speed_p, predictable_p) frv_branch_cost_int > > /* Define this macro as a C expression which is nonzero if accessing less than > a word of memory (i.e. a `char' or a `short') is no faster than accessing a > Index: config/s390/s390.h > =================================================================== > *** config/s390/s390.h (revision 139737) > --- config/s390/s390.h (working copy) > *************** extern struct rtx_def *s390_compare_op0, > *** 828,834 **** > > /* A C expression for the cost of a branch instruction. A value of 1 > is the default; other values are interpreted relative to that. */ > ! #define BRANCH_COST 1 > > /* Nonzero if access to memory by bytes is slow and undesirable. */ > #define SLOW_BYTE_ACCESS 1 > --- 828,834 ---- > > /* A C expression for the cost of a branch instruction. A value of 1 > is the default; other values are interpreted relative to that. */ > ! #define BRANCH_COST(speed_p, predictable_p) 1 > > /* Nonzero if access to memory by bytes is slow and undesirable. */ > #define SLOW_BYTE_ACCESS 1 > Index: config/spu/spu.h > =================================================================== > *** config/spu/spu.h (revision 139737) > --- config/spu/spu.h (working copy) > *************** targetm.resolve_overloaded_builtin = spu > *** 434,440 **** > > /* Costs */ > > ! #define BRANCH_COST spu_branch_cost > > #define SLOW_BYTE_ACCESS 0 > > --- 434,440 ---- > > /* Costs */ > > ! #define BRANCH_COST(speed_p, predictable_p) spu_branch_cost > > #define SLOW_BYTE_ACCESS 0 > > Index: config/sparc/sparc.h > =================================================================== > *** config/sparc/sparc.h (revision 139737) > --- config/sparc/sparc.h (working copy) > *************** do { > *** 2196,2202 **** > On Niagara-2, a not-taken branch costs 1 cycle whereas a taken > branch costs 6 cycles. */ > > ! #define BRANCH_COST \ > ((sparc_cpu == PROCESSOR_V9 \ > || sparc_cpu == PROCESSOR_ULTRASPARC) \ > ? 7 \ > --- 2196,2202 ---- > On Niagara-2, a not-taken branch costs 1 cycle whereas a taken > branch costs 6 cycles. */ > > ! #define BRANCH_COST (speed_p, predictable_p) \ > ((sparc_cpu == PROCESSOR_V9 \ > || sparc_cpu == PROCESSOR_ULTRASPARC) \ > ? 7 \ > Index: config/m32r/m32r.h > =================================================================== > *** config/m32r/m32r.h (revision 139737) > --- config/m32r/m32r.h (working copy) > *************** L2: .word STATIC > *** 1224,1230 **** > /* A value of 2 here causes GCC to avoid using branches in comparisons like > while (a < N && a). Branches aren't that expensive on the M32R so > we define this as 1. Defining it as 2 had a heavy hit in fp-bit.c. */ > ! #define BRANCH_COST ((TARGET_BRANCH_COST) ? 2 : 1) > > /* Nonzero if access to memory by bytes is slow and undesirable. > For RISC chips, it means that access to memory by bytes is no > --- 1224,1230 ---- > /* A value of 2 here causes GCC to avoid using branches in comparisons like > while (a < N && a). Branches aren't that expensive on the M32R so > we define this as 1. Defining it as 2 had a heavy hit in fp-bit.c. */ > ! #define BRANCH_COST(speed_p, predictable_p) ((TARGET_BRANCH_COST) ? 2 : 1) > > /* Nonzero if access to memory by bytes is slow and undesirable. > For RISC chips, it means that access to memory by bytes is no > Index: config/i386/i386.h > =================================================================== > *** config/i386/i386.h (revision 139737) > --- config/i386/i386.h (working copy) > *************** do { \ > *** 1975,1981 **** > /* A C expression for the cost of a branch instruction. A value of 1 > is the default; other values are interpreted relative to that. */ > > ! #define BRANCH_COST ix86_branch_cost > > /* Define this macro as a C expression which is nonzero if accessing > less than a word of memory (i.e. a `char' or a `short') is no > --- 1975,1982 ---- > /* A C expression for the cost of a branch instruction. A value of 1 > is the default; other values are interpreted relative to that. */ > > ! #define BRANCH_COST(speed_p, predictable_p) \ > ! (!(speed_p) ? 2 : (predictable_p) ? 0 : ix86_branch_cost) > > /* Define this macro as a C expression which is nonzero if accessing > less than a word of memory (i.e. a `char' or a `short') is no > Index: config/i386/i386.c > =================================================================== > *** config/i386/i386.c (revision 139737) > --- config/i386/i386.c (working copy) > *************** ix86_expand_int_movcc (rtx operands[]) > *** 14636,14642 **** > */ > > if ((!TARGET_CMOVE || (mode == QImode && TARGET_PARTIAL_REG_STALL)) > ! && BRANCH_COST >= 2) > { > if (cf == 0) > { > --- 14636,14643 ---- > */ > > if ((!TARGET_CMOVE || (mode == QImode && TARGET_PARTIAL_REG_STALL)) > ! && BRANCH_COST (optimize_insn_for_speed_p (), > ! false) >= 2) > { > if (cf == 0) > { > *************** ix86_expand_int_movcc (rtx operands[]) > *** 14721,14727 **** > optab op; > rtx var, orig_out, out, tmp; > > ! if (BRANCH_COST <= 2) > return 0; /* FAIL */ > > /* If one of the two operands is an interesting constant, load a > --- 14722,14728 ---- > optab op; > rtx var, orig_out, out, tmp; > > ! if (BRANCH_COST (optimize_insn_for_speed_p (), false) <= 2) > return 0; /* FAIL */ > > /* If one of the two operands is an interesting constant, load a > Index: config/sh/sh.h > =================================================================== > *** config/sh/sh.h (revision 139737) > --- config/sh/sh.h (working copy) > *************** struct sh_args { > *** 2847,2853 **** > The SH1 does not have delay slots, hence we get a pipeline stall > at every branch. The SH4 is superscalar, so the single delay slot > is not sufficient to keep both pipelines filled. */ > ! #define BRANCH_COST (TARGET_SH5 ? 1 : ! TARGET_SH2 || TARGET_HARD_SH4 ? 2 : 1) > > /* Assembler output control. */ > > --- 2847,2854 ---- > The SH1 does not have delay slots, hence we get a pipeline stall > at every branch. The SH4 is superscalar, so the single delay slot > is not sufficient to keep both pipelines filled. */ > ! #define BRANCH_COST(speed_p, predictable_p) \ > ! (TARGET_SH5 ? 1 : ! TARGET_SH2 || TARGET_HARD_SH4 ? 2 : 1) > > /* Assembler output control. */ > > Index: config/pdp11/pdp11.h > =================================================================== > *** config/pdp11/pdp11.h (revision 139737) > --- config/pdp11/pdp11.h (working copy) > *************** JMP FUNCTION 0x0058 0x0000 <- FUNCTION > *** 1057,1063 **** > /* there is no point in avoiding branches on a pdp, > since branches are really cheap - I just want to find out > how much difference the BRANCH_COST macro makes in code */ > ! #define BRANCH_COST (TARGET_BRANCH_CHEAP ? 0 : 1) > > > #define COMPARE_FLAG_MODE HImode > --- 1057,1063 ---- > /* there is no point in avoiding branches on a pdp, > since branches are really cheap - I just want to find out > how much difference the BRANCH_COST macro makes in code */ > ! #define BRANCH_COST(speed_p, predictable_p) (TARGET_BRANCH_CHEAP ? 0 : 1) > > > #define COMPARE_FLAG_MODE HImode > Index: config/avr/avr.h > =================================================================== > *** config/avr/avr.h (revision 139737) > --- config/avr/avr.h (working copy) > *************** do { \ > *** 511,517 **** > (MODE)==SImode ? 8 : \ > (MODE)==SFmode ? 8 : 16) > > ! #define BRANCH_COST 0 > > #define SLOW_BYTE_ACCESS 0 > > --- 511,517 ---- > (MODE)==SImode ? 8 : \ > (MODE)==SFmode ? 8 : 16) > > ! #define BRANCH_COST(speed_p, predictable_p) 0 > > #define SLOW_BYTE_ACCESS 0 > > Index: config/crx/crx.h > =================================================================== > *** config/crx/crx.h (revision 139737) > --- config/crx/crx.h (working copy) > *************** struct cumulative_args > *** 420,426 **** > /* Moving to processor register flushes pipeline - thus asymmetric */ > #define REGISTER_MOVE_COST(MODE, FROM, TO) ((TO != GENERAL_REGS) ? 8 : 2) > /* Assume best case (branch predicted) */ > ! #define BRANCH_COST 2 > > #define SLOW_BYTE_ACCESS 1 > > --- 420,426 ---- > /* Moving to processor register flushes pipeline - thus asymmetric */ > #define REGISTER_MOVE_COST(MODE, FROM, TO) ((TO != GENERAL_REGS) ? 8 : 2) > /* Assume best case (branch predicted) */ > ! #define BRANCH_COST(speed_p, predictable_p) 2 > > #define SLOW_BYTE_ACCESS 1 > > Index: config/xtensa/xtensa.h > =================================================================== > *** config/xtensa/xtensa.h (revision 139737) > --- config/xtensa/xtensa.h (working copy) > *************** typedef struct xtensa_args > *** 882,888 **** > > #define MEMORY_MOVE_COST(MODE, CLASS, IN) 4 > > ! #define BRANCH_COST 3 > > /* How to refer to registers in assembler output. > This sequence is indexed by compiler's hard-register-number (see above). */ > --- 882,888 ---- > > #define MEMORY_MOVE_COST(MODE, CLASS, IN) 4 > > ! #define BRANCH_COST(speed_p, predictable_p) 3 > > /* How to refer to registers in assembler output. > This sequence is indexed by compiler's hard-register-number (see above). */ > Index: config/stormy16/stormy16.h > =================================================================== > *** config/stormy16/stormy16.h (revision 139737) > --- config/stormy16/stormy16.h (working copy) > *************** do { \ > *** 587,593 **** > > #define MEMORY_MOVE_COST(M,C,I) (5 + memory_move_secondary_cost (M, C, I)) > > ! #define BRANCH_COST 5 > > #define SLOW_BYTE_ACCESS 0 > > --- 587,593 ---- > > #define MEMORY_MOVE_COST(M,C,I) (5 + memory_move_secondary_cost (M, C, I)) > > ! #define BRANCH_COST(speed_p, predictable_p) 5 > > #define SLOW_BYTE_ACCESS 0 > > Index: config/m68hc11/m68hc11.h > =================================================================== > *** config/m68hc11/m68hc11.h (revision 139737) > --- config/m68hc11/m68hc11.h (working copy) > *************** extern unsigned char m68hc11_reg_valid_f > *** 1266,1272 **** > > Pretend branches are cheap because GCC generates sub-optimal code > for the default value. */ > ! #define BRANCH_COST 0 > > /* Nonzero if access to memory by bytes is slow and undesirable. */ > #define SLOW_BYTE_ACCESS 0 > --- 1266,1272 ---- > > Pretend branches are cheap because GCC generates sub-optimal code > for the default value. */ > ! #define BRANCH_COST(speed_p, predictable_p) 0 > > /* Nonzero if access to memory by bytes is slow and undesirable. */ > #define SLOW_BYTE_ACCESS 0 > Index: config/iq2000/iq2000.h > =================================================================== > *** config/iq2000/iq2000.h (revision 139737) > --- config/iq2000/iq2000.h (working copy) > *************** typedef struct iq2000_args > *** 624,630 **** > #define MEMORY_MOVE_COST(MODE,CLASS,TO_P) \ > (TO_P ? 2 : 16) > > ! #define BRANCH_COST 2 > > #define SLOW_BYTE_ACCESS 1 > > --- 624,630 ---- > #define MEMORY_MOVE_COST(MODE,CLASS,TO_P) \ > (TO_P ? 2 : 16) > > ! #define BRANCH_COST(speed_p, predictable_p) 2 > > #define SLOW_BYTE_ACCESS 1 > > Index: config/ia64/ia64.h > =================================================================== > *** config/ia64/ia64.h (revision 139737) > --- config/ia64/ia64.h (working copy) > *************** do { \ > *** 1384,1390 **** > many additional insn groups we run into, vs how good the dynamic > branch predictor is. */ > > ! #define BRANCH_COST 6 > > /* Define this macro as a C expression which is nonzero if accessing less than > a word of memory (i.e. a `char' or a `short') is no faster than accessing a > --- 1384,1390 ---- > many additional insn groups we run into, vs how good the dynamic > branch predictor is. */ > > ! #define BRANCH_COST(speed_p, predictable_p) 6 > > /* Define this macro as a C expression which is nonzero if accessing less than > a word of memory (i.e. a `char' or a `short') is no faster than accessing a > Index: config/rs6000/rs6000.h > =================================================================== > *** config/rs6000/rs6000.h (revision 139737) > --- config/rs6000/rs6000.h (working copy) > *************** extern enum rs6000_nop_insertion rs6000_ > *** 967,973 **** > Set this to 3 on the RS/6000 since that is roughly the average cost of an > unscheduled conditional branch. */ > > ! #define BRANCH_COST 3 > > /* Override BRANCH_COST heuristic which empirically produces worse > performance for removing short circuiting from the logical ops. */ > --- 967,973 ---- > Set this to 3 on the RS/6000 since that is roughly the average cost of an > unscheduled conditional branch. */ > > ! #define BRANCH_COST(speed_p, predictable_p) 3 > > /* Override BRANCH_COST heuristic which empirically produces worse > performance for removing short circuiting from the logical ops. */ > Index: config/arc/arc.h > =================================================================== > *** config/arc/arc.h (revision 139737) > --- config/arc/arc.h (working copy) > *************** arc_select_cc_mode (OP, X, Y) > *** 824,830 **** > /* The cost of a branch insn. */ > /* ??? What's the right value here? Branches are certainly more > expensive than reg->reg moves. */ > ! #define BRANCH_COST 2 > > /* Nonzero if access to memory by bytes is slow and undesirable. > For RISC chips, it means that access to memory by bytes is no > --- 824,830 ---- > /* The cost of a branch insn. */ > /* ??? What's the right value here? Branches are certainly more > expensive than reg->reg moves. */ > ! #define BRANCH_COST(speed_p, predictable_p) 2 > > /* Nonzero if access to memory by bytes is slow and undesirable. > For RISC chips, it means that access to memory by bytes is no > Index: config/score/score.h > =================================================================== > *** config/score/score.h (revision 139737) > --- config/score/score.h (working copy) > *************** typedef struct score_args > *** 793,799 **** > (4 + memory_move_secondary_cost ((MODE), (CLASS), (TO_P))) > > /* Try to generate sequences that don't involve branches. */ > ! #define BRANCH_COST 2 > > /* Nonzero if access to memory by bytes is slow and undesirable. */ > #define SLOW_BYTE_ACCESS 1 > --- 793,799 ---- > (4 + memory_move_secondary_cost ((MODE), (CLASS), (TO_P))) > > /* Try to generate sequences that don't involve branches. */ > ! #define BRANCH_COST(speed_p, predictable_p) 2 > > /* Nonzero if access to memory by bytes is slow and undesirable. */ > #define SLOW_BYTE_ACCESS 1 > Index: config/arm/arm.h > =================================================================== > *** config/arm/arm.h (revision 139737) > --- config/arm/arm.h (working copy) > *************** do { \ > *** 2297,2303 **** > > /* Try to generate sequences that don't involve branches, we can then use > conditional instructions */ > ! #define BRANCH_COST \ > (TARGET_32BIT ? 4 : (optimize > 0 ? 2 : 0)) > > /* Position Independent Code. */ > --- 2297,2303 ---- > > /* Try to generate sequences that don't involve branches, we can then use > conditional instructions */ > ! #define BRANCH_COST(speed_p, predictable_p) \ > (TARGET_32BIT ? 4 : (optimize > 0 ? 2 : 0)) > > /* Position Independent Code. */ > Index: config/pa/pa.h > =================================================================== > *** config/pa/pa.h (revision 139737) > --- config/pa/pa.h (working copy) > *************** do { \ > *** 1570,1576 **** > : 2) > > /* Adjust the cost of branches. */ > ! #define BRANCH_COST (pa_cpu == PROCESSOR_8000 ? 2 : 1) > > /* Handling the special cases is going to get too complicated for a macro, > just call `pa_adjust_insn_length' to do the real work. */ > --- 1570,1576 ---- > : 2) > > /* Adjust the cost of branches. */ > ! #define BRANCH_COST(speed_p, predictable_p) (pa_cpu == PROCESSOR_8000 ? 2 : 1) > > /* Handling the special cases is going to get too complicated for a macro, > just call `pa_adjust_insn_length' to do the real work. */ > Index: config/mips/mips.h > =================================================================== > *** config/mips/mips.h (revision 139737) > --- config/mips/mips.h (working copy) > *************** typedef struct mips_args { > *** 2551,2557 **** > /* A C expression for the cost of a branch instruction. A value of > 1 is the default; other values are interpreted relative to that. */ > > ! #define BRANCH_COST mips_branch_cost > #define LOGICAL_OP_NON_SHORT_CIRCUIT 0 > > /* If defined, modifies the length assigned to instruction INSN as a > --- 2551,2557 ---- > /* A C expression for the cost of a branch instruction. A value of > 1 is the default; other values are interpreted relative to that. */ > > ! #define BRANCH_COST(speed_p, predictable_p) mips_branch_cost > #define LOGICAL_OP_NON_SHORT_CIRCUIT 0 > > /* If defined, modifies the length assigned to instruction INSN as a > Index: config/vax/vax.h > =================================================================== > *** config/vax/vax.h (revision 139737) > --- config/vax/vax.h (working copy) > *************** enum reg_class { NO_REGS, ALL_REGS, LIM_ > *** 648,654 **** > Branches are extremely cheap on the VAX while the shift insns often > used to replace branches can be expensive. */ > > ! #define BRANCH_COST 0 > > /* Tell final.c how to eliminate redundant test instructions. */ > > --- 648,654 ---- > Branches are extremely cheap on the VAX while the shift insns often > used to replace branches can be expensive. */ > > ! #define BRANCH_COST(speed_p, predictable_p) 0 > > /* Tell final.c how to eliminate redundant test instructions. */ > > Index: config/h8300/h8300.h > =================================================================== > *** config/h8300/h8300.h (revision 139737) > --- config/h8300/h8300.h (working copy) > *************** struct cum_arg > *** 1004,1010 **** > #define DELAY_SLOT_LENGTH(JUMP) \ > (NEXT_INSN (PREV_INSN (JUMP)) == JUMP ? 0 : 2) > > ! #define BRANCH_COST 0 > > /* Tell final.c how to eliminate redundant test instructions. */ > > --- 1004,1010 ---- > #define DELAY_SLOT_LENGTH(JUMP) \ > (NEXT_INSN (PREV_INSN (JUMP)) == JUMP ? 0 : 2) > > ! #define BRANCH_COST(speed_p, predictable_p) 0 > > /* Tell final.c how to eliminate redundant test instructions. */ > > Index: params.def > =================================================================== > *** params.def (revision 139737) > --- params.def (working copy) > *************** DEFPARAM (PARAM_STRUCT_REORG_COLD_STRUCT > *** 78,83 **** > --- 78,90 ---- > "The threshold ratio between current and hottest structure counts", > 10, 0, 100) > > + /* When branch is predicted to be taken with probability lower than this > + threshold (in percent), then it is considered well predictable. */ > + DEFPARAM (PARAM_PREDICTABLE_BRANCH_OUTCOME, > + "predictable-branch-outcome", > + "Maximal esitmated outcome of branch considered predictable", > + 2, 0, 50) > + > /* The single function inlining limit. This is the maximum size > of a function counted in internal gcc instructions (not in > real machine instructions) that is eligible for inlining >