From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 19764 invoked by alias); 3 Nov 2005 21:23:23 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 19643 invoked by alias); 3 Nov 2005 21:23:15 -0000 Date: Thu, 03 Nov 2005 21:23:00 -0000 Message-ID: <20051103212315.19642.qmail@sourceware.org> X-Bugzilla-Reason: CC References: Subject: [Bug rtl-optimization/23490] [3.4/4.0/4.1 Regression] Long compile time for array initializer with inlined constructor In-Reply-To: Reply-To: gcc-bugzilla@gcc.gnu.org To: gcc-bugs@gcc.gnu.org From: "hubicka at ucw dot cz" X-SW-Source: 2005-11/txt/msg00525.txt.bz2 List-Id: ------- Comment #11 from hubicka at ucw dot cz 2005-11-03 21:23 ------- Subject: Re: [3.4/4.0/4.1 Regression] Long compile time for array initializer with inlined constructor Hmm, OK, this adds the neccesary knobs so you can trottle the parameters even further. Sadly this brings quite dificult to tune parameters since reducing them considerably might easilly ruin code quality and I am unsure how we want to proceed... (IE keep them high and change your makefiles or try to reduce them to something more scalable) Honza 2005-11-03 Jan Hubicka * doc/invoke.texi (max-predicted-iterations, max-cse-insns, max-flow-memory-location): Document. * flow.c: Include params.h (MAX_MEM_SET_LIST_LEN): Kill. (add_to_mem_set_list): Use new param. * cse.c (cse_basic_block): Replace 1000 by new param. * params.def (PARAM_MAX_PREDICTED_ITERATIONS, PARAM_MAX_CSE_INSNS, PARAM_MAX_FLOW_MEMORY_LOCATIONS): New. * predict.c (predict_loops): Use new param. * predict.def (MAX_PRED_LOOP_ITERATIONS): Remove. Index: doc/invoke.texi =================================================================== --- doc/invoke.texi (revision 106422) +++ doc/invoke.texi (working copy) @@ -5934,6 +5938,13 @@ given basic block needs to have to be co Select fraction of the maximal frequency of executions of basic block in function given basic block needs to have to be considered hot +@item max-predicted-iterations +The maximum number of loop iterations we predict statically. This is useful +in cases where function contain single loop with known bound and other loop +with unknown. We predict the known number of iterations correctly, while +the unknown nummber of iterations average to roughly 10. This means that the +loop without bounds would appear artifically cold relative to the other one. + @item tracer-dynamic-coverage @itemx tracer-dynamic-coverage-feedback @@ -5971,6 +5982,9 @@ order to make tracer effective. Maximum number of basic blocks on path that cse considers. The default is 10. +@item max-cse-insns +The maximum instructions CSE process before flushing. The default is 1000. + @item global-var-threshold Counts the number of function calls (@var{n}) and the number of @@ -6031,6 +6045,10 @@ value is 100. The maximum number of memory locations cselib should take into acount. Increasing values mean more aggressive optimization, making the compile time increase with probably slightly better performance. The default value is 500. + +@item max-flow-memory-location +Similar as @option{max-cselib-memory-location} but for dataflow liveness. +The default value is 100. @item reorder-blocks-duplicate @itemx reorder-blocks-duplicate-feedback Index: flow.c =================================================================== --- flow.c (revision 106422) +++ flow.c (working copy) @@ -141,6 +141,7 @@ Software Foundation, 51 Franklin Street, #include "obstack.h" #include "splay-tree.h" #include "tree-pass.h" +#include "params.h" #ifndef HAVE_epilogue #define HAVE_epilogue 0 @@ -283,10 +284,6 @@ static int ndead; static int *reg_deaths; -/* Maximum length of pbi->mem_set_list before we start dropping - new elements on the floor. */ -#define MAX_MEM_SET_LIST_LEN 100 - /* Forward declarations */ static int verify_wide_reg_1 (rtx *, void *); static void verify_wide_reg (int, basic_block); @@ -630,7 +627,7 @@ update_life_info (sbitmap blocks, enum u /* We repeat regardless of what cleanup_cfg says. If there were instructions deleted above, that might have been only a - partial improvement (see MAX_MEM_SET_LIST_LEN usage). + partial improvement (see PARAM_MAX_FLOW_MEMORY_LOCATIONS usage). Further improvement may be possible. */ cleanup_cfg (CLEANUP_EXPENSIVE); @@ -2515,7 +2512,7 @@ add_to_mem_set_list (struct propagate_bl } } - if (pbi->mem_set_list_len < MAX_MEM_SET_LIST_LEN) + if (pbi->mem_set_list_len < PARAM_VALUE (PARAM_MAX_FLOW_MEMORY_LOCATIONS)) { #ifdef AUTO_INC_DEC /* Store a copy of mem, otherwise the address may be Index: cse.c =================================================================== --- cse.c (revision 106422) +++ cse.c (working copy) @@ -6890,7 +6890,7 @@ cse_basic_block (rtx from, rtx to, struc ??? This is a real kludge and needs to be done some other way. Perhaps for 2.9. */ - if (code != NOTE && num_insns++ > 1000) + if (code != NOTE && num_insns++ > PARAM_VALUE (PARAM_MAX_CSE_INSNS)) { flush_hash_table (); num_insns = 0; Index: predict.c =================================================================== --- predict.c (revision 106422) +++ predict.c (working copy) @@ -624,8 +624,9 @@ predict_loops (struct loops *loops_info, niter = desc.niter + 1; if (niter == 0) /* We might overflow here. */ niter = desc.niter; - if (niter > MAX_PRED_LOOP_ITERATIONS) - niter = MAX_PRED_LOOP_ITERATIONS; + if (niter + > (unsigned int) PARAM_VALUE (PARAM_MAX_PREDICTED_ITERATIONS)) + niter = PARAM_VALUE (PARAM_MAX_PREDICTED_ITERATIONS); prob = (REG_BR_PROB_BASE - (REG_BR_PROB_BASE + niter /2) / niter); @@ -653,19 +654,17 @@ predict_loops (struct loops *loops_info, if (TREE_CODE (niter) == INTEGER_CST) { int probability; + int max = PARAM_VALUE (PARAM_MAX_PREDICTED_ITERATIONS); if (host_integerp (niter, 1) && tree_int_cst_lt (niter, - build_int_cstu (NULL_TREE, - MAX_PRED_LOOP_ITERATIONS - 1))) + build_int_cstu (NULL_TREE, max - 1))) { HOST_WIDE_INT nitercst = tree_low_cst (niter, 1) + 1; probability = ((REG_BR_PROB_BASE + nitercst / 2) / nitercst); } else - probability = ((REG_BR_PROB_BASE - + MAX_PRED_LOOP_ITERATIONS / 2) - / MAX_PRED_LOOP_ITERATIONS); + probability = ((REG_BR_PROB_BASE + max / 2) / max); predict_edge (exits[j], PRED_LOOP_ITERATIONS, probability); } Index: params.def =================================================================== --- params.def (revision 106422) +++ params.def (working copy) @@ -309,6 +309,22 @@ DEFPARAM(HOT_BB_FREQUENCY_FRACTION, "hot-bb-frequency-fraction", "Select fraction of the maximal frequency of executions of basic block in function given basic block needs to have to be considered hot", 1000, 0, 0) + +/* For guessed profiles, the loops having unknown number of iterations + are predicted to iterate relatively few (10) times at average. + For functions containing one loop with large known number of iterations + and other loops having unbounded loops we would end up predicting all + the other loops cold that is not usually the case. So we need to artifically + flatten the profile. + + We need to cut the maximal predicted iterations to large enought iterations + so the loop appears important, but safely within HOT_BB_COUNT_FRACTION + range. */ + +DEFPARAM(PARAM_MAX_PREDICTED_ITERATIONS, + "max-predicted-iterations", + "The maximum number of loop iterations we predict statically", + 100, 0, 0) DEFPARAM(TRACER_DYNAMIC_COVERAGE_FEEDBACK, "tracer-dynamic-coverage-feedback", "The percentage of function, weighted by execution frequency, that must be covered by trace formation. Used when profile feedback is available", @@ -363,6 +379,10 @@ DEFPARAM(PARAM_MAX_CSE_PATH_LENGTH, "max-cse-path-length", "The maximum length of path considered in cse", 10, 0, 0) +DEFPARAM(PARAM_MAX_CSE_INSNS, + "max-flow-memory-locations", + "The maximum instructions CSE process before flushing", + 1000, 0, 0) /* The cost of expression in loop invariant motion that is considered expensive. */ @@ -417,6 +437,10 @@ DEFPARAM(PARAM_MAX_CSELIB_MEMORY_LOCATIO "max-cselib-memory-locations", "The maximum memory locations recorded by cselib", 500, 0, 0) +DEFPARAM(PARAM_MAX_FLOW_MEMORY_LOCATIONS, + "max-flow-memory-locations", + "The maximum memory locations recorded by flow", + 100, 0, 0) #ifdef ENABLE_GC_ALWAYS_COLLECT # define GGC_MIN_EXPAND_DEFAULT 0 Index: predict.def =================================================================== --- predict.def (revision 106422) +++ predict.def (working copy) @@ -58,18 +58,6 @@ DEF_PREDICTOR (PRED_UNCONDITIONAL, "unco DEF_PREDICTOR (PRED_LOOP_ITERATIONS, "loop iterations", PROB_ALWAYS, PRED_FLAG_FIRST_MATCH) -/* For guessed profiles, the loops having unknown number of iterations - are predicted to iterate relatively few (10) times at average. - For functions containing one loop with large known number of iterations - and other loops having unbounded loops we would end up predicting all - the other loops cold that is not usually the case. So we need to artifically - flatten the profile. - - We need to cut the maximal predicted iterations to large enought iterations - so the loop appears important, but safely within HOT_BB_COUNT_FRACTION - range. */ -#define MAX_PRED_LOOP_ITERATIONS 100 - /* Hints dropped by user via __builtin_expect feature. */ DEF_PREDICTOR (PRED_BUILTIN_EXPECT, "__builtin_expect", PROB_VERY_LIKELY, PRED_FLAG_FIRST_MATCH) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23490