* Re: [PATCH][RFA] Split passes from gcse_main into separate passes
@ 2009-04-26 22:50 Bradley Lucier
2009-04-27 5:53 ` Steven Bosscher
0 siblings, 1 reply; 6+ messages in thread
From: Bradley Lucier @ 2009-04-26 22:50 UTC (permalink / raw)
To: Steven Bosscher; +Cc: Bradley Lucier, gcc-patches
With regard to:
> After this split, we no longer have a reason to *not* alter jumps or
> bypass jumps in CPROP. Therefore, the separate jump bypassing pass is
> removed and CPROP now always alters jumps and bypasses jumps (this is
> extremely cheap anyway).
Would it be OK to remove this code in a separate patch, either before
or after the addition of this new code?
I've been trying to figure out how to analyze r118475, which removed
"old" optimizations at the same time it added "new" optimizations,
and it's hard to separate the two things figure out what has caused
any slow-downs that one might see.
Brad
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH][RFA] Split passes from gcse_main into separate passes
2009-04-26 22:50 [PATCH][RFA] Split passes from gcse_main into separate passes Bradley Lucier
@ 2009-04-27 5:53 ` Steven Bosscher
0 siblings, 0 replies; 6+ messages in thread
From: Steven Bosscher @ 2009-04-27 5:53 UTC (permalink / raw)
To: Bradley Lucier; +Cc: gcc-patches
On Mon, Apr 27, 2009 at 12:21 AM, Bradley Lucier <lucier@math.purdue.edu> wrote:
> With regard to:
>
>> After this split, we no longer have a reason to *not* alter jumps or
>> bypass jumps in CPROP. Therefore, the separate jump bypassing pass is
>> removed and CPROP now always alters jumps and bypasses jumps (this is
>> extremely cheap anyway).
>
>
> Would it be OK to remove this code in a separate patch, either before or
> after the addition of this new code?
That would be possible, but I'd have to add another rtl_opt_pass. So
I prefer not to do this.
> I've been trying to figure out how to analyze r118475, which removed "old"
> optimizations at the same time it added "new" optimizations, and it's hard
> to separate the two things figure out what has caused any slow-downs that
> one might see.
This isn't really a new optimization. In the old code, the jump_bypass
pass is just CPROP3 with alter jumps and bypass jumps. CPROP2 also
ran with alter jumps and bypass jumps. The only "new" bit is that we
now would do it in CPROP1 also. If something was slow here before,
you would have seen it already.
Ciao!
Steven
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH][RFA] Split passes from gcse_main into separate passes
2009-04-26 21:27 Steven Bosscher
2009-04-26 21:37 ` Steven Bosscher
2009-04-27 8:45 ` Richard Guenther
@ 2010-12-03 15:05 ` H.J. Lu
2 siblings, 0 replies; 6+ messages in thread
From: H.J. Lu @ 2010-12-03 15:05 UTC (permalink / raw)
To: Steven Bosscher; +Cc: GCC Patches, Richard Guenther
On Sun, Apr 26, 2009 at 2:23 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> Hi,
>
> This is the next step in the big gcse cleanup. This patch splits the
> uber-pass gcse_main() into individual passes: CPROP, PRE, HOIST,
> STORE_MOTION, and local CSE.
>
> After this split, we no longer have a reason to *not* alter jumps or
> bypass jumps in CPROP. Therefore, the separate jump bypassing pass is
> removed and CPROP now always alters jumps and bypasses jumps (this is
> extremely cheap anyway).
>
> Bootstrapped&tested (-m32/-m64) with c,c++,fortran on
> x86_64-unknown-linux-gnu with store motion enabled.
> Bootstrapped&tested (-m32/-m64) with c,c++,fortran on
> ia64-unknown-linux-gnu with store motion enabled.
> Bootstrap&test of all languages ongoing on ia64-unknown-linux-gnu.
>
> OK for trunk if everything passes?
>
> Ciao!
> Steven
>
>
>
>
> * dbgcnt.def (cprop1, cprop2, gcse, jump_bypass): Remove
> (cprop, hoist, pre, store_motion): New debug counters.
> * tree-pass.h (pass_tracer): Move to list of gimple passes, it
> is not an RTL pass anymore.
> (pass_profiling): Remove extern decl for pass removed in 2005.
> (pass_gcse, pass_jump_bypass): Remove.
> * final.c (rest_of_clean_state): Set flag_rerun_cse_after_global_opts
> to 0 for clean state.
> * toplev.h (flag_rerun_cse_after_global_opts): Add extern declaration.
> * cse.c (gate_handle_cse_after_global_opts,
> rest_of_handle_cse_after_global_opts): New functions.
> (pass_cse_after_global_opts): New pass, does local CSE.
> * timevar.def (TV_GCSE, TV_CPROP1, TV_CPROP2, TV_BYPASS): Remove.
> (TV_CPROP): New timevar.
> * gcse.c (flag_rerun_cse_after_global_opts): New global variable.
> (run_jump_opt_after_gcse, max_gcse_regno): Remove global vars.
> (gcse_main, recompute_all_luids): Remove.
> (compute_hash_table_work): Call max_reg_num instead of reading
> max_gcse_regno.
> (cprop_jump): Don't set run_jump_opt_after_gcse.
> (constprop_register): Always allow to alter jumps.
> (cprop_insn): Likewise.
> (do_local_cprop): Likewise.
> (local_cprop_pass): Likewise. Return non-zero if something changed.
> (cprop): Remove function, fold interesting bits into one_cprop_pass.
> (find_implicit_sets): Add note about missed optimization opportunity.
> (one_cprop_pass): Rewrite to be "the" CPROP pass, called from the
> pass_rtl_cprop execute function.
> Don't bother tracking the pass number, each pass gets its own dumpfile
> now anyway.
> Always allow to alter jumpsand bypass jumps.
> (bypass_block): Don't ignore regno >= max_gcse_regno, find_bypass_set
> will just find no suitable set.
> (pre_edge_insert): Fix dumping, this function is for PRE only.
> (one_pre_gcse_pass): Rewrite to be "the" PRE pass, called from the
> pass_rtl_pre execute function.
> (hoist_code): Return non-zero if something changed. Keep track of
> substitutions and insertions for statistics gathering similar to PRE.
> (one_code_hoisting_pass): Rewrite to be "the" code hoisting pass,
> called from the pass_rtl_hoist execute function. Show pass statistics.
> (compute_store_table): Use max_reg_num directly instead of using the
> formerly global max_gcse_regno.
> (build_store_vectors): Likewise.
> (replace_store_insn): Fix dumping.
> (store_motion): Rename to ...
> (one_store_motion_pass): ... this. Rewrite to be "the" STORE_MOTION
> pass, called from the pass_rtl_store_motion execute function. Keep
> track of substitutions and insertions for statistics gathering similar
> to PRE.
> (bypass_jumps): Remove, fold interesting bits into ...
> (one_cprop_pass): ... this. Rewrite to be "the" CPROP pass, called
> from the pass_rtl_cprop execute function.
> (gate_handle_jump_bypass, rest_of_handle_jump_bypass,
> pass_jump_bypass): Remove.
> (gate_handle_gcse, rest_of_handle_gcse): Remove.
> (gate_rtl_cprop, execute_rtl_cprop, pass_rtl_cprop): New.
> (gate_rtl_pre, execute_rtl_pre, pass_rtl_pre): New.
> (gate_rtl_hoist, execute_rtl_hoist, pass_rtl_hoist): New.
> (gate_rtl_store_motion, execute_rtl_store_motion,
> pass_rtl_store_motion): New.
> * common.opt: Remove flag_cse_skip_blocks, adjust documentation to
> make it clear that -fcse-skip-blocks is a no-op for backward compat.
> * passes.c (init_optimization_passes): Remove pass_gcse and
> pass_jump_bypass. Schedule cprop, pre, hoist, cprop, store_motion,
> and cse_after_global_opts in place of pass_gcse. Schedule cprop
> instead of pass_jump_bypass.
>
This caused:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46777
H.J.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH][RFA] Split passes from gcse_main into separate passes
2009-04-26 21:27 Steven Bosscher
2009-04-26 21:37 ` Steven Bosscher
@ 2009-04-27 8:45 ` Richard Guenther
2010-12-03 15:05 ` H.J. Lu
2 siblings, 0 replies; 6+ messages in thread
From: Richard Guenther @ 2009-04-27 8:45 UTC (permalink / raw)
To: Steven Bosscher; +Cc: GCC Patches
On Sun, Apr 26, 2009 at 11:23 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> Hi,
>
> This is the next step in the big gcse cleanup. This patch splits the
> uber-pass gcse_main() into individual passes: CPROP, PRE, HOIST,
> STORE_MOTION, and local CSE.
>
> After this split, we no longer have a reason to *not* alter jumps or
> bypass jumps in CPROP. Therefore, the separate jump bypassing pass is
> removed and CPROP now always alters jumps and bypasses jumps (this is
> extremely cheap anyway).
>
> Bootstrapped&tested (-m32/-m64) with c,c++,fortran on
> x86_64-unknown-linux-gnu with store motion enabled.
> Bootstrapped&tested (-m32/-m64) with c,c++,fortran on
> ia64-unknown-linux-gnu with store motion enabled.
> Bootstrap&test of all languages ongoing on ia64-unknown-linux-gnu.
>
> OK for trunk if everything passes?
Ok.
Thanks,
Richard.
> Ciao!
> Steven
>
>
>
>
> * dbgcnt.def (cprop1, cprop2, gcse, jump_bypass): Remove
> (cprop, hoist, pre, store_motion): New debug counters.
> * tree-pass.h (pass_tracer): Move to list of gimple passes, it
> is not an RTL pass anymore.
> (pass_profiling): Remove extern decl for pass removed in 2005.
> (pass_gcse, pass_jump_bypass): Remove.
> * final.c (rest_of_clean_state): Set flag_rerun_cse_after_global_opts
> to 0 for clean state.
> * toplev.h (flag_rerun_cse_after_global_opts): Add extern declaration.
> * cse.c (gate_handle_cse_after_global_opts,
> rest_of_handle_cse_after_global_opts): New functions.
> (pass_cse_after_global_opts): New pass, does local CSE.
> * timevar.def (TV_GCSE, TV_CPROP1, TV_CPROP2, TV_BYPASS): Remove.
> (TV_CPROP): New timevar.
> * gcse.c (flag_rerun_cse_after_global_opts): New global variable.
> (run_jump_opt_after_gcse, max_gcse_regno): Remove global vars.
> (gcse_main, recompute_all_luids): Remove.
> (compute_hash_table_work): Call max_reg_num instead of reading
> max_gcse_regno.
> (cprop_jump): Don't set run_jump_opt_after_gcse.
> (constprop_register): Always allow to alter jumps.
> (cprop_insn): Likewise.
> (do_local_cprop): Likewise.
> (local_cprop_pass): Likewise. Return non-zero if something changed.
> (cprop): Remove function, fold interesting bits into one_cprop_pass.
> (find_implicit_sets): Add note about missed optimization opportunity.
> (one_cprop_pass): Rewrite to be "the" CPROP pass, called from the
> pass_rtl_cprop execute function.
> Don't bother tracking the pass number, each pass gets its own dumpfile
> now anyway.
> Always allow to alter jumpsand bypass jumps.
> (bypass_block): Don't ignore regno >= max_gcse_regno, find_bypass_set
> will just find no suitable set.
> (pre_edge_insert): Fix dumping, this function is for PRE only.
> (one_pre_gcse_pass): Rewrite to be "the" PRE pass, called from the
> pass_rtl_pre execute function.
> (hoist_code): Return non-zero if something changed. Keep track of
> substitutions and insertions for statistics gathering similar to PRE.
> (one_code_hoisting_pass): Rewrite to be "the" code hoisting pass,
> called from the pass_rtl_hoist execute function. Show pass statistics.
> (compute_store_table): Use max_reg_num directly instead of using the
> formerly global max_gcse_regno.
> (build_store_vectors): Likewise.
> (replace_store_insn): Fix dumping.
> (store_motion): Rename to ...
> (one_store_motion_pass): ... this. Rewrite to be "the" STORE_MOTION
> pass, called from the pass_rtl_store_motion execute function. Keep
> track of substitutions and insertions for statistics gathering similar
> to PRE.
> (bypass_jumps): Remove, fold interesting bits into ...
> (one_cprop_pass): ... this. Rewrite to be "the" CPROP pass, called
> from the pass_rtl_cprop execute function.
> (gate_handle_jump_bypass, rest_of_handle_jump_bypass,
> pass_jump_bypass): Remove.
> (gate_handle_gcse, rest_of_handle_gcse): Remove.
> (gate_rtl_cprop, execute_rtl_cprop, pass_rtl_cprop): New.
> (gate_rtl_pre, execute_rtl_pre, pass_rtl_pre): New.
> (gate_rtl_hoist, execute_rtl_hoist, pass_rtl_hoist): New.
> (gate_rtl_store_motion, execute_rtl_store_motion,
> pass_rtl_store_motion): New.
> * common.opt: Remove flag_cse_skip_blocks, adjust documentation to
> make it clear that -fcse-skip-blocks is a no-op for backward compat.
> * passes.c (init_optimization_passes): Remove pass_gcse and
> pass_jump_bypass. Schedule cprop, pre, hoist, cprop, store_motion,
> and cse_after_global_opts in place of pass_gcse. Schedule cprop
> instead of pass_jump_bypass.
>
> Index: dbgcnt.def
> ===================================================================
> --- dbgcnt.def (revision 146798)
> +++ dbgcnt.def (working copy)
> @@ -145,8 +145,7 @@ DEBUG_COUNTER (auto_inc_dec)
> DEBUG_COUNTER (ccp)
> DEBUG_COUNTER (cfg_cleanup)
> DEBUG_COUNTER (cse2_move2add)
> -DEBUG_COUNTER (cprop1)
> -DEBUG_COUNTER (cprop2)
> +DEBUG_COUNTER (cprop)
> DEBUG_COUNTER (dce)
> DEBUG_COUNTER (dce_fast)
> DEBUG_COUNTER (dce_ud)
> @@ -155,17 +154,17 @@ DEBUG_COUNTER (df_byte_scan)
> DEBUG_COUNTER (dse)
> DEBUG_COUNTER (dse1)
> DEBUG_COUNTER (dse2)
> -DEBUG_COUNTER (gcse)
> DEBUG_COUNTER (gcse2_delete)
> DEBUG_COUNTER (global_alloc_at_func)
> DEBUG_COUNTER (global_alloc_at_reg)
> +DEBUG_COUNTER (hoist)
> DEBUG_COUNTER (ia64_sched2)
> DEBUG_COUNTER (if_conversion)
> DEBUG_COUNTER (if_after_combine)
> DEBUG_COUNTER (if_after_reload)
> -DEBUG_COUNTER (jump_bypass)
> DEBUG_COUNTER (local_alloc_for_sched)
> DEBUG_COUNTER (postreload_cse)
> +DEBUG_COUNTER (pre)
> DEBUG_COUNTER (pre_insn)
> DEBUG_COUNTER (treepre_insert)
> DEBUG_COUNTER (sched2_func)
> @@ -177,5 +176,6 @@ DEBUG_COUNTER (sel_sched_cnt)
> DEBUG_COUNTER (sel_sched_region_cnt)
> DEBUG_COUNTER (sel_sched_insn_cnt)
> DEBUG_COUNTER (sms_sched_loop)
> +DEBUG_COUNTER (store_motion)
> DEBUG_COUNTER (split_for_sched2)
> DEBUG_COUNTER (tail_call)
> Index: tree-pass.h
> ===================================================================
> --- tree-pass.h (revision 146798)
> +++ tree-pass.h (working copy)
> @@ -396,6 +396,7 @@ extern struct gimple_opt_pass pass_rebui
> extern struct gimple_opt_pass pass_remove_cgraph_callee_edges;
> extern struct gimple_opt_pass pass_build_cgraph_edges;
> extern struct gimple_opt_pass pass_local_pure_const;
> +extern struct gimple_opt_pass pass_tracer;
>
> /* IPA Passes */
> extern struct ipa_opt_pass pass_ipa_inline;
> @@ -437,11 +438,12 @@ extern struct rtl_opt_pass pass_rtl_dce;
> extern struct rtl_opt_pass pass_rtl_dse1;
> extern struct rtl_opt_pass pass_rtl_dse2;
> extern struct rtl_opt_pass pass_rtl_dse3;
> -extern struct rtl_opt_pass pass_gcse;
> -extern struct rtl_opt_pass pass_jump_bypass;
> -extern struct rtl_opt_pass pass_profiling;
> +extern struct rtl_opt_pass pass_rtl_cprop;
> +extern struct rtl_opt_pass pass_rtl_pre;
> +extern struct rtl_opt_pass pass_rtl_hoist;
> +extern struct rtl_opt_pass pass_rtl_store_motion;
> +extern struct rtl_opt_pass pass_cse_after_global_opts;
> extern struct rtl_opt_pass pass_rtl_ifcvt;
> -extern struct gimple_opt_pass pass_tracer;
>
> extern struct rtl_opt_pass pass_into_cfg_layout_mode;
> extern struct rtl_opt_pass pass_outof_cfg_layout_mode;
> Index: final.c
> ===================================================================
> --- final.c (revision 146798)
> +++ final.c (working copy)
> @@ -4298,6 +4298,7 @@ rest_of_clean_state (void)
> sdbout_types (NULL_TREE);
> #endif
>
> + flag_rerun_cse_after_global_opts = 0;
> reload_completed = 0;
> epilogue_completed = 0;
> #ifdef STACK_REGS
> Index: toplev.h
> ===================================================================
> --- toplev.h (revision 146798)
> +++ toplev.h (working copy)
> @@ -132,6 +132,7 @@ extern int flag_if_conversion;
> extern int flag_if_conversion2;
> extern int flag_keep_static_consts;
> extern int flag_peel_loops;
> +extern int flag_rerun_cse_after_global_opts;
> extern int flag_rerun_cse_after_loop;
> extern int flag_thread_jumps;
> extern int flag_tracer;
> Index: cse.c
> ===================================================================
> --- cse.c (revision 146798)
> +++ cse.c (working copy)
> @@ -6997,3 +6997,65 @@ struct rtl_opt_pass pass_cse2 =
> TODO_verify_flow /* todo_flags_finish */
> }
> };
> +
> +static bool
> +gate_handle_cse_after_global_opts (void)
> +{
> + return optimize > 0 && flag_rerun_cse_after_global_opts;
> +}
> +
> +/* Run second CSE pass after loop optimizations. */
> +static unsigned int
> +rest_of_handle_cse_after_global_opts (void)
> +{
> + int save_cfj;
> + int tem;
> +
> + /* We only want to do local CSE, so don't follow jumps. */
> + save_cfj = flag_cse_follow_jumps;
> + flag_cse_follow_jumps = 0;
> +
> + rebuild_jump_labels (get_insns ());
> + tem = cse_main (get_insns (), max_reg_num ());
> + purge_all_dead_edges ();
> + delete_trivially_dead_insns (get_insns (), max_reg_num ());
> +
> + cse_not_expected = !flag_rerun_cse_after_loop;
> +
> + /* If cse altered any jumps, rerun jump opts to clean things up. */
> + if (tem == 2)
> + {
> + timevar_push (TV_JUMP);
> + rebuild_jump_labels (get_insns ());
> + cleanup_cfg (0);
> + timevar_pop (TV_JUMP);
> + }
> + else if (tem == 1)
> + cleanup_cfg (0);
> +
> + flag_cse_follow_jumps = save_cfj;
> + return 0;
> +}
> +
> +struct rtl_opt_pass pass_cse_after_global_opts =
> +{
> + {
> + RTL_PASS,
> + "cse_local", /* name */
> + gate_handle_cse_after_global_opts, /* gate */
> + rest_of_handle_cse_after_global_opts, /* execute */
> + NULL, /* sub */
> + NULL, /* next */
> + 0, /* static_pass_number */
> + TV_CSE, /* tv_id */
> + 0, /* properties_required */
> + 0, /* properties_provided */
> + 0, /* properties_destroyed */
> + 0, /* todo_flags_start */
> + TODO_df_finish | TODO_verify_rtl_sharing |
> + TODO_dump_func |
> + TODO_ggc_collect |
> + TODO_verify_flow /* todo_flags_finish */
> + }
> +};
> +
> Index: timevar.def
> ===================================================================
> --- timevar.def (revision 146798)
> +++ timevar.def (working copy)
> @@ -154,13 +154,10 @@ DEFTIMEVAR (TV_DCE , "
> DEFTIMEVAR (TV_DSE1 , "dead store elim1")
> DEFTIMEVAR (TV_DSE2 , "dead store elim2")
> DEFTIMEVAR (TV_LOOP , "loop analysis")
> -DEFTIMEVAR (TV_GCSE , "global CSE")
> -DEFTIMEVAR (TV_CPROP1 , "CPROP 1")
> +DEFTIMEVAR (TV_CPROP , "CPROP")
> DEFTIMEVAR (TV_PRE , "PRE")
> DEFTIMEVAR (TV_HOIST , "code hoisting")
> -DEFTIMEVAR (TV_CPROP2 , "CPROP 2")
> DEFTIMEVAR (TV_LSM , "LSM")
> -DEFTIMEVAR (TV_BYPASS , "bypass jumps")
> DEFTIMEVAR (TV_TRACER , "tracer")
> DEFTIMEVAR (TV_WEB , "web")
> DEFTIMEVAR (TV_AUTO_INC_DEC , "auto inc dec")
> Index: gcse.c
> ===================================================================
> --- gcse.c (revision 146799)
> +++ gcse.c (working copy)
> @@ -280,13 +280,8 @@ along with GCC; see the file COPYING3.
>
> /* GCSE global vars. */
>
> -/* Note whether or not we should run jump optimization after gcse. We
> - want to do this for two cases.
> -
> - * If we changed any jumps via cprop.
> -
> - * If we added any labels via edge splitting. */
> -static int run_jump_opt_after_gcse;
> +/* Set to non-zero if CSE should run after all GCSE optimizations are done. */
> +int flag_rerun_cse_after_global_opts;
>
> /* An obstack for our working variables. */
> static struct obstack gcse_obstack;
> @@ -370,11 +365,6 @@ static struct hash_table expr_hash_table
> /* Copy propagation hash table. */
> static struct hash_table set_hash_table;
>
> -/* Maximum register number in function prior to doing gcse + 1.
> - Registers created during this pass have regno >= max_gcse_regno.
> - This is named with "gcse" to not collide with global of same name. */
> -static unsigned int max_gcse_regno;
> -
> /* This is a list of expressions which are MEMs and will be used by load
> or store motion.
> Load motion tracks MEMs which aren't killed by
> @@ -450,7 +440,6 @@ static int global_copy_prop_count;
> static sbitmap *ae_kill, *ae_gen;
>
> static void compute_can_copy (void);
> -static void recompute_all_luids (void);
> static void *gmalloc (size_t) ATTRIBUTE_MALLOC;
> static void *gcalloc (size_t, size_t) ATTRIBUTE_MALLOC;
> static void *gcse_alloc (unsigned long);
> @@ -502,11 +491,10 @@ static int cprop_jump (basic_block, rtx,
> static void mems_conflict_for_gcse_p (rtx, const_rtx, void *);
> static int load_killed_in_block_p (const_basic_block, int, const_rtx, int);
> static void canon_list_insert (rtx, const_rtx, void *);
> -static int cprop_insn (rtx, int);
> -static int cprop (int);
> +static int cprop_insn (rtx);
> static void find_implicit_sets (void);
> -static int one_cprop_pass (int, bool, bool);
> -static bool constprop_register (rtx, rtx, rtx, bool);
> +static int one_cprop_pass (void);
> +static bool constprop_register (rtx, rtx, rtx);
> static struct expr *find_bypass_set (int, int);
> static bool reg_killed_on_edge (const_rtx, const_edge);
> static int bypass_block (basic_block, rtx, rtx);
> @@ -521,14 +509,14 @@ static void pre_insert_copy_insn (struct
> static void pre_insert_copies (void);
> static int pre_delete (void);
> static int pre_gcse (void);
> -static int one_pre_gcse_pass (int);
> +static int one_pre_gcse_pass (void);
> static void add_label_notes (rtx, rtx);
> static void alloc_code_hoist_mem (int, int);
> static void free_code_hoist_mem (void);
> static void compute_code_hoist_vbeinout (void);
> static void compute_code_hoist_data (void);
> static int hoist_expr_reaches_here_p (basic_block, int, basic_block, char *);
> -static void hoist_code (void);
> +static int hoist_code (void);
> static int one_code_hoisting_pass (void);
> static rtx process_insert_insn (struct expr *);
> static int pre_edge_insert (struct edge_list *, struct expr **);
> @@ -566,14 +554,14 @@ static void remove_reachable_equiv_notes
> static void replace_store_insn (rtx, rtx, basic_block, struct ls_expr *);
> static void delete_store (struct ls_expr *, basic_block);
> static void free_store_memory (void);
> -static void store_motion (void);
> +static int one_store_motion_pass (void);
> static void free_insn_expr_list_list (rtx *);
> static void clear_modify_mem_tables (void);
> static void free_modify_mem_tables (void);
> static rtx gcse_emit_move_after (rtx, rtx, rtx);
> static void local_cprop_find_used_regs (rtx *, void *);
> -static bool do_local_cprop (rtx, rtx, bool);
> -static void local_cprop_pass (bool);
> +static bool do_local_cprop (rtx, rtx);
> +static int local_cprop_pass (void);
> static bool is_too_expensive (const char *);
>
> #define GNEW(T) ((T *) gmalloc (sizeof (T)))
> @@ -588,155 +576,6 @@ static bool is_too_expensive (const char
> #define GOBNEW(T) ((T *) gcse_alloc (sizeof (T)))
> #define GOBNEWVAR(T, S) ((T *) gcse_alloc ((S)))
>
> -
> -/* Entry point for global common subexpression elimination.
> - F is the first instruction in the function. Return nonzero if a
> - change is mode. */
> -
> -static int
> -gcse_main (rtx f ATTRIBUTE_UNUSED)
> -{
> - int changed;
> - /* Point to release obstack data from for each pass. */
> - char *gcse_obstack_bottom;
> -
> - /* We do not construct an accurate cfg in functions which call
> - setjmp, so just punt to be safe. */
> - if (cfun->calls_setjmp)
> - return 0;
> -
> - /* Assume that we do not need to run jump optimizations after gcse. */
> - run_jump_opt_after_gcse = 0;
> -
> - /* Identify the basic block information for this function, including
> - successors and predecessors. */
> - max_gcse_regno = max_reg_num ();
> -
> - df_note_add_problem ();
> - df_analyze ();
> -
> - if (dump_file)
> - dump_flow_info (dump_file, dump_flags);
> -
> - /* Return if there's nothing to do, or it is too expensive. */
> - if (n_basic_blocks <= NUM_FIXED_BLOCKS + 1
> - || is_too_expensive (_("GCSE disabled")))
> - return 0;
> -
> - gcc_obstack_init (&gcse_obstack);
> - bytes_used = 0;
> -
> - /* We need alias. */
> - init_alias_analysis ();
> -
> - gcse_obstack_bottom = GOBNEWVAR (char, 1);
> - changed = 0;
> -
> - if (dump_file)
> - fprintf (dump_file, "GCSE pass\n\n");
> -
> - max_gcse_regno = max_reg_num ();
> -
> - alloc_gcse_mem ();
> -
> - /* Don't allow constant propagation to modify jumps
> - during this pass. */
> - if (dbg_cnt (cprop1))
> - {
> - timevar_push (TV_CPROP1);
> - changed = one_cprop_pass (1, false, false);
> - if (changed)
> - recompute_all_luids ();
> - timevar_pop (TV_CPROP1);
> - }
> -
> - if (optimize_function_for_speed_p (cfun))
> - {
> - timevar_push (TV_PRE);
> - changed |= one_pre_gcse_pass (1);
> - /* We may have just created new basic blocks. Release and
> - recompute various things which are sized on the number of
> - basic blocks.
> - ??? There would be no need for this if we used a block
> - based Lazy Code Motion variant, with all (or selected)
> - edges split before running the pass. That would also
> - help find_implicit_sets for cprop. FIXME. */
> - if (changed)
> - {
> - free_modify_mem_tables ();
> - modify_mem_list = GCNEWVEC (rtx, last_basic_block);
> - canon_modify_mem_list = GCNEWVEC (rtx, last_basic_block);
> - }
> -
> - df_analyze ();
> - run_jump_opt_after_gcse = 1;
> - timevar_pop (TV_PRE);
> - }
> - else
> - {
> - /* This function is being optimized for code size.
> - It does not make sense to run code hoisting unless we are optimizing
> - for code size -- it rarely makes programs faster, and can make
> - them bigger if we did partial redundancy elimination (when optimizing
> - for space, we don't run the partial redundancy algorithms). */
> - timevar_push (TV_HOIST);
> - max_gcse_regno = max_reg_num ();
> - alloc_gcse_mem ();
> - one_code_hoisting_pass ();
> - timevar_pop (TV_HOIST);
> - }
> -
> - free_gcse_mem ();
> -
> - if (dump_file)
> - {
> - fprintf (dump_file, "\n");
> - fflush (dump_file);
> - }
> -
> - obstack_free (&gcse_obstack, gcse_obstack_bottom);
> -
> - /* Do the second const/copy propagation pass, including cprop into
> - conditional jumps. */
> - if (dbg_cnt (cprop2))
> - {
> - max_gcse_regno = max_reg_num ();
> - alloc_gcse_mem ();
> -
> - /* This time, go ahead and allow cprop to alter jumps. */
> - timevar_push (TV_CPROP2);
> - changed = one_cprop_pass (2, true, true);
> - if (changed)
> - recompute_all_luids ();
> - timevar_pop (TV_CPROP2);
> - free_gcse_mem ();
> - }
> -
> - if (dump_file)
> - {
> - fprintf (dump_file, "GCSE of %s: %d basic blocks, ",
> - current_function_name (), n_basic_blocks);
> - fprintf (dump_file, "pass 1, %d bytes\n\n", bytes_used);
> - }
> -
> - obstack_free (&gcse_obstack, NULL);
> -
> - /* We are finished with alias.
> - ??? Actually we recompute alias in store_motion. */
> - end_alias_analysis ();
> -
> - /* Run store motion. */
> - if (optimize_function_for_speed_p (cfun) && flag_gcse_sm)
> - {
> - timevar_push (TV_LSM);
> - store_motion ();
> - timevar_pop (TV_LSM);
> - }
> -
> - /* Record where pseudo-registers are set. */
> - return run_jump_opt_after_gcse;
> -}
> -
> /* Misc. utilities. */
>
> /* Nonzero for each mode that supports (set (reg) (reg)).
> @@ -790,19 +629,6 @@ can_copy_p (enum machine_mode mode)
> return can_copy[mode] != 0;
> }
>
> -/* Recompute the DF LUIDs for all basic blocks. If a sub-pass in this
> - file changes something, we have to recompute them for the next pass.
> - FIXME: If we would track which basic blocks we touch, we could
> - update LUIDs in only those basic blocks. */
> -
> -static void
> -recompute_all_luids (void)
> -{
> - basic_block bb;
> - FOR_EACH_BB (bb)
> - df_recompute_luids (bb);
> -}
> -
>
> /* Cover function to xmalloc to record bytes allocated. */
>
> @@ -1447,11 +1273,10 @@ insert_set_in_table (rtx x, rtx insn, st
> /* First occurrence of this expression in this basic block. */
> cur_occr = GOBNEW (struct occr);
> bytes_used += sizeof (struct occr);
> -
> - cur_occr->insn = insn;
> - cur_occr->next = cur_expr->avail_occr;
> - cur_occr->deleted_p = 0;
> - cur_expr->avail_occr = cur_occr;
> + cur_occr->insn = insn;
> + cur_occr->next = cur_expr->avail_occr;
> + cur_occr->deleted_p = 0;
> + cur_expr->avail_occr = cur_occr;
> }
> }
>
> @@ -1839,14 +1664,14 @@ record_last_set_info (rtx dest, const_rt
> static void
> compute_hash_table_work (struct hash_table *table)
> {
> - unsigned int i;
> + int i;
>
> /* re-Cache any INSN_LIST nodes we have allocated. */
> clear_modify_mem_tables ();
> /* Some working arrays used to track first and last set in each block. */
> - reg_avail_info = GNEWVEC (struct reg_avail_info, max_gcse_regno);
> + reg_avail_info = GNEWVEC (struct reg_avail_info, max_reg_num ());
>
> - for (i = 0; i < max_gcse_regno; ++i)
> + for (i = 0; i < max_reg_num (); ++i)
> reg_avail_info[i].last_bb = NULL;
>
> FOR_EACH_BB (current_bb)
> @@ -2631,8 +2456,6 @@ cprop_jump (basic_block bb, rtx setcc, r
> delete_insn (setcc);
> #endif
>
> - run_jump_opt_after_gcse = 1;
> -
> global_const_prop_count++;
> if (dump_file != NULL)
> {
> @@ -2666,14 +2489,13 @@ cprop_jump (basic_block bb, rtx setcc, r
> }
>
> static bool
> -constprop_register (rtx insn, rtx from, rtx to, bool alter_jumps)
> +constprop_register (rtx insn, rtx from, rtx to)
> {
> rtx sset;
>
> /* Check for reg or cc0 setting instructions followed by
> conditional branch instructions first. */
> - if (alter_jumps
> - && (sset = single_set (insn)) != NULL
> + if ((sset = single_set (insn)) != NULL
> && NEXT_INSN (insn)
> && any_condjump_p (NEXT_INSN (insn)) && onlyjump_p (NEXT_INSN (insn)))
> {
> @@ -2694,7 +2516,7 @@ constprop_register (rtx insn, rtx from,
>
> Right now the insn in question must look like
> (set (pc) (if_then_else ...)) */
> - else if (alter_jumps && any_condjump_p (insn) && onlyjump_p (insn))
> + else if (any_condjump_p (insn) && onlyjump_p (insn))
> return cprop_jump (BLOCK_FOR_INSN (insn), NULL, insn, from, to);
> return 0;
> }
> @@ -2703,7 +2525,7 @@ constprop_register (rtx insn, rtx from,
> The result is nonzero if a change was made. */
>
> static int
> -cprop_insn (rtx insn, int alter_jumps)
> +cprop_insn (rtx insn)
> {
> struct reg_use *reg_used;
> int changed = 0;
> @@ -2728,11 +2550,6 @@ cprop_insn (rtx insn, int alter_jumps)
> rtx pat, src;
> struct expr *set;
>
> - /* Ignore registers created by GCSE.
> - We do this because ... */
> - if (regno >= max_gcse_regno)
> - continue;
> -
> /* If the register has already been set in this block, there's
> nothing we can do. */
> if (! oprs_not_set_p (reg_used->reg_rtx, insn))
> @@ -2753,7 +2570,7 @@ cprop_insn (rtx insn, int alter_jumps)
> /* Constant propagation. */
> if (gcse_constant_p (src))
> {
> - if (constprop_register (insn, reg_used->reg_rtx, src, alter_jumps))
> + if (constprop_register (insn, reg_used->reg_rtx, src))
> {
> changed = 1;
> global_const_prop_count++;
> @@ -2840,11 +2657,10 @@ local_cprop_find_used_regs (rtx *xptr, v
> find_used_regs (xptr, data);
> }
>
> -/* Try to perform local const/copy propagation on X in INSN.
> - If ALTER_JUMPS is false, changing jump insns is not allowed. */
> +/* Try to perform local const/copy propagation on X in INSN. */
>
> static bool
> -do_local_cprop (rtx x, rtx insn, bool alter_jumps)
> +do_local_cprop (rtx x, rtx insn)
> {
> rtx newreg = NULL, newcnst = NULL;
>
> @@ -2877,7 +2693,7 @@ do_local_cprop (rtx x, rtx insn, bool al
> || ! MEM_P (XEXP (note, 0))))
> newreg = this_rtx;
> }
> - if (newcnst && constprop_register (insn, x, newcnst, alter_jumps))
> + if (newcnst && constprop_register (insn, x, newcnst))
> {
> if (dump_file != NULL)
> {
> @@ -2907,12 +2723,10 @@ do_local_cprop (rtx x, rtx insn, bool al
> return false;
> }
>
> -/* Do local const/copy propagation (i.e. within each basic block).
> - If ALTER_JUMPS is true, allow propagating into jump insns, which
> - could modify the CFG. */
> +/* Do local const/copy propagation (i.e. within each basic block). */
>
> -static void
> -local_cprop_pass (bool alter_jumps)
> +static int
> +local_cprop_pass (void)
> {
> basic_block bb;
> rtx insn;
> @@ -2938,7 +2752,7 @@ local_cprop_pass (bool alter_jumps)
> for (reg_used = ®_use_table[0]; reg_use_count > 0;
> reg_used++, reg_use_count--)
> {
> - if (do_local_cprop (reg_used->reg_rtx, insn, alter_jumps))
> + if (do_local_cprop (reg_used->reg_rtx, insn))
> {
> changed = true;
> break;
> @@ -2958,52 +2772,6 @@ local_cprop_pass (bool alter_jumps)
>
> cselib_finish ();
>
> - /* Global analysis may get into infinite loops for unreachable blocks. */
> - if (changed && alter_jumps)
> - delete_unreachable_blocks ();
> -}
> -
> -/* Forward propagate copies. This includes copies and constants. Return
> - nonzero if a change was made. */
> -
> -static int
> -cprop (int alter_jumps)
> -{
> - int changed;
> - basic_block bb;
> - rtx insn;
> -
> - /* Note we start at block 1. */
> - if (ENTRY_BLOCK_PTR->next_bb == EXIT_BLOCK_PTR)
> - {
> - if (dump_file != NULL)
> - fprintf (dump_file, "\n");
> - return 0;
> - }
> -
> - changed = 0;
> - FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR->next_bb->next_bb,
> EXIT_BLOCK_PTR, next_bb)
> - {
> - /* Reset tables used to keep track of what's still valid [since the
> - start of the block]. */
> - reset_opr_set_tables ();
> -
> - FOR_BB_INSNS (bb, insn)
> - if (INSN_P (insn))
> - {
> - changed |= cprop_insn (insn, alter_jumps);
> -
> - /* Keep track of everything modified by this insn. */
> - /* ??? Need to be careful w.r.t. mods done to INSN. Don't
> - call mark_oprs_set if we turned the insn into a NOTE. */
> - if (! NOTE_P (insn))
> - mark_oprs_set (insn);
> - }
> - }
> -
> - if (dump_file != NULL)
> - fprintf (dump_file, "\n");
> -
> return changed;
> }
>
> @@ -3060,7 +2828,12 @@ implicit_set_cond_p (const_rtx cond)
> following "if (x == 2)", the then branch may be optimized as though the
> conditional performed an "explicit set", in this example, "x = 2". This
> function records the set patterns that are implicit at the start of each
> - basic block. */
> + basic block.
> +
> + FIXME: This would be more effective if critical edges are pre-split. As
> + it is now, we can't record implicit sets for blocks that have
> + critical successor edges. This results in missed optimizations
> + and in more (unnecessary) work in cfgcleanup.c:thread_jump(). */
>
> static void
> find_implicit_sets (void)
> @@ -3085,7 +2858,9 @@ find_implicit_sets (void)
> dest = GET_CODE (cond) == EQ ? BRANCH_EDGE (bb)->dest
> : FALLTHRU_EDGE (bb)->dest;
>
> - if (dest && single_pred_p (dest)
> + if (dest
> + /* Record nothing for a critical edge. */
> + && single_pred_p (dest)
> && dest != EXIT_BLOCK_PTR)
> {
> new_rtx = gen_rtx_SET (VOIDmode, XEXP (cond, 0),
> @@ -3106,63 +2881,6 @@ find_implicit_sets (void)
> fprintf (dump_file, "Found %d implicit sets\n", count);
> }
>
> -/* Perform one copy/constant propagation pass.
> - PASS is the pass count. If CPROP_JUMPS is true, perform constant
> - propagation into conditional jumps. If BYPASS_JUMPS is true,
> - perform conditional jump bypassing optimizations. */
> -
> -static int
> -one_cprop_pass (int pass, bool cprop_jumps, bool bypass_jumps)
> -{
> - int changed = 0;
> -
> - global_const_prop_count = local_const_prop_count = 0;
> - global_copy_prop_count = local_copy_prop_count = 0;
> -
> - if (cprop_jumps)
> - local_cprop_pass (cprop_jumps);
> -
> - /* Determine implicit sets. */
> - implicit_sets = XCNEWVEC (rtx, last_basic_block);
> - find_implicit_sets ();
> -
> - alloc_hash_table (get_max_uid (), &set_hash_table, 1);
> - compute_hash_table (&set_hash_table);
> -
> - /* Free implicit_sets before peak usage. */
> - free (implicit_sets);
> - implicit_sets = NULL;
> -
> - if (dump_file)
> - dump_hash_table (dump_file, "SET", &set_hash_table);
> - if (set_hash_table.n_elems > 0)
> - {
> - alloc_cprop_mem (last_basic_block, set_hash_table.n_elems);
> - compute_cprop_data ();
> - changed = cprop (cprop_jumps);
> - if (bypass_jumps)
> - changed |= bypass_conditional_jumps ();
> - free_cprop_mem ();
> - }
> -
> - free_hash_table (&set_hash_table);
> -
> - if (dump_file)
> - {
> - fprintf (dump_file, "CPROP of %s, pass %d: %d bytes needed, ",
> - current_function_name (), pass, bytes_used);
> - fprintf (dump_file, "%d local const props, %d local copy props, ",
> - local_const_prop_count, local_copy_prop_count);
> - fprintf (dump_file, "%d global const props, %d global copy props\n\n",
> - global_const_prop_count, global_copy_prop_count);
> - }
> - /* Global analysis may get into infinite loops for unreachable blocks. */
> - if (changed && cprop_jumps)
> - delete_unreachable_blocks ();
> -
> - return changed;
> -}
> -
> /* Bypass conditional jumps. */
>
> /* The value of last_basic_block at the beginning of the jump_bypass
> @@ -3302,9 +3020,6 @@ bypass_block (basic_block bb, rtx setcc,
> struct expr *set;
> rtx src, new_rtx;
>
> - if (regno >= max_gcse_regno)
> - continue;
> -
> set = find_bypass_set (regno, e->src->index);
>
> if (! set)
> @@ -3888,7 +3603,7 @@ pre_edge_insert (struct edge_list *edge_
>
> if (dump_file)
> {
> - fprintf (dump_file, "PRE/HOIST: edge (%d,%d), ",
> + fprintf (dump_file, "PRE: edge (%d,%d), ",
> bb->index,
> INDEX_EDGE_SUCC_BB (edge_list, e)->index);
> fprintf (dump_file, "copy expression %d\n",
> @@ -4232,13 +3947,25 @@ pre_gcse (void)
> Return nonzero if a change was made. */
>
> static int
> -one_pre_gcse_pass (int pass)
> +one_pre_gcse_pass (void)
> {
> int changed = 0;
>
> gcse_subst_count = 0;
> gcse_create_count = 0;
>
> + /* Return if there's nothing to do, or it is too expensive. */
> + if (n_basic_blocks <= NUM_FIXED_BLOCKS + 1
> + || is_too_expensive (_("PRE disabled")))
> + return 0;
> +
> + /* We need alias. */
> + init_alias_analysis ();
> +
> + bytes_used = 0;
> + gcc_obstack_init (&gcse_obstack);
> + alloc_gcse_mem ();
> +
> alloc_hash_table (get_max_uid (), &expr_hash_table, 0);
> add_noreturn_fake_exit_edges ();
> if (flag_gcse_lm)
> @@ -4262,10 +3989,16 @@ one_pre_gcse_pass (int pass)
> remove_fake_exit_edges ();
> free_hash_table (&expr_hash_table);
>
> + free_gcse_mem ();
> + obstack_free (&gcse_obstack, NULL);
> +
> + /* We are finished with alias. */
> + end_alias_analysis ();
> +
> if (dump_file)
> {
> - fprintf (dump_file, "\nPRE GCSE of %s, pass %d: %d bytes needed, ",
> - current_function_name (), pass, bytes_used);
> + fprintf (dump_file, "PRE GCSE of %s, %d basic blocks, %d bytes needed, ",
> + current_function_name (), n_basic_blocks, bytes_used);
> fprintf (dump_file, "%d substs, %d insns created\n",
> gcse_subst_count, gcse_create_count);
> }
> @@ -4530,7 +4263,7 @@ hoist_expr_reaches_here_p (basic_block e
>
> /* Actually perform code hoisting. */
>
> -static void
> +static int
> hoist_code (void)
> {
> basic_block bb, dominated;
> @@ -4538,6 +4271,7 @@ hoist_code (void)
> unsigned int i,j;
> struct expr **index_map;
> struct expr *expr;
> + int changed = 0;
>
> sbitmap_vector_zero (hoist_exprs, last_basic_block);
>
> @@ -4669,6 +4403,9 @@ hoist_code (void)
> gcse_emit_move_after (expr->reaching_reg, SET_DEST (set), insn);
> delete_insn (insn);
> occr->deleted_p = 1;
> + changed = 1;
> + gcse_subst_count++;
> +
> if (!insn_inserted_p)
> {
> insert_insn_end_basic_block (index_map[i], bb, 0);
> @@ -4682,6 +4419,8 @@ hoist_code (void)
> }
>
> free (index_map);
> +
> + return changed;
> }
>
> /* Top level routine to perform one code hoisting (aka unification) pass
> @@ -4693,6 +4432,21 @@ one_code_hoisting_pass (void)
> {
> int changed = 0;
>
> + gcse_subst_count = 0;
> + gcse_create_count = 0;
> +
> + /* Return if there's nothing to do, or it is too expensive. */
> + if (n_basic_blocks <= NUM_FIXED_BLOCKS + 1
> + || is_too_expensive (_("GCSE disabled")))
> + return 0;
> +
> + /* We need alias. */
> + init_alias_analysis ();
> +
> + bytes_used = 0;
> + gcc_obstack_init (&gcse_obstack);
> + alloc_gcse_mem ();
> +
> alloc_hash_table (get_max_uid (), &expr_hash_table, 0);
> compute_hash_table (&expr_hash_table);
> if (dump_file)
> @@ -4702,11 +4456,24 @@ one_code_hoisting_pass (void)
> {
> alloc_code_hoist_mem (last_basic_block, expr_hash_table.n_elems);
> compute_code_hoist_data ();
> - hoist_code ();
> + changed = hoist_code ();
> free_code_hoist_mem ();
> }
>
> free_hash_table (&expr_hash_table);
> + free_gcse_mem ();
> + obstack_free (&gcse_obstack, NULL);
> +
> + /* We are finished with alias. */
> + end_alias_analysis ();
> +
> + if (dump_file)
> + {
> + fprintf (dump_file, "HOIST of %s, %d basic blocks, %d bytes needed, ",
> + current_function_name (), n_basic_blocks, bytes_used);
> + fprintf (dump_file, "%d substs, %d insns created\n",
> + gcse_subst_count, gcse_create_count);
> + }
>
> return changed;
> }
> @@ -5441,8 +5208,7 @@ compute_store_table (void)
> rtx insn, pat, tmp;
> int *last_set_in, *already_set;
> struct ls_expr * ptr, **prev_next_ptr_ptr;
> -
> - max_gcse_regno = max_reg_num ();
> + unsigned int max_gcse_regno = max_reg_num ();
>
> pre_ldst_mems = 0;
> pre_ldst_table = htab_create (13, pre_ldst_expr_hash,
> @@ -5762,6 +5528,7 @@ build_store_vectors (void)
> int *regs_set_in_block;
> rtx insn, st;
> struct ls_expr * ptr;
> + unsigned int max_gcse_regno = max_reg_num ();
>
> /* Build the gen_vector. This is any store in the table which is not killed
> by aliasing later in its block. */
> @@ -6060,7 +5827,7 @@ replace_store_insn (rtx reg, rtx del, ba
> fprintf (dump_file,
> "STORE_MOTION delete insn in BB %d:\n ", bb->index);
> print_inline_rtx (dump_file, del, 6);
> - fprintf (dump_file, "\nSTORE MOTION replaced with insn:\n ");
> + fprintf (dump_file, "\nSTORE_MOTION replaced with insn:\n ");
> print_inline_rtx (dump_file, insn, 6);
> fprintf (dump_file, "\n");
> }
> @@ -6142,21 +5909,19 @@ free_store_memory (void)
> }
>
> /* Perform store motion. Much like gcse, except we move expressions the
> - other way by looking at the flowgraph in reverse. */
> + other way by looking at the flowgraph in reverse.
> + Return non-zero if transformations are performed by the pass. */
>
> -static void
> -store_motion (void)
> +static int
> +one_store_motion_pass (void)
> {
> basic_block bb;
> int x;
> struct ls_expr * ptr;
> int update_flow = 0;
>
> - if (dump_file)
> - {
> - fprintf (dump_file, "before store motion\n");
> - print_rtl (dump_file, get_insns ());
> - }
> + gcse_subst_count = 0;
> + gcse_create_count = 0;
>
> init_alias_analysis ();
>
> @@ -6167,7 +5932,7 @@ store_motion (void)
> htab_delete (pre_ldst_table);
> pre_ldst_table = NULL;
> end_alias_analysis ();
> - return;
> + return 0;
> }
>
> /* Now compute kill & transp vectors. */
> @@ -6203,11 +5968,17 @@ store_motion (void)
>
> FOR_EACH_BB (bb)
> if (TEST_BIT (pre_delete_map[bb->index], ptr->index))
> - delete_store (ptr, bb);
> + {
> + delete_store (ptr, bb);
> + gcse_subst_count++;
> + }
>
> for (x = 0; x < NUM_EDGES (edge_list); x++)
> if (TEST_BIT (pre_insert_map[x], ptr->index))
> - update_flow |= insert_store (ptr, INDEX_EDGE (edge_list, x));
> + {
> + update_flow |= insert_store (ptr, INDEX_EDGE (edge_list, x));
> + gcse_create_count++;
> + }
> }
>
> if (update_flow)
> @@ -6217,59 +5988,19 @@ store_motion (void)
> free_edge_list (edge_list);
> remove_fake_exit_edges ();
> end_alias_analysis ();
> -}
> -
> -
> -/* Entry point for jump bypassing optimization pass. */
> -
> -static int
> -bypass_jumps (void)
> -{
> - int changed;
> -
> - /* We do not construct an accurate cfg in functions which call
> - setjmp, so just punt to be safe. */
> - if (cfun->calls_setjmp)
> - return 0;
> -
> - /* Identify the basic block information for this function, including
> - successors and predecessors. */
> - max_gcse_regno = max_reg_num ();
> -
> - if (dump_file)
> - dump_flow_info (dump_file, dump_flags);
> -
> - /* Return if there's nothing to do, or it is too expensive. */
> - if (n_basic_blocks <= NUM_FIXED_BLOCKS + 1
> - || is_too_expensive (_ ("jump bypassing disabled")))
> - return 0;
> -
> - gcc_obstack_init (&gcse_obstack);
> - bytes_used = 0;
> -
> - /* We need alias. */
> - init_alias_analysis ();
> -
> - max_gcse_regno = max_reg_num ();
> - alloc_gcse_mem ();
> - changed = one_cprop_pass (3, true, true);
> - free_gcse_mem ();
>
> if (dump_file)
> {
> - fprintf (dump_file, "BYPASS of %s: %d basic blocks, ",
> + fprintf (dump_file, "STORE_MOTION of %s, %d basic blocks, ",
> current_function_name (), n_basic_blocks);
> - fprintf (dump_file, "%d bytes\n\n", bytes_used);
> + fprintf (dump_file, "%d substs, %d insns created\n",
> + gcse_subst_count, gcse_create_count);
> }
>
> - obstack_free (&gcse_obstack, NULL);
> -
> - /* We are finished with alias. */
> - end_alias_analysis ();
> -
> - return changed;
> + return (gcse_subst_count > 0 || gcse_create_count > 0);
> }
>
> +
> /* Return true if the graph is too expensive to optimize. PASS is the
> optimization about to be performed. */
>
> @@ -6309,110 +6040,271 @@ is_too_expensive (const char *pass)
>
> return false;
> }
> +
> +
> +/* Main function for the CPROP pass. */
> +
> +static int
> +one_cprop_pass (void)
> +{
> + int changed = 0;
> +
> + /* Return if there's nothing to do, or it is too expensive. */
> + if (n_basic_blocks <= NUM_FIXED_BLOCKS + 1
> + || is_too_expensive (_ ("const/copy propagation disabled")))
> + return 0;
> +
> + global_const_prop_count = local_const_prop_count = 0;
> + global_copy_prop_count = local_copy_prop_count = 0;
> +
> + bytes_used = 0;
> + gcc_obstack_init (&gcse_obstack);
> + alloc_gcse_mem ();
> +
> + /* Do a local const/copy propagation pass first. The global pass
> + only handles global opportunities.
> + If the local pass changes something, remove any unreachable blocks
> + because the CPROP global dataflow analysis may get into infinite
> + loops for CFGs with unreachable blocks.
> +
> + FIXME: This local pass should not be necessary after CSE (but for
> + some reason it still is). It is also (proven) not necessary
> + to run the local pass right after FWPWOP.
> +
> + FIXME: The global analysis would not get into infinite loops if it
> + would use the DF solver (via df_simple_dataflow) instead of
> + the solver implemented in this file. */
> + if (local_cprop_pass ())
> + {
> + delete_unreachable_blocks ();
> + df_analyze ();
> + }
> +
> + /* Determine implicit sets. */
> + implicit_sets = XCNEWVEC (rtx, last_basic_block);
> + find_implicit_sets ();
> +
> + alloc_hash_table (get_max_uid (), &set_hash_table, 1);
> + compute_hash_table (&set_hash_table);
> +
> + /* Free implicit_sets before peak usage. */
> + free (implicit_sets);
> + implicit_sets = NULL;
> +
> + if (dump_file)
> + dump_hash_table (dump_file, "SET", &set_hash_table);
> + if (set_hash_table.n_elems > 0)
> + {
> + basic_block bb;
> + rtx insn;
> +
> + alloc_cprop_mem (last_basic_block, set_hash_table.n_elems);
> + compute_cprop_data ();
> +
> + FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR->next_bb->next_bb,
> EXIT_BLOCK_PTR, next_bb)
> + {
> + /* Reset tables used to keep track of what's still valid [since
> + the start of the block]. */
> + reset_opr_set_tables ();
> +
> + FOR_BB_INSNS (bb, insn)
> + if (INSN_P (insn))
> + {
> + changed |= cprop_insn (insn);
> +
> + /* Keep track of everything modified by this insn. */
> + /* ??? Need to be careful w.r.t. mods done to INSN.
> + Don't call mark_oprs_set if we turned the
> + insn into a NOTE. */
> + if (! NOTE_P (insn))
> + mark_oprs_set (insn);
> + }
> + }
> +
> + changed |= bypass_conditional_jumps ();
> + free_cprop_mem ();
> + }
> +
> + free_hash_table (&set_hash_table);
> + free_gcse_mem ();
> + obstack_free (&gcse_obstack, NULL);
> +
> + if (dump_file)
> + {
> + fprintf (dump_file, "CPROP of %s, %d basic blocks, %d bytes needed, ",
> + current_function_name (), n_basic_blocks, bytes_used);
> + fprintf (dump_file, "%d local const props, %d local copy props, ",
> + local_const_prop_count, local_copy_prop_count);
> + fprintf (dump_file, "%d global const props, %d global copy props\n\n",
> + global_const_prop_count, global_copy_prop_count);
> + }
> +
> + return changed;
> +}
> +
>
> +/* All the passes implemented in this file. Each pass has its
> + own gate and execute function, and at the end of the file a
> + pass definition for passes.c.
> +
> + We do not construct an accurate cfg in functions which call
> + setjmp, so none of these passes runs if the function calls
> + setjmp.
> + FIXME: Should just handle setjmp via REG_SETJMP notes. */
> +
> static bool
> -gate_handle_jump_bypass (void)
> +gate_rtl_cprop (void)
> {
> return optimize > 0 && flag_gcse
> - && dbg_cnt (jump_bypass);
> + && !cfun->calls_setjmp
> + && dbg_cnt (cprop);
> }
>
> -/* Perform jump bypassing and control flow optimizations. */
> static unsigned int
> -rest_of_handle_jump_bypass (void)
> +execute_rtl_cprop (void)
> {
> delete_unreachable_blocks ();
> - if (bypass_jumps ())
> - {
> - delete_trivially_dead_insns (get_insns (), max_reg_num ());
> - rebuild_jump_labels (get_insns ());
> - cleanup_cfg (0);
> - }
> + df_note_add_problem ();
> + df_set_flags (DF_LR_RUN_DCE);
> + df_analyze ();
> + flag_rerun_cse_after_global_opts |= one_cprop_pass ();
> + return 0;
> +}
> +
> +static bool
> +gate_rtl_pre (void)
> +{
> + return optimize > 0 && flag_gcse
> + && !cfun->calls_setjmp
> + && optimize_function_for_speed_p (cfun)
> + && dbg_cnt (pre);
> +}
> +
> +static unsigned int
> +execute_rtl_pre (void)
> +{
> + delete_unreachable_blocks ();
> + df_note_add_problem ();
> + df_analyze ();
> + flag_rerun_cse_after_global_opts |= one_pre_gcse_pass ();
> + return 0;
> +}
> +
> +static bool
> +gate_rtl_hoist (void)
> +{
> + return optimize > 0 && flag_gcse
> + && !cfun->calls_setjmp
> + /* It does not make sense to run code hoisting unless we are optimizing
> + for code size -- it rarely makes programs faster, and can make then
> + bigger if we did PRE (when optimizing for space, we don't run PRE). */
> + && optimize_function_for_size_p (cfun)
> + && dbg_cnt (hoist);
> +}
> +
> +static unsigned int
> +execute_rtl_hoist (void)
> +{
> + delete_unreachable_blocks ();
> + df_note_add_problem ();
> + df_analyze ();
> + flag_rerun_cse_after_global_opts |= one_code_hoisting_pass ();
> + return 0;
> +}
> +
> +static bool
> +gate_rtl_store_motion (void)
> +{
> + return optimize > 0 && flag_gcse_sm
> + && !cfun->calls_setjmp
> + && optimize_function_for_speed_p (cfun)
> + && dbg_cnt (store_motion);
> +}
> +
> +static unsigned int
> +execute_rtl_store_motion (void)
> +{
> + delete_unreachable_blocks ();
> + df_note_add_problem ();
> + df_analyze ();
> + flag_rerun_cse_after_global_opts |= one_store_motion_pass ();
> return 0;
> }
>
> -struct rtl_opt_pass pass_jump_bypass =
> +struct rtl_opt_pass pass_rtl_cprop =
> {
> {
> RTL_PASS,
> - "bypass", /* name */
> - gate_handle_jump_bypass, /* gate */
> - rest_of_handle_jump_bypass, /* execute */
> + "cprop", /* name */
> + gate_rtl_cprop, /* gate */
> + execute_rtl_cprop, /* execute */
> NULL, /* sub */
> NULL, /* next */
> 0, /* static_pass_number */
> - TV_BYPASS, /* tv_id */
> + TV_CPROP, /* tv_id */
> PROP_cfglayout, /* properties_required */
> 0, /* properties_provided */
> 0, /* properties_destroyed */
> 0, /* todo_flags_start */
> + TODO_df_finish | TODO_verify_rtl_sharing |
> TODO_dump_func |
> - TODO_ggc_collect | TODO_verify_flow /* todo_flags_finish */
> + TODO_verify_flow | TODO_ggc_collect /* todo_flags_finish */
> }
> };
>
> -
> -static bool
> -gate_handle_gcse (void)
> +struct rtl_opt_pass pass_rtl_pre =
> {
> - return optimize > 0 && flag_gcse
> - && dbg_cnt (gcse);
> -}
> -
> + {
> + RTL_PASS,
> + "pre", /* name */
> + gate_rtl_pre, /* gate */
> + execute_rtl_pre, /* execute */
> + NULL, /* sub */
> + NULL, /* next */
> + 0, /* static_pass_number */
> + TV_PRE, /* tv_id */
> + PROP_cfglayout, /* properties_required */
> + 0, /* properties_provided */
> + 0, /* properties_destroyed */
> + 0, /* todo_flags_start */
> + TODO_df_finish | TODO_verify_rtl_sharing |
> + TODO_dump_func |
> + TODO_verify_flow | TODO_ggc_collect /* todo_flags_finish */
> + }
> +};
>
> -static unsigned int
> -rest_of_handle_gcse (void)
> +struct rtl_opt_pass pass_rtl_hoist =
> {
> - int save_csb, save_cfj;
> - int tem2 = 0, tem;
> - tem = gcse_main (get_insns ());
> - delete_trivially_dead_insns (get_insns (), max_reg_num ());
> - rebuild_jump_labels (get_insns ());
> - save_csb = flag_cse_skip_blocks;
> - save_cfj = flag_cse_follow_jumps;
> - flag_cse_skip_blocks = flag_cse_follow_jumps = 0;
> -
> - /* If -fexpensive-optimizations, re-run CSE to clean up things done
> - by gcse. */
> - if (flag_expensive_optimizations)
> - {
> - timevar_push (TV_CSE);
> - tem2 = cse_main (get_insns (), max_reg_num ());
> - df_finish_pass (false);
> - purge_all_dead_edges ();
> - delete_trivially_dead_insns (get_insns (), max_reg_num ());
> - timevar_pop (TV_CSE);
> - cse_not_expected = !flag_rerun_cse_after_loop;
> - }
> -
> - /* If gcse or cse altered any jumps, rerun jump optimizations to clean
> - things up. */
> - if (tem || tem2 == 2)
> - {
> - timevar_push (TV_JUMP);
> - rebuild_jump_labels (get_insns ());
> - cleanup_cfg (0);
> - timevar_pop (TV_JUMP);
> - }
> - else if (tem2 == 1)
> - cleanup_cfg (0);
> -
> - flag_cse_skip_blocks = save_csb;
> - flag_cse_follow_jumps = save_cfj;
> - return 0;
> -}
> + {
> + RTL_PASS,
> + "hoist", /* name */
> + gate_rtl_hoist, /* gate */
> + execute_rtl_hoist, /* execute */
> + NULL, /* sub */
> + NULL, /* next */
> + 0, /* static_pass_number */
> + TV_HOIST, /* tv_id */
> + PROP_cfglayout, /* properties_required */
> + 0, /* properties_provided */
> + 0, /* properties_destroyed */
> + 0, /* todo_flags_start */
> + TODO_df_finish | TODO_verify_rtl_sharing |
> + TODO_dump_func |
> + TODO_verify_flow | TODO_ggc_collect /* todo_flags_finish */
> + }
> +};
>
> -struct rtl_opt_pass pass_gcse =
> +struct rtl_opt_pass pass_rtl_store_motion =
> {
> {
> RTL_PASS,
> - "gcse1", /* name */
> - gate_handle_gcse, /* gate */
> - rest_of_handle_gcse, /* execute */
> + "store_motion", /* name */
> + gate_rtl_store_motion, /* gate */
> + execute_rtl_store_motion, /* execute */
> NULL, /* sub */
> NULL, /* next */
> 0, /* static_pass_number */
> - TV_GCSE, /* tv_id */
> + TV_LSM, /* tv_id */
> PROP_cfglayout, /* properties_required */
> 0, /* properties_provided */
> 0, /* properties_destroyed */
> @@ -6423,5 +6315,4 @@ struct rtl_opt_pass pass_gcse =
> }
> };
>
> -
> #include "gt-gcse.h"
> Index: common.opt
> ===================================================================
> --- common.opt (revision 146798)
> +++ common.opt (working copy)
> @@ -397,8 +397,8 @@ Common Report Var(flag_cse_follow_jumps)
> When running CSE, follow jumps to their targets
>
> fcse-skip-blocks
> -Common Report Var(flag_cse_skip_blocks) Optimization
> -When running CSE, follow conditional jumps
> +Common
> +Does nothing. Preserved for backward compatibility.
>
> fcx-limited-range
> Common Report Var(flag_cx_limited_range) Optimization
> Index: passes.c
> ===================================================================
> --- passes.c (revision 146798)
> +++ passes.c (working copy)
> @@ -733,7 +733,12 @@ init_optimization_passes (void)
> NEXT_PASS (pass_df_initialize_opt);
> NEXT_PASS (pass_cse);
> NEXT_PASS (pass_rtl_fwprop);
> - NEXT_PASS (pass_gcse);
> + NEXT_PASS (pass_rtl_cprop);
> + NEXT_PASS (pass_rtl_pre);
> + NEXT_PASS (pass_rtl_hoist);
> + NEXT_PASS (pass_rtl_cprop);
> + NEXT_PASS (pass_rtl_store_motion);
> + NEXT_PASS (pass_cse_after_global_opts);
> NEXT_PASS (pass_rtl_ifcvt);
> /* Perform loop optimizations. It might be better to do them a bit
> sooner, but we want the profile feedback to work more
> @@ -750,7 +755,7 @@ init_optimization_passes (void)
> *p = NULL;
> }
> NEXT_PASS (pass_web);
> - NEXT_PASS (pass_jump_bypass);
> + NEXT_PASS (pass_rtl_cprop);
> NEXT_PASS (pass_cse2);
> NEXT_PASS (pass_rtl_dse1);
> NEXT_PASS (pass_rtl_fwprop_addr);
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH][RFA] Split passes from gcse_main into separate passes
2009-04-26 21:27 Steven Bosscher
@ 2009-04-26 21:37 ` Steven Bosscher
2009-04-27 8:45 ` Richard Guenther
2010-12-03 15:05 ` H.J. Lu
2 siblings, 0 replies; 6+ messages in thread
From: Steven Bosscher @ 2009-04-26 21:37 UTC (permalink / raw)
To: GCC Patches; +Cc: Richard Guenther
On Sun, Apr 26, 2009 at 11:23 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> Bootstrapped&tested (-m32/-m64) with c,c++,fortran on
> ia64-unknown-linux-gnu with store motion enabled.
Not -m32/-m64 for ia64, of course...
Copy&paste, the enemy of quality.
Ciao!
Steven
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH][RFA] Split passes from gcse_main into separate passes
@ 2009-04-26 21:27 Steven Bosscher
2009-04-26 21:37 ` Steven Bosscher
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Steven Bosscher @ 2009-04-26 21:27 UTC (permalink / raw)
To: GCC Patches; +Cc: Richard Guenther
Hi,
This is the next step in the big gcse cleanup. This patch splits the
uber-pass gcse_main() into individual passes: CPROP, PRE, HOIST,
STORE_MOTION, and local CSE.
After this split, we no longer have a reason to *not* alter jumps or
bypass jumps in CPROP. Therefore, the separate jump bypassing pass is
removed and CPROP now always alters jumps and bypasses jumps (this is
extremely cheap anyway).
Bootstrapped&tested (-m32/-m64) with c,c++,fortran on
x86_64-unknown-linux-gnu with store motion enabled.
Bootstrapped&tested (-m32/-m64) with c,c++,fortran on
ia64-unknown-linux-gnu with store motion enabled.
Bootstrap&test of all languages ongoing on ia64-unknown-linux-gnu.
OK for trunk if everything passes?
Ciao!
Steven
* dbgcnt.def (cprop1, cprop2, gcse, jump_bypass): Remove
(cprop, hoist, pre, store_motion): New debug counters.
* tree-pass.h (pass_tracer): Move to list of gimple passes, it
is not an RTL pass anymore.
(pass_profiling): Remove extern decl for pass removed in 2005.
(pass_gcse, pass_jump_bypass): Remove.
* final.c (rest_of_clean_state): Set flag_rerun_cse_after_global_opts
to 0 for clean state.
* toplev.h (flag_rerun_cse_after_global_opts): Add extern declaration.
* cse.c (gate_handle_cse_after_global_opts,
rest_of_handle_cse_after_global_opts): New functions.
(pass_cse_after_global_opts): New pass, does local CSE.
* timevar.def (TV_GCSE, TV_CPROP1, TV_CPROP2, TV_BYPASS): Remove.
(TV_CPROP): New timevar.
* gcse.c (flag_rerun_cse_after_global_opts): New global variable.
(run_jump_opt_after_gcse, max_gcse_regno): Remove global vars.
(gcse_main, recompute_all_luids): Remove.
(compute_hash_table_work): Call max_reg_num instead of reading
max_gcse_regno.
(cprop_jump): Don't set run_jump_opt_after_gcse.
(constprop_register): Always allow to alter jumps.
(cprop_insn): Likewise.
(do_local_cprop): Likewise.
(local_cprop_pass): Likewise. Return non-zero if something changed.
(cprop): Remove function, fold interesting bits into one_cprop_pass.
(find_implicit_sets): Add note about missed optimization opportunity.
(one_cprop_pass): Rewrite to be "the" CPROP pass, called from the
pass_rtl_cprop execute function.
Don't bother tracking the pass number, each pass gets its own dumpfile
now anyway.
Always allow to alter jumpsand bypass jumps.
(bypass_block): Don't ignore regno >= max_gcse_regno, find_bypass_set
will just find no suitable set.
(pre_edge_insert): Fix dumping, this function is for PRE only.
(one_pre_gcse_pass): Rewrite to be "the" PRE pass, called from the
pass_rtl_pre execute function.
(hoist_code): Return non-zero if something changed. Keep track of
substitutions and insertions for statistics gathering similar to PRE.
(one_code_hoisting_pass): Rewrite to be "the" code hoisting pass,
called from the pass_rtl_hoist execute function. Show pass statistics.
(compute_store_table): Use max_reg_num directly instead of using the
formerly global max_gcse_regno.
(build_store_vectors): Likewise.
(replace_store_insn): Fix dumping.
(store_motion): Rename to ...
(one_store_motion_pass): ... this. Rewrite to be "the" STORE_MOTION
pass, called from the pass_rtl_store_motion execute function. Keep
track of substitutions and insertions for statistics gathering similar
to PRE.
(bypass_jumps): Remove, fold interesting bits into ...
(one_cprop_pass): ... this. Rewrite to be "the" CPROP pass, called
from the pass_rtl_cprop execute function.
(gate_handle_jump_bypass, rest_of_handle_jump_bypass,
pass_jump_bypass): Remove.
(gate_handle_gcse, rest_of_handle_gcse): Remove.
(gate_rtl_cprop, execute_rtl_cprop, pass_rtl_cprop): New.
(gate_rtl_pre, execute_rtl_pre, pass_rtl_pre): New.
(gate_rtl_hoist, execute_rtl_hoist, pass_rtl_hoist): New.
(gate_rtl_store_motion, execute_rtl_store_motion,
pass_rtl_store_motion): New.
* common.opt: Remove flag_cse_skip_blocks, adjust documentation to
make it clear that -fcse-skip-blocks is a no-op for backward compat.
* passes.c (init_optimization_passes): Remove pass_gcse and
pass_jump_bypass. Schedule cprop, pre, hoist, cprop, store_motion,
and cse_after_global_opts in place of pass_gcse. Schedule cprop
instead of pass_jump_bypass.
Index: dbgcnt.def
===================================================================
--- dbgcnt.def (revision 146798)
+++ dbgcnt.def (working copy)
@@ -145,8 +145,7 @@ DEBUG_COUNTER (auto_inc_dec)
DEBUG_COUNTER (ccp)
DEBUG_COUNTER (cfg_cleanup)
DEBUG_COUNTER (cse2_move2add)
-DEBUG_COUNTER (cprop1)
-DEBUG_COUNTER (cprop2)
+DEBUG_COUNTER (cprop)
DEBUG_COUNTER (dce)
DEBUG_COUNTER (dce_fast)
DEBUG_COUNTER (dce_ud)
@@ -155,17 +154,17 @@ DEBUG_COUNTER (df_byte_scan)
DEBUG_COUNTER (dse)
DEBUG_COUNTER (dse1)
DEBUG_COUNTER (dse2)
-DEBUG_COUNTER (gcse)
DEBUG_COUNTER (gcse2_delete)
DEBUG_COUNTER (global_alloc_at_func)
DEBUG_COUNTER (global_alloc_at_reg)
+DEBUG_COUNTER (hoist)
DEBUG_COUNTER (ia64_sched2)
DEBUG_COUNTER (if_conversion)
DEBUG_COUNTER (if_after_combine)
DEBUG_COUNTER (if_after_reload)
-DEBUG_COUNTER (jump_bypass)
DEBUG_COUNTER (local_alloc_for_sched)
DEBUG_COUNTER (postreload_cse)
+DEBUG_COUNTER (pre)
DEBUG_COUNTER (pre_insn)
DEBUG_COUNTER (treepre_insert)
DEBUG_COUNTER (sched2_func)
@@ -177,5 +176,6 @@ DEBUG_COUNTER (sel_sched_cnt)
DEBUG_COUNTER (sel_sched_region_cnt)
DEBUG_COUNTER (sel_sched_insn_cnt)
DEBUG_COUNTER (sms_sched_loop)
+DEBUG_COUNTER (store_motion)
DEBUG_COUNTER (split_for_sched2)
DEBUG_COUNTER (tail_call)
Index: tree-pass.h
===================================================================
--- tree-pass.h (revision 146798)
+++ tree-pass.h (working copy)
@@ -396,6 +396,7 @@ extern struct gimple_opt_pass pass_rebui
extern struct gimple_opt_pass pass_remove_cgraph_callee_edges;
extern struct gimple_opt_pass pass_build_cgraph_edges;
extern struct gimple_opt_pass pass_local_pure_const;
+extern struct gimple_opt_pass pass_tracer;
/* IPA Passes */
extern struct ipa_opt_pass pass_ipa_inline;
@@ -437,11 +438,12 @@ extern struct rtl_opt_pass pass_rtl_dce;
extern struct rtl_opt_pass pass_rtl_dse1;
extern struct rtl_opt_pass pass_rtl_dse2;
extern struct rtl_opt_pass pass_rtl_dse3;
-extern struct rtl_opt_pass pass_gcse;
-extern struct rtl_opt_pass pass_jump_bypass;
-extern struct rtl_opt_pass pass_profiling;
+extern struct rtl_opt_pass pass_rtl_cprop;
+extern struct rtl_opt_pass pass_rtl_pre;
+extern struct rtl_opt_pass pass_rtl_hoist;
+extern struct rtl_opt_pass pass_rtl_store_motion;
+extern struct rtl_opt_pass pass_cse_after_global_opts;
extern struct rtl_opt_pass pass_rtl_ifcvt;
-extern struct gimple_opt_pass pass_tracer;
extern struct rtl_opt_pass pass_into_cfg_layout_mode;
extern struct rtl_opt_pass pass_outof_cfg_layout_mode;
Index: final.c
===================================================================
--- final.c (revision 146798)
+++ final.c (working copy)
@@ -4298,6 +4298,7 @@ rest_of_clean_state (void)
sdbout_types (NULL_TREE);
#endif
+ flag_rerun_cse_after_global_opts = 0;
reload_completed = 0;
epilogue_completed = 0;
#ifdef STACK_REGS
Index: toplev.h
===================================================================
--- toplev.h (revision 146798)
+++ toplev.h (working copy)
@@ -132,6 +132,7 @@ extern int flag_if_conversion;
extern int flag_if_conversion2;
extern int flag_keep_static_consts;
extern int flag_peel_loops;
+extern int flag_rerun_cse_after_global_opts;
extern int flag_rerun_cse_after_loop;
extern int flag_thread_jumps;
extern int flag_tracer;
Index: cse.c
===================================================================
--- cse.c (revision 146798)
+++ cse.c (working copy)
@@ -6997,3 +6997,65 @@ struct rtl_opt_pass pass_cse2 =
TODO_verify_flow /* todo_flags_finish */
}
};
+
+static bool
+gate_handle_cse_after_global_opts (void)
+{
+ return optimize > 0 && flag_rerun_cse_after_global_opts;
+}
+
+/* Run second CSE pass after loop optimizations. */
+static unsigned int
+rest_of_handle_cse_after_global_opts (void)
+{
+ int save_cfj;
+ int tem;
+
+ /* We only want to do local CSE, so don't follow jumps. */
+ save_cfj = flag_cse_follow_jumps;
+ flag_cse_follow_jumps = 0;
+
+ rebuild_jump_labels (get_insns ());
+ tem = cse_main (get_insns (), max_reg_num ());
+ purge_all_dead_edges ();
+ delete_trivially_dead_insns (get_insns (), max_reg_num ());
+
+ cse_not_expected = !flag_rerun_cse_after_loop;
+
+ /* If cse altered any jumps, rerun jump opts to clean things up. */
+ if (tem == 2)
+ {
+ timevar_push (TV_JUMP);
+ rebuild_jump_labels (get_insns ());
+ cleanup_cfg (0);
+ timevar_pop (TV_JUMP);
+ }
+ else if (tem == 1)
+ cleanup_cfg (0);
+
+ flag_cse_follow_jumps = save_cfj;
+ return 0;
+}
+
+struct rtl_opt_pass pass_cse_after_global_opts =
+{
+ {
+ RTL_PASS,
+ "cse_local", /* name */
+ gate_handle_cse_after_global_opts, /* gate */
+ rest_of_handle_cse_after_global_opts, /* execute */
+ NULL, /* sub */
+ NULL, /* next */
+ 0, /* static_pass_number */
+ TV_CSE, /* tv_id */
+ 0, /* properties_required */
+ 0, /* properties_provided */
+ 0, /* properties_destroyed */
+ 0, /* todo_flags_start */
+ TODO_df_finish | TODO_verify_rtl_sharing |
+ TODO_dump_func |
+ TODO_ggc_collect |
+ TODO_verify_flow /* todo_flags_finish */
+ }
+};
+
Index: timevar.def
===================================================================
--- timevar.def (revision 146798)
+++ timevar.def (working copy)
@@ -154,13 +154,10 @@ DEFTIMEVAR (TV_DCE , "
DEFTIMEVAR (TV_DSE1 , "dead store elim1")
DEFTIMEVAR (TV_DSE2 , "dead store elim2")
DEFTIMEVAR (TV_LOOP , "loop analysis")
-DEFTIMEVAR (TV_GCSE , "global CSE")
-DEFTIMEVAR (TV_CPROP1 , "CPROP 1")
+DEFTIMEVAR (TV_CPROP , "CPROP")
DEFTIMEVAR (TV_PRE , "PRE")
DEFTIMEVAR (TV_HOIST , "code hoisting")
-DEFTIMEVAR (TV_CPROP2 , "CPROP 2")
DEFTIMEVAR (TV_LSM , "LSM")
-DEFTIMEVAR (TV_BYPASS , "bypass jumps")
DEFTIMEVAR (TV_TRACER , "tracer")
DEFTIMEVAR (TV_WEB , "web")
DEFTIMEVAR (TV_AUTO_INC_DEC , "auto inc dec")
Index: gcse.c
===================================================================
--- gcse.c (revision 146799)
+++ gcse.c (working copy)
@@ -280,13 +280,8 @@ along with GCC; see the file COPYING3.
\f
/* GCSE global vars. */
-/* Note whether or not we should run jump optimization after gcse. We
- want to do this for two cases.
-
- * If we changed any jumps via cprop.
-
- * If we added any labels via edge splitting. */
-static int run_jump_opt_after_gcse;
+/* Set to non-zero if CSE should run after all GCSE optimizations are done. */
+int flag_rerun_cse_after_global_opts;
/* An obstack for our working variables. */
static struct obstack gcse_obstack;
@@ -370,11 +365,6 @@ static struct hash_table expr_hash_table
/* Copy propagation hash table. */
static struct hash_table set_hash_table;
-/* Maximum register number in function prior to doing gcse + 1.
- Registers created during this pass have regno >= max_gcse_regno.
- This is named with "gcse" to not collide with global of same name. */
-static unsigned int max_gcse_regno;
-
/* This is a list of expressions which are MEMs and will be used by load
or store motion.
Load motion tracks MEMs which aren't killed by
@@ -450,7 +440,6 @@ static int global_copy_prop_count;
static sbitmap *ae_kill, *ae_gen;
\f
static void compute_can_copy (void);
-static void recompute_all_luids (void);
static void *gmalloc (size_t) ATTRIBUTE_MALLOC;
static void *gcalloc (size_t, size_t) ATTRIBUTE_MALLOC;
static void *gcse_alloc (unsigned long);
@@ -502,11 +491,10 @@ static int cprop_jump (basic_block, rtx,
static void mems_conflict_for_gcse_p (rtx, const_rtx, void *);
static int load_killed_in_block_p (const_basic_block, int, const_rtx, int);
static void canon_list_insert (rtx, const_rtx, void *);
-static int cprop_insn (rtx, int);
-static int cprop (int);
+static int cprop_insn (rtx);
static void find_implicit_sets (void);
-static int one_cprop_pass (int, bool, bool);
-static bool constprop_register (rtx, rtx, rtx, bool);
+static int one_cprop_pass (void);
+static bool constprop_register (rtx, rtx, rtx);
static struct expr *find_bypass_set (int, int);
static bool reg_killed_on_edge (const_rtx, const_edge);
static int bypass_block (basic_block, rtx, rtx);
@@ -521,14 +509,14 @@ static void pre_insert_copy_insn (struct
static void pre_insert_copies (void);
static int pre_delete (void);
static int pre_gcse (void);
-static int one_pre_gcse_pass (int);
+static int one_pre_gcse_pass (void);
static void add_label_notes (rtx, rtx);
static void alloc_code_hoist_mem (int, int);
static void free_code_hoist_mem (void);
static void compute_code_hoist_vbeinout (void);
static void compute_code_hoist_data (void);
static int hoist_expr_reaches_here_p (basic_block, int, basic_block, char *);
-static void hoist_code (void);
+static int hoist_code (void);
static int one_code_hoisting_pass (void);
static rtx process_insert_insn (struct expr *);
static int pre_edge_insert (struct edge_list *, struct expr **);
@@ -566,14 +554,14 @@ static void remove_reachable_equiv_notes
static void replace_store_insn (rtx, rtx, basic_block, struct ls_expr *);
static void delete_store (struct ls_expr *, basic_block);
static void free_store_memory (void);
-static void store_motion (void);
+static int one_store_motion_pass (void);
static void free_insn_expr_list_list (rtx *);
static void clear_modify_mem_tables (void);
static void free_modify_mem_tables (void);
static rtx gcse_emit_move_after (rtx, rtx, rtx);
static void local_cprop_find_used_regs (rtx *, void *);
-static bool do_local_cprop (rtx, rtx, bool);
-static void local_cprop_pass (bool);
+static bool do_local_cprop (rtx, rtx);
+static int local_cprop_pass (void);
static bool is_too_expensive (const char *);
#define GNEW(T) ((T *) gmalloc (sizeof (T)))
@@ -588,155 +576,6 @@ static bool is_too_expensive (const char
#define GOBNEW(T) ((T *) gcse_alloc (sizeof (T)))
#define GOBNEWVAR(T, S) ((T *) gcse_alloc ((S)))
\f
-
-/* Entry point for global common subexpression elimination.
- F is the first instruction in the function. Return nonzero if a
- change is mode. */
-
-static int
-gcse_main (rtx f ATTRIBUTE_UNUSED)
-{
- int changed;
- /* Point to release obstack data from for each pass. */
- char *gcse_obstack_bottom;
-
- /* We do not construct an accurate cfg in functions which call
- setjmp, so just punt to be safe. */
- if (cfun->calls_setjmp)
- return 0;
-
- /* Assume that we do not need to run jump optimizations after gcse. */
- run_jump_opt_after_gcse = 0;
-
- /* Identify the basic block information for this function, including
- successors and predecessors. */
- max_gcse_regno = max_reg_num ();
-
- df_note_add_problem ();
- df_analyze ();
-
- if (dump_file)
- dump_flow_info (dump_file, dump_flags);
-
- /* Return if there's nothing to do, or it is too expensive. */
- if (n_basic_blocks <= NUM_FIXED_BLOCKS + 1
- || is_too_expensive (_("GCSE disabled")))
- return 0;
-
- gcc_obstack_init (&gcse_obstack);
- bytes_used = 0;
-
- /* We need alias. */
- init_alias_analysis ();
-
- gcse_obstack_bottom = GOBNEWVAR (char, 1);
- changed = 0;
-
- if (dump_file)
- fprintf (dump_file, "GCSE pass\n\n");
-
- max_gcse_regno = max_reg_num ();
-
- alloc_gcse_mem ();
-
- /* Don't allow constant propagation to modify jumps
- during this pass. */
- if (dbg_cnt (cprop1))
- {
- timevar_push (TV_CPROP1);
- changed = one_cprop_pass (1, false, false);
- if (changed)
- recompute_all_luids ();
- timevar_pop (TV_CPROP1);
- }
-
- if (optimize_function_for_speed_p (cfun))
- {
- timevar_push (TV_PRE);
- changed |= one_pre_gcse_pass (1);
- /* We may have just created new basic blocks. Release and
- recompute various things which are sized on the number of
- basic blocks.
- ??? There would be no need for this if we used a block
- based Lazy Code Motion variant, with all (or selected)
- edges split before running the pass. That would also
- help find_implicit_sets for cprop. FIXME. */
- if (changed)
- {
- free_modify_mem_tables ();
- modify_mem_list = GCNEWVEC (rtx, last_basic_block);
- canon_modify_mem_list = GCNEWVEC (rtx, last_basic_block);
- }
-
- df_analyze ();
- run_jump_opt_after_gcse = 1;
- timevar_pop (TV_PRE);
- }
- else
- {
- /* This function is being optimized for code size.
- It does not make sense to run code hoisting unless we are optimizing
- for code size -- it rarely makes programs faster, and can make
- them bigger if we did partial redundancy elimination (when optimizing
- for space, we don't run the partial redundancy algorithms). */
- timevar_push (TV_HOIST);
- max_gcse_regno = max_reg_num ();
- alloc_gcse_mem ();
- one_code_hoisting_pass ();
- timevar_pop (TV_HOIST);
- }
-
- free_gcse_mem ();
-
- if (dump_file)
- {
- fprintf (dump_file, "\n");
- fflush (dump_file);
- }
-
- obstack_free (&gcse_obstack, gcse_obstack_bottom);
-
- /* Do the second const/copy propagation pass, including cprop into
- conditional jumps. */
- if (dbg_cnt (cprop2))
- {
- max_gcse_regno = max_reg_num ();
- alloc_gcse_mem ();
-
- /* This time, go ahead and allow cprop to alter jumps. */
- timevar_push (TV_CPROP2);
- changed = one_cprop_pass (2, true, true);
- if (changed)
- recompute_all_luids ();
- timevar_pop (TV_CPROP2);
- free_gcse_mem ();
- }
-
- if (dump_file)
- {
- fprintf (dump_file, "GCSE of %s: %d basic blocks, ",
- current_function_name (), n_basic_blocks);
- fprintf (dump_file, "pass 1, %d bytes\n\n", bytes_used);
- }
-
- obstack_free (&gcse_obstack, NULL);
-
- /* We are finished with alias.
- ??? Actually we recompute alias in store_motion. */
- end_alias_analysis ();
-
- /* Run store motion. */
- if (optimize_function_for_speed_p (cfun) && flag_gcse_sm)
- {
- timevar_push (TV_LSM);
- store_motion ();
- timevar_pop (TV_LSM);
- }
-
- /* Record where pseudo-registers are set. */
- return run_jump_opt_after_gcse;
-}
-\f
/* Misc. utilities. */
/* Nonzero for each mode that supports (set (reg) (reg)).
@@ -790,19 +629,6 @@ can_copy_p (enum machine_mode mode)
return can_copy[mode] != 0;
}
-/* Recompute the DF LUIDs for all basic blocks. If a sub-pass in this
- file changes something, we have to recompute them for the next pass.
- FIXME: If we would track which basic blocks we touch, we could
- update LUIDs in only those basic blocks. */
-
-static void
-recompute_all_luids (void)
-{
- basic_block bb;
- FOR_EACH_BB (bb)
- df_recompute_luids (bb);
-}
-
\f
/* Cover function to xmalloc to record bytes allocated. */
@@ -1447,11 +1273,10 @@ insert_set_in_table (rtx x, rtx insn, st
/* First occurrence of this expression in this basic block. */
cur_occr = GOBNEW (struct occr);
bytes_used += sizeof (struct occr);
-
- cur_occr->insn = insn;
- cur_occr->next = cur_expr->avail_occr;
- cur_occr->deleted_p = 0;
- cur_expr->avail_occr = cur_occr;
+ cur_occr->insn = insn;
+ cur_occr->next = cur_expr->avail_occr;
+ cur_occr->deleted_p = 0;
+ cur_expr->avail_occr = cur_occr;
}
}
@@ -1839,14 +1664,14 @@ record_last_set_info (rtx dest, const_rt
static void
compute_hash_table_work (struct hash_table *table)
{
- unsigned int i;
+ int i;
/* re-Cache any INSN_LIST nodes we have allocated. */
clear_modify_mem_tables ();
/* Some working arrays used to track first and last set in each block. */
- reg_avail_info = GNEWVEC (struct reg_avail_info, max_gcse_regno);
+ reg_avail_info = GNEWVEC (struct reg_avail_info, max_reg_num ());
- for (i = 0; i < max_gcse_regno; ++i)
+ for (i = 0; i < max_reg_num (); ++i)
reg_avail_info[i].last_bb = NULL;
FOR_EACH_BB (current_bb)
@@ -2631,8 +2456,6 @@ cprop_jump (basic_block bb, rtx setcc, r
delete_insn (setcc);
#endif
- run_jump_opt_after_gcse = 1;
-
global_const_prop_count++;
if (dump_file != NULL)
{
@@ -2666,14 +2489,13 @@ cprop_jump (basic_block bb, rtx setcc, r
}
static bool
-constprop_register (rtx insn, rtx from, rtx to, bool alter_jumps)
+constprop_register (rtx insn, rtx from, rtx to)
{
rtx sset;
/* Check for reg or cc0 setting instructions followed by
conditional branch instructions first. */
- if (alter_jumps
- && (sset = single_set (insn)) != NULL
+ if ((sset = single_set (insn)) != NULL
&& NEXT_INSN (insn)
&& any_condjump_p (NEXT_INSN (insn)) && onlyjump_p (NEXT_INSN (insn)))
{
@@ -2694,7 +2516,7 @@ constprop_register (rtx insn, rtx from,
Right now the insn in question must look like
(set (pc) (if_then_else ...)) */
- else if (alter_jumps && any_condjump_p (insn) && onlyjump_p (insn))
+ else if (any_condjump_p (insn) && onlyjump_p (insn))
return cprop_jump (BLOCK_FOR_INSN (insn), NULL, insn, from, to);
return 0;
}
@@ -2703,7 +2525,7 @@ constprop_register (rtx insn, rtx from,
The result is nonzero if a change was made. */
static int
-cprop_insn (rtx insn, int alter_jumps)
+cprop_insn (rtx insn)
{
struct reg_use *reg_used;
int changed = 0;
@@ -2728,11 +2550,6 @@ cprop_insn (rtx insn, int alter_jumps)
rtx pat, src;
struct expr *set;
- /* Ignore registers created by GCSE.
- We do this because ... */
- if (regno >= max_gcse_regno)
- continue;
-
/* If the register has already been set in this block, there's
nothing we can do. */
if (! oprs_not_set_p (reg_used->reg_rtx, insn))
@@ -2753,7 +2570,7 @@ cprop_insn (rtx insn, int alter_jumps)
/* Constant propagation. */
if (gcse_constant_p (src))
{
- if (constprop_register (insn, reg_used->reg_rtx, src, alter_jumps))
+ if (constprop_register (insn, reg_used->reg_rtx, src))
{
changed = 1;
global_const_prop_count++;
@@ -2840,11 +2657,10 @@ local_cprop_find_used_regs (rtx *xptr, v
find_used_regs (xptr, data);
}
-/* Try to perform local const/copy propagation on X in INSN.
- If ALTER_JUMPS is false, changing jump insns is not allowed. */
+/* Try to perform local const/copy propagation on X in INSN. */
static bool
-do_local_cprop (rtx x, rtx insn, bool alter_jumps)
+do_local_cprop (rtx x, rtx insn)
{
rtx newreg = NULL, newcnst = NULL;
@@ -2877,7 +2693,7 @@ do_local_cprop (rtx x, rtx insn, bool al
|| ! MEM_P (XEXP (note, 0))))
newreg = this_rtx;
}
- if (newcnst && constprop_register (insn, x, newcnst, alter_jumps))
+ if (newcnst && constprop_register (insn, x, newcnst))
{
if (dump_file != NULL)
{
@@ -2907,12 +2723,10 @@ do_local_cprop (rtx x, rtx insn, bool al
return false;
}
-/* Do local const/copy propagation (i.e. within each basic block).
- If ALTER_JUMPS is true, allow propagating into jump insns, which
- could modify the CFG. */
+/* Do local const/copy propagation (i.e. within each basic block). */
-static void
-local_cprop_pass (bool alter_jumps)
+static int
+local_cprop_pass (void)
{
basic_block bb;
rtx insn;
@@ -2938,7 +2752,7 @@ local_cprop_pass (bool alter_jumps)
for (reg_used = ®_use_table[0]; reg_use_count > 0;
reg_used++, reg_use_count--)
{
- if (do_local_cprop (reg_used->reg_rtx, insn, alter_jumps))
+ if (do_local_cprop (reg_used->reg_rtx, insn))
{
changed = true;
break;
@@ -2958,52 +2772,6 @@ local_cprop_pass (bool alter_jumps)
cselib_finish ();
- /* Global analysis may get into infinite loops for unreachable blocks. */
- if (changed && alter_jumps)
- delete_unreachable_blocks ();
-}
-
-/* Forward propagate copies. This includes copies and constants. Return
- nonzero if a change was made. */
-
-static int
-cprop (int alter_jumps)
-{
- int changed;
- basic_block bb;
- rtx insn;
-
- /* Note we start at block 1. */
- if (ENTRY_BLOCK_PTR->next_bb == EXIT_BLOCK_PTR)
- {
- if (dump_file != NULL)
- fprintf (dump_file, "\n");
- return 0;
- }
-
- changed = 0;
- FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR->next_bb->next_bb,
EXIT_BLOCK_PTR, next_bb)
- {
- /* Reset tables used to keep track of what's still valid [since the
- start of the block]. */
- reset_opr_set_tables ();
-
- FOR_BB_INSNS (bb, insn)
- if (INSN_P (insn))
- {
- changed |= cprop_insn (insn, alter_jumps);
-
- /* Keep track of everything modified by this insn. */
- /* ??? Need to be careful w.r.t. mods done to INSN. Don't
- call mark_oprs_set if we turned the insn into a NOTE. */
- if (! NOTE_P (insn))
- mark_oprs_set (insn);
- }
- }
-
- if (dump_file != NULL)
- fprintf (dump_file, "\n");
-
return changed;
}
@@ -3060,7 +2828,12 @@ implicit_set_cond_p (const_rtx cond)
following "if (x == 2)", the then branch may be optimized as though the
conditional performed an "explicit set", in this example, "x = 2". This
function records the set patterns that are implicit at the start of each
- basic block. */
+ basic block.
+
+ FIXME: This would be more effective if critical edges are pre-split. As
+ it is now, we can't record implicit sets for blocks that have
+ critical successor edges. This results in missed optimizations
+ and in more (unnecessary) work in cfgcleanup.c:thread_jump(). */
static void
find_implicit_sets (void)
@@ -3085,7 +2858,9 @@ find_implicit_sets (void)
dest = GET_CODE (cond) == EQ ? BRANCH_EDGE (bb)->dest
: FALLTHRU_EDGE (bb)->dest;
- if (dest && single_pred_p (dest)
+ if (dest
+ /* Record nothing for a critical edge. */
+ && single_pred_p (dest)
&& dest != EXIT_BLOCK_PTR)
{
new_rtx = gen_rtx_SET (VOIDmode, XEXP (cond, 0),
@@ -3106,63 +2881,6 @@ find_implicit_sets (void)
fprintf (dump_file, "Found %d implicit sets\n", count);
}
-/* Perform one copy/constant propagation pass.
- PASS is the pass count. If CPROP_JUMPS is true, perform constant
- propagation into conditional jumps. If BYPASS_JUMPS is true,
- perform conditional jump bypassing optimizations. */
-
-static int
-one_cprop_pass (int pass, bool cprop_jumps, bool bypass_jumps)
-{
- int changed = 0;
-
- global_const_prop_count = local_const_prop_count = 0;
- global_copy_prop_count = local_copy_prop_count = 0;
-
- if (cprop_jumps)
- local_cprop_pass (cprop_jumps);
-
- /* Determine implicit sets. */
- implicit_sets = XCNEWVEC (rtx, last_basic_block);
- find_implicit_sets ();
-
- alloc_hash_table (get_max_uid (), &set_hash_table, 1);
- compute_hash_table (&set_hash_table);
-
- /* Free implicit_sets before peak usage. */
- free (implicit_sets);
- implicit_sets = NULL;
-
- if (dump_file)
- dump_hash_table (dump_file, "SET", &set_hash_table);
- if (set_hash_table.n_elems > 0)
- {
- alloc_cprop_mem (last_basic_block, set_hash_table.n_elems);
- compute_cprop_data ();
- changed = cprop (cprop_jumps);
- if (bypass_jumps)
- changed |= bypass_conditional_jumps ();
- free_cprop_mem ();
- }
-
- free_hash_table (&set_hash_table);
-
- if (dump_file)
- {
- fprintf (dump_file, "CPROP of %s, pass %d: %d bytes needed, ",
- current_function_name (), pass, bytes_used);
- fprintf (dump_file, "%d local const props, %d local copy props, ",
- local_const_prop_count, local_copy_prop_count);
- fprintf (dump_file, "%d global const props, %d global copy props\n\n",
- global_const_prop_count, global_copy_prop_count);
- }
- /* Global analysis may get into infinite loops for unreachable blocks. */
- if (changed && cprop_jumps)
- delete_unreachable_blocks ();
-
- return changed;
-}
-\f
/* Bypass conditional jumps. */
/* The value of last_basic_block at the beginning of the jump_bypass
@@ -3302,9 +3020,6 @@ bypass_block (basic_block bb, rtx setcc,
struct expr *set;
rtx src, new_rtx;
- if (regno >= max_gcse_regno)
- continue;
-
set = find_bypass_set (regno, e->src->index);
if (! set)
@@ -3888,7 +3603,7 @@ pre_edge_insert (struct edge_list *edge_
if (dump_file)
{
- fprintf (dump_file, "PRE/HOIST: edge (%d,%d), ",
+ fprintf (dump_file, "PRE: edge (%d,%d), ",
bb->index,
INDEX_EDGE_SUCC_BB (edge_list, e)->index);
fprintf (dump_file, "copy expression %d\n",
@@ -4232,13 +3947,25 @@ pre_gcse (void)
Return nonzero if a change was made. */
static int
-one_pre_gcse_pass (int pass)
+one_pre_gcse_pass (void)
{
int changed = 0;
gcse_subst_count = 0;
gcse_create_count = 0;
+ /* Return if there's nothing to do, or it is too expensive. */
+ if (n_basic_blocks <= NUM_FIXED_BLOCKS + 1
+ || is_too_expensive (_("PRE disabled")))
+ return 0;
+
+ /* We need alias. */
+ init_alias_analysis ();
+
+ bytes_used = 0;
+ gcc_obstack_init (&gcse_obstack);
+ alloc_gcse_mem ();
+
alloc_hash_table (get_max_uid (), &expr_hash_table, 0);
add_noreturn_fake_exit_edges ();
if (flag_gcse_lm)
@@ -4262,10 +3989,16 @@ one_pre_gcse_pass (int pass)
remove_fake_exit_edges ();
free_hash_table (&expr_hash_table);
+ free_gcse_mem ();
+ obstack_free (&gcse_obstack, NULL);
+
+ /* We are finished with alias. */
+ end_alias_analysis ();
+
if (dump_file)
{
- fprintf (dump_file, "\nPRE GCSE of %s, pass %d: %d bytes needed, ",
- current_function_name (), pass, bytes_used);
+ fprintf (dump_file, "PRE GCSE of %s, %d basic blocks, %d bytes needed, ",
+ current_function_name (), n_basic_blocks, bytes_used);
fprintf (dump_file, "%d substs, %d insns created\n",
gcse_subst_count, gcse_create_count);
}
@@ -4530,7 +4263,7 @@ hoist_expr_reaches_here_p (basic_block e
\f
/* Actually perform code hoisting. */
-static void
+static int
hoist_code (void)
{
basic_block bb, dominated;
@@ -4538,6 +4271,7 @@ hoist_code (void)
unsigned int i,j;
struct expr **index_map;
struct expr *expr;
+ int changed = 0;
sbitmap_vector_zero (hoist_exprs, last_basic_block);
@@ -4669,6 +4403,9 @@ hoist_code (void)
gcse_emit_move_after (expr->reaching_reg, SET_DEST (set), insn);
delete_insn (insn);
occr->deleted_p = 1;
+ changed = 1;
+ gcse_subst_count++;
+
if (!insn_inserted_p)
{
insert_insn_end_basic_block (index_map[i], bb, 0);
@@ -4682,6 +4419,8 @@ hoist_code (void)
}
free (index_map);
+
+ return changed;
}
/* Top level routine to perform one code hoisting (aka unification) pass
@@ -4693,6 +4432,21 @@ one_code_hoisting_pass (void)
{
int changed = 0;
+ gcse_subst_count = 0;
+ gcse_create_count = 0;
+
+ /* Return if there's nothing to do, or it is too expensive. */
+ if (n_basic_blocks <= NUM_FIXED_BLOCKS + 1
+ || is_too_expensive (_("GCSE disabled")))
+ return 0;
+
+ /* We need alias. */
+ init_alias_analysis ();
+
+ bytes_used = 0;
+ gcc_obstack_init (&gcse_obstack);
+ alloc_gcse_mem ();
+
alloc_hash_table (get_max_uid (), &expr_hash_table, 0);
compute_hash_table (&expr_hash_table);
if (dump_file)
@@ -4702,11 +4456,24 @@ one_code_hoisting_pass (void)
{
alloc_code_hoist_mem (last_basic_block, expr_hash_table.n_elems);
compute_code_hoist_data ();
- hoist_code ();
+ changed = hoist_code ();
free_code_hoist_mem ();
}
free_hash_table (&expr_hash_table);
+ free_gcse_mem ();
+ obstack_free (&gcse_obstack, NULL);
+
+ /* We are finished with alias. */
+ end_alias_analysis ();
+
+ if (dump_file)
+ {
+ fprintf (dump_file, "HOIST of %s, %d basic blocks, %d bytes needed, ",
+ current_function_name (), n_basic_blocks, bytes_used);
+ fprintf (dump_file, "%d substs, %d insns created\n",
+ gcse_subst_count, gcse_create_count);
+ }
return changed;
}
@@ -5441,8 +5208,7 @@ compute_store_table (void)
rtx insn, pat, tmp;
int *last_set_in, *already_set;
struct ls_expr * ptr, **prev_next_ptr_ptr;
-
- max_gcse_regno = max_reg_num ();
+ unsigned int max_gcse_regno = max_reg_num ();
pre_ldst_mems = 0;
pre_ldst_table = htab_create (13, pre_ldst_expr_hash,
@@ -5762,6 +5528,7 @@ build_store_vectors (void)
int *regs_set_in_block;
rtx insn, st;
struct ls_expr * ptr;
+ unsigned int max_gcse_regno = max_reg_num ();
/* Build the gen_vector. This is any store in the table which is not killed
by aliasing later in its block. */
@@ -6060,7 +5827,7 @@ replace_store_insn (rtx reg, rtx del, ba
fprintf (dump_file,
"STORE_MOTION delete insn in BB %d:\n ", bb->index);
print_inline_rtx (dump_file, del, 6);
- fprintf (dump_file, "\nSTORE MOTION replaced with insn:\n ");
+ fprintf (dump_file, "\nSTORE_MOTION replaced with insn:\n ");
print_inline_rtx (dump_file, insn, 6);
fprintf (dump_file, "\n");
}
@@ -6142,21 +5909,19 @@ free_store_memory (void)
}
/* Perform store motion. Much like gcse, except we move expressions the
- other way by looking at the flowgraph in reverse. */
+ other way by looking at the flowgraph in reverse.
+ Return non-zero if transformations are performed by the pass. */
-static void
-store_motion (void)
+static int
+one_store_motion_pass (void)
{
basic_block bb;
int x;
struct ls_expr * ptr;
int update_flow = 0;
- if (dump_file)
- {
- fprintf (dump_file, "before store motion\n");
- print_rtl (dump_file, get_insns ());
- }
+ gcse_subst_count = 0;
+ gcse_create_count = 0;
init_alias_analysis ();
@@ -6167,7 +5932,7 @@ store_motion (void)
htab_delete (pre_ldst_table);
pre_ldst_table = NULL;
end_alias_analysis ();
- return;
+ return 0;
}
/* Now compute kill & transp vectors. */
@@ -6203,11 +5968,17 @@ store_motion (void)
FOR_EACH_BB (bb)
if (TEST_BIT (pre_delete_map[bb->index], ptr->index))
- delete_store (ptr, bb);
+ {
+ delete_store (ptr, bb);
+ gcse_subst_count++;
+ }
for (x = 0; x < NUM_EDGES (edge_list); x++)
if (TEST_BIT (pre_insert_map[x], ptr->index))
- update_flow |= insert_store (ptr, INDEX_EDGE (edge_list, x));
+ {
+ update_flow |= insert_store (ptr, INDEX_EDGE (edge_list, x));
+ gcse_create_count++;
+ }
}
if (update_flow)
@@ -6217,59 +5988,19 @@ store_motion (void)
free_edge_list (edge_list);
remove_fake_exit_edges ();
end_alias_analysis ();
-}
-
-\f
-/* Entry point for jump bypassing optimization pass. */
-
-static int
-bypass_jumps (void)
-{
- int changed;
-
- /* We do not construct an accurate cfg in functions which call
- setjmp, so just punt to be safe. */
- if (cfun->calls_setjmp)
- return 0;
-
- /* Identify the basic block information for this function, including
- successors and predecessors. */
- max_gcse_regno = max_reg_num ();
-
- if (dump_file)
- dump_flow_info (dump_file, dump_flags);
-
- /* Return if there's nothing to do, or it is too expensive. */
- if (n_basic_blocks <= NUM_FIXED_BLOCKS + 1
- || is_too_expensive (_ ("jump bypassing disabled")))
- return 0;
-
- gcc_obstack_init (&gcse_obstack);
- bytes_used = 0;
-
- /* We need alias. */
- init_alias_analysis ();
-
- max_gcse_regno = max_reg_num ();
- alloc_gcse_mem ();
- changed = one_cprop_pass (3, true, true);
- free_gcse_mem ();
if (dump_file)
{
- fprintf (dump_file, "BYPASS of %s: %d basic blocks, ",
+ fprintf (dump_file, "STORE_MOTION of %s, %d basic blocks, ",
current_function_name (), n_basic_blocks);
- fprintf (dump_file, "%d bytes\n\n", bytes_used);
+ fprintf (dump_file, "%d substs, %d insns created\n",
+ gcse_subst_count, gcse_create_count);
}
- obstack_free (&gcse_obstack, NULL);
-
- /* We are finished with alias. */
- end_alias_analysis ();
-
- return changed;
+ return (gcse_subst_count > 0 || gcse_create_count > 0);
}
+\f
/* Return true if the graph is too expensive to optimize. PASS is the
optimization about to be performed. */
@@ -6309,110 +6040,271 @@ is_too_expensive (const char *pass)
return false;
}
+
+\f
+/* Main function for the CPROP pass. */
+
+static int
+one_cprop_pass (void)
+{
+ int changed = 0;
+
+ /* Return if there's nothing to do, or it is too expensive. */
+ if (n_basic_blocks <= NUM_FIXED_BLOCKS + 1
+ || is_too_expensive (_ ("const/copy propagation disabled")))
+ return 0;
+
+ global_const_prop_count = local_const_prop_count = 0;
+ global_copy_prop_count = local_copy_prop_count = 0;
+
+ bytes_used = 0;
+ gcc_obstack_init (&gcse_obstack);
+ alloc_gcse_mem ();
+
+ /* Do a local const/copy propagation pass first. The global pass
+ only handles global opportunities.
+ If the local pass changes something, remove any unreachable blocks
+ because the CPROP global dataflow analysis may get into infinite
+ loops for CFGs with unreachable blocks.
+
+ FIXME: This local pass should not be necessary after CSE (but for
+ some reason it still is). It is also (proven) not necessary
+ to run the local pass right after FWPWOP.
+
+ FIXME: The global analysis would not get into infinite loops if it
+ would use the DF solver (via df_simple_dataflow) instead of
+ the solver implemented in this file. */
+ if (local_cprop_pass ())
+ {
+ delete_unreachable_blocks ();
+ df_analyze ();
+ }
+
+ /* Determine implicit sets. */
+ implicit_sets = XCNEWVEC (rtx, last_basic_block);
+ find_implicit_sets ();
+
+ alloc_hash_table (get_max_uid (), &set_hash_table, 1);
+ compute_hash_table (&set_hash_table);
+
+ /* Free implicit_sets before peak usage. */
+ free (implicit_sets);
+ implicit_sets = NULL;
+
+ if (dump_file)
+ dump_hash_table (dump_file, "SET", &set_hash_table);
+ if (set_hash_table.n_elems > 0)
+ {
+ basic_block bb;
+ rtx insn;
+
+ alloc_cprop_mem (last_basic_block, set_hash_table.n_elems);
+ compute_cprop_data ();
+
+ FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR->next_bb->next_bb,
EXIT_BLOCK_PTR, next_bb)
+ {
+ /* Reset tables used to keep track of what's still valid [since
+ the start of the block]. */
+ reset_opr_set_tables ();
+
+ FOR_BB_INSNS (bb, insn)
+ if (INSN_P (insn))
+ {
+ changed |= cprop_insn (insn);
+
+ /* Keep track of everything modified by this insn. */
+ /* ??? Need to be careful w.r.t. mods done to INSN.
+ Don't call mark_oprs_set if we turned the
+ insn into a NOTE. */
+ if (! NOTE_P (insn))
+ mark_oprs_set (insn);
+ }
+ }
+
+ changed |= bypass_conditional_jumps ();
+ free_cprop_mem ();
+ }
+
+ free_hash_table (&set_hash_table);
+ free_gcse_mem ();
+ obstack_free (&gcse_obstack, NULL);
+
+ if (dump_file)
+ {
+ fprintf (dump_file, "CPROP of %s, %d basic blocks, %d bytes needed, ",
+ current_function_name (), n_basic_blocks, bytes_used);
+ fprintf (dump_file, "%d local const props, %d local copy props, ",
+ local_const_prop_count, local_copy_prop_count);
+ fprintf (dump_file, "%d global const props, %d global copy props\n\n",
+ global_const_prop_count, global_copy_prop_count);
+ }
+
+ return changed;
+}
+
\f
+/* All the passes implemented in this file. Each pass has its
+ own gate and execute function, and at the end of the file a
+ pass definition for passes.c.
+
+ We do not construct an accurate cfg in functions which call
+ setjmp, so none of these passes runs if the function calls
+ setjmp.
+ FIXME: Should just handle setjmp via REG_SETJMP notes. */
+
static bool
-gate_handle_jump_bypass (void)
+gate_rtl_cprop (void)
{
return optimize > 0 && flag_gcse
- && dbg_cnt (jump_bypass);
+ && !cfun->calls_setjmp
+ && dbg_cnt (cprop);
}
-/* Perform jump bypassing and control flow optimizations. */
static unsigned int
-rest_of_handle_jump_bypass (void)
+execute_rtl_cprop (void)
{
delete_unreachable_blocks ();
- if (bypass_jumps ())
- {
- delete_trivially_dead_insns (get_insns (), max_reg_num ());
- rebuild_jump_labels (get_insns ());
- cleanup_cfg (0);
- }
+ df_note_add_problem ();
+ df_set_flags (DF_LR_RUN_DCE);
+ df_analyze ();
+ flag_rerun_cse_after_global_opts |= one_cprop_pass ();
+ return 0;
+}
+
+static bool
+gate_rtl_pre (void)
+{
+ return optimize > 0 && flag_gcse
+ && !cfun->calls_setjmp
+ && optimize_function_for_speed_p (cfun)
+ && dbg_cnt (pre);
+}
+
+static unsigned int
+execute_rtl_pre (void)
+{
+ delete_unreachable_blocks ();
+ df_note_add_problem ();
+ df_analyze ();
+ flag_rerun_cse_after_global_opts |= one_pre_gcse_pass ();
+ return 0;
+}
+
+static bool
+gate_rtl_hoist (void)
+{
+ return optimize > 0 && flag_gcse
+ && !cfun->calls_setjmp
+ /* It does not make sense to run code hoisting unless we are optimizing
+ for code size -- it rarely makes programs faster, and can make then
+ bigger if we did PRE (when optimizing for space, we don't run PRE). */
+ && optimize_function_for_size_p (cfun)
+ && dbg_cnt (hoist);
+}
+
+static unsigned int
+execute_rtl_hoist (void)
+{
+ delete_unreachable_blocks ();
+ df_note_add_problem ();
+ df_analyze ();
+ flag_rerun_cse_after_global_opts |= one_code_hoisting_pass ();
+ return 0;
+}
+
+static bool
+gate_rtl_store_motion (void)
+{
+ return optimize > 0 && flag_gcse_sm
+ && !cfun->calls_setjmp
+ && optimize_function_for_speed_p (cfun)
+ && dbg_cnt (store_motion);
+}
+
+static unsigned int
+execute_rtl_store_motion (void)
+{
+ delete_unreachable_blocks ();
+ df_note_add_problem ();
+ df_analyze ();
+ flag_rerun_cse_after_global_opts |= one_store_motion_pass ();
return 0;
}
-struct rtl_opt_pass pass_jump_bypass =
+struct rtl_opt_pass pass_rtl_cprop =
{
{
RTL_PASS,
- "bypass", /* name */
- gate_handle_jump_bypass, /* gate */
- rest_of_handle_jump_bypass, /* execute */
+ "cprop", /* name */
+ gate_rtl_cprop, /* gate */
+ execute_rtl_cprop, /* execute */
NULL, /* sub */
NULL, /* next */
0, /* static_pass_number */
- TV_BYPASS, /* tv_id */
+ TV_CPROP, /* tv_id */
PROP_cfglayout, /* properties_required */
0, /* properties_provided */
0, /* properties_destroyed */
0, /* todo_flags_start */
+ TODO_df_finish | TODO_verify_rtl_sharing |
TODO_dump_func |
- TODO_ggc_collect | TODO_verify_flow /* todo_flags_finish */
+ TODO_verify_flow | TODO_ggc_collect /* todo_flags_finish */
}
};
-
-static bool
-gate_handle_gcse (void)
+struct rtl_opt_pass pass_rtl_pre =
{
- return optimize > 0 && flag_gcse
- && dbg_cnt (gcse);
-}
-
+ {
+ RTL_PASS,
+ "pre", /* name */
+ gate_rtl_pre, /* gate */
+ execute_rtl_pre, /* execute */
+ NULL, /* sub */
+ NULL, /* next */
+ 0, /* static_pass_number */
+ TV_PRE, /* tv_id */
+ PROP_cfglayout, /* properties_required */
+ 0, /* properties_provided */
+ 0, /* properties_destroyed */
+ 0, /* todo_flags_start */
+ TODO_df_finish | TODO_verify_rtl_sharing |
+ TODO_dump_func |
+ TODO_verify_flow | TODO_ggc_collect /* todo_flags_finish */
+ }
+};
-static unsigned int
-rest_of_handle_gcse (void)
+struct rtl_opt_pass pass_rtl_hoist =
{
- int save_csb, save_cfj;
- int tem2 = 0, tem;
- tem = gcse_main (get_insns ());
- delete_trivially_dead_insns (get_insns (), max_reg_num ());
- rebuild_jump_labels (get_insns ());
- save_csb = flag_cse_skip_blocks;
- save_cfj = flag_cse_follow_jumps;
- flag_cse_skip_blocks = flag_cse_follow_jumps = 0;
-
- /* If -fexpensive-optimizations, re-run CSE to clean up things done
- by gcse. */
- if (flag_expensive_optimizations)
- {
- timevar_push (TV_CSE);
- tem2 = cse_main (get_insns (), max_reg_num ());
- df_finish_pass (false);
- purge_all_dead_edges ();
- delete_trivially_dead_insns (get_insns (), max_reg_num ());
- timevar_pop (TV_CSE);
- cse_not_expected = !flag_rerun_cse_after_loop;
- }
-
- /* If gcse or cse altered any jumps, rerun jump optimizations to clean
- things up. */
- if (tem || tem2 == 2)
- {
- timevar_push (TV_JUMP);
- rebuild_jump_labels (get_insns ());
- cleanup_cfg (0);
- timevar_pop (TV_JUMP);
- }
- else if (tem2 == 1)
- cleanup_cfg (0);
-
- flag_cse_skip_blocks = save_csb;
- flag_cse_follow_jumps = save_cfj;
- return 0;
-}
+ {
+ RTL_PASS,
+ "hoist", /* name */
+ gate_rtl_hoist, /* gate */
+ execute_rtl_hoist, /* execute */
+ NULL, /* sub */
+ NULL, /* next */
+ 0, /* static_pass_number */
+ TV_HOIST, /* tv_id */
+ PROP_cfglayout, /* properties_required */
+ 0, /* properties_provided */
+ 0, /* properties_destroyed */
+ 0, /* todo_flags_start */
+ TODO_df_finish | TODO_verify_rtl_sharing |
+ TODO_dump_func |
+ TODO_verify_flow | TODO_ggc_collect /* todo_flags_finish */
+ }
+};
-struct rtl_opt_pass pass_gcse =
+struct rtl_opt_pass pass_rtl_store_motion =
{
{
RTL_PASS,
- "gcse1", /* name */
- gate_handle_gcse, /* gate */
- rest_of_handle_gcse, /* execute */
+ "store_motion", /* name */
+ gate_rtl_store_motion, /* gate */
+ execute_rtl_store_motion, /* execute */
NULL, /* sub */
NULL, /* next */
0, /* static_pass_number */
- TV_GCSE, /* tv_id */
+ TV_LSM, /* tv_id */
PROP_cfglayout, /* properties_required */
0, /* properties_provided */
0, /* properties_destroyed */
@@ -6423,5 +6315,4 @@ struct rtl_opt_pass pass_gcse =
}
};
-
#include "gt-gcse.h"
Index: common.opt
===================================================================
--- common.opt (revision 146798)
+++ common.opt (working copy)
@@ -397,8 +397,8 @@ Common Report Var(flag_cse_follow_jumps)
When running CSE, follow jumps to their targets
fcse-skip-blocks
-Common Report Var(flag_cse_skip_blocks) Optimization
-When running CSE, follow conditional jumps
+Common
+Does nothing. Preserved for backward compatibility.
fcx-limited-range
Common Report Var(flag_cx_limited_range) Optimization
Index: passes.c
===================================================================
--- passes.c (revision 146798)
+++ passes.c (working copy)
@@ -733,7 +733,12 @@ init_optimization_passes (void)
NEXT_PASS (pass_df_initialize_opt);
NEXT_PASS (pass_cse);
NEXT_PASS (pass_rtl_fwprop);
- NEXT_PASS (pass_gcse);
+ NEXT_PASS (pass_rtl_cprop);
+ NEXT_PASS (pass_rtl_pre);
+ NEXT_PASS (pass_rtl_hoist);
+ NEXT_PASS (pass_rtl_cprop);
+ NEXT_PASS (pass_rtl_store_motion);
+ NEXT_PASS (pass_cse_after_global_opts);
NEXT_PASS (pass_rtl_ifcvt);
/* Perform loop optimizations. It might be better to do them a bit
sooner, but we want the profile feedback to work more
@@ -750,7 +755,7 @@ init_optimization_passes (void)
*p = NULL;
}
NEXT_PASS (pass_web);
- NEXT_PASS (pass_jump_bypass);
+ NEXT_PASS (pass_rtl_cprop);
NEXT_PASS (pass_cse2);
NEXT_PASS (pass_rtl_dse1);
NEXT_PASS (pass_rtl_fwprop_addr);
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-12-03 15:05 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-26 22:50 [PATCH][RFA] Split passes from gcse_main into separate passes Bradley Lucier
2009-04-27 5:53 ` Steven Bosscher
-- strict thread matches above, loose matches on Subject: below --
2009-04-26 21:27 Steven Bosscher
2009-04-26 21:37 ` Steven Bosscher
2009-04-27 8:45 ` Richard Guenther
2010-12-03 15:05 ` H.J. Lu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).