* [RFC] Selective scheduling pass @ 2008-06-03 14:24 Andrey Belevantsev 2008-06-03 14:26 ` Selective scheduling pass - middle end changes [1/1] Andrey Belevantsev ` (5 more replies) 0 siblings, 6 replies; 28+ messages in thread From: Andrey Belevantsev @ 2008-06-03 14:24 UTC (permalink / raw) To: GCC Patches; +Cc: Jim Wilson, Vladimir Makarov Hello, The patches in this thread introduce selective scheduler in GCC, implemented by myself, Dmitry Melnik, Dmitry Zhurikhin, Alexander Monakov, and Maxim Kuvyrkov while he was at ISP RAS. Selective scheduler is aimed at scheduling eager targets such as ia64, power6, and cell. The implementation contains both the scheduler and the software pipeliner, which can be used on loops with control flow not handled by SMS. The scheduler can work either before or after register allocation, but it is currently tuned to work after. The scheduler was bootstrapped and tested on ia64, with all default languages, both as a first and as a second scheduler. It was also bootstrapped with c, c++, and fortran enabled on ppc64 and x86-64. On ia64, test results on SPEC2k FP comparing -O3 -ffast-math on trunk and sel-sched branch show 3.8% speedup on average, SPEC INT shows both small speedups and regressions, staying around neutral in average: 168.wupwise 513 552 7,60% 171.swim 757 772 1,98% 172.mgrid 570 643 12,81% 173.applu 503 524 4,17% 177.mesa 796 795 -0,13% 178.galgel 814 787 -3,32% 179.art 1990 2098 5,43% 183.equake 513 569 10,92% 187.facerec 958 991 3,44% 188.ammp 765 775 1,31% 189.lucas 860 869 1,05% 191.fma3d 549 536 -2,37% 200.sixtrack 300 323 7,67% 301.apsi 522 546 4,60% Geomean 673,97 699,87 3,84% 164.gzip 683 682 -0,15% 175.vpr 814 802 -1,47% 176.gcc 1080 1069 -1,02% 181.mcf 701 708 1,00% 186.crafty 872 855 -1,95% 197.parser 729 728 -0,14% 252.eon 793 785 -1,01% 253.perlbmk 824 839 1,82% 254.gap 558 569 1,97% 255.vortex 1012 966 -4,55% 256.bzip2 758 762 0,53% 300.twolf 1005 1015 1,00% Geomean 806,04 803,25 -0,35% On power6, Revital Eres saw speedups on several tests; additional tuning is required to get good results there, which is complicated because we don't have power6. On cell, there was some third-party testing in 2007, showing 4-6% speedups, but I don't have more detailed information. Compile time slowdown measured with --enable-checking=assert is quite significant -- about 12% on spec int and about 18% on spec fp and cc1-i-files collection. For this reason, we have enabled selective scheduler by default at -O3 on ia64 and disabled by default on other targets. Our current plan is to work on further compile time improvements and performance tuning for ppc and cell, hopefully with the help of IBM Haifa folks. If we will complete this work before the end of stage2, then we can enable selective scheduling at -O3 also for ppc in 4.4. In the mid-term, we will work on removing the ebb scheduler, as it is now used on ia64 only and will be superseded by selective scheduler when we'll further improve compile time. Andrey ^ permalink raw reply [flat|nested] 28+ messages in thread
* Selective scheduling pass - middle end changes [1/1] 2008-06-03 14:24 [RFC] Selective scheduling pass Andrey Belevantsev @ 2008-06-03 14:26 ` Andrey Belevantsev 2008-06-11 1:04 ` Ian Lance Taylor 2008-06-03 14:27 ` Selective scheduling pass - scheduler changes [2/3] Andrey Belevantsev ` (4 subsequent siblings) 5 siblings, 1 reply; 28+ messages in thread From: Andrey Belevantsev @ 2008-06-03 14:26 UTC (permalink / raw) To: GCC Patches; +Cc: Jim Wilson, Vladimir Makarov [-- Attachment #1: Type: text/plain, Size: 3687 bytes --] Hello, This patch represents the middle-end changes needed for the selective scheduler. They are relatively small and include the following: o hooks to catch new insns and basic blocks, which can be generated as a result of changing control flow in cfgrtl mode. These are out of the scheduler control, but still we need to initialize internal data for them. o fixes needed to work with loop data after register allocation and in cfgrtl mode, I think one of them was sent by Zdenek to me some time ago. o a function in genautomata to output maximal insn latency. o an iterator over hard reg sets analogous to the bitmap iterator. o an interface to validate_replace_rtx that allows postponing simplification of the new rtx until later. o an interface to rtx_equal_p and hash_rtx that allows skipping certain parts of rtx while comparing or hashing. This is needed for unification of e.g. control speculative and data speculative insns (which have different patterns, of course). The similar mechanism for may_trap_p is already in trunk, where it is implemented via a target hook. OK for trunk? Andrey 2008-06-03 Andrey Belevantsev <abel@ispras.ru> Dmitry Melnik <dm@ispras.ru> Dmitry Zhurikhin <zhur@ispras.ru> Alexander Monakov <amonakov@ispras.ru> Maxim Kuvyrkov <maxim@codesourcery.com> * cfghooks.h (get_cfg_hooks, set_cfg_hooks): New prototypes. * cfghooks.c (get_cfg_hooks, set_cfg_hooks): New functions. (make_forwarder_block): Update loop latch if we have redirected the loop latch edge. * cfgloop.c (get_loop_body_in_custom_order): New function. * cfgloop.h (LOOPS_HAVE_FALLTHRU_PREHEADERS): New enum field. (CP_FALLTHRU_PREHEADERS): Likewise. (get_loop_body_in_custom_order): Declare. * cfgloopmanip.c (has_preds_from_loop): New. (create_preheader): Honor CP_FALLTHRU_PREHEADERS. Assert that the preheader edge will be fall thru when it is set. * cse.c (hash_rtx_cb): New. (hash_rtx): Use it. * emit-rtl.c (add_insn, add_insn_after, add_insn_before, emit_insn_after_1): Call insn_added hook. * genattr.c (main): Output maximal_insn_latency prototype. * genautomata.c (output_default_latencies): New. Factor its code from ... (output_internal_insn_latency_func): ... here. (output_internal_maximal_insn_latency_func): New. (output_maximal_insn_latency_func): New. * hard-reg-set.h (UHOST_BITS_PER_WIDE_INT): Define unconditionally. (struct hard_reg_set_iterator): New. (hard_reg_set_iter_init, hard_reg_set_iter_set, hard_reg_set_iter_next): New functions. (EXECUTE_IF_SET_IN_HARD_REG_SET): New macro. * lists.c (remove_free_INSN_LIST_node, remove_free_EXPR_LIST_node): New functions. * loop-init.c (loop_optimizer_init): When LOOPS_HAVE_FALLTHRU_PREHEADERS, set CP_FALLTHRU_PREHEADERS when calling create_preheaders. (loop_optimizer_finalize): Do not verify flow info after reload. * passes.c (init_optimization_passes): Move pass_compute_alignments after pass_machine_reorg. * recog.c (validate_replace_rtx_1): New parameter simplify. Default it to true. Update all uses. Factor out simplifying code to ... (simplify_while_replacing): ... this new function. (validate_replace_rtx_part, validate_replace_rtx_part_nosimplify): New. * recog.h (validate_replace_rtx_part, validate_replace_rtx_part_nosimplify): Declare. * rtl.c (rtx_equal_p_cb): New. (rtx_equal_p): Use it. * rtl.h (rtx_equal_p_cb, hash_rtx_cb): Declare. (remove_free_INSN_LIST_NODE, remove_free_EXPR_LIST_node, debug_bb_n_slim, debug_bb_slim, print_rtl_slim, sel_sched_fix_param, insn_added): Likewise. * rtlhooks-def.h (RTL_HOOKS_INSN_ADDED): Define to NULL. Add to RTL_HOOKS_INITIALIZER. [-- Attachment #2: sel-sched-merge-middle-end.diff.gz --] [-- Type: application/gzip, Size: 11818 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Selective scheduling pass - middle end changes [1/1] 2008-06-03 14:26 ` Selective scheduling pass - middle end changes [1/1] Andrey Belevantsev @ 2008-06-11 1:04 ` Ian Lance Taylor 2008-06-11 13:40 ` Andrey Belevantsev 2008-08-22 15:55 ` Andrey Belevantsev 0 siblings, 2 replies; 28+ messages in thread From: Ian Lance Taylor @ 2008-06-11 1:04 UTC (permalink / raw) To: Andrey Belevantsev; +Cc: GCC Patches, Jim Wilson, Vladimir Makarov Andrey Belevantsev <abel@ispras.ru> writes: > 2008-06-03 Andrey Belevantsev <abel@ispras.ru> > Dmitry Melnik <dm@ispras.ru> > Dmitry Zhurikhin <zhur@ispras.ru> > Alexander Monakov <amonakov@ispras.ru> > Maxim Kuvyrkov <maxim@codesourcery.com> > > * cfghooks.h (get_cfg_hooks, set_cfg_hooks): New prototypes. > > * cfghooks.c (get_cfg_hooks, set_cfg_hooks): New functions. > (make_forwarder_block): Update loop latch if we have redirected > the loop latch edge. > > * cfgloop.c (get_loop_body_in_custom_order): New function. > > * cfgloop.h (LOOPS_HAVE_FALLTHRU_PREHEADERS): New enum field. > (CP_FALLTHRU_PREHEADERS): Likewise. > (get_loop_body_in_custom_order): Declare. > > * cfgloopmanip.c (has_preds_from_loop): New. > (create_preheader): Honor CP_FALLTHRU_PREHEADERS. > Assert that the preheader edge will be fall thru when it is set. > > * cse.c (hash_rtx_cb): New. > (hash_rtx): Use it. > > * emit-rtl.c (add_insn, add_insn_after, add_insn_before, > emit_insn_after_1): Call insn_added hook. > > * genattr.c (main): Output maximal_insn_latency prototype. > > * genautomata.c (output_default_latencies): New. Factor its code from ... > (output_internal_insn_latency_func): ... here. > (output_internal_maximal_insn_latency_func): New. > (output_maximal_insn_latency_func): New. > > * hard-reg-set.h (UHOST_BITS_PER_WIDE_INT): Define unconditionally. > (struct hard_reg_set_iterator): New. > (hard_reg_set_iter_init, hard_reg_set_iter_set, > hard_reg_set_iter_next): New functions. > (EXECUTE_IF_SET_IN_HARD_REG_SET): New macro. > > * lists.c (remove_free_INSN_LIST_node, > remove_free_EXPR_LIST_node): New functions. > > * loop-init.c (loop_optimizer_init): When LOOPS_HAVE_FALLTHRU_PREHEADERS, > set CP_FALLTHRU_PREHEADERS when calling create_preheaders. > (loop_optimizer_finalize): Do not verify flow info after reload. > > * passes.c (init_optimization_passes): Move pass_compute_alignments > after pass_machine_reorg. > > * recog.c (validate_replace_rtx_1): New parameter simplify. > Default it to true. Update all uses. Factor out simplifying > code to ... > (simplify_while_replacing): ... this new function. > (validate_replace_rtx_part, > validate_replace_rtx_part_nosimplify): New. > > * recog.h (validate_replace_rtx_part, > validate_replace_rtx_part_nosimplify): Declare. > > * rtl.c (rtx_equal_p_cb): New. > (rtx_equal_p): Use it. > > * rtl.h (rtx_equal_p_cb, hash_rtx_cb): Declare. > (remove_free_INSN_LIST_NODE, remove_free_EXPR_LIST_node, > debug_bb_n_slim, debug_bb_slim, print_rtl_slim, > sel_sched_fix_param, insn_added): Likewise. > > * rtlhooks-def.h (RTL_HOOKS_INSN_ADDED): Define to NULL. > Add to RTL_HOOKS_INITIALIZER. ! if (jump != NULL) ! { ! /* If we redirected the loop latch edge, the JUMP block now acts like ! the new latch of the loop. */ ! if (current_loops != NULL ! && dummy->loop_father->header == dummy ! && dummy->loop_father->latch == e_src) ! dummy->loop_father->latch = jump; I think you need to check that dummy->loop_father != NULL before you dereference it. && !((flags & CP_SIMPLE_PREHEADERS) ! && !single_succ_p (single_entry->src)) ! && !((flags & CP_FALLTHRU_PREHEADERS ! && (JUMP_P (BB_END (single_entry->src)) ! || has_preds_from_loop (single_entry->src, loop))))) This code needs a comment. Actually, I think it would be better to break up the complex condition into three simpler conditions, ideally ones which can be understood without applying DeMorgan's law. Also, you need to update the comment on the function as a whole. *************** hash_rtx (const_rtx x, enum machine_mode *** 2237,2243 **** x = XEXP (x, 0); goto repeat; ! case USE: /* A USE that mentions non-volatile memory needs special handling since the MEM may be BLKmode which normally prevents an entry from being made. Pure calls are --- 2241,2247 ---- x = XEXP (x, 0); goto repeat; ! case USE: /* A USE that mentions non-volatile memory needs special handling since the MEM may be BLKmode which normally prevents an entry from being made. Pure calls are A whitespace change in the wrong direction. Please don't apply this bit. *************** hash_rtx (const_rtx x, enum machine_mode *** 2330,2343 **** goto repeat; } ! hash += hash_rtx (XEXP (x, i), 0, do_not_record_p, ! hash_arg_in_memory_p, have_reg_qty); break; case 'E': for (j = 0; j < XVECLEN (x, i); j++) ! hash += hash_rtx (XVECEXP (x, i, j), 0, do_not_record_p, ! hash_arg_in_memory_p, have_reg_qty); break; case 's': --- 2339,2355 ---- goto repeat; } ! hash += hash_rtx_cb (XEXP (x, i), 0, do_not_record_p, ! hash_arg_in_memory_p, ! have_reg_qty, cb); break; case 'E': for (j = 0; j < XVECLEN (x, i); j++) ! hash += hash_rtx_cb (XVECEXP (x, i, j), 0, ! do_not_record_p, ! hash_arg_in_memory_p, ! have_reg_qty, cb); break; case 's': The indentation seems to have gone wrong here. *** trunk/gcc/genautomata.c Mon Sep 17 10:03:51 2007 --- sel-sched-branch/gcc/genautomata.c Mon Apr 14 17:13:39 2008 *************** output_min_insn_conflict_delay_func (voi *** 8067,8079 **** fprintf (output_file, "}\n\n"); } - /* Output function `internal_insn_latency'. */ static void ! output_internal_insn_latency_func (void) { - decl_t decl; - struct bypass_decl *bypass; int i, j, col; const char *tabletype = "unsigned char"; /* Find the smallest integer type that can hold all the default --- 8067,8077 ---- fprintf (output_file, "}\n\n"); } static void ! output_default_latencies (void) { int i, j, col; + decl_t decl; const char *tabletype = "unsigned char"; /* Find the smallest integer type that can hold all the default Don't remove the comment on the function, correct it. + if (iter->bits) + goto next_bit; This goto seems unnecessarily confusing. A simple "break" should work here. *** trunk/gcc/passes.c Fri May 30 17:32:06 2008 --- sel-sched-branch/gcc/passes.c Fri May 23 18:48:33 2008 *************** init_optimization_passes (void) *** 770,780 **** NEXT_PASS (pass_split_before_regstack); NEXT_PASS (pass_stack_regs_run); } - NEXT_PASS (pass_compute_alignments); NEXT_PASS (pass_duplicate_computed_gotos); NEXT_PASS (pass_variable_tracking); NEXT_PASS (pass_free_cfg); NEXT_PASS (pass_machine_reorg); NEXT_PASS (pass_cleanup_barriers); NEXT_PASS (pass_delay_slots); NEXT_PASS (pass_split_for_shorten_branches); --- 770,780 ---- NEXT_PASS (pass_split_before_regstack); NEXT_PASS (pass_stack_regs_run); } NEXT_PASS (pass_duplicate_computed_gotos); NEXT_PASS (pass_variable_tracking); NEXT_PASS (pass_free_cfg); NEXT_PASS (pass_machine_reorg); + NEXT_PASS (pass_compute_alignments); NEXT_PASS (pass_cleanup_barriers); NEXT_PASS (pass_delay_slots); NEXT_PASS (pass_split_for_shorten_branches); This looks wrong. I don't think you can call pass_compute_alignments after calling pass_free_cfg. *************** extern void set_curr_insn_source_locatio *** 2305,2308 **** --- 2327,2332 ---- extern void set_curr_insn_block (tree); extern int curr_insn_locator (void); + #define insn_added (rtl_hooks.insn_added) + #endif /* ! GCC_RTL_H */ We have a lot of #define's like this because nobody wanted to clean up the existing code. For new code, I don't think we need to add the #define's. Just use rtl_hooks.insn_added. That said, I'm not sure I like insn_added very much. It seems like a relatively fragile hook, as it will be hard to detect cases when it is used incorrectly. Can you expand on why this is needed? For building data structures, why does it not suffice to use get_max_uid? What sorts of insns do you expect to see created here? Thanks. Ian ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Selective scheduling pass - middle end changes [1/1] 2008-06-11 1:04 ` Ian Lance Taylor @ 2008-06-11 13:40 ` Andrey Belevantsev 2008-06-11 14:30 ` Ian Lance Taylor 2008-08-22 15:55 ` Andrey Belevantsev 1 sibling, 1 reply; 28+ messages in thread From: Andrey Belevantsev @ 2008-06-11 13:40 UTC (permalink / raw) To: Ian Lance Taylor; +Cc: GCC Patches, Jim Wilson, Vladimir Makarov Hello Ian, Thanks for reviewing the patch! Ian Lance Taylor wrote: > This looks wrong. I don't think you can call pass_compute_alignments > after calling pass_free_cfg. On ia64 we needed to compute alignments after the scheduling was done, i.e. after pass_machine_reorg. Otherwise cfg changes messed up the alignments, for example, loop label could move to the other basic block. Of course, on ia64 there is a cfg at that point, and this is why it's worked. I missed that it would not be like this for other targets. What would you suggest doing instead? > That said, I'm not sure I like insn_added very much. It seems like a > relatively fragile hook, as it will be hard to detect cases when it is > used incorrectly. Can you expand on why this is needed? For building > data structures, why does it not suffice to use get_max_uid? What > sorts of insns do you expect to see created here? We have control over all insns we create in the scheduler, and we properly initialize data for them, except for the jumps that get created during e.g. redirect_edge_and_branch. The hook was invented to catch these. We can probably do with e.g. memorizing get_max_uid before and after the calls to cfgrtl functions and then passing the new insns to the initialization engine, but we need an rtx there, not an uid. How can we get it? Andrey ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Selective scheduling pass - middle end changes [1/1] 2008-06-11 13:40 ` Andrey Belevantsev @ 2008-06-11 14:30 ` Ian Lance Taylor 2008-06-27 13:10 ` Andrey Belevantsev 0 siblings, 1 reply; 28+ messages in thread From: Ian Lance Taylor @ 2008-06-11 14:30 UTC (permalink / raw) To: Andrey Belevantsev; +Cc: GCC Patches, Jim Wilson, Vladimir Makarov Andrey Belevantsev <abel@ispras.ru> writes: > Ian Lance Taylor wrote: >> This looks wrong. I don't think you can call pass_compute_alignments >> after calling pass_free_cfg. > On ia64 we needed to compute alignments after the scheduling was done, > i.e. after pass_machine_reorg. Otherwise cfg changes messed up the > alignments, for example, loop label could move to the other basic > block. Of course, on ia64 there is a cfg at that point, and this is > why it's worked. I missed that it would not be like this for other > targets. What would you suggest doing instead? I would suggest that you have the ia64 machine_reorg pass call compute_alignments itself. Admittedly compute_alignments will be run twice for the ia64, but it should be a fairly fast pass--it loops through all the basic blocks, but not through all the insns. >> That said, I'm not sure I like insn_added very much. It seems like a >> relatively fragile hook, as it will be hard to detect cases when it is >> used incorrectly. Can you expand on why this is needed? For building >> data structures, why does it not suffice to use get_max_uid? What >> sorts of insns do you expect to see created here? > We have control over all insns we create in the scheduler, and we > properly initialize data for them, except for the jumps that get > created during e.g. redirect_edge_and_branch. The hook was invented > to catch these. We can probably do with e.g. memorizing get_max_uid > before and after the calls to cfgrtl functions and then passing the > new insns to the initialization engine, but we need an rtx there, not > an uid. How can we get it? Unfortunately, there is no mapping from the UID to the insn. I was thinking of, e.g., using the UID to scale array sizes. If you look at haifa-sched.c, you'll see that it uses calls like redirect_edge_succ, generates branch insns itself, and calls extend_global (a haifa-sched.c) function to build information about the insn. Is it reasonable for your code to work at that level? Since you have data about all insns, don't you also need data about insns which have changed or are deleted? Ian ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Selective scheduling pass - middle end changes [1/1] 2008-06-11 14:30 ` Ian Lance Taylor @ 2008-06-27 13:10 ` Andrey Belevantsev 2008-06-30 16:16 ` Ian Lance Taylor 0 siblings, 1 reply; 28+ messages in thread From: Andrey Belevantsev @ 2008-06-27 13:10 UTC (permalink / raw) To: Ian Lance Taylor; +Cc: GCC Patches, Jim Wilson, Vladimir Makarov Hello Ian, Sorry for the delay in answer -- I've just got back after traveling (including the summit). I'm now working on fixing issues you've pointed out. Ian Lance Taylor wrote: > I would suggest that you have the ia64 machine_reorg pass call > compute_alignments itself. Admittedly compute_alignments will be run > twice for the ia64, but it should be a fairly fast pass--it loops > through all the basic blocks, but not through all the insns. I will try that. > Unfortunately, there is no mapping from the UID to the insn. I was > thinking of, e.g., using the UID to scale array sizes. > > If you look at haifa-sched.c, you'll see that it uses calls like > redirect_edge_succ, generates branch insns itself, and calls > extend_global (a haifa-sched.c) function to build information about > the insn. Is it reasonable for your code to work at that level? That would require reimplementing e.g. split_edge and redirect_edge_and_branch inside the scheduler, so we can see the actual insn created. I don't think this is reasonable. If you're uncomfortable with the idea of the hook, I can invent something along the lines of searching the new jumps in the code and passing them to the initialization routines. This would effectively find insns given their UIDs and the knowledge that they has got created somewhere near the given point in the CFG. I think this will not happen too often to have significant effects on compile time. The hook seemed to be just the simpler way of doing this. > Since you have data about all insns, don't you also need data about > insns which have changed or are deleted? Not quite. We always change insn's uid when its pattern was changed (which is also happens not very often). Dependence caches used for on-the-fly analysis rely on this as they use UIDs as a key. Overall, the data is maintained valid only for insns that are actually in the insn stream, as we only either collect them as possible scheduling candidates or propagate through them. The data for deleted insns remains in the array and gets freed after the current region has been scheduled. Andrey ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Selective scheduling pass - middle end changes [1/1] 2008-06-27 13:10 ` Andrey Belevantsev @ 2008-06-30 16:16 ` Ian Lance Taylor 2008-07-08 14:54 ` Andrey Belevantsev 0 siblings, 1 reply; 28+ messages in thread From: Ian Lance Taylor @ 2008-06-30 16:16 UTC (permalink / raw) To: Andrey Belevantsev; +Cc: GCC Patches, Jim Wilson, Vladimir Makarov Andrey Belevantsev <abel@ispras.ru> writes: > If you're uncomfortable with the idea of the hook, I can invent > something along the lines of searching the new jumps in the code and > passing them to the initialization routines. This would effectively > find insns given their UIDs and the knowledge that they has got > created somewhere near the given point in the CFG. I think this will > not happen too often to have significant effects on compile time. The > hook seemed to be just the simpler way of doing this. Well, I'm uncomfortable with the idea of the hook. I wouldn't necessarily mind a complete hook interface. But the one you've implemented seems sort of ad hoc and easy to get wrong. We don't currently have any way for a pass to clearly track every change to the RTL insn stream. If we need that, I think we should do it for real. Ian ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Selective scheduling pass - middle end changes [1/1] 2008-06-30 16:16 ` Ian Lance Taylor @ 2008-07-08 14:54 ` Andrey Belevantsev 2008-07-08 15:29 ` Ian Lance Taylor 0 siblings, 1 reply; 28+ messages in thread From: Andrey Belevantsev @ 2008-07-08 14:54 UTC (permalink / raw) To: Ian Lance Taylor; +Cc: GCC Patches, Jim Wilson, Vladimir Makarov Hello, Ian Lance Taylor wrote: > Well, I'm uncomfortable with the idea of the hook. I wouldn't > necessarily mind a complete hook interface. But the one you've > implemented seems sort of ad hoc and easy to get wrong. We don't > currently have any way for a pass to clearly track every change to the > RTL insn stream. If we need that, I think we should do it for real. I have looked closely at the places where jumps are generated by cfgrtl.c. There are only two of them, one in force_fallthru_and_redirect and one in try_redirect_by_replacing_jump, and all our usage of split_edge and redirect_edge_and_branch leads to these places. What if I add an interface for register/unregister a hook that would notify of creating new jumps by those functions? This way, the changes in the scheduler will be minimal, and the hook itself would be much more safe. I can make it a general cfg hook if desired, but I doubt that tree cfg or cfglayout will use it. Andrey ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Selective scheduling pass - middle end changes [1/1] 2008-07-08 14:54 ` Andrey Belevantsev @ 2008-07-08 15:29 ` Ian Lance Taylor 0 siblings, 0 replies; 28+ messages in thread From: Ian Lance Taylor @ 2008-07-08 15:29 UTC (permalink / raw) To: Andrey Belevantsev; +Cc: GCC Patches, Jim Wilson, Vladimir Makarov Andrey Belevantsev <abel@ispras.ru> writes: > Ian Lance Taylor wrote: >> Well, I'm uncomfortable with the idea of the hook. I wouldn't >> necessarily mind a complete hook interface. But the one you've >> implemented seems sort of ad hoc and easy to get wrong. We don't >> currently have any way for a pass to clearly track every change to the >> RTL insn stream. If we need that, I think we should do it for real. > I have looked closely at the places where jumps are generated by > cfgrtl.c. There are only two of them, one in > force_fallthru_and_redirect and one in try_redirect_by_replacing_jump, > and all our usage of split_edge and redirect_edge_and_branch leads to > these places. What if I add an interface for register/unregister a > hook that would notify of creating new jumps by those functions? This > way, the changes in the scheduler will be minimal, and the hook itself > would be much more safe. I can make it a general cfg hook if desired, > but I doubt that tree cfg or cfglayout will use it. I really think that Steven's suggestion of using cfglayout mode is correct. Ian ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Selective scheduling pass - middle end changes [1/1] 2008-06-11 1:04 ` Ian Lance Taylor 2008-06-11 13:40 ` Andrey Belevantsev @ 2008-08-22 15:55 ` Andrey Belevantsev 1 sibling, 0 replies; 28+ messages in thread From: Andrey Belevantsev @ 2008-08-22 15:55 UTC (permalink / raw) To: Ian Lance Taylor; +Cc: GCC Patches, Jim Wilson, Vladimir Makarov [-- Attachment #1: Type: text/plain, Size: 3175 bytes --] Hello Ian, Here is the updated patch that has all your comments addressed and the RTL hooks removed. As I wrote in a separate email (http://gcc.gnu.org/ml/gcc-patches/2008-08/msg01052.html), I apologize for making too much noise about the RTL hooks problem -- with Zdenek's suggestion, it is actually solved very easily. This patch has the only addition compared to the original patch in final.c which fixes a FAIL on ia64 with the dump-addr.c test case. When the dump file is used, dominator information does not get freed in compute_alignment, thus resulting in ICE in verify_dominators later in the selective scheduling. The addition itself was committed to the branch at http://gcc.gnu.org/ml/gcc-patches/2008-08/msg01633.html. Thanks again for your review, Andrey 2008-08-22 Andrey Belevantsev <abel@ispras.ru> Dmitry Melnik <dm@ispras.ru> Dmitry Zhurikhin <zhur@ispras.ru> Alexander Monakov <amonakov@ispras.ru> Maxim Kuvyrkov <maxim@codesourcery.com> * cfghooks.h (get_cfg_hooks, set_cfg_hooks): New prototypes. * cfghooks.c (get_cfg_hooks, set_cfg_hooks): New functions. (make_forwarder_block): Update loop latch if we have redirected the loop latch edge. * cfgloop.c (get_loop_body_in_custom_order): New function. * cfgloop.h (LOOPS_HAVE_FALLTHRU_PREHEADERS): New enum field. (CP_FALLTHRU_PREHEADERS): Likewise. (get_loop_body_in_custom_order): Declare. * cfgloopmanip.c (has_preds_from_loop): New. (create_preheader): Honor CP_FALLTHRU_PREHEADERS. Assert that the preheader edge will be fall thru when it is set. * cse.c (hash_rtx_cb): New. (hash_rtx): Use it. * final.c (compute_alignments): Export. Free dominance info after loop_optimizer_finalize. * genattr.c (main): Output maximal_insn_latency prototype. * genautomata.c (output_default_latencies): New. Factor its code from ... (output_internal_insn_latency_func): ... here. (output_internal_maximal_insn_latency_func): New. (output_maximal_insn_latency_func): New. * hard-reg-set.h (UHOST_BITS_PER_WIDE_INT): Define unconditionally. (struct hard_reg_set_iterator): New. (hard_reg_set_iter_init, hard_reg_set_iter_set, hard_reg_set_iter_next): New functions. (EXECUTE_IF_SET_IN_HARD_REG_SET): New macro. * lists.c (remove_free_INSN_LIST_node, remove_free_EXPR_LIST_node): New functions. * loop-init.c (loop_optimizer_init): When LOOPS_HAVE_FALLTHRU_PREHEADERS, set CP_FALLTHRU_PREHEADERS when calling create_preheaders. (loop_optimizer_finalize): Do not verify flow info after reload. * recog.c (validate_replace_rtx_1): New parameter simplify. Default it to true. Update all uses. Factor out simplifying code to ... (simplify_while_replacing): ... this new function. (validate_replace_rtx_part, validate_replace_rtx_part_nosimplify): New. * recog.h (validate_replace_rtx_part, validate_replace_rtx_part_nosimplify): Declare. * rtl.c (rtx_equal_p_cb): New. (rtx_equal_p): Use it. * rtl.h (rtx_equal_p_cb, hash_rtx_cb): Declare. (remove_free_INSN_LIST_NODE, remove_free_EXPR_LIST_node, debug_bb_n_slim, debug_bb_slim, print_rtl_slim): Likewise. * vecprim.h: Add a vector type for unsigned int. [-- Attachment #2: sel-sched-middle.diff.gz --] [-- Type: application/gzip, Size: 26761 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Selective scheduling pass - scheduler changes [2/3] 2008-06-03 14:24 [RFC] Selective scheduling pass Andrey Belevantsev 2008-06-03 14:26 ` Selective scheduling pass - middle end changes [1/1] Andrey Belevantsev @ 2008-06-03 14:27 ` Andrey Belevantsev 2008-06-03 22:03 ` Vladimir Makarov 2008-06-03 14:28 ` Selective scheduling pass - target changes (ia64 & rs6000) [3/3] Andrey Belevantsev ` (3 subsequent siblings) 5 siblings, 1 reply; 28+ messages in thread From: Andrey Belevantsev @ 2008-06-03 14:27 UTC (permalink / raw) To: GCC Patches; +Cc: Jim Wilson, Vladimir Makarov, Ayal Zaks [-- Attachment #1: Type: text/plain, Size: 15469 bytes --] Hello, This patch is the largest part of the implementation, and it shows changes to the scheduler itself as well as the new files. The main changes in the scheduler fall into two categories: changes to the dependence analysis and changes to the initialization mechanism. The dependence analysis changes are as follows: o new hooks are introduced to make the schedulers be able to perform actions when a dependence is found, either a memory or a register one. This is needed for on-the-fly dependence analysis, which we have to do because it is unclear how we can keep the dependence graph up to date in the presence of control dependencies and register renaming. I think that a project for creating a proper dependence graph with e.g. distinguishing between control and data dependencies and storing origin of a dependence is needed to make this possible. o readonly dependence contexts, when analysis generates all proper dependencies but doesn't change a context it is based on. This is needed to speed up the on-the-fly analysis. o sched_get_condition is rewritten such that it doesn't generate rtx garbage every time it needs to canonicalize the condition. The initialization changes can be described as an effort to separate common scheduler data, private data, and dependence analysis data, and to create a uniform mechanism to initialize per-insn and per-basic block data. The former is achieved with factoring the common/private/deps data and the code initializing it into separate structures and functions. For example, the functions sched_init/sched_finish initialize/finalize the common part (df/aliases/dfa/etc), haifa_sched_{init,finish} work with the haifa data and so on. The latter is achieved via sched_scan interface, which allows scanning arbitrary array of insns/bbs and to call user-defined hooks {init_extend}_{insn,bb} on each of them. As a followup, the common part of the code, which is now in haifa-sched.c, can be moved to the new sched-common.c file. We did not do that for the sake of easier merge process. The overview of the selective scheduler implementation is given at the beginning of the sel-sched.c file. There are also papers in the GCC Summit proceedings of 2006 and 2007. The sel-sched.c file contains the main scheduling routines, and the sel-sched-ir.c file contains lower-level routines that manipulate the scheduler IR and data structures. There are also sel-sched-dump.[ch] files, which contain dumping infrastructure for the scheduler and some code that eases debugging. Some of the code may be emitted from merge, if we find it too specific. OK for trunk? Andrey 2008-06-03 Andrey Belevantsev <abel@ispras.ru> Dmitry Melnik <dm@ispras.ru> Dmitry Zhurikhin <zhur@ispras.ru> Alexander Monakov <amonakov@ispras.ru> Maxim Kuvyrkov <maxim@codesourcery.com> * sel-sched.h, sel-sched-dump.h, sel-sched-ir.h, sel-sched.c, sel-sched-dump.c, sel-sched-ir.c: New files. * Makefile.in (OBJS-common): Add selective scheduling object files. (sel-sched.o, sel-sched-dump.o, sel-sched-ir.o): New entries. (SEL_SCHED_IR_H, SEL_SCHED_DUMP_H): New entries. * common.opt (fsel-sched-bookkeeping, fsel-sched-pipelining, fsel-sched-pipelining-outer-loops, fsel-sched-renaming, fsel-sched-substitution, fselective-scheduling): New flags. * haifa-sched.c: Include vecprim.h. (issue_rate, sched_verbose_param, note_list, dfa_state_size, ready_try, cycle_issued_insns, dfa_lookahead, max_luid, spec_info): Make global. (old_max_uid, old_last_basic_block): Remove. (h_i_d): Make it a vector. (INSN_TICK, INTER_TICK, QUEUE_INDEX, INSN_COST): Make them work through HID macro. (after_recovery, adding_bb_to_current_region_p): New variables to handle correct insertion of the recovery code. (struct ready_list): Move declaration to sched-int.h. (rgn_n_insns): Removed. (rtx_vec_t): Move to sched-int.h. (find_insn_reg_weight): Remove. (find_insn_reg_weight1): Rename to find_insn_reg_weight. (extend_h_i_d, init_h_i_d, haifa_init_h_i_d, haifa_finish_h_i_d): New functions to initialize / finalize haifa instruction data. (dep_weak): Move to sched-deps.c. Rename to ds_weak. (unlink_other_notes): Move logic to add_to_note_list. Handle selective scheduler. (ready_lastpos, ready_element, ready_sort, reemit_notes, move_insn, find_fallthru_edge): Make global, remove static prototypes. (max_issue): Add privileged_n and state parameters. Use them. (extend_global, extend_all): Removed. (init_before_recovery): Add new param. Fix the handling of the case when we insert a recovery code before the EXIT which has a predecessor with a fallthrough edge to it. (create_recovery_block): Make global. Rename to sched_create_recovery_block. Update. (change_pattern): Rename to sched_change_pattern. Make global. (speculate_insn): Rename to sched_speculate_insn. Make global. Split haifa-specific functionality into ... (haifa_change_pattern): New static function. (sched_extend_bb, sched_init_bb): New static functions. (sched_extend_bb): Add the prototype. (current_sched_info): Change type to ... (struct haifa_sched_info): ... this. New structure. Move Haifa-specific fields from struct sched_info. (insn_cost): Adjust for selective scheduling. (dep_cost_1): New static function. Prototype it. Move logic from ... (insn_cost1): ... here. (dep_cost): Use dep_cost_1. (priority): Adjust to work with selective scheduling. Use sched_deps_info instead of current_sched_info. Process the corner case when all dependencies don't contribute to priority. (rank_for_schedule): Use ds_weak instead of dep_weak. (advance_state): New function. Move logic from ... (advance_one_cycle): ... here. (add_to_note_list, concat_note_lists): New functions. (rm_other_notes): Make static. Adjust for selective scheduling. (remove_notes, restore_other_notes): New functions. (move_insn): Don't call reemit_notes. (choose_ready): Remove lookahead variable, use dfa_lookahead. Remove more_issue, max_points. Move the code to initialize max_lookahead_tries to max_issue. (schedule_block): Remove rgn_n_insns1 parameter. Don't allocate ready. Adjust uses of move_insn. Call restore_other_notes. (luid): Remove. (sched_init, sched_finish): Move Haifa-specific initialization/ finalization to ... (haifa_sched_init, haifa_sched_finish): ... respectively. New functions. (setup_sched_dump): New function. (haifa_init_only_bb): New static function. (haifa_speculate_insn): New static function. (try_ready): Use haifa_* instead of speculate_insn and change_pattern. (extend_ready, extend_all): Remove. (sched_extend_ready_list, sched_finish_ready_list): New functions. (create_check_block_twin, add_to_speculative_block): Use haifa_insns_init instead of extend_global. Update to use new initialization functions. Change parameter. (add_block): Remove. (sched_scan_info): New. (extend_bb, init_bb, extend_insn, init_insn, init_insns_in_bb, sched_scan): New static functions for walking through scheduling region. (sched_init_bbs): New functions to init / finalize basic block information. (sched_luids): New vector variable to replace uid_to_luid. (luids_extend_insn): New function. (sched_max_luid): New variable. (luids_init_insn): New function. (sched_init_luids, sched_finish_luids): New functions. (insn_luid): New debug function. (sched_extend_target): New function. (haifa_init_insn): New static function. (sched_init_only_bb): New hook. (sched_split_block): New hook. (sched_split_block_1): New function. (sched_create_empty_bb): New hook. (sched_create_empty_bb_1): New function. (common_sched_info, ready): New global variables. (current_sched_info_var): Remove. (move_block_after_check): Use common_sched_info. (haifa_luid_for_non_insn): New static function. (init_before_recovery): Use haifa_init_only_bb instead of add_block. * modulo-sched.c: (sms_sched_info): Rename to sms_common_sched_info. (sms_sched_deps_info, sms_sched_info): New. (setup_sched_infos): New. (sms_schedule): Initialize them. Call haifa_sched_init/finish. Do not call regstat_free_calls_crossed, as it called by sched_init. (sms_print_insn): Use const_rtx. * params.def (PARAM_MAX_PIPELINE_REGION_BLOCKS, PARAM_MAX_PIPELINE_REGION_INSNS, PARAM_SELSCHED_MAX_LOOKAHEAD, PARAM_SELSCHED_MAX_SCHED_TIMES, PARAM_SELSCHED_INSNS_TO_RENAME): New. * sched-deps.c (sched_deps_info): New. Update all relevant uses of current_sched_info to use it. (enum reg_pending_barrier_mode): Move to sched-int.h. (h_d_i_d): New variable. Initialize to NULL. ({true, output, anti, spec, forward}_dependency_cache): Initialize to NULL. (sched_has_condition_p): New function. Adjust users of sched_get_condition to use it instead. (conditions_mutex_p): Add arguments indicating which conditions are reversed. Use them. (sched_get_condition_with_rev): Rename from sched_get_condition. Add argument to indicate whether returned condition is reversed. Do not generate new rtx when condition should be reversed; indicate it by setting new argument instead. (add_dependence_list_and_free): Add deps parameter. Update all users. Do not free dependence list when deps context is readonly. (add_insn_mem_dependence, flush_pending_lists): Adjust for readonly contexts. (remove_from_dependence_list, remove_from_both_dependence_lists): New. (remove_from_deps): New. Use the above functions. (deps_analyze_insn): Do not flush pending write lists on speculation checks. Do not make speculation check a scheduling barrier for memory references. (cur_max_luid, cur_insn, can_start_lhs_rhs_p): New static variables. (add_or_update_back_dep_1): Initialize present_dep_type. (haifa_start_insn, haifa_finish_insn, haifa_note_reg_set, haifa_note_reg_clobber, haifa_note_reg_use, haifa_note_mem_dep, haifa_note_dep): New functions implementing dependence hooks for the Haifa scheduler. (note_reg_use, note_reg_set, note_reg_clobber, note_mem_dep, note_dep): New functions. (ds_to_dt): New function. (sched_analyze_reg, sched_analyze_1, sched_analyze_2, sched_analyze_insn): Update to use dependency hooks infrastructure and readonly contexts. (deps_analyze_insn): New function. Move part of logic from ... (sched_analyze): ... here. Also move some logic to ... (deps_start_bb): ... here. New function. (add_forw_dep, delete_forw_dep): Guard use of INSN_DEP_COUNT with sel_sched_p. (sched_deps_init): New function. Move code from ... (init_dependency_caches): ... here. (init_deps_data_vector): New. (sched_deps_finish): New function. Move code from ... (free_dependency_caches): ... here. (init_deps_global, finish_deps_global): Adjust for use with selective scheduling. (get_dep_weak): Move logic to ... (get_dep_weak_1): New function. (ds_merge): Move logic to ... (ds_merge_1): New static function. (ds_full_merge, ds_max_merge, ds_get_speculation_types): New functions. (ds_get_max_dep_weak): New function. * sched-ebb.c (sched_n_insns): Rename to sched_rgn_n_insns. (n_insns): Rename to rgn_n_insns. (debug_ebb_dependencies): New function. (init_ready_list): Use it. (ebb_print_insn): Indicate when an insn starts a new cycle. (contributes_to_priority, compute_jump_reg_dependencies, add_remove_insn, fix_recovery_cfg): Add ebb_ prefix to function names. (ebb_sched_deps_info, ebb_common_sched_info): New variables. (schedule_ebb): Initialize them. Use remove_notes instead of rm_other_notes. Use haifa_local_init/finish. (schedule_ebbs): Use haifa_sched_init/finish. * sched-int.h: Include basic-block.h and vecprim.h. (sched_verbose_param, enum sched_pass_id_t, bb_vec_t, insn_vec_t, rtx_vec_t): New. (struct sched_scan_info_def): New structure. (sched_scan_info, sched_scan, sched_init_bbs, sched_init_luids, sched_finish_luids, sched_extend_target, haifa_init_h_i_d, haifa_finish_h_i_d): Declare. (struct common_sched_info_def): New. (common_sched_info, haifa_common_sched_info, sched_emulate_haifa_p): Declare. (sel_sched_p): New. (sched_luids): Declare. (INSN_LUID, LUID_BY_UID, SET_INSN_LUID): Declare. (sched_max_luid, insn_luid): Declare. (note_list, remove_notes, restore_other_notes, bb_note): Declare. (sched_insns_init, sched_insns_finish, xrecalloc, move_insn, reemit_notes, print_insn, print_pattern, print_value, haifa_classify_insn, sel_find_rgns, sel_mark_hard_insn, dfa_state_size, advance_state, setup_sched_dump, sched_init, sched_finish, sel_insn_is_speculation_check): Export. (struct ready_list): Move from haifa-sched.c. (ready_try, ready, max_issue): Export. (find_fallthru_edge, sched_init_only_bb, sched_split_block, sched_split_block_1, sched_create_empty_bb, sched_create_empty_bb_1, sched_create_recovery_block, sched_create_recovery_edges): Export. (enum reg_pending_barrier_mode): Export. (struct deps): New fields `last_reg_pending_barrier' and `readonly'. (deps_t): New. (struct sched_info): Move compute_jump_reg_dependencies, use_cselib ... (struct haifa_insn_data): and cant_move to ... (struct sched_deps_info_def): ... this new structure. (h_i_d): Export. (HID): New accessor macro. Rewrite h_i_d accessor macros through HID. (struct region): Move from sched-rgn.h. (nr_regions, rgn_table, rgn_bb_table, block_to_bb, containing_rgn, RGN_NR_BLOCKS, RGN_BLOCKS, RGN_DONT_CALC_DEPS, RGN_HAS_REAL_EBB, BLOCK_TO_BB, CONTAINING_RGN): Export. (ebb_head, BB_TO_BLOCK, EBB_FIRST_BB, EBB_LAST_BB, INSN_BB): Likewise. (current_nr_blocks, current_blocks, target_bb): Likewise. (sched_is_disabled_for_current_region_p, sched_rgn_init, sched_rgn_finish, rgn_setup_region, sched_rgn_compute_dependencies, sched_rgn_local_init, extend_regions, rgn_make_new_region_out_of_new_block, compute_priorities, debug_rgn_dependencies, free_rgn_deps, contributes_to_priority, extend_rgns, deps_join rgn_setup_common_sched_info, rgn_setup_sched_infos, debug_regions, debug_region, dump_region_dot, dump_region_dot_file, haifa_sched_init, haifa_sched_finish): Export. * sched-rgn.c: Export region data structures. (debug_region, bb_in_region_p, dump_region_dot_file, dump_region_dot): New. (too_large): Use estimate_number_of_insns. (haifa_find_rgns): New. Move the code from ... (find_rgns): ... here. Call either sel_find_rgns or haifa_find_rgns. (free_trg_info): New. (compute_trg_info): Allocate candidate tables here instead of ... (init_ready_list): ... here. (rgn_common_sched_info, rgn_const_sched_deps_info, rgn_const_sel_sched_deps_info, rgn_sched_deps_info): New. (deps_join): New, extracted from ... (propagate_deps): ... here. (free_rgn_deps, compute_priorities): New function. (sched_rgn_init, sched_rgn_finish): New functions. (schedule_region): Use them. (sched_rgn_local_preinit, sched_rgn_local_init, sched_rgn_local_free, sched_rgn_local_finish): New functions. (rgn_make_new_region_out_of_new_block): New. * sched-vis.c (print_value, print_pattern): Make global. (dump_insn_slim_1, print_rtl_slim, debug_bb_slim, debug_bb_n_slim): New functions. * target-def.h (TARGET_SCHED_ALLOC_SCHED_CONTEXT, TARGET_SCHED_INIT_SCHED_CONTEXT, TARGET_SCHED_SET_SCHED_CONTEXT, TARGET_SCHED_CLEAR_SCHED_CONTEXT, TARGET_SCHED_FREE_SCHED_CONTEXT): New target hooks. Initialize them to 0. * target.h (struct gcc_target): Add them. * doc/invoke.texi: Document new flags and parameters. * doc/tm.texi: Document new target hooks. [-- Attachment #2: sel-sched-merge-sched.diff.gz --] [-- Type: application/gzip, Size: 198019 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Selective scheduling pass - scheduler changes [2/3] 2008-06-03 14:27 ` Selective scheduling pass - scheduler changes [2/3] Andrey Belevantsev @ 2008-06-03 22:03 ` Vladimir Makarov 2008-08-22 15:52 ` Andrey Belevantsev 0 siblings, 1 reply; 28+ messages in thread From: Vladimir Makarov @ 2008-06-03 22:03 UTC (permalink / raw) To: Andrey Belevantsev; +Cc: GCC Patches, Jim Wilson, Ayal Zaks Andrey Belevantsev wrote: > Hello, > > This patch is the largest part of the implementation, and it shows > changes to the scheduler itself as well as the new files. The main > changes in the scheduler fall into two categories: changes to the > dependence analysis and changes to the initialization mechanism. I'll look at the patch. But taking the size of the patch, the review probably will take several weeks. > > > OK for trunk? > Andrey > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Selective scheduling pass - scheduler changes [2/3] 2008-06-03 22:03 ` Vladimir Makarov @ 2008-08-22 15:52 ` Andrey Belevantsev 0 siblings, 0 replies; 28+ messages in thread From: Andrey Belevantsev @ 2008-08-22 15:52 UTC (permalink / raw) To: Vladimir Makarov; +Cc: GCC Patches, Jim Wilson, Ayal Zaks [-- Attachment #1: Type: text/plain, Size: 691 bytes --] Vladimir Makarov wrote: > Andrey Belevantsev wrote: >> Hello, >> >> This patch is the largest part of the implementation, and it shows >> changes to the scheduler itself as well as the new files. The main >> changes in the scheduler fall into two categories: changes to the >> dependence analysis and changes to the initialization mechanism. > > I'll look at the patch. But taking the size of the patch, the review > probably will take several weeks. Thanks again for the review, Vlad. For the record, here is the updated patch against trunk version 139129 incorporating all your suggestions from the review to which I answered with separate patches to the branch. Yours, Andrey [-- Attachment #2: sel-sched.changelog --] [-- Type: text/plain, Size: 16839 bytes --] 2008-08-22 Andrey Belevantsev <abel@ispras.ru> Dmitry Melnik <dm@ispras.ru> Dmitry Zhurikhin <zhur@ispras.ru> Alexander Monakov <amonakov@ispras.ru> Maxim Kuvyrkov <maxim@codesourcery.com> * sel-sched.h, sel-sched-dump.h, sel-sched-ir.h, sel-sched.c, sel-sched-dump.c, sel-sched-ir.c: New files. * Makefile.in (OBJS-common): Add selective scheduling object files. (sel-sched.o, sel-sched-dump.o, sel-sched-ir.o): New entries. (SEL_SCHED_IR_H, SEL_SCHED_DUMP_H): New entries. (sched-vis.o): Add dependency on $(INSN_ATTR_H). * common.opt (fsel-sched-bookkeeping, fsel-sched-pipelining, fsel-sched-pipelining-outer-loops, fsel-sched-renaming, fsel-sched-substitution, fselective-scheduling): New flags. * haifa-sched.c: Include vecprim.h and cfgloop.h. (issue_rate, sched_verbose_param, note_list, dfa_state_size, ready_try, cycle_issued_insns, spec_info): Make global. (readyp): Initialize. (dfa_lookahead): New global variable. (old_max_uid, old_last_basic_block): Remove. (h_i_d): Make it a vector. (INSN_TICK, INTER_TICK, QUEUE_INDEX, INSN_COST): Make them work through HID macro. (after_recovery, adding_bb_to_current_region_p): New variables to handle correct insertion of the recovery code. (struct ready_list): Move declaration to sched-int.h. (rgn_n_insns): Removed. (rtx_vec_t): Move to sched-int.h. (find_insn_reg_weight): Remove. (find_insn_reg_weight1): Rename to find_insn_reg_weight. (haifa_init_h_i_d, haifa_finish_h_i_d): New functions to initialize / finalize haifa instruction data. (extend_h_i_d, init_h_i_d): Rewrite. (unlink_other_notes): Move logic to add_to_note_list. Handle selective scheduler. (ready_lastpos, ready_element, ready_sort, reemit_notes, find_fallthru_edge): Make global, remove static prototypes. (max_issue): Make global. Add privileged_n and state parameters. Use them. (extend_global, extend_all): Removed. (init_before_recovery): Add new param. Fix the handling of the case when we insert a recovery code before the EXIT which has a predecessor with a fallthrough edge to it. (create_recovery_block): Make global. Rename to sched_create_recovery_block. Update. (change_pattern): Rename to sched_change_pattern. Make global. (speculate_insn): Rename to sched_speculate_insn. Make global. Split haifa-specific functionality into ... (haifa_change_pattern): New static function. (sched_extend_bb): New static function. (sched_init_bbs): New function. (current_sched_info): Change type to struct haifa_sched_info. (insn_cost): Adjust for selective scheduling. (dep_cost_1): New function. Move logic from ... (dep_cost): ... here. (dep_cost): Use dep_cost_1. (contributes_to_priority_p): Use sched_deps_info instead of current_sched_info. (priority): Adjust to work with selective scheduling. Process the corner case when all dependencies don't contribute to priority. (rank_for_schedule): Use ds_weak instead of dep_weak. (advance_state): New function. Move logic from ... (advance_one_cycle): ... here. (add_to_note_list, concat_note_lists): New functions. (rm_other_notes): Make static. Adjust for selective scheduling. (remove_notes, restore_other_notes): New functions. (move_insn): Add two arguments. Update assert. Don't call reemit_notes. (choose_ready): Remove lookahead variable, use dfa_lookahead. Remove more_issue, max_points. Move the code to initialize max_lookahead_tries to max_issue. (schedule_block): Remove rgn_n_insns1 parameter. Don't allocate ready. Adjust use of move_insn. Call restore_other_notes. (luid): Remove. (sched_init, sched_finish): Move Haifa-specific initialization/ finalization to ... (haifa_sched_init, haifa_sched_finish): ... respectively. New functions. (setup_sched_dump): New function. (haifa_init_only_bb): New static function. (haifa_speculate_insn): New static function. (try_ready): Use haifa_* instead of speculate_insn and change_pattern. (extend_ready, extend_all): Remove. (sched_extend_ready_list, sched_finish_ready_list): New functions. (create_check_block_twin, add_to_speculative_block): Use haifa_insns_init instead of extend_global. Update to use new initialization functions. Change parameter. Factor out code from create_check_block_twin to ... (sched_create_recovery_edges) ... this new function. (add_block): Remove. (sched_scan_info): New. (extend_bb): Use sched_scan_info. (init_bb, extend_insn, init_insn, init_insns_in_bb, sched_scan): New static functions for walking through scheduling region. (sched_luids): New vector variable to replace uid_to_luid. (luids_extend_insn): New function. (sched_max_luid): New variable. (luids_init_insn): New function. (sched_init_luids, sched_finish_luids): New functions. (insn_luid): New debug function. (sched_extend_target): New function. (haifa_init_insn): New static function. (sched_init_only_bb): New hook. (sched_split_block): New hook. (sched_split_block_1): New function. (sched_create_empty_bb): New hook. (sched_create_empty_bb_1): New function. (common_sched_info, ready): New global variables. (current_sched_info_var): Remove. (move_block_after_check): Use common_sched_info. (haifa_luid_for_non_insn): New static function. (init_before_recovery): Use haifa_init_only_bb instead of add_block. * modulo-sched.c: (issue_rate): Remove static declaration. (sms_sched_info): Change type to haifa_sched_info. (sms_sched_deps_info, sms_common_sched_info): New variables. (setup_sched_infos): New. (sms_schedule): Initialize them. Call haifa_sched_init/finish. Do not call regstat_free_calls_crossed. (sms_print_insn): Use const_rtx. * params.def (PARAM_MAX_PIPELINE_REGION_BLOCKS, PARAM_MAX_PIPELINE_REGION_INSNS, PARAM_SELSCHED_MAX_LOOKAHEAD, PARAM_SELSCHED_MAX_SCHED_TIMES, PARAM_SELSCHED_INSNS_TO_RENAME, PARAM_SCHED_MEM_TRUE_DEP_COST): New. * sched-deps.c (sched_deps_info): New. Update all relevant uses of current_sched_info to use it. (enum reg_pending_barrier_mode): Move to sched-int.h. (h_d_i_d): New variable. Initialize to NULL. ({true, output, anti, spec, forward}_dependency_cache): Initialize to NULL. (estimate_dep_weak): Remove static declaration. (sched_has_condition_p): New function. Adjust users of sched_get_condition to use it instead. (conditions_mutex_p): Add arguments indicating which conditions are reversed. Use them. (sched_get_condition_with_rev): Rename from sched_get_condition. Add argument to indicate whether returned condition is reversed. Do not generate new rtx when condition should be reversed; indicate it by setting new argument instead. (add_dependence_list_and_free): Add deps parameter. Update all users. Do not free dependence list when deps context is readonly. (add_insn_mem_dependence, flush_pending_lists): Adjust for readonly contexts. (remove_from_dependence_list, remove_from_both_dependence_lists): New. (remove_from_deps): New. Use the above functions. (cur_insn, can_start_lhs_rhs_p): New static variables. (add_or_update_back_dep_1): Initialize present_dep_type. (haifa_start_insn, haifa_finish_insn, haifa_note_reg_set, haifa_note_reg_clobber, haifa_note_reg_use, haifa_note_mem_dep, haifa_note_dep): New functions implementing dependence hooks for the Haifa scheduler. (note_reg_use, note_reg_set, note_reg_clobber, note_mem_dep, note_dep): New functions. (ds_to_dt, extend_deps_reg_info, maybe_extend_reg_info_p): New functions. (init_deps): Initialize last_reg_pending_barrier and deps->readonly. (free_deps): Initialize deps->reg_last. (sched_analyze_reg, sched_analyze_1, sched_analyze_2, sched_analyze_insn): Update to use dependency hooks infrastructure and readonly contexts. (deps_analyze_insn): New function. Move part of logic from ... (sched_analyze): ... here. Also move some logic to ... (deps_start_bb): ... here. New function. (add_forw_dep, delete_forw_dep): Guard use of INSN_DEP_COUNT with sel_sched_p. (sched_deps_init): New function. Move code from ... (init_dependency_caches): ... here. Remove. (init_deps_data_vector): New. (sched_deps_finish): New function. Move code from ... (free_dependency_caches): ... here. Remove. (init_deps_global, finish_deps_global): Adjust for use with selective scheduling. (get_dep_weak): Move logic to ... (get_dep_weak_1): New function. (ds_merge): Move logic to ... (ds_merge_1): New static function. (ds_full_merge, ds_max_merge, ds_get_speculation_types): New functions. (ds_get_max_dep_weak): New function. * sched-ebb.c (sched_n_insns): Rename to sched_rgn_n_insns. (n_insns): Rename to rgn_n_insns. (debug_ebb_dependencies): New function. (init_ready_list): Use it. (begin_schedule_ready): Use sched_init_only_bb. (ebb_print_insn): Indicate when an insn starts a new cycle. (contributes_to_priority, compute_jump_reg_dependencies, add_remove_insn, fix_recovery_cfg): Add ebb_ prefix to function names. (add_block1): Remove to ebb_add_block. (ebb_sched_deps_info, ebb_common_sched_info): New variables. (schedule_ebb): Initialize them. Use remove_notes instead of rm_other_notes. Use haifa_local_init/finish. (schedule_ebbs): Use haifa_sched_init/finish. * sched-int.h: Include vecprim.h, remove rtl.h. (struct ready_list): Delete declaration. (sched_verbose_param, enum sched_pass_id_t, bb_vec_t, insn_vec_t, rtx_vec_t): New. (struct sched_scan_info_def): New structure. (sched_scan_info, sched_scan, sched_init_bbs, sched_init_luids, sched_finish_luids, sched_extend_target, haifa_init_h_i_d, haifa_finish_h_i_d): Declare. (struct common_sched_info_def): New. (common_sched_info, haifa_common_sched_info, sched_emulate_haifa_p): Declare. (sel_sched_p): New. (sched_luids): Declare. (INSN_LUID, LUID_BY_UID, SET_INSN_LUID): Declare. (sched_max_luid, insn_luid): Declare. (note_list, remove_notes, restore_other_notes, bb_note): Declare. (sched_insns_init, sched_insns_finish, xrecalloc, reemit_notes, print_insn, print_pattern, print_value, haifa_classify_insn, sel_find_rgns, sel_mark_hard_insn, dfa_state_size, advance_state, setup_sched_dump, sched_init, sched_finish, sel_insn_is_speculation_check): Export. (struct ready_list): Move from haifa-sched.c. (ready_try, ready, max_issue): Export. (ebb_compute_jump_reg_dependencies, find_fallthru_edge, sched_init_only_bb, sched_split_block, sched_split_block_1, sched_create_empty_bb, sched_create_empty_bb_1, sched_create_recovery_block, sched_create_recovery_edges): Export. (enum reg_pending_barrier_mode): Export. (struct deps): New fields `last_reg_pending_barrier' and `readonly'. (deps_t): New. (struct sched_info): Rename to haifa_sched_info. Use const_rtx for print_insn field. Move add_block and fix_recovery_cfg to common_sched_info_def. Move compute_jump_reg_dependencies, use_cselib ... (struct sched_deps_info_def): ... this new structure. (sched_deps_info): Declare. (struct spec_info_def): Remove weakness_cutoff, add data_weakness_cutoff and control_weakness_cutoff. (spec_info): Declare. (struct _haifa_deps_insn_data): Split from haifa_insn_data. Add dep_count field. (struct haifa_insn_data): Rename to struct _haifa_insn_data. (haifa_insn_data_def, haifa_insn_data_t): New typedefs. (current_sched_info): Change type to struct haifa_sched_info. (haifa_deps_insn_data_def, haifa_deps_insn_data_t): New typedefs. (h_d_i_d): New variable. (HDID): New accessor macro. (h_i_d): Change type to VEC (haifa_insn_data_def, heap) *. (HID): New accessor macro. Rewrite h_i_d accessor macros through HID and HDID. (IS_SPECULATION_CHECK_P): Update for selective scheduler. (enum SCHED_FLAGS): Update for selective scheduler. (enum SPEC_SCHED_FLAGS): New flag SEL_SCHED_SPEC_DONT_CHECK_CONTROL. (init_dependency_caches, free_dependency_caches): Delete declarations. (deps_analyze_insn, remove_from_deps, get_dep_weak_1, estimate_dep_weak, ds_full_merge, ds_max_merge, ds_weak, ds_get_speculation_types, ds_get_max_dep_weak, sched_deps_init, sched_deps_finish, haifa_note_reg_set, haifa_note_reg_use, haifa_note_reg_clobber, maybe_extend_reg_info_p, deps_start_bb, ds_to_dt): Export. (rm_other_notes): Delete declaration. (schedule_block): Remove one argument. (cycle_issued_insns, issue_rate, dfa_lookahead, ready_sort, ready_element, ready_lastpos, sched_extend_ready_list, sched_finish_ready_list, sched_change_pattern, sched_speculate_insn, concat_note_lists): Export. (struct region): Move from sched-rgn.h. (nr_regions, rgn_table, rgn_bb_table, block_to_bb, containing_rgn, RGN_NR_BLOCKS, RGN_BLOCKS, RGN_DONT_CALC_DEPS, RGN_HAS_REAL_EBB, BLOCK_TO_BB, CONTAINING_RGN): Export. (ebb_head, BB_TO_BLOCK, EBB_FIRST_BB, EBB_LAST_BB, INSN_BB): Likewise. (current_nr_blocks, current_blocks, target_bb): Likewise. (dep_cost_1, sched_is_disabled_for_current_region_p, sched_rgn_init, sched_rgn_finish, rgn_setup_region, sched_rgn_compute_dependencies, sched_rgn_local_init, extend_regions, rgn_make_new_region_out_of_new_block, compute_priorities, debug_rgn_dependencies, free_rgn_deps, contributes_to_priority, extend_rgns, deps_join rgn_setup_common_sched_info, rgn_setup_sched_infos, debug_regions, debug_region, dump_region_dot, dump_region_dot_file, haifa_sched_init, haifa_sched_finish): Export. * sched-rgn.c: Include sel-sched.h. (ref_counts): New static variable. Use it ... (INSN_REF_COUNT): ... here. Rewrite and move closer to uses. (FED_BY_SPEC_LOAD, IS_LOAD_INSN): Rewrite to use HID accessor macro. (sched_is_disabled_for_current_region_p): Delete static declaration. (struct region): Move to sched-int.h. (nr_regions, rgn_table, rgn_bb_table, block_to_bb, containing_rgn, ebb_head): Define and initialize. (RGN_NR_BLOCKS, RGN_BLOCKS, RGN_DONT_CALC_DEPS, RGN_HAS_REAL_EBB, BLOCK_TO_BB, CONTAINING_RGN, debug_regions, extend_regions, BB_TO_BLOCK, EBB_FIRST_BB, EBB_LAST_BB): Move to sched-int.h. (find_single_block_region): Add new argument to indicate that EBB regions should be constructed. (debug_live): Delete declaration. (current_nr_blocks, current_blocks, target_bb): Remove static qualifiers. (compute_dom_prob_ps, check_live, update_live, set_spec_fed): Delete declaration. (init_regions): Delete declaration. (debug_region, bb_in_region_p, dump_region_dot_file, dump_region_dot, rgn_estimate_number_of_insns): New. (too_large): Use estimate_number_of_insns. (haifa_find_rgns): New. Move the code from ... (find_rgns): ... here. Call either sel_find_rgns or haifa_find_rgns. (free_trg_info): New. (compute_trg_info): Allocate candidate tables here instead of ... (init_ready_list): ... here. (rgn_print_insn): Use const_rtx. (contributes_to_priority, extend_regions): Delete static declaration. (add_remove_insn, fix_recovery_cfg): Add rgn_ to function names. (add_block1): Rename to rgn_add_block. (debug_rgn_dependencies): Delete static qualifier. (new_ready): Use sched_deps_info. Simplify. (rgn_common_sched_info, rgn_const_sched_deps_info, rgn_const_sel_sched_deps_info, rgn_sched_deps_info, rgn_sched_info): New. (region_sched_info): Rename to rgn_const_sched_info. (deps_join): New, extracted from ... (propagate_deps): ... here. (compute_block_dependences, debug_dependencies): Update for selective scheduling. (free_rgn_deps, compute_priorities): New functions. (sched_rgn_init, sched_rgn_finish, rgn_setup_region, sched_rgn_compute_dependencies): New functions. (schedule_region): Use them. (sched_rgn_local_init, sched_rgn_local_free, sched_rgn_local_finish, rgn_setup_common_sched_info, rgn_setup_sched_infos): New functions. (schedule_insns): Call new functions that were split out. (rgn_make_new_region_out_of_new_block): New. (rest_of_handle_sched, rest_of_handle_sched2): Call selective scheduling when appropriate. * sched-vis.c: Include insn-attr.h. (print_value, print_pattern): Make global. (print_rtl_slim, debug_bb_slim, debug_bb_n_slim): New functions. * target-def.h (TARGET_SCHED_ADJUST_COST_2, TARGET_SCHED_ALLOC_SCHED_CONTEXT, TARGET_SCHED_INIT_SCHED_CONTEXT, TARGET_SCHED_SET_SCHED_CONTEXT, TARGET_SCHED_CLEAR_SCHED_CONTEXT, TARGET_SCHED_FREE_SCHED_CONTEXT, TARGET_SCHED_GET_INSN_CHECKED_DS, TARGET_SCHED_GET_INSN_SPEC_DS, TARGET_SCHED_SKIP_RTX_P): New target hooks. Initialize them to 0. (TARGET_SCHED_GEN_CHECK): Rename to TARGET_SCHED_GEN_SPEC_CHECK. * target.h (struct gcc_target): Add them. Rename gen_check field to gen_spec_check. * flags.h (sel_sched_switch_set): Declare. * opts.c (sel_sched_switch_set): New variable. (decode_options): Unset flag_sel_sched_pipelining_outer_loops if pipelining is disabled from command line. (common_handle_option): Record whether selective scheduling is requested from command line. * doc/invoke.texi: Document new flags and parameters. * doc/tm.texi: Document new target hooks. [-- Attachment #3: sel-sched.diff.gz --] [-- Type: application/gzip, Size: 192076 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Selective scheduling pass - target changes (ia64 & rs6000) [3/3] 2008-06-03 14:24 [RFC] Selective scheduling pass Andrey Belevantsev 2008-06-03 14:26 ` Selective scheduling pass - middle end changes [1/1] Andrey Belevantsev 2008-06-03 14:27 ` Selective scheduling pass - scheduler changes [2/3] Andrey Belevantsev @ 2008-06-03 14:28 ` Andrey Belevantsev 2008-08-22 16:04 ` Andrey Belevantsev 2008-06-03 22:03 ` [RFC] Selective scheduling pass Vladimir Makarov ` (2 subsequent siblings) 5 siblings, 1 reply; 28+ messages in thread From: Andrey Belevantsev @ 2008-06-03 14:28 UTC (permalink / raw) To: GCC Patches; +Cc: Jim Wilson, Vladimir Makarov, Ayal Zaks [-- Attachment #1: Type: text/plain, Size: 8286 bytes --] Hello, This patch shows the target dependent changes for the selective scheduler. The majority of changes is in the config/ia64/ia64.c file. They also include a lot of tunings done thorough the project. Each tuning originated from a problem test case (usually from SPEC or from Al Aburto tests) that was fixed by it. The summary of changes is as follows: o speculation support is improved to allow more patterns to be speculative (speculable1 and speculable2 attributes mark patterns/alternatives that are valid for speculation); o bundling also optimizes for minimal number of mid-bundle stops; o we lower the priority of memory operations if we have issued too many of them on the current cycle; o default function and loop alignment is set to 64 and 32, respectively; o we discard cost of memory dependencies which are likely false; o we place a stop bit after every simulated processor cycle; o the incorrect bypass in itanium2.md that resulted in stalls between fma and st insns is removed. Also, to support the proper alignment on scheduled loops, we have put pass_compute_alignments after pass_machine_reorg (this part actually is in the middle-end patch, but I mention it here as it was inspired by the Itanium). The rs6000 change is a minimal version needed to support the selective scheduler for a target. As we now can have several points in a region at which we are scheduling, the backend can no longer save the scheduler state in private variables and use it in the hooks (e.g. last_scheduled_insn). For that purpose, a concept of a target context is introduced: all private scheduler-related target info should be put in there, and the target should provide hooks for creating/deleting/setting as current a target context. The scheduler then treats target contexts as opaque pointers. Also, we do not yet support adjust_priority hooks (but the work on this is underway), so that part of the rs6000 scheduler hooks is disabled. OK for trunk? Andrey 2008-06-03 Andrey Belevantsev <abel@ispras.ru> Dmitry Melnik <dm@ispras.ru> Dmitry Zhurikhin <zhur@ispras.ru> Alexander Monakov <amonakov@ispras.ru> Maxim Kuvyrkov <maxim@codesourcery.com> * config/ia64/ia64.c: Include sel-sched.h. Rewrite speculation hooks. (ia64_gen_spec_insn): Removed. (get_spec_check_gen_function, insn_can_be_in_speculative_p, ia64_gen_spec_check): New static functions. (ia64_alloc_sched_context, ia64_init_sched_context, ia64_set_sched_context, ia64_clear_sched_context, ia64_free_sched_context, ia64_get_insn_spec_ds, ia64_get_insn_checked_ds, ia64_skip_rtx_p): Declare functions. (ia64_needs_block_p): Change prototype. (ia64_gen_check): Rename to ia64_gen_spec_check. (ia64_adjust_cost): Rename to ia64_adjust_cost_2. Add new parameter into declaration, add special memory dependencies handling. (TARGET_SCHED_ALLOC_SCHED_CONTEXT, TARGET_SCHED_INIT_SCHED_CONTEXT, TARGET_SCHED_SET_SCHED_CONTEXT, TARGET_SCHED_CLEAR_SCHED_CONTEXT, TARGET_SCHED_FREE_SCHED_CONTEXT, TARGET_SCHED_GET_INSN_SPEC_DS, TARGET_SCHED_GET_INSN_CHECKED_DS, TARGET_SCHED_SKIP_RTX_P): Define new target hooks. (TARGET_SCHED_GEN_CHECK): Rename to TARGET_SCHED_GEN_SPEC_CHECK. (ia64_override_options): Turn on selective scheduling with -O3, disable -fauto-inc-dec. Initialize align_loops and align_functions to 32 and 64, respectively. Set global selective scheduling flags according to target-dependent flags. (rtx_needs_barrier): Support UNSPEC_LDS_A. (group_barrier_needed): Use new mstop-bit-before-check flag. Add heuristic. (dfa_state_size): Make global. (spec_check_no, max_uid): Remove. (mem_ops_in_group, current_cycle): New variables. (ia64_sched_init): Disable checks for !SCHED_GROUP_P after reload. Initialize new variables. (is_load_p, record_memory_reference): New functions. (ia64_dfa_sched_reorder): Lower priority of loads when limit is reached. (ia64_variable_issue): Change use of current_sched_info to sched_deps_info. Update comment. Note if a load or a store is issued. (ia64_first_cycle_multipass_dfa_lookahead_guard_spec): Require a cycle advance if maximal number of loads or stores was issued on current cycle. (scheduled_good_insn): New static helper function. (ia64_dfa_new_cycle): Assert that last_scheduled_insn is set when a group barrier is needed. Fix vertical spacing. Guard the code doing state transition with last_scheduled_insn check. Mark that a stop bit should be before current insn if there was a cycle advance. Update current_cycle and mem_ops_in_group. (ia64_h_i_d_extended): Change use of current_sched_info to sched_deps_info. Reallocate stops_p by larger chunks. (struct _ia64_sched_context): New structure. (ia64_sched_context_t): New typedef. (ia64_alloc_sched_context, ia64_init_sched_context, ia64_set_sched_context, ia64_clear_sched_context, ia64_free_sched_context): New static functions. (gen_func_t): New typedef. (get_spec_load_gen_function): New function. (SPEC_GEN_EXTEND_OFFSET): Declare. (ia64_set_sched_flags): Check common_sched_info instead of *flags. (get_mode_no_for_insn): Change the condition that prevents use of special hardware registers so it can now handle pseudos. (get_spec_unspec_code): New function. (ia64_skip_rtx_p, get_insn_spec_code, ia64_get_insn_spec_ds, ia64_get_insn_checked_ds, ia64_gen_spec_load): New static functions. (ia64_speculate_insn, ia64_needs_block_p): Support branchy checks during selective scheduling. (ia64_speculate_insn): Use ds_get_speculation_types when determining whether we need to change the pattern. (SPEC_GEN_LD_MAP, SPEC_GEN_CHECK_OFFSET): Declare. (ia64_spec_check_src_p): Support new speculation/check codes. (struct bundle_state): New field. (issue_nops_and_insn): Initialize it. (insert_bundle_state): Minimize mid-bundle stop bits. (important_for_bundling_p): New function. (get_next_important_insn): Use important_for_bundling_p. (bundling): When shifting TImode from unimportant insns, ignore also group barriers. Assert that best state is found before the backward bundling pass. Print number of mid-bundle stop bits. Minimize mid-bundle stop bits. Check correct calculation of mid-bundle stop bits. (ia64_sched_finish, final_emit_insn_group_barriers): Fix formatting. (final_emit_insn_group_barriers): Emit stop bits before insns starting a new cycle. (sel2_run): New variable. (ia64_reorg): When flag_selective_scheduling is set, run the selective scheduling pass instead of schedule_ebbs. Adjust for flag_selective_scheduling2. (ia64_optimization_options): Declare new parameter. * config/ia64/ia64.md (speculable1, speculable2): New attributes. (UNSPEC_LDS_A): New UNSPEC. (movqi_internal, movhi_internal, movsi_internal, movdi_internal, movti_internal, movsf_internal, movdf_internal, movxf_internal): Make visible. Add speculable* attributes. (output_c_nc): New mode attribute. (mov<mode>_speculative_a, zero_extend<mode>di2_speculative_a, mov<mode>_nc, zero_extend<mode>di2_nc, advanced_load_check_nc_<mode>): New insns. (zero_extend*): Add speculable* attributes. * config/ia64/ia64.opt (msched_fp_mem_deps_zero_cost): New option. (msched-stop-bits-after-every-cycle): Likewise. (mstop-bit-before-check): Likewise. (msched-max-memory-insns, msched-max-memory-insns-hard-limit): Likewise. (msched-spec-verbose, msched-prefer-non-data-spec-insns, msched-prefer-non-control-spec-insns, msched-count-spec-in-critical-path, msel-sched-renaming, msel-sched-substitution, msel-sched-data-spec, msel-sched-control-spec, msel-sched-dont-check-control-spec): Use Target Report Var instead of Common Report Var. * config/ia64/itanium2.md: Remove strange bypass. * config/ia64/t-ia64 (ia64.o): Add dependency on sel-sched.h. * config/rs6000/rs6000.c (rs6000_init_sched_context, rs6000_alloc_sched_context, rs6000_set_sched_context, rs6000_free_sched_context): New functions. (struct _rs6000_sched_context): New. (rs6000_sched_reorder2): Do not modify INSN_PRIORITY for selective scheduling. (rs6000_sched_finish): Do not run for selective scheduling. [-- Attachment #2: sel-sched-merge-targets.diff.gz --] [-- Type: application/gzip, Size: 21330 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Selective scheduling pass - target changes (ia64 & rs6000) [3/3] 2008-06-03 14:28 ` Selective scheduling pass - target changes (ia64 & rs6000) [3/3] Andrey Belevantsev @ 2008-08-22 16:04 ` Andrey Belevantsev 2008-08-29 13:41 ` [Ping] [GWP/ia64/rs6000 maintainer needed] " Andrey Belevantsev 2008-09-25 22:39 ` sje 0 siblings, 2 replies; 28+ messages in thread From: Andrey Belevantsev @ 2008-08-22 16:04 UTC (permalink / raw) To: GCC Patches; +Cc: Jim Wilson, Vladimir Makarov, Ayal Zaks [-- Attachment #1: Type: text/plain, Size: 6719 bytes --] Hello, This is the updated patch which I resend with the other parts. I think the only change is that the flag_selective_scheduling option is turned on in ia64_optimization_options, not in ia64_override_options. This is actually the last part of the selective scheduler patch that did not get reviewed yet. Maybe a global write maintainer and a rs6000 maintainer could have a look? Thanks, Andrey 2008-08-22 Andrey Belevantsev <abel@ispras.ru> Dmitry Melnik <dm@ispras.ru> Dmitry Zhurikhin <zhur@ispras.ru> Alexander Monakov <amonakov@ispras.ru> Maxim Kuvyrkov <maxim@codesourcery.com> * config/ia64/ia64.c: Include sel-sched.h. Rewrite speculation hooks. (ia64_gen_spec_insn): Removed. (get_spec_check_gen_function, insn_can_be_in_speculative_p, ia64_gen_spec_check): New static functions. (ia64_alloc_sched_context, ia64_init_sched_context, ia64_set_sched_context, ia64_clear_sched_context, ia64_free_sched_context, ia64_get_insn_spec_ds, ia64_get_insn_checked_ds, ia64_skip_rtx_p): Declare functions. (ia64_needs_block_p): Change prototype. (ia64_gen_check): Rename to ia64_gen_spec_check. (ia64_adjust_cost): Rename to ia64_adjust_cost_2. Add new parameter into declaration, add special memory dependencies handling. (TARGET_SCHED_ALLOC_SCHED_CONTEXT, TARGET_SCHED_INIT_SCHED_CONTEXT, TARGET_SCHED_SET_SCHED_CONTEXT, TARGET_SCHED_CLEAR_SCHED_CONTEXT, TARGET_SCHED_FREE_SCHED_CONTEXT, TARGET_SCHED_GET_INSN_SPEC_DS, TARGET_SCHED_GET_INSN_CHECKED_DS, TARGET_SCHED_SKIP_RTX_P): Define new target hooks. (TARGET_SCHED_GEN_CHECK): Rename to TARGET_SCHED_GEN_SPEC_CHECK. (ia64_optimization_options): Turn on selective scheduling with -O3, disable -fauto-inc-dec. (ia64_override_options): Initialize align_loops and align_functions to 32 and 64, respectively. Set global selective scheduling flags according to target-dependent flags. (rtx_needs_barrier): Support UNSPEC_LDS_A. (group_barrier_needed): Use new mstop-bit-before-check flag. Add heuristic. (dfa_state_size): Make global. (spec_check_no, max_uid): Remove. (mem_ops_in_group, current_cycle): New variables. (ia64_sched_init): Disable checks for !SCHED_GROUP_P after reload. Initialize new variables. (is_load_p, record_memory_reference): New functions. (ia64_dfa_sched_reorder): Lower priority of loads when limit is reached. (ia64_variable_issue): Change use of current_sched_info to sched_deps_info. Update comment. Note if a load or a store is issued. (ia64_first_cycle_multipass_dfa_lookahead_guard_spec): Require a cycle advance if maximal number of loads or stores was issued on current cycle. (scheduled_good_insn): New static helper function. (ia64_dfa_new_cycle): Assert that last_scheduled_insn is set when a group barrier is needed. Fix vertical spacing. Guard the code doing state transition with last_scheduled_insn check. Mark that a stop bit should be before current insn if there was a cycle advance. Update current_cycle and mem_ops_in_group. (ia64_h_i_d_extended): Change use of current_sched_info to sched_deps_info. Reallocate stops_p by larger chunks. (struct _ia64_sched_context): New structure. (ia64_sched_context_t): New typedef. (ia64_alloc_sched_context, ia64_init_sched_context, ia64_set_sched_context, ia64_clear_sched_context, ia64_free_sched_context): New static functions. (gen_func_t): New typedef. (get_spec_load_gen_function): New function. (SPEC_GEN_EXTEND_OFFSET): Declare. (ia64_set_sched_flags): Check common_sched_info instead of *flags. (get_mode_no_for_insn): Change the condition that prevents use of special hardware registers so it can now handle pseudos. (get_spec_unspec_code): New function. (ia64_skip_rtx_p, get_insn_spec_code, ia64_get_insn_spec_ds, ia64_get_insn_checked_ds, ia64_gen_spec_load): New static functions. (ia64_speculate_insn, ia64_needs_block_p): Support branchy checks during selective scheduling. (ia64_speculate_insn): Use ds_get_speculation_types when determining whether we need to change the pattern. (SPEC_GEN_LD_MAP, SPEC_GEN_CHECK_OFFSET): Declare. (ia64_spec_check_src_p): Support new speculation/check codes. (struct bundle_state): New field. (issue_nops_and_insn): Initialize it. (insert_bundle_state): Minimize mid-bundle stop bits. (important_for_bundling_p): New function. (get_next_important_insn): Use important_for_bundling_p. (bundling): When shifting TImode from unimportant insns, ignore also group barriers. Assert that best state is found before the backward bundling pass. Print number of mid-bundle stop bits. Minimize mid-bundle stop bits. Check correct calculation of mid-bundle stop bits. (ia64_sched_finish, final_emit_insn_group_barriers): Fix formatting. (final_emit_insn_group_barriers): Emit stop bits before insns starting a new cycle. (sel2_run): New variable. (ia64_reorg): When flag_selective_scheduling is set, run the selective scheduling pass instead of schedule_ebbs. Adjust for flag_selective_scheduling2. (ia64_optimization_options): Declare new parameter. * config/ia64/ia64.md (speculable1, speculable2): New attributes. (UNSPEC_LDS_A): New UNSPEC. (movqi_internal, movhi_internal, movsi_internal, movdi_internal, movti_internal, movsf_internal, movdf_internal, movxf_internal): Make visible. Add speculable* attributes. (output_c_nc): New mode attribute. (mov<mode>_speculative_a, zero_extend<mode>di2_speculative_a, mov<mode>_nc, zero_extend<mode>di2_nc, advanced_load_check_nc_<mode>): New insns. (zero_extend*): Add speculable* attributes. * config/ia64/ia64.opt (msched_fp_mem_deps_zero_cost): New option. (msched-stop-bits-after-every-cycle): Likewise. (mstop-bit-before-check): Likewise. (msched-max-memory-insns, msched-max-memory-insns-hard-limit): Likewise. (msched-spec-verbose, msched-prefer-non-data-spec-insns, msched-prefer-non-control-spec-insns, msched-count-spec-in-critical-path, msel-sched-renaming, msel-sched-substitution, msel-sched-data-spec, msel-sched-control-spec, msel-sched-dont-check-control-spec): Use Target Report Var instead of Common Report Var. * config/ia64/itanium2.md: Remove strange bypass. * config/ia64/t-ia64 (ia64.o): Add dependency on sel-sched.h. * config/rs6000/rs6000.c (rs6000_init_sched_context, rs6000_alloc_sched_context, rs6000_set_sched_context, rs6000_free_sched_context): New functions. (struct _rs6000_sched_context): New. (rs6000_sched_reorder2): Do not modify INSN_PRIORITY for selective scheduling. (rs6000_sched_finish): Do not run for selective scheduling. [-- Attachment #2: sel-sched-target.diff.gz --] [-- Type: application/gzip, Size: 21401 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* [Ping] [GWP/ia64/rs6000 maintainer needed] Re: Selective scheduling pass - target changes (ia64 & rs6000) [3/3] 2008-08-22 16:04 ` Andrey Belevantsev @ 2008-08-29 13:41 ` Andrey Belevantsev 2008-08-29 15:01 ` Mark Mitchell 2008-09-25 22:39 ` sje 1 sibling, 1 reply; 28+ messages in thread From: Andrey Belevantsev @ 2008-08-29 13:41 UTC (permalink / raw) To: GCC Patches; +Cc: Jim Wilson, Vladimir Makarov, Ayal Zaks, Mark Mitchell Hello, [CC'ing Mark as both GWP and RM] Andrey Belevantsev wrote: > This is actually the last part of the selective scheduler patch that did > not get reviewed yet. Maybe a global write maintainer and a rs6000 > maintainer could have a look? The target changes of the selective scheduler is still the only unreviewed part. Without this part, the other scheduler reviews will be useless. There is only a few days left before stage1 closes. As Jim doesn't have enough time to look at the patch, and there is no more ia64 maintaners, maybe a global write maintainer will take a look? The last version of the patch can be found at http://gcc.gnu.org/ml/gcc-patches/2008-08/msg01669.html. Andrey ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Ping] [GWP/ia64/rs6000 maintainer needed] Re: Selective scheduling pass - target changes (ia64 & rs6000) [3/3] 2008-08-29 13:41 ` [Ping] [GWP/ia64/rs6000 maintainer needed] " Andrey Belevantsev @ 2008-08-29 15:01 ` Mark Mitchell 0 siblings, 0 replies; 28+ messages in thread From: Mark Mitchell @ 2008-08-29 15:01 UTC (permalink / raw) To: Andrey Belevantsev; +Cc: GCC Patches, Jim Wilson, Vladimir Makarov, Ayal Zaks Andrey Belevantsev wrote: > Hello, > > [CC'ing Mark as both GWP and RM] > > Andrey Belevantsev wrote: >> This is actually the last part of the selective scheduler patch that >> did not get reviewed yet. Maybe a global write maintainer and a >> rs6000 maintainer could have a look? I can try to take a look, but my current priority is Graphite. I am hoping to finish reviewing that today. Thanks, -- Mark Mitchell CodeSourcery mark@codesourcery.com (650) 331-3385 x713 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Selective scheduling pass - target changes (ia64 & rs6000) [3/3] 2008-08-22 16:04 ` Andrey Belevantsev 2008-08-29 13:41 ` [Ping] [GWP/ia64/rs6000 maintainer needed] " Andrey Belevantsev @ 2008-09-25 22:39 ` sje 2008-09-26 14:57 ` Andrey Belevantsev 1 sibling, 1 reply; 28+ messages in thread From: sje @ 2008-09-25 22:39 UTC (permalink / raw) To: abel; +Cc: gcc-patches, wilson, vmakarov Andrey, I have started looking at the IA64 specific parts of the selective scheduling branch. I still need some more time but I was wondering if you could update it so that it is up-to-date with respect to the main trunk. I tried to apply the patch so I could look at some of the changes with more context and ia64.c would not apply cleanly. Here are a few minor comments from what I have reviewed so far. I didn't include the patch and put the comments inline since the patch is so large and I only had a few comments. There are some places where lines start with spaces instead of tabs even though they are indented enough to use tabs and a couple of functions (ia64_sched_init and ia64_sched_final) had lines where the only change was from tabs to spaces. In ia64_clear_sched_context we free _sc->prev_cycle_state, I was wondering if we should set it to NULL after freeing it. Or are we going to free _sc right after this so that it doesn't matter? Is the mflag_sched_spec_verbose flag really needed? It looks like all it does is dump output to stderr intead of the normal dump file. In get_mode_no_for_insn, there is a check: (AR_CCV_REGNUM <= REGNO (reg) && REGNO (reg) <= AR_EC_REGNUM) I think this should be replaced with AR_REGNO_P (). Steve Ellcey sje@cup.hp.com ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Selective scheduling pass - target changes (ia64 & rs6000) [3/3] 2008-09-25 22:39 ` sje @ 2008-09-26 14:57 ` Andrey Belevantsev 2008-10-03 22:22 ` Steve Ellcey 0 siblings, 1 reply; 28+ messages in thread From: Andrey Belevantsev @ 2008-09-26 14:57 UTC (permalink / raw) To: Steve Ellcey; +Cc: gcc-patches, wilson, vmakarov, Alexander Monakov sje@cup.hp.com wrote: > I have started looking at the IA64 specific parts of the selective > scheduling branch. I still need some more time but I was wondering if you > could update it so that it is up-to-date with respect to the main trunk. > I tried to apply the patch so I could look at some of the changes with > more context and ia64.c would not apply cleanly. We (Alexander and myself) just did it, so current sel-sched branch has the version of config/ia64/* files that we'd like to see on trunk. > There are some places where lines start with spaces instead of tabs even > though they are indented enough to use tabs and a couple of functions > (ia64_sched_init and ia64_sched_final) had lines where the only change > was from tabs to spaces. This is fixed on the branch. > In ia64_clear_sched_context we free _sc->prev_cycle_state, I was > wondering if we should set it to NULL after freeing it. Or are > we going to free _sc right after this so that it doesn't matter? Sometimes we free it and sometimes we don't, but I agree with you that it would be clearer to set it to NULL. I will prepare a patch and check it in the branch. > Is the mflag_sched_spec_verbose flag really needed? It looks like > all it does is dump output to stderr intead of the normal dump file. No, it is not needed, but it is also not present on the branch. We have reviewed the other introduced flags. We will remove -msel-sched* flags related to speculation, as we can use the existing flags, and we will remove mstop-bit-before-check flag, as it doesn't improve performance in average. > In get_mode_no_for_insn, there is a check: > > (AR_CCV_REGNUM <= REGNO (reg) && REGNO (reg) <= AR_EC_REGNUM) > > I think this should be replaced with AR_REGNO_P (). Fixed on the branch. Thanks a lot for your efforts! Andrey ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Selective scheduling pass - target changes (ia64 & rs6000) [3/3] 2008-09-26 14:57 ` Andrey Belevantsev @ 2008-10-03 22:22 ` Steve Ellcey 2008-10-06 17:26 ` Andrey Belevantsev 0 siblings, 1 reply; 28+ messages in thread From: Steve Ellcey @ 2008-10-03 22:22 UTC (permalink / raw) To: Andrey Belevantsev; +Cc: gcc-patches, wilson, vmakarov, Alexander Monakov On Fri, 2008-09-26 at 17:05 +0400, Andrey Belevantsev wrote: > sje@cup.hp.com wrote: > > I have started looking at the IA64 specific parts of the selective > > scheduling branch. I still need some more time but I was wondering if you > > could update it so that it is up-to-date with respect to the main trunk. > > I tried to apply the patch so I could look at some of the changes with > > more context and ia64.c would not apply cleanly. > We (Alexander and myself) just did it, so current sel-sched branch has > the version of config/ia64/* files that we'd like to see on trunk. > > Andrey Andrey, I have reviewed the changes on the sel-sched-branch and approve the IA64 specific changes. I noticed there were a few non-IA64 changes on the branch and obviously I can't approve those but the IA64 changes look OK. Steve Ellcey sje@cup.hp.com ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Selective scheduling pass - target changes (ia64 & rs6000) [3/3] 2008-10-03 22:22 ` Steve Ellcey @ 2008-10-06 17:26 ` Andrey Belevantsev 0 siblings, 0 replies; 28+ messages in thread From: Andrey Belevantsev @ 2008-10-06 17:26 UTC (permalink / raw) To: sje; +Cc: gcc-patches, wilson, vmakarov, Alexander Monakov Steve Ellcey wrote: > I have reviewed the changes on the sel-sched-branch and approve the IA64 > specific changes. I noticed there were a few non-IA64 changes on the branch > and obviously I can't approve those but the IA64 changes look OK. Thanks for the review! I will retest the IA64 changes over the next few days and commit. The non-IA64 changes should be just the difference between scheduling hooks due to the changes we handle speculation. Vlad had already approved those with the main part of sel-sched patch, I just had to revert that pieces when committing the main part without the IA64 part. Of course, if there are other changes, such as bugfixes not yet approved for mainline, I will not commit them. Thanks again, Andrey ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC] Selective scheduling pass 2008-06-03 14:24 [RFC] Selective scheduling pass Andrey Belevantsev ` (2 preceding siblings ...) 2008-06-03 14:28 ` Selective scheduling pass - target changes (ia64 & rs6000) [3/3] Andrey Belevantsev @ 2008-06-03 22:03 ` Vladimir Makarov 2008-06-04 16:55 ` Mark Mitchell 2008-06-05 3:45 ` Seongbae Park (박성배, 朴成培) 5 siblings, 0 replies; 28+ messages in thread From: Vladimir Makarov @ 2008-06-03 22:03 UTC (permalink / raw) To: Andrey Belevantsev; +Cc: GCC Patches, Jim Wilson Andrey Belevantsev wrote: > Hello, > > The patches in this thread introduce selective scheduler in GCC, > implemented by myself, Dmitry Melnik, Dmitry Zhurikhin, Alexander > Monakov, and Maxim Kuvyrkov while he was at ISP RAS. Selective > scheduler is aimed at scheduling eager targets such as ia64, power6, > and cell. The implementation contains both the scheduler and the > software pipeliner, which can be used on loops with control flow not > handled by SMS. The scheduler can work either before or after > register allocation, but it is currently tuned to work after. > > The scheduler was bootstrapped and tested on ia64, with all default > languages, both as a first and as a second scheduler. It was also > bootstrapped with c, c++, and fortran enabled on ppc64 and x86-64. > > On ia64, test results on SPEC2k FP comparing -O3 -ffast-math on trunk > and sel-sched branch show 3.8% speedup on average, SPEC INT shows both > small speedups and regressions, staying around neutral in average: > Congratulation! I followed the project for a long time. Finally some useful milestone is achieved and you have got a pretty big improvement. The scheduling algorithm is superior than what we had because it permits to improve insn schedules on all execution paths by insn cloning and other transformations. > On power6, Revital Eres saw speedups on several tests; additional > tuning is required to get good results there, which is complicated > because we don't have power6. On cell, there was some third-party > testing in 2007, showing 4-6% speedups, but I don't have more detailed > information. > > Compile time slowdown measured with --enable-checking=assert is quite > significant -- about 12% on spec int and about 18% on spec fp and > cc1-i-files collection. For this reason, we have enabled selective > scheduler by default at -O3 on ia64 and disabled by default on other > targets. > Itanium is pretty specific target. It would be interesting to know how big a slowdown for ppc. > Our current plan is to work on further compile time improvements and > performance tuning for ppc and cell, hopefully with the help of IBM > Haifa folks. If we will complete this work before the end of stage2, > then we can enable selective scheduling at -O3 also for ppc in 4.4. > In the mid-term, we will work on removing the ebb scheduler, as it is > now used on ia64 only and will be superseded by selective scheduler > when we'll further improve compile time. > I think we should rid of EBB scheduler finally. You could try to improve compile-time problem preventing some transformations in the new scheduler in -O2 mode. If you solve compile-time problem, I think we should work on removing haifa-scheduler too to have just one insn scheduler. But as I understand it will not happen soon. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC] Selective scheduling pass 2008-06-03 14:24 [RFC] Selective scheduling pass Andrey Belevantsev ` (3 preceding siblings ...) 2008-06-03 22:03 ` [RFC] Selective scheduling pass Vladimir Makarov @ 2008-06-04 16:55 ` Mark Mitchell 2008-06-04 20:50 ` Andrey Belevantsev 2008-06-05 3:45 ` Seongbae Park (박성배, 朴成培) 5 siblings, 1 reply; 28+ messages in thread From: Mark Mitchell @ 2008-06-04 16:55 UTC (permalink / raw) To: Andrey Belevantsev; +Cc: GCC Patches, Jim Wilson, Vladimir Makarov Andrey Belevantsev wrote: > The patches in this thread introduce selective scheduler in GCC, > implemented by myself, Dmitry Melnik, Dmitry Zhurikhin, Alexander > Monakov, and Maxim Kuvyrkov while he was at ISP RAS. Selective > scheduler is aimed at scheduling eager targets such as ia64, power6, and > cell. The implementation contains both the scheduler and the software > pipeliner, which can be used on loops with control flow not handled by > SMS. The scheduler can work either before or after register allocation, > but it is currently tuned to work after. > > On ia64, test results on SPEC2k FP comparing -O3 -ffast-math on trunk > and sel-sched branch show 3.8% speedup on average, SPEC INT shows both > small speedups and regressions, staying around neutral in average: That's a very good result. Congratulations! I know that this scheduler is aimed at CPUs like the ones you mention above. However, would it function correctly on other CPUs with more "traditional" characteristics, like older ARM, MIPS, or x86 cores? And, would it be reasonably possible to tune it for those CPUs as well? As with the IRA allocator, I'd like to avoid having multiple schedulers in GCC. (I know we've done that for a while, but I still think it's undesirable.) So, I'd like to see if we can get this to work well across all of the Primary and Secondary CPUs, and then just make it "the GCC scheduler" rather than an optional thing enabled at some optimization levels on some CPUs. Do you think that's feasible? Or do you think that there are inherent aspects of the algorithm that mean that we need to have this new scheduler for one class of CPUs and the old scheduler for the other class? Is there any way to make the new scheduler do a reasonable job with the existing descriptions in GCC, so that port maintainers can tune later, or is a level of effort like that for Itanium require > Compile time slowdown measured with --enable-checking=assert is quite > significant -- about 12% on spec int and about 18% on spec fp and > cc1-i-files collection. For this reason, we have enabled selective > scheduler by default at -O3 on ia64 and disabled by default on other > targets. Do you understand what's causing the compile-time slowdown? Thanks, -- Mark Mitchell CodeSourcery mark@codesourcery.com (650) 331-3385 x713 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC] Selective scheduling pass 2008-06-04 16:55 ` Mark Mitchell @ 2008-06-04 20:50 ` Andrey Belevantsev 0 siblings, 0 replies; 28+ messages in thread From: Andrey Belevantsev @ 2008-06-04 20:50 UTC (permalink / raw) To: Mark Mitchell; +Cc: GCC Patches, Jim Wilson, Vladimir Makarov Mark Mitchell wrote: > That's a very good result. Congratulations! Thank you! > I know that this scheduler is aimed at CPUs like the ones you mention > above. However, would it function correctly on other CPUs with more > "traditional" characteristics, like older ARM, MIPS, or x86 cores? And, > would it be reasonably possible to tune it for those CPUs as well? When a target doesn't do anything "fancy" in scheduler hooks, everything should just work (modulo bugs, of course; we've tried only ppc64 and x86-64). In case a target saves some information describing scheduler's state, simple hooks manipulating this data should be implemented, like we did for the rs6000 port. > As with the IRA allocator, I'd like to avoid having multiple schedulers > in GCC. (I know we've done that for a while, but I still think it's > undesirable.) So, I'd like to see if we can get this to work well > across all of the Primary and Secondary CPUs, and then just make it "the > GCC scheduler" rather than an optional thing enabled at some > optimization levels on some CPUs. This is our goal as well, and I think it can be done incrementally. We are now working on the ppc performance. Then we need to tune the scheduler so that for traditional targets it is no worse in performance and the slowdown is reasonable, e.g. with disabling pipelining and decreasing the scheduling window. The last thing to do is to speed up the implementation so that for scheduling-eager targets with pipelining enabled the slowdown will be acceptable for -O2. Note that the selective scheduler does not subsume SMS, but complements it, because SMS does better job for countable loops, but cannot handle loops with control flow and with unknown number of iterations. So in any case there will be two schedulers. > Do you think that's feasible? Or do you think that there are inherent > aspects of the algorithm that mean that we need to have this new > scheduler for one class of CPUs and the old scheduler for the other > class? Is there any way to make the new scheduler do a reasonable job > with the existing descriptions in GCC, so that port maintainers can tune > later, or is a level of effort like that for Itanium require The ia64 backend is very complex, and we put a lot of efforts in tuning it by itself -- you can see it in my other mail about target changes. So I think that tuning for other targets will be simpler. The cell results I mentioned in the mail were received from a guy who did the tuning internally in Samsung, and AFAIR he didn't mentioned any target-independent changes he had to do, but basically he just made it working. >> Compile time slowdown measured with --enable-checking=assert is quite >> significant -- about 12% on spec int and about 18% on spec fp and >> cc1-i-files collection. For this reason, we have enabled selective >> scheduler by default at -O3 on ia64 and disabled by default on other >> targets. > > Do you understand what's causing the compile-time slowdown? The part that takes most time is the update of availability sets, as this is the central part of the algorithm. Renaming is quite expensive too, but we have tackled this limiting it only to several insns with the largest priority. To make the updates faster, you need to build the data dependence graph and to keep it up to date while scheduling. Unfortunately, we didn't manage to do this during this project. The first step towards this goal will be to make the dependence graph classify the dependencies built on control/data, lhs/rhs, register/memory etc. Then we can invent the mechanism of updating the graph, which would not be trivial -- e.g. when an insn gets renamed, we have introduced a register-register copy which can generate completely new register dependencies that cannot be devised from existing ones. Such a project is likely to make it to trunk on the next release cycle, and that would correspond to the last step of the incremental approach outlined above. Yours, Andrey ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC] Selective scheduling pass 2008-06-03 14:24 [RFC] Selective scheduling pass Andrey Belevantsev ` (4 preceding siblings ...) 2008-06-04 16:55 ` Mark Mitchell @ 2008-06-05 3:45 ` Seongbae Park (박성배, 朴成培) 2008-06-05 13:49 ` Andrey Belevantsev 5 siblings, 1 reply; 28+ messages in thread From: Seongbae Park (박성배, 朴成培) @ 2008-06-05 3:45 UTC (permalink / raw) To: Andrey Belevantsev; +Cc: GCC Patches, Jim Wilson, Vladimir Makarov On Tue, Jun 3, 2008 at 7:16 AM, Andrey Belevantsev <abel@ispras.ru> wrote: > Hello, > > The patches in this thread introduce selective scheduler in GCC, implemented > by myself, Dmitry Melnik, Dmitry Zhurikhin, Alexander Monakov, and Maxim > Kuvyrkov while he was at ISP RAS. Selective scheduler is aimed at > scheduling eager targets such as ia64, power6, and cell. The implementation > contains both the scheduler and the software pipeliner, which can be used on > loops with control flow not handled by SMS. The scheduler can work either > before or after register allocation, but it is currently tuned to work > after. > > The scheduler was bootstrapped and tested on ia64, with all default > languages, both as a first and as a second scheduler. It was also > bootstrapped with c, c++, and fortran enabled on ppc64 and x86-64. > > On ia64, test results on SPEC2k FP comparing -O3 -ffast-math on trunk and > sel-sched branch show 3.8% speedup on average, SPEC INT shows both small > speedups and regressions, staying around neutral in average: > > 168.wupwise 513 552 7,60% > 171.swim 757 772 1,98% > 172.mgrid 570 643 12,81% > 173.applu 503 524 4,17% > 177.mesa 796 795 -0,13% > 178.galgel 814 787 -3,32% > 179.art 1990 2098 5,43% > 183.equake 513 569 10,92% > 187.facerec 958 991 3,44% > 188.ammp 765 775 1,31% > 189.lucas 860 869 1,05% > 191.fma3d 549 536 -2,37% > 200.sixtrack 300 323 7,67% > 301.apsi 522 546 4,60% > Geomean 673,97 699,87 3,84% > > 164.gzip 683 682 -0,15% > 175.vpr 814 802 -1,47% > 176.gcc 1080 1069 -1,02% > 181.mcf 701 708 1,00% > 186.crafty 872 855 -1,95% > 197.parser 729 728 -0,14% > 252.eon 793 785 -1,01% > 253.perlbmk 824 839 1,82% > 254.gap 558 569 1,97% > 255.vortex 1012 966 -4,55% > 256.bzip2 758 762 0,53% > 300.twolf 1005 1015 1,00% > Geomean 806,04 803,25 -0,35% Presumably this is with any profile feedback ? If so, numbers look ok. Have you tried it with profile feedback ? Selective scheduling (and most other aggressive global scheduling algorithms) can benefit quite a bit from profile feedback, and tuning can be quite different for with and without profile feedback. Seongbae ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC] Selective scheduling pass 2008-06-05 3:45 ` Seongbae Park (박성배, 朴成培) @ 2008-06-05 13:49 ` Andrey Belevantsev 0 siblings, 0 replies; 28+ messages in thread From: Andrey Belevantsev @ 2008-06-05 13:49 UTC (permalink / raw) To: "Seongbae Park (박성배, 朴成培)" Cc: GCC Patches, Jim Wilson, Vladimir Makarov Seongbae Park (ë°ì±ë°°, æ´æå¹) wrote: > Presumably this is with any profile feedback ? > If so, numbers look ok. You probably mean that the numbers are without profile feedback. This is true. > Have you tried it with profile feedback ? > Selective scheduling (and most other aggressive global scheduling algorithms) > can benefit quite a bit from profile feedback, > and tuning can be quite different for with and without profile feedback. No, we haven't tried that. I've got the impression that profile optimizations are not of big importance to GCC developers, so we focused on tuning without profile feedback. Nevertheless, we'll try SPEC with profiling feedback tonight. I will be happy to discuss how the scheduler can be tuned to use the profiling information -- will you attend the summit btw? Andrey ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Selective scheduling pass - target changes (ia64 & rs6000) [3/3] @ 2008-08-29 15:10 David Edelsohn 2008-08-31 13:35 ` Andrey Belevantsev 0 siblings, 1 reply; 28+ messages in thread From: David Edelsohn @ 2008-08-29 15:10 UTC (permalink / raw) To: Andrey Belevantsev Cc: GCC Patches, Jim Wilson, Vladimir Makarov, Ayal Zaks, Mark Mitchell * config/rs6000/rs6000.c (rs6000_init_sched_context, rs6000_alloc_sched_context, rs6000_set_sched_context, rs6000_free_sched_context): New functions. (struct _rs6000_sched_context): New. (rs6000_sched_reorder2): Do not modify INSN_PRIORITY for selective scheduling. (rs6000_sched_finish): Do not run for selective scheduling. The rs6000 part of the patch is okay with a modification to the following chunk: *************** rs6000_sched_finish (FILE *dump, int sch *** 20085,20091 **** if (reload_completed && rs6000_sched_groups) { ! if (rs6000_sched_insert_nops == sched_finish_none) return; if (rs6000_sched_insert_nops == sched_finish_pad_groups) --- 20103,20110 ---- if (reload_completed && rs6000_sched_groups) { ! if (rs6000_sched_insert_nops == sched_finish_none ! || sel_sched_p ()) return; if (rs6000_sched_insert_nops == sched_finish_pad_groups) Please change this to a separate test for clarify + /* Do not run sched_finish hook when selective scheduling enabled. */ + if (sel_sched_p ()) + return; + if (rs6000_sched_insert_nops == sched_finish_none) return; instead of combining the tests. Also, target maintainers have flexibility during stage 3 with respect to changes local to a port, so the Itanium changes can be approved and committed during stage 3, although earlier would be better. Thanks, David ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Selective scheduling pass - target changes (ia64 & rs6000) [3/3] 2008-08-29 15:10 Selective scheduling pass - target changes (ia64 & rs6000) [3/3] David Edelsohn @ 2008-08-31 13:35 ` Andrey Belevantsev 0 siblings, 0 replies; 28+ messages in thread From: Andrey Belevantsev @ 2008-08-31 13:35 UTC (permalink / raw) To: David Edelsohn Cc: GCC Patches, Jim Wilson, Vladimir Makarov, Ayal Zaks, Mark Mitchell Hello, David Edelsohn wrote: > * config/rs6000/rs6000.c (rs6000_init_sched_context, > rs6000_alloc_sched_context, rs6000_set_sched_context, > rs6000_free_sched_context): New functions. > (struct _rs6000_sched_context): New. > (rs6000_sched_reorder2): Do not modify INSN_PRIORITY for selective > scheduling. > (rs6000_sched_finish): Do not run for selective scheduling. > > The rs6000 part of the patch is okay with a modification to the following chunk: Thanks for the review, I'll fix that up. > Also, target maintainers have flexibility during stage 3 with respect > to changes local to a port, > so the Itanium changes can be approved and committed during stage 3, > although earlier would > be better. I didn't know that. But, what would be the preferred policy for checking in the scheduler in this case? Should I wait for the Itanium changes to be reviewed and then commit the whole patch, which means that target-independent changes would be committed during stage3? Or, should I check in the scheduler without ia64 changes now, which means it will be non-functional on Itanium, and commit to reverting it in case the ia64 changes will not be reviewed even during stage 3? I'll appreciate the advice on how to proceed. Thanks, Andrey ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2008-10-06 17:06 UTC | newest] Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-06-03 14:24 [RFC] Selective scheduling pass Andrey Belevantsev 2008-06-03 14:26 ` Selective scheduling pass - middle end changes [1/1] Andrey Belevantsev 2008-06-11 1:04 ` Ian Lance Taylor 2008-06-11 13:40 ` Andrey Belevantsev 2008-06-11 14:30 ` Ian Lance Taylor 2008-06-27 13:10 ` Andrey Belevantsev 2008-06-30 16:16 ` Ian Lance Taylor 2008-07-08 14:54 ` Andrey Belevantsev 2008-07-08 15:29 ` Ian Lance Taylor 2008-08-22 15:55 ` Andrey Belevantsev 2008-06-03 14:27 ` Selective scheduling pass - scheduler changes [2/3] Andrey Belevantsev 2008-06-03 22:03 ` Vladimir Makarov 2008-08-22 15:52 ` Andrey Belevantsev 2008-06-03 14:28 ` Selective scheduling pass - target changes (ia64 & rs6000) [3/3] Andrey Belevantsev 2008-08-22 16:04 ` Andrey Belevantsev 2008-08-29 13:41 ` [Ping] [GWP/ia64/rs6000 maintainer needed] " Andrey Belevantsev 2008-08-29 15:01 ` Mark Mitchell 2008-09-25 22:39 ` sje 2008-09-26 14:57 ` Andrey Belevantsev 2008-10-03 22:22 ` Steve Ellcey 2008-10-06 17:26 ` Andrey Belevantsev 2008-06-03 22:03 ` [RFC] Selective scheduling pass Vladimir Makarov 2008-06-04 16:55 ` Mark Mitchell 2008-06-04 20:50 ` Andrey Belevantsev 2008-06-05 3:45 ` Seongbae Park (박성배, 朴成培) 2008-06-05 13:49 ` Andrey Belevantsev 2008-08-29 15:10 Selective scheduling pass - target changes (ia64 & rs6000) [3/3] David Edelsohn 2008-08-31 13:35 ` Andrey Belevantsev
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).