[RFC] Selective scheduling pass

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [RFC] Selective scheduling pass
@ 2008-06-03 14:24 Andrey Belevantsev
  2008-06-03 14:26 ` Selective scheduling pass - middle end changes [1/1] Andrey Belevantsev
                   ` (5 more replies)
  0 siblings, 6 replies; 28+ messages in thread
From: Andrey Belevantsev @ 2008-06-03 14:24 UTC (permalink / raw)
  To: GCC Patches; +Cc: Jim Wilson, Vladimir Makarov

Hello,

The patches in this thread introduce selective scheduler in GCC, 
implemented by myself, Dmitry Melnik, Dmitry Zhurikhin, Alexander 
Monakov, and Maxim Kuvyrkov while he was at ISP RAS.  Selective 
scheduler is aimed at scheduling eager targets such as ia64, power6, and 
cell.  The implementation contains both the scheduler and the software 
pipeliner, which can be used on loops with control flow not handled by 
SMS.  The scheduler can work either before or after register allocation, 
but it is currently tuned to work after.

The scheduler was bootstrapped and tested on ia64, with all default 
languages, both as a first and as a second scheduler.  It was also 
bootstrapped with c, c++, and fortran enabled on ppc64 and x86-64.

On ia64, test results on SPEC2k FP comparing -O3 -ffast-math on trunk 
and sel-sched branch show 3.8% speedup on average, SPEC INT shows both 
small speedups and regressions, staying around neutral in average:

168.wupwise	513	552	7,60%
171.swim	757	772	1,98%
172.mgrid	570	643	12,81%
173.applu	503	524	4,17%
177.mesa	796	795	-0,13%
178.galgel	814	787	-3,32%
179.art		1990	2098	5,43%
183.equake	513	569	10,92%
187.facerec	958	991	3,44%
188.ammp	765	775	1,31%
189.lucas	860	869	1,05%
191.fma3d	549	536	-2,37%
200.sixtrack	300	323	7,67%
301.apsi	522	546	4,60%
Geomean		673,97	699,87	3,84%

164.gzip	683	682	-0,15%
175.vpr		814	802	-1,47%
176.gcc		1080	1069	-1,02%
181.mcf		701	708	1,00%
186.crafty	872	855	-1,95%
197.parser	729	728	-0,14%
252.eon		793	785	-1,01%
253.perlbmk	824	839	1,82%
254.gap		558	569	1,97%
255.vortex	1012	966	-4,55%
256.bzip2	758	762	0,53%
300.twolf	1005	1015	1,00%
Geomean		806,04	803,25	-0,35%

On power6, Revital Eres saw speedups on several tests; additional tuning 
is required to get good results there, which is complicated because we 
don't have power6.  On cell, there was some third-party testing in 2007, 
showing 4-6% speedups, but I don't have more detailed information.

Compile time slowdown measured with --enable-checking=assert is quite 
significant -- about 12% on spec int and about 18% on spec fp and 
cc1-i-files collection.  For this reason, we have enabled selective 
scheduler by default at -O3 on ia64 and disabled by default on other 
targets.

Our current plan is to work on further compile time improvements and 
performance tuning for ppc and cell, hopefully with the help of IBM 
Haifa folks.  If we will complete this work before the end of stage2, 
then we can enable selective scheduling at -O3 also for ppc in 4.4.  In 
the mid-term, we will work on removing the ebb scheduler, as it is now 
used on ia64 only and will be superseded by selective scheduler when 
we'll further improve compile time.

Andrey

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Selective scheduling pass - middle end changes [1/1]
  2008-06-03 14:24 [RFC] Selective scheduling pass Andrey Belevantsev
@ 2008-06-03 14:26 ` Andrey Belevantsev
  2008-06-11  1:04   ` Ian Lance Taylor
  2008-06-03 14:27 ` Selective scheduling pass - scheduler changes [2/3] Andrey Belevantsev
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 28+ messages in thread
From: Andrey Belevantsev @ 2008-06-03 14:26 UTC (permalink / raw)
  To: GCC Patches; +Cc: Jim Wilson, Vladimir Makarov

[-- Attachment #1: Type: text/plain, Size: 3687 bytes --]

Hello,

This patch represents the middle-end changes needed for the selective 
scheduler.  They are relatively small and include the following:

o hooks to catch new insns and basic blocks, which can be generated as a 
  result of changing control flow in cfgrtl mode.  These are out of the 
scheduler control, but still we need to initialize internal data for them.

o fixes needed to work with loop data after register allocation and in 
cfgrtl mode, I think one of them was sent by Zdenek to me some time ago.

o a function in genautomata to output maximal insn latency.

o an iterator over hard reg sets analogous to the bitmap iterator.

o an interface to validate_replace_rtx that allows postponing 
simplification of the new rtx until later.

o an interface to rtx_equal_p and hash_rtx that allows skipping certain 
parts of rtx while comparing or hashing.  This is needed for unification 
of e.g. control speculative and data speculative insns (which have 
different patterns, of course).  The similar mechanism for may_trap_p is 
already in trunk, where it is implemented via a target hook.

OK for trunk?
Andrey


2008-06-03  Andrey Belevantsev  <abel@ispras.ru>
	    Dmitry Melnik  <dm@ispras.ru>
	    Dmitry Zhurikhin  <zhur@ispras.ru>
	    Alexander Monakov  <amonakov@ispras.ru>
	    Maxim Kuvyrkov  <maxim@codesourcery.com>
	
	* cfghooks.h (get_cfg_hooks, set_cfg_hooks): New prototypes.

	* cfghooks.c (get_cfg_hooks, set_cfg_hooks): New functions.
	(make_forwarder_block): Update loop latch if we have redirected
	the loop latch edge.

	* cfgloop.c (get_loop_body_in_custom_order): New function.

	* cfgloop.h (LOOPS_HAVE_FALLTHRU_PREHEADERS): New enum field.
	(CP_FALLTHRU_PREHEADERS): Likewise.
	(get_loop_body_in_custom_order): Declare.
	
	* cfgloopmanip.c (has_preds_from_loop): New.
	(create_preheader): Honor CP_FALLTHRU_PREHEADERS.
	Assert that the preheader edge will be fall thru when it is set.
	
	* cse.c (hash_rtx_cb): New.
	(hash_rtx): Use it.

	* emit-rtl.c (add_insn, add_insn_after, add_insn_before,
	emit_insn_after_1): Call insn_added hook.

	* genattr.c (main): Output maximal_insn_latency prototype.

	* genautomata.c (output_default_latencies): New. Factor its code from ...
	(output_internal_insn_latency_func): ... here.
	(output_internal_maximal_insn_latency_func): New.
	(output_maximal_insn_latency_func): New.

	* hard-reg-set.h (UHOST_BITS_PER_WIDE_INT): Define unconditionally.
	(struct hard_reg_set_iterator): New.
	(hard_reg_set_iter_init, hard_reg_set_iter_set,
	hard_reg_set_iter_next): New functions.
	(EXECUTE_IF_SET_IN_HARD_REG_SET): New macro.

	* lists.c (remove_free_INSN_LIST_node,
	remove_free_EXPR_LIST_node): New functions.

	* loop-init.c (loop_optimizer_init): When LOOPS_HAVE_FALLTHRU_PREHEADERS,
	set CP_FALLTHRU_PREHEADERS when calling create_preheaders.
	(loop_optimizer_finalize): Do not verify flow info after reload.

	* passes.c (init_optimization_passes): 	Move pass_compute_alignments
	after pass_machine_reorg.

	* recog.c (validate_replace_rtx_1): New parameter simplify.
	Default it to true.  Update all uses.  Factor out simplifying
	code to ...
	(simplify_while_replacing): ... this new function.
	(validate_replace_rtx_part,
	validate_replace_rtx_part_nosimplify): New.

	* recog.h (validate_replace_rtx_part,
	validate_replace_rtx_part_nosimplify): Declare.

	* rtl.c (rtx_equal_p_cb): New.
	(rtx_equal_p): Use it.

	* rtl.h (rtx_equal_p_cb, hash_rtx_cb): Declare.
	(remove_free_INSN_LIST_NODE, remove_free_EXPR_LIST_node,
	debug_bb_n_slim, debug_bb_slim,	print_rtl_slim,
	sel_sched_fix_param, insn_added): Likewise.

	* rtlhooks-def.h (RTL_HOOKS_INSN_ADDED): Define to NULL.
	Add to RTL_HOOKS_INITIALIZER.


[-- Attachment #2: sel-sched-merge-middle-end.diff.gz --]
[-- Type: application/gzip, Size: 11818 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Selective scheduling pass - middle end changes [1/1]
  2008-06-03 14:26 ` Selective scheduling pass - middle end changes [1/1] Andrey Belevantsev
@ 2008-06-11  1:04   ` Ian Lance Taylor
  2008-06-11 13:40     ` Andrey Belevantsev
  2008-08-22 15:55     ` Andrey Belevantsev
  0 siblings, 2 replies; 28+ messages in thread
From: Ian Lance Taylor @ 2008-06-11  1:04 UTC (permalink / raw)
  To: Andrey Belevantsev; +Cc: GCC Patches, Jim Wilson, Vladimir Makarov

Andrey Belevantsev <abel@ispras.ru> writes:

> 2008-06-03  Andrey Belevantsev  <abel@ispras.ru>
> 	    Dmitry Melnik  <dm@ispras.ru>
> 	    Dmitry Zhurikhin  <zhur@ispras.ru>
> 	    Alexander Monakov  <amonakov@ispras.ru>
> 	    Maxim Kuvyrkov  <maxim@codesourcery.com>
> 	
> 	* cfghooks.h (get_cfg_hooks, set_cfg_hooks): New prototypes.
>
> 	* cfghooks.c (get_cfg_hooks, set_cfg_hooks): New functions.
> 	(make_forwarder_block): Update loop latch if we have redirected
> 	the loop latch edge.
>
> 	* cfgloop.c (get_loop_body_in_custom_order): New function.
>
> 	* cfgloop.h (LOOPS_HAVE_FALLTHRU_PREHEADERS): New enum field.
> 	(CP_FALLTHRU_PREHEADERS): Likewise.
> 	(get_loop_body_in_custom_order): Declare.
> 	
> 	* cfgloopmanip.c (has_preds_from_loop): New.
> 	(create_preheader): Honor CP_FALLTHRU_PREHEADERS.
> 	Assert that the preheader edge will be fall thru when it is set.
> 	
> 	* cse.c (hash_rtx_cb): New.
> 	(hash_rtx): Use it.
>
> 	* emit-rtl.c (add_insn, add_insn_after, add_insn_before,
> 	emit_insn_after_1): Call insn_added hook.
>
> 	* genattr.c (main): Output maximal_insn_latency prototype.
>
> 	* genautomata.c (output_default_latencies): New. Factor its code from ...
> 	(output_internal_insn_latency_func): ... here.
> 	(output_internal_maximal_insn_latency_func): New.
> 	(output_maximal_insn_latency_func): New.
>
> 	* hard-reg-set.h (UHOST_BITS_PER_WIDE_INT): Define unconditionally.
> 	(struct hard_reg_set_iterator): New.
> 	(hard_reg_set_iter_init, hard_reg_set_iter_set,
> 	hard_reg_set_iter_next): New functions.
> 	(EXECUTE_IF_SET_IN_HARD_REG_SET): New macro.
>
> 	* lists.c (remove_free_INSN_LIST_node,
> 	remove_free_EXPR_LIST_node): New functions.
>
> 	* loop-init.c (loop_optimizer_init): When LOOPS_HAVE_FALLTHRU_PREHEADERS,
> 	set CP_FALLTHRU_PREHEADERS when calling create_preheaders.
> 	(loop_optimizer_finalize): Do not verify flow info after reload.
>
> 	* passes.c (init_optimization_passes): 	Move pass_compute_alignments
> 	after pass_machine_reorg.
>
> 	* recog.c (validate_replace_rtx_1): New parameter simplify.
> 	Default it to true.  Update all uses.  Factor out simplifying
> 	code to ...
> 	(simplify_while_replacing): ... this new function.
> 	(validate_replace_rtx_part,
> 	validate_replace_rtx_part_nosimplify): New.
>
> 	* recog.h (validate_replace_rtx_part,
> 	validate_replace_rtx_part_nosimplify): Declare.
>
> 	* rtl.c (rtx_equal_p_cb): New.
> 	(rtx_equal_p): Use it.
>
> 	* rtl.h (rtx_equal_p_cb, hash_rtx_cb): Declare.
> 	(remove_free_INSN_LIST_NODE, remove_free_EXPR_LIST_node,
> 	debug_bb_n_slim, debug_bb_slim,	print_rtl_slim,
> 	sel_sched_fix_param, insn_added): Likewise.
>
> 	* rtlhooks-def.h (RTL_HOOKS_INSN_ADDED): Define to NULL.
> 	Add to RTL_HOOKS_INITIALIZER.



!       if (jump != NULL)
!         {
!           /* If we redirected the loop latch edge, the JUMP block now acts like
!              the new latch of the loop.  */
!           if (current_loops != NULL
!               && dummy->loop_father->header == dummy
!               && dummy->loop_father->latch == e_src)
!             dummy->loop_father->latch = jump;

I think you need to check that dummy->loop_father != NULL before you
dereference it.



  	  && !((flags & CP_SIMPLE_PREHEADERS)
! 	       && !single_succ_p (single_entry->src))
!           && !((flags & CP_FALLTHRU_PREHEADERS
!                 && (JUMP_P (BB_END (single_entry->src))
!                     || has_preds_from_loop (single_entry->src, loop)))))

This code needs a comment.  Actually, I think it would be better to
break up the complex condition into three simpler conditions, ideally
ones which can be understood without applying DeMorgan's law.

Also, you need to update the comment on the function as a whole.



*************** hash_rtx (const_rtx x, enum machine_mode
*** 2237,2243 ****
        x = XEXP (x, 0);
        goto repeat;
  
!     case USE:
        /* A USE that mentions non-volatile memory needs special
  	 handling since the MEM may be BLKmode which normally
  	 prevents an entry from being made.  Pure calls are
--- 2241,2247 ----
        x = XEXP (x, 0);
        goto repeat;
  
!     case USE:	
        /* A USE that mentions non-volatile memory needs special
  	 handling since the MEM may be BLKmode which normally
  	 prevents an entry from being made.  Pure calls are

A whitespace change in the wrong direction.  Please don't apply this
bit.


*************** hash_rtx (const_rtx x, enum machine_mode
*** 2330,2343 ****
  	      goto repeat;
  	    }
  
! 	  hash += hash_rtx (XEXP (x, i), 0, do_not_record_p,
! 			    hash_arg_in_memory_p, have_reg_qty);
  	  break;
  
  	case 'E':
  	  for (j = 0; j < XVECLEN (x, i); j++)
! 	    hash += hash_rtx (XVECEXP (x, i, j), 0, do_not_record_p,
! 			      hash_arg_in_memory_p, have_reg_qty);
  	  break;
  
  	case 's':
--- 2339,2355 ----
  	      goto repeat;
  	    }
  
! 	  hash += hash_rtx_cb (XEXP (x, i), 0, do_not_record_p,
! 			                  hash_arg_in_memory_p,
!                                           have_reg_qty, cb);
  	  break;
  
  	case 'E':
  	  for (j = 0; j < XVECLEN (x, i); j++)
! 	    hash += hash_rtx_cb (XVECEXP (x, i, j), 0,
!                                             do_not_record_p,
!                                             hash_arg_in_memory_p,
!                                             have_reg_qty, cb);
  	  break;
  
  	case 's':

The indentation seems to have gone wrong here.




*** trunk/gcc/genautomata.c	Mon Sep 17 10:03:51 2007
--- sel-sched-branch/gcc/genautomata.c	Mon Apr 14 17:13:39 2008
*************** output_min_insn_conflict_delay_func (voi
*** 8067,8079 ****
    fprintf (output_file, "}\n\n");
  }
  
- /* Output function `internal_insn_latency'.  */
  static void
! output_internal_insn_latency_func (void)
  {
-   decl_t decl;
-   struct bypass_decl *bypass;
    int i, j, col;
    const char *tabletype = "unsigned char";
  
    /* Find the smallest integer type that can hold all the default
--- 8067,8077 ----
    fprintf (output_file, "}\n\n");
  }
  
  static void
! output_default_latencies (void)
  {
    int i, j, col;
+   decl_t decl;
    const char *tabletype = "unsigned char";
  
    /* Find the smallest integer type that can hold all the default


Don't remove the comment on the function, correct it.



+           if (iter->bits)
+             goto next_bit;

This goto seems unnecessarily confusing.  A simple "break" should work
here.




*** trunk/gcc/passes.c	Fri May 30 17:32:06 2008
--- sel-sched-branch/gcc/passes.c	Fri May 23 18:48:33 2008
*************** init_optimization_passes (void)
*** 770,780 ****
  	      NEXT_PASS (pass_split_before_regstack);
  	      NEXT_PASS (pass_stack_regs_run);
  	    }
- 	  NEXT_PASS (pass_compute_alignments);
  	  NEXT_PASS (pass_duplicate_computed_gotos);
  	  NEXT_PASS (pass_variable_tracking);
  	  NEXT_PASS (pass_free_cfg);
  	  NEXT_PASS (pass_machine_reorg);
  	  NEXT_PASS (pass_cleanup_barriers);
  	  NEXT_PASS (pass_delay_slots);
  	  NEXT_PASS (pass_split_for_shorten_branches);
--- 770,780 ----
  	      NEXT_PASS (pass_split_before_regstack);
  	      NEXT_PASS (pass_stack_regs_run);
  	    }
  	  NEXT_PASS (pass_duplicate_computed_gotos);
  	  NEXT_PASS (pass_variable_tracking);
  	  NEXT_PASS (pass_free_cfg);
  	  NEXT_PASS (pass_machine_reorg);
+ 	  NEXT_PASS (pass_compute_alignments);
  	  NEXT_PASS (pass_cleanup_barriers);
  	  NEXT_PASS (pass_delay_slots);
  	  NEXT_PASS (pass_split_for_shorten_branches);

This looks wrong.  I don't think you can call pass_compute_alignments
after calling pass_free_cfg.



*************** extern void set_curr_insn_source_locatio
*** 2305,2308 ****
--- 2327,2332 ----
  extern void set_curr_insn_block (tree);
  extern int curr_insn_locator (void);
  
+ #define insn_added (rtl_hooks.insn_added)
+ 
  #endif /* ! GCC_RTL_H */

We have a lot of #define's like this because nobody wanted to clean up
the existing code.  For new code, I don't think we need to add the
#define's.  Just use rtl_hooks.insn_added.

That said, I'm not sure I like insn_added very much.  It seems like a
relatively fragile hook, as it will be hard to detect cases when it is
used incorrectly.  Can you expand on why this is needed?  For building
data structures, why does it not suffice to use get_max_uid?  What
sorts of insns do you expect to see created here?

Thanks.

Ian

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Selective scheduling pass - middle end changes [1/1]
  2008-06-11  1:04   ` Ian Lance Taylor
@ 2008-06-11 13:40     ` Andrey Belevantsev
  2008-06-11 14:30       ` Ian Lance Taylor
  2008-08-22 15:55     ` Andrey Belevantsev
  1 sibling, 1 reply; 28+ messages in thread
From: Andrey Belevantsev @ 2008-06-11 13:40 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: GCC Patches, Jim Wilson, Vladimir Makarov

Hello Ian,

Thanks for reviewing the patch!

Ian Lance Taylor wrote:
> This looks wrong.  I don't think you can call pass_compute_alignments
> after calling pass_free_cfg.
On ia64 we needed to compute alignments after the scheduling was done, 
i.e. after pass_machine_reorg.  Otherwise cfg changes messed up the 
alignments, for example, loop label could move to the other basic block. 
Of course, on ia64 there is a cfg at that point, and this is why it's 
worked.  I missed that it would not be like this for other targets. 
What would you suggest doing instead?


> That said, I'm not sure I like insn_added very much.  It seems like a
> relatively fragile hook, as it will be hard to detect cases when it is
> used incorrectly.  Can you expand on why this is needed?  For building
> data structures, why does it not suffice to use get_max_uid?  What
> sorts of insns do you expect to see created here?
We have control over all insns we create in the scheduler, and we 
properly initialize data for them, except for the jumps that get created 
during e.g. redirect_edge_and_branch.  The hook was invented to catch 
these.  We can probably do with e.g. memorizing get_max_uid before and 
after the calls to cfgrtl functions and then passing the new insns to 
the initialization engine, but we need an rtx there, not an uid.  How 
can we get it?

Andrey

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Selective scheduling pass - middle end changes [1/1]
  2008-06-11 13:40     ` Andrey Belevantsev
@ 2008-06-11 14:30       ` Ian Lance Taylor
  2008-06-27 13:10         ` Andrey Belevantsev
  0 siblings, 1 reply; 28+ messages in thread
From: Ian Lance Taylor @ 2008-06-11 14:30 UTC (permalink / raw)
  To: Andrey Belevantsev; +Cc: GCC Patches, Jim Wilson, Vladimir Makarov

Andrey Belevantsev <abel@ispras.ru> writes:

> Ian Lance Taylor wrote:
>> This looks wrong.  I don't think you can call pass_compute_alignments
>> after calling pass_free_cfg.
> On ia64 we needed to compute alignments after the scheduling was done,
> i.e. after pass_machine_reorg.  Otherwise cfg changes messed up the
> alignments, for example, loop label could move to the other basic
> block. Of course, on ia64 there is a cfg at that point, and this is
> why it's worked.  I missed that it would not be like this for other
> targets. What would you suggest doing instead?

I would suggest that you have the ia64 machine_reorg pass call
compute_alignments itself.  Admittedly compute_alignments will be run
twice for the ia64, but it should be a fairly fast pass--it loops
through all the basic blocks, but not through all the insns.

>> That said, I'm not sure I like insn_added very much.  It seems like a
>> relatively fragile hook, as it will be hard to detect cases when it is
>> used incorrectly.  Can you expand on why this is needed?  For building
>> data structures, why does it not suffice to use get_max_uid?  What
>> sorts of insns do you expect to see created here?
> We have control over all insns we create in the scheduler, and we
> properly initialize data for them, except for the jumps that get
> created during e.g. redirect_edge_and_branch.  The hook was invented
> to catch these.  We can probably do with e.g. memorizing get_max_uid
> before and after the calls to cfgrtl functions and then passing the
> new insns to the initialization engine, but we need an rtx there, not
> an uid.  How can we get it?

Unfortunately, there is no mapping from the UID to the insn.  I was
thinking of, e.g., using the UID to scale array sizes.

If you look at haifa-sched.c, you'll see that it uses calls like
redirect_edge_succ, generates branch insns itself, and calls
extend_global (a haifa-sched.c) function to build information about
the insn.  Is it reasonable for your code to work at that level?

Since you have data about all insns, don't you also need data about
insns which have changed or are deleted?

Ian

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Selective scheduling pass - middle end changes [1/1]
  2008-06-11 14:30       ` Ian Lance Taylor
@ 2008-06-27 13:10         ` Andrey Belevantsev
  2008-06-30 16:16           ` Ian Lance Taylor
  0 siblings, 1 reply; 28+ messages in thread
From: Andrey Belevantsev @ 2008-06-27 13:10 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: GCC Patches, Jim Wilson, Vladimir Makarov

Hello Ian,

Sorry for the delay in answer -- I've just got back after traveling 
(including the summit).  I'm now working on fixing issues you've pointed 
out.

Ian Lance Taylor wrote:
> I would suggest that you have the ia64 machine_reorg pass call
> compute_alignments itself.  Admittedly compute_alignments will be run
> twice for the ia64, but it should be a fairly fast pass--it loops
> through all the basic blocks, but not through all the insns.
I will try that.

> Unfortunately, there is no mapping from the UID to the insn.  I was
> thinking of, e.g., using the UID to scale array sizes.
> 
> If you look at haifa-sched.c, you'll see that it uses calls like
> redirect_edge_succ, generates branch insns itself, and calls
> extend_global (a haifa-sched.c) function to build information about
> the insn.  Is it reasonable for your code to work at that level?
That would require reimplementing e.g. split_edge and 
redirect_edge_and_branch inside the scheduler, so we can see the actual 
insn created.  I don't think this is reasonable.

If you're uncomfortable with the idea of the hook, I can invent 
something along the lines of searching the new jumps in the code and 
passing them to the initialization routines.  This would effectively 
find insns given their UIDs and the knowledge that they has got created 
somewhere near the given point in the CFG.  I think this will not happen 
too often to have significant effects on compile time.  The hook seemed 
to be just the simpler way of doing this.

> Since you have data about all insns, don't you also need data about
> insns which have changed or are deleted?
Not quite.  We always change insn's uid when its pattern was changed 
(which is also happens not very often).  Dependence caches used for 
on-the-fly analysis rely on this as they use UIDs as a key.  Overall, 
the data is maintained valid only for insns that are actually in the 
insn stream, as we only either collect them as possible scheduling 
candidates or propagate through them.  The data for deleted insns 
remains in the array and gets freed after the current region has been 
scheduled.

Andrey

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Selective scheduling pass - middle end changes [1/1]
  2008-06-27 13:10         ` Andrey Belevantsev
@ 2008-06-30 16:16           ` Ian Lance Taylor
  2008-07-08 14:54             ` Andrey Belevantsev
  0 siblings, 1 reply; 28+ messages in thread
From: Ian Lance Taylor @ 2008-06-30 16:16 UTC (permalink / raw)
  To: Andrey Belevantsev; +Cc: GCC Patches, Jim Wilson, Vladimir Makarov

Andrey Belevantsev <abel@ispras.ru> writes:

> If you're uncomfortable with the idea of the hook, I can invent
> something along the lines of searching the new jumps in the code and
> passing them to the initialization routines.  This would effectively
> find insns given their UIDs and the knowledge that they has got
> created somewhere near the given point in the CFG.  I think this will
> not happen too often to have significant effects on compile time.  The
> hook seemed to be just the simpler way of doing this.

Well, I'm uncomfortable with the idea of the hook.  I wouldn't
necessarily mind a complete hook interface.  But the one you've
implemented seems sort of ad hoc and easy to get wrong.  We don't
currently have any way for a pass to clearly track every change to the
RTL insn stream.  If we need that, I think we should do it for real.

Ian

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Selective scheduling pass - middle end changes [1/1]
  2008-06-30 16:16           ` Ian Lance Taylor
@ 2008-07-08 14:54             ` Andrey Belevantsev
  2008-07-08 15:29               ` Ian Lance Taylor
  0 siblings, 1 reply; 28+ messages in thread
From: Andrey Belevantsev @ 2008-07-08 14:54 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: GCC Patches, Jim Wilson, Vladimir Makarov

Hello,

Ian Lance Taylor wrote:
> Well, I'm uncomfortable with the idea of the hook.  I wouldn't
> necessarily mind a complete hook interface.  But the one you've
> implemented seems sort of ad hoc and easy to get wrong.  We don't
> currently have any way for a pass to clearly track every change to the
> RTL insn stream.  If we need that, I think we should do it for real.
I have looked closely at the places where jumps are generated by 
cfgrtl.c.  There are only two of them, one in 
force_fallthru_and_redirect and one in try_redirect_by_replacing_jump, 
and all our usage of split_edge and redirect_edge_and_branch leads to 
these places.  What if I add an interface for register/unregister a hook 
that would notify of creating new jumps by those functions?  This way, 
the changes in the scheduler will be minimal, and the hook itself would 
be much more safe.  I can make it a general cfg hook if desired, but I 
doubt that tree cfg or cfglayout will use it.

Andrey

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Selective scheduling pass - middle end changes [1/1]
  2008-07-08 14:54             ` Andrey Belevantsev
@ 2008-07-08 15:29               ` Ian Lance Taylor
  0 siblings, 0 replies; 28+ messages in thread
From: Ian Lance Taylor @ 2008-07-08 15:29 UTC (permalink / raw)
  To: Andrey Belevantsev; +Cc: GCC Patches, Jim Wilson, Vladimir Makarov

Andrey Belevantsev <abel@ispras.ru> writes:

> Ian Lance Taylor wrote:
>> Well, I'm uncomfortable with the idea of the hook.  I wouldn't
>> necessarily mind a complete hook interface.  But the one you've
>> implemented seems sort of ad hoc and easy to get wrong.  We don't
>> currently have any way for a pass to clearly track every change to the
>> RTL insn stream.  If we need that, I think we should do it for real.
> I have looked closely at the places where jumps are generated by
> cfgrtl.c.  There are only two of them, one in
> force_fallthru_and_redirect and one in try_redirect_by_replacing_jump,
> and all our usage of split_edge and redirect_edge_and_branch leads to
> these places.  What if I add an interface for register/unregister a
> hook that would notify of creating new jumps by those functions?  This
> way, the changes in the scheduler will be minimal, and the hook itself
> would be much more safe.  I can make it a general cfg hook if desired,
> but I doubt that tree cfg or cfglayout will use it.

I really think that Steven's suggestion of using cfglayout mode is
correct.

Ian

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Selective scheduling pass - middle end changes [1/1]
  2008-06-11  1:04   ` Ian Lance Taylor
  2008-06-11 13:40     ` Andrey Belevantsev
@ 2008-08-22 15:55     ` Andrey Belevantsev
  1 sibling, 0 replies; 28+ messages in thread
From: Andrey Belevantsev @ 2008-08-22 15:55 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: GCC Patches, Jim Wilson, Vladimir Makarov

[-- Attachment #1: Type: text/plain, Size: 3175 bytes --]

Hello Ian,

Here is the updated patch that has all your comments addressed and the 
RTL hooks removed.  As I wrote in a separate email 
(http://gcc.gnu.org/ml/gcc-patches/2008-08/msg01052.html), I apologize 
for making too much noise about the RTL hooks problem -- with Zdenek's 
suggestion, it is actually solved very easily.  This patch has the only 
addition compared to the original patch in final.c which fixes a FAIL on 
ia64 with the dump-addr.c test case.  When the dump file is used, 
dominator information does not get freed in compute_alignment, thus 
resulting in ICE in verify_dominators later in the selective scheduling. 
  The addition itself was committed to the branch at 
http://gcc.gnu.org/ml/gcc-patches/2008-08/msg01633.html.

Thanks again for your review,
Andrey


2008-08-22  Andrey Belevantsev  <abel@ispras.ru>
	    Dmitry Melnik  <dm@ispras.ru>
	    Dmitry Zhurikhin  <zhur@ispras.ru>
	    Alexander Monakov  <amonakov@ispras.ru>
	    Maxim Kuvyrkov  <maxim@codesourcery.com>
	
	* cfghooks.h (get_cfg_hooks, set_cfg_hooks): New prototypes.

	* cfghooks.c (get_cfg_hooks, set_cfg_hooks): New functions.
	(make_forwarder_block): Update loop latch if we have redirected
	the loop latch edge.

	* cfgloop.c (get_loop_body_in_custom_order): New function.

	* cfgloop.h (LOOPS_HAVE_FALLTHRU_PREHEADERS): New enum field.
	(CP_FALLTHRU_PREHEADERS): Likewise.
	(get_loop_body_in_custom_order): Declare.
	
	* cfgloopmanip.c (has_preds_from_loop): New.
	(create_preheader): Honor CP_FALLTHRU_PREHEADERS.
	Assert that the preheader edge will be fall thru when it is set.
	
	* cse.c (hash_rtx_cb): New.
	(hash_rtx): Use it.

  	* final.c (compute_alignments): Export.  Free dominance info after 
loop_optimizer_finalize.

	* genattr.c (main): Output maximal_insn_latency prototype.

	* genautomata.c (output_default_latencies): New. Factor its code from ...
	(output_internal_insn_latency_func): ... here.
	(output_internal_maximal_insn_latency_func): New.
	(output_maximal_insn_latency_func): New.

	* hard-reg-set.h (UHOST_BITS_PER_WIDE_INT): Define unconditionally.
	(struct hard_reg_set_iterator): New.
	(hard_reg_set_iter_init, hard_reg_set_iter_set,
	hard_reg_set_iter_next): New functions.
	(EXECUTE_IF_SET_IN_HARD_REG_SET): New macro.

	* lists.c (remove_free_INSN_LIST_node,
	remove_free_EXPR_LIST_node): New functions.

	* loop-init.c (loop_optimizer_init): When LOOPS_HAVE_FALLTHRU_PREHEADERS,
	set CP_FALLTHRU_PREHEADERS when calling create_preheaders.
	(loop_optimizer_finalize): Do not verify flow info after reload.

	* recog.c (validate_replace_rtx_1): New parameter simplify.
	Default it to true.  Update all uses.  Factor out simplifying
	code to ...
	(simplify_while_replacing): ... this new function.
	(validate_replace_rtx_part,
	validate_replace_rtx_part_nosimplify): New.

	* recog.h (validate_replace_rtx_part,
	validate_replace_rtx_part_nosimplify): Declare.

	* rtl.c (rtx_equal_p_cb): New.
	(rtx_equal_p): Use it.

	* rtl.h (rtx_equal_p_cb, hash_rtx_cb): Declare.
	(remove_free_INSN_LIST_NODE, remove_free_EXPR_LIST_node,
	debug_bb_n_slim, debug_bb_slim,	print_rtl_slim): Likewise.

	* vecprim.h: Add a vector type for unsigned int.


[-- Attachment #2: sel-sched-middle.diff.gz --]
[-- Type: application/gzip, Size: 26761 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Selective scheduling pass - scheduler changes [2/3]
  2008-06-03 14:24 [RFC] Selective scheduling pass Andrey Belevantsev
  2008-06-03 14:26 ` Selective scheduling pass - middle end changes [1/1] Andrey Belevantsev
@ 2008-06-03 14:27 ` Andrey Belevantsev
  2008-06-03 22:03   ` Vladimir Makarov
  2008-06-03 14:28 ` Selective scheduling pass - target changes (ia64 & rs6000) [3/3] Andrey Belevantsev
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 28+ messages in thread
From: Andrey Belevantsev @ 2008-06-03 14:27 UTC (permalink / raw)
  To: GCC Patches; +Cc: Jim Wilson, Vladimir Makarov, Ayal Zaks

[-- Attachment #1: Type: text/plain, Size: 15469 bytes --]

Hello,

This patch is the largest part of the implementation, and it shows 
changes to the scheduler itself as well as the new files.  The main 
changes in the scheduler fall into two categories: changes to the 
dependence analysis and changes to the initialization mechanism.  The 
dependence analysis changes are as follows:

o new hooks are introduced to make the schedulers be able to perform 
actions when a dependence is found, either a memory or a register one. 
  This is needed for on-the-fly dependence analysis, which we have to do 
because it is unclear how we can keep the dependence graph up to date in 
the presence of control dependencies and register renaming.  I think 
that a project for creating a proper dependence graph with e.g. 
distinguishing between control and data dependencies and storing origin 
of a dependence is needed to make this possible.

o readonly dependence contexts, when analysis generates all proper 
dependencies but doesn't change a context it is based on.  This is 
needed to speed up the on-the-fly analysis.

o sched_get_condition is rewritten such that it doesn't generate rtx 
garbage every time it needs to canonicalize the condition.

The initialization changes can be described as an effort to separate 
common scheduler data, private data, and dependence analysis data, and 
to create a uniform mechanism to initialize per-insn and per-basic block 
data.  The former is achieved with factoring the common/private/deps 
data and the code initializing it into separate structures and 
functions.  For example, the functions sched_init/sched_finish 
initialize/finalize the common part (df/aliases/dfa/etc), 
haifa_sched_{init,finish} work with the haifa data and so on.  The 
latter is achieved via sched_scan interface, which allows scanning 
arbitrary array of insns/bbs and to call user-defined hooks 
{init_extend}_{insn,bb} on each of them.  As a followup, the common part 
of the code, which is now in haifa-sched.c, can be moved to the new 
sched-common.c file.  We did not do that for the sake of easier merge 
process.

The overview of the selective scheduler implementation is given at the 
beginning of the sel-sched.c file.  There are also papers in the GCC 
Summit proceedings of 2006 and 2007.  The sel-sched.c file contains the 
main scheduling routines, and the sel-sched-ir.c file contains 
lower-level routines that manipulate the scheduler IR and data 
structures.  There are also sel-sched-dump.[ch] files, which contain 
dumping infrastructure for the scheduler and some code that eases 
debugging.  Some of the code may be emitted from merge, if we find it 
too specific.

OK for trunk?
Andrey


2008-06-03  Andrey Belevantsev  <abel@ispras.ru>
	    Dmitry Melnik  <dm@ispras.ru>
	    Dmitry Zhurikhin  <zhur@ispras.ru>
	    Alexander Monakov  <amonakov@ispras.ru>
	    Maxim Kuvyrkov  <maxim@codesourcery.com>
	
	* sel-sched.h, sel-sched-dump.h, sel-sched-ir.h, sel-sched.c,
	sel-sched-dump.c, sel-sched-ir.c: New files.

	* Makefile.in (OBJS-common): Add selective scheduling object
	files.
	(sel-sched.o, sel-sched-dump.o, sel-sched-ir.o): New entries.
	(SEL_SCHED_IR_H, SEL_SCHED_DUMP_H): New entries.

	* common.opt (fsel-sched-bookkeeping, fsel-sched-pipelining,
	fsel-sched-pipelining-outer-loops, fsel-sched-renaming,
	fsel-sched-substitution, fselective-scheduling): New flags.

	* haifa-sched.c: Include vecprim.h.
	(issue_rate, sched_verbose_param, note_list, dfa_state_size,
	ready_try, cycle_issued_insns, dfa_lookahead, max_luid, spec_info):
	Make global.
	(old_max_uid, old_last_basic_block): Remove.
	(h_i_d): Make it a vector.
	(INSN_TICK, INTER_TICK, QUEUE_INDEX, INSN_COST): Make them work
	through HID macro.
	(after_recovery, adding_bb_to_current_region_p):
	New variables to handle correct insertion of the recovery code.
	(struct ready_list): Move declaration to sched-int.h.
	(rgn_n_insns): Removed.
	(rtx_vec_t): Move to sched-int.h.
	(find_insn_reg_weight): Remove.
	(find_insn_reg_weight1): Rename to find_insn_reg_weight.
	(extend_h_i_d, init_h_i_d, haifa_init_h_i_d, haifa_finish_h_i_d):
	New functions to initialize / finalize haifa instruction data.
	(dep_weak): Move to sched-deps.c.  Rename to ds_weak.
	(unlink_other_notes): Move logic to add_to_note_list.  Handle
	selective scheduler.
	(ready_lastpos, ready_element, ready_sort, reemit_notes, move_insn,
	find_fallthru_edge): Make global, remove static prototypes.
	(max_issue): Add privileged_n and state parameters.  Use them.
	(extend_global, extend_all): Removed.
	(init_before_recovery): Add new param.  Fix the handling of the case
	when we insert a recovery code before the EXIT which has a predecessor
	with a fallthrough edge to it.
	(create_recovery_block): Make global.  Rename to
	sched_create_recovery_block.  Update.
	(change_pattern): Rename to sched_change_pattern.  Make global.
	(speculate_insn): Rename to sched_speculate_insn.  Make global.
	Split haifa-specific functionality into ...
	(haifa_change_pattern): New static function.
	(sched_extend_bb, sched_init_bb): New static functions.
	(sched_extend_bb): Add the prototype.
	(current_sched_info): Change type to ...
	(struct haifa_sched_info): ... this.  New structure.  Move
	Haifa-specific fields from struct sched_info.
	(insn_cost): Adjust for selective scheduling.
	(dep_cost_1): New static function.  Prototype it.  Move logic from ...
	(insn_cost1): ... here.
     	(dep_cost): Use dep_cost_1.
	(priority): Adjust to work with selective scheduling.  Use
	sched_deps_info instead of current_sched_info.  Process the corner
	case when all dependencies don't contribute to priority.
	(rank_for_schedule): Use ds_weak instead of dep_weak.
	(advance_state): New function.  Move logic from ...
	(advance_one_cycle): ... here.
	(add_to_note_list, concat_note_lists): New functions.
	(rm_other_notes): Make static.  Adjust for selective scheduling.
	(remove_notes, restore_other_notes): New functions.
	(move_insn): Don't call reemit_notes.
	(choose_ready): Remove lookahead variable, use dfa_lookahead.
	Remove more_issue, max_points.  Move the code to initialize
	max_lookahead_tries to max_issue.
	(schedule_block): Remove rgn_n_insns1 parameter.  Don't allocate
	ready.  Adjust uses of move_insn.  Call restore_other_notes.
	(luid): Remove.
	(sched_init, sched_finish): Move Haifa-specific initialization/
	finalization to ...
	(haifa_sched_init, haifa_sched_finish): ... respectively.
	New functions.
	(setup_sched_dump): New function.
	(haifa_init_only_bb): New static function.
	(haifa_speculate_insn): New static function.
	(try_ready): Use haifa_* instead of speculate_insn and
	change_pattern.
	(extend_ready, extend_all): Remove.
	(sched_extend_ready_list, sched_finish_ready_list): New functions.
	(create_check_block_twin, add_to_speculative_block): Use
	haifa_insns_init instead of extend_global.  Update to use new
	initialization functions.  Change parameter.
	(add_block): Remove.
	(sched_scan_info): New.
	(extend_bb, init_bb, extend_insn, init_insn, init_insns_in_bb,
	sched_scan): New static functions for walking through scheduling
	region.
	(sched_init_bbs): New functions to init / finalize
	basic block information.
	(sched_luids): New vector variable to replace uid_to_luid.
	(luids_extend_insn): New function.
	(sched_max_luid): New variable.
	(luids_init_insn): New function.
	(sched_init_luids, sched_finish_luids): New functions.
	(insn_luid): New debug function.
	(sched_extend_target): New function.
	(haifa_init_insn): New static function.
	(sched_init_only_bb): New hook.
	(sched_split_block): New hook.
	(sched_split_block_1): New function.
	(sched_create_empty_bb): New hook.
	(sched_create_empty_bb_1): New function.	
	(common_sched_info, ready): New global variables.
	(current_sched_info_var): Remove.
	(move_block_after_check): Use common_sched_info.		
	(haifa_luid_for_non_insn): New static function.	
	(init_before_recovery): Use haifa_init_only_bb instead of
	add_block.

	* modulo-sched.c: (sms_sched_info): Rename to sms_common_sched_info.
	(sms_sched_deps_info, sms_sched_info): New.
	(setup_sched_infos): New.
	(sms_schedule): Initialize them.  Call haifa_sched_init/finish.
	Do not call regstat_free_calls_crossed, as it called by sched_init.
	(sms_print_insn): Use const_rtx.

	* params.def (PARAM_MAX_PIPELINE_REGION_BLOCKS,
	PARAM_MAX_PIPELINE_REGION_INSNS, PARAM_SELSCHED_MAX_LOOKAHEAD,
	PARAM_SELSCHED_MAX_SCHED_TIMES, PARAM_SELSCHED_INSNS_TO_RENAME): New.

	* sched-deps.c (sched_deps_info): New.  Update all relevant uses of
	current_sched_info to use it.
	(enum reg_pending_barrier_mode): Move to sched-int.h.
	(h_d_i_d): New variable. Initialize to NULL.
	({true, output, anti, spec, forward}_dependency_cache): Initialize
	to NULL.
	(sched_has_condition_p): New function.  Adjust users of
	sched_get_condition to use it instead.
	(conditions_mutex_p): Add arguments indicating which conditions are
	reversed.  Use them.
	(sched_get_condition_with_rev): Rename from sched_get_condition.  Add
	argument to indicate whether returned condition is reversed.  Do not
	generate new rtx when condition should be reversed; indicate it by
	setting new argument instead.
	(add_dependence_list_and_free): Add deps parameter.
	Update all users.  Do not free dependence list when
	deps context is readonly.
	(add_insn_mem_dependence, flush_pending_lists): Adjust for readonly
	contexts.
	(remove_from_dependence_list, remove_from_both_dependence_lists): New.
	(remove_from_deps): New. Use the above functions.	
	(deps_analyze_insn): Do not flush pending write lists on speculation
	checks.  Do not make speculation check a scheduling barrier for memory
	references.
	(cur_max_luid, cur_insn, can_start_lhs_rhs_p): New static variables.
	(add_or_update_back_dep_1): Initialize present_dep_type.
	(haifa_start_insn, haifa_finish_insn, haifa_note_reg_set,
	haifa_note_reg_clobber, haifa_note_reg_use, haifa_note_mem_dep,
	haifa_note_dep): New functions implementing dependence hooks for
	the Haifa scheduler.
	(note_reg_use, note_reg_set, note_reg_clobber, note_mem_dep,
	note_dep): New functions.
	(ds_to_dt): New function.
	(sched_analyze_reg, sched_analyze_1, sched_analyze_2,
	sched_analyze_insn): Update to use dependency hooks infrastructure
	and readonly contexts.
	(deps_analyze_insn): New function.  Move part of logic from ...
	(sched_analyze): ... here.  Also move some logic to ...
	(deps_start_bb): ... here.  New function.
	(add_forw_dep, delete_forw_dep): Guard use of INSN_DEP_COUNT with
	sel_sched_p.
	(sched_deps_init): New function.  Move code from ...
	(init_dependency_caches): ... here.
	(init_deps_data_vector): New.
	(sched_deps_finish): New function.  Move code from ...
	(free_dependency_caches): ... here.
	(init_deps_global, finish_deps_global): Adjust for use with
	selective scheduling.
	(get_dep_weak): Move logic to ...
	(get_dep_weak_1): New function.
	(ds_merge): Move logic to ...
	(ds_merge_1): New static function.
	(ds_full_merge, ds_max_merge, ds_get_speculation_types): New functions.
	(ds_get_max_dep_weak): New function.

	* sched-ebb.c (sched_n_insns): Rename to sched_rgn_n_insns.
	(n_insns): Rename to rgn_n_insns.
	(debug_ebb_dependencies): New function.
	(init_ready_list): Use it.
	(ebb_print_insn): Indicate when an insn starts a new cycle.
	(contributes_to_priority, compute_jump_reg_dependencies,
	add_remove_insn, fix_recovery_cfg): Add ebb_ prefix to function names.
	(ebb_sched_deps_info, ebb_common_sched_info): New variables.
	(schedule_ebb): Initialize them.  Use remove_notes instead of
	rm_other_notes.  Use haifa_local_init/finish.
	(schedule_ebbs): Use haifa_sched_init/finish.

	* sched-int.h: Include basic-block.h and vecprim.h.
	(sched_verbose_param, enum sched_pass_id_t,
	bb_vec_t, insn_vec_t, rtx_vec_t): New.
	(struct sched_scan_info_def): New structure.
	(sched_scan_info, sched_scan, sched_init_bbs,
	sched_init_luids, sched_finish_luids, sched_extend_target,
	haifa_init_h_i_d, haifa_finish_h_i_d): Declare.
	(struct common_sched_info_def): New.
	(common_sched_info, haifa_common_sched_info,
	sched_emulate_haifa_p): Declare.
	(sel_sched_p): New.
	(sched_luids): Declare.
	(INSN_LUID, LUID_BY_UID, SET_INSN_LUID): Declare.
	(sched_max_luid, insn_luid): Declare.
	(note_list, remove_notes, restore_other_notes, bb_note): Declare.
	(sched_insns_init, sched_insns_finish, xrecalloc, move_insn,
	reemit_notes, print_insn, print_pattern, print_value,
	haifa_classify_insn, sel_find_rgns, sel_mark_hard_insn,
	dfa_state_size, advance_state, setup_sched_dump, sched_init,
	sched_finish, sel_insn_is_speculation_check): Export.
	(struct ready_list): Move from haifa-sched.c.
	(ready_try, ready, max_issue): Export.
	(find_fallthru_edge, sched_init_only_bb, sched_split_block,
	sched_split_block_1, sched_create_empty_bb, sched_create_empty_bb_1,
	sched_create_recovery_block, sched_create_recovery_edges): Export.
	(enum reg_pending_barrier_mode): Export.
	(struct deps): New fields `last_reg_pending_barrier' and `readonly'.
	(deps_t): New.
	(struct sched_info): Move compute_jump_reg_dependencies, use_cselib  ...
	(struct haifa_insn_data): and cant_move to ...
	(struct sched_deps_info_def): ... this new structure.
	(h_i_d): Export.
	(HID): New accessor macro.  Rewrite h_i_d accessor macros through HID.
	(struct region): Move from sched-rgn.h.
	(nr_regions, rgn_table, rgn_bb_table, block_to_bb, containing_rgn,
	RGN_NR_BLOCKS, RGN_BLOCKS, RGN_DONT_CALC_DEPS, RGN_HAS_REAL_EBB,
	BLOCK_TO_BB, CONTAINING_RGN): Export.
	(ebb_head, BB_TO_BLOCK, EBB_FIRST_BB, EBB_LAST_BB, INSN_BB): Likewise.
	(current_nr_blocks, current_blocks, target_bb): Likewise.
	(sched_is_disabled_for_current_region_p, sched_rgn_init, sched_rgn_finish,
	rgn_setup_region, sched_rgn_compute_dependencies, sched_rgn_local_init,
	extend_regions, rgn_make_new_region_out_of_new_block,
	compute_priorities, debug_rgn_dependencies,
	free_rgn_deps, contributes_to_priority, extend_rgns, deps_join
	rgn_setup_common_sched_info, rgn_setup_sched_infos, debug_regions,
	debug_region, dump_region_dot, 	dump_region_dot_file,
	haifa_sched_init, haifa_sched_finish): Export.

	* sched-rgn.c: Export region data structures.
	(debug_region, bb_in_region_p, dump_region_dot_file, dump_region_dot): New.
	(too_large): Use estimate_number_of_insns.
	(haifa_find_rgns): New. Move the code from ...
	(find_rgns): ... here.  Call either sel_find_rgns or haifa_find_rgns.
	(free_trg_info): New.
	(compute_trg_info): Allocate candidate tables here instead of ...
	(init_ready_list): ... here.
	(rgn_common_sched_info, rgn_const_sched_deps_info,
	rgn_const_sel_sched_deps_info, rgn_sched_deps_info): New.
	(deps_join): New, extracted from ...
	(propagate_deps): ... here.
	(free_rgn_deps, compute_priorities): New function.
	(sched_rgn_init, sched_rgn_finish): New functions.
	(schedule_region): Use them.
	(sched_rgn_local_preinit, sched_rgn_local_init,
	sched_rgn_local_free, sched_rgn_local_finish): New functions.
	(rgn_make_new_region_out_of_new_block): New.

	* sched-vis.c (print_value, print_pattern): Make global.
	(dump_insn_slim_1, print_rtl_slim, debug_bb_slim,
	debug_bb_n_slim): New functions.

	* target-def.h (TARGET_SCHED_ALLOC_SCHED_CONTEXT,
	TARGET_SCHED_INIT_SCHED_CONTEXT, TARGET_SCHED_SET_SCHED_CONTEXT,
	TARGET_SCHED_CLEAR_SCHED_CONTEXT, TARGET_SCHED_FREE_SCHED_CONTEXT):
	New target hooks.  Initialize them to 0.
	* target.h (struct gcc_target): Add them.

	* doc/invoke.texi: Document new flags and parameters.
	* doc/tm.texi: Document new target hooks.


[-- Attachment #2: sel-sched-merge-sched.diff.gz --]
[-- Type: application/gzip, Size: 198019 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Selective scheduling pass - scheduler changes [2/3]
  2008-06-03 14:27 ` Selective scheduling pass - scheduler changes [2/3] Andrey Belevantsev
@ 2008-06-03 22:03   ` Vladimir Makarov
  2008-08-22 15:52     ` Andrey Belevantsev
  0 siblings, 1 reply; 28+ messages in thread
From: Vladimir Makarov @ 2008-06-03 22:03 UTC (permalink / raw)
  To: Andrey Belevantsev; +Cc: GCC Patches, Jim Wilson, Ayal Zaks

Andrey Belevantsev wrote:
> Hello,
>
> This patch is the largest part of the implementation, and it shows 
> changes to the scheduler itself as well as the new files.  The main 
> changes in the scheduler fall into two categories: changes to the 
> dependence analysis and changes to the initialization mechanism. 

I'll look at the patch.  But taking the size of the patch, the review 
probably will take several weeks.
>
>
> OK for trunk?
> Andrey
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Selective scheduling pass - scheduler changes [2/3]
  2008-06-03 22:03   ` Vladimir Makarov
@ 2008-08-22 15:52     ` Andrey Belevantsev
  0 siblings, 0 replies; 28+ messages in thread
From: Andrey Belevantsev @ 2008-08-22 15:52 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: GCC Patches, Jim Wilson, Ayal Zaks

[-- Attachment #1: Type: text/plain, Size: 691 bytes --]

Vladimir Makarov wrote:
> Andrey Belevantsev wrote:
>> Hello,
>>
>> This patch is the largest part of the implementation, and it shows 
>> changes to the scheduler itself as well as the new files.  The main 
>> changes in the scheduler fall into two categories: changes to the 
>> dependence analysis and changes to the initialization mechanism. 
> 
> I'll look at the patch.  But taking the size of the patch, the review 
> probably will take several weeks.
Thanks again for the review, Vlad.  For the record, here is the updated 
patch against trunk version 139129 incorporating all your suggestions 
from the review to which I answered with separate patches to the branch.

Yours, Andrey

[-- Attachment #2: sel-sched.changelog --]
[-- Type: text/plain, Size: 16839 bytes --]

2008-08-22  Andrey Belevantsev  <abel@ispras.ru>
	    Dmitry Melnik  <dm@ispras.ru>
	    Dmitry Zhurikhin  <zhur@ispras.ru>
	    Alexander Monakov  <amonakov@ispras.ru>
	    Maxim Kuvyrkov  <maxim@codesourcery.com>
	
	* sel-sched.h, sel-sched-dump.h, sel-sched-ir.h, sel-sched.c,
	sel-sched-dump.c, sel-sched-ir.c: New files.

	* Makefile.in (OBJS-common): Add selective scheduling object
	files.
	(sel-sched.o, sel-sched-dump.o, sel-sched-ir.o): New entries.
	(SEL_SCHED_IR_H, SEL_SCHED_DUMP_H): New entries.
	(sched-vis.o): Add dependency on $(INSN_ATTR_H).

	* common.opt (fsel-sched-bookkeeping, fsel-sched-pipelining,
	fsel-sched-pipelining-outer-loops, fsel-sched-renaming,
	fsel-sched-substitution, fselective-scheduling): New flags.

	* haifa-sched.c: Include vecprim.h and cfgloop.h.
	(issue_rate, sched_verbose_param, note_list, dfa_state_size,
	ready_try, cycle_issued_insns, spec_info): Make global.
	(readyp): Initialize.
	(dfa_lookahead): New global variable.
	(old_max_uid, old_last_basic_block): Remove.
	(h_i_d): Make it a vector.
	(INSN_TICK, INTER_TICK, QUEUE_INDEX, INSN_COST): Make them work
	through HID macro.
	(after_recovery, adding_bb_to_current_region_p):
	New variables to handle correct insertion of the recovery code.
	(struct ready_list): Move declaration to sched-int.h.
	(rgn_n_insns): Removed.
	(rtx_vec_t): Move to sched-int.h.
	(find_insn_reg_weight): Remove.
	(find_insn_reg_weight1): Rename to find_insn_reg_weight.
	(haifa_init_h_i_d, haifa_finish_h_i_d):
	New functions to initialize / finalize haifa instruction data.
	(extend_h_i_d, init_h_i_d): Rewrite.
	(unlink_other_notes): Move logic to add_to_note_list.  Handle
	selective scheduler.
	(ready_lastpos, ready_element, ready_sort, reemit_notes,
	find_fallthru_edge): Make global, remove static prototypes.
	(max_issue): Make global.  Add privileged_n and state parameters.  Use
	them.  
	(extend_global, extend_all): Removed.
	(init_before_recovery): Add new param.  Fix the handling of the case
	when we insert a recovery code before the EXIT which has a predecessor
	with a fallthrough edge to it.
	(create_recovery_block): Make global.  Rename to
	sched_create_recovery_block.  Update.
	(change_pattern): Rename to sched_change_pattern.  Make global.
	(speculate_insn): Rename to sched_speculate_insn.  Make global.
	Split haifa-specific functionality into ...
	(haifa_change_pattern): New static function.
	(sched_extend_bb): New static function.
	(sched_init_bbs): New function.
	(current_sched_info): Change type to struct haifa_sched_info.
	(insn_cost): Adjust for selective scheduling.
	(dep_cost_1): New function.  Move logic from ...
	(dep_cost): ... here.
	(dep_cost): Use dep_cost_1.
	(contributes_to_priority_p): Use sched_deps_info instead of
	current_sched_info.
	(priority): Adjust to work with selective scheduling.  Process the
	corner case when all dependencies don't contribute to priority.
	(rank_for_schedule): Use ds_weak instead of dep_weak.
	(advance_state): New function.  Move logic from ...
	(advance_one_cycle): ... here.
	(add_to_note_list, concat_note_lists): New functions.
	(rm_other_notes): Make static.  Adjust for selective scheduling.
	(remove_notes, restore_other_notes): New functions.
	(move_insn): Add two arguments.  Update assert.  Don't call
	reemit_notes.
	(choose_ready): Remove lookahead variable, use dfa_lookahead.
	Remove more_issue, max_points.  Move the code to initialize
	max_lookahead_tries to max_issue.
	(schedule_block): Remove rgn_n_insns1 parameter.  Don't allocate
	ready.  Adjust use of move_insn.  Call restore_other_notes.
	(luid): Remove.
	(sched_init, sched_finish): Move Haifa-specific initialization/
	finalization to ...
	(haifa_sched_init, haifa_sched_finish): ... respectively.
	New functions.
	(setup_sched_dump): New function.
	(haifa_init_only_bb): New static function.
	(haifa_speculate_insn): New static function.
	(try_ready): Use haifa_* instead of speculate_insn and
	change_pattern.
	(extend_ready, extend_all): Remove.
	(sched_extend_ready_list, sched_finish_ready_list): New functions.
	(create_check_block_twin, add_to_speculative_block): Use
	haifa_insns_init instead of extend_global.  Update to use new
	initialization functions.  Change parameter.  Factor out code from
	create_check_block_twin to ...
	(sched_create_recovery_edges) ... this new function.
	(add_block): Remove.
	(sched_scan_info): New.
	(extend_bb): Use sched_scan_info.
	(init_bb, extend_insn, init_insn, init_insns_in_bb, sched_scan): New
	static functions for walking through scheduling region.
	(sched_luids): New vector variable to replace uid_to_luid.
	(luids_extend_insn): New function.
	(sched_max_luid): New variable.
	(luids_init_insn): New function.
	(sched_init_luids, sched_finish_luids): New functions.
	(insn_luid): New debug function.
	(sched_extend_target): New function.
	(haifa_init_insn): New static function.
	(sched_init_only_bb): New hook.
	(sched_split_block): New hook.
	(sched_split_block_1): New function.
	(sched_create_empty_bb): New hook.
	(sched_create_empty_bb_1): New function.	
	(common_sched_info, ready): New global variables.
	(current_sched_info_var): Remove.
	(move_block_after_check): Use common_sched_info.		
	(haifa_luid_for_non_insn): New static function.	
	(init_before_recovery): Use haifa_init_only_bb instead of
	add_block.

	* modulo-sched.c: (issue_rate): Remove static declaration.
	(sms_sched_info): Change type to haifa_sched_info.
	(sms_sched_deps_info, sms_common_sched_info): New variables.
	(setup_sched_infos): New.
	(sms_schedule): Initialize them.  Call haifa_sched_init/finish.
	Do not call regstat_free_calls_crossed.
	(sms_print_insn): Use const_rtx.

	* params.def (PARAM_MAX_PIPELINE_REGION_BLOCKS,
	PARAM_MAX_PIPELINE_REGION_INSNS, PARAM_SELSCHED_MAX_LOOKAHEAD,
	PARAM_SELSCHED_MAX_SCHED_TIMES, PARAM_SELSCHED_INSNS_TO_RENAME,
	PARAM_SCHED_MEM_TRUE_DEP_COST): New.

	* sched-deps.c (sched_deps_info): New.  Update all relevant uses of
	current_sched_info to use it.
	(enum reg_pending_barrier_mode): Move to sched-int.h.
	(h_d_i_d): New variable. Initialize to NULL.
	({true, output, anti, spec, forward}_dependency_cache): Initialize
	to NULL.
	(estimate_dep_weak): Remove static declaration.
	(sched_has_condition_p): New function.  Adjust users of
	sched_get_condition to use it instead.
	(conditions_mutex_p): Add arguments indicating which conditions are
	reversed.  Use them.
	(sched_get_condition_with_rev): Rename from sched_get_condition.  Add
	argument to indicate whether returned condition is reversed.  Do not
	generate new rtx when condition should be reversed; indicate it by
	setting new argument instead.
	(add_dependence_list_and_free): Add deps parameter.
	Update all users.  Do not free dependence list when
	deps context is readonly.
	(add_insn_mem_dependence, flush_pending_lists): Adjust for readonly
	contexts.
	(remove_from_dependence_list, remove_from_both_dependence_lists): New.
	(remove_from_deps): New. Use the above functions.	
	(cur_insn, can_start_lhs_rhs_p): New static variables.
	(add_or_update_back_dep_1): Initialize present_dep_type.
	(haifa_start_insn, haifa_finish_insn, haifa_note_reg_set,
	haifa_note_reg_clobber, haifa_note_reg_use, haifa_note_mem_dep,
	haifa_note_dep): New functions implementing dependence hooks for
	the Haifa scheduler.
	(note_reg_use, note_reg_set, note_reg_clobber, note_mem_dep,
	note_dep): New functions.
	(ds_to_dt, extend_deps_reg_info, maybe_extend_reg_info_p): New
	functions.
	(init_deps): Initialize last_reg_pending_barrier and deps->readonly.
	(free_deps): Initialize deps->reg_last.
	(sched_analyze_reg, sched_analyze_1, sched_analyze_2,
	sched_analyze_insn): Update to use dependency hooks infrastructure
	and readonly contexts.
	(deps_analyze_insn): New function.  Move part of logic from ...
	(sched_analyze): ... here.  Also move some logic to ...
	(deps_start_bb): ... here.  New function.
	(add_forw_dep, delete_forw_dep): Guard use of INSN_DEP_COUNT with
	sel_sched_p.
	(sched_deps_init): New function.  Move code from ...
	(init_dependency_caches): ... here.  Remove.
	(init_deps_data_vector): New.
	(sched_deps_finish): New function.  Move code from ...
	(free_dependency_caches): ... here.  Remove.
	(init_deps_global, finish_deps_global): Adjust for use with
	selective scheduling.
	(get_dep_weak): Move logic to ...
	(get_dep_weak_1): New function.
	(ds_merge): Move logic to ...
	(ds_merge_1): New static function.
	(ds_full_merge, ds_max_merge, ds_get_speculation_types): New functions.
	(ds_get_max_dep_weak): New function.

	* sched-ebb.c (sched_n_insns): Rename to sched_rgn_n_insns.
	(n_insns): Rename to rgn_n_insns.
	(debug_ebb_dependencies): New function.
	(init_ready_list): Use it.
	(begin_schedule_ready): Use sched_init_only_bb.
	(ebb_print_insn): Indicate when an insn starts a new cycle.
	(contributes_to_priority, compute_jump_reg_dependencies,
	add_remove_insn, fix_recovery_cfg): Add ebb_ prefix to function names.
	(add_block1): Remove to ebb_add_block.
	(ebb_sched_deps_info, ebb_common_sched_info): New variables.
	(schedule_ebb): Initialize them.  Use remove_notes instead of
	rm_other_notes.  Use haifa_local_init/finish.
	(schedule_ebbs): Use haifa_sched_init/finish.

	* sched-int.h: Include vecprim.h, remove rtl.h.
	(struct ready_list): Delete declaration.
	(sched_verbose_param, enum sched_pass_id_t,
	bb_vec_t, insn_vec_t, rtx_vec_t): New.
	(struct sched_scan_info_def): New structure.
	(sched_scan_info, sched_scan, sched_init_bbs,
	sched_init_luids, sched_finish_luids, sched_extend_target,
	haifa_init_h_i_d, haifa_finish_h_i_d): Declare.
	(struct common_sched_info_def): New.
	(common_sched_info, haifa_common_sched_info,
	sched_emulate_haifa_p): Declare.
	(sel_sched_p): New.
	(sched_luids): Declare.
	(INSN_LUID, LUID_BY_UID, SET_INSN_LUID): Declare.
	(sched_max_luid, insn_luid): Declare.
	(note_list, remove_notes, restore_other_notes, bb_note): Declare.
	(sched_insns_init, sched_insns_finish, xrecalloc, reemit_notes,
	print_insn, print_pattern, print_value, haifa_classify_insn,
	sel_find_rgns, sel_mark_hard_insn, dfa_state_size, advance_state,
	setup_sched_dump, sched_init, sched_finish,
	sel_insn_is_speculation_check): Export.
	(struct ready_list): Move from haifa-sched.c.
	(ready_try, ready, max_issue): Export.
	(ebb_compute_jump_reg_dependencies, find_fallthru_edge,
	sched_init_only_bb, sched_split_block, sched_split_block_1,
	sched_create_empty_bb, sched_create_empty_bb_1,
	sched_create_recovery_block, sched_create_recovery_edges): Export.
	(enum reg_pending_barrier_mode): Export.
	(struct deps): New fields `last_reg_pending_barrier' and `readonly'.
	(deps_t): New.
	(struct sched_info): Rename to haifa_sched_info.  Use const_rtx for
	print_insn field.  Move add_block and fix_recovery_cfg to
	common_sched_info_def.  Move compute_jump_reg_dependencies, use_cselib  ...
	(struct sched_deps_info_def): ... this new structure.
	(sched_deps_info): Declare.
	(struct spec_info_def): Remove weakness_cutoff, add
	data_weakness_cutoff and control_weakness_cutoff.
	(spec_info): Declare.
	(struct _haifa_deps_insn_data): Split from haifa_insn_data.  Add
	dep_count field.
	(struct haifa_insn_data): Rename to struct _haifa_insn_data.
	(haifa_insn_data_def, haifa_insn_data_t): New typedefs.
	(current_sched_info): Change type to struct haifa_sched_info.
	(haifa_deps_insn_data_def, haifa_deps_insn_data_t): New typedefs.
	(h_d_i_d): New variable.
	(HDID): New accessor macro.
	(h_i_d): Change type to VEC (haifa_insn_data_def, heap) *.
	(HID): New accessor macro.  Rewrite h_i_d accessor macros through HID
	and HDID.
	(IS_SPECULATION_CHECK_P): Update for selective scheduler.
	(enum SCHED_FLAGS): Update for selective scheduler.
	(enum SPEC_SCHED_FLAGS): New flag SEL_SCHED_SPEC_DONT_CHECK_CONTROL.
	(init_dependency_caches, free_dependency_caches): Delete declarations.
	(deps_analyze_insn, remove_from_deps, get_dep_weak_1,
	estimate_dep_weak, ds_full_merge, ds_max_merge, ds_weak,
	ds_get_speculation_types, ds_get_max_dep_weak, sched_deps_init,
	sched_deps_finish, haifa_note_reg_set, haifa_note_reg_use,
	haifa_note_reg_clobber, maybe_extend_reg_info_p, deps_start_bb,
	ds_to_dt): Export.
	(rm_other_notes): Delete declaration.
	(schedule_block): Remove one argument.
	(cycle_issued_insns, issue_rate, dfa_lookahead, ready_sort,
	ready_element, ready_lastpos, sched_extend_ready_list,
	sched_finish_ready_list, sched_change_pattern, sched_speculate_insn,
	concat_note_lists): Export.
	(struct region): Move from sched-rgn.h.
	(nr_regions, rgn_table, rgn_bb_table, block_to_bb, containing_rgn,
	RGN_NR_BLOCKS, RGN_BLOCKS, RGN_DONT_CALC_DEPS, RGN_HAS_REAL_EBB,
	BLOCK_TO_BB, CONTAINING_RGN): Export.
	(ebb_head, BB_TO_BLOCK, EBB_FIRST_BB, EBB_LAST_BB, INSN_BB): Likewise.
	(current_nr_blocks, current_blocks, target_bb): Likewise.
	(dep_cost_1, sched_is_disabled_for_current_region_p, sched_rgn_init,
	sched_rgn_finish, rgn_setup_region, sched_rgn_compute_dependencies,
	sched_rgn_local_init, extend_regions,
	rgn_make_new_region_out_of_new_block, compute_priorities,
	debug_rgn_dependencies, free_rgn_deps, contributes_to_priority,
	extend_rgns, deps_join rgn_setup_common_sched_info,
	rgn_setup_sched_infos, debug_regions, debug_region, dump_region_dot,
	dump_region_dot_file, haifa_sched_init, haifa_sched_finish): Export.

	* sched-rgn.c: Include sel-sched.h.
	(ref_counts): New static variable.  Use it ...
	(INSN_REF_COUNT): ... here.  Rewrite and move closer to uses.
	(FED_BY_SPEC_LOAD, IS_LOAD_INSN): Rewrite to use HID accessor macro.
	(sched_is_disabled_for_current_region_p): Delete static declaration.
	(struct region): Move to sched-int.h.
	(nr_regions, rgn_table, rgn_bb_table, block_to_bb, containing_rgn,
	ebb_head): Define and initialize.
	(RGN_NR_BLOCKS, RGN_BLOCKS, RGN_DONT_CALC_DEPS, RGN_HAS_REAL_EBB,
	BLOCK_TO_BB, CONTAINING_RGN, debug_regions, extend_regions,
	BB_TO_BLOCK, EBB_FIRST_BB, EBB_LAST_BB): Move to
	sched-int.h.
	(find_single_block_region): Add new argument to indicate that EBB
	regions should be constructed.
	(debug_live): Delete declaration.
	(current_nr_blocks, current_blocks, target_bb): Remove static qualifiers.
	(compute_dom_prob_ps, check_live, update_live, set_spec_fed): Delete
	declaration.
	(init_regions): Delete declaration.
	(debug_region, bb_in_region_p, dump_region_dot_file, dump_region_dot,
	rgn_estimate_number_of_insns): New.
	(too_large): Use estimate_number_of_insns.
	(haifa_find_rgns): New. Move the code from ...
	(find_rgns): ... here.  Call either sel_find_rgns or haifa_find_rgns.
	(free_trg_info): New.
	(compute_trg_info): Allocate candidate tables here instead of ...
	(init_ready_list): ... here.
	(rgn_print_insn): Use const_rtx.
	(contributes_to_priority, extend_regions): Delete static declaration.
	(add_remove_insn, fix_recovery_cfg): Add rgn_ to function names.
	(add_block1): Rename to rgn_add_block.
	(debug_rgn_dependencies): Delete static qualifier.
	(new_ready): Use sched_deps_info.  Simplify.
	(rgn_common_sched_info, rgn_const_sched_deps_info,
	rgn_const_sel_sched_deps_info, rgn_sched_deps_info, rgn_sched_info): New.
	(region_sched_info): Rename to rgn_const_sched_info.
	(deps_join): New, extracted from ...
	(propagate_deps): ... here.
	(compute_block_dependences, debug_dependencies): Update for selective
	scheduling.
	(free_rgn_deps, compute_priorities): New functions.
	(sched_rgn_init, sched_rgn_finish, rgn_setup_region,
	sched_rgn_compute_dependencies): New functions.
	(schedule_region): Use them.
	(sched_rgn_local_init, sched_rgn_local_free, sched_rgn_local_finish,
	rgn_setup_common_sched_info, rgn_setup_sched_infos):
	New functions.
	(schedule_insns): Call new functions that were split out.
	(rgn_make_new_region_out_of_new_block): New.
	(rest_of_handle_sched, rest_of_handle_sched2): Call selective
	scheduling when appropriate.

	* sched-vis.c: Include insn-attr.h.
	(print_value, print_pattern): Make global.
	(print_rtl_slim, debug_bb_slim, debug_bb_n_slim): New functions.

	* target-def.h (TARGET_SCHED_ADJUST_COST_2,
	TARGET_SCHED_ALLOC_SCHED_CONTEXT, TARGET_SCHED_INIT_SCHED_CONTEXT,
	TARGET_SCHED_SET_SCHED_CONTEXT, TARGET_SCHED_CLEAR_SCHED_CONTEXT,
	TARGET_SCHED_FREE_SCHED_CONTEXT, TARGET_SCHED_GET_INSN_CHECKED_DS,
	TARGET_SCHED_GET_INSN_SPEC_DS, TARGET_SCHED_SKIP_RTX_P): New target
	hooks.  Initialize them to 0.
	(TARGET_SCHED_GEN_CHECK): Rename to TARGET_SCHED_GEN_SPEC_CHECK.
	* target.h (struct gcc_target): Add them.  Rename gen_check field to
	gen_spec_check.

	* flags.h (sel_sched_switch_set): Declare.
	* opts.c (sel_sched_switch_set): New variable.
	(decode_options): Unset flag_sel_sched_pipelining_outer_loops if
	pipelining is disabled from command line.
	(common_handle_option): Record whether selective scheduling is
	requested from command line.

	* doc/invoke.texi: Document new flags and parameters.
	* doc/tm.texi: Document new target hooks.


[-- Attachment #3: sel-sched.diff.gz --]
[-- Type: application/gzip, Size: 192076 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Selective scheduling pass - target changes (ia64 & rs6000) [3/3]
  2008-06-03 14:24 [RFC] Selective scheduling pass Andrey Belevantsev
  2008-06-03 14:26 ` Selective scheduling pass - middle end changes [1/1] Andrey Belevantsev
  2008-06-03 14:27 ` Selective scheduling pass - scheduler changes [2/3] Andrey Belevantsev
@ 2008-06-03 14:28 ` Andrey Belevantsev
  2008-08-22 16:04   ` Andrey Belevantsev
  2008-06-03 22:03 ` [RFC] Selective scheduling pass Vladimir Makarov
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 28+ messages in thread
From: Andrey Belevantsev @ 2008-06-03 14:28 UTC (permalink / raw)
  To: GCC Patches; +Cc: Jim Wilson, Vladimir Makarov, Ayal Zaks

[-- Attachment #1: Type: text/plain, Size: 8286 bytes --]

Hello,

This patch shows the target dependent changes for the selective 
scheduler.  The majority of changes is in the config/ia64/ia64.c file. 
They also include a lot of tunings done thorough the project.  Each 
tuning originated from a problem test case (usually from SPEC or from Al 
Aburto tests) that was fixed by it. The summary of changes is as follows:

o speculation support is improved to allow more patterns to be 
speculative (speculable1 and speculable2 attributes mark 
patterns/alternatives that are valid for speculation);

o bundling also optimizes for minimal number of mid-bundle stops;

o we lower the priority of memory operations if we have issued too many 
of them on the current cycle;

o default function and loop alignment is set to 64 and 32, respectively;

o we discard cost of memory dependencies which are likely false;

o we place a stop bit after every simulated processor cycle;

o the incorrect bypass in itanium2.md that resulted in stalls between 
fma and st insns is removed.

Also, to support the proper alignment on scheduled loops, we have put 
pass_compute_alignments after pass_machine_reorg (this part actually is 
in the middle-end patch, but I mention it here as it was inspired by the 
Itanium).

The rs6000 change is a minimal version needed to support the selective 
scheduler for a target.  As we now can have several points in a region 
at which we are scheduling, the backend can no longer save the scheduler 
state in private variables and use it in the hooks (e.g. 
last_scheduled_insn).  For that purpose, a concept of a target context 
is introduced: all private scheduler-related target info should be put 
in there, and the target should provide hooks for 
creating/deleting/setting as current a target context.  The scheduler 
then treats target contexts as opaque pointers.  Also, we do not yet 
support adjust_priority hooks (but the work on this is underway), so 
that part of the rs6000 scheduler hooks is disabled.

OK for trunk?
Andrey


2008-06-03  Andrey Belevantsev  <abel@ispras.ru>
	    Dmitry Melnik  <dm@ispras.ru>
	    Dmitry Zhurikhin  <zhur@ispras.ru>
	    Alexander Monakov  <amonakov@ispras.ru>
	    Maxim Kuvyrkov  <maxim@codesourcery.com>

	* config/ia64/ia64.c: Include sel-sched.h.  Rewrite speculation hooks.
	(ia64_gen_spec_insn): Removed.
	(get_spec_check_gen_function, insn_can_be_in_speculative_p,
	ia64_gen_spec_check): New static functions.
	(ia64_alloc_sched_context, ia64_init_sched_context,
	ia64_set_sched_context, ia64_clear_sched_context,
	ia64_free_sched_context, ia64_get_insn_spec_ds,
	ia64_get_insn_checked_ds, ia64_skip_rtx_p): Declare functions.
	(ia64_needs_block_p): Change prototype.
	(ia64_gen_check): Rename to ia64_gen_spec_check.
	(ia64_adjust_cost): Rename to ia64_adjust_cost_2.  Add new parameter
	into declaration, add special memory dependencies handling.
	(TARGET_SCHED_ALLOC_SCHED_CONTEXT, TARGET_SCHED_INIT_SCHED_CONTEXT,
	TARGET_SCHED_SET_SCHED_CONTEXT, TARGET_SCHED_CLEAR_SCHED_CONTEXT,
	TARGET_SCHED_FREE_SCHED_CONTEXT, TARGET_SCHED_GET_INSN_SPEC_DS,
	TARGET_SCHED_GET_INSN_CHECKED_DS, TARGET_SCHED_SKIP_RTX_P):
	Define new target hooks.
	(TARGET_SCHED_GEN_CHECK): Rename to TARGET_SCHED_GEN_SPEC_CHECK.
	(ia64_override_options): Turn on selective scheduling with -O3,
	disable -fauto-inc-dec.  Initialize align_loops and align_functions
	to 32 and 64, respectively.  Set global selective scheduling flags
	according to target-dependent flags.
	(rtx_needs_barrier): Support UNSPEC_LDS_A.
	(group_barrier_needed): Use new mstop-bit-before-check flag.
	Add heuristic.
	(dfa_state_size): Make global.
	(spec_check_no, max_uid): Remove.
         (mem_ops_in_group, current_cycle): New variables.
	(ia64_sched_init): Disable checks for !SCHED_GROUP_P after reload.
         Initialize new variables.
         (is_load_p, record_memory_reference): New functions.
         (ia64_dfa_sched_reorder): Lower priority of loads when limit is
         reached.
	(ia64_variable_issue): Change use of current_sched_info to
	sched_deps_info.  Update comment.  Note if a load or a store is issued.
         (ia64_first_cycle_multipass_dfa_lookahead_guard_spec): Require 
a cycle
         advance if maximal number of loads or stores was issued on current
         cycle.
	(scheduled_good_insn): New static helper function.
         (ia64_dfa_new_cycle): Assert that last_scheduled_insn is set when
         a group barrier is needed.  Fix vertical spacing.  Guard the code
         doing state transition with last_scheduled_insn check. 

         Mark that a stop bit should be before current insn if there was a
         cycle advance.  Update current_cycle and mem_ops_in_group.
	(ia64_h_i_d_extended): Change use of current_sched_info to
         sched_deps_info. Reallocate stops_p by larger chunks.
	(struct _ia64_sched_context): New structure.
	(ia64_sched_context_t): New typedef.
	(ia64_alloc_sched_context, ia64_init_sched_context,
	ia64_set_sched_context, ia64_clear_sched_context,
	ia64_free_sched_context): New static functions.
	(gen_func_t): New typedef.
	(get_spec_load_gen_function): New function.
	(SPEC_GEN_EXTEND_OFFSET): Declare.	
	(ia64_set_sched_flags): Check common_sched_info instead of *flags.
	(get_mode_no_for_insn): Change the condition that prevents use of
	special hardware registers so it can now handle pseudos.
	(get_spec_unspec_code): New function.
	(ia64_skip_rtx_p, get_insn_spec_code, ia64_get_insn_spec_ds,
	ia64_get_insn_checked_ds, ia64_gen_spec_load): New static functions.
	(ia64_speculate_insn, ia64_needs_block_p): Support branchy checks
	during selective scheduling.
	(ia64_speculate_insn): Use ds_get_speculation_types when
	determining whether we need to change the pattern.
	(SPEC_GEN_LD_MAP, SPEC_GEN_CHECK_OFFSET): Declare.
	(ia64_spec_check_src_p): Support new speculation/check codes.
	(struct bundle_state): New field.
	(issue_nops_and_insn): Initialize it.
	(insert_bundle_state): Minimize mid-bundle stop bits.
	(important_for_bundling_p): New function.
	(get_next_important_insn): Use important_for_bundling_p.
	(bundling): When shifting TImode from unimportant insns, ignore
	also group barriers.  Assert that best state is found before
	the backward bundling pass.  Print number of mid-bundle stop bits.
	Minimize mid-bundle stop bits.  Check correct calculation of
	mid-bundle stop bits.
	(ia64_sched_finish, final_emit_insn_group_barriers): Fix formatting.
	(final_emit_insn_group_barriers): Emit stop bits before insns starting
	a new cycle.
	(sel2_run): New variable.
	(ia64_reorg): When flag_selective_scheduling is set, run the selective
	scheduling pass instead of schedule_ebbs.  Adjust for
	flag_selective_scheduling2.
	(ia64_optimization_options): Declare new parameter.
	
	* config/ia64/ia64.md (speculable1, speculable2): New attributes.
	(UNSPEC_LDS_A): New UNSPEC.
	(movqi_internal, movhi_internal, movsi_internal, movdi_internal,
	movti_internal, movsf_internal, movdf_internal,
	movxf_internal): Make visible.  Add speculable* attributes.
	(output_c_nc): New mode attribute.
	(mov<mode>_speculative_a, zero_extend<mode>di2_speculative_a,
	mov<mode>_nc, zero_extend<mode>di2_nc,
	advanced_load_check_nc_<mode>): New insns.
	(zero_extend*): Add speculable* attributes.

	* config/ia64/ia64.opt (msched_fp_mem_deps_zero_cost): New option.
	(msched-stop-bits-after-every-cycle): Likewise.
	(mstop-bit-before-check): Likewise.
	(msched-max-memory-insns,
         msched-max-memory-insns-hard-limit): Likewise.
	(msched-spec-verbose, msched-prefer-non-data-spec-insns,
         msched-prefer-non-control-spec-insns, 
msched-count-spec-in-critical-path,
         msel-sched-renaming, msel-sched-substitution, 
msel-sched-data-spec,
         msel-sched-control-spec, msel-sched-dont-check-control-spec): 
Use Target
         Report Var instead of Common Report Var.

	* config/ia64/itanium2.md: Remove strange bypass.
	
	* config/ia64/t-ia64 (ia64.o): Add dependency on sel-sched.h.

	* config/rs6000/rs6000.c (rs6000_init_sched_context,
	rs6000_alloc_sched_context, rs6000_set_sched_context,
	rs6000_free_sched_context): New functions.
	(struct _rs6000_sched_context): New.
	(rs6000_sched_reorder2): Do not modify INSN_PRIORITY for selective
	scheduling.
	(rs6000_sched_finish): Do not run for selective scheduling.


[-- Attachment #2: sel-sched-merge-targets.diff.gz --]
[-- Type: application/gzip, Size: 21330 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Selective scheduling pass - target changes (ia64 & rs6000) [3/3]
  2008-06-03 14:28 ` Selective scheduling pass - target changes (ia64 & rs6000) [3/3] Andrey Belevantsev
@ 2008-08-22 16:04   ` Andrey Belevantsev
  2008-08-29 13:41     ` [Ping] [GWP/ia64/rs6000 maintainer needed] " Andrey Belevantsev
  2008-09-25 22:39     ` sje
  0 siblings, 2 replies; 28+ messages in thread
From: Andrey Belevantsev @ 2008-08-22 16:04 UTC (permalink / raw)
  To: GCC Patches; +Cc: Jim Wilson, Vladimir Makarov, Ayal Zaks

[-- Attachment #1: Type: text/plain, Size: 6719 bytes --]

Hello,

This is the updated patch which I resend with the other parts.  I think 
the only change is that the flag_selective_scheduling option is turned 
on in ia64_optimization_options, not in ia64_override_options.

This is actually the last part of the selective scheduler patch that did 
not get reviewed yet.  Maybe a global write maintainer and a rs6000 
maintainer could have a look?

Thanks, Andrey

2008-08-22  Andrey Belevantsev  <abel@ispras.ru>
	    Dmitry Melnik  <dm@ispras.ru>
	    Dmitry Zhurikhin  <zhur@ispras.ru>
	    Alexander Monakov  <amonakov@ispras.ru>
	    Maxim Kuvyrkov  <maxim@codesourcery.com>

	* config/ia64/ia64.c: Include sel-sched.h.  Rewrite speculation hooks.
	(ia64_gen_spec_insn): Removed.
	(get_spec_check_gen_function, insn_can_be_in_speculative_p,
	ia64_gen_spec_check): New static functions.
	(ia64_alloc_sched_context, ia64_init_sched_context,
	ia64_set_sched_context, ia64_clear_sched_context,
	ia64_free_sched_context, ia64_get_insn_spec_ds,
	ia64_get_insn_checked_ds, ia64_skip_rtx_p): Declare functions.
	(ia64_needs_block_p): Change prototype.
	(ia64_gen_check): Rename to ia64_gen_spec_check.
	(ia64_adjust_cost): Rename to ia64_adjust_cost_2.  Add new parameter
	into declaration, add special memory dependencies handling.
	(TARGET_SCHED_ALLOC_SCHED_CONTEXT, TARGET_SCHED_INIT_SCHED_CONTEXT,
	TARGET_SCHED_SET_SCHED_CONTEXT, TARGET_SCHED_CLEAR_SCHED_CONTEXT,
	TARGET_SCHED_FREE_SCHED_CONTEXT, TARGET_SCHED_GET_INSN_SPEC_DS,
	TARGET_SCHED_GET_INSN_CHECKED_DS, TARGET_SCHED_SKIP_RTX_P):
	Define new target hooks.
	(TARGET_SCHED_GEN_CHECK): Rename to TARGET_SCHED_GEN_SPEC_CHECK.
	(ia64_optimization_options): Turn on selective scheduling with -O3,
	disable -fauto-inc-dec.
	(ia64_override_options): Initialize align_loops and align_functions
	to 32 and 64, respectively.  Set global selective scheduling flags
	according to target-dependent flags.
	(rtx_needs_barrier): Support UNSPEC_LDS_A.
	(group_barrier_needed): Use new mstop-bit-before-check flag.
	Add heuristic.
	(dfa_state_size): Make global.
	(spec_check_no, max_uid): Remove.
         (mem_ops_in_group, current_cycle): New variables.
	(ia64_sched_init): Disable checks for !SCHED_GROUP_P after reload.
         Initialize new variables.
         (is_load_p, record_memory_reference): New functions.
         (ia64_dfa_sched_reorder): Lower priority of loads when limit is
         reached.
	(ia64_variable_issue): Change use of current_sched_info to
	sched_deps_info.  Update comment.  Note if a load or a store is issued.
         (ia64_first_cycle_multipass_dfa_lookahead_guard_spec): Require 
a cycle
         advance if maximal number of loads or stores was issued on current
         cycle.
	(scheduled_good_insn): New static helper function.
         (ia64_dfa_new_cycle): Assert that last_scheduled_insn is set when
         a group barrier is needed.  Fix vertical spacing.  Guard the code
         doing state transition with last_scheduled_insn check. 

         Mark that a stop bit should be before current insn if there was a
         cycle advance.  Update current_cycle and mem_ops_in_group.
	(ia64_h_i_d_extended): Change use of current_sched_info to
         sched_deps_info. Reallocate stops_p by larger chunks.
	(struct _ia64_sched_context): New structure.
	(ia64_sched_context_t): New typedef.
	(ia64_alloc_sched_context, ia64_init_sched_context,
	ia64_set_sched_context, ia64_clear_sched_context,
	ia64_free_sched_context): New static functions.
	(gen_func_t): New typedef.
	(get_spec_load_gen_function): New function.
	(SPEC_GEN_EXTEND_OFFSET): Declare.	
	(ia64_set_sched_flags): Check common_sched_info instead of *flags.
	(get_mode_no_for_insn): Change the condition that prevents use of
	special hardware registers so it can now handle pseudos.
	(get_spec_unspec_code): New function.
	(ia64_skip_rtx_p, get_insn_spec_code, ia64_get_insn_spec_ds,
	ia64_get_insn_checked_ds, ia64_gen_spec_load): New static functions.
	(ia64_speculate_insn, ia64_needs_block_p): Support branchy checks
	during selective scheduling.
	(ia64_speculate_insn): Use ds_get_speculation_types when
	determining whether we need to change the pattern.
	(SPEC_GEN_LD_MAP, SPEC_GEN_CHECK_OFFSET): Declare.
	(ia64_spec_check_src_p): Support new speculation/check codes.
	(struct bundle_state): New field.
	(issue_nops_and_insn): Initialize it.
	(insert_bundle_state): Minimize mid-bundle stop bits.
	(important_for_bundling_p): New function.
	(get_next_important_insn): Use important_for_bundling_p.
	(bundling): When shifting TImode from unimportant insns, ignore
	also group barriers.  Assert that best state is found before
	the backward bundling pass.  Print number of mid-bundle stop bits.
	Minimize mid-bundle stop bits.  Check correct calculation of
	mid-bundle stop bits.
	(ia64_sched_finish, final_emit_insn_group_barriers): Fix formatting.
	(final_emit_insn_group_barriers): Emit stop bits before insns starting
	a new cycle.
	(sel2_run): New variable.
	(ia64_reorg): When flag_selective_scheduling is set, run the selective
	scheduling pass instead of schedule_ebbs.  Adjust for
	flag_selective_scheduling2.
	(ia64_optimization_options): Declare new parameter.
	
	* config/ia64/ia64.md (speculable1, speculable2): New attributes.
	(UNSPEC_LDS_A): New UNSPEC.
	(movqi_internal, movhi_internal, movsi_internal, movdi_internal,
	movti_internal, movsf_internal, movdf_internal,
	movxf_internal): Make visible.  Add speculable* attributes.
	(output_c_nc): New mode attribute.
	(mov<mode>_speculative_a, zero_extend<mode>di2_speculative_a,
	mov<mode>_nc, zero_extend<mode>di2_nc,
	advanced_load_check_nc_<mode>): New insns.
	(zero_extend*): Add speculable* attributes.

	* config/ia64/ia64.opt (msched_fp_mem_deps_zero_cost): New option.
	(msched-stop-bits-after-every-cycle): Likewise.
	(mstop-bit-before-check): Likewise.
	(msched-max-memory-insns,
         msched-max-memory-insns-hard-limit): Likewise.
	(msched-spec-verbose, msched-prefer-non-data-spec-insns,
         msched-prefer-non-control-spec-insns, 
msched-count-spec-in-critical-path,
         msel-sched-renaming, msel-sched-substitution, 
msel-sched-data-spec,
         msel-sched-control-spec, msel-sched-dont-check-control-spec): 
Use Target
         Report Var instead of Common Report Var.

	* config/ia64/itanium2.md: Remove strange bypass.
	
	* config/ia64/t-ia64 (ia64.o): Add dependency on sel-sched.h.

	* config/rs6000/rs6000.c (rs6000_init_sched_context,
	rs6000_alloc_sched_context, rs6000_set_sched_context,
	rs6000_free_sched_context): New functions.
	(struct _rs6000_sched_context): New.
	(rs6000_sched_reorder2): Do not modify INSN_PRIORITY for selective
	scheduling.
	(rs6000_sched_finish): Do not run for selective scheduling.


[-- Attachment #2: sel-sched-target.diff.gz --]
[-- Type: application/gzip, Size: 21401 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Ping] [GWP/ia64/rs6000 maintainer needed] Re: Selective scheduling  pass - target changes (ia64 & rs6000) [3/3]
  2008-08-22 16:04   ` Andrey Belevantsev
@ 2008-08-29 13:41     ` Andrey Belevantsev
  2008-08-29 15:01       ` Mark Mitchell
  2008-09-25 22:39     ` sje
  1 sibling, 1 reply; 28+ messages in thread
From: Andrey Belevantsev @ 2008-08-29 13:41 UTC (permalink / raw)
  To: GCC Patches; +Cc: Jim Wilson, Vladimir Makarov, Ayal Zaks, Mark Mitchell

Hello,

[CC'ing Mark as both GWP and RM]

Andrey Belevantsev wrote:
> This is actually the last part of the selective scheduler patch that did 
> not get reviewed yet.  Maybe a global write maintainer and a rs6000 
> maintainer could have a look?

The target changes of the selective scheduler is still the only 
unreviewed part.  Without this part, the other scheduler reviews will be 
useless.  There is only a few days left before stage1 closes.  As Jim 
doesn't have enough time to look at the patch, and there is no more ia64 
maintaners, maybe a global write maintainer will take a look?  The last 
version of the patch can be found at 
http://gcc.gnu.org/ml/gcc-patches/2008-08/msg01669.html.

Andrey

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Ping] [GWP/ia64/rs6000 maintainer needed] Re: Selective scheduling  pass - target changes (ia64 & rs6000) [3/3]
  2008-08-29 13:41     ` [Ping] [GWP/ia64/rs6000 maintainer needed] " Andrey Belevantsev
@ 2008-08-29 15:01       ` Mark Mitchell
  0 siblings, 0 replies; 28+ messages in thread
From: Mark Mitchell @ 2008-08-29 15:01 UTC (permalink / raw)
  To: Andrey Belevantsev; +Cc: GCC Patches, Jim Wilson, Vladimir Makarov, Ayal Zaks

Andrey Belevantsev wrote:
> Hello,
> 
> [CC'ing Mark as both GWP and RM]
> 
> Andrey Belevantsev wrote:
>> This is actually the last part of the selective scheduler patch that
>> did not get reviewed yet.  Maybe a global write maintainer and a
>> rs6000 maintainer could have a look?

I can try to take a look, but my current priority is Graphite.  I am
hoping to finish reviewing that today.

Thanks,

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Selective scheduling pass - target changes (ia64 & rs6000) [3/3]
  2008-08-22 16:04   ` Andrey Belevantsev
  2008-08-29 13:41     ` [Ping] [GWP/ia64/rs6000 maintainer needed] " Andrey Belevantsev
@ 2008-09-25 22:39     ` sje
  2008-09-26 14:57       ` Andrey Belevantsev
  1 sibling, 1 reply; 28+ messages in thread
From: sje @ 2008-09-25 22:39 UTC (permalink / raw)
  To: abel; +Cc: gcc-patches, wilson, vmakarov

Andrey,

I have started looking at the IA64 specific parts of the selective
scheduling branch.  I still need some more time but I was wondering if you
could update it so that it is up-to-date with respect to the main trunk.
I tried to apply the patch so I could look at some of the changes with
more context and ia64.c would not apply cleanly.

Here are a few minor comments from what I have reviewed so far.  I didn't
include the patch and put the comments inline since the patch is so
large and I only had a few comments.

There are some places where lines start with spaces instead of tabs even
though they are indented enough to use tabs and a couple of functions
(ia64_sched_init and ia64_sched_final) had lines where the only change
was from tabs to spaces.

In ia64_clear_sched_context we free _sc->prev_cycle_state, I was
wondering if we should set it to NULL after freeing it.  Or are
we going to free _sc right after this so that it doesn't matter?

Is the mflag_sched_spec_verbose flag really needed?  It looks like
all it does is dump output to stderr intead of the normal dump file.

In get_mode_no_for_insn, there is a check:

	(AR_CCV_REGNUM <= REGNO (reg) && REGNO (reg) <= AR_EC_REGNUM)

I think this should be replaced with AR_REGNO_P ().

Steve Ellcey
sje@cup.hp.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Selective scheduling pass - target changes (ia64 & rs6000) [3/3]
  2008-09-25 22:39     ` sje
@ 2008-09-26 14:57       ` Andrey Belevantsev
  2008-10-03 22:22         ` Steve Ellcey
  0 siblings, 1 reply; 28+ messages in thread
From: Andrey Belevantsev @ 2008-09-26 14:57 UTC (permalink / raw)
  To: Steve Ellcey; +Cc: gcc-patches, wilson, vmakarov, Alexander Monakov

sje@cup.hp.com wrote:
> I have started looking at the IA64 specific parts of the selective
> scheduling branch.  I still need some more time but I was wondering if you
> could update it so that it is up-to-date with respect to the main trunk.
> I tried to apply the patch so I could look at some of the changes with
> more context and ia64.c would not apply cleanly.
We (Alexander and myself) just did it, so current sel-sched branch has 
the version of config/ia64/* files that we'd like to see on trunk.

> There are some places where lines start with spaces instead of tabs even
> though they are indented enough to use tabs and a couple of functions
> (ia64_sched_init and ia64_sched_final) had lines where the only change
> was from tabs to spaces.
This is fixed on the branch.

> In ia64_clear_sched_context we free _sc->prev_cycle_state, I was
> wondering if we should set it to NULL after freeing it.  Or are
> we going to free _sc right after this so that it doesn't matter?
Sometimes we free it and sometimes we don't, but I agree with you that 
it would be clearer to set it to NULL.  I will prepare a patch and check 
it in the branch.

> Is the mflag_sched_spec_verbose flag really needed?  It looks like
> all it does is dump output to stderr intead of the normal dump file.
No, it is not needed, but it is also not present on the branch.  We have 
reviewed the other introduced flags.  We will remove -msel-sched* flags 
related to speculation, as we can use the existing flags, and we will 
remove mstop-bit-before-check flag, as it doesn't improve performance in 
average.

> In get_mode_no_for_insn, there is a check:
> 
> 	(AR_CCV_REGNUM <= REGNO (reg) && REGNO (reg) <= AR_EC_REGNUM)
> 
> I think this should be replaced with AR_REGNO_P ().
Fixed on the branch.  Thanks a lot for your efforts!

Andrey

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Selective scheduling pass - target changes (ia64 & rs6000)  [3/3]
  2008-09-26 14:57       ` Andrey Belevantsev
@ 2008-10-03 22:22         ` Steve Ellcey
  2008-10-06 17:26           ` Andrey Belevantsev
  0 siblings, 1 reply; 28+ messages in thread
From: Steve Ellcey @ 2008-10-03 22:22 UTC (permalink / raw)
  To: Andrey Belevantsev; +Cc: gcc-patches, wilson, vmakarov, Alexander Monakov


On Fri, 2008-09-26 at 17:05 +0400, Andrey Belevantsev wrote:
> sje@cup.hp.com wrote:
> > I have started looking at the IA64 specific parts of the selective
> > scheduling branch.  I still need some more time but I was wondering if you
> > could update it so that it is up-to-date with respect to the main trunk.
> > I tried to apply the patch so I could look at some of the changes with
> > more context and ia64.c would not apply cleanly.
> We (Alexander and myself) just did it, so current sel-sched branch has 
> the version of config/ia64/* files that we'd like to see on trunk.
> 
> Andrey

Andrey,

I have reviewed the changes on the sel-sched-branch and approve the IA64
specific changes.  I noticed there were a few non-IA64 changes on the branch
and obviously I can't approve those but the IA64 changes look OK.

Steve Ellcey
sje@cup.hp.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Selective scheduling pass - target changes (ia64 & rs6000) [3/3]
  2008-10-03 22:22         ` Steve Ellcey
@ 2008-10-06 17:26           ` Andrey Belevantsev
  0 siblings, 0 replies; 28+ messages in thread
From: Andrey Belevantsev @ 2008-10-06 17:26 UTC (permalink / raw)
  To: sje; +Cc: gcc-patches, wilson, vmakarov, Alexander Monakov

Steve Ellcey wrote:
> I have reviewed the changes on the sel-sched-branch and approve the IA64
> specific changes.  I noticed there were a few non-IA64 changes on the branch
> and obviously I can't approve those but the IA64 changes look OK.
Thanks for the review!  I will retest the IA64 changes over the next few 
days and commit.  The non-IA64 changes should be just the difference 
between scheduling hooks due to the changes we handle speculation.  Vlad 
   had already approved those with the main part of sel-sched patch, I 
just had to revert that pieces when committing the main part without the 
IA64 part.  Of course, if there are other changes, such as bugfixes not 
yet approved for mainline, I will not commit them.

Thanks again,
Andrey

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Selective scheduling pass
  2008-06-03 14:24 [RFC] Selective scheduling pass Andrey Belevantsev
                   ` (2 preceding siblings ...)
  2008-06-03 14:28 ` Selective scheduling pass - target changes (ia64 & rs6000) [3/3] Andrey Belevantsev
@ 2008-06-03 22:03 ` Vladimir Makarov
  2008-06-04 16:55 ` Mark Mitchell
  2008-06-05  3:45 ` Seongbae Park (박성배, 朴成培)
  5 siblings, 0 replies; 28+ messages in thread
From: Vladimir Makarov @ 2008-06-03 22:03 UTC (permalink / raw)
  To: Andrey Belevantsev; +Cc: GCC Patches, Jim Wilson

Andrey Belevantsev wrote:
> Hello,
>
> The patches in this thread introduce selective scheduler in GCC, 
> implemented by myself, Dmitry Melnik, Dmitry Zhurikhin, Alexander 
> Monakov, and Maxim Kuvyrkov while he was at ISP RAS.  Selective 
> scheduler is aimed at scheduling eager targets such as ia64, power6, 
> and cell.  The implementation contains both the scheduler and the 
> software pipeliner, which can be used on loops with control flow not 
> handled by SMS.  The scheduler can work either before or after 
> register allocation, but it is currently tuned to work after.
>
> The scheduler was bootstrapped and tested on ia64, with all default 
> languages, both as a first and as a second scheduler.  It was also 
> bootstrapped with c, c++, and fortran enabled on ppc64 and x86-64.
>
> On ia64, test results on SPEC2k FP comparing -O3 -ffast-math on trunk 
> and sel-sched branch show 3.8% speedup on average, SPEC INT shows both 
> small speedups and regressions, staying around neutral in average:
>
Congratulation!  I followed the project for a long time.  Finally some 
useful milestone is achieved and you have got a pretty big improvement.  
The scheduling algorithm is superior than what we had because it permits 
to improve insn schedules on all execution paths by insn cloning and 
other transformations.
> On power6, Revital Eres saw speedups on several tests; additional 
> tuning is required to get good results there, which is complicated 
> because we don't have power6.  On cell, there was some third-party 
> testing in 2007, showing 4-6% speedups, but I don't have more detailed 
> information.
>
> Compile time slowdown measured with --enable-checking=assert is quite 
> significant -- about 12% on spec int and about 18% on spec fp and 
> cc1-i-files collection.  For this reason, we have enabled selective 
> scheduler by default at -O3 on ia64 and disabled by default on other 
> targets.
>
Itanium is pretty specific target.  It would be interesting to know how 
big a slowdown for ppc.
> Our current plan is to work on further compile time improvements and 
> performance tuning for ppc and cell, hopefully with the help of IBM 
> Haifa folks.  If we will complete this work before the end of stage2, 
> then we can enable selective scheduling at -O3 also for ppc in 4.4.  
> In the mid-term, we will work on removing the ebb scheduler, as it is 
> now used on ia64 only and will be superseded by selective scheduler 
> when we'll further improve compile time.
>

I think we should rid of EBB scheduler finally.  You could try to 
improve compile-time problem preventing some transformations in the new 
scheduler in -O2 mode.

If you solve compile-time problem, I think we should work on removing 
haifa-scheduler too to have just one insn scheduler.  But as I 
understand it will not happen soon.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Selective scheduling pass
  2008-06-03 14:24 [RFC] Selective scheduling pass Andrey Belevantsev
                   ` (3 preceding siblings ...)
  2008-06-03 22:03 ` [RFC] Selective scheduling pass Vladimir Makarov
@ 2008-06-04 16:55 ` Mark Mitchell
  2008-06-04 20:50   ` Andrey Belevantsev
  2008-06-05  3:45 ` Seongbae Park (박성배, 朴成培)
  5 siblings, 1 reply; 28+ messages in thread
From: Mark Mitchell @ 2008-06-04 16:55 UTC (permalink / raw)
  To: Andrey Belevantsev; +Cc: GCC Patches, Jim Wilson, Vladimir Makarov

Andrey Belevantsev wrote:

> The patches in this thread introduce selective scheduler in GCC, 
> implemented by myself, Dmitry Melnik, Dmitry Zhurikhin, Alexander 
> Monakov, and Maxim Kuvyrkov while he was at ISP RAS.  Selective 
> scheduler is aimed at scheduling eager targets such as ia64, power6, and 
> cell.  The implementation contains both the scheduler and the software 
> pipeliner, which can be used on loops with control flow not handled by 
> SMS.  The scheduler can work either before or after register allocation, 
> but it is currently tuned to work after.
> 
> On ia64, test results on SPEC2k FP comparing -O3 -ffast-math on trunk 
> and sel-sched branch show 3.8% speedup on average, SPEC INT shows both 
> small speedups and regressions, staying around neutral in average:

That's a very good result.  Congratulations!

I know that this scheduler is aimed at CPUs like the ones you mention 
above.  However, would it function correctly on other CPUs with more 
"traditional" characteristics, like older ARM, MIPS, or x86 cores?  And, 
would it be reasonably possible to tune it for those CPUs as well?

As with the IRA allocator, I'd like to avoid having multiple schedulers 
in GCC.  (I know we've done that for a while, but I still think it's 
undesirable.)  So, I'd like to see if we can get this to work well 
across all of the Primary and Secondary CPUs, and then just make it "the 
GCC scheduler" rather than an optional thing enabled at some 
optimization levels on some CPUs.

Do you think that's feasible?  Or do you think that there are inherent 
aspects of the algorithm that mean that we need to have this new 
scheduler for one class of CPUs and the old scheduler for the other 
class?  Is there any way to make the new scheduler do a reasonable job 
with the existing descriptions in GCC, so that port maintainers can tune 
later, or is a level of effort like that for Itanium require

> Compile time slowdown measured with --enable-checking=assert is quite 
> significant -- about 12% on spec int and about 18% on spec fp and 
> cc1-i-files collection.  For this reason, we have enabled selective 
> scheduler by default at -O3 on ia64 and disabled by default on other 
> targets.

Do you understand what's causing the compile-time slowdown?

Thanks,

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Selective scheduling pass
  2008-06-04 16:55 ` Mark Mitchell
@ 2008-06-04 20:50   ` Andrey Belevantsev
  0 siblings, 0 replies; 28+ messages in thread
From: Andrey Belevantsev @ 2008-06-04 20:50 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: GCC Patches, Jim Wilson, Vladimir Makarov

Mark Mitchell wrote:
> That's a very good result.  Congratulations! 
Thank you!

> I know that this scheduler is aimed at CPUs like the ones you mention 
> above.  However, would it function correctly on other CPUs with more 
> "traditional" characteristics, like older ARM, MIPS, or x86 cores?  And, 
> would it be reasonably possible to tune it for those CPUs as well?
When a target doesn't do anything "fancy" in scheduler hooks, everything 
should just work (modulo bugs, of course; we've tried only ppc64 and 
x86-64).  In case a target saves some information describing scheduler's 
state, simple hooks manipulating this data should be implemented, like 
we did for the rs6000 port.

> As with the IRA allocator, I'd like to avoid having multiple schedulers 
> in GCC.  (I know we've done that for a while, but I still think it's 
> undesirable.)  So, I'd like to see if we can get this to work well 
> across all of the Primary and Secondary CPUs, and then just make it "the 
> GCC scheduler" rather than an optional thing enabled at some 
> optimization levels on some CPUs.
This is our goal as well, and I think it can be done incrementally.  We 
are now working on the ppc performance.  Then we need to tune the 
scheduler so that for traditional targets it is no worse in performance 
and the slowdown is reasonable, e.g. with disabling pipelining and 
decreasing the scheduling window.  The last thing to do is to speed up 
the implementation so that for scheduling-eager targets with pipelining 
enabled the slowdown will be acceptable for -O2.

Note that the selective scheduler does not subsume SMS, but complements 
it, because SMS does better job for countable loops, but cannot handle 
loops with control flow and with unknown number of iterations.  So in 
any case there will be two schedulers.

> Do you think that's feasible?  Or do you think that there are inherent 
> aspects of the algorithm that mean that we need to have this new 
> scheduler for one class of CPUs and the old scheduler for the other 
> class?  Is there any way to make the new scheduler do a reasonable job 
> with the existing descriptions in GCC, so that port maintainers can tune 
> later, or is a level of effort like that for Itanium require
The ia64 backend is very complex, and we put a lot of efforts in tuning 
it by itself -- you can see it in my other mail about target changes. 
So I think that tuning for other targets will be simpler.  The cell 
results I mentioned in the mail were received from a guy who did the 
tuning internally in Samsung, and AFAIR he didn't mentioned any 
target-independent changes he had to do, but basically he just made it 
working.

>> Compile time slowdown measured with --enable-checking=assert is quite 
>> significant -- about 12% on spec int and about 18% on spec fp and 
>> cc1-i-files collection.  For this reason, we have enabled selective 
>> scheduler by default at -O3 on ia64 and disabled by default on other 
>> targets.
> 
> Do you understand what's causing the compile-time slowdown?
The part that takes most time is the update of availability sets, as 
this is the central part of the algorithm.  Renaming is quite expensive 
too, but we have tackled this limiting it only to several insns with the 
largest priority.  To make the updates faster, you need to build the 
data dependence graph and to keep it up to date while scheduling. 
Unfortunately, we didn't manage to do this during this project.  The 
first step towards this goal will be to make the dependence graph 
classify the dependencies built on control/data, lhs/rhs, 
register/memory etc.  Then we can invent the mechanism of updating the 
graph, which would not be trivial -- e.g. when an insn gets renamed, we 
have introduced a register-register copy which can generate completely 
new register dependencies that cannot be devised from existing ones. 
Such a project is likely to make it to trunk on the next release cycle, 
and that would correspond to the last step of the incremental approach 
outlined above.

Yours, Andrey

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Selective scheduling pass
  2008-06-03 14:24 [RFC] Selective scheduling pass Andrey Belevantsev
                   ` (4 preceding siblings ...)
  2008-06-04 16:55 ` Mark Mitchell
@ 2008-06-05  3:45 ` Seongbae Park (박성배, 朴成培)
  2008-06-05 13:49   ` Andrey Belevantsev
  5 siblings, 1 reply; 28+ messages in thread
From: Seongbae Park (박성배, 朴成培) @ 2008-06-05  3:45 UTC (permalink / raw)
  To: Andrey Belevantsev; +Cc: GCC Patches, Jim Wilson, Vladimir Makarov

On Tue, Jun 3, 2008 at 7:16 AM, Andrey Belevantsev <abel@ispras.ru> wrote:
> Hello,
>
> The patches in this thread introduce selective scheduler in GCC, implemented
> by myself, Dmitry Melnik, Dmitry Zhurikhin, Alexander Monakov, and Maxim
> Kuvyrkov while he was at ISP RAS.  Selective scheduler is aimed at
> scheduling eager targets such as ia64, power6, and cell.  The implementation
> contains both the scheduler and the software pipeliner, which can be used on
> loops with control flow not handled by SMS.  The scheduler can work either
> before or after register allocation, but it is currently tuned to work
> after.
>
> The scheduler was bootstrapped and tested on ia64, with all default
> languages, both as a first and as a second scheduler.  It was also
> bootstrapped with c, c++, and fortran enabled on ppc64 and x86-64.
>
> On ia64, test results on SPEC2k FP comparing -O3 -ffast-math on trunk and
> sel-sched branch show 3.8% speedup on average, SPEC INT shows both small
> speedups and regressions, staying around neutral in average:
>
> 168.wupwise     513     552     7,60%
> 171.swim        757     772     1,98%
> 172.mgrid       570     643     12,81%
> 173.applu       503     524     4,17%
> 177.mesa        796     795     -0,13%
> 178.galgel      814     787     -3,32%
> 179.art         1990    2098    5,43%
> 183.equake      513     569     10,92%
> 187.facerec     958     991     3,44%
> 188.ammp        765     775     1,31%
> 189.lucas       860     869     1,05%
> 191.fma3d       549     536     -2,37%
> 200.sixtrack    300     323     7,67%
> 301.apsi        522     546     4,60%
> Geomean         673,97  699,87  3,84%
>
> 164.gzip        683     682     -0,15%
> 175.vpr         814     802     -1,47%
> 176.gcc         1080    1069    -1,02%
> 181.mcf         701     708     1,00%
> 186.crafty      872     855     -1,95%
> 197.parser      729     728     -0,14%
> 252.eon         793     785     -1,01%
> 253.perlbmk     824     839     1,82%
> 254.gap         558     569     1,97%
> 255.vortex      1012    966     -4,55%
> 256.bzip2       758     762     0,53%
> 300.twolf       1005    1015    1,00%
> Geomean         806,04  803,25  -0,35%

Presumably this is with any profile feedback ?
If so, numbers look ok.
Have you tried it with profile feedback ?
Selective scheduling (and most other aggressive global scheduling algorithms)
can benefit quite a bit from profile feedback,
and tuning can be quite different for with and without profile feedback.

Seongbae

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] Selective scheduling pass
  2008-06-05  3:45 ` Seongbae Park (박성배, 朴成培)
@ 2008-06-05 13:49   ` Andrey Belevantsev
  0 siblings, 0 replies; 28+ messages in thread
From: Andrey Belevantsev @ 2008-06-05 13:49 UTC (permalink / raw)
  To: "Seongbae Park (박성배,
	朴成培)"
  Cc: GCC Patches, Jim Wilson, Vladimir Makarov

Seongbae Park (ë°•ì„±ë°°, æœ´æˆåŸ¹) wrote:
> Presumably this is with any profile feedback ?
> If so, numbers look ok.
You probably mean that the numbers are without profile feedback.  This 
is true.

> Have you tried it with profile feedback ?
> Selective scheduling (and most other aggressive global scheduling algorithms)
> can benefit quite a bit from profile feedback,
> and tuning can be quite different for with and without profile feedback.
No, we haven't tried that.  I've got the impression that profile 
optimizations are not of big importance to GCC developers, so we focused 
on tuning without profile feedback.  Nevertheless, we'll try SPEC with 
profiling feedback tonight.  I will be happy to discuss how the 
scheduler can be tuned to use the profiling information -- will you 
attend the summit btw?

Andrey

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Selective scheduling pass - target changes (ia64 & rs6000) [3/3]
@ 2008-08-29 15:10 David Edelsohn
  2008-08-31 13:35 ` Andrey Belevantsev
  0 siblings, 1 reply; 28+ messages in thread
From: David Edelsohn @ 2008-08-29 15:10 UTC (permalink / raw)
  To: Andrey Belevantsev
  Cc: GCC Patches, Jim Wilson, Vladimir Makarov, Ayal Zaks, Mark Mitchell

	* config/rs6000/rs6000.c (rs6000_init_sched_context,
	rs6000_alloc_sched_context, rs6000_set_sched_context,
	rs6000_free_sched_context): New functions.
	(struct _rs6000_sched_context): New.
	(rs6000_sched_reorder2): Do not modify INSN_PRIORITY for selective
	scheduling.
	(rs6000_sched_finish): Do not run for selective scheduling.

The rs6000 part of the patch is okay with a modification to the following chunk:

*************** rs6000_sched_finish (FILE *dump, int sch
*** 20085,20091 ****

    if (reload_completed && rs6000_sched_groups)
      {
!       if (rs6000_sched_insert_nops == sched_finish_none)
        return;

        if (rs6000_sched_insert_nops == sched_finish_pad_groups)
--- 20103,20110 ----

    if (reload_completed && rs6000_sched_groups)
      {
!       if (rs6000_sched_insert_nops == sched_finish_none
!         || sel_sched_p ())
        return;

        if (rs6000_sched_insert_nops == sched_finish_pad_groups)


Please change this to a separate test for clarify

+       /* Do not run sched_finish hook when selective scheduling enabled.  */
+       if (sel_sched_p ())
+        return;
+
        if (rs6000_sched_insert_nops == sched_finish_none)
          return;

instead of combining the tests.


Also, target maintainers have flexibility during stage 3 with respect
to changes local to a port,
so the Itanium changes can be approved and committed during stage 3,
although earlier would
be better.

Thanks, David

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Selective scheduling pass - target changes (ia64 & rs6000) [3/3]
  2008-08-29 15:10 Selective scheduling pass - target changes (ia64 & rs6000) [3/3] David Edelsohn
@ 2008-08-31 13:35 ` Andrey Belevantsev
  0 siblings, 0 replies; 28+ messages in thread
From: Andrey Belevantsev @ 2008-08-31 13:35 UTC (permalink / raw)
  To: David Edelsohn
  Cc: GCC Patches, Jim Wilson, Vladimir Makarov, Ayal Zaks, Mark Mitchell

Hello,

David Edelsohn wrote:
> 	* config/rs6000/rs6000.c (rs6000_init_sched_context,
> 	rs6000_alloc_sched_context, rs6000_set_sched_context,
> 	rs6000_free_sched_context): New functions.
> 	(struct _rs6000_sched_context): New.
> 	(rs6000_sched_reorder2): Do not modify INSN_PRIORITY for selective
> 	scheduling.
> 	(rs6000_sched_finish): Do not run for selective scheduling.
> 
> The rs6000 part of the patch is okay with a modification to the following chunk:
Thanks for the review, I'll fix that up.

> Also, target maintainers have flexibility during stage 3 with respect
> to changes local to a port,
> so the Itanium changes can be approved and committed during stage 3,
> although earlier would
> be better.
I didn't know that.  But, what would be the preferred policy for 
checking in the scheduler in this case?  Should I wait for the Itanium 
changes to be reviewed and then commit the whole patch, which means that 
target-independent changes would be committed during stage3?  Or, should 
I check in the scheduler without ia64 changes now, which means it will 
be non-functional on Itanium, and commit to reverting it in case the 
ia64 changes will not be reviewed even during stage 3?  I'll appreciate 
the advice on how to proceed.

Thanks, Andrey

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2008-10-06 17:06 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-06-03 14:24 [RFC] Selective scheduling pass Andrey Belevantsev
2008-06-03 14:26 ` Selective scheduling pass - middle end changes [1/1] Andrey Belevantsev
2008-06-11  1:04   ` Ian Lance Taylor
2008-06-11 13:40     ` Andrey Belevantsev
2008-06-11 14:30       ` Ian Lance Taylor
2008-06-27 13:10         ` Andrey Belevantsev
2008-06-30 16:16           ` Ian Lance Taylor
2008-07-08 14:54             ` Andrey Belevantsev
2008-07-08 15:29               ` Ian Lance Taylor
2008-08-22 15:55     ` Andrey Belevantsev
2008-06-03 14:27 ` Selective scheduling pass - scheduler changes [2/3] Andrey Belevantsev
2008-06-03 22:03   ` Vladimir Makarov
2008-08-22 15:52     ` Andrey Belevantsev
2008-06-03 14:28 ` Selective scheduling pass - target changes (ia64 & rs6000) [3/3] Andrey Belevantsev
2008-08-22 16:04   ` Andrey Belevantsev
2008-08-29 13:41     ` [Ping] [GWP/ia64/rs6000 maintainer needed] " Andrey Belevantsev
2008-08-29 15:01       ` Mark Mitchell
2008-09-25 22:39     ` sje
2008-09-26 14:57       ` Andrey Belevantsev
2008-10-03 22:22         ` Steve Ellcey
2008-10-06 17:26           ` Andrey Belevantsev
2008-06-03 22:03 ` [RFC] Selective scheduling pass Vladimir Makarov
2008-06-04 16:55 ` Mark Mitchell
2008-06-04 20:50   ` Andrey Belevantsev
2008-06-05  3:45 ` Seongbae Park (박성배, 朴成培)
2008-06-05 13:49   ` Andrey Belevantsev
2008-08-29 15:10 Selective scheduling pass - target changes (ia64 & rs6000) [3/3] David Edelsohn
2008-08-31 13:35 ` Andrey Belevantsev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).