public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Tune PRE insertion, make -Os -ftree-pre actually do  something
@ 2009-07-14 16:27 Richard Guenther
  2009-10-21 14:49 ` Richard Guenther
  0 siblings, 1 reply; 2+ messages in thread
From: Richard Guenther @ 2009-07-14 16:27 UTC (permalink / raw)
  To: gcc-patches; +Cc: Daniel Berlin


This tries to get rid of the sledge-hammer that disables PRE if
the current function is not optimized for speed (which is always
true if optimize_size is set ...).

The idea is to allow regular and phi-translation triggered full redundancy
elimination to be performed even if a path is to be optimized for size.
Thus, we limit us to only perform insertions when we can remove a full
redundancy on a path in the CFG we want to optimize for speed.

The effect of this patch is that PRE is now enabled at -Os and performs
only full redundancy elimination (thus as if PRE would run but never
insert anything).  This should reduce code-size and remove the odd
behavior that a VN before loop optimizations is missing at -Os.

I wonder if we don't want to specialize the case where the value is
available in all preds but is not the same value (thus, we'd only
need to insert a PHI node).  For low indegree blocks this might
result in smaller code as well.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Any comments?

Thanks,
Richard.

2009-07-14  Richard Guenther  <rguenther@suse.de>

	* tree-ssa-pre.c (do_regular_insertion): Only insert if a
	redundancy along a path in the CFG we want to optimize for speed
	is going to be removed.
	(execute_pre): Do partial-PRE only if the function is to be
	optimized for speed.
	(gate_pre): Do not turn off all of PRE when not optimizing a
	function for speed.

Index: gcc/tree-ssa-pre.c
===================================================================
*** gcc/tree-ssa-pre.c	(revision 149626)
--- gcc/tree-ssa-pre.c	(working copy)
*************** do_regular_insertion (basic_block block,
*** 3352,3357 ****
--- 3352,3358 ----
  	  pre_expr eprime = NULL;
  	  edge_iterator ei;
  	  pre_expr edoubleprime = NULL;
+ 	  bool do_insertion = false;
  
  	  val = get_expr_value_id (expr);
  	  if (bitmap_set_contains_value (PHI_GEN (block), val))
*************** do_regular_insertion (basic_block block,
*** 3403,3408 ****
--- 3404,3413 ----
  		{
  		  avail[bprime->index] = edoubleprime;
  		  by_some = true;
+ 		  /* We want to perform insertions to remove a redundancy on
+ 		     a path in the CFG we want to optimize for speed.  */
+ 		  if (optimize_edge_for_speed_p (pred))
+ 		    do_insertion = true;
  		  if (first_s == NULL)
  		    first_s = edoubleprime;
  		  else if (!pre_expr_eq (first_s, edoubleprime))
*************** do_regular_insertion (basic_block block,
*** 3413,3419 ****
  	     already existing along every predecessor, and
  	     it's defined by some predecessor, it is
  	     partially redundant.  */
! 	  if (!cant_insert && !all_same && by_some && dbg_cnt (treepre_insert))
  	    {
  	      if (insert_into_preds_of_block (block, get_expression_id (expr),
  					      avail))
--- 3418,3425 ----
  	     already existing along every predecessor, and
  	     it's defined by some predecessor, it is
  	     partially redundant.  */
! 	  if (!cant_insert && !all_same && by_some && do_insertion
! 	      && dbg_cnt (treepre_insert))
  	    {
  	      if (insert_into_preds_of_block (block, get_expression_id (expr),
  					      avail))
*************** fini_pre (bool do_fre)
*** 4475,4485 ****
     only wants to do full redundancy elimination.  */
  
  static unsigned int
! execute_pre (bool do_fre ATTRIBUTE_UNUSED)
  {
    unsigned int todo = 0;
  
!   do_partial_partial = optimize > 2;
  
    /* This has to happen before SCCVN runs because
       loop_optimizer_init may create new phis, etc.  */
--- 4481,4491 ----
     only wants to do full redundancy elimination.  */
  
  static unsigned int
! execute_pre (bool do_fre)
  {
    unsigned int todo = 0;
  
!   do_partial_partial = optimize > 2 && optimize_function_for_speed_p (cfun);
  
    /* This has to happen before SCCVN runs because
       loop_optimizer_init may create new phis, etc.  */
*************** do_pre (void)
*** 4563,4570 ****
  static bool
  gate_pre (void)
  {
!   /* PRE tends to generate bigger code.  */
!   return flag_tree_pre != 0 && optimize_function_for_speed_p (cfun);
  }
  
  struct gimple_opt_pass pass_pre =
--- 4569,4575 ----
  static bool
  gate_pre (void)
  {
!   return flag_tree_pre != 0;
  }
  
  struct gimple_opt_pass pass_pre =

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] Tune PRE insertion, make -Os -ftree-pre actually do  something
  2009-07-14 16:27 [PATCH] Tune PRE insertion, make -Os -ftree-pre actually do something Richard Guenther
@ 2009-10-21 14:49 ` Richard Guenther
  0 siblings, 0 replies; 2+ messages in thread
From: Richard Guenther @ 2009-10-21 14:49 UTC (permalink / raw)
  To: gcc-patches

On Tue, 14 Jul 2009, Richard Guenther wrote:

> 
> This tries to get rid of the sledge-hammer that disables PRE if
> the current function is not optimized for speed (which is always
> true if optimize_size is set ...).
> 
> The idea is to allow regular and phi-translation triggered full redundancy
> elimination to be performed even if a path is to be optimized for size.
> Thus, we limit us to only perform insertions when we can remove a full
> redundancy on a path in the CFG we want to optimize for speed.
> 
> The effect of this patch is that PRE is now enabled at -Os and performs
> only full redundancy elimination (thus as if PRE would run but never
> insert anything).  This should reduce code-size and remove the odd
> behavior that a VN before loop optimizations is missing at -Os.
> 
> I wonder if we don't want to specialize the case where the value is
> available in all preds but is not the same value (thus, we'd only
> need to insert a PHI node).  For low indegree blocks this might
> result in smaller code as well.
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> 
> Any comments?

No public comments yet - Steven said it clashes with his -Os PRE
patches, but I don't see them going in in stage3.  This patch now
also addresses PR41778, so I added that testcase, re-bootstrapped
and tested the patch and threw it on one of our x86_64 SPEC 2006 tester.
If that doesn't turn up sth odd I'll apply the patch tomorrow.

Thanks,
Richard.

> 2009-07-14  Richard Guenther  <rguenther@suse.de>
> 
> 	* tree-ssa-pre.c (do_regular_insertion): Only insert if a
> 	redundancy along a path in the CFG we want to optimize for speed
> 	is going to be removed.
> 	(execute_pre): Do partial-PRE only if the function is to be
> 	optimized for speed.
> 	(gate_pre): Do not turn off all of PRE when not optimizing a
> 	function for speed.
> 
> Index: gcc/tree-ssa-pre.c
> ===================================================================
> *** gcc/tree-ssa-pre.c	(revision 149626)
> --- gcc/tree-ssa-pre.c	(working copy)
> *************** do_regular_insertion (basic_block block,
> *** 3352,3357 ****
> --- 3352,3358 ----
>   	  pre_expr eprime = NULL;
>   	  edge_iterator ei;
>   	  pre_expr edoubleprime = NULL;
> + 	  bool do_insertion = false;
>   
>   	  val = get_expr_value_id (expr);
>   	  if (bitmap_set_contains_value (PHI_GEN (block), val))
> *************** do_regular_insertion (basic_block block,
> *** 3403,3408 ****
> --- 3404,3413 ----
>   		{
>   		  avail[bprime->index] = edoubleprime;
>   		  by_some = true;
> + 		  /* We want to perform insertions to remove a redundancy on
> + 		     a path in the CFG we want to optimize for speed.  */
> + 		  if (optimize_edge_for_speed_p (pred))
> + 		    do_insertion = true;
>   		  if (first_s == NULL)
>   		    first_s = edoubleprime;
>   		  else if (!pre_expr_eq (first_s, edoubleprime))
> *************** do_regular_insertion (basic_block block,
> *** 3413,3419 ****
>   	     already existing along every predecessor, and
>   	     it's defined by some predecessor, it is
>   	     partially redundant.  */
> ! 	  if (!cant_insert && !all_same && by_some && dbg_cnt (treepre_insert))
>   	    {
>   	      if (insert_into_preds_of_block (block, get_expression_id (expr),
>   					      avail))
> --- 3418,3425 ----
>   	     already existing along every predecessor, and
>   	     it's defined by some predecessor, it is
>   	     partially redundant.  */
> ! 	  if (!cant_insert && !all_same && by_some && do_insertion
> ! 	      && dbg_cnt (treepre_insert))
>   	    {
>   	      if (insert_into_preds_of_block (block, get_expression_id (expr),
>   					      avail))
> *************** fini_pre (bool do_fre)
> *** 4475,4485 ****
>      only wants to do full redundancy elimination.  */
>   
>   static unsigned int
> ! execute_pre (bool do_fre ATTRIBUTE_UNUSED)
>   {
>     unsigned int todo = 0;
>   
> !   do_partial_partial = optimize > 2;
>   
>     /* This has to happen before SCCVN runs because
>        loop_optimizer_init may create new phis, etc.  */
> --- 4481,4491 ----
>      only wants to do full redundancy elimination.  */
>   
>   static unsigned int
> ! execute_pre (bool do_fre)
>   {
>     unsigned int todo = 0;
>   
> !   do_partial_partial = optimize > 2 && optimize_function_for_speed_p (cfun);
>   
>     /* This has to happen before SCCVN runs because
>        loop_optimizer_init may create new phis, etc.  */
> *************** do_pre (void)
> *** 4563,4570 ****
>   static bool
>   gate_pre (void)
>   {
> !   /* PRE tends to generate bigger code.  */
> !   return flag_tree_pre != 0 && optimize_function_for_speed_p (cfun);
>   }
>   
>   struct gimple_opt_pass pass_pre =
> --- 4569,4575 ----
>   static bool
>   gate_pre (void)
>   {
> !   return flag_tree_pre != 0;
>   }
>   
>   struct gimple_opt_pass pass_pre =
> 

-- 
Richard Guenther <rguenther@suse.de>
Novell / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus Rex

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2009-10-21 14:11 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-07-14 16:27 [PATCH] Tune PRE insertion, make -Os -ftree-pre actually do something Richard Guenther
2009-10-21 14:49 ` Richard Guenther

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).