public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [tcb] Improve SSA-CCP to work with aliased stores, structures and arrays
@ 2004-10-15  4:55 Diego Novillo
  2004-10-15 19:22 ` Daniel Berlin
  0 siblings, 1 reply; 2+ messages in thread
From: Diego Novillo @ 2004-10-15  4:55 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 7175 bytes --]


Given

----------------------------------------------------------------------
struct A a;
const int B = 42;

void foo (int i)
{
  if (i > 10)
    a.a = 42;
  else
    {
      a.b = 21;
      a.a = a.b + 21;
    }

  if (a.a != B)
    link_error ();
}
----------------------------------------------------------------------

GCC will not fold 'a.a != B' into '0'.

While CCP has all the logic to propagate the constants in this case, it
is missing the ability to associate constants with stores and loads
(i.e., pointer dereferences, structures and global/aliased variables). 
We don't keep loads and stores in SSA, but we do build a factored
use-def web for them (in the virtual operands).

Essentially, the patch extends Brian Booth's work to propagate constants
in global variables.  It associates constant values with the SSA names
in the V_MAY_DEF/V_MUST_DEF operands.  However, this is not enough,
because we glob partial loads/stores with the base symbol:

	# a_5 = V_MAY_DEF <a_4>
	a.a = 2;

	# VUSE <a_5>
	x_3 = a.b;

In the example above, CCP will associate value '2' with 'a_5', but it
would be wrong to replace the load from 'a.b' with '2', because '2' had
been stored into a.a.  So, we also keep track of the memory reference
expression where the store was done.  This may not be necessary when we
start representing partial loads and stores (Dan?).

We also can't rely on looking up the store by checking the defining
statement of the SSA name, because we also need to propagate through
PHIs.

Store CCP is not too expensive, but it does have to do more work than
regular CCP, so I only enabled it at -O2 and parameterized CCP with a
flag to only run store CCP separately.  In particular, it's not
necessary to run store CCP before scalarization.  After timing various
alternatives, I settled on 3 CCP runs: one standard CCP early, one store
CCP after SRA, and another standard CCP near the end.

The patch also rearranges calls to FRE and copy-prop, and removes the
loop around the dominator pass.  In a test of cc1-i-files and PR8361, we
propagate 35% more constants than in mainline.  The timings aren't
significantly different.  Here's a summary of the major differences on
cc1-i-files:

Phase                     	Before	After	% change
 tree copy propagation        	 0.00	 7.36	    N/A
 tree PTA                     	 2.10	 1.76	 -16.2%
 tree alias analysis          	 2.01	 1.96	  -2.5%
 tree PHI insertion           	 4.15	 3.85	  -7.2%
 tree SSA rewrite             	 5.47	 5.04	  -7.9%
 tree SSA other               	23.25	20.37	 -12.4%
 tree operand scan            	19.43	21.10	   8.6%
 dominator optimization       	32.78	16.56	 -49.5%
 tree SRA                     	 0.33	 0.28	 -15.2%
 tree STORE-CCP               	 0.00	 2.28	   N/A
 tree CCP                     	 2.00	 4.42	 121.0%
 tree split crit edges        	 3.17	 3.03	  -4.4%
 tree FRE                     	 0.00	13.89	    N/A
 tree PRE                     	 5.50	 6.68	  21.5%
 tree conservative DCE        	 2.36	 4.64	  96.6%
 tree aggressive DCE          	 1.22	 1.55	  27.0%
 expand                       	25.30	20.25	 -20.0%
 varconst                     	 2.85	 2.65	  -7.0%
 jump                         	 1.50	 1.38	  -8.0%
 CSE                          	44.73	41.72	  -6.7%
 bypass jumps                 	 5.02	 4.28	 -14.7%
 CSE 2                        	28.61	26.40	  -7.7%
 branch prediction            	 5.16	 4.91	  -4.8%
 local alloc                  	 8.10	 7.65	  -5.6%
 global alloc                 	23.31	23.00	  -1.3%
 reload CSE regs              	11.21	10.22	  -8.8%
 flow 2                       	 2.55	 2.19	 -14.1%
Total tree passes           	143.28	157.10	   9.6%
Total compile time             	751.50	749.87	  -0.2%

So the net effect is positive.  We spend more time in the tree passes,
but the total compile time is somewhat better.  I expect to keep
improving these times.

Bootstrapped and tested x86, x86-64 and ppc.


Diego.

2004-10-14  Diego Novillo  <dnovillo@redhat.com>

	Constant propagation of stores and loads.

	* common.opt (ftree-store-ccp): New flag.
	* opts.c (decode_options): Enable STORE-CCP at -O2.
	* timevar.def (TV_TREE_STORE_CCP): New timer.
	* tree-flow.h (struct prop_value_d): Declare.
	(unmodifiable_var_p): Declare.
	* tree-optimize.c (init_tree_optimization_passes): Re-arrange
	CCP, FRE and copy-prop passes.
	Add call to pass_store_ccp.
	* tree-pass.h (pass_store_ccp): Declare.
	* tree-ssa-ccp.c (value): Remove.  Update all users.
	(const_val): Declare.
	(value_vector): Remove.  Update all users.
	(ccp_lattice_t): Rename from latticevalue.  Update all users.
	(do_store_ccp): Declare.
	(debug_lattice_value): New function.
	(get_default_value): Re-write.
	(set_lattice_value): Do not allow lattice transitions to lower
	lattice values (with the exception of CONSTANT->UNKNOWN_VAL).
	Return false on UNINITIALIZED->UNDEFINED transitions.
	(def_to_varying): Remove.
	(likely_value): Re-write.
	(need_imm_uses_for): Call likely_value if VAR is defined by an
	assignment.
	(ccp_initialize): Remove variable 'is_may_def'.
	(struct prop_stats_d): Declare.
	(prop_stats): Declare.
	(replace_uses_in): Count number of constants and copies
	propagated.
	(replace_vuse_in): Likewise.
	(replace_phi_args_in): Factor out from ...
	(substitute_and_fold): ... here.
	Initialize prop_stats.
	Always remove EH edges if the statement no longer has EH
	information.
	(ccp_lattice_meet): Re-write.  Add documentation for
	UNKNOWN_VAL.
	(ccp_visit_phi_node): Do not handle UNKNOWN_VAL if
	do_store_ccp is false.
	(ccp_fold): Do not handle loads if do_store_ccp is false.
	(visit_assignment): Re-write.
	(ccp_visit_stmt): Visit all assignments.
	(execute_ssa_ccp): Add argument 'store_ccp'.
	(do_ssa_ccp): New function.
	(do_ssa_store_ccp): New function.
	(gate_store_ccp): Declare.
	* tree-ssa-copy.c (copy_of): Declare.
	(do_store_copy_prop): Declare.
	(stmt_may_generate_copy): New function.
	(need_imm_uses_for): Call it.
	(get_first_copy_of): Use copy_of instead of SSA_NAME_VALUE.
	(get_last_copy_of): Likewise.
	(set_first_copy_of): Likewise.
	(dump_copy_of): Likewise.
	(copy_prop_visit_assignment): Likewise.
	(copy_prop_visit_cond_stmt): Likewise.
	(copy_prop_visit_stmt): Likewise.
	(init_copy_prop): Likewise.
	(fini_copy_prop): Likewise.
	* tree-ssa-dom.c (struct opt_stats_d): Add fields
	num_const_prop and num_copy_prop.
	(tree_ssa_dominator_optimize): Do not execute more than once.
	(dump_dominator_optimization_stats): Show constants and copies
	propagated.
	(cprop_into_successor_phis): Do not replace if NEW == ORIG.
	(cprop_operand):  Do not replace if OP == VAL.
	Do not call get_virtual_var when VAL is a GIMPLE register.
	Keep statistics on propagated copies and constants.
	* tree-ssa-propagate.c (first_vuse): New function.
	(first_vdef): New function.
	(stmt_makes_single_load): New function.
	(stmt_makes_single_store): New function.
	* tree-ssa-propagate.h (first_vuse): Declare.
	(first_vdef): Declare.
	(stmt_makes_single_load): Declare.
	(stmt_makes_single_store): Declare.
	* tree-ssa.c (verify_ssa): Call verify_stmts.
	* doc/invoke.texi (-fdump-tree-storeccp): Document.
	(-ftree-store-ccp): Document.

[-- Attachment #2: 20041014-store-ccp.diff.gz --]
[-- Type: application/x-gzip, Size: 24299 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [tcb] Improve SSA-CCP to work with aliased stores, structures and arrays
  2004-10-15  4:55 [tcb] Improve SSA-CCP to work with aliased stores, structures and arrays Diego Novillo
@ 2004-10-15 19:22 ` Daniel Berlin
  0 siblings, 0 replies; 2+ messages in thread
From: Daniel Berlin @ 2004-10-15 19:22 UTC (permalink / raw)
  To: Diego Novillo; +Cc: gcc-patches

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2100 bytes --]



On Fri, 15 Oct 2004, Diego Novillo wrote:

>
> Given
>
> ----------------------------------------------------------------------
> struct A a;
> const int B = 42;
>
> void foo (int i)
> {
>  if (i > 10)
>    a.a = 42;
>  else
>    {
>      a.b = 21;
>      a.a = a.b + 21;
>    }
>
>  if (a.a != B)
>    link_error ();
> }
> ----------------------------------------------------------------------
>
> GCC will not fold 'a.a != B' into '0'.
>
> While CCP has all the logic to propagate the constants in this case, it
> is missing the ability to associate constants with stores and loads
> (i.e., pointer dereferences, structures and global/aliased variables).
> We don't keep loads and stores in SSA, but we do build a factored
> use-def web for them (in the virtual operands).
>
> Essentially, the patch extends Brian Booth's work to propagate constants
> in global variables.  It associates constant values with the SSA names
> in the V_MAY_DEF/V_MUST_DEF operands.  However, this is not enough,
> because we glob partial loads/stores with the base symbol:
>
> 	# a_5 = V_MAY_DEF <a_4>
> 	a.a = 2;
>
> 	# VUSE <a_5>
> 	x_3 = a.b;
>
> In the example above, CCP will associate value '2' with 'a_5', but it
> would be wrong to replace the load from 'a.b' with '2', because '2' had
> been stored into a.a.  So, we also keep track of the memory reference
> expression where the store was done.  This may not be necessary when we
> start representing partial loads and stores (Dan?).

It won't be necessary.
Where we don't know, you'll see VUSE<a_5, 0, -1> (IE it's somewhere int he 
object) and where we do know, you'll see VUSE <a_5, 2, 4> or whatever.

Also, the immediate uses of <a_5, 2, 4> will be linked up with the right 
uses, and vice versa.
The only thing you'll have to watch out for is the "don't know" 
definitions of variables, which you can't propagate from (since we don't 
know if it really wrote that part), and don't know 
uses of variablse, which you can't propagate to.

You can think of V_MAY_DEF <a_5, 2, 4> as V_MUST_DEF <a_5, 2, 4>. We 
*know* it defined bytes 2-6 of a_5.

--Dan

[-- Attachment #2: Type: APPLICATION/X-GZIP, Size: 24299 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2004-10-15 19:22 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-10-15  4:55 [tcb] Improve SSA-CCP to work with aliased stores, structures and arrays Diego Novillo
2004-10-15 19:22 ` Daniel Berlin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).