* [PATCH] Tune PRE insertion, make -Os -ftree-pre actually do something
@ 2009-07-14 16:27 Richard Guenther
2009-10-21 14:49 ` Richard Guenther
0 siblings, 1 reply; 2+ messages in thread
From: Richard Guenther @ 2009-07-14 16:27 UTC (permalink / raw)
To: gcc-patches; +Cc: Daniel Berlin
This tries to get rid of the sledge-hammer that disables PRE if
the current function is not optimized for speed (which is always
true if optimize_size is set ...).
The idea is to allow regular and phi-translation triggered full redundancy
elimination to be performed even if a path is to be optimized for size.
Thus, we limit us to only perform insertions when we can remove a full
redundancy on a path in the CFG we want to optimize for speed.
The effect of this patch is that PRE is now enabled at -Os and performs
only full redundancy elimination (thus as if PRE would run but never
insert anything). This should reduce code-size and remove the odd
behavior that a VN before loop optimizations is missing at -Os.
I wonder if we don't want to specialize the case where the value is
available in all preds but is not the same value (thus, we'd only
need to insert a PHI node). For low indegree blocks this might
result in smaller code as well.
Bootstrap and regtest running on x86_64-unknown-linux-gnu.
Any comments?
Thanks,
Richard.
2009-07-14 Richard Guenther <rguenther@suse.de>
* tree-ssa-pre.c (do_regular_insertion): Only insert if a
redundancy along a path in the CFG we want to optimize for speed
is going to be removed.
(execute_pre): Do partial-PRE only if the function is to be
optimized for speed.
(gate_pre): Do not turn off all of PRE when not optimizing a
function for speed.
Index: gcc/tree-ssa-pre.c
===================================================================
*** gcc/tree-ssa-pre.c (revision 149626)
--- gcc/tree-ssa-pre.c (working copy)
*************** do_regular_insertion (basic_block block,
*** 3352,3357 ****
--- 3352,3358 ----
pre_expr eprime = NULL;
edge_iterator ei;
pre_expr edoubleprime = NULL;
+ bool do_insertion = false;
val = get_expr_value_id (expr);
if (bitmap_set_contains_value (PHI_GEN (block), val))
*************** do_regular_insertion (basic_block block,
*** 3403,3408 ****
--- 3404,3413 ----
{
avail[bprime->index] = edoubleprime;
by_some = true;
+ /* We want to perform insertions to remove a redundancy on
+ a path in the CFG we want to optimize for speed. */
+ if (optimize_edge_for_speed_p (pred))
+ do_insertion = true;
if (first_s == NULL)
first_s = edoubleprime;
else if (!pre_expr_eq (first_s, edoubleprime))
*************** do_regular_insertion (basic_block block,
*** 3413,3419 ****
already existing along every predecessor, and
it's defined by some predecessor, it is
partially redundant. */
! if (!cant_insert && !all_same && by_some && dbg_cnt (treepre_insert))
{
if (insert_into_preds_of_block (block, get_expression_id (expr),
avail))
--- 3418,3425 ----
already existing along every predecessor, and
it's defined by some predecessor, it is
partially redundant. */
! if (!cant_insert && !all_same && by_some && do_insertion
! && dbg_cnt (treepre_insert))
{
if (insert_into_preds_of_block (block, get_expression_id (expr),
avail))
*************** fini_pre (bool do_fre)
*** 4475,4485 ****
only wants to do full redundancy elimination. */
static unsigned int
! execute_pre (bool do_fre ATTRIBUTE_UNUSED)
{
unsigned int todo = 0;
! do_partial_partial = optimize > 2;
/* This has to happen before SCCVN runs because
loop_optimizer_init may create new phis, etc. */
--- 4481,4491 ----
only wants to do full redundancy elimination. */
static unsigned int
! execute_pre (bool do_fre)
{
unsigned int todo = 0;
! do_partial_partial = optimize > 2 && optimize_function_for_speed_p (cfun);
/* This has to happen before SCCVN runs because
loop_optimizer_init may create new phis, etc. */
*************** do_pre (void)
*** 4563,4570 ****
static bool
gate_pre (void)
{
! /* PRE tends to generate bigger code. */
! return flag_tree_pre != 0 && optimize_function_for_speed_p (cfun);
}
struct gimple_opt_pass pass_pre =
--- 4569,4575 ----
static bool
gate_pre (void)
{
! return flag_tree_pre != 0;
}
struct gimple_opt_pass pass_pre =
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [PATCH] Tune PRE insertion, make -Os -ftree-pre actually do something
2009-07-14 16:27 [PATCH] Tune PRE insertion, make -Os -ftree-pre actually do something Richard Guenther
@ 2009-10-21 14:49 ` Richard Guenther
0 siblings, 0 replies; 2+ messages in thread
From: Richard Guenther @ 2009-10-21 14:49 UTC (permalink / raw)
To: gcc-patches
On Tue, 14 Jul 2009, Richard Guenther wrote:
>
> This tries to get rid of the sledge-hammer that disables PRE if
> the current function is not optimized for speed (which is always
> true if optimize_size is set ...).
>
> The idea is to allow regular and phi-translation triggered full redundancy
> elimination to be performed even if a path is to be optimized for size.
> Thus, we limit us to only perform insertions when we can remove a full
> redundancy on a path in the CFG we want to optimize for speed.
>
> The effect of this patch is that PRE is now enabled at -Os and performs
> only full redundancy elimination (thus as if PRE would run but never
> insert anything). This should reduce code-size and remove the odd
> behavior that a VN before loop optimizations is missing at -Os.
>
> I wonder if we don't want to specialize the case where the value is
> available in all preds but is not the same value (thus, we'd only
> need to insert a PHI node). For low indegree blocks this might
> result in smaller code as well.
>
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
>
> Any comments?
No public comments yet - Steven said it clashes with his -Os PRE
patches, but I don't see them going in in stage3. This patch now
also addresses PR41778, so I added that testcase, re-bootstrapped
and tested the patch and threw it on one of our x86_64 SPEC 2006 tester.
If that doesn't turn up sth odd I'll apply the patch tomorrow.
Thanks,
Richard.
> 2009-07-14 Richard Guenther <rguenther@suse.de>
>
> * tree-ssa-pre.c (do_regular_insertion): Only insert if a
> redundancy along a path in the CFG we want to optimize for speed
> is going to be removed.
> (execute_pre): Do partial-PRE only if the function is to be
> optimized for speed.
> (gate_pre): Do not turn off all of PRE when not optimizing a
> function for speed.
>
> Index: gcc/tree-ssa-pre.c
> ===================================================================
> *** gcc/tree-ssa-pre.c (revision 149626)
> --- gcc/tree-ssa-pre.c (working copy)
> *************** do_regular_insertion (basic_block block,
> *** 3352,3357 ****
> --- 3352,3358 ----
> pre_expr eprime = NULL;
> edge_iterator ei;
> pre_expr edoubleprime = NULL;
> + bool do_insertion = false;
>
> val = get_expr_value_id (expr);
> if (bitmap_set_contains_value (PHI_GEN (block), val))
> *************** do_regular_insertion (basic_block block,
> *** 3403,3408 ****
> --- 3404,3413 ----
> {
> avail[bprime->index] = edoubleprime;
> by_some = true;
> + /* We want to perform insertions to remove a redundancy on
> + a path in the CFG we want to optimize for speed. */
> + if (optimize_edge_for_speed_p (pred))
> + do_insertion = true;
> if (first_s == NULL)
> first_s = edoubleprime;
> else if (!pre_expr_eq (first_s, edoubleprime))
> *************** do_regular_insertion (basic_block block,
> *** 3413,3419 ****
> already existing along every predecessor, and
> it's defined by some predecessor, it is
> partially redundant. */
> ! if (!cant_insert && !all_same && by_some && dbg_cnt (treepre_insert))
> {
> if (insert_into_preds_of_block (block, get_expression_id (expr),
> avail))
> --- 3418,3425 ----
> already existing along every predecessor, and
> it's defined by some predecessor, it is
> partially redundant. */
> ! if (!cant_insert && !all_same && by_some && do_insertion
> ! && dbg_cnt (treepre_insert))
> {
> if (insert_into_preds_of_block (block, get_expression_id (expr),
> avail))
> *************** fini_pre (bool do_fre)
> *** 4475,4485 ****
> only wants to do full redundancy elimination. */
>
> static unsigned int
> ! execute_pre (bool do_fre ATTRIBUTE_UNUSED)
> {
> unsigned int todo = 0;
>
> ! do_partial_partial = optimize > 2;
>
> /* This has to happen before SCCVN runs because
> loop_optimizer_init may create new phis, etc. */
> --- 4481,4491 ----
> only wants to do full redundancy elimination. */
>
> static unsigned int
> ! execute_pre (bool do_fre)
> {
> unsigned int todo = 0;
>
> ! do_partial_partial = optimize > 2 && optimize_function_for_speed_p (cfun);
>
> /* This has to happen before SCCVN runs because
> loop_optimizer_init may create new phis, etc. */
> *************** do_pre (void)
> *** 4563,4570 ****
> static bool
> gate_pre (void)
> {
> ! /* PRE tends to generate bigger code. */
> ! return flag_tree_pre != 0 && optimize_function_for_speed_p (cfun);
> }
>
> struct gimple_opt_pass pass_pre =
> --- 4569,4575 ----
> static bool
> gate_pre (void)
> {
> ! return flag_tree_pre != 0;
> }
>
> struct gimple_opt_pass pass_pre =
>
--
Richard Guenther <rguenther@suse.de>
Novell / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus Rex
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2009-10-21 14:11 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-07-14 16:27 [PATCH] Tune PRE insertion, make -Os -ftree-pre actually do something Richard Guenther
2009-10-21 14:49 ` Richard Guenther
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).