From: Richard Guenther <richard.guenther@gmail.com>
To: Tom de Vries <vries@codesourcery.com>
Cc: Steven Bosscher <stevenb.gcc@gmail.com>, gcc-patches@gcc.gnu.org
Subject: Re: [PATCH, PR43864] Gimple level duplicate block cleanup.
Date: Tue, 12 Jul 2011 14:37:00 -0000 [thread overview]
Message-ID: <CAFiYyc2VhgbAZ8U2iDMJqApzHR+GyCdOfskMEVvr5Mf2TwB8_A@mail.gmail.com> (raw)
In-Reply-To: <4E1C3A19.8060706@codesourcery.com>
On Tue, Jul 12, 2011 at 2:12 PM, Tom de Vries <vries@codesourcery.com> wrote:
> Hi Richard,
>
> here's a new version of the pass. I attempted to address as much as possible
> your comments. The pass was bootstrapped and reg-tested on x86_64.
>
> On 06/14/2011 05:01 PM, Richard Guenther wrote:
>> On Fri, Jun 10, 2011 at 6:54 PM, Tom de Vries <vries@codesourcery.com> wrote:
>>> Hi Richard,
>>>
>>> thanks for the review.
>>>
>>> On 06/08/2011 11:55 AM, Richard Guenther wrote:
>>>> On Wed, Jun 8, 2011 at 11:42 AM, Tom de Vries <vries@codesourcery.com> wrote:
>>>>> Hi Richard,
>>>>>
>>>>> I have a patch for PR43864. The patch adds a gimple level duplicate block
>>>>> cleanup. The patch has been bootstrapped and reg-tested on x86_64, and
>>>>> reg-tested on ARM. The size impact on ARM for spec2000 is shown in the following
>>>>> table (%, lower is better).
>>>>>
>>>>> none pic
>>>>> thumb1 thumb2 thumb1 thumb2
>>>>> spec2000 99.9 99.9 99.8 99.8
>>>>>
>>>>> PR43864 is currently marked as a duplicate of PR20070, but I'm not sure that the
>>>>> optimizations proposed in PR20070 would fix this PR.
>>>>>
>>>>> The problem in this PR is that when compiling with -O2, the example below should
>>>>> only have one call to free. The original problem is formulated in terms of -Os,
>>>>> but currently we generate one call to free with -Os, although still not the
>>>>> smallest code possible. I'll show here the -O2 case, since that's similar to the
>>>>> original PR.
>>>>>
>>>
>>> Example A. (naming it for reference below)
>>>
>>>>> #include <stdio.h>
>>>>> void foo (char*, FILE*);
>>>>> char* hprofStartupp(char *outputFileName, char *ctx)
>>>>> {
>>>>> char fileName[1000];
>>>>> FILE *fp;
>>>>> sprintf(fileName, outputFileName);
>>>>> if (access(fileName, 1) == 0) {
>>>>> free(ctx);
>>>>> return 0;
>>>>> }
>>>>>
>>>>> fp = fopen(fileName, 0);
>>>>> if (fp == 0) {
>>>>> free(ctx);
>>>>> return 0;
>>>>> }
>>>>>
>>>>> foo(outputFileName, fp);
>>>>>
>>>>> return ctx;
>>>>> }
>>>>>
>>>>> AFAIU, there are 2 complementary methods of rtl optimizations proposed in PR20070.
>>>>> - Merging 2 blocks which are identical expect for input registers, by using a
>>>>> conditional move to choose between the different input registers.
>>>>> - Merging 2 blocks which have different local registers, by ignoring those
>>>>> differences
>>>>>
>>>>> Blocks .L6 and.L7 have no difference in local registers, but they have a
>>>>> difference in input registers: r3 and r1. Replacing the move to r5 by a
>>>>> conditional move would probably be benificial in terms of size, but it's not
>>>>> clear what condition the conditional move should be using. Calculating such a
>>>>> condition would add in size and increase the execution path.
>>>>>
>>>>> gcc -O2 -march=armv7-a -mthumb pr43864.c -S:
>>>>> ...
>>>>> push {r4, r5, lr}
>>>>> mov r4, r0
>>>>> sub sp, sp, #1004
>>>>> mov r5, r1
>>>>> mov r0, sp
>>>>> mov r1, r4
>>>>> bl sprintf
>>>>> mov r0, sp
>>>>> movs r1, #1
>>>>> bl access
>>>>> mov r3, r0
>>>>> cbz r0, .L6
>>>>> movs r1, #0
>>>>> mov r0, sp
>>>>> bl fopen
>>>>> mov r1, r0
>>>>> cbz r0, .L7
>>>>> mov r0, r4
>>>>> bl foo
>>>>> .L3:
>>>>> mov r0, r5
>>>>> add sp, sp, #1004
>>>>> pop {r4, r5, pc}
>>>>> .L6:
>>>>> mov r0, r5
>>>>> mov r5, r3
>>>>> bl free
>>>>> b .L3
>>>>> .L7:
>>>>> mov r0, r5
>>>>> mov r5, r1
>>>>> bl free
>>>>> b .L3
>>>>> ...
>>>>>
>>>>> The proposed patch solved the problem by dealing with the 2 blocks at a level
>>>>> when they are still identical: at gimple level. It detect that the 2 blocks are
>>>>> identical, and removes one of them.
>>>>>
>>>>> The following table shows the impact of the patch on the example in terms of
>>>>> size for -march=armv7-a:
>>>>>
>>>>> without with delta
>>>>> Os : 108 104 -4
>>>>> O2 : 120 104 -16
>>>>> Os thumb: 68 64 -4
>>>>> O2 thumb: 76 64 -12
>>>>>
>>>>> The gain in size for -O2 is that of removing the entire block, plus the
>>>>> replacement of 2 moves by a constant set, which also decreases the execution
>>>>> path. The patch ensures optimal code for both -O2 and -Os.
>>>>>
>>>>>
>>>>> By keeping track of equivalent definitions in the 2 blocks, we can ignore those
>>>>> differences in comparison. Without this feature, we would only match blocks with
>>>>> resultless operations, due the the ssa-nature of gimples.
>>>>> For example, with this feature, we reduce the following function to its minimum
>>>>> at gimple level, rather than at rtl level.
>>>>>
>>>
>>> Example B. (naming it for reference below)
>>>
>>>>> int f(int c, int b, int d)
>>>>> {
>>>>> int r, e;
>>>>>
>>>>> if (c)
>>>>> r = b + d;
>>>>> else
>>>>> {
>>>>> e = b + d;
>>>>> r = e;
>>>>> }
>>>>>
>>>>> return r;
>>>>> }
>>>>>
>>>>> ;; Function f (f)
>>>>>
>>>>> f (int c, int b, int d)
>>>>> {
>>>>> int e;
>>>>>
>>>>> <bb 2>:
>>>>> e_6 = b_3(D) + d_4(D);
>>>>> return e_6;
>>>>>
>>>>> }
>>>>>
>>>>> I'll send the patch with the testcases in a separate email.
>>>>>
>>>>> OK for trunk?
>>>>
>>>> I don't like that you hook this into cleanup_tree_cfg - that is called
>>>> _way_ too often.
>>>>
>>>
>>> Here is a reworked patch that addresses several concerns, particularly the
>>> compile time overhead.
>>>
>>> Changes:
>>> - The optimization is now in a separate file.
>>> - The optimization is now a pass rather than a cleanup. That allowed me to
>>> remove the test for pass-local flags.
>>> New is the pass driver tail_merge_optimize, based on
>>> tree-cfgcleanup.c:cleanup_tree_cfg_1.
>>> - The pass is run once, on SSA. Before, the patch would
>>> fix example A only before SSA and example B only on SSA.
>>> In order to fix example A on SSA, I added these changes:
>>> - handle the vop state at entry of bb1 and bb2 as equal (gimple_equal_p)
>>> - insert vop phi in bb2, and use that one (update_vuses)
>>> - complete pt_solutions_equal_p.
>>>
>>> Passed x86_64 bootstrapping and regression testing, currently regtesting on ARM.
>>>
>>> I placed the pass at the earliest point where it fixes example B: After copy
>>> propagation and dead code elimination, specifically, after the first invocation
>>> of pass_cd_dce. Do you know (any other points) where the pass should be scheduled?
>>
>> It's probably reasonable to run it after IPA inlining has taken place which
>> means insert it somewhen after the second pass_fre (I'd suggest after
>> pass_merge_phi).
>>
>
> I placed it there, but I ran into some interaction with
> pass_late_warn_uninitialized. Addition of the pass makes test
> gcc.dg/uninit-pred-2_c.c fail.
>
> FAIL: gcc.dg/uninit-pred-2_c.c bogus uninitialized var warning
> (test for bogus messages, line 43)
> FAIL: gcc.dg/uninit-pred-2_c.c real uninitialized var warning
> (test for warnings, line 45)
>
> int foo_2 (int n, int m, int r)
> {
> int flag = 0;
> int v;
>
> if (n)
> {
> v = r;
> flag = 1;
> }
>
> if (m) g++;
> else bar ();
>
> if (flag)
> blah (v); { dg-bogus "uninitialized" "bogus uninitialized var warning" }
> else
> blah (v); { dg-warning "uninitialized" "real uninitialized var warning" }
>
> return 0;
> }
>
> The pass replaces the second call to blah with the first one, and eliminates
> the if. After that, the uninitialized warning is issued for the line number
> of the first call to blah, while at source level the warning only makes sense
> for the second call to blah.
>
> Shall I try putting the pass after pass_late_warn_uninitialized?
No, simply pass -fno-tree-tail-merge in the testcase.
>> But my general comment still applies - I don't like the structural
>> comparison code at all and you should really use the value-numbering
>> machineries we have
>
> I now use sccvn.
Good.
>> or even better, merge this pass with FRE itself
>> (or DOM if that suits you more). For FRE you'd want to hook into
>> tree-ssa-pre.c:eliminate().
>>
>
> If we need to do the transformation after pass_late_warn_uninitialized, it needs
> to stay on its own, I suppose.
I suppose part of the high cost of the pass is running SCCVN, so it
makes sense to share that with the existing FRE run. Any reason
you use VN_NOWALK?
>>>> This also duplicates the literal matching done on the RTL level - instead
>>>> I think this optimization would be more related to value-numbering
>>>> (either that of SCCVN/FRE/PRE or that of DOM which also does
>>>> jump-threading).
>>>
>>> The pass currently does just duplicate block elimination, not cross-jumping.
>>> If we would like to extend this to cross-jumping, I think we need to do the
>>> reverse of value numbering: walk backwards over the bb, and keep track of the
>>> way values are used rather than defined. This will allows us to make a cut
>>> halfway a basic block.
>>
>> I don't understand - I propose to do literal matching but using value-numbering
>> for tracking equivalences to avoid literal matching for stuff we know is
>> equivalent. In fact I think it will be mostly calls and stores where we
>> need to do literal matching, but never intermediate computations on
>> registers.
>>
>
> I tried to implement that scheme now.
>
>> But maybe I miss something here.
>>
>>> In general, we cannot do cut halfway a basic block in the current implementation
>>> (of value numbering and forward matching), since we assume equivalence of the
>>> incoming vops at bb entry. This assumption is in general only valid if we indeed
>>> replace the entire block by another entire block.
>>
>> Why are VOPs of concern at all?
>>
>
> In the previous version, I inserted the phis for the vops manually.
> In the current version of the pass, I let TODO_update_ssa_only_virtuals deal
> with vops, so it's not relevant anymore.
>
>>> I imagine that a cross-jumping heuristic would be based on the length of the
>>> match and the amount of non-vop phis it would introduce. Then value numbering
>>> would be something orthogonal to this optimization, which would reduce amount of
>>> phis needed for a cross-jump.
>>> I think it would make sense to use SCCVN value numbering at the point that we
>>> have this backward matching.
>>>
>>> I'm not sure whether it's a good idea to try to replace the current forward
>>> local value numbering with SCCVN value numbering, since we currently declare
>>> vops equal, which are, in the global sense, not equal. And once we go to
>>> backward matching, we'll need something to keep track of the uses, and we can
>>> reuse the current infrastructure for that, but not the SCCVN value numbering.
>>>
>>> Does that make any sense?
>>
>> Ok, let me think about this a bit.
>>
>
> I tried to to be more clear on this in the header comment of the pass.
>
>> For now about the patch in general. The functions need renaming to
>> something more sensible now that this isn't cfg-cleanup anymore.
>>
>> I miss a general overview of the pass - it's hard to reverse engineer
>> its working for me.
>
> I added a header comment.
>
>> Like (working backwards), you are detecting
>> duplicate predecessors
>> - that obviously doesn't work for duplicates
>> without any successors, like those ending in noreturn calls.
>>
>
> Merging of blocks without successors works now.
>
>> + n = EDGE_COUNT (bb->preds);
>> +
>> + for (i = 0; i < n; ++i)
>> + {
>> + e1 = EDGE_PRED (bb, i);
>> + if (e1->flags & EDGE_COMPLEX)
>> + continue;
>> + for (j = i + 1; j < n; ++j)
>> + {
>>
>> that's quadratic in the number of predecessors.
>>
>
> The quadratic comparison is now limited by PARAM_TAIL_MERGE_MAX_COMPARISONS.
> Each bb is compared to maximally PARAM_TAIL_MERGE_MAX_COMPARISONS similar bbs
> per worklist iteration.
>
>> + /* Block e1->src might be deleted. If bb and e1->src are the same
>> + block, delete e2->src instead, by swapping e1 and e2. */
>> + e1_swapped = (bb == e1->src) ? e2: e1;
>> + e2_swapped = (bb == e1->src) ? e1: e2;
>>
>> is that because you incrementally merge preds two at a time? As you
>> are deleting blocks don't you need to adjust the quadratic walking?
>> Thus, with say four equivalent preds won't your code crash anyway?
>>
>
> I think it was to make calculation of dominator info easier, but I use now
> functions from dominance.c for that, so this piece of code is gone.
>
>> I think the code needs to delay the CFG manipulation to the end
>> of this function.
>>
>
> I now delay the cfg manipulation till after each analysis phase.
>
>> +/* Returns whether for all phis in E1->dest the phi alternatives for E1 and
>> + E2 are either:
>> + - equal, or
>> + - defined locally in E1->src and E2->src.
>> + In the latter case, register the alternatives in *PHI_EQUIV. */
>> +
>> +static bool
>> +same_or_local_phi_alternatives (equiv_t *phi_equiv, edge e1, edge e2)
>> +{
>> + int n1 = e1->dest_idx;
>> + int n2 = e2->dest_idx;
>> + gimple_stmt_iterator gsi;
>> + basic_block dest = e1->dest;
>> + gcc_assert (dest == e2->dest);
>>
>> too many asserts in general - I'd say for this case pass in the destination
>> block as argument.
>>
>> + gcc_assert (val1 != NULL_TREE);
>> + gcc_assert (val2 != NULL_TREE);
>>
>> superfluous.
>>
>> +static bool
>> +cleanup_duplicate_preds_1 (equiv_t phi_equiv, edge e1, edge e2)
>> ...
>> + VEC (edge,heap) *redirected_edges;
>> + gcc_assert (bb == e2->dest);
>>
>> same.
>>
>> + if (e1->flags != e2->flags)
>> + return false;
>>
>> that's bad - it should handle EDGE_TRUE/FALSE_VALUE mismatches
>> by swapping edges in the preds.
>>
>
> That's handled now.
>
>> + /* TODO: We could allow multiple successor edges here, as long as bb1 and bb2
>> + have the same successors. */
>> + if (EDGE_COUNT (bb1->succs) != 1 || EDGE_COUNT (bb2->succs) != 1)
>> + return false;
>>
>> hm, ok - that would need fixing, too. Same or mergeable successors
>> of course, which makes me wonder if doing this whole transformation
>> incrementally and locally is a good idea ;) Also
>>
>
> Also handled now.
>
>> + /* Calculate the changes to be made to the dominator info.
>> + Calculate bb2_dom. */
>> ...
>>
>> wouldn't be necessary I suppose (just throw away dom info after the
>> pass).
>>
>> That is, I'd globally record BB equivalences (thus, "value-number"
>> BBs) and apply the CFG manipulations at a single point.
>>
>
> I delay the cfg manipulation till after each analysis phase. Delaying the cfg
> manipulation till the end of the pass instead might make the analysis code more
> convoluted.
>
>> Btw, I miss where you insert PHI nodes for all uses that flow in
>> from the preds preds - you do that for VOPs but not for real
>> operands?
>>
>
> Indeed, inserting phis for non-vops is a todo.
>
>> + /* Replace uses of vuse2 with uses of the phi. */
>> + for (gsi = gsi_start_bb (bb2); !gsi_end_p (gsi); gsi_next (&gsi))
>> + {
>>
>> why not walk immediate uses of the old PHI and SET_USE to
>> the new one instead (for those uses in the duplicate BB of course)?
>>
>
> And I no longer insert VOP phis, but let a TODO handle that, so this code is gone.
Ok. Somewhat costly in comparison though.
>> + case GSS_CALL:
>> + if (!pt_solution_equal_p (gimple_call_use_set (s1),
>> + gimple_call_use_set (s2))
>>
>> I don't understand why you are concerned about equality of
>> points-to information. Why not simply ior it (pt_solution_ior_into - note
>> they are shared so you need to unshare them first).
>>
>
> I let a todo handle the alias info now.
Hmm, that's not going to work if it's needed for correctness.
>> +/* Return true if p1 and p2 can be considered equal. */
>> +
>> +static bool
>> +pt_solution_equal_p (struct pt_solution *p1, struct pt_solution *p2)
>>
>> would go into tree-ssa-structalias.c instead.
>>
>> +static bool
>> +gimple_base_equal_p (gimple s1, gimple s2)
>> +{
>> ...
>> + if (gimple_modified_p (s1) || gimple_modified_p (s2))
>> + return false;
>>
>> that shouldn't be of concern.
>>
>> + if (s1->gsbase.subcode != s2->gsbase.subcode)
>> + return false;
>>
>> for assigns that are of class GIMPLE_SINGLE_RHS we do not
>> update subcode during transformations so it can differ for now
>> equal statements.
>>
>
> handled properly now.
>
>> I'm not sure if a splay tree for the SSA name version equivalency
>> map is the best representation - I would have used a simple
>> array of num_ssa_names size and assign value-numbers
>> (the lesser version for example).
>>
>> Thus equiv_insert would do
>>
>> value = MIN (SSA_NAME_VERSION (val1), SSA_NAME_VERSION (val2));
>> values[SSA_NAME_VERSION (val1)] = value;
>> values[SSA_NAME_VERSION (val2)] = value;
>>
>> if the names are not defined in bb1 resp. bb2 we would have to insert
>> a PHI node in the merged block - that would be a cost thingy for
>> doing this value-numbering in a more global way.
>>
>
> local value numbering code has been removed.
>
>> You don't seem to be concerned about the SSA names points-to
>> information, but it surely has the same issues as that of the calls
>> (so either they should be equal or they should be conservatively
>> merged). But as-is you don't allow any values to flow into the
>> merged blocks that are not equal for both edges, no?
>>
>
> Correct, that's still a todo.
>
>> + TV_TREE_CFG, /* tv_id */
>>
>> add a new timevar. We wan to be able to turn the pass off,
>> so also add a new option (I can see it might make debugging harder
>> in some cases).
>>
>
> I added -ftree-tail-merge and TV_TREE_TAIL_MERGE.
>
>> Can you assess the effect of the patch on GCC itself (for example
>> when building cc1?)? What's the size benefit and the compile-time
>> overhead?
>>
>
> effect on building cc1:
>
> real user sys
> without: 19m50.158s 19m 2.090s 0m20.860s
> with: 19m59.456s 19m17.170s 0m20.350s
> ----------
> +15.080s
> +1.31%
That's quite a lot of time.
> $ size without/cc1 with/cc1
> text data bss dec hex filename
> 17515986 41320 1364352 18921658 120b8ba without/cc1
> 17399226 41320 1364352 18804898 11ef0a2 with/cc1
> --------
> -116760
> -0.67%
>
> OK for trunk, provided build & reg-testing on ARM is ok?
I miss additions to the testsuite.
+static bool
+bb_dominated_by_p (basic_block bb1, basic_block bb2)
please use
+ if (TREE_CODE (val1) == SSA_NAME)
+ {
+ if (!same_preds
&& !SSA_NAME_IS_DEFAULT_DEF (val1)
&& !dominated_by_p (bb2, gimple_bb (SSA_NAME_DEF_STMT (val1)))
return false;
instead. All stmts should have a BB apart from def stmts of default defs
(which are gimple_nops).
+/* Return the canonical scc_vn tree for X, if we can use scc_vn_info.
+ Otherwise, return X. */
+
+static tree
+gvn_val (tree x)
+{
+ return ((scc_vn_ok && x != NULL && TREE_CODE (x) == SSA_NAME)
+ ? VN_INFO ((x))->valnum : x);
+}
I suppose we want to export vn_valueize from tree-ssa-sccvn.c instead
which seems to perform the same. Do you ever call the above
when scc_vn_ok is false or x is NULL?
+static bool
+gvn_uses_equal (tree val1, tree val2, basic_block bb1,
+ basic_block bb2, bool same_preds)
+{
+ gimple def1, def2;
+ basic_block def1_bb, def2_bb;
+
+ if (val1 == NULL_TREE || val2 == NULL_TREE)
+ return false;
does this ever happen?
+ if (gvn_val (val1) != gvn_val (val2))
+ return false;
I suppose a shortcut
if (val1 == val2)
return true;
is possible?
+static int *bb_size;
+
+/* Init bb_size administration. */
+
+static void
+init_bb_size (void)
+{
if you need more per-BB info you can hook it into bb->aux. What's the
size used for (I guess I'll see below ...)?
+ for (gsi = gsi_start_nondebug_bb (bb);
+ !gsi_end_p (gsi); gsi_next_nondebug (&gsi))
+ size++;
is pretty rough. I guess for a quick false result for comparing BBs
(which means you could initialize the info lazily?)
+struct same_succ
+{
+ /* The bbs that have the same successor bbs. */
+ bitmap bbs;
+ /* The successor bbs. */
+ bitmap succs;
+ /* Indicates whether the EDGE_TRUE/FALSE_VALUEs of succ_flags are swapped for
+ bb. */
+ bitmap inverse;
+ /* The edge flags for each of the successor bbs. */
+ VEC (int, heap) *succ_flags;
+ /* Indicates whether the struct is in the worklist. */
+ bool in_worklist;
+};
looks somewhat odd at first sight - maybe a overall comment what this
is used for is missing. Well, let's see.
+static hashval_t
+same_succ_hash (const void *ve)
+{
+ const_same_succ_t e = (const_same_succ_t)ve;
+ hashval_t hashval = bitmap_hash (e->succs);
+ int flags;
+ unsigned int i;
+ unsigned int first = bitmap_first_set_bit (e->bbs);
+ int size = bb_size [first];
+ gimple_stmt_iterator gsi;
+ gimple stmt;
+ basic_block bb = BASIC_BLOCK (first);
+
+ hashval = iterative_hash_hashval_t (size, hashval);
+ for (gsi = gsi_start_nondebug_bb (bb);
+ !gsi_end_p (gsi); gsi_next_nondebug (&gsi))
+ {
+ stmt = gsi_stmt (gsi);
+ hashval = iterative_hash_hashval_t (gimple_code (stmt), hashval);
+ if (!is_gimple_call (stmt))
+ continue;
+ if (gimple_call_internal_p (stmt))
+ hashval = iterative_hash_hashval_t
+ ((hashval_t) gimple_call_internal_fn (stmt), hashval);
+ else
+ hashval = iterative_hash_expr (gimple_call_fn (stmt), hashval);
you could also keep a cache of the BB hash as you keep a cache
of the size (if this function is called multiple times per BB).
The hash looks relatively weak - for all asignments it will hash
in GIMPLE_ASSIGN only ... I'd at least hash in gimple_assign_rhs_code.
The call handling OTOH looks overly complicated to me ;)
The hash will be dependent on stmt ordering even if that doesn't matter,
like
i = i + 1;
j = j - 1;
vs. the swapped variant. Similar the successor edges are not sorted,
so true/false edges may be in different order.
Not sure yet if your comparison function would make those BBs
unequal anyway.
+static bool
+inverse_flags (const_same_succ_t e1, const_same_succ_t e2)
+{
+ int f1a, f1b, f2a, f2b;
+ int mask = ~(EDGE_TRUE_VALUE | EDGE_FALSE_VALUE);
+
+ if (VEC_length (int, e1->succ_flags) != 2)
+ return false;
...
I wonder why you keep a VEC of successor edges in same_succ_t
instead of using the embedded successor edge vector in the basic_block
structure?
+ bb_to_same_succ[bb->index] = *slot;
looks like a candidate for per-BB info in bb->aux, too.
+static void
+find_same_succ (void)
+{
+ int i;
+ same_succ_t same = same_succ_alloc ();
+
+ for (i = 0; i < last_basic_block; ++i)
+ {
+ find_same_succ_bb (BASIC_BLOCK (i), &same);
+ if (same == NULL)
+ same = same_succ_alloc ();
+ }
I suppose you want FOR_EACH_BB (excluding entry/exit block) or
FOR_ALL_BB (including them). The above also can
have BASIC_BLOCK(i) == NULL. Similar in other places.
+ for (i = 0; i < n1; ++i)
+ {
+ ei = EDGE_PRED (bb1, i);
+ for (j = 0; j < n2; ++j)
+ {
+ ej = EDGE_PRED (bb2, j);
+ if (ei->src != ej->src)
+ continue;
+ nr_matches++;
+ break;
+ }
+ }
FOR_EACH_EDGE (ei, iterator, bb1->preds)
if (!find_edge (ei->src, bb2))
return false;
is easier to parse.
+static bool
+gimple_subcode_equal_p (gimple s1, gimple s2, bool inv_cond)
+{
+ tree var, var_type;
+ bool honor_nans;
+
+ if (is_gimple_assign (s1)
+ && gimple_assign_rhs_class (s1) == GIMPLE_SINGLE_RHS)
+ return true;
the subcode for GIMPLE_SINGLE_RHS is gimple_assign_rhs_code
(TREE_CODE of gimple_assign_rhs1 actually).
+static bool
+gimple_base_equal_p (gimple s1, gimple s2, bool inv_cond)
I wonder if you still need this given ..
+static bool
+gimple_equal_p (gimple s1, gimple s2, bool same_preds, bool inv_cond)
+{
+ unsigned int i;
+ enum gimple_statement_structure_enum gss;
+ tree lhs1, lhs2;
+ basic_block bb1 = gimple_bb (s1), bb2 = gimple_bb (s2);
+
+ /* Handle omp gimples conservatively. */
+ if (is_gimple_omp (s1) || is_gimple_omp (s2))
+ return false;
+
+ /* Handle lhs. */
+ lhs1 = gimple_get_lhs (s1);
+ lhs2 = gimple_get_lhs (s2);
+ if (lhs1 != NULL_TREE && lhs2 != NULL_TREE)
+ return (same_preds && TREE_CODE (lhs1) == SSA_NAME
+ && TREE_CODE (lhs2) == SSA_NAME
+ && gvn_val (lhs1) == gvn_val (lhs2));
+ else if (!(lhs1 == NULL_TREE && lhs2 == NULL_TREE))
+ return false;
all lhs equivalency is defered to GVN (which means all GIMPLE_ASSIGN
and GIMPLE_CALL stmts with a lhs).
That leaves the case of calls without a lhs. I'd rather structure this
function like
if (gimple_code (s1) != gimple_code (s2))
return false;
swithc (gimple_code (s1))
{
case GIMPLE_CALL:
... compare arguments ...
if equal ok, if not and we have a lhs use GVN.
case GIMPLE_ASSIGN:
... compare GVN of the LHS ...
case GIMPLE_COND:
... compare operands ...
default:
return false;
}
+static bool
+bb_gimple_equal_p (basic_block bb1, basic_block bb2, bool same_preds,
+ bool inv_cond)
+
+{
you don't do an early out by comparing the pre-computed sizes. Mind
you can have hashtable collisions where they still differ (did you
check hashtable stats on it? how is the collision rate?)
+static bool
+bb_has_non_vop_phi (basic_block bb)
+{
+ gimple_seq phis = phi_nodes (bb);
+ gimple phi;
+
+ if (phis == NULL)
+ return false;
+
+ if (!gimple_seq_singleton_p (phis))
+ return true;
+
+ phi = gimple_seq_first_stmt (phis);
+ return !VOID_TYPE_P (TREE_TYPE (gimple_phi_result (phi)));
return is_gimple_reg (gimple_phi_result (phi));
+static void
+update_debug_stmts (void)
+{
+ int i;
+ basic_block bb;
+
+ for (i = 0; i < last_basic_block; ++i)
+ {
+ gimple stmt;
+ gimple_stmt_iterator gsi;
+
+ bb = BASIC_BLOCK (i);
FOR_EACH_BB
it must be possible to avoid scanning basic-blocks that are not affected
by the transform, no? In fact the only affected basic-blocks should be
those that were merged with another block?
+ /* Mark vops for updating. Without this, TODO_update_ssa_only_virtuals
+ won't do anything. */
+ mark_sym_for_renaming (gimple_vop (cfun));
it won't insert any PHIs, that's correct. Still somewhat ugly, a manual
update of PHI nodes should be possible.
+ if (dump_file)
+ {
+ fprintf (dump_file, "Before TODOs.\n");
with TDF_DETAILS only please.
+ free_dominance_info (CDI_DOMINATORS);
if you keep dominance info up-to-date there is no need to free it.
+ TODO_verify_ssa | TODO_verify_stmts
+ | TODO_verify_flow | TODO_update_ssa_only_virtuals
+ | TODO_rebuild_alias
please no TODO_rebuild_alias, simply remove it - alias info in merged
paths should be compatible enough if there is value-equivalence between
SSA names. At least you can't rely on TODO_rebuild_alias for
correctness - it is skipped if IPA PTA was run for example.
+ | TODO_cleanup_cfg
is that needed? If so return it from your execute function if you changed
anything only. But I doubt your transformation introduces cleanup
opportunities?
New options and params need documentation in doc/invoke.texi.
Thanks,
Richard.
> Thanks,
> - Tom
>
> 2011-07-12 Tom de Vries <tom@codesourcery.com>
>
> PR middle-end/43864
> * tree-ssa-tail-merge.c: New file.
> (bb_dominated_by_p): New function.
> (scc_vn_ok): New var.
> (init_gvn, delete_gvn, gvn_val, gvn_uses_equal): New function.
> (bb_size): New var.
> (init_bb_size, delete_bb_size): New function.
> (struct same_succ): Define.
> (same_succ_t, const_same_succ_t): New typedef.
> (same_succ_print, same_succ_print_traverse, same_succ_hash)
> (inverse_flags, same_succ_equal, same_succ_alloc, same_succ_delete)
> (same_succ_reset): New function.
> (same_succ_htab, bb_to_same_succ, same_succ_edge_flags)
> (bitmap deleted_bbs, deleted_bb_preds): New vars.
> (debug_same_succ): New function.
> (worklist): New var.
> (print_worklist, add_to_worklist, find_same_succ_bb, find_same_succ)
> (init_worklist, delete_worklist, delete_basic_block_same_succ)
> (update_worklist): New function.
> (struct bb_cluster): Define.
> (bb_cluster_t, const_bb_cluster_t): New typedef.
> (print_cluster, debug_cluster, same_predecessors)
> (add_bb_to_cluster, new_cluster, delete_cluster): New function.
> (merge_cluster, all_clusters): New var.
> (alloc_cluster_vectors, reset_cluster_vectors, delete_cluster_vectors)
> (merge_clusters, set_cluster): New function.
> (gimple_subcode_equal_p, gimple_base_equal_p, gimple_equal_p)
> (bb_gimple_equal_p): New function.
> (find_duplicate, same_phi_alternatives_1, same_phi_alternatives)
> (bb_has_non_vop_phi, find_clusters_1, find_clusters): New function.
> (replace_block_by, apply_clusters): New function.
> (update_debug_stmt, update_debug_stmt): New function.
> (tail_merge_optimize, gate_tail_merge): New function.
> (pass_tail_merge): New gimple pass.
> * tree-pass.h (pass_tail_merge): Declare new pass.
> * passes.c (init_optimization_passes): Use new pass.
> * Makefile.in (OBJS-common): Add tree-ssa-tail-merge.o.
> (tree-ssa-tail-merge.o): New rule.
> * opts.c (default_options_table): Set OPT_ftree_tail_merge by default at
> OPT_LEVELS_2_PLUS.
> * timevar.def (TV_TREE_TAIL_MERGE): New timevar.
> * common.opt (ftree-tail-merge): New switches.
> * params.def (PARAM_TAIL_MERGE_MAX_COMPARISONS): New parameter.
>
next prev parent reply other threads:[~2011-07-12 14:08 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-08 9:49 Tom de Vries
2011-06-08 9:55 ` [PATCH, PR43864] Gimple level duplicate block cleanup - test cases Tom de Vries
2011-07-18 2:54 ` Tom de Vries
2011-08-19 18:38 ` Tom de Vries
2011-08-25 10:09 ` Richard Guenther
2011-06-08 10:09 ` [PATCH, PR43864] Gimple level duplicate block cleanup Richard Guenther
2011-06-08 10:40 ` Steven Bosscher
2011-06-10 17:16 ` Tom de Vries
2011-06-14 15:12 ` Richard Guenther
2011-07-12 12:21 ` Tom de Vries
2011-07-12 14:37 ` Richard Guenther [this message]
2011-07-18 0:41 ` Tom de Vries
2011-07-22 15:54 ` Richard Guenther
2011-08-19 18:33 ` Tom de Vries
2011-08-24 9:00 ` Tom de Vries
2011-08-25 1:07 ` Ian Lance Taylor
2011-08-25 9:30 ` Richard Guenther
2011-06-10 18:43 ` Jeff Law
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAFiYyc2VhgbAZ8U2iDMJqApzHR+GyCdOfskMEVvr5Mf2TwB8_A@mail.gmail.com \
--to=richard.guenther@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=stevenb.gcc@gmail.com \
--cc=vries@codesourcery.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).