I'm very sorry there seems to be encoding issue in the attachment in my last email. Attached is the new patch. Thanks, Di Zhao > -----Original Message----- > From: Di Zhao OS > Sent: Tuesday, November 16, 2021 1:24 AM > To: 'Richard Biener' > Cc: gcc-patches@gcc.gnu.org > Subject: RE: [PATCH v2] tree-optimization/101186 - extend FRE with > "equivalence map" for condition prediction > > Attached is the updated patch. Fixed some errors in testcases. > > > -----Original Message----- > > From: Richard Biener > > Sent: Wednesday, November 10, 2021 5:44 PM > > To: Di Zhao OS > > Cc: gcc-patches@gcc.gnu.org; Andrew MacLeod > > Subject: Re: [PATCH v2] tree-optimization/101186 - extend FRE with > > "equivalence map" for condition prediction > > > > On Sun, Oct 24, 2021 at 9:03 PM Di Zhao OS > > wrote: > > > > > > Hi, > > > > > > Attached is a new version of the patch, mainly for improving performance > > > and simplifying the code. > > > > The patch doesn't apply anymore, can you update it please? > > > > I see the new ssa-fre-101.c test already passing without the patch. > > It was a mistake in test ssa-fre-101.c::g to define the variables with > the unsigned integers, in this way "a >= 0" is always true. After modified > the case, now fre1 in the patch can remove unreachable call "foo ()", and > evrp on trunk does not. > > > Likewise ssa-fre-100.c and ssa-fre-102.c would PASS if you scan > > the pass dump after fre1 which is evrp so it seems that evrp already > > handles the equivalences (likely with the relation oracle) now? > > I'm sure there are second order effects when eliminating conditions > > in FRE but did you re-evaluate what made you improve VN to see > > if the cases are handled as expected now without this change? > > In case ssa-fre-102.c, the unreachable call "foo ()" can be removed by > evrp, but fre in the patch can additionally replace "g = x + b" with > "g = 99". (Again I'm sorry the regex to check this was wrong..) > > Test cases to simulate the original problem I found are moved into > gcc.dg/tree-ssa/ssa-pre-34.c. The unreachable calls to "foo ()" are > still removed by pre with the patch. (If compiled with -O3, the > "foo ()"s in test file can now be removed by thread2/threadfull2 and > dom3 on trunk. This relies on jump threading across the loops, so > even with -O3, similar cases may not get optimized say if there're > too many statements to copy.) > > Thanks, > Di Zhao > > > > > I will still look at and consider the change btw, but given the EVRP > > improvements I'm also considering to remove the predication > > support from VN alltogether. At least in the non-iterating mode > > it should be trivially easy to use rangers relation oracle to simplify > > predicates. For the iterating mode it might not be 100% effective > > since I'm not sure we can make it use the current SSA values and > > how it would behave with those eventually changing to worse. > > > > Andrew, how would one ask the relation oracle to simplify a > > condition? Do I have to do any bookkeeping to register > > predicates on edges for it? > > > > Thanks, > > Richard. > > > > > First, regarding the comments: > > > > > > > -----Original Message----- > > > > From: Richard Biener > > > > Sent: Friday, October 1, 2021 9:00 PM > > > > To: Di Zhao OS > > > > Cc: gcc-patches@gcc.gnu.org > > > > Subject: Re: [PATCH v2] tree-optimization/101186 - extend FRE with > > > > "equivalence map" for condition prediction > > > > > > > > On Thu, Sep 16, 2021 at 8:13 PM Di Zhao OS > > > > wrote: > > > > > > > > > > Sorry about updating on this after so long. It took me much time to work out a > > > > > new plan and pass the tests. > > > > > > > > > > The new idea is to use one variable to represent a set of equal variables at > > > > > some basic-block. This variable is called a "equivalence head" or "equiv-head" > > > > > in the code. (There's no-longer a "equivalence map".) > > > > > > > > > > - Initially an SSA_NAME's "equivalence head" is its value number. Temporary > > > > > equivalence heads are recorded as unary NOP_EXPR results in the vn_nary_op_t > > > > > map. Besides, when inserting into vn_nary_op_t map, make the new result at > > > > > front of the vn_pval list, so that when searching for a variable's > > > > > equivalence head, the first result represents the largest equivalence set at > > > > > current location. > > > > > - In vn_ssa_aux_t, maintain a list of references to valid_info->nary entry. > > > > > For recorded equivalences, the reference is result->entry; for normal N-ary > > > > > operations, the reference is operand->entry. > > > > > - When recording equivalences, if one side A is constant or has more refs, make > > > > > it the new equivalence head of the other side B. Traverse B's ref- list,if a > > > > > variable C's previous equiv-head is B, update to A. And re-insert B's n-ary > > > > > operations by replacing B with A. > > > > > - When inserting and looking for the results of n-ary operations, insert and > > > > > lookup by the operands' equiv-heads. > > > > > ... > > > > > > > > > > Thanks, > > > > > Di Zhao > > > > > > > > > > -------- > > > > > Extend FRE with temporary equivalences. > > > > > > > > Comments on the patch: > > > > > > > > + /* nary_ref count. */ > > > > + unsigned num_nary_ref; > > > > + > > > > > > > > I think a unsigned short should be enough and that would nicely > > > > pack after value_id together with the bitfield (maybe change that > > > > to unsigned short :1 then). > > > > > > Changed num_nary_ref to unsigned short and moved after value_id. > > > > > > > @@ -7307,17 +7839,23 @@ process_bb (rpo_elim &avail, basic_block bb, > > > > tree val = gimple_simplify (gimple_cond_code (last), > > > > boolean_type_node, lhs, rhs, > > > > NULL, vn_valueize); > > > > + vn_nary_op_t vnresult = NULL; > > > > /* If the condition didn't simplfy see if we have recorded > > > > an expression from sofar taken edges. */ > > > > if (! val || TREE_CODE (val) != INTEGER_CST) > > > > { > > > > - vn_nary_op_t vnresult; > > > > > > > > looks like you don't need vnresult outside of the if()? > > > > > > vnresult is reused later to record equivalences generated by PHI nodes. > > > > > > > +/* Find predicated value of vn_nary_op by the operands' equivalences. Return > > > > + * NULL_TREE if no known result is found. */ > > > > + > > > > +static tree > > > > +find_predicated_value_by_equivs (vn_nary_op_t vno, basic_block bb, > > > > + vn_nary_op_t *vnresult) > > > > +{ > > > > + lookup_equiv_heads (vno->length, vno->op, vno->op, bb); > > > > + tree result > > > > + = simplify_nary_op (vno->length, vno->opcode, vno->op, vno->type); > > > > > > > > why is it necessary to simplify here? It looks like the caller > > > > already does this. > > > > > > In the new patch, changed the code a little to remove redundant calculation. > > > > > > > I wonder whether it's valid to always perform find_predicated_value_by_equivs > > > > from inside vn_nary_op_get_predicated_value instead of bolting it to only > > > > a single place? > > > > > > Removed function find_predicated_value_by_equivs and inlined the code. > > > > > > Because lookup_equiv_head uses vn_nary_op_get_predicated_value, so I left > > > vn_nary_op_get_predicated_value unchanged. Instead, operands are set to > > > equiv-heads in init_vn_nary_op_from_stmt. So altogether, predicates are always > > > inserted and searched by equiv-heads. > > > > > > > + > > > > +static vn_nary_op_t > > > > +val_equiv_insert (tree op1, tree op2, edge e) > > > > +{ > > > > > > > > + if (is_gimple_min_invariant (lhs)) > > > > + std::swap (lhs, rhs); > > > > + if (is_gimple_min_invariant (lhs) || TREE_CODE (lhs) != SSA_NAME) > > > > + /* Possible if invoked from record_equiv_from_previous_cond. */ > > > > + return NULL; > > > > > > > > Better formulate all of the above in terms of only SSA_NAME checks since... > > > > > > > > + /* Make the hand-side with more recorded n-ary expressions new > > > > + * equivalence-head, to make fewer re-insertions. */ > > > > + if (TREE_CODE (rhs) == SSA_NAME > > > > + && VN_INFO (rhs)->num_nary_ref < VN_INFO (lhs)->num_nary_ref) > > > > + std::swap (lhs, rhs); > > > > > > > > here LHS needs to be an SSA_NAME. > > > > > > Tried to fix this in the new patch. > > > > > > > + /* Record equivalence as unary NOP_EXPR. */ > > > > + vn_nary_op_t val > > > > + = vn_nary_op_insert_pieces_predicated_1 (1, NOP_EXPR, TREE_TYPE (lhs), > > > > + &lhs, rhs, 0, bb); > > > > > > > > so you are recording an equivalence of lhs to rhs as (NOP_EXPR)lhs > > > > with value rhs? > > > > Presumably you think that's good enough to not overlap with entries > > > > for "real" IL? > > > > > > By checking related codes, it seems to me there's no overlap as long as > > > equivalences are inserted and searched as predicated_values. It should be easy > > > to replace this with other preferred opcode. > > > > > > > In particular why did you choose to _not_ use (2, EQ_EXPR, boolean_type_node, > > > > &[lhs, rhs], boolean_true_node, ...) like I think what would be > > > > inserted for by the > > > > existing code in process_bb via vn_nary_op_insert_pieces_predicated? > > > > It would probably felt more natural if the "new" code bolted on the case where > > > > that is called with an equivalence? > > > > > > The "equivalence head"s actually implement Disjoint-sets. So by a hashtable > > > search in the nary-map, the set of equivalences of a variable can be found. > > > Also it's easy to find out whether two variables are equal. And the > > > disjoint-sets are path-compressed by updating "equivalence head" and > > > re-inserting predicates. > > > > > > In the new attached patch, predicates like "A==B is true" and "A!=B is false" > > > are not inserted, as they can be computed from equivalences. Same with the > > > predicates derived from them. This reduces insertions of predicates by a lot ( > > > counting the equivalences themselves and re-inserted predicates). Below are > > > test results on SPEC intrate, comparing with trunk (f45610a4, Oct 19): > > > > > > |more bb |more bb |more stmt|more stmt|less |less > > > |removed |removed |removed |removed |predicates|predicates > > > |at fre1 |at fre1 |at fre1 |at fre1 |inserted |inserted > > > -------------------------------------------------------------------------- > > > 500.perlbench_r| 64 | 1.56% | 102 | 0.18% | 51928 | 53.44% > > > 502.gcc_r | 679 | 4.88% | 290 | 0.21% | 149968 | 65.5% > > > 505.mcf_r | 5 | 35.71% | 9 | 1.40% | 349 | 27.48% > > > 520.omnetpp | 132 | 5.36% | 48 | 0.14% | 28192 | 58.38% > > > 523.xalancbmk_r| 231 | 3.15% | 308 | 0.35% | 65131 | 58.95% > > > 525.x264_r | 4 | 1.36% | 27 | 0.11% | 10401 | 40.68% > > > 531.deepsjeng_r| 1 | 3.45% | 2 | 0.14% | 1412 | 53.81% > > > 541.leela_r | 2 | 0.62% | 5 | 0.06% | 3874 | 48.56% > > > 548.exchange2_r| 0 | 0.00% | 3 | 0.04% | 159 | 3.33% > > > 557.xz_r | 0 | 0.00% | 3 | 0.07% | 1894 | 52.39% > > > > > > Considering the copying and unwinding work saved, this patch could reduce > > > compile time rather than increase it. > > > > > > > Btw, I didn't find any condition that prevents equivalences being used for > > > > non-integer types? Are you sure this is all valid for FP compares without > > > > further restrictions? > > > > > > I think this approach follows sccvn's previous decisions on FP comparisons, > > > for equivalences are inserted only when EQ_EXPRs were evaluated to be true. > > > The patch is tested on regression tests and SPEC fprate cases. > > > > > > > > > > > @@ -4224,36 +4404,345 @@ vn_nary_op_insert_pieces_predicated (unsigned > > > > int length, enum tree_code code, > > > > fprintf (dump_file, " == %s\n", > > > > integer_zerop (result) ? "false" : "true"); > > > > } > > > > - vn_nary_op_t vno1 = alloc_vn_nary_op (length, NULL_TREE, value_id); > > > > - init_vn_nary_op_from_pieces (vno1, length, code, type, ops); > > > > - vno1->predicated_values = 1; > > > > - vno1->u.values = (vn_pval *) obstack_alloc (&vn_tables_obstack, > > > > - sizeof (vn_pval)); > > > > - vno1->u.values->next = NULL; > > > > - vno1->u.values->result = result; > > > > - vno1->u.values->n = 1; > > > > - vno1->u.values->valid_dominated_by_p[0] = pred_e->dest->index; > > > > - return vn_nary_op_insert_into (vno1, valid_info->nary, true); > > > > + /* Insert by operands' equiv-heads. */ > > > > + tree new_ops[3]; > > > > + lookup_equiv_heads (length, ops, new_ops, pred_e->dest); > > > > + /* Skip if the result is known. */ > > > > + tree simplified = simplify_nary_op (length, code, new_ops, type); > > > > + if (simplified) > > > > + { > > > > + if (dump_file && (dump_flags & TDF_DETAILS)) > > > > + { > > > > + fprintf (dump_file, "Result is known: "); > > > > + print_generic_expr (dump_file, simplified, TDF_NONE); > > > > + fprintf (dump_file, ", skipped.\n"); > > > > + } > > > > + return NULL; > > > > + } > > > > > > > > soo - how do the existing predicated expressions change to be handled here? > > > > Why would they ever be simplified? Should this simplification not better happen > > > > upthread when visiting the conditional? Maybe that needs to use the equivs > > > > already? > > > > > > This happens when updating variable A's equiv_head from B to C, and > > > re-inserting former predicates of B with C. For example, "x < 0" is recorded, > > > then x is assigned with new equiv_head -1 at some location, then -1 < 0 can > > > simplify. It shouldn't happen when inserting new predicates. > > > > > > > > > > > You re-use the predicated value list which I think is great, can you explain > > > > how "latest" is meant in the following? > > > > > > > > @@ -4145,7 +4264,12 @@ vn_nary_op_insert_into (vn_nary_op_t vno, > > > > vn_nary_op_table_type *table, > > > > next = &(*next)->next; > > > > } > > > > if (!found) > > > > - *next = nval; > > > > + { > > > > + /* Insert new value at the front, so that lookup_equiv_head can > > > > + * find the latest result. */ > > > > + nval->next = vno->u.values; > > > > + vno->u.values = nval; > > > > + } > > > > > > It meant the "most general" result. (I picked the wrong word. The description > > > is changed in the new patch file. ) > > > > > > There can be multiple valid results at the same time. As we traverse in RPO and > > > update equiv-heads (i.e. unite disjoint-sets), the last inserted result > > > represents the most complete set of equivalences of the variable. > > > > > > (In the last patch version, when "Appending predicate to value", the new result > > > is not moved to the front. That violates the property of equiv-heads and some > > > opportunities will be lost. Fixed this in the new patch file.) > > > > > > > > > > > OK, I have to stop now but will try to understand and review further next week. > > > > Sorry for the delays but the patch is quite complex to understand :/ > > > > > > > > Richard. > > > > > > > > > 2021-09-16 Di Zhao > > > > > > > > > > > Other changes: > > > - Fixed some errs in comments. > > > - On recording temporary equivalences generated by PHI nodes, altered the > > > logic, hope it can be a bit more clear. > > > 1. When a conditional expression's true/false edge E is assessed to be > > > executable, search for conflicting predicates for E to be taken. If a > > > conflicting predicate is found at L, and L's single predecessor P > > > dominates current BB, then we know P->L cannot be taken if E is taken. > > > 2. Process phi_bb (immediately dominated by P, and dominates BB), rule out > > > PHI arguments (x2) that can only come from P->L. If there's a single > > > PHI argument (x1) left, record the equivalence of x1 and PHI result x > > > at edge E. > > > ----- > > > | P | > > > ----- > > > | \ > > > | ----- > > > | | L | <== conflicting predicate's location > > > | ----- > > > | / > > > ------------------------- > > > | x = PHI | phi_bb > > > ------------------------- > > > | > > > .... > > > | > > > ----- > > > | BB | <== current bb > > > ----- > > > / \ edge E > > > ... ------- > > > | dest | > > > ------- > > > > > > - Fixed a correctness problem when recording equivalences from PHI in previous > > > version. Also test case ssa-fre-95.c:k in previous version was incorrect, > > > fixed that. > > > > > > Thanks, > > > Di Zhao > > --- > > Extend FRE with temporary equivalences. > > 2021-11-15 Di Zhao > > gcc/ChangeLog: > PR tree-optimization/101186 > * tree-ssa-sccvn.c (VN_INFO): remove assertions (there could be a > predicate already). > (dominated_by_p_w_unex): Moved upward. > (vn_nary_op_get_predicated_value): Moved upward. > (is_vn_valid_at_bb): Check if vn_pval is valid at BB. > (lookup_equiv_head): Lookup the "equivalence head" of given node. > (lookup_equiv_heads): Lookup the "equivalence head"s of given nodes. > (vn_tracking_edge): Extracted utility function. > (init_vn_nary_op_from_stmt): Insert and lookup by "equivalence head"s. > (vn_nary_op_insert_into): Insert new value at the front. > (vn_nary_op_insert_pieces_predicated_1): Insert as predicated values > from pieces. > (fold_const_from_equiv_heads): Fold N-ary expression of equiv-heads. > (push_new_nary_ref): Insert a back-reference to vn_nary_op_t. > (val_equiv_insert): Record temporary equivalence. > (vn_nary_op_insert_pieces_predicated): Record equivalences instead of > some predicates; insert back-refs. > (record_equiv_from_prev_phi_1): Record temporary equivalences generated > by PHI nodes. > (record_equiv_from_prev_phi): Given an outgoing edge of a conditional > expression taken, record equivalences generated by PHI nodes. > (visit_nary_op): Add lookup previous results of N-ary operations by > equivalences. > (insert_related_predicates_on_edge): Some predicates can be computed > from equivalences, no need to insert them. > (process_bb): Add lookup predicated values by equivalenc~/tmp/time- frees. > (struct unwind_state): Unwind state of back-refs to vn_nary_op_t. > (do_unwind): Unwind the back-refs to vn_nary_op_t. > (do_rpo_vn): Update back-reference unwind state. > * tree-ssa-sccvn.h (struct nary_ref): hold a lists of references to the > nary map entries. > > gcc/testsuite/ChangeLog: > > * gcc.dg/tree-ssa/pr68619-2.c: Disable fre. > * gcc.dg/tree-ssa/pr71947-1.c: Disable fre. > * gcc.dg/tree-ssa/pr71947-2.c: Disable fre. > * gcc.dg/tree-ssa/pr71947-3.c: Disable fre. > * gcc.dg/tree-ssa/pr71947-5.c: Disable fre. > * gcc.dg/tree-ssa/pr71947-7.c: Disable fre. > * gcc.dg/tree-ssa/pr71947-8.c: Disable fre. > * gcc.dg/tree-ssa/pr71947-9.c: Disable fre. > * gcc.dg/tree-ssa/vrp03.c: Disable fre. > * gcc.dg/tree-ssa/ssa-fre-100.c: New test. > * gcc.dg/tree-ssa/ssa-fre-101.c: New test. > * gcc.dg/tree-ssa/ssa-fre-102.c: New test. > * gcc.dg/tree-ssa/ssa-pre-34.c: New test.