I tried to improve the patch following your advices and to catch more opportunities. Hope it'll be helpful. On 6/24/21 8:29 AM, Richard Biener wrote: > On Thu, Jun 24, 2021 at 11:55 AM Di Zhao via Gcc-patches patches@gcc.gnu.org> wrote: > > I have some reservations about extending the ad-hoc "predicated value" code. > > Some comments on the patch: > > +/* hashtable & helpers to record equivalences at given bb. */ > + > +typedef struct val_equiv_s > +{ > + val_equiv_s *next; > + val_equiv_s *unwind_to; > + hashval_t hashcode; > + /* SSA name this val_equiv_s is associated with. */ > + tree name; > + /* RESULT in a vn_pval entry is SSA name of a equivalence. */ > + vn_pval *values; > +} * val_equiv_t; > > all of this (and using a hashtable for recording) is IMHO a bit overkill. > Since you only ever record equivalences for values the more natural place to > hook those in is the vn_ssa_aux structure where we also record the availability > chain. I tried to store the equivalences in the vn_ssa_aux structure, but I didn't optimize the second case successfully: I need to record the equivalence of a PHI expression's result and arguments, but their value number results will become VARYING first, so they won't be changed. Maybe I'm missing something, or can I force change a VARYING result? Besides, the way currently used, equivalences only need to be "predictable" rather than available, maybe availability chains do not represent them very well? > There's little commentary in the new code, in particular function-level > comments are missing everywhere. Added more comments. > There's complexity issues, like I see val_equiv_insert has a "recurse" > feature but also find_predicated_value_by_equiv is quadratic in the number of > equivalences of the lhs/rhs. Without knowing what the recursion on the > former is for - nothing tells me - I suspect either of both should be redundant. The intention was, given {A==B, B==C, X==Y, Y==Z} and a previous result of "C opcode Z", to find the result of "A opcode Y". I removed the "recurse" feature and modified the searching logic so solve the issue. Now a temporary hash_set is used to record the equivalences that are visited when searching. > You seem to record equivalences at possible use points which looks odd at best > - I'd expected equivalences being recorded at the same point we record > predicated values and for the current condition, not the one determining some > other predication. > What was the motivation to do it the way you do it? The purpose is to "bring down" what can be known from a previous basic-block that effectively dominates current block, but not actually does so (in the example it is because jump threading is hindered by a loop). For example in this case: if (a != 0) // Nothing useful can be recorded here, because this BB doesn't dominate // the BB that we want to simplify. c = b; for (unsigned i = 0; i < c; i++) { if (a != 0) // The recording is triggered here. { // c == b will be recorded here, so it can be used for simplification. // In gimple it is the equivalence of a PHI's result and argument. if (i >= b) foo (); These requires finding a previous condition that is identical with current one, so it is convenient to do this in FRE. Besides, as FRE records derived predicate, so for relative conditions there also might be opportunities for optimization. In the new patch code this is included. Besides, to find more opportunities, added a hashmap to store mappings from immediate dominators to basic-blocks with PHIs of interest. > Why is the code conditional on 'iterate'? I haven't worked it out to fit the non-iterate mode, so it now breaks the if_conversion pass. I think this is because some of the equivalence-recordings are too optimistic for non-iterate mode. > You've probably realized that there's no "nice" way to handle temporary > equivalences in a VN scheme using hashing for expressions (unless you degrade > hashes a lot). I modified the code to use TREE_HASH on ssa names. Would that be better? > You quote opportunities that are catched with this like > > + if (a != 0) > + { > + c = b; > + } > + for (unsigned i = 0; i < c; i++) > + { > + if (a != 0) > + { > + if (i >= b) > + /* Should be eliminated. > + */ > + foo (); > > but say other "obvious" cases like > > + if (a != 0) > + { > + c = b; > + } > + for (unsigned i = 0; i < c; i++) > + { > + if (a != 0) > + { > + /* Should be zero. */ > return b - c; > > are not handled. That's in line with the current "value predication" > which mainly aims at catching simple jump threading opportunities; you only > simplify conditions with the recorded equivalences. But then the complexity of > handling equivalences does probably not outweight the opportunities catched - > can you share some numbers on how many branches are newly known taken > during VN with this patch during say bootstrap or build of SPEC CPU? I extended the code a little to cover the cases like "A - A" and "A xor A". Here are some results on one bootstrap step: | values found | more bb removed | values found in all | in fre1 | at fre1 | fre & pre passes ----------------------------------------------------------------- bootstrap | 592 | 40 | 1272 As the code is different for bootstrap, the "more bb removed" metric is not precise. I also tested on SPEC CPU 2017 integer cases: | values found | more bb removed | values found in all | in fre1 | at fre1 | fre & pre passes ----------------------------------------------------------------- 500.perlbench_r| 3 | 0 | 9 502.gcc_r | 25 | 39 | 241 520.omnetpp | 9 | 6 | 34 523.xalancbmk_r| 12 | 0 | 35 541.leela_r | 2 | 0 | 2 In cases not listed above there's no value found by equivalences. Benefits after fre1 are not counted as CGF may be different from here (for 523.xalancbmk_r there're 8 more basic-blocks removed at fre3). Although the chances are not plenty, there might be potential benefits, such as making a function pure. > I've hoped to ditch the current "value predication" code by eventually using the > relation oracle from ranger but did not yet have the chance to look into that. > Now, the predicated equivalences are likely not something that infrastructure > can handle? > > In the end I think we should research into maintaining an alternate expression > table for conditions (those we like to simplify with > equivalences) and use a data structure that more easily allows to introduce > (temporary) equivalences. Like by maintaining back-references of values we've > recorded a condition for and a way to quickly re-canonicalize conditions. Well - > it requires some research, as said. > > Richard. > > > Regards, > > Di Zhao > > > > Extend FRE with an "equivalence map" for condition prediction. > > > > 2021-06-24 Di Zhao > > Thanks, Di Zhao -------- Extend FRE with an "equivalence map" for condition prediction. 2021-07-18 Di Zhao gcc/ChangeLog: PR tree-optimization/101186 * tree-ssa-sccvn.c (vn_tracking_edge): Extracted utility function. (dominated_by_p_w_unex): Moved upward, no change. (vn_nary_op_get_predicated_value): Moved upward, no change. (struct val_equiv_hasher): Hasher for the "equivalence map". (is_vn_valid_at_bb): Check if vn_pval is valid at BB. (val_equiv_insert): Insert into "equivalence map". (vn_lookup_binary_op_result): Lookup binary expression's result by VN. (iterate_val_equivs): Iterate on equivalences and returns a non-NULL result returned by callback. (find_predicated_binary_by_lhs_equiv): Callback for iterate_val_equivs. Lookup a binary operations result by LHS equivalences. (find_predicated_binary_by_rhs_equiv): Callback for iterate_val_equivs. Lookup a binary operations result by RHS equivalences. (find_predicated_binary_by_equiv): Lookup predicated value of a binary operation by equivalences. (is_relaxed_cond_code): Whether operation code is a relaxed condition code derived from original code. (branch_may_come_from_another): Whether there's a path from the one true or false destination to another. (record_equiv_from_previous_edge): Record equivalence relation from a previous condition on current bb' true and false edges. (record_equiv_from_previous_cond): Record equivalences generated by previous conditions on current BB's true and false edges. (vn_nary_op_insert_pieces_predicated): Extract utility function. Insert into the "equivalence map" for predicate like "x==y is true". (record_dom_to_phi_bb): Record mappings from immediate dominator to basic_block with PHIs. (vn_phi_lookup): Record mappings from immediate dominator to PHIs. (visit_nary_op): Add lookup predicated values of binaries by equivalences. (free_rpo_vn): Free the "equivalence map". (process_bb): Insert into & lookup from the "equivalence map". (struct unwind_state): Add "equivalence map" unwind state. (do_unwind): Unwind the "equivalence map". (do_rpo_vn): Update "equivalence map" unwind state. gcc/testsuite/ChangeLog: PR tree-optimization/101186 * gcc.dg/tree-ssa/pr71947-1.c: Disable fre. * gcc.dg/tree-ssa/pr71947-2.c: Disable fre. * gcc.dg/tree-ssa/pr71947-3.c: Disable fre. * gcc.dg/tree-ssa/pr71947-5.c: Disable fre. * gcc.dg/tree-ssa/vrp03.c: Disable fre. * gcc.dg/tree-ssa/ssa-fre-95.c: New test. * gcc.dg/tree-ssa/ssa-fre-96.c: New test.