Richard, I changed algorithm for bool pattern repair. It turned out that ifcvt_local_dce phaase is required since for test-case I sent you in previous mail vectorization is not performed without dead code elimination: For the loop #pragma omp simd safelen(8) for (i=0; i<512; i++) { float t = a[i]; if (t > 0.0f & t < 1.0e+17f) if (c[i] != 0) res += 1; } I've got the following message from vectorizer: t3.c:10:11: note: ==> examining statement: _ifc__39 = t_5 > 0.0; t3.c:10:11: note: bit-precision arithmetic not supported. t3.c:10:11: note: not vectorized: relevant stmt not supported: _ifc__39 = t_5 > 0.0; It is caused by the following dead predicate computations after critical edge splitting: (after combine blocks): : # res_15 = PHI # i_16 = PHI # ivtmp_14 = PHI t_5 = a[i_16]; _6 = t_5 > 0.0; _7 = t_5 < 9.9999998430674944e+16; _8 = _6 & _7; _10 = &c[i_16]; _ifc__36 = _8 ? 4294967295 : 0; _9 = MASK_LOAD (_10, 0B, _ifc__36); _28 = _8; _29 = _9 != 0; _30 = _28 & _29; // Statements below are dead!! _31 = _8; _32 = _9 != 0; _33 = ~_32; _34 = _31 & _33; // End of dead statements. _ifc__35 = _30 ? 1 : 0; res_1 = res_15 + _ifc__35; i_11 = i_16 + 1; ivtmp_13 = ivtmp_14 - 1; if (ivtmp_13 != 0) goto ; else goto ; But if we delete these statements loop will be vectorized. New patch is attached. Thanks. Yuri. 2014-12-19 14:45 GMT+03:00 Richard Biener : > On Thu, Dec 18, 2014 at 2:45 PM, Yuri Rumyantsev wrote: >> Richard, >> >> I am sending you full patch (~1000 lines) but if you need only patch.1 >> and patch.2 will let me know and i'll send you reduced patch. >> >> Below are few comments regarding your remarks for patch.3. >> >> 1. I deleted sub-phase ifcvt_local_dce since I did not find test-case >> when dead code elimination is required to vectorize loop, i.e. dead >> statement is marked as relevant. >> 2. You wrote: >>> The "retry" code also looks odd - why do you walk the BB multiple >>> times instead of just doing sth like >>> >>> while (!has_single_use (lhs)) >>> { >>> gimple copy = ifcvt_split_def_stmt (def_stmt); >>> ifcvt_walk_pattern_tree (copy); >>> } >>> >>> thus returning the copy you create and re-process it (the copy should >>> now have a single-use). >> >> The problem is that not only top SSA_NAME (lhs) may have multiple uses >> but some intermediate variables too. For example, for the following >> test-case >> >> float a[1000]; >> int c[1000]; >> >> int foo() >> { >> int i, res = 0; >> #pragma omp simd safelen(8) >> for (i=0; i<512; i++) >> { >> float t = a[i]; >> if (t > 0.0f & t < 1.0e+17f) >> if (c[i] != 0) >> res += 1; >> } >> return res; >> } >> >> After combine_blocks we have the following bb: >> >> : >> # res_15 = PHI >> # i_16 = PHI >> # ivtmp_14 = PHI >> t_5 = a[i_16]; >> _6 = t_5 > 0.0; >> _7 = t_5 < 9.9999998430674944e+16; >> _8 = _6 & _7; >> _10 = &c[i_16]; >> _ifc__32 = _8 ? 4294967295 : 0; >> _9 = MASK_LOAD (_10, 0B, _ifc__32); >> _28 = _8; >> _29 = _9 != 0; >> _30 = _28 & _29; >> _ifc__31 = _30 ? 1 : 0; >> res_1 = res_15 + _ifc__31; >> i_11 = i_16 + 1; >> ivtmp_13 = ivtmp_14 - 1; >> if (ivtmp_13 != 0) >> goto ; >> else >> goto ; >> >> and we can see that _8 has multiple uses. Also note that after splitting of >> _8 = _6 & _7 >> we also get multiple uses for definition of _6 and _7. So I used this >> iterative algorithm as the simplest one. > > But it walks the entire pattern again and again while you only need to > ensure you walk the pattern tree of the now single-use DEF again > (in fact, rather than replacing a random USE in ifcvt_split_def_stmt > you should pass down the user_operand_p that you need to make > single-use). > >> I think it would be nice to re-use some utility from tree-vect-patterns.c >> for stmt_is_root_of_bool_pattern. >> >> I assume that function stmt_is_root_of_bool_pattern can be simplified >> to check on COND_EXPR only since PHI predication and memory access >> predication produced only such statements,i.e. it can look like >> >> static bool >> stmt_is_root_of_bool_pattern (gimple stmt, tree *var) >> { >> enum tree_code code; >> tree lhs, rhs; >> >> code = gimple_assign_rhs_code (stmt); >> if (code == COND_EXPR) >> { >> rhs = gimple_assign_rhs1 (stmt); >> if (TREE_CODE (rhs) != SSA_NAME) >> return false; >> *var = rhs; >> return true; >> } >> return false; >> } >> >> I also did few minor changes in patch.2. >> >> 3. You can also notice that I inserted code in tree_if_conversion to >> do loop version if explicit option "-ftree-loop-if-convert" was not >> passed to compiler, i.e. we perform if-conversion for loop >> vectorization only and if it does not take place, we should delete >> if-converted version of loop. >> What is your opinion? > > Overall part 1 and part 2 look good to me, predicate_scalar_phi > looks in need of some refactoring to avoid duplicate code. We can > do that a followup. > > Part 3 still needs the iteration to be resolved and make the use we > actually care about single-use, not a random one so we can avoid > iterating completely. > > Richard. > >> Thanks. >> Yuri. >> >> 2014-12-17 18:41 GMT+03:00 Richard Biener : >>> On Tue, Dec 16, 2014 at 4:15 PM, Yuri Rumyantsev wrote: >>>> Hi Richard, >>>> >>>> Here is updated patch which includes >>>> (1) split critical edges for aggressive if conversion. >>>> (2) delete all stuff related to support of critical edge predication. >>>> (3) only one function - predicate_scalar_phi performs predication. >>>> (4) function find_phi_replacement_condition was deleted since it was >>>> included in predicate_scalar_phi for phi with two arguments. >>>> >>>> I checked that patch works in stress testing mode, i.e. with >>>> aggressive if conversion by default. >>>> >>>> What is your opinion? >>> >>> Looks ok overall, but please simply do >>> >>> FOR_EACH_EDGE (e, ei, bb->succs) >>> if (EDGE_CRITICAL_P (e) && e->dest->loop_father == loop) >>> split_edge (e); >>> >>> for all blocks apart from the latch. >>> >>> Can you please send a combined patch up to this one? Looking at >>> the incremental diff is somewhat hard. Thus a patch including all >>> patches from patch1 to this one. >>> >>> Thanks, >>> Richard. >>> >>>> >>>> Thanks. >>>> Yuri. >>>> >>>> 2014-12-11 11:59 GMT+03:00 Richard Biener : >>>>> On Wed, Dec 10, 2014 at 4:22 PM, Yuri Rumyantsev wrote: >>>>>> Richard, >>>>>> >>>>>> Thanks for your reply! >>>>>> >>>>>> I didn't understand your point: >>>>>> >>>>>> Well, I don't mind splitting all critical edges unconditionally >>>>>> >>>>>> but you do it unconditionally in proposed patch. >>>>> >>>>> I don't mind means I am fine with it. >>>>> >>>>>> Also I assume that >>>>>> call of split_critical_edges() can break ssa. For example, we can >>>>>> split headers of loops, loop exit blocks etc. >>>>> >>>>> How does that "break SSA"? You mean loop-closed SSA? I'd >>>>> be surprised if so but that may be possible. >>>>> >>>>>> I prefer to do something >>>>>> more loop-specialized, e.g. call edge_split() for critical edges >>>>>> outgoing from bb ending with GIMPLE_COND stmt (assuming that edge >>>>>> destination bb belongs to loop). >>>>> >>>>> That works for me as well but it is more complicated to implement. >>>>> Ideally you'd only split one edge if you find a block with only critical >>>>> predecessors (where we'd currently give up). But note that this >>>>> requires re-computation of ifc_bbs in if_convertible_loop_p_1 and it >>>>> will change loop->num_nodes so we have to be more careful in >>>>> constructing the loop calling if_convertible_bb_p. >>>>> >>>>> Richard. >>>>> >>>>>> >>>>>> 2014-12-10 17:31 GMT+03:00 Richard Biener : >>>>>>> On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev wrote: >>>>>>>> Richard, >>>>>>>> >>>>>>>> Sorry that I forgot to delete debug dump from my fix. >>>>>>>> I have few questions about your comments. >>>>>>>> >>>>>>>> 1. You wrote : >>>>>>>>> You also still have two functions for PHI predication. And the >>>>>>>>> new extended variant doesn't commonize the 2-args and general >>>>>>>>> path >>>>>>>> Did you mean that I must combine predicate_scalar_phi and >>>>>>>> predicate_extended scalar phi to one function? >>>>>>>> Please note that if additional flag was not set up (i.e. >>>>>>>> aggressive_if_conv is false) extended predication is required more >>>>>>>> compile time since it builds hash_map. >>>>>>> >>>>>>> It's compile-time complexity is reasonable enough even for >>>>>>> non-aggressive if-conversion. >>>>>>> >>>>>>>> 2. About critical edge splitting. >>>>>>>> >>>>>>>> Did you mean that we should perform it (1) under aggressive_if_conv >>>>>>>> option only; (2) should we split all critical edges. >>>>>>>> Note that this leads to recomputing of topological order. >>>>>>> >>>>>>> Well, I don't mind splitting all critical edges unconditionally, thus >>>>>>> do something like >>>>>>> >>>>>>> Index: gcc/tree-if-conv.c >>>>>>> =================================================================== >>>>>>> --- gcc/tree-if-conv.c (revision 218515) >>>>>>> +++ gcc/tree-if-conv.c (working copy) >>>>>>> @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f >>>>>>> if (number_of_loops (fun) <= 1) >>>>>>> return 0; >>>>>>> >>>>>>> + bool critical_edges_split_p = false; >>>>>>> FOR_EACH_LOOP (loop, 0) >>>>>>> if (flag_tree_loop_if_convert == 1 >>>>>>> || flag_tree_loop_if_convert_stores == 1 >>>>>>> || ((flag_tree_loop_vectorize || loop->force_vectorize) >>>>>>> && !loop->dont_vectorize)) >>>>>>> - todo |= tree_if_conversion (loop); >>>>>>> + { >>>>>>> + if (!critical_edges_split_p) >>>>>>> + { >>>>>>> + split_critical_edges (); >>>>>>> + critical_edges_split_p = true; >>>>>>> + todo |= TODO_cleanup_cfg; >>>>>>> + } >>>>>>> + todo |= tree_if_conversion (loop); >>>>>>> + } >>>>>>> >>>>>>> #ifdef ENABLE_CHECKING >>>>>>> { >>>>>>> >>>>>>>> It is worth noting that in current implementation bb's with 2 >>>>>>>> predecessors and both are on critical edges are accepted without >>>>>>>> additional option. >>>>>>> >>>>>>> Yes, I know. >>>>>>> >>>>>>> tree-if-conv.c is a mess right now and if we can avoid adding more >>>>>>> to it and even fix the critical edge missed optimization with splitting >>>>>>> critical edges then I am all for that solution. >>>>>>> >>>>>>> Richard. >>>>>>> >>>>>>>> Thanks ahead. >>>>>>>> Yuri. >>>>>>>> 2014-12-09 18:20 GMT+03:00 Richard Biener : >>>>>>>>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev wrote: >>>>>>>>>> Richard, >>>>>>>>>> >>>>>>>>>> Here is updated patch2 with the following changes: >>>>>>>>>> 1. Delete functions phi_has_two_different_args and find_insertion_point. >>>>>>>>>> 2. Use only one function for extended predication - >>>>>>>>>> predicate_extended_scalar_phi. >>>>>>>>>> 3. Save gsi before insertion of predicate computations for basic >>>>>>>>>> blocks if it has 2 predecessors and >>>>>>>>>> both incoming edges are critical or it gas more than 2 predecessors >>>>>>>>>> and at least one incoming edge >>>>>>>>>> is critical. This saved iterator can be used by extended phi predication. >>>>>>>>>> >>>>>>>>>> Here is motivated test-case which explains this point. >>>>>>>>>> Test-case is attached (t5.c) and it must be compiled with -O2 >>>>>>>>>> -ftree-loop-vectorize -fopenmp options. >>>>>>>>>> The problem phi is in bb-7: >>>>>>>>>> >>>>>>>>>> bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 }) >>>>>>>>>> { >>>>>>>>>> : >>>>>>>>>> xmax_edge_18 = xmax_edge_36 + 1; >>>>>>>>>> if (xmax_17 == xmax_27) >>>>>>>>>> goto ; >>>>>>>>>> else >>>>>>>>>> goto ; >>>>>>>>>> >>>>>>>>>> } >>>>>>>>>> bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 }) >>>>>>>>>> { >>>>>>>>>> : >>>>>>>>>> if (xmax_17 == xmax_27) >>>>>>>>>> goto ; >>>>>>>>>> else >>>>>>>>>> goto ; >>>>>>>>>> >>>>>>>>>> } >>>>>>>>>> bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 }) >>>>>>>>>> { >>>>>>>>>> : >>>>>>>>>> # xmax_edge_30 = PHI >>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1; >>>>>>>>>> goto ; >>>>>>>>>> >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> Note that both incoming edges to bb_7 are critical. If we comment out >>>>>>>>>> restoring gsi in predicate_all_scalar_phi: >>>>>>>>>> #if 0 >>>>>>>>>> if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb)) >>>>>>>>>> || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb))) >>>>>>>>>> gsi = bb_insert_point (bb); >>>>>>>>>> else >>>>>>>>>> #endif >>>>>>>>>> gsi = gsi_after_labels (bb); >>>>>>>>>> >>>>>>>>>> we will get ICE: >>>>>>>>>> t5.c: In function 'foo': >>>>>>>>>> t5.c:9:6: error: definition in block 4 follows the use >>>>>>>>>> void foo (int n) >>>>>>>>>> ^ >>>>>>>>>> for SSA_NAME: _1 in statement: >>>>>>>>>> _52 = _1 & _3; >>>>>>>>>> t5.c:9:6: internal compiler error: verify_ssa failed >>>>>>>>>> >>>>>>>>>> smce predicate computations were inserted in bb_7. >>>>>>>>> >>>>>>>>> The issue is obviously that the predicates have already been emitted >>>>>>>>> in the target BB - that's of course the wrong place. This is done >>>>>>>>> by insert_gimplified_predicates. >>>>>>>>> >>>>>>>>> This just shows how edge predicate handling is broken - we don't >>>>>>>>> seem to have a sequence of gimplified stmts for edge predicates >>>>>>>>> but push those to e->dest which makes this really messy. >>>>>>>>> >>>>>>>>> Rather than having a separate phase where we insert all >>>>>>>>> gimplified bb predicates we should do that on-demand when >>>>>>>>> predicating a PHI. >>>>>>>>> >>>>>>>>> Your patch writes to stderr - that's bad - use dump_file and guard >>>>>>>>> the printfs properly. >>>>>>>>> >>>>>>>>> You also still have two functions for PHI predication. And the >>>>>>>>> new extended variant doesn't commonize the 2-args and general >>>>>>>>> paths. >>>>>>>>> >>>>>>>>> I'm not at all happy with this code. It may be existing if-conv codes >>>>>>>>> fault but making it even worse is not an option. >>>>>>>>> >>>>>>>>> Again - what's wrong with simply splitting critical edges if >>>>>>>>> aggressive_if_conv? I think that would very much simplify >>>>>>>>> things here. Or alternatively use gsi_insert_on_edge and >>>>>>>>> commit edge insertions before merging the blocks. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Richard. >>>>>>>>> >>>>>>>>>> ChangeLog is >>>>>>>>>> >>>>>>>>>> 2014-12-09 Yuri Rumyantsev >>>>>>>>>> >>>>>>>>>> * tree-if-conv.c : Include hash-map.h. >>>>>>>>>> (struct bb_predicate_s): Add new field to save copy of gimple >>>>>>>>>> statement iterator. >>>>>>>>>> (bb_insert_point): New function. >>>>>>>>>> (set_bb_insert_point): New function. >>>>>>>>>> (has_pred_critical_p): New function. >>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if >>>>>>>>>> AGGRESSIVE_IF_CONV is true. >>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one >>>>>>>>>> non-critical incoming edge. >>>>>>>>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED. >>>>>>>>>> Allow interchange PHI arguments if EXTENDED is false. >>>>>>>>>> Change check that block containing reduction statement candidate >>>>>>>>>> is predecessor of phi-block since phi may have more than two arguments. >>>>>>>>>> (predicate_scalar_phi): Add new arguments for call of >>>>>>>>>> is_cond_scalar_reduction. >>>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>>> (struct phi_args_hash_traits): New type. >>>>>>>>>> (phi_args_hash_traits::hash): New function. >>>>>>>>>> (phi_args_hash_traits::equal_keys): New function. >>>>>>>>>> (gen_phi_arg_condition): New function. >>>>>>>>>> (predicate_extended_scalar_phi): New function. >>>>>>>>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it >>>>>>>>>> to true if BB containing phi has more than 2 predecessors or both >>>>>>>>>> incoming edges are critical. Invoke find_phi_replacement_condition and >>>>>>>>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB >>>>>>>>>> has 2 predecessors and both incoming edges are critical or it has more >>>>>>>>>> than 2 predecessors and atleast one incoming edge is critical. >>>>>>>>>> Use standard gsi_after_labels otherwise. >>>>>>>>>> Invoke predicate_extended_scalar_phi if EXTENDED is true. >>>>>>>>>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION >>>>>>>>>> to save gsi before insertion of predicate computations. SEt-up it to >>>>>>>>>> true for BB with 2 predecessors and critical incoming edges either >>>>>>>>>> number of predecessors is geater 2 and at least one incoming edge is >>>>>>>>>> critical. >>>>>>>>>> Add check that non-predicated block may have statements to insert. >>>>>>>>>> Insert predicate computation of BB just after label if >>>>>>>>>> EXTENDED_PREDICATION is true. >>>>>>>>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which >>>>>>>>>> is copy of inner or outer loop force_vectorize field. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2014-12-04 16:37 GMT+03:00 Richard Biener : >>>>>>>>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev wrote: >>>>>>>>>>>> Richard, >>>>>>>>>>>> >>>>>>>>>>>> I did simple change by saving gsi iterator for each bb that has >>>>>>>>>>>> critical edges by adding additional field to bb_predicate_s: >>>>>>>>>>>> >>>>>>>>>>>> typedef struct bb_predicate_s { >>>>>>>>>>>> >>>>>>>>>>>> /* The condition under which this basic block is executed. */ >>>>>>>>>>>> tree predicate; >>>>>>>>>>>> >>>>>>>>>>>> /* PREDICATE is gimplified, and the sequence of statements is >>>>>>>>>>>> recorded here, in order to avoid the duplication of computations >>>>>>>>>>>> that occur in previous conditions. See PR44483. */ >>>>>>>>>>>> gimple_seq predicate_gimplified_stmts; >>>>>>>>>>>> >>>>>>>>>>>> /* Insertion point for blocks having incoming critical edges. */ >>>>>>>>>>>> gimple_stmt_iterator gsi; >>>>>>>>>>>> } *bb_predicate_p; >>>>>>>>>>>> >>>>>>>>>>>> and this iterator is saved in insert_gimplified_predicates before >>>>>>>>>>>> insertion code for predicate computation. I checked that this fix >>>>>>>>>>>> works. >>>>>>>>>>> >>>>>>>>>>> Huh? I still wonder what the issue is with inserting everything >>>>>>>>>>> after the PHI we predicate. >>>>>>>>>>> >>>>>>>>>>> Well, your updated patch will come with testcases for the testsuite >>>>>>>>>>> that will hopefully fail if doing that. >>>>>>>>>>> >>>>>>>>>>> Richard. >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Now I am implementing merging of predicate_extended.. and >>>>>>>>>>>> predicate_arbitrary.. functions as you proposed. >>>>>>>>>>>> >>>>>>>>>>>> Best regards. >>>>>>>>>>>> Yuri. >>>>>>>>>>>> >>>>>>>>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener : >>>>>>>>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev wrote: >>>>>>>>>>>>>> Thanks Richard for your quick reply! >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1. I agree that we can combine predicate_extended_ and >>>>>>>>>>>>>> predicate_arbitrary_ to one function as you proposed. >>>>>>>>>>>>>> 2. What is your opinion about using more simple decision about >>>>>>>>>>>>>> insertion point - if bb has use of phi result insert phi predication >>>>>>>>>>>>>> before it and at the bb end otherwise. I assume that critical edge >>>>>>>>>>>>>> splitting is not a good decision. >>>>>>>>>>>>> >>>>>>>>>>>>> Why not always insert before the use? Which would be after labels, >>>>>>>>>>>>> what we do for two-arg PHIs. That is, how can it be that you predicate >>>>>>>>>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming >>>>>>>>>>>>> edges you get SSA uses with defs that are in BB1 itself? That >>>>>>>>>>>>> can only happen for backedges but those you can't remove in any case. >>>>>>>>>>>>> >>>>>>>>>>>>> Richard. >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best regards. >>>>>>>>>>>>>> Yuri. >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener : >>>>>>>>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev wrote: >>>>>>>>>>>>>>>> Hi Richard, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I resend you patch1 and patch2 with minor changes: >>>>>>>>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv. >>>>>>>>>>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge. >>>>>>>>>>>>>>>> I also very sorry that I sent you bad patch. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Now let me answer on your questions related to second patch. >>>>>>>>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and >>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Let's consider the following simple test-case: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> #pragma omp simd safelen(8) >>>>>>>>>>>>>>>> for (i=0; i<512; i++) >>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>> float t = a[i]; >>>>>>>>>>>>>>>> if (t > 0.0f & t < 1.0e+17f) >>>>>>>>>>>>>>>> if (c[i] != 0) /* c is integer array. */ >>>>>>>>>>>>>>>> res += 1; >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> we can see the following phi node correspondent to res: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> # res_1 = PHI >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> It is clear that we can optimize it to phi node with 2 arguments only >>>>>>>>>>>>>>>> and only one check can be used for phi predication (for reduction in >>>>>>>>>>>>>>>> our case), namely predicate of bb_5. In general case we can't do it >>>>>>>>>>>>>>>> even if we sort all phi argument values since we still have to produce >>>>>>>>>>>>>>>> a chain of cond expressions to perform phi predication (see comments >>>>>>>>>>>>>>>> for predicate_arbitrary_scalar_phi). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> How so? We can always use !(condition) for the "last" value, thus >>>>>>>>>>>>>>> treat it as an 'else' case. That even works for >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> # res_1 = PHI >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> where the condition for edges 5 and 7 can be computed as >>>>>>>>>>>>>>> ! (condition for 3 || condition for 4). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Of course it is worthwhile to also sort single-occurances first >>>>>>>>>>>>>>> so your case gets just the condiiton for edge 5 and its inversion >>>>>>>>>>>>>>> used for edges 3 and 4 combined. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2. Why we need to introduce find_insertion_point? >>>>>>>>>>>>>>>> Let's consider another test-case extracted from 175.vpr ( t5.c is >>>>>>>>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has >>>>>>>>>>>>>>>> only critical incoming edges and both contain code computing edge >>>>>>>>>>>>>>>> predicates, e.g. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> : >>>>>>>>>>>>>>>> # xmax_edge_30 = PHI >>>>>>>>>>>>>>>> _46 = xmax_17 == xmax_37; >>>>>>>>>>>>>>>> _47 = xmax_17 == xmax_27; >>>>>>>>>>>>>>>> _48 = _46 & _47; >>>>>>>>>>>>>>>> _53 = xmax_17 == xmax_37; >>>>>>>>>>>>>>>> _54 = ~_53; >>>>>>>>>>>>>>>> _55 = xmax_17 == xmax_27; >>>>>>>>>>>>>>>> _56 = _54 & _55; >>>>>>>>>>>>>>>> _57 = _48 | _56; >>>>>>>>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1; >>>>>>>>>>>>>>>> goto ; >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> It is evident that we can not put phi predication at the block >>>>>>>>>>>>>>>> beginning but need to put it after predicate computations. >>>>>>>>>>>>>>>> Note also that if there are no critical edges for phi arguments >>>>>>>>>>>>>>>> insertion point will be "after labels" Note also that phi result can >>>>>>>>>>>>>>>> have use in this block too, so we can't put predication code to the >>>>>>>>>>>>>>>> block end. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> So the issue is that predicate insertion for edge predicates does >>>>>>>>>>>>>>> not happen on the edge but somewhere else (generally impossible >>>>>>>>>>>>>>> for critical edges unless you split them). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I think I've told you before that I prefer simple solutions to such issues, >>>>>>>>>>>>>>> like splitting the edge! Certainly not involving a function walking >>>>>>>>>>>>>>> GENERIC expressions. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Richard. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Let me know if you still have any questions. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Best regards. >>>>>>>>>>>>>>>> Yuri. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener : >>>>>>>>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev wrote: >>>>>>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Here is the second patch related to extended predication. >>>>>>>>>>>>>>>>>> Few comments which explain a main goal of design. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since it may >>>>>>>>>>>>>>>>>> lead to less efficient binaries. >>>>>>>>>>>>>>>>>> 2. One special case of extended PHI node predication was introduced >>>>>>>>>>>>>>>>>> when #arguments is more than 2 but only two arguments are different >>>>>>>>>>>>>>>>>> and one argument has the only occurrence. For such PHI conditional >>>>>>>>>>>>>>>>>> scalar reduction is applied. >>>>>>>>>>>>>>>>>> This is correspondent to the following statement: >>>>>>>>>>>>>>>>>> if (q1 && q2 && q3) var++ >>>>>>>>>>>>>>>>>> New function phi_has_two_different_args was introduced to detect such phi. >>>>>>>>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption that at >>>>>>>>>>>>>>>>>> least one incoming edge for blocks containing PHI is not critical - it >>>>>>>>>>>>>>>>>> guarantees that all computations related to predicate of normal edge >>>>>>>>>>>>>>>>>> are already inserted above this block and >>>>>>>>>>>>>>>>>> code related to PHI predication can be inserted at the beginning of >>>>>>>>>>>>>>>>>> block. But this is not true for critical edges for which predicate >>>>>>>>>>>>>>>>>> computations are in the block where code for phi predication must be >>>>>>>>>>>>>>>>>> inserted. So new function find_insertion_point is introduced which is >>>>>>>>>>>>>>>>>> simply found out the last statement in block defining predicates >>>>>>>>>>>>>>>>>> correspondent to all incoming edges and insert phi predication code >>>>>>>>>>>>>>>>>> after it (with some minor exceptions). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@ >>>>>>>>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> a few remarks nevertheless. I don't see how we need both >>>>>>>>>>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi. >>>>>>>>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value >>>>>>>>>>>>>>>>> and handle equal values specially in predicate_extended_scalar_phi? >>>>>>>>>>>>>>>>> That would even make PHI more optimal. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I don't understand the need for find_insertion_point. All SSA names >>>>>>>>>>>>>>>>> required for the predicates are defined upward - and the complex CFG >>>>>>>>>>>>>>>>> is squashed to a single basic-block, thus the defs will dominate the >>>>>>>>>>>>>>>>> inserted code if you insert after labels just like for the other case. >>>>>>>>>>>>>>>>> Or what am I missing? ("flattening" of the basic-blocks of course needs >>>>>>>>>>>>>>>>> to happen in dominator order - but I guess that happens already?) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even >>>>>>>>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple >>>>>>>>>>>>>>>>> times that would have been nice to vectorize. I suggest to >>>>>>>>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this. We can do this as >>>>>>>>>>>>>>>>> followup, but please rename the local flag_force_vectorize flag >>>>>>>>>>>>>>>>> to something less looking like a flag, like simply 'aggressive'. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Otherwise patch 2 looks ok to me. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Richard. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ChangeLog: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 2014-10-24 Yuri Rumyantsev >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use >>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag. >>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if >>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true. >>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one >>>>>>>>>>>>>>>>>> non-critical incoming edge. >>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function. >>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access >>>>>>>>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi >>>>>>>>>>>>>>>>>> arguments if EXTENDED is true. Change check that block >>>>>>>>>>>>>>>>>> containing reduction statement candidate is predecessor >>>>>>>>>>>>>>>>>> of phi-block since phi may have more than two arguments. >>>>>>>>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert >>>>>>>>>>>>>>>>>> statement before/after gsi point. >>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended >>>>>>>>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument >>>>>>>>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of >>>>>>>>>>>>>>>>>> convert_scalar_cond_reduction. >>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function. >>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function. >>>>>>>>>>>>>>>>>> (find_insertion_point): New function. >>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and >>>>>>>>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more >>>>>>>>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke >>>>>>>>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or >>>>>>>>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on >>>>>>>>>>>>>>>>>> EXTENDED value. >>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block >>>>>>>>>>>>>>>>>> may have statements to insert. Insert predicate of BB just after label >>>>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true. >>>>>>>>>>>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which >>>>>>>>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.