From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 8888 invoked by alias); 10 Dec 2014 15:22:31 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 8872 invoked by uid 89); 10 Dec 2014 15:22:29 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-wg0-f42.google.com Received: from mail-wg0-f42.google.com (HELO mail-wg0-f42.google.com) (74.125.82.42) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Wed, 10 Dec 2014 15:22:21 +0000 Received: by mail-wg0-f42.google.com with SMTP id z12so3953870wgg.15 for ; Wed, 10 Dec 2014 07:22:18 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.194.71.203 with SMTP id x11mr7517421wju.131.1418224938478; Wed, 10 Dec 2014 07:22:18 -0800 (PST) Received: by 10.216.77.73 with HTTP; Wed, 10 Dec 2014 07:22:18 -0800 (PST) In-Reply-To: References: Date: Wed, 10 Dec 2014 15:22:00 -0000 Message-ID: Subject: Re: [PATCH 2/3] Extended if-conversion From: Yuri Rumyantsev To: Richard Biener Cc: gcc-patches , Igor Zamyatin Content-Type: text/plain; charset=UTF-8 X-SW-Source: 2014-12/txt/msg00933.txt.bz2 Richard, Thanks for your reply! I didn't understand your point: Well, I don't mind splitting all critical edges unconditionally but you do it unconditionally in proposed patch. Also I assume that call of split_critical_edges() can break ssa. For example, we can split headers of loops, loop exit blocks etc. I prefer to do something more loop-specialized, e.g. call edge_split() for critical edges outgoing from bb ending with GIMPLE_COND stmt (assuming that edge destination bb belongs to loop). 2014-12-10 17:31 GMT+03:00 Richard Biener : > On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev wrote: >> Richard, >> >> Sorry that I forgot to delete debug dump from my fix. >> I have few questions about your comments. >> >> 1. You wrote : >>> You also still have two functions for PHI predication. And the >>> new extended variant doesn't commonize the 2-args and general >>> path >> Did you mean that I must combine predicate_scalar_phi and >> predicate_extended scalar phi to one function? >> Please note that if additional flag was not set up (i.e. >> aggressive_if_conv is false) extended predication is required more >> compile time since it builds hash_map. > > It's compile-time complexity is reasonable enough even for > non-aggressive if-conversion. > >> 2. About critical edge splitting. >> >> Did you mean that we should perform it (1) under aggressive_if_conv >> option only; (2) should we split all critical edges. >> Note that this leads to recomputing of topological order. > > Well, I don't mind splitting all critical edges unconditionally, thus > do something like > > Index: gcc/tree-if-conv.c > =================================================================== > --- gcc/tree-if-conv.c (revision 218515) > +++ gcc/tree-if-conv.c (working copy) > @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f > if (number_of_loops (fun) <= 1) > return 0; > > + bool critical_edges_split_p = false; > FOR_EACH_LOOP (loop, 0) > if (flag_tree_loop_if_convert == 1 > || flag_tree_loop_if_convert_stores == 1 > || ((flag_tree_loop_vectorize || loop->force_vectorize) > && !loop->dont_vectorize)) > - todo |= tree_if_conversion (loop); > + { > + if (!critical_edges_split_p) > + { > + split_critical_edges (); > + critical_edges_split_p = true; > + todo |= TODO_cleanup_cfg; > + } > + todo |= tree_if_conversion (loop); > + } > > #ifdef ENABLE_CHECKING > { > >> It is worth noting that in current implementation bb's with 2 >> predecessors and both are on critical edges are accepted without >> additional option. > > Yes, I know. > > tree-if-conv.c is a mess right now and if we can avoid adding more > to it and even fix the critical edge missed optimization with splitting > critical edges then I am all for that solution. > > Richard. > >> Thanks ahead. >> Yuri. >> 2014-12-09 18:20 GMT+03:00 Richard Biener : >>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev wrote: >>>> Richard, >>>> >>>> Here is updated patch2 with the following changes: >>>> 1. Delete functions phi_has_two_different_args and find_insertion_point. >>>> 2. Use only one function for extended predication - >>>> predicate_extended_scalar_phi. >>>> 3. Save gsi before insertion of predicate computations for basic >>>> blocks if it has 2 predecessors and >>>> both incoming edges are critical or it gas more than 2 predecessors >>>> and at least one incoming edge >>>> is critical. This saved iterator can be used by extended phi predication. >>>> >>>> Here is motivated test-case which explains this point. >>>> Test-case is attached (t5.c) and it must be compiled with -O2 >>>> -ftree-loop-vectorize -fopenmp options. >>>> The problem phi is in bb-7: >>>> >>>> bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 }) >>>> { >>>> : >>>> xmax_edge_18 = xmax_edge_36 + 1; >>>> if (xmax_17 == xmax_27) >>>> goto ; >>>> else >>>> goto ; >>>> >>>> } >>>> bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 }) >>>> { >>>> : >>>> if (xmax_17 == xmax_27) >>>> goto ; >>>> else >>>> goto ; >>>> >>>> } >>>> bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 }) >>>> { >>>> : >>>> # xmax_edge_30 = PHI >>>> xmax_edge_19 = xmax_edge_39 + 1; >>>> goto ; >>>> >>>> } >>>> >>>> Note that both incoming edges to bb_7 are critical. If we comment out >>>> restoring gsi in predicate_all_scalar_phi: >>>> #if 0 >>>> if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb)) >>>> || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb))) >>>> gsi = bb_insert_point (bb); >>>> else >>>> #endif >>>> gsi = gsi_after_labels (bb); >>>> >>>> we will get ICE: >>>> t5.c: In function 'foo': >>>> t5.c:9:6: error: definition in block 4 follows the use >>>> void foo (int n) >>>> ^ >>>> for SSA_NAME: _1 in statement: >>>> _52 = _1 & _3; >>>> t5.c:9:6: internal compiler error: verify_ssa failed >>>> >>>> smce predicate computations were inserted in bb_7. >>> >>> The issue is obviously that the predicates have already been emitted >>> in the target BB - that's of course the wrong place. This is done >>> by insert_gimplified_predicates. >>> >>> This just shows how edge predicate handling is broken - we don't >>> seem to have a sequence of gimplified stmts for edge predicates >>> but push those to e->dest which makes this really messy. >>> >>> Rather than having a separate phase where we insert all >>> gimplified bb predicates we should do that on-demand when >>> predicating a PHI. >>> >>> Your patch writes to stderr - that's bad - use dump_file and guard >>> the printfs properly. >>> >>> You also still have two functions for PHI predication. And the >>> new extended variant doesn't commonize the 2-args and general >>> paths. >>> >>> I'm not at all happy with this code. It may be existing if-conv codes >>> fault but making it even worse is not an option. >>> >>> Again - what's wrong with simply splitting critical edges if >>> aggressive_if_conv? I think that would very much simplify >>> things here. Or alternatively use gsi_insert_on_edge and >>> commit edge insertions before merging the blocks. >>> >>> Thanks, >>> Richard. >>> >>>> ChangeLog is >>>> >>>> 2014-12-09 Yuri Rumyantsev >>>> >>>> * tree-if-conv.c : Include hash-map.h. >>>> (struct bb_predicate_s): Add new field to save copy of gimple >>>> statement iterator. >>>> (bb_insert_point): New function. >>>> (set_bb_insert_point): New function. >>>> (has_pred_critical_p): New function. >>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if >>>> AGGRESSIVE_IF_CONV is true. >>>> (if_convertible_bb_p): Delete check that bb has at least one >>>> non-critical incoming edge. >>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED. >>>> Allow interchange PHI arguments if EXTENDED is false. >>>> Change check that block containing reduction statement candidate >>>> is predecessor of phi-block since phi may have more than two arguments. >>>> (predicate_scalar_phi): Add new arguments for call of >>>> is_cond_scalar_reduction. >>>> (get_predicate_for_edge): New function. >>>> (struct phi_args_hash_traits): New type. >>>> (phi_args_hash_traits::hash): New function. >>>> (phi_args_hash_traits::equal_keys): New function. >>>> (gen_phi_arg_condition): New function. >>>> (predicate_extended_scalar_phi): New function. >>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it >>>> to true if BB containing phi has more than 2 predecessors or both >>>> incoming edges are critical. Invoke find_phi_replacement_condition and >>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB >>>> has 2 predecessors and both incoming edges are critical or it has more >>>> than 2 predecessors and atleast one incoming edge is critical. >>>> Use standard gsi_after_labels otherwise. >>>> Invoke predicate_extended_scalar_phi if EXTENDED is true. >>>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION >>>> to save gsi before insertion of predicate computations. SEt-up it to >>>> true for BB with 2 predecessors and critical incoming edges either >>>> number of predecessors is geater 2 and at least one incoming edge is >>>> critical. >>>> Add check that non-predicated block may have statements to insert. >>>> Insert predicate computation of BB just after label if >>>> EXTENDED_PREDICATION is true. >>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which >>>> is copy of inner or outer loop force_vectorize field. >>>> >>>> >>>> >>>> >>>> 2014-12-04 16:37 GMT+03:00 Richard Biener : >>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev wrote: >>>>>> Richard, >>>>>> >>>>>> I did simple change by saving gsi iterator for each bb that has >>>>>> critical edges by adding additional field to bb_predicate_s: >>>>>> >>>>>> typedef struct bb_predicate_s { >>>>>> >>>>>> /* The condition under which this basic block is executed. */ >>>>>> tree predicate; >>>>>> >>>>>> /* PREDICATE is gimplified, and the sequence of statements is >>>>>> recorded here, in order to avoid the duplication of computations >>>>>> that occur in previous conditions. See PR44483. */ >>>>>> gimple_seq predicate_gimplified_stmts; >>>>>> >>>>>> /* Insertion point for blocks having incoming critical edges. */ >>>>>> gimple_stmt_iterator gsi; >>>>>> } *bb_predicate_p; >>>>>> >>>>>> and this iterator is saved in insert_gimplified_predicates before >>>>>> insertion code for predicate computation. I checked that this fix >>>>>> works. >>>>> >>>>> Huh? I still wonder what the issue is with inserting everything >>>>> after the PHI we predicate. >>>>> >>>>> Well, your updated patch will come with testcases for the testsuite >>>>> that will hopefully fail if doing that. >>>>> >>>>> Richard. >>>>> >>>>>> >>>>>> Now I am implementing merging of predicate_extended.. and >>>>>> predicate_arbitrary.. functions as you proposed. >>>>>> >>>>>> Best regards. >>>>>> Yuri. >>>>>> >>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener : >>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev wrote: >>>>>>>> Thanks Richard for your quick reply! >>>>>>>> >>>>>>>> 1. I agree that we can combine predicate_extended_ and >>>>>>>> predicate_arbitrary_ to one function as you proposed. >>>>>>>> 2. What is your opinion about using more simple decision about >>>>>>>> insertion point - if bb has use of phi result insert phi predication >>>>>>>> before it and at the bb end otherwise. I assume that critical edge >>>>>>>> splitting is not a good decision. >>>>>>> >>>>>>> Why not always insert before the use? Which would be after labels, >>>>>>> what we do for two-arg PHIs. That is, how can it be that you predicate >>>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming >>>>>>> edges you get SSA uses with defs that are in BB1 itself? That >>>>>>> can only happen for backedges but those you can't remove in any case. >>>>>>> >>>>>>> Richard. >>>>>>> >>>>>>>> >>>>>>>> Best regards. >>>>>>>> Yuri. >>>>>>>> >>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener : >>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev wrote: >>>>>>>>>> Hi Richard, >>>>>>>>>> >>>>>>>>>> I resend you patch1 and patch2 with minor changes: >>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv. >>>>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge. >>>>>>>>>> I also very sorry that I sent you bad patch. >>>>>>>>>> >>>>>>>>>> Now let me answer on your questions related to second patch. >>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and >>>>>>>>>> predicate_arbitrary_scalar_phi? >>>>>>>>>> >>>>>>>>>> Let's consider the following simple test-case: >>>>>>>>>> >>>>>>>>>> #pragma omp simd safelen(8) >>>>>>>>>> for (i=0; i<512; i++) >>>>>>>>>> { >>>>>>>>>> float t = a[i]; >>>>>>>>>> if (t > 0.0f & t < 1.0e+17f) >>>>>>>>>> if (c[i] != 0) /* c is integer array. */ >>>>>>>>>> res += 1; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> we can see the following phi node correspondent to res: >>>>>>>>>> >>>>>>>>>> # res_1 = PHI >>>>>>>>>> >>>>>>>>>> It is clear that we can optimize it to phi node with 2 arguments only >>>>>>>>>> and only one check can be used for phi predication (for reduction in >>>>>>>>>> our case), namely predicate of bb_5. In general case we can't do it >>>>>>>>>> even if we sort all phi argument values since we still have to produce >>>>>>>>>> a chain of cond expressions to perform phi predication (see comments >>>>>>>>>> for predicate_arbitrary_scalar_phi). >>>>>>>>> >>>>>>>>> How so? We can always use !(condition) for the "last" value, thus >>>>>>>>> treat it as an 'else' case. That even works for >>>>>>>>> >>>>>>>>> # res_1 = PHI >>>>>>>>> >>>>>>>>> where the condition for edges 5 and 7 can be computed as >>>>>>>>> ! (condition for 3 || condition for 4). >>>>>>>>> >>>>>>>>> Of course it is worthwhile to also sort single-occurances first >>>>>>>>> so your case gets just the condiiton for edge 5 and its inversion >>>>>>>>> used for edges 3 and 4 combined. >>>>>>>>> >>>>>>>>>> 2. Why we need to introduce find_insertion_point? >>>>>>>>>> Let's consider another test-case extracted from 175.vpr ( t5.c is >>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has >>>>>>>>>> only critical incoming edges and both contain code computing edge >>>>>>>>>> predicates, e.g. >>>>>>>>>> >>>>>>>>>> : >>>>>>>>>> # xmax_edge_30 = PHI >>>>>>>>>> _46 = xmax_17 == xmax_37; >>>>>>>>>> _47 = xmax_17 == xmax_27; >>>>>>>>>> _48 = _46 & _47; >>>>>>>>>> _53 = xmax_17 == xmax_37; >>>>>>>>>> _54 = ~_53; >>>>>>>>>> _55 = xmax_17 == xmax_27; >>>>>>>>>> _56 = _54 & _55; >>>>>>>>>> _57 = _48 | _56; >>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1; >>>>>>>>>> goto ; >>>>>>>>>> >>>>>>>>>> It is evident that we can not put phi predication at the block >>>>>>>>>> beginning but need to put it after predicate computations. >>>>>>>>>> Note also that if there are no critical edges for phi arguments >>>>>>>>>> insertion point will be "after labels" Note also that phi result can >>>>>>>>>> have use in this block too, so we can't put predication code to the >>>>>>>>>> block end. >>>>>>>>> >>>>>>>>> So the issue is that predicate insertion for edge predicates does >>>>>>>>> not happen on the edge but somewhere else (generally impossible >>>>>>>>> for critical edges unless you split them). >>>>>>>>> >>>>>>>>> I think I've told you before that I prefer simple solutions to such issues, >>>>>>>>> like splitting the edge! Certainly not involving a function walking >>>>>>>>> GENERIC expressions. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Richard. >>>>>>>>> >>>>>>>>>> Let me know if you still have any questions. >>>>>>>>>> >>>>>>>>>> Best regards. >>>>>>>>>> Yuri. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener : >>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev wrote: >>>>>>>>>>>> Hi All, >>>>>>>>>>>> >>>>>>>>>>>> Here is the second patch related to extended predication. >>>>>>>>>>>> Few comments which explain a main goal of design. >>>>>>>>>>>> >>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since it may >>>>>>>>>>>> lead to less efficient binaries. >>>>>>>>>>>> 2. One special case of extended PHI node predication was introduced >>>>>>>>>>>> when #arguments is more than 2 but only two arguments are different >>>>>>>>>>>> and one argument has the only occurrence. For such PHI conditional >>>>>>>>>>>> scalar reduction is applied. >>>>>>>>>>>> This is correspondent to the following statement: >>>>>>>>>>>> if (q1 && q2 && q3) var++ >>>>>>>>>>>> New function phi_has_two_different_args was introduced to detect such phi. >>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption that at >>>>>>>>>>>> least one incoming edge for blocks containing PHI is not critical - it >>>>>>>>>>>> guarantees that all computations related to predicate of normal edge >>>>>>>>>>>> are already inserted above this block and >>>>>>>>>>>> code related to PHI predication can be inserted at the beginning of >>>>>>>>>>>> block. But this is not true for critical edges for which predicate >>>>>>>>>>>> computations are in the block where code for phi predication must be >>>>>>>>>>>> inserted. So new function find_insertion_point is introduced which is >>>>>>>>>>>> simply found out the last statement in block defining predicates >>>>>>>>>>>> correspondent to all incoming edges and insert phi predication code >>>>>>>>>>>> after it (with some minor exceptions). >>>>>>>>>>> >>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get >>>>>>>>>>> >>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@ >>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop) >>>>>>>>>>> >>>>>>>>>>> a few remarks nevertheless. I don't see how we need both >>>>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi. >>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value >>>>>>>>>>> and handle equal values specially in predicate_extended_scalar_phi? >>>>>>>>>>> That would even make PHI more optimal. >>>>>>>>>>> >>>>>>>>>>> I don't understand the need for find_insertion_point. All SSA names >>>>>>>>>>> required for the predicates are defined upward - and the complex CFG >>>>>>>>>>> is squashed to a single basic-block, thus the defs will dominate the >>>>>>>>>>> inserted code if you insert after labels just like for the other case. >>>>>>>>>>> Or what am I missing? ("flattening" of the basic-blocks of course needs >>>>>>>>>>> to happen in dominator order - but I guess that happens already?) >>>>>>>>>>> >>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even >>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple >>>>>>>>>>> times that would have been nice to vectorize. I suggest to >>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this. We can do this as >>>>>>>>>>> followup, but please rename the local flag_force_vectorize flag >>>>>>>>>>> to something less looking like a flag, like simply 'aggressive'. >>>>>>>>>>> >>>>>>>>>>> Otherwise patch 2 looks ok to me. >>>>>>>>>>> >>>>>>>>>>> Richard. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> ChangeLog: >>>>>>>>>>>> >>>>>>>>>>>> 2014-10-24 Yuri Rumyantsev >>>>>>>>>>>> >>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use >>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag. >>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if >>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true. >>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one >>>>>>>>>>>> non-critical incoming edge. >>>>>>>>>>>> (phi_has_two_different_args): New function. >>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access >>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi >>>>>>>>>>>> arguments if EXTENDED is true. Change check that block >>>>>>>>>>>> containing reduction statement candidate is predecessor >>>>>>>>>>>> of phi-block since phi may have more than two arguments. >>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert >>>>>>>>>>>> statement before/after gsi point. >>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended >>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument >>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of >>>>>>>>>>>> convert_scalar_cond_reduction. >>>>>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function. >>>>>>>>>>>> (predicate_extended_scalar_phi): New function. >>>>>>>>>>>> (find_insertion_point): New function. >>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and >>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more >>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke >>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or >>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on >>>>>>>>>>>> EXTENDED value. >>>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block >>>>>>>>>>>> may have statements to insert. Insert predicate of BB just after label >>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true. >>>>>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which >>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.