From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 57194 invoked by alias); 15 Jul 2015 10:49:55 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 57185 invoked by uid 89); 15 Jul 2015 10:49:55 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-oi0-f42.google.com Received: from mail-oi0-f42.google.com (HELO mail-oi0-f42.google.com) (209.85.218.42) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Wed, 15 Jul 2015 10:49:50 +0000 Received: by oiab3 with SMTP id b3so25559437oia.1 for ; Wed, 15 Jul 2015 03:49:48 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.202.49.212 with SMTP id x203mr2950984oix.81.1436957388227; Wed, 15 Jul 2015 03:49:48 -0700 (PDT) Received: by 10.76.115.167 with HTTP; Wed, 15 Jul 2015 03:49:48 -0700 (PDT) In-Reply-To: References: Date: Wed, 15 Jul 2015 10:58:00 -0000 Message-ID: Subject: Re: conditional lim From: Richard Biener To: Evgeniya Maenkova Cc: GCC Patches Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2015-07/txt/msg01251.txt.bz2 On Wed, Jul 15, 2015 at 10:36 AM, Evgeniya Maenkova wrote: > On Tue, Jul 14, 2015 at 2:54 PM, Richard Biener > wrote: >> On Mon, Jun 29, 2015 at 4:21 PM, Evgeniya Maenkova >> wrote: >>> On Mon, Jun 29, 2015 at 5:10 PM, Richard Biener >>> wrote: >>>> On Tue, Jun 9, 2015 at 10:11 PM, Evgeniya Maenkova >>>> wrote: >>>>> On Tue, Jun 9, 2015 at 3:46 PM, Richard Biener >>>>> wrote: >>>>>> On Fri, May 29, 2015 at 3:14 PM, Evgeniya Maenkova >>>>>> wrote: >>>>>>> Hi Richard, >>>>>>> >>>>>>> Here is some explanation. I hope you let me know if I need to clari= fy something. >>>>>>> >>>>>>> Also, you asked me about concrete example, to make sure you don=E2= =80=99t miss >>>>>>> my answer here is the link: >>>>>>> https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02417.html. >>>>>>> >>>>>>> Also, I doubt whether it=E2=80=99s convenient for you to create a b= uild with >>>>>>> my patch or not. May be to clarify things you could send me some >>>>>>> examples/concrete cases, then I=E2=80=99ll compile them with >>>>>>> =E2=80=93fdump-tree-loopinit-details and =E2=80=93fdump-tree-lim-de= tails and send you >>>>>>> these dumps. May be these dumps will be useful. (I=E2=80=99ll only = disable >>>>>>> cleanup_cfg TODO after lim to let you know the exact picture after >>>>>>> lim). >>>>>>> >>>>>>> What do you think? >>>>>>> >>>>>>> 1. invariantness _dom_walker =E2=80=93 >>>>>>> >>>>>>> 1.1 for each GIMPLE_COND in given bb calls handle_cond_stmt to ca= ll >>>>>>> for true and false edges handle_branch_edge, which calls SET_TARGET= _OF >>>>>>> for all bb =E2=80=98predicated=E2=80=99 by given GIMPLE_COND. >>>>>>> >>>>>>> SET_TARGET_OF sets in basic_blocks aux 2 facts: >>>>>>> >>>>>>> a) this is true or false edge; >>>>>>> >>>>>>> b) link to cond stmt; >>>>>>> >>>>>>> Handle_branch_edge works this way: >>>>>>> >>>>>>> If (cond1) >>>>>>> >>>>>>> { >>>>>>> >>>>>>> bb1; >>>>>>> >>>>>>> if (cond2} >>>>>>> >>>>>>> { >>>>>>> >>>>>>> bb2; >>>>>>> >>>>>>> } >>>>>>> >>>>>>> Being called for cond1, it sets cond1 as condition for both bb1 = and >>>>>>> bb2 (the whole branch for cond1, ie also for bb containing cond2), >>>>>>> then this method will be called (as there is dominance order) for >>>>>>> cond2 to correct things (ie to set cond2 as condition for bb2). >>>>>> >>>>>> Hmm, why not track the current condition as state during the DOM walk >>>>>> and thus avoid processing more than one basic-block in handle_branch= _edge? >>>>>> Thus get rid of handle_branch_edge and instead do everything in hand= le_cond_stmt >>>>>> plus the dom-walkers BB visitor? >>>>>> >>>>> I need to look more carefully how to implement it, but I think I >>>>> understand what you mean and this optimization of course looks >>>>> reasonable to me. Will do. >>>>> >>>>>> I see you don't handle BBs with multiple predecessors - that's ok, b= ut >>>>>> are you sure you don't run into correctness issues when not marking = such >>>>>> BBs as predicated? This misses handling of, say >>>>>> >>>>>> if (a || b) >>>>>> bb; >>>>>> >>>>>> which is a pity (but can be fixed later if desired). >>>>>> >>>>> I had some test (in gcc testsuite or bootstrap build) which worked >>>>> incorrectly because of multiple predecessors. As far as I remember the >>>>> situation was (further, will make some notes about such tests to >>>>> clarify this better), I mean with previous version of my code which >>>>> handled bb with 2 predecessors: >>>>> if (a) >>>>> tmpvar=3Dsomething; >>>>> while() >>>>> if (a || b) >>>>> basic_block {do something with tmpvar;} // I mean basic block >>>>> predicated by bb with a and bb with b >>>>> >>>>> So, if a is false, I mean we didn't do tmpvar=3Dsomething (outside >>>>> loop), BUT we are in basick_block (we went by bb with b), we got >>>>> Segmentation falt in basic_block {do something with tmpvar;}. >>>>> >>>>> I think we can reproduce all the details of this test if I remove not >>>>> handling bb with 2 predecessors. >>>>> >>>>> So I wouldn't move bb with 2 predecessors (this is not always executed >>>>> bb in any loop, not conditionally, they will not be moved at all). >>>>> >>>>> This is my more detail explanation on this point. Perhaps, I didn't >>>>> understand your question about correctness. Could you repeat it in >>>>> other words (based on this new clarification). >>>>> >>>>> So I think according to current code it will not be moved. What >>>>> incorrectness do you mean? >>>> >>>> If the block isn't marked as predicated the question is whether it is >>>> handled correctly or assumed to be unconditionally executed. >>>> >>>>>> I note that collecting predicates has similarities to what if-conver= sion >>>>>> does in tree-ifcvt.c (even if its implementation is completely diffe= rent, >>>>>> of course). >>>>>> >>>>> >>>>> Ok, I'll look at this. But could you please clarify your point? >>>>> (Should I just take this into account with low priority and look at >>>>> this later or you want some refactoring?) >>>> >>>> I just noted similar code exists elsewhere - it may be possible to >>>> factor it out but I didn't investigate. And no, doing that isn't a pr= erequesite >>>> for this patch. >>>> >>>>>>> 1.2 As 1.1 goes we identify whether some bb is predicated by some >>>>>>> condition or not. >>>>>>> >>>>>>> bb->aux->type will be [TRUE/FALSE]_TARGET_OF and >>>>>>> bb->aux->cond_stmt=3Dcond stmt (the nearest condition). >>>>>>> >>>>>>> If bb is always executed bb->aux->type =3D ALWAYS_EXECUTED_IN, >>>>>>> bb->loop->loop (this info was available in the clean build). >>>>>>> >>>>>>> 1.3 As this walker is called in dominance order, information about >>>>>>> condition is available when invariantness_dom_walker is called for >>>>>>> given bb. So we can make some computations based on bb->aux >>>>>>> structure. This is done in check_conditionally_executed. The main g= oal >>>>>>> of this method is to check that the root condition is always execut= ed >>>>>>> in the loop. I did so to avoid situation like this >>>>>>> >>>>>>> Loop: >>>>>>> >>>>>>> Jmp somewhere; >>>>>>> >>>>>>> If (cond1) >>>>>>> >>>>>>> If (cond2) >>>>>>> >>>>>>> Etc >>>>>>> >>>>>>> By =E2=80=98root condition=E2=80=99 I mean cond1 in this case (the = first cond in >>>>>>> dominance order). >>>>>> >>>>>> Ok, but this can't happen (an uncontional jump to somehwere). It >>>>>> will always be a conditional jump (a loop exit) and thus should >>>>>> be handled already. >>>>>> >>>>>> Did you have a testcase that was mishandled? >>>>> >>>>> No, I have not such testcase. This is because I;m newbie at gcc and >>>>> compilers. If you tell me this fact, let's just remove this check? >>>>> Correct? >>>> >>>> Yes. >>>> >>>>>> >>>>>>> If =E2=80=98root condition=E2=80=99 is not always executed we disa= ble (free) all the >>>>>>> info in bb->aux, ie all the blocks becomes neither conditionally nor >>>>>>> always executed according to bb->aux. >>>>>>> >>>>>>> There is some optimization to don=E2=80=99t go each time trough the= conditions >>>>>>> chain (cond2->cond1), let me skip such details for now. >>>>>>> >>>>>>> >>>>>>> >>>>>>> 1.4 Then we calculate tgt_level (which is used in move_computati= ons later) >>>>>>> >>>>>>> The patch is the same for phi and regular stmt (calculate_tgt_level) >>>>>>> >>>>>>> 1) If stmt is conditionally executed we compute max movement >>>>>>> (determine_max_movement) starting from >>>>>>> get_lim_data(condition)->max_loop. >>>>>>> >>>>>>> 2) If stmt is not cond executed as start level for >>>>>>> determine_max_movement computations we choose ALWAYS_EXECUTED_IN. >>>>>>> >>>>>>> To clarify why, see the comment >>>>>>> >>>>>>> /* The reason why we don't set start_level for MOVE_POSSIBLE >>>>>>> >>>>>>> statements to more outer level is: this statement could d= epend on >>>>>>> >>>>>>> conditional statements and even probably is not defined w= ithout this >>>>>>> >>>>>>> condition. So either we should analyze these ones and move >>>>>>> >>>>>>> under condition somehow or keep more strong start_level .= */ >>>>>>> >>>>>>> As you noted in your review there was some refactoring. Of course, I >>>>>>> had no goal to refactor existing code, I intended to remove some >>>>>>> duplication which I caused by my changes. I hope we discuss in >>>>>>> details later if you don=E2=80=99t like some refactoring. >>>>>> >>>>>> Refactoring makes a big patch harder to review because it ends up >>>>>> containing no-op changes. It also makes it appear bigger than it >>>>>> really is ;) >>>>>> >>>>>> A reasonable solution is to factor out the refactoring to a separate= patch. >>>>>> >>>>> I see what you mean. >>>>> >>>>>>> 2. store_motion - for some stmts set flags to ignore predica= ted >>>>>>> execution (I mean to move statements without conditions). >>>>>> >>>>>> That's for the existing "predicated" support as far as I can see. >>>>>> >>>>>>> >>>>>>> >>>>>>> 3. move_computations. The patch is doing something in the >>>>>>> following 3 calls inside of this method. >>>>>>> >>>>>>> >>>>>>> >>>>>>> 3.1 (more details below) walker.walk =E2=80=93 move computations(= create if() >>>>>>> structure) and store information to create phi nodes later (which >>>>>>> statements require phi nodes, as they conditionally executed and th= ere >>>>>>> are still their uses on other levels, what are the names for such p= hi >>>>>>> nodes, as we replace uses in here, in walker.walk.) >>>>>> >>>>>> What kind of uses are that? By definition all stmts executed >>>>>> under condition A that are used in code not under condition A have >>>>>> a PHI node associated. This is why the existing conditional movement >>>>>> handling moves PHI nodes (and all dependent computations). >>>>>> >>>>>> Are you thinking of >>>>>> >>>>>> for (;;) >>>>>> if (a) x =3D 1; >>>>>> >>>>>> .. use of x >>>>>> >>>>>> ? >>>>> No, I think in this case phi node of x will be used in initial >>>>> (=3D=3Dloopinit) code (in 'use of x') instead of some version of x (w= hich >>>>> I move). >>>>> >>>>>>If you move the PHI node for x then you don't need to insert any other >>>>>> PHI nodes (and you _do_ have to move the PHI node anyway). >>>>> >>>>>> >>>>>>> 3.2 (more details below) create_phi_nodes =E2=80=93 create phi no= des for >>>>>>> statements which require that (see 3.1) >>>>>> >>>>>> Only PHI nodes need PHI node creation and only its dependences >>>>>> might need predication. >>>>>> >>>>>> That is, if you move a conditionally executed statement but not >>>>>> any PHI node it feeds then the movement is useless. >>>>>> >>>>>> But maybe I am missing something about the PHI business. >>>>>> >>>>> Ok. I have different understanding on this (and am afraid of this of >>>>> course :) ). >>>>> >>>>> See the example: >>>>> void bar (long); >>>>> void foo (int cond1, int cond2, int cond3, int m, int n) >>>>> { >>>>> unsigned x; >>>>> unsigned inv1; >>>>> >>>>> for (x =3D 0; x < 1000; ++x) >>>>> { >>>>> if (cond1) >>>>> { >>>>> int tmp_inv1 =3D m*n; >>>>> non_inv1 =3D tmp_inv1 + x; >>>>> } >>>>> } >>>>> bar (non_inv1); >>>>> } >>>>> >>>>> This is working example, will attach loopinit and lim1 dumps. >>>>> There is no phi node for tmp_inv1 in loopinitat all. However, I think >>>>> we should move tmp_inv1. To use tmp_inv1 in non_inv1 we need phi node >>>>> (otherwise we get verification failure as bb with conditionally moved >>>>> tmp_inv1 doesn;t dominate bb with non_inv1). Another example if >>>>> non_inv1 is invariant but has not enough cost or something else. In >>>>> short, if it's still used in initial or others loops. Say, non_inv1 is >>>>> invariant and moved in another loop. Or something else. >>>>> >>>>> So, when I read >>>>> 'Only PHI nodes need PHI node creation' >>>>> and >>>>> 'That is, if you move a conditionally executed statement but not any >>>>> PHI node it feeds then the movement is useless.' >>>>> I think about this example and don't understand. Could you clarify? >>>> >>>> Ok, so this case is already handled by LIM by simply moving tmp_inv1 = =3D m*n >>>> (and making it unconditionally execute before the loop). In fact whet= her >>>> we can move it does not depend on whether 'cond' is invariant or not. >>>> >>>> If we couldn't execute m*n unconditionally on the loop preheader edge >>>> then we'd have to be able to also move the guarding condition. Note >>>> that cost considerations shouldn't come into play here. >>>> >>>> So I was only thinking of the case where we also want to move the >>>> condition (which is wanted if we want to move a PHI node). >>>> >>> >>> Don't understand "So I was only thinking of the case where we also >>> want to move the condition (which is wanted if we want to move a PHI >>> node)." >>> >>> What about the case when we want to move condition but don't have phi >>> node to move. >>> >>> Let's consider a little bit modified example: >>> >>> void bar (long); >>> void foo (int cond1, int cond2, int cond3, int m, int n) >>> { >>> unsigned x; >>> unsigned inv1; >>> >>> for (x =3D 0; x < 1000; ++x) >>> { >>> if (n) >>> { >>> int tmp_inv1 =3D m/n; >>> non_inv1 =3D tmp_inv1 + x; >>> } >>> } >>> bar (non_inv1); >>> } >>> We couldn't move tmp_inv1 uncoditionally. Right (I have no clean >>> build at hands right now to check. If I'm wrong let's replace m/n to >>> something that couldn't be unconditionally moved :))? There is no phi >>> node for tmp_inv1. >>> Could you clarify what conditional lim should do in your opinion here? >> >> Nothing at this point. If only then for the simple reason to make the >> first iteration of the patch simpler (still covering most cases). >> >> You seem to want to catch 100% of all possible cases. >> >>> Is it too little gain to move statements that only used in the same bb >>> or branch (I mean before phi node creation, even if stmt had phi >>> node)? >>> >>> In fact, we could have here 2 thousands of divisions: >>> if (n) >>> { >>> int tmp_inv1 =3D m/n; >>> tmp_inv2 =3D something_which_couldn't_be moved_unconditionally (tmp= _inv1); >>> ... >>> tmp_invN =3D something_which_couldn't_be moved_unconditionally (tmp= _invN-1); >>> non_inv1 =3D tmp_invN + x; >>> } >> >> Of course we could. But the patch didn't even exercise this case (no te= stcase). > > Patch handles this case (this is why I created phi nodes which we > tried to discuss). > Your comment is about test case not about the patch. My comment was about the lack of a testcase for this in the patch submissio= n. But my whole point is that supporting this should be split off to a followup patch to make both patches easier to review. Richard. >> >> Please make patches simple - support for moving non-PHIs can be added >> as a followup. The first important part is to make the current code >> (moving PHIs) >> create control-flow rather than if-converted form. >> >> Richard. >> >>> >>>>>>> >>>>>>> 3.3 replace_uses_in_conditions >>>>>>> >>>>>>> After computations movements we can have several copies of the cond >>>>>>> stmt. In 3.1 we do replacement of uses based on stmt=E2=80=99s tgt_= level. For, >>>>>>> cond stmt it doesn=E2=80=99t work. As cond stmt, of course, have so= mething in >>>>>>> lim_aux_data->tgt_level, but, in fact, they are moved only if >>>>>>> corresponding invariants are moved. So, in fact, the same cond (cop= ies >>>>>>> of it, of course) could go to the different levels. So to fix these >>>>>>> uses, we call replace_uses_in_conditions. >>>>>> >>>>>> Hmm, but stmts that a condition depends on always have at least the >>>>>> target level of the condition statement. Thus their computation is >>>>>> available in all other "same" conditon statements that might appear >>>>>> in more inner loop levels, no? So isn't this just about performing = the >>>>>> movement in a proper ordering? >>>>> >>>>> Let me give your more details, then may be you repeat question in oth= er words. >>>>> for () >>>>> { >>>>> if (a) >>>>> { >>>>> inv1 =3D some_computations; >>>>> if (inv1 < inv2) >>>>> { >>>>> inv3; >>>>> inv4; >>>>> } >>>>> } >>>>> } >>>>> >>>>> So, the order for movement is: inv1; inv3; inv4. When we move inv1 we >>>>> have tgt_level for inv3 and inv4 computed but we have not created >>>>> copies of 'if (inv1 < inv2)' created. In theory, we can have several >>>>> copies:(1) for tgt_level of inv3 (2) for tgt_level of inv2 (3) in >>>>> initial cycle. if (1) is in tgt level of inv1 we need no replacement, >>>>> in other cases we need replacements. Of course, we know this(all the >>>>> info about tgt_levels) when we move inv1, but then we need to create >>>>> all needed copies (store it somehow, etc) of 'if (inv1 < inv2') and >>>>> make replacement in these copies. I don't see now why this is much >>>>> better. This doesn't mean that current solution is perfect, just >>>>> clarify your thought. >>>> >>>> I still don't get what "replacements" actually mean. Let's extend the >>>> example to >>>> >>>> for () (A) >>>> { >>>>> for () (B) >>>>> { >>>>> if (a) >>>>> { >>>>> inv1 =3D some_computations; >>>>> if (inv1 < inv2) >>>>> { >>>>> inv3; >>>>> inv4; >>>>> } >>>>> } >>>>> } >>>> } >>>> >>>> then you say the target level of inv1 could be 'A' and the target >>>> level of inv3 'B' >>>> and that of inv4 is 'B' as well. >>>> >>>> That's true. So we move inv1 to before 'A' and inv3/inv4 to before 'B= ' guarded >>>> with if (inv1 < inv2). I fail to see what "copies" we need here. 'in= v1' is >>>> available in all target levels below its own, no copies needed. >>>> >>> >>> I meant the copies of conditions (COND_STMT: inv1 < inv2). >>> >>> Let's modify this example: tgt_level (inv1) =3D A; tgt_level(inv3) =3D = A; >>> tgt_level(inv4) =3D B. >>> >>> Then >>> if (a) >>> inv1=3Dsome_computations; >>> if (inv1 < inv2) >>> inv3; >>> >>> for () (A) >>> { >>> if (phi_for_inv1 >> inv3 >>> for () (B) >>> { >>> if (a) >>> { >>> if (phi_for_inv1 < inv2) >>> { >>> } >>> } >>> } >>> So, we have 3 copies of inv1 < inv2 after lim (in some of them we use >>> inv1, in other phi_for_inv1). But this is the second priority for now >>> (this is all about whether we should move statements w/o phi node). >>> >>>>>> >>>>>>> More details on 3.1 and 3.2 >>>>>>> >>>>>>> 3.1 The patch is very similar for phi stmts and regular stmts (there >>>>>>> is some difference for regular stmts related to the case when we mo= ve >>>>>>> sequence instead of single stmt, let=E2=80=99s skip it in this desc= ription, >>>>>>> will describe later. BTW, there are comments, in corresponding part= s). >>>>>>> >>>>>>> a) for each bb we call check_conditionally_executed and if bb is co= nd >>>>>>> executed then invariant statement will be moved conditionally. >>>>>>> >>>>>>> b) prepare_edge_for_stmt collects information needed to create phi >>>>>>> nodes in 3.2(add_info_for_phi) AND prepares edge ie creates if() >>>>>>> structure needed to move statement >>>>>>> (get_block_for_predicated_statement, see below). All of this is done >>>>>>> for cond executed stmt. Nothing is done for non cond executed. >>>>>>> >>>>>>> BTW, there is some strange names for functions like missed ending >>>>>>> (prepare_edge_for_stm, w/o t in the end, this is because of my scri= pt >>>>>>> to add =E2=80=98 =E2=80=98 before =E2=80=98(=E2=80=98, will fix in = the next patch version, ignore >>>>>>> please so far). >>>>>>> >>>>>>> Get_block_for_predicated_stmt: >>>>>>> >>>>>>> Create bb where to add cond stmt to (or return existing, each loop = has >>>>>>> a map (level_cond_data_map is map , >>>>>>> cond_data_map is map ) to identify if bb for >>>>>>> given cond was created or not for given level). >>>>>>> >>>>>>> Parameters of this method: >>>>>>> >>>>>>> Cond =E2=80=93 condition of stmt, branch =E2=80=93 is it on true or= false edge of >>>>>>> cond, cond_map - see above, level (tgt_level), preheader =E2=80=93 >>>>>>> preheader_edge for level. >>>>>>> >>>>>>> So first of all we check whether there is a basic block for given >>>>>>> condition in given loop. >>>>>>> >>>>>>> a) If such bb exists then we trying to find corresponding edge, >>>>>>> destination of this edge shouldn=E2=80=99t be a post dominator of bb >>>>>>> (find_branch_edge). >>>>>>> >>>>>>> a.1) if such edge exists we return the most remote in this branch bb >>>>>>> (or create it), see get_most_remote_in_branch; >>>>>>> >>>>>>> a.2) if such edge doesn=E2=80=99t exists, we create corresponding b= b, see make_branch >>>>>>> >>>>>>> b) If such bb doesn=E2=80=99t exist we create it (recursive= ly calling >>>>>>> get_block_for_predicated_stmt if required bb is conditionally >>>>>>> executed). >>>>>>> >>>>>>> Will not describe it further so far (let me know if you would like = to >>>>>>> read additional details). >>>>>>> >>>>>>> Add_info_for_phi >>>>>>> >>>>>>> The main goal of this method is to identify situation where >>>>>>> stmt(parameter,ie the statement which we are moving) or more exactly >>>>>>> lhs of this statement >>>>>>> >>>>>>> a) is still used in other loops, so we need to create phi node= to >>>>>>> use phi name inside other loops. >>>>>>> >>>>>>> b) Is used in phi node of original loop and this phi node is >>>>>>> going to be moved. BUT, as we move phi node as assignment (in the c= ase >>>>>>> of phi node with 2 arguments) we need to create phi node to use lhs= of >>>>>>> statement (param of add_info_for_phi). >>>>>>> >>>>>>> In these cases we make corresponding replacements and store >>>>>>> information needed to create phi nodes and make some other >>>>>>> replacements in 3.3 (replace_uses_in_conditions). >>>>>>> >>>>>>> Add_info_for_phi calls create_tmp_var which requires some explanati= on. >>>>>>> >>>>>>> Create_tmp_var >>>>>>> >>>>>>> To create new names for phi nodes and to create default def for phi >>>>>>> arguments I use make_ssa_name and get_or_create_ssa_default_def. Th= ese >>>>>>> methods have some asserts (being merged): (TREE_CODE (old_var) = =3D=3D >>>>>>> VAR_DECL || TREE_CODE (old_var) =3D=3D PARM_DECL || TREE_CODE >>>>>>> (old_var) =3D=3D RESULT_DECL). >>>>>>> >>>>>>> So in some cases (when I know that I need to use methods mentioned >>>>>>> above, see the start of create_tmp_var, ie the uses in other loops)= I >>>>>>> create new tmp variable to overcome these asserts. To be honest I >>>>>>> don=E2=80=99t like this but don=E2=80=99t know how to deal with thi= s better. >>>>>> >>>>>> Hmm, don't you just run into SSA names here? That is, TREE_CODE >>>>>> (old_var) =3D=3D SSA_NAME? >>>>>> You can use make_ssa_name (TREE_TYPE (old_var), "plim") in this case. >>>>>> >>>>>> Or you run into the issue that anonymous SSA names cannot have defau= lt >>>>>> defintions (I don't understand why you need all the fuss about defau= lt defs). >>>>> >>>>> I used default defs to write something in phi args when I have no val= ues on >>>>> corresponding branches. Say there is some tmp value which I should >>>>> create phi for. >>>>> if (a) >>>>> { >>>>> tmpvar=3Dsomething >>>>> b =3D something(tmpvar); >>>>> } >>>>> else >>>>> { >>>>> } >>>>> When I move if (a) {tmpvar =3D something} and create phi for tmpvar (= to >>>>> use phi result >>>>> in the initial cycle) i need to write something for 'not a' phi arg. >>>>> BUT, in some cases I use >>>>> build_zero_cst (which I started to use after default defs use. I mean >>>>> I didn't know about >>>>> build_zero_cst while I started to use default defs.). So may be I ca= n replace >>>>> uses of default defs in such cases by build_zero_cst? I mean just SOM= E values >>>>> (as I know that control flow will never go to this places actually). I >>>>> mean if !a tmpvar will not used. >>>> >>>> Ok, this is again about the case where you are not moving a PHI node b= ut >>>> creating it because you want to conditionally execute invariant stmts. >>>> Yes, a default def is ok here, but you can always use a new VAR_DECL >>>> just for the default def creation like gimple_fold_call does for examp= le: >>>> >>>> tree var =3D create_tmp_var (TREE_TYPE (lhs)= ); >>>> tree def =3D get_or_create_ssa_default_def (= cfun, var); >>>> >>>> Note that I think we don't need to bother about this case as we'd only >>>> want to move PHI nodes in the first place (so you already have values >>>> for all PHI args). >>>> >>>>>> >>>>>>> And a couple of words regarding where we store info for phi nodes: >>>>>>> >>>>>>> Each loop contains in its aux phi_data structure. On this stage we >>>>>>> only add there stmt (if phi node will be required), phi name=3Dphi = node >>>>>>> result name in names_map or fl_names_map). See the comment about >>>>>>> phi_data in patch. >>>>>>> >>>>>>> If there should be created several phi nodes for given lhs (I mean = the >>>>>>> first place for phi node doesn=E2=80=99t dominate on corresponding = preheader, >>>>>>> we add only the last name in names_map (as intermediate names could= be >>>>>>> created later, but the last name in bb which dominates preheader >>>>>>> should be used for replacement in other loops. Replacements were >>>>>>> already made in walker). >>>>>>> >>>>>>> If lhs is used only in phi node, which are moved and transformed in= to >>>>>>> assignment, and there is no uses in other loops, we need to create >>>>>>> only first level phi node (don=E2=80=99t need to create phi nodes t= ill some bb >>>>>>> which dominates preheader), then we add such name to fl_names_map. >>>>>>> >>>>>>> 3.2 Create_phi_nodes >>>>>>> >>>>>>> For each of the loops we go over ((phi_data*) loop->aux)->stmts. Th= ese >>>>>>> are statements which were moved, but they have uses in the original >>>>>>> loop (more exactly, their uses in some other loops replaced by some >>>>>>> name from ((phi_data*) loop->aux)->names_map or ((phi_data*) >>>>>>> loop->aux)->fl_names_map. >>>>>>> >>>>>>> So for each of such stmts we create phi nodes (for its lhs) chain. >>>>>>> >>>>>>> Basic block for phi node is found in get_bb_for_phi for given bb w= ith >>>>>>> stmt. If basic block for phi node dominates preheader->src we end >>>>>>> chain creation. There is some special condition for the case when we >>>>>>> need to create only first level phi node. (I described this situati= on >>>>>>> above, but let me know If I need to add more details. >>>>>>> >>>>>>> Basic blocks can have maps to identify if we created phi node for >>>>>>> given variable in given basic_block, so we only add an edge argument >>>>>>> for such case (phi_map =3D new hash_map>>>>>> gphi*>*>). >>>>>> >>>>>> Ok. >>>>>> >>>>>> Well - I think it might be easier code-generation-wise to have a sep= arate >>>>>> phase move_phis executed before move_computations, that just moves >>>>>> invariant PHI nodes (which this all is about - see above) and their >>>>>> controlling conditions and only then move their dependencies. >>>>> I didn;t understand so far (I think let's clarify previous questions,= especially >>>>> about phi nodes then let's go back to this one.) >>>> >>>> Yeah, I think the PHI node thing is our major mis-understanding (or the >>>> main thing that I think complicates the patch for too little gain - at= least >>>> initially). >>>> >>>> Thanks, >>>> Richard. >>>> >>>>>> >>>>>> Thanks, >>>>>> Richard. >>>>>> >>>>>>> In short, that=E2=80=99s all. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Evgeniya >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sat, May 9, 2015 at 1:07 AM, Evgeniya Maenkova >>>>>>> wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> Could you please review my patch for predicated lim? >>>>>>>> >>>>>>>> Let me note some details about it: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> 1) Phi statements are still moved only if they have 1 or 2 >>>>>>>> arguments. However, phi statements could be move under conditions = (as >>>>>>>> it=E2=80=99s done for the other statements). Probably, phi statem= ent motion >>>>>>>> with 3 + arguments could be implemented in the next patch after >>>>>>>> predicated lim. >>>>>>>> >>>>>>>> 2) Patch has limitations/features like (it was ok to me to >>>>>>>> implement it such way, maybe I=E2=80=99m not correct. ): >>>>>>>> >>>>>>>> a) Loop1 >>>>>>>> >>>>>>>> { >>>>>>>> >>>>>>>> If (a) >>>>>>>> >>>>>>>> Loop2 >>>>>>>> >>>>>>>> { >>>>>>>> >>>>>>>> Stmt - Invariant for Loop1 >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> In this case Stmt will be moved only out of Loop2, because = of if (a). >>>>>>>> >>>>>>>> b) Or >>>>>>>> >>>>>>>> Loop1 >>>>>>>> >>>>>>>> { >>>>>>>> >>>>>>>> =E2=80=A6 >>>>>>>> >>>>>>>> If (cond1) >>>>>>>> >>>>>>>> If (cond2) >>>>>>>> >>>>>>>> If (cond3) >>>>>>>> >>>>>>>> Stmt; >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> Stmt will be moved out only if cond1 is always executed in Loop1. >>>>>>>> >>>>>>>> c) It took me a long time to write all of these code, so the= re >>>>>>>> might be other peculiarities which I forgot to mention. :) >>>>>>>> >>>>>>>> Let=E2=80=99s discuss these ones as you wil= l review my patch. >>>>>>>> >>>>>>>> 3) Patch consists of 9 files: >>>>>>>> >>>>>>>> a) gcc/testsuite/gcc.dg/tree-ssa/loop-7.c, >>>>>>>> gcc/testsuite/gcc.dg/tree-ssa/recip-3.c =E2=80=93 changed tests: >>>>>>>> >>>>>>>> - gcc/testsuite/gcc.dg/tree-ssa/loop-7.c changed as >>>>>>>> predicated lim moves 2 more statements out of the loop; >>>>>>>> >>>>>>>> - gcc/testsuite/gcc.dg/tree-ssa/recip-3.c =E2=80=93 with = conditional >>>>>>>> lim recip optimization in this test doesn=E2=80=99t work (the corr= esponding >>>>>>>> value is below threshold as I could see in the code for recip, 1<3= ). >>>>>>>> So to have recip working in this test I changed test a little bit. >>>>>>>> >>>>>>>> b) gcc/tree-ssa-loop-im.c =E2=80=93 the patched lim per se >>>>>>>> >>>>>>>> c) gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-13.c, >>>>>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-14.c, >>>>>>>> >>>>>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-15.c, >>>>>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c, >>>>>>>> >>>>>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c, >>>>>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c >>>>>>>> >>>>>>>> the tests for conditional lim. >>>>>>>> >>>>>>>> 4) Patch testing: >>>>>>>> >>>>>>>> a) make =E2=80=93k check (no difference in results for me for= the clean >>>>>>>> build and the patched one, >>>>>>>> >>>>>>>> - Revision: 222849, >>>>>>>> >>>>>>>> - uname -a >>>>>>>> >>>>>>>> Linux Istanbul 3.16.0-23-generic #31-Ubuntu SMP Tue= Oct >>>>>>>> 21 18:00:35 UTC 2014 i686 i686 i686 GNU/Linux >>>>>>>> >>>>>>>> b) Bootstrap. >>>>>>>> >>>>>>>> It goes well now, however to fix it I have made a temporary hack = in >>>>>>>> the lim code. And with this fix patch definitely shouldn=E2=80=99t= be >>>>>>>> committed. >>>>>>>> >>>>>>>> I did so, as I would like to discuss this issue first. >>>>>>>> >>>>>>>> The problem is: I got stage2-stage3 comparison failure on the sing= le >>>>>>>> file (tree-vect-data-refs.o). After some investigation I understood >>>>>>>> that tree-vect-data-refs.o differs being compiled with and without >>>>>>>> =E2=80=98-g=E2=80=99 option (yes, more exactly on stage 2 this is = =E2=80=98-g =E2=80=93O2 =E2=80=93gtoggle=E2=80=99, >>>>>>>> and for stage 3 this is =E2=80=98-g =E2=80=93O2=E2=80=99. But to s= implify things I can >>>>>>>> reproduce this difference on the same build (even not bootstrapped= ), >>>>>>>> with option =E2=80=98 =E2=80=93g=E2=80=99 and without it). >>>>>>>> >>>>>>>> Of course, the file compiled with =E2=80=93g option will contain d= ebug >>>>>>>> information and will differ from the corresponding file without de= bug >>>>>>>> information. I mean there is the difference reported by >>>>>>>> contrib/compare-debug (I mean we talk about stripped files). >>>>>>>> >>>>>>>> Then I compared assemblers and lim dump files and made a conclusion >>>>>>>> the difference is: >>>>>>>> >>>>>>>> There is statement _484=3Dsomething, which is conditionally moved = out of >>>>>>>> loop X. In non debug cases no usages of _484 are left inside the l= oop >>>>>>>> X. In debug case, there is the single use in debug statement. So f= or >>>>>>>> debug case my patch generates phi statement for _484 to make it >>>>>>>> available inside the loop X. For non debug case, no such phi state= ment >>>>>>>> generated as there is no uses of _484. >>>>>>>> >>>>>>>> There is one more statement with lhs=3D_493, with exactly this pat= tern >>>>>>>> of use. But this is not important for our discussion. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> So, in my opinion it=E2=80=99s ok to generate additional phi node = for debug >>>>>>>> case. But I=E2=80=99m not a compiler expert and maybe there is some >>>>>>>> requirement that debug and non-debug versions should differ only by >>>>>>>> debug statements, I don=E2=80=99t know. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> My temporary hack fix is just removing of such debug statements (no >>>>>>>> debug statements, no problems J). >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> The alternatives I see are: >>>>>>>> >>>>>>>> - Move debug statements outside the loop (not good in my = opinion); >>>>>>>> >>>>>>>> - Somehow reset values in debug statements (not good in my >>>>>>>> opinion, if I could provide correct information for them); >>>>>>>> >>>>>>>> - Generate phi for debug statements and fix bootstrap (sa= y, >>>>>>>> let=E2=80=99s have some list of files, which we have special check= for. I mean >>>>>>>> for my case, the difference is not in stage2 and stage3, it=E2=80= =99s in debug >>>>>>>> and non-debug). I like this variant. However, as I don=E2=80=99t s= ee such list >>>>>>>> of special files in the whole gcc, I think I might be missing >>>>>>>> something. >>>>>>>> >>>>>>>> What do you think? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Evgeniya >>> >>> >>> >>> -- >>> Thanks, >>> >>> Evgeniya >>> >>> perfstories.wordpress.com > > > > -- > Thanks, > > Evgeniya > > perfstories.wordpress.com