From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-402978-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 129839 invoked by alias); 15 Jul 2015 08:36:13 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 129830 invoked by uid 89); 15 Jul 2015 08:36:12 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.2 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2
X-HELO: mail-la0-f41.google.com
Received: from mail-la0-f41.google.com (HELO mail-la0-f41.google.com) (209.85.215.41) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Wed, 15 Jul 2015 08:36:08 +0000
Received: by lagw2 with SMTP id w2so20240317lag.3        for <gcc-patches@gcc.gnu.org>; Wed, 15 Jul 2015 01:36:04 -0700 (PDT)
MIME-Version: 1.0
X-Received: by 10.152.25.227 with SMTP id f3mr2789409lag.67.1436949364293; Wed, 15 Jul 2015 01:36:04 -0700 (PDT)
Received: by 10.112.52.100 with HTTP; Wed, 15 Jul 2015 01:36:04 -0700 (PDT)
In-Reply-To: <CAFiYyc2quZ6WkyWhN0H9J_HS7HKX32qeE-jAW5EKW2DiH2WJZg@mail.gmail.com>
References: <CANVW7MPfRvpNjABKuqup9GxbUk7A=BaRqwsPcbP8DRpGoGMWdw@mail.gmail.com>	<CANVW7MP0Uv3LSSd=JfD5hTnMv75SpQ1WMrV_eWMa9owZ3T+yYg@mail.gmail.com>	<CAFiYyc3Do3Mn+eAgQGpbZCtzUy4y=FvbbtuPo2xkPwi=y=U2hg@mail.gmail.com>	<CANVW7MNHfPnXeU0bvQRQxjfX0Cq3Fm2ZbQBBowzt36WYg8Zhng@mail.gmail.com>	<CAFiYyc2w_f5WJrnyAgDf=Y8haqjDvF_d4RgqoHq_jwf5oopDdg@mail.gmail.com>	<CANVW7MN6jczCDYsvuYhTWSNTQnARfRUjWTxZpqAj8v868Beo8w@mail.gmail.com>	<CAFiYyc2quZ6WkyWhN0H9J_HS7HKX32qeE-jAW5EKW2DiH2WJZg@mail.gmail.com>
Date: Wed, 15 Jul 2015 08:43:00 -0000
Message-ID: <CANVW7MMRX_HUfbheEyn1OFL7x3x=1Qo=0=70y-Rcc13DmcSvsA@mail.gmail.com>
Subject: Re: conditional lim
From: Evgeniya Maenkova <evgeniya.maenkova@gmail.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
X-SW-Source: 2015-07/txt/msg01239.txt.bz2

On Tue, Jul 14, 2015 at 2:54 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Mon, Jun 29, 2015 at 4:21 PM, Evgeniya Maenkova
> <evgeniya.maenkova@gmail.com> wrote:
>> On Mon, Jun 29, 2015 at 5:10 PM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Tue, Jun 9, 2015 at 10:11 PM, Evgeniya Maenkova
>>> <evgeniya.maenkova@gmail.com> wrote:
>>>> On Tue, Jun 9, 2015 at 3:46 PM, Richard Biener
>>>> <richard.guenther@gmail.com> wrote:
>>>>> On Fri, May 29, 2015 at 3:14 PM, Evgeniya Maenkova
>>>>> <evgeniya.maenkova@gmail.com> wrote:
>>>>>> Hi Richard,
>>>>>>
>>>>>> Here is some explanation. I hope you let me know if I need to clarif=
y something.
>>>>>>
>>>>>> Also, you asked me about concrete example, to make sure you don=E2=
=80=99t miss
>>>>>> my answer here is the link:
>>>>>> https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02417.html.
>>>>>>
>>>>>> Also, I doubt whether it=E2=80=99s convenient for you to create a bu=
ild with
>>>>>> my patch or not. May be to clarify things you could send me some
>>>>>> examples/concrete cases, then I=E2=80=99ll compile them with
>>>>>> =E2=80=93fdump-tree-loopinit-details and =E2=80=93fdump-tree-lim-det=
ails and send you
>>>>>> these dumps. May be these dumps will be useful. (I=E2=80=99ll only d=
isable
>>>>>> cleanup_cfg TODO after lim to let you know the exact picture after
>>>>>> lim).
>>>>>>
>>>>>> What do you think?
>>>>>>
>>>>>> 1.       invariantness _dom_walker =E2=80=93
>>>>>>
>>>>>> 1.1   for each GIMPLE_COND in given bb calls handle_cond_stmt to call
>>>>>> for true and false edges handle_branch_edge, which calls SET_TARGET_=
OF
>>>>>> for all bb =E2=80=98predicated=E2=80=99 by given GIMPLE_COND.
>>>>>>
>>>>>> SET_TARGET_OF sets in basic_blocks aux 2 facts:
>>>>>>
>>>>>> a)      this is true or false edge;
>>>>>>
>>>>>> b)      link to cond stmt;
>>>>>>
>>>>>> Handle_branch_edge works this way:
>>>>>>
>>>>>> If (cond1)
>>>>>>
>>>>>>   {
>>>>>>
>>>>>>      bb1;
>>>>>>
>>>>>>      if (cond2}
>>>>>>
>>>>>>        {
>>>>>>
>>>>>>            bb2;
>>>>>>
>>>>>>         }
>>>>>>
>>>>>>    Being called for cond1, it sets cond1 as condition for both bb1 a=
nd
>>>>>> bb2 (the whole branch for cond1, ie also for bb containing cond2),
>>>>>> then this method will be called (as there is dominance order) for
>>>>>> cond2 to correct things (ie to set cond2 as condition for bb2).
>>>>>
>>>>> Hmm, why not track the current condition as state during the DOM walk
>>>>> and thus avoid processing more than one basic-block in handle_branch_=
edge?
>>>>> Thus get rid of handle_branch_edge and instead do everything in handl=
e_cond_stmt
>>>>> plus the dom-walkers BB visitor?
>>>>>
>>>> I need to look more carefully how to implement it, but I think I
>>>> understand what you mean and this optimization of course looks
>>>> reasonable to me. Will do.
>>>>
>>>>> I see you don't handle BBs with multiple predecessors - that's ok, but
>>>>> are you sure you don't run into correctness issues when not marking s=
uch
>>>>> BBs as predicated?  This misses handling of, say
>>>>>
>>>>>  if (a || b)
>>>>>    bb;
>>>>>
>>>>> which is a pity (but can be fixed later if desired).
>>>>>
>>>> I had some test (in gcc testsuite or bootstrap build) which worked
>>>> incorrectly because of multiple predecessors. As far as I remember the
>>>> situation was (further, will make some notes about such tests to
>>>> clarify this better), I mean with previous version of my code which
>>>> handled bb with 2 predecessors:
>>>> if (a)
>>>>   tmpvar=3Dsomething;
>>>> while()
>>>>   if (a || b)
>>>>       basic_block {do something with tmpvar;} // I mean basic block
>>>> predicated by bb with a and bb with b
>>>>
>>>> So, if a is false, I mean we didn't do tmpvar=3Dsomething (outside
>>>> loop), BUT we are in basick_block  (we went by bb with b), we got
>>>> Segmentation falt in basic_block {do something with tmpvar;}.
>>>>
>>>> I think we can reproduce all the details of this test if I remove not
>>>> handling bb with 2 predecessors.
>>>>
>>>> So I wouldn't move bb with 2 predecessors (this is not always executed
>>>> bb in any loop, not conditionally, they will not be moved at all).
>>>>
>>>> This is my more detail explanation on this point. Perhaps, I didn't
>>>> understand your question about correctness. Could you repeat it in
>>>> other words (based on this new clarification).
>>>>
>>>> So I think according to current code it will not be moved. What
>>>> incorrectness do you mean?
>>>
>>> If the block isn't marked as predicated the question is whether it is
>>> handled correctly or assumed to be unconditionally executed.
>>>
>>>>> I note that collecting predicates has similarities to what if-convers=
ion
>>>>> does in tree-ifcvt.c (even if its implementation is completely differ=
ent,
>>>>> of course).
>>>>>
>>>>
>>>> Ok, I'll look at this. But could you please clarify your point?
>>>> (Should I just take this into account with low priority and look at
>>>> this later or you want some refactoring?)
>>>
>>> I just noted similar code exists elsewhere - it may be possible to
>>> factor it out but I didn't investigate.  And no, doing that isn't a pre=
requesite
>>> for this patch.
>>>
>>>>>> 1.2   As 1.1 goes we identify whether some bb is predicated by some
>>>>>> condition or not.
>>>>>>
>>>>>> bb->aux->type will be [TRUE/FALSE]_TARGET_OF and
>>>>>> bb->aux->cond_stmt=3Dcond stmt (the nearest condition).
>>>>>>
>>>>>> If bb is always executed bb->aux->type =3D ALWAYS_EXECUTED_IN,
>>>>>> bb->loop->loop (this info was available in the clean build).
>>>>>>
>>>>>> 1.3   As this walker is called in dominance order, information about
>>>>>> condition is available when invariantness_dom_walker is called for
>>>>>> given bb.  So we can make some computations based on bb->aux
>>>>>> structure. This is done in check_conditionally_executed. The main go=
al
>>>>>> of this method is to check that the root condition is always executed
>>>>>> in the loop. I did so to avoid situation like this
>>>>>>
>>>>>> Loop:
>>>>>>
>>>>>>    Jmp somewhere;
>>>>>>
>>>>>>   If (cond1)
>>>>>>
>>>>>>       If (cond2)
>>>>>>
>>>>>>          Etc
>>>>>>
>>>>>> By =E2=80=98root condition=E2=80=99 I mean cond1 in this case (the f=
irst cond in
>>>>>> dominance order).
>>>>>
>>>>> Ok, but this can't happen (an uncontional jump to somehwere).  It
>>>>> will always be a conditional jump (a loop exit) and thus should
>>>>> be handled already.
>>>>>
>>>>> Did you have a testcase that was mishandled?
>>>>
>>>> No, I have not such testcase. This is because I;m newbie at gcc and
>>>> compilers. If you tell me this fact, let's just remove this check?
>>>> Correct?
>>>
>>> Yes.
>>>
>>>>>
>>>>>> If  =E2=80=98root condition=E2=80=99 is not always executed we disab=
le (free) all the
>>>>>> info in bb->aux, ie all the blocks becomes neither conditionally nor
>>>>>> always executed according to bb->aux.
>>>>>>
>>>>>> There is some optimization to don=E2=80=99t go each time trough the =
conditions
>>>>>> chain (cond2->cond1), let me skip such details for now.
>>>>>>
>>>>>>
>>>>>>
>>>>>> 1.4   Then we calculate tgt_level  (which is used in move_computatio=
ns later)
>>>>>>
>>>>>> The patch is the same for phi and regular stmt (calculate_tgt_level)
>>>>>>
>>>>>> 1)      If stmt is conditionally executed we compute max movement
>>>>>> (determine_max_movement) starting from
>>>>>> get_lim_data(condition)->max_loop.
>>>>>>
>>>>>> 2)      If stmt is not cond executed as start level  for
>>>>>> determine_max_movement computations we choose ALWAYS_EXECUTED_IN.
>>>>>>
>>>>>> To clarify why, see the comment
>>>>>>
>>>>>> /* The reason why we don't set start_level for MOVE_POSSIBLE
>>>>>>
>>>>>>           statements to more outer level is: this statement could de=
pend on
>>>>>>
>>>>>>           conditional statements and even probably is not defined wi=
thout this
>>>>>>
>>>>>>           condition. So either we should analyze these ones and move
>>>>>>
>>>>>>           under condition somehow or keep more strong start_level . =
*/
>>>>>>
>>>>>> As you noted in your review there was some refactoring. Of course, I
>>>>>> had no goal to refactor existing code, I intended to remove some
>>>>>> duplication which I caused by my changes.  I hope we discuss in
>>>>>> details later if you don=E2=80=99t like some refactoring.
>>>>>
>>>>> Refactoring makes a big patch harder to review because it ends up
>>>>> containing no-op changes.  It also makes it appear bigger than it
>>>>> really is ;)
>>>>>
>>>>> A reasonable solution is to factor out the refactoring to a separate =
patch.
>>>>>
>>>> I see what you mean.
>>>>
>>>>>> 2.       store_motion  - for some stmts set flags to ignore predicat=
ed
>>>>>> execution (I mean to move statements without conditions).
>>>>>
>>>>> That's for the existing "predicated" support as far as I can see.
>>>>>
>>>>>>
>>>>>>
>>>>>> 3.       move_computations. The patch is doing something in the
>>>>>> following 3 calls inside of this method.
>>>>>>
>>>>>>
>>>>>>
>>>>>> 3.1   (more details below) walker.walk =E2=80=93 move computations(c=
reate if()
>>>>>> structure) and store information to create phi nodes later (which
>>>>>> statements require phi nodes, as they conditionally executed and the=
re
>>>>>> are still their uses on other levels, what are the names for such phi
>>>>>> nodes, as we replace uses in here, in walker.walk.)
>>>>>
>>>>> What kind of uses are that?  By definition all stmts executed
>>>>> under condition A that are used in code not under condition A have
>>>>> a PHI node associated.  This is why the existing conditional movement
>>>>> handling moves PHI nodes (and all dependent computations).
>>>>>
>>>>> Are you thinking of
>>>>>
>>>>>   for (;;)
>>>>>     if (a) x =3D 1;
>>>>>
>>>>>   .. use of x
>>>>>
>>>>> ?
>>>> No, I think in this case phi node of x will be used in initial
>>>> (=3D=3Dloopinit) code (in 'use of x') instead of some version of x (wh=
ich
>>>> I move).
>>>>
>>>>>If you move the PHI node for x then you don't need to insert any other
>>>>> PHI nodes (and you _do_ have to move the PHI node anyway).
>>>>
>>>>>
>>>>>> 3.2   (more details below) create_phi_nodes =E2=80=93 create phi nod=
es for
>>>>>> statements which require that (see 3.1)
>>>>>
>>>>> Only PHI nodes need PHI node creation and only its dependences
>>>>> might need predication.
>>>>>
>>>>> That is, if you move a conditionally executed statement but not
>>>>> any PHI node it feeds then the movement is useless.
>>>>>
>>>>> But maybe I am missing something about the PHI business.
>>>>>
>>>> Ok. I have different understanding on this (and am afraid of this of
>>>> course :) ).
>>>>
>>>> See the example:
>>>> void bar (long);
>>>> void foo (int cond1, int cond2, int cond3, int m, int n)
>>>> {
>>>>   unsigned x;
>>>>   unsigned inv1;
>>>>
>>>>   for (x =3D 0; x < 1000; ++x)
>>>>     {
>>>>       if (cond1)
>>>>         {
>>>>           int tmp_inv1 =3D m*n;
>>>>           non_inv1 =3D tmp_inv1 + x;
>>>>         }
>>>>     }
>>>>   bar (non_inv1);
>>>> }
>>>>
>>>> This is working example, will attach loopinit and lim1 dumps.
>>>> There is no phi node for tmp_inv1 in loopinitat all. However, I think
>>>> we should move tmp_inv1. To use tmp_inv1 in non_inv1 we need phi node
>>>> (otherwise we get verification failure as bb with conditionally moved
>>>> tmp_inv1 doesn;t dominate bb with non_inv1). Another example if
>>>> non_inv1 is invariant but has not enough cost or something else. In
>>>> short, if it's still used in initial or others loops. Say, non_inv1 is
>>>> invariant and moved in another loop. Or something else.
>>>>
>>>> So, when I read
>>>> 'Only PHI nodes need PHI node creation'
>>>> and
>>>> 'That is, if you move a conditionally executed statement but not any
>>>> PHI node it feeds then the movement is useless.'
>>>> I think about this example and don't understand. Could you clarify?
>>>
>>> Ok, so this case is already handled by LIM by simply moving tmp_inv1 =
=3D m*n
>>> (and making it unconditionally execute before the loop).  In fact wheth=
er
>>> we can move it does not depend on whether 'cond' is invariant or not.
>>>
>>> If we couldn't execute m*n unconditionally on the loop preheader edge
>>> then we'd have to be able to also move the guarding condition.  Note
>>> that cost considerations shouldn't come into play here.
>>>
>>> So I was only thinking of the case where we also want to move the
>>> condition (which is wanted if we want to move a PHI node).
>>>
>>
>> Don't understand "So I was only thinking of the case where we also
>> want to move the condition (which is wanted if we want to move a PHI
>> node)."
>>
>> What about the case when we want to move condition but don't have phi
>> node to move.
>>
>> Let's consider a little bit modified example:
>>
>> void bar (long);
>> void foo (int cond1, int cond2, int cond3, int m, int n)
>>   {
>>      unsigned x;
>>      unsigned inv1;
>>
>>      for (x =3D 0; x < 1000; ++x)
>>        {
>>          if (n)
>>            {
>>              int tmp_inv1 =3D m/n;
>>              non_inv1 =3D tmp_inv1 + x;
>>             }
>>        }
>>      bar (non_inv1);
>>  }
>> We couldn't  move tmp_inv1 uncoditionally. Right (I have no clean
>> build at hands right now to check. If I'm wrong let's replace m/n to
>> something that couldn't be unconditionally moved :))? There is no phi
>> node for tmp_inv1.
>> Could you clarify what conditional lim should do in your opinion here?
>
> Nothing at this point.  If only then for the simple reason to make the
> first iteration of the patch simpler (still covering most cases).
>
> You seem to want to catch 100% of all possible cases.
>
>> Is it too little gain to move statements that only used in the same bb
>> or branch (I mean before phi node creation, even if stmt had phi
>> node)?
>>
>> In fact, we could have here 2 thousands of divisions:
>> if (n)
>>   {
>>     int tmp_inv1 =3D m/n;
>>     tmp_inv2 =3D something_which_couldn't_be moved_unconditionally (tmp_=
inv1);
>>     ...
>>     tmp_invN =3D something_which_couldn't_be moved_unconditionally (tmp_=
invN-1);
>>     non_inv1 =3D tmp_invN + x;
>>   }
>
> Of course we could.  But the patch didn't even exercise this case (no tes=
tcase).

Patch handles this case (this is why I created phi nodes which we
tried to discuss).
Your comment is about test case not about the patch.

>
> Please make patches simple - support for moving non-PHIs can be added
> as a followup.  The first important part is to make the current code
> (moving PHIs)
> create control-flow rather than if-converted form.
>
> Richard.
>
>>
>>>>>>
>>>>>> 3.3   replace_uses_in_conditions
>>>>>>
>>>>>> After computations movements we can have several copies of the cond
>>>>>> stmt. In 3.1 we do replacement of uses based on stmt=E2=80=99s tgt_l=
evel. For,
>>>>>> cond stmt it doesn=E2=80=99t work. As cond stmt, of course, have som=
ething in
>>>>>> lim_aux_data->tgt_level, but, in fact, they are moved only if
>>>>>> corresponding invariants are moved. So, in fact, the same cond (copi=
es
>>>>>> of it, of course) could go to the different levels. So to fix these
>>>>>> uses, we call replace_uses_in_conditions.
>>>>>
>>>>> Hmm, but stmts that a condition depends on always have at least the
>>>>> target level of the condition statement.  Thus their computation is
>>>>> available in all other "same" conditon statements that might appear
>>>>> in more inner loop levels, no?  So isn't this just about performing t=
he
>>>>> movement in a proper ordering?
>>>>
>>>> Let me give your more details, then may be you repeat question in othe=
r words.
>>>> for ()
>>>> {
>>>>   if (a)
>>>>     {
>>>>        inv1 =3D some_computations;
>>>>        if (inv1 < inv2)
>>>>             {
>>>>               inv3;
>>>>               inv4;
>>>>             }
>>>>        }
>>>> }
>>>>
>>>> So, the order for movement is: inv1; inv3; inv4. When we move inv1 we
>>>> have tgt_level for inv3 and inv4 computed but we have not created
>>>> copies of 'if (inv1 < inv2)' created. In theory, we can have several
>>>> copies:(1) for tgt_level of inv3 (2) for tgt_level of inv2 (3) in
>>>> initial cycle. if (1) is in tgt level of inv1 we need no replacement,
>>>> in other cases we need replacements. Of course, we know this(all the
>>>> info about tgt_levels) when we move inv1, but then we need to create
>>>> all needed copies (store it somehow, etc) of 'if (inv1 < inv2') and
>>>> make replacement in these copies. I don't see now why this is much
>>>> better. This doesn't mean that current solution is perfect, just
>>>> clarify your thought.
>>>
>>> I still don't get what "replacements" actually mean.  Let's extend the
>>> example to
>>>
>>> for ()  (A)
>>>   {
>>>> for ()  (B)
>>>> {
>>>>   if (a)
>>>>     {
>>>>        inv1 =3D some_computations;
>>>>        if (inv1 < inv2)
>>>>             {
>>>>               inv3;
>>>>               inv4;
>>>>             }
>>>>        }
>>>> }
>>>  }
>>>
>>> then you say the target level of inv1 could be 'A' and the target
>>> level of inv3 'B'
>>> and that of inv4 is 'B' as well.
>>>
>>> That's true.  So we move inv1 to before 'A' and inv3/inv4 to before 'B'=
 guarded
>>> with if (inv1 < inv2).  I fail to see what "copies" we need here.  'inv=
1' is
>>> available in all target levels below its own, no copies needed.
>>>
>>
>> I meant the copies of conditions (COND_STMT: inv1 < inv2).
>>
>> Let's modify this example: tgt_level (inv1) =3D A; tgt_level(inv3) =3D A;
>> tgt_level(inv4) =3D B.
>>
>>  Then
>> if (a)
>>    inv1=3Dsome_computations;
>> if (inv1 < inv2)
>>    inv3;
>> <preheader edge for A, where phi for inv1(phi_for_inv1) is computed>
>> for ()  (A)
>>  {
>>  if (phi_for_inv1 <inv2)
>>    inv3
>>  for ()  (B)
>>    {
>>    if (a)
>>      {
>>         if (phi_for_inv1 < inv2)
>>           {
>>           }
>>      }
>>  }
>> So, we have 3 copies of inv1 < inv2 after lim (in some of them we use
>> inv1, in other phi_for_inv1). But this is the second priority for now
>> (this is all about whether we should move statements w/o phi node).
>>
>>>>>
>>>>>> More details on 3.1 and 3.2
>>>>>>
>>>>>> 3.1 The patch is very similar for phi stmts and regular stmts (there
>>>>>> is some difference for regular stmts related to the case when we move
>>>>>> sequence instead of single stmt, let=E2=80=99s skip it in this descr=
iption,
>>>>>> will describe later. BTW, there are comments, in corresponding parts=
).
>>>>>>
>>>>>> a) for each bb we call check_conditionally_executed and if bb is cond
>>>>>> executed then invariant statement  will be moved conditionally.
>>>>>>
>>>>>> b) prepare_edge_for_stmt collects information needed to create phi
>>>>>> nodes in 3.2(add_info_for_phi) AND prepares edge ie creates if()
>>>>>> structure needed to move statement
>>>>>> (get_block_for_predicated_statement, see below). All of this is done
>>>>>> for cond executed stmt. Nothing is done for non cond executed.
>>>>>>
>>>>>> BTW, there is some strange names for functions like missed ending
>>>>>> (prepare_edge_for_stm, w/o t in the end, this is because of my script
>>>>>> to add =E2=80=98 =E2=80=98 before =E2=80=98(=E2=80=98, will fix in t=
he next patch version, ignore
>>>>>> please so far).
>>>>>>
>>>>>> Get_block_for_predicated_stmt:
>>>>>>
>>>>>> Create bb where to add cond stmt to (or return existing, each loop h=
as
>>>>>> a map  (level_cond_data_map is map <loop, cond_data_map> ,
>>>>>> cond_data_map is map <gimple, basic_block>) to identify if bb for
>>>>>> given cond was created or not for given level).
>>>>>>
>>>>>> Parameters of this method:
>>>>>>
>>>>>> Cond =E2=80=93 condition of stmt, branch =E2=80=93 is it on true or =
false edge of
>>>>>> cond, cond_map  - see above, level (tgt_level), preheader =E2=80=93
>>>>>> preheader_edge for level.
>>>>>>
>>>>>> So first of all we check whether there is a basic block for given
>>>>>> condition in given loop.
>>>>>>
>>>>>> a)      If such bb exists then we trying to find corresponding edge,
>>>>>> destination of this edge shouldn=E2=80=99t be a post dominator of bb
>>>>>> (find_branch_edge).
>>>>>>
>>>>>> a.1) if such edge exists we return the most remote in this branch bb
>>>>>> (or create it), see get_most_remote_in_branch;
>>>>>>
>>>>>> a.2) if such edge doesn=E2=80=99t exists, we create corresponding bb=
, see make_branch
>>>>>>
>>>>>>        b)  If such bb doesn=E2=80=99t exist we create it (recursivel=
y calling
>>>>>> get_block_for_predicated_stmt if required bb is conditionally
>>>>>> executed).
>>>>>>
>>>>>> Will not describe it further so far (let me know if you would like to
>>>>>> read additional details).
>>>>>>
>>>>>> Add_info_for_phi
>>>>>>
>>>>>> The main goal of this method is to identify situation where
>>>>>> stmt(parameter,ie the statement which we are moving) or more exactly
>>>>>> lhs of this statement
>>>>>>
>>>>>> a)      is still used in other loops, so we need to create phi node =
to
>>>>>> use phi name inside other loops.
>>>>>>
>>>>>> b)      Is used in phi node of original loop and this phi node is
>>>>>> going to be moved. BUT, as we move phi node as assignment (in the ca=
se
>>>>>> of phi node with 2 arguments) we need to create phi node to use lhs =
of
>>>>>> statement (param of add_info_for_phi).
>>>>>>
>>>>>> In these cases we make corresponding replacements and store
>>>>>> information needed to create phi nodes and make some other
>>>>>> replacements in 3.3 (replace_uses_in_conditions).
>>>>>>
>>>>>> Add_info_for_phi calls create_tmp_var which requires some explanatio=
n.
>>>>>>
>>>>>> Create_tmp_var
>>>>>>
>>>>>> To create new names for phi nodes and to create default def for phi
>>>>>> arguments I use make_ssa_name and get_or_create_ssa_default_def. The=
se
>>>>>> methods have some asserts (being merged):   (TREE_CODE (old_var) =3D=
=3D
>>>>>> VAR_DECL   || TREE_CODE (old_var) =3D=3D PARM_DECL   || TREE_CODE
>>>>>> (old_var) =3D=3D RESULT_DECL).
>>>>>>
>>>>>> So in some cases (when I know that I need to use methods mentioned
>>>>>> above, see the start of create_tmp_var, ie the uses in other loops) I
>>>>>> create new tmp variable to overcome these asserts. To be honest I
>>>>>> don=E2=80=99t like this but don=E2=80=99t know how to deal with this=
 better.
>>>>>
>>>>> Hmm, don't you just run into SSA names here?  That is, TREE_CODE
>>>>> (old_var) =3D=3D SSA_NAME?
>>>>> You can use make_ssa_name (TREE_TYPE (old_var), "plim") in this case.
>>>>>
>>>>> Or you run into the issue that anonymous SSA names cannot have default
>>>>> defintions (I don't understand why you need all the fuss about defaul=
t defs).
>>>>
>>>> I used default defs to write something in phi args when I have no valu=
es on
>>>> corresponding branches. Say there is some tmp value which I should
>>>> create phi for.
>>>> if (a)
>>>>    {
>>>>      tmpvar=3Dsomething
>>>>      b =3D something(tmpvar);
>>>>     }
>>>> else
>>>>    {
>>>>    }
>>>> When I move if (a) {tmpvar =3D something} and create phi for tmpvar (to
>>>> use phi result
>>>> in the initial cycle) i need to write something for 'not a' phi arg.
>>>> BUT, in some cases I use
>>>>  build_zero_cst (which I started to use after default defs use. I mean
>>>> I didn't know about
>>>>  build_zero_cst while I started to use default defs.). So may be I can=
 replace
>>>> uses of default defs in such cases by build_zero_cst? I mean just SOME=
 values
>>>> (as I know that control flow will never go to this places actually). I
>>>> mean if !a tmpvar will not used.
>>>
>>> Ok, this is again about the case where you are not moving a PHI node but
>>> creating it because you want to conditionally execute invariant stmts.
>>> Yes, a default def is ok here, but you can always use a new VAR_DECL
>>> just for the default def creation like gimple_fold_call does for exampl=
e:
>>>
>>>                           tree var =3D create_tmp_var (TREE_TYPE (lhs));
>>>                           tree def =3D get_or_create_ssa_default_def (c=
fun, var);
>>>
>>> Note that I think we don't need to bother about this case as we'd only
>>> want to move PHI nodes in the first place (so you already have values
>>> for all PHI args).
>>>
>>>>>
>>>>>> And a couple of words regarding where we store info for phi nodes:
>>>>>>
>>>>>> Each loop contains in its aux  phi_data structure. On this stage we
>>>>>> only add there stmt (if phi node will be required), phi name=3Dphi n=
ode
>>>>>> result name in names_map or fl_names_map). See the comment about
>>>>>> phi_data in patch.
>>>>>>
>>>>>> If there should be created several phi nodes for given lhs (I mean t=
he
>>>>>> first place for phi node doesn=E2=80=99t dominate on corresponding p=
reheader,
>>>>>> we add only the last name in names_map (as intermediate names could =
be
>>>>>> created later, but the last name in bb which dominates preheader
>>>>>> should be used for replacement in other loops. Replacements were
>>>>>> already made in walker).
>>>>>>
>>>>>> If lhs is used only in phi node, which are moved and transformed into
>>>>>> assignment, and there is no uses in other loops, we need to create
>>>>>> only first level phi node (don=E2=80=99t need to create phi nodes ti=
ll some bb
>>>>>> which dominates preheader), then we add such name to fl_names_map.
>>>>>>
>>>>>> 3.2 Create_phi_nodes
>>>>>>
>>>>>> For each of the loops we go over ((phi_data*) loop->aux)->stmts. The=
se
>>>>>> are statements which were moved, but they have uses in the original
>>>>>> loop (more exactly, their uses in some other loops replaced by some
>>>>>> name from ((phi_data*) loop->aux)->names_map or ((phi_data*)
>>>>>> loop->aux)->fl_names_map.
>>>>>>
>>>>>> So for each of such stmts we create phi nodes (for its lhs) chain.
>>>>>>
>>>>>> Basic block for phi node is found in get_bb_for_phi  for given bb wi=
th
>>>>>> stmt. If basic block for phi node dominates preheader->src we end
>>>>>> chain creation. There is some special condition for the case when we
>>>>>> need to create only first level phi node. (I described this situation
>>>>>> above, but let me know If I need to add  more details.
>>>>>>
>>>>>> Basic blocks can have maps to identify if we created phi node for
>>>>>> given variable in given basic_block, so we only add an edge argument
>>>>>> for such case (phi_map =3D new hash_map<basic_block, hash_map<tree,
>>>>>> gphi*>*>).
>>>>>
>>>>> Ok.
>>>>>
>>>>> Well - I think it might be easier code-generation-wise to have a sepa=
rate
>>>>> phase move_phis executed before move_computations, that just moves
>>>>> invariant PHI nodes (which this all is about - see above) and their
>>>>> controlling conditions and only then move their dependencies.
>>>> I didn;t understand so far (I think let's clarify previous questions,e=
specially
>>>> about phi nodes then let's go back to this one.)
>>>
>>> Yeah, I think the PHI node thing is our major mis-understanding (or the
>>> main thing that I think complicates the patch for too little gain - at =
least
>>> initially).
>>>
>>> Thanks,
>>> Richard.
>>>
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>>> In short, that=E2=80=99s all.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Evgeniya
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, May 9, 2015 at 1:07 AM, Evgeniya Maenkova
>>>>>> <evgeniya.maenkova@gmail.com> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Could you please review my patch for predicated lim?
>>>>>>>
>>>>>>> Let me note some details about it:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 1)      Phi statements are still moved only if they have 1 or 2
>>>>>>> arguments. However, phi statements could be move under conditions (=
as
>>>>>>> it=E2=80=99s done for the other statements).  Probably, phi stateme=
nt motion
>>>>>>> with 3 + arguments could be implemented in the next patch after
>>>>>>> predicated lim.
>>>>>>>
>>>>>>> 2)      Patch has limitations/features like (it was ok to me to
>>>>>>> implement it such way, maybe I=E2=80=99m not correct. ):
>>>>>>>
>>>>>>> a)      Loop1
>>>>>>>
>>>>>>>     {
>>>>>>>
>>>>>>>               If (a)
>>>>>>>
>>>>>>>                  Loop2
>>>>>>>
>>>>>>>                      {
>>>>>>>
>>>>>>>                        Stmt - Invariant for Loop1
>>>>>>>
>>>>>>>                      }
>>>>>>>
>>>>>>>              }
>>>>>>>
>>>>>>>        In this case Stmt will be moved only out of Loop2, because o=
f if (a).
>>>>>>>
>>>>>>> b)      Or
>>>>>>>
>>>>>>> Loop1
>>>>>>>
>>>>>>>     {
>>>>>>>
>>>>>>>          =E2=80=A6
>>>>>>>
>>>>>>>          If (cond1)
>>>>>>>
>>>>>>>               If (cond2)
>>>>>>>
>>>>>>>                   If (cond3)
>>>>>>>
>>>>>>>                       Stmt;
>>>>>>>
>>>>>>>        }
>>>>>>>
>>>>>>> Stmt will be moved out only if cond1 is always executed in Loop1.
>>>>>>>
>>>>>>> c)       It took me a long time to write all of these code, so there
>>>>>>> might be other peculiarities which I forgot to mention. :)
>>>>>>>
>>>>>>>                        Let=E2=80=99s discuss these ones as you will=
 review my patch.
>>>>>>>
>>>>>>> 3)      Patch consists of 9 files:
>>>>>>>
>>>>>>> a)      gcc/testsuite/gcc.dg/tree-ssa/loop-7.c,
>>>>>>> gcc/testsuite/gcc.dg/tree-ssa/recip-3.c =E2=80=93 changed tests:
>>>>>>>
>>>>>>> -          gcc/testsuite/gcc.dg/tree-ssa/loop-7.c  changed as
>>>>>>> predicated lim moves 2 more statements out of the loop;
>>>>>>>
>>>>>>> -          gcc/testsuite/gcc.dg/tree-ssa/recip-3.c =E2=80=93 with c=
onditional
>>>>>>> lim recip optimization in this test doesn=E2=80=99t work (the corre=
sponding
>>>>>>> value is below threshold as I could see in the code for recip, 1<3).
>>>>>>> So to have recip working in this test I changed test a little bit.
>>>>>>>
>>>>>>> b)      gcc/tree-ssa-loop-im.c =E2=80=93 the patched lim per se
>>>>>>>
>>>>>>> c)       gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-13.c,
>>>>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-14.c,
>>>>>>>
>>>>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-15.c,
>>>>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c,
>>>>>>>
>>>>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c,
>>>>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c
>>>>>>>
>>>>>>> the tests for conditional lim.
>>>>>>>
>>>>>>> 4)      Patch testing:
>>>>>>>
>>>>>>> a)      make =E2=80=93k check (no difference in results for me for =
the clean
>>>>>>> build and the patched one,
>>>>>>>
>>>>>>> -          Revision: 222849,
>>>>>>>
>>>>>>> -          uname -a
>>>>>>>
>>>>>>>                Linux Istanbul 3.16.0-23-generic #31-Ubuntu SMP Tue =
Oct
>>>>>>> 21 18:00:35 UTC 2014 i686 i686 i686 GNU/Linux
>>>>>>>
>>>>>>> b)      Bootstrap.
>>>>>>>
>>>>>>>  It goes well now, however to fix it I have made a temporary hack in
>>>>>>> the lim code. And with this fix patch definitely shouldn=E2=80=99t =
be
>>>>>>> committed.
>>>>>>>
>>>>>>> I did so, as I would like to discuss this issue first.
>>>>>>>
>>>>>>> The problem is: I got stage2-stage3 comparison failure on the single
>>>>>>> file (tree-vect-data-refs.o). After some investigation I understood
>>>>>>> that tree-vect-data-refs.o differs being compiled with and without
>>>>>>> =E2=80=98-g=E2=80=99 option (yes, more exactly on stage 2 this is  =
=E2=80=98-g =E2=80=93O2 =E2=80=93gtoggle=E2=80=99,
>>>>>>> and for stage 3 this is =E2=80=98-g =E2=80=93O2=E2=80=99. But to si=
mplify things I can
>>>>>>> reproduce this difference on the same build (even not bootstrapped),
>>>>>>> with option =E2=80=98 =E2=80=93g=E2=80=99 and without it).
>>>>>>>
>>>>>>> Of course, the file compiled with =E2=80=93g option will contain de=
bug
>>>>>>> information and will differ from the corresponding file without deb=
ug
>>>>>>> information. I mean there is the difference reported by
>>>>>>> contrib/compare-debug (I mean we talk about stripped files).
>>>>>>>
>>>>>>> Then I compared assemblers and lim dump files and made a conclusion
>>>>>>> the difference is:
>>>>>>>
>>>>>>> There is statement _484=3Dsomething, which is conditionally moved o=
ut of
>>>>>>> loop X. In non debug cases no usages of _484 are left inside the lo=
op
>>>>>>> X. In debug case, there is the single use in debug statement. So for
>>>>>>> debug case my patch generates phi statement for _484 to make it
>>>>>>> available inside the loop X. For non debug case, no such phi statem=
ent
>>>>>>> generated as there is no uses of _484.
>>>>>>>
>>>>>>> There is one more statement with lhs=3D_493, with exactly this patt=
ern
>>>>>>> of use. But this is not important for our discussion.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> So, in my opinion it=E2=80=99s ok to generate additional phi node f=
or debug
>>>>>>> case. But I=E2=80=99m not a compiler expert and maybe there is some
>>>>>>> requirement that debug and non-debug versions should differ only by
>>>>>>> debug statements, I don=E2=80=99t know.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> My temporary hack fix is just removing of such debug statements (no
>>>>>>> debug statements, no problems J).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The alternatives I see are:
>>>>>>>
>>>>>>> -          Move debug statements outside the loop (not good in my o=
pinion);
>>>>>>>
>>>>>>> -          Somehow reset values in debug statements (not good in my
>>>>>>> opinion, if I could provide correct information for them);
>>>>>>>
>>>>>>> -          Generate phi for debug statements and fix bootstrap (say,
>>>>>>> let=E2=80=99s have some list of files, which we have special check =
for. I mean
>>>>>>> for my case, the difference is not in stage2 and stage3, it=E2=80=
=99s in debug
>>>>>>> and non-debug). I like this variant. However, as I don=E2=80=99t se=
e such list
>>>>>>> of special files in the whole gcc, I think I might be missing
>>>>>>> something.
>>>>>>>
>>>>>>> What do you think?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Evgeniya
>>
>>
>>
>> --
>> Thanks,
>>
>> Evgeniya
>>
>> perfstories.wordpress.com


--=20
Thanks,

Evgeniya

perfstories.wordpress.com