From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-402990-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 57194 invoked by alias); 15 Jul 2015 10:49:55 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 57185 invoked by uid 89); 15 Jul 2015 10:49:55 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2
X-HELO: mail-oi0-f42.google.com
Received: from mail-oi0-f42.google.com (HELO mail-oi0-f42.google.com) (209.85.218.42) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Wed, 15 Jul 2015 10:49:50 +0000
Received: by oiab3 with SMTP id b3so25559437oia.1        for <gcc-patches@gcc.gnu.org>; Wed, 15 Jul 2015 03:49:48 -0700 (PDT)
MIME-Version: 1.0
X-Received: by 10.202.49.212 with SMTP id x203mr2950984oix.81.1436957388227; Wed, 15 Jul 2015 03:49:48 -0700 (PDT)
Received: by 10.76.115.167 with HTTP; Wed, 15 Jul 2015 03:49:48 -0700 (PDT)
In-Reply-To: <CANVW7MMRX_HUfbheEyn1OFL7x3x=1Qo=0=70y-Rcc13DmcSvsA@mail.gmail.com>
References: <CANVW7MPfRvpNjABKuqup9GxbUk7A=BaRqwsPcbP8DRpGoGMWdw@mail.gmail.com>	<CANVW7MP0Uv3LSSd=JfD5hTnMv75SpQ1WMrV_eWMa9owZ3T+yYg@mail.gmail.com>	<CAFiYyc3Do3Mn+eAgQGpbZCtzUy4y=FvbbtuPo2xkPwi=y=U2hg@mail.gmail.com>	<CANVW7MNHfPnXeU0bvQRQxjfX0Cq3Fm2ZbQBBowzt36WYg8Zhng@mail.gmail.com>	<CAFiYyc2w_f5WJrnyAgDf=Y8haqjDvF_d4RgqoHq_jwf5oopDdg@mail.gmail.com>	<CANVW7MN6jczCDYsvuYhTWSNTQnARfRUjWTxZpqAj8v868Beo8w@mail.gmail.com>	<CAFiYyc2quZ6WkyWhN0H9J_HS7HKX32qeE-jAW5EKW2DiH2WJZg@mail.gmail.com>	<CANVW7MMRX_HUfbheEyn1OFL7x3x=1Qo=0=70y-Rcc13DmcSvsA@mail.gmail.com>
Date: Wed, 15 Jul 2015 10:58:00 -0000
Message-ID: <CAFiYyc0kjcEyGPa7pNjuc1UUnjD+uM1PMpq0iHQ8c-_zd3rniQ@mail.gmail.com>
Subject: Re: conditional lim
From: Richard Biener <richard.guenther@gmail.com>
To: Evgeniya Maenkova <evgeniya.maenkova@gmail.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
X-SW-Source: 2015-07/txt/msg01251.txt.bz2

On Wed, Jul 15, 2015 at 10:36 AM, Evgeniya Maenkova
<evgeniya.maenkova@gmail.com> wrote:
> On Tue, Jul 14, 2015 at 2:54 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Mon, Jun 29, 2015 at 4:21 PM, Evgeniya Maenkova
>> <evgeniya.maenkova@gmail.com> wrote:
>>> On Mon, Jun 29, 2015 at 5:10 PM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>> On Tue, Jun 9, 2015 at 10:11 PM, Evgeniya Maenkova
>>>> <evgeniya.maenkova@gmail.com> wrote:
>>>>> On Tue, Jun 9, 2015 at 3:46 PM, Richard Biener
>>>>> <richard.guenther@gmail.com> wrote:
>>>>>> On Fri, May 29, 2015 at 3:14 PM, Evgeniya Maenkova
>>>>>> <evgeniya.maenkova@gmail.com> wrote:
>>>>>>> Hi Richard,
>>>>>>>
>>>>>>> Here is some explanation. I hope you let me know if I need to clari=
fy something.
>>>>>>>
>>>>>>> Also, you asked me about concrete example, to make sure you don=E2=
=80=99t miss
>>>>>>> my answer here is the link:
>>>>>>> https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02417.html.
>>>>>>>
>>>>>>> Also, I doubt whether it=E2=80=99s convenient for you to create a b=
uild with
>>>>>>> my patch or not. May be to clarify things you could send me some
>>>>>>> examples/concrete cases, then I=E2=80=99ll compile them with
>>>>>>> =E2=80=93fdump-tree-loopinit-details and =E2=80=93fdump-tree-lim-de=
tails and send you
>>>>>>> these dumps. May be these dumps will be useful. (I=E2=80=99ll only =
disable
>>>>>>> cleanup_cfg TODO after lim to let you know the exact picture after
>>>>>>> lim).
>>>>>>>
>>>>>>> What do you think?
>>>>>>>
>>>>>>> 1.       invariantness _dom_walker =E2=80=93
>>>>>>>
>>>>>>> 1.1   for each GIMPLE_COND in given bb calls handle_cond_stmt to ca=
ll
>>>>>>> for true and false edges handle_branch_edge, which calls SET_TARGET=
_OF
>>>>>>> for all bb =E2=80=98predicated=E2=80=99 by given GIMPLE_COND.
>>>>>>>
>>>>>>> SET_TARGET_OF sets in basic_blocks aux 2 facts:
>>>>>>>
>>>>>>> a)      this is true or false edge;
>>>>>>>
>>>>>>> b)      link to cond stmt;
>>>>>>>
>>>>>>> Handle_branch_edge works this way:
>>>>>>>
>>>>>>> If (cond1)
>>>>>>>
>>>>>>>   {
>>>>>>>
>>>>>>>      bb1;
>>>>>>>
>>>>>>>      if (cond2}
>>>>>>>
>>>>>>>        {
>>>>>>>
>>>>>>>            bb2;
>>>>>>>
>>>>>>>         }
>>>>>>>
>>>>>>>    Being called for cond1, it sets cond1 as condition for both bb1 =
and
>>>>>>> bb2 (the whole branch for cond1, ie also for bb containing cond2),
>>>>>>> then this method will be called (as there is dominance order) for
>>>>>>> cond2 to correct things (ie to set cond2 as condition for bb2).
>>>>>>
>>>>>> Hmm, why not track the current condition as state during the DOM walk
>>>>>> and thus avoid processing more than one basic-block in handle_branch=
_edge?
>>>>>> Thus get rid of handle_branch_edge and instead do everything in hand=
le_cond_stmt
>>>>>> plus the dom-walkers BB visitor?
>>>>>>
>>>>> I need to look more carefully how to implement it, but I think I
>>>>> understand what you mean and this optimization of course looks
>>>>> reasonable to me. Will do.
>>>>>
>>>>>> I see you don't handle BBs with multiple predecessors - that's ok, b=
ut
>>>>>> are you sure you don't run into correctness issues when not marking =
such
>>>>>> BBs as predicated?  This misses handling of, say
>>>>>>
>>>>>>  if (a || b)
>>>>>>    bb;
>>>>>>
>>>>>> which is a pity (but can be fixed later if desired).
>>>>>>
>>>>> I had some test (in gcc testsuite or bootstrap build) which worked
>>>>> incorrectly because of multiple predecessors. As far as I remember the
>>>>> situation was (further, will make some notes about such tests to
>>>>> clarify this better), I mean with previous version of my code which
>>>>> handled bb with 2 predecessors:
>>>>> if (a)
>>>>>   tmpvar=3Dsomething;
>>>>> while()
>>>>>   if (a || b)
>>>>>       basic_block {do something with tmpvar;} // I mean basic block
>>>>> predicated by bb with a and bb with b
>>>>>
>>>>> So, if a is false, I mean we didn't do tmpvar=3Dsomething (outside
>>>>> loop), BUT we are in basick_block  (we went by bb with b), we got
>>>>> Segmentation falt in basic_block {do something with tmpvar;}.
>>>>>
>>>>> I think we can reproduce all the details of this test if I remove not
>>>>> handling bb with 2 predecessors.
>>>>>
>>>>> So I wouldn't move bb with 2 predecessors (this is not always executed
>>>>> bb in any loop, not conditionally, they will not be moved at all).
>>>>>
>>>>> This is my more detail explanation on this point. Perhaps, I didn't
>>>>> understand your question about correctness. Could you repeat it in
>>>>> other words (based on this new clarification).
>>>>>
>>>>> So I think according to current code it will not be moved. What
>>>>> incorrectness do you mean?
>>>>
>>>> If the block isn't marked as predicated the question is whether it is
>>>> handled correctly or assumed to be unconditionally executed.
>>>>
>>>>>> I note that collecting predicates has similarities to what if-conver=
sion
>>>>>> does in tree-ifcvt.c (even if its implementation is completely diffe=
rent,
>>>>>> of course).
>>>>>>
>>>>>
>>>>> Ok, I'll look at this. But could you please clarify your point?
>>>>> (Should I just take this into account with low priority and look at
>>>>> this later or you want some refactoring?)
>>>>
>>>> I just noted similar code exists elsewhere - it may be possible to
>>>> factor it out but I didn't investigate.  And no, doing that isn't a pr=
erequesite
>>>> for this patch.
>>>>
>>>>>>> 1.2   As 1.1 goes we identify whether some bb is predicated by some
>>>>>>> condition or not.
>>>>>>>
>>>>>>> bb->aux->type will be [TRUE/FALSE]_TARGET_OF and
>>>>>>> bb->aux->cond_stmt=3Dcond stmt (the nearest condition).
>>>>>>>
>>>>>>> If bb is always executed bb->aux->type =3D ALWAYS_EXECUTED_IN,
>>>>>>> bb->loop->loop (this info was available in the clean build).
>>>>>>>
>>>>>>> 1.3   As this walker is called in dominance order, information about
>>>>>>> condition is available when invariantness_dom_walker is called for
>>>>>>> given bb.  So we can make some computations based on bb->aux
>>>>>>> structure. This is done in check_conditionally_executed. The main g=
oal
>>>>>>> of this method is to check that the root condition is always execut=
ed
>>>>>>> in the loop. I did so to avoid situation like this
>>>>>>>
>>>>>>> Loop:
>>>>>>>
>>>>>>>    Jmp somewhere;
>>>>>>>
>>>>>>>   If (cond1)
>>>>>>>
>>>>>>>       If (cond2)
>>>>>>>
>>>>>>>          Etc
>>>>>>>
>>>>>>> By =E2=80=98root condition=E2=80=99 I mean cond1 in this case (the =
first cond in
>>>>>>> dominance order).
>>>>>>
>>>>>> Ok, but this can't happen (an uncontional jump to somehwere).  It
>>>>>> will always be a conditional jump (a loop exit) and thus should
>>>>>> be handled already.
>>>>>>
>>>>>> Did you have a testcase that was mishandled?
>>>>>
>>>>> No, I have not such testcase. This is because I;m newbie at gcc and
>>>>> compilers. If you tell me this fact, let's just remove this check?
>>>>> Correct?
>>>>
>>>> Yes.
>>>>
>>>>>>
>>>>>>> If  =E2=80=98root condition=E2=80=99 is not always executed we disa=
ble (free) all the
>>>>>>> info in bb->aux, ie all the blocks becomes neither conditionally nor
>>>>>>> always executed according to bb->aux.
>>>>>>>
>>>>>>> There is some optimization to don=E2=80=99t go each time trough the=
 conditions
>>>>>>> chain (cond2->cond1), let me skip such details for now.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 1.4   Then we calculate tgt_level  (which is used in move_computati=
ons later)
>>>>>>>
>>>>>>> The patch is the same for phi and regular stmt (calculate_tgt_level)
>>>>>>>
>>>>>>> 1)      If stmt is conditionally executed we compute max movement
>>>>>>> (determine_max_movement) starting from
>>>>>>> get_lim_data(condition)->max_loop.
>>>>>>>
>>>>>>> 2)      If stmt is not cond executed as start level  for
>>>>>>> determine_max_movement computations we choose ALWAYS_EXECUTED_IN.
>>>>>>>
>>>>>>> To clarify why, see the comment
>>>>>>>
>>>>>>> /* The reason why we don't set start_level for MOVE_POSSIBLE
>>>>>>>
>>>>>>>           statements to more outer level is: this statement could d=
epend on
>>>>>>>
>>>>>>>           conditional statements and even probably is not defined w=
ithout this
>>>>>>>
>>>>>>>           condition. So either we should analyze these ones and move
>>>>>>>
>>>>>>>           under condition somehow or keep more strong start_level .=
 */
>>>>>>>
>>>>>>> As you noted in your review there was some refactoring. Of course, I
>>>>>>> had no goal to refactor existing code, I intended to remove some
>>>>>>> duplication which I caused by my changes.  I hope we discuss in
>>>>>>> details later if you don=E2=80=99t like some refactoring.
>>>>>>
>>>>>> Refactoring makes a big patch harder to review because it ends up
>>>>>> containing no-op changes.  It also makes it appear bigger than it
>>>>>> really is ;)
>>>>>>
>>>>>> A reasonable solution is to factor out the refactoring to a separate=
 patch.
>>>>>>
>>>>> I see what you mean.
>>>>>
>>>>>>> 2.       store_motion  - for some stmts set flags to ignore predica=
ted
>>>>>>> execution (I mean to move statements without conditions).
>>>>>>
>>>>>> That's for the existing "predicated" support as far as I can see.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 3.       move_computations. The patch is doing something in the
>>>>>>> following 3 calls inside of this method.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 3.1   (more details below) walker.walk =E2=80=93 move computations(=
create if()
>>>>>>> structure) and store information to create phi nodes later (which
>>>>>>> statements require phi nodes, as they conditionally executed and th=
ere
>>>>>>> are still their uses on other levels, what are the names for such p=
hi
>>>>>>> nodes, as we replace uses in here, in walker.walk.)
>>>>>>
>>>>>> What kind of uses are that?  By definition all stmts executed
>>>>>> under condition A that are used in code not under condition A have
>>>>>> a PHI node associated.  This is why the existing conditional movement
>>>>>> handling moves PHI nodes (and all dependent computations).
>>>>>>
>>>>>> Are you thinking of
>>>>>>
>>>>>>   for (;;)
>>>>>>     if (a) x =3D 1;
>>>>>>
>>>>>>   .. use of x
>>>>>>
>>>>>> ?
>>>>> No, I think in this case phi node of x will be used in initial
>>>>> (=3D=3Dloopinit) code (in 'use of x') instead of some version of x (w=
hich
>>>>> I move).
>>>>>
>>>>>>If you move the PHI node for x then you don't need to insert any other
>>>>>> PHI nodes (and you _do_ have to move the PHI node anyway).
>>>>>
>>>>>>
>>>>>>> 3.2   (more details below) create_phi_nodes =E2=80=93 create phi no=
des for
>>>>>>> statements which require that (see 3.1)
>>>>>>
>>>>>> Only PHI nodes need PHI node creation and only its dependences
>>>>>> might need predication.
>>>>>>
>>>>>> That is, if you move a conditionally executed statement but not
>>>>>> any PHI node it feeds then the movement is useless.
>>>>>>
>>>>>> But maybe I am missing something about the PHI business.
>>>>>>
>>>>> Ok. I have different understanding on this (and am afraid of this of
>>>>> course :) ).
>>>>>
>>>>> See the example:
>>>>> void bar (long);
>>>>> void foo (int cond1, int cond2, int cond3, int m, int n)
>>>>> {
>>>>>   unsigned x;
>>>>>   unsigned inv1;
>>>>>
>>>>>   for (x =3D 0; x < 1000; ++x)
>>>>>     {
>>>>>       if (cond1)
>>>>>         {
>>>>>           int tmp_inv1 =3D m*n;
>>>>>           non_inv1 =3D tmp_inv1 + x;
>>>>>         }
>>>>>     }
>>>>>   bar (non_inv1);
>>>>> }
>>>>>
>>>>> This is working example, will attach loopinit and lim1 dumps.
>>>>> There is no phi node for tmp_inv1 in loopinitat all. However, I think
>>>>> we should move tmp_inv1. To use tmp_inv1 in non_inv1 we need phi node
>>>>> (otherwise we get verification failure as bb with conditionally moved
>>>>> tmp_inv1 doesn;t dominate bb with non_inv1). Another example if
>>>>> non_inv1 is invariant but has not enough cost or something else. In
>>>>> short, if it's still used in initial or others loops. Say, non_inv1 is
>>>>> invariant and moved in another loop. Or something else.
>>>>>
>>>>> So, when I read
>>>>> 'Only PHI nodes need PHI node creation'
>>>>> and
>>>>> 'That is, if you move a conditionally executed statement but not any
>>>>> PHI node it feeds then the movement is useless.'
>>>>> I think about this example and don't understand. Could you clarify?
>>>>
>>>> Ok, so this case is already handled by LIM by simply moving tmp_inv1 =
=3D m*n
>>>> (and making it unconditionally execute before the loop).  In fact whet=
her
>>>> we can move it does not depend on whether 'cond' is invariant or not.
>>>>
>>>> If we couldn't execute m*n unconditionally on the loop preheader edge
>>>> then we'd have to be able to also move the guarding condition.  Note
>>>> that cost considerations shouldn't come into play here.
>>>>
>>>> So I was only thinking of the case where we also want to move the
>>>> condition (which is wanted if we want to move a PHI node).
>>>>
>>>
>>> Don't understand "So I was only thinking of the case where we also
>>> want to move the condition (which is wanted if we want to move a PHI
>>> node)."
>>>
>>> What about the case when we want to move condition but don't have phi
>>> node to move.
>>>
>>> Let's consider a little bit modified example:
>>>
>>> void bar (long);
>>> void foo (int cond1, int cond2, int cond3, int m, int n)
>>>   {
>>>      unsigned x;
>>>      unsigned inv1;
>>>
>>>      for (x =3D 0; x < 1000; ++x)
>>>        {
>>>          if (n)
>>>            {
>>>              int tmp_inv1 =3D m/n;
>>>              non_inv1 =3D tmp_inv1 + x;
>>>             }
>>>        }
>>>      bar (non_inv1);
>>>  }
>>> We couldn't  move tmp_inv1 uncoditionally. Right (I have no clean
>>> build at hands right now to check. If I'm wrong let's replace m/n to
>>> something that couldn't be unconditionally moved :))? There is no phi
>>> node for tmp_inv1.
>>> Could you clarify what conditional lim should do in your opinion here?
>>
>> Nothing at this point.  If only then for the simple reason to make the
>> first iteration of the patch simpler (still covering most cases).
>>
>> You seem to want to catch 100% of all possible cases.
>>
>>> Is it too little gain to move statements that only used in the same bb
>>> or branch (I mean before phi node creation, even if stmt had phi
>>> node)?
>>>
>>> In fact, we could have here 2 thousands of divisions:
>>> if (n)
>>>   {
>>>     int tmp_inv1 =3D m/n;
>>>     tmp_inv2 =3D something_which_couldn't_be moved_unconditionally (tmp=
_inv1);
>>>     ...
>>>     tmp_invN =3D something_which_couldn't_be moved_unconditionally (tmp=
_invN-1);
>>>     non_inv1 =3D tmp_invN + x;
>>>   }
>>
>> Of course we could.  But the patch didn't even exercise this case (no te=
stcase).
>
> Patch handles this case (this is why I created phi nodes which we
> tried to discuss).
> Your comment is about test case not about the patch.

My comment was about the lack of a testcase for this in the patch submissio=
n.

But my whole point is that supporting this should be split off to a followup
patch to make both patches easier to review.

Richard.

>>
>> Please make patches simple - support for moving non-PHIs can be added
>> as a followup.  The first important part is to make the current code
>> (moving PHIs)
>> create control-flow rather than if-converted form.
>>
>> Richard.
>>
>>>
>>>>>>>
>>>>>>> 3.3   replace_uses_in_conditions
>>>>>>>
>>>>>>> After computations movements we can have several copies of the cond
>>>>>>> stmt. In 3.1 we do replacement of uses based on stmt=E2=80=99s tgt_=
level. For,
>>>>>>> cond stmt it doesn=E2=80=99t work. As cond stmt, of course, have so=
mething in
>>>>>>> lim_aux_data->tgt_level, but, in fact, they are moved only if
>>>>>>> corresponding invariants are moved. So, in fact, the same cond (cop=
ies
>>>>>>> of it, of course) could go to the different levels. So to fix these
>>>>>>> uses, we call replace_uses_in_conditions.
>>>>>>
>>>>>> Hmm, but stmts that a condition depends on always have at least the
>>>>>> target level of the condition statement.  Thus their computation is
>>>>>> available in all other "same" conditon statements that might appear
>>>>>> in more inner loop levels, no?  So isn't this just about performing =
the
>>>>>> movement in a proper ordering?
>>>>>
>>>>> Let me give your more details, then may be you repeat question in oth=
er words.
>>>>> for ()
>>>>> {
>>>>>   if (a)
>>>>>     {
>>>>>        inv1 =3D some_computations;
>>>>>        if (inv1 < inv2)
>>>>>             {
>>>>>               inv3;
>>>>>               inv4;
>>>>>             }
>>>>>        }
>>>>> }
>>>>>
>>>>> So, the order for movement is: inv1; inv3; inv4. When we move inv1 we
>>>>> have tgt_level for inv3 and inv4 computed but we have not created
>>>>> copies of 'if (inv1 < inv2)' created. In theory, we can have several
>>>>> copies:(1) for tgt_level of inv3 (2) for tgt_level of inv2 (3) in
>>>>> initial cycle. if (1) is in tgt level of inv1 we need no replacement,
>>>>> in other cases we need replacements. Of course, we know this(all the
>>>>> info about tgt_levels) when we move inv1, but then we need to create
>>>>> all needed copies (store it somehow, etc) of 'if (inv1 < inv2') and
>>>>> make replacement in these copies. I don't see now why this is much
>>>>> better. This doesn't mean that current solution is perfect, just
>>>>> clarify your thought.
>>>>
>>>> I still don't get what "replacements" actually mean.  Let's extend the
>>>> example to
>>>>
>>>> for ()  (A)
>>>>   {
>>>>> for ()  (B)
>>>>> {
>>>>>   if (a)
>>>>>     {
>>>>>        inv1 =3D some_computations;
>>>>>        if (inv1 < inv2)
>>>>>             {
>>>>>               inv3;
>>>>>               inv4;
>>>>>             }
>>>>>        }
>>>>> }
>>>>  }
>>>>
>>>> then you say the target level of inv1 could be 'A' and the target
>>>> level of inv3 'B'
>>>> and that of inv4 is 'B' as well.
>>>>
>>>> That's true.  So we move inv1 to before 'A' and inv3/inv4 to before 'B=
' guarded
>>>> with if (inv1 < inv2).  I fail to see what "copies" we need here.  'in=
v1' is
>>>> available in all target levels below its own, no copies needed.
>>>>
>>>
>>> I meant the copies of conditions (COND_STMT: inv1 < inv2).
>>>
>>> Let's modify this example: tgt_level (inv1) =3D A; tgt_level(inv3) =3D =
A;
>>> tgt_level(inv4) =3D B.
>>>
>>>  Then
>>> if (a)
>>>    inv1=3Dsome_computations;
>>> if (inv1 < inv2)
>>>    inv3;
>>> <preheader edge for A, where phi for inv1(phi_for_inv1) is computed>
>>> for ()  (A)
>>>  {
>>>  if (phi_for_inv1 <inv2)
>>>    inv3
>>>  for ()  (B)
>>>    {
>>>    if (a)
>>>      {
>>>         if (phi_for_inv1 < inv2)
>>>           {
>>>           }
>>>      }
>>>  }
>>> So, we have 3 copies of inv1 < inv2 after lim (in some of them we use
>>> inv1, in other phi_for_inv1). But this is the second priority for now
>>> (this is all about whether we should move statements w/o phi node).
>>>
>>>>>>
>>>>>>> More details on 3.1 and 3.2
>>>>>>>
>>>>>>> 3.1 The patch is very similar for phi stmts and regular stmts (there
>>>>>>> is some difference for regular stmts related to the case when we mo=
ve
>>>>>>> sequence instead of single stmt, let=E2=80=99s skip it in this desc=
ription,
>>>>>>> will describe later. BTW, there are comments, in corresponding part=
s).
>>>>>>>
>>>>>>> a) for each bb we call check_conditionally_executed and if bb is co=
nd
>>>>>>> executed then invariant statement  will be moved conditionally.
>>>>>>>
>>>>>>> b) prepare_edge_for_stmt collects information needed to create phi
>>>>>>> nodes in 3.2(add_info_for_phi) AND prepares edge ie creates if()
>>>>>>> structure needed to move statement
>>>>>>> (get_block_for_predicated_statement, see below). All of this is done
>>>>>>> for cond executed stmt. Nothing is done for non cond executed.
>>>>>>>
>>>>>>> BTW, there is some strange names for functions like missed ending
>>>>>>> (prepare_edge_for_stm, w/o t in the end, this is because of my scri=
pt
>>>>>>> to add =E2=80=98 =E2=80=98 before =E2=80=98(=E2=80=98, will fix in =
the next patch version, ignore
>>>>>>> please so far).
>>>>>>>
>>>>>>> Get_block_for_predicated_stmt:
>>>>>>>
>>>>>>> Create bb where to add cond stmt to (or return existing, each loop =
has
>>>>>>> a map  (level_cond_data_map is map <loop, cond_data_map> ,
>>>>>>> cond_data_map is map <gimple, basic_block>) to identify if bb for
>>>>>>> given cond was created or not for given level).
>>>>>>>
>>>>>>> Parameters of this method:
>>>>>>>
>>>>>>> Cond =E2=80=93 condition of stmt, branch =E2=80=93 is it on true or=
 false edge of
>>>>>>> cond, cond_map  - see above, level (tgt_level), preheader =E2=80=93
>>>>>>> preheader_edge for level.
>>>>>>>
>>>>>>> So first of all we check whether there is a basic block for given
>>>>>>> condition in given loop.
>>>>>>>
>>>>>>> a)      If such bb exists then we trying to find corresponding edge,
>>>>>>> destination of this edge shouldn=E2=80=99t be a post dominator of bb
>>>>>>> (find_branch_edge).
>>>>>>>
>>>>>>> a.1) if such edge exists we return the most remote in this branch bb
>>>>>>> (or create it), see get_most_remote_in_branch;
>>>>>>>
>>>>>>> a.2) if such edge doesn=E2=80=99t exists, we create corresponding b=
b, see make_branch
>>>>>>>
>>>>>>>        b)  If such bb doesn=E2=80=99t exist we create it (recursive=
ly calling
>>>>>>> get_block_for_predicated_stmt if required bb is conditionally
>>>>>>> executed).
>>>>>>>
>>>>>>> Will not describe it further so far (let me know if you would like =
to
>>>>>>> read additional details).
>>>>>>>
>>>>>>> Add_info_for_phi
>>>>>>>
>>>>>>> The main goal of this method is to identify situation where
>>>>>>> stmt(parameter,ie the statement which we are moving) or more exactly
>>>>>>> lhs of this statement
>>>>>>>
>>>>>>> a)      is still used in other loops, so we need to create phi node=
 to
>>>>>>> use phi name inside other loops.
>>>>>>>
>>>>>>> b)      Is used in phi node of original loop and this phi node is
>>>>>>> going to be moved. BUT, as we move phi node as assignment (in the c=
ase
>>>>>>> of phi node with 2 arguments) we need to create phi node to use lhs=
 of
>>>>>>> statement (param of add_info_for_phi).
>>>>>>>
>>>>>>> In these cases we make corresponding replacements and store
>>>>>>> information needed to create phi nodes and make some other
>>>>>>> replacements in 3.3 (replace_uses_in_conditions).
>>>>>>>
>>>>>>> Add_info_for_phi calls create_tmp_var which requires some explanati=
on.
>>>>>>>
>>>>>>> Create_tmp_var
>>>>>>>
>>>>>>> To create new names for phi nodes and to create default def for phi
>>>>>>> arguments I use make_ssa_name and get_or_create_ssa_default_def. Th=
ese
>>>>>>> methods have some asserts (being merged):   (TREE_CODE (old_var) =
=3D=3D
>>>>>>> VAR_DECL   || TREE_CODE (old_var) =3D=3D PARM_DECL   || TREE_CODE
>>>>>>> (old_var) =3D=3D RESULT_DECL).
>>>>>>>
>>>>>>> So in some cases (when I know that I need to use methods mentioned
>>>>>>> above, see the start of create_tmp_var, ie the uses in other loops)=
 I
>>>>>>> create new tmp variable to overcome these asserts. To be honest I
>>>>>>> don=E2=80=99t like this but don=E2=80=99t know how to deal with thi=
s better.
>>>>>>
>>>>>> Hmm, don't you just run into SSA names here?  That is, TREE_CODE
>>>>>> (old_var) =3D=3D SSA_NAME?
>>>>>> You can use make_ssa_name (TREE_TYPE (old_var), "plim") in this case.
>>>>>>
>>>>>> Or you run into the issue that anonymous SSA names cannot have defau=
lt
>>>>>> defintions (I don't understand why you need all the fuss about defau=
lt defs).
>>>>>
>>>>> I used default defs to write something in phi args when I have no val=
ues on
>>>>> corresponding branches. Say there is some tmp value which I should
>>>>> create phi for.
>>>>> if (a)
>>>>>    {
>>>>>      tmpvar=3Dsomething
>>>>>      b =3D something(tmpvar);
>>>>>     }
>>>>> else
>>>>>    {
>>>>>    }
>>>>> When I move if (a) {tmpvar =3D something} and create phi for tmpvar (=
to
>>>>> use phi result
>>>>> in the initial cycle) i need to write something for 'not a' phi arg.
>>>>> BUT, in some cases I use
>>>>>  build_zero_cst (which I started to use after default defs use. I mean
>>>>> I didn't know about
>>>>>  build_zero_cst while I started to use default defs.). So may be I ca=
n replace
>>>>> uses of default defs in such cases by build_zero_cst? I mean just SOM=
E values
>>>>> (as I know that control flow will never go to this places actually). I
>>>>> mean if !a tmpvar will not used.
>>>>
>>>> Ok, this is again about the case where you are not moving a PHI node b=
ut
>>>> creating it because you want to conditionally execute invariant stmts.
>>>> Yes, a default def is ok here, but you can always use a new VAR_DECL
>>>> just for the default def creation like gimple_fold_call does for examp=
le:
>>>>
>>>>                           tree var =3D create_tmp_var (TREE_TYPE (lhs)=
);
>>>>                           tree def =3D get_or_create_ssa_default_def (=
cfun, var);
>>>>
>>>> Note that I think we don't need to bother about this case as we'd only
>>>> want to move PHI nodes in the first place (so you already have values
>>>> for all PHI args).
>>>>
>>>>>>
>>>>>>> And a couple of words regarding where we store info for phi nodes:
>>>>>>>
>>>>>>> Each loop contains in its aux  phi_data structure. On this stage we
>>>>>>> only add there stmt (if phi node will be required), phi name=3Dphi =
node
>>>>>>> result name in names_map or fl_names_map). See the comment about
>>>>>>> phi_data in patch.
>>>>>>>
>>>>>>> If there should be created several phi nodes for given lhs (I mean =
the
>>>>>>> first place for phi node doesn=E2=80=99t dominate on corresponding =
preheader,
>>>>>>> we add only the last name in names_map (as intermediate names could=
 be
>>>>>>> created later, but the last name in bb which dominates preheader
>>>>>>> should be used for replacement in other loops. Replacements were
>>>>>>> already made in walker).
>>>>>>>
>>>>>>> If lhs is used only in phi node, which are moved and transformed in=
to
>>>>>>> assignment, and there is no uses in other loops, we need to create
>>>>>>> only first level phi node (don=E2=80=99t need to create phi nodes t=
ill some bb
>>>>>>> which dominates preheader), then we add such name to fl_names_map.
>>>>>>>
>>>>>>> 3.2 Create_phi_nodes
>>>>>>>
>>>>>>> For each of the loops we go over ((phi_data*) loop->aux)->stmts. Th=
ese
>>>>>>> are statements which were moved, but they have uses in the original
>>>>>>> loop (more exactly, their uses in some other loops replaced by some
>>>>>>> name from ((phi_data*) loop->aux)->names_map or ((phi_data*)
>>>>>>> loop->aux)->fl_names_map.
>>>>>>>
>>>>>>> So for each of such stmts we create phi nodes (for its lhs) chain.
>>>>>>>
>>>>>>> Basic block for phi node is found in get_bb_for_phi  for given bb w=
ith
>>>>>>> stmt. If basic block for phi node dominates preheader->src we end
>>>>>>> chain creation. There is some special condition for the case when we
>>>>>>> need to create only first level phi node. (I described this situati=
on
>>>>>>> above, but let me know If I need to add  more details.
>>>>>>>
>>>>>>> Basic blocks can have maps to identify if we created phi node for
>>>>>>> given variable in given basic_block, so we only add an edge argument
>>>>>>> for such case (phi_map =3D new hash_map<basic_block, hash_map<tree,
>>>>>>> gphi*>*>).
>>>>>>
>>>>>> Ok.
>>>>>>
>>>>>> Well - I think it might be easier code-generation-wise to have a sep=
arate
>>>>>> phase move_phis executed before move_computations, that just moves
>>>>>> invariant PHI nodes (which this all is about - see above) and their
>>>>>> controlling conditions and only then move their dependencies.
>>>>> I didn;t understand so far (I think let's clarify previous questions,=
especially
>>>>> about phi nodes then let's go back to this one.)
>>>>
>>>> Yeah, I think the PHI node thing is our major mis-understanding (or the
>>>> main thing that I think complicates the patch for too little gain - at=
 least
>>>> initially).
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Richard.
>>>>>>
>>>>>>> In short, that=E2=80=99s all.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Evgeniya
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sat, May 9, 2015 at 1:07 AM, Evgeniya Maenkova
>>>>>>> <evgeniya.maenkova@gmail.com> wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Could you please review my patch for predicated lim?
>>>>>>>>
>>>>>>>> Let me note some details about it:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 1)      Phi statements are still moved only if they have 1 or 2
>>>>>>>> arguments. However, phi statements could be move under conditions =
(as
>>>>>>>> it=E2=80=99s done for the other statements).  Probably, phi statem=
ent motion
>>>>>>>> with 3 + arguments could be implemented in the next patch after
>>>>>>>> predicated lim.
>>>>>>>>
>>>>>>>> 2)      Patch has limitations/features like (it was ok to me to
>>>>>>>> implement it such way, maybe I=E2=80=99m not correct. ):
>>>>>>>>
>>>>>>>> a)      Loop1
>>>>>>>>
>>>>>>>>     {
>>>>>>>>
>>>>>>>>               If (a)
>>>>>>>>
>>>>>>>>                  Loop2
>>>>>>>>
>>>>>>>>                      {
>>>>>>>>
>>>>>>>>                        Stmt - Invariant for Loop1
>>>>>>>>
>>>>>>>>                      }
>>>>>>>>
>>>>>>>>              }
>>>>>>>>
>>>>>>>>        In this case Stmt will be moved only out of Loop2, because =
of if (a).
>>>>>>>>
>>>>>>>> b)      Or
>>>>>>>>
>>>>>>>> Loop1
>>>>>>>>
>>>>>>>>     {
>>>>>>>>
>>>>>>>>          =E2=80=A6
>>>>>>>>
>>>>>>>>          If (cond1)
>>>>>>>>
>>>>>>>>               If (cond2)
>>>>>>>>
>>>>>>>>                   If (cond3)
>>>>>>>>
>>>>>>>>                       Stmt;
>>>>>>>>
>>>>>>>>        }
>>>>>>>>
>>>>>>>> Stmt will be moved out only if cond1 is always executed in Loop1.
>>>>>>>>
>>>>>>>> c)       It took me a long time to write all of these code, so the=
re
>>>>>>>> might be other peculiarities which I forgot to mention. :)
>>>>>>>>
>>>>>>>>                        Let=E2=80=99s discuss these ones as you wil=
l review my patch.
>>>>>>>>
>>>>>>>> 3)      Patch consists of 9 files:
>>>>>>>>
>>>>>>>> a)      gcc/testsuite/gcc.dg/tree-ssa/loop-7.c,
>>>>>>>> gcc/testsuite/gcc.dg/tree-ssa/recip-3.c =E2=80=93 changed tests:
>>>>>>>>
>>>>>>>> -          gcc/testsuite/gcc.dg/tree-ssa/loop-7.c  changed as
>>>>>>>> predicated lim moves 2 more statements out of the loop;
>>>>>>>>
>>>>>>>> -          gcc/testsuite/gcc.dg/tree-ssa/recip-3.c =E2=80=93 with =
conditional
>>>>>>>> lim recip optimization in this test doesn=E2=80=99t work (the corr=
esponding
>>>>>>>> value is below threshold as I could see in the code for recip, 1<3=
).
>>>>>>>> So to have recip working in this test I changed test a little bit.
>>>>>>>>
>>>>>>>> b)      gcc/tree-ssa-loop-im.c =E2=80=93 the patched lim per se
>>>>>>>>
>>>>>>>> c)       gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-13.c,
>>>>>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-14.c,
>>>>>>>>
>>>>>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-15.c,
>>>>>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c,
>>>>>>>>
>>>>>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c,
>>>>>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c
>>>>>>>>
>>>>>>>> the tests for conditional lim.
>>>>>>>>
>>>>>>>> 4)      Patch testing:
>>>>>>>>
>>>>>>>> a)      make =E2=80=93k check (no difference in results for me for=
 the clean
>>>>>>>> build and the patched one,
>>>>>>>>
>>>>>>>> -          Revision: 222849,
>>>>>>>>
>>>>>>>> -          uname -a
>>>>>>>>
>>>>>>>>                Linux Istanbul 3.16.0-23-generic #31-Ubuntu SMP Tue=
 Oct
>>>>>>>> 21 18:00:35 UTC 2014 i686 i686 i686 GNU/Linux
>>>>>>>>
>>>>>>>> b)      Bootstrap.
>>>>>>>>
>>>>>>>>  It goes well now, however to fix it I have made a temporary hack =
in
>>>>>>>> the lim code. And with this fix patch definitely shouldn=E2=80=99t=
 be
>>>>>>>> committed.
>>>>>>>>
>>>>>>>> I did so, as I would like to discuss this issue first.
>>>>>>>>
>>>>>>>> The problem is: I got stage2-stage3 comparison failure on the sing=
le
>>>>>>>> file (tree-vect-data-refs.o). After some investigation I understood
>>>>>>>> that tree-vect-data-refs.o differs being compiled with and without
>>>>>>>> =E2=80=98-g=E2=80=99 option (yes, more exactly on stage 2 this is =
 =E2=80=98-g =E2=80=93O2 =E2=80=93gtoggle=E2=80=99,
>>>>>>>> and for stage 3 this is =E2=80=98-g =E2=80=93O2=E2=80=99. But to s=
implify things I can
>>>>>>>> reproduce this difference on the same build (even not bootstrapped=
),
>>>>>>>> with option =E2=80=98 =E2=80=93g=E2=80=99 and without it).
>>>>>>>>
>>>>>>>> Of course, the file compiled with =E2=80=93g option will contain d=
ebug
>>>>>>>> information and will differ from the corresponding file without de=
bug
>>>>>>>> information. I mean there is the difference reported by
>>>>>>>> contrib/compare-debug (I mean we talk about stripped files).
>>>>>>>>
>>>>>>>> Then I compared assemblers and lim dump files and made a conclusion
>>>>>>>> the difference is:
>>>>>>>>
>>>>>>>> There is statement _484=3Dsomething, which is conditionally moved =
out of
>>>>>>>> loop X. In non debug cases no usages of _484 are left inside the l=
oop
>>>>>>>> X. In debug case, there is the single use in debug statement. So f=
or
>>>>>>>> debug case my patch generates phi statement for _484 to make it
>>>>>>>> available inside the loop X. For non debug case, no such phi state=
ment
>>>>>>>> generated as there is no uses of _484.
>>>>>>>>
>>>>>>>> There is one more statement with lhs=3D_493, with exactly this pat=
tern
>>>>>>>> of use. But this is not important for our discussion.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> So, in my opinion it=E2=80=99s ok to generate additional phi node =
for debug
>>>>>>>> case. But I=E2=80=99m not a compiler expert and maybe there is some
>>>>>>>> requirement that debug and non-debug versions should differ only by
>>>>>>>> debug statements, I don=E2=80=99t know.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> My temporary hack fix is just removing of such debug statements (no
>>>>>>>> debug statements, no problems J).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> The alternatives I see are:
>>>>>>>>
>>>>>>>> -          Move debug statements outside the loop (not good in my =
opinion);
>>>>>>>>
>>>>>>>> -          Somehow reset values in debug statements (not good in my
>>>>>>>> opinion, if I could provide correct information for them);
>>>>>>>>
>>>>>>>> -          Generate phi for debug statements and fix bootstrap (sa=
y,
>>>>>>>> let=E2=80=99s have some list of files, which we have special check=
 for. I mean
>>>>>>>> for my case, the difference is not in stage2 and stage3, it=E2=80=
=99s in debug
>>>>>>>> and non-debug). I like this variant. However, as I don=E2=80=99t s=
ee such list
>>>>>>>> of special files in the whole gcc, I think I might be missing
>>>>>>>> something.
>>>>>>>>
>>>>>>>> What do you think?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Evgeniya
>>>
>>>
>>>
>>> --
>>> Thanks,
>>>
>>> Evgeniya
>>>
>>> perfstories.wordpress.com
>
>
>
> --
> Thanks,
>
> Evgeniya
>
> perfstories.wordpress.com