From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-398983-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 21785 invoked by alias); 27 May 2015 13:07:54 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 21764 invoked by uid 89); 27 May 2015 13:07:52 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.2 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2
X-HELO: mail-lb0-f180.google.com
Received: from mail-lb0-f180.google.com (HELO mail-lb0-f180.google.com) (209.85.217.180) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Wed, 27 May 2015 13:07:42 +0000
Received: by lbbqq2 with SMTP id qq2so6735678lbb.3        for <gcc-patches@gcc.gnu.org>; Wed, 27 May 2015 06:07:39 -0700 (PDT)
MIME-Version: 1.0
X-Received: by 10.112.159.162 with SMTP id xd2mr27182541lbb.67.1432732059470; Wed, 27 May 2015 06:07:39 -0700 (PDT)
Received: by 10.112.1.100 with HTTP; Wed, 27 May 2015 06:07:39 -0700 (PDT)
In-Reply-To: <CAFiYyc3sR2Z0GVZ+kiQ99GUWyYmuSTNnCdmcPLnFO9aOAEh2NA@mail.gmail.com>
References: <CANVW7MPfRvpNjABKuqup9GxbUk7A=BaRqwsPcbP8DRpGoGMWdw@mail.gmail.com>	<CAFiYyc0jxE958ROvpAEZb53xVOgn=wjm3XSLn-92SiZ3+3t0OA@mail.gmail.com>	<CANVW7MN_LYqO-p4Vh2LQ6vSEc6RRUMhUo=ydfjr17kGW=XeVLg@mail.gmail.com>	<CAFiYyc3sR2Z0GVZ+kiQ99GUWyYmuSTNnCdmcPLnFO9aOAEh2NA@mail.gmail.com>
Date: Wed, 27 May 2015 13:36:00 -0000
Message-ID: <CANVW7MNTf8DJrXrv=kG3WimkqorSwtNjgBUpW4tygja7vaZhJg@mail.gmail.com>
Subject: Re: conditional lim
From: Evgeniya Maenkova <evgeniya.maenkova@gmail.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
X-SW-Source: 2015-05/txt/msg02421.txt.bz2

On Wed, May 27, 2015 at 2:11 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Tue, May 26, 2015 at 3:10 PM, Evgeniya Maenkova
> <evgeniya.maenkova@gmail.com> wrote:
>> Hi, Richard
>>
>> Thanks for review starting.
>>
>> Do you see any major issues with this patch (i.e. algorithms and ideas
>> that should be completely replaced, effectively causing the re-write
>> of most code)?
>>
>> To decide if there are major issues in the patch, perhaps, you need
>> additional clarifications from me? Could you point at the places where
>> additional explanations could save you most effort?
>>
>> Your answers to these questions are looking the first priority ones.
>> You wrote about several issues in the code, which are looking as easy
>> (or almost easy ;) to fix(inline functions, unswitch-loops flag,
>> comments, etc). But, I think you agree, let=E2=80=99s first decide about=
 the
>> major issues (I mean, whether we continue with this patch or starting
>> new one, this will save a lot of time for both of us).
>
> I didn't get an overall idea on how the patch works, that is, how it inte=
grates
> with the existing algorithm.  If you can elaborate on that a bit that wou=
ld
> be helpful.
>
Hi,
Sure, I'll write you some notes in several days.

> I think the code-generation part needs some work (whether by following
> my idea with re-using copy_bbs or whether by basically re-implementing
> it is up to debate).  How does your code handle
>
>   for ()
>     {
>        if (cond1)
>         {
>            if (cond2)
>              invariant;
>            if (cond3)
>              invariant;
>         }
>     }
>
> ?  Optimally we'd have before the loop exactly the same if () structure
> (thus if (cond1) is shared).

If both invariants are going out of the same loop (i mean tgt_level),
then if structure will be the same.
for1()
  for ()
    {
      if (cond1)
        {
          if (cond2)
            invariant1;
          if (cond3)
            invariant2;
        }
    }

will be transformed to
for1()
  if (cond1)
    {
      if (cond2)
        invariant1;
      if (cond3)
        invariant2;
     }
  }
  for ()
    {
      if (cond1)
        {
          if (cond2);
          if (cond3);
        }
    }
(I don't cleanup empty if's in lim code).


If these invarians are moved in different loops then

for1
  for2()
    for()
      {
        if (cond1)
          {
            if (cond2)
              invariant1;
            if (cond3)
              invariant2;
           }
       }

will be transformed to:
for1
  {
    if (cond1)
      if (cond2)
        invariant1;
    for2()
      {
         if (cond1)
           if (cond3)
             invariant2;
         for()
           {
              if (cond1)
               {
                 if (cond2);
                 if (cond3);
               }
           }
        }
   }

Of course, there could be some bugs, but the idea was as mentioned above.

This transformation was looking logical to me. What do you think?

Thanks,

Evgeniya

>
> Richard.


>
>
>> Thanks,
>>
>> Evgeniya
>>
>> On Tue, May 26, 2015 at 2:31 PM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Fri, May 8, 2015 at 11:07 PM, Evgeniya Maenkova
>>> <evgeniya.maenkova@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> Could you please review my patch for predicated lim?
>>>>
>>>> Let me note some details about it:
>>>>
>>>>
>>>>
>>>> 1)      Phi statements are still moved only if they have 1 or 2
>>>> arguments. However, phi statements could be move under conditions (as
>>>> it=E2=80=99s done for the other statements).  Probably, phi statement =
motion
>>>> with 3 + arguments could be implemented in the next patch after
>>>> predicated lim.
>>>>
>>>> 2)      Patch has limitations/features like (it was ok to me to
>>>> implement it such way, maybe I=E2=80=99m not correct. ):
>>>>
>>>> a)      Loop1
>>>>
>>>>     {
>>>>
>>>>               If (a)
>>>>
>>>>                  Loop2
>>>>
>>>>                      {
>>>>
>>>>                        Stmt - Invariant for Loop1
>>>>
>>>>                      }
>>>>
>>>>              }
>>>>
>>>>        In this case Stmt will be moved only out of Loop2, because of i=
f (a).
>>>>
>>>> b)      Or
>>>>
>>>> Loop1
>>>>
>>>>     {
>>>>
>>>>          =E2=80=A6
>>>>
>>>>          If (cond1)
>>>>
>>>>               If (cond2)
>>>>
>>>>                   If (cond3)
>>>>
>>>>                       Stmt;
>>>>
>>>>        }
>>>>
>>>> Stmt will be moved out only if cond1 is always executed in Loop1.
>>>>
>>>> c)       It took me a long time to write all of these code, so there
>>>> might be other peculiarities which I forgot to mention. :)
>>>>
>>>>                        Let=E2=80=99s discuss these ones as you will re=
view my patch.
>>>>
>>>> 3)      Patch consists of 9 files:
>>>>
>>>> a)      gcc/testsuite/gcc.dg/tree-ssa/loop-7.c,
>>>> gcc/testsuite/gcc.dg/tree-ssa/recip-3.c =E2=80=93 changed tests:
>>>>
>>>> -          gcc/testsuite/gcc.dg/tree-ssa/loop-7.c  changed as
>>>> predicated lim moves 2 more statements out of the loop;
>>>>
>>>> -          gcc/testsuite/gcc.dg/tree-ssa/recip-3.c =E2=80=93 with cond=
itional
>>>> lim recip optimization in this test doesn=E2=80=99t work (the correspo=
nding
>>>> value is below threshold as I could see in the code for recip, 1<3).
>>>> So to have recip working in this test I changed test a little bit.
>>>>
>>>> b)      gcc/tree-ssa-loop-im.c =E2=80=93 the patched lim per se
>>>>
>>>> c)       gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-13.c,
>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-14.c,
>>>>
>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-15.c,
>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c,
>>>>
>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c,
>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c
>>>>
>>>> the tests for conditional lim.
>>>>
>>>> 4)      Patch testing:
>>>>
>>>> a)      make =E2=80=93k check (no difference in results for me for the=
 clean
>>>> build and the patched one,
>>>>
>>>> -          Revision: 222849,
>>>>
>>>> -          uname -a
>>>>
>>>>                Linux Istanbul 3.16.0-23-generic #31-Ubuntu SMP Tue Oct
>>>> 21 18:00:35 UTC 2014 i686 i686 i686 GNU/Linux
>>>>
>>>> b)      Bootstrap.
>>>>
>>>>  It goes well now, however to fix it I have made a temporary hack in
>>>> the lim code. And with this fix patch definitely shouldn=E2=80=99t be
>>>> committed.
>>>>
>>>> I did so, as I would like to discuss this issue first.
>>>>
>>>> The problem is: I got stage2-stage3 comparison failure on the single
>>>> file (tree-vect-data-refs.o). After some investigation I understood
>>>> that tree-vect-data-refs.o differs being compiled with and without
>>>> =E2=80=98-g=E2=80=99 option (yes, more exactly on stage 2 this is  =E2=
=80=98-g =E2=80=93O2 =E2=80=93gtoggle=E2=80=99,
>>>> and for stage 3 this is =E2=80=98-g =E2=80=93O2=E2=80=99. But to simpl=
ify things I can
>>>> reproduce this difference on the same build (even not bootstrapped),
>>>> with option =E2=80=98 =E2=80=93g=E2=80=99 and without it).
>>>>
>>>> Of course, the file compiled with =E2=80=93g option will contain debug
>>>> information and will differ from the corresponding file without debug
>>>> information. I mean there is the difference reported by
>>>> contrib/compare-debug (I mean we talk about stripped files).
>>>>
>>>> Then I compared assemblers and lim dump files and made a conclusion
>>>> the difference is:
>>>>
>>>> There is statement _484=3Dsomething, which is conditionally moved out =
of
>>>> loop X. In non debug cases no usages of _484 are left inside the loop
>>>> X. In debug case, there is the single use in debug statement. So for
>>>> debug case my patch generates phi statement for _484 to make it
>>>> available inside the loop X. For non debug case, no such phi statement
>>>> generated as there is no uses of _484.
>>>>
>>>> There is one more statement with lhs=3D_493, with exactly this pattern
>>>> of use. But this is not important for our discussion.
>>>>
>>>>
>>>>
>>>> So, in my opinion it=E2=80=99s ok to generate additional phi node for =
debug
>>>> case. But I=E2=80=99m not a compiler expert and maybe there is some
>>>> requirement that debug and non-debug versions should differ only by
>>>> debug statements, I don=E2=80=99t know.
>>>
>>> As Andi said code generation needs to be the same.
>>>
>>>> My temporary hack fix is just removing of such debug statements (no
>>>> debug statements, no problems J).
>>>
>>> That's fine and generally what is done in other passes.
>>>
>>>> The alternatives I see are:
>>>>
>>>> -          Move debug statements outside the loop (not good in my opin=
ion);
>>>>
>>>> -          Somehow reset values in debug statements (not good in my
>>>> opinion, if I could provide correct information for them);
>>>
>>> You could re-set them via gimple_debug_bind_reset_value which
>>> will tell the user that at this point the value is optimized out
>>> (that's slightly
>>> better than removing the debug stmts).
>>>
>>>> -          Generate phi for debug statements and fix bootstrap (say,
>>>> let=E2=80=99s have some list of files, which we have special check for=
. I mean
>>>> for my case, the difference is not in stage2 and stage3, it=E2=80=99s =
in debug
>>>> and non-debug). I like this variant. However, as I don=E2=80=99t see s=
uch list
>>>> of special files in the whole gcc, I think I might be missing
>>>> something.
>>>
>>> The policy ;)
>>>
>>>> What do you think?
>>>
>>> Now about the patch.  Generally there seem to be spurious whitespace-on=
ly
>>> or formatting differences - try to avoid these.
>>>
>>> All new functiuons need a comment explaining them and their function
>>> parameters.
>>>
>>> +static cond_data_map *get_cond_data_map (struct loop * level)
>>> +{
>>>
>>> per GCC coding conventions this should be formatted like
>>>
>>> static cond_data_map *
>>> get_cond_data_map (struct loop * level)
>>> {
>>>
>>> +#define SET_ALWAYS_EXECUTED_IN(BB, VAL)\
>>> +                 BB->aux =3D (void *) (XCNEW (struct move_cond));\
>>> +                 BB_MOVE_COND (BB)->type=3DALWAYS_EXECUTED_IN;\
>>> +                 BB_MOVE_COND (BB)->loop=3DVAL;
>>>
>>> this kind of side-effects in macros are bad - better turn them into
>>> inline functions.
>>>
>>> -  if (flag_unswitch_loops
>>> -      && gimple_code (stmt) =3D=3D GIMPLE_COND)
>>> -    {
>>> -      /* If we perform unswitching, force the operands of the invariant
>>> -        condition to be moved out of the loop.  */
>>> -      return MOVE_POSSIBLE;
>>> -    }
>>> +  if (gimple_code (stmt) =3D=3D GIMPLE_COND)
>>> +    return MOVE_POSSIBLE;
>>>
>>> any reason you removed the guard on flag_unswitch_loops?  We may
>>> want to add a new flag, certainly this transform might not be suitable
>>> for -O[12s] for example.  Like if not all stmts inside a conditional can
>>> be moved the transform will increase code-size.
>>>
>>> You are doing a (lot of?) refactoring which makes review of the patch
>>> harder.  For example
>>>
>>> +bool calculate_tgt_level (gimple stmt, struct loop *outermost,
>>> +                        bool cond_executed, basic_block bb)
>>> +{
>>> +  struct lim_aux_data *lim_dat
>>> ...
>>> +
>>> +  if (lim_data->cost >=3D LIM_EXPENSIVE)
>>> +    set_profitable_level (stmt);
>>> +  return true;
>>> +}
>>>
>>> ...
>>>
>>> -
>>> -       lim_data =3D init_lim_data (stmt);
>>> -       lim_data->always_executed_in =3D outermost;
>>> -
>>> -       if (!determine_max_movement (stmt, false))
>>> -         {
>>> -           lim_data->max_loop =3D NULL;
>>> -           continue;
>>> -         }
>>> -
>>> -       if (dump_file && (dump_flags & TDF_DETAILS))
>>> -         {
>>> -           print_gimple_stmt (dump_file, stmt, 2, 0);
>>> -           fprintf (dump_file, "  invariant up to level %d, cost %d.\n=
\n",
>>> -                    loop_depth (lim_data->max_loop),
>>> -                    lim_data->cost);
>>> -         }
>>> -
>>> -       if (lim_data->cost >=3D LIM_EXPENSIVE)
>>> -         set_profitable_level (stmt);
>>> +       if (!calculate_tgt_level (stmt, outermost, cond_executed, bb))
>>> +         continue;
>>>
>>> where it is not obvious why moving set_profitable_level should be
>>> in a function named calculate_tgt_level.  In fact determine_max_movement
>>> was supposed to be the abstraction already.
>>>
>>> Without yet getting an idea of the overall algorithm (I'm really missin=
g such
>>> a thing...) it seems to me that the current handling of doing invariant=
 motion
>>> of PHIs should handle the analysis phase just fine (you might be adding
>>> more advanced cases in this patch, but I'd rather do that as a followup=
 for
>>> initial support).
>>>
>>> The code-gen part looks somewhat ad-hoc to me.  In principle if we have
>>> a list of PHIs to hoist we can materialize the needed part of the origi=
nal
>>> basic-block structure on the pre-header edge together with the controll=
ing
>>> conditionals.  The loop pre-header would be the always-executed BB
>>> and the needed part would always start with the loop header and be a
>>> single-entry multiple-exit region.  So ideally we could use sth like
>>> gimple_duplicate_sese_region with the BB copying worker copy_bbs
>>> refactored/replaced with something only copying controlling conditionals
>>> and statements/PHIs that a callback identifies (tree-loop-distribution.c
>>> would also like sth like that - currently it copies whole loops and rem=
oves
>>> stmts it didn't intend to copy...).
>>>
>>> So basically invariantness_dom_walker would record a vector of PHIs
>>> to hoist (or we'd construct that at move_computations time).
>>> If there is any control-flow to hoist then we'd do an alternative
>>> move_computations by "copying" the needed CFG and covered stmts.
>>> In theory doing it the loop-distribution way (simply copy the whole
>>> loop body and then remove anything not needed) would also be possible
>>> though that wouldn't be the very best idea obviously ;)
>>>
>>> Thanks,
>>> Richard.
>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>>
>>>>
>>>> Evgeniya
>>
>>
>>
>> --
>> Thanks,
>>
>> Evgeniya
>>
>> perfstories.wordpress.com


--=20
Thanks,

Evgeniya

perfstories.wordpress.com