From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-398968-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 112057 invoked by alias); 27 May 2015 10:12:05 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 112039 invoked by uid 89); 27 May 2015 10:12:04 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2
X-HELO: mail-ob0-f178.google.com
Received: from mail-ob0-f178.google.com (HELO mail-ob0-f178.google.com) (209.85.214.178) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Wed, 27 May 2015 10:12:01 +0000
Received: by obbgf1 with SMTP id gf1so3615483obb.2        for <gcc-patches@gcc.gnu.org>; Wed, 27 May 2015 03:11:59 -0700 (PDT)
MIME-Version: 1.0
X-Received: by 10.60.176.37 with SMTP id cf5mr25716078oec.19.1432721519751; Wed, 27 May 2015 03:11:59 -0700 (PDT)
Received: by 10.76.115.167 with HTTP; Wed, 27 May 2015 03:11:59 -0700 (PDT)
In-Reply-To: <CANVW7MN_LYqO-p4Vh2LQ6vSEc6RRUMhUo=ydfjr17kGW=XeVLg@mail.gmail.com>
References: <CANVW7MPfRvpNjABKuqup9GxbUk7A=BaRqwsPcbP8DRpGoGMWdw@mail.gmail.com>	<CAFiYyc0jxE958ROvpAEZb53xVOgn=wjm3XSLn-92SiZ3+3t0OA@mail.gmail.com>	<CANVW7MN_LYqO-p4Vh2LQ6vSEc6RRUMhUo=ydfjr17kGW=XeVLg@mail.gmail.com>
Date: Wed, 27 May 2015 10:52:00 -0000
Message-ID: <CAFiYyc3sR2Z0GVZ+kiQ99GUWyYmuSTNnCdmcPLnFO9aOAEh2NA@mail.gmail.com>
Subject: Re: conditional lim
From: Richard Biener <richard.guenther@gmail.com>
To: Evgeniya Maenkova <evgeniya.maenkova@gmail.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
X-SW-Source: 2015-05/txt/msg02406.txt.bz2

On Tue, May 26, 2015 at 3:10 PM, Evgeniya Maenkova
<evgeniya.maenkova@gmail.com> wrote:
> Hi, Richard
>
> Thanks for review starting.
>
> Do you see any major issues with this patch (i.e. algorithms and ideas
> that should be completely replaced, effectively causing the re-write
> of most code)?
>
> To decide if there are major issues in the patch, perhaps, you need
> additional clarifications from me? Could you point at the places where
> additional explanations could save you most effort?
>
> Your answers to these questions are looking the first priority ones.
> You wrote about several issues in the code, which are looking as easy
> (or almost easy ;) to fix(inline functions, unswitch-loops flag,
> comments, etc). But, I think you agree, let=E2=80=99s first decide about =
the
> major issues (I mean, whether we continue with this patch or starting
> new one, this will save a lot of time for both of us).

I didn't get an overall idea on how the patch works, that is, how it integr=
ates
with the existing algorithm.  If you can elaborate on that a bit that would
be helpful.

I think the code-generation part needs some work (whether by following
my idea with re-using copy_bbs or whether by basically re-implementing
it is up to debate).  How does your code handle

  for ()
    {
       if (cond1)
        {
           if (cond2)
             invariant;
           if (cond3)
             invariant;
        }
    }

?  Optimally we'd have before the loop exactly the same if () structure
(thus if (cond1) is shared).

Richard.


> Thanks,
>
> Evgeniya
>
> On Tue, May 26, 2015 at 2:31 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Fri, May 8, 2015 at 11:07 PM, Evgeniya Maenkova
>> <evgeniya.maenkova@gmail.com> wrote:
>>> Hi,
>>>
>>> Could you please review my patch for predicated lim?
>>>
>>> Let me note some details about it:
>>>
>>>
>>>
>>> 1)      Phi statements are still moved only if they have 1 or 2
>>> arguments. However, phi statements could be move under conditions (as
>>> it=E2=80=99s done for the other statements).  Probably, phi statement m=
otion
>>> with 3 + arguments could be implemented in the next patch after
>>> predicated lim.
>>>
>>> 2)      Patch has limitations/features like (it was ok to me to
>>> implement it such way, maybe I=E2=80=99m not correct. ):
>>>
>>> a)      Loop1
>>>
>>>     {
>>>
>>>               If (a)
>>>
>>>                  Loop2
>>>
>>>                      {
>>>
>>>                        Stmt - Invariant for Loop1
>>>
>>>                      }
>>>
>>>              }
>>>
>>>        In this case Stmt will be moved only out of Loop2, because of if=
 (a).
>>>
>>> b)      Or
>>>
>>> Loop1
>>>
>>>     {
>>>
>>>          =E2=80=A6
>>>
>>>          If (cond1)
>>>
>>>               If (cond2)
>>>
>>>                   If (cond3)
>>>
>>>                       Stmt;
>>>
>>>        }
>>>
>>> Stmt will be moved out only if cond1 is always executed in Loop1.
>>>
>>> c)       It took me a long time to write all of these code, so there
>>> might be other peculiarities which I forgot to mention. :)
>>>
>>>                        Let=E2=80=99s discuss these ones as you will rev=
iew my patch.
>>>
>>> 3)      Patch consists of 9 files:
>>>
>>> a)      gcc/testsuite/gcc.dg/tree-ssa/loop-7.c,
>>> gcc/testsuite/gcc.dg/tree-ssa/recip-3.c =E2=80=93 changed tests:
>>>
>>> -          gcc/testsuite/gcc.dg/tree-ssa/loop-7.c  changed as
>>> predicated lim moves 2 more statements out of the loop;
>>>
>>> -          gcc/testsuite/gcc.dg/tree-ssa/recip-3.c =E2=80=93 with condi=
tional
>>> lim recip optimization in this test doesn=E2=80=99t work (the correspon=
ding
>>> value is below threshold as I could see in the code for recip, 1<3).
>>> So to have recip working in this test I changed test a little bit.
>>>
>>> b)      gcc/tree-ssa-loop-im.c =E2=80=93 the patched lim per se
>>>
>>> c)       gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-13.c,
>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-14.c,
>>>
>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-15.c,
>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c,
>>>
>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c,
>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c
>>>
>>> the tests for conditional lim.
>>>
>>> 4)      Patch testing:
>>>
>>> a)      make =E2=80=93k check (no difference in results for me for the =
clean
>>> build and the patched one,
>>>
>>> -          Revision: 222849,
>>>
>>> -          uname -a
>>>
>>>                Linux Istanbul 3.16.0-23-generic #31-Ubuntu SMP Tue Oct
>>> 21 18:00:35 UTC 2014 i686 i686 i686 GNU/Linux
>>>
>>> b)      Bootstrap.
>>>
>>>  It goes well now, however to fix it I have made a temporary hack in
>>> the lim code. And with this fix patch definitely shouldn=E2=80=99t be
>>> committed.
>>>
>>> I did so, as I would like to discuss this issue first.
>>>
>>> The problem is: I got stage2-stage3 comparison failure on the single
>>> file (tree-vect-data-refs.o). After some investigation I understood
>>> that tree-vect-data-refs.o differs being compiled with and without
>>> =E2=80=98-g=E2=80=99 option (yes, more exactly on stage 2 this is  =E2=
=80=98-g =E2=80=93O2 =E2=80=93gtoggle=E2=80=99,
>>> and for stage 3 this is =E2=80=98-g =E2=80=93O2=E2=80=99. But to simpli=
fy things I can
>>> reproduce this difference on the same build (even not bootstrapped),
>>> with option =E2=80=98 =E2=80=93g=E2=80=99 and without it).
>>>
>>> Of course, the file compiled with =E2=80=93g option will contain debug
>>> information and will differ from the corresponding file without debug
>>> information. I mean there is the difference reported by
>>> contrib/compare-debug (I mean we talk about stripped files).
>>>
>>> Then I compared assemblers and lim dump files and made a conclusion
>>> the difference is:
>>>
>>> There is statement _484=3Dsomething, which is conditionally moved out of
>>> loop X. In non debug cases no usages of _484 are left inside the loop
>>> X. In debug case, there is the single use in debug statement. So for
>>> debug case my patch generates phi statement for _484 to make it
>>> available inside the loop X. For non debug case, no such phi statement
>>> generated as there is no uses of _484.
>>>
>>> There is one more statement with lhs=3D_493, with exactly this pattern
>>> of use. But this is not important for our discussion.
>>>
>>>
>>>
>>> So, in my opinion it=E2=80=99s ok to generate additional phi node for d=
ebug
>>> case. But I=E2=80=99m not a compiler expert and maybe there is some
>>> requirement that debug and non-debug versions should differ only by
>>> debug statements, I don=E2=80=99t know.
>>
>> As Andi said code generation needs to be the same.
>>
>>> My temporary hack fix is just removing of such debug statements (no
>>> debug statements, no problems J).
>>
>> That's fine and generally what is done in other passes.
>>
>>> The alternatives I see are:
>>>
>>> -          Move debug statements outside the loop (not good in my opini=
on);
>>>
>>> -          Somehow reset values in debug statements (not good in my
>>> opinion, if I could provide correct information for them);
>>
>> You could re-set them via gimple_debug_bind_reset_value which
>> will tell the user that at this point the value is optimized out
>> (that's slightly
>> better than removing the debug stmts).
>>
>>> -          Generate phi for debug statements and fix bootstrap (say,
>>> let=E2=80=99s have some list of files, which we have special check for.=
 I mean
>>> for my case, the difference is not in stage2 and stage3, it=E2=80=99s i=
n debug
>>> and non-debug). I like this variant. However, as I don=E2=80=99t see su=
ch list
>>> of special files in the whole gcc, I think I might be missing
>>> something.
>>
>> The policy ;)
>>
>>> What do you think?
>>
>> Now about the patch.  Generally there seem to be spurious whitespace-only
>> or formatting differences - try to avoid these.
>>
>> All new functiuons need a comment explaining them and their function
>> parameters.
>>
>> +static cond_data_map *get_cond_data_map (struct loop * level)
>> +{
>>
>> per GCC coding conventions this should be formatted like
>>
>> static cond_data_map *
>> get_cond_data_map (struct loop * level)
>> {
>>
>> +#define SET_ALWAYS_EXECUTED_IN(BB, VAL)\
>> +                 BB->aux =3D (void *) (XCNEW (struct move_cond));\
>> +                 BB_MOVE_COND (BB)->type=3DALWAYS_EXECUTED_IN;\
>> +                 BB_MOVE_COND (BB)->loop=3DVAL;
>>
>> this kind of side-effects in macros are bad - better turn them into
>> inline functions.
>>
>> -  if (flag_unswitch_loops
>> -      && gimple_code (stmt) =3D=3D GIMPLE_COND)
>> -    {
>> -      /* If we perform unswitching, force the operands of the invariant
>> -        condition to be moved out of the loop.  */
>> -      return MOVE_POSSIBLE;
>> -    }
>> +  if (gimple_code (stmt) =3D=3D GIMPLE_COND)
>> +    return MOVE_POSSIBLE;
>>
>> any reason you removed the guard on flag_unswitch_loops?  We may
>> want to add a new flag, certainly this transform might not be suitable
>> for -O[12s] for example.  Like if not all stmts inside a conditional can
>> be moved the transform will increase code-size.
>>
>> You are doing a (lot of?) refactoring which makes review of the patch
>> harder.  For example
>>
>> +bool calculate_tgt_level (gimple stmt, struct loop *outermost,
>> +                        bool cond_executed, basic_block bb)
>> +{
>> +  struct lim_aux_data *lim_dat
>> ...
>> +
>> +  if (lim_data->cost >=3D LIM_EXPENSIVE)
>> +    set_profitable_level (stmt);
>> +  return true;
>> +}
>>
>> ...
>>
>> -
>> -       lim_data =3D init_lim_data (stmt);
>> -       lim_data->always_executed_in =3D outermost;
>> -
>> -       if (!determine_max_movement (stmt, false))
>> -         {
>> -           lim_data->max_loop =3D NULL;
>> -           continue;
>> -         }
>> -
>> -       if (dump_file && (dump_flags & TDF_DETAILS))
>> -         {
>> -           print_gimple_stmt (dump_file, stmt, 2, 0);
>> -           fprintf (dump_file, "  invariant up to level %d, cost %d.\n\=
n",
>> -                    loop_depth (lim_data->max_loop),
>> -                    lim_data->cost);
>> -         }
>> -
>> -       if (lim_data->cost >=3D LIM_EXPENSIVE)
>> -         set_profitable_level (stmt);
>> +       if (!calculate_tgt_level (stmt, outermost, cond_executed, bb))
>> +         continue;
>>
>> where it is not obvious why moving set_profitable_level should be
>> in a function named calculate_tgt_level.  In fact determine_max_movement
>> was supposed to be the abstraction already.
>>
>> Without yet getting an idea of the overall algorithm (I'm really missing=
 such
>> a thing...) it seems to me that the current handling of doing invariant =
motion
>> of PHIs should handle the analysis phase just fine (you might be adding
>> more advanced cases in this patch, but I'd rather do that as a followup =
for
>> initial support).
>>
>> The code-gen part looks somewhat ad-hoc to me.  In principle if we have
>> a list of PHIs to hoist we can materialize the needed part of the origin=
al
>> basic-block structure on the pre-header edge together with the controlli=
ng
>> conditionals.  The loop pre-header would be the always-executed BB
>> and the needed part would always start with the loop header and be a
>> single-entry multiple-exit region.  So ideally we could use sth like
>> gimple_duplicate_sese_region with the BB copying worker copy_bbs
>> refactored/replaced with something only copying controlling conditionals
>> and statements/PHIs that a callback identifies (tree-loop-distribution.c
>> would also like sth like that - currently it copies whole loops and remo=
ves
>> stmts it didn't intend to copy...).
>>
>> So basically invariantness_dom_walker would record a vector of PHIs
>> to hoist (or we'd construct that at move_computations time).
>> If there is any control-flow to hoist then we'd do an alternative
>> move_computations by "copying" the needed CFG and covered stmts.
>> In theory doing it the loop-distribution way (simply copy the whole
>> loop body and then remove anything not needed) would also be possible
>> though that wouldn't be the very best idea obviously ;)
>>
>> Thanks,
>> Richard.
>>
>>>
>>>
>>> Thanks,
>>>
>>>
>>>
>>> Evgeniya
>
>
>
> --
> Thanks,
>
> Evgeniya
>
> perfstories.wordpress.com