From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-399316-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 39572 invoked by alias); 29 May 2015 13:14:35 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 39554 invoked by uid 89); 29 May 2015 13:14:34 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.2 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2
X-HELO: mail-lb0-f175.google.com
Received: from mail-lb0-f175.google.com (HELO mail-lb0-f175.google.com) (209.85.217.175) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Fri, 29 May 2015 13:14:31 +0000
Received: by lbbuc2 with SMTP id uc2so48245117lbb.2        for <gcc-patches@gcc.gnu.org>; Fri, 29 May 2015 06:14:28 -0700 (PDT)
MIME-Version: 1.0
X-Received: by 10.112.163.35 with SMTP id yf3mr7752418lbb.107.1432905268153; Fri, 29 May 2015 06:14:28 -0700 (PDT)
Received: by 10.112.1.100 with HTTP; Fri, 29 May 2015 06:14:28 -0700 (PDT)
In-Reply-To: <CANVW7MPfRvpNjABKuqup9GxbUk7A=BaRqwsPcbP8DRpGoGMWdw@mail.gmail.com>
References: <CANVW7MPfRvpNjABKuqup9GxbUk7A=BaRqwsPcbP8DRpGoGMWdw@mail.gmail.com>
Date: Fri, 29 May 2015 13:32:00 -0000
Message-ID: <CANVW7MP0Uv3LSSd=JfD5hTnMv75SpQ1WMrV_eWMa9owZ3T+yYg@mail.gmail.com>
Subject: Re: conditional lim
From: Evgeniya Maenkova <evgeniya.maenkova@gmail.com>
To: Richard Biener <richard.guenther@gmail.com>, GCC Patches <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
X-SW-Source: 2015-05/txt/msg02754.txt.bz2

Hi Richard,

Here is some explanation. I hope you let me know if I need to clarify somet=
hing.

Also, you asked me about concrete example, to make sure you don=E2=80=99t m=
iss
my answer here is the link:
https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02417.html.

Also, I doubt whether it=E2=80=99s convenient for you to create a build with
my patch or not. May be to clarify things you could send me some
examples/concrete cases, then I=E2=80=99ll compile them with
=E2=80=93fdump-tree-loopinit-details and =E2=80=93fdump-tree-lim-details an=
d send you
these dumps. May be these dumps will be useful. (I=E2=80=99ll only disable
cleanup_cfg TODO after lim to let you know the exact picture after
lim).

What do you think?

1.       invariantness _dom_walker =E2=80=93

1.1   for each GIMPLE_COND in given bb calls handle_cond_stmt to call
for true and false edges handle_branch_edge, which calls SET_TARGET_OF
for all bb =E2=80=98predicated=E2=80=99 by given GIMPLE_COND.

SET_TARGET_OF sets in basic_blocks aux 2 facts:

a)      this is true or false edge;

b)      link to cond stmt;

Handle_branch_edge works this way:

If (cond1)

  {

     bb1;

     if (cond2}

       {

           bb2;

        }

   Being called for cond1, it sets cond1 as condition for both bb1 and
bb2 (the whole branch for cond1, ie also for bb containing cond2),
then this method will be called (as there is dominance order) for
cond2 to correct things (ie to set cond2 as condition for bb2).

1.2   As 1.1 goes we identify whether some bb is predicated by some
condition or not.

bb->aux->type will be [TRUE/FALSE]_TARGET_OF and
bb->aux->cond_stmt=3Dcond stmt (the nearest condition).

If bb is always executed bb->aux->type =3D ALWAYS_EXECUTED_IN,
bb->loop->loop (this info was available in the clean build).

1.3   As this walker is called in dominance order, information about
condition is available when invariantness_dom_walker is called for
given bb.  So we can make some computations based on bb->aux
structure. This is done in check_conditionally_executed. The main goal
of this method is to check that the root condition is always executed
in the loop. I did so to avoid situation like this

Loop:

   Jmp somewhere;

  If (cond1)

      If (cond2)

         Etc

By =E2=80=98root condition=E2=80=99 I mean cond1 in this case (the first co=
nd in
dominance order).

If  =E2=80=98root condition=E2=80=99 is not always executed we disable (fre=
e) all the
info in bb->aux, ie all the blocks becomes neither conditionally nor
always executed according to bb->aux.

There is some optimization to don=E2=80=99t go each time trough the conditi=
ons
chain (cond2->cond1), let me skip such details for now.


1.4   Then we calculate tgt_level  (which is used in move_computations late=
r)

The patch is the same for phi and regular stmt (calculate_tgt_level)

1)      If stmt is conditionally executed we compute max movement
(determine_max_movement) starting from
get_lim_data(condition)->max_loop.

2)      If stmt is not cond executed as start level  for
determine_max_movement computations we choose ALWAYS_EXECUTED_IN.

To clarify why, see the comment

/* The reason why we don't set start_level for MOVE_POSSIBLE

          statements to more outer level is: this statement could depend on

          conditional statements and even probably is not defined without t=
his

          condition. So either we should analyze these ones and move

          under condition somehow or keep more strong start_level . */

As you noted in your review there was some refactoring. Of course, I
had no goal to refactor existing code, I intended to remove some
duplication which I caused by my changes.  I hope we discuss in
details later if you don=E2=80=99t like some refactoring.

2.       store_motion  - for some stmts set flags to ignore predicated
execution (I mean to move statements without conditions).


3.       move_computations. The patch is doing something in the
following 3 calls inside of this method.


3.1   (more details below) walker.walk =E2=80=93 move computations(create i=
f()
structure) and store information to create phi nodes later (which
statements require phi nodes, as they conditionally executed and there
are still their uses on other levels, what are the names for such phi
nodes, as we replace uses in here, in walker.walk.)

3.2   (more details below) create_phi_nodes =E2=80=93 create phi nodes for
statements which require that (see 3.1)

3.3   replace_uses_in_conditions

After computations movements we can have several copies of the cond
stmt. In 3.1 we do replacement of uses based on stmt=E2=80=99s tgt_level. F=
or,
cond stmt it doesn=E2=80=99t work. As cond stmt, of course, have something =
in
lim_aux_data->tgt_level, but, in fact, they are moved only if
corresponding invariants are moved. So, in fact, the same cond (copies
of it, of course) could go to the different levels. So to fix these
uses, we call replace_uses_in_conditions.

More details on 3.1 and 3.2

3.1 The patch is very similar for phi stmts and regular stmts (there
is some difference for regular stmts related to the case when we move
sequence instead of single stmt, let=E2=80=99s skip it in this description,
will describe later. BTW, there are comments, in corresponding parts).

a) for each bb we call check_conditionally_executed and if bb is cond
executed then invariant statement  will be moved conditionally.

b) prepare_edge_for_stmt collects information needed to create phi
nodes in 3.2(add_info_for_phi) AND prepares edge ie creates if()
structure needed to move statement
(get_block_for_predicated_statement, see below). All of this is done
for cond executed stmt. Nothing is done for non cond executed.

BTW, there is some strange names for functions like missed ending
(prepare_edge_for_stm, w/o t in the end, this is because of my script
to add =E2=80=98 =E2=80=98 before =E2=80=98(=E2=80=98, will fix in the next=
 patch version, ignore
please so far).

Get_block_for_predicated_stmt:

Create bb where to add cond stmt to (or return existing, each loop has
a map  (level_cond_data_map is map <loop, cond_data_map> ,
cond_data_map is map <gimple, basic_block>) to identify if bb for
given cond was created or not for given level).

Parameters of this method:

Cond =E2=80=93 condition of stmt, branch =E2=80=93 is it on true or false e=
dge of
cond, cond_map  - see above, level (tgt_level), preheader =E2=80=93
preheader_edge for level.

So first of all we check whether there is a basic block for given
condition in given loop.

a)      If such bb exists then we trying to find corresponding edge,
destination of this edge shouldn=E2=80=99t be a post dominator of bb
(find_branch_edge).

a.1) if such edge exists we return the most remote in this branch bb
(or create it), see get_most_remote_in_branch;

a.2) if such edge doesn=E2=80=99t exists, we create corresponding bb, see m=
ake_branch

       b)  If such bb doesn=E2=80=99t exist we create it (recursively calli=
ng
get_block_for_predicated_stmt if required bb is conditionally
executed).

Will not describe it further so far (let me know if you would like to
read additional details).

Add_info_for_phi

The main goal of this method is to identify situation where
stmt(parameter,ie the statement which we are moving) or more exactly
lhs of this statement

a)      is still used in other loops, so we need to create phi node to
use phi name inside other loops.

b)      Is used in phi node of original loop and this phi node is
going to be moved. BUT, as we move phi node as assignment (in the case
of phi node with 2 arguments) we need to create phi node to use lhs of
statement (param of add_info_for_phi).

In these cases we make corresponding replacements and store
information needed to create phi nodes and make some other
replacements in 3.3 (replace_uses_in_conditions).

Add_info_for_phi calls create_tmp_var which requires some explanation.

Create_tmp_var

To create new names for phi nodes and to create default def for phi
arguments I use make_ssa_name and get_or_create_ssa_default_def. These
methods have some asserts (being merged):   (TREE_CODE (old_var) =3D=3D
VAR_DECL   || TREE_CODE (old_var) =3D=3D PARM_DECL   || TREE_CODE
(old_var) =3D=3D RESULT_DECL).

So in some cases (when I know that I need to use methods mentioned
above, see the start of create_tmp_var, ie the uses in other loops) I
create new tmp variable to overcome these asserts. To be honest I
don=E2=80=99t like this but don=E2=80=99t know how to deal with this better.

And a couple of words regarding where we store info for phi nodes:

Each loop contains in its aux  phi_data structure. On this stage we
only add there stmt (if phi node will be required), phi name=3Dphi node
result name in names_map or fl_names_map). See the comment about
phi_data in patch.

If there should be created several phi nodes for given lhs (I mean the
first place for phi node doesn=E2=80=99t dominate on corresponding preheade=
r,
we add only the last name in names_map (as intermediate names could be
created later, but the last name in bb which dominates preheader
should be used for replacement in other loops. Replacements were
already made in walker).

If lhs is used only in phi node, which are moved and transformed into
assignment, and there is no uses in other loops, we need to create
only first level phi node (don=E2=80=99t need to create phi nodes till some=
 bb
which dominates preheader), then we add such name to fl_names_map.

3.2 Create_phi_nodes

For each of the loops we go over ((phi_data*) loop->aux)->stmts. These
are statements which were moved, but they have uses in the original
loop (more exactly, their uses in some other loops replaced by some
name from ((phi_data*) loop->aux)->names_map or ((phi_data*)
loop->aux)->fl_names_map.

So for each of such stmts we create phi nodes (for its lhs) chain.

Basic block for phi node is found in get_bb_for_phi  for given bb with
stmt. If basic block for phi node dominates preheader->src we end
chain creation. There is some special condition for the case when we
need to create only first level phi node. (I described this situation
above, but let me know If I need to add  more details.

Basic blocks can have maps to identify if we created phi node for
given variable in given basic_block, so we only add an edge argument
for such case (phi_map =3D new hash_map<basic_block, hash_map<tree,
gphi*>*>).

In short, that=E2=80=99s all.

Thanks,

Evgeniya


On Sat, May 9, 2015 at 1:07 AM, Evgeniya Maenkova
<evgeniya.maenkova@gmail.com> wrote:
> Hi,
>
> Could you please review my patch for predicated lim?
>
> Let me note some details about it:
>
>
>
> 1)      Phi statements are still moved only if they have 1 or 2
> arguments. However, phi statements could be move under conditions (as
> it=E2=80=99s done for the other statements).  Probably, phi statement mot=
ion
> with 3 + arguments could be implemented in the next patch after
> predicated lim.
>
> 2)      Patch has limitations/features like (it was ok to me to
> implement it such way, maybe I=E2=80=99m not correct. ):
>
> a)      Loop1
>
>     {
>
>               If (a)
>
>                  Loop2
>
>                      {
>
>                        Stmt - Invariant for Loop1
>
>                      }
>
>              }
>
>        In this case Stmt will be moved only out of Loop2, because of if (=
a).
>
> b)      Or
>
> Loop1
>
>     {
>
>          =E2=80=A6
>
>          If (cond1)
>
>               If (cond2)
>
>                   If (cond3)
>
>                       Stmt;
>
>        }
>
> Stmt will be moved out only if cond1 is always executed in Loop1.
>
> c)       It took me a long time to write all of these code, so there
> might be other peculiarities which I forgot to mention. :)
>
>                        Let=E2=80=99s discuss these ones as you will revie=
w my patch.
>
> 3)      Patch consists of 9 files:
>
> a)      gcc/testsuite/gcc.dg/tree-ssa/loop-7.c,
> gcc/testsuite/gcc.dg/tree-ssa/recip-3.c =E2=80=93 changed tests:
>
> -          gcc/testsuite/gcc.dg/tree-ssa/loop-7.c  changed as
> predicated lim moves 2 more statements out of the loop;
>
> -          gcc/testsuite/gcc.dg/tree-ssa/recip-3.c =E2=80=93 with conditi=
onal
> lim recip optimization in this test doesn=E2=80=99t work (the correspondi=
ng
> value is below threshold as I could see in the code for recip, 1<3).
> So to have recip working in this test I changed test a little bit.
>
> b)      gcc/tree-ssa-loop-im.c =E2=80=93 the patched lim per se
>
> c)       gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-13.c,
> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-14.c,
>
> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-15.c,
> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c,
>
> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c,
> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c
>
> the tests for conditional lim.
>
> 4)      Patch testing:
>
> a)      make =E2=80=93k check (no difference in results for me for the cl=
ean
> build and the patched one,
>
> -          Revision: 222849,
>
> -          uname -a
>
>                Linux Istanbul 3.16.0-23-generic #31-Ubuntu SMP Tue Oct
> 21 18:00:35 UTC 2014 i686 i686 i686 GNU/Linux
>
> b)      Bootstrap.
>
>  It goes well now, however to fix it I have made a temporary hack in
> the lim code. And with this fix patch definitely shouldn=E2=80=99t be
> committed.
>
> I did so, as I would like to discuss this issue first.
>
> The problem is: I got stage2-stage3 comparison failure on the single
> file (tree-vect-data-refs.o). After some investigation I understood
> that tree-vect-data-refs.o differs being compiled with and without
> =E2=80=98-g=E2=80=99 option (yes, more exactly on stage 2 this is  =E2=80=
=98-g =E2=80=93O2 =E2=80=93gtoggle=E2=80=99,
> and for stage 3 this is =E2=80=98-g =E2=80=93O2=E2=80=99. But to simplify=
 things I can
> reproduce this difference on the same build (even not bootstrapped),
> with option =E2=80=98 =E2=80=93g=E2=80=99 and without it).
>
> Of course, the file compiled with =E2=80=93g option will contain debug
> information and will differ from the corresponding file without debug
> information. I mean there is the difference reported by
> contrib/compare-debug (I mean we talk about stripped files).
>
> Then I compared assemblers and lim dump files and made a conclusion
> the difference is:
>
> There is statement _484=3Dsomething, which is conditionally moved out of
> loop X. In non debug cases no usages of _484 are left inside the loop
> X. In debug case, there is the single use in debug statement. So for
> debug case my patch generates phi statement for _484 to make it
> available inside the loop X. For non debug case, no such phi statement
> generated as there is no uses of _484.
>
> There is one more statement with lhs=3D_493, with exactly this pattern
> of use. But this is not important for our discussion.
>
>
>
> So, in my opinion it=E2=80=99s ok to generate additional phi node for deb=
ug
> case. But I=E2=80=99m not a compiler expert and maybe there is some
> requirement that debug and non-debug versions should differ only by
> debug statements, I don=E2=80=99t know.
>
>
>
> My temporary hack fix is just removing of such debug statements (no
> debug statements, no problems J).
>
>
>
> The alternatives I see are:
>
> -          Move debug statements outside the loop (not good in my opinion=
);
>
> -          Somehow reset values in debug statements (not good in my
> opinion, if I could provide correct information for them);
>
> -          Generate phi for debug statements and fix bootstrap (say,
> let=E2=80=99s have some list of files, which we have special check for. I=
 mean
> for my case, the difference is not in stage2 and stage3, it=E2=80=99s in =
debug
> and non-debug). I like this variant. However, as I don=E2=80=99t see such=
 list
> of special files in the whole gcc, I think I might be missing
> something.
>
> What do you think?
>
>
>
> Thanks,
>
>
>
> Evgeniya