From: "Bin.Cheng" <amker.cheng@gmail.com>
To: David Malcolm <dmalcolm@redhat.com>
Cc: Bin Cheng <Bin.Cheng@arm.com>,
"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
nd <nd@arm.com>, Richard Biener <richard.guenther@gmail.com>,
"matz@suse.de" <matz@suse.de>
Subject: Re: [PATCH GCC][V2]A simple implementation of loop interchange
Date: Tue, 28 Nov 2017 16:16:00 -0000 [thread overview]
Message-ID: <CAHFci28-H_96it=qx_ZRvB5yrreMo1D4rvgexc7Bqe4HU1U=0Q@mail.gmail.com> (raw)
In-Reply-To: <1511884833.27881.15.camel@redhat.com>
On Tue, Nov 28, 2017 at 4:00 PM, David Malcolm <dmalcolm@redhat.com> wrote:
> On Tue, 2017-11-28 at 15:26 +0000, Bin Cheng wrote:
>> Hi,
>> This is updated patch with review comments resolved. Some
>> explanation embedded below.
>>
>> On Mon, Nov 20, 2017 at 2:46 PM, Richard Biener <richard.guenther@gma
>> il.com> wrote:
>> > On Thu, Nov 16, 2017 at 4:18 PM, Bin.Cheng <amker.cheng@gmail.com>
>> > wrote:
>> > > On Tue, Oct 24, 2017 at 3:30 PM, Michael Matz <matz@suse.de>
>> > > wrote:
>> > > > Hello,
>> > > >
>> > > > On Fri, 22 Sep 2017, Bin.Cheng wrote:
>> > > >
>> > > > > This is updated patch for loop interchange with review
>> > > > > suggestions
>> > > > > resolved. Changes are:
>> > > > > 1) It does more light weight checks like rectangle loop
>> > > > > nest check
>> > > > > earlier than before.
>> > > > > 2) It checks profitability of interchange before data
>> > > > > dependence computation.
>> > > > > 3) It calls find_data_references_in_loop only once for a
>> > > > > loop nest now.
>> > > > > 4) Data dependence is open-computed so that we can skip
>> > > > > instantly at
>> > > > > unknown dependence.
>> > > > > 5) It improves code generation in mapping induction
>> > > > > variables for
>> > > > > loop nest, as well as
>> > > > > adding a simple dead code elimination pass.
>> > > > > 6) It changes magic constants into parameters.
>> > > >
>> > > > So I have a couple comments/questions. Something stylistic:
>> > >
>> > > Hi Michael,
>> > > Thanks for reviewing.
>> > >
>> > > >
>> > > > > +class loop_cand
>> > > > > +{
>> > > > > +public:
>> > > > > ...
>> > > > > + friend class tree_loop_interchange;
>> > > > > +private:
>> > > >
>> > > > Just make this all public (and hence a struct, not class).
>> > > > No need for friends in file local classes.
>> > >
>> > > Done.
>> > >
>> > > >
>> > > > > +single_use_in_loop (tree var, struct loop *loop)
>> > > > > ...
>> > > > > + FOR_EACH_IMM_USE_FAST (use_p, iterator, var)
>> > > > > + {
>> > > > > + stmt = USE_STMT (use_p);
>> > > > > ...
>> > > > > + basic_block bb = gimple_bb (stmt);
>> > > > > + gcc_assert (bb != NULL);
>> > > >
>> > > > This pattern reoccurs often in your patch: you check for a bb
>> > > > associated
>> > > > for a USE_STMT. Uses of SSA names always occur in basic
>> > > > blocks, no need
>> > > > for checking.
>> > >
>> > > Done.
>> > >
>> > > >
>> > > > Then, something about your handling of simple reductions:
>> > > >
>> > > > > +void
>> > > > > +loop_cand::classify_simple_reduction (reduction_p re)
>> > > > > +{
>> > > > > ...
>> > > > > + /* Require memory references in producer and consumer are
>> > > > > the same so
>> > > > > + that we can undo reduction during interchange. */
>> > > > > + if (re->init_ref && !operand_equal_p (re->init_ref, re-
>> > > > > >fini_ref, 0))
>> > > > > + return;
>> > > >
>> > > > Where is it checked that the undoing transformation is legal
>> > > > also
>> > > > from a data dep point of view? Think code like this:
>> > > >
>> > > > sum = X[i];
>> > > > for (j ...)
>> > > > sum += X[j];
>> > > > X[i] = sum;
>> > > >
>> > > > Moving the store into the inner loop isn't always correct and I
>> > > > don't seem
>> > > > to find where the above situation is rejected.
>> > >
>> > > Yeah. for the old patch, it's possible to have such loop wrongly
>> > > interchanged;
>> > > in practice, it's hard to create an example. The pass will give
>> > > up
>> > > when computing
>> > > data dep between references in inner/outer loops. In this
>> > > updated
>> > > patch, it's fixed
>> > > by giving up if there is any dependence between references of
>> > > inner/outer loops.
>> > >
>> > > >
>> > > > Maybe I'm confused because I also don't see where you even can
>> > > > get into
>> > > > the above situation (though I do see testcases about
>> > > > this). The thing is,
>> > > > for an 2d loop nest to contain something like the above
>> > > > reduction it can't
>> > > > be perfect:
>> > > >
>> > > > for (j) {
>> > > > int sum = X[j]; // 1
>> > > > for (i)
>> > > > sum += Y[j][i];
>> > > > X[j] = sum; // 2
>> > > > }
>> > > >
>> > > > But you do check for perfectness in
>> > > > proper_loop_form_for_interchange and
>> > > > prepare_perfect_loop_nest, so either you can't get into the
>> > > > situation or
>> > > > the checking can't be complete, or you define the above to be
>> > > > perfect
>> > > > nevertheless (probably because the load and store are in outer
>> > > > loop
>> > > > header/exit blocks?). The latter would mean that you accept
>> > > > also other
>> > > > code in header/footer of loops from a pure CFG perspective, so
>> > > > where is it
>> > > > checked that that other code (which aren't simple reductions)
>> > > > isn't
>> > > > harmful to the transformation?
>> > >
>> > > Yes, I used the name perfect loop nest, but the pass can handle
>> > > special form
>> > > imperfect loop nest for the simple reduction. I added comments
>> > > describing
>> > > this before function prepare_perfect_loop_nest.
>> > >
>> > > >
>> > > > Then, the data dependence part of the new pass:
>> > > >
>> > > > > +bool
>> > > > > +tree_loop_interchange::valid_data_dependences (unsigned
>> > > > > inner, unsigned outer)
>> > > > > +{
>> > > > > + struct data_dependence_relation *ddr;
>> > > > > +
>> > > > > + for (unsigned i = 0; ddrs.iterate (i, &ddr); ++i)
>> > > > > + {
>> > > > > + /* Skip no-dependence case. */
>> > > > > + if (DDR_ARE_DEPENDENT (ddr) == chrec_known)
>> > > > > + continue;
>> > > > > +
>> > > > > + for (unsigned j = 0; j < DDR_NUM_DIR_VECTS (ddr); ++j)
>> > > > > + {
>> > > > > + lambda_vector dist_vect = DDR_DIST_VECT (ddr, j);
>> > > > > + unsigned level = dependence_level (dist_vect,
>> > > > > loop_nest.length ());
>> > > > > +
>> > > > > + /* If there is no carried dependence. */
>> > > > > + if (level == 0)
>> > > > > + continue;
>> > > > > +
>> > > > > + level --;
>> > > > > + /* Skip case which has '>' as the leftmost
>> > > > > direction. */
>> > > > > + if (!lambda_vector_lexico_pos (dist_vect, level))
>> > > > > + return false;
>> > > >
>> > > > Shouldn't happen as dist vectors are forced positive via
>> > > > DDR_REVERSED.
>> > >
>> > > Done.
>> > >
>> > > >
>> > > > > + /* If dependence is carried by outer loop of the two
>> > > > > loops for
>> > > > > + interchange. */
>> > > > > + if (level < outer)
>> > > > > + continue;
>> > > > > +
>> > > > > + lambda_vector dir_vect = DDR_DIR_VECT (ddr, j);
>> > > > > + /* If directions at both inner/outer levels are the
>> > > > > same. */
>> > > > > + if (dir_vect[inner] == dir_vect[outer])
>> > > > > + continue;
>> > > > > +
>> > > > > + /* Be conservative, skip case if either direction at
>> > > > > inner/outer
>> > > > > + levels is not '=' or '<'. */
>> > > > > + if (dir_vect[inner] != dir_equal
>> > > > > + && dir_vect[inner] != dir_positive
>> > > > > + && dir_vect[inner] != dir_independent
>> > > > > + && dir_vect[inner] != dir_positive_or_equal)
>> > > > > + return false;
>> > > > > +
>> > > > > + if (dir_vect[outer] != dir_equal
>> > > > > + && dir_vect[outer] != dir_positive
>> > > > > + && dir_vect[outer] != dir_independent
>> > > > > + && dir_vect[outer] != dir_positive_or_equal)
>> > > > > + return false;
>> > > >
>> > > > Checking dir vectors doesn't make much sense in GCC: the
>> > > > elements are only
>> > > > ever set to dir_positive, dir_negative or dir_equal, exactly
>> > > > when distance
>> > > > is
>> > > > > 0, < 0 or == 0. So checking dist vector is enough. (though
>> > > > sameness of
>> > > > direction checks sameness of sign with zero). Incidentally:
>> > >
>> > > Done.
>> > >
>> > > >
>> > > > > +tree_loop_interchange::update_data_deps (unsigned inner,
>> > > > > unsigned outer)
>> > > > > +{
>> > > > > + struct data_dependence_relation *ddr;
>> > > > > +
>> > > > > + for (unsigned i = 0; ddrs.iterate (i, &ddr); ++i)
>> > > > > + {
>> > > > > + /* Skip no-dependence case. */
>> > > > > + if (DDR_ARE_DEPENDENT (ddr) == chrec_known)
>> > > > > + continue;
>> > > > > +
>> > > > > + for (unsigned j = 0; j < DDR_NUM_DIR_VECTS (ddr); ++j)
>> > > > > + {
>> > > > > + lambda_vector dir_vect = DDR_DIR_VECT (ddr, j);
>> > > > > + std::swap (dir_vect[inner], dir_vect[outer]);
>> > > > > + }
>> > > > > + }
>> > > > > +}
>> > > >
>> > > > Here you swap only the direction but not the distance vector,
>> > > > which can't
>> > > > be right. I suggest only using (and updating) the distance
>> > > > vector.
>> > >
>> > > Yeah, fixed.
>> > >
>> > > >
>> > > > And then your usage and update of DR_ACCESS_FNs: there's quite
>> > > > some
>> > > > complexity connected with that and I'm not sure how worthwhile
>> > > > it is.
>> > > > You're basically using the ACCESS_FNs to determine
>> > > > profitability (and not
>> > > > for validity, and that's good). But e.g. for pointer based
>> > > > accesses like
>> > > > in fortran with explicit address arithmetic the relation
>> > > > between access-fn
>> > > > step and stride and actual access stride isn't that easy (e.g.
>> > > > in your
>> > > > should_interchange_loops function iloop_stride and oloop_stride
>> > > > will
>> > > > always be one for pointer based accesses).
>> > > >
>> > > > Conceptually what you should check is how the access address
>> > > > for each data
>> > > > ref revolves for each loop, so why not doing this
>> > > > explicitely? What I
>> > > > mean is: calculate a (complicated) chrec for the DR addresses
>> > > > for the
>> > > > whole nest at the beginning. It should be in the form like
>> > > > (assume "+"
>> > > > always):
>> > > >
>> > > > {{{init, s1}_l1, s2}_l2, s3}_l3
>> > > >
>> > > > (i.e. all steps should be invariants/constants, and only one
>> > > > non-chrec
>> > > > init value). Addresses which aren't in this form you're
>> > > > already ignoring
>> > > > right now, so you could continue doing that. (Or better said,
>> > > > all
>> > > > non-constant steps you regard as being AVG_DIM_SIZE, which you
>> > > > still can
>> > > > continue doing).
>> > > >
>> > > > Now, with the above form you can form expressions for the
>> > > > difference
>> > > > between addresses per iteration for each loop (i.e. the address
>> > > > stride per
>> > > > loop); store these. Then, when interchanging loops you need to
>> > > > merely
>> > > > swap these expressions like you have to with the distance
>> > > > vector, instead
>> > > > of fiddling inside the DR_ACCESS_FNs themself. Much code would
>> > > > go away.
>> > >
>> > > Yeah. Did similar thing in loop nest distribution pass. See
>> > > compute_access_range
>> > > in tree-loop-distribution.c. Actually, I would do the same here
>> > > if I
>> > > had implemented
>> > > this pass after loop nest distribution patches. Done in this
>> > > updated patch.
>> > >
>> > > >
>> > > > Testcases: given that we had to remove our old separate
>> > > > interchange pass
>> > > > because it miscompiled stuff all over I'm missing some
>> > > > testcases where
>> > > > interchange should _not_ happen for validity reasons, like my
>> > > > above
>> > > > example with an reduction that can't be moved inside. Perhaps
>> > > > you can
>> > > > think of some more.
>> > >
>> > > As mentioned above, it's hard to create test that fail exactly
>> > > for this reason.
>> > > I added one that data dependence prevents us from interchanging
>> > > the loop.
>> > >
>> > > >
>> > > > I hope this is of some help to you :)
>> > >
>> > > Thanks again, it's very helpful.
>> > >
>> > > I also fixed several bugs of previous implementation, mostly
>> > > about debug info
>> > > statements and simple reductions. As for test, I enabled this
>> > > pass by default,
>> > > bootstrap and regtest GCC, I also build/run specs. There must be
>> > > some other
>> > > latent bugs in it, but guess we have to exercise it by enabling
>> > > it at
>> > > some point.
>> > >
>> > > So any comments?
>> >
>> > bool
>> > -gsi_remove (gimple_stmt_iterator *i, bool remove_permanently)
>> > +gsi_remove (gimple_stmt_iterator *i, bool remove_permanently, bool
>> > insert_dbg)
>> > {
>> >
>> > that you need this suggests you do stmt removal in wrong order (you
>> > need to
>> > do reverse dom order).
>>
>> As below code in handling debug uses, this updated patch gives up on
>> more cases
>> by scrapping debug uses now. Hopefully this isn't a problem,
>> debugging experience
>> for interchange loops is bad already?
>> >
>> > +/* Maximum number of statements in loop nest for loop
>> > interchange. */
>> > +
>> > +DEFPARAM (PARAM_LOOP_INTERCHANGE_MAX_NUM_STMTS,
>> > + "loop-interchange-max-num-stmts",
>> > + "The maximum number of stmts in loop nest for loop
>> > interchange.",
>> > + 64, 0, 0)
>> >
>> > is that to limit dependence computation? In this case you should
>> > probably
>> > limit the number of data references instead?
>>
>> Hmm, I kept this one and it is to limit the size of loops for
>> interchange.
>>
>> >
>> > +ftree-loop-interchange
>> > +Common Report Var(flag_tree_loop_interchange) Optimization
>> > +Enable loop interchange on trees.
>> > +
>> >
>> > please re-use -floop-interchange instead and change the GRAPHITE
>> > tests
>> > to use -floop-nest-optimize. You can do that as pre-approved thing
>> > now.
>>
>> Done. I will send an independent patch adjusting GRAPHITE tests.
>>
>> >
>> > Please enable the pass by default at O3 via opts.c.
>>
>> I will do it in a separated patch because many vectorization tests
>> are vulnerable
>> to interchange. I checked these tests, interchange is good, we need
>> to disable
>> explicitly.
>>
>> >
>> > diff --git a/gcc/tree-ssa-loop-interchange.cc b/gcc/tree-ssa-loop-
>> > interchange.cc
>> >
>> > gimple-loop-interchange.cc please.
>> >
>> > new file mode 100644
>> > index 0000000..abffbf6
>> > --- /dev/null
>> > +++ b/gcc/tree-ssa-loop-interchange.cc
>> > @@ -0,0 +1,2274 @@
>> > +/* Loop invariant motion.
>> > + Copyright (C) 2017 Free Software Foundation, Inc.
>> >
>> > Loop invariant motion? ... ;)
>> >
>> > Please add a "Contributed by ..." to have an easy way to figure
>> > people to blame.
>> >
>> > +}*induction_p;
>> > +
>> >
>> > space after '*'
>> >
>> > +}*reduction_p;
>> > +
>> >
>> > likewise.
>>
>> All done.
>>
>> >
>> > +/* Return true if PHI is unsupported in loop interchange, i.e, PHI
>> > contains
>> > + ssa var appearing in any abnormal phi node. */
>> > +
>> > +static inline bool
>> > +unsupported_phi_node (gphi *phi)
>> > +{
>> > + if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (PHI_RESULT (phi)))
>> > + return true;
>> > +
>> > + for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
>> > + {
>> > + tree arg = PHI_ARG_DEF (phi, i);
>> > + if (TREE_CODE (arg) == SSA_NAME
>> > + && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (arg))
>> > + return true;
>> > + }
>> > +
>> > + return false;
>> >
>> > I believe the above isn't necessary given you rule out abnormal
>> > edges
>> > into the loop.
>> > Did you have a testcase that broke? A minor thing I guess if it is
>> > just for extra
>> > safety...
>>
>> Extra safety I guess. I now remove this and haven't run into
>> compilation issues so far.
>>
>> >
>> > +/* Return true if all stmts in BB can be supported by loop
>> > interchange,
>> > + otherwise return false. ILOOP is not NULL if this loop_cand is
>> > the
>> > + outer loop in loop nest. */
>> > +
>> > +bool
>> > +loop_cand::unsupported_operation (basic_block bb, loop_cand
>> > *iloop)
>> > +{
>> >
>> > docs and return value suggest this be named supported_operation
>>
>> Done.
>>
>> >
>> > + /* Or it's invariant memory reference and only used by inner
>> > loop. */
>> > + if (gimple_assign_single_p (stmt)
>> > + && (lhs = gimple_assign_lhs (stmt)) != NULL_TREE
>> > + && TREE_CODE (lhs) == SSA_NAME
>> > + && single_use_in_loop (lhs, iloop->loop))
>> > + continue;
>> >
>> > comment suggests multiple uses in loop would be ok?
>>
>> Comment changed.
>>
>> >
>> > + if ((lhs = gimple_assign_lhs (producer)) == NULL_TREE
>> > + || lhs != re->init)
>> > + return;
>> > +
>> > + if ((rhs = gimple_assign_rhs1 (producer)) == NULL_TREE
>> > + || !REFERENCE_CLASS_P (rhs))
>> > + return;
>> >
>> > lhs and rhs are never NULL. Please initialize them outside of the
>> > if.
>> > You want to disallow DECL_P rhs here?
>>
>> Done.
>>
>> >
>> > Can you add an overall function comment what this function
>> > does? It seems
>> > to detect a reduction as produced by loop store-motion? Thus it
>> > tries to
>> > get at enough information to perform the reverse transform?
>>
>> Yes. Comment added.
>>
>> >
>> > During review I have a hard time distinguishing between locals and
>> > members
>> > so can you please prefix all member variables with m_ as according
>> > to our
>> > code guidelines? I guess what adds to the confusion is the
>> > loop_cand argument
>> > that sometimes is present for loop_cand member functions...
>> > (I personally prefer to prefix all member accesses with this-> but
>> > that's harder
>> > to enforce)
>>
>> Done by using m_* stuff.
>>
>> >
>> > +static void
>> > +find_deps_in_bb_for_stmt (gimple_seq *stmts, basic_block bb,
>> > gimple *consumer)
>> > +{
>> >
>> > this extracts stmts starting from consumer but rules out other
>> > consumers (recursively).
>> > Is that intended? I wonder if you can avoid this dance...
>>
>> Okay, so in case the reduction is initialized from constant value, we
>> need to generate new
>> load memory reference when undoing it. The new memory reference may
>> depends on idx
>> variables like in ARRAY_REF. This function tries to find all
>> depended stmts for inserting it
>> I didn't look into how GRAPHITE tracks the idx variable and
>> regenerate it when necessary,
>> maybe we can do the same in the future.
>>
>> >
>> > + /* Transform simple reduction of below form:
>> > +
>> > + init = 0;
>> > + loop:
>> > + var = phi<init, next>
>> > + next = var op ...
>> > + reduc_sum = phi<next>
>> > + MEM_REF[...] = reduc_sum
>> > +
>> > + into:
>> > +
>> >
>> > which one is 'consumer'? I wonder if you can simply leave all the
>> > dead code in place
>> > just emitting the load-update-store stmts and removing the
>> > MEM_REF[...] = reduc_sum
>> > above?
>>
>> Done. I simplified the code generation for the pass and uses
>> prerequisite simple dce
>> interface to remove dead code. Hope it's clearer now.
>>
>> >
>> > +/* Eliminate dead code after loop interchange. */
>> > +
>> > +void
>> > +loop_cand::eliminate_dead_code (void)
>> > +{
>> >
>> > PRE tracks "possible" dead defs and has a worklist algorithm in
>> > remove_dead_inserted_code.
>> > I wonder if you can do sth similar? That is, I wonder if doing a
>> > sweep from last to first
>> > stmt wouldn't be better here?
>> >
>> > + /* Given copy propagation is done during interchange, we
>> > can
>> > + simply check zero uses of var and eliminate it. */
>> > + if (is_gimple_assign (stmt)
>> > + && !gimple_vuse (stmt)
>> >
>> > you probably meant to check gimple_vdef?
>> >
>> > + && !gimple_has_volatile_ops (stmt)
>> > + && !gimple_has_side_effects (stmt)
>> >
>> > the former is redundant
>> >
>> > + && (lhs = gimple_assign_lhs (stmt)) != NULL_TREE
>> > + && TREE_CODE (lhs) == SSA_NAME
>> > + && has_zero_uses (lhs))
>> >
>> > if you use gimple_get_lhs () you can also handle calls.
>> >
>> > That said, this seems to be a very poor DCE, why is it necessary at
>> > all?
>>
>> Local DCE removed by reusing new interface simple_dce_from_worklist.
>> But the DCE is necessary giving dead code lasts to vectorizer. At
>> least we
>> can save compilation time.
>>
>> >
>> > +/* Interchange niters info of ILOOP and OLOOP while reset any
>> > other niters
>> > + estimates information for now. */
>> > +
>> > +static inline void
>> > +interchange_nb_iterations (struct loop *iloop, struct loop *oloop)
>> > +{
>> > + tree nb_iterations = oloop->nb_iterations;
>> > +
>> > + oloop->any_upper_bound = false;
>> > + oloop->any_likely_upper_bound = false;
>> > + free_numbers_of_iterations_estimates (oloop);
>> > +
>> > + oloop->nb_iterations = iloop->nb_iterations;
>> > +
>> > + iloop->any_upper_bound = false;
>> > + iloop->any_likely_upper_bound = false;
>> > + free_numbers_of_iterations_estimates (iloop);
>> > +
>> > + iloop->nb_iterations = nb_iterations;
>> >
>> > use std::swap? Also I think if you can keep nb_iterations you
>> > can certainly keep the upper bounds. You're probably
>> > afraid of the ->stmt references in the nb_iter_bound entries?
>> >
>> > Anyway, either scrap everything or try to keep everything.
>>
>> Yeah, not only the stmts, but also the control_iv information because
>> the SCEV
>> information may be corrupted during code transformation.
>> Now I discarded all the information.
>>
>> >
>> > + for (i = 0; oloop.reductions.iterate (i, &re); ++i)
>> > + {
>> > + if (re->type != DOUBLE_RTYPE)
>> > + gcc_unreachable ();
>> > +
>> > + use_operand_p use_p;
>> > + imm_use_iterator iterator;
>> > + FOR_EACH_IMM_USE_FAST (use_p, iterator, re->var)
>> > + mark_or_remove_dbg_stmt (USE_STMT (use_p), re->var);
>> > + FOR_EACH_IMM_USE_FAST (use_p, iterator, re->next)
>> > + mark_or_remove_dbg_stmt (USE_STMT (use_p), re->next);
>> > + if (TREE_CODE (re->init) == SSA_NAME)
>> > + {
>> > + FOR_EACH_IMM_USE_FAST (use_p, iterator, re->init)
>> > + mark_or_remove_dbg_stmt (USE_STMT (use_p), re->init);
>> > + }
>> >
>> > can you add a comment what you are doing here?
>> >
>> > Note that other loop opts simply scrap all debug stmts ...
>>
>> As mentioned above, updated patch doesn't try hard to maintain debug
>> use info any more.
>>
>> >
>> > +static void
>> > +compute_access_stride (struct loop *loop_nest, struct loop *loop,
>> > + data_reference_p dr)
>> > +{
>> > ...
>> > + tree ref = DR_REF (dr);
>> > + tree scev_base = build_fold_addr_expr (ref);
>> > + tree access_size = TYPE_SIZE_UNIT (TREE_TYPE (ref));
>> > + tree niters = build_int_cst (sizetype, AVG_LOOP_NITER);
>> > + access_size = fold_build2 (MULT_EXPR, sizetype, niters,
>> > access_size);
>> > +
>> > + do {
>> > + tree scev_fn = analyze_scalar_evolution (loop, scev_base);
>> > + if (chrec_contains_undetermined (scev_fn)
>> > + || chrec_contains_symbols_defined_in_loop (scev_fn, loop-
>> > >num))
>> > + break;
>> > ...
>> > + strides->safe_push (scev_step);
>> > + } while (loop != loop_nest && (loop = loop_outer (loop)) !=
>> > NULL);
>> > +
>> >
>> > I _think_ you want to do
>> >
>> > scev_fn = analyze_scalar_evolution (loop, scev_base); //
>> > assuming
>> > DR_STMT (dr) is in loop
>> > scev_fn = instantiate_parameters (nest, scev_fn);
>> > if (chrec_contains_undetermined (scev_fn))
>> > return; // false?
>> >
>> > and analyze the result which should be of the form
>> >
>> > { { { init, +, step1 }_1, +, step2 }_2, + , step3 }_3 ...
>> >
>> > if canonical. I think estimate_val_by_simplify_replace isn't
>> > needed
>> > if you do that
>> > (it also looks odd to replace all vairables in step by niter...).
>>
>> I replied on this in previous message, instantiate_parameters doesn't
>> always
>> give canonical form result as expected. The loop here could be seen
>> as a
>> local instantiate process, right?
>> Also estimate_val_by_simplify_replace is needed for pointers, where
>> strides
>> are computed from niters of loops which could be non compilation time
>> constant.
>> But yes, it's an odd fixup after I failed to do anything better.
>>
>> >
>> > I think keeping the chrec in the above form is also more suitable
>> > for what
>> > the caller does so the outermost loop is simply
>> >
>> > loop = loop_nest;
>> > loop-over-all-dr-chrecs
>> > if (flow_loop_nested_p (loop, CHREC_LOOP (chrec)))
>> > loop = CHREC_LOOP (chrec);
>> >
>> > given the outermost loop is the outer evolution. If you sort the
>> > stride vecs from inner
>> > to outer you don't need prune_access_strides_not_in_loop.
>>
>> Hmm, So stripping outer loops prefer inner to outer sort of strides,
>> but cost computation
>> during interchange prefers outer to inner sort because loop_nest in
>> tree-data-ref is sorted
>> in this way. Seems a single prune_* function is better than fiddling
>> with cost computation.
>>
>> >
>> > +/* Count and return the number of loops in LOOP_NEST. */
>> > +
>> > +unsigned int
>> > +num_loops_in_loop_nest (struct loop *loop_nest)
>> > +{
>> > + unsigned num_loops;
>> > + for (num_loops = 0; loop_nest; num_loops++, loop_nest =
>> > loop_nest->inner)
>> > + ;
>> > + return num_loops;
>> >
>> > loop_depth (innermost) - loop_depth (nest)?
>>
>> Done.
>>
>> >
>> > +static bool
>> > +should_interchange_loops (unsigned i_idx, unsigned o_idx,
>> > + vec<data_reference_p> datarefs,
>> > + bool innermost_loops_p, bool dump_info_p
>> > = true)
>> > +{
>> >
>> > isn't all we need associating the above CHREC to sort after the
>> > CHREC_RIGHTs
>> > and figure a permutation sequence to arrive there? That is for the
>> > local
>> > decision you compute here it is CHREC_RIGHT [i_idx] > CHREC_RIGHT
>> > [o_idx]
>> > when we should interchange?
>> >
>> > That subloop_stride_p and tracking invariant DRs looks a bit
>> > odd. For loops
>> > where a DR is invariant you simply do not have an evolution in that
>> > loop.
>> > You seem to simply add strides in the inner and outer loops for
>> > each DR,
>> > can you explain how that works? Also I guess strides bigger than
>> > the
>> > various cache-line size parameters should be treated 'equal'? That
>> > is,
>> > if we don't get any DR to a step that results in L1 cache hits
>> > because
>> > two DRs share a cache line the interchange is pointless?
>>
>> So given below loop:
>>
>> for (int i = 0; i < M; i++)
>> for (int j = 0; j < M; j++)
>> for (int k = 0; k < M; k++)
>> a[k][0][i] = b[k][0][i]
>>
>> We check if memory reference is invariant wrto a loop only if it has
>> zero strides within
>> current loop nest. In this example, there is no invariant given
>> address changes in the
>> innermost loop.
>> For strides bigger than cache-line size, it's also possible the
>> interchange is wanted, as
>> in below example:
>>
>> for (int i = 0; i < M; i++) //loop 1
>> for (int j = 0; j < M; j++) //loop 2
>> for (int k = 0; k < M; k++) //loop 3
>> a[j][i][k] = b[j][i][k]
>>
>> Strides for loop 1/2 are very like to be big, but after interchange,
>> we will have stream
>> access of both arrays.
>>
>> More advanced heuristics may be possible, but so far the estimation
>> is quite good by
>> checking all interchanges I looked into.
>>
>> >
>> > +/* Prune DATAREFS by removing any data reference not inside of
>> > LOOP. */
>> > +
>> > +static inline void
>> > +prune_datarefs_not_in_loop (struct loop *loop,
>> > vec<data_reference_p> datarefs)
>> > +{
>> > + struct data_reference *dr;
>> > +
>> > + for (unsigned i = 0; datarefs.iterate (i, &dr);)
>> > + if (flow_bb_inside_loop_p (loop, gimple_bb (DR_STMT (dr))))
>> > + i++;
>> > + else
>> > + {
>> > + datarefs.ordered_remove (i);
>> >
>> > that's expensive. It's better to keep moving DRs we want to keep
>> > when walking the array. That is, add a j you increment only when
>> > we keep a DR, moving *i to *j.
>>
>> Done.
>>
>> >
>> > + if (dr->aux)
>> > + {
>> > + DR_ACCESS_STRIDE (dr)->release ();
>> > + free (dr->aux);
>> > + }
>> > + free_data_ref (dr);
>> > + }
>> > +}
>> >
>> > +
>> > + start_loop = prune_non_rectangle_loop_nest (innermost_loop,
>> > start_loop);
>> > +
>> >
>> > Hmm. If you instantiate the SCEV for the niters for each loop in
>> > the nest
>> > wrt the nest you can figure if it has any evolution in sth else
>> > than the
>> > loop (evolution_function_is_univariate_p). That is, this is not a
>> > problem
>> > until you arrive at analyzing DR strides, right? At which point
>> > you
>> > can check for the appropriate form?
>>
>> Hmm, not really. The niter relation may not appear in SCEV of
>> reference addr.
>> For example, below loop:
>>
>> for (int i = 0; i < M; i++) //loop 1
>> for (int j = 0; j < M; j++) //loop 2
>> for (int k = 0; k < i; k++) //loop 3
>> a[k][0][i] = b[k][0][i]
>>
>> There is no information in data reference about i/j loops.
>> Anyway, I refactored the code and put this check in
>> proper_loop_form_for_interchange.
>> Simpler I think.
>>
>> >
>> > + if (find_data_references_in_loop (start_loop, datarefs) ==
>> > chrec_dont_know
>> > + /* Check if there is no data reference. */
>> > + || datarefs->length () == 0
>> > + /* Check if there are too many of data references. */
>> > + || ((int) datarefs->length ()
>> > + > PARAM_VALUE (PARAM_LOOP_MAX_DATAREFS_FOR_DATADEPS))
>> > + /* Check if there is any data reference in loop latch. We
>> > can't handle
>> > + loops which loop header and data references have different
>> > execution
>> > + times. */
>> > + || dataref_niters_diff_to_loop_header (*datarefs)
>> >
>> > this suggests to do your own find_data_references_in_loop so you
>> > can early
>> > out.
>>
>> I refactored the code a bit. Now this check is in
>> proper_loop_form_for_interchange,
>> but I do customized my own data references finder. It's needed to
>> strip outer loops
>> once a difficult reference is found.
>>
>> >
>> > Overall the flow through the pass is a bit hard to follow given
>> > there are
>> > IMHO too many functions.
>>
>> Yeah, I removed quite number of small functions and refactor the code
>> a lot. Hope this
>> version is more straightforward.
>> >
>> > +unsigned int
>> > +pass_linterchange::execute (function *fun)
>> > +{
>> > + if (number_of_loops (fun) <= 2)
>> > + return 0;
>> > +
>> > + bool changed_p = false;;
>> > + struct loop *loop;
>> > + vec<loop_p> loop_nest;
>> > + vec<data_reference_p> datarefs;
>> > + vec<ddr_p> ddrs;
>> > +
>> > + FOR_EACH_LOOP (loop, LI_ONLY_INNERMOST)
>> > + if (prepare_perfect_loop_nest (loop, &loop_nest, &datarefs,
>> > &ddrs))
>> > + {
>> > + tree_loop_interchange loop_interchange (loop_nest,
>> > datarefs, ddrs);
>> > + changed_p |= loop_interchange.interchange ();
>> > + }
>> >
>> > you leak datarefs/ddrs?
>>
>> It was release in destructor, but I refactored it anyway. I will
>> push the code to branch
>> gcc.gnu.org/svn/gcc/branches/gimple-linterchange.
>>
>> Thanks again for the comment of you two.
>>
>> Thanks,
>> bin
>> 2017-11-27 Bin Cheng <bin.cheng@arm.com>
>>
>> * Makefile.in (gimple-loop-interchange.o): New object file.
>> * common.opt (floop-interchange): Reuse the option from
>> graphite.
>> * doc/invoke.texi (-floop-interchange): Ditto. New document.
>> * gimple-loop-interchange.cc: New file.
>> * params.def (PARAM_LOOP_INTERCHANGE_MAX_NUM_STMTS): New
>> parameter.
>> (PARAM_LOOP_INTERCHANGE_STRIDE_RATIO): New parameter.
>> * passes.def (pass_linterchange): New pass.
>> * timevar.def (TV_LINTERCHANGE): New time var.
>> * tree-pass.h (make_pass_linterchange): New declaration.
>> * tree-ssa-loop-ivcanon.c (create_canonical_iv): Change to
>> external
>> interchange. Record IV before/after increment in new
>> parameters.
>> * tree-ssa-loop-ivopts.h (create_canonical_iv): New
>> declaration.
>>
>> gcc/testsuite
>> 2017-11-27 Bin Cheng <bin.cheng@arm.com>
>>
>> * gcc.dg/tree-ssa/loop-interchange-1.c: New test.
>> * gcc.dg/tree-ssa/loop-interchange-2.c: New test.
>> * gcc.dg/tree-ssa/loop-interchange-3.c: New test.
>> * gcc.dg/tree-ssa/loop-interchange-4.c: New test.
>> * gcc.dg/tree-ssa/loop-interchange-5.c: New test.
>> * gcc.dg/tree-ssa/loop-interchange-6.c: New test.
>> * gcc.dg/tree-ssa/loop-interchange-7.c: New test.
>> * gcc.dg/tree-ssa/loop-interchange-8.c: New test.
>> * gcc.dg/tree-ssa/loop-interchange-9.c: New test.
>> * gcc.dg/tree-ssa/loop-interchange-10.c: New test.
>> * gcc.dg/tree-ssa/loop-interchange-11.c: New test.
>
> [...]
>> diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop-interchange.cc
>> new file mode 100644
>> index 0000000..9a65b28
>> --- /dev/null
>> +++ b/gcc/gimple-loop-interchange.cc
>
> [...]
>
>> +/* Return the reduction if STMT is one of its lcssa PHI, producer or consumer
>> + stmt. */
>> +
>> +reduction_p
>> +loop_cand::find_reduction_by_stmt (gimple *stmt)
>> +{
>> + gphi *phi = NULL;
>> + reduction_p re;
>> +
>> + if (is_a <gphi *> (stmt))
>> + phi = as_a <gphi *> (stmt);
>
> I happened to notice a few places in the patch where you use is_a<>
> followed by as_a<>.
>
> Sorry if you covered this in an earlier round of review.
>
> I believe the code could be slightly simplified by using is-a.h's
> dyn_cast<> here, giving something like:
>
> gphi *phi = dyn_cast <gphi *> (stmt);
>
> [...]
>
>> +/* Analyze reduction variable VAR for inner loop of the loop nest to be
>> + interchanged. Return true if analysis succeeds. */
>> +
>> +bool
>> +loop_cand::analyze_iloop_reduction_var (tree var)
>> +{
>
> [...]
>
>> +
>> + /* Or else it's used in PHI itself. */
>> + use_phi = NULL;
>> + if (is_a <gphi *> (stmt)
>> + && (use_phi = as_a <gphi *> (stmt)) != NULL
>> + && use_phi == phi)
>> + continue;
>
> Similarly here; I belive this could be simplified to:
>
> /* Or else it's used in PHI itself. */
> use_phi = dyn_cast <gphin *> (stmt);
> if (use_phi != NULL
> && use_phi == phi)
> continue;
>
> and given that (I think) phi is non-NULL, this could be:
>
> /* Or else it's used in PHI itself. */
> use_phi = dyn_cast <gphi *> (stmt);
> if (use_phi == phi)
> continue;
>
> for 3 lines rather than 5 lines.
>
> [...]
>
>> +bool
>> +loop_cand::analyze_oloop_reduction_var (loop_cand *iloop, tree var)
>> +{
>
> [...]
>
>> + /* Outer loop's reduction should only be used to initialize inner loop's
>> + simple reduction. */
>> + FOR_EACH_IMM_USE_FAST (use_p, iterator, var)
>> + {
>> + stmt = USE_STMT (use_p);
>> + if (is_gimple_debug (stmt))
>> + continue;
>> +
>> + if (!flow_bb_inside_loop_p (m_loop, gimple_bb (stmt)))
>> + return false;
>> +
>> + if (! is_a <gphi *> (stmt)
>> + || (use_phi = as_a <gphi *> (stmt)) == NULL
>> + || use_phi != inner_re->phi)
>> + return false;
>> + }
>
> Another one here. The:
> use_phi = as_a <gphi *> (stmt)) == NULL
> looks strange to me: the "is_a <gphi *>" tests for stmt->code, so stmt can't
> be NULL, and hence use_phi can't be NULL.
>
> You don't seem to use the "use_phi" after each FOR_EACH_IMM_USE_FAST loop,
> so this could in theory be simplified to:
>
> if (stmt != inner_re->phi)
> return false;
>
> Though maybe I'm missing something; presumably that logic was there
> for a reason.
>
>> +
>> + /* Check this reduction is correctly used outside of loop via lcssa phi. */
>> + FOR_EACH_IMM_USE_FAST (use_p, iterator, next)
>> + {
>> + stmt = USE_STMT (use_p);
>> + if (is_gimple_debug (stmt))
>> + continue;
>> +
>> + /* Or else it's used in PHI itself. */
>> + use_phi = NULL;
>> + if (is_a <gphi *> (stmt)
>> + && (use_phi = as_a <gphi *> (stmt)) != NULL
>> + && use_phi == phi)
>> + continue;
>> + if (lcssa_phi == NULL
>> + && use_phi != NULL
>> + && gimple_bb (stmt) == e->dest
>> + && PHI_ARG_DEF_FROM_EDGE (use_phi, e) == next)
>> + lcssa_phi = use_phi;
>> + else
>> + return false;
>> + }
>
> ...and here, which I believe could become:
>
> use_phi = dyn_cast <gphi *> (stmt);
> if (use_phi == phi)
> continue;
> if (lcssa_phi == NULL
> [..etc..]
>
> [...]
>
> Hope this is constructive
Yes, of course. I will fix these in the next update.
Thanks,
bin
> Dave
next prev parent reply other threads:[~2017-11-28 16:05 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-28 15:55 Bin Cheng
2017-11-28 16:14 ` David Malcolm
2017-11-28 16:16 ` Bin.Cheng [this message]
2017-11-30 13:19 ` Richard Biener
2017-11-30 14:34 ` Richard Biener
2017-11-30 15:07 ` Bin.Cheng
2017-11-30 15:14 ` Richard Biener
2017-11-30 16:01 ` Richard Biener
2017-11-30 18:24 ` Bin.Cheng
2017-11-30 18:41 ` Richard Biener
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAHFci28-H_96it=qx_ZRvB5yrreMo1D4rvgexc7Bqe4HU1U=0Q@mail.gmail.com' \
--to=amker.cheng@gmail.com \
--cc=Bin.Cheng@arm.com \
--cc=dmalcolm@redhat.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=matz@suse.de \
--cc=nd@arm.com \
--cc=richard.guenther@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).