On Tue, Oct 24, 2017 at 3:30 PM, Michael Matz wrote: > Hello, > > On Fri, 22 Sep 2017, Bin.Cheng wrote: > >> This is updated patch for loop interchange with review suggestions >> resolved. Changes are: >> 1) It does more light weight checks like rectangle loop nest check >> earlier than before. >> 2) It checks profitability of interchange before data dependence computation. >> 3) It calls find_data_references_in_loop only once for a loop nest now. >> 4) Data dependence is open-computed so that we can skip instantly at >> unknown dependence. >> 5) It improves code generation in mapping induction variables for >> loop nest, as well as >> adding a simple dead code elimination pass. >> 6) It changes magic constants into parameters. > > So I have a couple comments/questions. Something stylistic: Hi Michael, Thanks for reviewing. > >> +class loop_cand >> +{ >> +public: >> ... >> + friend class tree_loop_interchange; >> +private: > > Just make this all public (and hence a struct, not class). > No need for friends in file local classes. Done. > >> +single_use_in_loop (tree var, struct loop *loop) >> ... >> + FOR_EACH_IMM_USE_FAST (use_p, iterator, var) >> + { >> + stmt = USE_STMT (use_p); >> ... >> + basic_block bb = gimple_bb (stmt); >> + gcc_assert (bb != NULL); > > This pattern reoccurs often in your patch: you check for a bb associated > for a USE_STMT. Uses of SSA names always occur in basic blocks, no need > for checking. Done. > > Then, something about your handling of simple reductions: > >> +void >> +loop_cand::classify_simple_reduction (reduction_p re) >> +{ >> ... >> + /* Require memory references in producer and consumer are the same so >> + that we can undo reduction during interchange. */ >> + if (re->init_ref && !operand_equal_p (re->init_ref, re->fini_ref, 0)) >> + return; > > Where is it checked that the undoing transformation is legal also > from a data dep point of view? Think code like this: > > sum = X[i]; > for (j ...) > sum += X[j]; > X[i] = sum; > > Moving the store into the inner loop isn't always correct and I don't seem > to find where the above situation is rejected. Yeah. for the old patch, it's possible to have such loop wrongly interchanged; in practice, it's hard to create an example. The pass will give up when computing data dep between references in inner/outer loops. In this updated patch, it's fixed by giving up if there is any dependence between references of inner/outer loops. > > Maybe I'm confused because I also don't see where you even can get into > the above situation (though I do see testcases about this). The thing is, > for an 2d loop nest to contain something like the above reduction it can't > be perfect: > > for (j) { > int sum = X[j]; // 1 > for (i) > sum += Y[j][i]; > X[j] = sum; // 2 > } > > But you do check for perfectness in proper_loop_form_for_interchange and > prepare_perfect_loop_nest, so either you can't get into the situation or > the checking can't be complete, or you define the above to be perfect > nevertheless (probably because the load and store are in outer loop > header/exit blocks?). The latter would mean that you accept also other > code in header/footer of loops from a pure CFG perspective, so where is it > checked that that other code (which aren't simple reductions) isn't > harmful to the transformation? Yes, I used the name perfect loop nest, but the pass can handle special form imperfect loop nest for the simple reduction. I added comments describing this before function prepare_perfect_loop_nest. > > Then, the data dependence part of the new pass: > >> +bool >> +tree_loop_interchange::valid_data_dependences (unsigned inner, unsigned outer) >> +{ >> + struct data_dependence_relation *ddr; >> + >> + for (unsigned i = 0; ddrs.iterate (i, &ddr); ++i) >> + { >> + /* Skip no-dependence case. */ >> + if (DDR_ARE_DEPENDENT (ddr) == chrec_known) >> + continue; >> + >> + for (unsigned j = 0; j < DDR_NUM_DIR_VECTS (ddr); ++j) >> + { >> + lambda_vector dist_vect = DDR_DIST_VECT (ddr, j); >> + unsigned level = dependence_level (dist_vect, loop_nest.length ()); >> + >> + /* If there is no carried dependence. */ >> + if (level == 0) >> + continue; >> + >> + level --; >> + /* Skip case which has '>' as the leftmost direction. */ >> + if (!lambda_vector_lexico_pos (dist_vect, level)) >> + return false; > > Shouldn't happen as dist vectors are forced positive via DDR_REVERSED. Done. > >> + /* If dependence is carried by outer loop of the two loops for >> + interchange. */ >> + if (level < outer) >> + continue; >> + >> + lambda_vector dir_vect = DDR_DIR_VECT (ddr, j); >> + /* If directions at both inner/outer levels are the same. */ >> + if (dir_vect[inner] == dir_vect[outer]) >> + continue; >> + >> + /* Be conservative, skip case if either direction at inner/outer >> + levels is not '=' or '<'. */ >> + if (dir_vect[inner] != dir_equal >> + && dir_vect[inner] != dir_positive >> + && dir_vect[inner] != dir_independent >> + && dir_vect[inner] != dir_positive_or_equal) >> + return false; >> + >> + if (dir_vect[outer] != dir_equal >> + && dir_vect[outer] != dir_positive >> + && dir_vect[outer] != dir_independent >> + && dir_vect[outer] != dir_positive_or_equal) >> + return false; > > Checking dir vectors doesn't make much sense in GCC: the elements are only > ever set to dir_positive, dir_negative or dir_equal, exactly when distance > is > > 0, < 0 or == 0. So checking dist vector is enough. (though sameness of > direction checks sameness of sign with zero). Incidentally: Done. > >> +tree_loop_interchange::update_data_deps (unsigned inner, unsigned outer) >> +{ >> + struct data_dependence_relation *ddr; >> + >> + for (unsigned i = 0; ddrs.iterate (i, &ddr); ++i) >> + { >> + /* Skip no-dependence case. */ >> + if (DDR_ARE_DEPENDENT (ddr) == chrec_known) >> + continue; >> + >> + for (unsigned j = 0; j < DDR_NUM_DIR_VECTS (ddr); ++j) >> + { >> + lambda_vector dir_vect = DDR_DIR_VECT (ddr, j); >> + std::swap (dir_vect[inner], dir_vect[outer]); >> + } >> + } >> +} > > Here you swap only the direction but not the distance vector, which can't > be right. I suggest only using (and updating) the distance vector. Yeah, fixed. > > And then your usage and update of DR_ACCESS_FNs: there's quite some > complexity connected with that and I'm not sure how worthwhile it is. > You're basically using the ACCESS_FNs to determine profitability (and not > for validity, and that's good). But e.g. for pointer based accesses like > in fortran with explicit address arithmetic the relation between access-fn > step and stride and actual access stride isn't that easy (e.g. in your > should_interchange_loops function iloop_stride and oloop_stride will > always be one for pointer based accesses). > > Conceptually what you should check is how the access address for each data > ref revolves for each loop, so why not doing this explicitely? What I > mean is: calculate a (complicated) chrec for the DR addresses for the > whole nest at the beginning. It should be in the form like (assume "+" > always): > > {{{init, s1}_l1, s2}_l2, s3}_l3 > > (i.e. all steps should be invariants/constants, and only one non-chrec > init value). Addresses which aren't in this form you're already ignoring > right now, so you could continue doing that. (Or better said, all > non-constant steps you regard as being AVG_DIM_SIZE, which you still can > continue doing). > > Now, with the above form you can form expressions for the difference > between addresses per iteration for each loop (i.e. the address stride per > loop); store these. Then, when interchanging loops you need to merely > swap these expressions like you have to with the distance vector, instead > of fiddling inside the DR_ACCESS_FNs themself. Much code would go away. Yeah. Did similar thing in loop nest distribution pass. See compute_access_range in tree-loop-distribution.c. Actually, I would do the same here if I had implemented this pass after loop nest distribution patches. Done in this updated patch. > > Testcases: given that we had to remove our old separate interchange pass > because it miscompiled stuff all over I'm missing some testcases where > interchange should _not_ happen for validity reasons, like my above > example with an reduction that can't be moved inside. Perhaps you can > think of some more. As mentioned above, it's hard to create test that fail exactly for this reason. I added one that data dependence prevents us from interchanging the loop. > > I hope this is of some help to you :) Thanks again, it's very helpful. I also fixed several bugs of previous implementation, mostly about debug info statements and simple reductions. As for test, I enabled this pass by default, bootstrap and regtest GCC, I also build/run specs. There must be some other latent bugs in it, but guess we have to exercise it by enabling it at some point. So any comments? Thanks, bin 2017-11-15 Bin Cheng * Makefile.in (tree-ssa-loop-interchange.o): New object file. * common.opt (ftree-loop-interchange): New option. * doc/invoke.texi (-ftree-loop-interchange): Document new option. * gimple-iterator.c (gsi_remove): New parameter determining if dbg bind stmt is inserted or not. * gimple-iterator.h (gsi_remove): New parameter in declaration. * params.def (PARAM_LOOP_INTERCHANGE_MAX_NUM_STMTS): New parameter. (PARAM_LOOP_INTERCHANGE_STRIDE_RATIO): New parameter. * passes.def (pass_linterchange): New pass. * timevar.def (TV_LINTERCHANGE): New time var. * tree-pass.h (make_pass_linterchange): New declaration. * tree-ssa-loop-interchange.cc: New file. * tree-ssa-loop-ivcanon.c (create_canonical_iv): Change to external. Record IV before/after increment in new parameters. * tree-ssa-loop-ivopts.h (create_canonical_iv): New declaration. gcc/testsuite 2017-11-15 Bin Cheng * gcc.dg/tree-ssa/loop-interchange-1.c: New test. * gcc.dg/tree-ssa/loop-interchange-2.c: New test. * gcc.dg/tree-ssa/loop-interchange-3.c: New test. * gcc.dg/tree-ssa/loop-interchange-4.c: New test. * gcc.dg/tree-ssa/loop-interchange-5.c: New test. * gcc.dg/tree-ssa/loop-interchange-6.c: New test. * gcc.dg/tree-ssa/loop-interchange-7.c: New test. * gcc.dg/tree-ssa/loop-interchange-8.c: New test. * gcc.dg/tree-ssa/loop-interchange-9.c: New test. * gcc.dg/tree-ssa/loop-interchange-10.c: New test. * gcc.dg/tree-ssa/loop-interchange-11.c: New test. > > > Ciao, > Michael.