Hi, This patch implements a simple loop interchange pass in GCC, as described by its comments: +/* This pass performs loop interchange: for example, the loop nest + + for (int j = 0; j < N; j++) + for (int k = 0; k < N; k++) + for (int i = 0; i < N; i++) + c[i][j] = c[i][j] + a[i][k]*b[k][j]; + + is transformed to + + for (int i = 0; i < N; i++) + for (int j = 0; j < N; j++) + for (int k = 0; k < N; k++) + c[i][j] = c[i][j] + a[i][k]*b[k][j]; + + This pass implements loop interchange in the following steps: + + 1) Find perfect loop nest for each innermost loop and compute data + dependence relations for it. For above example, loop nest is + . + 2) From innermost to outermost loop, this pass tries to interchange + each loop pair. For above case, it firstly tries to interchange + and loop nest becomes . + Then it tries to interchange and loop nest becomes + . The overall effect is to move innermost + loop to the outermost position. For loop pair + to be interchanged, we: + 3) Check if data dependence relations are valid for loop interchange. + 4) Check if both loops can be interchanged in terms of transformation. + 5) Check if interchanging the two loops is profitable. + 6) Interchange the two loops by mapping induction variables. + + This pass also handles reductions in loop nest. So far we only support + simple reduction of inner loop and double reduction of the loop nest. */ Actually, this pass only does loop shift which outermosting inner loop to outer, rather than permutation. Also as a traditional loop optimizer, it only works for perfect loop nest. I put it just after loop distribution thus ideally loop split/distribution could create perfect nest for it. Unfortunately, we don't get any perfect nest from distribution for now because it only works for innermost loop. For example, the motivation case in spec2k6/bwaves is not handled on this pass alone. I have a patch extending distribution for (innermost) loop nest and with that patch bwaves case can be handled. Another point is I deliberately make both the cost model and code transformation (very) conservative. We can support more cases, or more transformations with great care when it is for sure known beneficial. IMHO, we already hit over-baked issues quite often and don't want to introduce more. As for code generation, this patch has an issue that invariant code in outer loop could be moved to inner loop. For the moment, we rely on the last lim pass to handle all INV generated during interchange. In the future, we may need to avoid that in interchange itself, or another lim pass just like the one after graphite optimizations. Boostrap and test on x86_64 and AArch64. Various benchmarks built and run successfully. Note this pass is disabled in patch, while the code is exercised by bootstrap/building programs with it enabled by default. Any comments? Thanks, bin 2017-08-29 Bin Cheng * Makefile.in (tree-ssa-loop-interchange.o): New object file. * common.opt (ftree-loop-interchange): New option. * doc/invoke.texi (-ftree-loop-interchange): Document new option. * passes.def (pass_linterchange): New pass. * timevar.def (TV_LINTERCHANGE): New time var. * tree-pass.h (make_pass_linterchange): New declaration. * tree-ssa-loop-interchange.cc: New file. * tree-ssa-loop-ivcanon.c (create_canonical_iv): Change to external. * tree-ssa-loop-ivopts.h (create_canonical_iv): New declaration. gcc/testsuite 2017-08-29 Bin Cheng * gcc.dg/tree-ssa/loop-interchange-1.c: New test. * gcc.dg/tree-ssa/loop-interchange-2.c: New test. * gcc.dg/tree-ssa/loop-interchange-3.c: New test. * gcc.dg/tree-ssa/loop-interchange-4.c: New test. * gcc.dg/tree-ssa/loop-interchange-5.c: New test. * gcc.dg/tree-ssa/loop-interchange-6.c: New test. * gcc.dg/tree-ssa/loop-interchange-7.c: New test. * gcc.dg/tree-ssa/loop-interchange-8.c: New test. * gcc.dg/tree-ssa/loop-interchange-9.c: New test. * gcc.dg/tree-ssa/loop-interchange-10.c: New test.