Hello world, this is a draft patch which interchanges the indices for FORALL and DO CONCURRENT loops for cases like PR 82471, where code like DO CONCURRENT( K=1:N, J=1:M, I=1:L) C(I,J,K) = A(I,J,K) + B(I,J,K) END DO led to very poor code because of stride issues. Currently, Graphite is not able to do this. Without the patch, the code above is translated as i.7 = 1; count.10 = 512; while (1) { if (ANNOTATE_EXPR ) goto L.4; j.6 = 1; count.9 = 512; while (1) { if (ANNOTATE_EXPR ) goto L.3; k.5 = 1; count.8 = 512; while (1) { if (ANNOTATE_EXPR ) goto L.2; (*(real(kind=4)[0:] * restrict) c.data)[((c.offset + (integer(kind=8)) k.5 * c.dim[2].stride) + (integer(kind=8)) j.6 * c.dim[1].stride) + (integer(kind=8)) i.7] = (*(real(kind=4)[0:] * restrict) a.data)[((a.offset + (integer(kind=8)) k.5 * a.dim[2].stride) + (integer(kind=8)) j.6 * a.dim[1].stride) + (integer(kind=8)) i.7] + (*(real(kind=4)[0:] * restrict) b.data)[((b.offset + (integer(kind=8)) k.5 * b.dim[2].stride) + (integer(kind=8)) j.6 * b.dim[1].stride) + (integer(kind=8)) i.7]; L.1:; k.5 = k.5 + 1; count.8 = count.8 + -1; } L.2:; j.6 = j.6 + 1; count.9 = count.9 + -1; } L.3:; i.7 = i.7 + 1; count.10 = count.10 + -1; } L.4:; so the innermost loop has the biggest stride. With the patch (and front-end optimization turned on), this is turned into k.7 = 1; count.10 = 512; while (1) { if (ANNOTATE_EXPR ) goto L.4; j.6 = 1; count.9 = 512; while (1) { if (ANNOTATE_EXPR ) goto L.3; i.5 = 1; count.8 = 512; while (1) { if (ANNOTATE_EXPR ) goto L.2; (*(real(kind=4)[0:] * restrict) c.data)[((c.offset + (integer(kind=8)) k.7 * c.dim[2].stride) + (integer(kind=8)) j.6 * c.dim[1].stride) + (integer(kind=8)) i.5] = (*(real(kind=4)[0:] * restrict) a.data)[((a.offset + (integer(kind=8)) k.7 * a.dim[2].stride) + (integer(kind=8)) j.6 * a.dim[1].stride) + (integer(kind=8)) i.5] + (*(real(kind=4)[0:] * restrict) b.data)[((b.offset + (integer(kind=8)) k.7 * b.dim[2].stride) + (integer(kind=8)) j.6 * b.dim[1].stride) + (integer(kind=8)) i.5]; L.1:; i.5 = i.5 + 1; count.8 = count.8 + -1; } L.2:; j.6 = j.6 + 1; count.9 = count.9 + -1; } L.3:; k.7 = k.7 + 1; count.10 = count.10 + -1; } L.4:; so the innermost loop is the one that gets traversed with unity stride (the way it should have been done). Although the algorithm here is quite simple, it resolves the issues raised in the PR so far, and it definitely worth fixing. If we do this kind of thing, it might also be possible to interchange normal DO loops in a similar way (which Graphite also cannot do at the moment, at least not if the bounds are variables). So, comments? Suggestions? Other ideas? Any ideas how to write a test case? Any volunteers to re-implement Graphite in the Fortran front end (didn't think so) or to make Graphite catch this sort of pattern (which it currently doesn't) instead? Regards Thomas 2017-10-27 Thomas Koenig * frontend-passes.c (index_interchange): New funciton, prototype. (optimize_namespace): Call index_interchange. (ind_type): New function. (has_var): New function. (index_cost): New function. (loop_comp): New function.