Jakub, currently automatic loop partitioning assigns from the innermost loop outwards -- that was the simplest thing to implement. A better algorithm is to assign the outermost loop to the outermost available axis, and then assign from the innermost loop outwards. That way we (generally) get gang partitioning on the outermost loop. Just inside that we'll get non-partitioned loops if the nest is too deep, and the two innermost nested loops will get worker and vector partitioning. This patch has been on the gomp4 branch for a while. ok for trunk? nathan