I've committed this patch to trunk. It implements a partitioning optimization for a loop partitioned over both vector and worker axes. We can elide the inner vector partitioning state propagation, if there are no intervening instructions in the worker-partitioned outer loop other than the forking and joining. We simply execute the worker propagation on all vectors. I've been unable to introduce a testcase for this. The difficulty is we want to check an rtl dump from the acceleration compiler, and there doesn't appear to be existing machinery for that in the testsuite. Perhaps something to be added later? nathan