Hi all, In this testcase the codegen for VLA SVE is worse than it could be due to unrolling: fully_peel_me: mov x1, 5 ptrue p1.d, all whilelo p0.d, xzr, x1 ld1d z0.d, p0/z, [x0] fadd z0.d, z0.d, z0.d st1d z0.d, p0, [x0] cntd x2 addvl x3, x0, #1 whilelo p0.d, x2, x1 beq .L1 ld1d z0.d, p0/z, [x0, #1, mul vl] fadd z0.d, z0.d, z0.d st1d z0.d, p0, [x3] cntw x2 incb x0, all, mul #2 whilelo p0.d, x2, x1 beq .L1 ld1d z0.d, p0/z, [x0] fadd z0.d, z0.d, z0.d st1d z0.d, p0, [x0] .L1: ret In this case, due to the vector-length-agnostic nature of SVE the compiler doesn't know the loop iteration count. For such loops we don't want to unroll if we don't end up eliminating branches as this just bloats code size and hurts icache performance. This patch introduces a new unroll-known-loop-iterations-only param that disables cunroll when the loop iteration count is unknown (SCEV_NOT_KNOWN). This case occurs much more often for SVE VLA code, but it does help some Advanced SIMD cases as well where loops with an unknown iteration count are not unrolled when it doesn't eliminate the branches. So for the above testcase we generate now: fully_peel_me: mov x2, 5 mov x3, x2 mov x1, 0 whilelo p0.d, xzr, x2 ptrue p1.d, all .L2: ld1d z0.d, p0/z, [x0, x1, lsl 3] fadd z0.d, z0.d, z0.d st1d z0.d, p0, [x0, x1, lsl 3] incd x1 whilelo p0.d, x1, x3 bne .L2 ret Not perfect still, but it's preferable to the original code. The new param is enabled by default on aarch64 but disabled for other targets, leaving their behaviour unchanged (until other target people experiment with it and set it, if appropriate). Bootstrapped and tested on aarch64-none-linux-gnu. Benchmarked on SPEC2017 on a Cortex-A57 and there are no differences in performance. Ok for trunk? Thanks, Kyrill 2018-11-09 Kyrylo Tkachov * params.def (PARAM_UNROLL_KNOWN_LOOP_ITERATIONS_ONLY): Define. * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Use above to disable unrolling on unknown iteration count. * config/aarch64/aarch64.c (aarch64_override_options_internal): Set PARAM_UNROLL_KNOWN_LOOP_ITERATIONS_ONLY to 1. * doc/invoke.texi (--param unroll-known-loop-iterations-only): Document. 2018-11-09 Kyrylo Tkachov * gcc.target/aarch64/sve/unroll-1.c: New test.