Hi Richard, While digging thru aarch64_expand_vector_init, I noticed it gives priority to loading a constant first: /* Initialise a vector which is part-variable. We want to first try to build those lanes which are constant in the most efficient way we can. */ which results in suboptimal code-gen for following case: int16x8_t f_s16(int16_t x) { return (int16x8_t) { x, x, x, x, x, x, x, 1 }; } code-gen trunk: f_s16: movi v0.8h, 0x1 ins v0.h[0], w0 ins v0.h[1], w0 ins v0.h[2], w0 ins v0.h[3], w0 ins v0.h[4], w0 ins v0.h[5], w0 ins v0.h[6], w0 ret The attached patch tweaks the following condition: if (n_var == n_elts && n_elts <= 16) { ... } to pass if maxv >= 80% of n_elts, with 80% being an arbitrary "high enough" threshold. The intent is to dup the most repeating variable if it it's repetition is "high enough" and insert constants which should be "better" than loading constant first and inserting variables like in the above case. Alternatively, I suppose we can remove threshold and for constants, generate both sequences and check which one is more efficient ? code-gen with patch: f_s16: dup v0.8h, w0 movi v1.4h, 0x1 ins v0.h[7], v1.h[0] ret The patch is lightly tested to verify that vec[t]-init-*.c tests pass with bootstrap+test in progress. Does this look OK ? Thanks, Prathamesh