This patch improves code generation for literal vector construction by expanding and exposing the pattern to rtl optimization earlier. The current implementation delays splitting the pattern until after reload which results in poor code generation for the following code: #include "arm_neon.h" int16x8_t foo () { return vcombine_s16 (vdup_n_s16 (0), vdup_n_s16 (8)); } Trunk generates: foo: movi v1.2s, 0 movi v0.4h, 0x8 dup d2, v1.d[0] ins v2.d[1], v0.d[0] orr v0.16b, v2.16b, v2.16b ret With the patch we now generate: foo: movi v1.4h, 0x8 movi v0.4s, 0 ins v0.d[1], v1.d[0] ret Bootstrapped and tested on aarch64-linux-gnu. Okay for trunk. 2017-06-15 Michael Collison * config/aarch64/aarch64-simd.md(aarch64_combine_internal): Convert from define_insn_and_split into define_expand * config/aarch64/aarch64.c(aarch64_split_simd_combine): Allow register and subreg operands.