I don't think loop vectorizer can do more optimization here. GCC pass to vec_perm_const targethook vec_perm <,,(nunits - 1, nunits , nuits + 1, ....)> to handle that. It's very target dependent. We can't do more about that. For RVV, it's better transform this case into vec_extract + vec_shl_insert. However, for ARM SVE, it's not. ARM SVE has a dedicated instruction to handle that (trn), it's better to pass vec_perm_const with this permute indice for ARM SVE. juzhe.zhong@rivai.ai From: Robin Dapp Date: 2023-11-23 22:58 To: Juzhe-Zhong; gcc-patches CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw Subject: Re: [PATCH] RISC-V: Optimize a special case of VLA SLP LGTM (and harmless enough) but I'd rather wait for a second look or a maintainer's OK as we're past stage 1 and it's not a real bugfix. (On top, it's Thanksgiving so not many people will even notice). On a related note, this should probably be a middle-end optimization but before a variable-index vec extract most likely nobody bothered. Regards Robin