I don't think loop vectorizer can do more optimization here.

GCC pass to vec_perm_const targethook vec_perm <,,(nunits - 1, nunits , nuits + 1, ....)>
to handle that. It's very target dependent. We can't do more about that.

For RVV, it's better transform this case into vec_extract + vec_shl_insert.
However, for ARM SVE, it's not. ARM SVE has a dedicated instruction to handle that (trn),
it's better to pass vec_perm_const with this permute indice for ARM SVE.



juzhe.zhong@rivai.ai
 
From: Robin Dapp
Date: 2023-11-23 22:58
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Optimize a special case of VLA SLP
LGTM (and harmless enough) but I'd rather wait for a second look or a
maintainer's OK as we're past stage 1 and it's not a real bugfix.
(On top, it's Thanksgiving so not many people will even notice).
 
On a related note, this should probably be a middle-end optimization
but before a variable-index vec extract most likely nobody bothered. 
 
Regards
Robin