Hi, For the following test-case: typedef float __attribute__((__vector_size__ (16))) F; F foo (F a, F b) { F v = (F) { 9 }; return __builtin_shufflevector (v, v, 1, 0, 1, 2); } Compiling with -O2 results in following ICE: foo.c: In function ‘foo’: foo.c:6:10: internal compiler error: in decompose, at rtl.h:2314 6 | return __builtin_shufflevector (v, v, 1, 0, 1, 2); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0x7f3185 wi::int_traits >::decompose(long*, unsigned int, std::pair const&) ../../gcc/gcc/rtl.h:2314 0x7f3185 wide_int_ref_storage::wide_int_ref_storage >(std::pair const&) ../../gcc/gcc/wide-int.h:1089 0x7f3185 generic_wide_int >::generic_wide_int >(std::pair const&) ../../gcc/gcc/wide-int.h:847 0x7f3185 poly_int<1u, generic_wide_int > >::poly_int >(poly_int_full, std::pair const&) ../../gcc/gcc/poly-int.h:467 0x7f3185 poly_int<1u, generic_wide_int > >::poly_int >(std::pair const&) ../../gcc/gcc/poly-int.h:453 0x7f3185 wi::to_poly_wide(rtx_def const*, machine_mode) ../../gcc/gcc/rtl.h:2383 0x7f3185 rtx_vector_builder::step(rtx_def*, rtx_def*) const ../../gcc/gcc/rtx-vector-builder.h:122 0xfd4e1b vector_builder::elt(unsigned int) const ../../gcc/gcc/vector-builder.h:253 0xfd4d11 rtx_vector_builder::build() ../../gcc/gcc/rtx-vector-builder.cc:73 0xc21d9c const_vector_from_tree ../../gcc/gcc/expr.cc:13487 0xc21d9c expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) ../../gcc/gcc/expr.cc:11059 0xaee682 expand_expr(tree_node*, rtx_def*, machine_mode, expand_modifier) ../../gcc/gcc/expr.h:310 0xaee682 expand_return ../../gcc/gcc/cfgexpand.cc:3809 0xaee682 expand_gimple_stmt_1 ../../gcc/gcc/cfgexpand.cc:3918 0xaee682 expand_gimple_stmt ../../gcc/gcc/cfgexpand.cc:4044 0xaf28f0 expand_gimple_basic_block ../../gcc/gcc/cfgexpand.cc:6100 0xaf4996 execute ../../gcc/gcc/cfgexpand.cc:6835 IIUC, the issue is that fold_vec_perm returns a vector having float element type with res_nelts_per_pattern == 3, and later ICE's when it tries to derive element v[3], not present in the encoding, while trying to build rtx vector in rtx_vector_builder::build(): for (unsigned int i = 0; i < nelts; ++i) RTVEC_ELT (v, i) = elt (i); The attached patch tries to fix this by returning false from valid_mask_for_fold_vec_perm_cst if sel has a stepped sequence and input vector has non-integral element type, so for VLA vectors, it will only build result with dup sequence (nelts_per_pattern < 3) for non-integral element type. For VLS vectors, this will still work for stepped sequence since it will then use the "VLS exception" in fold_vec_perm_cst, and set: res_npattern = res_nelts and res_nelts_per_pattern = 1 and fold the above case to: F foo (F a, F b) { [local count: 1073741824]: return { 0.0, 9.0e+0, 0.0, 0.0 }; } But I am not sure if this is entirely correct, since: tree res = out_elts.build (); will canonicalize the encoding and may result in a stepped sequence (vector_builder::finalize() may reduce npatterns at the cost of increasing nelts_per_pattern) ? PS: This issue is now latent after PR111648 fix, since valid_mask_for_fold_vec_perm_cst with sel = {1, 0, 1, ...} returns false because the corresponding pattern in arg0 is not a natural stepped sequence, and folds correctly using VLS exception. However, I guess the underlying issue of dealing with non-integral element types in fold_vec_perm_cst still remains ? The patch passes bootstrap+test with and without SVE on aarch64-linux-gnu, and on x86_64-linux-gnu. Thanks, Prathamesh