Hi Richard, This is reworking of patch to extend fold_vec_perm to handle VLA vectors. The attached patch unifies handling of VLS and VLA vector_csts, while using fallback code for ctors. For VLS vector, the patch ignores underlying encoding, and uses npatterns = nelts, and nelts_per_pattern = 1. For VLA patterns, if sel has a stepped sequence, then it only chooses elements from a particular pattern of a particular input vector. To make things simpler, the patch imposes following constraints: (a) op0_npatterns, op1_npatterns and sel_npatterns are powers of 2. (b) The step size for a stepped sequence is a power of 2, and multiple of npatterns of chosen input vector. (c) Runtime vector length of sel is a multiple of sel_npatterns. So, we don't handle sel.length = 2 + 2x and npatterns = 4. Eg: op0, op1: npatterns = 2, nelts_per_pattern = 3 op0_len = op1_len = 16 + 16x. sel = { 0, 0, 2, 0, 4, 0, ... } npatterns = 2, nelts_per_pattern = 3. For pattern {0, 2, 4, ...} Let, a1 = 2 S = step size = 2 Let Esel denote number of elements per pattern in sel at runtime. Esel = (16 + 16x) / npatterns_sel = (16 + 16x) / 2 = (8 + 8x) So, last element of pattern: ae = a1 + (Esel - 2) * S = 2 + (8 + 8x - 2) * 2 = 14 + 16x a1 /trunc arg0_len = 2 / (16 + 16x) = 0 ae /trunc arg0_len = (14 + 16x) / (16 + 16x) = 0 Since both are equal with quotient = 0, we select elements from op0. Since step size (S) is a multiple of npatterns(op0), we select all elements from same pattern of op0. res_npatterns = max (op0_npatterns, max (op1_npatterns, sel_npatterns)) = max (2, max (2, 2) = 2 res_nelts_per_pattern = max (op0_nelts_per_pattern, max (op1_nelts_per_pattern, sel_nelts_per_pattern)) = max (3, max (3, 3)) = 3 So res has encoding with npatterns = 2, nelts_per_pattern = 3. res: { op0[0], op0[0], op0[2], op0[0], op0[4], op0[0], ... } Unfortunately, this results in an issue for poly_int_cst index: For example, op0, op1: npatterns = 1, nelts_per_pattern = 3 op0_len = op1_len = 4 + 4x sel: { 4 + 4x, 5 + 4x, 6 + 4x, ... } // should choose op1 In this case, a1 = 5 + 4x S = (6 + 4x) - (5 + 4x) = 1 Esel = 4 + 4x ae = a1 + (esel - 2) * S = (5 + 4x) + (4 + 4x - 2) * 1 = 7 + 8x IIUC, 7 + 8x will always be index for last element of op1 ? if x = 0, len = 4, 7 + 8x = 7 if x = 1, len = 8, 7 + 8x = 15, etc. So the stepped sequence will always choose elements from op1 regardless of vector length for above case ? However, ae /trunc op0_len = (7 + 8x) / (4 + 4x) which is not defined because 7/4 != 8/4 and we return NULL_TREE, but I suppose the expected result would be: res: { op1[0], op1[1], op1[2], ... } ? The patch passes bootstrap+test on aarch64-linux-gnu with and without sve, and on x86_64-unknown-linux-gnu. I would be grateful for suggestions on how to proceed. Thanks, Prathamesh