On Thu, 10 Aug 2023 at 21:27, Richard Sandiford wrote: > > Prathamesh Kulkarni writes: > >> static bool > >> is_simple_vla_size (poly_uint64 size) > >> { > >> if (size.is_constant ()) > >> return false; > >> for (int i = 1; i < ARRAY_SIZE (size.coeffs); ++i) > >> if (size[i] != (i <= 1 ? size[0] : 0)) > > Just wondering is this should be (i == 1 ? size[0] : 0) since i is > > initialized to 1 ? > > Both work. I prefer <= 1 because it doesn't depend on the micro > optimisation to start at coefficient 1. In a theoretical 3-indeterminate > poly_int, we want the first 2 coefficients to be nonzero and the rest to > be zero. > > > IIUC, is_simple_vla_size should return true for polynomials of first > > degree and having same coeff like 4 + 4x ? > > FWIW, poly_int only supports first-degree polynomials at the moment. > coeffs>2 means there is more than one indeterminate, rather than a > higher power. Oh OK, thanks for the clarification. > > >> return false; > >> return true; > >> } > >> > >> > >> FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_INT) > >> { > >> auto nunits = GET_MODE_NUNITS (mode); > >> if (!is_simple_vla_size (nunits)) > >> continue; > >> if (nunits[0] ...) > >> test_... (mode); > >> ... > >> > >> } > >> > >> test_vnx4si_v4si and test_v4si_vnx4si look good. But with the > >> loop structure above, I think we can apply the test_vnx4si and > >> test_vnx16qi to more cases. So the classification isn't the > >> exact number of elements, but instead a limit. > >> > >> I think the nunits[0] conditions for test_vnx4si are as follows > >> (inspection only, so could be wrong): > >> > >> > +/* Test cases where result and input vectors are VNx4SI */ > >> > + > >> > +static void > >> > +test_vnx4si (machine_mode vmode) > >> > +{ > >> > + /* Case 1: mask = {0, ...} */ > >> > + { > >> > + tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); > >> > + tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); > >> > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > >> > + > >> > + vec_perm_builder builder (len, 1, 1); > >> > + builder.quick_push (0); > >> > + vec_perm_indices sel (builder, 2, len); > >> > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > >> > + > >> > + tree expected_res[] = { vector_cst_elt (res, 0) }; > > This should be { vector_cst_elt (arg0, 0) }; will fix in next patch. > >> > + validate_res (1, 1, res, expected_res); > >> > + } > >> > >> nunits[0] >= 2 (could be all nunits if the inputs had nelts_per_pattern==1, > >> which I think would be better) > > IIUC, the vectors that can be used for a particular test should have > > nunits[0] >= res_npatterns, > > where res_npatterns is as computed in fold_vec_perm_cst without the > > canonicalization ? > > For above test -- res_npatterns = max(2, max (2, 1)) == 2, so we > > require nunits[0] >= 2 ? > > Which implies we can use above test for vectors with length 2 + 2x, 4 + 4x, etc. > > Right, that's what I meant. With the inputs as they stand it has to be > nunits[0] >= 2. We need that form the inputs correctly. But if the > inputs instead had nelts_per_pattern == 1, the test would work for all > nunits. In the attached patch, I have reordered the tests based on min or max limit. For tests where sel_npatterns < 3 (ie dup sequence), I have kept input npatterns = 1, so we can test more vector modes, and also input npatterns matter only for stepped sequence in sel (Since for a dup pattern we don't enforce the constraint of selecting elements from same input pattern). Does it look OK ? For the following tests with input vectors having shape (1, 3) sel = {0, 1, 2, ...} // (1, 3) res = { arg0[0], arg0[1], arg0[2], ... } // (1, 3) and sel = {len, len + 1, len + 2, ... } // (1, 3) res = { arg1[0], arg1[1], arg1[2], ... } // (1, 3) Altho res_npatterns = 1, I suppose these will need to be tested with vectors with length >= 4 + 4x, since index 2 can be ambiguous for length 2 + 2x ? (In the patch, these are cases 2 and 3 in test_nunits_min_4) Patch is bootstrapped+tested on aarch64-linux-gnu with and without SVE and on x86_64-linux-gnu (altho I suppose bootstrapping won't be necessary for changes to unit-tests?) > > > Sorry if this sounds like a silly question -- Won't nunits[0] >= 2 > > cover all nunits, > > since a vector, at a minimum, will contain 2 elements ? > > Not necessarily. VNx1TI makes conceptual sense. We just don't use it > currently (although that'll change with SME). And we do have single-element > VLS vectors like V1DI and V1DF. Thanks for the explanation, I wasn't aware of that. Thanks, Prathamesh > > Thanks, > Richard