From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 701183858C1F for ; Mon, 14 Aug 2023 12:53:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 701183858C1F Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6FC811063; Mon, 14 Aug 2023 05:53:58 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 941BC3F64C; Mon, 14 Aug 2023 05:53:15 -0700 (PDT) From: Richard Sandiford To: Prathamesh Kulkarni Mail-Followup-To: Prathamesh Kulkarni ,gcc Patches , richard.sandiford@arm.com Cc: gcc Patches Subject: Re: [RFC] [v2] Extend fold_vec_perm to handle VLA vectors References: Date: Mon, 14 Aug 2023 13:53:14 +0100 In-Reply-To: (Prathamesh Kulkarni's message of "Sun, 13 Aug 2023 17:19:58 +0530") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-19.8 required=5.0 tests=BAYES_00,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Prathamesh Kulkarni writes: > On Thu, 10 Aug 2023 at 21:27, Richard Sandiford > wrote: >> >> Prathamesh Kulkarni writes: >> >> static bool >> >> is_simple_vla_size (poly_uint64 size) >> >> { >> >> if (size.is_constant ()) >> >> return false; >> >> for (int i = 1; i < ARRAY_SIZE (size.coeffs); ++i) >> >> if (size[i] != (i <= 1 ? size[0] : 0)) >> > Just wondering is this should be (i == 1 ? size[0] : 0) since i is >> > initialized to 1 ? >> >> Both work. I prefer <= 1 because it doesn't depend on the micro >> optimisation to start at coefficient 1. In a theoretical 3-indeterminate >> poly_int, we want the first 2 coefficients to be nonzero and the rest to >> be zero. >> >> > IIUC, is_simple_vla_size should return true for polynomials of first >> > degree and having same coeff like 4 + 4x ? >> >> FWIW, poly_int only supports first-degree polynomials at the moment. >> coeffs>2 means there is more than one indeterminate, rather than a >> higher power. > Oh OK, thanks for the clarification. >> >> >> return false; >> >> return true; >> >> } >> >> >> >> >> >> FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_INT) >> >> { >> >> auto nunits = GET_MODE_NUNITS (mode); >> >> if (!is_simple_vla_size (nunits)) >> >> continue; >> >> if (nunits[0] ...) >> >> test_... (mode); >> >> ... >> >> >> >> } >> >> >> >> test_vnx4si_v4si and test_v4si_vnx4si look good. But with the >> >> loop structure above, I think we can apply the test_vnx4si and >> >> test_vnx16qi to more cases. So the classification isn't the >> >> exact number of elements, but instead a limit. >> >> >> >> I think the nunits[0] conditions for test_vnx4si are as follows >> >> (inspection only, so could be wrong): >> >> >> >> > +/* Test cases where result and input vectors are VNx4SI */ >> >> > + >> >> > +static void >> >> > +test_vnx4si (machine_mode vmode) >> >> > +{ >> >> > + /* Case 1: mask = {0, ...} */ >> >> > + { >> >> > + tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); >> >> > + tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); >> >> > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); >> >> > + >> >> > + vec_perm_builder builder (len, 1, 1); >> >> > + builder.quick_push (0); >> >> > + vec_perm_indices sel (builder, 2, len); >> >> > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); >> >> > + >> >> > + tree expected_res[] = { vector_cst_elt (res, 0) }; >> > This should be { vector_cst_elt (arg0, 0) }; will fix in next patch. >> >> > + validate_res (1, 1, res, expected_res); >> >> > + } >> >> >> >> nunits[0] >= 2 (could be all nunits if the inputs had nelts_per_pattern==1, >> >> which I think would be better) >> > IIUC, the vectors that can be used for a particular test should have >> > nunits[0] >= res_npatterns, >> > where res_npatterns is as computed in fold_vec_perm_cst without the >> > canonicalization ? >> > For above test -- res_npatterns = max(2, max (2, 1)) == 2, so we >> > require nunits[0] >= 2 ? >> > Which implies we can use above test for vectors with length 2 + 2x, 4 + 4x, etc. >> >> Right, that's what I meant. With the inputs as they stand it has to be >> nunits[0] >= 2. We need that form the inputs correctly. But if the >> inputs instead had nelts_per_pattern == 1, the test would work for all >> nunits. > In the attached patch, I have reordered the tests based on min or max limit. > For tests where sel_npatterns < 3 (ie dup sequence), I have kept input > npatterns = 1, > so we can test more vector modes, and also input npatterns matter only > for stepped sequence in sel > (Since for a dup pattern we don't enforce the constraint of selecting > elements from same input pattern). > Does it look OK ? > > For the following tests with input vectors having shape (1, 3) > sel = {0, 1, 2, ...} // (1, 3) > res = { arg0[0], arg0[1], arg0[2], ... } // (1, 3) > > and sel = {len, len + 1, len + 2, ... } // (1, 3) > res = { arg1[0], arg1[1], arg1[2], ... } // (1, 3) > > Altho res_npatterns = 1, I suppose these will need to be tested with > vectors with length >= 4 + 4x, > since index 2 can be ambiguous for length 2 + 2x ? > (In the patch, these are cases 2 and 3 in test_nunits_min_4) Ah, yeah, fair point. I guess that means: + /* Case 3: mask = {len, 0, 1, ...} // (1, 3) + Test that stepped sequence of the pattern selects from arg0. + res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3) */ + { + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); + + vec_perm_builder builder (len, 1, 3); + poly_uint64 mask_elems[] = { len, 0, 1 }; + builder_push_elems (builder, mask_elems); + + vec_perm_indices sel (builder, 2, len); + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); + + tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) }; + validate_res (1, 3, res, expected_res); + } needs to be min_2 after all. Also: > +/* Helper routine to push multiple elements into BUILDER. */ > + > +static void > +builder_push_elems (vec_perm_builder& builder, poly_uint64 *elems) > +{ > + for (unsigned i = 0; i < builder.encoded_nelts (); i++) > + builder.quick_push (elems[i]); > +} I think it'd be safer to make this: template builder_push_elems (vec_perm_builder& builder, poly_uint64 (&elems)[N]) { for (unsigned i = 0; i < N; i++) builder.quick_push (elems[i]); } so that we only push elements that are in the array. OK for trunk with those changes, thanks. Richard > + > +#define ARG0(index) vector_cst_elt (arg0, index) > +#define ARG1(index) vector_cst_elt (arg1, index) > + > +/* Test cases where result is VNx4SI and input vectors are V4SI. */ > + > +static void > +test_vnx4si_v4si (machine_mode vnx4si_mode, machine_mode v4si_mode) > +{ > + for (int i = 0; i < 10; i++) > + { > + /* Case 1: > + sel = { 0, 4, 1, 5, ... } > + res = { arg[0], arg1[0], arg0[1], arg1[1], ...} // (4, 1) */ > + { > + tree arg0 = build_vec_cst_rand (v4si_mode, 4, 1, 0); > + tree arg1 = build_vec_cst_rand (v4si_mode, 4, 1, 0); > + > + tree inner_type > + = lang_hooks.types.type_for_mode (GET_MODE_INNER (vnx4si_mode), 1); > + tree res_type = build_vector_type_for_mode (inner_type, vnx4si_mode); > + > + poly_uint64 res_len = TYPE_VECTOR_SUBPARTS (res_type); > + vec_perm_builder builder (res_len, 4, 1); > + poly_uint64 mask_elems[] = { 0, 4, 1, 5 }; > + builder_push_elems (builder, mask_elems); > + > + vec_perm_indices sel (builder, 2, res_len); > + tree res = fold_vec_perm_cst (res_type, arg0, arg1, sel); > + > + tree expected_res[] = { ARG0(0), ARG1(0), ARG0(1), ARG1(1) }; > + validate_res (4, 1, res, expected_res); > + } > + > + /* Case 2: Same as case 1, but contains an out of bounds access which > + should wrap around. > + sel = {0, 8, 4, 12, ...} (4, 1) > + res = { arg0[0], arg0[0], arg1[0], arg1[0], ... } (4, 1). */ > + { > + tree arg0 = build_vec_cst_rand (v4si_mode, 4, 1, 0); > + tree arg1 = build_vec_cst_rand (v4si_mode, 4, 1, 0); > + > + tree inner_type > + = lang_hooks.types.type_for_mode (GET_MODE_INNER (vnx4si_mode), 1); > + tree res_type = build_vector_type_for_mode (inner_type, vnx4si_mode); > + > + poly_uint64 res_len = TYPE_VECTOR_SUBPARTS (res_type); > + vec_perm_builder builder (res_len, 4, 1); > + poly_uint64 mask_elems[] = { 0, 8, 4, 12 }; > + builder_push_elems (builder, mask_elems); > + > + vec_perm_indices sel (builder, 2, res_len); > + tree res = fold_vec_perm_cst (res_type, arg0, arg1, sel); > + > + tree expected_res[] = { ARG0(0), ARG0(0), ARG1(0), ARG1(0) }; > + validate_res (4, 1, res, expected_res); > + } > + } > +} > + > +/* Test cases where result is V4SI and input vectors are VNx4SI. */ > + > +static void > +test_v4si_vnx4si (machine_mode v4si_mode, machine_mode vnx4si_mode) > +{ > + for (int i = 0; i < 10; i++) > + { > + /* Case 1: > + sel = { 0, 1, 2, 3} > + res = { arg0[0], arg0[1], arg0[2], arg0[3] }. */ > + { > + tree arg0 = build_vec_cst_rand (vnx4si_mode, 4, 1); > + tree arg1 = build_vec_cst_rand (vnx4si_mode, 4, 1); > + > + tree inner_type > + = lang_hooks.types.type_for_mode (GET_MODE_INNER (v4si_mode), 1); > + tree res_type = build_vector_type_for_mode (inner_type, v4si_mode); > + > + poly_uint64 res_len = TYPE_VECTOR_SUBPARTS (res_type); > + vec_perm_builder builder (res_len, 4, 1); > + poly_uint64 mask_elems[] = {0, 1, 2, 3}; > + builder_push_elems (builder, mask_elems); > + > + vec_perm_indices sel (builder, 2, res_len); > + tree res = fold_vec_perm_cst (res_type, arg0, arg1, sel); > + > + tree expected_res[] = { ARG0(0), ARG0(1), ARG0(2), ARG0(3) }; > + validate_res_vls (res, expected_res, 4); > + } > + > + /* Case 2: Same as Case 1, but crossing input vector. > + sel = {0, 2, 4, 6} > + In this case,the index 4 is ambiguous since len = 4 + 4x. > + Since we cannot determine, which vector to choose from during > + compile time, should return NULL_TREE. */ > + { > + tree arg0 = build_vec_cst_rand (vnx4si_mode, 4, 1); > + tree arg1 = build_vec_cst_rand (vnx4si_mode, 4, 1); > + > + tree inner_type > + = lang_hooks.types.type_for_mode (GET_MODE_INNER (v4si_mode), 1); > + tree res_type = build_vector_type_for_mode (inner_type, v4si_mode); > + > + poly_uint64 res_len = TYPE_VECTOR_SUBPARTS (res_type); > + vec_perm_builder builder (res_len, 4, 1); > + poly_uint64 mask_elems[] = {0, 2, 4, 6}; > + builder_push_elems (builder, mask_elems); > + > + vec_perm_indices sel (builder, 2, res_len); > + const char *reason; > + tree res = fold_vec_perm_cst (res_type, arg0, arg1, sel, &reason); > + > + ASSERT_TRUE (res == NULL_TREE); > + ASSERT_TRUE (!strcmp (reason, "cannot divide selector element by arg len")); > + } > + } > +} > + > +/* Test all input vectors. */ > + > +static void > +test_all_nunits (machine_mode vmode) > +{ > + /* Test with 10 different inputs. */ > + for (int i = 0; i < 10; i++) > + { > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > + > + /* Case 1: mask = {0, ...} // (1, 1) > + res = { arg0[0], ... } // (1, 1) */ > + { > + vec_perm_builder builder (len, 1, 1); > + builder.quick_push (0); > + vec_perm_indices sel (builder, 2, len); > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > + tree expected_res[] = { ARG0(0) }; > + validate_res (1, 1, res, expected_res); > + } > + > + /* Case 2: mask = {len, ...} // (1, 1) > + res = { arg1[0], ... } // (1, 1) */ > + { > + vec_perm_builder builder (len, 1, 1); > + builder.quick_push (len); > + vec_perm_indices sel (builder, 2, len); > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > + > + tree expected_res[] = { ARG1(0) }; > + validate_res (1, 1, res, expected_res); > + } > + > + /* Case 3: mask = {len, 0, 1, ...} // (1, 3) > + Test that stepped sequence of the pattern selects from arg0. > + res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3) */ > + { > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > + > + vec_perm_builder builder (len, 1, 3); > + poly_uint64 mask_elems[] = { len, 0, 1 }; > + builder_push_elems (builder, mask_elems); > + > + vec_perm_indices sel (builder, 2, len); > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > + > + tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) }; > + validate_res (1, 3, res, expected_res); > + } > + } > +} > + > +/* Test all vectors which contain at-least 2 elements. */ > + > +static void > +test_nunits_min_2 (machine_mode vmode) > +{ > + for (int i = 0; i < 10; i++) > + { > + /* Case 1: mask = { 0, len, ... } // (2, 1) > + res = { arg0[0], arg1[0], ... } // (2, 1) */ > + { > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > + > + vec_perm_builder builder (len, 2, 1); > + poly_uint64 mask_elems[] = { 0, len }; > + builder_push_elems (builder, mask_elems); > + > + vec_perm_indices sel (builder, 2, len); > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > + > + tree expected_res[] = { ARG0(0), ARG1(0) }; > + validate_res (2, 1, res, expected_res); > + } > + > + /* Case 2: mask = { 0, len, 1, len+1, ... } // (2, 2) > + res = { arg0[0], arg1[0], arg0[1], arg1[1], ... } // (2, 2) */ > + { > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > + > + vec_perm_builder builder (len, 2, 2); > + poly_uint64 mask_elems[] = { 0, len, 1, len + 1 }; > + builder_push_elems (builder, mask_elems); > + > + vec_perm_indices sel (builder, 2, len); > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > + > + tree expected_res[] = { ARG0(0), ARG1(0), ARG0(1), ARG1(1) }; > + validate_res (2, 2, res, expected_res); > + } > + > + /* Case 4: mask = {0, 0, 1, ...} // (1, 3) > + Test that the stepped sequence of the pattern selects from > + same input pattern. Since input vectors have npatterns = 2, > + and step (a2 - a1) = 1, step is not a multiple of npatterns > + in input vector. So return NULL_TREE. */ > + { > + tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1); > + tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1); > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > + > + vec_perm_builder builder (len, 1, 3); > + poly_uint64 mask_elems[] = { 0, 0, 1 }; > + builder_push_elems (builder, mask_elems); > + > + vec_perm_indices sel (builder, 2, len); > + const char *reason; > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, > + &reason); > + ASSERT_TRUE (res == NULL_TREE); > + ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns")); > + } > + } > +} > + > +/* Test all vectors which contain at-least 4 elements. */ > + > +static void > +test_nunits_min_4 (machine_mode vmode) > +{ > + for (int i = 0; i < 10; i++) > + { > + /* Case 1: mask = { 0, len, 1, len+1, ... } // (4, 1) > + res: { arg0[0], arg1[0], arg0[1], arg1[1], ... } // (4, 1) */ > + { > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > + > + vec_perm_builder builder (len, 4, 1); > + poly_uint64 mask_elems[] = { 0, len, 1, len + 1 }; > + builder_push_elems (builder, mask_elems); > + > + vec_perm_indices sel (builder, 2, len); > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > + > + tree expected_res[] = { ARG0(0), ARG1(0), ARG0(1), ARG1(1) }; > + validate_res (4, 1, res, expected_res); > + } > + > + /* Case 2: sel = {0, 1, 2, ...} // (1, 3) > + res: { arg0[0], arg0[1], arg0[2], ... } // (1, 3) */ > + { > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 2); > + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 2); > + poly_uint64 arg0_len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > + > + vec_perm_builder builder (arg0_len, 1, 3); > + poly_uint64 mask_elems[] = {0, 1, 2}; > + builder_push_elems (builder, mask_elems); > + > + vec_perm_indices sel (builder, 2, arg0_len); > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > + tree expected_res[] = { ARG0(0), ARG0(1), ARG0(2) }; > + validate_res (1, 3, res, expected_res); > + } > + > + /* Case 3: sel = {len, len+1, len+2, ...} // (1, 3) > + res: { arg1[0], arg1[1], arg1[2], ... } // (1, 3) */ > + { > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 2); > + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 2); > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > + > + vec_perm_builder builder (len, 1, 3); > + poly_uint64 mask_elems[] = {len, len + 1, len + 2}; > + builder_push_elems (builder, mask_elems); > + > + vec_perm_indices sel (builder, 2, len); > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > + tree expected_res[] = { ARG1(0), ARG1(1), ARG1(2) }; > + validate_res (1, 3, res, expected_res); > + } > + > + /* Case 4: > + sel = { len, 0, 2, ... } // (1, 3) > + This should return NULL because we cross the input vectors. > + Because, > + Let's assume len = C + Cx > + a1 = 0 > + S = 2 > + esel = arg0_len / sel_npatterns = C + Cx > + ae = 0 + (esel - 2) * S > + = 0 + (C + Cx - 2) * 2 > + = 2(C-2) + 2Cx > + > + For C >= 4: > + Let q1 = a1 / arg0_len = 0 / (C + Cx) = 0 > + Let qe = ae / arg0_len = (2(C-2) + 2Cx) / (C + Cx) = 1 > + Since q1 != qe, we cross input vectors. > + So return NULL_TREE. */ > + { > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 2); > + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 2); > + poly_uint64 arg0_len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > + > + vec_perm_builder builder (arg0_len, 1, 3); > + poly_uint64 mask_elems[] = { arg0_len, 0, 2 }; > + builder_push_elems (builder, mask_elems); > + > + vec_perm_indices sel (builder, 2, arg0_len); > + const char *reason; > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason); > + ASSERT_TRUE (res == NULL_TREE); > + ASSERT_TRUE (!strcmp (reason, "crossed input vectors")); > + } > + > + /* Case 5: npatterns(arg0) = 4 > npatterns(sel) = 2 > + mask = { 0, len, 1, len + 1, ...} // (2, 2) > + res = { arg0[0], arg1[0], arg0[1], arg1[1], ... } // (2, 2) > + > + Note that fold_vec_perm_cst will set > + res_npatterns = max(4, max(4, 2)) = 4 > + However after canonicalizing, we will end up with shape (2, 2). */ > + { > + tree arg0 = build_vec_cst_rand (vmode, 4, 1); > + tree arg1 = build_vec_cst_rand (vmode, 4, 1); > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > + > + vec_perm_builder builder (len, 2, 2); > + poly_uint64 mask_elems[] = { 0, len, 1, len + 1 }; > + builder_push_elems (builder, mask_elems); > + > + vec_perm_indices sel (builder, 2, len); > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > + tree expected_res[] = { ARG0(0), ARG1(0), ARG0(1), ARG1(1) }; > + validate_res (2, 2, res, expected_res); > + } > + > + /* Case 6: Test combination in sel, where one pattern is dup and other > + is stepped sequence. > + sel = { 0, 0, 0, 1, 0, 2, ... } // (2, 3) > + res = { arg0[0], arg0[0], arg0[0], > + arg0[1], arg0[0], arg0[2], ... } // (2, 3) */ > + { > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > + > + vec_perm_builder builder (len, 2, 3); > + poly_uint64 mask_elems[] = { 0, 0, 0, 1, 0, 2 }; > + builder_push_elems (builder, mask_elems); > + > + vec_perm_indices sel (builder, 2, len); > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > + > + tree expected_res[] = { ARG0(0), ARG0(0), ARG0(0), > + ARG0(1), ARG0(0), ARG0(2) }; > + validate_res (2, 3, res, expected_res); > + } > + } > +} > + > +/* Test all vectors which contain at-least 8 elements. */ > + > +static void > +test_nunits_min_8 (machine_mode vmode) > +{ > + for (int i = 0; i < 10; i++) > + { > + /* Case 1: sel_npatterns (4) > input npatterns (2) > + sel: { 0, 0, 1, len, 2, 0, 3, len, 4, 0, 5, len, ...} // (4, 3) > + res: { arg0[0], arg0[0], arg0[0], arg1[0], > + arg0[2], arg0[0], arg0[3], arg1[0], > + arg0[4], arg0[0], arg0[5], arg1[0], ... } // (4, 3) */ > + { > + tree arg0 = build_vec_cst_rand (vmode, 2, 3, 2); > + tree arg1 = build_vec_cst_rand (vmode, 2, 3, 2); > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > + > + vec_perm_builder builder(len, 4, 3); > + poly_uint64 mask_elems[] = { 0, 0, 1, len, 2, 0, 3, len, > + 4, 0, 5, len }; > + builder_push_elems (builder, mask_elems); > + > + vec_perm_indices sel (builder, 2, len); > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel); > + > + tree expected_res[] = { ARG0(0), ARG0(0), ARG0(1), ARG1(0), > + ARG0(2), ARG0(0), ARG0(3), ARG1(0), > + ARG0(4), ARG0(0), ARG0(5), ARG1(0) }; > + validate_res (4, 3, res, expected_res); > + } > + } > +} > + > +/* Test vectors for which nunits[0] <= 4. */ > + > +static void > +test_nunits_max_4 (machine_mode vmode) > +{ > + /* Case 1: mask = {0, 4, ...} // (1, 2) > + This should return NULL_TREE because the index 4 may choose > + from either arg0 or arg1 depending on vector length. */ > + { > + tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1); > + tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1); > + poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)); > + > + vec_perm_builder builder (len, 1, 2); > + poly_uint64 mask_elems[] = {0, 4}; > + builder_push_elems (builder, mask_elems); > + > + vec_perm_indices sel (builder, 2, len); > + const char *reason; > + tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason); > + ASSERT_TRUE (res == NULL_TREE); > + ASSERT_TRUE (reason != NULL); > + ASSERT_TRUE (!strcmp (reason, "cannot divide selector element by arg len")); > + } > +} > + > +#undef ARG0 > +#undef ARG1 > + > +/* Return true if SIZE is of the form C + Cx and C is power of 2. */ > + > +static bool > +is_simple_vla_size (poly_uint64 size) > +{ > + if (size.is_constant () > + || !pow2p_hwi (size.coeffs[0])) > + return false; > + for (unsigned i = 1; i < ARRAY_SIZE (size.coeffs); ++i) > + if (size.coeffs[i] != (i <= 1 ? size.coeffs[0] : 0)) > + return false; > + return true; > +} > + > +/* Execute fold_vec_perm_cst unit tests. */ > + > +static void > +test () > +{ > + machine_mode vnx4si_mode = E_VOIDmode; > + machine_mode v4si_mode = E_VOIDmode; > + > + machine_mode vmode; > + FOR_EACH_MODE_IN_CLASS (vmode, MODE_VECTOR_INT) > + { > + /* Obtain modes corresponding to VNx4SI and V4SI, > + to call mixed mode tests below. > + FIXME: Is there a better way to do this ? */ > + if (GET_MODE_INNER (vmode) == SImode) > + { > + poly_uint64 nunits = GET_MODE_NUNITS (vmode); > + if (is_simple_vla_size (nunits) > + && nunits.coeffs[0] == 4) > + vnx4si_mode = vmode; > + else if (known_eq (nunits, poly_uint64 (4))) > + v4si_mode = vmode; > + } > + > + if (!is_simple_vla_size (GET_MODE_NUNITS (vmode)) > + || !targetm.vector_mode_supported_p (vmode)) > + continue; > + > + poly_uint64 nunits = GET_MODE_NUNITS (vmode); > + test_all_nunits (vmode); > + if (nunits.coeffs[0] >= 2) > + test_nunits_min_2 (vmode); > + if (nunits.coeffs[0] >= 4) > + test_nunits_min_4 (vmode); > + if (nunits.coeffs[0] >= 8) > + test_nunits_min_8 (vmode); > + > + if (nunits.coeffs[0] <= 4) > + test_nunits_max_4 (vmode); > + } > + > + if (vnx4si_mode != E_VOIDmode && v4si_mode != E_VOIDmode > + && targetm.vector_mode_supported_p (vnx4si_mode) > + && targetm.vector_mode_supported_p (v4si_mode)) > + { > + test_vnx4si_v4si (vnx4si_mode, v4si_mode); > + test_v4si_vnx4si (v4si_mode, vnx4si_mode); > + } > +} > +}; // end of test_fold_vec_perm_cst namespace > + > /* Verify that various binary operations on vectors are folded > correctly. */ > > @@ -16943,6 +17693,7 @@ fold_const_cc_tests () > test_arithmetic_folding (); > test_vector_folding (); > test_vec_duplicate_folding (); > + test_fold_vec_perm_cst::test (); > } > > } // namespace selftest