From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=AueL=D7=arm.com=richard.sandiford@sourceware.org>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by sourceware.org (Postfix) with ESMTP id 701183858C1F
	for <gcc-patches@gcc.gnu.org>; Mon, 14 Aug 2023 12:53:16 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 701183858C1F
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6FC811063;
	Mon, 14 Aug 2023 05:53:58 -0700 (PDT)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 941BC3F64C;
	Mon, 14 Aug 2023 05:53:15 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
Mail-Followup-To: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>,gcc Patches <gcc-patches@gcc.gnu.org>, richard.sandiford@arm.com
Cc: gcc Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [RFC] [v2] Extend fold_vec_perm to handle VLA vectors
References: <CAAgBjMnk_N=tgPNBhUu91yt8YN0HcCoWgQQYpshHMqhU=6WgAQ@mail.gmail.com>
	<mptmszkdubp.fsf@arm.com>
	<CAAgBjMmOrVxsNT8jK0Ei-RYrhtNVymd7pmaud2tA6=OAJCbpOg@mail.gmail.com>
	<mpt4jlg2sb3.fsf@arm.com> <mptv8dw1d2l.fsf@arm.com>
	<CAAgBjMmuzg54RiUTXVQE06+AZxCM-4w+ohojFND5TWghfsiV8w@mail.gmail.com>
	<mpto7jmyhhe.fsf@arm.com>
	<CAAgBjMnhiuXreLaq13zYAwq=c2qRxoZzgSz6=nAT+sDbtPoGSQ@mail.gmail.com>
	<mpth6p93lhs.fsf@arm.com>
	<CAAgBjM=yCS5zSFaeZHe6MRFQA-Qv-9qd754s7Pmcz0_n1diQyg@mail.gmail.com>
	<mptjzu2ubys.fsf@arm.com>
	<CAAgBjMm62BEN_wrF4zK7YEEERmh5sJeehJgLhtcWj8HE=JHktA@mail.gmail.com>
Date: Mon, 14 Aug 2023 13:53:14 +0100
In-Reply-To: <CAAgBjMm62BEN_wrF4zK7YEEERmh5sJeehJgLhtcWj8HE=JHktA@mail.gmail.com>
	(Prathamesh Kulkarni's message of "Sun, 13 Aug 2023 17:19:58 +0530")
Message-ID: <mpto7j9rdj9.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Status: No, score=-19.8 required=5.0 tests=BAYES_00,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> On Thu, 10 Aug 2023 at 21:27, Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
>> >> static bool
>> >> is_simple_vla_size (poly_uint64 size)
>> >> {
>> >>   if (size.is_constant ())
>> >>     return false;
>> >>   for (int i = 1; i < ARRAY_SIZE (size.coeffs); ++i)
>> >>     if (size[i] != (i <= 1 ? size[0] : 0))
>> > Just wondering is this should be (i == 1 ? size[0] : 0) since i is
>> > initialized to 1 ?
>>
>> Both work.  I prefer <= 1 because it doesn't depend on the micro
>> optimisation to start at coefficient 1.  In a theoretical 3-indeterminate
>> poly_int, we want the first 2 coefficients to be nonzero and the rest to
>> be zero.
>>
>> > IIUC, is_simple_vla_size should return true for polynomials of first
>> > degree and having same coeff like 4 + 4x ?
>>
>> FWIW, poly_int only supports first-degree polynomials at the moment.
>> coeffs>2 means there is more than one indeterminate, rather than a
>> higher power.
> Oh OK, thanks for the clarification.
>>
>> >>       return false;
>> >>   return true;
>> >> }
>> >>
>> >>
>> >>   FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_INT)
>> >>     {
>> >>       auto nunits = GET_MODE_NUNITS (mode);
>> >>       if (!is_simple_vla_size (nunits))
>> >>         continue;
>> >>       if (nunits[0] ...)
>> >>         test_... (mode);
>> >>       ...
>> >>
>> >>     }
>> >>
>> >> test_vnx4si_v4si and test_v4si_vnx4si look good.  But with the
>> >> loop structure above, I think we can apply the test_vnx4si and
>> >> test_vnx16qi to more cases.  So the classification isn't the
>> >> exact number of elements, but instead a limit.
>> >>
>> >> I think the nunits[0] conditions for test_vnx4si are as follows
>> >> (inspection only, so could be wrong):
>> >>
>> >> > +/* Test cases where result and input vectors are VNx4SI  */
>> >> > +
>> >> > +static void
>> >> > +test_vnx4si (machine_mode vmode)
>> >> > +{
>> >> > +  /* Case 1: mask = {0, ...} */
>> >> > +  {
>> >> > +    tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1);
>> >> > +    tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1);
>> >> > +    poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
>> >> > +
>> >> > +    vec_perm_builder builder (len, 1, 1);
>> >> > +    builder.quick_push (0);
>> >> > +    vec_perm_indices sel (builder, 2, len);
>> >> > +    tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
>> >> > +
>> >> > +    tree expected_res[] = { vector_cst_elt (res, 0) };
>> > This should be { vector_cst_elt (arg0, 0) }; will fix in next patch.
>> >> > +    validate_res (1, 1, res, expected_res);
>> >> > +  }
>> >>
>> >> nunits[0] >= 2 (could be all nunits if the inputs had nelts_per_pattern==1,
>> >> which I think would be better)
>> > IIUC, the vectors that can be used for a particular test should have
>> > nunits[0] >= res_npatterns,
>> > where res_npatterns is as computed in fold_vec_perm_cst without the
>> > canonicalization ?
>> > For above test -- res_npatterns = max(2, max (2, 1)) == 2, so we
>> > require nunits[0] >= 2 ?
>> > Which implies we can use above test for vectors with length 2 + 2x, 4 + 4x, etc.
>>
>> Right, that's what I meant.  With the inputs as they stand it has to be
>> nunits[0] >= 2.  We need that form the inputs correctly.  But if the
>> inputs instead had nelts_per_pattern == 1, the test would work for all
>> nunits.
> In the attached patch, I have reordered the tests based on min or max limit.
> For tests where sel_npatterns < 3 (ie dup sequence), I have kept input
> npatterns = 1,
> so we can test more vector modes, and also input npatterns matter only
> for stepped sequence in sel
> (Since for a dup pattern we don't enforce the constraint of selecting
> elements from same input pattern).
> Does it look OK ?
>
> For the following tests with input vectors having shape (1, 3)
> sel = {0, 1, 2, ...}  // (1, 3)
> res = { arg0[0], arg0[1], arg0[2], ... } // (1, 3)
>
> and sel = {len, len + 1, len + 2, ... }  // (1, 3)
> res = { arg1[0], arg1[1], arg1[2], ... } // (1, 3)
>
> Altho res_npatterns = 1, I suppose these will need to be tested with
> vectors with length >= 4 + 4x,
> since index 2 can be ambiguous for length 2 + 2x  ?
> (In the patch, these are cases 2 and 3 in test_nunits_min_4)

Ah, yeah, fair point.  I guess that means:

+      /* Case 3: mask = {len, 0, 1, ...} // (1, 3)
+	 Test that stepped sequence of the pattern selects from arg0.
+	 res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3)  */
+      {
+	tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
+	tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
+	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
+
+	vec_perm_builder builder (len, 1, 3);
+	poly_uint64 mask_elems[] = { len, 0, 1 };
+	builder_push_elems (builder, mask_elems);
+
+	vec_perm_indices sel (builder, 2, len);
+	tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
+
+	tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) };
+	validate_res (1, 3, res, expected_res);
+      }

needs to be min_2 after all.

Also:

> +/* Helper routine to push multiple elements into BUILDER.  */
> +
> +static void
> +builder_push_elems (vec_perm_builder& builder, poly_uint64 *elems)
> +{
> +  for (unsigned i = 0; i < builder.encoded_nelts (); i++)
> +    builder.quick_push (elems[i]);
> +}

I think it'd be safer to make this:

template<unsigned N>
builder_push_elems (vec_perm_builder& builder, poly_uint64 (&elems)[N])
{
  for (unsigned i = 0; i < N; i++)
    builder.quick_push (elems[i]);
}

so that we only push elements that are in the array.

OK for trunk with those changes, thanks.

Richard

> +
> +#define ARG0(index) vector_cst_elt (arg0, index)
> +#define ARG1(index) vector_cst_elt (arg1, index)
> +
> +/* Test cases where result is VNx4SI and input vectors are V4SI.  */
> +
> +static void
> +test_vnx4si_v4si (machine_mode vnx4si_mode, machine_mode v4si_mode)
> +{
> +  for (int i = 0; i < 10; i++)
> +    {
> +      /* Case 1:
> +	 sel = { 0, 4, 1, 5, ... }
> +	 res = { arg[0], arg1[0], arg0[1], arg1[1], ...} // (4, 1)  */
> +      {
> +	tree arg0 = build_vec_cst_rand (v4si_mode, 4, 1, 0);
> +	tree arg1 = build_vec_cst_rand (v4si_mode, 4, 1, 0);
> +
> +	tree inner_type
> +	  = lang_hooks.types.type_for_mode (GET_MODE_INNER (vnx4si_mode), 1);
> +	tree res_type = build_vector_type_for_mode (inner_type, vnx4si_mode);
> +
> +	poly_uint64 res_len = TYPE_VECTOR_SUBPARTS (res_type);
> +	vec_perm_builder builder (res_len, 4, 1);
> +	poly_uint64 mask_elems[] = { 0, 4, 1, 5 };
> +	builder_push_elems (builder, mask_elems);
> +
> +	vec_perm_indices sel (builder, 2, res_len);
> +	tree res = fold_vec_perm_cst (res_type, arg0, arg1, sel);
> +
> +	tree expected_res[] = { ARG0(0), ARG1(0), ARG0(1), ARG1(1) };
> +	validate_res (4, 1, res, expected_res);
> +      }
> +
> +      /* Case 2: Same as case 1, but contains an out of bounds access which
> +	 should wrap around.
> +	 sel = {0, 8, 4, 12, ...} (4, 1)
> +	 res = { arg0[0], arg0[0], arg1[0], arg1[0], ... } (4, 1).  */
> +      {
> +	tree arg0 = build_vec_cst_rand (v4si_mode, 4, 1, 0);
> +	tree arg1 = build_vec_cst_rand (v4si_mode, 4, 1, 0);
> +
> +	tree inner_type
> +	  = lang_hooks.types.type_for_mode (GET_MODE_INNER (vnx4si_mode), 1);
> +	tree res_type = build_vector_type_for_mode (inner_type, vnx4si_mode);
> +
> +	poly_uint64 res_len = TYPE_VECTOR_SUBPARTS (res_type);
> +	vec_perm_builder builder (res_len, 4, 1);
> +	poly_uint64 mask_elems[] = { 0, 8, 4, 12 };
> +	builder_push_elems (builder, mask_elems);
> +
> +	vec_perm_indices sel (builder, 2, res_len);
> +	tree res = fold_vec_perm_cst (res_type, arg0, arg1, sel);
> +
> +	tree expected_res[] = { ARG0(0), ARG0(0), ARG1(0), ARG1(0) };
> +	validate_res (4, 1, res, expected_res);
> +      }
> +    }
> +}
> +
> +/* Test cases where result is V4SI and input vectors are VNx4SI.  */
> +
> +static void
> +test_v4si_vnx4si (machine_mode v4si_mode, machine_mode vnx4si_mode)
> +{
> +  for (int i = 0; i < 10; i++)
> +    {
> +      /* Case 1:
> +	 sel = { 0, 1, 2, 3}
> +	 res = { arg0[0], arg0[1], arg0[2], arg0[3] }.  */
> +      {
> +	tree arg0 = build_vec_cst_rand (vnx4si_mode, 4, 1);
> +	tree arg1 = build_vec_cst_rand (vnx4si_mode, 4, 1);
> +
> +	tree inner_type
> +	  = lang_hooks.types.type_for_mode (GET_MODE_INNER (v4si_mode), 1);
> +	tree res_type = build_vector_type_for_mode (inner_type, v4si_mode);
> +
> +	poly_uint64 res_len = TYPE_VECTOR_SUBPARTS (res_type);
> +	vec_perm_builder builder (res_len, 4, 1);
> +	poly_uint64 mask_elems[] = {0, 1, 2, 3};
> +	builder_push_elems (builder, mask_elems);
> +
> +	vec_perm_indices sel (builder, 2, res_len);
> +	tree res = fold_vec_perm_cst (res_type, arg0, arg1, sel);
> +
> +	tree expected_res[] = { ARG0(0), ARG0(1), ARG0(2), ARG0(3) };
> +	validate_res_vls (res, expected_res, 4);
> +      }
> +
> +      /* Case 2: Same as Case 1, but crossing input vector.
> +	 sel = {0, 2, 4, 6}
> +	 In this case,the index 4 is ambiguous since len = 4 + 4x.
> +	 Since we cannot determine, which vector to choose from during
> +	 compile time, should return NULL_TREE.  */
> +      {
> +	tree arg0 = build_vec_cst_rand (vnx4si_mode, 4, 1);
> +	tree arg1 = build_vec_cst_rand (vnx4si_mode, 4, 1);
> +
> +	tree inner_type
> +	  = lang_hooks.types.type_for_mode (GET_MODE_INNER (v4si_mode), 1);
> +	tree res_type = build_vector_type_for_mode (inner_type, v4si_mode);
> +
> +	poly_uint64 res_len = TYPE_VECTOR_SUBPARTS (res_type);
> +	vec_perm_builder builder (res_len, 4, 1);
> +	poly_uint64 mask_elems[] = {0, 2, 4, 6};
> +	builder_push_elems (builder, mask_elems);
> +
> +	vec_perm_indices sel (builder, 2, res_len);
> +	const char *reason;
> +	tree res = fold_vec_perm_cst (res_type, arg0, arg1, sel, &reason);
> +
> +	ASSERT_TRUE (res == NULL_TREE);
> +	ASSERT_TRUE (!strcmp (reason, "cannot divide selector element by arg len"));
> +      }
> +    }
> +}
> +
> +/* Test all input vectors.  */
> +
> +static void
> +test_all_nunits (machine_mode vmode)
> +{
> +  /* Test with 10 different inputs.  */
> +  for (int i = 0; i < 10; i++)
> +    {
> +      tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> +      tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> +      poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +      /* Case 1: mask = {0, ...} // (1, 1)
> +	 res = { arg0[0], ... } // (1, 1)  */
> +      {
> +	vec_perm_builder builder (len, 1, 1);
> +	builder.quick_push (0);
> +	vec_perm_indices sel (builder, 2, len);
> +	tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> +	tree expected_res[] = { ARG0(0) };
> +	validate_res (1, 1, res, expected_res);
> +      }
> +
> +      /* Case 2: mask = {len, ...} // (1, 1)
> +	 res = { arg1[0], ... } // (1, 1)  */
> +      {
> +	vec_perm_builder builder (len, 1, 1);
> +	builder.quick_push (len);
> +	vec_perm_indices sel (builder, 2, len);
> +	tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> +
> +	tree expected_res[] = { ARG1(0) };
> +	validate_res (1, 1, res, expected_res);
> +      }
> +
> +      /* Case 3: mask = {len, 0, 1, ...} // (1, 3)
> +	 Test that stepped sequence of the pattern selects from arg0.
> +	 res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3)  */
> +      {
> +	tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> +	tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> +	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +	vec_perm_builder builder (len, 1, 3);
> +	poly_uint64 mask_elems[] = { len, 0, 1 };
> +	builder_push_elems (builder, mask_elems);
> +
> +	vec_perm_indices sel (builder, 2, len);
> +	tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> +
> +	tree expected_res[] = { ARG1(0), ARG0(0), ARG0(1) };
> +	validate_res (1, 3, res, expected_res);
> +      }
> +    }
> +}
> +
> +/* Test all vectors which contain at-least 2 elements.  */
> +
> +static void
> +test_nunits_min_2 (machine_mode vmode)
> +{
> +  for (int i = 0; i < 10; i++)
> +    {
> +      /* Case 1: mask = { 0, len, ... }  // (2, 1)
> +	 res = { arg0[0], arg1[0], ... } // (2, 1)  */
> +      {
> +	tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> +	tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> +	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +	vec_perm_builder builder (len, 2, 1);
> +	poly_uint64 mask_elems[] = { 0, len };
> +	builder_push_elems (builder, mask_elems);
> +
> +	vec_perm_indices sel (builder, 2, len);
> +	tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> +
> +	tree expected_res[] = { ARG0(0), ARG1(0) };
> +	validate_res (2, 1, res, expected_res);
> +      }
> +
> +      /* Case 2: mask = { 0, len, 1, len+1, ... } // (2, 2)
> +	 res = { arg0[0], arg1[0], arg0[1], arg1[1], ... } // (2, 2)  */
> +      {
> +	tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> +	tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> +	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +	vec_perm_builder builder (len, 2, 2);
> +	poly_uint64 mask_elems[] = { 0, len, 1, len + 1 };
> +	builder_push_elems (builder, mask_elems);
> +
> +	vec_perm_indices sel (builder, 2, len);
> +	tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> +
> +	tree expected_res[] = { ARG0(0), ARG1(0), ARG0(1), ARG1(1) };
> +	validate_res (2, 2, res, expected_res);
> +      }
> +
> +      /* Case 4: mask = {0, 0, 1, ...} // (1, 3)
> +	 Test that the stepped sequence of the pattern selects from
> +	 same input pattern. Since input vectors have npatterns = 2,
> +	 and step (a2 - a1) = 1, step is not a multiple of npatterns
> +	 in input vector. So return NULL_TREE.  */
> +      {
> +	tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1);
> +	tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1);
> +	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +	vec_perm_builder builder (len, 1, 3);
> +	poly_uint64 mask_elems[] = { 0, 0, 1 };
> +	builder_push_elems (builder, mask_elems);
> +
> +	vec_perm_indices sel (builder, 2, len);
> +	const char *reason;
> +	tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel,
> +				      &reason);
> +	ASSERT_TRUE (res == NULL_TREE);
> +	ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns"));
> +      }
> +    }
> +}
> +
> +/* Test all vectors which contain at-least 4 elements.  */
> +
> +static void
> +test_nunits_min_4 (machine_mode vmode)
> +{
> +  for (int i = 0; i < 10; i++)
> +    {
> +      /* Case 1: mask = { 0, len, 1, len+1, ... } // (4, 1)
> +	 res: { arg0[0], arg1[0], arg0[1], arg1[1], ... } // (4, 1)  */
> +      {
> +	tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> +	tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> +	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +	vec_perm_builder builder (len, 4, 1);
> +	poly_uint64 mask_elems[] = { 0, len, 1, len + 1 };
> +	builder_push_elems (builder, mask_elems);
> +
> +	vec_perm_indices sel (builder, 2, len);
> +	tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> +
> +	tree expected_res[] = { ARG0(0), ARG1(0), ARG0(1), ARG1(1) };
> +	validate_res (4, 1, res, expected_res);
> +      }
> +
> +      /* Case 2: sel = {0, 1, 2, ...}  // (1, 3)
> +	 res: { arg0[0], arg0[1], arg0[2], ... } // (1, 3) */
> +      {
> +	tree arg0 = build_vec_cst_rand (vmode, 1, 3, 2);
> +	tree arg1 = build_vec_cst_rand (vmode, 1, 3, 2);
> +	poly_uint64 arg0_len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +	vec_perm_builder builder (arg0_len, 1, 3);
> +	poly_uint64 mask_elems[] = {0, 1, 2};
> +	builder_push_elems (builder, mask_elems);
> +
> +	vec_perm_indices sel (builder, 2, arg0_len);
> +	tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> +	tree expected_res[] = { ARG0(0), ARG0(1), ARG0(2) };
> +	validate_res (1, 3, res, expected_res);
> +      }
> +
> +      /* Case 3: sel = {len, len+1, len+2, ...} // (1, 3)
> +	 res: { arg1[0], arg1[1], arg1[2], ... } // (1, 3) */
> +      {
> +	tree arg0 = build_vec_cst_rand (vmode, 1, 3, 2);
> +	tree arg1 = build_vec_cst_rand (vmode, 1, 3, 2);
> +	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +	vec_perm_builder builder (len, 1, 3);
> +	poly_uint64 mask_elems[] = {len, len + 1, len + 2};
> +	builder_push_elems (builder, mask_elems);
> +
> +	vec_perm_indices sel (builder, 2, len);
> +	tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> +	tree expected_res[] = { ARG1(0), ARG1(1), ARG1(2) };
> +	validate_res (1, 3, res, expected_res);
> +      }
> +
> +      /* Case 4:
> +	sel = { len, 0, 2, ... } // (1, 3) 
> +	This should return NULL because we cross the input vectors.
> +	Because,
> +	Let's assume len = C + Cx
> +	a1 = 0
> +	S = 2
> +	esel = arg0_len / sel_npatterns = C + Cx
> +	ae = 0 + (esel - 2) * S
> +	   = 0 + (C + Cx - 2) * 2
> +	   = 2(C-2) + 2Cx
> +
> +	For C >= 4:
> +	Let q1 = a1 / arg0_len = 0 / (C + Cx) = 0
> +	Let qe = ae / arg0_len = (2(C-2) + 2Cx) / (C + Cx) = 1
> +	Since q1 != qe, we cross input vectors.
> +	So return NULL_TREE.  */
> +      {
> +	tree arg0 = build_vec_cst_rand (vmode, 1, 3, 2);
> +	tree arg1 = build_vec_cst_rand (vmode, 1, 3, 2);
> +	poly_uint64 arg0_len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +	vec_perm_builder builder (arg0_len, 1, 3);
> +	poly_uint64 mask_elems[] = { arg0_len, 0, 2 };
> +	builder_push_elems (builder, mask_elems);
> +
> +	vec_perm_indices sel (builder, 2, arg0_len);
> +	const char *reason;
> +	tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason);
> +	ASSERT_TRUE (res == NULL_TREE);
> +	ASSERT_TRUE (!strcmp (reason, "crossed input vectors"));
> +      }
> +
> +      /* Case 5: npatterns(arg0) = 4 > npatterns(sel) = 2
> +	 mask = { 0, len, 1, len + 1, ...} // (2, 2)
> +	 res = { arg0[0], arg1[0], arg0[1], arg1[1], ... } // (2, 2)
> +
> +	 Note that fold_vec_perm_cst will set
> +	 res_npatterns = max(4, max(4, 2)) = 4
> +	 However after canonicalizing, we will end up with shape (2, 2).  */
> +      {
> +	tree arg0 = build_vec_cst_rand (vmode, 4, 1);
> +	tree arg1 = build_vec_cst_rand (vmode, 4, 1);
> +	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +	vec_perm_builder builder (len, 2, 2);
> +	poly_uint64 mask_elems[] = { 0, len, 1, len + 1 };
> +	builder_push_elems (builder, mask_elems);
> +
> +	vec_perm_indices sel (builder, 2, len);
> +	tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> +	tree expected_res[] = { ARG0(0), ARG1(0), ARG0(1), ARG1(1) };
> +	validate_res (2, 2, res, expected_res);
> +      }
> +
> +      /* Case 6: Test combination in sel, where one pattern is dup and other
> +	 is stepped sequence.
> +	 sel = { 0, 0, 0, 1, 0, 2, ... } // (2, 3)
> +	 res = { arg0[0], arg0[0], arg0[0],
> +		 arg0[1], arg0[0], arg0[2], ... } // (2, 3)  */
> +      {
> +	tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> +	tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> +	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +	vec_perm_builder builder (len, 2, 3);
> +	poly_uint64 mask_elems[] = { 0, 0, 0, 1, 0, 2 };
> +	builder_push_elems (builder, mask_elems);
> +
> +	vec_perm_indices sel (builder, 2, len);
> +	tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> +
> +	tree expected_res[] = { ARG0(0), ARG0(0), ARG0(0),
> +				ARG0(1), ARG0(0), ARG0(2) };
> +	validate_res (2, 3, res, expected_res);
> +      }
> +    }
> +}
> +
> +/* Test all vectors which contain at-least 8 elements.  */
> +
> +static void
> +test_nunits_min_8 (machine_mode vmode)
> +{
> +  for (int i = 0; i < 10; i++)
> +    {
> +      /* Case 1: sel_npatterns (4) > input npatterns (2)
> +	 sel: { 0, 0, 1, len, 2, 0, 3, len, 4, 0, 5, len, ...} // (4, 3)
> +	 res: { arg0[0], arg0[0], arg0[0], arg1[0],
> +		arg0[2], arg0[0], arg0[3], arg1[0],
> +		arg0[4], arg0[0], arg0[5], arg1[0], ... } // (4, 3)  */
> +      {
> +	tree arg0 = build_vec_cst_rand (vmode, 2, 3, 2);
> +	tree arg1 = build_vec_cst_rand (vmode, 2, 3, 2);
> +	poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +	vec_perm_builder builder(len, 4, 3);
> +	poly_uint64 mask_elems[] = { 0, 0, 1, len, 2, 0, 3, len,
> +				     4, 0, 5, len };
> +	builder_push_elems (builder, mask_elems);
> +
> +	vec_perm_indices sel (builder, 2, len);
> +	tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
> +
> +	tree expected_res[] = { ARG0(0), ARG0(0), ARG0(1), ARG1(0),
> +				ARG0(2), ARG0(0), ARG0(3), ARG1(0),
> +				ARG0(4), ARG0(0), ARG0(5), ARG1(0) };
> +	validate_res (4, 3, res, expected_res);
> +      }
> +    }
> +}
> +
> +/* Test vectors for which nunits[0] <= 4.  */
> +
> +static void
> +test_nunits_max_4 (machine_mode vmode)
> +{
> +  /* Case 1: mask = {0, 4, ...} // (1, 2)
> +     This should return NULL_TREE because the index 4 may choose
> +     from either arg0 or arg1 depending on vector length.  */
> +  {
> +    tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
> +    tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
> +    poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
> +
> +    vec_perm_builder builder (len, 1, 2);
> +    poly_uint64 mask_elems[] = {0, 4};
> +    builder_push_elems (builder, mask_elems);
> +
> +    vec_perm_indices sel (builder, 2, len);
> +    const char *reason;
> +    tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel, &reason);
> +    ASSERT_TRUE (res == NULL_TREE);
> +    ASSERT_TRUE (reason != NULL);
> +    ASSERT_TRUE (!strcmp (reason, "cannot divide selector element by arg len"));
> +  }
> +}
> +
> +#undef ARG0
> +#undef ARG1
> +
> +/* Return true if SIZE is of the form C + Cx and C is power of 2.  */
> +
> +static bool
> +is_simple_vla_size (poly_uint64 size)
> +{
> +  if (size.is_constant ()
> +      || !pow2p_hwi (size.coeffs[0]))
> +    return false;
> +  for (unsigned i = 1; i < ARRAY_SIZE (size.coeffs); ++i)
> +    if (size.coeffs[i] != (i <= 1 ? size.coeffs[0] : 0))
> +      return false;
> +  return true;
> +}
> +
> +/* Execute fold_vec_perm_cst unit tests.  */
> +
> +static void
> +test ()
> +{
> +  machine_mode vnx4si_mode = E_VOIDmode;
> +  machine_mode v4si_mode = E_VOIDmode;
> +
> +  machine_mode vmode;
> +  FOR_EACH_MODE_IN_CLASS (vmode, MODE_VECTOR_INT)
> +    {
> +      /* Obtain modes corresponding to VNx4SI and V4SI,
> +	 to call mixed mode tests below.
> +	 FIXME: Is there a better way to do this ?  */
> +      if (GET_MODE_INNER (vmode) == SImode)
> +	{
> +	  poly_uint64 nunits = GET_MODE_NUNITS (vmode);
> +	  if (is_simple_vla_size (nunits)
> +	      && nunits.coeffs[0] == 4)
> +	    vnx4si_mode = vmode;
> +	  else if (known_eq (nunits, poly_uint64 (4)))
> +	    v4si_mode = vmode;
> +	}
> +
> +      if (!is_simple_vla_size (GET_MODE_NUNITS (vmode))
> +	  || !targetm.vector_mode_supported_p (vmode))
> +	continue;
> +
> +      poly_uint64 nunits = GET_MODE_NUNITS (vmode);
> +      test_all_nunits (vmode);
> +      if (nunits.coeffs[0] >= 2)
> +	test_nunits_min_2 (vmode);
> +      if (nunits.coeffs[0] >= 4)
> +	test_nunits_min_4 (vmode);
> +      if (nunits.coeffs[0] >= 8)
> +	test_nunits_min_8 (vmode);
> +
> +      if (nunits.coeffs[0] <= 4)
> +	test_nunits_max_4 (vmode);
> +    }
> +
> +  if (vnx4si_mode != E_VOIDmode && v4si_mode != E_VOIDmode
> +      && targetm.vector_mode_supported_p (vnx4si_mode)
> +      && targetm.vector_mode_supported_p (v4si_mode))
> +    {
> +      test_vnx4si_v4si (vnx4si_mode, v4si_mode);
> +      test_v4si_vnx4si (v4si_mode, vnx4si_mode);
> +    }
> +}
> +}; // end of test_fold_vec_perm_cst namespace
> +
>  /* Verify that various binary operations on vectors are folded
>     correctly.  */
>  
> @@ -16943,6 +17693,7 @@ fold_const_cc_tests ()
>    test_arithmetic_folding ();
>    test_vector_folding ();
>    test_vec_duplicate_folding ();
> +  test_fold_vec_perm_cst::test ();
>  }
>  
>  } // namespace selftest