Re: [RFC] [v2] Extend fold_vec_perm to handle VLA vectors

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Sandiford <richard.sandiford@arm.com>
To: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
Cc: gcc Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [RFC] [v2] Extend fold_vec_perm to handle VLA vectors
Date: Thu, 10 Aug 2023 16:57:31 +0100	[thread overview]
Message-ID: <mptjzu2ubys.fsf@arm.com> (raw)
In-Reply-To: <CAAgBjM=yCS5zSFaeZHe6MRFQA-Qv-9qd754s7Pmcz0_n1diQyg@mail.gmail.com> (Prathamesh Kulkarni's message of "Thu, 10 Aug 2023 20:03:19 +0530")

Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
>> static bool
>> is_simple_vla_size (poly_uint64 size)
>> {
>>   if (size.is_constant ())
>>     return false;
>>   for (int i = 1; i < ARRAY_SIZE (size.coeffs); ++i)
>>     if (size[i] != (i <= 1 ? size[0] : 0))
> Just wondering is this should be (i == 1 ? size[0] : 0) since i is
> initialized to 1 ?

Both work.  I prefer <= 1 because it doesn't depend on the micro
optimisation to start at coefficient 1.  In a theoretical 3-indeterminate
poly_int, we want the first 2 coefficients to be nonzero and the rest to
be zero.

> IIUC, is_simple_vla_size should return true for polynomials of first
> degree and having same coeff like 4 + 4x ?

FWIW, poly_int only supports first-degree polynomials at the moment.
coeffs>2 means there is more than one indeterminate, rather than a
higher power.

>>       return false;
>>   return true;
>> }
>>
>>
>>   FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_INT)
>>     {
>>       auto nunits = GET_MODE_NUNITS (mode);
>>       if (!is_simple_vla_size (nunits))
>>         continue;
>>       if (nunits[0] ...)
>>         test_... (mode);
>>       ...
>>
>>     }
>>
>> test_vnx4si_v4si and test_v4si_vnx4si look good.  But with the
>> loop structure above, I think we can apply the test_vnx4si and
>> test_vnx16qi to more cases.  So the classification isn't the
>> exact number of elements, but instead a limit.
>>
>> I think the nunits[0] conditions for test_vnx4si are as follows
>> (inspection only, so could be wrong):
>>
>> > +/* Test cases where result and input vectors are VNx4SI  */
>> > +
>> > +static void
>> > +test_vnx4si (machine_mode vmode)
>> > +{
>> > +  /* Case 1: mask = {0, ...} */
>> > +  {
>> > +    tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1);
>> > +    tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1);
>> > +    poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
>> > +
>> > +    vec_perm_builder builder (len, 1, 1);
>> > +    builder.quick_push (0);
>> > +    vec_perm_indices sel (builder, 2, len);
>> > +    tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
>> > +
>> > +    tree expected_res[] = { vector_cst_elt (res, 0) };
> This should be { vector_cst_elt (arg0, 0) }; will fix in next patch.
>> > +    validate_res (1, 1, res, expected_res);
>> > +  }
>>
>> nunits[0] >= 2 (could be all nunits if the inputs had nelts_per_pattern==1,
>> which I think would be better)
> IIUC, the vectors that can be used for a particular test should have
> nunits[0] >= res_npatterns,
> where res_npatterns is as computed in fold_vec_perm_cst without the
> canonicalization ?
> For above test -- res_npatterns = max(2, max (2, 1)) == 2, so we
> require nunits[0] >= 2 ?
> Which implies we can use above test for vectors with length 2 + 2x, 4 + 4x, etc.

Right, that's what I meant.  With the inputs as they stand it has to be
nunits[0] >= 2.  We need that form the inputs correctly.  But if the
inputs instead had nelts_per_pattern == 1, the test would work for all
nunits.

> Sorry if this sounds like a silly question -- Won't nunits[0] >= 2
> cover all nunits,
> since a vector, at a minimum, will contain 2 elements ?

Not necessarily.  VNx1TI makes conceptual sense.  We just don't use it
currently (although that'll change with SME).  And we do have single-element
VLS vectors like V1DI and V1DF.

Thanks,
Richard

next prev parent reply	other threads:[~2023-08-10 15:57 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-17 12:14 Prathamesh Kulkarni
2023-07-25  9:26 ` Prathamesh Kulkarni
2023-07-25 12:55 ` Richard Sandiford
2023-07-28 12:57   ` Prathamesh Kulkarni
2023-08-03 13:01     ` Richard Sandiford
2023-08-03 13:16       ` Richard Sandiford
2023-08-04 10:06         ` Prathamesh Kulkarni
2023-08-04 15:06           ` Richard Sandiford
2023-08-06 12:25             ` Prathamesh Kulkarni
2023-08-08  9:57               ` Richard Sandiford
2023-08-10 14:33                 ` Prathamesh Kulkarni
2023-08-10 15:57                   ` Richard Sandiford [this message]
2023-08-13 11:49                     ` Prathamesh Kulkarni
2023-08-14 12:53                       ` Richard Sandiford
2023-08-15 11:29                         ` Prathamesh Kulkarni
2023-08-16  8:53                           ` Prathamesh Kulkarni
2023-08-16  9:51                             ` Richard Sandiford
2023-08-16 11:28                               ` Prathamesh Kulkarni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mptjzu2ubys.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=prathamesh.kulkarni@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).