Re: Extend fold_vec_perm to fold VEC_PERM_EXPR in VLA manner

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Sandiford <richard.sandiford@arm.com>
To: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
Cc: gcc Patches <gcc-patches@gcc.gnu.org>,
	 Richard Biener <richard.guenther@gmail.com>
Subject: Re: Extend fold_vec_perm to fold VEC_PERM_EXPR in VLA manner
Date: Mon, 31 Oct 2022 09:57:25 +0000	[thread overview]
Message-ID: <mpth6zk2u4a.fsf@arm.com> (raw)
In-Reply-To: <CAAgBjMku5-i3m8eLc2ky1Hb-NBjwGFD5F4278R3GTSFQq=H5KQ@mail.gmail.com> (Prathamesh Kulkarni's message of "Fri, 28 Oct 2022 20:16:07 +0530")

Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> On Wed, 26 Oct 2022 at 21:07, Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Sorry for the slow response.  I wanted to find some time to think
>> about this a bit more.
>>
>> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
>> > On Fri, 30 Sept 2022 at 21:38, Richard Sandiford
>> > <richard.sandiford@arm.com> wrote:
>> >>
>> >> Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> >> > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
>> >> >> Sorry to ask a silly question but in which case shall we select 2nd vector ?
>> >> >> For num_poly_int_coeffs == 2,
>> >> >> a1 /trunc n1 == (a1 + 0x) / (n1.coeffs[0] + n1.coeffs[1]*x)
>> >> >> If a1/trunc n1 succeeds,
>> >> >> 0 / n1.coeffs[1] == a1/n1.coeffs[0] == 0.
>> >> >> So, a1 has to be < n1.coeffs[0] ?
>> >> >
>> >> > Remember that a1 is itself a poly_int.  It's not necessarily a constant.
>> >> >
>> >> > E.g. the TRN1 .D instruction maps to a VEC_PERM_EXPR with the selector:
>> >> >
>> >> >   { 0, 2 + 2x, 1, 4 + 2x, 2, 6 + 2x, ... }
>> >>
>> >> Sorry, should have been:
>> >>
>> >>   { 0, 2 + 2x, 2, 4 + 2x, 4, 6 + 2x, ... }
>> > Hi Richard,
>> > Thanks for the clarifications, and sorry for late reply.
>> > I have attached POC patch that tries to implement the above approach.
>> > Passes bootstrap+test on x86_64-linux-gnu and aarch64-linux-gnu for VLS vectors.
>> >
>> > For VLA vectors, I have only done limited testing so far.
>> > It seems to pass couple of tests written in the patch for
>> > nelts_per_pattern == 3,
>> > and folds the following svld1rq test:
>> > int32x4_t v = {1, 2, 3, 4};
>> > return svld1rq_s32 (svptrue_b8 (), &v[0])
>> > into:
>> > return {1, 2, 3, 4, ...};
>> > I will try to bootstrap+test it on SVE machine to test further for VLA folding.
>> >
>> > I have a couple of questions:
>> > 1] When mask selects elements from same vector but from different patterns:
>> > For eg:
>> > arg0 = {1, 11, 2, 12, 3, 13, ...},
>> > arg1 = {21, 31, 22, 32, 23, 33, ...},
>> > mask = {0, 0, 0, 1, 0, 2, ... },
>> > All have npatterns = 2, nelts_per_pattern = 3.
>> >
>> > With above mask,
>> > Pattern {0, ...} selects arg0[0], ie {1, ...}
>> > Pattern {0, 1, 2, ...} selects arg0[0], arg0[1], arg0[2], ie {1, 11, 2, ...}
>> > While arg0[0] and arg0[2] belong to same pattern, arg0[1] belongs to different
>> > pattern in arg0.
>> > The result is:
>> > res = {1, 1, 1, 11, 1, 2, ...}
>> > In this case, res's 2nd pattern {1, 11, 2, ...} is encoded with:
>> > with a0 = 1, a1 = 11, S = -9.
>> > Is that expected tho ? It seems to create a new encoding which
>> > wasn't present in the input vector. For instance, the next elem in
>> > sequence would be -7,
>> > which is not present originally in arg0.
>>
>> Yeah, you're right, sorry.  Going back to:
>>
>> (2) The explicit encoding can be used to produce a sequence of N*Ex*Px
>>     elements for any integer N.  This extended sequence can be reencoded
>>     as having N*Px patterns, with Ex staying the same.
>>
>> I guess we need to pick an N for the selector such that each new
>> selector pattern (each one out of the N*Px patterns) selects from
>> the *same pattern* of the same data input.
>>
>> So if a particular pattern in the selector has a step S, and the data
>> input it selects from has Pi patterns, N*S must be a multiple of Pi.
>> N must be a multiple of least_common_multiple(S,Pi)/S.
>>
>> I think that means that the total number of patterns in the result
>> (Pr from previous messages) can safely be:
>>
>>   Ps * least_common_multiple(
>>     least_common_multiple(S[1], P[input(1)]) / S[1],
>>     ...
>>     least_common_multiple(S[Ps], P[input(Ps)]) / S[Ps]
>>   )
>>
>> where:
>>
>>   Ps = the number of patterns in the selector
>>   S[I] = the step for selector pattern I (I being 1-based)
>>   input(I) = the data input selected by selector pattern I (I being 1-based)
>>   P[I] = the number of patterns in data input I
>>
>> That's getting quite complicated :-)  If we allow arbitrary P[...]
>> and S[...] then it could also get large.  Perhaps we should finally
>> give up on the general case and limit this to power-of-2 patterns and
>> power-of-2 steps, so that least_common_multiple becomes MAX.  Maybe that
>> simplifies other things as well.
>>
>> What do you think?
> Hi Richard,
> Thanks for the suggestions. Yeah I suppose we can initially add support for
> power-of-2 patterns and power-of-2 steps and try to generalize it in
> follow up patches if possible.
>
> Sorry if this sounds like a silly ques -- if we are going to have
> pattern in selector, select *same pattern from same input vector*,
> instead of re-encoding the selector to have N * Ps patterns, would it
> make sense for elements in selector to denote pattern number itself
> instead of element index
> if input vectors are VLA ?
>
> For eg:
> op0 = {1, 2, 3, 4, 1, 2, 3, 5, 1, 2, 3, 6, ...}
> op1 = {...}
> with npatterns == 4, nelts_per_pattern == 3,
> sel = {0, 3} should pick pattern 0 and pattern 3 from op0,
> so, res = {1, 4, 1, 5, 1, 6, ...}
> Not sure if this is correct tho.

This wouldn't allow us to represent things like a "duplicate one
element", or "copy the leading N elements from the first input and
the other elements from elements N+ of the second input", which we
can with the current scheme.

The restriction about each (unwound) selector pattern selecting from the
same input pattern only applies to case where the selector pattern is
stepped (and only applies to the stepped part of the pattern, not the
leading element).  The restriction is also local to this code; it
doesn't make other VEC_PERM_EXPRs invalid.

Thanks,
Richard

>
> Thanks,
> Prathamesh
>>
>> > I suppose it's fine since if the user defines mask to have pattern {0,
>> > 1, 2, ...}
>> > they intended result to have pattern with above encoding.
>> > Just wanted to confirm if this is correct ?
>> >
>> > 2] Could you please suggest a test-case for S < 0 ?
>> > I am not able to come up with one :/
>>
>> svrev is one way of creating negative steps.
>>
>> Thanks,
>> Richard
>>
>> >
>> > Thanks,
>> > Prathamesh
>> >>
>> >> > which is an interleaving of the two patterns:
>> >> >
>> >> >   { 0, 2, 4, ... }                  a0 = 0, a1 = 2, S = 2
>> >> >   { 2 + 2x, 4 + 2x, 6 + 2x }        a0 = 2 + 2x, a1 = 4 + 2x, S = 2

next prev parent reply	other threads:[~2022-10-31  9:57 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-17 12:39 Prathamesh Kulkarni
2022-08-29  6:08 ` Prathamesh Kulkarni
2022-09-05  8:53   ` Prathamesh Kulkarni
2022-09-05 10:21 ` Richard Sandiford
2022-09-09 13:59   ` Prathamesh Kulkarni
2022-09-12 14:27     ` Richard Sandiford
2022-09-15 12:26       ` Prathamesh Kulkarni
2022-09-20 12:39         ` Richard Sandiford
2022-09-23 11:59           ` Prathamesh Kulkarni
2022-09-23 16:03             ` Richard Sandiford
2022-09-26 19:33               ` Prathamesh Kulkarni
2022-09-26 20:29                 ` Richard Sandiford
2022-09-30 14:41                   ` Prathamesh Kulkarni
2022-09-30 16:00                     ` Richard Sandiford
2022-09-30 16:08                       ` Richard Sandiford
2022-10-10 10:48                         ` Prathamesh Kulkarni
2022-10-17 10:32                           ` Prathamesh Kulkarni
2022-10-24  8:12                             ` Prathamesh Kulkarni
2022-10-26 15:37                           ` Richard Sandiford
2022-10-28 14:46                             ` Prathamesh Kulkarni
2022-10-31  9:57                               ` Richard Sandiford [this message]
2022-11-04  8:30                                 ` Prathamesh Kulkarni
2022-11-21  9:07                                   ` Prathamesh Kulkarni
2022-11-28 11:44                                     ` Prathamesh Kulkarni
2022-12-06 15:30                                     ` Richard Sandiford
2022-12-13  6:05                                       ` Prathamesh Kulkarni
2022-12-26  4:26                                         ` Prathamesh Kulkarni
2023-01-17 11:54                                           ` Prathamesh Kulkarni
2023-02-01 10:01                                             ` Prathamesh Kulkarni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mpth6zk2u4a.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=prathamesh.kulkarni@linaro.org \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).