From: Richard Sandiford <richard.sandiford@arm.com>
To: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
Cc: gcc Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [RFC] [v2] Extend fold_vec_perm to handle VLA vectors
Date: Thu, 03 Aug 2023 14:01:36 +0100 [thread overview]
Message-ID: <mpt4jlg2sb3.fsf@arm.com> (raw)
In-Reply-To: <CAAgBjMmOrVxsNT8jK0Ei-RYrhtNVymd7pmaud2tA6=OAJCbpOg@mail.gmail.com> (Prathamesh Kulkarni's message of "Fri, 28 Jul 2023 18:27:19 +0530")
Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> On Tue, 25 Jul 2023 at 18:25, Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Hi,
>>
>> Thanks for the rework and sorry for the slow review.
> Hi Richard,
> Thanks for the suggestions! Please find my responses inline below.
>>
>> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
>> > Hi Richard,
>> > This is reworking of patch to extend fold_vec_perm to handle VLA vectors.
>> > The attached patch unifies handling of VLS and VLA vector_csts, while
>> > using fallback code
>> > for ctors.
>> >
>> > For VLS vector, the patch ignores underlying encoding, and
>> > uses npatterns = nelts, and nelts_per_pattern = 1.
>> >
>> > For VLA patterns, if sel has a stepped sequence, then it
>> > only chooses elements from a particular pattern of a particular
>> > input vector.
>> >
>> > To make things simpler, the patch imposes following constraints:
>> > (a) op0_npatterns, op1_npatterns and sel_npatterns are powers of 2.
>> > (b) The step size for a stepped sequence is a power of 2, and
>> > multiple of npatterns of chosen input vector.
>> > (c) Runtime vector length of sel is a multiple of sel_npatterns.
>> > So, we don't handle sel.length = 2 + 2x and npatterns = 4.
>> >
>> > Eg:
>> > op0, op1: npatterns = 2, nelts_per_pattern = 3
>> > op0_len = op1_len = 16 + 16x.
>> > sel = { 0, 0, 2, 0, 4, 0, ... }
>> > npatterns = 2, nelts_per_pattern = 3.
>> >
>> > For pattern {0, 2, 4, ...}
>> > Let,
>> > a1 = 2
>> > S = step size = 2
>> >
>> > Let Esel denote number of elements per pattern in sel at runtime.
>> > Esel = (16 + 16x) / npatterns_sel
>> > = (16 + 16x) / 2
>> > = (8 + 8x)
>> >
>> > So, last element of pattern:
>> > ae = a1 + (Esel - 2) * S
>> > = 2 + (8 + 8x - 2) * 2
>> > = 14 + 16x
>> >
>> > a1 /trunc arg0_len = 2 / (16 + 16x) = 0
>> > ae /trunc arg0_len = (14 + 16x) / (16 + 16x) = 0
>> > Since both are equal with quotient = 0, we select elements from op0.
>> >
>> > Since step size (S) is a multiple of npatterns(op0), we select
>> > all elements from same pattern of op0.
>> >
>> > res_npatterns = max (op0_npatterns, max (op1_npatterns, sel_npatterns))
>> > = max (2, max (2, 2)
>> > = 2
>> >
>> > res_nelts_per_pattern = max (op0_nelts_per_pattern,
>> > max (op1_nelts_per_pattern,
>> > sel_nelts_per_pattern))
>> > = max (3, max (3, 3))
>> > = 3
>> >
>> > So res has encoding with npatterns = 2, nelts_per_pattern = 3.
>> > res: { op0[0], op0[0], op0[2], op0[0], op0[4], op0[0], ... }
>> >
>> > Unfortunately, this results in an issue for poly_int_cst index:
>> > For example,
>> > op0, op1: npatterns = 1, nelts_per_pattern = 3
>> > op0_len = op1_len = 4 + 4x
>> >
>> > sel: { 4 + 4x, 5 + 4x, 6 + 4x, ... } // should choose op1
>> >
>> > In this case,
>> > a1 = 5 + 4x
>> > S = (6 + 4x) - (5 + 4x) = 1
>> > Esel = 4 + 4x
>> >
>> > ae = a1 + (esel - 2) * S
>> > = (5 + 4x) + (4 + 4x - 2) * 1
>> > = 7 + 8x
>> >
>> > IIUC, 7 + 8x will always be index for last element of op1 ?
>> > if x = 0, len = 4, 7 + 8x = 7
>> > if x = 1, len = 8, 7 + 8x = 15, etc.
>> > So the stepped sequence will always choose elements
>> > from op1 regardless of vector length for above case ?
>> >
>> > However,
>> > ae /trunc op0_len
>> > = (7 + 8x) / (4 + 4x)
>> > which is not defined because 7/4 != 8/4
>> > and we return NULL_TREE, but I suppose the expected result would be:
>> > res: { op1[0], op1[1], op1[2], ... } ?
>> >
>> > The patch passes bootstrap+test on aarch64-linux-gnu with and without sve,
>> > and on x86_64-unknown-linux-gnu.
>> > I would be grateful for suggestions on how to proceed.
>> >
>> > Thanks,
>> > Prathamesh
>> >
>> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
>> > index a02ede79fed..8028b3e8e9a 100644
>> > --- a/gcc/fold-const.cc
>> > +++ b/gcc/fold-const.cc
>> > @@ -85,6 +85,10 @@ along with GCC; see the file COPYING3. If not see
>> > #include "vec-perm-indices.h"
>> > #include "asan.h"
>> > #include "gimple-range.h"
>> > +#include <algorithm>
>> > +#include "tree-pretty-print.h"
>> > +#include "gimple-pretty-print.h"
>> > +#include "print-tree.h"
>> >
>> > /* Nonzero if we are folding constants inside an initializer or a C++
>> > manifestly-constant-evaluated context; zero otherwise.
>> > @@ -10493,15 +10497,9 @@ fold_mult_zconjz (location_t loc, tree type, tree expr)
>> > static bool
>> > vec_cst_ctor_to_array (tree arg, unsigned int nelts, tree *elts)
>> > {
>> > - unsigned HOST_WIDE_INT i, nunits;
>> > + unsigned HOST_WIDE_INT i;
>> >
>> > - if (TREE_CODE (arg) == VECTOR_CST
>> > - && VECTOR_CST_NELTS (arg).is_constant (&nunits))
>> > - {
>> > - for (i = 0; i < nunits; ++i)
>> > - elts[i] = VECTOR_CST_ELT (arg, i);
>> > - }
>> > - else if (TREE_CODE (arg) == CONSTRUCTOR)
>> > + if (TREE_CODE (arg) == CONSTRUCTOR)
>> > {
>> > constructor_elt *elt;
>> >
>> > @@ -10519,6 +10517,230 @@ vec_cst_ctor_to_array (tree arg, unsigned int nelts, tree *elts)
>> > return true;
>> > }
>> >
>> > +/* Return a vector with (NPATTERNS, NELTS_PER_PATTERN) encoding. */
>> > +
>> > +static tree
>> > +vector_cst_reshape (tree vec, unsigned npatterns, unsigned nelts_per_pattern)
>> > +{
>> > + gcc_assert (pow2p_hwi (npatterns));
>> > +
>> > + if (VECTOR_CST_NPATTERNS (vec) == npatterns
>> > + && VECTOR_CST_NELTS_PER_PATTERN (vec) == nelts_per_pattern)
>> > + return vec;
>> > +
>> > + tree v = make_vector (exact_log2 (npatterns), nelts_per_pattern);
>> > + TREE_TYPE (v) = TREE_TYPE (vec);
>> > +
>> > + unsigned nelts = npatterns * nelts_per_pattern;
>> > + for (unsigned i = 0; i < nelts; i++)
>> > + VECTOR_CST_ENCODED_ELT(v, i) = vector_cst_elt (vec, i);
>> > + return v;
>> > +}
>> > +
>> > +/* Helper routine for fold_vec_perm_vla to check if ARG is a suitable
>> > + operand for VLA vec_perm folding. If arg is VLS, then set
>> > + NPATTERNS = nelts and NELTS_PER_PATTERN = 1. */
>> > +
>> > +static tree
>> > +valid_operand_for_fold_vec_perm_cst_p (tree arg)
>> > +{
>> > + if (TREE_CODE (arg) != VECTOR_CST)
>> > + return NULL_TREE;
>> > +
>> > + unsigned HOST_WIDE_INT nelts;
>> > + unsigned npatterns, nelts_per_pattern;
>> > + if (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg)).is_constant (&nelts))
>> > + {
>> > + npatterns = nelts;
>> > + nelts_per_pattern = 1;
>> > + }
>> > + else
>> > + {
>> > + npatterns = VECTOR_CST_NPATTERNS (arg);
>> > + nelts_per_pattern = VECTOR_CST_NELTS_PER_PATTERN (arg);
>> > + }
>> > +
>> > + if (!pow2p_hwi (npatterns))
>> > + return NULL_TREE;
>> > +
>> > + return vector_cst_reshape (arg, npatterns, nelts_per_pattern);
>> > +}
>>
>> I don't think we should reshape the vectors for VLS, since it would
>> create more nodes for GC to clean up later. Also, the "compact" encoding
>> is canonical even for VLS, so the reshaping would effectively create
>> noncanonical constants (even if only temporarily).
> Removed in the attached patch.
>>
>> Instead, I think we should change the later:
>>
>> > + if (!valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, sel_npatterns,
>> > + sel_nelts_per_pattern, reason, verbose))
>> > + return NULL_TREE;
>>
>> so that it comes after the computation of res_npatterns and
>> res_nelts_per_pattern. Then, if valid_mask_for_fold_vec_perm_cst_p
>> returns false, and if the result type has a constant number of elements,
>> we should:
>>
>> * set res_npatterns to that number of elements
>> * set res_nelts_per_pattern to 1
>> * continue instead of returning null
> Assuming we don't enforce only VLA or only VLS for input vectors and sel,
> won't that be still an issue if res (and sel) is VLS, and input
> vectors are VLA ?
> For eg:
> arg0, arg1 are type VNx4SI with npatterns = 2, nelts_per_pattern = 3, step = 2
> sel is V4SI constant with encoding { 0, 2, 4, ... }
> and res_type is V4SI.
> In this case, when it comes to index 4, the vector selection becomes ambiguous,
> since it can be arg1 for len = 4 + 4x, and arg0 for lengths > 4 + 4x ?
Ah, right. So the condition is whether the result type and the data
input type have a constant number of elements, rather than just the result.
> In the attached patch if sel is not a suitable mask, and input vectors
> have constant length, then it sets:
> res_npatterns = nelts of input vector
> res_nelts_per_pattern = 1
> Does that look OK ?
> This part of the code has few tests in test_mixed().
res_npatterns should be nelts of the result vector, not the input.
>> The loop that follows will then do the correct thing for each element.
>>
>> The check for a power of 2 would then go in
>> valid_mask_for_fold_vec_perm_cst_p rather than
>> valid_operand_for_fold_vec_perm_cst_p.
>>
>> With that change, I think:
>>
>> > +
>> > +/* Helper routine for fold_vec_perm_cst to check if SEL is a suitable
>> > + mask for VLA vec_perm folding. Set SEL_NPATTERNS and SEL_NELTS_PER_PATTERN
>> > + similarly. */
>> > +
>> > +static bool
>> > +valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree arg1,
>> > + const vec_perm_indices &sel,
>> > + unsigned& sel_npatterns,
>> > + unsigned& sel_nelts_per_pattern,
>> > + char *reason = NULL,
>> > + bool verbose = false)
>> > +{
>> > + unsigned HOST_WIDE_INT nelts;
>> > + if (sel.length ().is_constant (&nelts))
>> > + {
>> > + sel_npatterns = nelts;
>> > + sel_nelts_per_pattern = 1;
>> > + }
>> > + else
>> > + {
>> > + sel_npatterns = sel.encoding ().npatterns ();
>> > + sel_nelts_per_pattern = sel.encoding ().nelts_per_pattern ();
>> > + }
>>
>> ...we should use the "else" code unconditionally.
> Done.
>>
>> The function comment should describe "reason" and "verbose". It looks
>> like these are debug parameters, is that right?
> Yes, verbose is just for printf debugging, and reason is used in unit-tests when
> result is NULL_TREE, to verify that it's NULL for the intended reason,
> and not due to something else.
Could we make it a "const char **" instead, and just store the string
pointers there? It looks like all the reasons are fixed strings rather
than dynamic.
>>
>> > +
>> > + if (!pow2p_hwi (sel_npatterns))
>> > + {
>> > + if (reason)
>> > + strcpy (reason, "sel_npatterns is not power of 2");
>> > + return false;
>> > + }
>> > +
>> > + /* We want to avoid cases where sel.length is not a multiple of npatterns.
>> > + For eg: sel.length = 2 + 2x, and sel npatterns = 4. */
>> > + poly_uint64 esel;
>> > + if (!multiple_p (sel.length (), sel_npatterns, &esel))
>> > + {
>> > + if (reason)
>> > + strcpy (reason, "sel.length is not multiple of sel_npatterns");
>> > + return false;
>> > + }
>> > +
>> > + if (sel_nelts_per_pattern < 3)
>> > + return true;
>> > +
>> > + for (unsigned pattern = 0; pattern < sel_npatterns; pattern++)
>> > + {
>> > + poly_uint64 a1 = sel[pattern + sel_npatterns];
>> > + poly_uint64 a2 = sel[pattern + 2 * sel_npatterns];
>> > +
>> > + poly_uint64 step = a2 - a1;
>> > + if (!step.is_constant ())
>> > + {
>> > + if (reason)
>> > + strcpy (reason, "step is not constant");
>> > + return false;
>> > + }
>> > + int S = step.to_constant ();
>> > + if (S == 0)
>> > + continue;
>>
>> Might be simpler as:
>>
>> HOST_WIDE_INT step;
>> if (!(a2 - a1).is_constant (&step))
>> {
>> ...
>> }
> This resulted in following error:
> ../../gcc/gcc/fold-const.cc:10563:34: error: no matching function for
> call to ‘poly_int<2, long unsigned int>::is_constant(long int*)’
> 10563 | if (!(a2 - a1).is_constant (&S))
> | ~~~~~~~~~~~~~~~~~~~~~~^~~~
OK:
HOST_WIDE_INT step;
if (!poly_int64 (a2 - a1).is_constant (&step))
> Could you please elaborate on the following issue, because it still
> stands in this patch,
> which is in Case 2 of test_fold_vec_perm_cst::test_stepped.
>
> For example,
> op0, op1: npatterns = 1, nelts_per_pattern = 3
> op0_len = op1_len = 4 + 4x
>
> sel: { 4 + 4x, 5 + 4x, 6 + 4x, ... } // should choose op1
>
> In this case,
> a1 = 5 + 4x
> S = (6 + 4x) - (5 + 4x) = 1
> Esel = 4 + 4x
>
> ae = a1 + (esel - 2) * S
> = (5 + 4x) + (4 + 4x - 2) * 1
> = 7 + 8x
>
> IIUC, 7 + 8x will always be index for last element of op1 ?
> if x = 0, len = 4, 7 + 8x = 7
> if x = 1, len = 8, 7 + 8x = 15, etc.
> So the stepped sequence will always choose elements
> from op1 regardless of vector length for above case ?
>
> However,
> ae /trunc op0_len
> = (7 + 8x) / (4 + 4x)
> which is not defined because 7/4 != 8/4
Ah, that was me being lazy and not handling a boundary case that
seemed difficult to keep overflow-free. I've now pushed a fix.
Thanks,
Richard
next prev parent reply other threads:[~2023-08-03 13:01 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-17 12:14 Prathamesh Kulkarni
2023-07-25 9:26 ` Prathamesh Kulkarni
2023-07-25 12:55 ` Richard Sandiford
2023-07-28 12:57 ` Prathamesh Kulkarni
2023-08-03 13:01 ` Richard Sandiford [this message]
2023-08-03 13:16 ` Richard Sandiford
2023-08-04 10:06 ` Prathamesh Kulkarni
2023-08-04 15:06 ` Richard Sandiford
2023-08-06 12:25 ` Prathamesh Kulkarni
2023-08-08 9:57 ` Richard Sandiford
2023-08-10 14:33 ` Prathamesh Kulkarni
2023-08-10 15:57 ` Richard Sandiford
2023-08-13 11:49 ` Prathamesh Kulkarni
2023-08-14 12:53 ` Richard Sandiford
2023-08-15 11:29 ` Prathamesh Kulkarni
2023-08-16 8:53 ` Prathamesh Kulkarni
2023-08-16 9:51 ` Richard Sandiford
2023-08-16 11:28 ` Prathamesh Kulkarni
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=mpt4jlg2sb3.fsf@arm.com \
--to=richard.sandiford@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=prathamesh.kulkarni@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).