From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
To: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>,
gcc Patches <gcc-patches@gcc.gnu.org>,
Richard Biener <rguenther@suse.de>,
richard.sandiford@arm.com
Subject: Re: PR111754
Date: Wed, 25 Oct 2023 16:29:05 +0530 [thread overview]
Message-ID: <CAAgBjMnJDtdpSqX_gC6UpU6sCTobdpwovbm6-ZHxZV77cz5GDw@mail.gmail.com> (raw)
In-Reply-To: <mptcyx3n20f.fsf@arm.com>
On Wed, 25 Oct 2023 at 02:58, Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Hi,
>
> Sorry the slow review. I clearly didn't think this through properly
> when doing the review of the original patch, so I wanted to spend
> some time working on the code to get a better understanding of
> the problem.
>
> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> > Hi,
> > For the following test-case:
> >
> > typedef float __attribute__((__vector_size__ (16))) F;
> > F foo (F a, F b)
> > {
> > F v = (F) { 9 };
> > return __builtin_shufflevector (v, v, 1, 0, 1, 2);
> > }
> >
> > Compiling with -O2 results in following ICE:
> > foo.c: In function ‘foo’:
> > foo.c:6:10: internal compiler error: in decompose, at rtl.h:2314
> > 6 | return __builtin_shufflevector (v, v, 1, 0, 1, 2);
> > | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > 0x7f3185 wi::int_traits<std::pair<rtx_def*, machine_mode>
> >>::decompose(long*, unsigned int, std::pair<rtx_def*, machine_mode>
> > const&)
> > ../../gcc/gcc/rtl.h:2314
> > 0x7f3185 wide_int_ref_storage<false,
> > false>::wide_int_ref_storage<std::pair<rtx_def*, machine_mode>
> >>(std::pair<rtx_def*, machine_mode> const&)
> > ../../gcc/gcc/wide-int.h:1089
> > 0x7f3185 generic_wide_int<wide_int_ref_storage<false, false>
> >>::generic_wide_int<std::pair<rtx_def*, machine_mode>
> >>(std::pair<rtx_def*, machine_mode> const&)
> > ../../gcc/gcc/wide-int.h:847
> > 0x7f3185 poly_int<1u, generic_wide_int<wide_int_ref_storage<false,
> > false> > >::poly_int<std::pair<rtx_def*, machine_mode>
> >>(poly_int_full, std::pair<rtx_def*, machine_mode> const&)
> > ../../gcc/gcc/poly-int.h:467
> > 0x7f3185 poly_int<1u, generic_wide_int<wide_int_ref_storage<false,
> > false> > >::poly_int<std::pair<rtx_def*, machine_mode>
> >>(std::pair<rtx_def*, machine_mode> const&)
> > ../../gcc/gcc/poly-int.h:453
> > 0x7f3185 wi::to_poly_wide(rtx_def const*, machine_mode)
> > ../../gcc/gcc/rtl.h:2383
> > 0x7f3185 rtx_vector_builder::step(rtx_def*, rtx_def*) const
> > ../../gcc/gcc/rtx-vector-builder.h:122
> > 0xfd4e1b vector_builder<rtx_def*, machine_mode,
> > rtx_vector_builder>::elt(unsigned int) const
> > ../../gcc/gcc/vector-builder.h:253
> > 0xfd4d11 rtx_vector_builder::build()
> > ../../gcc/gcc/rtx-vector-builder.cc:73
> > 0xc21d9c const_vector_from_tree
> > ../../gcc/gcc/expr.cc:13487
> > 0xc21d9c expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
> > expand_modifier, rtx_def**, bool)
> > ../../gcc/gcc/expr.cc:11059
> > 0xaee682 expand_expr(tree_node*, rtx_def*, machine_mode, expand_modifier)
> > ../../gcc/gcc/expr.h:310
> > 0xaee682 expand_return
> > ../../gcc/gcc/cfgexpand.cc:3809
> > 0xaee682 expand_gimple_stmt_1
> > ../../gcc/gcc/cfgexpand.cc:3918
> > 0xaee682 expand_gimple_stmt
> > ../../gcc/gcc/cfgexpand.cc:4044
> > 0xaf28f0 expand_gimple_basic_block
> > ../../gcc/gcc/cfgexpand.cc:6100
> > 0xaf4996 execute
> > ../../gcc/gcc/cfgexpand.cc:6835
> >
> > IIUC, the issue is that fold_vec_perm returns a vector having float element
> > type with res_nelts_per_pattern == 3, and later ICE's when it tries
> > to derive element v[3], not present in the encoding, while trying to
> > build rtx vector
> > in rtx_vector_builder::build():
> > for (unsigned int i = 0; i < nelts; ++i)
> > RTVEC_ELT (v, i) = elt (i);
> >
> > The attached patch tries to fix this by returning false from
> > valid_mask_for_fold_vec_perm_cst if sel has a stepped sequence and
> > input vector has non-integral element type, so for VLA vectors, it
> > will only build result with dup sequence (nelts_per_pattern < 3) for
> > non-integral element type.
> >
> > For VLS vectors, this will still work for stepped sequence since it
> > will then use the "VLS exception" in fold_vec_perm_cst, and set:
> > res_npattern = res_nelts and
> > res_nelts_per_pattern = 1
> >
> > and fold the above case to:
> > F foo (F a, F b)
> > {
> > <bb 2> [local count: 1073741824]:
> > return { 0.0, 9.0e+0, 0.0, 0.0 };
> > }
> >
> > But I am not sure if this is entirely correct, since:
> > tree res = out_elts.build ();
> > will canonicalize the encoding and may result in a stepped sequence
> > (vector_builder::finalize() may reduce npatterns at the cost of increasing
> > nelts_per_pattern) ?
> >
> > PS: This issue is now latent after PR111648 fix, since
> > valid_mask_for_fold_vec_perm_cst with sel = {1, 0, 1, ...} returns
> > false because the corresponding pattern in arg0 is not a natural
> > stepped sequence, and folds correctly using VLS exception. However, I
> > guess the underlying issue of dealing with non-integral element types
> > in fold_vec_perm_cst still remains ?
> >
> > The patch passes bootstrap+test with and without SVE on aarch64-linux-gnu,
> > and on x86_64-linux-gnu.
>
> I think the problem is instead in the way that we're calculating
> res_npatterns and res_nelts_per_pattern.
>
> If the selector is a duplication of { a1, ..., an }, then the
> result will be a duplication of n elements, regardless of the shape
> of the other arguments.
>
> Similarly, if the selector is { a1, ...., an } followed by a
> duplication of { b1, ..., bn }, the result be n elements followed
> by a duplication of n elements, regardless of the shape of the other
> arguments.
>
> So for these two cases, res_npatterns and res_nelts_per_pattern
> can come directly from the selector's encoding.
>
> If:
>
> (1) the selector is an n-pattern stepped sequence
> (2) the stepped part of each pattern selects from the same input pattern
> (3) the stepped part of each pattern does not select the first element
> of the input pattern, or the full input pattern is stepped
> (your previous patch)
>
> then the result is stepped only if one of the inputs is stepped.
> This is because, if an input pattern has 1 or 2 elements, (3) means
> that each element of the stepped sequence will select the same value,
> as if the selector step had been 0.
Hi Richard,
Thanks for the suggestions! I agree when selector is dup of {a1, ... an, ...} or
base elements followed up dup {a1, .. an, b1, ... bn, ...} in that
case we can set
res_nelts_per_pattern from selector's encoding. However even if we provide
more nelts_per_pattern that necessary, I guess vector_builder::finalize() will
canonicalize it to the correct encoding for result ?
>
> So I think the PR could be solved by something like the attached.
> Do you agree? If so, could you base the patch on this instead?
>
> Only tested against the self-tests.
>
> Thanks,
> Richard
>
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 40767736389..00fce4945a7 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -10743,27 +10743,37 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, const vec_perm_indices &sel,
> unsigned res_npatterns, res_nelts_per_pattern;
> unsigned HOST_WIDE_INT res_nelts;
>
> - /* (1) If SEL is a suitable mask as determined by
> - valid_mask_for_fold_vec_perm_cst_p, then:
> - res_npatterns = max of npatterns between ARG0, ARG1, and SEL
> - res_nelts_per_pattern = max of nelts_per_pattern between
> - ARG0, ARG1 and SEL.
> - (2) If SEL is not a suitable mask, and TYPE is VLS then:
> - res_npatterns = nelts in result vector.
> - res_nelts_per_pattern = 1.
> - This exception is made so that VLS ARG0, ARG1 and SEL work as before. */
> - if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
> - {
> - res_npatterns
> - = std::max (VECTOR_CST_NPATTERNS (arg0),
> - std::max (VECTOR_CST_NPATTERNS (arg1),
> - sel.encoding ().npatterns ()));
> + /* First try to implement the fold in a VLA-friendly way.
> +
> + (1) If the selector is simply a duplication of N elements, the
> + result is likewise a duplication of N elements.
> +
> + (2) If the selector is N elements followed by a duplication
> + of N elements, the result is too.
>
> - res_nelts_per_pattern
> - = std::max (VECTOR_CST_NELTS_PER_PATTERN (arg0),
> - std::max (VECTOR_CST_NELTS_PER_PATTERN (arg1),
> - sel.encoding ().nelts_per_pattern ()));
> + (3) If the selector is N elements followed by an interleaving
> + of N linear series, the situation is more complex.
>
> + valid_mask_for_fold_vec_perm_cst_p detects whether we
> + can handle this case. If we can, then each of the N linear
> + series either (a) selects the same element each time or
> + (b) selects a linear series from one of the input patterns.
> +
> + If (b) holds for one of the linear series, the result
> + will contain a linear series, and so the result will have
> + the same shape as the selector. If (a) holds for all of
> + the lienar series, the result will be the same as (2) above.
> +
> + (b) can only hold if one of the inputs pattern has a
> + stepped encoding. */
> + if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
> + {
> + res_npatterns = sel.encoding ().npatterns ();
> + res_nelts_per_pattern = sel.encoding ().nelts_per_pattern ();
> + if (res_nelts_per_pattern == 3
> + && VECTOR_CST_NELTS_PER_PATTERN (arg0) < 3
> + && VECTOR_CST_NELTS_PER_PATTERN (arg1) < 3)
> + res_nelts_per_pattern = 2;
Um, in this case, should we set:
res_nelts_per_pattern = max (nelts_per_pattern (arg0), nelts_per_pattern(arg1))
if both have nelts_per_pattern == 1 ?
Also I suppose this matters only for non-integral element type, since
for integral element type,
vector_cst_elt will return the correct value even if the element is
not explicitly encoded and input vector is dup ?
Thanks,
Prathamesh
> res_nelts = res_npatterns * res_nelts_per_pattern;
> }
> else if (TYPE_VECTOR_SUBPARTS (type).is_constant (&res_nelts))
> --
> 2.25.1
>
next prev parent reply other threads:[~2023-10-25 10:59 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-20 17:25 PR111754 Prathamesh Kulkarni
2023-10-24 21:28 ` PR111754 Richard Sandiford
2023-10-25 8:42 ` PR111754 Richard Sandiford
2023-10-25 10:59 ` Prathamesh Kulkarni [this message]
2023-10-25 22:38 ` PR111754 Richard Sandiford
2023-10-26 4:13 ` PR111754 Prathamesh Kulkarni
2023-11-08 16:27 ` PR111754 Prathamesh Kulkarni
2023-11-15 15:14 ` PR111754 Prathamesh Kulkarni
2023-11-23 11:40 ` PR111754 Prathamesh Kulkarni
2023-11-23 21:43 ` PR111754 Richard Sandiford
2023-11-27 15:13 ` PR111754 Prathamesh Kulkarni
2023-11-27 16:49 ` PR111754 Richard Sandiford
2023-11-28 7:56 PR111754 juzhe.zhong
2023-11-28 8:43 ` PR111754 Jakub Jelinek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAAgBjMnJDtdpSqX_gC6UpU6sCTobdpwovbm6-ZHxZV77cz5GDw@mail.gmail.com \
--to=prathamesh.kulkarni@linaro.org \
--cc=gcc-patches@gcc.gnu.org \
--cc=rguenther@suse.de \
--cc=richard.sandiford@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).