From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
To: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>,
gcc Patches <gcc-patches@gcc.gnu.org>,
Richard Biener <rguenther@suse.de>,
richard.sandiford@arm.com
Subject: Re: PR111754
Date: Wed, 8 Nov 2023 21:57:58 +0530 [thread overview]
Message-ID: <CAAgBjM=DNuR5n95jm5u1JH-K-ezPASkVOqUjEsHs5GN3XsTCTQ@mail.gmail.com> (raw)
In-Reply-To: <CAAgBjM=OtYUFLw=ujQdBLh7my8Jh78Wbe3enWAC1swJ26H-Adg@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 14031 bytes --]
On Thu, 26 Oct 2023 at 09:43, Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
>
> On Thu, 26 Oct 2023 at 04:09, Richard Sandiford
> <richard.sandiford@arm.com> wrote:
> >
> > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> > > On Wed, 25 Oct 2023 at 02:58, Richard Sandiford
> > > <richard.sandiford@arm.com> wrote:
> > >>
> > >> Hi,
> > >>
> > >> Sorry the slow review. I clearly didn't think this through properly
> > >> when doing the review of the original patch, so I wanted to spend
> > >> some time working on the code to get a better understanding of
> > >> the problem.
> > >>
> > >> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> > >> > Hi,
> > >> > For the following test-case:
> > >> >
> > >> > typedef float __attribute__((__vector_size__ (16))) F;
> > >> > F foo (F a, F b)
> > >> > {
> > >> > F v = (F) { 9 };
> > >> > return __builtin_shufflevector (v, v, 1, 0, 1, 2);
> > >> > }
> > >> >
> > >> > Compiling with -O2 results in following ICE:
> > >> > foo.c: In function ‘foo’:
> > >> > foo.c:6:10: internal compiler error: in decompose, at rtl.h:2314
> > >> > 6 | return __builtin_shufflevector (v, v, 1, 0, 1, 2);
> > >> > | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > >> > 0x7f3185 wi::int_traits<std::pair<rtx_def*, machine_mode>
> > >> >>::decompose(long*, unsigned int, std::pair<rtx_def*, machine_mode>
> > >> > const&)
> > >> > ../../gcc/gcc/rtl.h:2314
> > >> > 0x7f3185 wide_int_ref_storage<false,
> > >> > false>::wide_int_ref_storage<std::pair<rtx_def*, machine_mode>
> > >> >>(std::pair<rtx_def*, machine_mode> const&)
> > >> > ../../gcc/gcc/wide-int.h:1089
> > >> > 0x7f3185 generic_wide_int<wide_int_ref_storage<false, false>
> > >> >>::generic_wide_int<std::pair<rtx_def*, machine_mode>
> > >> >>(std::pair<rtx_def*, machine_mode> const&)
> > >> > ../../gcc/gcc/wide-int.h:847
> > >> > 0x7f3185 poly_int<1u, generic_wide_int<wide_int_ref_storage<false,
> > >> > false> > >::poly_int<std::pair<rtx_def*, machine_mode>
> > >> >>(poly_int_full, std::pair<rtx_def*, machine_mode> const&)
> > >> > ../../gcc/gcc/poly-int.h:467
> > >> > 0x7f3185 poly_int<1u, generic_wide_int<wide_int_ref_storage<false,
> > >> > false> > >::poly_int<std::pair<rtx_def*, machine_mode>
> > >> >>(std::pair<rtx_def*, machine_mode> const&)
> > >> > ../../gcc/gcc/poly-int.h:453
> > >> > 0x7f3185 wi::to_poly_wide(rtx_def const*, machine_mode)
> > >> > ../../gcc/gcc/rtl.h:2383
> > >> > 0x7f3185 rtx_vector_builder::step(rtx_def*, rtx_def*) const
> > >> > ../../gcc/gcc/rtx-vector-builder.h:122
> > >> > 0xfd4e1b vector_builder<rtx_def*, machine_mode,
> > >> > rtx_vector_builder>::elt(unsigned int) const
> > >> > ../../gcc/gcc/vector-builder.h:253
> > >> > 0xfd4d11 rtx_vector_builder::build()
> > >> > ../../gcc/gcc/rtx-vector-builder.cc:73
> > >> > 0xc21d9c const_vector_from_tree
> > >> > ../../gcc/gcc/expr.cc:13487
> > >> > 0xc21d9c expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
> > >> > expand_modifier, rtx_def**, bool)
> > >> > ../../gcc/gcc/expr.cc:11059
> > >> > 0xaee682 expand_expr(tree_node*, rtx_def*, machine_mode, expand_modifier)
> > >> > ../../gcc/gcc/expr.h:310
> > >> > 0xaee682 expand_return
> > >> > ../../gcc/gcc/cfgexpand.cc:3809
> > >> > 0xaee682 expand_gimple_stmt_1
> > >> > ../../gcc/gcc/cfgexpand.cc:3918
> > >> > 0xaee682 expand_gimple_stmt
> > >> > ../../gcc/gcc/cfgexpand.cc:4044
> > >> > 0xaf28f0 expand_gimple_basic_block
> > >> > ../../gcc/gcc/cfgexpand.cc:6100
> > >> > 0xaf4996 execute
> > >> > ../../gcc/gcc/cfgexpand.cc:6835
> > >> >
> > >> > IIUC, the issue is that fold_vec_perm returns a vector having float element
> > >> > type with res_nelts_per_pattern == 3, and later ICE's when it tries
> > >> > to derive element v[3], not present in the encoding, while trying to
> > >> > build rtx vector
> > >> > in rtx_vector_builder::build():
> > >> > for (unsigned int i = 0; i < nelts; ++i)
> > >> > RTVEC_ELT (v, i) = elt (i);
> > >> >
> > >> > The attached patch tries to fix this by returning false from
> > >> > valid_mask_for_fold_vec_perm_cst if sel has a stepped sequence and
> > >> > input vector has non-integral element type, so for VLA vectors, it
> > >> > will only build result with dup sequence (nelts_per_pattern < 3) for
> > >> > non-integral element type.
> > >> >
> > >> > For VLS vectors, this will still work for stepped sequence since it
> > >> > will then use the "VLS exception" in fold_vec_perm_cst, and set:
> > >> > res_npattern = res_nelts and
> > >> > res_nelts_per_pattern = 1
> > >> >
> > >> > and fold the above case to:
> > >> > F foo (F a, F b)
> > >> > {
> > >> > <bb 2> [local count: 1073741824]:
> > >> > return { 0.0, 9.0e+0, 0.0, 0.0 };
> > >> > }
> > >> >
> > >> > But I am not sure if this is entirely correct, since:
> > >> > tree res = out_elts.build ();
> > >> > will canonicalize the encoding and may result in a stepped sequence
> > >> > (vector_builder::finalize() may reduce npatterns at the cost of increasing
> > >> > nelts_per_pattern) ?
> > >> >
> > >> > PS: This issue is now latent after PR111648 fix, since
> > >> > valid_mask_for_fold_vec_perm_cst with sel = {1, 0, 1, ...} returns
> > >> > false because the corresponding pattern in arg0 is not a natural
> > >> > stepped sequence, and folds correctly using VLS exception. However, I
> > >> > guess the underlying issue of dealing with non-integral element types
> > >> > in fold_vec_perm_cst still remains ?
> > >> >
> > >> > The patch passes bootstrap+test with and without SVE on aarch64-linux-gnu,
> > >> > and on x86_64-linux-gnu.
> > >>
> > >> I think the problem is instead in the way that we're calculating
> > >> res_npatterns and res_nelts_per_pattern.
> > >>
> > >> If the selector is a duplication of { a1, ..., an }, then the
> > >> result will be a duplication of n elements, regardless of the shape
> > >> of the other arguments.
> > >>
> > >> Similarly, if the selector is { a1, ...., an } followed by a
> > >> duplication of { b1, ..., bn }, the result be n elements followed
> > >> by a duplication of n elements, regardless of the shape of the other
> > >> arguments.
> > >>
> > >> So for these two cases, res_npatterns and res_nelts_per_pattern
> > >> can come directly from the selector's encoding.
> > >>
> > >> If:
> > >>
> > >> (1) the selector is an n-pattern stepped sequence
> > >> (2) the stepped part of each pattern selects from the same input pattern
> > >> (3) the stepped part of each pattern does not select the first element
> > >> of the input pattern, or the full input pattern is stepped
> > >> (your previous patch)
> > >>
> > >> then the result is stepped only if one of the inputs is stepped.
> > >> This is because, if an input pattern has 1 or 2 elements, (3) means
> > >> that each element of the stepped sequence will select the same value,
> > >> as if the selector step had been 0.
> > > Hi Richard,
> > > Thanks for the suggestions! I agree when selector is dup of {a1, ... an, ...} or
> > > base elements followed up dup {a1, .. an, b1, ... bn, ...} in that
> > > case we can set
> > > res_nelts_per_pattern from selector's encoding. However even if we provide
> > > more nelts_per_pattern that necessary, I guess vector_builder::finalize() will
> > > canonicalize it to the correct encoding for result ?
> >
> > Right. The elements before finalize is called have to be correct,
> > but they don't need to be canonical or minimal.
> >
> > After the change, we'll build no more elements than we did before.
> >
> > >> So I think the PR could be solved by something like the attached.
> > >> Do you agree? If so, could you base the patch on this instead?
> > >>
> > >> Only tested against the self-tests.
> > >>
> > >> Thanks,
> > >> Richard
> > >>
> > >> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > >> index 40767736389..00fce4945a7 100644
> > >> --- a/gcc/fold-const.cc
> > >> +++ b/gcc/fold-const.cc
> > >> @@ -10743,27 +10743,37 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, const vec_perm_indices &sel,
> > >> unsigned res_npatterns, res_nelts_per_pattern;
> > >> unsigned HOST_WIDE_INT res_nelts;
> > >>
> > >> - /* (1) If SEL is a suitable mask as determined by
> > >> - valid_mask_for_fold_vec_perm_cst_p, then:
> > >> - res_npatterns = max of npatterns between ARG0, ARG1, and SEL
> > >> - res_nelts_per_pattern = max of nelts_per_pattern between
> > >> - ARG0, ARG1 and SEL.
> > >> - (2) If SEL is not a suitable mask, and TYPE is VLS then:
> > >> - res_npatterns = nelts in result vector.
> > >> - res_nelts_per_pattern = 1.
> > >> - This exception is made so that VLS ARG0, ARG1 and SEL work as before. */
> > >> - if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
> > >> - {
> > >> - res_npatterns
> > >> - = std::max (VECTOR_CST_NPATTERNS (arg0),
> > >> - std::max (VECTOR_CST_NPATTERNS (arg1),
> > >> - sel.encoding ().npatterns ()));
> > >> + /* First try to implement the fold in a VLA-friendly way.
> > >> +
> > >> + (1) If the selector is simply a duplication of N elements, the
> > >> + result is likewise a duplication of N elements.
> > >> +
> > >> + (2) If the selector is N elements followed by a duplication
> > >> + of N elements, the result is too.
> > >>
> > >> - res_nelts_per_pattern
> > >> - = std::max (VECTOR_CST_NELTS_PER_PATTERN (arg0),
> > >> - std::max (VECTOR_CST_NELTS_PER_PATTERN (arg1),
> > >> - sel.encoding ().nelts_per_pattern ()));
> > >> + (3) If the selector is N elements followed by an interleaving
> > >> + of N linear series, the situation is more complex.
> > >>
> > >> + valid_mask_for_fold_vec_perm_cst_p detects whether we
> > >> + can handle this case. If we can, then each of the N linear
> > >> + series either (a) selects the same element each time or
> > >> + (b) selects a linear series from one of the input patterns.
> > >> +
> > >> + If (b) holds for one of the linear series, the result
> > >> + will contain a linear series, and so the result will have
> > >> + the same shape as the selector. If (a) holds for all of
> > >> + the lienar series, the result will be the same as (2) above.
> > >> +
> > >> + (b) can only hold if one of the inputs pattern has a
> > >> + stepped encoding. */
> > >> + if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
> > >> + {
> > >> + res_npatterns = sel.encoding ().npatterns ();
> > >> + res_nelts_per_pattern = sel.encoding ().nelts_per_pattern ();
> > >> + if (res_nelts_per_pattern == 3
> > >> + && VECTOR_CST_NELTS_PER_PATTERN (arg0) < 3
> > >> + && VECTOR_CST_NELTS_PER_PATTERN (arg1) < 3)
> > >> + res_nelts_per_pattern = 2;
> > > Um, in this case, should we set:
> > > res_nelts_per_pattern = max (nelts_per_pattern (arg0), nelts_per_pattern(arg1))
> > > if both have nelts_per_pattern == 1 ?
> >
> > No, it still needs to be 2 even if arg0 and arg1 are duplicates.
> > E.g. consider a selector that picks the first element of arg0
> > followed by a duplicate of the first element of arg1.
> >
> > > Also I suppose this matters only for non-integral element type, since
> > > for integral element type,
> > > vector_cst_elt will return the correct value even if the element is
> > > not explicitly encoded and input vector is dup ?
> >
> > Yeah, but it might help even for integers. If we build fewer
> > elements explicitly, and so read fewer implicitly-encoded inputs,
> > there's less risk of running into:
> >
> > if (!can_div_trunc_p (sel[i], len, &q, &r))
> > {
> > if (reason)
> > *reason = "cannot divide selector element by arg len";
> > return NULL_TREE;
> > }
> Ah right, thanks for the clarification!
> I am currently away on vacation and will return next Thursday, and
> will post a follow up patch based on your patch.
> Sorry for the delay.
Hi,
Sorry for slow response, I have rebased your patch and added couple of tests.
The attached patch resulted in fallout for aarch64/sve/slp_3.c and
aarch64/sve/slp_4.c.
Specifically for slp_3.c, we didn't fold following case:
arg0, arg1 are dup vectors.
sel = { 0, len, 1, len + 1, 2, len + 2, ... } // (npatterns = 2,
nelts_per_pattern = 3)
because res_nelts_per_pattern was set to 3, and upon encountering 2,
fold_vec_perm_cst returned false.
With patch, we set res_nelts_per_pattern = 2 (since input vectors are
dup), and thus gets folded to:
res = { arg0[0], arg1[0], ... } // (2, 1)
Which results in using ldrqd for loading the result instead of doing
the permutation at runtime with mov and zip1.
I have adjusted the tests for new code-gen.
Does it look OK ?
There's also this strange failure observed on x86_64, as well as on aarch64:
New tests that FAIL (1 tests):
libitm.c++/dropref.C -B
/home/prathamesh.kulkarni/gnu-toolchain/gcc/gnu-964-5/bootstrap-build-after/aarch64-unknown-linux-gnu/./libitm/../libstdc++-v3/src/.libs
execution test
Looking at dropref.C:
/* { dg-xfail-run-if "unsupported" { *-*-* } } */
#include <libitm.h>
char *pp;
int main()
{
__transaction_atomic {
_ITM_dropReferences (pp, 555);
}
return 0;
}
doesn't seem relevant to VEC_PERM_EXPR folding ?
The patch otherwise passes bootstrap+test on aarch64-linux-gnu with
and without SVE, and on x86_64-linux-gnu.
Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Richard
[-- Attachment #2: gnu-964-6.txt --]
[-- Type: text/plain, Size: 9494 bytes --]
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 40767736389..75410869796 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -10743,27 +10743,38 @@ fold_vec_perm_cst (tree type, tree arg0, tree arg1, const vec_perm_indices &sel,
unsigned res_npatterns, res_nelts_per_pattern;
unsigned HOST_WIDE_INT res_nelts;
- /* (1) If SEL is a suitable mask as determined by
- valid_mask_for_fold_vec_perm_cst_p, then:
- res_npatterns = max of npatterns between ARG0, ARG1, and SEL
- res_nelts_per_pattern = max of nelts_per_pattern between
- ARG0, ARG1 and SEL.
- (2) If SEL is not a suitable mask, and TYPE is VLS then:
- res_npatterns = nelts in result vector.
- res_nelts_per_pattern = 1.
- This exception is made so that VLS ARG0, ARG1 and SEL work as before. */
- if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
- {
- res_npatterns
- = std::max (VECTOR_CST_NPATTERNS (arg0),
- std::max (VECTOR_CST_NPATTERNS (arg1),
- sel.encoding ().npatterns ()));
+ /* First try to implement the fold in a VLA-friendly way.
+
+ (1) If the selector is simply a duplication of N elements, the
+ result is likewise a duplication of N elements.
+
+ (2) If the selector is N elements followed by a duplication
+ of N elements, the result is too.
+
+ (3) If the selector is N elements followed by an interleaving
+ of N linear series, the situation is more complex.
+
+ valid_mask_for_fold_vec_perm_cst_p detects whether we
+ can handle this case. If we can, then each of the N linear
+ series either (a) selects the same element each time or
+ (b) selects a linear series from one of the input patterns.
+
+ If (b) holds for one of the linear series, the result
+ will contain a linear series, and so the result will have
+ the same shape as the selector. If (a) holds for all of
+ the lienar series, the result will be the same as (2) above.
- res_nelts_per_pattern
- = std::max (VECTOR_CST_NELTS_PER_PATTERN (arg0),
- std::max (VECTOR_CST_NELTS_PER_PATTERN (arg1),
- sel.encoding ().nelts_per_pattern ()));
+ (b) can only hold if one of the input patterns has a
+ stepped encoding. */
+ if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
+ {
+ res_npatterns = sel.encoding ().npatterns ();
+ res_nelts_per_pattern = sel.encoding ().nelts_per_pattern ();
+ if (res_nelts_per_pattern == 3
+ && VECTOR_CST_NELTS_PER_PATTERN (arg0) < 3
+ && VECTOR_CST_NELTS_PER_PATTERN (arg1) < 3)
+ res_nelts_per_pattern = 2;
res_nelts = res_npatterns * res_nelts_per_pattern;
}
else if (TYPE_VECTOR_SUBPARTS (type).is_constant (&res_nelts))
@@ -17562,6 +17573,29 @@ test_nunits_min_2 (machine_mode vmode)
tree expected_res[] = { ARG0(0), ARG1(0), ARG1(1) };
validate_res (1, 3, res, expected_res);
}
+
+ /* Case 8: Same as aarch64/sve/slp_3.c:
+ arg0, arg1 are dup vectors.
+ sel = { 0, len, 1, len+1, 2, len+2, ... } // (2, 3)
+ So res = { arg0[0], arg1[0], ... } // (2, 1)
+
+ In this case, since the input vectors are dup, only the first two
+ elements per pattern in sel are considered significant. */
+ {
+ tree arg0 = build_vec_cst_rand (vmode, 1, 1);
+ tree arg1 = build_vec_cst_rand (vmode, 1, 1);
+ poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
+
+ vec_perm_builder builder (len, 2, 3);
+ poly_uint64 mask_elems[] = { 0, len, 1, len + 1, 2, len + 2 };
+ builder_push_elems (builder, mask_elems);
+
+ vec_perm_indices sel (builder, 2, len);
+ tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
+
+ tree expected_res[] = { ARG0(0), ARG1(0) };
+ validate_res (2, 1, res, expected_res);
+ }
}
}
@@ -17730,6 +17764,45 @@ test_nunits_min_4 (machine_mode vmode)
ASSERT_TRUE (res == NULL_TREE);
ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns"));
}
+
+ /* Case 8: PR111754: When input vector is not a stepped sequence,
+ check that the result is not a stepped sequence either, even
+ if sel has a stepped sequence. */
+ {
+ tree arg0 = build_vec_cst_rand (vmode, 1, 2);
+ tree arg1 = build_vec_cst_rand (vmode, 1, 2);
+ poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
+
+ vec_perm_builder builder (len, 1, 3);
+ poly_uint64 mask_elems[] = { 0, 1, 2 };
+ builder_push_elems (builder, mask_elems);
+
+ vec_perm_indices sel (builder, 2, len);
+ tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
+
+ tree expected_res[] = { ARG0(0), ARG0(1) };
+ validate_res (sel.encoding ().npatterns (), 2, res, expected_res);
+ }
+
+ /* Case 9: If sel doesn't contain a stepped sequence,
+ check that the result has same encoding as sel, irrespective
+ of shape of input vectors. */
+ {
+ tree arg0 = build_vec_cst_rand (vmode, 1, 3, 1);
+ tree arg1 = build_vec_cst_rand (vmode, 1, 3, 1);
+ poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
+
+ vec_perm_builder builder (len, 1, 2);
+ poly_uint64 mask_elems[] = { 0, len };
+ builder_push_elems (builder, mask_elems);
+
+ vec_perm_indices sel (builder, 2, len);
+ tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel);
+
+ tree expected_res[] = { ARG0(0), ARG1(0) };
+ validate_res (sel.encoding ().npatterns (),
+ sel.encoding ().nelts_per_pattern (), res, expected_res);
+ }
}
}
diff --git a/gcc/testsuite/gcc.dg/vect/pr111754.c b/gcc/testsuite/gcc.dg/vect/pr111754.c
new file mode 100644
index 00000000000..7c1c16875c7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr111754.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+typedef float __attribute__((__vector_size__ (16))) F;
+
+F foo (F a, F b)
+{
+ F v = (F) { 9 };
+ return __builtin_shufflevector (v, v, 1, 0, 1, 2);
+}
+
+/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "optimized" } } */
+/* { dg-final { scan-tree-dump "return \{ 0.0, 9.0e\\+0, 0.0, 0.0 \}" "optimized" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/slp_3.c b/gcc/testsuite/gcc.target/aarch64/sve/slp_3.c
index 82dd43a4d98..cb649bc1aa9 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/slp_3.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/slp_3.c
@@ -33,21 +33,15 @@ TEST_ALL (VEC_PERM)
/* 1 for each 8-bit type. */
/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 2 } } */
-/* 1 for each 16-bit type plus 1 for double. */
-/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 4 } } */
+/* 1 for each 16-bit type */
+/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 3 } } */
/* 1 for each 32-bit type. */
/* { dg-final { scan-assembler-times {\tld1rqw\tz[0-9]+\.s, } 3 } } */
-/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #41\n} 2 } } */
-/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #25\n} 2 } } */
-/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #31\n} 2 } } */
-/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #62\n} 2 } } */
-/* 3 for double. */
-/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, x[0-9]+\n} 3 } } */
/* The 64-bit types need:
ZIP1 ZIP1 (2 ZIP2s optimized away)
ZIP1 ZIP2. */
-/* { dg-final { scan-assembler-times {\tzip1\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 9 } } */
+/* { dg-final { scan-assembler-times {\tzip1\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */
/* { dg-final { scan-assembler-times {\tzip2\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */
/* The loop should be fully-masked. The 64-bit types need two loads
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/slp_4.c b/gcc/testsuite/gcc.target/aarch64/sve/slp_4.c
index b1fa5e3cf68..ce940a28597 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/slp_4.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/slp_4.c
@@ -35,20 +35,10 @@ vec_slp_##TYPE (TYPE *restrict a, int n) \
TEST_ALL (VEC_PERM)
-/* 1 for each 8-bit type, 4 for each 32-bit type and 4 for double. */
-/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 18 } } */
+/* 1 for each 8-bit type */
+/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 2 } } */
/* 1 for each 16-bit type. */
/* { dg-final { scan-assembler-times {\tld1rqh\tz[0-9]+\.h, } 3 } } */
-/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #99\n} 2 } } */
-/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #11\n} 2 } } */
-/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #17\n} 2 } } */
-/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #80\n} 2 } } */
-/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #63\n} 2 } } */
-/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #37\n} 2 } } */
-/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #24\n} 2 } } */
-/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #81\n} 2 } } */
-/* 4 for double. */
-/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, x[0-9]+\n} 4 } } */
/* The 32-bit types need:
ZIP1 ZIP1 (2 ZIP2s optimized away)
@@ -59,7 +49,7 @@ TEST_ALL (VEC_PERM)
ZIP1 ZIP1 ZIP1 ZIP1 (4 ZIP2s optimized away)
ZIP1 ZIP2 ZIP1 ZIP2
ZIP1 ZIP2 ZIP1 ZIP2. */
-/* { dg-final { scan-assembler-times {\tzip1\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 33 } } */
+/* { dg-final { scan-assembler-times {\tzip1\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 15 } } */
/* { dg-final { scan-assembler-times {\tzip2\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 15 } } */
/* The loop should be fully-masked. The 32-bit types need two loads
next prev parent reply other threads:[~2023-11-08 16:28 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-20 17:25 PR111754 Prathamesh Kulkarni
2023-10-24 21:28 ` PR111754 Richard Sandiford
2023-10-25 8:42 ` PR111754 Richard Sandiford
2023-10-25 10:59 ` PR111754 Prathamesh Kulkarni
2023-10-25 22:38 ` PR111754 Richard Sandiford
2023-10-26 4:13 ` PR111754 Prathamesh Kulkarni
2023-11-08 16:27 ` Prathamesh Kulkarni [this message]
2023-11-15 15:14 ` PR111754 Prathamesh Kulkarni
2023-11-23 11:40 ` PR111754 Prathamesh Kulkarni
2023-11-23 21:43 ` PR111754 Richard Sandiford
2023-11-27 15:13 ` PR111754 Prathamesh Kulkarni
2023-11-27 16:49 ` PR111754 Richard Sandiford
2023-11-28 7:56 PR111754 juzhe.zhong
2023-11-28 8:43 ` PR111754 Jakub Jelinek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAAgBjM=DNuR5n95jm5u1JH-K-ezPASkVOqUjEsHs5GN3XsTCTQ@mail.gmail.com' \
--to=prathamesh.kulkarni@linaro.org \
--cc=gcc-patches@gcc.gnu.org \
--cc=rguenther@suse.de \
--cc=richard.sandiford@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).