* [AArch64][testsuite] Adjust vect_copy_lane_1.c for new code-gen
@ 2023-09-13 14:50 Prathamesh Kulkarni
2023-09-17 14:41 ` Richard Sandiford
0 siblings, 1 reply; 3+ messages in thread
From: Prathamesh Kulkarni @ 2023-09-13 14:50 UTC (permalink / raw)
To: Richard Sandiford, Adhemerval Zanella, gcc Patches
[-- Attachment #1: Type: text/plain, Size: 1214 bytes --]
Hi,
After 27de9aa152141e7f3ee66372647d0f2cd94c4b90, there's a following regression:
FAIL: gcc.target/aarch64/vect_copy_lane_1.c scan-assembler-times
ins\\tv0.s\\[1\\], v1.s\\[0\\] 3
This happens because for the following function from vect_copy_lane_1.c:
float32x2_t
__attribute__((noinline, noclone)) test_copy_lane_f32 (float32x2_t a,
float32x2_t b)
{
return vcopy_lane_f32 (a, 1, b, 0);
}
Before 27de9aa152141e7f3ee66372647d0f2cd94c4b90,
it got lowered to following sequence in .optimized dump:
<bb 2> [local count: 1073741824]:
_4 = BIT_FIELD_REF <b_3(D), 32, 0>;
__a_5 = BIT_INSERT_EXPR <a_2(D), _4, 32>;
return __a_5;
The above commit simplifies BIT_FIELD_REF + BIT_INSERT_EXPR
to vector permutation and now thus gets lowered to:
<bb 2> [local count: 1073741824]:
__a_4 = VEC_PERM_EXPR <a_2(D), b_3(D), { 0, 2 }>;
return __a_4;
Since we give higher priority to aarch64_evpc_zip over aarch64_evpc_ins
in aarch64_expand_vec_perm_const_1, it now generates:
test_copy_lane_f32:
zip1 v0.2s, v0.2s, v1.2s
ret
Similarly for test_copy_lane_[us]32.
The attached patch adjusts the tests to reflect the change in code-gen
and the tests pass.
OK to commit ?
Thanks,
Prathamesh
[-- Attachment #2: gnu-890-1.txt --]
[-- Type: text/plain, Size: 826 bytes --]
diff --git a/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c b/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c
index 2848be564d5..811dc678b92 100644
--- a/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c
@@ -22,7 +22,7 @@ BUILD_TEST (uint16x4_t, uint16x4_t, , , u16, 3, 2)
BUILD_TEST (float32x2_t, float32x2_t, , , f32, 1, 0)
BUILD_TEST (int32x2_t, int32x2_t, , , s32, 1, 0)
BUILD_TEST (uint32x2_t, uint32x2_t, , , u32, 1, 0)
-/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[1\\\], v1.s\\\[0\\\]" 3 } } */
+/* { dg-final { scan-assembler-times "zip1\\tv0.2s, v0.2s, v1.2s" 3 } } */
BUILD_TEST (int64x1_t, int64x1_t, , , s64, 0, 0)
BUILD_TEST (uint64x1_t, uint64x1_t, , , u64, 0, 0)
BUILD_TEST (float64x1_t, float64x1_t, , , f64, 0, 0)
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [AArch64][testsuite] Adjust vect_copy_lane_1.c for new code-gen
2023-09-13 14:50 [AArch64][testsuite] Adjust vect_copy_lane_1.c for new code-gen Prathamesh Kulkarni
@ 2023-09-17 14:41 ` Richard Sandiford
2023-09-19 5:55 ` Prathamesh Kulkarni
0 siblings, 1 reply; 3+ messages in thread
From: Richard Sandiford @ 2023-09-17 14:41 UTC (permalink / raw)
To: Prathamesh Kulkarni; +Cc: Adhemerval Zanella, gcc Patches
Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> Hi,
> After 27de9aa152141e7f3ee66372647d0f2cd94c4b90, there's a following regression:
> FAIL: gcc.target/aarch64/vect_copy_lane_1.c scan-assembler-times
> ins\\tv0.s\\[1\\], v1.s\\[0\\] 3
>
> This happens because for the following function from vect_copy_lane_1.c:
> float32x2_t
> __attribute__((noinline, noclone)) test_copy_lane_f32 (float32x2_t a,
> float32x2_t b)
> {
> return vcopy_lane_f32 (a, 1, b, 0);
> }
>
> Before 27de9aa152141e7f3ee66372647d0f2cd94c4b90,
> it got lowered to following sequence in .optimized dump:
> <bb 2> [local count: 1073741824]:
> _4 = BIT_FIELD_REF <b_3(D), 32, 0>;
> __a_5 = BIT_INSERT_EXPR <a_2(D), _4, 32>;
> return __a_5;
>
> The above commit simplifies BIT_FIELD_REF + BIT_INSERT_EXPR
> to vector permutation and now thus gets lowered to:
>
> <bb 2> [local count: 1073741824]:
> __a_4 = VEC_PERM_EXPR <a_2(D), b_3(D), { 0, 2 }>;
> return __a_4;
>
> Since we give higher priority to aarch64_evpc_zip over aarch64_evpc_ins
> in aarch64_expand_vec_perm_const_1, it now generates:
>
> test_copy_lane_f32:
> zip1 v0.2s, v0.2s, v1.2s
> ret
>
> Similarly for test_copy_lane_[us]32.
Yeah, I suppose this choice is at least as good as INS. It has the advantage
that the source and destination don't need to be tied. For example:
int32x2_t f(int32x2_t a, int32x2_t b, int32x2_t c) {
return vcopy_lane_s32 (b, 1, c, 0);
}
used to be:
f:
mov v0.8b, v1.8b
ins v0.s[1], v2.s[0]
ret
but is now:
f:
zip1 v0.2s, v1.2s, v2.2s
ret
> The attached patch adjusts the tests to reflect the change in code-gen
> and the tests pass.
> OK to commit ?
>
> Thanks,
> Prathamesh
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c b/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c
> index 2848be564d5..811dc678b92 100644
> --- a/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c
> @@ -22,7 +22,7 @@ BUILD_TEST (uint16x4_t, uint16x4_t, , , u16, 3, 2)
> BUILD_TEST (float32x2_t, float32x2_t, , , f32, 1, 0)
> BUILD_TEST (int32x2_t, int32x2_t, , , s32, 1, 0)
> BUILD_TEST (uint32x2_t, uint32x2_t, , , u32, 1, 0)
> -/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[1\\\], v1.s\\\[0\\\]" 3 } } */
> +/* { dg-final { scan-assembler-times "zip1\\tv0.2s, v0.2s, v1.2s" 3 } } */
> BUILD_TEST (int64x1_t, int64x1_t, , , s64, 0, 0)
> BUILD_TEST (uint64x1_t, uint64x1_t, , , u64, 0, 0)
> BUILD_TEST (float64x1_t, float64x1_t, , , f64, 0, 0)
OK, thanks.
Richard
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [AArch64][testsuite] Adjust vect_copy_lane_1.c for new code-gen
2023-09-17 14:41 ` Richard Sandiford
@ 2023-09-19 5:55 ` Prathamesh Kulkarni
0 siblings, 0 replies; 3+ messages in thread
From: Prathamesh Kulkarni @ 2023-09-19 5:55 UTC (permalink / raw)
To: Prathamesh Kulkarni, Adhemerval Zanella, gcc Patches, richard.sandiford
On Sun, 17 Sept 2023 at 20:11, Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> > Hi,
> > After 27de9aa152141e7f3ee66372647d0f2cd94c4b90, there's a following regression:
> > FAIL: gcc.target/aarch64/vect_copy_lane_1.c scan-assembler-times
> > ins\\tv0.s\\[1\\], v1.s\\[0\\] 3
> >
> > This happens because for the following function from vect_copy_lane_1.c:
> > float32x2_t
> > __attribute__((noinline, noclone)) test_copy_lane_f32 (float32x2_t a,
> > float32x2_t b)
> > {
> > return vcopy_lane_f32 (a, 1, b, 0);
> > }
> >
> > Before 27de9aa152141e7f3ee66372647d0f2cd94c4b90,
> > it got lowered to following sequence in .optimized dump:
> > <bb 2> [local count: 1073741824]:
> > _4 = BIT_FIELD_REF <b_3(D), 32, 0>;
> > __a_5 = BIT_INSERT_EXPR <a_2(D), _4, 32>;
> > return __a_5;
> >
> > The above commit simplifies BIT_FIELD_REF + BIT_INSERT_EXPR
> > to vector permutation and now thus gets lowered to:
> >
> > <bb 2> [local count: 1073741824]:
> > __a_4 = VEC_PERM_EXPR <a_2(D), b_3(D), { 0, 2 }>;
> > return __a_4;
> >
> > Since we give higher priority to aarch64_evpc_zip over aarch64_evpc_ins
> > in aarch64_expand_vec_perm_const_1, it now generates:
> >
> > test_copy_lane_f32:
> > zip1 v0.2s, v0.2s, v1.2s
> > ret
> >
> > Similarly for test_copy_lane_[us]32.
>
> Yeah, I suppose this choice is at least as good as INS. It has the advantage
> that the source and destination don't need to be tied. For example:
>
> int32x2_t f(int32x2_t a, int32x2_t b, int32x2_t c) {
> return vcopy_lane_s32 (b, 1, c, 0);
> }
>
> used to be:
>
> f:
> mov v0.8b, v1.8b
> ins v0.s[1], v2.s[0]
> ret
>
> but is now:
>
> f:
> zip1 v0.2s, v1.2s, v2.2s
> ret
>
> > The attached patch adjusts the tests to reflect the change in code-gen
> > and the tests pass.
> > OK to commit ?
> >
> > Thanks,
> > Prathamesh
> >
> > diff --git a/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c b/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c
> > index 2848be564d5..811dc678b92 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c
> > @@ -22,7 +22,7 @@ BUILD_TEST (uint16x4_t, uint16x4_t, , , u16, 3, 2)
> > BUILD_TEST (float32x2_t, float32x2_t, , , f32, 1, 0)
> > BUILD_TEST (int32x2_t, int32x2_t, , , s32, 1, 0)
> > BUILD_TEST (uint32x2_t, uint32x2_t, , , u32, 1, 0)
> > -/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[1\\\], v1.s\\\[0\\\]" 3 } } */
> > +/* { dg-final { scan-assembler-times "zip1\\tv0.2s, v0.2s, v1.2s" 3 } } */
> > BUILD_TEST (int64x1_t, int64x1_t, , , s64, 0, 0)
> > BUILD_TEST (uint64x1_t, uint64x1_t, , , u64, 0, 0)
> > BUILD_TEST (float64x1_t, float64x1_t, , , f64, 0, 0)
>
> OK, thanks.
Thanks, committed to trunk in 98c25cfc79a21886de7342fb563c4eb3c3d5f4e9.
Thanks,
Prathamesh
>
> Richard
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-09-19 5:56 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-13 14:50 [AArch64][testsuite] Adjust vect_copy_lane_1.c for new code-gen Prathamesh Kulkarni
2023-09-17 14:41 ` Richard Sandiford
2023-09-19 5:55 ` Prathamesh Kulkarni
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).