* [AArch64][testsuite] Adjust vect_copy_lane_1.c for new code-gen @ 2023-09-13 14:50 Prathamesh Kulkarni 2023-09-17 14:41 ` Richard Sandiford 0 siblings, 1 reply; 3+ messages in thread From: Prathamesh Kulkarni @ 2023-09-13 14:50 UTC (permalink / raw) To: Richard Sandiford, Adhemerval Zanella, gcc Patches [-- Attachment #1: Type: text/plain, Size: 1214 bytes --] Hi, After 27de9aa152141e7f3ee66372647d0f2cd94c4b90, there's a following regression: FAIL: gcc.target/aarch64/vect_copy_lane_1.c scan-assembler-times ins\\tv0.s\\[1\\], v1.s\\[0\\] 3 This happens because for the following function from vect_copy_lane_1.c: float32x2_t __attribute__((noinline, noclone)) test_copy_lane_f32 (float32x2_t a, float32x2_t b) { return vcopy_lane_f32 (a, 1, b, 0); } Before 27de9aa152141e7f3ee66372647d0f2cd94c4b90, it got lowered to following sequence in .optimized dump: <bb 2> [local count: 1073741824]: _4 = BIT_FIELD_REF <b_3(D), 32, 0>; __a_5 = BIT_INSERT_EXPR <a_2(D), _4, 32>; return __a_5; The above commit simplifies BIT_FIELD_REF + BIT_INSERT_EXPR to vector permutation and now thus gets lowered to: <bb 2> [local count: 1073741824]: __a_4 = VEC_PERM_EXPR <a_2(D), b_3(D), { 0, 2 }>; return __a_4; Since we give higher priority to aarch64_evpc_zip over aarch64_evpc_ins in aarch64_expand_vec_perm_const_1, it now generates: test_copy_lane_f32: zip1 v0.2s, v0.2s, v1.2s ret Similarly for test_copy_lane_[us]32. The attached patch adjusts the tests to reflect the change in code-gen and the tests pass. OK to commit ? Thanks, Prathamesh [-- Attachment #2: gnu-890-1.txt --] [-- Type: text/plain, Size: 826 bytes --] diff --git a/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c b/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c index 2848be564d5..811dc678b92 100644 --- a/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c +++ b/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c @@ -22,7 +22,7 @@ BUILD_TEST (uint16x4_t, uint16x4_t, , , u16, 3, 2) BUILD_TEST (float32x2_t, float32x2_t, , , f32, 1, 0) BUILD_TEST (int32x2_t, int32x2_t, , , s32, 1, 0) BUILD_TEST (uint32x2_t, uint32x2_t, , , u32, 1, 0) -/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[1\\\], v1.s\\\[0\\\]" 3 } } */ +/* { dg-final { scan-assembler-times "zip1\\tv0.2s, v0.2s, v1.2s" 3 } } */ BUILD_TEST (int64x1_t, int64x1_t, , , s64, 0, 0) BUILD_TEST (uint64x1_t, uint64x1_t, , , u64, 0, 0) BUILD_TEST (float64x1_t, float64x1_t, , , f64, 0, 0) ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [AArch64][testsuite] Adjust vect_copy_lane_1.c for new code-gen 2023-09-13 14:50 [AArch64][testsuite] Adjust vect_copy_lane_1.c for new code-gen Prathamesh Kulkarni @ 2023-09-17 14:41 ` Richard Sandiford 2023-09-19 5:55 ` Prathamesh Kulkarni 0 siblings, 1 reply; 3+ messages in thread From: Richard Sandiford @ 2023-09-17 14:41 UTC (permalink / raw) To: Prathamesh Kulkarni; +Cc: Adhemerval Zanella, gcc Patches Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > Hi, > After 27de9aa152141e7f3ee66372647d0f2cd94c4b90, there's a following regression: > FAIL: gcc.target/aarch64/vect_copy_lane_1.c scan-assembler-times > ins\\tv0.s\\[1\\], v1.s\\[0\\] 3 > > This happens because for the following function from vect_copy_lane_1.c: > float32x2_t > __attribute__((noinline, noclone)) test_copy_lane_f32 (float32x2_t a, > float32x2_t b) > { > return vcopy_lane_f32 (a, 1, b, 0); > } > > Before 27de9aa152141e7f3ee66372647d0f2cd94c4b90, > it got lowered to following sequence in .optimized dump: > <bb 2> [local count: 1073741824]: > _4 = BIT_FIELD_REF <b_3(D), 32, 0>; > __a_5 = BIT_INSERT_EXPR <a_2(D), _4, 32>; > return __a_5; > > The above commit simplifies BIT_FIELD_REF + BIT_INSERT_EXPR > to vector permutation and now thus gets lowered to: > > <bb 2> [local count: 1073741824]: > __a_4 = VEC_PERM_EXPR <a_2(D), b_3(D), { 0, 2 }>; > return __a_4; > > Since we give higher priority to aarch64_evpc_zip over aarch64_evpc_ins > in aarch64_expand_vec_perm_const_1, it now generates: > > test_copy_lane_f32: > zip1 v0.2s, v0.2s, v1.2s > ret > > Similarly for test_copy_lane_[us]32. Yeah, I suppose this choice is at least as good as INS. It has the advantage that the source and destination don't need to be tied. For example: int32x2_t f(int32x2_t a, int32x2_t b, int32x2_t c) { return vcopy_lane_s32 (b, 1, c, 0); } used to be: f: mov v0.8b, v1.8b ins v0.s[1], v2.s[0] ret but is now: f: zip1 v0.2s, v1.2s, v2.2s ret > The attached patch adjusts the tests to reflect the change in code-gen > and the tests pass. > OK to commit ? > > Thanks, > Prathamesh > > diff --git a/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c b/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c > index 2848be564d5..811dc678b92 100644 > --- a/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c > +++ b/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c > @@ -22,7 +22,7 @@ BUILD_TEST (uint16x4_t, uint16x4_t, , , u16, 3, 2) > BUILD_TEST (float32x2_t, float32x2_t, , , f32, 1, 0) > BUILD_TEST (int32x2_t, int32x2_t, , , s32, 1, 0) > BUILD_TEST (uint32x2_t, uint32x2_t, , , u32, 1, 0) > -/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[1\\\], v1.s\\\[0\\\]" 3 } } */ > +/* { dg-final { scan-assembler-times "zip1\\tv0.2s, v0.2s, v1.2s" 3 } } */ > BUILD_TEST (int64x1_t, int64x1_t, , , s64, 0, 0) > BUILD_TEST (uint64x1_t, uint64x1_t, , , u64, 0, 0) > BUILD_TEST (float64x1_t, float64x1_t, , , f64, 0, 0) OK, thanks. Richard ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [AArch64][testsuite] Adjust vect_copy_lane_1.c for new code-gen 2023-09-17 14:41 ` Richard Sandiford @ 2023-09-19 5:55 ` Prathamesh Kulkarni 0 siblings, 0 replies; 3+ messages in thread From: Prathamesh Kulkarni @ 2023-09-19 5:55 UTC (permalink / raw) To: Prathamesh Kulkarni, Adhemerval Zanella, gcc Patches, richard.sandiford On Sun, 17 Sept 2023 at 20:11, Richard Sandiford <richard.sandiford@arm.com> wrote: > > Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes: > > Hi, > > After 27de9aa152141e7f3ee66372647d0f2cd94c4b90, there's a following regression: > > FAIL: gcc.target/aarch64/vect_copy_lane_1.c scan-assembler-times > > ins\\tv0.s\\[1\\], v1.s\\[0\\] 3 > > > > This happens because for the following function from vect_copy_lane_1.c: > > float32x2_t > > __attribute__((noinline, noclone)) test_copy_lane_f32 (float32x2_t a, > > float32x2_t b) > > { > > return vcopy_lane_f32 (a, 1, b, 0); > > } > > > > Before 27de9aa152141e7f3ee66372647d0f2cd94c4b90, > > it got lowered to following sequence in .optimized dump: > > <bb 2> [local count: 1073741824]: > > _4 = BIT_FIELD_REF <b_3(D), 32, 0>; > > __a_5 = BIT_INSERT_EXPR <a_2(D), _4, 32>; > > return __a_5; > > > > The above commit simplifies BIT_FIELD_REF + BIT_INSERT_EXPR > > to vector permutation and now thus gets lowered to: > > > > <bb 2> [local count: 1073741824]: > > __a_4 = VEC_PERM_EXPR <a_2(D), b_3(D), { 0, 2 }>; > > return __a_4; > > > > Since we give higher priority to aarch64_evpc_zip over aarch64_evpc_ins > > in aarch64_expand_vec_perm_const_1, it now generates: > > > > test_copy_lane_f32: > > zip1 v0.2s, v0.2s, v1.2s > > ret > > > > Similarly for test_copy_lane_[us]32. > > Yeah, I suppose this choice is at least as good as INS. It has the advantage > that the source and destination don't need to be tied. For example: > > int32x2_t f(int32x2_t a, int32x2_t b, int32x2_t c) { > return vcopy_lane_s32 (b, 1, c, 0); > } > > used to be: > > f: > mov v0.8b, v1.8b > ins v0.s[1], v2.s[0] > ret > > but is now: > > f: > zip1 v0.2s, v1.2s, v2.2s > ret > > > The attached patch adjusts the tests to reflect the change in code-gen > > and the tests pass. > > OK to commit ? > > > > Thanks, > > Prathamesh > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c b/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c > > index 2848be564d5..811dc678b92 100644 > > --- a/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c > > +++ b/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c > > @@ -22,7 +22,7 @@ BUILD_TEST (uint16x4_t, uint16x4_t, , , u16, 3, 2) > > BUILD_TEST (float32x2_t, float32x2_t, , , f32, 1, 0) > > BUILD_TEST (int32x2_t, int32x2_t, , , s32, 1, 0) > > BUILD_TEST (uint32x2_t, uint32x2_t, , , u32, 1, 0) > > -/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[1\\\], v1.s\\\[0\\\]" 3 } } */ > > +/* { dg-final { scan-assembler-times "zip1\\tv0.2s, v0.2s, v1.2s" 3 } } */ > > BUILD_TEST (int64x1_t, int64x1_t, , , s64, 0, 0) > > BUILD_TEST (uint64x1_t, uint64x1_t, , , u64, 0, 0) > > BUILD_TEST (float64x1_t, float64x1_t, , , f64, 0, 0) > > OK, thanks. Thanks, committed to trunk in 98c25cfc79a21886de7342fb563c4eb3c3d5f4e9. Thanks, Prathamesh > > Richard ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-09-19 5:56 UTC | newest] Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-09-13 14:50 [AArch64][testsuite] Adjust vect_copy_lane_1.c for new code-gen Prathamesh Kulkarni 2023-09-17 14:41 ` Richard Sandiford 2023-09-19 5:55 ` Prathamesh Kulkarni
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).