* [PATCH] simplify-rtx: Simplify VEC_CONCAT of SUBREG and VEC_CONCAT from same vector
@ 2023-06-16 9:06 Kyrylo Tkachov
2023-06-17 2:31 ` Jeff Law
0 siblings, 1 reply; 2+ messages in thread
From: Kyrylo Tkachov @ 2023-06-16 9:06 UTC (permalink / raw)
To: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 1514 bytes --]
Hi all,
In the testcase for this patch we try to vec_concat the lowpart and highpart of a vector, but the lowpart is expressed as a subreg.
simplify-rtx.cc does not recognise this and combine ends up trying to match:
Trying 7 -> 8:
7: r93:V2SI=vec_select(r95:V4SI,parallel)
8: r97:V4SI=vec_concat(r95:V4SI#0,r93:V2SI)
REG_DEAD r95:V4SI
REG_DEAD r93:V2SI
Failed to match this instruction:
(set (reg:V4SI 97)
(vec_concat:V4SI (subreg:V2SI (reg/v:V4SI 95 [ a ]) 0)
(vec_select:V2SI (reg/v:V4SI 95 [ a ])
(parallel:V4SI [
(const_int 2 [0x2])
(const_int 3 [0x3])
]))))
This should be just (set (reg:V4SI 97) (reg:V4SI 95)). This patch adds such a simplification.
The testcase is a bit artificial, but I do have other aarch64-specific patterns that I want to optimise later
that rely on this simplification happening.
Without this patch for the testcase we generate:
foo:
dup d31, v0.d[1]
ins v0.d[1], v31.d[0]
ret
whereas we should just not generate anything as the operation is ultimately a no-op.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
Ok for trunk?
Thanks,
Kyrill
gcc/ChangeLog:
* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
Simplify vec_concat of lowpart subreg and high part vec_select.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/simd/low-high-combine_1.c: New test.
[-- Attachment #2: concat-subreg.patch --]
[-- Type: application/octet-stream, Size: 1645 bytes --]
diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 21b7eb484d05818bb563e086e07a6152a3a3c6b7..9c68d36067236a8b14ddefae3455f12fe30e35d2 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -4860,6 +4860,17 @@ simplify_ashift:
return simplify_gen_binary (VEC_SELECT, mode, XEXP (trueop0, 0),
gen_rtx_PARALLEL (VOIDmode, vec));
}
+ /* (vec_concat:
+ (subreg_lowpart:N OP)
+ (vec_select:N OP P)) --> OP when P selects the high half
+ of the OP. */
+ if (GET_CODE (trueop0) == SUBREG
+ && subreg_lowpart_p (trueop0)
+ && GET_CODE (trueop1) == VEC_SELECT
+ && SUBREG_REG (trueop0) == XEXP (trueop1, 0)
+ && !side_effects_p (XEXP (trueop1, 0))
+ && vec_series_highpart_p (op1_mode, mode, XEXP (trueop1, 1)))
+ return XEXP (trueop1, 0);
}
return 0;
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/low-high-combine_1.c b/gcc/testsuite/gcc.target/aarch64/simd/low-high-combine_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..0b502d593f2be387270ae929e75b6a6c2efc5f2e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/low-high-combine_1.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#include <arm_neon.h>
+
+/*
+** foo_le: { target aarch64_little_endian }
+** ret
+*/
+
+int32x4_t
+foo_le (int32x4_t a)
+{
+ return vcombine_s32 (vget_low_s32 (a), vget_high_s32 (a));
+}
+
+/*
+** foo_be: { target aarch64_big_endian }
+** ret
+*/
+
+int32x4_t
+foo_be (int32x4_t a)
+{
+ return vcombine_s32 (vget_high_s32 (a), vget_low_s32 (a));
+}
+
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [PATCH] simplify-rtx: Simplify VEC_CONCAT of SUBREG and VEC_CONCAT from same vector
2023-06-16 9:06 [PATCH] simplify-rtx: Simplify VEC_CONCAT of SUBREG and VEC_CONCAT from same vector Kyrylo Tkachov
@ 2023-06-17 2:31 ` Jeff Law
0 siblings, 0 replies; 2+ messages in thread
From: Jeff Law @ 2023-06-17 2:31 UTC (permalink / raw)
To: Kyrylo Tkachov, gcc-patches
On 6/16/23 03:06, Kyrylo Tkachov via Gcc-patches wrote:
> Hi all,
>
> In the testcase for this patch we try to vec_concat the lowpart and highpart of a vector, but the lowpart is expressed as a subreg.
> simplify-rtx.cc does not recognise this and combine ends up trying to match:
> Trying 7 -> 8:
> 7: r93:V2SI=vec_select(r95:V4SI,parallel)
> 8: r97:V4SI=vec_concat(r95:V4SI#0,r93:V2SI)
> REG_DEAD r95:V4SI
> REG_DEAD r93:V2SI
> Failed to match this instruction:
> (set (reg:V4SI 97)
> (vec_concat:V4SI (subreg:V2SI (reg/v:V4SI 95 [ a ]) 0)
> (vec_select:V2SI (reg/v:V4SI 95 [ a ])
> (parallel:V4SI [
> (const_int 2 [0x2])
> (const_int 3 [0x3])
> ]))))
>
> This should be just (set (reg:V4SI 97) (reg:V4SI 95)). This patch adds such a simplification.
> The testcase is a bit artificial, but I do have other aarch64-specific patterns that I want to optimise later
> that rely on this simplification happening.
>
> Without this patch for the testcase we generate:
> foo:
> dup d31, v0.d[1]
> ins v0.d[1], v31.d[0]
> ret
>
> whereas we should just not generate anything as the operation is ultimately a no-op.
>
> Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
> Ok for trunk?
> Thanks,
> Kyrill
>
> gcc/ChangeLog:
>
> * simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
> Simplify vec_concat of lowpart subreg and high part vec_select.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/simd/low-high-combine_1.c: New test.
OK.
Jeff
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2023-06-17 2:31 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-16 9:06 [PATCH] simplify-rtx: Simplify VEC_CONCAT of SUBREG and VEC_CONCAT from same vector Kyrylo Tkachov
2023-06-17 2:31 ` Jeff Law
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).