* [PATCH] aarch64: Fix invalid nested subregs [PR115464]
@ 2024-06-13 9:36 Richard Sandiford
2024-06-13 11:46 ` Richard Biener
0 siblings, 1 reply; 2+ messages in thread
From: Richard Sandiford @ 2024-06-13 9:36 UTC (permalink / raw)
To: gcc-patches; +Cc: rguenther, jlaw
The testcase extracts one arm_neon.h vector from a pair (one subreg)
and then reinterprets the result as an SVE vector (another subreg).
Each subreg makes sense individually, but we can't fold them together
into a single subreg: it's 32 bytes -> 16 bytes -> 16*N bytes,
but the interpretation of 32 bytes -> 16*N bytes depends on
whether N==1 or N>1.
Since the second subreg makes sense individually, simplify_subreg
should bail out rather than ICE on it. simplify_gen_subreg will
then do the same (because it already checks validate_subreg).
This leaves simplify_gen_subreg returning null, requiring the
caller to take appropriate action.
I think this is relatively likely to occur elsewhere, so the patch
adds a helper for forcing a subreg, allowing a temporary pseudo to
be created where necessary.
I'll follow up by using force_subreg in more places. This patch
is intended to be a minimal backportable fix for the PR.
Bootstrapped & regression tested on aarch64-linux-gnu. OK for trunk
and GCC 14 branch?
Richard
gcc/
PR target/115464
* simplify-rtx.cc (simplify_context::simplify_subreg): Don't try
to fold two subregs together if their relationship isn't known
at compile time.
* explow.h (force_subreg): Declare.
* explow.cc (force_subreg): New function.
* config/aarch64/aarch64-sve-builtins-base.cc
(svset_neonq_impl::expand): Use it instead of simplify_gen_subreg.
gcc/testsuite/
PR target/115464
* gcc.target/aarch64/sve/acle/general/pr115464.c: New test.
---
gcc/config/aarch64/aarch64-sve-builtins-base.cc | 2 +-
gcc/explow.cc | 15 +++++++++++++++
gcc/explow.h | 2 ++
gcc/simplify-rtx.cc | 5 +++++
.../aarch64/sve/acle/general/pr115464.c | 13 +++++++++++++
5 files changed, 36 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464.c
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 0d2edf3f19e..c9182594bc1 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -1174,7 +1174,7 @@ public:
Advanced SIMD argument as an SVE vector. */
if (!BYTES_BIG_ENDIAN
&& is_undef (CALL_EXPR_ARG (e.call_expr, 0)))
- return simplify_gen_subreg (mode, e.args[1], GET_MODE (e.args[1]), 0);
+ return force_subreg (mode, e.args[1], GET_MODE (e.args[1]), 0);
rtx_vector_builder builder (VNx16BImode, 16, 2);
for (unsigned int i = 0; i < 16; i++)
diff --git a/gcc/explow.cc b/gcc/explow.cc
index 8e5f6b8e680..f6843398c4b 100644
--- a/gcc/explow.cc
+++ b/gcc/explow.cc
@@ -745,6 +745,21 @@ force_reg (machine_mode mode, rtx x)
return temp;
}
+/* Like simplify_gen_subreg, but force OP into a new register if the
+ subreg cannot be formed directly. */
+
+rtx
+force_subreg (machine_mode outermode, rtx op,
+ machine_mode innermode, poly_uint64 byte)
+{
+ rtx x = simplify_gen_subreg (outermode, op, innermode, byte);
+ if (x)
+ return x;
+
+ op = copy_to_mode_reg (innermode, op);
+ return simplify_gen_subreg (outermode, op, innermode, byte);
+}
+
/* If X is a memory ref, copy its contents to a new temp reg and return
that reg. Otherwise, return X. */
diff --git a/gcc/explow.h b/gcc/explow.h
index 16aa02cfb68..cbd1fcb7eb3 100644
--- a/gcc/explow.h
+++ b/gcc/explow.h
@@ -42,6 +42,8 @@ extern rtx copy_to_suggested_reg (rtx, rtx, machine_mode);
Args are mode (in case value is a constant) and the value. */
extern rtx force_reg (machine_mode, rtx);
+extern rtx force_subreg (machine_mode, rtx, machine_mode, poly_uint64);
+
/* Return given rtx, copied into a new temp reg if it was in memory. */
extern rtx force_not_mem (rtx);
diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 9bc3ef9ad9f..b6bb7e1f9e9 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -7735,6 +7735,11 @@ simplify_context::simplify_subreg (machine_mode outermode, rtx op,
poly_uint64 innermostsize = GET_MODE_SIZE (innermostmode);
rtx newx;
+ /* Make sure that the relationship between the two subregs is
+ known at compile time. */
+ if (!ordered_p (outersize, innermostsize))
+ return NULL_RTX;
+
if (outermode == innermostmode
&& known_eq (byte, 0U)
&& known_eq (SUBREG_BYTE (op), 0))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464.c
new file mode 100644
index 00000000000..d728d1325ed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464.c
@@ -0,0 +1,13 @@
+/* { dg-options "-O2" } */
+
+#include <arm_neon.h>
+#include <arm_sve.h>
+#include <arm_neon_sve_bridge.h>
+
+svuint16_t
+convolve4_4_x (uint16x8x2_t permute_tbl)
+{
+ return svset_neonq_u16 (svundef_u16 (), permute_tbl.val[1]);
+}
+
+/* { dg-final { scan-assembler {\tmov\tz0\.d, z1\.d\n} } } */
--
2.25.1
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [PATCH] aarch64: Fix invalid nested subregs [PR115464]
2024-06-13 9:36 [PATCH] aarch64: Fix invalid nested subregs [PR115464] Richard Sandiford
@ 2024-06-13 11:46 ` Richard Biener
0 siblings, 0 replies; 2+ messages in thread
From: Richard Biener @ 2024-06-13 11:46 UTC (permalink / raw)
To: Richard Sandiford; +Cc: gcc-patches, jlaw
On Thu, 13 Jun 2024, Richard Sandiford wrote:
> The testcase extracts one arm_neon.h vector from a pair (one subreg)
> and then reinterprets the result as an SVE vector (another subreg).
> Each subreg makes sense individually, but we can't fold them together
> into a single subreg: it's 32 bytes -> 16 bytes -> 16*N bytes,
> but the interpretation of 32 bytes -> 16*N bytes depends on
> whether N==1 or N>1.
>
> Since the second subreg makes sense individually, simplify_subreg
> should bail out rather than ICE on it. simplify_gen_subreg will
> then do the same (because it already checks validate_subreg).
> This leaves simplify_gen_subreg returning null, requiring the
> caller to take appropriate action.
>
> I think this is relatively likely to occur elsewhere, so the patch
> adds a helper for forcing a subreg, allowing a temporary pseudo to
> be created where necessary.
>
> I'll follow up by using force_subreg in more places. This patch
> is intended to be a minimal backportable fix for the PR.
>
> Bootstrapped & regression tested on aarch64-linux-gnu. OK for trunk
> and GCC 14 branch?
OK for trunk and branch after it settles for a while.
Richard.
> Richard
>
>
> gcc/
> PR target/115464
> * simplify-rtx.cc (simplify_context::simplify_subreg): Don't try
> to fold two subregs together if their relationship isn't known
> at compile time.
> * explow.h (force_subreg): Declare.
> * explow.cc (force_subreg): New function.
> * config/aarch64/aarch64-sve-builtins-base.cc
> (svset_neonq_impl::expand): Use it instead of simplify_gen_subreg.
>
> gcc/testsuite/
> PR target/115464
> * gcc.target/aarch64/sve/acle/general/pr115464.c: New test.
> ---
> gcc/config/aarch64/aarch64-sve-builtins-base.cc | 2 +-
> gcc/explow.cc | 15 +++++++++++++++
> gcc/explow.h | 2 ++
> gcc/simplify-rtx.cc | 5 +++++
> .../aarch64/sve/acle/general/pr115464.c | 13 +++++++++++++
> 5 files changed, 36 insertions(+), 1 deletion(-)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464.c
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 0d2edf3f19e..c9182594bc1 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -1174,7 +1174,7 @@ public:
> Advanced SIMD argument as an SVE vector. */
> if (!BYTES_BIG_ENDIAN
> && is_undef (CALL_EXPR_ARG (e.call_expr, 0)))
> - return simplify_gen_subreg (mode, e.args[1], GET_MODE (e.args[1]), 0);
> + return force_subreg (mode, e.args[1], GET_MODE (e.args[1]), 0);
>
> rtx_vector_builder builder (VNx16BImode, 16, 2);
> for (unsigned int i = 0; i < 16; i++)
> diff --git a/gcc/explow.cc b/gcc/explow.cc
> index 8e5f6b8e680..f6843398c4b 100644
> --- a/gcc/explow.cc
> +++ b/gcc/explow.cc
> @@ -745,6 +745,21 @@ force_reg (machine_mode mode, rtx x)
> return temp;
> }
>
> +/* Like simplify_gen_subreg, but force OP into a new register if the
> + subreg cannot be formed directly. */
> +
> +rtx
> +force_subreg (machine_mode outermode, rtx op,
> + machine_mode innermode, poly_uint64 byte)
> +{
> + rtx x = simplify_gen_subreg (outermode, op, innermode, byte);
> + if (x)
> + return x;
> +
> + op = copy_to_mode_reg (innermode, op);
> + return simplify_gen_subreg (outermode, op, innermode, byte);
> +}
> +
> /* If X is a memory ref, copy its contents to a new temp reg and return
> that reg. Otherwise, return X. */
>
> diff --git a/gcc/explow.h b/gcc/explow.h
> index 16aa02cfb68..cbd1fcb7eb3 100644
> --- a/gcc/explow.h
> +++ b/gcc/explow.h
> @@ -42,6 +42,8 @@ extern rtx copy_to_suggested_reg (rtx, rtx, machine_mode);
> Args are mode (in case value is a constant) and the value. */
> extern rtx force_reg (machine_mode, rtx);
>
> +extern rtx force_subreg (machine_mode, rtx, machine_mode, poly_uint64);
> +
> /* Return given rtx, copied into a new temp reg if it was in memory. */
> extern rtx force_not_mem (rtx);
>
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index 9bc3ef9ad9f..b6bb7e1f9e9 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -7735,6 +7735,11 @@ simplify_context::simplify_subreg (machine_mode outermode, rtx op,
> poly_uint64 innermostsize = GET_MODE_SIZE (innermostmode);
> rtx newx;
>
> + /* Make sure that the relationship between the two subregs is
> + known at compile time. */
> + if (!ordered_p (outersize, innermostsize))
> + return NULL_RTX;
> +
> if (outermode == innermostmode
> && known_eq (byte, 0U)
> && known_eq (SUBREG_BYTE (op), 0))
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464.c
> new file mode 100644
> index 00000000000..d728d1325ed
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464.c
> @@ -0,0 +1,13 @@
> +/* { dg-options "-O2" } */
> +
> +#include <arm_neon.h>
> +#include <arm_sve.h>
> +#include <arm_neon_sve_bridge.h>
> +
> +svuint16_t
> +convolve4_4_x (uint16x8x2_t permute_tbl)
> +{
> + return svset_neonq_u16 (svundef_u16 (), permute_tbl.val[1]);
> +}
> +
> +/* { dg-final { scan-assembler {\tmov\tz0\.d, z1\.d\n} } } */
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2024-06-13 11:46 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-13 9:36 [PATCH] aarch64: Fix invalid nested subregs [PR115464] Richard Sandiford
2024-06-13 11:46 ` Richard Biener
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).