* [PATCH] Fix memory constraint on MVE v[ld/st][2/4] instructions [PR107714]
@ 2022-12-09 13:32 Stam Markianos-Wright
2022-12-12 13:42 ` Kyrylo Tkachov
0 siblings, 1 reply; 3+ messages in thread
From: Stam Markianos-Wright @ 2022-12-09 13:32 UTC (permalink / raw)
To: gcc-patches; +Cc: Kyrylo Tkachov, richard Earnshaw, Ramana Radhakrishnan, nickc
[-- Attachment #1: Type: text/plain, Size: 1364 bytes --]
Hi all,
In the M-Class Arm-ARM:
https://developer.arm.com/documentation/ddi0553/bu/?lang=en
these MVE instructions only have '!' writeback variant and at:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107714
we found that the Um constraint would also allow through a
register offset writeback, resulting in an assembler error.
Here I have added a new constraint and predicate for these
instructions, which (uniquely, AFAICT), only support a `!` writeback
increment by the data size (inside the compiler this is a POST_INC).
No regressions in arm-none-eabi with MVE and MVE.FP.
Ok for trunk, and backport to GCC11 and GCC12 (testing pending)?
Thanks,
Stam
gcc/ChangeLog:
PR target/107714
* config/arm/arm-protos.h (mve_struct_mem_operand): New protoype.
* config/arm/arm.cc (mve_struct_mem_operand): New function.
* config/arm/constraints.md (Ug): New constraint.
* config/arm/mve.md (mve_vst4q<mode>): Change constraint.
(mve_vst2q<mode>): Likewise.
(mve_vld4q<mode>): Likewise.
(mve_vld2q<mode>): Likewise.
* config/arm/predicates.md (mve_struct_operand): New predicate.
gcc/testsuite/ChangeLog:
PR target/107714
* gcc.target/arm/mve/intrinsics/vldst24q_reg_offset.c: New test.
[-- Attachment #2: rb16665.patch --]
[-- Type: text/x-patch, Size: 15172 bytes --]
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 550272facd12e60a49bf8a3b20f811cc13765b3a..8ea38118b05769bd6fcb1d22d902a50979cfd953 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -122,6 +122,7 @@ extern int arm_coproc_mem_operand_wb (rtx, int);
extern int neon_vector_mem_operand (rtx, int, bool);
extern int mve_vector_mem_operand (machine_mode, rtx, bool);
extern int neon_struct_mem_operand (rtx);
+extern int mve_struct_mem_operand (rtx);
extern rtx *neon_vcmla_lane_prepare_operands (rtx *);
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index b587561eebea921bdc68016922d37948e2870ce2..31f2a7b9d4688dde69d1435e24cf885e8544be71 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -13737,6 +13737,24 @@ neon_vector_mem_operand (rtx op, int type, bool strict)
return FALSE;
}
+/* Return TRUE if OP is a mem suitable for loading/storing an MVE struct
+ type. */
+int
+mve_struct_mem_operand (rtx op)
+{
+ rtx ind = XEXP (op, 0);
+
+ /* Match: (mem (reg)). */
+ if (REG_P (ind))
+ return arm_address_register_rtx_p (ind, 0);
+
+ /* Allow only post-increment by the mode size. */
+ if (GET_CODE (ind) == POST_INC)
+ return arm_address_register_rtx_p (XEXP (ind, 0), 0);
+
+ return FALSE;
+}
+
/* Return TRUE if OP is a mem suitable for loading/storing a Neon struct
type. */
int
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index e5a36d29c7135943b9bb5ea396f70e2e4beb1e4a..8908b7f5b15ce150685868e78e75280bf32053f1 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -474,6 +474,12 @@
(and (match_code "mem")
(match_test "TARGET_32BIT && arm_coproc_mem_operand (op, FALSE)")))
+(define_memory_constraint "Ug"
+ "@internal
+ In Thumb-2 state a valid MVE struct load/store address."
+ (and (match_code "mem")
+ (match_test "TARGET_HAVE_MVE && mve_struct_mem_operand (op)")))
+
(define_memory_constraint "Uj"
"@internal
In ARM/Thumb-2 state a VFP load/store address that supports writeback
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index b5e6da4b1335818a3e8815de59850e845a2d0400..847bc032afa2c3977c05725562a14940beb282d4 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -99,7 +99,7 @@
;; [vst4q])
;;
(define_insn "mve_vst4q<mode>"
- [(set (match_operand:XI 0 "neon_struct_operand" "=Um")
+ [(set (match_operand:XI 0 "mve_struct_operand" "=Ug")
(unspec:XI [(match_operand:XI 1 "s_register_operand" "w")
(unspec:MVE_VLD_ST [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
VST4Q))
@@ -9959,7 +9959,7 @@
;; [vst2q])
;;
(define_insn "mve_vst2q<mode>"
- [(set (match_operand:OI 0 "neon_struct_operand" "=Um")
+ [(set (match_operand:OI 0 "mve_struct_operand" "=Ug")
(unspec:OI [(match_operand:OI 1 "s_register_operand" "w")
(unspec:MVE_VLD_ST [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
VST2Q))
@@ -9988,7 +9988,7 @@
;;
(define_insn "mve_vld2q<mode>"
[(set (match_operand:OI 0 "s_register_operand" "=w")
- (unspec:OI [(match_operand:OI 1 "neon_struct_operand" "Um")
+ (unspec:OI [(match_operand:OI 1 "mve_struct_operand" "Ug")
(unspec:MVE_VLD_ST [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
VLD2Q))
]
@@ -10016,7 +10016,7 @@
;;
(define_insn "mve_vld4q<mode>"
[(set (match_operand:XI 0 "s_register_operand" "=w")
- (unspec:XI [(match_operand:XI 1 "neon_struct_operand" "Um")
+ (unspec:XI [(match_operand:XI 1 "mve_struct_operand" "Ug")
(unspec:MVE_VLD_ST [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
VLD4Q))
]
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index aab5a91ad4ddc6a7a02611d05442d6de63841a7c..67f2fdb4f8f607ceb50871e1bc17dbdb9b987c2c 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -876,6 +876,10 @@
(and (match_code "mem")
(match_test "TARGET_32BIT && neon_vector_mem_operand (op, 2, true)")))
+(define_predicate "mve_struct_operand"
+ (and (match_code "mem")
+ (match_test "TARGET_HAVE_MVE && mve_struct_mem_operand (op)")))
+
(define_predicate "neon_permissive_struct_operand"
(and (match_code "mem")
(match_test "TARGET_32BIT && neon_vector_mem_operand (op, 2, false)")))
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldst24q_reg_offset.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldst24q_reg_offset.c
new file mode 100644
index 0000000000000000000000000000000000000000..d028b91e81aed97e4b30978b6d130a6f97f1cbc3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldst24q_reg_offset.c
@@ -0,0 +1,300 @@
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-additional-options "-O1" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "arm_mve.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*
+**test:
+** ...
+** vld20.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld21.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+** vld20.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld21.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+** vst20.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst21.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+** vst20.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst21.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+*/
+void
+test(const uint8_t * in, uint8_t * out, int width)
+{
+ uint8x16x2_t rg = vld2q(in);
+ uint8x16x2_t gb = vld2q(in + width);
+ vst2q (out, rg);
+ vst2q (out + width, gb);
+}
+
+/*
+**test2:
+** ...
+** vld20.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld21.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]!
+** vld20.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld21.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst20.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst21.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]!
+** vst20.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst21.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+*/
+void
+test2(const uint8_t * in, uint8_t * out)
+{
+ uint8x16x2_t rg = vld2q(in);
+ uint8x16x2_t gb = vld2q(in + 32);
+ vst2q (out, rg);
+ vst2q (out + 32, gb);
+}
+
+/*
+**test3:
+** ...
+** vld20.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld21.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+** vld20.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld21.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+** vst20.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst21.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+** vst20.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst21.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+*/
+void
+test3(const uint8_t * in, uint8_t * out)
+{
+ uint8x16x2_t rg = vld2q(in);
+ uint8x16x2_t gb = vld2q(in - 32);
+ vst2q (out, rg);
+ vst2q (out - 32, gb);
+}
+
+/*
+**test4:
+** ...
+** vld20.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld21.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+** vld20.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld21.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+** vst20.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst21.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+** vst20.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst21.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+*/
+void
+test4(const uint8_t * in, uint8_t * out)
+{
+ uint8x16x2_t rg = vld2q(in);
+ uint8x16x2_t gb = vld2q(in + 64);
+ vst2q (out, rg);
+ vst2q (out + 64, gb);
+}
+
+/*
+**test5:
+** ...
+** vld20.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld21.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+** vld20.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld21.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+** vst20.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst21.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+** vst20.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst21.8 {q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+*/
+void
+test5(const uint8_t * in, uint8_t * out)
+{
+ uint8x16x2_t rg = vld2q(in);
+ uint8x16x2_t gb = vld2q(in + 42);
+ vst2q (out, rg);
+ vst2q (out + 42, gb);
+}
+
+/*
+**test6:
+** ...
+** vld40.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld41.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld42.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld43.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+** vld40.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld41.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld42.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld43.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+** vst40.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst41.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst42.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst43.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+** vst40.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst41.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst42.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst43.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+*/
+void
+test6(const uint8_t * in, uint8_t * out, int width)
+{
+ uint8x16x4_t rg = vld4q(in);
+ uint8x16x4_t gb = vld4q(in + width);
+ vst4q (out, rg);
+ vst4q (out + width, gb);
+}
+
+/*
+**test7:
+** ...
+** vld40.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld41.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld42.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld43.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+** vld40.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld41.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld42.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld43.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+** vst40.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst41.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst42.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst43.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+** vst40.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst41.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst42.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst43.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+*/
+void
+test7(const uint8_t * in, uint8_t * out)
+{
+ uint8x16x4_t rg = vld4q(in);
+ uint8x16x4_t gb = vld4q(in + 32);
+ vst4q (out, rg);
+ vst4q (out + 32, gb);
+}
+
+/*
+**test8:
+** ...
+** vld40.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld41.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld42.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld43.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]!
+** vld40.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld41.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld42.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld43.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst40.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst41.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst42.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst43.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]!
+** vst40.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst41.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst42.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst43.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+*/
+void
+test8(const uint8_t * in, uint8_t * out)
+{
+ uint8x16x4_t rg = vld4q(in);
+ uint8x16x4_t gb = vld4q(in + 64);
+ vst4q (out, rg);
+ vst4q (out + 64, gb);
+}
+
+/*
+**test9:
+** ...
+** vld40.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld41.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld42.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld43.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+** vld40.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld41.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld42.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld43.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+** vst40.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst41.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst42.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst43.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+** vst40.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst41.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst42.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst43.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+*/
+void
+test9(const uint8_t * in, uint8_t * out)
+{
+ uint8x16x4_t rg = vld4q(in);
+ uint8x16x4_t gb = vld4q(in - 64);
+ vst4q (out, rg);
+ vst4q (out - 64, gb);
+}
+
+/*
+**test10:
+** ...
+** vld40.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld41.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld42.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld43.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+** vld40.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld41.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld42.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vld43.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+** vst40.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst41.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst42.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst43.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+** vst40.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst41.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst42.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** vst43.8 {q[0-9]+, q[0-9]+, q[0-9]+, q[0-9]+}, \[(?:ip|fp|r[0-9]+)\]
+** ...
+*/
+void
+test10(const uint8_t * in, uint8_t * out)
+{
+ uint8x16x4_t rg = vld4q(in);
+ uint8x16x4_t gb = vld4q(in + 42);
+ vst4q (out, rg);
+ vst4q (out + 42, gb);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
^ permalink raw reply [flat|nested] 3+ messages in thread
* RE: [PATCH] Fix memory constraint on MVE v[ld/st][2/4] instructions [PR107714]
2022-12-09 13:32 [PATCH] Fix memory constraint on MVE v[ld/st][2/4] instructions [PR107714] Stam Markianos-Wright
@ 2022-12-12 13:42 ` Kyrylo Tkachov
2023-01-10 14:01 ` Stam Markianos-Wright
0 siblings, 1 reply; 3+ messages in thread
From: Kyrylo Tkachov @ 2022-12-12 13:42 UTC (permalink / raw)
To: Stam Markianos-Wright, gcc-patches
Cc: Richard Earnshaw, Ramana Radhakrishnan, nickc
Hi Stam,
> -----Original Message-----
> From: Stam Markianos-Wright <Stam.Markianos-Wright@arm.com>
> Sent: Friday, December 9, 2022 1:32 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Ramana Radhakrishnan
> <ramana.gcc@googlemail.com>; nickc@redhat.com
> Subject: [PATCH] Fix memory constraint on MVE v[ld/st][2/4] instructions
> [PR107714]
>
> Hi all,
>
> In the M-Class Arm-ARM:
>
> https://developer.arm.com/documentation/ddi0553/bu/?lang=en
>
> these MVE instructions only have '!' writeback variant and at:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107714
>
> we found that the Um constraint would also allow through a
> register offset writeback, resulting in an assembler error.
>
> Here I have added a new constraint and predicate for these
> instructions, which (uniquely, AFAICT), only support a `!` writeback
> increment by the data size (inside the compiler this is a POST_INC).
>
> No regressions in arm-none-eabi with MVE and MVE.FP.
>
> Ok for trunk, and backport to GCC11 and GCC12 (testing pending)?
>
> Thanks,
> Stam
>
> gcc/ChangeLog:
> PR target/107714
> * config/arm/arm-protos.h (mve_struct_mem_operand): New
> protoype.
> * config/arm/arm.cc (mve_struct_mem_operand): New function.
> * config/arm/constraints.md (Ug): New constraint.
> * config/arm/mve.md (mve_vst4q<mode>): Change constraint.
> (mve_vst2q<mode>): Likewise.
> (mve_vld4q<mode>): Likewise.
> (mve_vld2q<mode>): Likewise.
> * config/arm/predicates.md (mve_struct_operand): New predicate.
>
> gcc/testsuite/ChangeLog:
> PR target/107714
> * gcc.target/arm/mve/intrinsics/vldst24q_reg_offset.c: New test.
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index e5a36d29c7135943b9bb5ea396f70e2e4beb1e4a..8908b7f5b15ce150685868e78e75280bf32053f1 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -474,6 +474,12 @@
(and (match_code "mem")
(match_test "TARGET_32BIT && arm_coproc_mem_operand (op, FALSE)")))
+(define_memory_constraint "Ug"
+ "@internal
+ In Thumb-2 state a valid MVE struct load/store address."
+ (and (match_code "mem")
+ (match_test "TARGET_HAVE_MVE && mve_struct_mem_operand (op)")))
+
I think you can define the constraints in terms of the new mve_struct_operand predicate directly (see how we define the "Ua" constraint, for example).
Ok if that works (and testing passes of course).
Thanks,
Kyrill
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] Fix memory constraint on MVE v[ld/st][2/4] instructions [PR107714]
2022-12-12 13:42 ` Kyrylo Tkachov
@ 2023-01-10 14:01 ` Stam Markianos-Wright
0 siblings, 0 replies; 3+ messages in thread
From: Stam Markianos-Wright @ 2023-01-10 14:01 UTC (permalink / raw)
To: Kyrylo Tkachov, gcc-patches; +Cc: Richard Earnshaw, Ramana Radhakrishnan, nickc
On 12/12/2022 13:42, Kyrylo Tkachov wrote:
> Hi Stam,
>
>> -----Original Message-----
>> From: Stam Markianos-Wright <Stam.Markianos-Wright@arm.com>
>> Sent: Friday, December 9, 2022 1:32 PM
>> To: gcc-patches@gcc.gnu.org
>> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw
>> <Richard.Earnshaw@arm.com>; Ramana Radhakrishnan
>> <ramana.gcc@googlemail.com>; nickc@redhat.com
>> Subject: [PATCH] Fix memory constraint on MVE v[ld/st][2/4] instructions
>> [PR107714]
>>
>> Hi all,
>>
>> In the M-Class Arm-ARM:
>>
>> https://developer.arm.com/documentation/ddi0553/bu/?lang=en
>>
>> these MVE instructions only have '!' writeback variant and at:
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107714
>>
>> we found that the Um constraint would also allow through a
>> register offset writeback, resulting in an assembler error.
>>
>> Here I have added a new constraint and predicate for these
>> instructions, which (uniquely, AFAICT), only support a `!` writeback
>> increment by the data size (inside the compiler this is a POST_INC).
>>
>> No regressions in arm-none-eabi with MVE and MVE.FP.
>>
>> Ok for trunk, and backport to GCC11 and GCC12 (testing pending)?
>>
>> Thanks,
>> Stam
>>
>> gcc/ChangeLog:
>> PR target/107714
>> * config/arm/arm-protos.h (mve_struct_mem_operand): New
>> protoype.
>> * config/arm/arm.cc (mve_struct_mem_operand): New function.
>> * config/arm/constraints.md (Ug): New constraint.
>> * config/arm/mve.md (mve_vst4q<mode>): Change constraint.
>> (mve_vst2q<mode>): Likewise.
>> (mve_vld4q<mode>): Likewise.
>> (mve_vld2q<mode>): Likewise.
>> * config/arm/predicates.md (mve_struct_operand): New predicate.
>>
>> gcc/testsuite/ChangeLog:
>> PR target/107714
>> * gcc.target/arm/mve/intrinsics/vldst24q_reg_offset.c: New test.
>
> diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
> index e5a36d29c7135943b9bb5ea396f70e2e4beb1e4a..8908b7f5b15ce150685868e78e75280bf32053f1 100644
> --- a/gcc/config/arm/constraints.md
> +++ b/gcc/config/arm/constraints.md
> @@ -474,6 +474,12 @@
> (and (match_code "mem")
> (match_test "TARGET_32BIT && arm_coproc_mem_operand (op, FALSE)")))
>
> +(define_memory_constraint "Ug"
> + "@internal
> + In Thumb-2 state a valid MVE struct load/store address."
> + (and (match_code "mem")
> + (match_test "TARGET_HAVE_MVE && mve_struct_mem_operand (op)")))
> +
>
> I think you can define the constraints in terms of the new mve_struct_operand predicate directly (see how we define the "Ua" constraint, for example).
> Ok if that works (and testing passes of course).
Done as discussed and re-tested on all branches. Pushed as:
4269a6567eb991e6838f40bda5be9e3a7972530c to trunk
25edc76f2afba0b4eaf22174d42de042a6969dbe to gcc-12
08842ad274f5e2630994f7c6e70b2d31768107ea to gcc-11
Thank you!
Stam
> Thanks,
> Kyrill
>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-01-10 14:06 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-09 13:32 [PATCH] Fix memory constraint on MVE v[ld/st][2/4] instructions [PR107714] Stam Markianos-Wright
2022-12-12 13:42 ` Kyrylo Tkachov
2023-01-10 14:01 ` Stam Markianos-Wright
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).