public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] aarch64: Fix movv8di for overlapping register and memory load [PR113550]
@ 2024-01-24  7:01 Andrew Pinski
  2024-01-25 12:08 ` Richard Sandiford
  0 siblings, 1 reply; 2+ messages in thread
From: Andrew Pinski @ 2024-01-24  7:01 UTC (permalink / raw)
  To: gcc-patches; +Cc: Andrew Pinski

The split for movv8di is not ready to handle the case where the setting
register overlaps with the address of the memory that is being load.
Fixing the split than just making the output constraint as an early clobber
for this alternative. The split would first need to figure out which register
is overlapping with the address and then only emit that move last.

Build and tested for aarch64-linux-gnu with no regressions

gcc/ChangeLog:

	* config/aarch64/aarch64-simd.md (*aarch64_movv8di): Mark the last
	alternative's output constraint as an early clobber.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
---
 gcc/config/aarch64/aarch64-simd.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 662ef696630..ba079298b84 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7985,7 +7985,7 @@ (define_insn "*aarch64_mov<mode>"
 )
 
 (define_insn "*aarch64_movv8di"
-  [(set (match_operand:V8DI 0 "nonimmediate_operand" "=r,m,r")
+  [(set (match_operand:V8DI 0 "nonimmediate_operand" "=r,m,&r")
 	(match_operand:V8DI 1 "general_operand" " r,r,m"))]
   "(register_operand (operands[0], V8DImode)
     || register_operand (operands[1], V8DImode))"
-- 
2.39.3


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] aarch64: Fix movv8di for overlapping register and memory load [PR113550]
  2024-01-24  7:01 [PATCH] aarch64: Fix movv8di for overlapping register and memory load [PR113550] Andrew Pinski
@ 2024-01-25 12:08 ` Richard Sandiford
  0 siblings, 0 replies; 2+ messages in thread
From: Richard Sandiford @ 2024-01-25 12:08 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: gcc-patches

Andrew Pinski <quic_apinski@quicinc.com> writes:
> The split for movv8di is not ready to handle the case where the setting
> register overlaps with the address of the memory that is being load.
> Fixing the split than just making the output constraint as an early clobber
> for this alternative. The split would first need to figure out which register
> is overlapping with the address and then only emit that move last.

I was curious how strained that detection would be in practice, and in
the end it didn't seem too bad.  I pushed the following variant after
testing on aarch64-linux-gnu.

Thanks,
Richard


The LS64 movv8di pattern didn't handle loads that overlapped with
the address register (unless the overlap happened to be in the
last subload).

gcc/
	PR target/113550
	* config/aarch64/aarch64-simd.md: In the movv8di splitter, check
	whether each split instruction is a load that clobbers the source
	address.  Emit that instruction last if so.

gcc/testsuite/
	PR target/113550
	* gcc.target/aarch64/pr113550.c: New test.
---
 gcc/config/aarch64/aarch64-simd.md          | 18 ++++++--
 gcc/testsuite/gcc.target/aarch64/pr113550.c | 48 +++++++++++++++++++++
 2 files changed, 62 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr113550.c

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 48f0741e7d0..f036f6ce997 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -8221,11 +8221,21 @@ (define_split
 	   || (memory_operand (operands[0], V8DImode)
 	       && register_operand (operands[1], V8DImode)))
     {
+      std::pair<rtx, rtx> last_pair = {};
       for (int offset = 0; offset < 64; offset += 16)
-	emit_move_insn (simplify_gen_subreg (TImode, operands[0],
-					     V8DImode, offset),
-			simplify_gen_subreg (TImode, operands[1],
-					     V8DImode, offset));
+        {
+	  std::pair<rtx, rtx> pair = {
+	    simplify_gen_subreg (TImode, operands[0], V8DImode, offset),
+	    simplify_gen_subreg (TImode, operands[1], V8DImode, offset)
+	  };
+	  if (register_operand (pair.first, TImode)
+	      && reg_overlap_mentioned_p (pair.first, pair.second))
+	    last_pair = pair;
+	  else
+	    emit_move_insn (pair.first, pair.second);
+        }
+      if (last_pair.first)
+	emit_move_insn (last_pair.first, last_pair.second);
       DONE;
     }
   else
diff --git a/gcc/testsuite/gcc.target/aarch64/pr113550.c b/gcc/testsuite/gcc.target/aarch64/pr113550.c
new file mode 100644
index 00000000000..0ff3c7b5c00
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr113550.c
@@ -0,0 +1,48 @@
+/* { dg-options "-O" } */
+/* { dg-do run } */
+
+#pragma GCC push_options
+#pragma GCC target "+ls64"
+#pragma GCC aarch64 "arm_acle.h"
+#pragma GCC pop_options
+
+#define DEF_FUNCTION(NAME, ARGS)		\
+  __attribute__((noipa))			\
+  __arm_data512_t				\
+  NAME ARGS					\
+  {						\
+    return *ptr;				\
+  }
+
+DEF_FUNCTION (f0, (__arm_data512_t *ptr))
+DEF_FUNCTION (f1, (int x0, __arm_data512_t *ptr))
+DEF_FUNCTION (f2, (int x0, int x1, __arm_data512_t *ptr))
+DEF_FUNCTION (f3, (int x0, int x1, int x2, __arm_data512_t *ptr))
+DEF_FUNCTION (f4, (int x0, int x1, int x2, int x3, __arm_data512_t *ptr))
+DEF_FUNCTION (f5, (int x0, int x1, int x2, int x3, int x4,
+		   __arm_data512_t *ptr))
+DEF_FUNCTION (f6, (int x0, int x1, int x2, int x3, int x4, int x5,
+		   __arm_data512_t *ptr))
+DEF_FUNCTION (f7, (int x0, int x1, int x2, int x3, int x4, int x5, int x6,
+		   __arm_data512_t *ptr))
+
+int
+main (void)
+{
+  __arm_data512_t x = { 0, 10, 20, 30, 40, 50, 60, 70 };
+  __arm_data512_t res[8] =
+  {
+    f0 (&x),
+    f1 (0, &x),
+    f2 (0, 1, &x),
+    f3 (0, 1, 2, &x),
+    f4 (0, 1, 2, 3, &x),
+    f5 (0, 1, 2, 3, 4, &x),
+    f6 (0, 1, 2, 3, 4, 5, &x),
+    f7 (0, 1, 2, 3, 4, 5, 6, &x)
+  };
+  for (int i = 0; i < 8; ++i)
+    if (__builtin_memcmp (&x, &res[i], sizeof (x)) != 0)
+      __builtin_abort ();
+  return 0;
+}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-01-25 12:08 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-24  7:01 [PATCH] aarch64: Fix movv8di for overlapping register and memory load [PR113550] Andrew Pinski
2024-01-25 12:08 ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).