public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/4] RISC-V: Enhance unaligned/overlapping codegen
@ 2024-05-08  5:17 Christoph Müllner
  2024-05-08  5:17 ` [PATCH 1/4] RISC-V: Add test cases for cpymem expansion Christoph Müllner
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Christoph Müllner @ 2024-05-08  5:17 UTC (permalink / raw)
  To: gcc-patches, Kito Cheng, Jim Wilson, Palmer Dabbelt,
	Andrew Waterman, Philipp Tomsich, Jeff Law, Vineet Gupta
  Cc: Christoph Müllner

I've mentioned some improvements for unaligned and overlapping code
generation in the RISC-V call a few weeks ago.  Sending this patches
now, as the release is out.

Christoph Müllner (4):
  RISC-V: Add test cases for cpymem expansion
  RISC-V: Allow unaligned accesses in cpymemsi expansion
  RISC-V: tune: Add setting for overlapping mem ops to tuning struct
  RISC-V: Allow by-pieces to do overlapping accesses in
    block_move_straight

 gcc/config/riscv/riscv-string.cc              |  59 +++++---
 gcc/config/riscv/riscv.cc                     |  20 +++
 .../gcc.target/riscv/cpymem-32-ooo.c          | 137 ++++++++++++++++++
 gcc/testsuite/gcc.target/riscv/cpymem-32.c    | 136 +++++++++++++++++
 .../gcc.target/riscv/cpymem-64-ooo.c          | 130 +++++++++++++++++
 gcc/testsuite/gcc.target/riscv/cpymem-64.c    | 135 +++++++++++++++++
 6 files changed, 593 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cpymem-32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cpymem-64.c

-- 
2.44.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] RISC-V: Add test cases for cpymem expansion
  2024-05-08  5:17 [PATCH 0/4] RISC-V: Enhance unaligned/overlapping codegen Christoph Müllner
@ 2024-05-08  5:17 ` Christoph Müllner
  2024-05-09 21:15   ` Jeff Law
  2024-05-08  5:17 ` [PATCH 2/4] RISC-V: Allow unaligned accesses in cpymemsi expansion Christoph Müllner
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Christoph Müllner @ 2024-05-08  5:17 UTC (permalink / raw)
  To: gcc-patches, Kito Cheng, Jim Wilson, Palmer Dabbelt,
	Andrew Waterman, Philipp Tomsich, Jeff Law, Vineet Gupta
  Cc: Christoph Müllner

We have two mechanisms in the RISC-V backend that expand
cpymem pattern: a) by-pieces, b) riscv_expand_block_move()
in riscv-string.cc. The by-pieces framework has higher priority
and emits a sequence of up to 15 instructions
(see use_by_pieces_infrastructure_p() for more details).

As a rule-of-thumb, by-pieces emits alternating load/store sequences
and the setmem expansion in the backend emits a sequence of loads
followed by a sequence of stores.

Let's add some test cases to document the current behaviour
and to have tests to identify regressions.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/cpymem-32-ooo.c: New test.
	* gcc.target/riscv/cpymem-32.c: New test.
	* gcc.target/riscv/cpymem-64-ooo.c: New test.
	* gcc.target/riscv/cpymem-64.c: New test.
---
 .../gcc.target/riscv/cpymem-32-ooo.c          | 131 +++++++++++++++++
 gcc/testsuite/gcc.target/riscv/cpymem-32.c    | 138 ++++++++++++++++++
 .../gcc.target/riscv/cpymem-64-ooo.c          | 129 ++++++++++++++++
 gcc/testsuite/gcc.target/riscv/cpymem-64.c    | 138 ++++++++++++++++++
 4 files changed, 536 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cpymem-32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cpymem-64.c

diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
new file mode 100644
index 00000000000..33fb9891d82
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
@@ -0,0 +1,131 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv32 } */
+/* { dg-options "-march=rv32gc -mabi=ilp32d -mtune=generic-ooo" } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-Os" "-Og" "-Oz" "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+/* { dg-allow-blank-lines-in-output 1 } */
+
+#define COPY_N(N)					\
+void copy_##N (void *to, void *from)			\
+{							\
+  __builtin_memcpy (to, from, N);			\
+}
+
+#define COPY_ALIGNED_N(N)				\
+void copy_aligned_##N (void *to, void *from)		\
+{							\
+  to = __builtin_assume_aligned(to, sizeof(long));	\
+  from = __builtin_assume_aligned(from, sizeof(long));	\
+  __builtin_memcpy (to, from, N);			\
+}
+
+/*
+**copy_7:
+**    ...
+**    lw\t[at][0-9],0\([at][0-9]\)
+**    sw\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    lbu\t[at][0-9],6\([at][0-9]\)
+**    sb\t[at][0-9],6\([at][0-9]\)
+**    ...
+*/
+COPY_N(7)
+
+/*
+**copy_aligned_7:
+**    ...
+**    lw\t[at][0-9],0\([at][0-9]\)
+**    sw\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    lbu\t[at][0-9],6\([at][0-9]\)
+**    sb\t[at][0-9],6\([at][0-9]\)
+**    ...
+*/
+COPY_ALIGNED_N(7)
+
+/*
+**copy_8:
+**    ...
+**    lw\ta[0-9],0\(a[0-9]\)
+**    sw\ta[0-9],0\(a[0-9]\)
+**    ...
+*/
+COPY_N(8)
+
+/*
+**copy_aligned_8:
+**    ...
+**    lw\ta[0-9],0\(a[0-9]\)
+**    sw\ta[0-9],0\(a[0-9]\)
+**    ...
+*/
+COPY_ALIGNED_N(8)
+
+/*
+**copy_11:
+**    ...
+**    lbu\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    lbu\t[at][0-9],10\([at][0-9]\)
+**    ...
+**    sb\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    sb\t[at][0-9],10\([at][0-9]\)
+**    ...
+*/
+COPY_N(11)
+
+/*
+**copy_aligned_11:
+**    ...
+**    lw\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    sw\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    lbu\t[at][0-9],10\([at][0-9]\)
+**    sb\t[at][0-9],10\([at][0-9]\)
+**    ...
+*/
+COPY_ALIGNED_N(11)
+
+/*
+**copy_15:
+**    ...
+**    (call|tail)\tmemcpy
+**    ...
+*/
+COPY_N(15)
+
+/*
+**copy_aligned_15:
+**    ...
+**    lw\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    sw\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    lbu\t[at][0-9],14\([at][0-9]\)
+**    sb\t[at][0-9],14\([at][0-9]\)
+**    ...
+*/
+COPY_ALIGNED_N(15)
+
+/*
+**copy_27:
+**    ...
+**    (call|tail)\tmemcpy
+**    ...
+*/
+COPY_N(27)
+
+/*
+**copy_aligned_27:
+**    ...
+**    lw\t[at][0-9],20\([at][0-9]\)
+**    ...
+**    sw\t[at][0-9],20\([at][0-9]\)
+**    ...
+**    lbu\t[at][0-9],26\([at][0-9]\)
+**    sb\t[at][0-9],26\([at][0-9]\)
+**    ...
+*/
+COPY_ALIGNED_N(27)
diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-32.c b/gcc/testsuite/gcc.target/riscv/cpymem-32.c
new file mode 100644
index 00000000000..44ba14a1d51
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cpymem-32.c
@@ -0,0 +1,138 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv32 } */
+/* { dg-options "-march=rv32gc -mabi=ilp32d -mtune=rocket" } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-Os" "-Og" "-Oz" "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+/* { dg-allow-blank-lines-in-output 1 } */
+
+#define COPY_N(N)					\
+void copy_##N (void *to, void *from)			\
+{							\
+  __builtin_memcpy (to, from, N);			\
+}
+
+#define COPY_ALIGNED_N(N)				\
+void copy_aligned_##N (void *to, void *from)		\
+{							\
+  to = __builtin_assume_aligned(to, sizeof(long));	\
+  from = __builtin_assume_aligned(from, sizeof(long));	\
+  __builtin_memcpy (to, from, N);			\
+}
+
+/*
+**copy_7:
+**    ...
+**    lbu\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    lbu\t[at][0-9],6\([at][0-9]\)
+**    ...
+**    sb\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    sb\t[at][0-9],6\([at][0-9]\)
+**    ...
+*/
+COPY_N(7)
+
+/*
+**copy_aligned_7:
+**    ...
+**    lw\t[at][0-9],0\([at][0-9]\)
+**    sw\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    lbu\t[at][0-9],6\([at][0-9]\)
+**    sb\t[at][0-9],6\([at][0-9]\)
+**    ...
+*/
+COPY_ALIGNED_N(7)
+
+/*
+**copy_8:
+**    ...
+**    lbu\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    lbu\t[at][0-9],7\([at][0-9]\)
+**    ...
+**    sb\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    sb\t[at][0-9],7\([at][0-9]\)
+**    ...
+*/
+COPY_N(8)
+
+/*
+**copy_aligned_8:
+**    ...
+**    lw\ta[0-9],0\(a[0-9]\)
+**    sw\ta[0-9],0\(a[0-9]\)
+**    ...
+*/
+COPY_ALIGNED_N(8)
+
+/*
+**copy_11:
+**    ...
+**    lbu\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    lbu\t[at][0-9],10\([at][0-9]\)
+**    ...
+**    sb\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    sb\t[at][0-9],10\([at][0-9]\)
+**    ...
+*/
+COPY_N(11)
+
+/*
+**copy_aligned_11:
+**    ...
+**    lw\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    sw\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    lbu\t[at][0-9],10\([at][0-9]\)
+**    sb\t[at][0-9],10\([at][0-9]\)
+**    ...
+*/
+COPY_ALIGNED_N(11)
+
+/*
+**copy_15:
+**    ...
+**    (call|tail)\tmemcpy
+**    ...
+*/
+COPY_N(15)
+
+/*
+**copy_aligned_15:
+**    ...
+**    lw\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    sw\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    lbu\t[at][0-9],14\([at][0-9]\)
+**    sb\t[at][0-9],14\([at][0-9]\)
+**    ...
+*/
+COPY_ALIGNED_N(15)
+
+/*
+**copy_27:
+**    ...
+**    (call|tail)\tmemcpy
+**    ...
+*/
+COPY_N(27)
+
+/*
+**copy_aligned_27:
+**    ...
+**    lw\t[at][0-9],20\([at][0-9]\)
+**    ...
+**    sw\t[at][0-9],20\([at][0-9]\)
+**    ...
+**    lbu\t[at][0-9],26\([at][0-9]\)
+**    sb\t[at][0-9],26\([at][0-9]\)
+**    ...
+*/
+COPY_ALIGNED_N(27)
diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c b/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c
new file mode 100644
index 00000000000..8e40e52fa91
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c
@@ -0,0 +1,129 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -mtune=generic-ooo" } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-Os" "-Og" "-Oz" "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+/* { dg-allow-blank-lines-in-output 1 } */
+
+#define COPY_N(N)					\
+void copy_##N (void *to, void *from)			\
+{							\
+  __builtin_memcpy (to, from, N);			\
+}
+
+#define COPY_ALIGNED_N(N)				\
+void copy_aligned_##N (void *to, void *from)		\
+{							\
+  to = __builtin_assume_aligned(to, sizeof(long));	\
+  from = __builtin_assume_aligned(from, sizeof(long));	\
+  __builtin_memcpy (to, from, N);			\
+}
+
+/*
+**copy_7:
+**    ...
+**    lw\t[at][0-9],0\([at][0-9]\)
+**    sw\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    lbu\t[at][0-9],6\([at][0-9]\)
+**    sb\t[at][0-9],6\([at][0-9]\)
+**    ...
+*/
+COPY_N(7)
+
+/*
+**copy_aligned_7:
+**    ...
+**    lw\t[at][0-9],0\([at][0-9]\)
+**    sw\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    lbu\t[at][0-9],6\([at][0-9]\)
+**    sb\t[at][0-9],6\([at][0-9]\)
+**    ...
+*/
+COPY_ALIGNED_N(7)
+
+/*
+**copy_8:
+**    ...
+**    ld\ta[0-9],0\(a[0-9]\)
+**    sd\ta[0-9],0\(a[0-9]\)
+**    ...
+*/
+COPY_N(8)
+
+/*
+**copy_aligned_8:
+**    ...
+**    ld\ta[0-9],0\(a[0-9]\)
+**    sd\ta[0-9],0\(a[0-9]\)
+**    ...
+*/
+COPY_ALIGNED_N(8)
+
+/*
+**copy_11:
+**    ...
+**    ld\t[at][0-9],0\([at][0-9]\)
+**    sd\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    lbu\t[at][0-9],10\([at][0-9]\)
+**    sb\t[at][0-9],10\([at][0-9]\)
+**    ...
+*/
+COPY_N(11)
+
+/*
+**copy_aligned_11:
+**    ...
+**    ld\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    sd\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    lbu\t[at][0-9],10\([at][0-9]\)
+**    sb\t[at][0-9],10\([at][0-9]\)
+**    ...
+*/
+COPY_ALIGNED_N(11)
+
+/*
+**copy_15:
+**    ...
+**    (call|tail)\tmemcpy
+**    ...
+*/
+COPY_N(15)
+
+/*
+**copy_aligned_15:
+**    ...
+**    ld\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    sd\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    lbu\t[at][0-9],14\([at][0-9]\)
+**    sb\t[at][0-9],14\([at][0-9]\)
+**    ...
+*/
+COPY_ALIGNED_N(15)
+
+/*
+**copy_27:
+**    ...
+**    (call|tail)\tmemcpy
+**    ...
+*/
+COPY_N(27)
+
+/*
+**copy_aligned_27:
+**    ...
+**    ld\t[at][0-9],16\([at][0-9]\)
+**    ...
+**    sd\t[at][0-9],16\([at][0-9]\)
+**    ...
+**    lbu\t[at][0-9],26\([at][0-9]\)
+**    sb\t[at][0-9],26\([at][0-9]\)
+**    ...
+*/
+COPY_ALIGNED_N(27)
diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-64.c b/gcc/testsuite/gcc.target/riscv/cpymem-64.c
new file mode 100644
index 00000000000..bdfaca0d46a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cpymem-64.c
@@ -0,0 +1,138 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -mtune=rocket" } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-Os" "-Og" "-Oz" "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+/* { dg-allow-blank-lines-in-output 1 } */
+
+#define COPY_N(N)					\
+void copy_##N (void *to, void *from)			\
+{							\
+  __builtin_memcpy (to, from, N);			\
+}
+
+#define COPY_ALIGNED_N(N)				\
+void copy_aligned_##N (void *to, void *from)		\
+{							\
+  to = __builtin_assume_aligned(to, sizeof(long));	\
+  from = __builtin_assume_aligned(from, sizeof(long));	\
+  __builtin_memcpy (to, from, N);			\
+}
+
+/*
+**copy_7:
+**    ...
+**    lbu\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    lbu\t[at][0-9],6\([at][0-9]\)
+**    ...
+**    sb\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    sb\t[at][0-9],6\([at][0-9]\)
+**    ...
+*/
+COPY_N(7)
+
+/*
+**copy_aligned_7:
+**    ...
+**    lw\t[at][0-9],0\([at][0-9]\)
+**    sw\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    lbu\t[at][0-9],6\([at][0-9]\)
+**    sb\t[at][0-9],6\([at][0-9]\)
+**    ...
+*/
+COPY_ALIGNED_N(7)
+
+/*
+**copy_8:
+**    ...
+**    lbu\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    lbu\t[at][0-9],7\([at][0-9]\)
+**    ...
+**    sb\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    sb\t[at][0-9],7\([at][0-9]\)
+**    ...
+*/
+COPY_N(8)
+
+/*
+**copy_aligned_8:
+**    ...
+**    ld\ta[0-9],0\(a[0-9]\)
+**    sd\ta[0-9],0\(a[0-9]\)
+**    ...
+*/
+COPY_ALIGNED_N(8)
+
+/*
+**copy_11:
+**    ...
+**    lbu\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    lbu\t[at][0-9],10\([at][0-9]\)
+**    ...
+**    sb\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    sb\t[at][0-9],10\([at][0-9]\)
+**    ...
+*/
+COPY_N(11)
+
+/*
+**copy_aligned_11:
+**    ...
+**    ld\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    sd\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    lbu\t[at][0-9],10\([at][0-9]\)
+**    sb\t[at][0-9],10\([at][0-9]\)
+**    ...
+*/
+COPY_ALIGNED_N(11)
+
+/*
+**copy_15:
+**    ...
+**    (call|tail)\tmemcpy
+**    ...
+*/
+COPY_N(15)
+
+/*
+**copy_aligned_15:
+**    ...
+**    ld\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    sd\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    lbu\t[at][0-9],14\([at][0-9]\)
+**    sb\t[at][0-9],14\([at][0-9]\)
+**    ...
+*/
+COPY_ALIGNED_N(15)
+
+/*
+**copy_27:
+**    ...
+**    (call|tail)\tmemcpy
+**    ...
+*/
+COPY_N(27)
+
+/*
+**copy_aligned_27:
+**    ...
+**    ld\t[at][0-9],16\([at][0-9]\)
+**    ...
+**    sd\t[at][0-9],16\([at][0-9]\)
+**    ...
+**    lbu\t[at][0-9],26\([at][0-9]\)
+**    sb\t[at][0-9],26\([at][0-9]\)
+**    ...
+*/
+COPY_ALIGNED_N(27)
-- 
2.44.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/4] RISC-V: Allow unaligned accesses in cpymemsi expansion
  2024-05-08  5:17 [PATCH 0/4] RISC-V: Enhance unaligned/overlapping codegen Christoph Müllner
  2024-05-08  5:17 ` [PATCH 1/4] RISC-V: Add test cases for cpymem expansion Christoph Müllner
@ 2024-05-08  5:17 ` Christoph Müllner
  2024-05-10 22:32   ` Jeff Law
  2024-05-08  5:17 ` [PATCH 3/4] RISC-V: tune: Add setting for overlapping mem ops to tuning struct Christoph Müllner
  2024-05-08  5:17 ` [PATCH 4/4] RISC-V: Allow by-pieces to do overlapping accesses in block_move_straight Christoph Müllner
  3 siblings, 1 reply; 10+ messages in thread
From: Christoph Müllner @ 2024-05-08  5:17 UTC (permalink / raw)
  To: gcc-patches, Kito Cheng, Jim Wilson, Palmer Dabbelt,
	Andrew Waterman, Philipp Tomsich, Jeff Law, Vineet Gupta
  Cc: Christoph Müllner

The RISC-V cpymemsi expansion is called, whenever the by-pieces
infrastructure will not take care of the builtin expansion.
The code emitted by the by-pieces infrastructure may emits code,
that includes unaligned accesses if riscv_slow_unaligned_access_p
is false.

The RISC-V cpymemsi expansion is handled via riscv_expand_block_move().
The current implementation of this function does not check
riscv_slow_unaligned_access_p and never emits unaligned accesses.

Since by-pieces emits unaligned accesses, it is reasonable to implement
the same behaviour in the cpymemsi expansion. And that's what this patch
is doing.

The patch checks riscv_slow_unaligned_access_p at the entry and sets
the allowed alignment accordingly. This alignment is then propagated
down to the routines that emit the actual instructions.

The changes introduced by this patch can be seen in the adjustments
of the cpymem tests.

gcc/ChangeLog:

	* config/riscv/riscv-string.cc (riscv_block_move_straight): Add
	parameter align.
	(riscv_adjust_block_mem): Replace parameter length by align.
	(riscv_block_move_loop): Add parameter align.
	(riscv_expand_block_move_scalar): Set alignment properly if the
	target has fast unaligned access.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/cpymem-32-ooo.c: Adjust for unaligned access.
	* gcc.target/riscv/cpymem-64-ooo.c: Likewise.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
---
 gcc/config/riscv/riscv-string.cc              | 53 +++++++++++--------
 .../gcc.target/riscv/cpymem-32-ooo.c          | 20 +++++--
 .../gcc.target/riscv/cpymem-64-ooo.c          | 14 ++++-
 3 files changed, 59 insertions(+), 28 deletions(-)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index b09b51d7526..8fc0877772f 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -610,11 +610,13 @@ riscv_expand_strlen (rtx result, rtx src, rtx search_char, rtx align)
   return false;
 }
 
-/* Emit straight-line code to move LENGTH bytes from SRC to DEST.
+/* Emit straight-line code to move LENGTH bytes from SRC to DEST
+   with accesses that are ALIGN bytes aligned.
    Assume that the areas do not overlap.  */
 
 static void
-riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length)
+riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length,
+			   unsigned HOST_WIDE_INT align)
 {
   unsigned HOST_WIDE_INT offset, delta;
   unsigned HOST_WIDE_INT bits;
@@ -622,8 +624,7 @@ riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length)
   enum machine_mode mode;
   rtx *regs;
 
-  bits = MAX (BITS_PER_UNIT,
-	      MIN (BITS_PER_WORD, MIN (MEM_ALIGN (src), MEM_ALIGN (dest))));
+  bits = MAX (BITS_PER_UNIT, MIN (BITS_PER_WORD, align));
 
   mode = mode_for_size (bits, MODE_INT, 0).require ();
   delta = bits / BITS_PER_UNIT;
@@ -648,21 +649,20 @@ riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length)
     {
       src = adjust_address (src, BLKmode, offset);
       dest = adjust_address (dest, BLKmode, offset);
-      move_by_pieces (dest, src, length - offset,
-		      MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), RETURN_BEGIN);
+      move_by_pieces (dest, src, length - offset, align, RETURN_BEGIN);
     }
 }
 
 /* Helper function for doing a loop-based block operation on memory
-   reference MEM.  Each iteration of the loop will operate on LENGTH
-   bytes of MEM.
+   reference MEM.
 
    Create a new base register for use within the loop and point it to
    the start of MEM.  Create a new memory reference that uses this
-   register.  Store them in *LOOP_REG and *LOOP_MEM respectively.  */
+   register and has an alignment of ALIGN.  Store them in *LOOP_REG
+   and *LOOP_MEM respectively.  */
 
 static void
-riscv_adjust_block_mem (rtx mem, unsigned HOST_WIDE_INT length,
+riscv_adjust_block_mem (rtx mem, unsigned HOST_WIDE_INT align,
 			rtx *loop_reg, rtx *loop_mem)
 {
   *loop_reg = copy_addr_to_reg (XEXP (mem, 0));
@@ -670,15 +670,17 @@ riscv_adjust_block_mem (rtx mem, unsigned HOST_WIDE_INT length,
   /* Although the new mem does not refer to a known location,
      it does keep up to LENGTH bytes of alignment.  */
   *loop_mem = change_address (mem, BLKmode, *loop_reg);
-  set_mem_align (*loop_mem, MIN (MEM_ALIGN (mem), length * BITS_PER_UNIT));
+  set_mem_align (*loop_mem, align);
 }
 
 /* Move LENGTH bytes from SRC to DEST using a loop that moves BYTES_PER_ITER
-   bytes at a time.  LENGTH must be at least BYTES_PER_ITER.  Assume that
-   the memory regions do not overlap.  */
+   bytes at a time.  LENGTH must be at least BYTES_PER_ITER.  The alignment
+   of the access can be set by ALIGN.  Assume that the memory regions do not
+   overlap.  */
 
 static void
 riscv_block_move_loop (rtx dest, rtx src, unsigned HOST_WIDE_INT length,
+		       unsigned HOST_WIDE_INT align,
 		       unsigned HOST_WIDE_INT bytes_per_iter)
 {
   rtx label, src_reg, dest_reg, final_src, test;
@@ -688,8 +690,8 @@ riscv_block_move_loop (rtx dest, rtx src, unsigned HOST_WIDE_INT length,
   length -= leftover;
 
   /* Create registers and memory references for use within the loop.  */
-  riscv_adjust_block_mem (src, bytes_per_iter, &src_reg, &src);
-  riscv_adjust_block_mem (dest, bytes_per_iter, &dest_reg, &dest);
+  riscv_adjust_block_mem (src, align, &src_reg, &src);
+  riscv_adjust_block_mem (dest, align, &dest_reg, &dest);
 
   /* Calculate the value that SRC_REG should have after the last iteration
      of the loop.  */
@@ -701,7 +703,7 @@ riscv_block_move_loop (rtx dest, rtx src, unsigned HOST_WIDE_INT length,
   emit_label (label);
 
   /* Emit the loop body.  */
-  riscv_block_move_straight (dest, src, bytes_per_iter);
+  riscv_block_move_straight (dest, src, bytes_per_iter, align);
 
   /* Move on to the next block.  */
   riscv_emit_move (src_reg, plus_constant (Pmode, src_reg, bytes_per_iter));
@@ -713,7 +715,7 @@ riscv_block_move_loop (rtx dest, rtx src, unsigned HOST_WIDE_INT length,
 
   /* Mop up any left-over bytes.  */
   if (leftover)
-    riscv_block_move_straight (dest, src, leftover);
+    riscv_block_move_straight (dest, src, leftover, align);
   else
     emit_insn(gen_nop ());
 }
@@ -730,8 +732,16 @@ riscv_expand_block_move_scalar (rtx dest, rtx src, rtx length)
   unsigned HOST_WIDE_INT hwi_length = UINTVAL (length);
   unsigned HOST_WIDE_INT factor, align;
 
-  align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD);
-  factor = BITS_PER_WORD / align;
+  if (riscv_slow_unaligned_access_p)
+    {
+      align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD);
+      factor = BITS_PER_WORD / align;
+    }
+  else
+    {
+      align = hwi_length * BITS_PER_UNIT;
+      factor = 1;
+    }
 
   if (optimize_function_for_size_p (cfun)
       && hwi_length * factor * UNITS_PER_WORD > MOVE_RATIO (false))
@@ -739,7 +749,7 @@ riscv_expand_block_move_scalar (rtx dest, rtx src, rtx length)
 
   if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor))
     {
-      riscv_block_move_straight (dest, src, INTVAL (length));
+      riscv_block_move_straight (dest, src, hwi_length, align);
       return true;
     }
   else if (optimize && align >= BITS_PER_WORD)
@@ -759,7 +769,8 @@ riscv_expand_block_move_scalar (rtx dest, rtx src, rtx length)
 	    iter_words = i;
 	}
 
-      riscv_block_move_loop (dest, src, bytes, iter_words * UNITS_PER_WORD);
+      riscv_block_move_loop (dest, src, bytes, align,
+			     iter_words * UNITS_PER_WORD);
       return true;
     }
 
diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
index 33fb9891d82..946a773f77a 100644
--- a/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
+++ b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
@@ -64,12 +64,12 @@ COPY_ALIGNED_N(8)
 /*
 **copy_11:
 **    ...
-**    lbu\t[at][0-9],0\([at][0-9]\)
 **    ...
-**    lbu\t[at][0-9],10\([at][0-9]\)
+**    lw\t[at][0-9],0\([at][0-9]\)
 **    ...
-**    sb\t[at][0-9],0\([at][0-9]\)
+**    sw\t[at][0-9],0\([at][0-9]\)
 **    ...
+**    lbu\t[at][0-9],10\([at][0-9]\)
 **    sb\t[at][0-9],10\([at][0-9]\)
 **    ...
 */
@@ -91,7 +91,12 @@ COPY_ALIGNED_N(11)
 /*
 **copy_15:
 **    ...
-**    (call|tail)\tmemcpy
+**    lw\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    sw\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    lbu\t[at][0-9],14\([at][0-9]\)
+**    sb\t[at][0-9],14\([at][0-9]\)
 **    ...
 */
 COPY_N(15)
@@ -112,7 +117,12 @@ COPY_ALIGNED_N(15)
 /*
 **copy_27:
 **    ...
-**    (call|tail)\tmemcpy
+**    lw\t[at][0-9],20\([at][0-9]\)
+**    ...
+**    sw\t[at][0-9],20\([at][0-9]\)
+**    ...
+**    lbu\t[at][0-9],26\([at][0-9]\)
+**    sb\t[at][0-9],26\([at][0-9]\)
 **    ...
 */
 COPY_N(27)
diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c b/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c
index 8e40e52fa91..08a927b9483 100644
--- a/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c
+++ b/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c
@@ -89,7 +89,12 @@ COPY_ALIGNED_N(11)
 /*
 **copy_15:
 **    ...
-**    (call|tail)\tmemcpy
+**    ld\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    sd\t[at][0-9],0\([at][0-9]\)
+**    ...
+**    lbu\t[at][0-9],14\([at][0-9]\)
+**    sb\t[at][0-9],14\([at][0-9]\)
 **    ...
 */
 COPY_N(15)
@@ -110,7 +115,12 @@ COPY_ALIGNED_N(15)
 /*
 **copy_27:
 **    ...
-**    (call|tail)\tmemcpy
+**    ld\t[at][0-9],16\([at][0-9]\)
+**    ...
+**    sd\t[at][0-9],16\([at][0-9]\)
+**    ...
+**    lbu\t[at][0-9],26\([at][0-9]\)
+**    sb\t[at][0-9],26\([at][0-9]\)
 **    ...
 */
 COPY_N(27)
-- 
2.44.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 3/4] RISC-V: tune: Add setting for overlapping mem ops to tuning struct
  2024-05-08  5:17 [PATCH 0/4] RISC-V: Enhance unaligned/overlapping codegen Christoph Müllner
  2024-05-08  5:17 ` [PATCH 1/4] RISC-V: Add test cases for cpymem expansion Christoph Müllner
  2024-05-08  5:17 ` [PATCH 2/4] RISC-V: Allow unaligned accesses in cpymemsi expansion Christoph Müllner
@ 2024-05-08  5:17 ` Christoph Müllner
  2024-05-10 22:40   ` Jeff Law
  2024-05-08  5:17 ` [PATCH 4/4] RISC-V: Allow by-pieces to do overlapping accesses in block_move_straight Christoph Müllner
  3 siblings, 1 reply; 10+ messages in thread
From: Christoph Müllner @ 2024-05-08  5:17 UTC (permalink / raw)
  To: gcc-patches, Kito Cheng, Jim Wilson, Palmer Dabbelt,
	Andrew Waterman, Philipp Tomsich, Jeff Law, Vineet Gupta
  Cc: Christoph Müllner

This patch adds the field overlap_op_by_pieces to the struct
riscv_tune_param, which is used by the TARGET_OVERLAP_OP_BY_PIECES_P()
hook. This hook is used by the by-pieces infrastructure to decide
if overlapping memory accesses should be emitted.

The new property is set to false in all tune structs except for
generic-ooo.

The changes in the expansion can be seen in the adjustments of the
cpymem test cases. These tests also reveal a limitation in the
RISC-V cpymem expansion that prevents this optimization as only
by-pieces cpymem expansions emit overlapping memory accesses.

gcc/ChangeLog:

	* config/riscv/riscv.cc (struct riscv_tune_param): New field
	overlap_op_by_pieces.
	(riscv_overlap_op_by_pieces): New function.
	(TARGET_OVERLAP_OP_BY_PIECES_P): Connect to
	riscv_overlap_op_by_pieces.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/cpymem-32-ooo.c: Adjust for overlapping
	access.
	* gcc.target/riscv/cpymem-64-ooo.c: Likewise.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
---
 gcc/config/riscv/riscv.cc                     | 20 +++++++++++
 .../gcc.target/riscv/cpymem-32-ooo.c          | 20 +++++------
 .../gcc.target/riscv/cpymem-64-ooo.c          | 33 +++++++------------
 3 files changed, 40 insertions(+), 33 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 44945d47fd6..793ec3155b9 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -286,6 +286,7 @@ struct riscv_tune_param
   unsigned short memory_cost;
   unsigned short fmv_cost;
   bool slow_unaligned_access;
+  bool overlap_op_by_pieces;
   bool use_divmod_expansion;
   unsigned int fusible_ops;
   const struct cpu_vector_cost *vec_costs;
@@ -425,6 +426,7 @@ static const struct riscv_tune_param rocket_tune_info = {
   5,						/* memory_cost */
   8,						/* fmv_cost */
   true,						/* slow_unaligned_access */
+  false,					/* overlap_op_by_pieces */
   false,					/* use_divmod_expansion */
   RISCV_FUSE_NOTHING,                           /* fusible_ops */
   NULL,						/* vector cost */
@@ -442,6 +444,7 @@ static const struct riscv_tune_param sifive_7_tune_info = {
   3,						/* memory_cost */
   8,						/* fmv_cost */
   true,						/* slow_unaligned_access */
+  false,					/* overlap_op_by_pieces */
   false,					/* use_divmod_expansion */
   RISCV_FUSE_NOTHING,                           /* fusible_ops */
   NULL,						/* vector cost */
@@ -459,6 +462,7 @@ static const struct riscv_tune_param sifive_p400_tune_info = {
   3,						/* memory_cost */
   4,						/* fmv_cost */
   true,						/* slow_unaligned_access */
+  false,					/* overlap_op_by_pieces */
   false,					/* use_divmod_expansion */
   RISCV_FUSE_LUI_ADDI | RISCV_FUSE_AUIPC_ADDI,  /* fusible_ops */
   &generic_vector_cost,				/* vector cost */
@@ -476,6 +480,7 @@ static const struct riscv_tune_param sifive_p600_tune_info = {
   3,						/* memory_cost */
   4,						/* fmv_cost */
   true,						/* slow_unaligned_access */
+  false,					/* overlap_op_by_pieces */
   false,					/* use_divmod_expansion */
   RISCV_FUSE_LUI_ADDI | RISCV_FUSE_AUIPC_ADDI,  /* fusible_ops */
   &generic_vector_cost,				/* vector cost */
@@ -493,6 +498,7 @@ static const struct riscv_tune_param thead_c906_tune_info = {
   5,            /* memory_cost */
   8,		/* fmv_cost */
   false,            /* slow_unaligned_access */
+  false,					/* overlap_op_by_pieces */
   false,	/* use_divmod_expansion */
   RISCV_FUSE_NOTHING,                           /* fusible_ops */
   NULL,						/* vector cost */
@@ -510,6 +516,7 @@ static const struct riscv_tune_param xiangshan_nanhu_tune_info = {
   3,						/* memory_cost */
   3,						/* fmv_cost */
   true,						/* slow_unaligned_access */
+  false,					/* overlap_op_by_pieces */
   false,					/* use_divmod_expansion */
   RISCV_FUSE_ZEXTW | RISCV_FUSE_ZEXTH,          /* fusible_ops */
   NULL,						/* vector cost */
@@ -527,6 +534,7 @@ static const struct riscv_tune_param generic_ooo_tune_info = {
   4,						/* memory_cost */
   4,						/* fmv_cost */
   false,					/* slow_unaligned_access */
+  true,						/* overlap_op_by_pieces */
   false,					/* use_divmod_expansion */
   RISCV_FUSE_NOTHING,                           /* fusible_ops */
   &generic_vector_cost,				/* vector cost */
@@ -544,6 +552,7 @@ static const struct riscv_tune_param optimize_size_tune_info = {
   2,						/* memory_cost */
   8,						/* fmv_cost */
   false,					/* slow_unaligned_access */
+  false,					/* overlap_op_by_pieces */
   false,					/* use_divmod_expansion */
   RISCV_FUSE_NOTHING,                           /* fusible_ops */
   NULL,						/* vector cost */
@@ -9923,6 +9932,14 @@ riscv_slow_unaligned_access (machine_mode, unsigned int)
   return riscv_slow_unaligned_access_p;
 }
 
+/* Implement TARGET_OVERLAP_OP_BY_PIECES_P.  */
+
+static bool
+riscv_overlap_op_by_pieces (void)
+{
+  return tune_param->overlap_op_by_pieces;
+}
+
 /* Implement TARGET_CAN_CHANGE_MODE_CLASS.  */
 
 static bool
@@ -11340,6 +11357,9 @@ riscv_get_raw_result_mode (int regno)
 #undef TARGET_SLOW_UNALIGNED_ACCESS
 #define TARGET_SLOW_UNALIGNED_ACCESS riscv_slow_unaligned_access
 
+#undef TARGET_OVERLAP_OP_BY_PIECES_P
+#define TARGET_OVERLAP_OP_BY_PIECES_P riscv_overlap_op_by_pieces
+
 #undef TARGET_SECONDARY_MEMORY_NEEDED
 #define TARGET_SECONDARY_MEMORY_NEEDED riscv_secondary_memory_needed
 
diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
index 946a773f77a..947d58c30fa 100644
--- a/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
+++ b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
@@ -24,9 +24,8 @@ void copy_aligned_##N (void *to, void *from)		\
 **    ...
 **    lw\t[at][0-9],0\([at][0-9]\)
 **    sw\t[at][0-9],0\([at][0-9]\)
-**    ...
-**    lbu\t[at][0-9],6\([at][0-9]\)
-**    sb\t[at][0-9],6\([at][0-9]\)
+**    lw\t[at][0-9],3\([at][0-9]\)
+**    sw\t[at][0-9],3\([at][0-9]\)
 **    ...
 */
 COPY_N(7)
@@ -36,9 +35,8 @@ COPY_N(7)
 **    ...
 **    lw\t[at][0-9],0\([at][0-9]\)
 **    sw\t[at][0-9],0\([at][0-9]\)
-**    ...
-**    lbu\t[at][0-9],6\([at][0-9]\)
-**    sb\t[at][0-9],6\([at][0-9]\)
+**    lw\t[at][0-9],3\([at][0-9]\)
+**    sw\t[at][0-9],3\([at][0-9]\)
 **    ...
 */
 COPY_ALIGNED_N(7)
@@ -66,11 +64,10 @@ COPY_ALIGNED_N(8)
 **    ...
 **    ...
 **    lw\t[at][0-9],0\([at][0-9]\)
-**    ...
 **    sw\t[at][0-9],0\([at][0-9]\)
 **    ...
-**    lbu\t[at][0-9],10\([at][0-9]\)
-**    sb\t[at][0-9],10\([at][0-9]\)
+**    lw\t[at][0-9],7\([at][0-9]\)
+**    sw\t[at][0-9],7\([at][0-9]\)
 **    ...
 */
 COPY_N(11)
@@ -79,11 +76,10 @@ COPY_N(11)
 **copy_aligned_11:
 **    ...
 **    lw\t[at][0-9],0\([at][0-9]\)
-**    ...
 **    sw\t[at][0-9],0\([at][0-9]\)
 **    ...
-**    lbu\t[at][0-9],10\([at][0-9]\)
-**    sb\t[at][0-9],10\([at][0-9]\)
+**    lw\t[at][0-9],7\([at][0-9]\)
+**    sw\t[at][0-9],7\([at][0-9]\)
 **    ...
 */
 COPY_ALIGNED_N(11)
diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c b/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c
index 08a927b9483..108748690cd 100644
--- a/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c
+++ b/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c
@@ -24,9 +24,8 @@ void copy_aligned_##N (void *to, void *from)		\
 **    ...
 **    lw\t[at][0-9],0\([at][0-9]\)
 **    sw\t[at][0-9],0\([at][0-9]\)
-**    ...
-**    lbu\t[at][0-9],6\([at][0-9]\)
-**    sb\t[at][0-9],6\([at][0-9]\)
+**    lw\t[at][0-9],3\([at][0-9]\)
+**    sw\t[at][0-9],3\([at][0-9]\)
 **    ...
 */
 COPY_N(7)
@@ -36,9 +35,8 @@ COPY_N(7)
 **    ...
 **    lw\t[at][0-9],0\([at][0-9]\)
 **    sw\t[at][0-9],0\([at][0-9]\)
-**    ...
-**    lbu\t[at][0-9],6\([at][0-9]\)
-**    sb\t[at][0-9],6\([at][0-9]\)
+**    lw\t[at][0-9],3\([at][0-9]\)
+**    sw\t[at][0-9],3\([at][0-9]\)
 **    ...
 */
 COPY_ALIGNED_N(7)
@@ -66,9 +64,8 @@ COPY_ALIGNED_N(8)
 **    ...
 **    ld\t[at][0-9],0\([at][0-9]\)
 **    sd\t[at][0-9],0\([at][0-9]\)
-**    ...
-**    lbu\t[at][0-9],10\([at][0-9]\)
-**    sb\t[at][0-9],10\([at][0-9]\)
+**    lw\t[at][0-9],7\([at][0-9]\)
+**    sw\t[at][0-9],7\([at][0-9]\)
 **    ...
 */
 COPY_N(11)
@@ -77,11 +74,9 @@ COPY_N(11)
 **copy_aligned_11:
 **    ...
 **    ld\t[at][0-9],0\([at][0-9]\)
-**    ...
 **    sd\t[at][0-9],0\([at][0-9]\)
-**    ...
-**    lbu\t[at][0-9],10\([at][0-9]\)
-**    sb\t[at][0-9],10\([at][0-9]\)
+**    lw\t[at][0-9],7\([at][0-9]\)
+**    sw\t[at][0-9],7\([at][0-9]\)
 **    ...
 */
 COPY_ALIGNED_N(11)
@@ -90,11 +85,9 @@ COPY_ALIGNED_N(11)
 **copy_15:
 **    ...
 **    ld\t[at][0-9],0\([at][0-9]\)
-**    ...
 **    sd\t[at][0-9],0\([at][0-9]\)
-**    ...
-**    lbu\t[at][0-9],14\([at][0-9]\)
-**    sb\t[at][0-9],14\([at][0-9]\)
+**    ld\t[at][0-9],7\([at][0-9]\)
+**    sd\t[at][0-9],7\([at][0-9]\)
 **    ...
 */
 COPY_N(15)
@@ -103,11 +96,9 @@ COPY_N(15)
 **copy_aligned_15:
 **    ...
 **    ld\t[at][0-9],0\([at][0-9]\)
-**    ...
 **    sd\t[at][0-9],0\([at][0-9]\)
-**    ...
-**    lbu\t[at][0-9],14\([at][0-9]\)
-**    sb\t[at][0-9],14\([at][0-9]\)
+**    ld\t[at][0-9],7\([at][0-9]\)
+**    sd\t[at][0-9],7\([at][0-9]\)
 **    ...
 */
 COPY_ALIGNED_N(15)
-- 
2.44.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 4/4] RISC-V: Allow by-pieces to do overlapping accesses in block_move_straight
  2024-05-08  5:17 [PATCH 0/4] RISC-V: Enhance unaligned/overlapping codegen Christoph Müllner
                   ` (2 preceding siblings ...)
  2024-05-08  5:17 ` [PATCH 3/4] RISC-V: tune: Add setting for overlapping mem ops to tuning struct Christoph Müllner
@ 2024-05-08  5:17 ` Christoph Müllner
  2024-05-10 22:42   ` Jeff Law
  3 siblings, 1 reply; 10+ messages in thread
From: Christoph Müllner @ 2024-05-08  5:17 UTC (permalink / raw)
  To: gcc-patches, Kito Cheng, Jim Wilson, Palmer Dabbelt,
	Andrew Waterman, Philipp Tomsich, Jeff Law, Vineet Gupta
  Cc: Christoph Müllner

The current implementation of riscv_block_move_straight() emits a couple
of loads/stores with with maximum width (e.g. 8-byte for RV64).
The remainder is handed over to move_by_pieces().
The by-pieces framework utilizes target hooks to decide about the emitted
instructions (e.g. unaligned accesses or overlapping accesses).

Since the current implementation will always request less than XLEN bytes
to be handled by the by-pieces infrastructure, it is impossible that
overlapping memory accesses can ever be emitted (the by-pieces code does
not know of any previous instructions that were emitted by the backend).

This patch changes the implementation of riscv_block_move_straight()
such, that it utilizes the by-pieces framework if the remaining data
is less than 2*XLEN bytes, which is sufficient to enable overlapping
memory accesses (if the requirements for them are given).

The changes in the expansion can be seen in the adjustments of the
cpymem-NN-ooo test cases. The changes in the cpymem-NN tests are
caused by the different instruction ordering of the code emitted
by the by-pieces infrastructure, which emits alternating load/store
sequences.

gcc/ChangeLog:

	* config/riscv/riscv-string.cc (riscv_block_move_straight):
	Hand over up to 2xXLEN bytes to move_by_pieces().

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/cpymem-32-ooo.c: Adjustments for overlapping
	access.
	* gcc.target/riscv/cpymem-32.c: Adjustments for code emitted by
	by-pieces.
	* gcc.target/riscv/cpymem-64-ooo.c: Adjustments for overlapping
	access.
	* gcc.target/riscv/cpymem-64.c: Adjustments for code emitted by
	by-pieces.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
---
 gcc/config/riscv/riscv-string.cc               |  6 +++---
 gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c | 16 ++++++++--------
 gcc/testsuite/gcc.target/riscv/cpymem-32.c     | 10 ++++------
 gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c |  8 ++++----
 gcc/testsuite/gcc.target/riscv/cpymem-64.c     |  9 +++------
 5 files changed, 22 insertions(+), 27 deletions(-)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 8fc0877772f..38cf60eb9cf 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -630,18 +630,18 @@ riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length,
   delta = bits / BITS_PER_UNIT;
 
   /* Allocate a buffer for the temporary registers.  */
-  regs = XALLOCAVEC (rtx, length / delta);
+  regs = XALLOCAVEC (rtx, length / delta - 1);
 
   /* Load as many BITS-sized chunks as possible.  Use a normal load if
      the source has enough alignment, otherwise use left/right pairs.  */
-  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+  for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++)
     {
       regs[i] = gen_reg_rtx (mode);
       riscv_emit_move (regs[i], adjust_address (src, mode, offset));
     }
 
   /* Copy the chunks to the destination.  */
-  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+  for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++)
     riscv_emit_move (adjust_address (dest, mode, offset), regs[i]);
 
   /* Mop up any left-over bytes.  */
diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
index 947d58c30fa..2a48567353a 100644
--- a/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
+++ b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
@@ -91,8 +91,8 @@ COPY_ALIGNED_N(11)
 **    ...
 **    sw\t[at][0-9],0\([at][0-9]\)
 **    ...
-**    lbu\t[at][0-9],14\([at][0-9]\)
-**    sb\t[at][0-9],14\([at][0-9]\)
+**    lw\t[at][0-9],11\([at][0-9]\)
+**    sw\t[at][0-9],11\([at][0-9]\)
 **    ...
 */
 COPY_N(15)
@@ -104,8 +104,8 @@ COPY_N(15)
 **    ...
 **    sw\t[at][0-9],0\([at][0-9]\)
 **    ...
-**    lbu\t[at][0-9],14\([at][0-9]\)
-**    sb\t[at][0-9],14\([at][0-9]\)
+**    lw\t[at][0-9],11\([at][0-9]\)
+**    sw\t[at][0-9],11\([at][0-9]\)
 **    ...
 */
 COPY_ALIGNED_N(15)
@@ -117,8 +117,8 @@ COPY_ALIGNED_N(15)
 **    ...
 **    sw\t[at][0-9],20\([at][0-9]\)
 **    ...
-**    lbu\t[at][0-9],26\([at][0-9]\)
-**    sb\t[at][0-9],26\([at][0-9]\)
+**    lw\t[at][0-9],23\([at][0-9]\)
+**    sw\t[at][0-9],23\([at][0-9]\)
 **    ...
 */
 COPY_N(27)
@@ -130,8 +130,8 @@ COPY_N(27)
 **    ...
 **    sw\t[at][0-9],20\([at][0-9]\)
 **    ...
-**    lbu\t[at][0-9],26\([at][0-9]\)
-**    sb\t[at][0-9],26\([at][0-9]\)
+**    lw\t[at][0-9],23\([at][0-9]\)
+**    sw\t[at][0-9],23\([at][0-9]\)
 **    ...
 */
 COPY_ALIGNED_N(27)
diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-32.c b/gcc/testsuite/gcc.target/riscv/cpymem-32.c
index 44ba14a1d51..2030a39ca97 100644
--- a/gcc/testsuite/gcc.target/riscv/cpymem-32.c
+++ b/gcc/testsuite/gcc.target/riscv/cpymem-32.c
@@ -24,10 +24,10 @@ void copy_aligned_##N (void *to, void *from)		\
 **    ...
 **    lbu\t[at][0-9],0\([at][0-9]\)
 **    ...
-**    lbu\t[at][0-9],6\([at][0-9]\)
-**    ...
 **    sb\t[at][0-9],0\([at][0-9]\)
 **    ...
+**    lbu\t[at][0-9],6\([at][0-9]\)
+**    ...
 **    sb\t[at][0-9],6\([at][0-9]\)
 **    ...
 */
@@ -50,10 +50,9 @@ COPY_ALIGNED_N(7)
 **    ...
 **    lbu\t[at][0-9],0\([at][0-9]\)
 **    ...
-**    lbu\t[at][0-9],7\([at][0-9]\)
-**    ...
 **    sb\t[at][0-9],0\([at][0-9]\)
 **    ...
+**    lbu\t[at][0-9],7\([at][0-9]\)
 **    sb\t[at][0-9],7\([at][0-9]\)
 **    ...
 */
@@ -73,10 +72,9 @@ COPY_ALIGNED_N(8)
 **    ...
 **    lbu\t[at][0-9],0\([at][0-9]\)
 **    ...
-**    lbu\t[at][0-9],10\([at][0-9]\)
-**    ...
 **    sb\t[at][0-9],0\([at][0-9]\)
 **    ...
+**    lbu\t[at][0-9],10\([at][0-9]\)
 **    sb\t[at][0-9],10\([at][0-9]\)
 **    ...
 */
diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c b/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c
index 108748690cd..147324093cb 100644
--- a/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c
+++ b/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c
@@ -110,8 +110,8 @@ COPY_ALIGNED_N(15)
 **    ...
 **    sd\t[at][0-9],16\([at][0-9]\)
 **    ...
-**    lbu\t[at][0-9],26\([at][0-9]\)
-**    sb\t[at][0-9],26\([at][0-9]\)
+**    lw\t[at][0-9],23\([at][0-9]\)
+**    sw\t[at][0-9],23\([at][0-9]\)
 **    ...
 */
 COPY_N(27)
@@ -123,8 +123,8 @@ COPY_N(27)
 **    ...
 **    sd\t[at][0-9],16\([at][0-9]\)
 **    ...
-**    lbu\t[at][0-9],26\([at][0-9]\)
-**    sb\t[at][0-9],26\([at][0-9]\)
+**    lw\t[at][0-9],23\([at][0-9]\)
+**    sw\t[at][0-9],23\([at][0-9]\)
 **    ...
 */
 COPY_ALIGNED_N(27)
diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-64.c b/gcc/testsuite/gcc.target/riscv/cpymem-64.c
index bdfaca0d46a..37b8ef0e020 100644
--- a/gcc/testsuite/gcc.target/riscv/cpymem-64.c
+++ b/gcc/testsuite/gcc.target/riscv/cpymem-64.c
@@ -24,10 +24,9 @@ void copy_aligned_##N (void *to, void *from)		\
 **    ...
 **    lbu\t[at][0-9],0\([at][0-9]\)
 **    ...
-**    lbu\t[at][0-9],6\([at][0-9]\)
-**    ...
 **    sb\t[at][0-9],0\([at][0-9]\)
 **    ...
+**    lbu\t[at][0-9],6\([at][0-9]\)
 **    sb\t[at][0-9],6\([at][0-9]\)
 **    ...
 */
@@ -50,10 +49,9 @@ COPY_ALIGNED_N(7)
 **    ...
 **    lbu\t[at][0-9],0\([at][0-9]\)
 **    ...
-**    lbu\t[at][0-9],7\([at][0-9]\)
-**    ...
 **    sb\t[at][0-9],0\([at][0-9]\)
 **    ...
+**    lbu\t[at][0-9],7\([at][0-9]\)
 **    sb\t[at][0-9],7\([at][0-9]\)
 **    ...
 */
@@ -73,10 +71,9 @@ COPY_ALIGNED_N(8)
 **    ...
 **    lbu\t[at][0-9],0\([at][0-9]\)
 **    ...
-**    lbu\t[at][0-9],10\([at][0-9]\)
-**    ...
 **    sb\t[at][0-9],0\([at][0-9]\)
 **    ...
+**    lbu\t[at][0-9],10\([at][0-9]\)
 **    sb\t[at][0-9],10\([at][0-9]\)
 **    ...
 */
-- 
2.44.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] RISC-V: Add test cases for cpymem expansion
  2024-05-08  5:17 ` [PATCH 1/4] RISC-V: Add test cases for cpymem expansion Christoph Müllner
@ 2024-05-09 21:15   ` Jeff Law
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff Law @ 2024-05-09 21:15 UTC (permalink / raw)
  To: Christoph Müllner, gcc-patches, Kito Cheng, Jim Wilson,
	Palmer Dabbelt, Andrew Waterman, Philipp Tomsich, Vineet Gupta



On 5/7/24 11:17 PM, Christoph Müllner wrote:
> We have two mechanisms in the RISC-V backend that expand
> cpymem pattern: a) by-pieces, b) riscv_expand_block_move()
> in riscv-string.cc. The by-pieces framework has higher priority
> and emits a sequence of up to 15 instructions
> (see use_by_pieces_infrastructure_p() for more details).
> 
> As a rule-of-thumb, by-pieces emits alternating load/store sequences
> and the setmem expansion in the backend emits a sequence of loads
> followed by a sequence of stores.
> 
> Let's add some test cases to document the current behaviour
> and to have tests to identify regressions.
> 
> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/riscv/cpymem-32-ooo.c: New test.
> 	* gcc.target/riscv/cpymem-32.c: New test.
> 	* gcc.target/riscv/cpymem-64-ooo.c: New test.
> 	* gcc.target/riscv/cpymem-64.c: New test.
It looks like those function body tests are fairly generic.  So OK.

Jeff


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] RISC-V: Allow unaligned accesses in cpymemsi expansion
  2024-05-08  5:17 ` [PATCH 2/4] RISC-V: Allow unaligned accesses in cpymemsi expansion Christoph Müllner
@ 2024-05-10 22:32   ` Jeff Law
  2024-05-15 11:02     ` Christoph Müllner
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Law @ 2024-05-10 22:32 UTC (permalink / raw)
  To: Christoph Müllner, gcc-patches, Kito Cheng, Jim Wilson,
	Palmer Dabbelt, Andrew Waterman, Philipp Tomsich, Vineet Gupta



On 5/7/24 11:17 PM, Christoph Müllner wrote:
> The RISC-V cpymemsi expansion is called, whenever the by-pieces
> infrastructure will not take care of the builtin expansion.
> The code emitted by the by-pieces infrastructure may emits code,
> that includes unaligned accesses if riscv_slow_unaligned_access_p
> is false.
> 
> The RISC-V cpymemsi expansion is handled via riscv_expand_block_move().
> The current implementation of this function does not check
> riscv_slow_unaligned_access_p and never emits unaligned accesses.
> 
> Since by-pieces emits unaligned accesses, it is reasonable to implement
> the same behaviour in the cpymemsi expansion. And that's what this patch
> is doing.
> 
> The patch checks riscv_slow_unaligned_access_p at the entry and sets
> the allowed alignment accordingly. This alignment is then propagated
> down to the routines that emit the actual instructions.
> 
> The changes introduced by this patch can be seen in the adjustments
> of the cpymem tests.
> 
> gcc/ChangeLog:
> 
> 	* config/riscv/riscv-string.cc (riscv_block_move_straight): Add
> 	parameter align.
> 	(riscv_adjust_block_mem): Replace parameter length by align.
> 	(riscv_block_move_loop): Add parameter align.
> 	(riscv_expand_block_move_scalar): Set alignment properly if the
> 	target has fast unaligned access.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/riscv/cpymem-32-ooo.c: Adjust for unaligned access.
> 	* gcc.target/riscv/cpymem-64-ooo.c: Likewise.
Mostly ok.  One concern noted below.


> 
> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
> ---
>   gcc/config/riscv/riscv-string.cc              | 53 +++++++++++--------
>   .../gcc.target/riscv/cpymem-32-ooo.c          | 20 +++++--
>   .../gcc.target/riscv/cpymem-64-ooo.c          | 14 ++++-
>   3 files changed, 59 insertions(+), 28 deletions(-)
> 
> @@ -730,8 +732,16 @@ riscv_expand_block_move_scalar (rtx dest, rtx src, rtx length)
>     unsigned HOST_WIDE_INT hwi_length = UINTVAL (length);
>     unsigned HOST_WIDE_INT factor, align;
>   
> -  align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD);
> -  factor = BITS_PER_WORD / align;
> +  if (riscv_slow_unaligned_access_p)
> +    {
> +      align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD);
> +      factor = BITS_PER_WORD / align;
> +    }
> +  else
> +    {
> +      align = hwi_length * BITS_PER_UNIT;
> +      factor = 1;
> +    }
Not sure why you're using hwi_length here.  That's a property of the 
host, not the target.  ISTM you wanted BITS_PER_WORD here to encourage 
word sized moves irrespective of alignment.

OK with that change after a fresh rounding of testing.

jeff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/4] RISC-V: tune: Add setting for overlapping mem ops to tuning struct
  2024-05-08  5:17 ` [PATCH 3/4] RISC-V: tune: Add setting for overlapping mem ops to tuning struct Christoph Müllner
@ 2024-05-10 22:40   ` Jeff Law
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff Law @ 2024-05-10 22:40 UTC (permalink / raw)
  To: Christoph Müllner, gcc-patches, Kito Cheng, Jim Wilson,
	Palmer Dabbelt, Andrew Waterman, Philipp Tomsich, Vineet Gupta



On 5/7/24 11:17 PM, Christoph Müllner wrote:
> This patch adds the field overlap_op_by_pieces to the struct
> riscv_tune_param, which is used by the TARGET_OVERLAP_OP_BY_PIECES_P()
> hook. This hook is used by the by-pieces infrastructure to decide
> if overlapping memory accesses should be emitted.
> 
> The new property is set to false in all tune structs except for
> generic-ooo.
> 
> The changes in the expansion can be seen in the adjustments of the
> cpymem test cases. These tests also reveal a limitation in the
> RISC-V cpymem expansion that prevents this optimization as only
> by-pieces cpymem expansions emit overlapping memory accesses.
> 
> gcc/ChangeLog:
> 
> 	* config/riscv/riscv.cc (struct riscv_tune_param): New field
> 	overlap_op_by_pieces.
> 	(riscv_overlap_op_by_pieces): New function.
> 	(TARGET_OVERLAP_OP_BY_PIECES_P): Connect to
> 	riscv_overlap_op_by_pieces.
I think these are redundant with the changes I installed earlier this 
week :-)

> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/riscv/cpymem-32-ooo.c: Adjust for overlapping
> 	access.
> 	* gcc.target/riscv/cpymem-64-ooo.c: Likewise.
OK once prereqs are in.

jeff


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/4] RISC-V: Allow by-pieces to do overlapping accesses in block_move_straight
  2024-05-08  5:17 ` [PATCH 4/4] RISC-V: Allow by-pieces to do overlapping accesses in block_move_straight Christoph Müllner
@ 2024-05-10 22:42   ` Jeff Law
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff Law @ 2024-05-10 22:42 UTC (permalink / raw)
  To: Christoph Müllner, gcc-patches, Kito Cheng, Jim Wilson,
	Palmer Dabbelt, Andrew Waterman, Philipp Tomsich, Vineet Gupta



On 5/7/24 11:17 PM, Christoph Müllner wrote:
> The current implementation of riscv_block_move_straight() emits a couple
> of loads/stores with with maximum width (e.g. 8-byte for RV64).
> The remainder is handed over to move_by_pieces().
> The by-pieces framework utilizes target hooks to decide about the emitted
> instructions (e.g. unaligned accesses or overlapping accesses).
> 
> Since the current implementation will always request less than XLEN bytes
> to be handled by the by-pieces infrastructure, it is impossible that
> overlapping memory accesses can ever be emitted (the by-pieces code does
> not know of any previous instructions that were emitted by the backend).
> 
> This patch changes the implementation of riscv_block_move_straight()
> such, that it utilizes the by-pieces framework if the remaining data
> is less than 2*XLEN bytes, which is sufficient to enable overlapping
> memory accesses (if the requirements for them are given).
> 
> The changes in the expansion can be seen in the adjustments of the
> cpymem-NN-ooo test cases. The changes in the cpymem-NN tests are
> caused by the different instruction ordering of the code emitted
> by the by-pieces infrastructure, which emits alternating load/store
> sequences.
> 
> gcc/ChangeLog:
> 
> 	* config/riscv/riscv-string.cc (riscv_block_move_straight):
> 	Hand over up to 2xXLEN bytes to move_by_pieces().
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/riscv/cpymem-32-ooo.c: Adjustments for overlapping
> 	access.
> 	* gcc.target/riscv/cpymem-32.c: Adjustments for code emitted by
> 	by-pieces.
> 	* gcc.target/riscv/cpymem-64-ooo.c: Adjustments for overlapping
> 	access.
> 	* gcc.target/riscv/cpymem-64.c: Adjustments for code emitted by
> 	by-pieces.
OK once any prereqs are in.

jeff


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] RISC-V: Allow unaligned accesses in cpymemsi expansion
  2024-05-10 22:32   ` Jeff Law
@ 2024-05-15 11:02     ` Christoph Müllner
  0 siblings, 0 replies; 10+ messages in thread
From: Christoph Müllner @ 2024-05-15 11:02 UTC (permalink / raw)
  To: Jeff Law
  Cc: gcc-patches, Kito Cheng, Jim Wilson, Palmer Dabbelt,
	Andrew Waterman, Philipp Tomsich, Vineet Gupta

On Sat, May 11, 2024 at 12:32 AM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 5/7/24 11:17 PM, Christoph Müllner wrote:
> > The RISC-V cpymemsi expansion is called, whenever the by-pieces
> > infrastructure will not take care of the builtin expansion.
> > The code emitted by the by-pieces infrastructure may emits code,
> > that includes unaligned accesses if riscv_slow_unaligned_access_p
> > is false.
> >
> > The RISC-V cpymemsi expansion is handled via riscv_expand_block_move().
> > The current implementation of this function does not check
> > riscv_slow_unaligned_access_p and never emits unaligned accesses.
> >
> > Since by-pieces emits unaligned accesses, it is reasonable to implement
> > the same behaviour in the cpymemsi expansion. And that's what this patch
> > is doing.
> >
> > The patch checks riscv_slow_unaligned_access_p at the entry and sets
> > the allowed alignment accordingly. This alignment is then propagated
> > down to the routines that emit the actual instructions.
> >
> > The changes introduced by this patch can be seen in the adjustments
> > of the cpymem tests.
> >
> > gcc/ChangeLog:
> >
> >       * config/riscv/riscv-string.cc (riscv_block_move_straight): Add
> >       parameter align.
> >       (riscv_adjust_block_mem): Replace parameter length by align.
> >       (riscv_block_move_loop): Add parameter align.
> >       (riscv_expand_block_move_scalar): Set alignment properly if the
> >       target has fast unaligned access.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.target/riscv/cpymem-32-ooo.c: Adjust for unaligned access.
> >       * gcc.target/riscv/cpymem-64-ooo.c: Likewise.
> Mostly ok.  One concern noted below.
>
>
> >
> > Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
> > ---
> >   gcc/config/riscv/riscv-string.cc              | 53 +++++++++++--------
> >   .../gcc.target/riscv/cpymem-32-ooo.c          | 20 +++++--
> >   .../gcc.target/riscv/cpymem-64-ooo.c          | 14 ++++-
> >   3 files changed, 59 insertions(+), 28 deletions(-)
> >
> > @@ -730,8 +732,16 @@ riscv_expand_block_move_scalar (rtx dest, rtx src, rtx length)
> >     unsigned HOST_WIDE_INT hwi_length = UINTVAL (length);
> >     unsigned HOST_WIDE_INT factor, align;
> >
> > -  align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD);
> > -  factor = BITS_PER_WORD / align;
> > +  if (riscv_slow_unaligned_access_p)
> > +    {
> > +      align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD);
> > +      factor = BITS_PER_WORD / align;
> > +    }
> > +  else
> > +    {
> > +      align = hwi_length * BITS_PER_UNIT;
> > +      factor = 1;
> > +    }
> Not sure why you're using hwi_length here.  That's a property of the
> host, not the target.  ISTM you wanted BITS_PER_WORD here to encourage
> word sized moves irrespective of alignment.

We set 'align' here to pretend proper alignment to force unaligned
accesses (if needed).
'hwi_length' is defined above as:
  unsigned HOST_WIDE_INT hwi_length = UINTVAL (length);
So, it is not a host property, but the number of bytes to compare.

Setting 'align' to BITS_PER_WORD does the same but is indeed the better choice.
I'll also add the comment "Pretend alignment" to make readers aware of
the fact that
we ignore the actual alignment.

> OK with that change after a fresh rounding of testing.

Pushed after adjusting as stated above and retesting:
* rv32 & rv64: GCC regression tests
* rv64: CPU 2017 intrate

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-05-15 11:02 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-08  5:17 [PATCH 0/4] RISC-V: Enhance unaligned/overlapping codegen Christoph Müllner
2024-05-08  5:17 ` [PATCH 1/4] RISC-V: Add test cases for cpymem expansion Christoph Müllner
2024-05-09 21:15   ` Jeff Law
2024-05-08  5:17 ` [PATCH 2/4] RISC-V: Allow unaligned accesses in cpymemsi expansion Christoph Müllner
2024-05-10 22:32   ` Jeff Law
2024-05-15 11:02     ` Christoph Müllner
2024-05-08  5:17 ` [PATCH 3/4] RISC-V: tune: Add setting for overlapping mem ops to tuning struct Christoph Müllner
2024-05-10 22:40   ` Jeff Law
2024-05-08  5:17 ` [PATCH 4/4] RISC-V: Allow by-pieces to do overlapping accesses in block_move_straight Christoph Müllner
2024-05-10 22:42   ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).