public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/3] RISC-V: vectorised memory operations
@ 2023-12-11  9:47 Sergei Lewis
  2023-12-11  9:47 ` [PATCH 1/3] RISC-V: movmem for RISCV with V extension Sergei Lewis
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Sergei Lewis @ 2023-12-11  9:47 UTC (permalink / raw)
  To: gcc-patches

This patchset permits generation of inlined vectorised code for movmem, 
setmem and cmpmem, if and only if the operation size is 
at least one and at most eight vector registers' worth of data.

Further vectorisation rapidly becomes debatable due to code size concerns;
however, for these simple cases we do have an unambiguous performance win 
without sacrificing too much code size compared to a libc call.

Signed-off-by: Sergei Lewis <slewis@rivosinc.com>

---

Sergei Lewis (3):
  RISC-V: movmem for RISCV with V extension
  RISC-V: setmem for RISCV with V extension
  RISC-V: cmpmem for RISCV with V extension

 gcc/config/riscv/riscv-protos.h               |   2 +
 gcc/config/riscv/riscv-string.cc              | 193 ++++++++++++++++++
 gcc/config/riscv/riscv.md                     |  51 +++++
 .../gcc.target/riscv/rvv/base/cmpmem-1.c      |  85 ++++++++
 .../gcc.target/riscv/rvv/base/cmpmem-2.c      |  69 +++++++
 .../gcc.target/riscv/rvv/base/movmem-1.c      |  59 ++++++
 .../gcc.target/riscv/rvv/base/setmem-1.c      |  99 +++++++++
 7 files changed, 558 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/setmem-1.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/3] RISC-V: movmem for RISCV with V extension
  2023-12-11  9:47 [PATCH 0/3] RISC-V: vectorised memory operations Sergei Lewis
@ 2023-12-11  9:47 ` Sergei Lewis
  2023-12-11 10:08   ` Robin Dapp
  2023-12-11 10:21   ` Robin Dapp
  2023-12-11  9:47 ` [PATCH 2/3] RISC-V: setmem " Sergei Lewis
  2023-12-11  9:47 ` [PATCH 3/3] RISC-V: cmpmem " Sergei Lewis
  2 siblings, 2 replies; 9+ messages in thread
From: Sergei Lewis @ 2023-12-11  9:47 UTC (permalink / raw)
  To: gcc-patches

gcc/ChangeLog

    * config/riscv/riscv.md (movmem<mode>): Use riscv_vector::expand_block_move,
    if and only if we know the entire operation can be performed using one vector
    load followed by one vector store

gcc/testsuite/ChangeLog

    * gcc.target/riscv/rvv/base/movmem-1.c: New test
---
 gcc/config/riscv/riscv.md                     | 22 +++++++
 .../gcc.target/riscv/rvv/base/movmem-1.c      | 59 +++++++++++++++++++
 2 files changed, 81 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index eed997116b0..88fde290a8a 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2359,6 +2359,28 @@
     FAIL;
 })
 
+;; inlining general memmove is a pessimisation: we can't avoid having to decide
+;; which direction to go at runtime, which is costly in instruction count
+;; however for situations where the entire move fits in one vector operation
+;; we can do all reads before doing any writes so we don't have to worry
+;; so generate the inline vector code in such situations
+;; nb. prefer scalar path for tiny memmoves
+(define_expand "movmem<mode>"
+  [(parallel [(set (match_operand:BLK 0 "general_operand")
+      (match_operand:BLK 1 "general_operand"))
+	      (use (match_operand:P 2 ""))
+	      (use (match_operand:SI 3 "const_int_operand"))])]
+  "TARGET_VECTOR"
+{
+  if ((INTVAL (operands[2]) >= TARGET_MIN_VLEN/8)
+	&& (INTVAL (operands[2]) <= TARGET_MIN_VLEN)
+	&& riscv_vector::expand_block_move (operands[0], operands[1],
+	     operands[2]))
+    DONE;
+  else
+    FAIL;
+})
+
 ;; Expand in-line code to clear the instruction cache between operand[0] and
 ;; operand[1].
 (define_expand "clear_cache"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c b/gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c
new file mode 100644
index 00000000000..b930241ae5d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c
@@ -0,0 +1,59 @@
+/* { dg-do compile } */
+/* { dg-add-options riscv_v } */
+/* { dg-additional-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <string.h>
+
+#define MIN_VECTOR_BYTES (__riscv_v_min_vlen/8)
+
+/* tiny memmoves should not be vectorised
+** f1:
+**  li\s+a2,15
+**  tail\s+memmove
+*/
+char * f1 (char *a, char const *b)
+{
+  return memmove (a, b, 15);
+}
+
+/* vectorise+inline minimum vector register width with LMUL=1
+** f2:
+**  (
+**  vsetivli\s+zero,16,e8,m1,ta,ma
+**  |
+**  li\s+[ta][0-7],\d+
+**  vsetvli\s+zero,[ta][0-7],e8,m1,ta,ma
+**  )
+**  vle8\.v\s+v\d+,0\(a1\)
+**  vse8\.v\s+v\d+,0\(a0\)
+**  ret
+*/
+char * f2 (char *a, char const *b)
+{
+  return memmove (a, b, MIN_VECTOR_BYTES);
+}
+
+/* vectorise+inline up to LMUL=8
+** f3:
+**  li\s+[ta][0-7],\d+
+**  vsetvli\s+zero,[ta][0-7],e8,m8,ta,ma
+**  vle8\.v\s+v\d+,0\(a1\)
+**  vse8\.v\s+v\d+,0\(a0\)
+**  ret
+*/
+char * f3 (char *a, char const *b)
+{
+  return memmove (a, b, MIN_VECTOR_BYTES*8);
+}
+
+/* don't vectorise if the move is too large for one operation
+** f4:
+**  li\s+a2,\d+
+**  tail\s+memmove
+*/
+char * f4 (char *a, char const *b)
+{
+  return memmove (a, b, MIN_VECTOR_BYTES*8+1);
+}
+
-- 
2.34.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/3] RISC-V: setmem for RISCV with V extension
  2023-12-11  9:47 [PATCH 0/3] RISC-V: vectorised memory operations Sergei Lewis
  2023-12-11  9:47 ` [PATCH 1/3] RISC-V: movmem for RISCV with V extension Sergei Lewis
@ 2023-12-11  9:47 ` Sergei Lewis
  2023-12-11 13:38   ` Kito Cheng
  2023-12-11  9:47 ` [PATCH 3/3] RISC-V: cmpmem " Sergei Lewis
  2 siblings, 1 reply; 9+ messages in thread
From: Sergei Lewis @ 2023-12-11  9:47 UTC (permalink / raw)
  To: gcc-patches

gcc/ChangeLog

    * config/riscv/riscv-protos.h (riscv_vector::expand_vec_setmem): New function
    declaration.

    * config/riscv/riscv-string.cc (riscv_vector::expand_vec_setmem): New
    function: this generates an inline vectorised memory set, if and only if we
    know the entire operation can be performed in a single vector store

    * config/riscv/riscv.md (setmem<mode>): Try riscv_vector::expand_vec_setmem
    for constant lengths

gcc/testsuite/ChangeLog
    * gcc.target/riscv/rvv/base/setmem-1.c: New tests
---
 gcc/config/riscv/riscv-protos.h               |  1 +
 gcc/config/riscv/riscv-string.cc              | 82 +++++++++++++++
 gcc/config/riscv/riscv.md                     | 14 +++
 .../gcc.target/riscv/rvv/base/setmem-1.c      | 99 +++++++++++++++++++
 4 files changed, 196 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/setmem-1.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 20bbb5b859c..950cb65c910 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -560,6 +560,7 @@ void expand_popcount (rtx *);
 void expand_rawmemchr (machine_mode, rtx, rtx, rtx, bool = false);
 bool expand_strcmp (rtx, rtx, rtx, rtx, unsigned HOST_WIDE_INT, bool);
 void emit_vec_extract (rtx, rtx, poly_int64);
+bool expand_vec_setmem (rtx, rtx, rtx, rtx);
 
 /* Rounding mode bitfield for fixed point VXRM.  */
 enum fixed_point_rounding_mode
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 11c1f74d0b3..0abbd5f8b28 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -1247,4 +1247,86 @@ expand_strcmp (rtx result, rtx src1, rtx src2, rtx nbytes,
   return true;
 }
 
+
+/* Select appropriate LMUL for a single vector operation based on
+   byte size of data to be processed.
+   On success, return true and populate lmul_out.
+   If length_in is too wide for a single vector operation, return false
+   and leave lmul_out unchanged.  */
+
+static bool
+select_appropriate_lmul (HOST_WIDE_INT length_in,
+    HOST_WIDE_INT &lmul_out)
+{
+  /* if it's tiny, default operation is likely better; maybe worth
+     considering fractional lmul in the future as well.  */
+  if (length_in < (TARGET_MIN_VLEN/8))
+    return false;
+
+  /* find smallest lmul large enough for entire op.  */
+  HOST_WIDE_INT lmul = 1;
+  while ((lmul <= 8) && (length_in > ((lmul*TARGET_MIN_VLEN)/8)))
+    {
+      lmul <<= 1;
+    }
+
+  if (lmul > 8)
+    return false;
+
+  lmul_out = lmul;
+  return true;
+}
+
+/* Used by setmemdi in riscv.md.  */
+bool
+expand_vec_setmem (rtx dst_in, rtx length_in, rtx fill_value_in,
+	  rtx alignment_in)
+{
+  /* we're generating vector code.  */
+  if (!TARGET_VECTOR)
+    return false;
+  /* if we can't reason about the length, let libc handle the operation.  */
+  if (!CONST_INT_P (length_in))
+    return false;
+
+  HOST_WIDE_INT length = INTVAL (length_in);
+  HOST_WIDE_INT lmul;
+
+  /* select an lmul such that the data just fits into one vector operation;
+     bail if we can't.  */
+  if (!select_appropriate_lmul (length, lmul))
+    return false;
+
+  machine_mode vmode = riscv_vector::get_vector_mode (QImode,
+	  BYTES_PER_RISCV_VECTOR * lmul).require ();
+  rtx dst_addr = copy_addr_to_reg (XEXP (dst_in, 0));
+  rtx dst = change_address (dst_in, vmode, dst_addr);
+
+  rtx fill_value = gen_reg_rtx (vmode);
+  rtx broadcast_ops[] = {fill_value, fill_value_in};
+
+  /* If the length is exactly vlmax for the selected mode, do that.
+     Otherwise, use a predicated store.  */
+  if (known_eq (GET_MODE_SIZE (vmode), INTVAL (length_in)))
+    {
+      emit_vlmax_insn (code_for_pred_broadcast (vmode),
+	      UNARY_OP, broadcast_ops);
+      emit_move_insn (dst, fill_value);
+    }
+  else
+    {
+      if (!satisfies_constraint_K (length_in))
+	      length_in= force_reg (Pmode, length_in);
+      emit_nonvlmax_insn (code_for_pred_broadcast (vmode), UNARY_OP,
+	      broadcast_ops, length_in);
+      machine_mode mask_mode = riscv_vector::get_vector_mode
+	      (BImode, GET_MODE_NUNITS (vmode)).require ();
+      rtx mask =  CONSTM1_RTX (mask_mode);
+      emit_insn (gen_pred_store (vmode, dst, mask, fill_value, length_in,
+	      get_avl_type_rtx (riscv_vector::NONVLMAX)));
+    }
+
+  return true;
+}
+
 }
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 88fde290a8a..29d3b1aa342 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2381,6 +2381,20 @@
     FAIL;
 })
 
+(define_expand "setmemsi"
+  [(set (match_operand:BLK 0 "memory_operand")     ;; Dest
+	      (match_operand:QI  2 "nonmemory_operand")) ;; Value
+   (use (match_operand:SI  1 "const_int_operand")) ;; Length
+   (match_operand:SI       3 "const_int_operand")] ;; Align
+  "TARGET_VECTOR"
+{
+  if (riscv_vector::expand_vec_setmem (operands[0], operands[1], operands[2],
+      operands[3]))
+    DONE;
+  else
+    FAIL;
+})
+
 ;; Expand in-line code to clear the instruction cache between operand[0] and
 ;; operand[1].
 (define_expand "clear_cache"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-1.c b/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-1.c
new file mode 100644
index 00000000000..d1a5ff002a9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-1.c
@@ -0,0 +1,99 @@
+/* { dg-do compile } */
+/* { dg-add-options riscv_v } */
+/* { dg-additional-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <string.h>
+
+#define MIN_VECTOR_BYTES (__riscv_v_min_vlen/8)
+
+/* tiny memsets should use scalar ops
+** f1:
+**  sb\s+a1,0\(a0\)
+**  ret
+*/
+void * f1 (void *a, int const b)
+{
+  return memset (a, b, 1);
+}
+
+/* tiny memsets should use scalar ops
+** f2:
+**  sb\s+a1,0\(a0\)
+**  sb\s+a1,1\(a0\)
+**  ret
+*/
+void * f2 (void *a, int const b)
+{
+  return memset (a, b, 2);
+}
+
+/* tiny memsets should use scalar ops
+** f3:
+**  sb\s+a1,0\(a0\)
+**  sb\s+a1,1\(a0\)
+**  sb\s+a1,2\(a0\)
+**  ret
+*/
+void * f3 (void *a, int const b)
+{
+  return memset (a, b, 3);
+}
+
+/* vectorise+inline minimum vector register width with LMUL=1
+** f4:
+**  (
+**  vsetivli\s+zero,\d+,e8,m1,ta,ma
+**  |
+**  li\s+a\d+,\d+
+**  vsetvli\s+zero,a\d+,e8,m1,ta,ma
+**  )
+**  vmv\.v\.x\s+v\d+,a1
+**  vse8\.v\s+v\d+,0\(a0\)
+**  ret
+*/
+void * f4 (void *a, int const b)
+{
+  return memset (a, b, MIN_VECTOR_BYTES);
+}
+
+/* vectorised code should use smallest lmul known to fit length
+** f5:
+**  (
+**  vsetivli\s+zero,\d+,e8,m2,ta,ma
+**  |
+**  li\s+a\d+,\d+
+**  vsetvli\s+zero,a\d+,e8,m2,ta,ma
+**  )
+**  vmv\.v\.x\s+v\d+,a1
+**  vse8\.v\s+v\d+,0\(a0\)
+**  ret
+*/
+void * f5 (void *a, int const b)
+{
+  return memset (a, b, MIN_VECTOR_BYTES+1);
+}
+
+/* vectorise+inline up to LMUL=8
+** f6:
+**  li\s+a\d+,\d+
+**  vsetvli\s+zero,a\d+,e8,m8,ta,ma
+**  vmv\.v\.x\s+v\d+,a1
+**  vse8\.v\s+v\d+,0\(a0\)
+**  ret
+*/
+void * f6 (void *a, int const b)
+{
+  return memset (a, b, MIN_VECTOR_BYTES*8);
+}
+
+/* don't vectorise if the move is too large for one operation
+** f7:
+**  li\s+a2,\d+
+**  tail\s+memset
+*/
+void * f7 (void *a, int const b)
+{
+  return memset (a, b, MIN_VECTOR_BYTES*8+1);
+}
+
-- 
2.34.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 3/3] RISC-V: cmpmem for RISCV with V extension
  2023-12-11  9:47 [PATCH 0/3] RISC-V: vectorised memory operations Sergei Lewis
  2023-12-11  9:47 ` [PATCH 1/3] RISC-V: movmem for RISCV with V extension Sergei Lewis
  2023-12-11  9:47 ` [PATCH 2/3] RISC-V: setmem " Sergei Lewis
@ 2023-12-11  9:47 ` Sergei Lewis
  2 siblings, 0 replies; 9+ messages in thread
From: Sergei Lewis @ 2023-12-11  9:47 UTC (permalink / raw)
  To: gcc-patches

gcc/ChangeLog:

    * config/riscv/riscv-protos.h (riscv_vector::expand_vec_cmpmem): New function
    declaration.

    * config/riscv/riscv-string.cc (riscv_vector::expand_vec_cmpmem): New
    function; this generates an inline vectorised memory compare, if and only if
    we know the entire operation can be performed in a single vector load per
    input

    * config/riscv/riscv.md (cmpmemsi): Try riscv_vector::expand_vec_cmpmem for
    constant lengths

gcc/testsuite/ChangeLog:

    * gcc.target/riscv/rvv/base/cmpmem-1.c: New codegen tests
    * gcc.target/riscv/rvv/base/cmpmem-2.c: New execution tests
---
 gcc/config/riscv/riscv-protos.h               |   1 +
 gcc/config/riscv/riscv-string.cc              | 111 ++++++++++++++++++
 gcc/config/riscv/riscv.md                     |  15 +++
 .../gcc.target/riscv/rvv/base/cmpmem-1.c      |  85 ++++++++++++++
 .../gcc.target/riscv/rvv/base/cmpmem-2.c      |  69 +++++++++++
 5 files changed, 281 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-2.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 950cb65c910..72378438552 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -561,6 +561,7 @@ void expand_rawmemchr (machine_mode, rtx, rtx, rtx, bool = false);
 bool expand_strcmp (rtx, rtx, rtx, rtx, unsigned HOST_WIDE_INT, bool);
 void emit_vec_extract (rtx, rtx, poly_int64);
 bool expand_vec_setmem (rtx, rtx, rtx, rtx);
+bool expand_vec_cmpmem (rtx, rtx, rtx, rtx);
 
 /* Rounding mode bitfield for fixed point VXRM.  */
 enum fixed_point_rounding_mode
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 0abbd5f8b28..6128565310b 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -1329,4 +1329,115 @@ expand_vec_setmem (rtx dst_in, rtx length_in, rtx fill_value_in,
   return true;
 }
 
+
+/* Used by cmpmemsi in riscv.md.  */
+
+bool
+expand_vec_cmpmem (rtx result_out, rtx blk_a_in, rtx blk_b_in, rtx length_in)
+{
+  /* we're generating vector code.  */
+  if (!TARGET_VECTOR)
+    return false;
+  /* if we can't reason about the length, let libc handle the operation.  */
+  if (!CONST_INT_P (length_in))
+    return false;
+
+  HOST_WIDE_INT length = INTVAL (length_in);
+  HOST_WIDE_INT lmul;
+
+  /* select an lmul such that the data just fits into one vector operation;
+     bail if we can't.  */
+  if (!select_appropriate_lmul (length, lmul))
+    return false;
+
+  /* strategy:
+     load entire blocks at a and b into vector regs
+     generate mask of bytes that differ
+     find first set bit in mask
+     find offset of first set bit in mask, use 0 if none set
+     result is ((char*)a[offset] - (char*)b[offset])
+   */
+
+  machine_mode vmode = riscv_vector::get_vector_mode (QImode,
+	    BYTES_PER_RISCV_VECTOR * lmul).require ();
+  rtx blk_a_addr = copy_addr_to_reg (XEXP (blk_a_in, 0));
+  rtx blk_a = change_address (blk_a_in, vmode, blk_a_addr);
+  rtx blk_b_addr = copy_addr_to_reg (XEXP (blk_b_in, 0));
+  rtx blk_b = change_address (blk_b_in, vmode, blk_b_addr);
+
+  rtx vec_a = gen_reg_rtx (vmode);
+  rtx vec_b = gen_reg_rtx (vmode);
+
+  machine_mode mask_mode = get_mask_mode (vmode);
+  rtx mask = gen_reg_rtx (mask_mode);
+  rtx mismatch_ofs = gen_reg_rtx (Pmode);
+
+  rtx ne = gen_rtx_NE (mask_mode, vec_a, vec_b);
+  rtx vmsops[] = {mask, ne, vec_a, vec_b};
+  rtx vfops[] = {mismatch_ofs, mask};
+
+  /* If the length is exactly vlmax for the selected mode, do that.
+     Otherwise, use a predicated store.  */
+
+  if (known_eq (GET_MODE_SIZE (vmode), INTVAL (length_in)))
+    {
+      emit_move_insn (vec_a, blk_a);
+      emit_move_insn (vec_b, blk_b);
+      emit_vlmax_insn (code_for_pred_cmp (vmode),
+	      riscv_vector::COMPARE_OP, vmsops);
+
+      emit_vlmax_insn (code_for_pred_ffs (mask_mode, Pmode),
+	      riscv_vector::CPOP_OP, vfops);
+    }
+  else
+    {
+      if (!satisfies_constraint_K (length_in))
+	      length_in= force_reg (Pmode, length_in);
+
+      rtx memmask =  CONSTM1_RTX (mask_mode);
+
+      rtx m_ops_a[] = {vec_a, memmask, blk_a};
+      rtx m_ops_b[] = {vec_b, memmask, blk_b};
+
+      emit_nonvlmax_insn (code_for_pred_mov (vmode),
+	      riscv_vector::UNARY_OP_TAMA, m_ops_a, length_in);
+      emit_nonvlmax_insn (code_for_pred_mov (vmode),
+	      riscv_vector::UNARY_OP_TAMA, m_ops_b, length_in);
+
+      emit_nonvlmax_insn (code_for_pred_cmp (vmode),
+	      riscv_vector::COMPARE_OP, vmsops, length_in);
+
+      emit_nonvlmax_insn (code_for_pred_ffs (mask_mode, Pmode),
+	      riscv_vector::CPOP_OP, vfops, length_in);
+    }
+
+  /* mismatch_ofs is -1 if blocks match, or the offset of
+     the first mismatch otherwise.  */
+    rtx ltz = gen_reg_rtx (Xmode);
+    emit_insn (gen_slt_3 (LT, Xmode, Xmode, ltz, mismatch_ofs, const0_rtx));
+  /* mismatch_ofs += (mismatch_ofs < 0) ? 1 : 0.  */
+    emit_insn (gen_rtx_SET (mismatch_ofs, gen_rtx_PLUS (Pmode,
+	    mismatch_ofs, ltz)));
+
+  /* unconditionally load the bytes at mismatch_ofs and subtract them
+     to get our result.  */
+    emit_insn (gen_rtx_SET (blk_a_addr, gen_rtx_PLUS (Pmode,
+	    mismatch_ofs, blk_a_addr)));
+    emit_insn (gen_rtx_SET (blk_b_addr, gen_rtx_PLUS (Pmode,
+	    mismatch_ofs, blk_b_addr)));
+
+    blk_a = change_address (blk_a, QImode, blk_a_addr);
+    blk_b = change_address (blk_b, QImode, blk_b_addr);
+
+    rtx byte_a = gen_reg_rtx (SImode);
+    rtx byte_b = gen_reg_rtx (SImode);
+    do_zero_extendqi2 (byte_a, blk_a);
+    do_zero_extendqi2 (byte_b, blk_b);
+
+    emit_insn (gen_rtx_SET (result_out, gen_rtx_MINUS (SImode,
+	    byte_a, byte_b)));
+
+
+  return true;
+}
 }
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 29d3b1aa342..39829c8566c 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2395,6 +2395,21 @@
     FAIL;
 })
 
+(define_expand "cmpmemsi"
+ [(set (match_operand:SI 0 "register_operand" "")
+       (compare:SI (match_operand:BLK 1 "memory_operand" "")
+				  (match_operand:BLK 2 "memory_operand" "")))
+  (use (match_operand:SI 3 "general_operand" ""))
+  (use (match_operand:SI 4 "" ""))]
+ "TARGET_VECTOR"
+{
+ if (riscv_vector::expand_vec_cmpmem (operands[0], operands[1],
+				  operands[2], operands[3]))
+   DONE;
+ else
+   FAIL;
+})
+
 ;; Expand in-line code to clear the instruction cache between operand[0] and
 ;; operand[1].
 (define_expand "clear_cache"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-1.c b/gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-1.c
new file mode 100644
index 00000000000..686ac6d6b0c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-1.c
@@ -0,0 +1,85 @@
+/* { dg-do compile } */
+/* { dg-add-options riscv_v } */
+/* { dg-additional-options "-O3" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <string.h>
+
+#define MIN_VECTOR_BYTES (__riscv_v_min_vlen/8)
+
+/* trivial memcmp should use inline scalar ops
+** f1:
+**  lbu\s+a\d+,0\(a0\)
+**  lbu\s+a\d+,0\(a1\)
+**  subw\s+a0,a\d+,a\d+
+**  ret
+*/
+int f1 (void * a, void * b)
+{
+  return memcmp (a, b, 1);
+}
+
+/* tiny memcmp should use libc
+** f2:
+**  li\s+a2,\d+
+**  tail\s+memcmp
+*/
+int f2 (void * a, void * b)
+{
+  return memcmp (a, b, MIN_VECTOR_BYTES-1);
+}
+
+/* vectorise+inline minimum vector register width with LMUL=1
+** f3:
+**  (
+**  vsetivli\s+zero,\d+,e8,m1,ta,ma
+**  |
+**  li\s+a\d+,\d+
+**  vsetvli\s+zero,a\d+,e8,m1,ta,ma
+**  )
+**  ...
+**  ret
+*/
+int f3 (void * a, void * b)
+{
+  return memcmp (a, b, MIN_VECTOR_BYTES);
+}
+
+/* vectorised code should use smallest lmul known to fit length
+** f4:
+**  (
+**  vsetivli\s+zero,\d+,e8,m2,ta,ma
+**  |
+**  li\s+a\d+,\d+
+**  vsetvli\s+zero,a\d+,e8,m2,ta,ma
+**  )
+**  ...
+**  ret
+*/
+int f4 (void * a, void * b)
+{
+  return memcmp (a, b, MIN_VECTOR_BYTES+1);
+}
+
+/* vectorise+inline up to LMUL=8
+** f5:
+**  li\s+a\d+,\d+
+**  vsetvli\s+zero,a\d+,e8,m8,ta,ma
+**  ...
+**  ret
+*/
+int f5 (void * a, void * b)
+{
+  return memcmp (a, b, MIN_VECTOR_BYTES*8);
+}
+
+/* don't inline if the length is too large for one operation
+** f6:
+**  li\s+a2,\d+
+**  tail\s+memcmp
+*/
+int f6 (void * a, void * b)
+{
+  return memcmp (a, b, MIN_VECTOR_BYTES*8+1);
+}
+
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-2.c b/gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-2.c
new file mode 100644
index 00000000000..eedd23d4db0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-2.c
@@ -0,0 +1,69 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-options "-O2" } */
+
+#include <string.h>
+#include <stdlib.h>
+
+#define MIN_VECTOR_BYTES (__riscv_v_min_vlen/8)
+
+static inline __attribute__((always_inline)) 
+void do_one_test( int const size, int const diff_offset, 
+    int const diff_dir ) 
+{
+  unsigned char A[size];
+  unsigned char B[size];
+  unsigned char const fill_value = 0x55;
+  memset( A, fill_value, size );
+  memset( B, fill_value, size );
+
+  if( diff_dir != 0 ) {
+    if( diff_dir < 0 ) {
+      A[diff_offset] = fill_value-1;
+    } else {
+      A[diff_offset] = fill_value+1;
+    }
+  }
+
+  if( memcmp( A, B, size ) != diff_dir ) {
+    abort ();
+  }
+}
+
+int main()
+{
+  do_one_test( 0, 0, 0  );
+
+  do_one_test( 1, 0, -1 );
+  do_one_test( 1, 0,  0 );
+  do_one_test( 1, 0,  1 );
+
+  do_one_test( MIN_VECTOR_BYTES-1, 0, -1 );
+  do_one_test( MIN_VECTOR_BYTES-1, 0,  0 );
+  do_one_test( MIN_VECTOR_BYTES-1, 0,  1 );
+  do_one_test( MIN_VECTOR_BYTES-1, 1, -1 );
+  do_one_test( MIN_VECTOR_BYTES-1, 1,  0 );
+  do_one_test( MIN_VECTOR_BYTES-1, 1,  1 );
+
+  do_one_test( MIN_VECTOR_BYTES, 0, -1 );
+  do_one_test( MIN_VECTOR_BYTES, 0,  0 );
+  do_one_test( MIN_VECTOR_BYTES, 0,  1 );
+  do_one_test( MIN_VECTOR_BYTES, MIN_VECTOR_BYTES-1, -1 );
+  do_one_test( MIN_VECTOR_BYTES, MIN_VECTOR_BYTES-1,  0 );
+  do_one_test( MIN_VECTOR_BYTES, MIN_VECTOR_BYTES-1,  1 );
+
+  do_one_test( MIN_VECTOR_BYTES+1, 0, -1 );
+  do_one_test( MIN_VECTOR_BYTES+1, 0,  0 );
+  do_one_test( MIN_VECTOR_BYTES+1, 0,  1 );
+  do_one_test( MIN_VECTOR_BYTES+1, MIN_VECTOR_BYTES, -1 );
+  do_one_test( MIN_VECTOR_BYTES+1, MIN_VECTOR_BYTES,  0 );
+  do_one_test( MIN_VECTOR_BYTES+1, MIN_VECTOR_BYTES,  1 );
+
+  do_one_test( MIN_VECTOR_BYTES*8, 0, -1 );
+  do_one_test( MIN_VECTOR_BYTES*8, 0,  0 );
+  do_one_test( MIN_VECTOR_BYTES*8, 0,  1 );
+  do_one_test( MIN_VECTOR_BYTES*8, MIN_VECTOR_BYTES*8-1, -1 );
+  do_one_test( MIN_VECTOR_BYTES*8, MIN_VECTOR_BYTES*8-1,  0 );
+  do_one_test( MIN_VECTOR_BYTES*8, MIN_VECTOR_BYTES*8-1,  1 );
+
+  return 0;
+}
-- 
2.34.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] RISC-V: movmem for RISCV with V extension
  2023-12-11  9:47 ` [PATCH 1/3] RISC-V: movmem for RISCV with V extension Sergei Lewis
@ 2023-12-11 10:08   ` Robin Dapp
  2023-12-11 10:21   ` Robin Dapp
  1 sibling, 0 replies; 9+ messages in thread
From: Robin Dapp @ 2023-12-11 10:08 UTC (permalink / raw)
  To: gcc-patches; +Cc: rdapp.gcc

Hi Sergei,

thanks for contributing this!

Small general remarks/nits upfront:

The code looks like it hasn't been run through clang-format or
similar.  Please make sure that it adheres to the GNU coding
conventions.  The same applies to comments.  Some of them start
in lowercase.

As you rely on the vector length, please make sure to test various
combinations (also "exotic" ones) like zve32 and zve64.
Also, please specify which configurations it has been tested on. 

>     * config/riscv/riscv.md (movmem<mode>): Use riscv_vector::expand_block_move,
>     if and only if we know the entire operation can be performed using one vector
>     load followed by one vector store
> 
> gcc/testsuite/ChangeLog
> 
>     * gcc.target/riscv/rvv/base/movmem-1.c: New test

Please add a PR target/112109 here.  I believe after these
patches have landed we can close that bug.

> ---
>  gcc/config/riscv/riscv.md                     | 22 +++++++
>  .../gcc.target/riscv/rvv/base/movmem-1.c      | 59 +++++++++++++++++++
>  2 files changed, 81 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c
> 
> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index eed997116b0..88fde290a8a 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -2359,6 +2359,28 @@
>      FAIL;
>  })
>  
> +;; inlining general memmove is a pessimisation: we can't avoid having to decide
> +;; which direction to go at runtime, which is costly in instruction count
> +;; however for situations where the entire move fits in one vector operation
> +;; we can do all reads before doing any writes so we don't have to worry
> +;; so generate the inline vector code in such situations
> +;; nb. prefer scalar path for tiny memmoves
> +(define_expand "movmem<mode>"
> +  [(parallel [(set (match_operand:BLK 0 "general_operand")
> +      (match_operand:BLK 1 "general_operand"))
> +	      (use (match_operand:P 2 ""))
> +	      (use (match_operand:SI 3 "const_int_operand"))])]
> +  "TARGET_VECTOR"
> +{
> +  if ((INTVAL (operands[2]) >= TARGET_MIN_VLEN/8)

If operands[2] is used as an int we need to make sure the constraint
says so.  Shouldn't operand[1] be a memory_operand?

> +	&& (INTVAL (operands[2]) <= TARGET_MIN_VLEN)
> +	&& riscv_vector::expand_block_move (operands[0], operands[1],
> +	     operands[2]))
> +    DONE;
> +  else
> +    FAIL;
> +})
> +

> +#define MIN_VECTOR_BYTES (__riscv_v_min_vlen/8)
> +
> +/* tiny memmoves should not be vectorised
> +** f1:
> +**  li\s+a2,15
> +**  tail\s+memmove
> +*/
> +char * f1 (char *a, char const *b)
> +{
> +  return memmove (a, b, 15);
> +}

The < 16 assumption might not hold on embedded targets.
Same with the other tests.

Regards
 Robin


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] RISC-V: movmem for RISCV with V extension
  2023-12-11  9:47 ` [PATCH 1/3] RISC-V: movmem for RISCV with V extension Sergei Lewis
  2023-12-11 10:08   ` Robin Dapp
@ 2023-12-11 10:21   ` Robin Dapp
  1 sibling, 0 replies; 9+ messages in thread
From: Robin Dapp @ 2023-12-11 10:21 UTC (permalink / raw)
  To: gcc-patches; +Cc: rdapp.gcc

Ah, please also ensure to include (and follow) the stringop_strategy
checks. (LIBCALL, VECTOR)
The naming is a bit unfortunate still but that need not be fixed
in this patch.  

Regards
 Robin

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] RISC-V: setmem for RISCV with V extension
  2023-12-11  9:47 ` [PATCH 2/3] RISC-V: setmem " Sergei Lewis
@ 2023-12-11 13:38   ` Kito Cheng
  0 siblings, 0 replies; 9+ messages in thread
From: Kito Cheng @ 2023-12-11 13:38 UTC (permalink / raw)
  To: Sergei Lewis; +Cc: gcc-patches

On Mon, Dec 11, 2023 at 5:48 PM Sergei Lewis <slewis@rivosinc.com> wrote:
>
> gcc/ChangeLog
>
>     * config/riscv/riscv-protos.h (riscv_vector::expand_vec_setmem): New function
>     declaration.
>
>     * config/riscv/riscv-string.cc (riscv_vector::expand_vec_setmem): New
>     function: this generates an inline vectorised memory set, if and only if we
>     know the entire operation can be performed in a single vector store
>
>     * config/riscv/riscv.md (setmem<mode>): Try riscv_vector::expand_vec_setmem
>     for constant lengths
>
> gcc/testsuite/ChangeLog
>     * gcc.target/riscv/rvv/base/setmem-1.c: New tests
> ---
>  gcc/config/riscv/riscv-protos.h               |  1 +
>  gcc/config/riscv/riscv-string.cc              | 82 +++++++++++++++
>  gcc/config/riscv/riscv.md                     | 14 +++
>  .../gcc.target/riscv/rvv/base/setmem-1.c      | 99 +++++++++++++++++++
>  4 files changed, 196 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/setmem-1.c
>
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index 20bbb5b859c..950cb65c910 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -560,6 +560,7 @@ void expand_popcount (rtx *);
>  void expand_rawmemchr (machine_mode, rtx, rtx, rtx, bool = false);
>  bool expand_strcmp (rtx, rtx, rtx, rtx, unsigned HOST_WIDE_INT, bool);
>  void emit_vec_extract (rtx, rtx, poly_int64);
> +bool expand_vec_setmem (rtx, rtx, rtx, rtx);
>
>  /* Rounding mode bitfield for fixed point VXRM.  */
>  enum fixed_point_rounding_mode
> diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
> index 11c1f74d0b3..0abbd5f8b28 100644
> --- a/gcc/config/riscv/riscv-string.cc
> +++ b/gcc/config/riscv/riscv-string.cc
> @@ -1247,4 +1247,86 @@ expand_strcmp (rtx result, rtx src1, rtx src2, rtx nbytes,
>    return true;
>  }
>
> +
> +/* Select appropriate LMUL for a single vector operation based on
> +   byte size of data to be processed.
> +   On success, return true and populate lmul_out.
> +   If length_in is too wide for a single vector operation, return false
> +   and leave lmul_out unchanged.  */
> +
> +static bool
> +select_appropriate_lmul (HOST_WIDE_INT length_in,
> +    HOST_WIDE_INT &lmul_out)
> +{
> +  /* if it's tiny, default operation is likely better; maybe worth
> +     considering fractional lmul in the future as well.  */
> +  if (length_in < (TARGET_MIN_VLEN/8))

(TARGET_MIN_VLEN / 8)

> +    return false;
> +
> +  /* find smallest lmul large enough for entire op.  */
> +  HOST_WIDE_INT lmul = 1;
> +  while ((lmul <= 8) && (length_in > ((lmul*TARGET_MIN_VLEN)/8)))

 ((lmu l *TARGET_MIN_VLEN) / 8)))

> +    {
> +      lmul <<= 1;
> +    }
> +
> +  if (lmul > 8)
> +    return false;
> +
> +  lmul_out = lmul;
> +  return true;
> +}
> +
> +/* Used by setmemdi in riscv.md.  */
> +bool
> +expand_vec_setmem (rtx dst_in, rtx length_in, rtx fill_value_in,
> +         rtx alignment_in)
> +{
> +  /* we're generating vector code.  */
> +  if (!TARGET_VECTOR)
> +    return false;
> +  /* if we can't reason about the length, let libc handle the operation.  */
> +  if (!CONST_INT_P (length_in))
> +    return false;
> +
> +  HOST_WIDE_INT length = INTVAL (length_in);
> +  HOST_WIDE_INT lmul;
> +
> +  /* select an lmul such that the data just fits into one vector operation;
> +     bail if we can't.  */
> +  if (!select_appropriate_lmul (length, lmul))
> +    return false;
> +
> +  machine_mode vmode = riscv_vector::get_vector_mode (QImode,
> +         BYTES_PER_RISCV_VECTOR * lmul).require ();
> +  rtx dst_addr = copy_addr_to_reg (XEXP (dst_in, 0));
> +  rtx dst = change_address (dst_in, vmode, dst_addr);
> +
> +  rtx fill_value = gen_reg_rtx (vmode);
> +  rtx broadcast_ops[] = {fill_value, fill_value_in};
> +
> +  /* If the length is exactly vlmax for the selected mode, do that.
> +     Otherwise, use a predicated store.  */
> +  if (known_eq (GET_MODE_SIZE (vmode), INTVAL (length_in)))
> +    {
> +      emit_vlmax_insn (code_for_pred_broadcast (vmode),
> +             UNARY_OP, broadcast_ops);
> +      emit_move_insn (dst, fill_value);
> +    }
> +  else
> +    {
> +      if (!satisfies_constraint_K (length_in))
> +             length_in= force_reg (Pmode, length_in);
> +      emit_nonvlmax_insn (code_for_pred_broadcast (vmode), UNARY_OP,
> +             broadcast_ops, length_in);
> +      machine_mode mask_mode = riscv_vector::get_vector_mode
> +             (BImode, GET_MODE_NUNITS (vmode)).require ();
> +      rtx mask =  CONSTM1_RTX (mask_mode);
> +      emit_insn (gen_pred_store (vmode, dst, mask, fill_value, length_in,
> +             get_avl_type_rtx (riscv_vector::NONVLMAX)));
> +    }
> +
> +  return true;
> +}
> +
>  }
> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index 88fde290a8a..29d3b1aa342 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -2381,6 +2381,20 @@
>      FAIL;
>  })
>
> +(define_expand "setmemsi"
> +  [(set (match_operand:BLK 0 "memory_operand")     ;; Dest
> +             (match_operand:QI  2 "nonmemory_operand")) ;; Value
> +   (use (match_operand:SI  1 "const_int_operand")) ;; Length
> +   (match_operand:SI       3 "const_int_operand")] ;; Align
> +  "TARGET_VECTOR"
> +{
> +  if (riscv_vector::expand_vec_setmem (operands[0], operands[1], operands[2],
> +      operands[3]))
> +    DONE;
> +  else
> +    FAIL;
> +})
> +
>  ;; Expand in-line code to clear the instruction cache between operand[0] and
>  ;; operand[1].
>  (define_expand "clear_cache"
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-1.c b/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-1.c
> new file mode 100644
> index 00000000000..d1a5ff002a9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/setmem-1.c
> @@ -0,0 +1,99 @@
> +/* { dg-do compile } */
> +/* { dg-add-options riscv_v } */
> +/* { dg-additional-options "-O3" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +#include <string.h>

Drop this to prevent multilib testing issues.

> +
> +#define MIN_VECTOR_BYTES (__riscv_v_min_vlen/8)
> +
> +/* tiny memsets should use scalar ops
> +** f1:
> +**  sb\s+a1,0\(a0\)
> +**  ret
> +*/
> +void * f1 (void *a, int const b)
> +{
> +  return memset (a, b, 1);

__builtin_memset instead memset

> +}
> +
> +/* tiny memsets should use scalar ops
> +** f2:
> +**  sb\s+a1,0\(a0\)
> +**  sb\s+a1,1\(a0\)
> +**  ret
> +*/
> +void * f2 (void *a, int const b)
> +{
> +  return memset (a, b, 2);

Ditto.

> +}
> +
> +/* tiny memsets should use scalar ops
> +** f3:
> +**  sb\s+a1,0\(a0\)
> +**  sb\s+a1,1\(a0\)
> +**  sb\s+a1,2\(a0\)
> +**  ret
> +*/
> +void * f3 (void *a, int const b)
> +{
> +  return memset (a, b, 3);

Ditto.

> +}
> +
> +/* vectorise+inline minimum vector register width with LMUL=1
> +** f4:
> +**  (
> +**  vsetivli\s+zero,\d+,e8,m1,ta,ma
> +**  |
> +**  li\s+a\d+,\d+
> +**  vsetvli\s+zero,a\d+,e8,m1,ta,ma
> +**  )
> +**  vmv\.v\.x\s+v\d+,a1
> +**  vse8\.v\s+v\d+,0\(a0\)
> +**  ret
> +*/
> +void * f4 (void *a, int const b)
> +{
> +  return memset (a, b, MIN_VECTOR_BYTES);

Ditto.

> +}
> +
> +/* vectorised code should use smallest lmul known to fit length
> +** f5:
> +**  (
> +**  vsetivli\s+zero,\d+,e8,m2,ta,ma
> +**  |
> +**  li\s+a\d+,\d+
> +**  vsetvli\s+zero,a\d+,e8,m2,ta,ma
> +**  )
> +**  vmv\.v\.x\s+v\d+,a1
> +**  vse8\.v\s+v\d+,0\(a0\)
> +**  ret
> +*/
> +void * f5 (void *a, int const b)
> +{
> +  return memset (a, b, MIN_VECTOR_BYTES+1);

Ditto.

> +}
> +
> +/* vectorise+inline up to LMUL=8
> +** f6:
> +**  li\s+a\d+,\d+
> +**  vsetvli\s+zero,a\d+,e8,m8,ta,ma
> +**  vmv\.v\.x\s+v\d+,a1
> +**  vse8\.v\s+v\d+,0\(a0\)
> +**  ret
> +*/
> +void * f6 (void *a, int const b)
> +{
> +  return memset (a, b, MIN_VECTOR_BYTES*8);

Ditto.

> +}
> +
> +/* don't vectorise if the move is too large for one operation
> +** f7:
> +**  li\s+a2,\d+
> +**  tail\s+memset
> +*/
> +void * f7 (void *a, int const b)
> +{
> +  return memset (a, b, MIN_VECTOR_BYTES*8+1);

Ditto.

> +}
> +
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] RISC-V: setmem for RISCV with V extension
  2023-12-11 10:05 [PATCH 2/3] RISC-V: setmem " juzhe.zhong
@ 2023-12-11 14:58 ` Sergei Lewis
  0 siblings, 0 replies; 9+ messages in thread
From: Sergei Lewis @ 2023-12-11 14:58 UTC (permalink / raw)
  To: juzhe.zhong; +Cc: gcc-patches, Robin Dapp, Kito.cheng, jeffreyalaw

[-- Attachment #1: Type: text/plain, Size: 948 bytes --]

The thinking here is that using the largest possible LMUL when we know the
operation will fit in fewer registers potentially leaves performance on the
table - indirectly, due to the unnecessarily increased register pressure,
and also directly, depending on the implementation.

On Mon, Dec 11, 2023 at 10:05 AM juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
wrote:

> Hi, Thanks for contributing this.
>
> +/* Select appropriate LMUL for a single vector operation based on
> +   byte size of data to be processed.
> +   On success, return true and populate lmul_out.
> +   If length_in is too wide for a single vector operation, return false
> +   and leave lmul_out unchanged.  */
> +
> +static bool
> +select_appropriate_lmul (HOST_WIDE_INT length_in,
> +    HOST_WIDE_INT &lmul_out)
> +{
>
> I don't think we need this, you only need to use TARGET_MAX_LMUL
>
>
> ------------------------------
> juzhe.zhong@rivai.ai
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/3] RISC-V: setmem for RISCV with V extension
@ 2023-12-11 10:05 juzhe.zhong
  2023-12-11 14:58 ` Sergei Lewis
  0 siblings, 1 reply; 9+ messages in thread
From: juzhe.zhong @ 2023-12-11 10:05 UTC (permalink / raw)
  To: gcc-patches; +Cc: Robin Dapp, Kito.cheng, jeffreyalaw

[-- Attachment #1: Type: text/plain, Size: 501 bytes --]

Hi, Thanks for contributing this.

+/* Select appropriate LMUL for a single vector operation based on
+   byte size of data to be processed.
+   On success, return true and populate lmul_out.
+   If length_in is too wide for a single vector operation, return false
+   and leave lmul_out unchanged.  */
+
+static bool
+select_appropriate_lmul (HOST_WIDE_INT length_in,
+    HOST_WIDE_INT &lmul_out)
+{
I don't think we need this, you only need to use TARGET_MAX_LMUL




juzhe.zhong@rivai.ai

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-12-11 14:58 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-11  9:47 [PATCH 0/3] RISC-V: vectorised memory operations Sergei Lewis
2023-12-11  9:47 ` [PATCH 1/3] RISC-V: movmem for RISCV with V extension Sergei Lewis
2023-12-11 10:08   ` Robin Dapp
2023-12-11 10:21   ` Robin Dapp
2023-12-11  9:47 ` [PATCH 2/3] RISC-V: setmem " Sergei Lewis
2023-12-11 13:38   ` Kito Cheng
2023-12-11  9:47 ` [PATCH 3/3] RISC-V: cmpmem " Sergei Lewis
2023-12-11 10:05 [PATCH 2/3] RISC-V: setmem " juzhe.zhong
2023-12-11 14:58 ` Sergei Lewis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).