public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/7] riscv: Improve builtins expansion
@ 2022-11-13 23:05 Christoph Muellner
  2022-11-13 23:05 ` [PATCH 1/7] riscv: bitmanip: add orc.b as an unspec Christoph Muellner
                   ` (6 more replies)
  0 siblings, 7 replies; 33+ messages in thread
From: Christoph Muellner @ 2022-11-13 23:05 UTC (permalink / raw)
  To: gcc-patches, Kito Cheng, Jim Wilson, Palmer Dabbelt,
	Andrew Waterman, Philipp Tomsich, Jeff Law, Vineet Gupta
  Cc: Christoph Müllner

From: Christoph Müllner <christoph.muellner@vrull.eu>

This patchset adds includes patches to improve the following builtin
expansions:

* cpymemsi: Allow by-pieces to generate overlapping memory accesses
* cmpstrsi: Add expansion for strcmp and strncmp for aligned strings when zbb/orc.b is available
* strlen<mode>: Add expansion for strlen when zbb/orc.b is available

This changes were inspired a lot by the PowerPC backend
(e.g. moving the expansion code into riscv-string.cc and emitting an
unrolled loop which eventually calls the C runtime if necessary).

This series is meant to be applied on top of the VT1 support series:
  https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605959.html

The patches come with their own tests and show no regressions to
existing tests. Further the series has been tested with all SPEC CPU2017
benchmarks.

Christoph Müllner (6):
  riscv: bitmanip/zbb: Add prefix/postfix and enable visiblity
  riscv: Enable overlap-by-pieces via tune param
  riscv: Move riscv_block_move_loop to separate file
  riscv: Use by-pieces to do overlapping accesses in block_move_straight
  riscv: Add support for strlen inline expansion
  riscv: Add support for str(n)cmp inline expansion

Philipp Tomsich (1):
  riscv: bitmanip: add orc.b as an unspec

 gcc/config.gcc                                |   3 +-
 gcc/config/riscv/bitmanip.md                  |  12 +-
 gcc/config/riscv/riscv-protos.h               |   7 +-
 gcc/config/riscv/riscv-string.cc              | 687 ++++++++++++++++++
 gcc/config/riscv/riscv.cc                     | 172 +----
 gcc/config/riscv/riscv.md                     | 105 ++-
 gcc/config/riscv/riscv.opt                    |   5 +
 gcc/config/riscv/t-riscv                      |   4 +
 .../gcc.target/riscv/memcpy-nonoverlapping.c  |  54 ++
 .../gcc.target/riscv/memcpy-overlapping.c     |  47 ++
 .../gcc.target/riscv/memset-nonoverlapping.c  |  45 ++
 .../gcc.target/riscv/memset-overlapping.c     |  43 ++
 .../gcc.target/riscv/zbb-strcmp-unaligned.c   |  36 +
 gcc/testsuite/gcc.target/riscv/zbb-strcmp.c   |  55 ++
 .../gcc.target/riscv/zbb-strlen-unaligned.c   |  13 +
 gcc/testsuite/gcc.target/riscv/zbb-strlen.c   |  18 +
 16 files changed, 1132 insertions(+), 174 deletions(-)
 create mode 100644 gcc/config/riscv/riscv-string.cc
 create mode 100644 gcc/testsuite/gcc.target/riscv/memcpy-nonoverlapping.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/memset-nonoverlapping.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/memset-overlapping.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strcmp-unaligned.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strcmp.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen-unaligned.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen.c

-- 
2.38.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/7] riscv: bitmanip: add orc.b as an unspec
  2022-11-13 23:05 [PATCH 0/7] riscv: Improve builtins expansion Christoph Muellner
@ 2022-11-13 23:05 ` Christoph Muellner
  2022-11-14 16:51   ` Jeff Law
  2022-11-13 23:05 ` [PATCH 2/7] riscv: bitmanip/zbb: Add prefix/postfix and enable visiblity Christoph Muellner
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 33+ messages in thread
From: Christoph Muellner @ 2022-11-13 23:05 UTC (permalink / raw)
  To: gcc-patches, Kito Cheng, Jim Wilson, Palmer Dabbelt,
	Andrew Waterman, Philipp Tomsich, Jeff Law, Vineet Gupta

From: Philipp Tomsich <philipp.tomsich@vrull.eu>

As a basis for optimized string functions (e.g., the by-pieces
implementations), we need orc.b available.  This adds orc.b as an
unspec, so we can expand to it.

gcc/ChangeLog:

        * config/riscv/bitmanip.md (orcb<mode>2): Add orc.b as an
	  unspec.
        * config/riscv/riscv.md: Add UNSPEC_ORC_B.

Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
---
 gcc/config/riscv/bitmanip.md | 8 ++++++++
 gcc/config/riscv/riscv.md    | 3 +++
 2 files changed, 11 insertions(+)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index b44fb9517e7..3dbe6002974 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -242,6 +242,14 @@ (define_insn "rotlsi3_sext"
   "rolw\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
 
+;; orc.b (or-combine) is added as an unspec for the benefit of the support
+;; for optimized string functions (such as strcmp).
+(define_insn "orcb<mode>2"
+  [(set (match_operand:X 0 "register_operand" "=r")
+	(unspec:X [(match_operand:X 1 "register_operand" "r")] UNSPEC_ORC_B))]
+  "TARGET_ZBB"
+  "orc.b\t%0,%1")
+
 (define_insn "bswap<mode>2"
   [(set (match_operand:X 0 "register_operand" "=r")
         (bswap:X (match_operand:X 1 "register_operand" "r")))]
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 798f7370a08..532289dd178 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -62,6 +62,9 @@ (define_c_enum "unspec" [
 
   ;; Stack tie
   UNSPEC_TIE
+
+  ;; OR-COMBINE
+  UNSPEC_ORC_B
 ])
 
 (define_c_enum "unspecv" [
-- 
2.38.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 2/7] riscv: bitmanip/zbb: Add prefix/postfix and enable visiblity
  2022-11-13 23:05 [PATCH 0/7] riscv: Improve builtins expansion Christoph Muellner
  2022-11-13 23:05 ` [PATCH 1/7] riscv: bitmanip: add orc.b as an unspec Christoph Muellner
@ 2022-11-13 23:05 ` Christoph Muellner
  2022-11-14 16:55   ` Jeff Law
  2022-11-13 23:05 ` [PATCH 3/7] riscv: Enable overlap-by-pieces via tune param Christoph Muellner
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 33+ messages in thread
From: Christoph Muellner @ 2022-11-13 23:05 UTC (permalink / raw)
  To: gcc-patches, Kito Cheng, Jim Wilson, Palmer Dabbelt,
	Andrew Waterman, Philipp Tomsich, Jeff Law, Vineet Gupta
  Cc: Christoph Müllner

From: Christoph Müllner <christoph.muellner@vrull.eu>

INSNs are usually postfixed by a number representing the argument count.
Given the instructions will be used in a later commit, let's make them
visible, but add a "riscv_" prefix to avoid conflicts with standard
INSNs.

gcc/ChangeLog:

	* config/riscv/bitmanip.md (*<optab>_not<mode>): Rename INSN.
	(riscv_<optab>_not<mode>3): Rename INSN.
	(*xor_not<mode>): Rename INSN.
	(xor_not<mode>3): Rename INSN.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
---
 gcc/config/riscv/bitmanip.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 3dbe6002974..d6d94e5cdf8 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -119,7 +119,7 @@ (define_insn "*slliuw"
 
 ;; ZBB extension.
 
-(define_insn "*<optab>_not<mode>"
+(define_insn "riscv_<optab>_not<mode>3"
   [(set (match_operand:X 0 "register_operand" "=r")
         (bitmanip_bitwise:X (not:X (match_operand:X 1 "register_operand" "r"))
                             (match_operand:X 2 "register_operand" "r")))]
@@ -128,7 +128,7 @@ (define_insn "*<optab>_not<mode>"
   [(set_attr "type" "bitmanip")
    (set_attr "mode" "<X:MODE>")])
 
-(define_insn "*xor_not<mode>"
+(define_insn "riscv_xor_not<mode>3"
   [(set (match_operand:X 0 "register_operand" "=r")
         (not:X (xor:X (match_operand:X 1 "register_operand" "r")
                       (match_operand:X 2 "register_operand" "r"))))]
-- 
2.38.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 3/7] riscv: Enable overlap-by-pieces via tune param
  2022-11-13 23:05 [PATCH 0/7] riscv: Improve builtins expansion Christoph Muellner
  2022-11-13 23:05 ` [PATCH 1/7] riscv: bitmanip: add orc.b as an unspec Christoph Muellner
  2022-11-13 23:05 ` [PATCH 2/7] riscv: bitmanip/zbb: Add prefix/postfix and enable visiblity Christoph Muellner
@ 2022-11-13 23:05 ` Christoph Muellner
  2022-11-14  2:48   ` Vineet Gupta
  2022-11-13 23:05 ` [PATCH 4/7] riscv: Move riscv_block_move_loop to separate file Christoph Muellner
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 33+ messages in thread
From: Christoph Muellner @ 2022-11-13 23:05 UTC (permalink / raw)
  To: gcc-patches, Kito Cheng, Jim Wilson, Palmer Dabbelt,
	Andrew Waterman, Philipp Tomsich, Jeff Law, Vineet Gupta
  Cc: Christoph Müllner

From: Christoph Müllner <christoph.muellner@vrull.eu>

This patch adds the field overlap_op_by_pieces to the struct
riscv_tune_param, which allows to enable the overlap_op_by_pieces
infrastructure.

gcc/ChangeLog:

	* config/riscv/riscv.c (struct riscv_tune_param): New field.
	(riscv_overlap_op_by_pieces): New function.
	(TARGET_OVERLAP_OP_BY_PIECES_P): Connect to
	riscv_overlap_op_by_pieces.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
---
 gcc/config/riscv/riscv.cc                     | 17 +++++-
 .../gcc.target/riscv/memcpy-nonoverlapping.c  | 54 +++++++++++++++++++
 .../gcc.target/riscv/memcpy-overlapping.c     | 50 +++++++++++++++++
 .../gcc.target/riscv/memset-nonoverlapping.c  | 45 ++++++++++++++++
 .../gcc.target/riscv/memset-overlapping.c     | 43 +++++++++++++++
 5 files changed, 208 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/memcpy-nonoverlapping.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/memset-nonoverlapping.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/memset-overlapping.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index a0c00cfb66f..7357cf51cdf 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -243,6 +243,7 @@ struct riscv_tune_param
   unsigned short fmv_cost;
   bool slow_unaligned_access;
   unsigned int fusible_ops;
+  bool overlap_op_by_pieces;
 };
 
 /* Information about one micro-arch we know about.  */
@@ -331,6 +332,7 @@ static const struct riscv_tune_param rocket_tune_info = {
   8,						/* fmv_cost */
   true,						/* slow_unaligned_access */
   RISCV_FUSE_NOTHING,                           /* fusible_ops */
+  false,					/* overlap_op_by_pieces */
 };
 
 /* Costs to use when optimizing for Sifive 7 Series.  */
@@ -346,6 +348,7 @@ static const struct riscv_tune_param sifive_7_tune_info = {
   8,						/* fmv_cost */
   true,						/* slow_unaligned_access */
   RISCV_FUSE_NOTHING,                           /* fusible_ops */
+  false,					/* overlap_op_by_pieces */
 };
 
 /* Costs to use when optimizing for T-HEAD c906.  */
@@ -361,6 +364,7 @@ static const struct riscv_tune_param thead_c906_tune_info = {
   8,		/* fmv_cost */
   false,            /* slow_unaligned_access */
   RISCV_FUSE_NOTHING,                           /* fusible_ops */
+  false,					/* overlap_op_by_pieces */
 };
 
 /* Costs to use when optimizing for size.  */
@@ -376,6 +380,7 @@ static const struct riscv_tune_param optimize_size_tune_info = {
   8,						/* fmv_cost */
   false,					/* slow_unaligned_access */
   RISCV_FUSE_NOTHING,                           /* fusible_ops */
+  false,					/* overlap_op_by_pieces */
 };
 
 /* Costs to use when optimizing for Ventana Micro VT1.  */
@@ -393,7 +398,8 @@ static const struct riscv_tune_param ventana_vt1_tune_info = {
   ( RISCV_FUSE_ZEXTW | RISCV_FUSE_ZEXTH |       /* fusible_ops */
     RISCV_FUSE_ZEXTWS | RISCV_FUSE_LDINDEXED |
     RISCV_FUSE_LUI_ADDI | RISCV_FUSE_AUIPC_ADDI |
-    RISCV_FUSE_LUI_LD | RISCV_FUSE_AUIPC_LD )
+    RISCV_FUSE_LUI_LD | RISCV_FUSE_AUIPC_LD ),
+  true,						/* overlap_op_by_pieces */
 };
 
 static tree riscv_handle_fndecl_attribute (tree *, tree, tree, int, bool *);
@@ -6444,6 +6450,12 @@ riscv_slow_unaligned_access (machine_mode, unsigned int)
   return riscv_slow_unaligned_access_p;
 }
 
+static bool
+riscv_overlap_op_by_pieces (void)
+{
+  return tune_param->overlap_op_by_pieces;
+}
+
 /* Implement TARGET_CAN_CHANGE_MODE_CLASS.  */
 
 static bool
@@ -6974,6 +6986,9 @@ riscv_dwarf_poly_indeterminate_value (unsigned int i, unsigned int *factor,
 #undef TARGET_SLOW_UNALIGNED_ACCESS
 #define TARGET_SLOW_UNALIGNED_ACCESS riscv_slow_unaligned_access
 
+#undef TARGET_OVERLAP_OP_BY_PIECES_P
+#define TARGET_OVERLAP_OP_BY_PIECES_P riscv_overlap_op_by_pieces
+
 #undef TARGET_SECONDARY_MEMORY_NEEDED
 #define TARGET_SECONDARY_MEMORY_NEEDED riscv_secondary_memory_needed
 
diff --git a/gcc/testsuite/gcc.target/riscv/memcpy-nonoverlapping.c b/gcc/testsuite/gcc.target/riscv/memcpy-nonoverlapping.c
new file mode 100644
index 00000000000..1c99e13fc26
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/memcpy-nonoverlapping.c
@@ -0,0 +1,54 @@
+/* { dg-do compile } */
+/* { dg-options "-mcpu=sifive-u74 -march=rv64gc -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Oz" "-Og" } } */
+
+
+#define COPY_N(N)				\
+void copy##N (char *src, char *dst)		\
+{						\
+  dst = __builtin_assume_aligned (dst, 4096);	\
+  src = __builtin_assume_aligned (src, 4096);	\
+  __builtin_memcpy (dst, src, N);		\
+}
+
+/* Emits 1x {ld,sd} and 1x {lhu,lbu,sh,sb}.  */
+COPY_N(11)
+
+/* Emits 1x {ld,sd} and 1x {lw,lbu,sw,sb}.  */
+COPY_N(13)
+
+/* Emits 1x {ld,sd} and 1x {lw,lhu,sw,sh}.  */
+COPY_N(14)
+
+/* Emits 1x {ld,sd} and 1x {lw,lhu,lbu,sw,sh,sb}.  */
+COPY_N(15)
+
+/* Emits 2x {ld,sd} and 1x {lhu,lbu,sh,sb}.  */
+COPY_N(19)
+
+/* Emits 2x {ld,sd} and 1x {lw,lhu,lbu,sw,sh,sb}.  */
+COPY_N(23)
+
+/* The by-pieces infrastructure handles up to 24 bytes.
+   So the code below is emitted via cpymemsi/block_move_straight.  */
+
+/* Emits 3x {ld,sd} and 1x {lhu,lbu,sh,sb}.  */
+COPY_N(27)
+
+/* Emits 3x {ld,sd} and 1x {lw,lbu,sw,sb}.  */
+COPY_N(29)
+
+/* Emits 3x {ld,sd} and 1x {lw,lhu,lbu,sw,sh,sb}.  */
+COPY_N(31)
+
+/* { dg-final { scan-assembler-times "ld\t" 17 } } */
+/* { dg-final { scan-assembler-times "sd\t" 17 } } */
+
+/* { dg-final { scan-assembler-times "lw\t" 6 } } */
+/* { dg-final { scan-assembler-times "sw\t" 6 } } */
+
+/* { dg-final { scan-assembler-times "lhu\t" 7 } } */
+/* { dg-final { scan-assembler-times "sh\t" 7 } } */
+
+/* { dg-final { scan-assembler-times "lbu\t" 8 } } */
+/* { dg-final { scan-assembler-times "sb\t" 8 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c b/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
new file mode 100644
index 00000000000..ffb7248bfd1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-options "-mcpu=ventana-vt1 -march=rv64gc -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Oz" "-Og" } } */
+
+#define COPY_N(N)				\
+void copy##N (char *src, char *dst)		\
+{						\
+  dst = __builtin_assume_aligned (dst, 4096);	\
+  src = __builtin_assume_aligned (src, 4096);	\
+  __builtin_memcpy (dst, src, N);		\
+}
+
+/* Emits 1x {ld,sd} and 1x {lw,sw}.  */
+COPY_N(11)
+
+/* Emits 2x {ld,sd}.  */
+COPY_N(13)
+
+/* Emits 2x {ld,sd}.  */
+COPY_N(14)
+
+/* Emits 2x {ld,sd}.  */
+COPY_N(15)
+
+/* Emits 2x {ld,sd} and 1x {lw,sw}.  */
+COPY_N(19)
+
+/* Emits 3x ld and 3x sd.  */
+COPY_N(23)
+
+/* The by-pieces infrastructure handles up to 24 bytes.
+   So the code below is emitted via cpymemsi/block_move_straight.  */
+
+/* Emits 3x {ld,sd} and 1x {lhu,lbu,sh,sb}.  */
+COPY_N(27)
+
+/* Emits 3x {ld,sd} and 1x {lw,lbu,sw,sb}.  */
+COPY_N(29)
+
+/* Emits 3x {ld,sd} and 2x {lw,sw}.  */
+COPY_N(31)
+
+/* { dg-final { scan-assembler-times "ld\t" 21 } } */
+/* { dg-final { scan-assembler-times "sd\t" 21 } } */
+
+/* { dg-final { scan-assembler-times "lw\t" 5 } } */
+/* { dg-final { scan-assembler-times "sw\t" 5 } } */
+
+/* { dg-final { scan-assembler-times "lbu\t" 2 } } */
+/* { dg-final { scan-assembler-times "sb\t" 2 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/memset-nonoverlapping.c b/gcc/testsuite/gcc.target/riscv/memset-nonoverlapping.c
new file mode 100644
index 00000000000..c4311c7a8d0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/memset-nonoverlapping.c
@@ -0,0 +1,45 @@
+/* { dg-do compile } */
+/* { dg-options "-mcpu=sifive-u74 -march=rv64gc -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Oz" "-Og" } } */
+
+#define ZERO_N(N)				\
+void zero##N (char *dst)			\
+{						\
+  dst = __builtin_assume_aligned (dst, 4096);	\
+  __builtin_memset (dst, 0, N);			\
+}
+
+/* Emits 1x sd and 1x {sh,sb}.  */
+ZERO_N(11)
+
+/* Emits 1x sd and 1x {sw,sb}.  */
+ZERO_N(13)
+
+/* Emits 1x sd and 1x {sw,sh}.  */
+ZERO_N(14)
+
+/* Emits 1x sd and 1x {sw,sh,sb}.  */
+ZERO_N(15)
+
+/* Emits 2x sd and 1x {sh,sb}.  */
+ZERO_N(19)
+
+/* Emits 2x sd and 1x {sw,sh,sb}.  */
+ZERO_N(23)
+
+/* The by-pieces infrastructure handles up to 24 bytes.
+   So the code below is emitted via cpymemsi/block_move_straight.  */
+
+/* Emits 3x sd and 1x {sh,sb}.  */
+ZERO_N(27)
+
+/* Emits 3x sd and 1x {sw,sb}.  */
+ZERO_N(29)
+
+/* Emits 3x sd and 1x {sw,sh,sb}.  */
+ZERO_N(31)
+
+/* { dg-final { scan-assembler-times "sd\t" 17 } } */
+/* { dg-final { scan-assembler-times "sw\t" 6 } } */
+/* { dg-final { scan-assembler-times "sh\t" 7 } } */
+/* { dg-final { scan-assembler-times "sb\t" 8 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/memset-overlapping.c b/gcc/testsuite/gcc.target/riscv/memset-overlapping.c
new file mode 100644
index 00000000000..793766b5262
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/memset-overlapping.c
@@ -0,0 +1,43 @@
+/* { dg-do compile } */
+/* { dg-options "-mcpu=ventana-vt1 -march=rv64gc -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Oz" "-Og" } } */
+
+#define ZERO_N(N)				\
+void zero##N (char *dst)			\
+{						\
+  dst = __builtin_assume_aligned (dst, 4096);	\
+  __builtin_memset (dst, 0, N);			\
+}
+
+/* Emits 1x sd and 1x sw.  */
+ZERO_N(11)
+
+/* Emits 2x sd.  */
+ZERO_N(13)
+
+/* Emits 2x sd.  */
+ZERO_N(14)
+
+/* Emits 2x sd.  */
+ZERO_N(15)
+
+/* Emits 2x sd and 1x sw.  */
+ZERO_N(19)
+
+/* Emits 3x sd.  */
+ZERO_N(23)
+
+/* The by-pieces infrastructure handles up to 24 bytes.
+   So the code below is emitted via cpymemsi/block_move_straight.  */
+
+/* Emits 3x sd and 1x sw.  */
+ZERO_N(27)
+
+/* Emits 4x sd.  */
+ZERO_N(29)
+
+/* Emits 4x sd.  */
+ZERO_N(31)
+
+/* { dg-final { scan-assembler-times "sd\t" 23 } } */
+/* { dg-final { scan-assembler-times "sw\t" 3 } } */
-- 
2.38.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 4/7] riscv: Move riscv_block_move_loop to separate file
  2022-11-13 23:05 [PATCH 0/7] riscv: Improve builtins expansion Christoph Muellner
                   ` (2 preceding siblings ...)
  2022-11-13 23:05 ` [PATCH 3/7] riscv: Enable overlap-by-pieces via tune param Christoph Muellner
@ 2022-11-13 23:05 ` Christoph Muellner
  2022-11-14 16:56   ` Jeff Law
  2022-11-13 23:05 ` [PATCH 5/7] riscv: Use by-pieces to do overlapping accesses in block_move_straight Christoph Muellner
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 33+ messages in thread
From: Christoph Muellner @ 2022-11-13 23:05 UTC (permalink / raw)
  To: gcc-patches, Kito Cheng, Jim Wilson, Palmer Dabbelt,
	Andrew Waterman, Philipp Tomsich, Jeff Law, Vineet Gupta
  Cc: Christoph Müllner

From: Christoph Müllner <christoph.muellner@vrull.eu>

Let's try to not accumulate too much functionality in one single file
as this does not really help maintaining or extending the code.
So in order to add more similar functionality like riscv_block_move_loop
let's move this function to a separate file.

This change does not do any functional changes.
It does modify a single line in the existing code,
that check_GNU_style.py complained about.

gcc/ChangeLog:

	* config.gcc: Add new object riscv-string.o
	* config/riscv/riscv-protos.h (riscv_expand_block_move): Remove
	  duplicated prototype and move to new section for
	  riscv-string.cc.
	* config/riscv/riscv.cc (riscv_block_move_straight): Remove function.
	(riscv_adjust_block_mem): Likewise.
	(riscv_block_move_loop): Likewise.
	(riscv_expand_block_move): Likewise.
	* config/riscv/riscv.md (cpymemsi): Move to new section for
	  riscv-string.cc.
	* config/riscv/t-riscv: Add compile rule for riscv-string.o
	* config/riscv/riscv-string.c: New file.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
---
 gcc/config.gcc                   |   3 +-
 gcc/config/riscv/riscv-protos.h  |   5 +-
 gcc/config/riscv/riscv-string.cc | 194 +++++++++++++++++++++++++++++++
 gcc/config/riscv/riscv.cc        | 155 ------------------------
 gcc/config/riscv/riscv.md        |  28 ++---
 gcc/config/riscv/t-riscv         |   4 +
 6 files changed, 218 insertions(+), 171 deletions(-)
 create mode 100644 gcc/config/riscv/riscv-string.cc

diff --git a/gcc/config.gcc b/gcc/config.gcc
index b5eda046033..fc9e582e713 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -518,7 +518,8 @@ pru-*-*)
 	;;
 riscv*)
 	cpu_type=riscv
-	extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o"
+	extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o riscv-selftests.o"
+	extra_objs="${extra_objs} riscv-string.o riscv-v.o"
 	extra_objs="${extra_objs} riscv-vector-builtins.o riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
 	d_target_objs="riscv-d.o"
 	extra_headers="riscv_vector.h"
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5a718bb62b4..344515dbaf4 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -62,7 +62,6 @@ extern void riscv_expand_conditional_move (rtx, rtx, rtx, rtx_code, rtx, rtx);
 #endif
 extern rtx riscv_legitimize_call_address (rtx);
 extern void riscv_set_return_address (rtx, rtx);
-extern bool riscv_expand_block_move (rtx, rtx, rtx);
 extern rtx riscv_return_addr (int, rtx);
 extern poly_int64 riscv_initial_elimination_offset (int, int);
 extern void riscv_expand_prologue (void);
@@ -70,7 +69,6 @@ extern void riscv_expand_epilogue (int);
 extern bool riscv_epilogue_uses (unsigned int);
 extern bool riscv_can_use_return_insn (void);
 extern rtx riscv_function_value (const_tree, const_tree, enum machine_mode);
-extern bool riscv_expand_block_move (rtx, rtx, rtx);
 extern bool riscv_store_data_bypass_p (rtx_insn *, rtx_insn *);
 extern rtx riscv_gen_gpr_save_insn (struct riscv_frame_info *);
 extern bool riscv_gpr_save_operation_p (rtx);
@@ -96,6 +94,9 @@ extern bool riscv_hard_regno_rename_ok (unsigned, unsigned);
 
 rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
 
+/* Routines implemented in riscv-string.c.  */
+extern bool riscv_expand_block_move (rtx, rtx, rtx);
+
 /* Information about one CPU we know about.  */
 struct riscv_cpu_info {
   /* This CPU's canonical name.  */
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
new file mode 100644
index 00000000000..6882f0be269
--- /dev/null
+++ b/gcc/config/riscv/riscv-string.cc
@@ -0,0 +1,194 @@
+/* Subroutines used to expand string and block move, clear,
+   compare and other operations for RISC-V.
+   Copyright (C) 2011-2022 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#define IN_TARGET_CODE 1
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "rtl.h"
+#include "tree.h"
+#include "memmodel.h"
+#include "tm_p.h"
+#include "ira.h"
+#include "print-tree.h"
+#include "varasm.h"
+#include "explow.h"
+#include "expr.h"
+#include "output.h"
+#include "target.h"
+#include "predict.h"
+#include "optabs.h"
+
+/* Emit straight-line code to move LENGTH bytes from SRC to DEST.
+   Assume that the areas do not overlap.  */
+
+static void
+riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length)
+{
+  unsigned HOST_WIDE_INT offset, delta;
+  unsigned HOST_WIDE_INT bits;
+  int i;
+  enum machine_mode mode;
+  rtx *regs;
+
+  bits = MAX (BITS_PER_UNIT,
+	      MIN (BITS_PER_WORD, MIN (MEM_ALIGN (src), MEM_ALIGN (dest))));
+
+  mode = mode_for_size (bits, MODE_INT, 0).require ();
+  delta = bits / BITS_PER_UNIT;
+
+  /* Allocate a buffer for the temporary registers.  */
+  regs = XALLOCAVEC (rtx, length / delta);
+
+  /* Load as many BITS-sized chunks as possible.  Use a normal load if
+     the source has enough alignment, otherwise use left/right pairs.  */
+  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+    {
+      regs[i] = gen_reg_rtx (mode);
+      riscv_emit_move (regs[i], adjust_address (src, mode, offset));
+    }
+
+  /* Copy the chunks to the destination.  */
+  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+    riscv_emit_move (adjust_address (dest, mode, offset), regs[i]);
+
+  /* Mop up any left-over bytes.  */
+  if (offset < length)
+    {
+      src = adjust_address (src, BLKmode, offset);
+      dest = adjust_address (dest, BLKmode, offset);
+      move_by_pieces (dest, src, length - offset,
+		      MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), RETURN_BEGIN);
+    }
+}
+
+/* Helper function for doing a loop-based block operation on memory
+   reference MEM.  Each iteration of the loop will operate on LENGTH
+   bytes of MEM.
+
+   Create a new base register for use within the loop and point it to
+   the start of MEM.  Create a new memory reference that uses this
+   register.  Store them in *LOOP_REG and *LOOP_MEM respectively.  */
+
+static void
+riscv_adjust_block_mem (rtx mem, unsigned HOST_WIDE_INT length,
+			rtx *loop_reg, rtx *loop_mem)
+{
+  *loop_reg = copy_addr_to_reg (XEXP (mem, 0));
+
+  /* Although the new mem does not refer to a known location,
+     it does keep up to LENGTH bytes of alignment.  */
+  *loop_mem = change_address (mem, BLKmode, *loop_reg);
+  set_mem_align (*loop_mem, MIN (MEM_ALIGN (mem), length * BITS_PER_UNIT));
+}
+
+/* Move LENGTH bytes from SRC to DEST using a loop that moves BYTES_PER_ITER
+   bytes at a time.  LENGTH must be at least BYTES_PER_ITER.  Assume that
+   the memory regions do not overlap.  */
+
+static void
+riscv_block_move_loop (rtx dest, rtx src, unsigned HOST_WIDE_INT length,
+		       unsigned HOST_WIDE_INT bytes_per_iter)
+{
+  rtx label, src_reg, dest_reg, final_src, test;
+  unsigned HOST_WIDE_INT leftover;
+
+  leftover = length % bytes_per_iter;
+  length -= leftover;
+
+  /* Create registers and memory references for use within the loop.  */
+  riscv_adjust_block_mem (src, bytes_per_iter, &src_reg, &src);
+  riscv_adjust_block_mem (dest, bytes_per_iter, &dest_reg, &dest);
+
+  /* Calculate the value that SRC_REG should have after the last iteration
+     of the loop.  */
+  final_src = expand_simple_binop (Pmode, PLUS, src_reg, GEN_INT (length),
+				   0, 0, OPTAB_WIDEN);
+
+  /* Emit the start of the loop.  */
+  label = gen_label_rtx ();
+  emit_label (label);
+
+  /* Emit the loop body.  */
+  riscv_block_move_straight (dest, src, bytes_per_iter);
+
+  /* Move on to the next block.  */
+  riscv_emit_move (src_reg, plus_constant (Pmode, src_reg, bytes_per_iter));
+  riscv_emit_move (dest_reg, plus_constant (Pmode, dest_reg, bytes_per_iter));
+
+  /* Emit the loop condition.  */
+  test = gen_rtx_NE (VOIDmode, src_reg, final_src);
+  emit_jump_insn (gen_cbranch4 (Pmode, test, src_reg, final_src, label));
+
+  /* Mop up any left-over bytes.  */
+  if (leftover)
+    riscv_block_move_straight (dest, src, leftover);
+  else
+    emit_insn (gen_nop ());
+}
+
+/* Expand a cpymemsi instruction, which copies LENGTH bytes from
+   memory reference SRC to memory reference DEST.  */
+
+bool
+riscv_expand_block_move (rtx dest, rtx src, rtx length)
+{
+  if (CONST_INT_P (length))
+    {
+      unsigned HOST_WIDE_INT hwi_length = UINTVAL (length);
+      unsigned HOST_WIDE_INT factor, align;
+
+      align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD);
+      factor = BITS_PER_WORD / align;
+
+      if (optimize_function_for_size_p (cfun)
+	  && hwi_length * factor * UNITS_PER_WORD > MOVE_RATIO (false))
+	return false;
+
+      if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor))
+	{
+	  riscv_block_move_straight (dest, src, INTVAL (length));
+	  return true;
+	}
+      else if (optimize && align >= BITS_PER_WORD)
+	{
+	  unsigned min_iter_words
+	    = RISCV_MAX_MOVE_BYTES_PER_LOOP_ITER / UNITS_PER_WORD;
+	  unsigned iter_words = min_iter_words;
+	  unsigned HOST_WIDE_INT bytes = hwi_length;
+	  unsigned HOST_WIDE_INT words = bytes / UNITS_PER_WORD;
+
+	  /* Lengthen the loop body if it shortens the tail.  */
+	  for (unsigned i = min_iter_words; i < min_iter_words * 2 - 1; i++)
+	    {
+	      unsigned cur_cost = iter_words + words % iter_words;
+	      unsigned new_cost = i + words % i;
+	      if (new_cost <= cur_cost)
+		iter_words = i;
+	    }
+
+	  riscv_block_move_loop (dest, src, bytes, iter_words * UNITS_PER_WORD);
+	  return true;
+	}
+    }
+  return false;
+}
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 7357cf51cdf..fab40c6f8dc 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3949,161 +3949,6 @@ riscv_legitimize_call_address (rtx addr)
   return addr;
 }
 
-/* Emit straight-line code to move LENGTH bytes from SRC to DEST.
-   Assume that the areas do not overlap.  */
-
-static void
-riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length)
-{
-  unsigned HOST_WIDE_INT offset, delta;
-  unsigned HOST_WIDE_INT bits;
-  int i;
-  enum machine_mode mode;
-  rtx *regs;
-
-  bits = MAX (BITS_PER_UNIT,
-	      MIN (BITS_PER_WORD, MIN (MEM_ALIGN (src), MEM_ALIGN (dest))));
-
-  mode = mode_for_size (bits, MODE_INT, 0).require ();
-  delta = bits / BITS_PER_UNIT;
-
-  /* Allocate a buffer for the temporary registers.  */
-  regs = XALLOCAVEC (rtx, length / delta);
-
-  /* Load as many BITS-sized chunks as possible.  Use a normal load if
-     the source has enough alignment, otherwise use left/right pairs.  */
-  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
-    {
-      regs[i] = gen_reg_rtx (mode);
-      riscv_emit_move (regs[i], adjust_address (src, mode, offset));
-    }
-
-  /* Copy the chunks to the destination.  */
-  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
-    riscv_emit_move (adjust_address (dest, mode, offset), regs[i]);
-
-  /* Mop up any left-over bytes.  */
-  if (offset < length)
-    {
-      src = adjust_address (src, BLKmode, offset);
-      dest = adjust_address (dest, BLKmode, offset);
-      move_by_pieces (dest, src, length - offset,
-		      MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), RETURN_BEGIN);
-    }
-}
-
-/* Helper function for doing a loop-based block operation on memory
-   reference MEM.  Each iteration of the loop will operate on LENGTH
-   bytes of MEM.
-
-   Create a new base register for use within the loop and point it to
-   the start of MEM.  Create a new memory reference that uses this
-   register.  Store them in *LOOP_REG and *LOOP_MEM respectively.  */
-
-static void
-riscv_adjust_block_mem (rtx mem, unsigned HOST_WIDE_INT length,
-			rtx *loop_reg, rtx *loop_mem)
-{
-  *loop_reg = copy_addr_to_reg (XEXP (mem, 0));
-
-  /* Although the new mem does not refer to a known location,
-     it does keep up to LENGTH bytes of alignment.  */
-  *loop_mem = change_address (mem, BLKmode, *loop_reg);
-  set_mem_align (*loop_mem, MIN (MEM_ALIGN (mem), length * BITS_PER_UNIT));
-}
-
-/* Move LENGTH bytes from SRC to DEST using a loop that moves BYTES_PER_ITER
-   bytes at a time.  LENGTH must be at least BYTES_PER_ITER.  Assume that
-   the memory regions do not overlap.  */
-
-static void
-riscv_block_move_loop (rtx dest, rtx src, unsigned HOST_WIDE_INT length,
-		       unsigned HOST_WIDE_INT bytes_per_iter)
-{
-  rtx label, src_reg, dest_reg, final_src, test;
-  unsigned HOST_WIDE_INT leftover;
-
-  leftover = length % bytes_per_iter;
-  length -= leftover;
-
-  /* Create registers and memory references for use within the loop.  */
-  riscv_adjust_block_mem (src, bytes_per_iter, &src_reg, &src);
-  riscv_adjust_block_mem (dest, bytes_per_iter, &dest_reg, &dest);
-
-  /* Calculate the value that SRC_REG should have after the last iteration
-     of the loop.  */
-  final_src = expand_simple_binop (Pmode, PLUS, src_reg, GEN_INT (length),
-				   0, 0, OPTAB_WIDEN);
-
-  /* Emit the start of the loop.  */
-  label = gen_label_rtx ();
-  emit_label (label);
-
-  /* Emit the loop body.  */
-  riscv_block_move_straight (dest, src, bytes_per_iter);
-
-  /* Move on to the next block.  */
-  riscv_emit_move (src_reg, plus_constant (Pmode, src_reg, bytes_per_iter));
-  riscv_emit_move (dest_reg, plus_constant (Pmode, dest_reg, bytes_per_iter));
-
-  /* Emit the loop condition.  */
-  test = gen_rtx_NE (VOIDmode, src_reg, final_src);
-  emit_jump_insn (gen_cbranch4 (Pmode, test, src_reg, final_src, label));
-
-  /* Mop up any left-over bytes.  */
-  if (leftover)
-    riscv_block_move_straight (dest, src, leftover);
-  else
-    emit_insn(gen_nop ());
-}
-
-/* Expand a cpymemsi instruction, which copies LENGTH bytes from
-   memory reference SRC to memory reference DEST.  */
-
-bool
-riscv_expand_block_move (rtx dest, rtx src, rtx length)
-{
-  if (CONST_INT_P (length))
-    {
-      unsigned HOST_WIDE_INT hwi_length = UINTVAL (length);
-      unsigned HOST_WIDE_INT factor, align;
-
-      align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD);
-      factor = BITS_PER_WORD / align;
-
-      if (optimize_function_for_size_p (cfun)
-	  && hwi_length * factor * UNITS_PER_WORD > MOVE_RATIO (false))
-	return false;
-
-      if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor))
-	{
-	  riscv_block_move_straight (dest, src, INTVAL (length));
-	  return true;
-	}
-      else if (optimize && align >= BITS_PER_WORD)
-	{
-	  unsigned min_iter_words
-	    = RISCV_MAX_MOVE_BYTES_PER_LOOP_ITER / UNITS_PER_WORD;
-	  unsigned iter_words = min_iter_words;
-	  unsigned HOST_WIDE_INT bytes = hwi_length;
-	  unsigned HOST_WIDE_INT words = bytes / UNITS_PER_WORD;
-
-	  /* Lengthen the loop body if it shortens the tail.  */
-	  for (unsigned i = min_iter_words; i < min_iter_words * 2 - 1; i++)
-	    {
-	      unsigned cur_cost = iter_words + words % iter_words;
-	      unsigned new_cost = i + words % i;
-	      if (new_cost <= cur_cost)
-		iter_words = i;
-	    }
-
-	  riscv_block_move_loop (dest, src, bytes, iter_words * UNITS_PER_WORD);
-	  return true;
-	}
-    }
-  return false;
-}
-
 /* Print symbolic operand OP, which is part of a HIGH or LO_SUM
    in context CONTEXT.  HI_RELOC indicates a high-part reloc.  */
 
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 532289dd178..43b97f1181e 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -1872,19 +1872,6 @@ (define_split
   DONE;
 })
 
-(define_expand "cpymemsi"
-  [(parallel [(set (match_operand:BLK 0 "general_operand")
-		   (match_operand:BLK 1 "general_operand"))
-	      (use (match_operand:SI 2 ""))
-	      (use (match_operand:SI 3 "const_int_operand"))])]
-  ""
-{
-  if (riscv_expand_block_move (operands[0], operands[1], operands[2]))
-    DONE;
-  else
-    FAIL;
-})
-
 ;; Expand in-line code to clear the instruction cache between operand[0] and
 ;; operand[1].
 (define_expand "clear_cache"
@@ -3005,6 +2992,21 @@ (define_insn "riscv_prefetchi_<mode>"
   "prefetch.i\t%a0"
 )
 
+;; Expansions from riscv-string.c
+
+(define_expand "cpymemsi"
+  [(parallel [(set (match_operand:BLK 0 "general_operand")
+		   (match_operand:BLK 1 "general_operand"))
+	      (use (match_operand:SI 2 ""))
+	      (use (match_operand:SI 3 "const_int_operand"))])]
+  ""
+{
+  if (riscv_expand_block_move (operands[0], operands[1], operands[2]))
+    DONE;
+  else
+    FAIL;
+})
+
 (include "bitmanip.md")
 (include "sync.md")
 (include "peephole.md")
diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
index 7997db3d898..5cb58a74a53 100644
--- a/gcc/config/riscv/t-riscv
+++ b/gcc/config/riscv/t-riscv
@@ -63,6 +63,10 @@ riscv-selftests.o: $(srcdir)/config/riscv/riscv-selftests.cc
 	$(COMPILE) $<
 	$(POSTCOMPILE)
 
+riscv-string.o: $(srcdir)/config/riscv/riscv-string.cc
+	$(COMPILE) $<
+	$(POSTCOMPILE)
+
 riscv-v.o: $(srcdir)/config/riscv/riscv-v.cc
 	$(COMPILE) $<
 	$(POSTCOMPILE)
-- 
2.38.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 5/7] riscv: Use by-pieces to do overlapping accesses in block_move_straight
  2022-11-13 23:05 [PATCH 0/7] riscv: Improve builtins expansion Christoph Muellner
                   ` (3 preceding siblings ...)
  2022-11-13 23:05 ` [PATCH 4/7] riscv: Move riscv_block_move_loop to separate file Christoph Muellner
@ 2022-11-13 23:05 ` Christoph Muellner
  2022-11-14 17:16   ` Jeff Law
  2022-11-13 23:05 ` [PATCH 6/7] riscv: Add support for strlen inline expansion Christoph Muellner
  2022-11-13 23:05 ` [PATCH 7/7] riscv: Add support for str(n)cmp " Christoph Muellner
  6 siblings, 1 reply; 33+ messages in thread
From: Christoph Muellner @ 2022-11-13 23:05 UTC (permalink / raw)
  To: gcc-patches, Kito Cheng, Jim Wilson, Palmer Dabbelt,
	Andrew Waterman, Philipp Tomsich, Jeff Law, Vineet Gupta
  Cc: Christoph Müllner

From: Christoph Müllner <christoph.muellner@vrull.eu>

The current implementation of riscv_block_move_straight() emits a couple
of load-store pairs with maximum width (e.g. 8-byte for RV64).
The remainder is handed over to move_by_pieces(), which emits code based
target settings like slow_unaligned_access and overlap_op_by_pieces.

move_by_pieces() will emit overlapping memory accesses with maximum
width only if the given length exceeds the size of one access
(e.g. 15-bytes for 8-byte accesses).

This patch changes the implementation of riscv_block_move_straight()
such, that it preserves a remainder within the interval
[delta..2*delta) instead of [0..delta), so that overlapping memory
access may be emitted (if the requirements for them are given).

gcc/ChangeLog:

	* config/riscv/riscv-string.c (riscv_block_move_straight):
	  Adjust range for emitted load/store pairs.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
---
 gcc/config/riscv/riscv-string.cc              |  8 ++++----
 .../gcc.target/riscv/memcpy-overlapping.c     | 19 ++++++++-----------
 2 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 6882f0be269..1137df475be 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -57,18 +57,18 @@ riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length)
   delta = bits / BITS_PER_UNIT;
 
   /* Allocate a buffer for the temporary registers.  */
-  regs = XALLOCAVEC (rtx, length / delta);
+  regs = XALLOCAVEC (rtx, length / delta - 1);
 
   /* Load as many BITS-sized chunks as possible.  Use a normal load if
      the source has enough alignment, otherwise use left/right pairs.  */
-  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+  for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++)
     {
       regs[i] = gen_reg_rtx (mode);
       riscv_emit_move (regs[i], adjust_address (src, mode, offset));
     }
 
   /* Copy the chunks to the destination.  */
-  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+  for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++)
     riscv_emit_move (adjust_address (dest, mode, offset), regs[i]);
 
   /* Mop up any left-over bytes.  */
@@ -166,7 +166,7 @@ riscv_expand_block_move (rtx dest, rtx src, rtx length)
 
       if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor))
 	{
-	  riscv_block_move_straight (dest, src, INTVAL (length));
+	  riscv_block_move_straight (dest, src, hwi_length);
 	  return true;
 	}
       else if (optimize && align >= BITS_PER_WORD)
diff --git a/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c b/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
index ffb7248bfd1..ef95bfb879b 100644
--- a/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
+++ b/gcc/testsuite/gcc.target/riscv/memcpy-overlapping.c
@@ -25,26 +25,23 @@ COPY_N(15)
 /* Emits 2x {ld,sd} and 1x {lw,sw}.  */
 COPY_N(19)
 
-/* Emits 3x ld and 3x sd.  */
+/* Emits 3x {ld,sd}.  */
 COPY_N(23)
 
 /* The by-pieces infrastructure handles up to 24 bytes.
    So the code below is emitted via cpymemsi/block_move_straight.  */
 
-/* Emits 3x {ld,sd} and 1x {lhu,lbu,sh,sb}.  */
+/* Emits 3x {ld,sd} and 1x {lw,sw}.  */
 COPY_N(27)
 
-/* Emits 3x {ld,sd} and 1x {lw,lbu,sw,sb}.  */
+/* Emits 4x {ld,sd}.  */
 COPY_N(29)
 
-/* Emits 3x {ld,sd} and 2x {lw,sw}.  */
+/* Emits 4x {ld,sd}.  */
 COPY_N(31)
 
-/* { dg-final { scan-assembler-times "ld\t" 21 } } */
-/* { dg-final { scan-assembler-times "sd\t" 21 } } */
+/* { dg-final { scan-assembler-times "ld\t" 23 } } */
+/* { dg-final { scan-assembler-times "sd\t" 23 } } */
 
-/* { dg-final { scan-assembler-times "lw\t" 5 } } */
-/* { dg-final { scan-assembler-times "sw\t" 5 } } */
-
-/* { dg-final { scan-assembler-times "lbu\t" 2 } } */
-/* { dg-final { scan-assembler-times "sb\t" 2 } } */
+/* { dg-final { scan-assembler-times "lw\t" 3 } } */
+/* { dg-final { scan-assembler-times "sw\t" 3 } } */
-- 
2.38.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 6/7] riscv: Add support for strlen inline expansion
  2022-11-13 23:05 [PATCH 0/7] riscv: Improve builtins expansion Christoph Muellner
                   ` (4 preceding siblings ...)
  2022-11-13 23:05 ` [PATCH 5/7] riscv: Use by-pieces to do overlapping accesses in block_move_straight Christoph Muellner
@ 2022-11-13 23:05 ` Christoph Muellner
  2022-11-14 18:17   ` Jeff Law
  2022-11-13 23:05 ` [PATCH 7/7] riscv: Add support for str(n)cmp " Christoph Muellner
  6 siblings, 1 reply; 33+ messages in thread
From: Christoph Muellner @ 2022-11-13 23:05 UTC (permalink / raw)
  To: gcc-patches, Kito Cheng, Jim Wilson, Palmer Dabbelt,
	Andrew Waterman, Philipp Tomsich, Jeff Law, Vineet Gupta
  Cc: Christoph Müllner

From: Christoph Müllner <christoph.muellner@vrull.eu>

This patch implements the expansion of the strlen builtin
using Zbb instructions (if available) for aligned strings
using the following sequence:

      li      a3,-1
      addi    a4,a0,8
.L2:  ld      a5,0(a0)
      addi    a0,a0,8
      orc.b   a5,a5
      beq     a5,a3,6 <.L2>
      not     a5,a5
      ctz     a5,a5
      srli    a5,a5,0x3
      add     a0,a0,a5
      sub     a0,a0,a4

This allows to inline calls to strlen(), with optimized code for
determining the length of a string.

gcc/ChangeLog:

	* config/riscv/riscv-protos.h (riscv_expand_strlen): New
	  prototype.
	* config/riscv/riscv-string.cc (riscv_emit_unlikely_jump): New
	  function.
	(GEN_EMIT_HELPER2): New helper macro.
	(GEN_EMIT_HELPER3): New helper macro.
	(do_load_from_addr): New helper function.
	(riscv_expand_strlen_zbb): New function.
	(riscv_expand_strlen): New function.
	* config/riscv/riscv.md (strlen<mode>): Invoke expansion
	  functions for strlen.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
---
 gcc/config/riscv/riscv-protos.h               |   1 +
 gcc/config/riscv/riscv-string.cc              | 149 ++++++++++++++++++
 gcc/config/riscv/riscv.md                     |  28 ++++
 .../gcc.target/riscv/zbb-strlen-unaligned.c   |  13 ++
 gcc/testsuite/gcc.target/riscv/zbb-strlen.c   |  18 +++
 5 files changed, 209 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen-unaligned.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 344515dbaf4..18187e3bd78 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -96,6 +96,7 @@ rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
 
 /* Routines implemented in riscv-string.c.  */
 extern bool riscv_expand_block_move (rtx, rtx, rtx);
+extern bool riscv_expand_strlen (rtx[]);
 
 /* Information about one CPU we know about.  */
 struct riscv_cpu_info {
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 1137df475be..bf96522b608 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -38,6 +38,81 @@
 #include "predict.h"
 #include "optabs.h"
 
+/* Emit unlikely jump instruction.  */
+
+static rtx_insn *
+riscv_emit_unlikely_jump (rtx insn)
+{
+  rtx_insn *jump = emit_jump_insn (insn);
+  add_reg_br_prob_note (jump, profile_probability::very_unlikely ());
+  return jump;
+}
+
+/* Emit proper instruction depending on type of dest.  */
+
+#define GEN_EMIT_HELPER2(name)				\
+static rtx_insn *					\
+do_## name ## 2(rtx dest, rtx src)			\
+{							\
+  rtx_insn *insn;					\
+  if (GET_MODE (dest) == DImode)			\
+    insn = emit_insn (gen_ ## name ## di2 (dest, src));	\
+  else							\
+    insn = emit_insn (gen_ ## name ## si2 (dest, src));	\
+  return insn;						\
+}
+
+/* Emit proper instruction depending on type of dest.  */
+
+#define GEN_EMIT_HELPER3(name)					\
+static rtx_insn *						\
+do_## name ## 3(rtx dest, rtx src1, rtx src2)			\
+{								\
+  rtx_insn *insn;						\
+  if (GET_MODE (dest) == DImode)				\
+    insn = emit_insn (gen_ ## name ## di3 (dest, src1, src2));	\
+  else								\
+    insn = emit_insn (gen_ ## name ## si3 (dest, src1, src2));	\
+  return insn;							\
+}
+
+GEN_EMIT_HELPER3(add) /* do_add3  */
+GEN_EMIT_HELPER3(sub) /* do_sub3  */
+GEN_EMIT_HELPER3(lshr) /* do_lshr3  */
+GEN_EMIT_HELPER2(orcb) /* do_orcb2  */
+GEN_EMIT_HELPER2(one_cmpl) /* do_one_cmpl2  */
+GEN_EMIT_HELPER2(clz) /* do_clz2  */
+GEN_EMIT_HELPER2(ctz) /* do_ctz2  */
+GEN_EMIT_HELPER2(zero_extendqi) /* do_zero_extendqi2  */
+
+/* Helper function to load a byte or a Pmode register.
+
+   MODE is the mode to use for the load (QImode or Pmode).
+   DEST is the destination register for the data.
+   ADDR_REG is the register that holds the address.
+   ADDR is the address expression to load from.
+
+   This function returns an rtx containing the register,
+   where the ADDR is stored.  */
+
+static rtx
+do_load_from_addr (machine_mode mode, rtx dest, rtx addr_reg, rtx addr)
+{
+  rtx mem = gen_rtx_MEM (mode, addr_reg);
+  MEM_COPY_ATTRIBUTES (mem, addr);
+  set_mem_size (mem, GET_MODE_SIZE (mode));
+
+  if (mode == QImode)
+    do_zero_extendqi2 (dest, mem);
+  else if (mode == Pmode)
+    emit_move_insn (dest, mem);
+  else
+    gcc_unreachable ();
+
+  return addr_reg;
+}
+
+
 /* Emit straight-line code to move LENGTH bytes from SRC to DEST.
    Assume that the areas do not overlap.  */
 
@@ -192,3 +267,77 @@ riscv_expand_block_move (rtx dest, rtx src, rtx length)
     }
   return false;
 }
+
+/* If the provided string is aligned, then read XLEN bytes
+   in a loop and use orc.b to find NUL-bytes.  */
+
+static bool
+riscv_expand_strlen_zbb (rtx result, rtx src, rtx align)
+{
+  rtx m1, addr, addr_plus_regsz, word, zeros;
+  rtx loop_label, cond;
+
+  gcc_assert (TARGET_ZBB);
+
+  /* The alignment needs to be known and big enough.  */
+  if (!CONST_INT_P (align) || UINTVAL (align) < GET_MODE_SIZE (Pmode))
+    return false;
+
+  m1 = gen_reg_rtx (Pmode);
+  addr = copy_addr_to_reg (XEXP (src, 0));
+  addr_plus_regsz = gen_reg_rtx (Pmode);
+  word = gen_reg_rtx (Pmode);
+  zeros = gen_reg_rtx (Pmode);
+
+  emit_insn (gen_rtx_SET (m1, constm1_rtx));
+  do_add3 (addr_plus_regsz, addr, GEN_INT (UNITS_PER_WORD));
+
+  loop_label = gen_label_rtx ();
+  emit_label (loop_label);
+
+  /* Load a word and use orc.b to find a zero-byte.  */
+  do_load_from_addr (Pmode, word, addr, src);
+  do_add3 (addr, addr, GEN_INT (UNITS_PER_WORD));
+  do_orcb2 (word, word);
+  cond = gen_rtx_EQ (VOIDmode, word, m1);
+  riscv_emit_unlikely_jump (gen_cbranch4 (Pmode, cond,
+			    word, m1, loop_label));
+
+  /* Calculate the return value by counting zero-bits.  */
+  do_one_cmpl2 (word, word);
+  if (TARGET_BIG_ENDIAN)
+    do_clz2 (zeros, word);
+  else
+    do_ctz2 (zeros, word);
+
+  do_lshr3 (zeros, zeros, GEN_INT (exact_log2 (BITS_PER_UNIT)));
+  do_add3 (addr, addr, zeros);
+  do_sub3 (result, addr, addr_plus_regsz);
+
+  return true;
+}
+
+/* Expand a strlen operation and return true if successful.
+   Return false if we should let the compiler generate normal
+   code, probably a strlen call.
+
+   OPERANDS[0] is the target (result).
+   OPERANDS[1] is the source.
+   OPERANDS[2] is the search byte (must be 0)
+   OPERANDS[3] is the alignment in bytes.  */
+
+bool
+riscv_expand_strlen (rtx operands[])
+{
+  rtx result = operands[0];
+  rtx src = operands[1];
+  rtx search_char = operands[2];
+  rtx align = operands[3];
+
+  gcc_assert (search_char == const0_rtx);
+
+  if (TARGET_ZBB)
+    return riscv_expand_strlen_zbb (result, src, align);
+
+  return false;
+}
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 43b97f1181e..f05c764c3d4 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -65,6 +65,9 @@ (define_c_enum "unspec" [
 
   ;; OR-COMBINE
   UNSPEC_ORC_B
+
+  ;; ZBB STRLEN
+  UNSPEC_STRLEN
 ])
 
 (define_c_enum "unspecv" [
@@ -3007,6 +3010,31 @@ (define_expand "cpymemsi"
     FAIL;
 })
 
+;; Search character in string (generalization of strlen).
+;; Argument 0 is the resulting offset
+;; Argument 1 is the string
+;; Argument 2 is the search character
+;; Argument 3 is the alignment
+
+(define_expand "strlen<mode>"
+  [(set (match_operand:X 0 "register_operand")
+	(unspec:X [(match_operand:BLK 1 "general_operand")
+		     (match_operand:SI 2 "const_int_operand")
+		     (match_operand:SI 3 "const_int_operand")]
+		  UNSPEC_STRLEN))]
+  ""
+{
+  rtx search_char = operands[2];
+
+  if (optimize_insn_for_size_p () || search_char != const0_rtx)
+    FAIL;
+
+  if (riscv_expand_strlen (operands))
+    DONE;
+  else
+    FAIL;
+})
+
 (include "bitmanip.md")
 (include "sync.md")
 (include "peephole.md")
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-strlen-unaligned.c b/gcc/testsuite/gcc.target/riscv/zbb-strlen-unaligned.c
new file mode 100644
index 00000000000..39da70a5021
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-strlen-unaligned.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv64gc_zbb -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-Os" } } */
+
+typedef long unsigned int size_t;
+
+size_t
+my_str_len (const char *s)
+{
+  return __builtin_strlen (s);
+}
+
+/* { dg-final { scan-assembler-not "orc.b\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-strlen.c b/gcc/testsuite/gcc.target/riscv/zbb-strlen.c
new file mode 100644
index 00000000000..d01b7fc552d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-strlen.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv64gc_zbb -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-Os" } } */
+
+typedef long unsigned int size_t;
+
+size_t
+my_str_len (const char *s)
+{
+  s = __builtin_assume_aligned (s, 4096);
+  return __builtin_strlen (s);
+}
+
+/* { dg-final { scan-assembler "orc.b\t" } } */
+/* { dg-final { scan-assembler-not "jalr" } } */
+/* { dg-final { scan-assembler-not "call" } } */
+/* { dg-final { scan-assembler-not "jr" } } */
+/* { dg-final { scan-assembler-not "tail" } } */
-- 
2.38.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 7/7] riscv: Add support for str(n)cmp inline expansion
  2022-11-13 23:05 [PATCH 0/7] riscv: Improve builtins expansion Christoph Muellner
                   ` (5 preceding siblings ...)
  2022-11-13 23:05 ` [PATCH 6/7] riscv: Add support for strlen inline expansion Christoph Muellner
@ 2022-11-13 23:05 ` Christoph Muellner
  2022-11-14 19:28   ` Jeff Law
  2022-11-15  0:46   ` Kito Cheng
  6 siblings, 2 replies; 33+ messages in thread
From: Christoph Muellner @ 2022-11-13 23:05 UTC (permalink / raw)
  To: gcc-patches, Kito Cheng, Jim Wilson, Palmer Dabbelt,
	Andrew Waterman, Philipp Tomsich, Jeff Law, Vineet Gupta
  Cc: Christoph Müllner

From: Christoph Müllner <christoph.muellner@vrull.eu>

This patch implements expansions for the cmpstrsi and the cmpstrnsi
builtins using Zbb instructions (if available).
This allows to inline calls to strcmp() and strncmp().

The expansion basically emits a peeled comparison sequence (i.e. a peeled
comparison loop) which compares XLEN bits per step if possible.

The emitted sequence can be controlled, by setting the maximum number
of compared bytes (-mstring-compare-inline-limit).

gcc/ChangeLog:

	* config/riscv/riscv-protos.h (riscv_expand_strn_compare): New
	  prototype.
	* config/riscv/riscv-string.cc (GEN_EMIT_HELPER3): New helper
	  macros.
	(GEN_EMIT_HELPER2): New helper macros.
	(expand_strncmp_zbb_sequence): New function.
	(riscv_emit_str_compare_zbb): New function.
	(riscv_expand_strn_compare): New function.
	* config/riscv/riscv.md (cmpstrnsi): Invoke expansion functions
	  for strn_compare.
	(cmpstrsi): Invoke expansion functions for strn_compare.
	* config/riscv/riscv.opt: Add new parameter
	  '-mstring-compare-inline-limit'.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
---
 gcc/config/riscv/riscv-protos.h               |   1 +
 gcc/config/riscv/riscv-string.cc              | 344 ++++++++++++++++++
 gcc/config/riscv/riscv.md                     |  46 +++
 gcc/config/riscv/riscv.opt                    |   5 +
 .../gcc.target/riscv/zbb-strcmp-unaligned.c   |  36 ++
 gcc/testsuite/gcc.target/riscv/zbb-strcmp.c   |  55 +++
 6 files changed, 487 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strcmp-unaligned.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strcmp.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 18187e3bd78..7f334be333c 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -97,6 +97,7 @@ rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
 /* Routines implemented in riscv-string.c.  */
 extern bool riscv_expand_block_move (rtx, rtx, rtx);
 extern bool riscv_expand_strlen (rtx[]);
+extern bool riscv_expand_strn_compare (rtx[], int);
 
 /* Information about one CPU we know about.  */
 struct riscv_cpu_info {
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index bf96522b608..f157e04ac0c 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -84,6 +84,11 @@ GEN_EMIT_HELPER2(one_cmpl) /* do_one_cmpl2  */
 GEN_EMIT_HELPER2(clz) /* do_clz2  */
 GEN_EMIT_HELPER2(ctz) /* do_ctz2  */
 GEN_EMIT_HELPER2(zero_extendqi) /* do_zero_extendqi2  */
+GEN_EMIT_HELPER3(xor) /* do_xor3  */
+GEN_EMIT_HELPER3(ashl) /* do_ashl3  */
+GEN_EMIT_HELPER2(bswap) /* do_bswap2  */
+GEN_EMIT_HELPER3(riscv_ior_not) /* do_riscv_ior_not3  */
+GEN_EMIT_HELPER3(riscv_and_not) /* do_riscv_and_not3  */
 
 /* Helper function to load a byte or a Pmode register.
 
@@ -268,6 +273,345 @@ riscv_expand_block_move (rtx dest, rtx src, rtx length)
   return false;
 }
 
+/* Generate the sequence of compares for strcmp/strncmp using zbb instructions.
+   BYTES_TO_COMPARE is the number of bytes to be compared.
+   BASE_ALIGN is the smaller of the alignment of the two strings.
+   ORIG_SRC1 is the unmodified rtx for the first string.
+   ORIG_SRC2 is the unmodified rtx for the second string.
+   DATA1 is the register for loading the first string.
+   DATA2 is the register for loading the second string.
+   HAS_NUL is the register holding non-NUL bytes for NUL-bytes in the string.
+   TARGET is the rtx for the result register (SImode)
+   EQUALITY_COMPARE_REST if set, then we hand over to libc if string matches.
+   END_LABEL is the location before the calculation of the result value.
+   FINAL_LABEL is the location after the calculation of the result value.  */
+
+static void
+expand_strncmp_zbb_sequence (unsigned HOST_WIDE_INT bytes_to_compare,
+			     rtx src1, rtx src2, rtx data1, rtx data2,
+			     rtx target, rtx orc, bool equality_compare_rest,
+			     rtx end_label, rtx final_label)
+{
+  const unsigned HOST_WIDE_INT p_mode_size = GET_MODE_SIZE (Pmode);
+  rtx src1_addr = force_reg (Pmode, XEXP (src1, 0));
+  rtx src2_addr = force_reg (Pmode, XEXP (src2, 0));
+  unsigned HOST_WIDE_INT offset = 0;
+
+  rtx m1 = gen_reg_rtx (Pmode);
+  emit_insn (gen_rtx_SET (m1, constm1_rtx));
+
+  /* Generate a compare sequence.  */
+  while (bytes_to_compare > 0)
+    {
+      machine_mode load_mode = QImode;
+      unsigned HOST_WIDE_INT load_mode_size = 1;
+      if (bytes_to_compare > 1)
+	{
+	  load_mode = Pmode;
+	  load_mode_size = p_mode_size;
+	}
+      unsigned HOST_WIDE_INT cmp_bytes = 0;
+
+      if (bytes_to_compare >= load_mode_size)
+	cmp_bytes = load_mode_size;
+      else
+	cmp_bytes = bytes_to_compare;
+
+      unsigned HOST_WIDE_INT remain = bytes_to_compare - cmp_bytes;
+
+      /* load_mode_size...bytes we will read
+	 cmp_bytes...bytes we will compare (might be less than load_mode_size)
+	 bytes_to_compare...bytes we will compare (incl. cmp_bytes)
+	 remain...bytes left to compare (excl. cmp_bytes)  */
+
+      rtx addr1 = gen_rtx_PLUS (Pmode, src1_addr, GEN_INT (offset));
+      rtx addr2 = gen_rtx_PLUS (Pmode, src2_addr, GEN_INT (offset));
+
+      do_load_from_addr (load_mode, data1, addr1, src1);
+      do_load_from_addr (load_mode, data2, addr2, src2);
+
+      if (load_mode_size == 1)
+	{
+	  /* Special case for comparing just single (last) byte.  */
+	  gcc_assert (remain == 0);
+
+	  if (!equality_compare_rest)
+	    {
+	      /* Calculate difference and jump to final_label.  */
+	      rtx result = gen_reg_rtx (Pmode);
+	      do_sub3 (result, data1, data2);
+	      emit_insn (gen_movsi (target, gen_lowpart (SImode, result)));
+	      emit_jump_insn (gen_jump (final_label));
+	    }
+	  else
+	    {
+	      /* Compare both bytes and jump to final_label if not equal.  */
+	      rtx result = gen_reg_rtx (Pmode);
+	      do_sub3 (result, data1, data2);
+	      emit_insn (gen_movsi (target, gen_lowpart (SImode, result)));
+	      /* Check if str1[i] is NULL.  */
+	      rtx cond1 = gen_rtx_EQ (VOIDmode, data1, const0_rtx);
+	      riscv_emit_unlikely_jump (gen_cbranch4 (Pmode, cond1,
+					data1, const0_rtx, final_label));
+	      /* Check if str1[i] == str2[i].  */
+	      rtx cond2 = gen_rtx_NE (VOIDmode, data1, data2);
+	      riscv_emit_unlikely_jump (gen_cbranch4 (Pmode, cond2,
+					data1, data2, final_label));
+	      /* Processing will fall through to libc calls.  */
+	    }
+	}
+      else
+	{
+	  /* Eliminate irrelevant data (behind the N-th character).  */
+	  if (bytes_to_compare < p_mode_size)
+	    {
+	      gcc_assert (remain == 0);
+	     /* Set a NUL-byte after the relevant data (behind the string).  */
+	      unsigned long im = 0xffUL;
+	      rtx imask = gen_rtx_CONST_INT (Pmode, im);
+	      rtx m_reg = gen_reg_rtx (Pmode);
+	      emit_insn (gen_rtx_SET (m_reg, imask));
+	      do_ashl3 (m_reg, m_reg, GEN_INT (cmp_bytes * BITS_PER_UNIT));
+	      do_riscv_and_not3 (data1, m_reg, data1);
+	      do_riscv_and_not3 (data2, m_reg, data2);
+	      do_orcb2 (orc, data1);
+	      emit_jump_insn (gen_jump (end_label));
+	    }
+	  else
+	    {
+	      /* Check if data1 contains a NUL character.  */
+	      do_orcb2 (orc, data1);
+	      rtx cond1 = gen_rtx_NE (VOIDmode, orc, m1);
+	      riscv_emit_unlikely_jump (gen_cbranch4 (Pmode, cond1, orc, m1,
+						      end_label));
+
+	      /* Break out if u1 != u2 */
+	      rtx cond2 = gen_rtx_NE (VOIDmode, data1, data2);
+	      riscv_emit_unlikely_jump (gen_cbranch4 (Pmode, cond2, data1,
+						      data2, end_label));
+
+	      /* Fast-exit for complete and equal strings.  */
+	      if (remain == 0 && !equality_compare_rest)
+		{
+		  /* All compared and everything was equal.  */
+		  emit_insn (gen_rtx_SET (target, gen_rtx_CONST_INT (SImode, 0)));
+		  emit_jump_insn (gen_jump (final_label));
+		}
+	    }
+	}
+
+      offset += cmp_bytes;
+      bytes_to_compare -= cmp_bytes;
+    }
+  /* Processing will fall through to libc calls.  */
+}
+
+/* Emit a string comparison sequence using Zbb instruction.
+
+   OPERANDS[0] is the target (result).
+   OPERANDS[1] is the first source.
+   OPERANDS[2] is the second source.
+   If NO_LENGTH is zero, then:
+   OPERANDS[3] is the length.
+   OPERANDS[4] is the alignment in bytes.
+   If NO_LENGTH is nonzero, then:
+   OPERANDS[3] is the alignment in bytes.
+   BYTES_TO_COMPARE is the maximum number of bytes to compare.
+   EQUALITY_COMPARE_REST defines if str(n)cmp should be called on equality.
+ */
+
+static bool
+riscv_emit_str_compare_zbb (rtx operands[], int no_length,
+			    unsigned HOST_WIDE_INT bytes_to_compare,
+			    bool equality_compare_rest)
+{
+  const unsigned HOST_WIDE_INT p_mode_size = GET_MODE_SIZE (Pmode);
+  rtx target = operands[0];
+  rtx src1 = operands[1];
+  rtx src2 = operands[2];
+  rtx bytes_rtx = NULL;
+  rtx align_rtx = operands[3];
+
+  if (!no_length)
+    {
+      bytes_rtx = operands[3];
+      align_rtx = operands[4];
+    }
+
+  gcc_assert (TARGET_ZBB);
+
+  /* Enable only if we can access at least one XLEN-register.  */
+  if (bytes_to_compare < p_mode_size)
+    return false;
+
+  /* Limit to 12-bits (maximum load-offset).  */
+  if (bytes_to_compare > IMM_REACH)
+    return false;
+
+  /* We don't support big endian.  */
+  if (BYTES_BIG_ENDIAN)
+    return false;
+
+  /* We need to know the alignment.  */
+  if (!CONST_INT_P (align_rtx))
+    return false;
+
+  unsigned HOST_WIDE_INT base_align = UINTVAL (align_rtx);
+  unsigned HOST_WIDE_INT required_align = p_mode_size;
+  if (base_align < required_align)
+    return false;
+
+  rtx data1 = gen_reg_rtx (Pmode);
+  rtx data2 = gen_reg_rtx (Pmode);
+  rtx orc = gen_reg_rtx (Pmode);
+  rtx end_label = gen_label_rtx ();
+  rtx final_label = gen_label_rtx ();
+
+  /* Generate a sequence of zbb instructions to compare out
+     to the length specified.  */
+  expand_strncmp_zbb_sequence (bytes_to_compare, src1, src2, data1, data2,
+			       target, orc, equality_compare_rest,
+			       end_label, final_label);
+
+  if (equality_compare_rest)
+    {
+      /* Update pointers past what has been compared already.  */
+      rtx src1_addr = force_reg (Pmode, XEXP (src1, 0));
+      rtx src2_addr = force_reg (Pmode, XEXP (src2, 0));
+      unsigned HOST_WIDE_INT offset = bytes_to_compare;
+      rtx src1 = force_reg (Pmode,
+			    gen_rtx_PLUS (Pmode, src1_addr, GEN_INT (offset)));
+      rtx src2 = force_reg (Pmode,
+			    gen_rtx_PLUS (Pmode, src2_addr, GEN_INT (offset)));
+
+      /* Construct call to strcmp/strncmp to compare the rest of the string.  */
+      if (no_length)
+	{
+	  tree fun = builtin_decl_explicit (BUILT_IN_STRCMP);
+	  emit_library_call_value (XEXP (DECL_RTL (fun), 0),
+				   target, LCT_NORMAL, GET_MODE (target),
+				   src1, Pmode, src2, Pmode);
+	}
+      else
+	{
+	  unsigned HOST_WIDE_INT bytes = UINTVAL (bytes_rtx);
+	  unsigned HOST_WIDE_INT delta = bytes - bytes_to_compare;
+	  gcc_assert (delta > 0);
+	  rtx len_rtx = gen_reg_rtx (Pmode);
+	  emit_move_insn (len_rtx, gen_int_mode (delta, Pmode));
+	  tree fun = builtin_decl_explicit (BUILT_IN_STRNCMP);
+	  emit_library_call_value (XEXP (DECL_RTL (fun), 0),
+				   target, LCT_NORMAL, GET_MODE (target),
+				   src1, Pmode, src2, Pmode, len_rtx, Pmode);
+	}
+
+      emit_jump_insn (gen_jump (final_label));
+    }
+
+  emit_barrier (); /* No fall-through.  */
+
+  emit_label (end_label);
+
+  /* Convert non-equal bytes into non-NUL bytes.  */
+  rtx diff = gen_reg_rtx (Pmode);
+  do_xor3 (diff, data1, data2);
+  do_orcb2 (diff, diff);
+
+  /* Convert non-equal or NUL-bytes into non-NUL bytes.  */
+  rtx syndrome = gen_reg_rtx (Pmode);
+  do_riscv_ior_not3 (syndrome, orc, diff);
+
+  /* Count the number of equal bits from the beginning of the word.  */
+  rtx shift = gen_reg_rtx (Pmode);
+  do_ctz2 (shift, syndrome);
+
+  do_bswap2 (data1, data1);
+  do_bswap2 (data2, data2);
+
+  /* The most-significant-non-zero bit of the syndrome marks either the
+     first bit that is different, or the top bit of the first zero byte.
+     Shifting left now will bring the critical information into the
+     top bits.  */
+  do_ashl3 (data1, data1, gen_lowpart (QImode, shift));
+  do_ashl3 (data2, data2, gen_lowpart (QImode, shift));
+
+  /* But we need to zero-extend (char is unsigned) the value and then
+     perform a signed 32-bit subtraction.  */
+  unsigned int shiftr = p_mode_size * BITS_PER_UNIT - 8;
+  do_lshr3 (data1, data1, GEN_INT (shiftr));
+  do_lshr3 (data2, data2, GEN_INT (shiftr));
+
+  rtx result = gen_reg_rtx (Pmode);
+  do_sub3 (result, data1, data2);
+  emit_insn (gen_movsi (target, gen_lowpart (SImode, result)));
+
+  /* And we are done.  */
+  emit_label (final_label);
+  return true;
+}
+
+/* Expand a string compare operation with length, and return
+   true if successful.  Return false if we should let the
+   compiler generate normal code, probably a strncmp call.
+   If NO_LENGTH is set, there is no upper bound of the strings.
+
+   OPERANDS[0] is the target (result).
+   OPERANDS[1] is the first source.
+   OPERANDS[2] is the second source.
+   If NO_LENGTH is zero, then:
+   OPERANDS[3] is the length.
+   OPERANDS[4] is the alignment in bytes.
+   If NO_LENGTH is nonzero, then:
+   OPERANDS[3] is the alignment in bytes.  */
+
+bool
+riscv_expand_strn_compare (rtx operands[], int no_length)
+{
+  rtx bytes_rtx = NULL;
+  const unsigned HOST_WIDE_INT compare_max = riscv_string_compare_inline_limit;
+  unsigned HOST_WIDE_INT compare_length; /* How much to compare inline.  */
+  bool equality_compare_rest = false; /* Call libc to compare remainder.  */
+
+  if (riscv_string_compare_inline_limit == 0)
+    return false;
+
+  /* Decide how many bytes to compare inline and what to do if there is
+     no difference detected at the end of the compared bytes.
+     We might call libc to continue the comparison.  */
+  if (no_length)
+    {
+      compare_length = compare_max;
+      equality_compare_rest = true;
+    }
+  else
+    {
+      /* If we have a length, it must be constant.  */
+      bytes_rtx = operands[3];
+      if (!CONST_INT_P (bytes_rtx))
+	return false;
+
+      unsigned HOST_WIDE_INT bytes = UINTVAL (bytes_rtx);
+      if (bytes <= compare_max)
+	{
+	  compare_length = bytes;
+	  equality_compare_rest = false;
+	}
+      else
+	{
+	  compare_length = compare_max;
+	  equality_compare_rest = true;
+	}
+    }
+
+  if (TARGET_ZBB)
+    {
+      return riscv_emit_str_compare_zbb (operands, no_length, compare_length,
+					 equality_compare_rest);
+    }
+
+  return false;
+}
+
 /* If the provided string is aligned, then read XLEN bytes
    in a loop and use orc.b to find NUL-bytes.  */
 
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index f05c764c3d4..dce33a4b638 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -3010,6 +3010,52 @@ (define_expand "cpymemsi"
     FAIL;
 })
 
+;; String compare N insn.
+;; Argument 0 is the target (result)
+;; Argument 1 is the source1
+;; Argument 2 is the source2
+;; Argument 3 is the length
+;; Argument 4 is the alignment
+
+(define_expand "cmpstrnsi"
+  [(parallel [(set (match_operand:SI 0)
+	      (compare:SI (match_operand:BLK 1)
+			  (match_operand:BLK 2)))
+	      (use (match_operand:SI 3))
+	      (use (match_operand:SI 4))])]
+  ""
+{
+  if (optimize_insn_for_size_p ())
+    FAIL;
+
+  if (riscv_expand_strn_compare (operands, 0))
+    DONE;
+  else
+    FAIL;
+})
+
+;; String compare insn.
+;; Argument 0 is the target (result)
+;; Argument 1 is the destination
+;; Argument 2 is the source
+;; Argument 3 is the alignment
+
+(define_expand "cmpstrsi"
+  [(parallel [(set (match_operand:SI 0)
+	      (compare:SI (match_operand:BLK 1)
+			  (match_operand:BLK 2)))
+	      (use (match_operand:SI 3))])]
+  ""
+{
+  if (optimize_insn_for_size_p ())
+    FAIL;
+
+  if (riscv_expand_strn_compare (operands, 1))
+    DONE;
+  else
+    FAIL;
+})
+
 ;; Search character in string (generalization of strlen).
 ;; Argument 0 is the resulting offset
 ;; Argument 1 is the string
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 7c3ca48d1cc..fdf768ae9a7 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -249,3 +249,8 @@ Enum(isa_spec_class) String(20191213) Value(ISA_SPEC_CLASS_20191213)
 misa-spec=
 Target RejectNegative Joined Enum(isa_spec_class) Var(riscv_isa_spec) Init(TARGET_DEFAULT_ISA_SPEC)
 Set the version of RISC-V ISA spec.
+
+mstring-compare-inline-limit=
+Target Var(riscv_string_compare_inline_limit) Init(64) RejectNegative Joined UInteger Save
+Max number of bytes to compare.
+
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-strcmp-unaligned.c b/gcc/testsuite/gcc.target/riscv/zbb-strcmp-unaligned.c
new file mode 100644
index 00000000000..2126c849e0a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-strcmp-unaligned.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbb -mabi=lp64 -mstring-compare-inline-limit=64" } */
+
+typedef long unsigned int size_t;
+
+int
+my_str_cmp (const char *s1, const char *s2)
+{
+  return __builtin_strcmp (s1, s2);
+}
+
+int
+my_str_cmp_const (const char *s1)
+{
+  return __builtin_strcmp (s1, "foo");
+}
+
+int
+my_strn_cmp (const char *s1, const char *s2, size_t n)
+{
+  return __builtin_strncmp (s1, s2, n);
+}
+
+int
+my_strn_cmp_const (const char *s1, size_t n)
+{
+  return __builtin_strncmp (s1, "foo", n);
+}
+
+int
+my_strn_cmp_bounded (const char *s1, const char *s2)
+{
+  return __builtin_strncmp (s1, s2, 42);
+}
+
+/* { dg-final { scan-assembler-not "orc.b\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-strcmp.c b/gcc/testsuite/gcc.target/riscv/zbb-strcmp.c
new file mode 100644
index 00000000000..3465e7ffee3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-strcmp.c
@@ -0,0 +1,55 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbb -mabi=lp64 -mstring-compare-inline-limit=64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Oz" "-Og" } } */
+
+typedef long unsigned int size_t;
+
+/* Emits 8+1 orc.b instructions.  */
+
+int
+my_str_cmp (const char *s1, const char *s2)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  s2 = __builtin_assume_aligned (s2, 4096);
+  return __builtin_strcmp (s1, s2);
+}
+
+/* 8+1 because the backend does not know the size of "foo".  */
+
+int
+my_str_cmp_const (const char *s1)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  return __builtin_strcmp (s1, "foo");
+}
+
+/* Emits 6+1 orc.b instructions.  */
+
+int
+my_strn_cmp (const char *s1, const char *s2)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  s2 = __builtin_assume_aligned (s2, 4096);
+  return __builtin_strncmp (s1, s2, 42);
+}
+
+/* Note expanded because the backend does not know the size of "foo".  */
+
+int
+my_strn_cmp_const (const char *s1, size_t n)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  return __builtin_strncmp (s1, "foo", n);
+}
+
+/* Emits 6+1 orc.b instructions.  */
+
+int
+my_strn_cmp_bounded (const char *s1, const char *s2)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  s2 = __builtin_assume_aligned (s2, 4096);
+  return __builtin_strncmp (s1, s2, 42);
+}
+
+/* { dg-final { scan-assembler-times "orc.b\t" 32 } } */
-- 
2.38.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/7] riscv: Enable overlap-by-pieces via tune param
  2022-11-13 23:05 ` [PATCH 3/7] riscv: Enable overlap-by-pieces via tune param Christoph Muellner
@ 2022-11-14  2:48   ` Vineet Gupta
  2022-11-14  7:59     ` Philipp Tomsich
  0 siblings, 1 reply; 33+ messages in thread
From: Vineet Gupta @ 2022-11-14  2:48 UTC (permalink / raw)
  To: Christoph Muellner, gcc-patches, Kito Cheng, Jim Wilson,
	Palmer Dabbelt, Andrew Waterman, Philipp Tomsich, Jeff Law



On 11/13/22 15:05, Christoph Muellner wrote:
>   
> +static bool
> +riscv_overlap_op_by_pieces (void)
> +{
> +  return tune_param->overlap_op_by_pieces;

Does this not need to be gated on unaligned access enabled as well.

-Vineet


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/7] riscv: Enable overlap-by-pieces via tune param
  2022-11-14  2:48   ` Vineet Gupta
@ 2022-11-14  7:59     ` Philipp Tomsich
  2022-11-14  8:29       ` Christoph Müllner
  0 siblings, 1 reply; 33+ messages in thread
From: Philipp Tomsich @ 2022-11-14  7:59 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Christoph Muellner, gcc-patches, Kito Cheng, Jim Wilson,
	Palmer Dabbelt, Andrew Waterman, Jeff Law

On Mon, 14 Nov 2022 at 03:48, Vineet Gupta <vineetg@rivosinc.com> wrote:
>
>
>
> On 11/13/22 15:05, Christoph Muellner wrote:
> >
> > +static bool
> > +riscv_overlap_op_by_pieces (void)
> > +{
> > +  return tune_param->overlap_op_by_pieces;
>
> Does this not need to be gated on unaligned access enabled as well.

I assume you mean "&& !STRICT_ALIGNMENT"?

Philipp.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/7] riscv: Enable overlap-by-pieces via tune param
  2022-11-14  7:59     ` Philipp Tomsich
@ 2022-11-14  8:29       ` Christoph Müllner
  2022-11-14 19:04         ` Jeff Law
  0 siblings, 1 reply; 33+ messages in thread
From: Christoph Müllner @ 2022-11-14  8:29 UTC (permalink / raw)
  To: Philipp Tomsich
  Cc: Vineet Gupta, gcc-patches, Kito Cheng, Jim Wilson,
	Palmer Dabbelt, Andrew Waterman, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 757 bytes --]

On Mon, Nov 14, 2022 at 8:59 AM Philipp Tomsich <philipp.tomsich@vrull.eu>
wrote:

> On Mon, 14 Nov 2022 at 03:48, Vineet Gupta <vineetg@rivosinc.com> wrote:
> >
> >
> >
> > On 11/13/22 15:05, Christoph Muellner wrote:
> > >
> > > +static bool
> > > +riscv_overlap_op_by_pieces (void)
> > > +{
> > > +  return tune_param->overlap_op_by_pieces;
> >
> > Does this not need to be gated on unaligned access enabled as well.
>
> I assume you mean "&& !STRICT_ALIGNMENT"?
>

I think the case that slow_unaligned_access and overlap_op_by_pieces will
both be set will not occur (we can defer the discussion about that until
then).
Gating overlap_op_by_pieces with !TARGET_STRICT_ALIGN is a good idea.
It will be fixed for a v2.

Thanks,
Christoph



>
> Philipp.
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/7] riscv: bitmanip: add orc.b as an unspec
  2022-11-13 23:05 ` [PATCH 1/7] riscv: bitmanip: add orc.b as an unspec Christoph Muellner
@ 2022-11-14 16:51   ` Jeff Law
  2022-11-14 17:53     ` Jeff Law
  2022-11-14 19:05     ` Philipp Tomsich
  0 siblings, 2 replies; 33+ messages in thread
From: Jeff Law @ 2022-11-14 16:51 UTC (permalink / raw)
  To: Christoph Muellner, gcc-patches, Kito Cheng, Jim Wilson,
	Palmer Dabbelt, Andrew Waterman, Philipp Tomsich, Vineet Gupta


On 11/13/22 16:05, Christoph Muellner wrote:
> From: Philipp Tomsich <philipp.tomsich@vrull.eu>
>
> As a basis for optimized string functions (e.g., the by-pieces
> implementations), we need orc.b available.  This adds orc.b as an
> unspec, so we can expand to it.
>
> gcc/ChangeLog:
>
>          * config/riscv/bitmanip.md (orcb<mode>2): Add orc.b as an
> 	  unspec.
>          * config/riscv/riscv.md: Add UNSPEC_ORC_B.
In general, we should prefer to express things as "real" RTL rather than 
UNSPECS.  In this particular case expressing the orc could be done with 
a handful of IOR expressions, though they'd probably need to reference 
byte SUBREGs of the input and I dislike explicit SUBREGs in the md file 
even more than UNSPECs.  So....

OK.


Jeff


ps.  We could consider this a reduc_ior_scal insn, but that may be 
actively harmful.  Having vector ops on the general and vector registers 
is a wart I hope we can avoid.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/7] riscv: bitmanip/zbb: Add prefix/postfix and enable visiblity
  2022-11-13 23:05 ` [PATCH 2/7] riscv: bitmanip/zbb: Add prefix/postfix and enable visiblity Christoph Muellner
@ 2022-11-14 16:55   ` Jeff Law
  0 siblings, 0 replies; 33+ messages in thread
From: Jeff Law @ 2022-11-14 16:55 UTC (permalink / raw)
  To: Christoph Muellner, gcc-patches, Kito Cheng, Jim Wilson,
	Palmer Dabbelt, Andrew Waterman, Philipp Tomsich, Vineet Gupta


On 11/13/22 16:05, Christoph Muellner wrote:
> From: Christoph Müllner <christoph.muellner@vrull.eu>
>
> INSNs are usually postfixed by a number representing the argument count.
> Given the instructions will be used in a later commit, let's make them
> visible, but add a "riscv_" prefix to avoid conflicts with standard
> INSNs.
>
> gcc/ChangeLog:
>
> 	* config/riscv/bitmanip.md (*<optab>_not<mode>): Rename INSN.
> 	(riscv_<optab>_not<mode>3): Rename INSN.
> 	(*xor_not<mode>): Rename INSN.
> 	(xor_not<mode>3): Rename INSN.

Not strictly necessary, but given how often I've seen ports expose an 
insn with a standard name, but ever so slightly different semantics and 
the ensuing code correctness issues, I like the idea of prefixing.


OK

jeff



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/7] riscv: Move riscv_block_move_loop to separate file
  2022-11-13 23:05 ` [PATCH 4/7] riscv: Move riscv_block_move_loop to separate file Christoph Muellner
@ 2022-11-14 16:56   ` Jeff Law
  0 siblings, 0 replies; 33+ messages in thread
From: Jeff Law @ 2022-11-14 16:56 UTC (permalink / raw)
  To: Christoph Muellner, gcc-patches, Kito Cheng, Jim Wilson,
	Palmer Dabbelt, Andrew Waterman, Philipp Tomsich, Vineet Gupta


On 11/13/22 16:05, Christoph Muellner wrote:
> From: Christoph Müllner <christoph.muellner@vrull.eu>
>
> Let's try to not accumulate too much functionality in one single file
> as this does not really help maintaining or extending the code.
> So in order to add more similar functionality like riscv_block_move_loop
> let's move this function to a separate file.
>
> This change does not do any functional changes.
> It does modify a single line in the existing code,
> that check_GNU_style.py complained about.
>
> gcc/ChangeLog:
>
> 	* config.gcc: Add new object riscv-string.o
> 	* config/riscv/riscv-protos.h (riscv_expand_block_move): Remove
> 	  duplicated prototype and move to new section for
> 	  riscv-string.cc.
> 	* config/riscv/riscv.cc (riscv_block_move_straight): Remove function.
> 	(riscv_adjust_block_mem): Likewise.
> 	(riscv_block_move_loop): Likewise.
> 	(riscv_expand_block_move): Likewise.
> 	* config/riscv/riscv.md (cpymemsi): Move to new section for
> 	  riscv-string.cc.
> 	* config/riscv/t-riscv: Add compile rule for riscv-string.o
> 	* config/riscv/riscv-string.c: New file.

OK.  Note I suspect the commit hooks are going to complain about your 
ChangeLog formatting.


jeff



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/7] riscv: Use by-pieces to do overlapping accesses in block_move_straight
  2022-11-13 23:05 ` [PATCH 5/7] riscv: Use by-pieces to do overlapping accesses in block_move_straight Christoph Muellner
@ 2022-11-14 17:16   ` Jeff Law
  2022-11-14 19:01     ` Christoph Müllner
  0 siblings, 1 reply; 33+ messages in thread
From: Jeff Law @ 2022-11-14 17:16 UTC (permalink / raw)
  To: Christoph Muellner, gcc-patches, Kito Cheng, Jim Wilson,
	Palmer Dabbelt, Andrew Waterman, Philipp Tomsich, Vineet Gupta


On 11/13/22 16:05, Christoph Muellner wrote:
> From: Christoph Müllner <christoph.muellner@vrull.eu>
>
> The current implementation of riscv_block_move_straight() emits a couple
> of load-store pairs with maximum width (e.g. 8-byte for RV64).
> The remainder is handed over to move_by_pieces(), which emits code based
> target settings like slow_unaligned_access and overlap_op_by_pieces.
>
> move_by_pieces() will emit overlapping memory accesses with maximum
> width only if the given length exceeds the size of one access
> (e.g. 15-bytes for 8-byte accesses).
>
> This patch changes the implementation of riscv_block_move_straight()
> such, that it preserves a remainder within the interval
> [delta..2*delta) instead of [0..delta), so that overlapping memory
> access may be emitted (if the requirements for them are given).
>
> gcc/ChangeLog:
>
> 	* config/riscv/riscv-string.c (riscv_block_move_straight):
> 	  Adjust range for emitted load/store pairs.

The change to riscv_expand_block_move isn't noted in the ChangeLog.  OK 
with that fixed (I'm assuming you want to attempt to use overlapping 
word ops for that case).


jeff



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/7] riscv: bitmanip: add orc.b as an unspec
  2022-11-14 16:51   ` Jeff Law
@ 2022-11-14 17:53     ` Jeff Law
  2022-11-14 19:05     ` Philipp Tomsich
  1 sibling, 0 replies; 33+ messages in thread
From: Jeff Law @ 2022-11-14 17:53 UTC (permalink / raw)
  To: Christoph Muellner, gcc-patches, Kito Cheng, Jim Wilson,
	Palmer Dabbelt, Andrew Waterman, Philipp Tomsich, Vineet Gupta


On 11/14/22 09:51, Jeff Law wrote:
>
> On 11/13/22 16:05, Christoph Muellner wrote:
>> From: Philipp Tomsich <philipp.tomsich@vrull.eu>
>>
>> As a basis for optimized string functions (e.g., the by-pieces
>> implementations), we need orc.b available.  This adds orc.b as an
>> unspec, so we can expand to it.
>>
>> gcc/ChangeLog:
>>
>>          * config/riscv/bitmanip.md (orcb<mode>2): Add orc.b as an
>>       unspec.
>>          * config/riscv/riscv.md: Add UNSPEC_ORC_B.
> In general, we should prefer to express things as "real" RTL rather 
> than UNSPECS.  In this particular case expressing the orc could be 
> done with a handful of IOR expressions, though they'd probably need to 
> reference byte SUBREGs of the input and I dislike explicit SUBREGs in 
> the md file even more than UNSPECs. 

Mis-read the specs on orc.  So ignore the comment about expressing this 
as a handful of IORs and about it being reduc_ior_scal.

jeff



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/7] riscv: Add support for strlen inline expansion
  2022-11-13 23:05 ` [PATCH 6/7] riscv: Add support for strlen inline expansion Christoph Muellner
@ 2022-11-14 18:17   ` Jeff Law
  2022-11-14 21:07     ` Christoph Müllner
  0 siblings, 1 reply; 33+ messages in thread
From: Jeff Law @ 2022-11-14 18:17 UTC (permalink / raw)
  To: Christoph Muellner, gcc-patches, Kito Cheng, Jim Wilson,
	Palmer Dabbelt, Andrew Waterman, Philipp Tomsich, Vineet Gupta


On 11/13/22 16:05, Christoph Muellner wrote:
> From: Christoph Müllner <christoph.muellner@vrull.eu>
>
> This patch implements the expansion of the strlen builtin
> using Zbb instructions (if available) for aligned strings
> using the following sequence:
>
>        li      a3,-1
>        addi    a4,a0,8
> .L2:  ld      a5,0(a0)
>        addi    a0,a0,8
>        orc.b   a5,a5
>        beq     a5,a3,6 <.L2>
>        not     a5,a5
>        ctz     a5,a5
>        srli    a5,a5,0x3
>        add     a0,a0,a5
>        sub     a0,a0,a4
>
> This allows to inline calls to strlen(), with optimized code for
> determining the length of a string.
>
> gcc/ChangeLog:
>
> 	* config/riscv/riscv-protos.h (riscv_expand_strlen): New
> 	  prototype.
> 	* config/riscv/riscv-string.cc (riscv_emit_unlikely_jump): New
> 	  function.
> 	(GEN_EMIT_HELPER2): New helper macro.
> 	(GEN_EMIT_HELPER3): New helper macro.
> 	(do_load_from_addr): New helper function.
> 	(riscv_expand_strlen_zbb): New function.
> 	(riscv_expand_strlen): New function.
> 	* config/riscv/riscv.md (strlen<mode>): Invoke expansion
> 	  functions for strlen.
>
>
> +extern bool riscv_expand_strlen (rtx[]);

Consider adding the number of elements in the RTX array here. Martin S's 
work from a little while ago will make use of it to try and catch 
over-reads and over-writes if the data is available.


>   
>   /* Information about one CPU we know about.  */
>   struct riscv_cpu_info {
> diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
> index 1137df475be..bf96522b608 100644
> --- a/gcc/config/riscv/riscv-string.cc
> +++ b/gcc/config/riscv/riscv-string.cc
> @@ -38,6 +38,81 @@
>   #include "predict.h"
>   #include "optabs.h"
>   
> +/* Emit unlikely jump instruction.  */
> +
> +static rtx_insn *
> +riscv_emit_unlikely_jump (rtx insn)
> +{
> +  rtx_insn *jump = emit_jump_insn (insn);
> +  add_reg_br_prob_note (jump, profile_probability::very_unlikely ());
> +  return jump;
> +}

I was a bit surprised that we didn't have this as a generic routine.   
Consider adding this to emit-rtl.cc along with its companion 
emit_likely_jump.  Not a requirement to move forward, but it seems like 
the right thing to do.




> +
> +/* Emit proper instruction depending on type of dest.  */

s/type/mode/



> +
> +/* Emit proper instruction depending on type of dest.  */

s/type/mode/


You probably want to undefine GEN_EMIT_HELPER once you're done when 
them.  That's become fairly standard practice for these kind of helper 
macros.

OK with the nits fixed.  Your call on whether or not to move the 
implementation of emit_likely_jump and emit_unlikely_jump into emit-rtl.cc.


Jeff



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/7] riscv: Use by-pieces to do overlapping accesses in block_move_straight
  2022-11-14 17:16   ` Jeff Law
@ 2022-11-14 19:01     ` Christoph Müllner
  2022-11-14 19:05       ` Jeff Law
  0 siblings, 1 reply; 33+ messages in thread
From: Christoph Müllner @ 2022-11-14 19:01 UTC (permalink / raw)
  To: Jeff Law
  Cc: gcc-patches, Kito Cheng, Jim Wilson, Palmer Dabbelt,
	Andrew Waterman, Philipp Tomsich, Vineet Gupta

[-- Attachment #1: Type: text/plain, Size: 1799 bytes --]

On Mon, Nov 14, 2022 at 6:16 PM Jeff Law <jeffreyalaw@gmail.com> wrote:

>
> On 11/13/22 16:05, Christoph Muellner wrote:
> > From: Christoph Müllner <christoph.muellner@vrull.eu>
> >
> > The current implementation of riscv_block_move_straight() emits a couple
> > of load-store pairs with maximum width (e.g. 8-byte for RV64).
> > The remainder is handed over to move_by_pieces(), which emits code based
> > target settings like slow_unaligned_access and overlap_op_by_pieces.
> >
> > move_by_pieces() will emit overlapping memory accesses with maximum
> > width only if the given length exceeds the size of one access
> > (e.g. 15-bytes for 8-byte accesses).
> >
> > This patch changes the implementation of riscv_block_move_straight()
> > such, that it preserves a remainder within the interval
> > [delta..2*delta) instead of [0..delta), so that overlapping memory
> > access may be emitted (if the requirements for them are given).
> >
> > gcc/ChangeLog:
> >
> >       * config/riscv/riscv-string.c (riscv_block_move_straight):
> >         Adjust range for emitted load/store pairs.
>
> The change to riscv_expand_block_move isn't noted in the ChangeLog.  OK
> with that fixed (I'm assuming you want to attempt to use overlapping
> word ops for that case).
>

The change in riscv_expand_block_move is a code cleanup.
At the beginning of riscv_expand_block_move we do the following:
  unsigned HOST_WIDE_INT length = UINTVAL (length);
The signature of riscv_block_move_straight wants a "unsigned HOST_WIDE_INT
length".
So we can simply reuse length instead of doing "INTVAL (length)", which is
redundant
and uses the wrong signess (INTVAL vs UINTVAL).

Also, the ChangeLog entry for the test was missing.

Thanks,
Christoph


>
>
> jeff
>
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/7] riscv: Enable overlap-by-pieces via tune param
  2022-11-14  8:29       ` Christoph Müllner
@ 2022-11-14 19:04         ` Jeff Law
  2022-11-14 19:07           ` Christoph Müllner
  0 siblings, 1 reply; 33+ messages in thread
From: Jeff Law @ 2022-11-14 19:04 UTC (permalink / raw)
  To: Christoph Müllner, Philipp Tomsich
  Cc: Vineet Gupta, gcc-patches, Kito Cheng, Jim Wilson,
	Palmer Dabbelt, Andrew Waterman

[-- Attachment #1: Type: text/plain, Size: 984 bytes --]


On 11/14/22 01:29, Christoph Müllner wrote:
>
>
> On Mon, Nov 14, 2022 at 8:59 AM Philipp Tomsich 
> <philipp.tomsich@vrull.eu> wrote:
>
>     On Mon, 14 Nov 2022 at 03:48, Vineet Gupta <vineetg@rivosinc.com>
>     wrote:
>     >
>     >
>     >
>     > On 11/13/22 15:05, Christoph Muellner wrote:
>     > >
>     > > +static bool
>     > > +riscv_overlap_op_by_pieces (void)
>     > > +{
>     > > +  return tune_param->overlap_op_by_pieces;
>     >
>     > Does this not need to be gated on unaligned access enabled as well.
>
>     I assume you mean "&& !STRICT_ALIGNMENT"?
>
>
> I think the case that slow_unaligned_access and overlap_op_by_pieces will
> both be set will not occur (we can defer the discussion about that 
> until then).
> Gating overlap_op_by_pieces with !TARGET_STRICT_ALIGN is a good idea.
> It will be fixed for a v2.

OK with that change.


I'm still working through 7/7.  I may not have enough left in my tank to 
get through that one today.


jeff


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/7] riscv: Use by-pieces to do overlapping accesses in block_move_straight
  2022-11-14 19:01     ` Christoph Müllner
@ 2022-11-14 19:05       ` Jeff Law
  0 siblings, 0 replies; 33+ messages in thread
From: Jeff Law @ 2022-11-14 19:05 UTC (permalink / raw)
  To: Christoph Müllner
  Cc: gcc-patches, Kito Cheng, Jim Wilson, Palmer Dabbelt,
	Andrew Waterman, Philipp Tomsich, Vineet Gupta

[-- Attachment #1: Type: text/plain, Size: 1659 bytes --]


On 11/14/22 12:01, Christoph Müllner wrote:
>
>
> On Mon, Nov 14, 2022 at 6:16 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>     On 11/13/22 16:05, Christoph Muellner wrote:
>     > From: Christoph Müllner <christoph.muellner@vrull.eu>
>     >
>     > The current implementation of riscv_block_move_straight() emits
>     a couple
>     > of load-store pairs with maximum width (e.g. 8-byte for RV64).
>     > The remainder is handed over to move_by_pieces(), which emits
>     code based
>     > target settings like slow_unaligned_access and overlap_op_by_pieces.
>     >
>     > move_by_pieces() will emit overlapping memory accesses with maximum
>     > width only if the given length exceeds the size of one access
>     > (e.g. 15-bytes for 8-byte accesses).
>     >
>     > This patch changes the implementation of riscv_block_move_straight()
>     > such, that it preserves a remainder within the interval
>     > [delta..2*delta) instead of [0..delta), so that overlapping memory
>     > access may be emitted (if the requirements for them are given).
>     >
>     > gcc/ChangeLog:
>     >
>     >       * config/riscv/riscv-string.c (riscv_block_move_straight):
>     >         Adjust range for emitted load/store pairs.
>
>     The change to riscv_expand_block_move isn't noted in the
>     ChangeLog.  OK
>     with that fixed (I'm assuming you want to attempt to use overlapping
>     word ops for that case).
>
>
> The change in riscv_expand_block_move is a code cleanup.
> At the beginning of riscv_expand_block_move we do the following:
>   unsigned HOST_WIDE_INT length = UINTVAL (length);

AH, missed that.


Thanks,

Jeff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/7] riscv: bitmanip: add orc.b as an unspec
  2022-11-14 16:51   ` Jeff Law
  2022-11-14 17:53     ` Jeff Law
@ 2022-11-14 19:05     ` Philipp Tomsich
  1 sibling, 0 replies; 33+ messages in thread
From: Philipp Tomsich @ 2022-11-14 19:05 UTC (permalink / raw)
  To: Jeff Law
  Cc: Christoph Muellner, gcc-patches, Kito Cheng, Jim Wilson,
	Palmer Dabbelt, Andrew Waterman, Vineet Gupta

On Mon, 14 Nov 2022 at 17:51, Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
> On 11/13/22 16:05, Christoph Muellner wrote:
> > From: Philipp Tomsich <philipp.tomsich@vrull.eu>
> >
> > As a basis for optimized string functions (e.g., the by-pieces
> > implementations), we need orc.b available.  This adds orc.b as an
> > unspec, so we can expand to it.
> >
> > gcc/ChangeLog:
> >
> >          * config/riscv/bitmanip.md (orcb<mode>2): Add orc.b as an
> >         unspec.
> >          * config/riscv/riscv.md: Add UNSPEC_ORC_B.
> In general, we should prefer to express things as "real" RTL rather than
> UNSPECS.  In this particular case expressing the orc could be done with
> a handful of IOR expressions, though they'd probably need to reference
> byte SUBREGs of the input and I dislike explicit SUBREGs in the md file
> even more than UNSPECs.  So....
>
> OK.

Applied to master. Thanks!
(After using emacs' whitespace-cleanup to fix the damage that
Christoph's vim did to the ChangeLog...)

Philipp.

>
>
> Jeff
>
>
> ps.  We could consider this a reduc_ior_scal insn, but that may be
> actively harmful.  Having vector ops on the general and vector registers
> is a wart I hope we can avoid.
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/7] riscv: Enable overlap-by-pieces via tune param
  2022-11-14 19:04         ` Jeff Law
@ 2022-11-14 19:07           ` Christoph Müllner
  0 siblings, 0 replies; 33+ messages in thread
From: Christoph Müllner @ 2022-11-14 19:07 UTC (permalink / raw)
  To: Jeff Law
  Cc: Philipp Tomsich, Vineet Gupta, gcc-patches, Kito Cheng,
	Jim Wilson, Palmer Dabbelt, Andrew Waterman

[-- Attachment #1: Type: text/plain, Size: 1132 bytes --]

On Mon, Nov 14, 2022 at 8:04 PM Jeff Law <jeffreyalaw@gmail.com> wrote:

>
> On 11/14/22 01:29, Christoph Müllner wrote:
>
>
>
> On Mon, Nov 14, 2022 at 8:59 AM Philipp Tomsich <philipp.tomsich@vrull.eu>
> wrote:
>
>> On Mon, 14 Nov 2022 at 03:48, Vineet Gupta <vineetg@rivosinc.com> wrote:
>> >
>> >
>> >
>> > On 11/13/22 15:05, Christoph Muellner wrote:
>> > >
>> > > +static bool
>> > > +riscv_overlap_op_by_pieces (void)
>> > > +{
>> > > +  return tune_param->overlap_op_by_pieces;
>> >
>> > Does this not need to be gated on unaligned access enabled as well.
>>
>> I assume you mean "&& !STRICT_ALIGNMENT"?
>>
>
> I think the case that slow_unaligned_access and overlap_op_by_pieces will
> both be set will not occur (we can defer the discussion about that until
> then).
> Gating overlap_op_by_pieces with !TARGET_STRICT_ALIGN is a good idea.
> It will be fixed for a v2.
>
> OK with that change.
>
>
> I'm still working through 7/7.  I may not have enough left in my tank to
> get through that one today.
>

No worries!
Thank you very much for the fast reviews!


>
> jeff
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/7] riscv: Add support for str(n)cmp inline expansion
  2022-11-13 23:05 ` [PATCH 7/7] riscv: Add support for str(n)cmp " Christoph Muellner
@ 2022-11-14 19:28   ` Jeff Law
  2022-11-14 21:49     ` Christoph Müllner
  2022-11-15  0:46   ` Kito Cheng
  1 sibling, 1 reply; 33+ messages in thread
From: Jeff Law @ 2022-11-14 19:28 UTC (permalink / raw)
  To: Christoph Muellner, gcc-patches, Kito Cheng, Jim Wilson,
	Palmer Dabbelt, Andrew Waterman, Philipp Tomsich, Vineet Gupta


On 11/13/22 16:05, Christoph Muellner wrote:
> From: Christoph Müllner <christoph.muellner@vrull.eu>
>
> This patch implements expansions for the cmpstrsi and the cmpstrnsi
> builtins using Zbb instructions (if available).
> This allows to inline calls to strcmp() and strncmp().
>
> The expansion basically emits a peeled comparison sequence (i.e. a peeled
> comparison loop) which compares XLEN bits per step if possible.
>
> The emitted sequence can be controlled, by setting the maximum number
> of compared bytes (-mstring-compare-inline-limit).
>
> gcc/ChangeLog:
>
> 	* config/riscv/riscv-protos.h (riscv_expand_strn_compare): New
> 	  prototype.
> 	* config/riscv/riscv-string.cc (GEN_EMIT_HELPER3): New helper
> 	  macros.
> 	(GEN_EMIT_HELPER2): New helper macros.
> 	(expand_strncmp_zbb_sequence): New function.
> 	(riscv_emit_str_compare_zbb): New function.
> 	(riscv_expand_strn_compare): New function.
> 	* config/riscv/riscv.md (cmpstrnsi): Invoke expansion functions
> 	  for strn_compare.
> 	(cmpstrsi): Invoke expansion functions for strn_compare.
> 	* config/riscv/riscv.opt: Add new parameter
> 	  '-mstring-compare-inline-limit'.

Presumably the hybrid inline + out of line approach is to capture the 
fact that most strings compare unequal early, then punt out to the 
library if they don't follow that model?  It looks like we're structured 
for that case by peeling iterations rather than having a fully inlined 
approach.  Just want to confirm...


I was a bit worried about the "readahead" problem that arises when 
reading more than a byte and a NUL is found in the first string.  If 
you're not careful, the readahead of the second string could fault.  But 
it looks like we avoid that by requiring word alignment on both strings.


> +
> +/* Emit a string comparison sequence using Zbb instruction.
> +
> +   OPERANDS[0] is the target (result).
> +   OPERANDS[1] is the first source.
> +   OPERANDS[2] is the second source.
> +   If NO_LENGTH is zero, then:
> +   OPERANDS[3] is the length.
> +   OPERANDS[4] is the alignment in bytes.
> +   If NO_LENGTH is nonzero, then:
> +   OPERANDS[3] is the alignment in bytes.

Ugh.  I guess it's inevitable unless we want to drop the array and pass 
each element individually (in which case we'd pass a NULL_RTX in the 
case we don't have a length argument).


I'd like to give others a chance to chime in here.  Everything looks 
sensible here, but I may have missed something.  So give the other 
maintainers a couple days to chime in before committing.


Jeff


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/7] riscv: Add support for strlen inline expansion
  2022-11-14 18:17   ` Jeff Law
@ 2022-11-14 21:07     ` Christoph Müllner
  0 siblings, 0 replies; 33+ messages in thread
From: Christoph Müllner @ 2022-11-14 21:07 UTC (permalink / raw)
  To: Jeff Law
  Cc: gcc-patches, Kito Cheng, Jim Wilson, Palmer Dabbelt,
	Andrew Waterman, Philipp Tomsich, Vineet Gupta

[-- Attachment #1: Type: text/plain, Size: 3321 bytes --]

On Mon, Nov 14, 2022 at 7:17 PM Jeff Law <jeffreyalaw@gmail.com> wrote:

>
> On 11/13/22 16:05, Christoph Muellner wrote:
> > From: Christoph Müllner <christoph.muellner@vrull.eu>
> >
> > This patch implements the expansion of the strlen builtin
> > using Zbb instructions (if available) for aligned strings
> > using the following sequence:
> >
> >        li      a3,-1
> >        addi    a4,a0,8
> > .L2:  ld      a5,0(a0)
> >        addi    a0,a0,8
> >        orc.b   a5,a5
> >        beq     a5,a3,6 <.L2>
> >        not     a5,a5
> >        ctz     a5,a5
> >        srli    a5,a5,0x3
> >        add     a0,a0,a5
> >        sub     a0,a0,a4
> >
> > This allows to inline calls to strlen(), with optimized code for
> > determining the length of a string.
> >
> > gcc/ChangeLog:
> >
> >       * config/riscv/riscv-protos.h (riscv_expand_strlen): New
> >         prototype.
> >       * config/riscv/riscv-string.cc (riscv_emit_unlikely_jump): New
> >         function.
> >       (GEN_EMIT_HELPER2): New helper macro.
> >       (GEN_EMIT_HELPER3): New helper macro.
> >       (do_load_from_addr): New helper function.
> >       (riscv_expand_strlen_zbb): New function.
> >       (riscv_expand_strlen): New function.
> >       * config/riscv/riscv.md (strlen<mode>): Invoke expansion
> >         functions for strlen.
> >
> >
> > +extern bool riscv_expand_strlen (rtx[]);
>
> Consider adding the number of elements in the RTX array here. Martin S's
> work from a little while ago will make use of it to try and catch
> over-reads and over-writes if the data is available.
>

Done.


>
>
> >
> >   /* Information about one CPU we know about.  */
> >   struct riscv_cpu_info {
> > diff --git a/gcc/config/riscv/riscv-string.cc
> b/gcc/config/riscv/riscv-string.cc
> > index 1137df475be..bf96522b608 100644
> > --- a/gcc/config/riscv/riscv-string.cc
> > +++ b/gcc/config/riscv/riscv-string.cc
> > @@ -38,6 +38,81 @@
> >   #include "predict.h"
> >   #include "optabs.h"
> >
> > +/* Emit unlikely jump instruction.  */
> > +
> > +static rtx_insn *
> > +riscv_emit_unlikely_jump (rtx insn)
> > +{
> > +  rtx_insn *jump = emit_jump_insn (insn);
> > +  add_reg_br_prob_note (jump, profile_probability::very_unlikely ());
> > +  return jump;
> > +}
>
> I was a bit surprised that we didn't have this as a generic routine.
> Consider adding this to emit-rtl.cc along with its companion
> emit_likely_jump.  Not a requirement to move forward, but it seems like
> the right thing to do.
>

I created both and called them emit_[un]likely_jump_insn() to match
emit_jump_insn().


>
>
>
>
> > +
> > +/* Emit proper instruction depending on type of dest.  */
>
> s/type/mode/
>

Done.


>
>
>
> > +
> > +/* Emit proper instruction depending on type of dest.  */
>
> s/type/mode/
>

Done.


>
>
> You probably want to undefine GEN_EMIT_HELPER once you're done when
> them.  That's become fairly standard practice for these kind of helper
> macros.
>

Done.


>
> OK with the nits fixed.  Your call on whether or not to move the
> implementation of emit_likely_jump and emit_unlikely_jump into emit-rtl.cc.
>

I've made all the requested and suggested changes and rested again.
Thanks!


>
>
> Jeff
>
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/7] riscv: Add support for str(n)cmp inline expansion
  2022-11-14 19:28   ` Jeff Law
@ 2022-11-14 21:49     ` Christoph Müllner
  2022-11-15  0:22       ` Jeff Law
  0 siblings, 1 reply; 33+ messages in thread
From: Christoph Müllner @ 2022-11-14 21:49 UTC (permalink / raw)
  To: Jeff Law
  Cc: gcc-patches, Kito Cheng, Jim Wilson, Palmer Dabbelt,
	Andrew Waterman, Philipp Tomsich, Vineet Gupta

[-- Attachment #1: Type: text/plain, Size: 3872 bytes --]

On Mon, Nov 14, 2022 at 8:28 PM Jeff Law <jeffreyalaw@gmail.com> wrote:

>
> On 11/13/22 16:05, Christoph Muellner wrote:
> > From: Christoph Müllner <christoph.muellner@vrull.eu>
> >
> > This patch implements expansions for the cmpstrsi and the cmpstrnsi
> > builtins using Zbb instructions (if available).
> > This allows to inline calls to strcmp() and strncmp().
> >
> > The expansion basically emits a peeled comparison sequence (i.e. a peeled
> > comparison loop) which compares XLEN bits per step if possible.
> >
> > The emitted sequence can be controlled, by setting the maximum number
> > of compared bytes (-mstring-compare-inline-limit).
> >
> > gcc/ChangeLog:
> >
> >       * config/riscv/riscv-protos.h (riscv_expand_strn_compare): New
> >         prototype.
> >       * config/riscv/riscv-string.cc (GEN_EMIT_HELPER3): New helper
> >         macros.
> >       (GEN_EMIT_HELPER2): New helper macros.
> >       (expand_strncmp_zbb_sequence): New function.
> >       (riscv_emit_str_compare_zbb): New function.
> >       (riscv_expand_strn_compare): New function.
> >       * config/riscv/riscv.md (cmpstrnsi): Invoke expansion functions
> >         for strn_compare.
> >       (cmpstrsi): Invoke expansion functions for strn_compare.
> >       * config/riscv/riscv.opt: Add new parameter
> >         '-mstring-compare-inline-limit'.
>
> Presumably the hybrid inline + out of line approach is to capture the
> fact that most strings compare unequal early, then punt out to the
> library if they don't follow that model?  It looks like we're structured
> for that case by peeling iterations rather than having a fully inlined
> approach.  Just want to confirm...
>

Yes, this was inspired by gcc/config/rs6000/rs6000-string.cc
(e.g. expand_strncmp_gpr_sequence).

The current implementation emits an unrolled loop to process up to N
characters.
For longer strings, we do a handover to libc to process the remainder there.
The hand-over implies a call overhead and, of course, a well-optimized
str(n)cmp
implementation would be beneficial (once we have the information in user
space for ifuncs).

We can take this further, but then the following questions pop up:
* how much data processing per loop iteration?
* what about unaligned strings?

Happy to get suggestions/opinions for improvement.


> I was a bit worried about the "readahead" problem that arises when
> reading more than a byte and a NUL is found in the first string.  If
> you're not careful, the readahead of the second string could fault.  But
> it looks like we avoid that by requiring word alignment on both strings.
>

Yes, aligned strings are not affected by the readahead.

I wonder if we should add dynamic tests in case the compiler cannot derive
XLEN-alignment so we capture more cases (e.g. character-arrays have
guaranteed alignment 1, but are allocated with a higher actual alignment on
the stack).


> > +
> > +/* Emit a string comparison sequence using Zbb instruction.
> > +
> > +   OPERANDS[0] is the target (result).
> > +   OPERANDS[1] is the first source.
> > +   OPERANDS[2] is the second source.
> > +   If NO_LENGTH is zero, then:
> > +   OPERANDS[3] is the length.
> > +   OPERANDS[4] is the alignment in bytes.
> > +   If NO_LENGTH is nonzero, then:
> > +   OPERANDS[3] is the alignment in bytes.
>
> Ugh.  I guess it's inevitable unless we want to drop the array and pass
> each element individually (in which case we'd pass a NULL_RTX in the
> case we don't have a length argument).
>

I will split the array into individual rtx arguments as suggested.


> I'd like to give others a chance to chime in here.  Everything looks
> sensible here, but I may have missed something.  So give the other
> maintainers a couple days to chime in before committing.
>
>
> Jeff
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/7] riscv: Add support for str(n)cmp inline expansion
  2022-11-14 21:49     ` Christoph Müllner
@ 2022-11-15  0:22       ` Jeff Law
  0 siblings, 0 replies; 33+ messages in thread
From: Jeff Law @ 2022-11-15  0:22 UTC (permalink / raw)
  To: Christoph Müllner
  Cc: gcc-patches, Kito Cheng, Jim Wilson, Palmer Dabbelt,
	Andrew Waterman, Philipp Tomsich, Vineet Gupta

[-- Attachment #1: Type: text/plain, Size: 805 bytes --]


On 11/14/22 14:49, Christoph Müllner wrote:
>
>
> We can take this further, but then the following questions pop up:
> * how much data processing per loop iteration?

I have no idea because I don't have any real data.  Last time I gathered 
any data on this issue was circa 1988 :-)


> * what about unaligned strings?

I'd punt.  I don't think we can depend on having a high performance 
unaligned access.  You could do a dynamic check of alignment, but you'd 
really need to know that they're aligned often enough that the dynamic 
check can often be recovered.


>
> Happy to get suggestions/opinions for improvement.

I think this is pretty good without additional data that would indicate 
that handling unaligned cases or a different number of loop peels would 
be a notable improvement.

Jeff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/7] riscv: Add support for str(n)cmp inline expansion
  2022-11-13 23:05 ` [PATCH 7/7] riscv: Add support for str(n)cmp " Christoph Muellner
  2022-11-14 19:28   ` Jeff Law
@ 2022-11-15  0:46   ` Kito Cheng
  2022-11-15  0:53     ` Palmer Dabbelt
                       ` (2 more replies)
  1 sibling, 3 replies; 33+ messages in thread
From: Kito Cheng @ 2022-11-15  0:46 UTC (permalink / raw)
  To: Christoph Muellner
  Cc: gcc-patches, Kito Cheng, Jim Wilson, Palmer Dabbelt,
	Andrew Waterman, Philipp Tomsich, Jeff Law, Vineet Gupta

Hi Christoph:

> This patch implements expansions for the cmpstrsi and the cmpstrnsi
> builtins using Zbb instructions (if available).
> This allows to inline calls to strcmp() and strncmp().
>
> The expansion basically emits a peeled comparison sequence (i.e. a peeled
> comparison loop) which compares XLEN bits per step if possible.
>
> The emitted sequence can be controlled, by setting the maximum number
> of compared bytes (-mstring-compare-inline-limit).

I would like to have a unified option interface,
maybe -m[no-]inline-str[n]cmp and -minline-str[n]cmp-limit.
And add some option like this:
-minline-str[n]cmp=[bitmanip|vector|auto] in future,
since I assume we'll have different versions of those things.

>
> gcc/ChangeLog:
>
>         * config/riscv/riscv-protos.h (riscv_expand_strn_compare): New
>           prototype.
>         * config/riscv/riscv-string.cc (GEN_EMIT_HELPER3): New helper
>           macros.
>         (GEN_EMIT_HELPER2): New helper macros.
>         (expand_strncmp_zbb_sequence): New function.
>         (riscv_emit_str_compare_zbb): New function.
>         (riscv_expand_strn_compare): New function.
>         * config/riscv/riscv.md (cmpstrnsi): Invoke expansion functions
>           for strn_compare.
>         (cmpstrsi): Invoke expansion functions for strn_compare.
>         * config/riscv/riscv.opt: Add new parameter
>           '-mstring-compare-inline-limit'.

We need to document this option.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/7] riscv: Add support for str(n)cmp inline expansion
  2022-11-15  0:46   ` Kito Cheng
@ 2022-11-15  0:53     ` Palmer Dabbelt
  2022-11-15  1:55       ` Kito Cheng
  2022-11-15  3:41       ` Jeff Law
  2022-11-15 22:22     ` Christoph Müllner
  2022-11-16  0:15     ` Philipp Tomsich
  2 siblings, 2 replies; 33+ messages in thread
From: Palmer Dabbelt @ 2022-11-15  0:53 UTC (permalink / raw)
  To: Kito Cheng
  Cc: christoph.muellner, gcc-patches, kito.cheng, Jim Wilson,
	Andrew Waterman, philipp.tomsich, jeffreyalaw, Vineet Gupta

On Mon, 14 Nov 2022 16:46:37 PST (-0800), Kito Cheng wrote:
> Hi Christoph:
>
>> This patch implements expansions for the cmpstrsi and the cmpstrnsi
>> builtins using Zbb instructions (if available).
>> This allows to inline calls to strcmp() and strncmp().
>>
>> The expansion basically emits a peeled comparison sequence (i.e. a peeled
>> comparison loop) which compares XLEN bits per step if possible.
>>
>> The emitted sequence can be controlled, by setting the maximum number
>> of compared bytes (-mstring-compare-inline-limit).
>
> I would like to have a unified option interface,
> maybe -m[no-]inline-str[n]cmp and -minline-str[n]cmp-limit.
> And add some option like this:
> -minline-str[n]cmp=[bitmanip|vector|auto] in future,
> since I assume we'll have different versions of those things.

Can we just decide that from mtune?  We'll probably have uarch-specific 
string functions at some point, might as well start planning for it now.

>> gcc/ChangeLog:
>>
>>         * config/riscv/riscv-protos.h (riscv_expand_strn_compare): New
>>           prototype.
>>         * config/riscv/riscv-string.cc (GEN_EMIT_HELPER3): New helper
>>           macros.
>>         (GEN_EMIT_HELPER2): New helper macros.
>>         (expand_strncmp_zbb_sequence): New function.
>>         (riscv_emit_str_compare_zbb): New function.
>>         (riscv_expand_strn_compare): New function.
>>         * config/riscv/riscv.md (cmpstrnsi): Invoke expansion functions
>>           for strn_compare.
>>         (cmpstrsi): Invoke expansion functions for strn_compare.
>>         * config/riscv/riscv.opt: Add new parameter
>>           '-mstring-compare-inline-limit'.
>
> We need to document this option.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/7] riscv: Add support for str(n)cmp inline expansion
  2022-11-15  0:53     ` Palmer Dabbelt
@ 2022-11-15  1:55       ` Kito Cheng
  2022-11-15  3:41       ` Jeff Law
  1 sibling, 0 replies; 33+ messages in thread
From: Kito Cheng @ 2022-11-15  1:55 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: Kito Cheng, christoph.muellner, gcc-patches, Jim Wilson,
	Andrew Waterman, philipp.tomsich, jeffreyalaw, Vineet Gupta

On Tue, Nov 15, 2022 at 8:53 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>
> On Mon, 14 Nov 2022 16:46:37 PST (-0800), Kito Cheng wrote:
> > Hi Christoph:
> >
> >> This patch implements expansions for the cmpstrsi and the cmpstrnsi
> >> builtins using Zbb instructions (if available).
> >> This allows to inline calls to strcmp() and strncmp().
> >>
> >> The expansion basically emits a peeled comparison sequence (i.e. a peeled
> >> comparison loop) which compares XLEN bits per step if possible.
> >>
> >> The emitted sequence can be controlled, by setting the maximum number
> >> of compared bytes (-mstring-compare-inline-limit).
> >
> > I would like to have a unified option interface,
> > maybe -m[no-]inline-str[n]cmp and -minline-str[n]cmp-limit.
> > And add some option like this:
> > -minline-str[n]cmp=[bitmanip|vector|auto] in future,
> > since I assume we'll have different versions of those things.
>
> Can we just decide that from mtune?  We'll probably have uarch-specific
> string functions at some point, might as well start planning for it now.

I assume you mean the -minline-str[n]cmp=[bitmanip|vector|auto] part?
I think this part should have more discussion and could defer that until
we reach consensus.

But -m[no-]inline-str[n]cmp and -minline-str[n]cmp-limit part I favor having
those two options to disable and/or fine tune those parameters.

>
> >> gcc/ChangeLog:
> >>
> >>         * config/riscv/riscv-protos.h (riscv_expand_strn_compare): New
> >>           prototype.
> >>         * config/riscv/riscv-string.cc (GEN_EMIT_HELPER3): New helper
> >>           macros.
> >>         (GEN_EMIT_HELPER2): New helper macros.
> >>         (expand_strncmp_zbb_sequence): New function.
> >>         (riscv_emit_str_compare_zbb): New function.
> >>         (riscv_expand_strn_compare): New function.
> >>         * config/riscv/riscv.md (cmpstrnsi): Invoke expansion functions
> >>           for strn_compare.
> >>         (cmpstrsi): Invoke expansion functions for strn_compare.
> >>         * config/riscv/riscv.opt: Add new parameter
> >>           '-mstring-compare-inline-limit'.
> >
> > We need to document this option.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/7] riscv: Add support for str(n)cmp inline expansion
  2022-11-15  0:53     ` Palmer Dabbelt
  2022-11-15  1:55       ` Kito Cheng
@ 2022-11-15  3:41       ` Jeff Law
  1 sibling, 0 replies; 33+ messages in thread
From: Jeff Law @ 2022-11-15  3:41 UTC (permalink / raw)
  To: Palmer Dabbelt, Kito Cheng
  Cc: christoph.muellner, gcc-patches, kito.cheng, Jim Wilson,
	Andrew Waterman, philipp.tomsich, Vineet Gupta


On 11/14/22 17:53, Palmer Dabbelt wrote:
> On Mon, 14 Nov 2022 16:46:37 PST (-0800), Kito Cheng wrote:
>> Hi Christoph:
>>
>>> This patch implements expansions for the cmpstrsi and the cmpstrnsi
>>> builtins using Zbb instructions (if available).
>>> This allows to inline calls to strcmp() and strncmp().
>>>
>>> The expansion basically emits a peeled comparison sequence (i.e. a 
>>> peeled
>>> comparison loop) which compares XLEN bits per step if possible.
>>>
>>> The emitted sequence can be controlled, by setting the maximum number
>>> of compared bytes (-mstring-compare-inline-limit).
>>
>> I would like to have a unified option interface,
>> maybe -m[no-]inline-str[n]cmp and -minline-str[n]cmp-limit.
>> And add some option like this:
>> -minline-str[n]cmp=[bitmanip|vector|auto] in future,
>> since I assume we'll have different versions of those things.
>
> Can we just decide that from mtune?  We'll probably have 
> uarch-specific string functions at some point, might as well start 
> planning for it now.

Sure, though the implementation isn't terribly tied to any uarch at the 
moment and I doubt uarch approaches would make significant impacts here 
-- we're peeling off some small number of iterations fairly 
generically.  The uarch specific stuff would be the code in glibc 
selected by an ifunc.  uarch variants for block copiers seem inevitable 
though :-)


I don' t have strong opinions here, so if we want to key off mtune, 
sure.  If we want to have variants for scalar vs vector that's quite 
reasonable too.  Or if we want to go all the way to uarch specific 
implementations, I won't object.






> th
>>> gcc/ChangeLog:
>>>
>>>         * config/riscv/riscv-protos.h (riscv_expand_strn_compare): New
>>>           prototype.
>>>         * config/riscv/riscv-string.cc (GEN_EMIT_HELPER3): New helper
>>>           macros.
>>>         (GEN_EMIT_HELPER2): New helper macros.
>>>         (expand_strncmp_zbb_sequence): New function.
>>>         (riscv_emit_str_compare_zbb): New function.
>>>         (riscv_expand_strn_compare): New function.
>>>         * config/riscv/riscv.md (cmpstrnsi): Invoke expansion functions
>>>           for strn_compare.
>>>         (cmpstrsi): Invoke expansion functions for strn_compare.
>>>         * config/riscv/riscv.opt: Add new parameter
>>>           '-mstring-compare-inline-limit'.
>>
>> We need to document this option.

Yes, definitely needs documentation.  Thanks for catching that.


jeff


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/7] riscv: Add support for str(n)cmp inline expansion
  2022-11-15  0:46   ` Kito Cheng
  2022-11-15  0:53     ` Palmer Dabbelt
@ 2022-11-15 22:22     ` Christoph Müllner
  2022-11-16  0:15     ` Philipp Tomsich
  2 siblings, 0 replies; 33+ messages in thread
From: Christoph Müllner @ 2022-11-15 22:22 UTC (permalink / raw)
  To: Kito Cheng
  Cc: gcc-patches, Kito Cheng, Jim Wilson, Palmer Dabbelt,
	Andrew Waterman, Philipp Tomsich, Jeff Law, Vineet Gupta

[-- Attachment #1: Type: text/plain, Size: 1727 bytes --]

On Tue, Nov 15, 2022 at 1:46 AM Kito Cheng <kito.cheng@gmail.com> wrote:

> Hi Christoph:
>
> > This patch implements expansions for the cmpstrsi and the cmpstrnsi
> > builtins using Zbb instructions (if available).
> > This allows to inline calls to strcmp() and strncmp().
> >
> > The expansion basically emits a peeled comparison sequence (i.e. a peeled
> > comparison loop) which compares XLEN bits per step if possible.
> >
> > The emitted sequence can be controlled, by setting the maximum number
> > of compared bytes (-mstring-compare-inline-limit).
>
> I would like to have a unified option interface,
> maybe -m[no-]inline-str[n]cmp and -minline-str[n]cmp-limit.
>

Ok, I don't mind (in fact, I thought about this as well).
The reason why it is how it is: I took inspiration from the rs6000 backend.


> And add some option like this:
> -minline-str[n]cmp=[bitmanip|vector|auto] in future,
> since I assume we'll have different versions of those things.
>
> >
> > gcc/ChangeLog:
> >
> >         * config/riscv/riscv-protos.h (riscv_expand_strn_compare): New
> >           prototype.
> >         * config/riscv/riscv-string.cc (GEN_EMIT_HELPER3): New helper
> >           macros.
> >         (GEN_EMIT_HELPER2): New helper macros.
> >         (expand_strncmp_zbb_sequence): New function.
> >         (riscv_emit_str_compare_zbb): New function.
> >         (riscv_expand_strn_compare): New function.
> >         * config/riscv/riscv.md (cmpstrnsi): Invoke expansion functions
> >           for strn_compare.
> >         (cmpstrsi): Invoke expansion functions for strn_compare.
> >         * config/riscv/riscv.opt: Add new parameter
> >           '-mstring-compare-inline-limit'.
>
> We need to document this option.
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/7] riscv: Add support for str(n)cmp inline expansion
  2022-11-15  0:46   ` Kito Cheng
  2022-11-15  0:53     ` Palmer Dabbelt
  2022-11-15 22:22     ` Christoph Müllner
@ 2022-11-16  0:15     ` Philipp Tomsich
  2022-11-21  3:24       ` Kito Cheng
  2 siblings, 1 reply; 33+ messages in thread
From: Philipp Tomsich @ 2022-11-16  0:15 UTC (permalink / raw)
  To: Kito Cheng
  Cc: Christoph Muellner, gcc-patches, Kito Cheng, Jim Wilson,
	Palmer Dabbelt, Andrew Waterman, Jeff Law, Vineet Gupta

On Tue, 15 Nov 2022 at 01:46, Kito Cheng <kito.cheng@gmail.com> wrote:
>
> Hi Christoph:
>
> > This patch implements expansions for the cmpstrsi and the cmpstrnsi
> > builtins using Zbb instructions (if available).
> > This allows to inline calls to strcmp() and strncmp().
> >
> > The expansion basically emits a peeled comparison sequence (i.e. a peeled
> > comparison loop) which compares XLEN bits per step if possible.
> >
> > The emitted sequence can be controlled, by setting the maximum number
> > of compared bytes (-mstring-compare-inline-limit).
>
> I would like to have a unified option interface,
> maybe -m[no-]inline-str[n]cmp and -minline-str[n]cmp-limit.

For the basic option (-m[no-]inline-str[n]cmp), I would punt to
-fno-builtin-str[n]cmp.
The limit-one sounds more like a --param?

> And add some option like this:
> -minline-str[n]cmp=[bitmanip|vector|auto] in future,

If we want to follow the lead of others, then x86 has a -mstringop-strategy=alg


> since I assume we'll have different versions of those things.
>
> >
> > gcc/ChangeLog:
> >
> >         * config/riscv/riscv-protos.h (riscv_expand_strn_compare): New
> >           prototype.
> >         * config/riscv/riscv-string.cc (GEN_EMIT_HELPER3): New helper
> >           macros.
> >         (GEN_EMIT_HELPER2): New helper macros.
> >         (expand_strncmp_zbb_sequence): New function.
> >         (riscv_emit_str_compare_zbb): New function.
> >         (riscv_expand_strn_compare): New function.
> >         * config/riscv/riscv.md (cmpstrnsi): Invoke expansion functions
> >           for strn_compare.
> >         (cmpstrsi): Invoke expansion functions for strn_compare.
> >         * config/riscv/riscv.opt: Add new parameter
> >           '-mstring-compare-inline-limit'.
>
> We need to document this option.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/7] riscv: Add support for str(n)cmp inline expansion
  2022-11-16  0:15     ` Philipp Tomsich
@ 2022-11-21  3:24       ` Kito Cheng
  0 siblings, 0 replies; 33+ messages in thread
From: Kito Cheng @ 2022-11-21  3:24 UTC (permalink / raw)
  To: Philipp Tomsich
  Cc: Christoph Muellner, gcc-patches, Kito Cheng, Jim Wilson,
	Palmer Dabbelt, Andrew Waterman, Jeff Law, Vineet Gupta

> > I would like to have a unified option interface,
> > maybe -m[no-]inline-str[n]cmp and -minline-str[n]cmp-limit.
>
> For the basic option (-m[no-]inline-str[n]cmp), I would punt to
> -fno-builtin-str[n]cmp.

-fno-bulitin-* will also suppress middle-end optimization for those builtins.

see:
https://github.com/gcc-mirror/gcc/blob/master/gcc/tree-ssa-strlen.cc#L5372

and
https://github.com/gcc-mirror/gcc/blob/master/gcc/tree-loop-distribution.cc
for -fno-bulitin-memcpy/memset/memmove

> The limit-one sounds more like a --param?

Use -param=inline-*-limit= sound good idea, aarch64 and x86 have few
options like that.

>
> > And add some option like this:
> > -minline-str[n]cmp=[bitmanip|vector|auto] in future,
>
> If we want to follow the lead of others, then x86 has a -mstringop-strategy=alg

Using same option name as x86 is SGTM.

> > since I assume we'll have different versions of those things.
> >
> > >
> > > gcc/ChangeLog:
> > >
> > >         * config/riscv/riscv-protos.h (riscv_expand_strn_compare): New
> > >           prototype.
> > >         * config/riscv/riscv-string.cc (GEN_EMIT_HELPER3): New helper
> > >           macros.
> > >         (GEN_EMIT_HELPER2): New helper macros.
> > >         (expand_strncmp_zbb_sequence): New function.
> > >         (riscv_emit_str_compare_zbb): New function.
> > >         (riscv_expand_strn_compare): New function.
> > >         * config/riscv/riscv.md (cmpstrnsi): Invoke expansion functions
> > >           for strn_compare.
> > >         (cmpstrsi): Invoke expansion functions for strn_compare.
> > >         * config/riscv/riscv.opt: Add new parameter
> > >           '-mstring-compare-inline-limit'.
> >
> > We need to document this option.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2022-11-21  3:24 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-13 23:05 [PATCH 0/7] riscv: Improve builtins expansion Christoph Muellner
2022-11-13 23:05 ` [PATCH 1/7] riscv: bitmanip: add orc.b as an unspec Christoph Muellner
2022-11-14 16:51   ` Jeff Law
2022-11-14 17:53     ` Jeff Law
2022-11-14 19:05     ` Philipp Tomsich
2022-11-13 23:05 ` [PATCH 2/7] riscv: bitmanip/zbb: Add prefix/postfix and enable visiblity Christoph Muellner
2022-11-14 16:55   ` Jeff Law
2022-11-13 23:05 ` [PATCH 3/7] riscv: Enable overlap-by-pieces via tune param Christoph Muellner
2022-11-14  2:48   ` Vineet Gupta
2022-11-14  7:59     ` Philipp Tomsich
2022-11-14  8:29       ` Christoph Müllner
2022-11-14 19:04         ` Jeff Law
2022-11-14 19:07           ` Christoph Müllner
2022-11-13 23:05 ` [PATCH 4/7] riscv: Move riscv_block_move_loop to separate file Christoph Muellner
2022-11-14 16:56   ` Jeff Law
2022-11-13 23:05 ` [PATCH 5/7] riscv: Use by-pieces to do overlapping accesses in block_move_straight Christoph Muellner
2022-11-14 17:16   ` Jeff Law
2022-11-14 19:01     ` Christoph Müllner
2022-11-14 19:05       ` Jeff Law
2022-11-13 23:05 ` [PATCH 6/7] riscv: Add support for strlen inline expansion Christoph Muellner
2022-11-14 18:17   ` Jeff Law
2022-11-14 21:07     ` Christoph Müllner
2022-11-13 23:05 ` [PATCH 7/7] riscv: Add support for str(n)cmp " Christoph Muellner
2022-11-14 19:28   ` Jeff Law
2022-11-14 21:49     ` Christoph Müllner
2022-11-15  0:22       ` Jeff Law
2022-11-15  0:46   ` Kito Cheng
2022-11-15  0:53     ` Palmer Dabbelt
2022-11-15  1:55       ` Kito Cheng
2022-11-15  3:41       ` Jeff Law
2022-11-15 22:22     ` Christoph Müllner
2022-11-16  0:15     ` Philipp Tomsich
2022-11-21  3:24       ` Kito Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).