public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] riscv: Introduce strlen/strcmp/strncmp inline expansion
@ 2023-09-06 16:07 Christoph Muellner
  2023-09-06 16:07 ` [PATCH v2 1/2] riscv: Add support for strlen " Christoph Muellner
  2023-09-06 16:07 ` [PATCH v2 2/2] riscv: Add support for str(n)cmp " Christoph Muellner
  0 siblings, 2 replies; 10+ messages in thread
From: Christoph Muellner @ 2023-09-06 16:07 UTC (permalink / raw)
  To: gcc-patches, Kito Cheng, Jim Wilson, Palmer Dabbelt,
	Andrew Waterman, Philipp Tomsich, Jeff Law, Vineet Gupta
  Cc: Christoph Müllner

From: Christoph Müllner <christoph.muellner@vrull.eu>

This series introduces strlen/strcmp/strncmp inline expansion for Zbb/XTheadBb.

In the last months, glibc as well as the Linux kernel merged changes for
optimized string processing for RISC-V. The instruction, which enables
optimized string routines is Zbb's orc.b (or T-Head's th.tstnbz) instruction.

This patch attempts to add optimized string processing to GCC with the
following properties:
* strlen: inline a loop if the string is xlen-aligned
* strcmp/strncmp: inline a peeled comparison loop sequence if both strings
  are xlen-aligned

I've already posted the idea in a previous series last November
(therefore, this series is called 'v2'):
* https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605996.html
* https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605998.html

Back then, there were a couple of comments, which have been addressed,
but the str(n)cmp patch has been restructured to make the code easier
to digest.  In total the following changes are made:
* Address Jeff's comments for the strlen patch
* Change str(n)cmp flags according to Kito's comments
* Ensure that all flags are documented
* Break str(n)cmp expansion into several functions
* Add support for XTheadBb's th.tstnbz

I have not introduced "-minline-str[n]cmp=[bitmanip|vector|auto]"
or "-mstringop-strategy=alg" because we only have one bitmanip/scalar
expansion.  But it is possible to add this in the future (or not and
decide based on mtune).

By default all optimizations are disabled, so there should be no risk
of regressions.

Testing was done using the following strategy:
* Enablement/flag tests are part of the patches
* Correctness was tested using qemu-user with glibc's string tests compiled for:
** rv64gc (baseline) QEMU_CPU=rv64
** rv64gc_zbb (limit=64) QEMU_CPU=rv64,zbb=false (must fail)
** rv64gc_zbb (limit=64) QEMU_CPU=rv64,zbb=true
** rv64gc_zbb (limit=32) QEMU_CPU=rv64,zbb=true
** rv64gc_xtheadbb (limit=64) QEMU_CPU=rv64 (must fail)
** rv64gc_xtheadbb (limit=64) QEMU_CPU=thead-c906
** rv64gc_xtheadbb (limit=8) QEMU_CPU=thead-c906
** rv32gc_zbb (limit=64) QEMU_CPU=rv32,zbb=true
* SPEC CPU 2017 intrate base/peak with LTO

Christoph Müllner (2):
  riscv: Add support for strlen inline expansion
  riscv: Add support for str(n)cmp inline expansion

 gcc/config.gcc                                |   3 +-
 gcc/config/riscv/bitmanip.md                  |   2 +-
 gcc/config/riscv/riscv-protos.h               |   4 +
 gcc/config/riscv/riscv-string.cc              | 594 ++++++++++++++++++
 gcc/config/riscv/riscv.md                     |  72 ++-
 gcc/config/riscv/riscv.opt                    |  16 +
 gcc/config/riscv/t-riscv                      |   6 +
 gcc/config/riscv/thead.md                     |   9 +-
 gcc/doc/invoke.texi                           |  29 +-
 gcc/emit-rtl.cc                               |  24 +
 gcc/rtl.h                                     |   2 +
 .../gcc.target/riscv/xtheadbb-strcmp.c        |  57 ++
 .../riscv/xtheadbb-strlen-unaligned.c         |  14 +
 .../gcc.target/riscv/xtheadbb-strlen.c        |  19 +
 .../gcc.target/riscv/zbb-strcmp-disabled-2.c  |  38 ++
 .../gcc.target/riscv/zbb-strcmp-disabled.c    |  38 ++
 .../gcc.target/riscv/zbb-strcmp-limit.c       |  57 ++
 .../gcc.target/riscv/zbb-strcmp-unaligned.c   |  38 ++
 gcc/testsuite/gcc.target/riscv/zbb-strcmp.c   |  57 ++
 .../gcc.target/riscv/zbb-strlen-disabled-2.c  |  15 +
 .../gcc.target/riscv/zbb-strlen-disabled.c    |  15 +
 .../gcc.target/riscv/zbb-strlen-unaligned.c   |  14 +
 gcc/testsuite/gcc.target/riscv/zbb-strlen.c   |  19 +
 23 files changed, 1137 insertions(+), 5 deletions(-)
 create mode 100644 gcc/config/riscv/riscv-string.cc
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbb-strcmp.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbb-strlen-unaligned.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbb-strlen.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strcmp-disabled-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strcmp-disabled.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strcmp-limit.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strcmp-unaligned.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strcmp.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen-disabled-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen-disabled.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen-unaligned.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen.c

-- 
2.41.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/2] riscv: Add support for strlen inline expansion
  2023-09-06 16:07 [PATCH v2 0/2] riscv: Introduce strlen/strcmp/strncmp inline expansion Christoph Muellner
@ 2023-09-06 16:07 ` Christoph Muellner
  2023-09-06 16:22   ` Palmer Dabbelt
                     ` (2 more replies)
  2023-09-06 16:07 ` [PATCH v2 2/2] riscv: Add support for str(n)cmp " Christoph Muellner
  1 sibling, 3 replies; 10+ messages in thread
From: Christoph Muellner @ 2023-09-06 16:07 UTC (permalink / raw)
  To: gcc-patches, Kito Cheng, Jim Wilson, Palmer Dabbelt,
	Andrew Waterman, Philipp Tomsich, Jeff Law, Vineet Gupta
  Cc: Christoph Müllner

From: Christoph Müllner <christoph.muellner@vrull.eu>

This patch implements the expansion of the strlen builtin for RV32/RV64
for xlen-aligned aligned strings if Zbb or XTheadBb instructions are available.
The inserted sequences are:

rv32gc_zbb (RV64 is similar):
      add     a3,a0,4
      li      a4,-1
.L1:  lw      a5,0(a0)
      add     a0,a0,4
      orc.b   a5,a5
      beq     a5,a4,.L1
      not     a5,a5
      ctz     a5,a5
      srl     a5,a5,0x3
      add     a0,a0,a5
      sub     a0,a0,a3

rv64gc_xtheadbb (RV32 is similar):
      add       a4,a0,8
.L2:  ld        a5,0(a0)
      add       a0,a0,8
      th.tstnbz a5,a5
      beqz      a5,.L2
      th.rev    a5,a5
      th.ff1    a5,a5
      srl       a5,a5,0x3
      add       a0,a0,a5
      sub       a0,a0,a4

This allows to inline calls to strlen(), with optimized code for
xlen-aligned strings, resulting in the following benefits over
a call to libc:
* no call/ret instructions
* no stack frame allocation
* no register saving/restoring
* no alignment test

The inlining mechanism is gated by a new switch ('-minline-strlen')
and by the variable 'optimize_size'.

Tested using the glibc string tests.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

gcc/ChangeLog:

	* config.gcc: Add new object riscv-string.o.
	riscv-string.cc.
	* config/riscv/riscv-protos.h (riscv_expand_strlen):
	New function.
	* config/riscv/riscv.md (strlen<mode>): New expand INSN.
	* config/riscv/riscv.opt: New flag 'minline-strlen'.
	* config/riscv/t-riscv: Add new object riscv-string.o.
	* config/riscv/thead.md (th_rev<mode>2): Export INSN name.
	(th_rev<mode>2): Likewise.
	(th_tstnbz<mode>2): New INSN.
	* doc/invoke.texi: Document '-minline-strlen'.
	* emit-rtl.cc (emit_likely_jump_insn): New helper function.
	(emit_unlikely_jump_insn): Likewise.
	* rtl.h (emit_likely_jump_insn): New prototype.
	(emit_unlikely_jump_insn): Likewise.
	* config/riscv/riscv-string.cc: New file.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/xtheadbb-strlen-unaligned.c: New test.
	* gcc.target/riscv/xtheadbb-strlen.c: New test.
	* gcc.target/riscv/zbb-strlen-disabled-2.c: New test.
	* gcc.target/riscv/zbb-strlen-disabled.c: New test.
	* gcc.target/riscv/zbb-strlen-unaligned.c: New test.
	* gcc.target/riscv/zbb-strlen.c: New test.
---
 gcc/config.gcc                                |   3 +-
 gcc/config/riscv/riscv-protos.h               |   3 +
 gcc/config/riscv/riscv-string.cc              | 183 ++++++++++++++++++
 gcc/config/riscv/riscv.md                     |  28 +++
 gcc/config/riscv/riscv.opt                    |   4 +
 gcc/config/riscv/t-riscv                      |   6 +
 gcc/config/riscv/thead.md                     |   9 +-
 gcc/doc/invoke.texi                           |  11 +-
 gcc/emit-rtl.cc                               |  24 +++
 gcc/rtl.h                                     |   2 +
 .../riscv/xtheadbb-strlen-unaligned.c         |  14 ++
 .../gcc.target/riscv/xtheadbb-strlen.c        |  19 ++
 .../gcc.target/riscv/zbb-strlen-disabled-2.c  |  15 ++
 .../gcc.target/riscv/zbb-strlen-disabled.c    |  15 ++
 .../gcc.target/riscv/zbb-strlen-unaligned.c   |  14 ++
 gcc/testsuite/gcc.target/riscv/zbb-strlen.c   |  19 ++
 16 files changed, 366 insertions(+), 3 deletions(-)
 create mode 100644 gcc/config/riscv/riscv-string.cc
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbb-strlen-unaligned.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbb-strlen.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen-disabled-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen-disabled.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen-unaligned.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index b2fe7c7ceef..aff6b6a5601 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -530,7 +530,8 @@ pru-*-*)
 	;;
 riscv*)
 	cpu_type=riscv
-	extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o riscv-vector-costs.o"
+	extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o riscv-selftests.o riscv-string.o"
+	extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o riscv-vector-costs.o"
 	extra_objs="${extra_objs} riscv-vector-builtins.o riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
 	extra_objs="${extra_objs} thead.o"
 	d_target_objs="riscv-d.o"
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 6dbf6b9f943..b060d047f01 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -517,6 +517,9 @@ const unsigned int RISCV_BUILTIN_SHIFT = 1;
 /* Mask that selects the riscv_builtin_class part of a function code.  */
 const unsigned int RISCV_BUILTIN_CLASS = (1 << RISCV_BUILTIN_SHIFT) - 1;
 
+/* Routines implemented in riscv-string.cc.  */
+extern bool riscv_expand_strlen (rtx, rtx, rtx, rtx);
+
 /* Routines implemented in thead.cc.  */
 extern bool th_mempair_operands_p (rtx[4], bool, machine_mode);
 extern void th_mempair_order_operands (rtx[4], bool, machine_mode);
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
new file mode 100644
index 00000000000..086900a6083
--- /dev/null
+++ b/gcc/config/riscv/riscv-string.cc
@@ -0,0 +1,183 @@
+/* Subroutines used to expand string operations for RISC-V.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#define IN_TARGET_CODE 1
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "rtl.h"
+#include "tree.h"
+#include "memmodel.h"
+#include "tm_p.h"
+#include "ira.h"
+#include "print-tree.h"
+#include "varasm.h"
+#include "explow.h"
+#include "expr.h"
+#include "output.h"
+#include "target.h"
+#include "predict.h"
+#include "optabs.h"
+
+/* Emit proper instruction depending on mode of dest.  */
+
+#define GEN_EMIT_HELPER2(name)				\
+static rtx_insn *					\
+do_## name ## 2(rtx dest, rtx src)			\
+{							\
+  rtx_insn *insn;					\
+  if (GET_MODE (dest) == DImode)			\
+    insn = emit_insn (gen_ ## name ## di2 (dest, src));	\
+  else							\
+    insn = emit_insn (gen_ ## name ## si2 (dest, src));	\
+  return insn;						\
+}
+
+/* Emit proper instruction depending on mode of dest.  */
+
+#define GEN_EMIT_HELPER3(name)					\
+static rtx_insn *						\
+do_## name ## 3(rtx dest, rtx src1, rtx src2)			\
+{								\
+  rtx_insn *insn;						\
+  if (GET_MODE (dest) == DImode)				\
+    insn = emit_insn (gen_ ## name ## di3 (dest, src1, src2));	\
+  else								\
+    insn = emit_insn (gen_ ## name ## si3 (dest, src1, src2));	\
+  return insn;							\
+}
+
+GEN_EMIT_HELPER3(add) /* do_add3  */
+GEN_EMIT_HELPER2(clz) /* do_clz2  */
+GEN_EMIT_HELPER2(ctz) /* do_ctz2  */
+GEN_EMIT_HELPER3(lshr) /* do_lshr3  */
+GEN_EMIT_HELPER2(orcb) /* do_orcb2  */
+GEN_EMIT_HELPER2(one_cmpl) /* do_one_cmpl2  */
+GEN_EMIT_HELPER3(sub) /* do_sub3  */
+GEN_EMIT_HELPER2(th_rev) /* do_th_rev2  */
+GEN_EMIT_HELPER2(th_tstnbz) /* do_th_tstnbz2  */
+GEN_EMIT_HELPER2(zero_extendqi) /* do_zero_extendqi2  */
+
+#undef GEN_EMIT_HELPER2
+#undef GEN_EMIT_HELPER3
+
+/* Helper function to load a byte or a Pmode register.
+
+   MODE is the mode to use for the load (QImode or Pmode).
+   DEST is the destination register for the data.
+   ADDR_REG is the register that holds the address.
+   ADDR is the address expression to load from.
+
+   This function returns an rtx containing the register,
+   where the ADDR is stored.  */
+
+static rtx
+do_load_from_addr (machine_mode mode, rtx dest, rtx addr_reg, rtx addr)
+{
+  rtx mem = gen_rtx_MEM (mode, addr_reg);
+  MEM_COPY_ATTRIBUTES (mem, addr);
+  set_mem_size (mem, GET_MODE_SIZE (mode));
+
+  if (mode == QImode)
+    do_zero_extendqi2 (dest, mem);
+  else if (mode == Xmode)
+    emit_move_insn (dest, mem);
+  else
+    gcc_unreachable ();
+
+  return addr_reg;
+}
+
+/* If the provided string is aligned, then read XLEN bytes
+   in a loop and use orc.b to find NUL-bytes.  */
+
+static bool
+riscv_expand_strlen_scalar (rtx result, rtx src, rtx align)
+{
+  rtx testval, addr, addr_plus_regsz, word, zeros;
+  rtx loop_label, cond;
+
+  gcc_assert (TARGET_ZBB || TARGET_XTHEADBB);
+
+  /* The alignment needs to be known and big enough.  */
+  if (!CONST_INT_P (align) || UINTVAL (align) < GET_MODE_SIZE (Xmode))
+    return false;
+
+  testval = gen_reg_rtx (Xmode);
+  addr = copy_addr_to_reg (XEXP (src, 0));
+  addr_plus_regsz = gen_reg_rtx (Pmode);
+  word = gen_reg_rtx (Xmode);
+  zeros = gen_reg_rtx (Xmode);
+
+  if (TARGET_ZBB)
+    emit_insn (gen_rtx_SET (testval, constm1_rtx));
+  else
+    emit_insn (gen_rtx_SET (testval, const0_rtx));
+
+  do_add3 (addr_plus_regsz, addr, GEN_INT (UNITS_PER_WORD));
+
+  loop_label = gen_label_rtx ();
+  emit_label (loop_label);
+
+  /* Load a word and use orc.b/th.tstnbz to find a zero-byte.  */
+  do_load_from_addr (Xmode, word, addr, src);
+  do_add3 (addr, addr, GEN_INT (UNITS_PER_WORD));
+  if (TARGET_ZBB)
+    do_orcb2 (word, word);
+  else
+    do_th_tstnbz2 (word, word);
+  cond = gen_rtx_EQ (VOIDmode, word, testval);
+  emit_unlikely_jump_insn (gen_cbranch4 (Xmode, cond, word, testval, loop_label));
+
+  /* Calculate the return value by counting zero-bits.  */
+  if (TARGET_ZBB)
+    do_one_cmpl2 (word, word);
+  if (TARGET_BIG_ENDIAN)
+    do_clz2 (zeros, word);
+  else if (TARGET_ZBB)
+    do_ctz2 (zeros, word);
+  else
+    {
+      do_th_rev2 (word, word);
+      do_clz2 (zeros, word);
+    }
+
+  do_lshr3 (zeros, zeros, GEN_INT (exact_log2 (BITS_PER_UNIT)));
+  do_add3 (addr, addr, zeros);
+  do_sub3 (result, addr, addr_plus_regsz);
+
+  return true;
+}
+
+/* Expand a strlen operation and return true if successful.
+   Return false if we should let the compiler generate normal
+   code, probably a strlen call.  */
+
+bool
+riscv_expand_strlen (rtx result, rtx src, rtx search_char, rtx align)
+{
+  gcc_assert (search_char == const0_rtx);
+
+  if (TARGET_ZBB || TARGET_XTHEADBB)
+    return riscv_expand_strlen_scalar (result, src, align);
+
+  return false;
+}
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 9da2a9f1c42..e078ebc43cb 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -82,6 +82,9 @@ (define_c_enum "unspec" [
 
   ;; the calling convention of callee
   UNSPEC_CALLEE_CC
+
+  ;; String unspecs
+  UNSPEC_STRLEN
 ])
 
 (define_c_enum "unspecv" [
@@ -3500,6 +3503,31 @@ (define_expand "msubhisi4"
   "TARGET_XTHEADMAC"
 )
 
+;; Search character in string (generalization of strlen).
+;; Argument 0 is the resulting offset
+;; Argument 1 is the string
+;; Argument 2 is the search character
+;; Argument 3 is the alignment
+
+(define_expand "strlen<mode>"
+  [(set (match_operand:X 0 "register_operand")
+	(unspec:X [(match_operand:BLK 1 "general_operand")
+		     (match_operand:SI 2 "const_int_operand")
+		     (match_operand:SI 3 "const_int_operand")]
+		  UNSPEC_STRLEN))]
+  "riscv_inline_strlen && !optimize_size && (TARGET_ZBB || TARGET_XTHEADBB)"
+{
+  rtx search_char = operands[2];
+
+  if (search_char != const0_rtx)
+    FAIL;
+
+  if (riscv_expand_strlen (operands[0], operands[1], operands[2], operands[3]))
+    DONE;
+  else
+    FAIL;
+})
+
 (include "bitmanip.md")
 (include "crypto.md")
 (include "sync.md")
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 98f342348b7..2491b335aef 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -278,6 +278,10 @@ minline-atomics
 Target Var(TARGET_INLINE_SUBWORD_ATOMIC) Init(1)
 Always inline subword atomic operations.
 
+minline-strlen
+Target Bool Var(riscv_inline_strlen) Init(0)
+Inline strlen calls if possible.
+
 Enum
 Name(riscv_autovec_preference) Type(enum riscv_autovec_preference_enum)
 Valid arguments to -param=riscv-autovec-preference=:
diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
index b1f80d1d87c..c012ac0cf33 100644
--- a/gcc/config/riscv/t-riscv
+++ b/gcc/config/riscv/t-riscv
@@ -91,6 +91,12 @@ riscv-selftests.o: $(srcdir)/config/riscv/riscv-selftests.cc \
 	$(COMPILE) $<
 	$(POSTCOMPILE)
 
+riscv-string.o: $(srcdir)/config/riscv/riscv-string.cc \
+  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TARGET_H) backend.h $(RTL_H) \
+  memmodel.h $(EMIT_RTL_H) poly-int.h output.h
+	$(COMPILE) $<
+	$(POSTCOMPILE)
+
 riscv-v.o: $(srcdir)/config/riscv/riscv-v.cc \
   $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) $(RTL_H) \
   $(TM_P_H) $(TARGET_H) memmodel.h insn-codes.h $(OPTABS_H) $(RECOG_H) \
diff --git a/gcc/config/riscv/thead.md b/gcc/config/riscv/thead.md
index 29f98dec3a8..982b048cb65 100644
--- a/gcc/config/riscv/thead.md
+++ b/gcc/config/riscv/thead.md
@@ -110,7 +110,7 @@ (define_insn "*th_clz<mode>2"
   [(set_attr "type" "bitmanip")
    (set_attr "mode" "<X:MODE>")])
 
-(define_insn "*th_rev<mode>2"
+(define_insn "th_rev<mode>2"
   [(set (match_operand:GPR 0 "register_operand" "=r")
 	(bswap:GPR (match_operand:GPR 1 "register_operand" "r")))]
   "TARGET_XTHEADBB && (TARGET_64BIT || <MODE>mode == SImode)"
@@ -121,6 +121,13 @@ (define_insn "*th_rev<mode>2"
   [(set_attr "type" "bitmanip")
    (set_attr "mode" "<GPR:MODE>")])
 
+(define_insn "th_tstnbz<mode>2"
+  [(set (match_operand:X 0 "register_operand" "=r")
+	(unspec:X [(match_operand:X 1 "register_operand" "r")] UNSPEC_ORC_B))]
+  "TARGET_XTHEADBB"
+  "th.tstnbz\t%0,%1"
+  [(set_attr "type" "bitmanip")])
+
 ;; XTheadBs
 
 (define_insn "*th_tst<mode>3"
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 33befee7d6b..4a9e385d009 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1236,7 +1236,8 @@ See RS/6000 and PowerPC Options.
 -mstack-protector-guard=@var{guard}  -mstack-protector-guard-reg=@var{reg}
 -mstack-protector-guard-offset=@var{offset}
 -mcsr-check -mno-csr-check
--minline-atomics  -mno-inline-atomics}
+-minline-atomics  -mno-inline-atomics
+-minline-strlen  -mno-inline-strlen}
 
 @emph{RL78 Options}
 @gccoptlist{-msim  -mmul=none  -mmul=g13  -mmul=g14  -mallregs
@@ -29359,6 +29360,14 @@ Do or don't use smaller but slower subword atomic emulation code that uses
 libatomic function calls.  The default is to use fast inline subword atomics
 that do not require libatomic.
 
+@opindex minline-strlen
+@item -minline-strlen
+@itemx -mno-inline-strlen
+Do or do not attempt to inline strlen calls if possible.
+Inlining will only be done if the string is properly aligned
+and instructions for accelerated processing are available.
+The default is to not inline strlen calls.
+
 @opindex mshorten-memrefs
 @item -mshorten-memrefs
 @itemx -mno-shorten-memrefs
diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
index f6276a2d0b6..8bd623dcd0e 100644
--- a/gcc/emit-rtl.cc
+++ b/gcc/emit-rtl.cc
@@ -5168,6 +5168,30 @@ emit_jump_insn (rtx x)
   return last;
 }
 
+/* Make an insn of code JUMP_INSN with pattern X,
+   add a REG_BR_PROB note that indicates very likely probability,
+   and add it to the end of the doubly-linked list.  */
+
+rtx_insn *
+emit_likely_jump_insn (rtx x)
+{
+  rtx_insn *jump = emit_jump_insn (x);
+  add_reg_br_prob_note (jump, profile_probability::very_likely ());
+  return jump;
+}
+
+/* Make an insn of code JUMP_INSN with pattern X,
+   add a REG_BR_PROB note that indicates very unlikely probability,
+   and add it to the end of the doubly-linked list.  */
+
+rtx_insn *
+emit_unlikely_jump_insn (rtx x)
+{
+  rtx_insn *jump = emit_jump_insn (x);
+  add_reg_br_prob_note (jump, profile_probability::very_unlikely ());
+  return jump;
+}
+
 /* Make an insn of code CALL_INSN with pattern X
    and add it to the end of the doubly-linked list.  */
 
diff --git a/gcc/rtl.h b/gcc/rtl.h
index 0e9491b89b4..102ad9b57a6 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -3347,6 +3347,8 @@ extern rtx_note *emit_note_after (enum insn_note, rtx_insn *);
 extern rtx_insn *emit_insn (rtx);
 extern rtx_insn *emit_debug_insn (rtx);
 extern rtx_insn *emit_jump_insn (rtx);
+extern rtx_insn *emit_likely_jump_insn (rtx);
+extern rtx_insn *emit_unlikely_jump_insn (rtx);
 extern rtx_insn *emit_call_insn (rtx);
 extern rtx_code_label *emit_label (rtx);
 extern rtx_jump_table_data *emit_jump_table_data (rtx);
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadbb-strlen-unaligned.c b/gcc/testsuite/gcc.target/riscv/xtheadbb-strlen-unaligned.c
new file mode 100644
index 00000000000..57a6b5ea66a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/xtheadbb-strlen-unaligned.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-minline-strlen -march=rv32gc_xtheadbb" { target { rv32 } } } */
+/* { dg-options "-minline-strlen -march=rv64gc_xtheadbb" { target { rv64 } } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
+
+typedef long unsigned int size_t;
+
+size_t
+my_str_len (const char *s)
+{
+  return __builtin_strlen (s);
+}
+
+/* { dg-final { scan-assembler-not "th.tstnbz\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadbb-strlen.c b/gcc/testsuite/gcc.target/riscv/xtheadbb-strlen.c
new file mode 100644
index 00000000000..dbc8d1e7da7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/xtheadbb-strlen.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-minline-strlen -march=rv32gc_xtheadbb" { target { rv32 } } } */
+/* { dg-options "-minline-strlen -march=rv64gc_xtheadbb" { target { rv64 } } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
+
+typedef long unsigned int size_t;
+
+size_t
+my_str_len (const char *s)
+{
+  s = __builtin_assume_aligned (s, 4096);
+  return __builtin_strlen (s);
+}
+
+/* { dg-final { scan-assembler "th.tstnbz\t" } } */
+/* { dg-final { scan-assembler-not "jalr" } } */
+/* { dg-final { scan-assembler-not "call" } } */
+/* { dg-final { scan-assembler-not "jr" } } */
+/* { dg-final { scan-assembler-not "tail" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-strlen-disabled-2.c b/gcc/testsuite/gcc.target/riscv/zbb-strlen-disabled-2.c
new file mode 100644
index 00000000000..a481068aa0c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-strlen-disabled-2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc_zbb" { target { rv32 } } } */
+/* { dg-options "-march=rv64gc_zbb" { target { rv64 } } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
+
+typedef long unsigned int size_t;
+
+size_t
+my_str_len (const char *s)
+{
+  s = __builtin_assume_aligned (s, 4096);
+  return __builtin_strlen (s);
+}
+
+/* { dg-final { scan-assembler-not "orc.b\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-strlen-disabled.c b/gcc/testsuite/gcc.target/riscv/zbb-strlen-disabled.c
new file mode 100644
index 00000000000..1295aeb0086
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-strlen-disabled.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mno-inline-strlen -march=rv32gc_zbb" { target { rv32 } } } */
+/* { dg-options "-mno-inline-strlen -march=rv64gc_zbb" { target { rv64 } } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
+
+typedef long unsigned int size_t;
+
+size_t
+my_str_len (const char *s)
+{
+  s = __builtin_assume_aligned (s, 4096);
+  return __builtin_strlen (s);
+}
+
+/* { dg-final { scan-assembler-not "orc.b\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-strlen-unaligned.c b/gcc/testsuite/gcc.target/riscv/zbb-strlen-unaligned.c
new file mode 100644
index 00000000000..326fef885d8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-strlen-unaligned.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-minline-strlen -march=rv32gc_zbb" { target { rv32 } } } */
+/* { dg-options "-minline-strlen -march=rv64gc_zbb" { target { rv64 } } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
+
+typedef long unsigned int size_t;
+
+size_t
+my_str_len (const char *s)
+{
+  return __builtin_strlen (s);
+}
+
+/* { dg-final { scan-assembler-not "orc.b\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-strlen.c b/gcc/testsuite/gcc.target/riscv/zbb-strlen.c
new file mode 100644
index 00000000000..19ebfaef16f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-strlen.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-minline-strlen -march=rv32gc_zbb" { target { rv32 } } } */
+/* { dg-options "-minline-strlen -march=rv64gc_zbb" { target { rv64 } } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
+
+typedef long unsigned int size_t;
+
+size_t
+my_str_len (const char *s)
+{
+  s = __builtin_assume_aligned (s, 4096);
+  return __builtin_strlen (s);
+}
+
+/* { dg-final { scan-assembler "orc.b\t" } } */
+/* { dg-final { scan-assembler-not "jalr" } } */
+/* { dg-final { scan-assembler-not "call" } } */
+/* { dg-final { scan-assembler-not "jr" } } */
+/* { dg-final { scan-assembler-not "tail" } } */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 2/2] riscv: Add support for str(n)cmp inline expansion
  2023-09-06 16:07 [PATCH v2 0/2] riscv: Introduce strlen/strcmp/strncmp inline expansion Christoph Muellner
  2023-09-06 16:07 ` [PATCH v2 1/2] riscv: Add support for strlen " Christoph Muellner
@ 2023-09-06 16:07 ` Christoph Muellner
  2023-09-12  3:34   ` Jeff Law
  1 sibling, 1 reply; 10+ messages in thread
From: Christoph Muellner @ 2023-09-06 16:07 UTC (permalink / raw)
  To: gcc-patches, Kito Cheng, Jim Wilson, Palmer Dabbelt,
	Andrew Waterman, Philipp Tomsich, Jeff Law, Vineet Gupta
  Cc: Christoph Müllner

From: Christoph Müllner <christoph.muellner@vrull.eu>

This patch implements expansions for the cmpstrsi and cmpstrnsi
builtins for RV32/RV64 for xlen-aligned strings if Zbb or XTheadBb
instructions are available.  The expansion basically emits a comparison
sequence which compares XLEN bits per step if possible.

This allows to inline calls to strcmp() and strncmp() if both strings
are xlen-aligned.  For strncmp() the length parameter needs to be known.
The benefits over calls to libc are:
* no call/ret instructions
* no stack frame allocation
* no register saving/restoring
* no alignment tests

The inlining mechanism is gated by a new switches ('-minline-strcmp' and
'-minline-strncmp') and by the variable 'optimize_size'.
The amount of emitted unrolled loop iterations can be controlled by the
parameter '--param=riscv-strcmp-inline-limit=N', which defaults to 64.

The comparision sequence is inspired by the strcmp example
in the appendix of the Bitmanip specification (incl. the fast
result calculation in case the first word does not contain
a NULL byte).  Additional inspiration comes from rs6000-string.c.

The emitted sequence is not triggering any readahead pagefault issues,
because only aligned strings are accessed by aligned xlen-loads.

This patch has been tested using the glibc string tests on QEMU:
* rv64gc_zbb/rv64gc_xtheadbb with riscv-strcmp-inline-limit=64
* rv64gc_zbb/rv64gc_xtheadbb with riscv-strcmp-inline-limit=8
* rv32gc_zbb/rv32gc_xtheadbb with riscv-strcmp-inline-limit=64

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

gcc/ChangeLog:

	* config/riscv/bitmanip.md (*<optab>_not<mode>): Export INSN name.
	(<optab>_not<mode>3): Likewise.
	* config/riscv/riscv-protos.h (riscv_expand_strcmp): New
	prototype.
	* config/riscv/riscv-string.cc (GEN_EMIT_HELPER3): New helper
	macros.
	(GEN_EMIT_HELPER2): Likewise.
	(emit_strcmp_scalar_compare_byte): New function.
	(emit_strcmp_scalar_compare_subword): Likewise.
	(emit_strcmp_scalar_compare_word): Likewise.
	(emit_strcmp_scalar_load_and_compare): Likewise.
	(emit_strcmp_scalar_call_to_libc): Likewise.
	(emit_strcmp_scalar_result_calculation_nonul): Likewise.
	(emit_strcmp_scalar_result_calculation): Likewise.
	(riscv_expand_strcmp_scalar): Likewise.
	(riscv_expand_strcmp): Likewise.
	* config/riscv/riscv.md (*slt<u>_<X:mode><GPR:mode>): Export
	INSN name.
	(@slt<u>_<X:mode><GPR:mode>3): Likewise.
	(cmpstrnsi): Invoke expansion function for str(n)cmp.
	(cmpstrsi): Likewise.
	* config/riscv/riscv.opt: Add new parameter
	'-mstring-compare-inline-limit'.
	* doc/invoke.texi: Document new parameter
	'-mstring-compare-inline-limit'.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/xtheadbb-strcmp-unaligned.c: New test.
	* gcc.target/riscv/xtheadbb-strcmp.c: New test.
	* gcc.target/riscv/zbb-strcmp-disabled-2.c: New test.
	* gcc.target/riscv/zbb-strcmp-disabled.c: New test.
	* gcc.target/riscv/zbb-strcmp-unaligned.c: New test.
	* gcc.target/riscv/zbb-strcmp.c: New test.
---
 gcc/config/riscv/bitmanip.md                  |   2 +-
 gcc/config/riscv/riscv-protos.h               |   1 +
 gcc/config/riscv/riscv-string.cc              | 411 ++++++++++++++++++
 gcc/config/riscv/riscv.md                     |  44 +-
 gcc/config/riscv/riscv.opt                    |  12 +
 gcc/doc/invoke.texi                           |  20 +-
 .../gcc.target/riscv/xtheadbb-strcmp.c        |  57 +++
 .../gcc.target/riscv/zbb-strcmp-disabled-2.c  |  38 ++
 .../gcc.target/riscv/zbb-strcmp-disabled.c    |  38 ++
 .../gcc.target/riscv/zbb-strcmp-limit.c       |  57 +++
 .../gcc.target/riscv/zbb-strcmp-unaligned.c   |  38 ++
 gcc/testsuite/gcc.target/riscv/zbb-strcmp.c   |  57 +++
 12 files changed, 772 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbb-strcmp.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strcmp-disabled-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strcmp-disabled.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strcmp-limit.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strcmp-unaligned.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strcmp.c

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 1544ef4e125..1e90636dd60 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -206,7 +206,7 @@ (define_expand "popcount<mode>2"
 	(popcount:GPR (match_operand:GPR 1 "register_operand")))]
   "TARGET_ZBB")
 
-(define_insn "*<optab>_not<mode>"
+(define_insn "<optab>_not<mode>3"
   [(set (match_operand:X 0 "register_operand" "=r")
         (bitmanip_bitwise:X (not:X (match_operand:X 1 "register_operand" "r"))
                             (match_operand:X 2 "register_operand" "r")))]
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index b060d047f01..0006fe0564e 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -518,6 +518,7 @@ const unsigned int RISCV_BUILTIN_SHIFT = 1;
 const unsigned int RISCV_BUILTIN_CLASS = (1 << RISCV_BUILTIN_SHIFT) - 1;
 
 /* Routines implemented in riscv-string.cc.  */
+extern bool riscv_expand_strcmp (rtx, rtx, rtx, rtx, rtx);
 extern bool riscv_expand_strlen (rtx, rtx, rtx, rtx);
 
 /* Routines implemented in thead.cc.  */
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 086900a6083..2bdff0374e8 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -66,14 +66,22 @@ do_## name ## 3(rtx dest, rtx src1, rtx src2)			\
 }
 
 GEN_EMIT_HELPER3(add) /* do_add3  */
+GEN_EMIT_HELPER3(and) /* do_and3  */
+GEN_EMIT_HELPER3(ashl) /* do_ashl3  */
+GEN_EMIT_HELPER2(bswap) /* do_bswap2  */
 GEN_EMIT_HELPER2(clz) /* do_clz2  */
 GEN_EMIT_HELPER2(ctz) /* do_ctz2  */
+GEN_EMIT_HELPER3(ior) /* do_ior3  */
+GEN_EMIT_HELPER3(ior_not) /* do_ior_not3  */
 GEN_EMIT_HELPER3(lshr) /* do_lshr3  */
+GEN_EMIT_HELPER2(neg) /* do_neg2  */
 GEN_EMIT_HELPER2(orcb) /* do_orcb2  */
 GEN_EMIT_HELPER2(one_cmpl) /* do_one_cmpl2  */
+GEN_EMIT_HELPER3(rotr) /* do_rotr3  */
 GEN_EMIT_HELPER3(sub) /* do_sub3  */
 GEN_EMIT_HELPER2(th_rev) /* do_th_rev2  */
 GEN_EMIT_HELPER2(th_tstnbz) /* do_th_tstnbz2  */
+GEN_EMIT_HELPER3(xor) /* do_xor3  */
 GEN_EMIT_HELPER2(zero_extendqi) /* do_zero_extendqi2  */
 
 #undef GEN_EMIT_HELPER2
@@ -106,6 +114,409 @@ do_load_from_addr (machine_mode mode, rtx dest, rtx addr_reg, rtx addr)
   return addr_reg;
 }
 
+/* Generate a sequence to compare single characters in data1 and data2.
+
+   RESULT is the register where the return value of str(n)cmp will be stored.
+   DATA1 is a register which contains character1.
+   DATA2 is a register which contains character2.
+   FINAL_LABEL is the location after the calculation of the return value.  */
+
+static void
+emit_strcmp_scalar_compare_byte (rtx result, rtx data1, rtx data2,
+				 rtx final_label)
+{
+  rtx tmp = gen_reg_rtx (Xmode);
+  do_sub3 (tmp, data1, data2);
+  emit_insn (gen_movsi (result, gen_lowpart (SImode, tmp)));
+  emit_jump_insn (gen_jump (final_label));
+  emit_barrier (); /* No fall-through.  */
+}
+
+/* Generate a sequence to compare two strings in data1 and data2.
+
+   DATA1 is a register which contains string1.
+   DATA2 is a register which contains string2.
+   ORC1 is a register where orc.b(data1) will be stored.
+   CMP_BYTES is the length of the strings.
+   END_LABEL is the location of the code that calculates the return value.  */
+
+static void
+emit_strcmp_scalar_compare_subword (rtx data1, rtx data2, rtx orc1,
+				    unsigned HOST_WIDE_INT cmp_bytes,
+				    rtx end_label)
+{
+  /* Set a NUL-byte after the relevant data (behind the string).  */
+  long long im = -256ll;
+  rtx imask = gen_rtx_CONST_INT (Xmode, im);
+  rtx m_reg = gen_reg_rtx (Xmode);
+  emit_insn (gen_rtx_SET (m_reg, imask));
+  do_rotr3 (m_reg, m_reg, GEN_INT (64 - cmp_bytes * BITS_PER_UNIT));
+  do_and3 (data1, m_reg, data1);
+  do_and3 (data2, m_reg, data2);
+  if (TARGET_ZBB)
+    do_orcb2 (orc1, data1);
+  else
+    do_th_tstnbz2 (orc1, data1);
+  emit_jump_insn (gen_jump (end_label));
+  emit_barrier (); /* No fall-through.  */
+}
+
+/* Generate a sequence to compare two strings in data1 and data2.
+
+   DATA1 is a register which contains string1.
+   DATA2 is a register which contains string2.
+   ORC1 is a register where orc.b(data1) will be stored.
+   TESTVAL is the value to test ORC1 against.
+   END_LABEL is the location of the code that calculates the return value.
+   NONUL_END_LABEL is the location of the code that calculates the return value
+   in case the first string does not contain a NULL-byte.  */
+
+static void
+emit_strcmp_scalar_compare_word (rtx data1, rtx data2, rtx orc1, rtx testval,
+				 rtx end_label, rtx nonul_end_label)
+{
+  /* Check if data1 contains a NUL character.  */
+  if (TARGET_ZBB)
+    do_orcb2 (orc1, data1);
+  else
+    do_th_tstnbz2 (orc1, data1);
+  rtx cond1 = gen_rtx_NE (VOIDmode, orc1, testval);
+  emit_unlikely_jump_insn (gen_cbranch4 (Pmode, cond1, orc1, testval,
+					  end_label));
+  /* Break out if u1 != u2 */
+  rtx cond2 = gen_rtx_NE (VOIDmode, data1, data2);
+  emit_unlikely_jump_insn (gen_cbranch4 (Pmode, cond2, data1,
+					 data2, nonul_end_label));
+  /* Fall-through on equality.  */
+}
+
+/* Generate the sequence of compares for strcmp/strncmp using zbb instructions.
+
+   RESULT is the register where the return value of str(n)cmp will be stored.
+   The strings are referenced by SRC1 and SRC2.
+   The number of bytes to compare is defined by NBYTES.
+   DATA1 is a register where string1 will be stored.
+   DATA2 is a register where string2 will be stored.
+   ORC1 is a register where orc.b(data1) will be stored.
+   END_LABEL is the location of the code that calculates the return value.
+   NONUL_END_LABEL is the location of the code that calculates the return value
+   in case the first string does not contain a NULL-byte.
+   FINAL_LABEL is the location of the code that comes after the calculation
+   of the return value.  */
+
+static void
+emit_strcmp_scalar_load_and_compare (rtx result, rtx src1, rtx src2,
+				     unsigned HOST_WIDE_INT nbytes,
+				     rtx data1, rtx data2, rtx orc1,
+				     rtx end_label, rtx nonul_end_label,
+				     rtx final_label)
+{
+  const unsigned HOST_WIDE_INT xlen = GET_MODE_SIZE (Xmode);
+  rtx src1_addr = force_reg (Pmode, XEXP (src1, 0));
+  rtx src2_addr = force_reg (Pmode, XEXP (src2, 0));
+  unsigned HOST_WIDE_INT offset = 0;
+
+  rtx testval = gen_reg_rtx (Xmode);
+  if (TARGET_ZBB)
+    emit_insn (gen_rtx_SET (testval, constm1_rtx));
+  else
+    emit_insn (gen_rtx_SET (testval, const0_rtx));
+
+  while (nbytes > 0)
+    {
+      unsigned HOST_WIDE_INT cmp_bytes = xlen < nbytes ? xlen : nbytes;
+      machine_mode load_mode;
+      if (cmp_bytes == 1)
+	load_mode = QImode;
+      else
+	load_mode = Xmode;
+
+      rtx addr1 = gen_rtx_PLUS (Pmode, src1_addr, GEN_INT (offset));
+      do_load_from_addr (load_mode, data1, addr1, src1);
+      rtx addr2 = gen_rtx_PLUS (Pmode, src2_addr, GEN_INT (offset));
+      do_load_from_addr (load_mode, data2, addr2, src2);
+
+      if (cmp_bytes == 1)
+	{
+	  emit_strcmp_scalar_compare_byte (result, data1, data2, final_label);
+	  return;
+	}
+      else if (cmp_bytes < xlen)
+	{
+	  emit_strcmp_scalar_compare_subword (data1, data2, orc1,
+					      cmp_bytes, end_label);
+	  return;
+	}
+      else
+	emit_strcmp_scalar_compare_word (data1, data2, orc1, testval,
+					 end_label, nonul_end_label);
+
+      offset += cmp_bytes;
+      nbytes -= cmp_bytes;
+    }
+}
+
+/* Fixup pointers and generate a call to strcmp.
+
+   RESULT is the register where the return value of str(n)cmp will be stored.
+   The strings are referenced by SRC1 and SRC2.
+   The number of already compared bytes is defined by NBYTES.  */
+
+static void
+emit_strcmp_scalar_call_to_libc (rtx result, rtx src1, rtx src2,
+				 unsigned HOST_WIDE_INT nbytes)
+{
+  /* Update pointers past what has been compared already.  */
+  rtx src1_addr = force_reg (Pmode, XEXP (src1, 0));
+  rtx src2_addr = force_reg (Pmode, XEXP (src2, 0));
+  rtx src1_new = force_reg (Pmode,
+			    gen_rtx_PLUS (Pmode, src1_addr, GEN_INT (nbytes)));
+  rtx src2_new = force_reg (Pmode,
+			    gen_rtx_PLUS (Pmode, src2_addr, GEN_INT (nbytes)));
+
+  /* Construct call to strcmp to compare the rest of the string.  */
+  tree fun = builtin_decl_explicit (BUILT_IN_STRCMP);
+  emit_library_call_value (XEXP (DECL_RTL (fun), 0),
+			   result, LCT_NORMAL, GET_MODE (result),
+			   src1_new, Pmode, src2_new, Pmode);
+}
+
+/* Fast strcmp-result calculation if no NULL-byte in string1.
+
+   RESULT is the register where the return value of str(n)cmp will be stored.
+   The mismatching strings are stored in DATA1 and DATA2.  */
+
+static void
+emit_strcmp_scalar_result_calculation_nonul (rtx result, rtx data1, rtx data2)
+{
+  /* Words don't match, and no NUL byte in one word.
+     Get bytes in big-endian order and compare as words.  */
+  do_bswap2 (data1, data1);
+  do_bswap2 (data2, data2);
+  /* Synthesize (data1 >= data2) ? 1 : -1 in a branchless sequence.  */
+  rtx tmp = gen_reg_rtx (Xmode);
+  emit_insn (gen_slt_3 (LTU, Xmode, Xmode, tmp, data1, data2));
+  do_neg2 (tmp, tmp);
+  do_ior3 (tmp, tmp, const1_rtx);
+  emit_insn (gen_movsi (result, gen_lowpart (SImode, tmp)));
+}
+
+/* strcmp-result calculation.
+
+   RESULT is the register where the return value of str(n)cmp will be stored.
+   The strings are stored in DATA1 and DATA2.
+   ORC1 contains orc.b(DATA1).  */
+
+static void
+emit_strcmp_scalar_result_calculation (rtx result, rtx data1, rtx data2,
+				       rtx orc1)
+{
+  const unsigned HOST_WIDE_INT xlen = GET_MODE_SIZE (Xmode);
+
+  /* Convert non-equal bytes into non-NUL bytes.  */
+  rtx diff = gen_reg_rtx (Xmode);
+  do_xor3 (diff, data1, data2);
+  rtx shift = gen_reg_rtx (Xmode);
+
+  if (TARGET_ZBB)
+    {
+      /* Convert non-equal or NUL-bytes into non-NUL bytes.  */
+      rtx syndrome = gen_reg_rtx (Xmode);
+      do_orcb2 (diff, diff);
+      do_ior_not3 (syndrome, orc1, diff);
+      /* Count the number of equal bits from the beginning of the word.  */
+      do_ctz2 (shift, syndrome);
+    }
+  else
+    {
+      /* Convert non-equal or NUL-bytes into non-NUL bytes.  */
+      rtx syndrome = gen_reg_rtx (Xmode);
+      do_th_tstnbz2 (diff, diff);
+      do_one_cmpl2 (diff, diff);
+      do_ior3 (syndrome, orc1, diff);
+      /* Count the number of equal bits from the beginning of the word.  */
+      do_th_rev2 (syndrome, syndrome);
+      do_clz2 (shift, syndrome);
+    }
+
+  do_bswap2 (data1, data1);
+  do_bswap2 (data2, data2);
+
+  /* The most-significant-non-zero bit of the syndrome marks either the
+     first bit that is different, or the top bit of the first zero byte.
+     Shifting left now will bring the critical information into the
+     top bits.  */
+  do_ashl3 (data1, data1, gen_lowpart (QImode, shift));
+  do_ashl3 (data2, data2, gen_lowpart (QImode, shift));
+
+  /* But we need to zero-extend (char is unsigned) the value and then
+     perform a signed 32-bit subtraction.  */
+  unsigned int shiftr = (xlen - 1) * BITS_PER_UNIT;
+  do_lshr3 (data1, data1, GEN_INT (shiftr));
+  do_lshr3 (data2, data2, GEN_INT (shiftr));
+  rtx tmp = gen_reg_rtx (Xmode);
+  do_sub3 (tmp, data1, data2);
+  emit_insn (gen_movsi (result, gen_lowpart (SImode, tmp)));
+}
+
+/* Expand str(n)cmp using Zbb/TheadBb instructions.
+
+   The result will be stored in RESULT.
+   The strings are referenced by SRC1 and SRC2.
+   The number of bytes to compare is defined by NBYTES.
+   The alignment is defined by ALIGNMENT.
+   If NCOMPARE is false then libc's strcmp() will be called if comparing
+   NBYTES of both strings did not find differences or NULL-bytes.
+
+   Return true if expansion was successful, or false otherwise.  */
+
+static bool
+riscv_expand_strcmp_scalar (rtx result, rtx src1, rtx src2,
+			    unsigned HOST_WIDE_INT nbytes,
+			    unsigned HOST_WIDE_INT alignment,
+			    bool ncompare)
+{
+  const unsigned HOST_WIDE_INT xlen = GET_MODE_SIZE (Xmode);
+
+  gcc_assert (TARGET_ZBB || TARGET_XTHEADBB);
+  gcc_assert (nbytes > 0);
+  gcc_assert ((int)nbytes <= riscv_strcmp_inline_limit);
+  gcc_assert (ncompare || (nbytes & (xlen - 1)) == 0);
+
+  /* Limit to 12-bits (maximum load-offset).  */
+  if (nbytes > IMM_REACH)
+    nbytes = IMM_REACH;
+
+  /* We don't support big endian.  */
+  if (BYTES_BIG_ENDIAN)
+    return false;
+
+  /* We need xlen-aligned strings.  */
+  if (alignment < xlen)
+    return false;
+
+  /* Overall structure of emitted code:
+       Load-and-compare:
+	 - Load data1 and data2
+	 - Set orc1 := orc.b (data1) (or th.tstnbz)
+	 - Compare strings and either:
+	   - Fall-through on equality
+	   - Jump to nonul_end_label if data1 !or end_label
+	   - Calculate result value and jump to final_label
+       // Fall-through
+       Call-to-libc or set result to 0 (depending on ncompare)
+       Jump to final_label
+     nonul_end_label: // words don't match, and no null byte in first word.
+       Calculate result value with the use of data1, data2 and orc1
+       Jump to final_label
+     end_label:
+       Calculate result value with the use of data1, data2 and orc1
+       Jump to final_label
+     final_label:
+       // Nothing.  */
+
+  rtx data1 = gen_reg_rtx (Xmode);
+  rtx data2 = gen_reg_rtx (Xmode);
+  rtx orc1 = gen_reg_rtx (Xmode);
+  rtx nonul_end_label = gen_label_rtx ();
+  rtx end_label = gen_label_rtx ();
+  rtx final_label = gen_label_rtx ();
+
+  /* Generate a sequence of zbb instructions to compare out
+     to the length specified.  */
+  emit_strcmp_scalar_load_and_compare (result, src1, src2, nbytes,
+				       data1, data2, orc1,
+				       end_label, nonul_end_label, final_label);
+
+  /* All compared and everything was equal.  */
+  if (ncompare)
+    {
+      emit_insn (gen_rtx_SET (result, gen_rtx_CONST_INT (SImode, 0)));
+      emit_jump_insn (gen_jump (final_label));
+      emit_barrier (); /* No fall-through.  */
+    }
+  else
+    {
+      emit_strcmp_scalar_call_to_libc (result, src1, src2, nbytes);
+      emit_jump_insn (gen_jump (final_label));
+      emit_barrier (); /* No fall-through.  */
+    }
+
+
+  emit_label (nonul_end_label);
+  emit_strcmp_scalar_result_calculation_nonul (result, data1, data2);
+  emit_jump_insn (gen_jump (final_label));
+  emit_barrier (); /* No fall-through.  */
+
+  emit_label (end_label);
+  emit_strcmp_scalar_result_calculation (result, data1, data2, orc1);
+  emit_jump_insn (gen_jump (final_label));
+  emit_barrier (); /* No fall-through.  */
+
+  emit_label (final_label);
+  return true;
+}
+
+/* Expand a string compare operation.
+
+   The result will be stored in RESULT.
+   The strings are referenced by SRC1 and SRC2.
+   The argument BYTES_RTX either holds the number of characters to
+   compare, or is NULL_RTX. The argument ALIGN_RTX holds the alignment.
+
+   Return true if expansion was successful, or false otherwise.  */
+
+bool
+riscv_expand_strcmp (rtx result, rtx src1, rtx src2,
+		     rtx bytes_rtx, rtx align_rtx)
+{
+  unsigned HOST_WIDE_INT compare_max;
+  unsigned HOST_WIDE_INT nbytes;
+  unsigned HOST_WIDE_INT alignment;
+  bool ncompare = bytes_rtx != NULL_RTX;
+  const unsigned HOST_WIDE_INT xlen = GET_MODE_SIZE (Xmode);
+
+  if (riscv_strcmp_inline_limit == 0)
+    return false;
+
+  /* Round down the comparision limit to a multiple of xlen.  */
+  compare_max = riscv_strcmp_inline_limit & ~(xlen - 1);
+
+  /* Decide how many bytes to compare inline.  */
+  if (bytes_rtx == NULL_RTX)
+    {
+      nbytes = compare_max;
+    }
+  else
+    {
+      /* If we have a length, it must be constant.  */
+      if (!CONST_INT_P (bytes_rtx))
+	return false;
+      nbytes = UINTVAL (bytes_rtx);
+
+      /* We don't emit parts of a strncmp() call.  */
+      if (nbytes > compare_max)
+	return false;
+    }
+
+  /* Guarantees:
+     - nbytes > 0
+     - nbytes <= riscv_strcmp_inline_limit
+     - nbytes is a multiple of xlen if !ncompare  */
+
+  if (!CONST_INT_P (align_rtx))
+    return false;
+  alignment = UINTVAL (align_rtx);
+
+  if (TARGET_ZBB || TARGET_XTHEADBB)
+    {
+      return riscv_expand_strcmp_scalar (result, src1, src2, nbytes, alignment,
+					 ncompare);
+    }
+
+  return false;
+}
+
 /* If the provided string is aligned, then read XLEN bytes
    in a loop and use orc.b to find NUL-bytes.  */
 
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index e078ebc43cb..26db7081b15 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2857,7 +2857,7 @@ (define_insn "*sge<u>_<X:mode><GPR:mode>"
   [(set_attr "type" "slt")
    (set_attr "mode" "<X:MODE>")])
 
-(define_insn "*slt<u>_<X:mode><GPR:mode>"
+(define_insn "@slt<u>_<X:mode><GPR:mode>3"
   [(set (match_operand:GPR           0 "register_operand" "= r")
 	(any_lt:GPR (match_operand:X 1 "register_operand" "  r")
 		    (match_operand:X 2 "arith_operand"    " rI")))]
@@ -3503,6 +3503,48 @@ (define_expand "msubhisi4"
   "TARGET_XTHEADMAC"
 )
 
+;; String compare with length insn.
+;; Argument 0 is the target (result)
+;; Argument 1 is the source1
+;; Argument 2 is the source2
+;; Argument 3 is the length
+;; Argument 4 is the alignment
+
+(define_expand "cmpstrnsi"
+  [(parallel [(set (match_operand:SI 0)
+	      (compare:SI (match_operand:BLK 1)
+			  (match_operand:BLK 2)))
+	      (use (match_operand:SI 3))
+	      (use (match_operand:SI 4))])]
+  "riscv_inline_strncmp && !optimize_size && (TARGET_ZBB || TARGET_XTHEADBB)"
+{
+  if (riscv_expand_strcmp (operands[0], operands[1], operands[2],
+                           operands[3], operands[4]))
+    DONE;
+  else
+    FAIL;
+})
+
+;; String compare insn.
+;; Argument 0 is the target (result)
+;; Argument 1 is the source1
+;; Argument 2 is the source2
+;; Argument 3 is the alignment
+
+(define_expand "cmpstrsi"
+  [(parallel [(set (match_operand:SI 0)
+	      (compare:SI (match_operand:BLK 1)
+			  (match_operand:BLK 2)))
+	      (use (match_operand:SI 3))])]
+  "riscv_inline_strcmp && !optimize_size && (TARGET_ZBB || TARGET_XTHEADBB)"
+{
+  if (riscv_expand_strcmp (operands[0], operands[1], operands[2],
+                           NULL_RTX, operands[3]))
+    DONE;
+  else
+    FAIL;
+})
+
 ;; Search character in string (generalization of strlen).
 ;; Argument 0 is the resulting offset
 ;; Argument 1 is the string
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 2491b335aef..311f52c3d2d 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -278,10 +278,22 @@ minline-atomics
 Target Var(TARGET_INLINE_SUBWORD_ATOMIC) Init(1)
 Always inline subword atomic operations.
 
+minline-strcmp
+Target Bool Var(riscv_inline_strcmp) Init(0)
+Inline strcmp calls if possible.
+
+minline-strncmp
+Target Bool Var(riscv_inline_strncmp) Init(0)
+Inline strncmp calls if possible.
+
 minline-strlen
 Target Bool Var(riscv_inline_strlen) Init(0)
 Inline strlen calls if possible.
 
+-param=riscv-strcmp-inline-limit=
+Target RejectNegative Joined UInteger Var(riscv_strcmp_inline_limit) Init(64)
+Max number of bytes to compare as part of inlined strcmp/strncmp routines (default: 64).
+
 Enum
 Name(riscv_autovec_preference) Type(enum riscv_autovec_preference_enum)
 Valid arguments to -param=riscv-autovec-preference=:
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4a9e385d009..03d93e6b185 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1237,7 +1237,9 @@ See RS/6000 and PowerPC Options.
 -mstack-protector-guard-offset=@var{offset}
 -mcsr-check -mno-csr-check
 -minline-atomics  -mno-inline-atomics
--minline-strlen  -mno-inline-strlen}
+-minline-strlen  -mno-inline-strlen
+-minline-strcmp  -mno-inline-strcmp
+-minline-strncmp  -mno-inline-strncmp}
 
 @emph{RL78 Options}
 @gccoptlist{-msim  -mmul=none  -mmul=g13  -mmul=g14  -mallregs
@@ -29368,6 +29370,22 @@ Inlining will only be done if the string is properly aligned
 and instructions for accelerated processing are available.
 The default is to not inline strlen calls.
 
+@opindex minline-strcmp
+@item -minline-strcmp
+@itemx -mno-inline-strcmp
+Do or do not attempt to inline strcmp calls if possible.
+Inlining will only be done if the strings are properly aligned
+and instructions for accelerated processing are available.
+The default is to not inline strcmp calls.
+
+@opindex minline-strncmp
+@item -minline-strncmp
+@itemx -mno-inline-strncmp
+Do or do not attempt to inline strncmp calls if possible.
+Inlining will only be done if the strings are properly aligned
+and instructions for accelerated processing are available.
+The default is to not inline strncmp calls.
+
 @opindex mshorten-memrefs
 @item -mshorten-memrefs
 @itemx -mno-shorten-memrefs
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadbb-strcmp.c b/gcc/testsuite/gcc.target/riscv/xtheadbb-strcmp.c
new file mode 100644
index 00000000000..6b88912d828
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/xtheadbb-strcmp.c
@@ -0,0 +1,57 @@
+/* { dg-do compile } */
+/* { dg-options "-minline-strcmp -minline-strncmp -march=rv32gc_xtheadbb" { target { rv32 } } } */
+/* { dg-options "-minline-strcmp -minline-strncmp -march=rv64gc_xtheadbb" { target { rv64 } } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
+
+typedef long unsigned int size_t;
+
+/* Emits 8+1 th.tstnbz instructions.  */
+
+int
+my_str_cmp (const char *s1, const char *s2)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  s2 = __builtin_assume_aligned (s2, 4096);
+  return __builtin_strcmp (s1, s2);
+}
+
+/* 8+1 because the backend does not know the size of "foo".  */
+
+int
+my_str_cmp_const (const char *s1)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  return __builtin_strcmp (s1, "foo");
+}
+
+/* Emits 6+1 th.tstnbz instructions.  */
+
+int
+my_strn_cmp (const char *s1, const char *s2)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  s2 = __builtin_assume_aligned (s2, 4096);
+  return __builtin_strncmp (s1, s2, 42);
+}
+
+/* Note expanded because the backend does not know the size of "foo".  */
+
+int
+my_strn_cmp_const (const char *s1, size_t n)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  return __builtin_strncmp (s1, "foo", n);
+}
+
+/* Emits 6+1 th.tstnbz instructions.  */
+
+int
+my_strn_cmp_bounded (const char *s1, const char *s2)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  s2 = __builtin_assume_aligned (s2, 4096);
+  return __builtin_strncmp (s1, s2, 42);
+}
+
+/* { dg-final { scan-assembler-times "th.tstnbz\t" 32 { target { rv64 } } } } */
+/* { dg-final { scan-assembler-times "th.tstnbz\t" 58 { target { rv32 } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-strcmp-disabled-2.c b/gcc/testsuite/gcc.target/riscv/zbb-strcmp-disabled-2.c
new file mode 100644
index 00000000000..f0b3cd542e0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-strcmp-disabled-2.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc_zbb" { target { rv32 } } } */
+/* { dg-options "-march=rv64gc_zbb" { target { rv64 } } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
+
+typedef long unsigned int size_t;
+
+int
+my_str_cmp (const char *s1, const char *s2)
+{
+  return __builtin_strcmp (s1, s2);
+}
+
+int
+my_str_cmp_const (const char *s1)
+{
+  return __builtin_strcmp (s1, "foo");
+}
+
+int
+my_strn_cmp (const char *s1, const char *s2, size_t n)
+{
+  return __builtin_strncmp (s1, s2, n);
+}
+
+int
+my_strn_cmp_const (const char *s1, size_t n)
+{
+  return __builtin_strncmp (s1, "foo", n);
+}
+
+int
+my_strn_cmp_bounded (const char *s1, const char *s2)
+{
+  return __builtin_strncmp (s1, s2, 42);
+}
+
+/* { dg-final { scan-assembler-not "orc.b\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-strcmp-disabled.c b/gcc/testsuite/gcc.target/riscv/zbb-strcmp-disabled.c
new file mode 100644
index 00000000000..68497d53280
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-strcmp-disabled.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-mno-inline-strcmp -mno-inline-strncmp -march=rv32gc_zbb" { target { rv32 } } } */
+/* { dg-options "-mno-inline-strcmp -mno-inline-strncmp -march=rv64gc_zbb" { target { rv64 } } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
+
+typedef long unsigned int size_t;
+
+int
+my_str_cmp (const char *s1, const char *s2)
+{
+  return __builtin_strcmp (s1, s2);
+}
+
+int
+my_str_cmp_const (const char *s1)
+{
+  return __builtin_strcmp (s1, "foo");
+}
+
+int
+my_strn_cmp (const char *s1, const char *s2, size_t n)
+{
+  return __builtin_strncmp (s1, s2, n);
+}
+
+int
+my_strn_cmp_const (const char *s1, size_t n)
+{
+  return __builtin_strncmp (s1, "foo", n);
+}
+
+int
+my_strn_cmp_bounded (const char *s1, const char *s2)
+{
+  return __builtin_strncmp (s1, s2, 42);
+}
+
+/* { dg-final { scan-assembler-not "orc.b\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-strcmp-limit.c b/gcc/testsuite/gcc.target/riscv/zbb-strcmp-limit.c
new file mode 100644
index 00000000000..6bcbd70b542
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-strcmp-limit.c
@@ -0,0 +1,57 @@
+/* { dg-do compile } */
+/* { dg-options "-minline-strcmp -minline-strncmp --param=riscv-strcmp-inline-limit=32 -march=rv32gc_zbb" { target { rv32 } } } */
+/* { dg-options "-minline-strcmp -minline-strncmp --param=riscv-strcmp-inline-limit=32 -march=rv64gc_zbb" { target { rv64 } } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
+
+typedef long unsigned int size_t;
+
+/* Emits 8+1 orc.b instructions.  */
+
+int
+my_str_cmp (const char *s1, const char *s2)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  s2 = __builtin_assume_aligned (s2, 4096);
+  return __builtin_strcmp (s1, s2);
+}
+
+/* 8+1 because the backend does not know the size of "foo".  */
+
+int
+my_str_cmp_const (const char *s1)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  return __builtin_strcmp (s1, "foo");
+}
+
+/* Emits 6+1 orc.b instructions.  */
+
+int
+my_strn_cmp (const char *s1, const char *s2)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  s2 = __builtin_assume_aligned (s2, 4096);
+  return __builtin_strncmp (s1, s2, 42);
+}
+
+/* Note expanded because the backend does not know the size of "foo".  */
+
+int
+my_strn_cmp_const (const char *s1, size_t n)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  return __builtin_strncmp (s1, "foo", n);
+}
+
+/* Emits 6+1 orc.b instructions.  */
+
+int
+my_strn_cmp_bounded (const char *s1, const char *s2)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  s2 = __builtin_assume_aligned (s2, 4096);
+  return __builtin_strncmp (s1, s2, 42);
+}
+
+/* { dg-final { scan-assembler-times "orc.b\t" 10 { target { rv64 } } } } */
+/* { dg-final { scan-assembler-times "orc.b\t" 18 { target { rv32 } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-strcmp-unaligned.c b/gcc/testsuite/gcc.target/riscv/zbb-strcmp-unaligned.c
new file mode 100644
index 00000000000..191187643c1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-strcmp-unaligned.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-minline-strcmp -minline-strncmp -march=rv32gc_zbb" { target { rv32 } } } */
+/* { dg-options "-minline-strcmp -minline-strncmp -march=rv64gc_zbb" { target { rv64 } } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
+
+typedef long unsigned int size_t;
+
+int
+my_str_cmp (const char *s1, const char *s2)
+{
+  return __builtin_strcmp (s1, s2);
+}
+
+int
+my_str_cmp_const (const char *s1)
+{
+  return __builtin_strcmp (s1, "foo");
+}
+
+int
+my_strn_cmp (const char *s1, const char *s2, size_t n)
+{
+  return __builtin_strncmp (s1, s2, n);
+}
+
+int
+my_strn_cmp_const (const char *s1, size_t n)
+{
+  return __builtin_strncmp (s1, "foo", n);
+}
+
+int
+my_strn_cmp_bounded (const char *s1, const char *s2)
+{
+  return __builtin_strncmp (s1, s2, 42);
+}
+
+/* { dg-final { scan-assembler-not "orc.b\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-strcmp.c b/gcc/testsuite/gcc.target/riscv/zbb-strcmp.c
new file mode 100644
index 00000000000..f64aa34a162
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-strcmp.c
@@ -0,0 +1,57 @@
+/* { dg-do compile } */
+/* { dg-options "-minline-strcmp -minline-strncmp -march=rv32gc_zbb" { target { rv32 } } } */
+/* { dg-options "-minline-strcmp -minline-strncmp -march=rv64gc_zbb" { target { rv64 } } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
+
+typedef long unsigned int size_t;
+
+/* Emits 8+1 orc.b instructions.  */
+
+int
+my_str_cmp (const char *s1, const char *s2)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  s2 = __builtin_assume_aligned (s2, 4096);
+  return __builtin_strcmp (s1, s2);
+}
+
+/* 8+1 because the backend does not know the size of "foo".  */
+
+int
+my_str_cmp_const (const char *s1)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  return __builtin_strcmp (s1, "foo");
+}
+
+/* Emits 6+1 orc.b instructions.  */
+
+int
+my_strn_cmp (const char *s1, const char *s2)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  s2 = __builtin_assume_aligned (s2, 4096);
+  return __builtin_strncmp (s1, s2, 42);
+}
+
+/* Note expanded because the backend does not know the size of "foo".  */
+
+int
+my_strn_cmp_const (const char *s1, size_t n)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  return __builtin_strncmp (s1, "foo", n);
+}
+
+/* Emits 6+1 orc.b instructions.  */
+
+int
+my_strn_cmp_bounded (const char *s1, const char *s2)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  s2 = __builtin_assume_aligned (s2, 4096);
+  return __builtin_strncmp (s1, s2, 42);
+}
+
+/* { dg-final { scan-assembler-times "orc.b\t" 32 { target { rv64 } } } } */
+/* { dg-final { scan-assembler-times "orc.b\t" 58 { target { rv32 } } } } */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] riscv: Add support for strlen inline expansion
  2023-09-06 16:07 ` [PATCH v2 1/2] riscv: Add support for strlen " Christoph Muellner
@ 2023-09-06 16:22   ` Palmer Dabbelt
  2023-09-06 16:47     ` Jeff Law
  2023-09-12  3:28   ` Jeff Law
  2023-09-12  9:38   ` Philipp Tomsich
  2 siblings, 1 reply; 10+ messages in thread
From: Palmer Dabbelt @ 2023-09-06 16:22 UTC (permalink / raw)
  To: christoph.muellner
  Cc: gcc-patches, kito.cheng, Jim Wilson, Andrew Waterman,
	philipp.tomsich, jeffreyalaw, Vineet Gupta, christoph.muellner

On Wed, 06 Sep 2023 09:07:33 PDT (-0700), christoph.muellner@vrull.eu wrote:
> From: Christoph Müllner <christoph.muellner@vrull.eu>
>
> This patch implements the expansion of the strlen builtin for RV32/RV64
> for xlen-aligned aligned strings if Zbb or XTheadBb instructions are available.
> The inserted sequences are:
>
> rv32gc_zbb (RV64 is similar):
>       add     a3,a0,4
>       li      a4,-1
> .L1:  lw      a5,0(a0)
>       add     a0,a0,4
>       orc.b   a5,a5
>       beq     a5,a4,.L1
>       not     a5,a5
>       ctz     a5,a5
>       srl     a5,a5,0x3
>       add     a0,a0,a5
>       sub     a0,a0,a3
>
> rv64gc_xtheadbb (RV32 is similar):
>       add       a4,a0,8
> .L2:  ld        a5,0(a0)
>       add       a0,a0,8
>       th.tstnbz a5,a5
>       beqz      a5,.L2
>       th.rev    a5,a5
>       th.ff1    a5,a5
>       srl       a5,a5,0x3
>       add       a0,a0,a5
>       sub       a0,a0,a4
>
> This allows to inline calls to strlen(), with optimized code for
> xlen-aligned strings, resulting in the following benefits over
> a call to libc:
> * no call/ret instructions
> * no stack frame allocation
> * no register saving/restoring
> * no alignment test
>
> The inlining mechanism is gated by a new switch ('-minline-strlen')
> and by the variable 'optimize_size'.

Maybe this is more of a Jeff question, but this looks to me like 
something that should be target-agnostic -- maybe we need some backend 
work to actually emit the special instruction, but IIRC this is a 
somewhat common flavor of instruction and is in other ISAs as well.  It 
looks like there's already a strlen insn, so I guess the core issue is 
why we need that unspec?

Sorry if I'm just missing something, though...

> Tested using the glibc string tests.
>
> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
>
> gcc/ChangeLog:
>
> 	* config.gcc: Add new object riscv-string.o.
> 	riscv-string.cc.
> 	* config/riscv/riscv-protos.h (riscv_expand_strlen):
> 	New function.
> 	* config/riscv/riscv.md (strlen<mode>): New expand INSN.
> 	* config/riscv/riscv.opt: New flag 'minline-strlen'.
> 	* config/riscv/t-riscv: Add new object riscv-string.o.
> 	* config/riscv/thead.md (th_rev<mode>2): Export INSN name.
> 	(th_rev<mode>2): Likewise.
> 	(th_tstnbz<mode>2): New INSN.
> 	* doc/invoke.texi: Document '-minline-strlen'.
> 	* emit-rtl.cc (emit_likely_jump_insn): New helper function.
> 	(emit_unlikely_jump_insn): Likewise.
> 	* rtl.h (emit_likely_jump_insn): New prototype.
> 	(emit_unlikely_jump_insn): Likewise.
> 	* config/riscv/riscv-string.cc: New file.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/riscv/xtheadbb-strlen-unaligned.c: New test.
> 	* gcc.target/riscv/xtheadbb-strlen.c: New test.
> 	* gcc.target/riscv/zbb-strlen-disabled-2.c: New test.
> 	* gcc.target/riscv/zbb-strlen-disabled.c: New test.
> 	* gcc.target/riscv/zbb-strlen-unaligned.c: New test.
> 	* gcc.target/riscv/zbb-strlen.c: New test.
> ---
>  gcc/config.gcc                                |   3 +-
>  gcc/config/riscv/riscv-protos.h               |   3 +
>  gcc/config/riscv/riscv-string.cc              | 183 ++++++++++++++++++
>  gcc/config/riscv/riscv.md                     |  28 +++
>  gcc/config/riscv/riscv.opt                    |   4 +
>  gcc/config/riscv/t-riscv                      |   6 +
>  gcc/config/riscv/thead.md                     |   9 +-
>  gcc/doc/invoke.texi                           |  11 +-
>  gcc/emit-rtl.cc                               |  24 +++
>  gcc/rtl.h                                     |   2 +
>  .../riscv/xtheadbb-strlen-unaligned.c         |  14 ++
>  .../gcc.target/riscv/xtheadbb-strlen.c        |  19 ++
>  .../gcc.target/riscv/zbb-strlen-disabled-2.c  |  15 ++
>  .../gcc.target/riscv/zbb-strlen-disabled.c    |  15 ++
>  .../gcc.target/riscv/zbb-strlen-unaligned.c   |  14 ++
>  gcc/testsuite/gcc.target/riscv/zbb-strlen.c   |  19 ++
>  16 files changed, 366 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/config/riscv/riscv-string.cc
>  create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbb-strlen-unaligned.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbb-strlen.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen-disabled-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen-disabled.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen-unaligned.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen.c
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index b2fe7c7ceef..aff6b6a5601 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -530,7 +530,8 @@ pru-*-*)
>  	;;
>  riscv*)
>  	cpu_type=riscv
> -	extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o riscv-vector-costs.o"
> +	extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o riscv-selftests.o riscv-string.o"
> +	extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o riscv-vector-costs.o"
>  	extra_objs="${extra_objs} riscv-vector-builtins.o riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
>  	extra_objs="${extra_objs} thead.o"
>  	d_target_objs="riscv-d.o"
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index 6dbf6b9f943..b060d047f01 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -517,6 +517,9 @@ const unsigned int RISCV_BUILTIN_SHIFT = 1;
>  /* Mask that selects the riscv_builtin_class part of a function code.  */
>  const unsigned int RISCV_BUILTIN_CLASS = (1 << RISCV_BUILTIN_SHIFT) - 1;
>
> +/* Routines implemented in riscv-string.cc.  */
> +extern bool riscv_expand_strlen (rtx, rtx, rtx, rtx);
> +
>  /* Routines implemented in thead.cc.  */
>  extern bool th_mempair_operands_p (rtx[4], bool, machine_mode);
>  extern void th_mempair_order_operands (rtx[4], bool, machine_mode);
> diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
> new file mode 100644
> index 00000000000..086900a6083
> --- /dev/null
> +++ b/gcc/config/riscv/riscv-string.cc
> @@ -0,0 +1,183 @@
> +/* Subroutines used to expand string operations for RISC-V.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published
> +   by the Free Software Foundation; either version 3, or (at your
> +   option) any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but WITHOUT
> +   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> +   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> +   License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#define IN_TARGET_CODE 1
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "rtl.h"
> +#include "tree.h"
> +#include "memmodel.h"
> +#include "tm_p.h"
> +#include "ira.h"
> +#include "print-tree.h"
> +#include "varasm.h"
> +#include "explow.h"
> +#include "expr.h"
> +#include "output.h"
> +#include "target.h"
> +#include "predict.h"
> +#include "optabs.h"
> +
> +/* Emit proper instruction depending on mode of dest.  */
> +
> +#define GEN_EMIT_HELPER2(name)				\
> +static rtx_insn *					\
> +do_## name ## 2(rtx dest, rtx src)			\
> +{							\
> +  rtx_insn *insn;					\
> +  if (GET_MODE (dest) == DImode)			\
> +    insn = emit_insn (gen_ ## name ## di2 (dest, src));	\
> +  else							\
> +    insn = emit_insn (gen_ ## name ## si2 (dest, src));	\
> +  return insn;						\
> +}
> +
> +/* Emit proper instruction depending on mode of dest.  */
> +
> +#define GEN_EMIT_HELPER3(name)					\
> +static rtx_insn *						\
> +do_## name ## 3(rtx dest, rtx src1, rtx src2)			\
> +{								\
> +  rtx_insn *insn;						\
> +  if (GET_MODE (dest) == DImode)				\
> +    insn = emit_insn (gen_ ## name ## di3 (dest, src1, src2));	\
> +  else								\
> +    insn = emit_insn (gen_ ## name ## si3 (dest, src1, src2));	\
> +  return insn;							\
> +}
> +
> +GEN_EMIT_HELPER3(add) /* do_add3  */
> +GEN_EMIT_HELPER2(clz) /* do_clz2  */
> +GEN_EMIT_HELPER2(ctz) /* do_ctz2  */
> +GEN_EMIT_HELPER3(lshr) /* do_lshr3  */
> +GEN_EMIT_HELPER2(orcb) /* do_orcb2  */
> +GEN_EMIT_HELPER2(one_cmpl) /* do_one_cmpl2  */
> +GEN_EMIT_HELPER3(sub) /* do_sub3  */
> +GEN_EMIT_HELPER2(th_rev) /* do_th_rev2  */
> +GEN_EMIT_HELPER2(th_tstnbz) /* do_th_tstnbz2  */
> +GEN_EMIT_HELPER2(zero_extendqi) /* do_zero_extendqi2  */
> +
> +#undef GEN_EMIT_HELPER2
> +#undef GEN_EMIT_HELPER3
> +
> +/* Helper function to load a byte or a Pmode register.
> +
> +   MODE is the mode to use for the load (QImode or Pmode).
> +   DEST is the destination register for the data.
> +   ADDR_REG is the register that holds the address.
> +   ADDR is the address expression to load from.
> +
> +   This function returns an rtx containing the register,
> +   where the ADDR is stored.  */
> +
> +static rtx
> +do_load_from_addr (machine_mode mode, rtx dest, rtx addr_reg, rtx addr)
> +{
> +  rtx mem = gen_rtx_MEM (mode, addr_reg);
> +  MEM_COPY_ATTRIBUTES (mem, addr);
> +  set_mem_size (mem, GET_MODE_SIZE (mode));
> +
> +  if (mode == QImode)
> +    do_zero_extendqi2 (dest, mem);
> +  else if (mode == Xmode)
> +    emit_move_insn (dest, mem);
> +  else
> +    gcc_unreachable ();
> +
> +  return addr_reg;
> +}
> +
> +/* If the provided string is aligned, then read XLEN bytes
> +   in a loop and use orc.b to find NUL-bytes.  */
> +
> +static bool
> +riscv_expand_strlen_scalar (rtx result, rtx src, rtx align)
> +{
> +  rtx testval, addr, addr_plus_regsz, word, zeros;
> +  rtx loop_label, cond;
> +
> +  gcc_assert (TARGET_ZBB || TARGET_XTHEADBB);
> +
> +  /* The alignment needs to be known and big enough.  */
> +  if (!CONST_INT_P (align) || UINTVAL (align) < GET_MODE_SIZE (Xmode))
> +    return false;
> +
> +  testval = gen_reg_rtx (Xmode);
> +  addr = copy_addr_to_reg (XEXP (src, 0));
> +  addr_plus_regsz = gen_reg_rtx (Pmode);
> +  word = gen_reg_rtx (Xmode);
> +  zeros = gen_reg_rtx (Xmode);
> +
> +  if (TARGET_ZBB)
> +    emit_insn (gen_rtx_SET (testval, constm1_rtx));
> +  else
> +    emit_insn (gen_rtx_SET (testval, const0_rtx));
> +
> +  do_add3 (addr_plus_regsz, addr, GEN_INT (UNITS_PER_WORD));
> +
> +  loop_label = gen_label_rtx ();
> +  emit_label (loop_label);
> +
> +  /* Load a word and use orc.b/th.tstnbz to find a zero-byte.  */
> +  do_load_from_addr (Xmode, word, addr, src);
> +  do_add3 (addr, addr, GEN_INT (UNITS_PER_WORD));
> +  if (TARGET_ZBB)
> +    do_orcb2 (word, word);
> +  else
> +    do_th_tstnbz2 (word, word);
> +  cond = gen_rtx_EQ (VOIDmode, word, testval);
> +  emit_unlikely_jump_insn (gen_cbranch4 (Xmode, cond, word, testval, loop_label));
> +
> +  /* Calculate the return value by counting zero-bits.  */
> +  if (TARGET_ZBB)
> +    do_one_cmpl2 (word, word);
> +  if (TARGET_BIG_ENDIAN)
> +    do_clz2 (zeros, word);
> +  else if (TARGET_ZBB)
> +    do_ctz2 (zeros, word);
> +  else
> +    {
> +      do_th_rev2 (word, word);
> +      do_clz2 (zeros, word);
> +    }
> +
> +  do_lshr3 (zeros, zeros, GEN_INT (exact_log2 (BITS_PER_UNIT)));
> +  do_add3 (addr, addr, zeros);
> +  do_sub3 (result, addr, addr_plus_regsz);
> +
> +  return true;
> +}
> +
> +/* Expand a strlen operation and return true if successful.
> +   Return false if we should let the compiler generate normal
> +   code, probably a strlen call.  */
> +
> +bool
> +riscv_expand_strlen (rtx result, rtx src, rtx search_char, rtx align)
> +{
> +  gcc_assert (search_char == const0_rtx);
> +
> +  if (TARGET_ZBB || TARGET_XTHEADBB)
> +    return riscv_expand_strlen_scalar (result, src, align);
> +
> +  return false;
> +}
> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index 9da2a9f1c42..e078ebc43cb 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -82,6 +82,9 @@ (define_c_enum "unspec" [
>
>    ;; the calling convention of callee
>    UNSPEC_CALLEE_CC
> +
> +  ;; String unspecs
> +  UNSPEC_STRLEN
>  ])
>
>  (define_c_enum "unspecv" [
> @@ -3500,6 +3503,31 @@ (define_expand "msubhisi4"
>    "TARGET_XTHEADMAC"
>  )
>
> +;; Search character in string (generalization of strlen).
> +;; Argument 0 is the resulting offset
> +;; Argument 1 is the string
> +;; Argument 2 is the search character
> +;; Argument 3 is the alignment
> +
> +(define_expand "strlen<mode>"
> +  [(set (match_operand:X 0 "register_operand")
> +	(unspec:X [(match_operand:BLK 1 "general_operand")
> +		     (match_operand:SI 2 "const_int_operand")
> +		     (match_operand:SI 3 "const_int_operand")]
> +		  UNSPEC_STRLEN))]
> +  "riscv_inline_strlen && !optimize_size && (TARGET_ZBB || TARGET_XTHEADBB)"
> +{
> +  rtx search_char = operands[2];
> +
> +  if (search_char != const0_rtx)
> +    FAIL;
> +
> +  if (riscv_expand_strlen (operands[0], operands[1], operands[2], operands[3]))
> +    DONE;
> +  else
> +    FAIL;
> +})
> +
>  (include "bitmanip.md")
>  (include "crypto.md")
>  (include "sync.md")
> diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
> index 98f342348b7..2491b335aef 100644
> --- a/gcc/config/riscv/riscv.opt
> +++ b/gcc/config/riscv/riscv.opt
> @@ -278,6 +278,10 @@ minline-atomics
>  Target Var(TARGET_INLINE_SUBWORD_ATOMIC) Init(1)
>  Always inline subword atomic operations.
>
> +minline-strlen
> +Target Bool Var(riscv_inline_strlen) Init(0)
> +Inline strlen calls if possible.
> +
>  Enum
>  Name(riscv_autovec_preference) Type(enum riscv_autovec_preference_enum)
>  Valid arguments to -param=riscv-autovec-preference=:
> diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
> index b1f80d1d87c..c012ac0cf33 100644
> --- a/gcc/config/riscv/t-riscv
> +++ b/gcc/config/riscv/t-riscv
> @@ -91,6 +91,12 @@ riscv-selftests.o: $(srcdir)/config/riscv/riscv-selftests.cc \
>  	$(COMPILE) $<
>  	$(POSTCOMPILE)
>
> +riscv-string.o: $(srcdir)/config/riscv/riscv-string.cc \
> +  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TARGET_H) backend.h $(RTL_H) \
> +  memmodel.h $(EMIT_RTL_H) poly-int.h output.h
> +	$(COMPILE) $<
> +	$(POSTCOMPILE)
> +
>  riscv-v.o: $(srcdir)/config/riscv/riscv-v.cc \
>    $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) $(RTL_H) \
>    $(TM_P_H) $(TARGET_H) memmodel.h insn-codes.h $(OPTABS_H) $(RECOG_H) \
> diff --git a/gcc/config/riscv/thead.md b/gcc/config/riscv/thead.md
> index 29f98dec3a8..982b048cb65 100644
> --- a/gcc/config/riscv/thead.md
> +++ b/gcc/config/riscv/thead.md
> @@ -110,7 +110,7 @@ (define_insn "*th_clz<mode>2"
>    [(set_attr "type" "bitmanip")
>     (set_attr "mode" "<X:MODE>")])
>
> -(define_insn "*th_rev<mode>2"
> +(define_insn "th_rev<mode>2"
>    [(set (match_operand:GPR 0 "register_operand" "=r")
>  	(bswap:GPR (match_operand:GPR 1 "register_operand" "r")))]
>    "TARGET_XTHEADBB && (TARGET_64BIT || <MODE>mode == SImode)"
> @@ -121,6 +121,13 @@ (define_insn "*th_rev<mode>2"
>    [(set_attr "type" "bitmanip")
>     (set_attr "mode" "<GPR:MODE>")])
>
> +(define_insn "th_tstnbz<mode>2"
> +  [(set (match_operand:X 0 "register_operand" "=r")
> +	(unspec:X [(match_operand:X 1 "register_operand" "r")] UNSPEC_ORC_B))]
> +  "TARGET_XTHEADBB"
> +  "th.tstnbz\t%0,%1"
> +  [(set_attr "type" "bitmanip")])
> +
>  ;; XTheadBs
>
>  (define_insn "*th_tst<mode>3"
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 33befee7d6b..4a9e385d009 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -1236,7 +1236,8 @@ See RS/6000 and PowerPC Options.
>  -mstack-protector-guard=@var{guard}  -mstack-protector-guard-reg=@var{reg}
>  -mstack-protector-guard-offset=@var{offset}
>  -mcsr-check -mno-csr-check
> --minline-atomics  -mno-inline-atomics}
> +-minline-atomics  -mno-inline-atomics
> +-minline-strlen  -mno-inline-strlen}
>
>  @emph{RL78 Options}
>  @gccoptlist{-msim  -mmul=none  -mmul=g13  -mmul=g14  -mallregs
> @@ -29359,6 +29360,14 @@ Do or don't use smaller but slower subword atomic emulation code that uses
>  libatomic function calls.  The default is to use fast inline subword atomics
>  that do not require libatomic.
>
> +@opindex minline-strlen
> +@item -minline-strlen
> +@itemx -mno-inline-strlen
> +Do or do not attempt to inline strlen calls if possible.
> +Inlining will only be done if the string is properly aligned
> +and instructions for accelerated processing are available.
> +The default is to not inline strlen calls.
> +
>  @opindex mshorten-memrefs
>  @item -mshorten-memrefs
>  @itemx -mno-shorten-memrefs
> diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
> index f6276a2d0b6..8bd623dcd0e 100644
> --- a/gcc/emit-rtl.cc
> +++ b/gcc/emit-rtl.cc
> @@ -5168,6 +5168,30 @@ emit_jump_insn (rtx x)
>    return last;
>  }
>
> +/* Make an insn of code JUMP_INSN with pattern X,
> +   add a REG_BR_PROB note that indicates very likely probability,
> +   and add it to the end of the doubly-linked list.  */
> +
> +rtx_insn *
> +emit_likely_jump_insn (rtx x)
> +{
> +  rtx_insn *jump = emit_jump_insn (x);
> +  add_reg_br_prob_note (jump, profile_probability::very_likely ());
> +  return jump;
> +}
> +
> +/* Make an insn of code JUMP_INSN with pattern X,
> +   add a REG_BR_PROB note that indicates very unlikely probability,
> +   and add it to the end of the doubly-linked list.  */
> +
> +rtx_insn *
> +emit_unlikely_jump_insn (rtx x)
> +{
> +  rtx_insn *jump = emit_jump_insn (x);
> +  add_reg_br_prob_note (jump, profile_probability::very_unlikely ());
> +  return jump;
> +}
> +
>  /* Make an insn of code CALL_INSN with pattern X
>     and add it to the end of the doubly-linked list.  */
>
> diff --git a/gcc/rtl.h b/gcc/rtl.h
> index 0e9491b89b4..102ad9b57a6 100644
> --- a/gcc/rtl.h
> +++ b/gcc/rtl.h
> @@ -3347,6 +3347,8 @@ extern rtx_note *emit_note_after (enum insn_note, rtx_insn *);
>  extern rtx_insn *emit_insn (rtx);
>  extern rtx_insn *emit_debug_insn (rtx);
>  extern rtx_insn *emit_jump_insn (rtx);
> +extern rtx_insn *emit_likely_jump_insn (rtx);
> +extern rtx_insn *emit_unlikely_jump_insn (rtx);
>  extern rtx_insn *emit_call_insn (rtx);
>  extern rtx_code_label *emit_label (rtx);
>  extern rtx_jump_table_data *emit_jump_table_data (rtx);
> diff --git a/gcc/testsuite/gcc.target/riscv/xtheadbb-strlen-unaligned.c b/gcc/testsuite/gcc.target/riscv/xtheadbb-strlen-unaligned.c
> new file mode 100644
> index 00000000000..57a6b5ea66a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/xtheadbb-strlen-unaligned.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-minline-strlen -march=rv32gc_xtheadbb" { target { rv32 } } } */
> +/* { dg-options "-minline-strlen -march=rv64gc_xtheadbb" { target { rv64 } } } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
> +
> +typedef long unsigned int size_t;
> +
> +size_t
> +my_str_len (const char *s)
> +{
> +  return __builtin_strlen (s);
> +}
> +
> +/* { dg-final { scan-assembler-not "th.tstnbz\t" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/xtheadbb-strlen.c b/gcc/testsuite/gcc.target/riscv/xtheadbb-strlen.c
> new file mode 100644
> index 00000000000..dbc8d1e7da7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/xtheadbb-strlen.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-minline-strlen -march=rv32gc_xtheadbb" { target { rv32 } } } */
> +/* { dg-options "-minline-strlen -march=rv64gc_xtheadbb" { target { rv64 } } } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
> +
> +typedef long unsigned int size_t;
> +
> +size_t
> +my_str_len (const char *s)
> +{
> +  s = __builtin_assume_aligned (s, 4096);
> +  return __builtin_strlen (s);
> +}
> +
> +/* { dg-final { scan-assembler "th.tstnbz\t" } } */
> +/* { dg-final { scan-assembler-not "jalr" } } */
> +/* { dg-final { scan-assembler-not "call" } } */
> +/* { dg-final { scan-assembler-not "jr" } } */
> +/* { dg-final { scan-assembler-not "tail" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/zbb-strlen-disabled-2.c b/gcc/testsuite/gcc.target/riscv/zbb-strlen-disabled-2.c
> new file mode 100644
> index 00000000000..a481068aa0c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/zbb-strlen-disabled-2.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gc_zbb" { target { rv32 } } } */
> +/* { dg-options "-march=rv64gc_zbb" { target { rv64 } } } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
> +
> +typedef long unsigned int size_t;
> +
> +size_t
> +my_str_len (const char *s)
> +{
> +  s = __builtin_assume_aligned (s, 4096);
> +  return __builtin_strlen (s);
> +}
> +
> +/* { dg-final { scan-assembler-not "orc.b\t" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/zbb-strlen-disabled.c b/gcc/testsuite/gcc.target/riscv/zbb-strlen-disabled.c
> new file mode 100644
> index 00000000000..1295aeb0086
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/zbb-strlen-disabled.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mno-inline-strlen -march=rv32gc_zbb" { target { rv32 } } } */
> +/* { dg-options "-mno-inline-strlen -march=rv64gc_zbb" { target { rv64 } } } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
> +
> +typedef long unsigned int size_t;
> +
> +size_t
> +my_str_len (const char *s)
> +{
> +  s = __builtin_assume_aligned (s, 4096);
> +  return __builtin_strlen (s);
> +}
> +
> +/* { dg-final { scan-assembler-not "orc.b\t" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/zbb-strlen-unaligned.c b/gcc/testsuite/gcc.target/riscv/zbb-strlen-unaligned.c
> new file mode 100644
> index 00000000000..326fef885d8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/zbb-strlen-unaligned.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-minline-strlen -march=rv32gc_zbb" { target { rv32 } } } */
> +/* { dg-options "-minline-strlen -march=rv64gc_zbb" { target { rv64 } } } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
> +
> +typedef long unsigned int size_t;
> +
> +size_t
> +my_str_len (const char *s)
> +{
> +  return __builtin_strlen (s);
> +}
> +
> +/* { dg-final { scan-assembler-not "orc.b\t" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/zbb-strlen.c b/gcc/testsuite/gcc.target/riscv/zbb-strlen.c
> new file mode 100644
> index 00000000000..19ebfaef16f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/zbb-strlen.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-minline-strlen -march=rv32gc_zbb" { target { rv32 } } } */
> +/* { dg-options "-minline-strlen -march=rv64gc_zbb" { target { rv64 } } } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
> +
> +typedef long unsigned int size_t;
> +
> +size_t
> +my_str_len (const char *s)
> +{
> +  s = __builtin_assume_aligned (s, 4096);
> +  return __builtin_strlen (s);
> +}
> +
> +/* { dg-final { scan-assembler "orc.b\t" } } */
> +/* { dg-final { scan-assembler-not "jalr" } } */
> +/* { dg-final { scan-assembler-not "call" } } */
> +/* { dg-final { scan-assembler-not "jr" } } */
> +/* { dg-final { scan-assembler-not "tail" } } */

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] riscv: Add support for strlen inline expansion
  2023-09-06 16:22   ` Palmer Dabbelt
@ 2023-09-06 16:47     ` Jeff Law
  2023-09-06 19:29       ` Palmer Dabbelt
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Law @ 2023-09-06 16:47 UTC (permalink / raw)
  To: Palmer Dabbelt, christoph.muellner
  Cc: gcc-patches, kito.cheng, Jim Wilson, Andrew Waterman,
	philipp.tomsich, Vineet Gupta



On 9/6/23 10:22, Palmer Dabbelt wrote:
> On Wed, 06 Sep 2023 09:07:33 PDT (-0700), christoph.muellner@vrull.eu 
> wrote:
>> From: Christoph Müllner <christoph.muellner@vrull.eu>
>>
>> This patch implements the expansion of the strlen builtin for RV32/RV64
>> for xlen-aligned aligned strings if Zbb or XTheadBb instructions are 
>> available.
>> The inserted sequences are:
>>
>> rv32gc_zbb (RV64 is similar):
>>       add     a3,a0,4
>>       li      a4,-1
>> .L1:  lw      a5,0(a0)
>>       add     a0,a0,4
>>       orc.b   a5,a5
>>       beq     a5,a4,.L1
>>       not     a5,a5
>>       ctz     a5,a5
>>       srl     a5,a5,0x3
>>       add     a0,a0,a5
>>       sub     a0,a0,a3
>>
>> rv64gc_xtheadbb (RV32 is similar):
>>       add       a4,a0,8
>> .L2:  ld        a5,0(a0)
>>       add       a0,a0,8
>>       th.tstnbz a5,a5
>>       beqz      a5,.L2
>>       th.rev    a5,a5
>>       th.ff1    a5,a5
>>       srl       a5,a5,0x3
>>       add       a0,a0,a5
>>       sub       a0,a0,a4
>>
>> This allows to inline calls to strlen(), with optimized code for
>> xlen-aligned strings, resulting in the following benefits over
>> a call to libc:
>> * no call/ret instructions
>> * no stack frame allocation
>> * no register saving/restoring
>> * no alignment test
>>
>> The inlining mechanism is gated by a new switch ('-minline-strlen')
>> and by the variable 'optimize_size'.
> 
> Maybe this is more of a Jeff question, but this looks to me like 
> something that should be target-agnostic -- maybe we need some backend 
> work to actually emit the special instruction, but IIRC this is a 
> somewhat common flavor of instruction and is in other ISAs as well.  It 
> looks like there's already a strlen insn, so I guess the core issue is 
> why we need that unspec?
> 
> Sorry if I'm just missing something, though...

The generic strlen expansion in GCC doesn't really expand a strlen loop. 
  It really just calls into the target code and forces the target to 
handle everything.


We could have generic strlen expansion code that kicks in if the target 
expander fails.  And we could probably create the necessary opcodes to 
express the optimized end-of-string comparison instructions that exist 
on various architectures.  I'm not not sure it's worth that much effort 
given targets are already doing their own strlen expansions.

jeff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] riscv: Add support for strlen inline expansion
  2023-09-06 16:47     ` Jeff Law
@ 2023-09-06 19:29       ` Palmer Dabbelt
  0 siblings, 0 replies; 10+ messages in thread
From: Palmer Dabbelt @ 2023-09-06 19:29 UTC (permalink / raw)
  To: jeffreyalaw
  Cc: christoph.muellner, gcc-patches, kito.cheng, Jim Wilson,
	Andrew Waterman, philipp.tomsich, Vineet Gupta

On Wed, 06 Sep 2023 09:47:05 PDT (-0700), jeffreyalaw@gmail.com wrote:
>
>
> On 9/6/23 10:22, Palmer Dabbelt wrote:
>> On Wed, 06 Sep 2023 09:07:33 PDT (-0700), christoph.muellner@vrull.eu
>> wrote:
>>> From: Christoph Müllner <christoph.muellner@vrull.eu>
>>>
>>> This patch implements the expansion of the strlen builtin for RV32/RV64
>>> for xlen-aligned aligned strings if Zbb or XTheadBb instructions are
>>> available.
>>> The inserted sequences are:
>>>
>>> rv32gc_zbb (RV64 is similar):
>>>       add     a3,a0,4
>>>       li      a4,-1
>>> .L1:  lw      a5,0(a0)
>>>       add     a0,a0,4
>>>       orc.b   a5,a5
>>>       beq     a5,a4,.L1
>>>       not     a5,a5
>>>       ctz     a5,a5
>>>       srl     a5,a5,0x3
>>>       add     a0,a0,a5
>>>       sub     a0,a0,a3
>>>
>>> rv64gc_xtheadbb (RV32 is similar):
>>>       add       a4,a0,8
>>> .L2:  ld        a5,0(a0)
>>>       add       a0,a0,8
>>>       th.tstnbz a5,a5
>>>       beqz      a5,.L2
>>>       th.rev    a5,a5
>>>       th.ff1    a5,a5
>>>       srl       a5,a5,0x3
>>>       add       a0,a0,a5
>>>       sub       a0,a0,a4
>>>
>>> This allows to inline calls to strlen(), with optimized code for
>>> xlen-aligned strings, resulting in the following benefits over
>>> a call to libc:
>>> * no call/ret instructions
>>> * no stack frame allocation
>>> * no register saving/restoring
>>> * no alignment test
>>>
>>> The inlining mechanism is gated by a new switch ('-minline-strlen')
>>> and by the variable 'optimize_size'.
>>
>> Maybe this is more of a Jeff question, but this looks to me like
>> something that should be target-agnostic -- maybe we need some backend
>> work to actually emit the special instruction, but IIRC this is a
>> somewhat common flavor of instruction and is in other ISAs as well.  It
>> looks like there's already a strlen insn, so I guess the core issue is
>> why we need that unspec?
>>
>> Sorry if I'm just missing something, though...
>
> The generic strlen expansion in GCC doesn't really expand a strlen loop.
>   It really just calls into the target code and forces the target to
> handle everything.

OK, that explains it.

> We could have generic strlen expansion code that kicks in if the target
> expander fails.  And we could probably create the necessary opcodes to
> express the optimized end-of-string comparison instructions that exist
> on various architectures.  I'm not not sure it's worth that much effort
> given targets are already doing their own strlen expansions.


If everyone does it this way then I don't think we need to worry about 
it.

>
> jeff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] riscv: Add support for strlen inline expansion
  2023-09-06 16:07 ` [PATCH v2 1/2] riscv: Add support for strlen " Christoph Muellner
  2023-09-06 16:22   ` Palmer Dabbelt
@ 2023-09-12  3:28   ` Jeff Law
  2023-09-12  9:38   ` Philipp Tomsich
  2 siblings, 0 replies; 10+ messages in thread
From: Jeff Law @ 2023-09-12  3:28 UTC (permalink / raw)
  To: Christoph Muellner, gcc-patches, Kito Cheng, Jim Wilson,
	Palmer Dabbelt, Andrew Waterman, Philipp Tomsich, Vineet Gupta



On 9/6/23 10:07, Christoph Muellner wrote:
> From: Christoph Müllner <christoph.muellner@vrull.eu>
> 
> This patch implements the expansion of the strlen builtin for RV32/RV64
> for xlen-aligned aligned strings if Zbb or XTheadBb instructions are available.
> The inserted sequences are:
> 
> rv32gc_zbb (RV64 is similar):
>        add     a3,a0,4
>        li      a4,-1
> .L1:  lw      a5,0(a0)
>        add     a0,a0,4
>        orc.b   a5,a5
>        beq     a5,a4,.L1
>        not     a5,a5
>        ctz     a5,a5
>        srl     a5,a5,0x3
>        add     a0,a0,a5
>        sub     a0,a0,a3
> 
> rv64gc_xtheadbb (RV32 is similar):
>        add       a4,a0,8
> .L2:  ld        a5,0(a0)
>        add       a0,a0,8
>        th.tstnbz a5,a5
>        beqz      a5,.L2
>        th.rev    a5,a5
>        th.ff1    a5,a5
>        srl       a5,a5,0x3
>        add       a0,a0,a5
>        sub       a0,a0,a4
> 
> This allows to inline calls to strlen(), with optimized code for
> xlen-aligned strings, resulting in the following benefits over
> a call to libc:
> * no call/ret instructions
> * no stack frame allocation
> * no register saving/restoring
> * no alignment test
> 
> The inlining mechanism is gated by a new switch ('-minline-strlen')
> and by the variable 'optimize_size'.
> 
> Tested using the glibc string tests.
> 
> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
> 
> gcc/ChangeLog:
> 
> 	* config.gcc: Add new object riscv-string.o.
> 	riscv-string.cc.
> 	* config/riscv/riscv-protos.h (riscv_expand_strlen):
> 	New function.
> 	* config/riscv/riscv.md (strlen<mode>): New expand INSN.
> 	* config/riscv/riscv.opt: New flag 'minline-strlen'.
> 	* config/riscv/t-riscv: Add new object riscv-string.o.
> 	* config/riscv/thead.md (th_rev<mode>2): Export INSN name.
> 	(th_rev<mode>2): Likewise.
> 	(th_tstnbz<mode>2): New INSN.
> 	* doc/invoke.texi: Document '-minline-strlen'.
> 	* emit-rtl.cc (emit_likely_jump_insn): New helper function.
> 	(emit_unlikely_jump_insn): Likewise.
> 	* rtl.h (emit_likely_jump_insn): New prototype.
> 	(emit_unlikely_jump_insn): Likewise.
> 	* config/riscv/riscv-string.cc: New file.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/riscv/xtheadbb-strlen-unaligned.c: New test.
> 	* gcc.target/riscv/xtheadbb-strlen.c: New test.
> 	* gcc.target/riscv/zbb-strlen-disabled-2.c: New test.
> 	* gcc.target/riscv/zbb-strlen-disabled.c: New test.
> 	* gcc.target/riscv/zbb-strlen-unaligned.c: New test.
> 	* gcc.target/riscv/zbb-strlen.c: New test.
Note that I don't think we need the new UNSPEC_STRLEN since its only 
used in the expander and doesn't survive into RTL.  Your call on whether 
or not to remove it now or as a separate patch (or keep it if I'm wrong 
about it not being needed.

OK for the trunk.  Sorry this got lost in the shuffle last year.

jeff
.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] riscv: Add support for str(n)cmp inline expansion
  2023-09-06 16:07 ` [PATCH v2 2/2] riscv: Add support for str(n)cmp " Christoph Muellner
@ 2023-09-12  3:34   ` Jeff Law
  2023-09-12  9:38     ` Philipp Tomsich
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Law @ 2023-09-12  3:34 UTC (permalink / raw)
  To: Christoph Muellner, gcc-patches, Kito Cheng, Jim Wilson,
	Palmer Dabbelt, Andrew Waterman, Philipp Tomsich, Vineet Gupta



On 9/6/23 10:07, Christoph Muellner wrote:
> From: Christoph Müllner <christoph.muellner@vrull.eu>
> 
> This patch implements expansions for the cmpstrsi and cmpstrnsi
> builtins for RV32/RV64 for xlen-aligned strings if Zbb or XTheadBb
> instructions are available.  The expansion basically emits a comparison
> sequence which compares XLEN bits per step if possible.
> 
> This allows to inline calls to strcmp() and strncmp() if both strings
> are xlen-aligned.  For strncmp() the length parameter needs to be known.
> The benefits over calls to libc are:
> * no call/ret instructions
> * no stack frame allocation
> * no register saving/restoring
> * no alignment tests
> 
> The inlining mechanism is gated by a new switches ('-minline-strcmp' and
> '-minline-strncmp') and by the variable 'optimize_size'.
> The amount of emitted unrolled loop iterations can be controlled by the
> parameter '--param=riscv-strcmp-inline-limit=N', which defaults to 64.
> 
> The comparision sequence is inspired by the strcmp example
> in the appendix of the Bitmanip specification (incl. the fast
> result calculation in case the first word does not contain
> a NULL byte).  Additional inspiration comes from rs6000-string.c.
> 
> The emitted sequence is not triggering any readahead pagefault issues,
> because only aligned strings are accessed by aligned xlen-loads.
> 
> This patch has been tested using the glibc string tests on QEMU:
> * rv64gc_zbb/rv64gc_xtheadbb with riscv-strcmp-inline-limit=64
> * rv64gc_zbb/rv64gc_xtheadbb with riscv-strcmp-inline-limit=8
> * rv32gc_zbb/rv32gc_xtheadbb with riscv-strcmp-inline-limit=64
> 
> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
> 
> gcc/ChangeLog:
> 
> 	* config/riscv/bitmanip.md (*<optab>_not<mode>): Export INSN name.
> 	(<optab>_not<mode>3): Likewise.
> 	* config/riscv/riscv-protos.h (riscv_expand_strcmp): New
> 	prototype.
> 	* config/riscv/riscv-string.cc (GEN_EMIT_HELPER3): New helper
> 	macros.
> 	(GEN_EMIT_HELPER2): Likewise.
> 	(emit_strcmp_scalar_compare_byte): New function.
> 	(emit_strcmp_scalar_compare_subword): Likewise.
> 	(emit_strcmp_scalar_compare_word): Likewise.
> 	(emit_strcmp_scalar_load_and_compare): Likewise.
> 	(emit_strcmp_scalar_call_to_libc): Likewise.
> 	(emit_strcmp_scalar_result_calculation_nonul): Likewise.
> 	(emit_strcmp_scalar_result_calculation): Likewise.
> 	(riscv_expand_strcmp_scalar): Likewise.
> 	(riscv_expand_strcmp): Likewise.
> 	* config/riscv/riscv.md (*slt<u>_<X:mode><GPR:mode>): Export
> 	INSN name.
> 	(@slt<u>_<X:mode><GPR:mode>3): Likewise.
> 	(cmpstrnsi): Invoke expansion function for str(n)cmp.
> 	(cmpstrsi): Likewise.
> 	* config/riscv/riscv.opt: Add new parameter
> 	'-mstring-compare-inline-limit'.
> 	* doc/invoke.texi: Document new parameter
> 	'-mstring-compare-inline-limit'.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/riscv/xtheadbb-strcmp-unaligned.c: New test.
> 	* gcc.target/riscv/xtheadbb-strcmp.c: New test.
> 	* gcc.target/riscv/zbb-strcmp-disabled-2.c: New test.
> 	* gcc.target/riscv/zbb-strcmp-disabled.c: New test.
> 	* gcc.target/riscv/zbb-strcmp-unaligned.c: New test.
> 	* gcc.target/riscv/zbb-strcmp.c: New test.
OK for the trunk.  THanks for pushing this along.

jeff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] riscv: Add support for strlen inline expansion
  2023-09-06 16:07 ` [PATCH v2 1/2] riscv: Add support for strlen " Christoph Muellner
  2023-09-06 16:22   ` Palmer Dabbelt
  2023-09-12  3:28   ` Jeff Law
@ 2023-09-12  9:38   ` Philipp Tomsich
  2 siblings, 0 replies; 10+ messages in thread
From: Philipp Tomsich @ 2023-09-12  9:38 UTC (permalink / raw)
  To: Christoph Muellner
  Cc: gcc-patches, Kito Cheng, Jim Wilson, Palmer Dabbelt,
	Andrew Waterman, Jeff Law, Vineet Gupta

Applied to master. Thanks!
Philipp.


On Wed, 6 Sept 2023 at 18:07, Christoph Muellner
<christoph.muellner@vrull.eu> wrote:
>
> From: Christoph Müllner <christoph.muellner@vrull.eu>
>
> This patch implements the expansion of the strlen builtin for RV32/RV64
> for xlen-aligned aligned strings if Zbb or XTheadBb instructions are available.
> The inserted sequences are:
>
> rv32gc_zbb (RV64 is similar):
>       add     a3,a0,4
>       li      a4,-1
> .L1:  lw      a5,0(a0)
>       add     a0,a0,4
>       orc.b   a5,a5
>       beq     a5,a4,.L1
>       not     a5,a5
>       ctz     a5,a5
>       srl     a5,a5,0x3
>       add     a0,a0,a5
>       sub     a0,a0,a3
>
> rv64gc_xtheadbb (RV32 is similar):
>       add       a4,a0,8
> .L2:  ld        a5,0(a0)
>       add       a0,a0,8
>       th.tstnbz a5,a5
>       beqz      a5,.L2
>       th.rev    a5,a5
>       th.ff1    a5,a5
>       srl       a5,a5,0x3
>       add       a0,a0,a5
>       sub       a0,a0,a4
>
> This allows to inline calls to strlen(), with optimized code for
> xlen-aligned strings, resulting in the following benefits over
> a call to libc:
> * no call/ret instructions
> * no stack frame allocation
> * no register saving/restoring
> * no alignment test
>
> The inlining mechanism is gated by a new switch ('-minline-strlen')
> and by the variable 'optimize_size'.
>
> Tested using the glibc string tests.
>
> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
>
> gcc/ChangeLog:
>
>         * config.gcc: Add new object riscv-string.o.
>         riscv-string.cc.
>         * config/riscv/riscv-protos.h (riscv_expand_strlen):
>         New function.
>         * config/riscv/riscv.md (strlen<mode>): New expand INSN.
>         * config/riscv/riscv.opt: New flag 'minline-strlen'.
>         * config/riscv/t-riscv: Add new object riscv-string.o.
>         * config/riscv/thead.md (th_rev<mode>2): Export INSN name.
>         (th_rev<mode>2): Likewise.
>         (th_tstnbz<mode>2): New INSN.
>         * doc/invoke.texi: Document '-minline-strlen'.
>         * emit-rtl.cc (emit_likely_jump_insn): New helper function.
>         (emit_unlikely_jump_insn): Likewise.
>         * rtl.h (emit_likely_jump_insn): New prototype.
>         (emit_unlikely_jump_insn): Likewise.
>         * config/riscv/riscv-string.cc: New file.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/riscv/xtheadbb-strlen-unaligned.c: New test.
>         * gcc.target/riscv/xtheadbb-strlen.c: New test.
>         * gcc.target/riscv/zbb-strlen-disabled-2.c: New test.
>         * gcc.target/riscv/zbb-strlen-disabled.c: New test.
>         * gcc.target/riscv/zbb-strlen-unaligned.c: New test.
>         * gcc.target/riscv/zbb-strlen.c: New test.
> ---
>  gcc/config.gcc                                |   3 +-
>  gcc/config/riscv/riscv-protos.h               |   3 +
>  gcc/config/riscv/riscv-string.cc              | 183 ++++++++++++++++++
>  gcc/config/riscv/riscv.md                     |  28 +++
>  gcc/config/riscv/riscv.opt                    |   4 +
>  gcc/config/riscv/t-riscv                      |   6 +
>  gcc/config/riscv/thead.md                     |   9 +-
>  gcc/doc/invoke.texi                           |  11 +-
>  gcc/emit-rtl.cc                               |  24 +++
>  gcc/rtl.h                                     |   2 +
>  .../riscv/xtheadbb-strlen-unaligned.c         |  14 ++
>  .../gcc.target/riscv/xtheadbb-strlen.c        |  19 ++
>  .../gcc.target/riscv/zbb-strlen-disabled-2.c  |  15 ++
>  .../gcc.target/riscv/zbb-strlen-disabled.c    |  15 ++
>  .../gcc.target/riscv/zbb-strlen-unaligned.c   |  14 ++
>  gcc/testsuite/gcc.target/riscv/zbb-strlen.c   |  19 ++
>  16 files changed, 366 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/config/riscv/riscv-string.cc
>  create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbb-strlen-unaligned.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbb-strlen.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen-disabled-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen-disabled.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen-unaligned.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen.c
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index b2fe7c7ceef..aff6b6a5601 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -530,7 +530,8 @@ pru-*-*)
>         ;;
>  riscv*)
>         cpu_type=riscv
> -       extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o riscv-vector-costs.o"
> +       extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o riscv-selftests.o riscv-string.o"
> +       extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o riscv-vector-costs.o"
>         extra_objs="${extra_objs} riscv-vector-builtins.o riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
>         extra_objs="${extra_objs} thead.o"
>         d_target_objs="riscv-d.o"
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index 6dbf6b9f943..b060d047f01 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -517,6 +517,9 @@ const unsigned int RISCV_BUILTIN_SHIFT = 1;
>  /* Mask that selects the riscv_builtin_class part of a function code.  */
>  const unsigned int RISCV_BUILTIN_CLASS = (1 << RISCV_BUILTIN_SHIFT) - 1;
>
> +/* Routines implemented in riscv-string.cc.  */
> +extern bool riscv_expand_strlen (rtx, rtx, rtx, rtx);
> +
>  /* Routines implemented in thead.cc.  */
>  extern bool th_mempair_operands_p (rtx[4], bool, machine_mode);
>  extern void th_mempair_order_operands (rtx[4], bool, machine_mode);
> diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
> new file mode 100644
> index 00000000000..086900a6083
> --- /dev/null
> +++ b/gcc/config/riscv/riscv-string.cc
> @@ -0,0 +1,183 @@
> +/* Subroutines used to expand string operations for RISC-V.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published
> +   by the Free Software Foundation; either version 3, or (at your
> +   option) any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but WITHOUT
> +   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> +   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> +   License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#define IN_TARGET_CODE 1
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "rtl.h"
> +#include "tree.h"
> +#include "memmodel.h"
> +#include "tm_p.h"
> +#include "ira.h"
> +#include "print-tree.h"
> +#include "varasm.h"
> +#include "explow.h"
> +#include "expr.h"
> +#include "output.h"
> +#include "target.h"
> +#include "predict.h"
> +#include "optabs.h"
> +
> +/* Emit proper instruction depending on mode of dest.  */
> +
> +#define GEN_EMIT_HELPER2(name)                         \
> +static rtx_insn *                                      \
> +do_## name ## 2(rtx dest, rtx src)                     \
> +{                                                      \
> +  rtx_insn *insn;                                      \
> +  if (GET_MODE (dest) == DImode)                       \
> +    insn = emit_insn (gen_ ## name ## di2 (dest, src));        \
> +  else                                                 \
> +    insn = emit_insn (gen_ ## name ## si2 (dest, src));        \
> +  return insn;                                         \
> +}
> +
> +/* Emit proper instruction depending on mode of dest.  */
> +
> +#define GEN_EMIT_HELPER3(name)                                 \
> +static rtx_insn *                                              \
> +do_## name ## 3(rtx dest, rtx src1, rtx src2)                  \
> +{                                                              \
> +  rtx_insn *insn;                                              \
> +  if (GET_MODE (dest) == DImode)                               \
> +    insn = emit_insn (gen_ ## name ## di3 (dest, src1, src2)); \
> +  else                                                         \
> +    insn = emit_insn (gen_ ## name ## si3 (dest, src1, src2)); \
> +  return insn;                                                 \
> +}
> +
> +GEN_EMIT_HELPER3(add) /* do_add3  */
> +GEN_EMIT_HELPER2(clz) /* do_clz2  */
> +GEN_EMIT_HELPER2(ctz) /* do_ctz2  */
> +GEN_EMIT_HELPER3(lshr) /* do_lshr3  */
> +GEN_EMIT_HELPER2(orcb) /* do_orcb2  */
> +GEN_EMIT_HELPER2(one_cmpl) /* do_one_cmpl2  */
> +GEN_EMIT_HELPER3(sub) /* do_sub3  */
> +GEN_EMIT_HELPER2(th_rev) /* do_th_rev2  */
> +GEN_EMIT_HELPER2(th_tstnbz) /* do_th_tstnbz2  */
> +GEN_EMIT_HELPER2(zero_extendqi) /* do_zero_extendqi2  */
> +
> +#undef GEN_EMIT_HELPER2
> +#undef GEN_EMIT_HELPER3
> +
> +/* Helper function to load a byte or a Pmode register.
> +
> +   MODE is the mode to use for the load (QImode or Pmode).
> +   DEST is the destination register for the data.
> +   ADDR_REG is the register that holds the address.
> +   ADDR is the address expression to load from.
> +
> +   This function returns an rtx containing the register,
> +   where the ADDR is stored.  */
> +
> +static rtx
> +do_load_from_addr (machine_mode mode, rtx dest, rtx addr_reg, rtx addr)
> +{
> +  rtx mem = gen_rtx_MEM (mode, addr_reg);
> +  MEM_COPY_ATTRIBUTES (mem, addr);
> +  set_mem_size (mem, GET_MODE_SIZE (mode));
> +
> +  if (mode == QImode)
> +    do_zero_extendqi2 (dest, mem);
> +  else if (mode == Xmode)
> +    emit_move_insn (dest, mem);
> +  else
> +    gcc_unreachable ();
> +
> +  return addr_reg;
> +}
> +
> +/* If the provided string is aligned, then read XLEN bytes
> +   in a loop and use orc.b to find NUL-bytes.  */
> +
> +static bool
> +riscv_expand_strlen_scalar (rtx result, rtx src, rtx align)
> +{
> +  rtx testval, addr, addr_plus_regsz, word, zeros;
> +  rtx loop_label, cond;
> +
> +  gcc_assert (TARGET_ZBB || TARGET_XTHEADBB);
> +
> +  /* The alignment needs to be known and big enough.  */
> +  if (!CONST_INT_P (align) || UINTVAL (align) < GET_MODE_SIZE (Xmode))
> +    return false;
> +
> +  testval = gen_reg_rtx (Xmode);
> +  addr = copy_addr_to_reg (XEXP (src, 0));
> +  addr_plus_regsz = gen_reg_rtx (Pmode);
> +  word = gen_reg_rtx (Xmode);
> +  zeros = gen_reg_rtx (Xmode);
> +
> +  if (TARGET_ZBB)
> +    emit_insn (gen_rtx_SET (testval, constm1_rtx));
> +  else
> +    emit_insn (gen_rtx_SET (testval, const0_rtx));
> +
> +  do_add3 (addr_plus_regsz, addr, GEN_INT (UNITS_PER_WORD));
> +
> +  loop_label = gen_label_rtx ();
> +  emit_label (loop_label);
> +
> +  /* Load a word and use orc.b/th.tstnbz to find a zero-byte.  */
> +  do_load_from_addr (Xmode, word, addr, src);
> +  do_add3 (addr, addr, GEN_INT (UNITS_PER_WORD));
> +  if (TARGET_ZBB)
> +    do_orcb2 (word, word);
> +  else
> +    do_th_tstnbz2 (word, word);
> +  cond = gen_rtx_EQ (VOIDmode, word, testval);
> +  emit_unlikely_jump_insn (gen_cbranch4 (Xmode, cond, word, testval, loop_label));
> +
> +  /* Calculate the return value by counting zero-bits.  */
> +  if (TARGET_ZBB)
> +    do_one_cmpl2 (word, word);
> +  if (TARGET_BIG_ENDIAN)
> +    do_clz2 (zeros, word);
> +  else if (TARGET_ZBB)
> +    do_ctz2 (zeros, word);
> +  else
> +    {
> +      do_th_rev2 (word, word);
> +      do_clz2 (zeros, word);
> +    }
> +
> +  do_lshr3 (zeros, zeros, GEN_INT (exact_log2 (BITS_PER_UNIT)));
> +  do_add3 (addr, addr, zeros);
> +  do_sub3 (result, addr, addr_plus_regsz);
> +
> +  return true;
> +}
> +
> +/* Expand a strlen operation and return true if successful.
> +   Return false if we should let the compiler generate normal
> +   code, probably a strlen call.  */
> +
> +bool
> +riscv_expand_strlen (rtx result, rtx src, rtx search_char, rtx align)
> +{
> +  gcc_assert (search_char == const0_rtx);
> +
> +  if (TARGET_ZBB || TARGET_XTHEADBB)
> +    return riscv_expand_strlen_scalar (result, src, align);
> +
> +  return false;
> +}
> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index 9da2a9f1c42..e078ebc43cb 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -82,6 +82,9 @@ (define_c_enum "unspec" [
>
>    ;; the calling convention of callee
>    UNSPEC_CALLEE_CC
> +
> +  ;; String unspecs
> +  UNSPEC_STRLEN
>  ])
>
>  (define_c_enum "unspecv" [
> @@ -3500,6 +3503,31 @@ (define_expand "msubhisi4"
>    "TARGET_XTHEADMAC"
>  )
>
> +;; Search character in string (generalization of strlen).
> +;; Argument 0 is the resulting offset
> +;; Argument 1 is the string
> +;; Argument 2 is the search character
> +;; Argument 3 is the alignment
> +
> +(define_expand "strlen<mode>"
> +  [(set (match_operand:X 0 "register_operand")
> +       (unspec:X [(match_operand:BLK 1 "general_operand")
> +                    (match_operand:SI 2 "const_int_operand")
> +                    (match_operand:SI 3 "const_int_operand")]
> +                 UNSPEC_STRLEN))]
> +  "riscv_inline_strlen && !optimize_size && (TARGET_ZBB || TARGET_XTHEADBB)"
> +{
> +  rtx search_char = operands[2];
> +
> +  if (search_char != const0_rtx)
> +    FAIL;
> +
> +  if (riscv_expand_strlen (operands[0], operands[1], operands[2], operands[3]))
> +    DONE;
> +  else
> +    FAIL;
> +})
> +
>  (include "bitmanip.md")
>  (include "crypto.md")
>  (include "sync.md")
> diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
> index 98f342348b7..2491b335aef 100644
> --- a/gcc/config/riscv/riscv.opt
> +++ b/gcc/config/riscv/riscv.opt
> @@ -278,6 +278,10 @@ minline-atomics
>  Target Var(TARGET_INLINE_SUBWORD_ATOMIC) Init(1)
>  Always inline subword atomic operations.
>
> +minline-strlen
> +Target Bool Var(riscv_inline_strlen) Init(0)
> +Inline strlen calls if possible.
> +
>  Enum
>  Name(riscv_autovec_preference) Type(enum riscv_autovec_preference_enum)
>  Valid arguments to -param=riscv-autovec-preference=:
> diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
> index b1f80d1d87c..c012ac0cf33 100644
> --- a/gcc/config/riscv/t-riscv
> +++ b/gcc/config/riscv/t-riscv
> @@ -91,6 +91,12 @@ riscv-selftests.o: $(srcdir)/config/riscv/riscv-selftests.cc \
>         $(COMPILE) $<
>         $(POSTCOMPILE)
>
> +riscv-string.o: $(srcdir)/config/riscv/riscv-string.cc \
> +  $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TARGET_H) backend.h $(RTL_H) \
> +  memmodel.h $(EMIT_RTL_H) poly-int.h output.h
> +       $(COMPILE) $<
> +       $(POSTCOMPILE)
> +
>  riscv-v.o: $(srcdir)/config/riscv/riscv-v.cc \
>    $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) $(RTL_H) \
>    $(TM_P_H) $(TARGET_H) memmodel.h insn-codes.h $(OPTABS_H) $(RECOG_H) \
> diff --git a/gcc/config/riscv/thead.md b/gcc/config/riscv/thead.md
> index 29f98dec3a8..982b048cb65 100644
> --- a/gcc/config/riscv/thead.md
> +++ b/gcc/config/riscv/thead.md
> @@ -110,7 +110,7 @@ (define_insn "*th_clz<mode>2"
>    [(set_attr "type" "bitmanip")
>     (set_attr "mode" "<X:MODE>")])
>
> -(define_insn "*th_rev<mode>2"
> +(define_insn "th_rev<mode>2"
>    [(set (match_operand:GPR 0 "register_operand" "=r")
>         (bswap:GPR (match_operand:GPR 1 "register_operand" "r")))]
>    "TARGET_XTHEADBB && (TARGET_64BIT || <MODE>mode == SImode)"
> @@ -121,6 +121,13 @@ (define_insn "*th_rev<mode>2"
>    [(set_attr "type" "bitmanip")
>     (set_attr "mode" "<GPR:MODE>")])
>
> +(define_insn "th_tstnbz<mode>2"
> +  [(set (match_operand:X 0 "register_operand" "=r")
> +       (unspec:X [(match_operand:X 1 "register_operand" "r")] UNSPEC_ORC_B))]
> +  "TARGET_XTHEADBB"
> +  "th.tstnbz\t%0,%1"
> +  [(set_attr "type" "bitmanip")])
> +
>  ;; XTheadBs
>
>  (define_insn "*th_tst<mode>3"
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 33befee7d6b..4a9e385d009 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -1236,7 +1236,8 @@ See RS/6000 and PowerPC Options.
>  -mstack-protector-guard=@var{guard}  -mstack-protector-guard-reg=@var{reg}
>  -mstack-protector-guard-offset=@var{offset}
>  -mcsr-check -mno-csr-check
> --minline-atomics  -mno-inline-atomics}
> +-minline-atomics  -mno-inline-atomics
> +-minline-strlen  -mno-inline-strlen}
>
>  @emph{RL78 Options}
>  @gccoptlist{-msim  -mmul=none  -mmul=g13  -mmul=g14  -mallregs
> @@ -29359,6 +29360,14 @@ Do or don't use smaller but slower subword atomic emulation code that uses
>  libatomic function calls.  The default is to use fast inline subword atomics
>  that do not require libatomic.
>
> +@opindex minline-strlen
> +@item -minline-strlen
> +@itemx -mno-inline-strlen
> +Do or do not attempt to inline strlen calls if possible.
> +Inlining will only be done if the string is properly aligned
> +and instructions for accelerated processing are available.
> +The default is to not inline strlen calls.
> +
>  @opindex mshorten-memrefs
>  @item -mshorten-memrefs
>  @itemx -mno-shorten-memrefs
> diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
> index f6276a2d0b6..8bd623dcd0e 100644
> --- a/gcc/emit-rtl.cc
> +++ b/gcc/emit-rtl.cc
> @@ -5168,6 +5168,30 @@ emit_jump_insn (rtx x)
>    return last;
>  }
>
> +/* Make an insn of code JUMP_INSN with pattern X,
> +   add a REG_BR_PROB note that indicates very likely probability,
> +   and add it to the end of the doubly-linked list.  */
> +
> +rtx_insn *
> +emit_likely_jump_insn (rtx x)
> +{
> +  rtx_insn *jump = emit_jump_insn (x);
> +  add_reg_br_prob_note (jump, profile_probability::very_likely ());
> +  return jump;
> +}
> +
> +/* Make an insn of code JUMP_INSN with pattern X,
> +   add a REG_BR_PROB note that indicates very unlikely probability,
> +   and add it to the end of the doubly-linked list.  */
> +
> +rtx_insn *
> +emit_unlikely_jump_insn (rtx x)
> +{
> +  rtx_insn *jump = emit_jump_insn (x);
> +  add_reg_br_prob_note (jump, profile_probability::very_unlikely ());
> +  return jump;
> +}
> +
>  /* Make an insn of code CALL_INSN with pattern X
>     and add it to the end of the doubly-linked list.  */
>
> diff --git a/gcc/rtl.h b/gcc/rtl.h
> index 0e9491b89b4..102ad9b57a6 100644
> --- a/gcc/rtl.h
> +++ b/gcc/rtl.h
> @@ -3347,6 +3347,8 @@ extern rtx_note *emit_note_after (enum insn_note, rtx_insn *);
>  extern rtx_insn *emit_insn (rtx);
>  extern rtx_insn *emit_debug_insn (rtx);
>  extern rtx_insn *emit_jump_insn (rtx);
> +extern rtx_insn *emit_likely_jump_insn (rtx);
> +extern rtx_insn *emit_unlikely_jump_insn (rtx);
>  extern rtx_insn *emit_call_insn (rtx);
>  extern rtx_code_label *emit_label (rtx);
>  extern rtx_jump_table_data *emit_jump_table_data (rtx);
> diff --git a/gcc/testsuite/gcc.target/riscv/xtheadbb-strlen-unaligned.c b/gcc/testsuite/gcc.target/riscv/xtheadbb-strlen-unaligned.c
> new file mode 100644
> index 00000000000..57a6b5ea66a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/xtheadbb-strlen-unaligned.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-minline-strlen -march=rv32gc_xtheadbb" { target { rv32 } } } */
> +/* { dg-options "-minline-strlen -march=rv64gc_xtheadbb" { target { rv64 } } } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
> +
> +typedef long unsigned int size_t;
> +
> +size_t
> +my_str_len (const char *s)
> +{
> +  return __builtin_strlen (s);
> +}
> +
> +/* { dg-final { scan-assembler-not "th.tstnbz\t" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/xtheadbb-strlen.c b/gcc/testsuite/gcc.target/riscv/xtheadbb-strlen.c
> new file mode 100644
> index 00000000000..dbc8d1e7da7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/xtheadbb-strlen.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-minline-strlen -march=rv32gc_xtheadbb" { target { rv32 } } } */
> +/* { dg-options "-minline-strlen -march=rv64gc_xtheadbb" { target { rv64 } } } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
> +
> +typedef long unsigned int size_t;
> +
> +size_t
> +my_str_len (const char *s)
> +{
> +  s = __builtin_assume_aligned (s, 4096);
> +  return __builtin_strlen (s);
> +}
> +
> +/* { dg-final { scan-assembler "th.tstnbz\t" } } */
> +/* { dg-final { scan-assembler-not "jalr" } } */
> +/* { dg-final { scan-assembler-not "call" } } */
> +/* { dg-final { scan-assembler-not "jr" } } */
> +/* { dg-final { scan-assembler-not "tail" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/zbb-strlen-disabled-2.c b/gcc/testsuite/gcc.target/riscv/zbb-strlen-disabled-2.c
> new file mode 100644
> index 00000000000..a481068aa0c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/zbb-strlen-disabled-2.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gc_zbb" { target { rv32 } } } */
> +/* { dg-options "-march=rv64gc_zbb" { target { rv64 } } } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
> +
> +typedef long unsigned int size_t;
> +
> +size_t
> +my_str_len (const char *s)
> +{
> +  s = __builtin_assume_aligned (s, 4096);
> +  return __builtin_strlen (s);
> +}
> +
> +/* { dg-final { scan-assembler-not "orc.b\t" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/zbb-strlen-disabled.c b/gcc/testsuite/gcc.target/riscv/zbb-strlen-disabled.c
> new file mode 100644
> index 00000000000..1295aeb0086
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/zbb-strlen-disabled.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mno-inline-strlen -march=rv32gc_zbb" { target { rv32 } } } */
> +/* { dg-options "-mno-inline-strlen -march=rv64gc_zbb" { target { rv64 } } } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
> +
> +typedef long unsigned int size_t;
> +
> +size_t
> +my_str_len (const char *s)
> +{
> +  s = __builtin_assume_aligned (s, 4096);
> +  return __builtin_strlen (s);
> +}
> +
> +/* { dg-final { scan-assembler-not "orc.b\t" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/zbb-strlen-unaligned.c b/gcc/testsuite/gcc.target/riscv/zbb-strlen-unaligned.c
> new file mode 100644
> index 00000000000..326fef885d8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/zbb-strlen-unaligned.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-minline-strlen -march=rv32gc_zbb" { target { rv32 } } } */
> +/* { dg-options "-minline-strlen -march=rv64gc_zbb" { target { rv64 } } } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
> +
> +typedef long unsigned int size_t;
> +
> +size_t
> +my_str_len (const char *s)
> +{
> +  return __builtin_strlen (s);
> +}
> +
> +/* { dg-final { scan-assembler-not "orc.b\t" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/zbb-strlen.c b/gcc/testsuite/gcc.target/riscv/zbb-strlen.c
> new file mode 100644
> index 00000000000..19ebfaef16f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/zbb-strlen.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-minline-strlen -march=rv32gc_zbb" { target { rv32 } } } */
> +/* { dg-options "-minline-strlen -march=rv64gc_zbb" { target { rv64 } } } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
> +
> +typedef long unsigned int size_t;
> +
> +size_t
> +my_str_len (const char *s)
> +{
> +  s = __builtin_assume_aligned (s, 4096);
> +  return __builtin_strlen (s);
> +}
> +
> +/* { dg-final { scan-assembler "orc.b\t" } } */
> +/* { dg-final { scan-assembler-not "jalr" } } */
> +/* { dg-final { scan-assembler-not "call" } } */
> +/* { dg-final { scan-assembler-not "jr" } } */
> +/* { dg-final { scan-assembler-not "tail" } } */
> --
> 2.41.0
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] riscv: Add support for str(n)cmp inline expansion
  2023-09-12  3:34   ` Jeff Law
@ 2023-09-12  9:38     ` Philipp Tomsich
  0 siblings, 0 replies; 10+ messages in thread
From: Philipp Tomsich @ 2023-09-12  9:38 UTC (permalink / raw)
  To: Jeff Law
  Cc: Christoph Muellner, gcc-patches, Kito Cheng, Jim Wilson,
	Palmer Dabbelt, Andrew Waterman, Vineet Gupta

Applied to master. Thanks!
Philipp.

On Tue, 12 Sept 2023 at 05:34, Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 9/6/23 10:07, Christoph Muellner wrote:
> > From: Christoph Müllner <christoph.muellner@vrull.eu>
> >
> > This patch implements expansions for the cmpstrsi and cmpstrnsi
> > builtins for RV32/RV64 for xlen-aligned strings if Zbb or XTheadBb
> > instructions are available.  The expansion basically emits a comparison
> > sequence which compares XLEN bits per step if possible.
> >
> > This allows to inline calls to strcmp() and strncmp() if both strings
> > are xlen-aligned.  For strncmp() the length parameter needs to be known.
> > The benefits over calls to libc are:
> > * no call/ret instructions
> > * no stack frame allocation
> > * no register saving/restoring
> > * no alignment tests
> >
> > The inlining mechanism is gated by a new switches ('-minline-strcmp' and
> > '-minline-strncmp') and by the variable 'optimize_size'.
> > The amount of emitted unrolled loop iterations can be controlled by the
> > parameter '--param=riscv-strcmp-inline-limit=N', which defaults to 64.
> >
> > The comparision sequence is inspired by the strcmp example
> > in the appendix of the Bitmanip specification (incl. the fast
> > result calculation in case the first word does not contain
> > a NULL byte).  Additional inspiration comes from rs6000-string.c.
> >
> > The emitted sequence is not triggering any readahead pagefault issues,
> > because only aligned strings are accessed by aligned xlen-loads.
> >
> > This patch has been tested using the glibc string tests on QEMU:
> > * rv64gc_zbb/rv64gc_xtheadbb with riscv-strcmp-inline-limit=64
> > * rv64gc_zbb/rv64gc_xtheadbb with riscv-strcmp-inline-limit=8
> > * rv32gc_zbb/rv32gc_xtheadbb with riscv-strcmp-inline-limit=64
> >
> > Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
> >
> > gcc/ChangeLog:
> >
> >       * config/riscv/bitmanip.md (*<optab>_not<mode>): Export INSN name.
> >       (<optab>_not<mode>3): Likewise.
> >       * config/riscv/riscv-protos.h (riscv_expand_strcmp): New
> >       prototype.
> >       * config/riscv/riscv-string.cc (GEN_EMIT_HELPER3): New helper
> >       macros.
> >       (GEN_EMIT_HELPER2): Likewise.
> >       (emit_strcmp_scalar_compare_byte): New function.
> >       (emit_strcmp_scalar_compare_subword): Likewise.
> >       (emit_strcmp_scalar_compare_word): Likewise.
> >       (emit_strcmp_scalar_load_and_compare): Likewise.
> >       (emit_strcmp_scalar_call_to_libc): Likewise.
> >       (emit_strcmp_scalar_result_calculation_nonul): Likewise.
> >       (emit_strcmp_scalar_result_calculation): Likewise.
> >       (riscv_expand_strcmp_scalar): Likewise.
> >       (riscv_expand_strcmp): Likewise.
> >       * config/riscv/riscv.md (*slt<u>_<X:mode><GPR:mode>): Export
> >       INSN name.
> >       (@slt<u>_<X:mode><GPR:mode>3): Likewise.
> >       (cmpstrnsi): Invoke expansion function for str(n)cmp.
> >       (cmpstrsi): Likewise.
> >       * config/riscv/riscv.opt: Add new parameter
> >       '-mstring-compare-inline-limit'.
> >       * doc/invoke.texi: Document new parameter
> >       '-mstring-compare-inline-limit'.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.target/riscv/xtheadbb-strcmp-unaligned.c: New test.
> >       * gcc.target/riscv/xtheadbb-strcmp.c: New test.
> >       * gcc.target/riscv/zbb-strcmp-disabled-2.c: New test.
> >       * gcc.target/riscv/zbb-strcmp-disabled.c: New test.
> >       * gcc.target/riscv/zbb-strcmp-unaligned.c: New test.
> >       * gcc.target/riscv/zbb-strcmp.c: New test.
> OK for the trunk.  THanks for pushing this along.
>
> jeff

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-09-12  9:39 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-06 16:07 [PATCH v2 0/2] riscv: Introduce strlen/strcmp/strncmp inline expansion Christoph Muellner
2023-09-06 16:07 ` [PATCH v2 1/2] riscv: Add support for strlen " Christoph Muellner
2023-09-06 16:22   ` Palmer Dabbelt
2023-09-06 16:47     ` Jeff Law
2023-09-06 19:29       ` Palmer Dabbelt
2023-09-12  3:28   ` Jeff Law
2023-09-12  9:38   ` Philipp Tomsich
2023-09-06 16:07 ` [PATCH v2 2/2] riscv: Add support for str(n)cmp " Christoph Muellner
2023-09-12  3:34   ` Jeff Law
2023-09-12  9:38     ` Philipp Tomsich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).