public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc(refs/vendors/riscv/heads/gcc-13-with-riscv-opts)] riscv: Add support for str(n)cmp inline expansion
@ 2023-09-18 18:25 Jeff Law
  0 siblings, 0 replies; only message in thread
From: Jeff Law @ 2023-09-18 18:25 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:ffb5ec391a22688b19593e4c9fbe7d51725d6a9c

commit ffb5ec391a22688b19593e4c9fbe7d51725d6a9c
Author: Christoph Müllner <christoph.muellner@vrull.eu>
Date:   Wed Sep 28 11:19:18 2022 +0200

    riscv: Add support for str(n)cmp inline expansion
    
    This patch implements expansions for the cmpstrsi and cmpstrnsi
    builtins for RV32/RV64 for xlen-aligned strings if Zbb or XTheadBb
    instructions are available.  The expansion basically emits a comparison
    sequence which compares XLEN bits per step if possible.
    
    This allows to inline calls to strcmp() and strncmp() if both strings
    are xlen-aligned.  For strncmp() the length parameter needs to be known.
    The benefits over calls to libc are:
    * no call/ret instructions
    * no stack frame allocation
    * no register saving/restoring
    * no alignment tests
    
    The inlining mechanism is gated by a new switches ('-minline-strcmp' and
    '-minline-strncmp') and by the variable 'optimize_size'.
    The amount of emitted unrolled loop iterations can be controlled by the
    parameter '--param=riscv-strcmp-inline-limit=N', which defaults to 64.
    
    The comparision sequence is inspired by the strcmp example
    in the appendix of the Bitmanip specification (incl. the fast
    result calculation in case the first word does not contain
    a NULL byte).  Additional inspiration comes from rs6000-string.c.
    
    The emitted sequence is not triggering any readahead pagefault issues,
    because only aligned strings are accessed by aligned xlen-loads.
    
    This patch has been tested using the glibc string tests on QEMU:
    * rv64gc_zbb/rv64gc_xtheadbb with riscv-strcmp-inline-limit=64
    * rv64gc_zbb/rv64gc_xtheadbb with riscv-strcmp-inline-limit=8
    * rv32gc_zbb/rv32gc_xtheadbb with riscv-strcmp-inline-limit=64
    
    Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
    
    gcc/ChangeLog:
    
            * config/riscv/bitmanip.md (*<optab>_not<mode>): Export INSN name.
            (<optab>_not<mode>3): Likewise.
            * config/riscv/riscv-protos.h (riscv_expand_strcmp): New
            prototype.
            * config/riscv/riscv-string.cc (GEN_EMIT_HELPER3): New helper
            macros.
            (GEN_EMIT_HELPER2): Likewise.
            (emit_strcmp_scalar_compare_byte): New function.
            (emit_strcmp_scalar_compare_subword): Likewise.
            (emit_strcmp_scalar_compare_word): Likewise.
            (emit_strcmp_scalar_load_and_compare): Likewise.
            (emit_strcmp_scalar_call_to_libc): Likewise.
            (emit_strcmp_scalar_result_calculation_nonul): Likewise.
            (emit_strcmp_scalar_result_calculation): Likewise.
            (riscv_expand_strcmp_scalar): Likewise.
            (riscv_expand_strcmp): Likewise.
            * config/riscv/riscv.md (*slt<u>_<X:mode><GPR:mode>): Export
            INSN name.
            (@slt<u>_<X:mode><GPR:mode>3): Likewise.
            (cmpstrnsi): Invoke expansion function for str(n)cmp.
            (cmpstrsi): Likewise.
            * config/riscv/riscv.opt: Add new parameter
            '-mstring-compare-inline-limit'.
            * doc/invoke.texi: Document new parameter
            '-mstring-compare-inline-limit'.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.target/riscv/xtheadbb-strcmp.c: New test.
            * gcc.target/riscv/zbb-strcmp-disabled-2.c: New test.
            * gcc.target/riscv/zbb-strcmp-disabled.c: New test.
            * gcc.target/riscv/zbb-strcmp-unaligned.c: New test.
            * gcc.target/riscv/zbb-strcmp.c: New test.
    
    Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
    (cherry picked from commit 949f1ccf1ba9d1f33ca3809424e97429b717950a)

Diff:
---
 gcc/config/riscv/bitmanip.md                       |   2 +-
 gcc/config/riscv/riscv-protos.h                    |   1 +
 gcc/config/riscv/riscv-string.cc                   | 411 +++++++++++++++++++++
 gcc/config/riscv/riscv.md                          |  44 ++-
 gcc/config/riscv/riscv.opt                         |  12 +
 gcc/doc/invoke.texi                                |  20 +-
 gcc/testsuite/gcc.target/riscv/xtheadbb-strcmp.c   |  57 +++
 .../gcc.target/riscv/zbb-strcmp-disabled-2.c       |  38 ++
 .../gcc.target/riscv/zbb-strcmp-disabled.c         |  38 ++
 gcc/testsuite/gcc.target/riscv/zbb-strcmp-limit.c  |  57 +++
 .../gcc.target/riscv/zbb-strcmp-unaligned.c        |  38 ++
 gcc/testsuite/gcc.target/riscv/zbb-strcmp.c        |  57 +++
 12 files changed, 772 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 431b3292213..0d126a8ece5 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -206,7 +206,7 @@
 	(popcount:GPR (match_operand:GPR 1 "register_operand")))]
   "TARGET_ZBB")
 
-(define_insn "*<optab>_not<mode>"
+(define_insn "<optab>_not<mode>3"
   [(set (match_operand:X 0 "register_operand" "=r")
         (bitmanip_bitwise:X (not:X (match_operand:X 1 "register_operand" "r"))
                             (match_operand:X 2 "register_operand" "r")))]
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 52d99f27ec6..d08d5dfeef4 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -518,6 +518,7 @@ const unsigned int RISCV_BUILTIN_SHIFT = 1;
 const unsigned int RISCV_BUILTIN_CLASS = (1 << RISCV_BUILTIN_SHIFT) - 1;
 
 /* Routines implemented in riscv-string.cc.  */
+extern bool riscv_expand_strcmp (rtx, rtx, rtx, rtx, rtx);
 extern bool riscv_expand_strlen (rtx, rtx, rtx, rtx);
 
 /* Routines implemented in thead.cc.  */
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 086900a6083..2bdff0374e8 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -66,14 +66,22 @@ do_## name ## 3(rtx dest, rtx src1, rtx src2)			\
 }
 
 GEN_EMIT_HELPER3(add) /* do_add3  */
+GEN_EMIT_HELPER3(and) /* do_and3  */
+GEN_EMIT_HELPER3(ashl) /* do_ashl3  */
+GEN_EMIT_HELPER2(bswap) /* do_bswap2  */
 GEN_EMIT_HELPER2(clz) /* do_clz2  */
 GEN_EMIT_HELPER2(ctz) /* do_ctz2  */
+GEN_EMIT_HELPER3(ior) /* do_ior3  */
+GEN_EMIT_HELPER3(ior_not) /* do_ior_not3  */
 GEN_EMIT_HELPER3(lshr) /* do_lshr3  */
+GEN_EMIT_HELPER2(neg) /* do_neg2  */
 GEN_EMIT_HELPER2(orcb) /* do_orcb2  */
 GEN_EMIT_HELPER2(one_cmpl) /* do_one_cmpl2  */
+GEN_EMIT_HELPER3(rotr) /* do_rotr3  */
 GEN_EMIT_HELPER3(sub) /* do_sub3  */
 GEN_EMIT_HELPER2(th_rev) /* do_th_rev2  */
 GEN_EMIT_HELPER2(th_tstnbz) /* do_th_tstnbz2  */
+GEN_EMIT_HELPER3(xor) /* do_xor3  */
 GEN_EMIT_HELPER2(zero_extendqi) /* do_zero_extendqi2  */
 
 #undef GEN_EMIT_HELPER2
@@ -106,6 +114,409 @@ do_load_from_addr (machine_mode mode, rtx dest, rtx addr_reg, rtx addr)
   return addr_reg;
 }
 
+/* Generate a sequence to compare single characters in data1 and data2.
+
+   RESULT is the register where the return value of str(n)cmp will be stored.
+   DATA1 is a register which contains character1.
+   DATA2 is a register which contains character2.
+   FINAL_LABEL is the location after the calculation of the return value.  */
+
+static void
+emit_strcmp_scalar_compare_byte (rtx result, rtx data1, rtx data2,
+				 rtx final_label)
+{
+  rtx tmp = gen_reg_rtx (Xmode);
+  do_sub3 (tmp, data1, data2);
+  emit_insn (gen_movsi (result, gen_lowpart (SImode, tmp)));
+  emit_jump_insn (gen_jump (final_label));
+  emit_barrier (); /* No fall-through.  */
+}
+
+/* Generate a sequence to compare two strings in data1 and data2.
+
+   DATA1 is a register which contains string1.
+   DATA2 is a register which contains string2.
+   ORC1 is a register where orc.b(data1) will be stored.
+   CMP_BYTES is the length of the strings.
+   END_LABEL is the location of the code that calculates the return value.  */
+
+static void
+emit_strcmp_scalar_compare_subword (rtx data1, rtx data2, rtx orc1,
+				    unsigned HOST_WIDE_INT cmp_bytes,
+				    rtx end_label)
+{
+  /* Set a NUL-byte after the relevant data (behind the string).  */
+  long long im = -256ll;
+  rtx imask = gen_rtx_CONST_INT (Xmode, im);
+  rtx m_reg = gen_reg_rtx (Xmode);
+  emit_insn (gen_rtx_SET (m_reg, imask));
+  do_rotr3 (m_reg, m_reg, GEN_INT (64 - cmp_bytes * BITS_PER_UNIT));
+  do_and3 (data1, m_reg, data1);
+  do_and3 (data2, m_reg, data2);
+  if (TARGET_ZBB)
+    do_orcb2 (orc1, data1);
+  else
+    do_th_tstnbz2 (orc1, data1);
+  emit_jump_insn (gen_jump (end_label));
+  emit_barrier (); /* No fall-through.  */
+}
+
+/* Generate a sequence to compare two strings in data1 and data2.
+
+   DATA1 is a register which contains string1.
+   DATA2 is a register which contains string2.
+   ORC1 is a register where orc.b(data1) will be stored.
+   TESTVAL is the value to test ORC1 against.
+   END_LABEL is the location of the code that calculates the return value.
+   NONUL_END_LABEL is the location of the code that calculates the return value
+   in case the first string does not contain a NULL-byte.  */
+
+static void
+emit_strcmp_scalar_compare_word (rtx data1, rtx data2, rtx orc1, rtx testval,
+				 rtx end_label, rtx nonul_end_label)
+{
+  /* Check if data1 contains a NUL character.  */
+  if (TARGET_ZBB)
+    do_orcb2 (orc1, data1);
+  else
+    do_th_tstnbz2 (orc1, data1);
+  rtx cond1 = gen_rtx_NE (VOIDmode, orc1, testval);
+  emit_unlikely_jump_insn (gen_cbranch4 (Pmode, cond1, orc1, testval,
+					  end_label));
+  /* Break out if u1 != u2 */
+  rtx cond2 = gen_rtx_NE (VOIDmode, data1, data2);
+  emit_unlikely_jump_insn (gen_cbranch4 (Pmode, cond2, data1,
+					 data2, nonul_end_label));
+  /* Fall-through on equality.  */
+}
+
+/* Generate the sequence of compares for strcmp/strncmp using zbb instructions.
+
+   RESULT is the register where the return value of str(n)cmp will be stored.
+   The strings are referenced by SRC1 and SRC2.
+   The number of bytes to compare is defined by NBYTES.
+   DATA1 is a register where string1 will be stored.
+   DATA2 is a register where string2 will be stored.
+   ORC1 is a register where orc.b(data1) will be stored.
+   END_LABEL is the location of the code that calculates the return value.
+   NONUL_END_LABEL is the location of the code that calculates the return value
+   in case the first string does not contain a NULL-byte.
+   FINAL_LABEL is the location of the code that comes after the calculation
+   of the return value.  */
+
+static void
+emit_strcmp_scalar_load_and_compare (rtx result, rtx src1, rtx src2,
+				     unsigned HOST_WIDE_INT nbytes,
+				     rtx data1, rtx data2, rtx orc1,
+				     rtx end_label, rtx nonul_end_label,
+				     rtx final_label)
+{
+  const unsigned HOST_WIDE_INT xlen = GET_MODE_SIZE (Xmode);
+  rtx src1_addr = force_reg (Pmode, XEXP (src1, 0));
+  rtx src2_addr = force_reg (Pmode, XEXP (src2, 0));
+  unsigned HOST_WIDE_INT offset = 0;
+
+  rtx testval = gen_reg_rtx (Xmode);
+  if (TARGET_ZBB)
+    emit_insn (gen_rtx_SET (testval, constm1_rtx));
+  else
+    emit_insn (gen_rtx_SET (testval, const0_rtx));
+
+  while (nbytes > 0)
+    {
+      unsigned HOST_WIDE_INT cmp_bytes = xlen < nbytes ? xlen : nbytes;
+      machine_mode load_mode;
+      if (cmp_bytes == 1)
+	load_mode = QImode;
+      else
+	load_mode = Xmode;
+
+      rtx addr1 = gen_rtx_PLUS (Pmode, src1_addr, GEN_INT (offset));
+      do_load_from_addr (load_mode, data1, addr1, src1);
+      rtx addr2 = gen_rtx_PLUS (Pmode, src2_addr, GEN_INT (offset));
+      do_load_from_addr (load_mode, data2, addr2, src2);
+
+      if (cmp_bytes == 1)
+	{
+	  emit_strcmp_scalar_compare_byte (result, data1, data2, final_label);
+	  return;
+	}
+      else if (cmp_bytes < xlen)
+	{
+	  emit_strcmp_scalar_compare_subword (data1, data2, orc1,
+					      cmp_bytes, end_label);
+	  return;
+	}
+      else
+	emit_strcmp_scalar_compare_word (data1, data2, orc1, testval,
+					 end_label, nonul_end_label);
+
+      offset += cmp_bytes;
+      nbytes -= cmp_bytes;
+    }
+}
+
+/* Fixup pointers and generate a call to strcmp.
+
+   RESULT is the register where the return value of str(n)cmp will be stored.
+   The strings are referenced by SRC1 and SRC2.
+   The number of already compared bytes is defined by NBYTES.  */
+
+static void
+emit_strcmp_scalar_call_to_libc (rtx result, rtx src1, rtx src2,
+				 unsigned HOST_WIDE_INT nbytes)
+{
+  /* Update pointers past what has been compared already.  */
+  rtx src1_addr = force_reg (Pmode, XEXP (src1, 0));
+  rtx src2_addr = force_reg (Pmode, XEXP (src2, 0));
+  rtx src1_new = force_reg (Pmode,
+			    gen_rtx_PLUS (Pmode, src1_addr, GEN_INT (nbytes)));
+  rtx src2_new = force_reg (Pmode,
+			    gen_rtx_PLUS (Pmode, src2_addr, GEN_INT (nbytes)));
+
+  /* Construct call to strcmp to compare the rest of the string.  */
+  tree fun = builtin_decl_explicit (BUILT_IN_STRCMP);
+  emit_library_call_value (XEXP (DECL_RTL (fun), 0),
+			   result, LCT_NORMAL, GET_MODE (result),
+			   src1_new, Pmode, src2_new, Pmode);
+}
+
+/* Fast strcmp-result calculation if no NULL-byte in string1.
+
+   RESULT is the register where the return value of str(n)cmp will be stored.
+   The mismatching strings are stored in DATA1 and DATA2.  */
+
+static void
+emit_strcmp_scalar_result_calculation_nonul (rtx result, rtx data1, rtx data2)
+{
+  /* Words don't match, and no NUL byte in one word.
+     Get bytes in big-endian order and compare as words.  */
+  do_bswap2 (data1, data1);
+  do_bswap2 (data2, data2);
+  /* Synthesize (data1 >= data2) ? 1 : -1 in a branchless sequence.  */
+  rtx tmp = gen_reg_rtx (Xmode);
+  emit_insn (gen_slt_3 (LTU, Xmode, Xmode, tmp, data1, data2));
+  do_neg2 (tmp, tmp);
+  do_ior3 (tmp, tmp, const1_rtx);
+  emit_insn (gen_movsi (result, gen_lowpart (SImode, tmp)));
+}
+
+/* strcmp-result calculation.
+
+   RESULT is the register where the return value of str(n)cmp will be stored.
+   The strings are stored in DATA1 and DATA2.
+   ORC1 contains orc.b(DATA1).  */
+
+static void
+emit_strcmp_scalar_result_calculation (rtx result, rtx data1, rtx data2,
+				       rtx orc1)
+{
+  const unsigned HOST_WIDE_INT xlen = GET_MODE_SIZE (Xmode);
+
+  /* Convert non-equal bytes into non-NUL bytes.  */
+  rtx diff = gen_reg_rtx (Xmode);
+  do_xor3 (diff, data1, data2);
+  rtx shift = gen_reg_rtx (Xmode);
+
+  if (TARGET_ZBB)
+    {
+      /* Convert non-equal or NUL-bytes into non-NUL bytes.  */
+      rtx syndrome = gen_reg_rtx (Xmode);
+      do_orcb2 (diff, diff);
+      do_ior_not3 (syndrome, orc1, diff);
+      /* Count the number of equal bits from the beginning of the word.  */
+      do_ctz2 (shift, syndrome);
+    }
+  else
+    {
+      /* Convert non-equal or NUL-bytes into non-NUL bytes.  */
+      rtx syndrome = gen_reg_rtx (Xmode);
+      do_th_tstnbz2 (diff, diff);
+      do_one_cmpl2 (diff, diff);
+      do_ior3 (syndrome, orc1, diff);
+      /* Count the number of equal bits from the beginning of the word.  */
+      do_th_rev2 (syndrome, syndrome);
+      do_clz2 (shift, syndrome);
+    }
+
+  do_bswap2 (data1, data1);
+  do_bswap2 (data2, data2);
+
+  /* The most-significant-non-zero bit of the syndrome marks either the
+     first bit that is different, or the top bit of the first zero byte.
+     Shifting left now will bring the critical information into the
+     top bits.  */
+  do_ashl3 (data1, data1, gen_lowpart (QImode, shift));
+  do_ashl3 (data2, data2, gen_lowpart (QImode, shift));
+
+  /* But we need to zero-extend (char is unsigned) the value and then
+     perform a signed 32-bit subtraction.  */
+  unsigned int shiftr = (xlen - 1) * BITS_PER_UNIT;
+  do_lshr3 (data1, data1, GEN_INT (shiftr));
+  do_lshr3 (data2, data2, GEN_INT (shiftr));
+  rtx tmp = gen_reg_rtx (Xmode);
+  do_sub3 (tmp, data1, data2);
+  emit_insn (gen_movsi (result, gen_lowpart (SImode, tmp)));
+}
+
+/* Expand str(n)cmp using Zbb/TheadBb instructions.
+
+   The result will be stored in RESULT.
+   The strings are referenced by SRC1 and SRC2.
+   The number of bytes to compare is defined by NBYTES.
+   The alignment is defined by ALIGNMENT.
+   If NCOMPARE is false then libc's strcmp() will be called if comparing
+   NBYTES of both strings did not find differences or NULL-bytes.
+
+   Return true if expansion was successful, or false otherwise.  */
+
+static bool
+riscv_expand_strcmp_scalar (rtx result, rtx src1, rtx src2,
+			    unsigned HOST_WIDE_INT nbytes,
+			    unsigned HOST_WIDE_INT alignment,
+			    bool ncompare)
+{
+  const unsigned HOST_WIDE_INT xlen = GET_MODE_SIZE (Xmode);
+
+  gcc_assert (TARGET_ZBB || TARGET_XTHEADBB);
+  gcc_assert (nbytes > 0);
+  gcc_assert ((int)nbytes <= riscv_strcmp_inline_limit);
+  gcc_assert (ncompare || (nbytes & (xlen - 1)) == 0);
+
+  /* Limit to 12-bits (maximum load-offset).  */
+  if (nbytes > IMM_REACH)
+    nbytes = IMM_REACH;
+
+  /* We don't support big endian.  */
+  if (BYTES_BIG_ENDIAN)
+    return false;
+
+  /* We need xlen-aligned strings.  */
+  if (alignment < xlen)
+    return false;
+
+  /* Overall structure of emitted code:
+       Load-and-compare:
+	 - Load data1 and data2
+	 - Set orc1 := orc.b (data1) (or th.tstnbz)
+	 - Compare strings and either:
+	   - Fall-through on equality
+	   - Jump to nonul_end_label if data1 !or end_label
+	   - Calculate result value and jump to final_label
+       // Fall-through
+       Call-to-libc or set result to 0 (depending on ncompare)
+       Jump to final_label
+     nonul_end_label: // words don't match, and no null byte in first word.
+       Calculate result value with the use of data1, data2 and orc1
+       Jump to final_label
+     end_label:
+       Calculate result value with the use of data1, data2 and orc1
+       Jump to final_label
+     final_label:
+       // Nothing.  */
+
+  rtx data1 = gen_reg_rtx (Xmode);
+  rtx data2 = gen_reg_rtx (Xmode);
+  rtx orc1 = gen_reg_rtx (Xmode);
+  rtx nonul_end_label = gen_label_rtx ();
+  rtx end_label = gen_label_rtx ();
+  rtx final_label = gen_label_rtx ();
+
+  /* Generate a sequence of zbb instructions to compare out
+     to the length specified.  */
+  emit_strcmp_scalar_load_and_compare (result, src1, src2, nbytes,
+				       data1, data2, orc1,
+				       end_label, nonul_end_label, final_label);
+
+  /* All compared and everything was equal.  */
+  if (ncompare)
+    {
+      emit_insn (gen_rtx_SET (result, gen_rtx_CONST_INT (SImode, 0)));
+      emit_jump_insn (gen_jump (final_label));
+      emit_barrier (); /* No fall-through.  */
+    }
+  else
+    {
+      emit_strcmp_scalar_call_to_libc (result, src1, src2, nbytes);
+      emit_jump_insn (gen_jump (final_label));
+      emit_barrier (); /* No fall-through.  */
+    }
+
+
+  emit_label (nonul_end_label);
+  emit_strcmp_scalar_result_calculation_nonul (result, data1, data2);
+  emit_jump_insn (gen_jump (final_label));
+  emit_barrier (); /* No fall-through.  */
+
+  emit_label (end_label);
+  emit_strcmp_scalar_result_calculation (result, data1, data2, orc1);
+  emit_jump_insn (gen_jump (final_label));
+  emit_barrier (); /* No fall-through.  */
+
+  emit_label (final_label);
+  return true;
+}
+
+/* Expand a string compare operation.
+
+   The result will be stored in RESULT.
+   The strings are referenced by SRC1 and SRC2.
+   The argument BYTES_RTX either holds the number of characters to
+   compare, or is NULL_RTX. The argument ALIGN_RTX holds the alignment.
+
+   Return true if expansion was successful, or false otherwise.  */
+
+bool
+riscv_expand_strcmp (rtx result, rtx src1, rtx src2,
+		     rtx bytes_rtx, rtx align_rtx)
+{
+  unsigned HOST_WIDE_INT compare_max;
+  unsigned HOST_WIDE_INT nbytes;
+  unsigned HOST_WIDE_INT alignment;
+  bool ncompare = bytes_rtx != NULL_RTX;
+  const unsigned HOST_WIDE_INT xlen = GET_MODE_SIZE (Xmode);
+
+  if (riscv_strcmp_inline_limit == 0)
+    return false;
+
+  /* Round down the comparision limit to a multiple of xlen.  */
+  compare_max = riscv_strcmp_inline_limit & ~(xlen - 1);
+
+  /* Decide how many bytes to compare inline.  */
+  if (bytes_rtx == NULL_RTX)
+    {
+      nbytes = compare_max;
+    }
+  else
+    {
+      /* If we have a length, it must be constant.  */
+      if (!CONST_INT_P (bytes_rtx))
+	return false;
+      nbytes = UINTVAL (bytes_rtx);
+
+      /* We don't emit parts of a strncmp() call.  */
+      if (nbytes > compare_max)
+	return false;
+    }
+
+  /* Guarantees:
+     - nbytes > 0
+     - nbytes <= riscv_strcmp_inline_limit
+     - nbytes is a multiple of xlen if !ncompare  */
+
+  if (!CONST_INT_P (align_rtx))
+    return false;
+  alignment = UINTVAL (align_rtx);
+
+  if (TARGET_ZBB || TARGET_XTHEADBB)
+    {
+      return riscv_expand_strcmp_scalar (result, src1, src2, nbytes, alignment,
+					 ncompare);
+    }
+
+  return false;
+}
+
 /* If the provided string is aligned, then read XLEN bytes
    in a loop and use orc.b to find NUL-bytes.  */
 
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 649685ee05d..e00b8ee3579 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2864,7 +2864,7 @@
   [(set_attr "type" "slt")
    (set_attr "mode" "<X:MODE>")])
 
-(define_insn "*slt<u>_<X:mode><GPR:mode>"
+(define_insn "@slt<u>_<X:mode><GPR:mode>3"
   [(set (match_operand:GPR           0 "register_operand" "= r")
 	(any_lt:GPR (match_operand:X 1 "register_operand" "  r")
 		    (match_operand:X 2 "arith_operand"    " rI")))]
@@ -3510,6 +3510,48 @@
   "TARGET_XTHEADMAC"
 )
 
+;; String compare with length insn.
+;; Argument 0 is the target (result)
+;; Argument 1 is the source1
+;; Argument 2 is the source2
+;; Argument 3 is the length
+;; Argument 4 is the alignment
+
+(define_expand "cmpstrnsi"
+  [(parallel [(set (match_operand:SI 0)
+	      (compare:SI (match_operand:BLK 1)
+			  (match_operand:BLK 2)))
+	      (use (match_operand:SI 3))
+	      (use (match_operand:SI 4))])]
+  "riscv_inline_strncmp && !optimize_size && (TARGET_ZBB || TARGET_XTHEADBB)"
+{
+  if (riscv_expand_strcmp (operands[0], operands[1], operands[2],
+                           operands[3], operands[4]))
+    DONE;
+  else
+    FAIL;
+})
+
+;; String compare insn.
+;; Argument 0 is the target (result)
+;; Argument 1 is the source1
+;; Argument 2 is the source2
+;; Argument 3 is the alignment
+
+(define_expand "cmpstrsi"
+  [(parallel [(set (match_operand:SI 0)
+	      (compare:SI (match_operand:BLK 1)
+			  (match_operand:BLK 2)))
+	      (use (match_operand:SI 3))])]
+  "riscv_inline_strcmp && !optimize_size && (TARGET_ZBB || TARGET_XTHEADBB)"
+{
+  if (riscv_expand_strcmp (operands[0], operands[1], operands[2],
+                           NULL_RTX, operands[3]))
+    DONE;
+  else
+    FAIL;
+})
+
 ;; Search character in string (generalization of strlen).
 ;; Argument 0 is the resulting offset
 ;; Argument 1 is the string
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 13f655819b7..21d00606f25 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -281,10 +281,22 @@ minline-atomics
 Target Var(TARGET_INLINE_SUBWORD_ATOMIC) Init(1)
 Always inline subword atomic operations.
 
+minline-strcmp
+Target Bool Var(riscv_inline_strcmp) Init(0)
+Inline strcmp calls if possible.
+
+minline-strncmp
+Target Bool Var(riscv_inline_strncmp) Init(0)
+Inline strncmp calls if possible.
+
 minline-strlen
 Target Bool Var(riscv_inline_strlen) Init(0)
 Inline strlen calls if possible.
 
+-param=riscv-strcmp-inline-limit=
+Target RejectNegative Joined UInteger Var(riscv_strcmp_inline_limit) Init(64)
+Max number of bytes to compare as part of inlined strcmp/strncmp routines (default: 64).
+
 Enum
 Name(riscv_autovec_preference) Type(enum riscv_autovec_preference_enum)
 Valid arguments to -param=riscv-autovec-preference=:
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 3ca409d090c..ea682f036c0 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1228,7 +1228,9 @@ See RS/6000 and PowerPC Options.
 -mstack-protector-guard-offset=@var{offset}
 -mcsr-check -mno-csr-check
 -minline-atomics  -mno-inline-atomics
--minline-strlen  -mno-inline-strlen}
+-minline-strlen  -mno-inline-strlen
+-minline-strcmp  -mno-inline-strcmp
+-minline-strncmp  -mno-inline-strncmp}
 
 @emph{RL78 Options}
 @gccoptlist{-msim  -mmul=none  -mmul=g13  -mmul=g14  -mallregs
@@ -29037,6 +29039,22 @@ Inlining will only be done if the string is properly aligned
 and instructions for accelerated processing are available.
 The default is to not inline strlen calls.
 
+@opindex minline-strcmp
+@item -minline-strcmp
+@itemx -mno-inline-strcmp
+Do or do not attempt to inline strcmp calls if possible.
+Inlining will only be done if the strings are properly aligned
+and instructions for accelerated processing are available.
+The default is to not inline strcmp calls.
+
+@opindex minline-strncmp
+@item -minline-strncmp
+@itemx -mno-inline-strncmp
+Do or do not attempt to inline strncmp calls if possible.
+Inlining will only be done if the strings are properly aligned
+and instructions for accelerated processing are available.
+The default is to not inline strncmp calls.
+
 @opindex mshorten-memrefs
 @item -mshorten-memrefs
 @itemx -mno-shorten-memrefs
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadbb-strcmp.c b/gcc/testsuite/gcc.target/riscv/xtheadbb-strcmp.c
new file mode 100644
index 00000000000..6b88912d828
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/xtheadbb-strcmp.c
@@ -0,0 +1,57 @@
+/* { dg-do compile } */
+/* { dg-options "-minline-strcmp -minline-strncmp -march=rv32gc_xtheadbb" { target { rv32 } } } */
+/* { dg-options "-minline-strcmp -minline-strncmp -march=rv64gc_xtheadbb" { target { rv64 } } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
+
+typedef long unsigned int size_t;
+
+/* Emits 8+1 th.tstnbz instructions.  */
+
+int
+my_str_cmp (const char *s1, const char *s2)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  s2 = __builtin_assume_aligned (s2, 4096);
+  return __builtin_strcmp (s1, s2);
+}
+
+/* 8+1 because the backend does not know the size of "foo".  */
+
+int
+my_str_cmp_const (const char *s1)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  return __builtin_strcmp (s1, "foo");
+}
+
+/* Emits 6+1 th.tstnbz instructions.  */
+
+int
+my_strn_cmp (const char *s1, const char *s2)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  s2 = __builtin_assume_aligned (s2, 4096);
+  return __builtin_strncmp (s1, s2, 42);
+}
+
+/* Note expanded because the backend does not know the size of "foo".  */
+
+int
+my_strn_cmp_const (const char *s1, size_t n)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  return __builtin_strncmp (s1, "foo", n);
+}
+
+/* Emits 6+1 th.tstnbz instructions.  */
+
+int
+my_strn_cmp_bounded (const char *s1, const char *s2)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  s2 = __builtin_assume_aligned (s2, 4096);
+  return __builtin_strncmp (s1, s2, 42);
+}
+
+/* { dg-final { scan-assembler-times "th.tstnbz\t" 32 { target { rv64 } } } } */
+/* { dg-final { scan-assembler-times "th.tstnbz\t" 58 { target { rv32 } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-strcmp-disabled-2.c b/gcc/testsuite/gcc.target/riscv/zbb-strcmp-disabled-2.c
new file mode 100644
index 00000000000..f0b3cd542e0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-strcmp-disabled-2.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc_zbb" { target { rv32 } } } */
+/* { dg-options "-march=rv64gc_zbb" { target { rv64 } } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
+
+typedef long unsigned int size_t;
+
+int
+my_str_cmp (const char *s1, const char *s2)
+{
+  return __builtin_strcmp (s1, s2);
+}
+
+int
+my_str_cmp_const (const char *s1)
+{
+  return __builtin_strcmp (s1, "foo");
+}
+
+int
+my_strn_cmp (const char *s1, const char *s2, size_t n)
+{
+  return __builtin_strncmp (s1, s2, n);
+}
+
+int
+my_strn_cmp_const (const char *s1, size_t n)
+{
+  return __builtin_strncmp (s1, "foo", n);
+}
+
+int
+my_strn_cmp_bounded (const char *s1, const char *s2)
+{
+  return __builtin_strncmp (s1, s2, 42);
+}
+
+/* { dg-final { scan-assembler-not "orc.b\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-strcmp-disabled.c b/gcc/testsuite/gcc.target/riscv/zbb-strcmp-disabled.c
new file mode 100644
index 00000000000..68497d53280
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-strcmp-disabled.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-mno-inline-strcmp -mno-inline-strncmp -march=rv32gc_zbb" { target { rv32 } } } */
+/* { dg-options "-mno-inline-strcmp -mno-inline-strncmp -march=rv64gc_zbb" { target { rv64 } } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
+
+typedef long unsigned int size_t;
+
+int
+my_str_cmp (const char *s1, const char *s2)
+{
+  return __builtin_strcmp (s1, s2);
+}
+
+int
+my_str_cmp_const (const char *s1)
+{
+  return __builtin_strcmp (s1, "foo");
+}
+
+int
+my_strn_cmp (const char *s1, const char *s2, size_t n)
+{
+  return __builtin_strncmp (s1, s2, n);
+}
+
+int
+my_strn_cmp_const (const char *s1, size_t n)
+{
+  return __builtin_strncmp (s1, "foo", n);
+}
+
+int
+my_strn_cmp_bounded (const char *s1, const char *s2)
+{
+  return __builtin_strncmp (s1, s2, 42);
+}
+
+/* { dg-final { scan-assembler-not "orc.b\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-strcmp-limit.c b/gcc/testsuite/gcc.target/riscv/zbb-strcmp-limit.c
new file mode 100644
index 00000000000..6bcbd70b542
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-strcmp-limit.c
@@ -0,0 +1,57 @@
+/* { dg-do compile } */
+/* { dg-options "-minline-strcmp -minline-strncmp --param=riscv-strcmp-inline-limit=32 -march=rv32gc_zbb" { target { rv32 } } } */
+/* { dg-options "-minline-strcmp -minline-strncmp --param=riscv-strcmp-inline-limit=32 -march=rv64gc_zbb" { target { rv64 } } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
+
+typedef long unsigned int size_t;
+
+/* Emits 8+1 orc.b instructions.  */
+
+int
+my_str_cmp (const char *s1, const char *s2)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  s2 = __builtin_assume_aligned (s2, 4096);
+  return __builtin_strcmp (s1, s2);
+}
+
+/* 8+1 because the backend does not know the size of "foo".  */
+
+int
+my_str_cmp_const (const char *s1)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  return __builtin_strcmp (s1, "foo");
+}
+
+/* Emits 6+1 orc.b instructions.  */
+
+int
+my_strn_cmp (const char *s1, const char *s2)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  s2 = __builtin_assume_aligned (s2, 4096);
+  return __builtin_strncmp (s1, s2, 42);
+}
+
+/* Note expanded because the backend does not know the size of "foo".  */
+
+int
+my_strn_cmp_const (const char *s1, size_t n)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  return __builtin_strncmp (s1, "foo", n);
+}
+
+/* Emits 6+1 orc.b instructions.  */
+
+int
+my_strn_cmp_bounded (const char *s1, const char *s2)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  s2 = __builtin_assume_aligned (s2, 4096);
+  return __builtin_strncmp (s1, s2, 42);
+}
+
+/* { dg-final { scan-assembler-times "orc.b\t" 10 { target { rv64 } } } } */
+/* { dg-final { scan-assembler-times "orc.b\t" 18 { target { rv32 } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-strcmp-unaligned.c b/gcc/testsuite/gcc.target/riscv/zbb-strcmp-unaligned.c
new file mode 100644
index 00000000000..191187643c1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-strcmp-unaligned.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-minline-strcmp -minline-strncmp -march=rv32gc_zbb" { target { rv32 } } } */
+/* { dg-options "-minline-strcmp -minline-strncmp -march=rv64gc_zbb" { target { rv64 } } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
+
+typedef long unsigned int size_t;
+
+int
+my_str_cmp (const char *s1, const char *s2)
+{
+  return __builtin_strcmp (s1, s2);
+}
+
+int
+my_str_cmp_const (const char *s1)
+{
+  return __builtin_strcmp (s1, "foo");
+}
+
+int
+my_strn_cmp (const char *s1, const char *s2, size_t n)
+{
+  return __builtin_strncmp (s1, s2, n);
+}
+
+int
+my_strn_cmp_const (const char *s1, size_t n)
+{
+  return __builtin_strncmp (s1, "foo", n);
+}
+
+int
+my_strn_cmp_bounded (const char *s1, const char *s2)
+{
+  return __builtin_strncmp (s1, s2, 42);
+}
+
+/* { dg-final { scan-assembler-not "orc.b\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-strcmp.c b/gcc/testsuite/gcc.target/riscv/zbb-strcmp.c
new file mode 100644
index 00000000000..f64aa34a162
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-strcmp.c
@@ -0,0 +1,57 @@
+/* { dg-do compile } */
+/* { dg-options "-minline-strcmp -minline-strncmp -march=rv32gc_zbb" { target { rv32 } } } */
+/* { dg-options "-minline-strcmp -minline-strncmp -march=rv64gc_zbb" { target { rv64 } } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
+
+typedef long unsigned int size_t;
+
+/* Emits 8+1 orc.b instructions.  */
+
+int
+my_str_cmp (const char *s1, const char *s2)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  s2 = __builtin_assume_aligned (s2, 4096);
+  return __builtin_strcmp (s1, s2);
+}
+
+/* 8+1 because the backend does not know the size of "foo".  */
+
+int
+my_str_cmp_const (const char *s1)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  return __builtin_strcmp (s1, "foo");
+}
+
+/* Emits 6+1 orc.b instructions.  */
+
+int
+my_strn_cmp (const char *s1, const char *s2)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  s2 = __builtin_assume_aligned (s2, 4096);
+  return __builtin_strncmp (s1, s2, 42);
+}
+
+/* Note expanded because the backend does not know the size of "foo".  */
+
+int
+my_strn_cmp_const (const char *s1, size_t n)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  return __builtin_strncmp (s1, "foo", n);
+}
+
+/* Emits 6+1 orc.b instructions.  */
+
+int
+my_strn_cmp_bounded (const char *s1, const char *s2)
+{
+  s1 = __builtin_assume_aligned (s1, 4096);
+  s2 = __builtin_assume_aligned (s2, 4096);
+  return __builtin_strncmp (s1, s2, 42);
+}
+
+/* { dg-final { scan-assembler-times "orc.b\t" 32 { target { rv64 } } } } */
+/* { dg-final { scan-assembler-times "orc.b\t" 58 { target { rv32 } } } } */

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2023-09-18 18:25 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-18 18:25 [gcc(refs/vendors/riscv/heads/gcc-13-with-riscv-opts)] riscv: Add support for str(n)cmp inline expansion Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).