* Re: [PATCH] RISC-V: Vectorized str(n)cmp and strlen.
2023-11-30 22:22 [PATCH] RISC-V: Vectorized str(n)cmp and strlen Robin Dapp
2023-12-01 0:49 ` Jeff Law
@ 2023-12-01 0:58 ` juzhe.zhong
2023-12-01 1:04 ` juzhe.zhong
2 siblings, 0 replies; 5+ messages in thread
From: juzhe.zhong @ 2023-12-01 0:58 UTC (permalink / raw)
To: Robin Dapp, gcc-patches, palmer, kito.cheng, jeffreyalaw; +Cc: Robin Dapp
[-- Attachment #1: Type: text/plain, Size: 29110 bytes --]
Hi, Robin.
Thanks for working on this. I know this is a tedious work.
A couple comments here:
- if (TARGET_ZBB || TARGET_XTHEADBB)
+ if (TARGET_VECTOR && stringop_strategy & STRINGOP_STRATEGY_VECTOR)
+ {
+ bool ok = riscv_vector::expand_strcmp (result, src1, src2, bytes_rtx,
+ alignment, ncompare);
+ if (ok)
+ return true;
+ }
+ if (TARGET_VECTOR && (stringop_strategy & STRINGOP_STRATEGY_VECTOR))
+ {
+ riscv_vector::expand_rawmemchr (E_QImode, result, src, search_char,
+ /* strlen */ true);
+ return true;
+ }
To make code consistent, I think you should change it cpymem:
(define_expand "cpymem<mode>"
[(parallel [(set (match_operand:BLK 0 "general_operand")
(match_operand:BLK 1 "general_operand"))
(use (match_operand:P 2 ""))
(use (match_operand:SI 3 "const_int_operand"))])]
""
{
if (riscv_vector::expand_block_move (operands[0], operands[1], operands[2]))
DONE;
else if (riscv_expand_block_move (operands[0], operands[1], operands[2]))
DONE;
else
FAIL;
})
Or you should change cpymem code first (in another patch) like strcmp/strlen you did in this patch.
I don't have strong opinion here, depend on you.
-bool
-riscv_expand_block_move (rtx dest, rtx src, rtx length)
+static bool
+riscv_expand_block_move_scalar (rtx dest, rtx src, rtx length)
{
- if (riscv_memcpy_strategy == USE_LIBCALL
- || riscv_memcpy_strategy == USE_VECTOR)
+ if (!CONST_INT_P (length))
return false;
- if (CONST_INT_P (length))
- {
- unsigned HOST_WIDE_INT hwi_length = UINTVAL (length);
- unsigned HOST_WIDE_INT factor, align;
+ unsigned HOST_WIDE_INT hwi_length = UINTVAL (length);
+ unsigned HOST_WIDE_INT factor, align;
- align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD);
- factor = BITS_PER_WORD / align;
+ align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD);
+ factor = BITS_PER_WORD / align;
- if (optimize_function_for_size_p (cfun)
- && hwi_length * factor * UNITS_PER_WORD > MOVE_RATIO (false))
- return false;
+ if (optimize_function_for_size_p (cfun)
+ && hwi_length * factor * UNITS_PER_WORD > MOVE_RATIO (false))
+ return false;
- if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor))
+ if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor))
+ {
+ riscv_block_move_straight (dest, src, INTVAL (length));
+ return true;
+ }
+ else if (optimize && align >= BITS_PER_WORD)
+ {
+ unsigned min_iter_words
+ = RISCV_MAX_MOVE_BYTES_PER_LOOP_ITER / UNITS_PER_WORD;
+ unsigned iter_words = min_iter_words;
+ unsigned HOST_WIDE_INT bytes = hwi_length;
+ unsigned HOST_WIDE_INT words = bytes / UNITS_PER_WORD;
+
+ /* Lengthen the loop body if it shortens the tail. */
+ for (unsigned i = min_iter_words; i < min_iter_words * 2 - 1; i++)
{
- riscv_block_move_straight (dest, src, INTVAL (length));
- return true;
+ unsigned cur_cost = iter_words + words % iter_words;
+ unsigned new_cost = i + words % i;
+ if (new_cost <= cur_cost)
+ iter_words = i;
}
- else if (optimize && align >= BITS_PER_WORD)
- {
- unsigned min_iter_words
- = RISCV_MAX_MOVE_BYTES_PER_LOOP_ITER / UNITS_PER_WORD;
- unsigned iter_words = min_iter_words;
- unsigned HOST_WIDE_INT bytes = hwi_length;
- unsigned HOST_WIDE_INT words = bytes / UNITS_PER_WORD;
-
- /* Lengthen the loop body if it shortens the tail. */
- for (unsigned i = min_iter_words; i < min_iter_words * 2 - 1; i++)
- {
- unsigned cur_cost = iter_words + words % iter_words;
- unsigned new_cost = i + words % i;
- if (new_cost <= cur_cost)
- iter_words = i;
- }
- riscv_block_move_loop (dest, src, bytes, iter_words * UNITS_PER_WORD);
- return true;
- }
+ riscv_block_move_loop (dest, src, bytes, iter_words * UNITS_PER_WORD);
+ return true;
+ }
+
+ return false;
+}
I don't understand why you touch scalar part here ? It looks like formating ?
If yes, it should be another separate patch.
Otherwise, Ok from my side.
juzhe.zhong@rivai.ai
From: Robin Dapp
Date: 2023-12-01 06:22
To: gcc-patches; palmer; Kito Cheng; jeffreyalaw; juzhe.zhong@rivai.ai
CC: rdapp.gcc
Subject: [PATCH] RISC-V: Vectorized str(n)cmp and strlen.
Hi,
this adds vectorized implementations of strcmp and strncmp as well as
strlen. strlen falls back to the previously implemented rawmemchr.
Also, it fixes a rawmemchr bug causing a SPEC2017 execution failure:
We would only ever increment the source address by 1 regardless of
the input type.
The patch also changes the stringop-strategy handling slightly:
auto is now an aggregate (including vector and scalar,
possibly more in the future) and expansion functions try all
matching strategies in their preferred order.
As before, str* expansion is guarded by -minline-str* and not active
by default. This might change in the future as I would rather have
those on by default. As of now, though, there is still a latent bug:
With -minline-strlen and -minline-strcmp we have several execution
failures in gcc.c-torture/execute/builtins/. From my initial analysis
it looks like we don't insert a vsetvl at the right spot (which would
be right after a setjmp in those cases). This leaves the initial
vle8ff without a proper vtype or vl causing a SIGILL.
Still, I figured I'd rather post the patch as-is so the bug can be
reproduced upstream.
Regards
Robin
gcc/ChangeLog:
PR target/112109
* config/riscv/riscv-opts.h (enum riscv_stringop_strategy_enum):
Rename.
(enum stringop_strategy_enum): To this.
* config/riscv/riscv-protos.h (expand_rawmemchr): Add strlen
param.
(expand_strcmp): Define.
* config/riscv/riscv-string.cc (riscv_expand_strcmp): Add
vector version.
(riscv_expand_strlen): Ditto.
(riscv_expand_block_move_scalar): Handle existing scalar expansion.
(riscv_expand_block_move): Expand to either vector or scalar
version.
(expand_block_move): Add stringop strategy.
(expand_rawmemchr): Handle strlen and fix increment bug.
(expand_strcmp): New expander.
* config/riscv/riscv.md: Add vector.
* config/riscv/riscv.opt: Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c: New test.
* gcc.target/riscv/rvv/autovec/builtin/strcmp.c: New test.
* gcc.target/riscv/rvv/autovec/builtin/strlen-run.c: New test.
* gcc.target/riscv/rvv/autovec/builtin/strlen.c: New test.
---
gcc/config/riscv/riscv-opts.h | 20 +-
gcc/config/riscv/riscv-protos.h | 4 +-
gcc/config/riscv/riscv-string.cc | 287 +++++++++++++++---
gcc/config/riscv/riscv.md | 18 +-
gcc/config/riscv/riscv.opt | 18 +-
.../riscv/rvv/autovec/builtin/strcmp-run.c | 32 ++
.../riscv/rvv/autovec/builtin/strcmp.c | 13 +
.../riscv/rvv/autovec/builtin/strlen-run.c | 37 +++
.../riscv/rvv/autovec/builtin/strlen.c | 12 +
9 files changed, 363 insertions(+), 78 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen-run.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen.c
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index e6e55ad7071..315f6ddb239 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -103,16 +103,16 @@ enum riscv_entity
MAX_RISCV_ENTITIES
};
-/* RISC-V stringop strategy. */
-enum riscv_stringop_strategy_enum {
- /* Use scalar or vector instructions. */
- USE_AUTO,
- /* Always use a library call. */
- USE_LIBCALL,
- /* Only use scalar instructions. */
- USE_SCALAR,
- /* Only use vector instructions. */
- USE_VECTOR
+/* RISC-V builtin strategy. */
+enum stringop_strategy_enum {
+ /* No expansion. */
+ STRINGOP_STRATEGY_LIBCALL = 1,
+ /* Use scalar expansion if possible. */
+ STRINGOP_STRATEGY_SCALAR = 2,
+ /* Only vector expansion if possible. */
+ STRINGOP_STRATEGY_VECTOR = 4,
+ /* Use any. */
+ STRINGOP_STRATEGY_AUTO = STRINGOP_STRATEGY_SCALAR | STRINGOP_STRATEGY_VECTOR
};
#define TARGET_ZICOND_LIKE (TARGET_ZICOND || (TARGET_XVENTANACONDOPS && TARGET_64BIT))
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 695ee24ad6f..51359154846 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -557,7 +557,9 @@ void expand_cond_unop (unsigned, rtx *);
void expand_cond_binop (unsigned, rtx *);
void expand_cond_ternop (unsigned, rtx *);
void expand_popcount (rtx *);
-void expand_rawmemchr (machine_mode, rtx, rtx, rtx);
+void expand_rawmemchr (machine_mode, rtx, rtx, rtx, bool = false);
+bool expand_strcmp (rtx, rtx, rtx, rtx,
+ unsigned HOST_WIDE_INT, bool);
void emit_vec_extract (rtx, rtx, poly_int64);
/* Rounding mode bitfield for fixed point VXRM. */
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 80e3b5981af..ce259831a5c 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -511,7 +511,16 @@ riscv_expand_strcmp (rtx result, rtx src1, rtx src2,
return false;
alignment = UINTVAL (align_rtx);
- if (TARGET_ZBB || TARGET_XTHEADBB)
+ if (TARGET_VECTOR && stringop_strategy & STRINGOP_STRATEGY_VECTOR)
+ {
+ bool ok = riscv_vector::expand_strcmp (result, src1, src2, bytes_rtx,
+ alignment, ncompare);
+ if (ok)
+ return true;
+ }
+
+ if ((TARGET_ZBB || TARGET_XTHEADBB)
+ && stringop_strategy & STRINGOP_STRATEGY_SCALAR)
{
return riscv_expand_strcmp_scalar (result, src1, src2, nbytes, alignment,
ncompare);
@@ -588,9 +597,17 @@ riscv_expand_strlen_scalar (rtx result, rtx src, rtx align)
bool
riscv_expand_strlen (rtx result, rtx src, rtx search_char, rtx align)
{
+ if (TARGET_VECTOR && (stringop_strategy & STRINGOP_STRATEGY_VECTOR))
+ {
+ riscv_vector::expand_rawmemchr (E_QImode, result, src, search_char,
+ /* strlen */ true);
+ return true;
+ }
+
gcc_assert (search_char == const0_rtx);
- if (TARGET_ZBB || TARGET_XTHEADBB)
+ if ((TARGET_ZBB || TARGET_XTHEADBB)
+ && stringop_strategy & STRINGOP_STRATEGY_SCALAR)
return riscv_expand_strlen_scalar (result, src, align);
return false;
@@ -707,51 +724,68 @@ riscv_block_move_loop (rtx dest, rtx src, unsigned HOST_WIDE_INT length,
/* Expand a cpymemsi instruction, which copies LENGTH bytes from
memory reference SRC to memory reference DEST. */
-bool
-riscv_expand_block_move (rtx dest, rtx src, rtx length)
+static bool
+riscv_expand_block_move_scalar (rtx dest, rtx src, rtx length)
{
- if (riscv_memcpy_strategy == USE_LIBCALL
- || riscv_memcpy_strategy == USE_VECTOR)
+ if (!CONST_INT_P (length))
return false;
- if (CONST_INT_P (length))
- {
- unsigned HOST_WIDE_INT hwi_length = UINTVAL (length);
- unsigned HOST_WIDE_INT factor, align;
+ unsigned HOST_WIDE_INT hwi_length = UINTVAL (length);
+ unsigned HOST_WIDE_INT factor, align;
- align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD);
- factor = BITS_PER_WORD / align;
+ align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD);
+ factor = BITS_PER_WORD / align;
- if (optimize_function_for_size_p (cfun)
- && hwi_length * factor * UNITS_PER_WORD > MOVE_RATIO (false))
- return false;
+ if (optimize_function_for_size_p (cfun)
+ && hwi_length * factor * UNITS_PER_WORD > MOVE_RATIO (false))
+ return false;
- if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor))
+ if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor))
+ {
+ riscv_block_move_straight (dest, src, INTVAL (length));
+ return true;
+ }
+ else if (optimize && align >= BITS_PER_WORD)
+ {
+ unsigned min_iter_words
+ = RISCV_MAX_MOVE_BYTES_PER_LOOP_ITER / UNITS_PER_WORD;
+ unsigned iter_words = min_iter_words;
+ unsigned HOST_WIDE_INT bytes = hwi_length;
+ unsigned HOST_WIDE_INT words = bytes / UNITS_PER_WORD;
+
+ /* Lengthen the loop body if it shortens the tail. */
+ for (unsigned i = min_iter_words; i < min_iter_words * 2 - 1; i++)
{
- riscv_block_move_straight (dest, src, INTVAL (length));
- return true;
+ unsigned cur_cost = iter_words + words % iter_words;
+ unsigned new_cost = i + words % i;
+ if (new_cost <= cur_cost)
+ iter_words = i;
}
- else if (optimize && align >= BITS_PER_WORD)
- {
- unsigned min_iter_words
- = RISCV_MAX_MOVE_BYTES_PER_LOOP_ITER / UNITS_PER_WORD;
- unsigned iter_words = min_iter_words;
- unsigned HOST_WIDE_INT bytes = hwi_length;
- unsigned HOST_WIDE_INT words = bytes / UNITS_PER_WORD;
-
- /* Lengthen the loop body if it shortens the tail. */
- for (unsigned i = min_iter_words; i < min_iter_words * 2 - 1; i++)
- {
- unsigned cur_cost = iter_words + words % iter_words;
- unsigned new_cost = i + words % i;
- if (new_cost <= cur_cost)
- iter_words = i;
- }
- riscv_block_move_loop (dest, src, bytes, iter_words * UNITS_PER_WORD);
- return true;
- }
+ riscv_block_move_loop (dest, src, bytes, iter_words * UNITS_PER_WORD);
+ return true;
+ }
+
+ return false;
+}
+
+/* This function delegates block-move expansion to either the vector
+ implementation or the scalar one. Return TRUE if successful or FALSE
+ otherwise. */
+
+bool
+riscv_expand_block_move (rtx dest, rtx src, rtx length)
+{
+ if (TARGET_VECTOR && stringop_strategy & STRINGOP_STRATEGY_VECTOR)
+ {
+ bool ok = riscv_vector::expand_block_move (dest, src, length);
+ if (ok)
+ return true;
}
+
+ if (stringop_strategy & STRINGOP_STRATEGY_SCALAR)
+ return riscv_expand_block_move_scalar (dest, src, length);
+
return false;
}
@@ -777,9 +811,6 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in)
bnez a2, loop # Any more?
ret # Return
*/
- if (!TARGET_VECTOR || riscv_memcpy_strategy == USE_LIBCALL
- || riscv_memcpy_strategy == USE_SCALAR)
- return false;
HOST_WIDE_INT potential_ew
= (MIN (MIN (MEM_ALIGN (src_in), MEM_ALIGN (dst_in)), BITS_PER_WORD)
/ BITS_PER_UNIT);
@@ -968,7 +999,8 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in)
behavior is undefined. */
void
-expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat)
+expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat,
+ bool strlen)
{
/*
rawmemchr:
@@ -1001,6 +1033,8 @@ expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat)
machine_mode mask_mode = riscv_vector::get_mask_mode (vmode);
rtx cnt = gen_reg_rtx (Pmode);
+ emit_move_insn (cnt, CONST0_RTX (Pmode));
+
rtx end = gen_reg_rtx (Pmode);
rtx vec = gen_reg_rtx (vmode);
rtx mask = gen_reg_rtx (mask_mode);
@@ -1011,12 +1045,18 @@ expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat)
unsigned int shift = exact_log2 (GET_MODE_SIZE (mode).to_constant ());
rtx src_addr = copy_addr_to_reg (XEXP (src, 0));
+ rtx start_addr = copy_addr_to_reg (XEXP (src, 0));
rtx loop = gen_label_rtx ();
emit_label (loop);
rtx vsrc = change_address (src, vmode, src_addr);
+ /* Bump the pointer. */
+ rtx step = gen_reg_rtx (Pmode);
+ emit_insn (gen_rtx_SET (step, gen_rtx_ASHIFT (Pmode, cnt, GEN_INT (shift))));
+ emit_insn (gen_rtx_SET (src_addr, gen_rtx_PLUS (Pmode, src_addr, step)));
+
/* Emit a first-fault load. */
rtx vlops[] = {vec, vsrc};
emit_vlmax_insn (code_for_pred_fault_load (vmode),
@@ -1039,19 +1079,166 @@ expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat)
emit_nonvlmax_insn (code_for_pred_ffs (mask_mode, Pmode),
riscv_vector::CPOP_OP, vfops, cnt);
- /* Bump the pointer. */
- emit_insn (gen_rtx_SET (src_addr, gen_rtx_PLUS (Pmode, src_addr, cnt)));
-
/* Emit the loop condition. */
rtx test = gen_rtx_LT (VOIDmode, end, const0_rtx);
emit_jump_insn (gen_cbranch4 (Pmode, test, end, const0_rtx, loop));
- /* We overran by CNT, subtract it. */
- emit_insn (gen_rtx_SET (src_addr, gen_rtx_MINUS (Pmode, src_addr, cnt)));
-
- /* We found something at SRC + END * [1,2,4,8]. */
- emit_insn (gen_rtx_SET (end, gen_rtx_ASHIFT (Pmode, end, GEN_INT (shift))));
- emit_insn (gen_rtx_SET (dst, gen_rtx_PLUS (Pmode, src_addr, end)));
+ if (strlen)
+ {
+ /* For strlen, return the length. */
+ emit_insn (gen_rtx_SET (dst, gen_rtx_PLUS (Pmode, src_addr, end)));
+ emit_insn (gen_rtx_SET (dst, gen_rtx_MINUS (Pmode, dst, start_addr)));
+ }
+ else
+ {
+ /* For rawmemchr, return the position at SRC + END * [1,2,4,8]. */
+ emit_insn (gen_rtx_SET (end, gen_rtx_ASHIFT (Pmode, end, GEN_INT (shift))));
+ emit_insn (gen_rtx_SET (dst, gen_rtx_PLUS (Pmode, src_addr, end)));
+ }
}
+/* Implement cmpstr<mode> using vector instructions. */
+
+bool
+expand_strcmp (rtx result, rtx src1, rtx src2, rtx nbytes,
+ unsigned HOST_WIDE_INT, bool)
+{
+ gcc_assert (TARGET_VECTOR);
+
+ /* We don't support big endian. */
+ if (BYTES_BIG_ENDIAN)
+ return false;
+
+ bool with_length = nbytes != NULL_RTX;
+
+ if (with_length
+ && (!REG_P (nbytes) && !SUBREG_P (nbytes) && !CONST_INT_P (nbytes)))
+ return false;
+
+ if (with_length && CONST_INT_P (nbytes))
+ nbytes = force_reg (Pmode, nbytes);
+
+ machine_mode mode = E_QImode;
+ unsigned int isize = GET_MODE_SIZE (mode).to_constant ();
+ int lmul = TARGET_MAX_LMUL;
+ poly_int64 nunits = exact_div (BYTES_PER_RISCV_VECTOR * lmul, isize);
+
+ machine_mode vmode;
+ if (!riscv_vector::get_vector_mode (GET_MODE_INNER (mode),
+ nunits).exists (&vmode))
+ gcc_unreachable ();
+
+ machine_mode mask_mode = riscv_vector::get_mask_mode (vmode);
+
+ /* Prepare addresses. */
+ rtx src_addr1 = copy_addr_to_reg (XEXP (src1, 0));
+ rtx vsrc1 = change_address (src1, vmode, src_addr1);
+
+ rtx src_addr2 = copy_addr_to_reg (XEXP (src2, 0));
+ rtx vsrc2 = change_address (src2, vmode, src_addr2);
+
+ /* Set initial pointer bump to 0. */
+ rtx cnt = gen_reg_rtx (Pmode);
+ emit_move_insn (cnt, CONST0_RTX (Pmode));
+
+ rtx sub = gen_reg_rtx (Pmode);
+ emit_move_insn (sub, CONST0_RTX (Pmode));
+
+ /* Create source vectors. */
+ rtx vec1 = gen_reg_rtx (vmode);
+ rtx vec2 = gen_reg_rtx (vmode);
+
+ rtx done = gen_label_rtx ();
+ rtx loop = gen_label_rtx ();
+ emit_label (loop);
+
+ /* Bump the pointers. */
+ emit_insn (gen_rtx_SET (src_addr1, gen_rtx_PLUS (Pmode, src_addr1, cnt)));
+ emit_insn (gen_rtx_SET (src_addr2, gen_rtx_PLUS (Pmode, src_addr2, cnt)));
+
+ rtx vlops1[] = {vec1, vsrc1};
+ rtx vlops2[] = {vec2, vsrc2};
+
+ if (!with_length)
+ {
+ emit_vlmax_insn (code_for_pred_fault_load (vmode),
+ riscv_vector::UNARY_OP, vlops1);
+
+ emit_vlmax_insn (code_for_pred_fault_load (vmode),
+ riscv_vector::UNARY_OP, vlops2);
+ }
+ else
+ {
+ nbytes = gen_lowpart (Pmode, nbytes);
+ emit_nonvlmax_insn (code_for_pred_fault_load (vmode),
+ riscv_vector::UNARY_OP, vlops1, nbytes);
+
+ emit_nonvlmax_insn (code_for_pred_fault_load (vmode),
+ riscv_vector::UNARY_OP, vlops2, nbytes);
+ }
+
+ /* Read the vl for the next pointer bump. */
+ if (Pmode == SImode)
+ emit_insn (gen_read_vlsi (cnt));
+ else
+ emit_insn (gen_read_vldi_zero_extend (cnt));
+
+ if (with_length)
+ {
+ rtx test_done = gen_rtx_EQ (VOIDmode, cnt, const0_rtx);
+ emit_jump_insn (gen_cbranch4 (Pmode, test_done, cnt, const0_rtx, done));
+ emit_insn (gen_rtx_SET (nbytes, gen_rtx_MINUS (Pmode, nbytes, cnt)));
+ }
+
+ /* Look for a \0 in the first string. */
+ rtx mask0 = gen_reg_rtx (mask_mode);
+ rtx eq0 = gen_rtx_EQ (mask_mode,
+ gen_const_vec_duplicate (vmode, CONST0_RTX (mode)),
+ vec1);
+ rtx vmsops1[] = {mask0, eq0, vec1, CONST0_RTX (mode)};
+ emit_nonvlmax_insn (code_for_pred_eqne_scalar (vmode),
+ riscv_vector::COMPARE_OP, vmsops1, cnt);
+
+ /* Look for vec1 != vec2 (includes vec2[i] == 0). */
+ rtx maskne = gen_reg_rtx (mask_mode);
+ rtx ne = gen_rtx_NE (mask_mode, vec1, vec2);
+ rtx vmsops[] = {maskne, ne, vec1, vec2};
+ emit_nonvlmax_insn (code_for_pred_cmp (vmode),
+ riscv_vector::COMPARE_OP, vmsops, cnt);
+
+ /* Combine both masks into one. */
+ rtx mask = gen_reg_rtx (mask_mode);
+ rtx vmorops[] = {mask, mask0, maskne};
+ emit_nonvlmax_insn (code_for_pred (IOR, mask_mode),
+ riscv_vector::BINARY_MASK_OP, vmorops, cnt);
+
+ /* Find the first bit in the mask (the first unequal element). */
+ rtx found_at = gen_reg_rtx (Pmode);
+ rtx vfops[] = {found_at, mask};
+ emit_nonvlmax_insn (code_for_pred_ffs (mask_mode, Pmode),
+ riscv_vector::CPOP_OP, vfops, cnt);
+
+ /* Emit the loop condition. */
+ rtx test = gen_rtx_LT (VOIDmode, found_at, const0_rtx);
+ emit_jump_insn (gen_cbranch4 (Pmode, test, found_at, const0_rtx, loop));
+
+ /* Walk up to the difference point. */
+ emit_insn (gen_rtx_SET (src_addr1, gen_rtx_PLUS (Pmode, src_addr1, found_at)));
+ emit_insn (gen_rtx_SET (src_addr2, gen_rtx_PLUS (Pmode, src_addr2, found_at)));
+
+ /* Load the respective byte and compute the difference. */
+ rtx c1 = gen_reg_rtx (Pmode);
+ rtx c2 = gen_reg_rtx (Pmode);
+
+ do_load_from_addr (mode, c1, src_addr1, src1);
+ do_load_from_addr (mode, c2, src_addr2, src2);
+
+ do_sub3 (sub, c1, c2);
+
+ if (with_length)
+ emit_label (done);
+
+ emit_insn (gen_movsi (result, gen_lowpart (SImode, sub)));
+ return true;
+}
}
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 6bf2dfdf9b4..ce092e92465 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2336,9 +2336,7 @@ (define_expand "cpymem<mode>"
(use (match_operand:SI 3 "const_int_operand"))])]
""
{
- if (riscv_vector::expand_block_move (operands[0], operands[1], operands[2]))
- DONE;
- else if (riscv_expand_block_move (operands[0], operands[1], operands[2]))
+ if (riscv_expand_block_move (operands[0], operands[1], operands[2]))
DONE;
else
FAIL;
@@ -3705,7 +3703,8 @@ (define_expand "cmpstrnsi"
(match_operand:BLK 2)))
(use (match_operand:SI 3))
(use (match_operand:SI 4))])]
- "riscv_inline_strncmp && !optimize_size && (TARGET_ZBB || TARGET_XTHEADBB)"
+ "riscv_inline_strncmp && !optimize_size
+ && (TARGET_ZBB || TARGET_XTHEADBB || TARGET_VECTOR)"
{
if (riscv_expand_strcmp (operands[0], operands[1], operands[2],
operands[3], operands[4]))
@@ -3725,7 +3724,8 @@ (define_expand "cmpstrsi"
(compare:SI (match_operand:BLK 1)
(match_operand:BLK 2)))
(use (match_operand:SI 3))])]
- "riscv_inline_strcmp && !optimize_size && (TARGET_ZBB || TARGET_XTHEADBB)"
+ "riscv_inline_strcmp && !optimize_size
+ && (TARGET_ZBB || TARGET_XTHEADBB || TARGET_VECTOR)"
{
if (riscv_expand_strcmp (operands[0], operands[1], operands[2],
NULL_RTX, operands[3]))
@@ -3746,14 +3746,16 @@ (define_expand "strlen<mode>"
(match_operand:SI 2 "const_int_operand")
(match_operand:SI 3 "const_int_operand")]
UNSPEC_STRLEN))]
- "riscv_inline_strlen && !optimize_size && (TARGET_ZBB || TARGET_XTHEADBB)"
+ "riscv_inline_strlen && !optimize_size
+ && (TARGET_ZBB || TARGET_XTHEADBB || TARGET_VECTOR)"
{
rtx search_char = operands[2];
- if (search_char != const0_rtx)
+ if (search_char != const0_rtx && !TARGET_VECTOR)
FAIL;
- if (riscv_expand_strlen (operands[0], operands[1], operands[2], operands[3]))
+ else if (riscv_expand_strlen (operands[0], operands[1], operands[2],
+ operands[3]))
DONE;
else
FAIL;
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 0c6517bdc8b..00b52f5dc77 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -536,21 +536,21 @@ Enable the use of vector registers for function arguments and return value.
This is an experimental switch and may be subject to change in the future.
Enum
-Name(riscv_stringop_strategy) Type(enum riscv_stringop_strategy_enum)
-Valid arguments to -mmemcpy-strategy=:
+Name(stringop_strategy) Type(enum stringop_strategy_enum)
+Valid arguments to -mbuilin-strategy=:
EnumValue
-Enum(riscv_stringop_strategy) String(auto) Value(USE_AUTO)
+Enum(stringop_strategy) String(auto) Value(STRINGOP_STRATEGY_AUTO)
EnumValue
-Enum(riscv_stringop_strategy) String(libcall) Value(USE_LIBCALL)
+Enum(stringop_strategy) String(libcall) Value(STRINGOP_STRATEGY_LIBCALL)
EnumValue
-Enum(riscv_stringop_strategy) String(scalar) Value(USE_SCALAR)
+Enum(stringop_strategy) String(scalar) Value(STRINGOP_STRATEGY_SCALAR)
EnumValue
-Enum(riscv_stringop_strategy) String(vector) Value(USE_VECTOR)
+Enum(stringop_strategy) String(vector) Value(STRINGOP_STRATEGY_VECTOR)
-mmemcpy-strategy=
-Target RejectNegative Joined Enum(riscv_stringop_strategy) Var(riscv_memcpy_strategy) Init(USE_AUTO)
-Specify memcpy expansion strategy.
+mbuiltin-strategy=
+Target RejectNegative Joined Enum(stringop_strategy) Var(stringop_strategy) Init(STRINGOP_STRATEGY_AUTO)
+Specify builtin expansion strategy.
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
new file mode 100644
index 00000000000..6dec7da91c1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+/* { dg-additional-options "-O3 -minline-strcmp" } */
+
+#include <string.h>
+
+int
+__attribute__ ((noipa))
+foo (const char *s, const char *t)
+{
+ return __builtin_strcmp (s, t);
+}
+
+int
+__attribute__ ((noipa, optimize ("0")))
+foo2 (const char *s, const char *t)
+{
+ return strcmp (s, t);
+}
+
+#define SZ 10
+
+int main ()
+{
+ const char *s[SZ]
+ = {"", "asdf", "0", "\0", "!@#$%***m1123fdnmoi43",
+ "a", "z", "1", "9", "12345678901234567889012345678901234567890"};
+
+ for (int i = 0; i < SZ; i++)
+ for (int j = 0; j < SZ; j++)
+ if (foo (s[i], s[j]) != foo2 (s[i], s[j]))
+ __builtin_abort ();
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp.c
new file mode 100644
index 00000000000..f9d33a74fc5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { riscv_v } } } */
+/* { dg-additional-options "-O3 -minline-strcmp" } */
+
+int
+__attribute__ ((noipa))
+foo (const char *s, const char *t)
+{
+ return __builtin_strcmp (s, t);
+}
+
+/* { dg-final { scan-assembler-times "vle8ff" 2 } } */
+/* { dg-final { scan-assembler-times "vfirst.m" 1 } } */
+/* { dg-final { scan-assembler-times "vmor.m" 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen-run.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen-run.c
new file mode 100644
index 00000000000..d29297a5f86
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen-run.c
@@ -0,0 +1,37 @@
+/* { dg-do run } */
+/* { dg-additional-options "-O3 -minline-strlen" } */
+
+int
+__attribute__ ((noipa))
+foo (const char *s)
+{
+ return __builtin_strlen (s);
+}
+
+int
+__attribute__ ((noipa))
+foo2 (const char *s)
+{
+ int n = 0;
+ while (*s++ != '\0')
+ {
+ asm volatile ("");
+ n++;
+ }
+ return n;
+}
+
+#define SZ 10
+
+int main ()
+{
+ const char *s[SZ]
+ = {"", "asdf", "0", "\0", "!@#$%***m1123fdnmoi43",
+ "a", "z", "1", "9", "12345678901234567889012345678901234567890"};
+
+ for (int i = 0; i < SZ; i++)
+ {
+ if (foo (s[i]) != foo2 (s[i]))
+ __builtin_abort ();
+ }
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen.c
new file mode 100644
index 00000000000..0c6cca63ebf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target { riscv_v } } } */
+/* { dg-additional-options "-O3 -minline-strlen" } */
+
+int
+__attribute__ ((noipa))
+foo (const char *s)
+{
+ return __builtin_strlen (s);
+}
+
+/* { dg-final { scan-assembler-times "vle8ff" 1 } } */
+/* { dg-final { scan-assembler-times "vfirst.m" 1 } } */
--
2.43.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] RISC-V: Vectorized str(n)cmp and strlen.
2023-11-30 22:22 [PATCH] RISC-V: Vectorized str(n)cmp and strlen Robin Dapp
2023-12-01 0:49 ` Jeff Law
2023-12-01 0:58 ` juzhe.zhong
@ 2023-12-01 1:04 ` juzhe.zhong
2023-12-01 15:27 ` Robin Dapp
2 siblings, 1 reply; 5+ messages in thread
From: juzhe.zhong @ 2023-12-01 1:04 UTC (permalink / raw)
To: Robin Dapp, gcc-patches, palmer, kito.cheng, jeffreyalaw; +Cc: Robin Dapp
[-- Attachment #1: Type: text/plain, Size: 25478 bytes --]
Ah. I see:
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2336,9 +2336,7 @@ (define_expand "cpymem<mode>"
(use (match_operand:SI 3 "const_int_operand"))])]
""
{
- if (riscv_vector::expand_block_move (operands[0], operands[1], operands[2]))
- DONE;
- else if (riscv_expand_block_move (operands[0], operands[1], operands[2]))
+ if (riscv_expand_block_move (operands[0], operands[1], operands[2]))
DONE;
I think it should be an NFC patch in another separate patch.
juzhe.zhong@rivai.ai
From: Robin Dapp
Date: 2023-12-01 06:22
To: gcc-patches; palmer; Kito Cheng; jeffreyalaw; juzhe.zhong@rivai.ai
CC: rdapp.gcc
Subject: [PATCH] RISC-V: Vectorized str(n)cmp and strlen.
Hi,
this adds vectorized implementations of strcmp and strncmp as well as
strlen. strlen falls back to the previously implemented rawmemchr.
Also, it fixes a rawmemchr bug causing a SPEC2017 execution failure:
We would only ever increment the source address by 1 regardless of
the input type.
The patch also changes the stringop-strategy handling slightly:
auto is now an aggregate (including vector and scalar,
possibly more in the future) and expansion functions try all
matching strategies in their preferred order.
As before, str* expansion is guarded by -minline-str* and not active
by default. This might change in the future as I would rather have
those on by default. As of now, though, there is still a latent bug:
With -minline-strlen and -minline-strcmp we have several execution
failures in gcc.c-torture/execute/builtins/. From my initial analysis
it looks like we don't insert a vsetvl at the right spot (which would
be right after a setjmp in those cases). This leaves the initial
vle8ff without a proper vtype or vl causing a SIGILL.
Still, I figured I'd rather post the patch as-is so the bug can be
reproduced upstream.
Regards
Robin
gcc/ChangeLog:
PR target/112109
* config/riscv/riscv-opts.h (enum riscv_stringop_strategy_enum):
Rename.
(enum stringop_strategy_enum): To this.
* config/riscv/riscv-protos.h (expand_rawmemchr): Add strlen
param.
(expand_strcmp): Define.
* config/riscv/riscv-string.cc (riscv_expand_strcmp): Add
vector version.
(riscv_expand_strlen): Ditto.
(riscv_expand_block_move_scalar): Handle existing scalar expansion.
(riscv_expand_block_move): Expand to either vector or scalar
version.
(expand_block_move): Add stringop strategy.
(expand_rawmemchr): Handle strlen and fix increment bug.
(expand_strcmp): New expander.
* config/riscv/riscv.md: Add vector.
* config/riscv/riscv.opt: Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c: New test.
* gcc.target/riscv/rvv/autovec/builtin/strcmp.c: New test.
* gcc.target/riscv/rvv/autovec/builtin/strlen-run.c: New test.
* gcc.target/riscv/rvv/autovec/builtin/strlen.c: New test.
---
gcc/config/riscv/riscv-opts.h | 20 +-
gcc/config/riscv/riscv-protos.h | 4 +-
gcc/config/riscv/riscv-string.cc | 287 +++++++++++++++---
gcc/config/riscv/riscv.md | 18 +-
gcc/config/riscv/riscv.opt | 18 +-
.../riscv/rvv/autovec/builtin/strcmp-run.c | 32 ++
.../riscv/rvv/autovec/builtin/strcmp.c | 13 +
.../riscv/rvv/autovec/builtin/strlen-run.c | 37 +++
.../riscv/rvv/autovec/builtin/strlen.c | 12 +
9 files changed, 363 insertions(+), 78 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen-run.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen.c
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index e6e55ad7071..315f6ddb239 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -103,16 +103,16 @@ enum riscv_entity
MAX_RISCV_ENTITIES
};
-/* RISC-V stringop strategy. */
-enum riscv_stringop_strategy_enum {
- /* Use scalar or vector instructions. */
- USE_AUTO,
- /* Always use a library call. */
- USE_LIBCALL,
- /* Only use scalar instructions. */
- USE_SCALAR,
- /* Only use vector instructions. */
- USE_VECTOR
+/* RISC-V builtin strategy. */
+enum stringop_strategy_enum {
+ /* No expansion. */
+ STRINGOP_STRATEGY_LIBCALL = 1,
+ /* Use scalar expansion if possible. */
+ STRINGOP_STRATEGY_SCALAR = 2,
+ /* Only vector expansion if possible. */
+ STRINGOP_STRATEGY_VECTOR = 4,
+ /* Use any. */
+ STRINGOP_STRATEGY_AUTO = STRINGOP_STRATEGY_SCALAR | STRINGOP_STRATEGY_VECTOR
};
#define TARGET_ZICOND_LIKE (TARGET_ZICOND || (TARGET_XVENTANACONDOPS && TARGET_64BIT))
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 695ee24ad6f..51359154846 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -557,7 +557,9 @@ void expand_cond_unop (unsigned, rtx *);
void expand_cond_binop (unsigned, rtx *);
void expand_cond_ternop (unsigned, rtx *);
void expand_popcount (rtx *);
-void expand_rawmemchr (machine_mode, rtx, rtx, rtx);
+void expand_rawmemchr (machine_mode, rtx, rtx, rtx, bool = false);
+bool expand_strcmp (rtx, rtx, rtx, rtx,
+ unsigned HOST_WIDE_INT, bool);
void emit_vec_extract (rtx, rtx, poly_int64);
/* Rounding mode bitfield for fixed point VXRM. */
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 80e3b5981af..ce259831a5c 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -511,7 +511,16 @@ riscv_expand_strcmp (rtx result, rtx src1, rtx src2,
return false;
alignment = UINTVAL (align_rtx);
- if (TARGET_ZBB || TARGET_XTHEADBB)
+ if (TARGET_VECTOR && stringop_strategy & STRINGOP_STRATEGY_VECTOR)
+ {
+ bool ok = riscv_vector::expand_strcmp (result, src1, src2, bytes_rtx,
+ alignment, ncompare);
+ if (ok)
+ return true;
+ }
+
+ if ((TARGET_ZBB || TARGET_XTHEADBB)
+ && stringop_strategy & STRINGOP_STRATEGY_SCALAR)
{
return riscv_expand_strcmp_scalar (result, src1, src2, nbytes, alignment,
ncompare);
@@ -588,9 +597,17 @@ riscv_expand_strlen_scalar (rtx result, rtx src, rtx align)
bool
riscv_expand_strlen (rtx result, rtx src, rtx search_char, rtx align)
{
+ if (TARGET_VECTOR && (stringop_strategy & STRINGOP_STRATEGY_VECTOR))
+ {
+ riscv_vector::expand_rawmemchr (E_QImode, result, src, search_char,
+ /* strlen */ true);
+ return true;
+ }
+
gcc_assert (search_char == const0_rtx);
- if (TARGET_ZBB || TARGET_XTHEADBB)
+ if ((TARGET_ZBB || TARGET_XTHEADBB)
+ && stringop_strategy & STRINGOP_STRATEGY_SCALAR)
return riscv_expand_strlen_scalar (result, src, align);
return false;
@@ -707,51 +724,68 @@ riscv_block_move_loop (rtx dest, rtx src, unsigned HOST_WIDE_INT length,
/* Expand a cpymemsi instruction, which copies LENGTH bytes from
memory reference SRC to memory reference DEST. */
-bool
-riscv_expand_block_move (rtx dest, rtx src, rtx length)
+static bool
+riscv_expand_block_move_scalar (rtx dest, rtx src, rtx length)
{
- if (riscv_memcpy_strategy == USE_LIBCALL
- || riscv_memcpy_strategy == USE_VECTOR)
+ if (!CONST_INT_P (length))
return false;
- if (CONST_INT_P (length))
- {
- unsigned HOST_WIDE_INT hwi_length = UINTVAL (length);
- unsigned HOST_WIDE_INT factor, align;
+ unsigned HOST_WIDE_INT hwi_length = UINTVAL (length);
+ unsigned HOST_WIDE_INT factor, align;
- align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD);
- factor = BITS_PER_WORD / align;
+ align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD);
+ factor = BITS_PER_WORD / align;
- if (optimize_function_for_size_p (cfun)
- && hwi_length * factor * UNITS_PER_WORD > MOVE_RATIO (false))
- return false;
+ if (optimize_function_for_size_p (cfun)
+ && hwi_length * factor * UNITS_PER_WORD > MOVE_RATIO (false))
+ return false;
- if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor))
+ if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor))
+ {
+ riscv_block_move_straight (dest, src, INTVAL (length));
+ return true;
+ }
+ else if (optimize && align >= BITS_PER_WORD)
+ {
+ unsigned min_iter_words
+ = RISCV_MAX_MOVE_BYTES_PER_LOOP_ITER / UNITS_PER_WORD;
+ unsigned iter_words = min_iter_words;
+ unsigned HOST_WIDE_INT bytes = hwi_length;
+ unsigned HOST_WIDE_INT words = bytes / UNITS_PER_WORD;
+
+ /* Lengthen the loop body if it shortens the tail. */
+ for (unsigned i = min_iter_words; i < min_iter_words * 2 - 1; i++)
{
- riscv_block_move_straight (dest, src, INTVAL (length));
- return true;
+ unsigned cur_cost = iter_words + words % iter_words;
+ unsigned new_cost = i + words % i;
+ if (new_cost <= cur_cost)
+ iter_words = i;
}
- else if (optimize && align >= BITS_PER_WORD)
- {
- unsigned min_iter_words
- = RISCV_MAX_MOVE_BYTES_PER_LOOP_ITER / UNITS_PER_WORD;
- unsigned iter_words = min_iter_words;
- unsigned HOST_WIDE_INT bytes = hwi_length;
- unsigned HOST_WIDE_INT words = bytes / UNITS_PER_WORD;
-
- /* Lengthen the loop body if it shortens the tail. */
- for (unsigned i = min_iter_words; i < min_iter_words * 2 - 1; i++)
- {
- unsigned cur_cost = iter_words + words % iter_words;
- unsigned new_cost = i + words % i;
- if (new_cost <= cur_cost)
- iter_words = i;
- }
- riscv_block_move_loop (dest, src, bytes, iter_words * UNITS_PER_WORD);
- return true;
- }
+ riscv_block_move_loop (dest, src, bytes, iter_words * UNITS_PER_WORD);
+ return true;
+ }
+
+ return false;
+}
+
+/* This function delegates block-move expansion to either the vector
+ implementation or the scalar one. Return TRUE if successful or FALSE
+ otherwise. */
+
+bool
+riscv_expand_block_move (rtx dest, rtx src, rtx length)
+{
+ if (TARGET_VECTOR && stringop_strategy & STRINGOP_STRATEGY_VECTOR)
+ {
+ bool ok = riscv_vector::expand_block_move (dest, src, length);
+ if (ok)
+ return true;
}
+
+ if (stringop_strategy & STRINGOP_STRATEGY_SCALAR)
+ return riscv_expand_block_move_scalar (dest, src, length);
+
return false;
}
@@ -777,9 +811,6 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in)
bnez a2, loop # Any more?
ret # Return
*/
- if (!TARGET_VECTOR || riscv_memcpy_strategy == USE_LIBCALL
- || riscv_memcpy_strategy == USE_SCALAR)
- return false;
HOST_WIDE_INT potential_ew
= (MIN (MIN (MEM_ALIGN (src_in), MEM_ALIGN (dst_in)), BITS_PER_WORD)
/ BITS_PER_UNIT);
@@ -968,7 +999,8 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in)
behavior is undefined. */
void
-expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat)
+expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat,
+ bool strlen)
{
/*
rawmemchr:
@@ -1001,6 +1033,8 @@ expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat)
machine_mode mask_mode = riscv_vector::get_mask_mode (vmode);
rtx cnt = gen_reg_rtx (Pmode);
+ emit_move_insn (cnt, CONST0_RTX (Pmode));
+
rtx end = gen_reg_rtx (Pmode);
rtx vec = gen_reg_rtx (vmode);
rtx mask = gen_reg_rtx (mask_mode);
@@ -1011,12 +1045,18 @@ expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat)
unsigned int shift = exact_log2 (GET_MODE_SIZE (mode).to_constant ());
rtx src_addr = copy_addr_to_reg (XEXP (src, 0));
+ rtx start_addr = copy_addr_to_reg (XEXP (src, 0));
rtx loop = gen_label_rtx ();
emit_label (loop);
rtx vsrc = change_address (src, vmode, src_addr);
+ /* Bump the pointer. */
+ rtx step = gen_reg_rtx (Pmode);
+ emit_insn (gen_rtx_SET (step, gen_rtx_ASHIFT (Pmode, cnt, GEN_INT (shift))));
+ emit_insn (gen_rtx_SET (src_addr, gen_rtx_PLUS (Pmode, src_addr, step)));
+
/* Emit a first-fault load. */
rtx vlops[] = {vec, vsrc};
emit_vlmax_insn (code_for_pred_fault_load (vmode),
@@ -1039,19 +1079,166 @@ expand_rawmemchr (machine_mode mode, rtx dst, rtx src, rtx pat)
emit_nonvlmax_insn (code_for_pred_ffs (mask_mode, Pmode),
riscv_vector::CPOP_OP, vfops, cnt);
- /* Bump the pointer. */
- emit_insn (gen_rtx_SET (src_addr, gen_rtx_PLUS (Pmode, src_addr, cnt)));
-
/* Emit the loop condition. */
rtx test = gen_rtx_LT (VOIDmode, end, const0_rtx);
emit_jump_insn (gen_cbranch4 (Pmode, test, end, const0_rtx, loop));
- /* We overran by CNT, subtract it. */
- emit_insn (gen_rtx_SET (src_addr, gen_rtx_MINUS (Pmode, src_addr, cnt)));
-
- /* We found something at SRC + END * [1,2,4,8]. */
- emit_insn (gen_rtx_SET (end, gen_rtx_ASHIFT (Pmode, end, GEN_INT (shift))));
- emit_insn (gen_rtx_SET (dst, gen_rtx_PLUS (Pmode, src_addr, end)));
+ if (strlen)
+ {
+ /* For strlen, return the length. */
+ emit_insn (gen_rtx_SET (dst, gen_rtx_PLUS (Pmode, src_addr, end)));
+ emit_insn (gen_rtx_SET (dst, gen_rtx_MINUS (Pmode, dst, start_addr)));
+ }
+ else
+ {
+ /* For rawmemchr, return the position at SRC + END * [1,2,4,8]. */
+ emit_insn (gen_rtx_SET (end, gen_rtx_ASHIFT (Pmode, end, GEN_INT (shift))));
+ emit_insn (gen_rtx_SET (dst, gen_rtx_PLUS (Pmode, src_addr, end)));
+ }
}
+/* Implement cmpstr<mode> using vector instructions. */
+
+bool
+expand_strcmp (rtx result, rtx src1, rtx src2, rtx nbytes,
+ unsigned HOST_WIDE_INT, bool)
+{
+ gcc_assert (TARGET_VECTOR);
+
+ /* We don't support big endian. */
+ if (BYTES_BIG_ENDIAN)
+ return false;
+
+ bool with_length = nbytes != NULL_RTX;
+
+ if (with_length
+ && (!REG_P (nbytes) && !SUBREG_P (nbytes) && !CONST_INT_P (nbytes)))
+ return false;
+
+ if (with_length && CONST_INT_P (nbytes))
+ nbytes = force_reg (Pmode, nbytes);
+
+ machine_mode mode = E_QImode;
+ unsigned int isize = GET_MODE_SIZE (mode).to_constant ();
+ int lmul = TARGET_MAX_LMUL;
+ poly_int64 nunits = exact_div (BYTES_PER_RISCV_VECTOR * lmul, isize);
+
+ machine_mode vmode;
+ if (!riscv_vector::get_vector_mode (GET_MODE_INNER (mode),
+ nunits).exists (&vmode))
+ gcc_unreachable ();
+
+ machine_mode mask_mode = riscv_vector::get_mask_mode (vmode);
+
+ /* Prepare addresses. */
+ rtx src_addr1 = copy_addr_to_reg (XEXP (src1, 0));
+ rtx vsrc1 = change_address (src1, vmode, src_addr1);
+
+ rtx src_addr2 = copy_addr_to_reg (XEXP (src2, 0));
+ rtx vsrc2 = change_address (src2, vmode, src_addr2);
+
+ /* Set initial pointer bump to 0. */
+ rtx cnt = gen_reg_rtx (Pmode);
+ emit_move_insn (cnt, CONST0_RTX (Pmode));
+
+ rtx sub = gen_reg_rtx (Pmode);
+ emit_move_insn (sub, CONST0_RTX (Pmode));
+
+ /* Create source vectors. */
+ rtx vec1 = gen_reg_rtx (vmode);
+ rtx vec2 = gen_reg_rtx (vmode);
+
+ rtx done = gen_label_rtx ();
+ rtx loop = gen_label_rtx ();
+ emit_label (loop);
+
+ /* Bump the pointers. */
+ emit_insn (gen_rtx_SET (src_addr1, gen_rtx_PLUS (Pmode, src_addr1, cnt)));
+ emit_insn (gen_rtx_SET (src_addr2, gen_rtx_PLUS (Pmode, src_addr2, cnt)));
+
+ rtx vlops1[] = {vec1, vsrc1};
+ rtx vlops2[] = {vec2, vsrc2};
+
+ if (!with_length)
+ {
+ emit_vlmax_insn (code_for_pred_fault_load (vmode),
+ riscv_vector::UNARY_OP, vlops1);
+
+ emit_vlmax_insn (code_for_pred_fault_load (vmode),
+ riscv_vector::UNARY_OP, vlops2);
+ }
+ else
+ {
+ nbytes = gen_lowpart (Pmode, nbytes);
+ emit_nonvlmax_insn (code_for_pred_fault_load (vmode),
+ riscv_vector::UNARY_OP, vlops1, nbytes);
+
+ emit_nonvlmax_insn (code_for_pred_fault_load (vmode),
+ riscv_vector::UNARY_OP, vlops2, nbytes);
+ }
+
+ /* Read the vl for the next pointer bump. */
+ if (Pmode == SImode)
+ emit_insn (gen_read_vlsi (cnt));
+ else
+ emit_insn (gen_read_vldi_zero_extend (cnt));
+
+ if (with_length)
+ {
+ rtx test_done = gen_rtx_EQ (VOIDmode, cnt, const0_rtx);
+ emit_jump_insn (gen_cbranch4 (Pmode, test_done, cnt, const0_rtx, done));
+ emit_insn (gen_rtx_SET (nbytes, gen_rtx_MINUS (Pmode, nbytes, cnt)));
+ }
+
+ /* Look for a \0 in the first string. */
+ rtx mask0 = gen_reg_rtx (mask_mode);
+ rtx eq0 = gen_rtx_EQ (mask_mode,
+ gen_const_vec_duplicate (vmode, CONST0_RTX (mode)),
+ vec1);
+ rtx vmsops1[] = {mask0, eq0, vec1, CONST0_RTX (mode)};
+ emit_nonvlmax_insn (code_for_pred_eqne_scalar (vmode),
+ riscv_vector::COMPARE_OP, vmsops1, cnt);
+
+ /* Look for vec1 != vec2 (includes vec2[i] == 0). */
+ rtx maskne = gen_reg_rtx (mask_mode);
+ rtx ne = gen_rtx_NE (mask_mode, vec1, vec2);
+ rtx vmsops[] = {maskne, ne, vec1, vec2};
+ emit_nonvlmax_insn (code_for_pred_cmp (vmode),
+ riscv_vector::COMPARE_OP, vmsops, cnt);
+
+ /* Combine both masks into one. */
+ rtx mask = gen_reg_rtx (mask_mode);
+ rtx vmorops[] = {mask, mask0, maskne};
+ emit_nonvlmax_insn (code_for_pred (IOR, mask_mode),
+ riscv_vector::BINARY_MASK_OP, vmorops, cnt);
+
+ /* Find the first bit in the mask (the first unequal element). */
+ rtx found_at = gen_reg_rtx (Pmode);
+ rtx vfops[] = {found_at, mask};
+ emit_nonvlmax_insn (code_for_pred_ffs (mask_mode, Pmode),
+ riscv_vector::CPOP_OP, vfops, cnt);
+
+ /* Emit the loop condition. */
+ rtx test = gen_rtx_LT (VOIDmode, found_at, const0_rtx);
+ emit_jump_insn (gen_cbranch4 (Pmode, test, found_at, const0_rtx, loop));
+
+ /* Walk up to the difference point. */
+ emit_insn (gen_rtx_SET (src_addr1, gen_rtx_PLUS (Pmode, src_addr1, found_at)));
+ emit_insn (gen_rtx_SET (src_addr2, gen_rtx_PLUS (Pmode, src_addr2, found_at)));
+
+ /* Load the respective byte and compute the difference. */
+ rtx c1 = gen_reg_rtx (Pmode);
+ rtx c2 = gen_reg_rtx (Pmode);
+
+ do_load_from_addr (mode, c1, src_addr1, src1);
+ do_load_from_addr (mode, c2, src_addr2, src2);
+
+ do_sub3 (sub, c1, c2);
+
+ if (with_length)
+ emit_label (done);
+
+ emit_insn (gen_movsi (result, gen_lowpart (SImode, sub)));
+ return true;
+}
}
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 6bf2dfdf9b4..ce092e92465 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2336,9 +2336,7 @@ (define_expand "cpymem<mode>"
(use (match_operand:SI 3 "const_int_operand"))])]
""
{
- if (riscv_vector::expand_block_move (operands[0], operands[1], operands[2]))
- DONE;
- else if (riscv_expand_block_move (operands[0], operands[1], operands[2]))
+ if (riscv_expand_block_move (operands[0], operands[1], operands[2]))
DONE;
else
FAIL;
@@ -3705,7 +3703,8 @@ (define_expand "cmpstrnsi"
(match_operand:BLK 2)))
(use (match_operand:SI 3))
(use (match_operand:SI 4))])]
- "riscv_inline_strncmp && !optimize_size && (TARGET_ZBB || TARGET_XTHEADBB)"
+ "riscv_inline_strncmp && !optimize_size
+ && (TARGET_ZBB || TARGET_XTHEADBB || TARGET_VECTOR)"
{
if (riscv_expand_strcmp (operands[0], operands[1], operands[2],
operands[3], operands[4]))
@@ -3725,7 +3724,8 @@ (define_expand "cmpstrsi"
(compare:SI (match_operand:BLK 1)
(match_operand:BLK 2)))
(use (match_operand:SI 3))])]
- "riscv_inline_strcmp && !optimize_size && (TARGET_ZBB || TARGET_XTHEADBB)"
+ "riscv_inline_strcmp && !optimize_size
+ && (TARGET_ZBB || TARGET_XTHEADBB || TARGET_VECTOR)"
{
if (riscv_expand_strcmp (operands[0], operands[1], operands[2],
NULL_RTX, operands[3]))
@@ -3746,14 +3746,16 @@ (define_expand "strlen<mode>"
(match_operand:SI 2 "const_int_operand")
(match_operand:SI 3 "const_int_operand")]
UNSPEC_STRLEN))]
- "riscv_inline_strlen && !optimize_size && (TARGET_ZBB || TARGET_XTHEADBB)"
+ "riscv_inline_strlen && !optimize_size
+ && (TARGET_ZBB || TARGET_XTHEADBB || TARGET_VECTOR)"
{
rtx search_char = operands[2];
- if (search_char != const0_rtx)
+ if (search_char != const0_rtx && !TARGET_VECTOR)
FAIL;
- if (riscv_expand_strlen (operands[0], operands[1], operands[2], operands[3]))
+ else if (riscv_expand_strlen (operands[0], operands[1], operands[2],
+ operands[3]))
DONE;
else
FAIL;
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 0c6517bdc8b..00b52f5dc77 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -536,21 +536,21 @@ Enable the use of vector registers for function arguments and return value.
This is an experimental switch and may be subject to change in the future.
Enum
-Name(riscv_stringop_strategy) Type(enum riscv_stringop_strategy_enum)
-Valid arguments to -mmemcpy-strategy=:
+Name(stringop_strategy) Type(enum stringop_strategy_enum)
+Valid arguments to -mbuilin-strategy=:
EnumValue
-Enum(riscv_stringop_strategy) String(auto) Value(USE_AUTO)
+Enum(stringop_strategy) String(auto) Value(STRINGOP_STRATEGY_AUTO)
EnumValue
-Enum(riscv_stringop_strategy) String(libcall) Value(USE_LIBCALL)
+Enum(stringop_strategy) String(libcall) Value(STRINGOP_STRATEGY_LIBCALL)
EnumValue
-Enum(riscv_stringop_strategy) String(scalar) Value(USE_SCALAR)
+Enum(stringop_strategy) String(scalar) Value(STRINGOP_STRATEGY_SCALAR)
EnumValue
-Enum(riscv_stringop_strategy) String(vector) Value(USE_VECTOR)
+Enum(stringop_strategy) String(vector) Value(STRINGOP_STRATEGY_VECTOR)
-mmemcpy-strategy=
-Target RejectNegative Joined Enum(riscv_stringop_strategy) Var(riscv_memcpy_strategy) Init(USE_AUTO)
-Specify memcpy expansion strategy.
+mbuiltin-strategy=
+Target RejectNegative Joined Enum(stringop_strategy) Var(stringop_strategy) Init(STRINGOP_STRATEGY_AUTO)
+Specify builtin expansion strategy.
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
new file mode 100644
index 00000000000..6dec7da91c1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+/* { dg-additional-options "-O3 -minline-strcmp" } */
+
+#include <string.h>
+
+int
+__attribute__ ((noipa))
+foo (const char *s, const char *t)
+{
+ return __builtin_strcmp (s, t);
+}
+
+int
+__attribute__ ((noipa, optimize ("0")))
+foo2 (const char *s, const char *t)
+{
+ return strcmp (s, t);
+}
+
+#define SZ 10
+
+int main ()
+{
+ const char *s[SZ]
+ = {"", "asdf", "0", "\0", "!@#$%***m1123fdnmoi43",
+ "a", "z", "1", "9", "12345678901234567889012345678901234567890"};
+
+ for (int i = 0; i < SZ; i++)
+ for (int j = 0; j < SZ; j++)
+ if (foo (s[i], s[j]) != foo2 (s[i], s[j]))
+ __builtin_abort ();
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp.c
new file mode 100644
index 00000000000..f9d33a74fc5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target { riscv_v } } } */
+/* { dg-additional-options "-O3 -minline-strcmp" } */
+
+int
+__attribute__ ((noipa))
+foo (const char *s, const char *t)
+{
+ return __builtin_strcmp (s, t);
+}
+
+/* { dg-final { scan-assembler-times "vle8ff" 2 } } */
+/* { dg-final { scan-assembler-times "vfirst.m" 1 } } */
+/* { dg-final { scan-assembler-times "vmor.m" 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen-run.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen-run.c
new file mode 100644
index 00000000000..d29297a5f86
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen-run.c
@@ -0,0 +1,37 @@
+/* { dg-do run } */
+/* { dg-additional-options "-O3 -minline-strlen" } */
+
+int
+__attribute__ ((noipa))
+foo (const char *s)
+{
+ return __builtin_strlen (s);
+}
+
+int
+__attribute__ ((noipa))
+foo2 (const char *s)
+{
+ int n = 0;
+ while (*s++ != '\0')
+ {
+ asm volatile ("");
+ n++;
+ }
+ return n;
+}
+
+#define SZ 10
+
+int main ()
+{
+ const char *s[SZ]
+ = {"", "asdf", "0", "\0", "!@#$%***m1123fdnmoi43",
+ "a", "z", "1", "9", "12345678901234567889012345678901234567890"};
+
+ for (int i = 0; i < SZ; i++)
+ {
+ if (foo (s[i]) != foo2 (s[i]))
+ __builtin_abort ();
+ }
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen.c
new file mode 100644
index 00000000000..0c6cca63ebf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strlen.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target { riscv_v } } } */
+/* { dg-additional-options "-O3 -minline-strlen" } */
+
+int
+__attribute__ ((noipa))
+foo (const char *s)
+{
+ return __builtin_strlen (s);
+}
+
+/* { dg-final { scan-assembler-times "vle8ff" 1 } } */
+/* { dg-final { scan-assembler-times "vfirst.m" 1 } } */
--
2.43.0
^ permalink raw reply [flat|nested] 5+ messages in thread