* [PATCH v12 14/31] string: Improve generic strrchr with memrchr and strlen
@ 2023-02-02 19:54 Wilco Dijkstra
2023-02-03 12:15 ` Adhemerval Zanella Netto
0 siblings, 1 reply; 5+ messages in thread
From: Wilco Dijkstra @ 2023-02-02 19:54 UTC (permalink / raw)
To: Adhemerval Zanella; +Cc: 'GNU C Library'
Hi Adhemerval,
STRRCHR (const char *s, int c)
{
+ return __memrchr (s, c, strlen (s) + 1);
}
LGTM
I did a quick check on AArch64 and the speedup vs the old generic version
on bench-strrchr is 9.9 times! It's ~15% slower overall than the assembler
strrchr.
While the benchmark is not perfect, it shows that if you have decent strlen
and memrchr then strlen+memchr gives a competitive strrchr.
Cheers,
Wilco
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v12 14/31] string: Improve generic strrchr with memrchr and strlen
2023-02-02 19:54 [PATCH v12 14/31] string: Improve generic strrchr with memrchr and strlen Wilco Dijkstra
@ 2023-02-03 12:15 ` Adhemerval Zanella Netto
0 siblings, 0 replies; 5+ messages in thread
From: Adhemerval Zanella Netto @ 2023-02-03 12:15 UTC (permalink / raw)
To: Wilco Dijkstra; +Cc: 'GNU C Library'
On 02/02/23 16:54, Wilco Dijkstra wrote:
> Hi Adhemerval,
>
> STRRCHR (const char *s, int c)
> {
> + return __memrchr (s, c, strlen (s) + 1);
> }
>
> LGTM
>
> I did a quick check on AArch64 and the speedup vs the old generic version
> on bench-strrchr is 9.9 times! It's ~15% slower overall than the assembler
> strrchr.
>
> While the benchmark is not perfect, it shows that if you have decent strlen
> and memrchr then strlen+memchr gives a competitive strrchr.
Yeah, I saw a similar improvement on my side as well. Thanks for verify it
as well.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v12 00/31] Improve generic string routines
@ 2023-02-02 18:11 Adhemerval Zanella
2023-02-02 18:11 ` [PATCH v12 14/31] string: Improve generic strrchr with memrchr and strlen Adhemerval Zanella
0 siblings, 1 reply; 5+ messages in thread
From: Adhemerval Zanella @ 2023-02-02 18:11 UTC (permalink / raw)
To: libc-alpha, Richard Henderson, Jeff Law, Xi Ruoyao, Noah Goldstein
It is done by:
1. Parametrizing the internal routines (for instance the find zero
in a word) so each architecture can reimplement without the need
to reimplement the whole routine.
2. Vectorizing more string implementations (for instance strcpy
and strcmp).
3. Change some implementations to use already possible optimized
ones (strnlen and strchr). It makes new ports to focus on
only provide optimized implementation of a hardful symbols
(for instance memchr) and make its improvement to be used in
a larger set of routines.
I checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu,
and powerpc64-linux-gnu by removing the arch-specific assembly
implementation and disabling multiarch (it covers both LE and BE
for 64 and 32 bits). I also checked the string routines on alpha, hppa,
and sh.
Changes since v11:
* Use index_first_zero_ne strcmp/strncmp, and fixed it on LE.
* Added strrchr optimization based on strlen/memrchr.
* Reorder the patches so composed optimizations (such as strrchr) as
ordered later.
Changes since v10:
* Added strcpy and stpcpy optimization.
* Added RISCV __riscv_zbb support to lower ctz/clz/orc.b.
* Fixed test-strnlen name.
Changes since v9:
* Added strncmp optimization.
* Fixed wcsmbs regressions.
Changes since v8:
* Change memrchr to use vectorized load on final string, instead of
byte per byte reads.
* Remove string-maskoff.h header.
* Add string-repeat_bytes.h and string-shift.h.
* Hook up the generic implementation on string tests.
Changes since v7:
* Split string-fzc.h out of string-fzi.h, with all of the
routines that are combinations of fza and fzi routines.
* Fix missing find_t and shift_find() from alpha, arm, powerpc.
* Use compiler builtins for arm and powerpc.
* Define sh4 has_zero() via has_eq(), rather than reverse.
Changes since v6:
* Add find_t to handle alpha way of comapring bytes (which returns
a bit-mask instead of byte-mask).
* Fixed alpha string-fzi.h and added string-fza.h.
* Renamed check_mask to shift_find.
Changes since v5:
* Replace 'inline' with '__always_inline' macros.
* Replace strchr implementation with a simpler one that call
strchrnul.
* Add strchrnul suggested changes.
* Add memchr suggested changes.
* Added check_mask on string-maskoff.h.
* Rebase and update Copyright years.
Changes since v4:
* Removed __clz and __ctz in favor of count_leading_zero and
count_trailing_zeros from longlong.h.
* Use repeat_bytes more often.
* Added a comment on strcmp final_cmp on why index_first_zero_ne can
not be used.
Changes since v3:
* Rebased against master.
* Dropped strcpy optimization.
* Refactor strcmp implementation.
* Some minor changes in comments.
Changes since v2:
* Move string-fz{a,b,i} to its own patch.
* Add a inline implementation for __builtin_c{l,t}z to avoid using
compiler provided symbols.
* Add a new header, string-maskoff.h, to handle unaligned accesses
on some implementation.
* Fixed strcmp on LE machines.
* Added a unaligned strcpy variant for architecture that define
_STRING_ARCH_unaligned.
* Add SH string-fzb.h (which uses cmp/str instruction to find
a zero in word).
Changes since v1:
* Marked ChangeLog entries with [BZ #5806], as appropriate.
* Reorganized the headers, so that armv6t2 and power6 need override
as little as possible to use their (integer) zero detection insns.
* Hopefully fixed all of the coding style issues.
* Adjusted the memrchr algorithm as discussed.
* Replaced the #ifdef STRRCHR etc that are used by the multiarch
* files.
* Tested on i386, i686, x86_64 (verified this is unused), ppc64,
ppc64le --with-cpu=power8 (to use power6 in multiarch), armv7,
aarch64, alpha (qemu) and hppa (qemu).
Adhemerval Zanella (25):
Parameterize op_t from memcopy.h
Add string vectorized find and detection functions
string: Improve generic strlen
string: Improve generic strchrnul
string: Improve generic strchr
string: Improve generic strcmp
string: Improve generic strncmp
string: Improve generic stpcpy
string: Improve generic strcpy
string: Improve generic memchr
string: Improve generic strnlen with memchr
string: Improve generic memrchr
string: Improve generic strrchr with memrchr and strlen
sh: Add string-fzb.h
riscv: Add string-fza.h and string-fzi.h
string: Hook up the default implementation on test-strlen
string: Hook up the default implementation on test-strnlen
string: Hook up the default implementation on test-strchr
string: Hook up the default implementation on test-strcmp
string: Hook up the default implementation on test-strncmp
string: Hook up the default implementation on test-stpcpy
string: Hook up the default implementation on test-strcpy
string: Hook up the default implementation on test-memchr
string: Hook up the default implementation on test-memrchr
string: Hook up the default implementation on test-strrchr
Richard Henderson (6):
Parameterize OP_T_THRES from memcopy.h
hppa: Add memcopy.h
hppa: Add string-fza.h, string-fzc.h, and string-fzi.h
alpha: Add string-fza, string-fzb.h, string-fzi.h, and string-shift.h
arm: Add string-fza.h
powerpc: Add string-fza.h
string/memchr.c | 176 +++++-----------
string/memcmp.c | 4 -
string/memrchr.c | 196 ++++--------------
string/stpcpy.c | 92 +++++++-
string/strchr.c | 164 +--------------
string/strchrnul.c | 155 ++------------
string/strcmp.c | 110 ++++++++--
string/strcpy.c | 6 +-
string/strlen.c | 92 ++------
string/strncmp.c | 138 ++++++++----
string/strnlen.c | 137 +-----------
string/strrchr.c | 18 +-
string/test-memchr.c | 31 ++-
string/test-memrchr.c | 7 +
string/test-stpcpy.c | 32 ++-
string/test-strchr.c | 53 +++--
string/test-strcmp.c | 22 ++
string/test-strcpy.c | 34 ++-
string/test-strlen.c | 31 ++-
string/test-strncmp.c | 16 ++
string/test-strnlen.c | 35 +++-
string/test-strrchr.c | 38 ++--
sysdeps/alpha/string-fza.h | 61 ++++++
sysdeps/alpha/string-fzb.h | 52 +++++
sysdeps/alpha/string-fzi.h | 62 ++++++
sysdeps/alpha/string-shift.h | 44 ++++
sysdeps/arm/armv6t2/string-fza.h | 68 ++++++
sysdeps/generic/memcopy.h | 10 +-
sysdeps/generic/string-fza.h | 104 ++++++++++
sysdeps/generic/string-fzb.h | 49 +++++
sysdeps/generic/string-fzc.h | 83 ++++++++
sysdeps/generic/string-fzi.h | 71 +++++++
sysdeps/generic/string-misc.h | 45 ++++
sysdeps/generic/string-opthr.h | 25 +++
sysdeps/generic/string-optype.h | 24 +++
sysdeps/generic/string-shift.h | 52 +++++
sysdeps/hppa/memcopy.h | 42 ++++
sysdeps/hppa/string-fzb.h | 63 ++++++
sysdeps/hppa/string-fzc.h | 124 +++++++++++
sysdeps/hppa/string-fzi.h | 63 ++++++
sysdeps/i386/i686/multiarch/strnlen-c.c | 14 +-
sysdeps/i386/memcopy.h | 3 -
sysdeps/i386/string-opthr.h | 25 +++
sysdeps/m68k/memcopy.h | 3 -
sysdeps/powerpc/powerpc32/power4/memcopy.h | 5 -
.../powerpc32/power4/multiarch/memchr-ppc32.c | 14 +-
.../power4/multiarch/strchrnul-ppc32.c | 4 -
.../power4/multiarch/strnlen-ppc32.c | 14 +-
.../powerpc64/multiarch/memchr-ppc64.c | 9 +-
sysdeps/powerpc/string-fza.h | 72 +++++++
sysdeps/riscv/string-fza.h | 70 +++++++
sysdeps/riscv/string-fzi.h | 77 +++++++
sysdeps/s390/strchr-c.c | 11 +-
sysdeps/s390/strchrnul-c.c | 2 -
sysdeps/s390/strlen-c.c | 10 +-
sysdeps/s390/strnlen-c.c | 14 +-
sysdeps/sh/string-fzb.h | 55 +++++
sysdeps/x86_64/x32/string-optype.h | 24 +++
58 files changed, 2041 insertions(+), 1014 deletions(-)
create mode 100644 sysdeps/alpha/string-fza.h
create mode 100644 sysdeps/alpha/string-fzb.h
create mode 100644 sysdeps/alpha/string-fzi.h
create mode 100644 sysdeps/alpha/string-shift.h
create mode 100644 sysdeps/arm/armv6t2/string-fza.h
create mode 100644 sysdeps/generic/string-fza.h
create mode 100644 sysdeps/generic/string-fzb.h
create mode 100644 sysdeps/generic/string-fzc.h
create mode 100644 sysdeps/generic/string-fzi.h
create mode 100644 sysdeps/generic/string-misc.h
create mode 100644 sysdeps/generic/string-opthr.h
create mode 100644 sysdeps/generic/string-optype.h
create mode 100644 sysdeps/generic/string-shift.h
create mode 100644 sysdeps/hppa/memcopy.h
create mode 100644 sysdeps/hppa/string-fzb.h
create mode 100644 sysdeps/hppa/string-fzc.h
create mode 100644 sysdeps/hppa/string-fzi.h
create mode 100644 sysdeps/i386/string-opthr.h
create mode 100644 sysdeps/powerpc/string-fza.h
create mode 100644 sysdeps/riscv/string-fza.h
create mode 100644 sysdeps/riscv/string-fzi.h
create mode 100644 sysdeps/sh/string-fzb.h
create mode 100644 sysdeps/x86_64/x32/string-optype.h
--
2.34.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v12 14/31] string: Improve generic strrchr with memrchr and strlen
2023-02-02 18:11 [PATCH v12 00/31] Improve generic string routines Adhemerval Zanella
@ 2023-02-02 18:11 ` Adhemerval Zanella
2023-02-04 3:06 ` Richard Henderson
0 siblings, 1 reply; 5+ messages in thread
From: Adhemerval Zanella @ 2023-02-02 18:11 UTC (permalink / raw)
To: libc-alpha, Richard Henderson, Jeff Law, Xi Ruoyao, Noah Goldstein
Now that both strlen and memrchr have word vectorized implementation,
it should be faster to implement strrchr based on memrchr over the
string length instead of calling strchr on a loop.
Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu,
and powerpc64-linux-gnu by removing the arch-specific assembly
implementation and disabling multi-arch (it covers both LE and BE
for 64 and 32 bits).
---
string/strrchr.c | 18 +-----------------
1 file changed, 1 insertion(+), 17 deletions(-)
diff --git a/string/strrchr.c b/string/strrchr.c
index 3c6e715d3b..7b76dea4e0 100644
--- a/string/strrchr.c
+++ b/string/strrchr.c
@@ -27,23 +27,7 @@
char *
STRRCHR (const char *s, int c)
{
- const char *found, *p;
-
- c = (unsigned char) c;
-
- /* Since strchr is fast, we use it rather than the obvious loop. */
-
- if (c == '\0')
- return strchr (s, '\0');
-
- found = NULL;
- while ((p = strchr (s, c)) != NULL)
- {
- found = p;
- s = p + 1;
- }
-
- return (char *) found;
+ return __memrchr (s, c, strlen (s) + 1);
}
#ifdef weak_alias
--
2.34.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v12 14/31] string: Improve generic strrchr with memrchr and strlen
2023-02-02 18:11 ` [PATCH v12 14/31] string: Improve generic strrchr with memrchr and strlen Adhemerval Zanella
@ 2023-02-04 3:06 ` Richard Henderson
2023-02-06 14:01 ` Adhemerval Zanella Netto
0 siblings, 1 reply; 5+ messages in thread
From: Richard Henderson @ 2023-02-04 3:06 UTC (permalink / raw)
To: Adhemerval Zanella, libc-alpha, Jeff Law, Xi Ruoyao, Noah Goldstein
On 2/2/23 08:11, Adhemerval Zanella wrote:
> Now that both strlen and memrchr have word vectorized implementation,
> it should be faster to implement strrchr based on memrchr over the
> string length instead of calling strchr on a loop.
>
> Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu,
> and powerpc64-linux-gnu by removing the arch-specific assembly
> implementation and disabling multi-arch (it covers both LE and BE
> for 64 and 32 bits).
> ---
> string/strrchr.c | 18 +-----------------
> 1 file changed, 1 insertion(+), 17 deletions(-)
memrchr needs libc_hidden_builtin_proto. On riscv64:
c: 00000097 auipc ra,0x0
c: R_RISCV_CALL __GI_strlen
c: R_RISCV_RELAX *ABS*
10: 000080e7 jalr ra # c <__GI_strrchr+0xc>
...
24: 00000317 auipc t1,0x0
24: R_RISCV_CALL_PLT __memrchr
24: R_RISCV_RELAX *ABS*
28: 00030067 jr t1 # 24 <.LVL2+0x8>
r~
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v12 14/31] string: Improve generic strrchr with memrchr and strlen
2023-02-04 3:06 ` Richard Henderson
@ 2023-02-06 14:01 ` Adhemerval Zanella Netto
0 siblings, 0 replies; 5+ messages in thread
From: Adhemerval Zanella Netto @ 2023-02-06 14:01 UTC (permalink / raw)
To: Richard Henderson, libc-alpha, Jeff Law, Xi Ruoyao, Noah Goldstein
On 04/02/23 00:06, Richard Henderson wrote:
> On 2/2/23 08:11, Adhemerval Zanella wrote:
>> Now that both strlen and memrchr have word vectorized implementation,
>> it should be faster to implement strrchr based on memrchr over the
>> string length instead of calling strchr on a loop.
>>
>> Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu,
>> and powerpc64-linux-gnu by removing the arch-specific assembly
>> implementation and disabling multi-arch (it covers both LE and BE
>> for 64 and 32 bits).
>> ---
>> string/strrchr.c | 18 +-----------------
>> 1 file changed, 1 insertion(+), 17 deletions(-)
>
> memrchr needs libc_hidden_builtin_proto. On riscv64:
>
> c: 00000097 auipc ra,0x0
> c: R_RISCV_CALL __GI_strlen
> c: R_RISCV_RELAX *ABS*
> 10: 000080e7 jalr ra # c <__GI_strrchr+0xc>
> ...
> 24: 00000317 auipc t1,0x0
> 24: R_RISCV_CALL_PLT __memrchr
> 24: R_RISCV_RELAX *ABS*
> 28: 00030067 jr t1 # 24 <.LVL2+0x8>
>
>
> r~
Similar to strchr [1], static linker ends up creating the expected local
call:
$ riscv64-glibc-linux-gnu-objdump -dr libc.so
[...]
0000000000088f5c <strrchr>:
88f5c: 7179 addi sp,sp,-48
88f5e: e84a sd s2,16(sp)
88f60: 000e7917 auipc s2,0xe7
88f64: 73893903 ld s2,1848(s2) # 170698 <__stack_chk_guard@GLIBC_2.27>
88f68: 00093783 ld a5,0(s2)
[...]
88f9a: 6145 addi sp,sp,48
88f9c: c70fd06f j 8640c <memrchr>
88fa0: 193500ef jal ra,d9932 <__stack_chk_fail>
[...]
But I agree that we should change to have the libc_hidden_builin_proto
in this case. I will send a patch to fix it.
[1] https://sourceware.org/pipermail/libc-alpha/2023-February/145322.html
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-02-06 14:01 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-02 19:54 [PATCH v12 14/31] string: Improve generic strrchr with memrchr and strlen Wilco Dijkstra
2023-02-03 12:15 ` Adhemerval Zanella Netto
-- strict thread matches above, loose matches on Subject: below --
2023-02-02 18:11 [PATCH v12 00/31] Improve generic string routines Adhemerval Zanella
2023-02-02 18:11 ` [PATCH v12 14/31] string: Improve generic strrchr with memrchr and strlen Adhemerval Zanella
2023-02-04 3:06 ` Richard Henderson
2023-02-06 14:01 ` Adhemerval Zanella Netto
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).