* [PATCH 0/4] LoongArch: Add ifunc support for str{cpy, rchr},
@ 2023-09-08 9:33 dengjianbo
2023-09-08 9:33 ` [PATCH 1/4] LoongArch: Add ifunc support for strcpy{aligned, unaligned, lsx, lasx} dengjianbo
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: dengjianbo @ 2023-09-08 9:33 UTC (permalink / raw)
To: libc-alpha
Cc: adhemerval.zanella, xry111, caiyinyu, xuchenghua, huangpei, dengjianbo
This patch add mutiple versions of strcpy, stpcpy, strrchr implemented
by basic LoongArch instructions, LSX instructions, LASX instructions.
Even though this implementation experience degradation in a few cases,
overall, the performance gains are significant.
See:
https://github.com/jiadengx/glibc_test/blob/main/bench/strcpy_compare.out
https://github.com/jiadengx/glibc_test/blob/main/bench/stpcpy_compare.out
Test results are compared with generic strcpy and stpcpy, not strlen +
memcpy in the benchmark.
Generic strrchr is implemented by strlen + memrchr, the strrchr_lasx
will be compared with generic_strrchr implemented by strlen-lasx and
memrchr-lasx, strrchr-lsx will be compared with generic_strrchr
implemented by strlen-lsx and memrchr-lsx, strrchr-aligned will be
compared with generic_strrchr implemented by strlen-aligned and
memrchr-generic.
https://github.com/jiadengx/glibc_test/blob/main/bench/strrchr_lasx_compare.out
https://github.com/jiadengx/glibc_test/blob/main/bench/strrchr_lsx_compare.out
https://github.com/jiadengx/glibc_test/blob/main/bench/strrchr_aligned_compare.out
In the data, positive values in the parentheses indicate that our
implementation took less time, indicating a performance improvement;
negative values in the parentheses mean that our implementation took
more time, indicating a decrease in performance. Following is the
summarise of the performance comparing with the generic version in the
glibc microbenchmark,
name reduce time percent
strcpy-aligned 10%-45%
strcpy-unaligned 10%-49%, comparing with the aligned version,unaligned
version experience better performance in case src and
dest cannot be both aligned with 8bytes
strcpy-lsx 20%-80%
strcpy-lasx 15%-86%
stpcpy-lasx 10%-87%
stpcpy-lsx 10%-80%
stpcpy-aligned 5%-45%
strrchr-lasx 10%-50%
strrchr-lsx 0%-50%
strrchr-aligned 5%-50%
dengjianbo (4):
LoongArch: Add ifunc support for strcpy{aligned, unaligned, lsx, lasx}
LoongArch: Add ifunc support for stpcpy{aligned, lsx, lasx}
LoongArch: Add ifunc support for strrchr{aligned, lsx, lasx}
LoongArch: Change to put magic number to .rodata section
sysdeps/loongarch/lp64/multiarch/Makefile | 10 +
.../lp64/multiarch/ifunc-impl-list.c | 25 +++
.../loongarch/lp64/multiarch/ifunc-stpcpy.h | 40 ++++
.../loongarch/lp64/multiarch/ifunc-strrchr.h | 41 ++++
.../loongarch/lp64/multiarch/memmove-lsx.S | 20 +-
.../loongarch/lp64/multiarch/stpcpy-aligned.S | 191 ++++++++++++++++
.../loongarch/lp64/multiarch/stpcpy-lasx.S | 208 ++++++++++++++++++
sysdeps/loongarch/lp64/multiarch/stpcpy-lsx.S | 206 +++++++++++++++++
sysdeps/loongarch/lp64/multiarch/stpcpy.c | 42 ++++
.../loongarch/lp64/multiarch/strcpy-aligned.S | 185 ++++++++++++++++
.../loongarch/lp64/multiarch/strcpy-lasx.S | 208 ++++++++++++++++++
sysdeps/loongarch/lp64/multiarch/strcpy-lsx.S | 197 +++++++++++++++++
.../lp64/multiarch/strcpy-unaligned.S | 131 +++++++++++
sysdeps/loongarch/lp64/multiarch/strcpy.c | 35 +++
.../lp64/multiarch/strrchr-aligned.S | 170 ++++++++++++++
.../loongarch/lp64/multiarch/strrchr-lasx.S | 176 +++++++++++++++
.../loongarch/lp64/multiarch/strrchr-lsx.S | 144 ++++++++++++
sysdeps/loongarch/lp64/multiarch/strrchr.c | 36 +++
18 files changed, 2055 insertions(+), 10 deletions(-)
create mode 100644 sysdeps/loongarch/lp64/multiarch/ifunc-stpcpy.h
create mode 100644 sysdeps/loongarch/lp64/multiarch/ifunc-strrchr.h
create mode 100644 sysdeps/loongarch/lp64/multiarch/stpcpy-aligned.S
create mode 100644 sysdeps/loongarch/lp64/multiarch/stpcpy-lasx.S
create mode 100644 sysdeps/loongarch/lp64/multiarch/stpcpy-lsx.S
create mode 100644 sysdeps/loongarch/lp64/multiarch/stpcpy.c
create mode 100644 sysdeps/loongarch/lp64/multiarch/strcpy-aligned.S
create mode 100644 sysdeps/loongarch/lp64/multiarch/strcpy-lasx.S
create mode 100644 sysdeps/loongarch/lp64/multiarch/strcpy-lsx.S
create mode 100644 sysdeps/loongarch/lp64/multiarch/strcpy-unaligned.S
create mode 100644 sysdeps/loongarch/lp64/multiarch/strcpy.c
create mode 100644 sysdeps/loongarch/lp64/multiarch/strrchr-aligned.S
create mode 100644 sysdeps/loongarch/lp64/multiarch/strrchr-lasx.S
create mode 100644 sysdeps/loongarch/lp64/multiarch/strrchr-lsx.S
create mode 100644 sysdeps/loongarch/lp64/multiarch/strrchr.c
--
2.40.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/4] LoongArch: Add ifunc support for strcpy{aligned, unaligned, lsx, lasx}
2023-09-08 9:33 [PATCH 0/4] LoongArch: Add ifunc support for str{cpy, rchr}, dengjianbo
@ 2023-09-08 9:33 ` dengjianbo
2023-09-08 14:22 ` Xi Ruoyao
2023-09-08 9:33 ` [PATCH 2/4] LoongArch: Add ifunc support for stpcpy{aligned, " dengjianbo
` (2 subsequent siblings)
3 siblings, 1 reply; 8+ messages in thread
From: dengjianbo @ 2023-09-08 9:33 UTC (permalink / raw)
To: libc-alpha
Cc: adhemerval.zanella, xry111, caiyinyu, xuchenghua, huangpei, dengjianbo
According to glibc strcpy microbenchmark test results(changed to use
generic_strcpy instead of strlen + memcpy), comparing with generic_strcpy,
this implementation could reduce the runtime as following:
Name Percent of rutime reduced
strcpy-aligned 10%-45%
strcpy-unaligned 10%-49%, comparing with the aligned version,unaligned
version experience better performance in case src and dest
cannot be both aligned with 8bytes
strcpy-lsx 20%-80%
strcpy-lasx 15%-86%
---
sysdeps/loongarch/lp64/multiarch/Makefile | 4 +
.../lp64/multiarch/ifunc-impl-list.c | 9 +
.../loongarch/lp64/multiarch/strcpy-aligned.S | 185 ++++++++++++++++
.../loongarch/lp64/multiarch/strcpy-lasx.S | 208 ++++++++++++++++++
sysdeps/loongarch/lp64/multiarch/strcpy-lsx.S | 197 +++++++++++++++++
.../lp64/multiarch/strcpy-unaligned.S | 131 +++++++++++
sysdeps/loongarch/lp64/multiarch/strcpy.c | 35 +++
7 files changed, 769 insertions(+)
create mode 100644 sysdeps/loongarch/lp64/multiarch/strcpy-aligned.S
create mode 100644 sysdeps/loongarch/lp64/multiarch/strcpy-lasx.S
create mode 100644 sysdeps/loongarch/lp64/multiarch/strcpy-lsx.S
create mode 100644 sysdeps/loongarch/lp64/multiarch/strcpy-unaligned.S
create mode 100644 sysdeps/loongarch/lp64/multiarch/strcpy.c
diff --git a/sysdeps/loongarch/lp64/multiarch/Makefile b/sysdeps/loongarch/lp64/multiarch/Makefile
index 360a6718c0..f05685ceec 100644
--- a/sysdeps/loongarch/lp64/multiarch/Makefile
+++ b/sysdeps/loongarch/lp64/multiarch/Makefile
@@ -16,6 +16,10 @@ sysdep_routines += \
strcmp-lsx \
strncmp-aligned \
strncmp-lsx \
+ strcpy-aligned \
+ strcpy-unaligned \
+ strcpy-lsx \
+ strcpy-lasx \
memcpy-aligned \
memcpy-unaligned \
memmove-unaligned \
diff --git a/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c
index e397d58c9d..b556bacbd1 100644
--- a/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c
@@ -76,6 +76,15 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
IFUNC_IMPL_ADD (array, i, strncmp, 1, __strncmp_aligned)
)
+ IFUNC_IMPL (i, name, strcpy,
+#if !defined __loongarch_soft_float
+ IFUNC_IMPL_ADD (array, i, strcpy, SUPPORT_LASX, __strcpy_lasx)
+ IFUNC_IMPL_ADD (array, i, strcpy, SUPPORT_LSX, __strcpy_lsx)
+#endif
+ IFUNC_IMPL_ADD (array, i, strcpy, SUPPORT_UAL, __strcpy_unaligned)
+ IFUNC_IMPL_ADD (array, i, strcpy, 1, __strcpy_aligned)
+ )
+
IFUNC_IMPL (i, name, memcpy,
#if !defined __loongarch_soft_float
IFUNC_IMPL_ADD (array, i, memcpy, SUPPORT_LASX, __memcpy_lasx)
diff --git a/sysdeps/loongarch/lp64/multiarch/strcpy-aligned.S b/sysdeps/loongarch/lp64/multiarch/strcpy-aligned.S
new file mode 100644
index 0000000000..d5926e5e11
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/strcpy-aligned.S
@@ -0,0 +1,185 @@
+/* Optimized strcpy aligned implementation using basic LoongArch instructions.
+ Copyright (C) 2023 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library. If not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include <sysdep.h>
+#include <sys/regdef.h>
+#include <sys/asm.h>
+
+#if IS_IN (libc)
+# define STRCPY __strcpy_aligned
+#else
+# define STRCPY strcpy
+#endif
+
+LEAF(STRCPY, 6)
+ andi a3, a0, 0x7
+ move a2, a0
+ beqz a3, L(dest_align)
+ sub.d a5, a1, a3
+ addi.d a5, a5, 8
+
+L(make_dest_align):
+ ld.b t0, a1, 0
+ addi.d a1, a1, 1
+ st.b t0, a2, 0
+ beqz t0, L(al_out)
+
+ addi.d a2, a2, 1
+ bne a1, a5, L(make_dest_align)
+
+L(dest_align):
+ andi a4, a1, 7
+ bstrins.d a1, zero, 2, 0
+
+ lu12i.w t5, 0x1010
+ ld.d t0, a1, 0
+ ori t5, t5, 0x101
+ bstrins.d t5, t5, 63, 32
+
+ slli.d t6, t5, 0x7
+ bnez a4, L(unalign)
+ sub.d t1, t0, t5
+ andn t2, t6, t0
+
+ and t3, t1, t2
+ bnez t3, L(al_end)
+
+L(al_loop):
+ st.d t0, a2, 0
+ ld.d t0, a1, 8
+
+ addi.d a1, a1, 8
+ addi.d a2, a2, 8
+ sub.d t1, t0, t5
+ andn t2, t6, t0
+
+ and t3, t1, t2
+ beqz t3, L(al_loop)
+
+L(al_end):
+ ctz.d t1, t3
+ srli.d t1, t1, 3
+ addi.d t1, t1, 1
+
+ andi a3, t1, 8
+ andi a4, t1, 4
+ andi a5, t1, 2
+ andi a6, t1, 1
+
+L(al_end_8):
+ beqz a3, L(al_end_4)
+ st.d t0, a2, 0
+ jr ra
+L(al_end_4):
+ beqz a4, L(al_end_2)
+ st.w t0, a2, 0
+ addi.d a2, a2, 4
+ srli.d t0, t0, 32
+L(al_end_2):
+ beqz a5, L(al_end_1)
+ st.h t0, a2, 0
+ addi.d a2, a2, 2
+ srli.d t0, t0, 16
+L(al_end_1):
+ beqz a6, L(al_out)
+ st.b t0, a2, 0
+L(al_out):
+ jr ra
+
+L(unalign):
+ slli.d a5, a4, 3
+ li.d t1, -1
+ sub.d a6, zero, a5
+
+ srl.d a7, t0, a5
+ sll.d t7, t1, a6
+
+ or t0, a7, t7
+ sub.d t1, t0, t5
+ andn t2, t6, t0
+ and t3, t1, t2
+
+ bnez t3, L(un_end)
+
+ ld.d t4, a1, 8
+
+ sub.d t1, t4, t5
+ andn t2, t6, t4
+ sll.d t0, t4, a6
+ and t3, t1, t2
+
+ or t0, t0, a7
+ bnez t3, L(un_end_with_remaining)
+
+L(un_loop):
+ srl.d a7, t4, a5
+
+ ld.d t4, a1, 16
+ addi.d a1, a1, 8
+
+ st.d t0, a2, 0
+ addi.d a2, a2, 8
+
+ sub.d t1, t4, t5
+ andn t2, t6, t4
+ sll.d t0, t4, a6
+ and t3, t1, t2
+
+ or t0, t0, a7
+ beqz t3, L(un_loop)
+
+L(un_end_with_remaining):
+ ctz.d t1, t3
+ srli.d t1, t1, 3
+ addi.d t1, t1, 1
+ sub.d t1, t1, a4
+
+ blt t1, zero, L(un_end_less_8)
+ st.d t0, a2, 0
+ addi.d a2, a2, 8
+ beqz t1, L(un_out)
+ srl.d t0, t4, a5
+ b L(un_end_less_8)
+
+L(un_end):
+ ctz.d t1, t3
+ srli.d t1, t1, 3
+ addi.d t1, t1, 1
+
+L(un_end_less_8):
+ andi a4, t1, 4
+ andi a5, t1, 2
+ andi a6, t1, 1
+L(un_end_4):
+ beqz a4, L(un_end_2)
+ st.w t0, a2, 0
+ addi.d a2, a2, 4
+ srli.d t0, t0, 32
+L(un_end_2):
+ beqz a5, L(un_end_1)
+ st.h t0, a2, 0
+ addi.d a2, a2, 2
+ srli.d t0, t0, 16
+L(un_end_1):
+ beqz a6, L(un_out)
+ st.b t0, a2, 0
+L(un_out):
+ jr ra
+END(STRCPY)
+
+libc_hidden_builtin_def (STRCPY)
diff --git a/sysdeps/loongarch/lp64/multiarch/strcpy-lasx.S b/sysdeps/loongarch/lp64/multiarch/strcpy-lasx.S
new file mode 100644
index 0000000000..d928db5b91
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/strcpy-lasx.S
@@ -0,0 +1,208 @@
+/* Optimized strcpy implementation using LoongArch LASX instructions.
+ Copyright (C) 2023 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library. If not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include <sysdep.h>
+#include <sys/regdef.h>
+#include <sys/asm.h>
+
+#if IS_IN (libc) && !defined __loongarch_soft_float
+
+#define STRCPY __strcpy_lasx
+
+LEAF(STRCPY, 6)
+ ori t8, zero, 0xfe0
+ andi t0, a1, 0xfff
+ li.d t7, -1
+ move a2, a0
+
+ bltu t8, t0, L(page_cross_start)
+L(start_entry):
+ xvld xr0, a1, 0
+ li.d t0, 32
+ andi t1, a2, 0x1f
+
+ xvsetanyeqz.b fcc0, xr0
+ sub.d t0, t0, t1
+ bcnez fcc0, L(end)
+ add.d a1, a1, t0
+
+ xvst xr0, a2, 0
+ andi a3, a1, 0x1f
+ add.d a2, a2, t0
+ bnez a3, L(unaligned)
+
+
+ xvld xr0, a1, 0
+ xvsetanyeqz.b fcc0, xr0
+ bcnez fcc0, L(al_end)
+L(al_loop):
+ xvst xr0, a2, 0
+
+ xvld xr0, a1, 32
+ addi.d a2, a2, 32
+ addi.d a1, a1, 32
+ xvsetanyeqz.b fcc0, xr0
+
+ bceqz fcc0, L(al_loop)
+L(al_end):
+ xvmsknz.b xr0, xr0
+ xvpickve.w xr1, xr0, 4
+ vilvl.h vr0, vr1, vr0
+
+ movfr2gr.s t0, fa0
+ cto.w t0, t0
+ add.d a1, a1, t0
+ xvld xr0, a1, -31
+
+
+ add.d a2, a2, t0
+ xvst xr0, a2, -31
+ jr ra
+ nop
+
+L(page_cross_start):
+ move a4, a1
+ bstrins.d a4, zero, 4, 0
+ xvld xr0, a4, 0
+ xvmsknz.b xr0, xr0
+
+ xvpickve.w xr1, xr0, 4
+ vilvl.h vr0, vr1, vr0
+ movfr2gr.s t0, fa0
+ sra.w t0, t0, a1
+
+ beq t0, t7, L(start_entry)
+ b L(tail)
+L(unaligned):
+ andi t0, a1, 0xfff
+ bltu t8, t0, L(un_page_cross)
+
+
+L(un_start_entry):
+ xvld xr0, a1, 0
+ xvsetanyeqz.b fcc0, xr0
+ bcnez fcc0, L(un_end)
+ addi.d a1, a1, 32
+
+L(un_loop):
+ xvst xr0, a2, 0
+ andi t0, a1, 0xfff
+ addi.d a2, a2, 32
+ bltu t8, t0, L(page_cross_loop)
+
+L(un_loop_entry):
+ xvld xr0, a1, 0
+ addi.d a1, a1, 32
+ xvsetanyeqz.b fcc0, xr0
+ bceqz fcc0, L(un_loop)
+
+ addi.d a1, a1, -32
+L(un_end):
+ xvmsknz.b xr0, xr0
+ xvpickve.w xr1, xr0, 4
+ vilvl.h vr0, vr1, vr0
+
+
+ movfr2gr.s t0, fa0
+L(un_tail):
+ cto.w t0, t0
+ add.d a1, a1, t0
+ xvld xr0, a1, -31
+
+ add.d a2, a2, t0
+ xvst xr0, a2, -31
+ jr ra
+L(un_page_cross):
+ sub.d a4, a1, a3
+
+ xvld xr0, a4, 0
+ xvmsknz.b xr0, xr0
+ xvpickve.w xr1, xr0, 4
+ vilvl.h vr0, vr1, vr0
+
+ movfr2gr.s t0, fa0
+ sra.w t0, t0, a1
+ beq t0, t7, L(un_start_entry)
+ b L(un_tail)
+
+
+L(page_cross_loop):
+ sub.d a4, a1, a3
+ xvld xr0, a4, 0
+ xvmsknz.b xr0, xr0
+ xvpickve.w xr1, xr0, 4
+
+ vilvl.h vr0, vr1, vr0
+ movfr2gr.s t0, fa0
+ sra.w t0, t0, a1
+ beq t0, t7, L(un_loop_entry)
+
+ b L(un_tail)
+L(end):
+ xvmsknz.b xr0, xr0
+ xvpickve.w xr1, xr0, 4
+ vilvl.h vr0, vr1, vr0
+
+ movfr2gr.s t0, fa0
+L(tail):
+ cto.w t0, t0
+ add.d a4, a2, t0
+ add.d a5, a1, t0
+
+L(less_32):
+ srli.d t1, t0, 4
+ beqz t1, L(less_16)
+ vld vr0, a1, 0
+ vld vr1, a5, -15
+
+ vst vr0, a2, 0
+ vst vr1, a4, -15
+ jr ra
+L(less_16):
+ srli.d t1, t0, 3
+
+ beqz t1, L(less_8)
+ ld.d t2, a1, 0
+ ld.d t3, a5, -7
+ st.d t2, a2, 0
+
+ st.d t3, a4, -7
+ jr ra
+L(less_8):
+ li.d t1, 3
+ bltu t0, t1, L(less_4)
+
+ ld.w t2, a1, 0
+ ld.w t3, a5, -3
+ st.w t2, a2, 0
+ st.w t3, a4, -3
+
+ jr ra
+L(less_4):
+ srli.d t1, t0, 2
+ bgeu t1, t0, L(zero_byte)
+ ld.h t2, a1, 0
+
+ st.h t2, a2, 0
+L(zero_byte):
+ st.b zero, a4, 0
+ jr ra
+END(STRCPY)
+
+libc_hidden_builtin_def (STRCPY)
+#endif
diff --git a/sysdeps/loongarch/lp64/multiarch/strcpy-lsx.S b/sysdeps/loongarch/lp64/multiarch/strcpy-lsx.S
new file mode 100644
index 0000000000..7a17af12a3
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/strcpy-lsx.S
@@ -0,0 +1,197 @@
+/* Optimized strcpy implementation using LoongArch LSX instructions.
+ Copyright (C) 2023 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library. If not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include <sysdep.h>
+#include <sys/regdef.h>
+#include <sys/asm.h>
+
+#if IS_IN (libc) && !defined __loongarch_soft_float
+
+# define STRCPY __strcpy_lsx
+
+LEAF(STRCPY, 6)
+ pcalau12i t0, %pc_hi20(L(INDEX))
+ andi a4, a1, 0xf
+ vld vr1, t0, %pc_lo12(L(INDEX))
+ move a2, a0
+
+ beqz a4, L(load_start)
+ xor t0, a1, a4
+ vld vr0, t0, 0
+ vreplgr2vr.b vr2, a4
+
+ vadd.b vr2, vr2, vr1
+ vshuf.b vr0, vr2, vr0, vr2
+ vsetanyeqz.b fcc0, vr0
+ bcnez fcc0, L(end)
+
+L(load_start):
+ vld vr0, a1, 0
+ li.d t1, 16
+ andi a3, a2, 0xf
+ vsetanyeqz.b fcc0, vr0
+
+
+ sub.d t0, t1, a3
+ bcnez fcc0, L(end)
+ add.d a1, a1, t0
+ vst vr0, a2, 0
+
+ andi a3, a1, 0xf
+ add.d a2, a2, t0
+ bnez a3, L(unaligned)
+ vld vr0, a1, 0
+
+ vsetanyeqz.b fcc0, vr0
+ bcnez fcc0, L(al_end)
+L(al_loop):
+ vst vr0, a2, 0
+ vld vr0, a1, 16
+
+ addi.d a2, a2, 16
+ addi.d a1, a1, 16
+ vsetanyeqz.b fcc0, vr0
+ bceqz fcc0, L(al_loop)
+
+
+L(al_end):
+ vmsknz.b vr1, vr0
+ movfr2gr.s t0, fa1
+ cto.w t0, t0
+ add.d a1, a1, t0
+
+ vld vr0, a1, -15
+ add.d a2, a2, t0
+ vst vr0, a2, -15
+ jr ra
+
+L(end):
+ vmsknz.b vr1, vr0
+ movfr2gr.s t0, fa1
+ cto.w t0, t0
+ addi.d t0, t0, 1
+
+L(end_16):
+ andi t1, t0, 16
+ beqz t1, L(end_8)
+ vst vr0, a2, 0
+ jr ra
+
+
+L(end_8):
+ andi t2, t0, 8
+ andi t3, t0, 4
+ andi t4, t0, 2
+ andi t5, t0, 1
+
+ beqz t2, L(end_4)
+ vstelm.d vr0, a2, 0, 0
+ addi.d a2, a2, 8
+ vbsrl.v vr0, vr0, 8
+
+L(end_4):
+ beqz t3, L(end_2)
+ vstelm.w vr0, a2, 0, 0
+ addi.d a2, a2, 4
+ vbsrl.v vr0, vr0, 4
+
+L(end_2):
+ beqz t4, L(end_1)
+ vstelm.h vr0, a2, 0, 0
+ addi.d a2, a2, 2
+ vbsrl.v vr0, vr0, 2
+
+
+L(end_1):
+ beqz t5, L(out)
+ vstelm.b vr0, a2, 0, 0
+L(out):
+ jr ra
+ nop
+
+L(unaligned):
+ bstrins.d a1, zero, 3, 0
+ vld vr2, a1, 0
+ vreplgr2vr.b vr3, a3
+ vslt.b vr4, vr1, vr3
+
+ vor.v vr0, vr2, vr4
+ vsetanyeqz.b fcc0, vr0
+ bcnez fcc0, L(un_first_end)
+ vld vr0, a1, 16
+
+ vadd.b vr3, vr3, vr1
+ vshuf.b vr4, vr0, vr2, vr3
+ vsetanyeqz.b fcc0, vr0
+ bcnez fcc0, L(un_end)
+
+
+ vor.v vr2, vr0, vr0
+ addi.d a1, a1, 16
+L(un_loop):
+ vld vr0, a1, 16
+ vst vr4, a2, 0
+
+ addi.d a2, a2, 16
+ vshuf.b vr4, vr0, vr2, vr3
+ vsetanyeqz.b fcc0, vr0
+ bcnez fcc0, L(un_end)
+
+ vld vr2, a1, 32
+ vst vr4, a2, 0
+ addi.d a1, a1, 32
+ addi.d a2, a2, 16
+
+ vshuf.b vr4, vr2, vr0, vr3
+ vsetanyeqz.b fcc0, vr2
+ bceqz fcc0, L(un_loop)
+ vor.v vr0, vr2, vr2
+
+
+ addi.d a1, a1, -16
+L(un_end):
+ vsetanyeqz.b fcc0, vr4
+ bcnez fcc0, 1f
+ vst vr4, a2, 0
+
+1:
+ vmsknz.b vr1, vr0
+ movfr2gr.s t0, fa1
+ cto.w t0, t0
+ add.d a1, a1, t0
+
+ vld vr0, a1, 1
+ add.d a2, a2, t0
+ sub.d a2, a2, a3
+ vst vr0, a2, 1
+
+ jr ra
+L(un_first_end):
+ addi.d a2, a2, -16
+ addi.d a1, a1, -16
+ b 1b
+END(STRCPY)
+
+ .section .rodata.cst16,"M",@progbits,16
+ .align 4
+L(INDEX):
+ .dword 0x0706050403020100
+ .dword 0x0f0e0d0c0b0a0908
+
+libc_hidden_builtin_def (STRCPY)
+#endif
diff --git a/sysdeps/loongarch/lp64/multiarch/strcpy-unaligned.S b/sysdeps/loongarch/lp64/multiarch/strcpy-unaligned.S
new file mode 100644
index 0000000000..12e79f2ac0
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/strcpy-unaligned.S
@@ -0,0 +1,131 @@
+/* Optimized strcpy unaligned implementation using basic LoongArch instructions.
+ Copyright (C) 2023 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library. If not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include <sysdep.h>
+#include <sys/regdef.h>
+#include <sys/asm.h>
+
+#if IS_IN (libc)
+
+# define STRCPY __strcpy_unaligned
+
+LEAF(STRCPY, 4)
+ move t8, a0
+ lu12i.w t5, 0x01010
+ lu12i.w t6, 0x7f7f7
+ ori t5, t5, 0x101
+
+ ori t6, t6, 0xf7f
+ bstrins.d t5, t5, 63, 32
+ bstrins.d t6, t6, 63, 32
+ andi a3, a1, 0x7
+
+ beqz a3, L(strcpy_loop_aligned_1)
+ b L(strcpy_mutual_align)
+L(strcpy_loop_aligned):
+ st.d t0, a0, 0
+ addi.d a0, a0, 8
+
+L(strcpy_loop_aligned_1):
+ ld.d t0, a1, 0
+ addi.d a1, a1, 8
+L(strcpy_start_realigned):
+ sub.d a4, t0, t5
+ or a5, t0, t6
+
+ andn t2, a4, a5
+ beqz t2, L(strcpy_loop_aligned)
+L(strcpy_end):
+ ctz.d t7, t2
+ srli.d t7, t7, 3
+ addi.d t7, t7, 1
+
+L(strcpy_end_8):
+ andi a4, t7, 0x8
+ beqz a4, L(strcpy_end_4)
+ st.d t0, a0, 0
+ move a0, t8
+ jr ra
+
+L(strcpy_end_4):
+ andi a4, t7, 0x4
+ beqz a4, L(strcpy_end_2)
+ st.w t0, a0, 0
+ srli.d t0, t0, 32
+ addi.d a0, a0, 4
+
+L(strcpy_end_2):
+ andi a4, t7, 0x2
+ beqz a4, L(strcpy_end_1)
+ st.h t0, a0, 0
+ srli.d t0, t0, 16
+ addi.d a0, a0, 2
+
+L(strcpy_end_1):
+ andi a4, t7, 0x1
+ beqz a4, L(strcpy_end_ret)
+ st.b t0, a0, 0
+
+L(strcpy_end_ret):
+ move a0, t8
+ jr ra
+
+
+L(strcpy_mutual_align):
+ li.w a5, 0xff8
+ andi a4, a1, 0xff8
+ beq a4, a5, L(strcpy_page_cross)
+
+L(strcpy_page_cross_ok):
+ ld.d t0, a1, 0
+ sub.d a4, t0, t5
+ or a5, t0, t6
+ andn t2, a4, a5
+ bnez t2, L(strcpy_end)
+
+L(strcpy_mutual_align_finish):
+ li.w a4, 8
+ st.d t0, a0, 0
+ sub.d a4, a4, a3
+ add.d a1, a1, a4
+ add.d a0, a0, a4
+
+ b L(strcpy_loop_aligned_1)
+
+L(strcpy_page_cross):
+ li.w a4, 0x7
+ andn a6, a1, a4
+ ld.d t0, a6, 0
+ li.w a7, -1
+
+ slli.d a5, a3, 3
+ srl.d a7, a7, a5
+ srl.d t0, t0, a5
+ nor a7, a7, zero
+
+ or t0, t0, a7
+ sub.d a4, t0, t5
+ or a5, t0, t6
+ andn t2, a4, a5
+ beqz t2, L(strcpy_page_cross_ok)
+
+ b L(strcpy_end)
+END(STRCPY)
+
+libc_hidden_builtin_def (STRCPY)
+#endif
diff --git a/sysdeps/loongarch/lp64/multiarch/strcpy.c b/sysdeps/loongarch/lp64/multiarch/strcpy.c
new file mode 100644
index 0000000000..46afd068f9
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/strcpy.c
@@ -0,0 +1,35 @@
+/* Multiple versions of strcpy.
+ All versions must be listed in ifunc-impl-list.c.
+ Copyright (C) 2023 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+/* Define multiple versions only for the definition in libc. */
+#if IS_IN (libc)
+# define strcpy __redirect_strcpy
+# include <string.h>
+# undef strcpy
+
+# define SYMBOL_NAME strcpy
+# include "ifunc-lasx.h"
+
+libc_ifunc_redirected (__redirect_strcpy, strcpy, IFUNC_SELECTOR ());
+
+# ifdef SHARED
+__hidden_ver1 (strcpy, __GI_strcpy, __redirect_strcpy)
+ __attribute__ ((visibility ("hidden"))) __attribute_copy__ (strcpy);
+# endif
+#endif
--
2.40.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/4] LoongArch: Add ifunc support for stpcpy{aligned, lsx, lasx}
2023-09-08 9:33 [PATCH 0/4] LoongArch: Add ifunc support for str{cpy, rchr}, dengjianbo
2023-09-08 9:33 ` [PATCH 1/4] LoongArch: Add ifunc support for strcpy{aligned, unaligned, lsx, lasx} dengjianbo
@ 2023-09-08 9:33 ` dengjianbo
2023-09-08 9:33 ` [PATCH 3/4] LoongArch: Add ifunc support for strrchr{aligned, " dengjianbo
2023-09-08 9:33 ` [PATCH 4/4] LoongArch: Change to put magic number to .rodata section dengjianbo
3 siblings, 0 replies; 8+ messages in thread
From: dengjianbo @ 2023-09-08 9:33 UTC (permalink / raw)
To: libc-alpha
Cc: adhemerval.zanella, xry111, caiyinyu, xuchenghua, huangpei, dengjianbo
According to glibc stpcpy microbenchmark test results(changed to use
generic_stpcpy instead of strlen + memcpy), This implementation could
reduce the runtime as following:
Name Percent of rutime reduced
stpcpy-lasx 10%-87%
stpcpy-lsx 10%-80%
stpcpy-aligned 5%-45%
---
sysdeps/loongarch/lp64/multiarch/Makefile | 3 +
.../lp64/multiarch/ifunc-impl-list.c | 8 +
.../loongarch/lp64/multiarch/ifunc-stpcpy.h | 40 ++++
.../loongarch/lp64/multiarch/stpcpy-aligned.S | 191 ++++++++++++++++
.../loongarch/lp64/multiarch/stpcpy-lasx.S | 208 ++++++++++++++++++
sysdeps/loongarch/lp64/multiarch/stpcpy-lsx.S | 206 +++++++++++++++++
sysdeps/loongarch/lp64/multiarch/stpcpy.c | 42 ++++
7 files changed, 698 insertions(+)
create mode 100644 sysdeps/loongarch/lp64/multiarch/ifunc-stpcpy.h
create mode 100644 sysdeps/loongarch/lp64/multiarch/stpcpy-aligned.S
create mode 100644 sysdeps/loongarch/lp64/multiarch/stpcpy-lasx.S
create mode 100644 sysdeps/loongarch/lp64/multiarch/stpcpy-lsx.S
create mode 100644 sysdeps/loongarch/lp64/multiarch/stpcpy.c
diff --git a/sysdeps/loongarch/lp64/multiarch/Makefile b/sysdeps/loongarch/lp64/multiarch/Makefile
index f05685ceec..f95eb5c4fe 100644
--- a/sysdeps/loongarch/lp64/multiarch/Makefile
+++ b/sysdeps/loongarch/lp64/multiarch/Makefile
@@ -20,6 +20,9 @@ sysdep_routines += \
strcpy-unaligned \
strcpy-lsx \
strcpy-lasx \
+ stpcpy-aligned \
+ stpcpy-lsx \
+ stpcpy-lasx \
memcpy-aligned \
memcpy-unaligned \
memmove-unaligned \
diff --git a/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c
index b556bacbd1..539aa681f9 100644
--- a/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c
@@ -85,6 +85,14 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
IFUNC_IMPL_ADD (array, i, strcpy, 1, __strcpy_aligned)
)
+ IFUNC_IMPL (i, name, stpcpy,
+#if !defined __loongarch_soft_float
+ IFUNC_IMPL_ADD (array, i, stpcpy, SUPPORT_LASX, __stpcpy_lasx)
+ IFUNC_IMPL_ADD (array, i, stpcpy, SUPPORT_LSX, __stpcpy_lsx)
+#endif
+ IFUNC_IMPL_ADD (array, i, stpcpy, 1, __stpcpy_aligned)
+ )
+
IFUNC_IMPL (i, name, memcpy,
#if !defined __loongarch_soft_float
IFUNC_IMPL_ADD (array, i, memcpy, SUPPORT_LASX, __memcpy_lasx)
diff --git a/sysdeps/loongarch/lp64/multiarch/ifunc-stpcpy.h b/sysdeps/loongarch/lp64/multiarch/ifunc-stpcpy.h
new file mode 100644
index 0000000000..3827ec5a7e
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/ifunc-stpcpy.h
@@ -0,0 +1,40 @@
+/* Common definition for stpcpy ifunc selections.
+ All versions must be listed in ifunc-impl-list.c.
+ Copyright (C) 2023 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include <ldsodefs.h>
+#include <ifunc-init.h>
+
+#if !defined __loongarch_soft_float
+extern __typeof (REDIRECT_NAME) OPTIMIZE (lasx) attribute_hidden;
+extern __typeof (REDIRECT_NAME) OPTIMIZE (lsx) attribute_hidden;
+#endif
+extern __typeof (REDIRECT_NAME) OPTIMIZE (aligned) attribute_hidden;
+
+static inline void *
+IFUNC_SELECTOR (void)
+{
+#if !defined __loongarch_soft_float
+ if (SUPPORT_LASX)
+ return OPTIMIZE (lasx);
+ else if (SUPPORT_LSX)
+ return OPTIMIZE (lsx);
+ else
+#endif
+ return OPTIMIZE (aligned);
+}
diff --git a/sysdeps/loongarch/lp64/multiarch/stpcpy-aligned.S b/sysdeps/loongarch/lp64/multiarch/stpcpy-aligned.S
new file mode 100644
index 0000000000..1520597b91
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/stpcpy-aligned.S
@@ -0,0 +1,191 @@
+/* Optimized stpcpy aligned implementation using basic LoongArch instructions.
+ Copyright (C) 2023 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library. If not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include <sysdep.h>
+#include <sys/regdef.h>
+#include <sys/asm.h>
+
+#if IS_IN (libc)
+# define STPCPY_NAME __stpcpy_aligned
+#else
+# define STPCPY_NAME __stpcpy
+#endif
+
+LEAF(STPCPY_NAME, 6)
+ andi a3, a0, 0x7
+ beqz a3, L(dest_align)
+ sub.d a5, a1, a3
+ addi.d a5, a5, 8
+
+L(make_dest_align):
+ ld.b t0, a1, 0
+ addi.d a1, a1, 1
+ st.b t0, a0, 0
+ addi.d a0, a0, 1
+
+ beqz t0, L(al_out)
+ bne a1, a5, L(make_dest_align)
+
+L(dest_align):
+ andi a4, a1, 7
+ bstrins.d a1, zero, 2, 0
+
+ lu12i.w t5, 0x1010
+ ld.d t0, a1, 0
+ ori t5, t5, 0x101
+ bstrins.d t5, t5, 63, 32
+
+ slli.d t6, t5, 0x7
+ bnez a4, L(unalign)
+ sub.d t1, t0, t5
+ andn t2, t6, t0
+
+ and t3, t1, t2
+ bnez t3, L(al_end)
+
+L(al_loop):
+ st.d t0, a0, 0
+ ld.d t0, a1, 8
+
+ addi.d a1, a1, 8
+ addi.d a0, a0, 8
+ sub.d t1, t0, t5
+ andn t2, t6, t0
+
+ and t3, t1, t2
+ beqz t3, L(al_loop)
+
+L(al_end):
+ ctz.d t1, t3
+ srli.d t1, t1, 3
+ addi.d t1, t1, 1
+
+ andi a3, t1, 8
+ andi a4, t1, 4
+ andi a5, t1, 2
+ andi a6, t1, 1
+
+L(al_end_8):
+ beqz a3, L(al_end_4)
+ st.d t0, a0, 0
+ addi.d a0, a0, 7
+ jr ra
+L(al_end_4):
+ beqz a4, L(al_end_2)
+ st.w t0, a0, 0
+ addi.d a0, a0, 4
+ srli.d t0, t0, 32
+L(al_end_2):
+ beqz a5, L(al_end_1)
+ st.h t0, a0, 0
+ addi.d a0, a0, 2
+ srli.d t0, t0, 16
+L(al_end_1):
+ beqz a6, L(al_out)
+ st.b t0, a0, 0
+ addi.d a0, a0, 1
+L(al_out):
+ addi.d a0, a0, -1
+ jr ra
+
+L(unalign):
+ slli.d a5, a4, 3
+ li.d t1, -1
+ sub.d a6, zero, a5
+
+ srl.d a7, t0, a5
+ sll.d t7, t1, a6
+
+ or t0, a7, t7
+ sub.d t1, t0, t5
+ andn t2, t6, t0
+ and t3, t1, t2
+
+ bnez t3, L(un_end)
+
+ ld.d t4, a1, 8
+ addi.d a1, a1, 8
+
+ sub.d t1, t4, t5
+ andn t2, t6, t4
+ sll.d t0, t4, a6
+ and t3, t1, t2
+
+ or t0, t0, a7
+ bnez t3, L(un_end_with_remaining)
+
+L(un_loop):
+ srl.d a7, t4, a5
+
+ ld.d t4, a1, 8
+ addi.d a1, a1, 8
+
+ st.d t0, a0, 0
+ addi.d a0, a0, 8
+
+ sub.d t1, t4, t5
+ andn t2, t6, t4
+ sll.d t0, t4, a6
+ and t3, t1, t2
+
+ or t0, t0, a7
+ beqz t3, L(un_loop)
+
+L(un_end_with_remaining):
+ ctz.d t1, t3
+ srli.d t1, t1, 3
+ addi.d t1, t1, 1
+ sub.d t1, t1, a4
+
+ blt t1, zero, L(un_end_less_8)
+ st.d t0, a0, 0
+ addi.d a0, a0, 8
+ beqz t1, L(un_out)
+ srl.d t0, t4, a5
+ b L(un_end_less_8)
+
+L(un_end):
+ ctz.d t1, t3
+ srli.d t1, t1, 3
+ addi.d t1, t1, 1
+
+L(un_end_less_8):
+ andi a4, t1, 4
+ andi a5, t1, 2
+ andi a6, t1, 1
+L(un_end_4):
+ beqz a4, L(un_end_2)
+ st.w t0, a0, 0
+ addi.d a0, a0, 4
+ srli.d t0, t0, 32
+L(un_end_2):
+ beqz a5, L(un_end_1)
+ st.h t0, a0, 0
+ addi.d a0, a0, 2
+ srli.d t0, t0, 16
+L(un_end_1):
+ beqz a6, L(un_out)
+ st.b t0, a0, 0
+ addi.d a0, a0, 1
+L(un_out):
+ addi.d a0, a0, -1
+ jr ra
+
+END(STPCPY_NAME)
+
+libc_hidden_builtin_def (STPCPY_NAME)
diff --git a/sysdeps/loongarch/lp64/multiarch/stpcpy-lasx.S b/sysdeps/loongarch/lp64/multiarch/stpcpy-lasx.S
new file mode 100644
index 0000000000..c21b132239
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/stpcpy-lasx.S
@@ -0,0 +1,208 @@
+/* Optimized stpcpy implementation using LoongArch LASX instructions.
+ Copyright (C) 2023 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library. If not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include <sysdep.h>
+#include <sys/regdef.h>
+#include <sys/asm.h>
+
+#if IS_IN (libc) && !defined __loongarch_soft_float
+
+# define STPCPY __stpcpy_lasx
+
+LEAF(STPCPY, 6)
+ ori t8, zero, 0xfe0
+ andi t0, a1, 0xfff
+ li.d t7, -1
+ move a2, a0
+
+ bltu t8, t0, L(page_cross_start)
+L(start_entry):
+ xvld xr0, a1, 0
+ li.d t0, 32
+ andi t1, a2, 0x1f
+
+ xvsetanyeqz.b fcc0, xr0
+ sub.d t0, t0, t1
+ bcnez fcc0, L(end)
+ add.d a1, a1, t0
+
+ xvst xr0, a2, 0
+ andi a3, a1, 0x1f
+ add.d a2, a2, t0
+ bnez a3, L(unaligned)
+
+
+ xvld xr0, a1, 0
+ xvsetanyeqz.b fcc0, xr0
+ bcnez fcc0, L(al_end)
+L(al_loop):
+ xvst xr0, a2, 0
+
+ xvld xr0, a1, 32
+ addi.d a2, a2, 32
+ addi.d a1, a1, 32
+ xvsetanyeqz.b fcc0, xr0
+
+ bceqz fcc0, L(al_loop)
+L(al_end):
+ xvmsknz.b xr0, xr0
+ xvpickve.w xr1, xr0, 4
+ vilvl.h vr0, vr1, vr0
+
+ movfr2gr.s t0, fa0
+ cto.w t0, t0
+ add.d a1, a1, t0
+ xvld xr0, a1, -31
+
+
+ add.d a0, a2, t0
+ xvst xr0, a0, -31
+ jr ra
+ nop
+
+L(page_cross_start):
+ move a4, a1
+ bstrins.d a4, zero, 4, 0
+ xvld xr0, a4, 0
+ xvmsknz.b xr0, xr0
+
+ xvpickve.w xr1, xr0, 4
+ vilvl.h vr0, vr1, vr0
+ movfr2gr.s t0, fa0
+ sra.w t0, t0, a1
+
+ beq t0, t7, L(start_entry)
+ b L(tail)
+L(unaligned):
+ andi t0, a1, 0xfff
+ bltu t8, t0, L(un_page_cross)
+
+
+L(un_start_entry):
+ xvld xr0, a1, 0
+ xvsetanyeqz.b fcc0, xr0
+ bcnez fcc0, L(un_end)
+ addi.d a1, a1, 32
+
+L(un_loop):
+ xvst xr0, a2, 0
+ andi t0, a1, 0xfff
+ addi.d a2, a2, 32
+ bltu t8, t0, L(page_cross_loop)
+
+L(un_loop_entry):
+ xvld xr0, a1, 0
+ addi.d a1, a1, 32
+ xvsetanyeqz.b fcc0, xr0
+ bceqz fcc0, L(un_loop)
+
+ addi.d a1, a1, -32
+L(un_end):
+ xvmsknz.b xr0, xr0
+ xvpickve.w xr1, xr0, 4
+ vilvl.h vr0, vr1, vr0
+
+
+ movfr2gr.s t0, fa0
+L(un_tail):
+ cto.w t0, t0
+ add.d a1, a1, t0
+ xvld xr0, a1, -31
+
+ add.d a0, a2, t0
+ xvst xr0, a0, -31
+ jr ra
+L(un_page_cross):
+ sub.d a4, a1, a3
+
+ xvld xr0, a4, 0
+ xvmsknz.b xr0, xr0
+ xvpickve.w xr1, xr0, 4
+ vilvl.h vr0, vr1, vr0
+
+ movfr2gr.s t0, fa0
+ sra.w t0, t0, a1
+ beq t0, t7, L(un_start_entry)
+ b L(un_tail)
+
+
+L(page_cross_loop):
+ sub.d a4, a1, a3
+ xvld xr0, a4, 0
+ xvmsknz.b xr0, xr0
+ xvpickve.w xr1, xr0, 4
+
+ vilvl.h vr0, vr1, vr0
+ movfr2gr.s t0, fa0
+ sra.w t0, t0, a1
+ beq t0, t7, L(un_loop_entry)
+
+ b L(un_tail)
+L(end):
+ xvmsknz.b xr0, xr0
+ xvpickve.w xr1, xr0, 4
+ vilvl.h vr0, vr1, vr0
+
+ movfr2gr.s t0, fa0
+L(tail):
+ cto.w t0, t0
+ add.d a0, a2, t0
+ add.d a5, a1, t0
+
+L(less_32):
+ srli.d t1, t0, 4
+ beqz t1, L(less_16)
+ vld vr0, a1, 0
+ vld vr1, a5, -15
+
+ vst vr0, a2, 0
+ vst vr1, a0, -15
+ jr ra
+L(less_16):
+ srli.d t1, t0, 3
+
+ beqz t1, L(less_8)
+ ld.d t2, a1, 0
+ ld.d t3, a5, -7
+ st.d t2, a2, 0
+
+ st.d t3, a0, -7
+ jr ra
+L(less_8):
+ li.d t1, 3
+ bltu t0, t1, L(less_4)
+
+ ld.w t2, a1, 0
+ ld.w t3, a5, -3
+ st.w t2, a2, 0
+ st.w t3, a0, -3
+
+ jr ra
+L(less_4):
+ srli.d t1, t0, 2
+ bgeu t1, t0, L(zero_byte)
+ ld.h t2, a1, 0
+
+ st.h t2, a2, 0
+L(zero_byte):
+ st.b zero, a0, 0
+ jr ra
+END(STPCPY)
+
+libc_hidden_builtin_def (STPCPY)
+#endif
diff --git a/sysdeps/loongarch/lp64/multiarch/stpcpy-lsx.S b/sysdeps/loongarch/lp64/multiarch/stpcpy-lsx.S
new file mode 100644
index 0000000000..34ceadee66
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/stpcpy-lsx.S
@@ -0,0 +1,206 @@
+/* Optimized stpcpy implementation using LoongArch LSX instructions.
+ Copyright (C) 2023 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library. If not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include <sysdep.h>
+#include <sys/regdef.h>
+#include <sys/asm.h>
+
+#if IS_IN (libc) && !defined __loongarch_soft_float
+
+# define STPCPY __stpcpy_lsx
+
+LEAF(STPCPY, 6)
+ pcalau12i t0, %pc_hi20(L(INDEX))
+ andi a4, a1, 0xf
+ vld vr1, t0, %pc_lo12(L(INDEX))
+ beqz a4, L(load_start)
+
+ xor t0, a1, a4
+ vld vr0, t0, 0
+ vreplgr2vr.b vr2, a4
+ vadd.b vr2, vr2, vr1
+
+ vshuf.b vr0, vr2, vr0, vr2
+ vsetanyeqz.b fcc0, vr0
+ bcnez fcc0, L(end)
+L(load_start):
+ vld vr0, a1, 0
+
+ li.d t1, 16
+ andi a3, a0, 0xf
+ vsetanyeqz.b fcc0, vr0
+ sub.d t0, t1, a3
+
+
+ bcnez fcc0, L(end)
+ add.d a1, a1, t0
+ vst vr0, a0, 0
+ add.d a0, a0, t0
+
+ bne a3, a4, L(unaligned)
+ vld vr0, a1, 0
+ vsetanyeqz.b fcc0, vr0
+ bcnez fcc0, L(al_end)
+
+L(al_loop):
+ vst vr0, a0, 0
+ vld vr0, a1, 16
+ addi.d a0, a0, 16
+ addi.d a1, a1, 16
+
+ vsetanyeqz.b fcc0, vr0
+ bceqz fcc0, L(al_loop)
+L(al_end):
+ vmsknz.b vr1, vr0
+ movfr2gr.s t0, fa1
+
+
+ cto.w t0, t0
+ add.d a1, a1, t0
+ vld vr0, a1, -15
+ add.d a0, a0, t0
+
+ vst vr0, a0, -15
+ jr ra
+ nop
+ nop
+
+L(end):
+ vseqi.b vr1, vr0, 0
+ vfrstpi.b vr1, vr1, 0
+ vpickve2gr.bu t0, vr1, 0
+ addi.d t0, t0, 1
+
+L(end_16):
+ andi t1, t0, 16
+ beqz t1, L(end_8)
+ vst vr0, a0, 0
+ addi.d a0, a0, 15
+
+
+ jr ra
+L(end_8):
+ andi t2, t0, 8
+ andi t3, t0, 4
+ andi t4, t0, 2
+
+ andi t5, t0, 1
+ beqz t2, L(end_4)
+ vstelm.d vr0, a0, 0, 0
+ addi.d a0, a0, 8
+
+ vbsrl.v vr0, vr0, 8
+L(end_4):
+ beqz t3, L(end_2)
+ vstelm.w vr0, a0, 0, 0
+ addi.d a0, a0, 4
+
+ vbsrl.v vr0, vr0, 4
+L(end_2):
+ beqz t4, L(end_1)
+ vstelm.h vr0, a0, 0, 0
+ addi.d a0, a0, 2
+
+
+ vbsrl.v vr0, vr0, 2
+L(end_1):
+ beqz t5, L(out)
+ vstelm.b vr0, a0, 0, 0
+ addi.d a0, a0, 1
+
+L(out):
+ addi.d a0, a0, -1
+ jr ra
+ nop
+ nop
+
+L(unaligned):
+ andi a3, a1, 0xf
+ bstrins.d a1, zero, 3, 0
+ vld vr2, a1, 0
+ vreplgr2vr.b vr3, a3
+
+ vslt.b vr4, vr1, vr3
+ vor.v vr0, vr2, vr4
+ vsetanyeqz.b fcc0, vr0
+ bcnez fcc0, L(un_first_end)
+
+
+ vld vr0, a1, 16
+ vadd.b vr3, vr3, vr1
+ vshuf.b vr4, vr0, vr2, vr3
+ vsetanyeqz.b fcc0, vr0
+
+ bcnez fcc0, L(un_end)
+ vor.v vr2, vr0, vr0
+ addi.d a1, a1, 16
+L(un_loop):
+ vld vr0, a1, 16
+
+ vst vr4, a0, 0
+ addi.d a0, a0, 16
+ vshuf.b vr4, vr0, vr2, vr3
+ vsetanyeqz.b fcc0, vr0
+
+ bcnez fcc0, L(un_end)
+ vld vr2, a1, 32
+ vst vr4, a0, 0
+ addi.d a1, a1, 32
+
+
+ addi.d a0, a0, 16
+ vshuf.b vr4, vr2, vr0, vr3
+ vsetanyeqz.b fcc0, vr2
+ bceqz fcc0, L(un_loop)
+
+ vor.v vr0, vr2, vr2
+ addi.d a1, a1, -16
+L(un_end):
+ vsetanyeqz.b fcc0, vr4
+ bcnez fcc0, 1f
+
+ vst vr4, a0, 0
+1:
+ vmsknz.b vr1, vr0
+ movfr2gr.s t0, fa1
+ cto.w t0, t0
+
+ add.d a1, a1, t0
+ vld vr0, a1, 1
+ add.d a0, a0, t0
+ sub.d a0, a0, a3
+
+
+ vst vr0, a0, 1
+ addi.d a0, a0, 16
+ jr ra
+L(un_first_end):
+ addi.d a0, a0, -16
+
+ addi.d a1, a1, -16
+ b 1b
+END(STPCPY)
+
+ .section .rodata.cst16,"M",@progbits,16
+ .align 4
+L(INDEX):
+ .dword 0x0706050403020100
+ .dword 0x0f0e0d0c0b0a0908
+
+libc_hidden_builtin_def (STPCPY)
+#endif
diff --git a/sysdeps/loongarch/lp64/multiarch/stpcpy.c b/sysdeps/loongarch/lp64/multiarch/stpcpy.c
new file mode 100644
index 0000000000..62115e4055
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/stpcpy.c
@@ -0,0 +1,42 @@
+/* Multiple versions of stpcpy.
+ All versions must be listed in ifunc-impl-list.c.
+ Copyright (C) 2017-2023 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+/* Define multiple versions only for the definition in libc. */
+#if IS_IN (libc)
+# define stpcpy __redirect_stpcpy
+# define __stpcpy __redirect___stpcpy
+# define NO_MEMPCPY_STPCPY_REDIRECT
+# define __NO_STRING_INLINES
+# include <string.h>
+# undef stpcpy
+# undef __stpcpy
+
+# define SYMBOL_NAME stpcpy
+# include "ifunc-stpcpy.h"
+
+libc_ifunc_redirected (__redirect_stpcpy, __stpcpy, IFUNC_SELECTOR ());
+
+weak_alias (__stpcpy, stpcpy)
+# ifdef SHARED
+__hidden_ver1 (__stpcpy, __GI___stpcpy, __redirect___stpcpy)
+ __attribute__ ((visibility ("hidden"))) __attribute_copy__ (stpcpy);
+__hidden_ver1 (stpcpy, __GI_stpcpy, __redirect_stpcpy)
+ __attribute__ ((visibility ("hidden"))) __attribute_copy__ (stpcpy);
+# endif
+#endif
--
2.40.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 3/4] LoongArch: Add ifunc support for strrchr{aligned, lsx, lasx}
2023-09-08 9:33 [PATCH 0/4] LoongArch: Add ifunc support for str{cpy, rchr}, dengjianbo
2023-09-08 9:33 ` [PATCH 1/4] LoongArch: Add ifunc support for strcpy{aligned, unaligned, lsx, lasx} dengjianbo
2023-09-08 9:33 ` [PATCH 2/4] LoongArch: Add ifunc support for stpcpy{aligned, " dengjianbo
@ 2023-09-08 9:33 ` dengjianbo
2023-09-08 9:33 ` [PATCH 4/4] LoongArch: Change to put magic number to .rodata section dengjianbo
3 siblings, 0 replies; 8+ messages in thread
From: dengjianbo @ 2023-09-08 9:33 UTC (permalink / raw)
To: libc-alpha
Cc: adhemerval.zanella, xry111, caiyinyu, xuchenghua, huangpei, dengjianbo
According to glibc strrchr microbenchmark test results, this implementation
could reduce the runtime time as following:
Name Percent of rutime reduced
strrchr-lasx 10%-50%
strrchr-lsx 0%-50%
strrchr-aligned 5%-50%
Generic strrchr is implemented by function strlen + memrchr, the lasx version
will compare with generic strrchr implemented by strlen-lasx + memrchr-lasx,
the lsx version will compare with generic strrchr implemented by strlen-lsx +
memrchr-lsx, the aligned version will compare with generic strrchr implemented
by strlen-aligned + memrchr-generic.
---
sysdeps/loongarch/lp64/multiarch/Makefile | 3 +
.../lp64/multiarch/ifunc-impl-list.c | 8 +
.../loongarch/lp64/multiarch/ifunc-strrchr.h | 41 ++++
.../lp64/multiarch/strrchr-aligned.S | 170 +++++++++++++++++
.../loongarch/lp64/multiarch/strrchr-lasx.S | 176 ++++++++++++++++++
.../loongarch/lp64/multiarch/strrchr-lsx.S | 144 ++++++++++++++
sysdeps/loongarch/lp64/multiarch/strrchr.c | 36 ++++
7 files changed, 578 insertions(+)
create mode 100644 sysdeps/loongarch/lp64/multiarch/ifunc-strrchr.h
create mode 100644 sysdeps/loongarch/lp64/multiarch/strrchr-aligned.S
create mode 100644 sysdeps/loongarch/lp64/multiarch/strrchr-lasx.S
create mode 100644 sysdeps/loongarch/lp64/multiarch/strrchr-lsx.S
create mode 100644 sysdeps/loongarch/lp64/multiarch/strrchr.c
diff --git a/sysdeps/loongarch/lp64/multiarch/Makefile b/sysdeps/loongarch/lp64/multiarch/Makefile
index f95eb5c4fe..23041fd727 100644
--- a/sysdeps/loongarch/lp64/multiarch/Makefile
+++ b/sysdeps/loongarch/lp64/multiarch/Makefile
@@ -23,6 +23,9 @@ sysdep_routines += \
stpcpy-aligned \
stpcpy-lsx \
stpcpy-lasx \
+ strrchr-aligned \
+ strrchr-lsx \
+ strrchr-lasx \
memcpy-aligned \
memcpy-unaligned \
memmove-unaligned \
diff --git a/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c
index 539aa681f9..ceab78dbfe 100644
--- a/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c
@@ -93,6 +93,14 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
IFUNC_IMPL_ADD (array, i, stpcpy, 1, __stpcpy_aligned)
)
+ IFUNC_IMPL (i, name, strrchr,
+#if !defined __loongarch_soft_float
+ IFUNC_IMPL_ADD (array, i, strrchr, SUPPORT_LASX, __strrchr_lasx)
+ IFUNC_IMPL_ADD (array, i, strrchr, SUPPORT_LSX, __strrchr_lsx)
+#endif
+ IFUNC_IMPL_ADD (array, i, strrchr, 1, __strrchr_aligned)
+ )
+
IFUNC_IMPL (i, name, memcpy,
#if !defined __loongarch_soft_float
IFUNC_IMPL_ADD (array, i, memcpy, SUPPORT_LASX, __memcpy_lasx)
diff --git a/sysdeps/loongarch/lp64/multiarch/ifunc-strrchr.h b/sysdeps/loongarch/lp64/multiarch/ifunc-strrchr.h
new file mode 100644
index 0000000000..bbb34089ef
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/ifunc-strrchr.h
@@ -0,0 +1,41 @@
+/* Common definition for strrchr ifunc selections.
+ All versions must be listed in ifunc-impl-list.c.
+ Copyright (C) 2023 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include <ldsodefs.h>
+#include <ifunc-init.h>
+
+#if !defined __loongarch_soft_float
+extern __typeof (REDIRECT_NAME) OPTIMIZE (lasx) attribute_hidden;
+extern __typeof (REDIRECT_NAME) OPTIMIZE (lsx) attribute_hidden;
+#endif
+
+extern __typeof (REDIRECT_NAME) OPTIMIZE (aligned) attribute_hidden;
+
+static inline void *
+IFUNC_SELECTOR (void)
+{
+#if !defined __loongarch_soft_float
+ if (SUPPORT_LASX)
+ return OPTIMIZE (lasx);
+ else if (SUPPORT_LSX)
+ return OPTIMIZE (lsx);
+ else
+#endif
+ return OPTIMIZE (aligned);
+}
diff --git a/sysdeps/loongarch/lp64/multiarch/strrchr-aligned.S b/sysdeps/loongarch/lp64/multiarch/strrchr-aligned.S
new file mode 100644
index 0000000000..a73deb7840
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/strrchr-aligned.S
@@ -0,0 +1,170 @@
+/* Optimized strrchr implementation using basic LoongArch instructions.
+ Copyright (C) 2023 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library. If not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include <sysdep.h>
+#include <sys/regdef.h>
+#include <sys/asm.h>
+
+#if IS_IN (libc)
+# define STRRCHR __strrchr_aligned
+#else
+# define STRRCHR strrchr
+#endif
+
+LEAF(STRRCHR, 6)
+ slli.d t0, a0, 3
+ bstrins.d a0, zero, 2, 0
+ lu12i.w a2, 0x01010
+ ld.d t2, a0, 0
+
+ andi a1, a1, 0xff
+ ori a2, a2, 0x101
+ li.d t3, -1
+ bstrins.d a2, a2, 63, 32
+
+ sll.d t5, t3, t0
+ slli.d a3, a2, 7
+ orn t4, t2, t5
+ mul.d a1, a1, a2
+
+ sub.d t0, t4, a2
+ andn t1, a3, t4
+ and t1, t0, t1
+ beqz t1, L(find_tail)
+
+
+ ctz.d t0, t1
+ orn t0, zero, t0
+ xor t2, t4, a1
+ srl.d t0, t3, t0
+
+ orn t2, t2, t0
+ orn t2, t2, t5
+ revb.d t2, t2
+ sub.d t1, t2, a2
+
+ andn t0, a3, t2
+ and t1, t0, t1
+ ctz.d t0, t1
+ srli.d t0, t0, 3
+
+ addi.d a0, a0, 7
+ sub.d a0, a0, t0
+ maskeqz a0, a0, t1
+ jr ra
+
+
+L(find_tail):
+ addi.d a4, a0, 8
+ addi.d a0, a0, 8
+L(loop_ascii):
+ ld.d t2, a0, 0
+ sub.d t1, t2, a2
+
+ and t0, t1, a3
+ bnez t0, L(more_check)
+ ld.d t2, a0, 8
+ sub.d t1, t2, a2
+
+ and t0, t1, a3
+ addi.d a0, a0, 16
+ beqz t0, L(loop_ascii)
+ addi.d a0, a0, -8
+
+L(more_check):
+ andn t0, a3, t2
+ and t1, t1, t0
+ bnez t1, L(tail)
+ addi.d a0, a0, 8
+
+
+L(loop_nonascii):
+ ld.d t2, a0, 0
+ sub.d t1, t2, a2
+ andn t0, a3, t2
+ and t1, t0, t1
+
+ bnez t1, L(tail)
+ ld.d t2, a0, 8
+ addi.d a0, a0, 16
+ sub.d t1, t2, a2
+
+ andn t0, a3, t2
+ and t1, t0, t1
+ beqz t1, L(loop_nonascii)
+ addi.d a0, a0, -8
+
+L(tail):
+ ctz.d t0, t1
+ orn t0, zero, t0
+ xor t2, t2, a1
+ srl.d t0, t3, t0
+
+
+ orn t2, t2, t0
+ revb.d t2, t2
+ sub.d t1, t2, a2
+ andn t0, a3, t2
+
+ and t1, t0, t1
+ bnez t1, L(count_pos)
+L(find_loop):
+ beq a0, a4, L(find_end)
+ ld.d t2, a0, -8
+
+ addi.d a0, a0, -8
+ xor t2, t2, a1
+ sub.d t1, t2, a2
+ andn t0, a3, t2
+
+ and t1, t0, t1
+ beqz t1, L(find_loop)
+ revb.d t2, t2
+ sub.d t1, t2, a2
+
+
+ andn t0, a3, t2
+ and t1, t0, t1
+L(count_pos):
+ ctz.d t0, t1
+ addi.d a0, a0, 7
+
+ srli.d t0, t0, 3
+ sub.d a0, a0, t0
+ jr ra
+ nop
+
+L(find_end):
+ xor t2, t4, a1
+ orn t2, t2, t5
+ revb.d t2, t2
+ sub.d t1, t2, a2
+
+
+ andn t0, a3, t2
+ and t1, t0, t1
+ ctz.d t0, t1
+ srli.d t0, t0, 3
+
+ addi.d a0, a4, -1
+ sub.d a0, a0, t0
+ maskeqz a0, a0, t1
+ jr ra
+END(STRRCHR)
+
+libc_hidden_builtin_def(STRRCHR)
diff --git a/sysdeps/loongarch/lp64/multiarch/strrchr-lasx.S b/sysdeps/loongarch/lp64/multiarch/strrchr-lasx.S
new file mode 100644
index 0000000000..5a6e22979a
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/strrchr-lasx.S
@@ -0,0 +1,176 @@
+/* Optimized strrchr implementation using LoongArch LASX instructions.
+ Copyright (C) 2023 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library. If not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include <sysdep.h>
+#include <sys/regdef.h>
+#include <sys/asm.h>
+
+#if IS_IN (libc) && !defined __loongarch_soft_float
+
+#define STRRCHR __strrchr_lasx
+
+LEAF(STRRCHR, 6)
+ move a2, a0
+ bstrins.d a0, zero, 5, 0
+ xvld xr0, a0, 0
+ xvld xr1, a0, 32
+
+ li.d t2, -1
+ xvreplgr2vr.b xr4, a1
+ xvmsknz.b xr2, xr0
+ xvmsknz.b xr3, xr1
+
+ xvpickve.w xr5, xr2, 4
+ xvpickve.w xr6, xr3, 4
+ vilvl.h vr2, vr5, vr2
+ vilvl.h vr3, vr6, vr3
+
+ vilvl.w vr2, vr3, vr2
+ movfr2gr.d t0, fa2
+ sra.d t0, t0, a2
+ beq t0, t2, L(find_tail)
+
+
+ xvseq.b xr2, xr0, xr4
+ xvseq.b xr3, xr1, xr4
+ xvmsknz.b xr2, xr2
+ xvmsknz.b xr3, xr3
+
+ xvpickve.w xr4, xr2, 4
+ xvpickve.w xr5, xr3, 4
+ vilvl.h vr2, vr4, vr2
+ vilvl.h vr3, vr5, vr3
+
+ vilvl.w vr1, vr3, vr2
+ slli.d t3, t2, 1
+ movfr2gr.d t1, fa1
+ cto.d t0, t0
+
+ srl.d t1, t1, a2
+ sll.d t3, t3, t0
+ addi.d a0, a2, 63
+ andn t1, t1, t3
+
+
+ clz.d t0, t1
+ sub.d a0, a0, t0
+ maskeqz a0, a0, t1
+ jr ra
+
+ .align 5
+L(find_tail):
+ addi.d a3, a0, 64
+L(loop):
+ xvld xr2, a0, 64
+ xvld xr3, a0, 96
+ addi.d a0, a0, 64
+
+ xvmin.bu xr5, xr2, xr3
+ xvsetanyeqz.b fcc0, xr5
+ bceqz fcc0, L(loop)
+ xvmsknz.b xr5, xr2
+
+
+ xvmsknz.b xr6, xr3
+ xvpickve.w xr7, xr5, 4
+ xvpickve.w xr8, xr6, 4
+ vilvl.h vr5, vr7, vr5
+
+ vilvl.h vr6, vr8, vr6
+ xvseq.b xr2, xr2, xr4
+ xvseq.b xr3, xr3, xr4
+ xvmsknz.b xr2, xr2
+
+ xvmsknz.b xr3, xr3
+ xvpickve.w xr7, xr2, 4
+ xvpickve.w xr8, xr3, 4
+ vilvl.h vr2, vr7, vr2
+
+ vilvl.h vr3, vr8, vr3
+ vilvl.w vr5, vr6, vr5
+ vilvl.w vr2, vr3, vr2
+ movfr2gr.d t0, fa5
+
+
+ movfr2gr.d t1, fa2
+ slli.d t3, t2, 1
+ cto.d t0, t0
+ sll.d t3, t3, t0
+
+ andn t1, t1, t3
+ beqz t1, L(find_loop)
+ clz.d t0, t1
+ addi.d a0, a0, 63
+
+ sub.d a0, a0, t0
+ jr ra
+L(find_loop):
+ beq a0, a3, L(find_end)
+ xvld xr2, a0, -64
+
+ xvld xr3, a0, -32
+ addi.d a0, a0, -64
+ xvseq.b xr2, xr2, xr4
+ xvseq.b xr3, xr3, xr4
+
+
+ xvmax.bu xr5, xr2, xr3
+ xvseteqz.v fcc0, xr5
+ bcnez fcc0, L(find_loop)
+ xvmsknz.b xr0, xr2
+
+ xvmsknz.b xr1, xr3
+ xvpickve.w xr2, xr0, 4
+ xvpickve.w xr3, xr1, 4
+ vilvl.h vr0, vr2, vr0
+
+ vilvl.h vr1, vr3, vr1
+ vilvl.w vr0, vr1, vr0
+ movfr2gr.d t0, fa0
+ addi.d a0, a0, 63
+
+ clz.d t0, t0
+ sub.d a0, a0, t0
+ jr ra
+ nop
+
+
+L(find_end):
+ xvseq.b xr2, xr0, xr4
+ xvseq.b xr3, xr1, xr4
+ xvmsknz.b xr2, xr2
+ xvmsknz.b xr3, xr3
+
+ xvpickve.w xr4, xr2, 4
+ xvpickve.w xr5, xr3, 4
+ vilvl.h vr2, vr4, vr2
+ vilvl.h vr3, vr5, vr3
+
+ vilvl.w vr1, vr3, vr2
+ movfr2gr.d t1, fa1
+ addi.d a0, a2, 63
+ srl.d t1, t1, a2
+
+ clz.d t0, t1
+ sub.d a0, a0, t0
+ maskeqz a0, a0, t1
+ jr ra
+END(STRRCHR)
+
+libc_hidden_builtin_def(STRRCHR)
+#endif
diff --git a/sysdeps/loongarch/lp64/multiarch/strrchr-lsx.S b/sysdeps/loongarch/lp64/multiarch/strrchr-lsx.S
new file mode 100644
index 0000000000..8f2fd22e50
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/strrchr-lsx.S
@@ -0,0 +1,144 @@
+/* Optimized strrchr implementation using LoongArch LSX instructions.
+ Copyright (C) 2023 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library. If not, see
+ <https://www.gnu.org/licenses/>. */
+
+#include <sysdep.h>
+#include <sys/regdef.h>
+#include <sys/asm.h>
+
+#if IS_IN (libc) && !defined __loongarch_soft_float
+
+#define STRRCHR __strrchr_lsx
+
+LEAF(STRRCHR, 6)
+ move a2, a0
+ bstrins.d a0, zero, 4, 0
+ vld vr0, a0, 0
+ vld vr1, a0, 16
+
+ li.d t2, -1
+ vreplgr2vr.b vr4, a1
+ vmsknz.b vr2, vr0
+ vmsknz.b vr3, vr1
+
+ vilvl.h vr2, vr3, vr2
+ movfr2gr.s t0, fa2
+ sra.w t0, t0, a2
+ beq t0, t2, L(find_tail)
+
+ vseq.b vr2, vr0, vr4
+ vseq.b vr3, vr1, vr4
+ vmsknz.b vr2, vr2
+ vmsknz.b vr3, vr3
+
+
+ vilvl.h vr1, vr3, vr2
+ slli.d t3, t2, 1
+ movfr2gr.s t1, fa1
+ cto.w t0, t0
+
+ srl.w t1, t1, a2
+ sll.d t3, t3, t0
+ addi.d a0, a2, 31
+ andn t1, t1, t3
+
+ clz.w t0, t1
+ sub.d a0, a0, t0
+ maskeqz a0, a0, t1
+ jr ra
+
+ .align 5
+L(find_tail):
+ addi.d a3, a0, 32
+L(loop):
+ vld vr2, a0, 32
+ vld vr3, a0, 48
+ addi.d a0, a0, 32
+
+ vmin.bu vr5, vr2, vr3
+ vsetanyeqz.b fcc0, vr5
+ bceqz fcc0, L(loop)
+ vmsknz.b vr5, vr2
+
+ vmsknz.b vr6, vr3
+ vilvl.h vr5, vr6, vr5
+ vseq.b vr2, vr2, vr4
+ vseq.b vr3, vr3, vr4
+
+ vmsknz.b vr2, vr2
+ vmsknz.b vr3, vr3
+ vilvl.h vr2, vr3, vr2
+ movfr2gr.s t0, fa5
+
+
+ movfr2gr.s t1, fa2
+ slli.d t3, t2, 1
+ cto.w t0, t0
+ sll.d t3, t3, t0
+
+ andn t1, t1, t3
+ beqz t1, L(find_loop)
+ clz.w t0, t1
+ addi.d a0, a0, 31
+
+ sub.d a0, a0, t0
+ jr ra
+L(find_loop):
+ beq a0, a3, L(find_end)
+ vld vr2, a0, -32
+
+ vld vr3, a0, -16
+ addi.d a0, a0, -32
+ vseq.b vr2, vr2, vr4
+ vseq.b vr3, vr3, vr4
+
+
+ vmax.bu vr5, vr2, vr3
+ vseteqz.v fcc0, vr5
+ bcnez fcc0, L(find_loop)
+ vmsknz.b vr0, vr2
+
+ vmsknz.b vr1, vr3
+ vilvl.h vr0, vr1, vr0
+ movfr2gr.s t0, fa0
+ addi.d a0, a0, 31
+
+ clz.w t0, t0
+ sub.d a0, a0, t0
+ jr ra
+ nop
+
+L(find_end):
+ vseq.b vr2, vr0, vr4
+ vseq.b vr3, vr1, vr4
+ vmsknz.b vr2, vr2
+ vmsknz.b vr3, vr3
+
+
+ vilvl.h vr1, vr3, vr2
+ movfr2gr.s t1, fa1
+ addi.d a0, a2, 31
+ srl.w t1, t1, a2
+
+ clz.w t0, t1
+ sub.d a0, a0, t0
+ maskeqz a0, a0, t1
+ jr ra
+END(STRRCHR)
+
+libc_hidden_builtin_def(STRRCHR)
+#endif
diff --git a/sysdeps/loongarch/lp64/multiarch/strrchr.c b/sysdeps/loongarch/lp64/multiarch/strrchr.c
new file mode 100644
index 0000000000..d9c9f660a0
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/strrchr.c
@@ -0,0 +1,36 @@
+/* Multiple versions of strrchr.
+ All versions must be listed in ifunc-impl-list.c.
+ Copyright (C) 2023 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <https://www.gnu.org/licenses/>. */
+
+/* Define multiple versions only for the definition in libc. */
+#if IS_IN (libc)
+# define strrchr __redirect_strrchr
+# include <string.h>
+# undef strrchr
+
+# define SYMBOL_NAME strrchr
+# include "ifunc-strrchr.h"
+
+libc_ifunc_redirected (__redirect_strrchr, strrchr, IFUNC_SELECTOR ());
+weak_alias (strrchr, rindex)
+# ifdef SHARED
+__hidden_ver1 (strrchr, __GI_strrchr, __redirect_strrchr)
+ __attribute__ ((visibility ("hidden"))) __attribute_copy__ (strrchr);
+# endif
+
+#endif
--
2.40.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 4/4] LoongArch: Change to put magic number to .rodata section
2023-09-08 9:33 [PATCH 0/4] LoongArch: Add ifunc support for str{cpy, rchr}, dengjianbo
` (2 preceding siblings ...)
2023-09-08 9:33 ` [PATCH 3/4] LoongArch: Add ifunc support for strrchr{aligned, " dengjianbo
@ 2023-09-08 9:33 ` dengjianbo
3 siblings, 0 replies; 8+ messages in thread
From: dengjianbo @ 2023-09-08 9:33 UTC (permalink / raw)
To: libc-alpha
Cc: adhemerval.zanella, xry111, caiyinyu, xuchenghua, huangpei, dengjianbo
Change to put magic number to .rodata section in memmove-lsx, and use
pcalau12i and %pc_lo12 with vld to get the data.
---
.../loongarch/lp64/multiarch/memmove-lsx.S | 20 +++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/sysdeps/loongarch/lp64/multiarch/memmove-lsx.S b/sysdeps/loongarch/lp64/multiarch/memmove-lsx.S
index 8a9367708d..5eb819ef74 100644
--- a/sysdeps/loongarch/lp64/multiarch/memmove-lsx.S
+++ b/sysdeps/loongarch/lp64/multiarch/memmove-lsx.S
@@ -209,13 +209,10 @@ L(al_less_16):
nop
-L(magic_num):
- .dword 0x0706050403020100
- .dword 0x0f0e0d0c0b0a0908
L(unaligned):
- pcaddi t2, -4
+ pcalau12i t2, %pc_hi20(L(INDEX))
bstrins.d a1, zero, 3, 0
- vld vr8, t2, 0
+ vld vr8, t2, %pc_lo12(L(INDEX))
vld vr0, a1, 0
vld vr1, a1, 16
@@ -413,13 +410,10 @@ L(back_al_less_16):
vst vr1, a0, 0
jr ra
-L(magic_num_2):
- .dword 0x0706050403020100
- .dword 0x0f0e0d0c0b0a0908
L(back_unaligned):
- pcaddi t2, -4
+ pcalau12i t2, %pc_hi20(L(INDEX))
bstrins.d a4, zero, 3, 0
- vld vr8, t2, 0
+ vld vr8, t2, %pc_lo12(L(INDEX))
vld vr0, a4, 0
vld vr1, a4, -16
@@ -529,6 +523,12 @@ L(back_un_less_16):
jr ra
END(MEMMOVE_NAME)
+ .section .rodata.cst16,"M",@progbits,16
+ .align 4
+L(INDEX):
+ .dword 0x0706050403020100
+ .dword 0x0f0e0d0c0b0a0908
+
libc_hidden_builtin_def (MEMCPY_NAME)
libc_hidden_builtin_def (MEMMOVE_NAME)
#endif
--
2.40.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/4] LoongArch: Add ifunc support for strcpy{aligned, unaligned, lsx, lasx}
2023-09-08 9:33 ` [PATCH 1/4] LoongArch: Add ifunc support for strcpy{aligned, unaligned, lsx, lasx} dengjianbo
@ 2023-09-08 14:22 ` Xi Ruoyao
2023-09-11 9:53 ` dengjianbo
0 siblings, 1 reply; 8+ messages in thread
From: Xi Ruoyao @ 2023-09-08 14:22 UTC (permalink / raw)
To: dengjianbo, libc-alpha; +Cc: adhemerval.zanella, caiyinyu, xuchenghua, huangpei
On Fri, 2023-09-08 at 17:33 +0800, dengjianbo wrote:
> According to glibc strcpy microbenchmark test results(changed to use
> generic_strcpy instead of strlen + memcpy), comparing with generic_strcpy,
> this implementation could reduce the runtime as following:
>
> Name Percent of rutime reduced
> strcpy-aligned 10%-45%
> strcpy-unaligned 10%-49%, comparing with the aligned version,unaligned
> version experience better performance in case src and dest
> cannot be both aligned with 8bytes
> strcpy-lsx 20%-80%
> strcpy-lasx 15%-86%
Generic strcpy calls stpcpy, so if we've optimized stpcpy maybe it's not
necessary to duplicate everything in strcpy. Is there a benchmark
result comparing the timing with and without this patch, but both with
the second patch (optimized stpcpy)?
--
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/4] LoongArch: Add ifunc support for strcpy{aligned, unaligned, lsx, lasx}
2023-09-08 14:22 ` Xi Ruoyao
@ 2023-09-11 9:53 ` dengjianbo
2023-09-13 7:47 ` dengjianbo
0 siblings, 1 reply; 8+ messages in thread
From: dengjianbo @ 2023-09-11 9:53 UTC (permalink / raw)
To: Xi Ruoyao, libc-alpha; +Cc: adhemerval.zanella, caiyinyu, xuchenghua, huangpei
Tested strcpy-lasx comparing with strcpy(call stpcpy-lasx), the
difference between two timings are 0.28, strcpy-lasx takes less time.
When the length of data is less than 32, it could reduce the runtime
more than 30%.
See:
https://github.com/jiadengx/glibc_test/blob/main/bench/strcpy_lasx_compare_generic_strcpy.out
There are some duplicated code in strcpy from stpcpy, since the main
part is almost same. Maybe we can try to use one source code with
MARCO USE_AS_STPCPY to distinguish strcpy and stpcpy link x86_64? it
could avoid the performance degradation.
On 2023-09-08 22:22, Xi Ruoyao wrote:
> On Fri, 2023-09-08 at 17:33 +0800, dengjianbo wrote:
>> According to glibc strcpy microbenchmark test results(changed to use
>> generic_strcpy instead of strlen + memcpy), comparing with generic_strcpy,
>> this implementation could reduce the runtime as following:
>>
>> Name Percent of rutime reduced
>> strcpy-aligned 10%-45%
>> strcpy-unaligned 10%-49%, comparing with the aligned version,unaligned
>> version experience better performance in case src and dest
>> cannot be both aligned with 8bytes
>> strcpy-lsx 20%-80%
>> strcpy-lasx 15%-86%
> Generic strcpy calls stpcpy, so if we've optimized stpcpy maybe it's not
> necessary to duplicate everything in strcpy. Is there a benchmark
> result comparing the timing with and without this patch, but both with
> the second patch (optimized stpcpy)?
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/4] LoongArch: Add ifunc support for strcpy{aligned, unaligned, lsx, lasx}
2023-09-11 9:53 ` dengjianbo
@ 2023-09-13 7:47 ` dengjianbo
0 siblings, 0 replies; 8+ messages in thread
From: dengjianbo @ 2023-09-13 7:47 UTC (permalink / raw)
To: Xi Ruoyao, libc-alpha; +Cc: adhemerval.zanella, caiyinyu, xuchenghua, huangpei
[-- Attachment #1: Type: text/plain, Size: 1914 bytes --]
We have changed strcpy to include both strcpy and stpcpy implementation,
and use USE_AS_STPCPY to distinguish these two functions, stpcpy
function will define related macros and include strcpy source code.
See patch v2:
https://sourceware.org/pipermail/libc-alpha/2023-September/151531.html
On 2023-09-11 17:53, dengjianbo wrote:
> Tested strcpy-lasx comparing with strcpy(call stpcpy-lasx), the
> difference between two timings are 0.28, strcpy-lasx takes less time.
> When the length of data is less than 32, it could reduce the runtime
> more than 30%.
>
> See:
> https://github.com/jiadengx/glibc_test/blob/main/bench/strcpy_lasx_compare_generic_strcpy.out
>
> There are some duplicated code in strcpy from stpcpy, since the main
> part is almost same. Maybe we can try to use one source code with
> MARCO USE_AS_STPCPY to distinguish strcpy and stpcpy like x86_64? it
> could avoid the performance degradation.
>
> On 2023-09-08 22:22, Xi Ruoyao wrote:
>> On Fri, 2023-09-08 at 17:33 +0800, dengjianbo wrote:
>>> According to glibc strcpy microbenchmark test results(changed to use
>>> generic_strcpy instead of strlen + memcpy), comparing with generic_strcpy,
>>> this implementation could reduce the runtime as following:
>>>
>>> Name Percent of rutime reduced
>>> strcpy-aligned 10%-45%
>>> strcpy-unaligned 10%-49%, comparing with the aligned version,unaligned
>>> version experience better performance in case src and dest
>>> cannot be both aligned with 8bytes
>>> strcpy-lsx 20%-80%
>>> strcpy-lasx 15%-86%
>> Generic strcpy calls stpcpy, so if we've optimized stpcpy maybe it's not
>> necessary to duplicate everything in strcpy. Is there a benchmark
>> result comparing the timing with and without this patch, but both with
>> the second patch (optimized stpcpy)?
>>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-09-13 7:47 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-08 9:33 [PATCH 0/4] LoongArch: Add ifunc support for str{cpy, rchr}, dengjianbo
2023-09-08 9:33 ` [PATCH 1/4] LoongArch: Add ifunc support for strcpy{aligned, unaligned, lsx, lasx} dengjianbo
2023-09-08 14:22 ` Xi Ruoyao
2023-09-11 9:53 ` dengjianbo
2023-09-13 7:47 ` dengjianbo
2023-09-08 9:33 ` [PATCH 2/4] LoongArch: Add ifunc support for stpcpy{aligned, " dengjianbo
2023-09-08 9:33 ` [PATCH 3/4] LoongArch: Add ifunc support for strrchr{aligned, " dengjianbo
2023-09-08 9:33 ` [PATCH 4/4] LoongArch: Change to put magic number to .rodata section dengjianbo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).