public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 0/6] LoongArch: Add ifunc support for {raw}memchr,
@ 2023-08-28  7:26 dengjianbo
  2023-08-28  7:26 ` [PATCH 1/6] LoongArch: Add ifunc support for rawmemchr{aligned, lsx, lasx} dengjianbo
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: dengjianbo @ 2023-08-28  7:26 UTC (permalink / raw)
  To: libc-alpha
  Cc: adhemerval.zanella, xry111, caiyinyu, xuchenghua, huangpei, dengjianbo

This patch add mutiple versions of rawmemchr, memchr, memrchr, memset,
memcmp implemented by LoongArch basic instructions, LSX instructions,
LASX instructions, comparing with current generic version, even this
implementation experience performance degradation in few cases, overall,
the performance gains are significant.

See:
https://github.com/jiadengx/glibc_test/blob/main/bench/rawmemchr_compare.out
https://github.com/jiadengx/glibc_test/blob/main/bench/memchr_compare.out
https://github.com/jiadengx/glibc_test/blob/main/bench/memrchr_compare.out
https://github.com/jiadengx/glibc_test/blob/main/bench/memset_compare.out
https://github.com/jiadengx/glibc_test/blob/main/bench/memcmp_compare.out

In the data, positive values in the parentheses indicate that out
implementation took less time, indicating a performance improvement;
negative values in the parentheses mean that our implementation took
more time, indicating a decrease in performance. Following is the
summarize of the performance comparing with the generic version in the
glibc microbenchmark:

Name                   Percent of time reduced
rawmemchr-lasx         40%-80%
rawmemchr-lsx          40%-66%
rawmemchr-aligned      20%-40%

memchr-lasx            37%-83%
memchr-lsx             30%-66%
memchr-aligned         0%-15%

memrchr-lasx           20%-83%
memrchr-lsx            20%-64%

memset-lasx            15%-75%
memset-lsx             15%-50%
memset-unaligned       performance is close when the length larger than
                       128. For 8-128, 30%-70%
memset-aligned         performance is close when the length larger than
                       128. For 8-128, 20%-50%

memcmp-lasx            16%-74%
memcmp-lsx             20%-50%
memcmp-aligned         5%-20%

dengjianbo (6):
  LoongArch: Add ifunc support for rawmemchr{aligned, lsx, lasx}
  LoongArch: Add ifunc support for memchr{aligned, lsx, lasx}
  LoongArch: Add ifunc support for memrchr{lsx, lasx}
  LoongArch: Add ifunc support for memset{aligned, unaligned, lsx, lasx}
  LoongArch: Add ifunc support for memcmp{aligned, lsx, lasx}
  LoongArch: Change loongarch to LoongArch in comments

 sysdeps/loongarch/lp64/multiarch/Makefile     |  16 +
 .../lp64/multiarch/dl-symbol-redir-ifunc.h    |  24 ++
 .../lp64/multiarch/ifunc-impl-list.c          |  40 +++
 .../loongarch/lp64/multiarch/ifunc-memchr.h   |  40 +++
 .../loongarch/lp64/multiarch/ifunc-memcmp.h   |  40 +++
 .../loongarch/lp64/multiarch/ifunc-memrchr.h  |  40 +++
 .../lp64/multiarch/ifunc-rawmemchr.h          |  40 +++
 .../loongarch/lp64/multiarch/memchr-aligned.S |  95 ++++++
 .../loongarch/lp64/multiarch/memchr-lasx.S    | 117 +++++++
 sysdeps/loongarch/lp64/multiarch/memchr-lsx.S | 102 ++++++
 sysdeps/loongarch/lp64/multiarch/memchr.c     |  37 +++
 .../loongarch/lp64/multiarch/memcmp-aligned.S | 292 ++++++++++++++++++
 .../loongarch/lp64/multiarch/memcmp-lasx.S    | 207 +++++++++++++
 sysdeps/loongarch/lp64/multiarch/memcmp-lsx.S | 269 ++++++++++++++++
 sysdeps/loongarch/lp64/multiarch/memcmp.c     |  43 +++
 .../loongarch/lp64/multiarch/memcpy-aligned.S |   2 +-
 .../loongarch/lp64/multiarch/memcpy-lasx.S    |   2 +-
 sysdeps/loongarch/lp64/multiarch/memcpy-lsx.S |   2 +-
 .../lp64/multiarch/memcpy-unaligned.S         |   2 +-
 .../lp64/multiarch/memmove-aligned.S          |   2 +-
 .../loongarch/lp64/multiarch/memmove-lasx.S   |   2 +-
 .../loongarch/lp64/multiarch/memmove-lsx.S    |   2 +-
 .../lp64/multiarch/memmove-unaligned.S        |   2 +-
 .../lp64/multiarch/memrchr-generic.c          |  23 ++
 .../loongarch/lp64/multiarch/memrchr-lasx.S   | 123 ++++++++
 .../loongarch/lp64/multiarch/memrchr-lsx.S    | 105 +++++++
 sysdeps/loongarch/lp64/multiarch/memrchr.c    |  33 ++
 .../loongarch/lp64/multiarch/memset-aligned.S | 174 +++++++++++
 .../loongarch/lp64/multiarch/memset-lasx.S    | 142 +++++++++
 sysdeps/loongarch/lp64/multiarch/memset-lsx.S | 135 ++++++++
 .../lp64/multiarch/memset-unaligned.S         | 162 ++++++++++
 sysdeps/loongarch/lp64/multiarch/memset.c     |  37 +++
 .../lp64/multiarch/rawmemchr-aligned.S        | 124 ++++++++
 .../loongarch/lp64/multiarch/rawmemchr-lasx.S |  82 +++++
 .../loongarch/lp64/multiarch/rawmemchr-lsx.S  |  71 +++++
 sysdeps/loongarch/lp64/multiarch/rawmemchr.c  |  37 +++
 .../loongarch/lp64/multiarch/strchr-aligned.S |   2 +-
 .../loongarch/lp64/multiarch/strchr-lasx.S    |   2 +-
 sysdeps/loongarch/lp64/multiarch/strchr-lsx.S |   2 +-
 .../lp64/multiarch/strchrnul-aligned.S        |   2 +-
 .../loongarch/lp64/multiarch/strchrnul-lasx.S |   2 +-
 .../loongarch/lp64/multiarch/strchrnul-lsx.S  |   2 +-
 .../loongarch/lp64/multiarch/strcmp-aligned.S |   2 +-
 sysdeps/loongarch/lp64/multiarch/strcmp-lsx.S |   2 +-
 .../loongarch/lp64/multiarch/strlen-aligned.S |   2 +-
 .../loongarch/lp64/multiarch/strlen-lasx.S    |   2 +-
 sysdeps/loongarch/lp64/multiarch/strlen-lsx.S |   2 +-
 .../lp64/multiarch/strncmp-aligned.S          |   2 +-
 .../loongarch/lp64/multiarch/strncmp-lsx.S    |   2 +-
 .../lp64/multiarch/strnlen-aligned.S          |   2 +-
 .../loongarch/lp64/multiarch/strnlen-lasx.S   |   2 +-
 .../loongarch/lp64/multiarch/strnlen-lsx.S    |   2 +-
 52 files changed, 2674 insertions(+), 24 deletions(-)
 create mode 100644 sysdeps/loongarch/lp64/multiarch/dl-symbol-redir-ifunc.h
 create mode 100644 sysdeps/loongarch/lp64/multiarch/ifunc-memchr.h
 create mode 100644 sysdeps/loongarch/lp64/multiarch/ifunc-memcmp.h
 create mode 100644 sysdeps/loongarch/lp64/multiarch/ifunc-memrchr.h
 create mode 100644 sysdeps/loongarch/lp64/multiarch/ifunc-rawmemchr.h
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memchr-aligned.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memchr-lasx.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memchr-lsx.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memchr.c
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memcmp-aligned.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memcmp-lasx.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memcmp-lsx.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memcmp.c
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memrchr-generic.c
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memrchr-lasx.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memrchr-lsx.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memrchr.c
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memset-aligned.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memset-lasx.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memset-lsx.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memset-unaligned.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memset.c
 create mode 100644 sysdeps/loongarch/lp64/multiarch/rawmemchr-aligned.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/rawmemchr-lasx.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/rawmemchr-lsx.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/rawmemchr.c

-- 
2.40.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/6] LoongArch: Add ifunc support for rawmemchr{aligned, lsx, lasx}
  2023-08-28  7:26 [PATCH 0/6] LoongArch: Add ifunc support for {raw}memchr, dengjianbo
@ 2023-08-28  7:26 ` dengjianbo
  2023-08-28  7:26 ` [PATCH 2/6] LoongArch: Add ifunc support for memchr{aligned, " dengjianbo
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: dengjianbo @ 2023-08-28  7:26 UTC (permalink / raw)
  To: libc-alpha
  Cc: adhemerval.zanella, xry111, caiyinyu, xuchenghua, huangpei, dengjianbo

According to glibc rawmemchr microbenchmark, A few cases tested with
char '\0' experience performance degradation due to the lasx and lsx
versions don't handle the '\0' separately. Overall, rawmemchr-lasx
implementation could reduce the runtime about 40%-80%, rawmemchr-lsx
implementation could reduce the runtime about 40%-66%, rawmemchr-aligned
implementation could reduce the runtime about 20%-40%.
---
 sysdeps/loongarch/lp64/multiarch/Makefile     |   3 +
 .../lp64/multiarch/ifunc-impl-list.c          |   8 ++
 .../lp64/multiarch/ifunc-rawmemchr.h          |  40 ++++++
 .../lp64/multiarch/rawmemchr-aligned.S        | 124 ++++++++++++++++++
 .../loongarch/lp64/multiarch/rawmemchr-lasx.S |  82 ++++++++++++
 .../loongarch/lp64/multiarch/rawmemchr-lsx.S  |  71 ++++++++++
 sysdeps/loongarch/lp64/multiarch/rawmemchr.c  |  37 ++++++
 7 files changed, 365 insertions(+)
 create mode 100644 sysdeps/loongarch/lp64/multiarch/ifunc-rawmemchr.h
 create mode 100644 sysdeps/loongarch/lp64/multiarch/rawmemchr-aligned.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/rawmemchr-lasx.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/rawmemchr-lsx.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/rawmemchr.c

diff --git a/sysdeps/loongarch/lp64/multiarch/Makefile b/sysdeps/loongarch/lp64/multiarch/Makefile
index 5d7ae7ae73..64416b025a 100644
--- a/sysdeps/loongarch/lp64/multiarch/Makefile
+++ b/sysdeps/loongarch/lp64/multiarch/Makefile
@@ -21,5 +21,8 @@ sysdep_routines += \
   memmove-unaligned \
   memmove-lsx \
   memmove-lasx \
+  rawmemchr-aligned \
+  rawmemchr-lsx \
+  rawmemchr-lasx \
 # sysdep_routines
 endif
diff --git a/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c
index c8ba87bd81..3db9af1460 100644
--- a/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c
@@ -94,5 +94,13 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
               IFUNC_IMPL_ADD (array, i, memmove, 1, __memmove_aligned)
               )
 
+  IFUNC_IMPL (i, name, rawmemchr,
+#if !defined __loongarch_soft_float
+	      IFUNC_IMPL_ADD (array, i, rawmemchr, SUPPORT_LASX, __rawmemchr_lasx)
+	      IFUNC_IMPL_ADD (array, i, rawmemchr, SUPPORT_LSX, __rawmemchr_lsx)
+#endif
+	      IFUNC_IMPL_ADD (array, i, rawmemchr, 1, __rawmemchr_aligned)
+	      )
+
   return i;
 }
diff --git a/sysdeps/loongarch/lp64/multiarch/ifunc-rawmemchr.h b/sysdeps/loongarch/lp64/multiarch/ifunc-rawmemchr.h
new file mode 100644
index 0000000000..a7bb4cf9ea
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/ifunc-rawmemchr.h
@@ -0,0 +1,40 @@
+/* Common definition for rawmemchr ifunc selections.
+   All versions must be listed in ifunc-impl-list.c.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <ldsodefs.h>
+#include <ifunc-init.h>
+
+#if !defined __loongarch_soft_float
+extern __typeof (REDIRECT_NAME) OPTIMIZE (lasx) attribute_hidden;
+extern __typeof (REDIRECT_NAME) OPTIMIZE (lsx) attribute_hidden;
+#endif
+extern __typeof (REDIRECT_NAME) OPTIMIZE (aligned) attribute_hidden;
+
+static inline void *
+IFUNC_SELECTOR (void)
+{
+#if !defined __loongarch_soft_float
+  if (SUPPORT_LASX)
+    return OPTIMIZE (lasx);
+  else if (SUPPORT_LSX)
+    return OPTIMIZE (lsx);
+  else
+#endif
+    return OPTIMIZE (aligned);
+}
diff --git a/sysdeps/loongarch/lp64/multiarch/rawmemchr-aligned.S b/sysdeps/loongarch/lp64/multiarch/rawmemchr-aligned.S
new file mode 100644
index 0000000000..9c7155ae82
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/rawmemchr-aligned.S
@@ -0,0 +1,124 @@
+/* Optimized rawmemchr implementation using basic LoongArch instructions.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <sys/regdef.h>
+#include <sys/asm.h>
+
+#if IS_IN (libc)
+# define RAWMEMCHR_NAME __rawmemchr_aligned
+#else
+# define RAWMEMCHR_NAME __rawmemchr
+#endif
+
+LEAF(RAWMEMCHR_NAME, 6)
+    andi        t1, a0, 0x7
+    bstrins.d   a0, zero, 2, 0
+    lu12i.w     a2, 0x01010
+    bstrins.d   a1, a1, 15, 8
+
+    ld.d        t0, a0, 0
+    slli.d      t1, t1, 3
+    ori         a2, a2, 0x101
+    bstrins.d   a1, a1, 31, 16
+
+    li.w        t8, -1
+    bstrins.d   a1, a1, 63, 32
+    bstrins.d   a2, a2, 63, 32
+    sll.d       t2, t8, t1
+
+    sll.d       t3, a1, t1
+    orn         t0, t0, t2
+    slli.d      a3, a2, 7
+    beqz        a1, L(find_zero)
+
+    xor         t0, t0, t3
+    sub.d       t1, t0, a2
+    andn        t2, a3, t0
+    and         t3, t1, t2
+
+    bnez        t3, L(count_pos)
+    addi.d      a0, a0, 8
+
+L(loop):
+    ld.d        t0, a0, 0
+    xor         t0, t0, a1
+
+    sub.d       t1, t0, a2
+    andn        t2, a3, t0
+    and         t3, t1, t2
+    bnez        t3, L(count_pos)
+
+    ld.d        t0, a0, 8
+    addi.d      a0, a0, 16
+    xor         t0, t0, a1
+    sub.d       t1, t0, a2
+
+    andn        t2, a3, t0
+    and         t3, t1, t2
+    beqz        t3, L(loop)
+    addi.d      a0, a0, -8
+L(count_pos):
+    ctz.d       t0, t3
+    srli.d      t0, t0, 3
+    add.d       a0, a0, t0
+    jr          ra
+
+L(loop_7bit):
+    ld.d        t0, a0, 0
+L(find_zero):
+    sub.d       t1, t0, a2
+    and         t2, t1, a3
+    bnez        t2, L(more_check)
+
+    ld.d        t0, a0, 8
+    addi.d      a0, a0, 16
+    sub.d       t1, t0, a2
+    and         t2, t1, a3
+
+    beqz        t2, L(loop_7bit)
+    addi.d      a0, a0, -8
+
+L(more_check):
+    andn        t2, a3, t0
+    and         t3, t1, t2
+    bnez        t3, L(count_pos)
+    addi.d      a0, a0, 8
+
+L(loop_8bit):
+    ld.d        t0, a0, 0
+
+    sub.d       t1, t0, a2
+    andn        t2, a3, t0
+    and         t3, t1, t2
+    bnez        t3, L(count_pos)
+
+    ld.d        t0, a0, 8
+    addi.d      a0, a0, 16
+    sub.d       t1, t0, a2
+
+    andn        t2, a3, t0
+    and         t3, t1, t2
+    beqz        t3, L(loop_8bit)
+
+    addi.d      a0, a0, -8
+    b           L(count_pos)
+
+END(RAWMEMCHR_NAME)
+
+libc_hidden_builtin_def (__rawmemchr)
diff --git a/sysdeps/loongarch/lp64/multiarch/rawmemchr-lasx.S b/sysdeps/loongarch/lp64/multiarch/rawmemchr-lasx.S
new file mode 100644
index 0000000000..be2eb59dbe
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/rawmemchr-lasx.S
@@ -0,0 +1,82 @@
+/* Optimized rawmemchr implementation using LoongArch LASX instructions.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <sys/asm.h>
+#include <sys/regdef.h>
+
+#if IS_IN (libc) && !defined __loongarch_soft_float
+
+# define RAWMEMCHR __rawmemchr_lasx
+
+LEAF(RAWMEMCHR, 6)
+    move            a2, a0
+    bstrins.d       a0, zero, 5, 0
+    xvld            xr0, a0, 0
+    xvld            xr1, a0, 32
+
+    xvreplgr2vr.b   xr2, a1
+    xvseq.b         xr0, xr0, xr2
+    xvseq.b         xr1, xr1, xr2
+    xvmsknz.b       xr0, xr0
+
+    xvmsknz.b       xr1, xr1
+    xvpickve.w      xr3, xr0, 4
+    xvpickve.w      xr4, xr1, 4
+    vilvl.h         vr0, vr3, vr0
+
+    vilvl.h         vr1, vr4, vr1
+    vilvl.w         vr0, vr1, vr0
+    movfr2gr.d      t0, fa0
+    sra.d           t0, t0, a2
+
+
+    beqz            t0, L(loop)
+    ctz.d           t0, t0
+    add.d           a0, a2, t0
+    jr              ra
+
+L(loop):
+    xvld            xr0, a0, 64
+    xvld            xr1, a0, 96
+    addi.d          a0, a0, 64
+    xvseq.b         xr0, xr0, xr2
+
+    xvseq.b         xr1, xr1, xr2
+    xvmax.bu        xr3, xr0, xr1
+    xvseteqz.v      fcc0, xr3
+    bcnez           fcc0, L(loop)
+
+    xvmsknz.b       xr0, xr0
+    xvmsknz.b       xr1, xr1
+    xvpickve.w      xr3, xr0, 4
+    xvpickve.w      xr4, xr1, 4
+
+
+    vilvl.h         vr0, vr3, vr0
+    vilvl.h         vr1, vr4, vr1
+    vilvl.w         vr0, vr1, vr0
+    movfr2gr.d      t0, fa0
+
+    ctz.d           t0, t0
+    add.d           a0, a0, t0
+    jr              ra
+END(RAWMEMCHR)
+
+libc_hidden_builtin_def (RAWMEMCHR)
+#endif
diff --git a/sysdeps/loongarch/lp64/multiarch/rawmemchr-lsx.S b/sysdeps/loongarch/lp64/multiarch/rawmemchr-lsx.S
new file mode 100644
index 0000000000..2f6fe024dc
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/rawmemchr-lsx.S
@@ -0,0 +1,71 @@
+/* Optimized rawmemchr implementation using LoongArch LSX instructions.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <sys/regdef.h>
+#include <sys/asm.h>
+
+#if IS_IN (libc) && !defined __loongarch_soft_float
+
+# define RAWMEMCHR __rawmemchr_lsx
+
+LEAF(RAWMEMCHR, 6)
+    move            a2, a0
+    bstrins.d       a0, zero, 4, 0
+    vld             vr0, a0, 0
+    vld             vr1, a0, 16
+
+    vreplgr2vr.b    vr2, a1
+    vseq.b          vr0, vr0, vr2
+    vseq.b          vr1, vr1, vr2
+    vmsknz.b        vr0, vr0
+
+    vmsknz.b        vr1, vr1
+    vilvl.h         vr0, vr1, vr0
+    movfr2gr.s      t0, fa0
+    sra.w           t0, t0, a2
+
+    beqz            t0, L(loop)
+    ctz.w           t0, t0
+    add.d           a0, a2, t0
+    jr              ra
+
+
+L(loop):
+    vld             vr0, a0, 32
+    vld             vr1, a0, 48
+    addi.d          a0, a0, 32
+    vseq.b          vr0, vr0, vr2
+
+    vseq.b          vr1, vr1, vr2
+    vmax.bu         vr3, vr0, vr1
+    vseteqz.v       fcc0, vr3
+    bcnez           fcc0, L(loop)
+
+    vmsknz.b        vr0, vr0
+    vmsknz.b        vr1, vr1
+    vilvl.h         vr0, vr1, vr0
+    movfr2gr.s      t0, fa0
+
+    ctz.w           t0, t0
+    add.d           a0, a0, t0
+    jr              ra
+END(RAWMEMCHR)
+
+libc_hidden_builtin_def (RAWMEMCHR)
+#endif
diff --git a/sysdeps/loongarch/lp64/multiarch/rawmemchr.c b/sysdeps/loongarch/lp64/multiarch/rawmemchr.c
new file mode 100644
index 0000000000..89c7ffff8f
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/rawmemchr.c
@@ -0,0 +1,37 @@
+/* Multiple versions of rawmemchr.
+   All versions must be listed in ifunc-impl-list.c.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#if IS_IN (libc)
+# define rawmemchr __redirect_rawmemchr
+# define __rawmemchr __redirect___rawmemchr
+# include <string.h>
+# undef rawmemchr
+# undef __rawmemchr
+
+# define SYMBOL_NAME rawmemchr
+# include "ifunc-rawmemchr.h"
+
+libc_ifunc_redirected (__redirect_rawmemchr, __rawmemchr,
+                       IFUNC_SELECTOR ());
+weak_alias (__rawmemchr, rawmemchr)
+# ifdef SHARED
+__hidden_ver1 (__rawmemchr, __GI___rawmemchr, __redirect___rawmemchr)
+  __attribute__((visibility ("hidden")));
+# endif
+#endif
-- 
2.40.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 2/6] LoongArch: Add ifunc support for memchr{aligned, lsx, lasx}
  2023-08-28  7:26 [PATCH 0/6] LoongArch: Add ifunc support for {raw}memchr, dengjianbo
  2023-08-28  7:26 ` [PATCH 1/6] LoongArch: Add ifunc support for rawmemchr{aligned, lsx, lasx} dengjianbo
@ 2023-08-28  7:26 ` dengjianbo
  2023-08-28  7:26 ` [PATCH 3/6] LoongArch: Add ifunc support for memrchr{lsx, lasx} dengjianbo
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: dengjianbo @ 2023-08-28  7:26 UTC (permalink / raw)
  To: libc-alpha
  Cc: adhemerval.zanella, xry111, caiyinyu, xuchenghua, huangpei, dengjianbo

According to glibc memchr microbenchmark, this implementation could reduce
the runtime as following:

Name               Percent of runtime reduced
memchr-lasx        37%-83%
memchr-lsx         30%-66%
memchr-aligned     0%-15%
---
 sysdeps/loongarch/lp64/multiarch/Makefile     |   3 +
 .../lp64/multiarch/ifunc-impl-list.c          |   7 ++
 .../loongarch/lp64/multiarch/ifunc-memchr.h   |  40 ++++++
 .../loongarch/lp64/multiarch/memchr-aligned.S |  95 ++++++++++++++
 .../loongarch/lp64/multiarch/memchr-lasx.S    | 117 ++++++++++++++++++
 sysdeps/loongarch/lp64/multiarch/memchr-lsx.S | 102 +++++++++++++++
 sysdeps/loongarch/lp64/multiarch/memchr.c     |  37 ++++++
 7 files changed, 401 insertions(+)
 create mode 100644 sysdeps/loongarch/lp64/multiarch/ifunc-memchr.h
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memchr-aligned.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memchr-lasx.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memchr-lsx.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memchr.c

diff --git a/sysdeps/loongarch/lp64/multiarch/Makefile b/sysdeps/loongarch/lp64/multiarch/Makefile
index 64416b025a..2f4802cfa4 100644
--- a/sysdeps/loongarch/lp64/multiarch/Makefile
+++ b/sysdeps/loongarch/lp64/multiarch/Makefile
@@ -24,5 +24,8 @@ sysdep_routines += \
   rawmemchr-aligned \
   rawmemchr-lsx \
   rawmemchr-lasx \
+  memchr-aligned \
+  memchr-lsx \
+  memchr-lasx \
 # sysdep_routines
 endif
diff --git a/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c
index 3db9af1460..a567b9cf4d 100644
--- a/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c
@@ -102,5 +102,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 	      IFUNC_IMPL_ADD (array, i, rawmemchr, 1, __rawmemchr_aligned)
 	      )
 
+  IFUNC_IMPL (i, name, memchr,
+#if !defined __loongarch_soft_float
+	      IFUNC_IMPL_ADD (array, i, memchr, SUPPORT_LASX, __memchr_lasx)
+	      IFUNC_IMPL_ADD (array, i, memchr, SUPPORT_LSX, __memchr_lsx)
+#endif
+	      IFUNC_IMPL_ADD (array, i, memchr, 1, __memchr_aligned)
+	      )
   return i;
 }
diff --git a/sysdeps/loongarch/lp64/multiarch/ifunc-memchr.h b/sysdeps/loongarch/lp64/multiarch/ifunc-memchr.h
new file mode 100644
index 0000000000..9060ccd54d
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/ifunc-memchr.h
@@ -0,0 +1,40 @@
+/* Common definition for memchr ifunc selections.
+   All versions must be listed in ifunc-impl-list.c.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <ldsodefs.h>
+#include <ifunc-init.h>
+
+#if !defined __loongarch_soft_float
+extern __typeof (REDIRECT_NAME) OPTIMIZE (lasx) attribute_hidden;
+extern __typeof (REDIRECT_NAME) OPTIMIZE (lsx) attribute_hidden;
+#endif
+extern __typeof (REDIRECT_NAME) OPTIMIZE (aligned) attribute_hidden;
+
+static inline void *
+IFUNC_SELECTOR (void)
+{
+#if !defined __loongarch_soft_float
+  if (SUPPORT_LASX)
+    return OPTIMIZE (lasx);
+  else if (SUPPORT_LSX)
+    return OPTIMIZE (lsx);
+  else
+#endif
+    return OPTIMIZE (aligned);
+}
diff --git a/sysdeps/loongarch/lp64/multiarch/memchr-aligned.S b/sysdeps/loongarch/lp64/multiarch/memchr-aligned.S
new file mode 100644
index 0000000000..81d0d00461
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/memchr-aligned.S
@@ -0,0 +1,95 @@
+/* Optimized memchr implementation using basic LoongArch instructions.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <sys/regdef.h>
+#include <sys/asm.h>
+
+#if IS_IN (libc)
+# define MEMCHR_NAME __memchr_aligned
+#else
+# define MEMCHR_NAME memchr
+#endif
+
+LEAF(MEMCHR_NAME, 6)
+    beqz        a2, L(out)
+    andi        t1, a0, 0x7
+    add.d       a5, a0, a2
+    bstrins.d   a0, zero, 2, 0
+
+    ld.d        t0, a0, 0
+    bstrins.d   a1, a1, 15, 8
+    lu12i.w     a3, 0x01010
+    slli.d      t2, t1, 03
+
+    bstrins.d   a1, a1, 31, 16
+    ori         a3, a3, 0x101
+    li.d        t7, -1
+    li.d        t8, 8
+
+    bstrins.d   a1, a1, 63, 32
+    bstrins.d   a3, a3, 63, 32
+    sll.d       t2, t7, t2
+    xor         t0, t0, a1
+
+
+    addi.d      a6, a5, -1
+    slli.d      a4, a3, 7
+    sub.d       t1, t8, t1
+    orn         t0, t0, t2
+
+    sub.d       t2, t0, a3
+    andn        t3, a4, t0
+    bstrins.d   a6, zero, 2, 0
+    and         t0, t2, t3
+
+    bgeu        t1, a2, L(end)
+L(loop):
+    bnez        t0, L(found)
+    ld.d        t1, a0, 8
+    xor         t0, t1, a1
+
+    addi.d      a0, a0, 8
+    sub.d       t2, t0, a3
+    andn        t3, a4, t0
+    and         t0, t2, t3
+
+
+    bne         a0, a6, L(loop)
+L(end):
+    sub.d       t1, a5, a6
+    ctz.d       t0, t0
+    srli.d      t0, t0, 3
+
+    sltu        t1, t0, t1
+    add.d       a0, a0, t0
+    maskeqz     a0, a0, t1
+    jr          ra
+
+L(found):
+    ctz.d       t0, t0
+    srli.d      t0, t0, 3
+    add.d       a0, a0, t0
+    jr          ra
+
+L(out):
+    move        a0, zero
+    jr          ra
+END(MEMCHR_NAME)
+
+libc_hidden_builtin_def (MEMCHR_NAME)
diff --git a/sysdeps/loongarch/lp64/multiarch/memchr-lasx.S b/sysdeps/loongarch/lp64/multiarch/memchr-lasx.S
new file mode 100644
index 0000000000..a26cdf48b5
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/memchr-lasx.S
@@ -0,0 +1,117 @@
+/* Optimized memchr implementation using LoongArch LASX instructions.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <sys/regdef.h>
+#include <sys/asm.h>
+
+#if IS_IN (libc) && !defined __loongarch_soft_float
+
+# define MEMCHR __memchr_lasx
+
+LEAF(MEMCHR, 6)
+    beqz            a2, L(ret0)
+    add.d           a3, a0, a2
+    andi            t0, a0, 0x3f
+    bstrins.d       a0, zero, 5, 0
+
+    xvld            xr0, a0, 0
+    xvld            xr1, a0, 32
+    li.d            t1, -1
+    li.d            t2, 64
+
+    xvreplgr2vr.b   xr2, a1
+    sll.d           t3, t1, t0
+    sub.d           t2, t2, t0
+    xvseq.b         xr0, xr0, xr2
+
+    xvseq.b         xr1, xr1, xr2
+    xvmsknz.b       xr0, xr0
+    xvmsknz.b       xr1, xr1
+    xvpickve.w      xr3, xr0, 4
+
+
+    xvpickve.w      xr4, xr1, 4
+    vilvl.h         vr0, vr3, vr0
+    vilvl.h         vr1, vr4, vr1
+    vilvl.w         vr0, vr1, vr0
+
+    movfr2gr.d      t0, fa0
+    and             t0, t0, t3
+    bgeu            t2, a2, L(end)
+    bnez            t0, L(found)
+
+    addi.d          a4, a3, -1
+    bstrins.d       a4, zero, 5, 0
+L(loop):
+    xvld            xr0, a0, 64
+    xvld            xr1, a0, 96
+
+    addi.d          a0, a0, 64
+    xvseq.b         xr0, xr0, xr2
+    xvseq.b         xr1, xr1, xr2
+    beq             a0, a4, L(out)
+
+
+    xvmax.bu        xr3, xr0, xr1
+    xvseteqz.v      fcc0, xr3
+    bcnez           fcc0, L(loop)
+    xvmsknz.b       xr0, xr0
+
+    xvmsknz.b       xr1, xr1
+    xvpickve.w      xr3, xr0, 4
+    xvpickve.w      xr4, xr1, 4
+    vilvl.h         vr0, vr3, vr0
+
+    vilvl.h         vr1, vr4, vr1
+    vilvl.w         vr0, vr1, vr0
+    movfr2gr.d      t0, fa0
+L(found):
+    ctz.d           t1, t0
+
+    add.d           a0, a0, t1
+    jr              ra
+L(ret0):
+    move            a0, zero
+    jr              ra
+
+
+L(out):
+    xvmsknz.b       xr0, xr0
+    xvmsknz.b       xr1, xr1
+    xvpickve.w      xr3, xr0, 4
+    xvpickve.w      xr4, xr1, 4
+
+    vilvl.h         vr0, vr3, vr0
+    vilvl.h         vr1, vr4, vr1
+    vilvl.w         vr0, vr1, vr0
+    movfr2gr.d      t0, fa0
+
+L(end):
+    sub.d           t2, zero, a3
+    srl.d           t1, t1, t2
+    and             t0, t0, t1
+    ctz.d           t1, t0
+
+    add.d           a0, a0, t1
+    maskeqz         a0, a0, t0
+    jr              ra
+END(MEMCHR)
+
+libc_hidden_builtin_def (MEMCHR)
+#endif
diff --git a/sysdeps/loongarch/lp64/multiarch/memchr-lsx.S b/sysdeps/loongarch/lp64/multiarch/memchr-lsx.S
new file mode 100644
index 0000000000..a73ecd2599
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/memchr-lsx.S
@@ -0,0 +1,102 @@
+/* Optimized memchr implementation using LoongArch LSX instructions.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <sys/regdef.h>
+#include <sys/asm.h>
+
+#if IS_IN (libc) && !defined __loongarch_soft_float
+
+# define MEMCHR __memchr_lsx
+
+LEAF(MEMCHR, 6)
+    beqz            a2, L(ret0)
+    add.d           a3, a0, a2
+    andi            t0, a0, 0x1f
+    bstrins.d       a0, zero, 4, 0
+
+    vld             vr0, a0, 0
+    vld             vr1, a0, 16
+    li.d            t1, -1
+    li.d            t2, 32
+
+    vreplgr2vr.b    vr2, a1
+    sll.d           t3, t1, t0
+    sub.d           t2, t2, t0
+    vseq.b          vr0, vr0, vr2
+
+    vseq.b          vr1, vr1, vr2
+    vmsknz.b        vr0, vr0
+    vmsknz.b        vr1, vr1
+    vilvl.h         vr0, vr1, vr0
+
+
+    movfr2gr.s      t0, fa0
+    and             t0, t0, t3
+    bgeu            t2, a2, L(end)
+    bnez            t0, L(found)
+
+    addi.d          a4, a3, -1
+    bstrins.d       a4, zero, 4, 0
+L(loop):
+    vld             vr0, a0, 32
+    vld             vr1, a0, 48
+
+    addi.d          a0, a0, 32
+    vseq.b          vr0, vr0, vr2
+    vseq.b          vr1, vr1, vr2
+    beq             a0, a4, L(out)
+
+    vmax.bu         vr3, vr0, vr1
+    vseteqz.v       fcc0, vr3
+    bcnez           fcc0, L(loop)
+    vmsknz.b        vr0, vr0
+
+
+    vmsknz.b        vr1, vr1
+    vilvl.h         vr0, vr1, vr0
+    movfr2gr.s      t0, fa0
+L(found):
+    ctz.w           t0, t0
+
+    add.d           a0, a0, t0
+    jr              ra
+L(ret0):
+    move            a0, zero
+    jr              ra
+
+L(out):
+    vmsknz.b        vr0, vr0
+    vmsknz.b        vr1, vr1
+    vilvl.h         vr0, vr1, vr0
+    movfr2gr.s      t0, fa0
+
+L(end):
+    sub.d           t2, zero, a3
+    srl.w           t1, t1, t2
+    and             t0, t0, t1
+    ctz.w           t1, t0
+
+
+    add.d           a0, a0, t1
+    maskeqz         a0, a0, t0
+    jr              ra
+END(MEMCHR)
+
+libc_hidden_builtin_def (MEMCHR)
+#endif
diff --git a/sysdeps/loongarch/lp64/multiarch/memchr.c b/sysdeps/loongarch/lp64/multiarch/memchr.c
new file mode 100644
index 0000000000..059479c0ce
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/memchr.c
@@ -0,0 +1,37 @@
+/* Multiple versions of memchr.
+   All versions must be listed in ifunc-impl-list.c.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* Define multiple versions only for the definition in libc.  */
+#if IS_IN (libc)
+# define memchr __redirect_memchr
+# include <string.h>
+# undef memchr
+
+# define SYMBOL_NAME memchr
+# include "ifunc-memchr.h"
+
+libc_ifunc_redirected (__redirect_memchr, memchr,
+		       IFUNC_SELECTOR ());
+
+# ifdef SHARED
+__hidden_ver1 (memchr, __GI_memchr, __redirect_memchr)
+  __attribute__ ((visibility ("hidden"))) __attribute_copy__ (memchr);
+# endif
+
+#endif
-- 
2.40.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 3/6] LoongArch: Add ifunc support for memrchr{lsx, lasx}
  2023-08-28  7:26 [PATCH 0/6] LoongArch: Add ifunc support for {raw}memchr, dengjianbo
  2023-08-28  7:26 ` [PATCH 1/6] LoongArch: Add ifunc support for rawmemchr{aligned, lsx, lasx} dengjianbo
  2023-08-28  7:26 ` [PATCH 2/6] LoongArch: Add ifunc support for memchr{aligned, " dengjianbo
@ 2023-08-28  7:26 ` dengjianbo
  2023-08-28  7:26 ` [PATCH 4/6] LoongArch: Add ifunc support for memset{aligned, unaligned, lsx, lasx} dengjianbo
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: dengjianbo @ 2023-08-28  7:26 UTC (permalink / raw)
  To: libc-alpha
  Cc: adhemerval.zanella, xry111, caiyinyu, xuchenghua, huangpei, dengjianbo

According to glibc memrchr microbenchmark, this implementation could reduce
the runtime as following:

Name            Percent of rutime reduced
memrchr-lasx    20%-83%
memrchr-lsx     20%-64%
---
 sysdeps/loongarch/lp64/multiarch/Makefile     |   3 +
 .../lp64/multiarch/ifunc-impl-list.c          |   8 ++
 .../loongarch/lp64/multiarch/ifunc-memrchr.h  |  40 ++++++
 .../lp64/multiarch/memrchr-generic.c          |  23 ++++
 .../loongarch/lp64/multiarch/memrchr-lasx.S   | 123 ++++++++++++++++++
 .../loongarch/lp64/multiarch/memrchr-lsx.S    | 105 +++++++++++++++
 sysdeps/loongarch/lp64/multiarch/memrchr.c    |  33 +++++
 7 files changed, 335 insertions(+)
 create mode 100644 sysdeps/loongarch/lp64/multiarch/ifunc-memrchr.h
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memrchr-generic.c
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memrchr-lasx.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memrchr-lsx.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memrchr.c

diff --git a/sysdeps/loongarch/lp64/multiarch/Makefile b/sysdeps/loongarch/lp64/multiarch/Makefile
index 2f4802cfa4..7b87bc9055 100644
--- a/sysdeps/loongarch/lp64/multiarch/Makefile
+++ b/sysdeps/loongarch/lp64/multiarch/Makefile
@@ -27,5 +27,8 @@ sysdep_routines += \
   memchr-aligned \
   memchr-lsx \
   memchr-lasx \
+  memrchr-generic \
+  memrchr-lsx \
+  memrchr-lasx \
 # sysdep_routines
 endif
diff --git a/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c
index a567b9cf4d..8bd5489ee2 100644
--- a/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c
@@ -109,5 +109,13 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 #endif
 	      IFUNC_IMPL_ADD (array, i, memchr, 1, __memchr_aligned)
 	      )
+
+  IFUNC_IMPL (i, name, memrchr,
+#if !defined __loongarch_soft_float
+	      IFUNC_IMPL_ADD (array, i, memrchr, SUPPORT_LASX, __memrchr_lasx)
+	      IFUNC_IMPL_ADD (array, i, memrchr, SUPPORT_LSX, __memrchr_lsx)
+#endif
+	      IFUNC_IMPL_ADD (array, i, memrchr, 1, __memrchr_generic)
+	      )
   return i;
 }
diff --git a/sysdeps/loongarch/lp64/multiarch/ifunc-memrchr.h b/sysdeps/loongarch/lp64/multiarch/ifunc-memrchr.h
new file mode 100644
index 0000000000..8215f9ad94
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/ifunc-memrchr.h
@@ -0,0 +1,40 @@
+/* Common definition for memrchr implementation.
+   All versions must be listed in ifunc-impl-list.c.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <ldsodefs.h>
+#include <ifunc-init.h>
+
+#if !defined __loongarch_soft_float
+extern __typeof (REDIRECT_NAME) OPTIMIZE (lasx) attribute_hidden;
+extern __typeof (REDIRECT_NAME) OPTIMIZE (lsx) attribute_hidden;
+#endif
+extern __typeof (REDIRECT_NAME) OPTIMIZE (generic) attribute_hidden;
+
+static inline void *
+IFUNC_SELECTOR (void)
+{
+#if !defined __loongarch_soft_float
+  if (SUPPORT_LASX)
+    return OPTIMIZE (lasx);
+  else if (SUPPORT_LSX)
+    return OPTIMIZE (lsx);
+  else
+#endif
+    return OPTIMIZE (generic);
+}
diff --git a/sysdeps/loongarch/lp64/multiarch/memrchr-generic.c b/sysdeps/loongarch/lp64/multiarch/memrchr-generic.c
new file mode 100644
index 0000000000..ced61ebce5
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/memrchr-generic.c
@@ -0,0 +1,23 @@
+/* Generic implementation of memrchr.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#if IS_IN (libc)
+# define MEMRCHR __memrchr_generic
+#endif
+
+#include <string/memrchr.c>
diff --git a/sysdeps/loongarch/lp64/multiarch/memrchr-lasx.S b/sysdeps/loongarch/lp64/multiarch/memrchr-lasx.S
new file mode 100644
index 0000000000..5f3e0d06d7
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/memrchr-lasx.S
@@ -0,0 +1,123 @@
+/* Optimized memrchr implementation using LoongArch LASX instructions.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <sys/regdef.h>
+#include <sys/asm.h>
+
+#if IS_IN (libc) && !defined __loongarch_soft_float
+
+#ifndef MEMRCHR
+# define MEMRCHR __memrchr_lasx
+#endif
+
+LEAF(MEMRCHR, 6)
+    beqz            a2, L(ret0)
+    addi.d          a2, a2, -1
+    add.d           a3, a0, a2
+    andi            t1, a3, 0x3f
+
+    bstrins.d       a3, zero, 5, 0
+    addi.d          t1, t1, 1
+    xvld            xr0, a3, 0
+    xvld            xr1, a3, 32
+
+    sub.d           t2, zero, t1
+    li.d            t3, -1
+    xvreplgr2vr.b   xr2, a1
+    andi            t4, a0, 0x3f
+
+    srl.d           t2, t3, t2
+    xvseq.b         xr0, xr0, xr2
+    xvseq.b         xr1, xr1, xr2
+    xvmsknz.b       xr0, xr0
+
+
+    xvmsknz.b       xr1, xr1
+    xvpickve.w      xr3, xr0, 4
+    xvpickve.w      xr4, xr1, 4
+    vilvl.h         vr0, vr3, vr0
+
+    vilvl.h         vr1, vr4, vr1
+    vilvl.w         vr0, vr1, vr0
+    movfr2gr.d      t0, fa0
+    and             t0, t0, t2
+
+    bltu            a2, t1, L(end)
+    bnez            t0, L(found)
+    bstrins.d       a0, zero, 5, 0
+L(loop):
+    xvld            xr0, a3, -64
+
+    xvld            xr1, a3, -32
+    addi.d          a3, a3, -64
+    xvseq.b         xr0, xr0, xr2
+    xvseq.b         xr1, xr1, xr2
+
+
+    beq             a0, a3, L(out)
+    xvmax.bu        xr3, xr0, xr1
+    xvseteqz.v      fcc0, xr3
+    bcnez           fcc0, L(loop)
+
+    xvmsknz.b       xr0, xr0
+    xvmsknz.b       xr1, xr1
+    xvpickve.w      xr3, xr0, 4
+    xvpickve.w      xr4, xr1, 4
+
+    vilvl.h         vr0, vr3, vr0
+    vilvl.h         vr1, vr4, vr1
+    vilvl.w         vr0, vr1, vr0
+    movfr2gr.d      t0, fa0
+
+L(found):
+    addi.d          a0, a3, 63
+    clz.d           t1, t0
+    sub.d           a0, a0, t1
+    jr              ra
+
+
+L(out):
+    xvmsknz.b       xr0, xr0
+    xvmsknz.b       xr1, xr1
+    xvpickve.w      xr3, xr0, 4
+    xvpickve.w      xr4, xr1, 4
+
+    vilvl.h         vr0, vr3, vr0
+    vilvl.h         vr1, vr4, vr1
+    vilvl.w         vr0, vr1, vr0
+    movfr2gr.d      t0, fa0
+
+L(end):
+    sll.d           t2, t3, t4
+    and             t0, t0, t2
+    addi.d          a0, a3, 63
+    clz.d           t1, t0
+
+    sub.d           a0, a0, t1
+    maskeqz         a0, a0, t0
+    jr              ra
+L(ret0):
+    move            a0, zero
+
+
+    jr              ra
+END(MEMRCHR)
+
+libc_hidden_builtin_def (MEMRCHR)
+#endif
diff --git a/sysdeps/loongarch/lp64/multiarch/memrchr-lsx.S b/sysdeps/loongarch/lp64/multiarch/memrchr-lsx.S
new file mode 100644
index 0000000000..39a7c8b076
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/memrchr-lsx.S
@@ -0,0 +1,105 @@
+/* Optimized memrchr implementation using LoongArch LSX instructions.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <sys/regdef.h>
+#include <sys/asm.h>
+
+#if IS_IN (libc) && !defined __loongarch_soft_float
+
+# define MEMRCHR __memrchr_lsx
+
+LEAF(MEMRCHR, 6)
+    beqz            a2, L(ret0)
+    addi.d          a2, a2, -1
+    add.d           a3, a0, a2
+    andi            t1, a3, 0x1f
+
+    bstrins.d       a3, zero, 4, 0
+    addi.d          t1, t1, 1
+    vld             vr0, a3, 0
+    vld             vr1, a3, 16
+
+    sub.d           t2, zero, t1
+    li.d            t3, -1
+    vreplgr2vr.b    vr2, a1
+    andi            t4, a0, 0x1f
+
+    srl.d           t2, t3, t2
+    vseq.b          vr0, vr0, vr2
+    vseq.b          vr1, vr1, vr2
+    vmsknz.b        vr0, vr0
+
+
+    vmsknz.b        vr1, vr1
+    vilvl.h         vr0, vr1, vr0
+    movfr2gr.s      t0, fa0
+    and             t0, t0, t2
+
+    bltu            a2, t1, L(end)
+    bnez            t0, L(found)
+    bstrins.d       a0, zero, 4, 0
+L(loop):
+    vld             vr0, a3, -32
+
+    vld             vr1, a3, -16
+    addi.d          a3, a3, -32
+    vseq.b          vr0, vr0, vr2
+    vseq.b          vr1, vr1, vr2
+
+    beq             a0, a3, L(out)
+    vmax.bu         vr3, vr0, vr1
+    vseteqz.v       fcc0, vr3
+    bcnez           fcc0, L(loop)
+
+
+    vmsknz.b        vr0, vr0
+    vmsknz.b        vr1, vr1
+    vilvl.h         vr0, vr1, vr0
+    movfr2gr.s      t0, fa0
+
+L(found):
+    addi.d          a0, a3, 31
+    clz.w           t1, t0
+    sub.d           a0, a0, t1
+    jr              ra
+
+L(out):
+    vmsknz.b        vr0, vr0
+    vmsknz.b        vr1, vr1
+    vilvl.h         vr0, vr1, vr0
+    movfr2gr.s      t0, fa0
+
+L(end):
+    sll.d           t2, t3, t4
+    and             t0, t0, t2
+    addi.d          a0, a3, 31
+    clz.w           t1, t0
+
+
+    sub.d           a0, a0, t1
+    maskeqz         a0, a0, t0
+    jr              ra
+L(ret0):
+    move            a0, zero
+
+    jr              ra
+END(MEMRCHR)
+
+libc_hidden_builtin_def (MEMRCHR)
+#endif
diff --git a/sysdeps/loongarch/lp64/multiarch/memrchr.c b/sysdeps/loongarch/lp64/multiarch/memrchr.c
new file mode 100644
index 0000000000..8baba9ab7e
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/memrchr.c
@@ -0,0 +1,33 @@
+/* Multiple versions of memrchr.
+   All versions must be listed in ifunc-impl-list.c.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* Define multiple versions only for the definition in libc.  */
+#if IS_IN (libc)
+# define memrchr __redirect_memrchr
+# include <string.h>
+# undef memrchr
+
+# define SYMBOL_NAME memrchr
+# include "ifunc-memrchr.h"
+
+libc_ifunc_redirected (__redirect_memrchr, __memrchr, IFUNC_SELECTOR ());
+libc_hidden_def (__memrchr)
+weak_alias (__memrchr, memrchr)
+
+#endif
-- 
2.40.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 4/6] LoongArch: Add ifunc support for memset{aligned, unaligned, lsx, lasx}
  2023-08-28  7:26 [PATCH 0/6] LoongArch: Add ifunc support for {raw}memchr, dengjianbo
                   ` (2 preceding siblings ...)
  2023-08-28  7:26 ` [PATCH 3/6] LoongArch: Add ifunc support for memrchr{lsx, lasx} dengjianbo
@ 2023-08-28  7:26 ` dengjianbo
  2023-08-28  7:26 ` [PATCH 5/6] LoongArch: Add ifunc support for memcmp{aligned, " dengjianbo
  2023-08-28  7:26 ` [PATCH 6/6] LoongArch: Change loongarch to LoongArch in comments dengjianbo
  5 siblings, 0 replies; 7+ messages in thread
From: dengjianbo @ 2023-08-28  7:26 UTC (permalink / raw)
  To: libc-alpha
  Cc: adhemerval.zanella, xry111, caiyinyu, xuchenghua, huangpei, dengjianbo

According to glibc memset microbenchmark test results, for LSX and LASX
versions, A few cases with length less than 8 experience performace
degradation, overall, the LASX version could reduce the runtime about
15% - 75%, LSX version could reduce the runtime about 15%-50%.

The unaligned version uses unaligned memmory access to set data which
length is less than 64 and make address aligned with 8. For this part,
the performace is better than aligned version. Comparing with the generic
version, the performance is close when the length is larger than 128. When
the length is 8-128, the unaligned version could reduce the runtime about
30%-70%, the aligned version could reduce the runtime about 20%-50%.
---
 sysdeps/loongarch/lp64/multiarch/Makefile     |   4 +
 .../lp64/multiarch/dl-symbol-redir-ifunc.h    |  24 +++
 .../lp64/multiarch/ifunc-impl-list.c          |  10 +
 .../loongarch/lp64/multiarch/memset-aligned.S | 174 ++++++++++++++++++
 .../loongarch/lp64/multiarch/memset-lasx.S    | 142 ++++++++++++++
 sysdeps/loongarch/lp64/multiarch/memset-lsx.S | 135 ++++++++++++++
 .../lp64/multiarch/memset-unaligned.S         | 162 ++++++++++++++++
 sysdeps/loongarch/lp64/multiarch/memset.c     |  37 ++++
 8 files changed, 688 insertions(+)
 create mode 100644 sysdeps/loongarch/lp64/multiarch/dl-symbol-redir-ifunc.h
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memset-aligned.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memset-lasx.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memset-lsx.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memset-unaligned.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memset.c

diff --git a/sysdeps/loongarch/lp64/multiarch/Makefile b/sysdeps/loongarch/lp64/multiarch/Makefile
index 7b87bc9055..216886c551 100644
--- a/sysdeps/loongarch/lp64/multiarch/Makefile
+++ b/sysdeps/loongarch/lp64/multiarch/Makefile
@@ -30,5 +30,9 @@ sysdep_routines += \
   memrchr-generic \
   memrchr-lsx \
   memrchr-lasx \
+  memset-aligned \
+  memset-unaligned \
+  memset-lsx \
+  memset-lasx \
 # sysdep_routines
 endif
diff --git a/sysdeps/loongarch/lp64/multiarch/dl-symbol-redir-ifunc.h b/sysdeps/loongarch/lp64/multiarch/dl-symbol-redir-ifunc.h
new file mode 100644
index 0000000000..e2723873bc
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/dl-symbol-redir-ifunc.h
@@ -0,0 +1,24 @@
+/* Symbol rediretion for loader/static initialization code.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef _DL_IFUNC_GENERIC_H
+#define _DL_IFUNC_GENERIC_H
+
+asm ("memset = __memset_aligned");
+
+#endif
diff --git a/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c
index 8bd5489ee2..37f60dde91 100644
--- a/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c
@@ -117,5 +117,15 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 #endif
 	      IFUNC_IMPL_ADD (array, i, memrchr, 1, __memrchr_generic)
 	      )
+
+  IFUNC_IMPL (i, name, memset,
+#if !defined __loongarch_soft_float
+	      IFUNC_IMPL_ADD (array, i, memset, SUPPORT_LASX, __memset_lasx)
+	      IFUNC_IMPL_ADD (array, i, memset, SUPPORT_LSX, __memset_lsx)
+#endif
+	      IFUNC_IMPL_ADD (array, i, memset, SUPPORT_UAL, __memset_unaligned)
+	      IFUNC_IMPL_ADD (array, i, memset, 1, __memset_aligned)
+	      )
+
   return i;
 }
diff --git a/sysdeps/loongarch/lp64/multiarch/memset-aligned.S b/sysdeps/loongarch/lp64/multiarch/memset-aligned.S
new file mode 100644
index 0000000000..1fce95b714
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/memset-aligned.S
@@ -0,0 +1,174 @@
+/* Optimized memset aligned implementation using basic LoongArch instructions.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <sys/regdef.h>
+#include <sys/asm.h>
+
+#if IS_IN (libc)
+# define MEMSET_NAME __memset_aligned
+#else
+# define MEMSET_NAME memset
+#endif
+
+LEAF(MEMSET_NAME, 6)
+    move        t0, a0
+    andi        a3, a0, 0x7
+    li.w        t6, 16
+    beqz        a3, L(align)
+    bltu        a2, t6, L(short_data)
+
+L(make_align):
+    li.w        t8, 8
+    sub.d       t2, t8, a3
+    pcaddi      t1, 11
+    slli.d      t3, t2, 2
+    sub.d       t1, t1, t3
+    jr          t1
+
+L(al7):
+    st.b        a1, t0, 6
+L(al6):
+    st.b        a1, t0, 5
+L(al5):
+    st.b        a1, t0, 4
+L(al4):
+    st.b        a1, t0, 3
+L(al3):
+    st.b        a1, t0, 2
+L(al2):
+    st.b        a1, t0, 1
+L(al1):
+    st.b        a1, t0, 0
+L(al0):
+    add.d       t0, t0, t2
+    sub.d       a2, a2, t2
+
+L(align):
+    bstrins.d   a1, a1, 15, 8
+    bstrins.d   a1, a1, 31, 16
+    bstrins.d   a1, a1, 63, 32
+    bltu        a2, t6, L(less_16bytes)
+
+    andi        a4, a2, 0x3f
+    beq         a4, a2, L(less_64bytes)
+
+    sub.d       t1, a2, a4
+    move        a2, a4
+    add.d       a5, t0, t1
+
+L(loop_64bytes):
+    addi.d      t0, t0, 64
+    st.d        a1, t0, -64
+    st.d        a1, t0, -56
+    st.d        a1, t0, -48
+    st.d        a1, t0, -40
+
+    st.d        a1, t0, -32
+    st.d        a1, t0, -24
+    st.d        a1, t0, -16
+    st.d        a1, t0, -8
+    bne         t0, a5, L(loop_64bytes)
+
+L(less_64bytes):
+    srai.d      a4, a2, 5
+    beqz        a4, L(less_32bytes)
+    addi.d      a2, a2, -32
+    st.d        a1, t0, 0
+
+    st.d        a1, t0, 8
+    st.d        a1, t0, 16
+    st.d        a1, t0, 24
+    addi.d      t0, t0, 32
+
+L(less_32bytes):
+    bltu        a2, t6, L(less_16bytes)
+    addi.d      a2, a2, -16
+    st.d        a1, t0, 0
+    st.d        a1, t0, 8
+    addi.d      t0, t0, 16
+
+L(less_16bytes):
+    srai.d      a4, a2, 3
+    beqz        a4, L(less_8bytes)
+    addi.d      a2, a2, -8
+    st.d        a1, t0, 0
+    addi.d      t0, t0, 8
+
+L(less_8bytes):
+    beqz        a2, L(less_1byte)
+    srai.d      a4, a2, 2
+    beqz        a4, L(less_4bytes)
+    addi.d      a2, a2, -4
+    st.w        a1, t0, 0
+    addi.d      t0, t0, 4
+
+L(less_4bytes):
+    srai.d      a3, a2, 1
+    beqz        a3, L(less_2bytes)
+    addi.d      a2, a2, -2
+    st.h        a1, t0, 0
+    addi.d      t0, t0, 2
+
+L(less_2bytes):
+    beqz        a2, L(less_1byte)
+    st.b        a1, t0, 0
+L(less_1byte):
+    jr          ra
+
+L(short_data):
+    pcaddi      t1, 19
+    slli.d      t3, a2, 2
+    sub.d       t1, t1, t3
+    jr          t1
+L(short_15):
+    st.b        a1, a0, 14
+L(short_14):
+    st.b        a1, a0, 13
+L(short_13):
+    st.b        a1, a0, 12
+L(short_12):
+    st.b        a1, a0, 11
+L(short_11):
+    st.b        a1, a0, 10
+L(short_10):
+    st.b        a1, a0, 9
+L(short_9):
+    st.b        a1, a0, 8
+L(short_8):
+    st.b        a1, a0, 7
+L(short_7):
+    st.b        a1, a0, 6
+L(short_6):
+    st.b        a1, a0, 5
+L(short_5):
+    st.b        a1, a0, 4
+L(short_4):
+    st.b        a1, a0, 3
+L(short_3):
+    st.b        a1, a0, 2
+L(short_2):
+    st.b        a1, a0, 1
+L(short_1):
+    st.b        a1, a0, 0
+L(short_0):
+    jr          ra
+END(MEMSET_NAME)
+
+libc_hidden_builtin_def (MEMSET_NAME)
diff --git a/sysdeps/loongarch/lp64/multiarch/memset-lasx.S b/sysdeps/loongarch/lp64/multiarch/memset-lasx.S
new file mode 100644
index 0000000000..041abbac87
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/memset-lasx.S
@@ -0,0 +1,142 @@
+/* Optimized memset implementation using LoongArch LASX instructions.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <sys/regdef.h>
+#include <sys/asm.h>
+
+#if IS_IN (libc) && !defined __loongarch_soft_float
+
+# define MEMSET __memset_lasx
+
+LEAF(MEMSET, 6)
+    li.d            t1, 32
+    move            a3, a0
+    xvreplgr2vr.b   xr0, a1
+    add.d           a4, a0, a2
+
+    bgeu            t1, a2, L(less_32bytes)
+    li.d            t3, 128
+    li.d            t2, 64
+    blt             t3, a2, L(long_bytes)
+
+L(less_128bytes):
+    bgeu            t2, a2, L(less_64bytes)
+    xvst            xr0, a3, 0
+    xvst            xr0, a3, 32
+    xvst            xr0, a4, -32
+
+    xvst            xr0, a4, -64
+    jr              ra
+L(less_64bytes):
+    xvst            xr0, a3, 0
+    xvst            xr0, a4, -32
+
+
+    jr              ra
+L(less_32bytes):
+    srli.d          t0, a2, 4
+    beqz            t0, L(less_16bytes)
+    vst             vr0, a3, 0
+
+    vst             vr0, a4, -16
+    jr              ra
+L(less_16bytes):
+    srli.d          t0, a2, 3
+    beqz            t0, L(less_8bytes)
+
+    vstelm.d        vr0, a3, 0, 0
+    vstelm.d        vr0, a4, -8, 0
+    jr              ra
+L(less_8bytes):
+    srli.d          t0, a2, 2
+
+    beqz            t0, L(less_4bytes)
+    vstelm.w        vr0, a3, 0, 0
+    vstelm.w        vr0, a4, -4, 0
+    jr              ra
+
+
+L(less_4bytes):
+    srli.d          t0, a2, 1
+    beqz            t0, L(less_2bytes)
+    vstelm.h        vr0, a3, 0, 0
+    vstelm.h        vr0, a4, -2, 0
+
+    jr              ra
+L(less_2bytes):
+    beqz            a2, L(less_1bytes)
+    st.b            a1, a3, 0
+L(less_1bytes):
+    jr              ra
+
+L(long_bytes):
+    xvst            xr0, a3, 0
+    bstrins.d       a3, zero, 4, 0
+    addi.d          a3, a3, 32
+    sub.d           a2, a4, a3
+
+    andi            t0, a2, 0xff
+    beq             t0, a2, L(long_end)
+    move            a2, t0
+    sub.d           t0, a4, t0
+
+
+L(loop_256):
+    xvst            xr0, a3, 0
+    xvst            xr0, a3, 32
+    xvst            xr0, a3, 64
+    xvst            xr0, a3, 96
+
+    xvst            xr0, a3, 128
+    xvst            xr0, a3, 160
+    xvst            xr0, a3, 192
+    xvst            xr0, a3, 224
+
+    addi.d          a3, a3, 256
+    bne             a3, t0, L(loop_256)
+L(long_end):
+    bltu            a2, t3, L(end_less_128)
+    addi.d          a2, a2, -128
+
+    xvst            xr0, a3, 0
+    xvst            xr0, a3, 32
+    xvst            xr0, a3, 64
+    xvst            xr0, a3, 96
+
+
+    addi.d          a3, a3, 128
+L(end_less_128):
+    bltu            a2, t2, L(end_less_64)
+    addi.d          a2, a2, -64
+    xvst            xr0, a3, 0
+
+    xvst            xr0, a3, 32
+    addi.d          a3, a3, 64
+L(end_less_64):
+    bltu            a2, t1, L(end_less_32)
+    xvst            xr0, a3, 0
+
+L(end_less_32):
+    xvst            xr0, a4, -32
+    jr              ra
+END(MEMSET)
+
+libc_hidden_builtin_def (MEMSET)
+#endif
diff --git a/sysdeps/loongarch/lp64/multiarch/memset-lsx.S b/sysdeps/loongarch/lp64/multiarch/memset-lsx.S
new file mode 100644
index 0000000000..3d3982aa5a
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/memset-lsx.S
@@ -0,0 +1,135 @@
+/* Optimized memset implementation using LoongArch LSX instructions.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <sys/regdef.h>
+#include <sys/asm.h>
+
+#if IS_IN (libc) && !defined __loongarch_soft_float
+
+# define MEMSET __memset_lsx
+
+LEAF(MEMSET, 6)
+    li.d            t1, 16
+    move            a3, a0
+    vreplgr2vr.b    vr0, a1
+    add.d           a4, a0, a2
+
+    bgeu            t1, a2, L(less_16bytes)
+    li.d            t3, 64
+    li.d            t2, 32
+    bgeu            a2, t3, L(long_bytes)
+
+L(less_64bytes):
+    bgeu            t2, a2, L(less_32bytes)
+    vst             vr0, a3, 0
+    vst             vr0, a3, 16
+    vst             vr0, a4, -32
+
+    vst             vr0, a4, -16
+    jr              ra
+L(less_32bytes):
+    vst             vr0, a3, 0
+    vst             vr0, a4, -16
+
+
+    jr              ra
+L(less_16bytes):
+    srli.d          t0, a2, 3
+    beqz            t0, L(less_8bytes)
+    vstelm.d        vr0, a3, 0, 0
+
+    vstelm.d        vr0, a4, -8, 0
+    jr              ra
+L(less_8bytes):
+    srli.d          t0, a2, 2
+    beqz            t0, L(less_4bytes)
+
+    vstelm.w        vr0, a3, 0, 0
+    vstelm.w        vr0, a4, -4, 0
+    jr              ra
+L(less_4bytes):
+    srli.d          t0, a2, 1
+
+    beqz            t0, L(less_2bytes)
+    vstelm.h        vr0, a3, 0, 0
+    vstelm.h        vr0, a4, -2, 0
+    jr              ra
+
+
+L(less_2bytes):
+    beqz            a2, L(less_1bytes)
+    vstelm.b        vr0, a3, 0, 0
+L(less_1bytes):
+    jr              ra
+L(long_bytes):
+    vst             vr0, a3, 0
+
+    bstrins.d       a3, zero, 3, 0
+    addi.d          a3, a3, 16
+    sub.d           a2, a4, a3
+    andi            t0, a2, 0x7f
+
+    beq             t0, a2, L(long_end)
+    move            a2, t0
+    sub.d           t0, a4, t0
+
+L(loop_128):
+    vst             vr0, a3, 0
+
+    vst             vr0, a3, 16
+    vst             vr0, a3, 32
+    vst             vr0, a3, 48
+    vst             vr0, a3, 64
+
+
+    vst             vr0, a3, 80
+    vst             vr0, a3, 96
+    vst             vr0, a3, 112
+    addi.d          a3, a3, 128
+
+    bne             a3, t0, L(loop_128)
+L(long_end):
+    bltu            a2, t3, L(end_less_64)
+    addi.d          a2, a2, -64
+    vst             vr0, a3, 0
+
+    vst             vr0, a3, 16
+    vst             vr0, a3, 32
+    vst             vr0, a3, 48
+    addi.d          a3, a3, 64
+
+L(end_less_64):
+    bltu            a2, t2, L(end_less_32)
+    addi.d          a2, a2, -32
+    vst             vr0, a3, 0
+    vst             vr0, a3, 16
+
+    addi.d          a3, a3, 32
+L(end_less_32):
+    bltu            a2, t1, L(end_less_16)
+    vst             vr0, a3, 0
+
+L(end_less_16):
+    vst             vr0, a4, -16
+    jr              ra
+END(MEMSET)
+
+libc_hidden_builtin_def (MEMSET)
+#endif
diff --git a/sysdeps/loongarch/lp64/multiarch/memset-unaligned.S b/sysdeps/loongarch/lp64/multiarch/memset-unaligned.S
new file mode 100644
index 0000000000..f7d32039df
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/memset-unaligned.S
@@ -0,0 +1,162 @@
+/* Optimized memset unaligned implementation using basic LoongArch instructions.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <sys/regdef.h>
+#include <sys/asm.h>
+
+#if IS_IN (libc)
+
+# define MEMSET_NAME __memset_unaligned
+
+#define ST_128(n)              \
+    st.d        a1, a0, n;     \
+    st.d        a1, a0, n+8  ; \
+    st.d        a1, a0, n+16 ; \
+    st.d        a1, a0, n+24 ; \
+    st.d        a1, a0, n+32 ; \
+    st.d        a1, a0, n+40 ; \
+    st.d        a1, a0, n+48 ; \
+    st.d        a1, a0, n+56 ; \
+    st.d        a1, a0, n+64 ; \
+    st.d        a1, a0, n+72 ; \
+    st.d        a1, a0, n+80 ; \
+    st.d        a1, a0, n+88 ; \
+    st.d        a1, a0, n+96 ; \
+    st.d        a1, a0, n+104; \
+    st.d        a1, a0, n+112; \
+    st.d        a1, a0, n+120;
+
+LEAF(MEMSET_NAME, 6)
+    bstrins.d   a1, a1, 15, 8
+    add.d       t7, a0, a2
+    bstrins.d   a1, a1, 31, 16
+    move        t0, a0
+
+    bstrins.d   a1, a1, 63, 32
+    srai.d      t8, a2, 4
+    beqz        t8, L(less_16bytes)
+    srai.d      t8, a2, 6
+
+    bnez        t8, L(more_64bytes)
+    srai.d      t8, a2, 5
+    beqz        t8, L(less_32bytes)
+
+    st.d        a1, a0, 0
+    st.d        a1, a0, 8
+    st.d        a1, a0, 16
+    st.d        a1, a0, 24
+
+    st.d        a1, t7, -32
+    st.d        a1, t7, -24
+    st.d        a1, t7, -16
+    st.d        a1, t7, -8
+
+    jr          ra
+
+L(less_32bytes):
+    st.d        a1, a0, 0
+    st.d        a1, a0, 8
+    st.d        a1, t7, -16
+    st.d        a1, t7, -8
+
+    jr          ra
+
+L(less_16bytes):
+    srai.d      t8, a2, 3
+    beqz        t8, L(less_8bytes)
+    st.d        a1, a0, 0
+    st.d        a1, t7, -8
+
+    jr          ra
+
+L(less_8bytes):
+    srai.d      t8, a2, 2
+    beqz        t8, L(less_4bytes)
+    st.w        a1, a0, 0
+    st.w        a1, t7, -4
+
+    jr          ra
+
+L(less_4bytes):
+    srai.d      t8, a2, 1
+    beqz        t8, L(less_2bytes)
+    st.h        a1, a0, 0
+    st.h        a1, t7, -2
+
+    jr          ra
+
+L(less_2bytes):
+    beqz        a2, L(less_1bytes)
+    st.b        a1, a0, 0
+
+    jr          ra
+
+L(less_1bytes):
+    jr          ra
+
+L(more_64bytes):
+    srli.d      a0, a0, 3
+    slli.d      a0, a0, 3
+    addi.d      a0, a0, 0x8
+    st.d        a1, t0, 0
+
+    sub.d       t2, t0, a0
+    add.d       a2, t2, a2
+    addi.d      a2, a2, -0x80
+    blt         a2, zero, L(end_unalign_proc)
+
+L(loop_less):
+    ST_128(0)
+    addi.d      a0, a0,  0x80
+    addi.d      a2, a2, -0x80
+    bge         a2, zero, L(loop_less)
+
+L(end_unalign_proc):
+    addi.d      a2, a2, 0x80
+    pcaddi      t1, 20
+    andi        t5, a2, 0x78
+    srli.d      t5, t5, 1
+
+    sub.d       t1, t1, t5
+    jr          t1
+
+    st.d        a1, a0, 112
+    st.d        a1, a0, 104
+    st.d        a1, a0, 96
+    st.d        a1, a0, 88
+    st.d        a1, a0, 80
+    st.d        a1, a0, 72
+    st.d        a1, a0, 64
+    st.d        a1, a0, 56
+    st.d        a1, a0, 48
+    st.d        a1, a0, 40
+    st.d        a1, a0, 32
+    st.d        a1, a0, 24
+    st.d        a1, a0, 16
+    st.d        a1, a0, 8
+    st.d        a1, a0, 0
+    st.d        a1, t7, -8
+
+    move        a0, t0
+    jr          ra
+END(MEMSET_NAME)
+
+libc_hidden_builtin_def (MEMSET_NAME)
+#endif
diff --git a/sysdeps/loongarch/lp64/multiarch/memset.c b/sysdeps/loongarch/lp64/multiarch/memset.c
new file mode 100644
index 0000000000..3ff60d8ac7
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/memset.c
@@ -0,0 +1,37 @@
+/* Multiple versions of memset.
+   All versions must be listed in ifunc-impl-list.c.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* Define multiple versions only for the definition in libc.  */
+#if IS_IN (libc)
+# define memset __redirect_memset
+# include <string.h>
+# undef memset
+
+# define SYMBOL_NAME memset
+# include "ifunc-lasx.h"
+
+libc_ifunc_redirected (__redirect_memset, memset,
+		       IFUNC_SELECTOR ());
+
+# ifdef SHARED
+__hidden_ver1 (memset, __GI_memset, __redirect_memset)
+  __attribute__ ((visibility ("hidden"))) __attribute_copy__ (memset);
+# endif
+
+#endif
-- 
2.40.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 5/6] LoongArch: Add ifunc support for memcmp{aligned, lsx, lasx}
  2023-08-28  7:26 [PATCH 0/6] LoongArch: Add ifunc support for {raw}memchr, dengjianbo
                   ` (3 preceding siblings ...)
  2023-08-28  7:26 ` [PATCH 4/6] LoongArch: Add ifunc support for memset{aligned, unaligned, lsx, lasx} dengjianbo
@ 2023-08-28  7:26 ` dengjianbo
  2023-08-28  7:26 ` [PATCH 6/6] LoongArch: Change loongarch to LoongArch in comments dengjianbo
  5 siblings, 0 replies; 7+ messages in thread
From: dengjianbo @ 2023-08-28  7:26 UTC (permalink / raw)
  To: libc-alpha
  Cc: adhemerval.zanella, xry111, caiyinyu, xuchenghua, huangpei, dengjianbo

According to glibc memcmp microbenchmark test results(Add generic
memcmp), this implementation have performance improvement
except the length is less than 3, details as below:

Name             Percent of time reduced
memcmp-lasx      16%-74%
memcmp-lsx       20%-50%
memcmp-aligned   5%-20%
---
 sysdeps/loongarch/lp64/multiarch/Makefile     |   3 +
 .../lp64/multiarch/ifunc-impl-list.c          |   7 +
 .../loongarch/lp64/multiarch/ifunc-memcmp.h   |  40 +++
 .../loongarch/lp64/multiarch/memcmp-aligned.S | 292 ++++++++++++++++++
 .../loongarch/lp64/multiarch/memcmp-lasx.S    | 207 +++++++++++++
 sysdeps/loongarch/lp64/multiarch/memcmp-lsx.S | 269 ++++++++++++++++
 sysdeps/loongarch/lp64/multiarch/memcmp.c     |  43 +++
 7 files changed, 861 insertions(+)
 create mode 100644 sysdeps/loongarch/lp64/multiarch/ifunc-memcmp.h
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memcmp-aligned.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memcmp-lasx.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memcmp-lsx.S
 create mode 100644 sysdeps/loongarch/lp64/multiarch/memcmp.c

diff --git a/sysdeps/loongarch/lp64/multiarch/Makefile b/sysdeps/loongarch/lp64/multiarch/Makefile
index 216886c551..360a6718c0 100644
--- a/sysdeps/loongarch/lp64/multiarch/Makefile
+++ b/sysdeps/loongarch/lp64/multiarch/Makefile
@@ -34,5 +34,8 @@ sysdep_routines += \
   memset-unaligned \
   memset-lsx \
   memset-lasx \
+  memcmp-aligned \
+  memcmp-lsx \
+  memcmp-lasx \
 # sysdep_routines
 endif
diff --git a/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c
index 37f60dde91..e397d58c9d 100644
--- a/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/loongarch/lp64/multiarch/ifunc-impl-list.c
@@ -127,5 +127,12 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 	      IFUNC_IMPL_ADD (array, i, memset, 1, __memset_aligned)
 	      )
 
+  IFUNC_IMPL (i, name, memcmp,
+#if !defined __loongarch_soft_float
+	      IFUNC_IMPL_ADD (array, i, memcmp, SUPPORT_LASX, __memcmp_lasx)
+	      IFUNC_IMPL_ADD (array, i, memcmp, SUPPORT_LSX, __memcmp_lsx)
+#endif
+	      IFUNC_IMPL_ADD (array, i, memcmp, 1, __memcmp_aligned)
+	      )
   return i;
 }
diff --git a/sysdeps/loongarch/lp64/multiarch/ifunc-memcmp.h b/sysdeps/loongarch/lp64/multiarch/ifunc-memcmp.h
new file mode 100644
index 0000000000..04adc2e561
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/ifunc-memcmp.h
@@ -0,0 +1,40 @@
+/* Common definition for memcmp ifunc selections.
+   All versions must be listed in ifunc-impl-list.c.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <ldsodefs.h>
+#include <ifunc-init.h>
+
+#if !defined __loongarch_soft_float
+extern __typeof (REDIRECT_NAME) OPTIMIZE (lasx) attribute_hidden;
+extern __typeof (REDIRECT_NAME) OPTIMIZE (lsx) attribute_hidden;
+#endif
+extern __typeof (REDIRECT_NAME) OPTIMIZE (aligned) attribute_hidden;
+
+static inline void *
+IFUNC_SELECTOR (void)
+{
+#if !defined __loongarch_soft_float
+  if (SUPPORT_LASX)
+    return OPTIMIZE (lasx);
+  else if (SUPPORT_LSX)
+    return OPTIMIZE (lsx);
+  else
+#endif
+    return OPTIMIZE (aligned);
+}
diff --git a/sysdeps/loongarch/lp64/multiarch/memcmp-aligned.S b/sysdeps/loongarch/lp64/multiarch/memcmp-aligned.S
new file mode 100644
index 0000000000..14a7caa9a8
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/memcmp-aligned.S
@@ -0,0 +1,292 @@
+/* Optimized memcmp implementation using basic LoongArch instructions.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <sys/regdef.h>
+#include <sys/asm.h>
+
+#if IS_IN (libc)
+# define MEMCMP_NAME __memcmp_aligned
+#else
+# define MEMCMP_NAME memcmp
+#endif
+
+LEAF(MEMCMP_NAME, 6)
+    beqz        a2, L(ret)
+    andi        a4, a1, 0x7
+    andi        a3, a0, 0x7
+    sltu        a5, a4, a3
+
+    xor         t0, a0, a1
+    li.w        t8, 8
+    maskeqz     t0, t0, a5
+    li.w        t7, -1
+
+    xor         a0, a0, t0
+    xor         a1, a1, t0
+    andi        a3, a0, 0x7
+    andi        a4, a1, 0x7
+
+    xor         a0, a0, a3
+    xor         a1, a1, a4
+    ld.d        t2, a0, 0
+    ld.d        t1, a1, 0
+
+    slli.d      t3, a3, 3
+    slli.d      t4, a4, 3
+    sub.d       a6, t3, t4
+    srl.d       t1, t1, t4
+
+    srl.d       t0, t2, t3
+    srl.d       t5, t7, t4
+    sub.d       t6, t0, t1
+    and         t6, t6, t5
+
+    sub.d       t5, t8, a4
+    bnez        t6, L(first_out)
+    bgeu        t5, a2, L(ret)
+    sub.d       a2, a2, t5
+
+    bnez        a6, L(unaligned)
+    blt         a2, t8, L(al_less_8bytes)
+    andi        t1, a2, 31
+    beq         t1, a2, L(al_less_32bytes)
+
+    sub.d       t2, a2, t1
+    add.d       a4, a0, t2
+    move        a2, t1
+
+L(al_loop):
+    ld.d        t0, a0, 8
+
+    ld.d        t1, a1, 8
+    ld.d        t2, a0, 16
+    ld.d        t3, a1, 16
+    ld.d        t4, a0, 24
+
+    ld.d        t5, a1, 24
+    ld.d        t6, a0, 32
+    ld.d        t7, a1, 32
+    addi.d      a0, a0, 32
+
+    addi.d      a1, a1, 32
+    bne         t0, t1, L(out1)
+    bne         t2, t3, L(out2)
+    bne         t4, t5, L(out3)
+
+    bne         t6, t7, L(out4)
+    bne         a0, a4, L(al_loop)
+
+L(al_less_32bytes):
+    srai.d      a4, a2, 4
+    beqz        a4, L(al_less_16bytes)
+
+    ld.d        t0, a0, 8
+    ld.d        t1, a1, 8
+    ld.d        t2, a0, 16
+    ld.d        t3, a1, 16
+
+    addi.d      a0, a0, 16
+    addi.d      a1, a1, 16
+    addi.d      a2, a2, -16
+    bne         t0, t1, L(out1)
+
+    bne         t2, t3, L(out2)
+
+L(al_less_16bytes):
+    srai.d      a4, a2, 3
+    beqz        a4, L(al_less_8bytes)
+    ld.d        t0, a0, 8
+
+    ld.d        t1, a1, 8
+    addi.d      a0, a0, 8
+    addi.d      a1, a1, 8
+    addi.d      a2, a2, -8
+
+    bne         t0, t1, L(out1)
+
+L(al_less_8bytes):
+    beqz        a2, L(ret)
+    ld.d        t0, a0, 8
+    ld.d        t1, a1, 8
+
+    li.d        t7, -1
+    slli.d      t2, a2, 3
+    sll.d       t2, t7, t2
+    sub.d       t3, t0, t1
+
+    andn        t6, t3, t2
+    bnez        t6, L(count_diff)
+
+L(ret):
+    move        a0, zero
+    jr          ra
+
+L(out4):
+    move        t0, t6
+    move        t1, t7
+    sub.d       t6, t6, t7
+    b           L(count_diff)
+
+L(out3):
+    move        t0, t4
+    move        t1, t5
+    sub.d       t6, t4, t5
+    b           L(count_diff)
+
+L(out2):
+    move        t0, t2
+    move        t1, t3
+L(out1):
+    sub.d       t6, t0, t1
+    b           L(count_diff)
+
+L(first_out):
+    slli.d      t4, a2, 3
+    slt         t3, a2, t5
+    sll.d       t4, t7, t4
+    maskeqz     t4, t4, t3
+
+    andn        t6, t6, t4
+
+L(count_diff):
+    ctz.d       t2, t6
+    bstrins.d   t2, zero, 2, 0
+    srl.d       t0, t0, t2
+
+    srl.d       t1, t1, t2
+    andi        t0, t0, 0xff
+    andi        t1, t1, 0xff
+    sub.d       t2, t0, t1
+
+    sub.d       t3, t1, t0
+    masknez     t2, t2, a5
+    maskeqz     t3, t3, a5
+    or          a0, t2, t3
+
+    jr          ra
+
+L(unaligned):
+    sub.d       a7, zero, a6
+    srl.d       t0, t2, a6
+    blt         a2, t8, L(un_less_8bytes)
+
+    andi        t1, a2, 31
+    beq         t1, a2, L(un_less_32bytes)
+    sub.d       t2, a2, t1
+    add.d       a4, a0, t2
+
+    move        a2, t1
+
+L(un_loop):
+    ld.d        t2, a0, 8
+    ld.d        t1, a1, 8
+    ld.d        t4, a0, 16
+
+    ld.d        t3, a1, 16
+    ld.d        t6, a0, 24
+    ld.d        t5, a1, 24
+    ld.d        t8, a0, 32
+
+    ld.d        t7, a1, 32
+    addi.d      a0, a0, 32
+    addi.d      a1, a1, 32
+    sll.d       a3, t2, a7
+
+    or          t0, a3, t0
+    bne         t0, t1, L(out1)
+    srl.d       t0, t2, a6
+    sll.d       a3, t4, a7
+
+    or          t2, a3, t0
+    bne         t2, t3, L(out2)
+    srl.d       t0, t4, a6
+    sll.d       a3, t6, a7
+
+    or          t4, a3, t0
+    bne         t4, t5, L(out3)
+    srl.d       t0, t6, a6
+    sll.d       a3, t8, a7
+
+    or          t6, t0, a3
+    bne         t6, t7, L(out4)
+    srl.d       t0, t8, a6
+    bne         a0, a4, L(un_loop)
+
+L(un_less_32bytes):
+    srai.d      a4, a2, 4
+    beqz        a4, L(un_less_16bytes)
+    ld.d        t2, a0, 8
+    ld.d        t1, a1, 8
+
+    ld.d        t4, a0, 16
+    ld.d        t3, a1, 16
+    addi.d      a0, a0, 16
+    addi.d      a1, a1, 16
+
+    addi.d      a2, a2, -16
+    sll.d       a3, t2, a7
+    or          t0, a3, t0
+    bne         t0, t1, L(out1)
+
+    srl.d       t0, t2, a6
+    sll.d       a3, t4, a7
+    or          t2, a3, t0
+    bne         t2, t3, L(out2)
+
+    srl.d       t0, t4, a6
+
+L(un_less_16bytes):
+    srai.d      a4, a2, 3
+    beqz        a4, L(un_less_8bytes)
+    ld.d        t2, a0, 8
+
+    ld.d        t1, a1, 8
+    addi.d      a0, a0, 8
+    addi.d      a1, a1, 8
+    addi.d      a2, a2, -8
+
+    sll.d       a3, t2, a7
+    or          t0, a3, t0
+    bne         t0, t1, L(out1)
+    srl.d       t0, t2, a6
+
+L(un_less_8bytes):
+    beqz        a2, L(ret)
+    andi        a7, a7, 63
+    slli.d      a4, a2, 3
+    bgeu        a7, a4, L(last_cmp)
+
+    ld.d        t2, a0, 8
+    sll.d       a3, t2, a7
+    or          t0, a3, t0
+
+L(last_cmp):
+    ld.d        t1, a1, 8
+
+    li.d        t7, -1
+    sll.d       t2, t7, a4
+    sub.d       t3, t0, t1
+    andn        t6, t3, t2
+
+    bnez        t6, L(count_diff)
+    move        a0, zero
+    jr          ra
+END(MEMCMP_NAME)
+
+libc_hidden_builtin_def (MEMCMP_NAME)
diff --git a/sysdeps/loongarch/lp64/multiarch/memcmp-lasx.S b/sysdeps/loongarch/lp64/multiarch/memcmp-lasx.S
new file mode 100644
index 0000000000..3151a17927
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/memcmp-lasx.S
@@ -0,0 +1,207 @@
+/* Optimized memcmp implementation using LoongArch LASX instructions.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <sys/regdef.h>
+#include <sys/asm.h>
+
+#if IS_IN (libc) && !defined __loongarch_soft_float
+
+# define MEMCMP __memcmp_lasx
+
+LEAF(MEMCMP, 6)
+    li.d            t2, 32
+    add.d           a3, a0, a2
+    add.d           a4, a1, a2
+    bgeu            t2, a2, L(less32)
+
+    li.d            t1, 160
+    bgeu            a2, t1, L(make_aligned)
+L(loop32):
+    xvld            xr0, a0, 0
+    xvld            xr1, a1, 0
+
+    addi.d          a0, a0, 32
+    addi.d          a1, a1, 32
+    addi.d          a2, a2, -32
+    xvseq.b         xr2, xr0, xr1
+
+    xvsetanyeqz.b   fcc0, xr2
+    bcnez           fcc0, L(end)
+L(last_bytes):
+    bltu            t2, a2, L(loop32)
+    xvld            xr0, a3, -32
+
+
+    xvld            xr1, a4, -32
+    xvseq.b         xr2, xr0, xr1
+L(end):
+    xvmsknz.b       xr2, xr2
+    xvpermi.q       xr4, xr0, 1
+
+    xvpickve.w      xr3, xr2, 4
+    xvpermi.q       xr5, xr1, 1
+    vilvl.h         vr2, vr3, vr2
+    movfr2gr.s      t0, fa2
+
+    cto.w           t0, t0
+    vreplgr2vr.b    vr2, t0
+    vshuf.b         vr0, vr4, vr0, vr2
+    vshuf.b         vr1, vr5, vr1, vr2
+
+    vpickve2gr.bu   t0, vr0, 0
+    vpickve2gr.bu   t1, vr1, 0
+    sub.d           a0, t0, t1
+    jr              ra
+
+
+L(less32):
+    srli.d          t0, a2, 4
+    beqz            t0, L(less16)
+    vld             vr0, a0, 0
+    vld             vr1, a1, 0
+
+    vld             vr2, a3, -16
+    vld             vr3, a4, -16
+L(short_ret):
+    vseq.b          vr4, vr0, vr1
+    vseq.b          vr5, vr2, vr3
+
+    vmsknz.b        vr4, vr4
+    vmsknz.b        vr5, vr5
+    vilvl.h         vr4, vr5, vr4
+    movfr2gr.s      t0, fa4
+
+    cto.w           t0, t0
+    vreplgr2vr.b    vr4, t0
+    vshuf.b         vr0, vr2, vr0, vr4
+    vshuf.b         vr1, vr3, vr1, vr4
+
+
+    vpickve2gr.bu   t0, vr0, 0
+    vpickve2gr.bu   t1, vr1, 0
+    sub.d           a0, t0, t1
+    jr              ra
+
+L(less16):
+    srli.d          t0, a2, 3
+    beqz            t0, L(less8)
+    vldrepl.d       vr0, a0, 0
+    vldrepl.d       vr1, a1, 0
+
+    vldrepl.d       vr2, a3, -8
+    vldrepl.d       vr3, a4, -8
+    b               L(short_ret)
+    nop
+
+L(less8):
+    srli.d          t0, a2, 2
+    beqz            t0, L(less4)
+    vldrepl.w       vr0, a0, 0
+    vldrepl.w       vr1, a1, 0
+
+
+    vldrepl.w       vr2, a3, -4
+    vldrepl.w       vr3, a4, -4
+    b               L(short_ret)
+    nop
+
+L(less4):
+    srli.d          t0, a2, 1
+    beqz            t0, L(less2)
+    vldrepl.h       vr0, a0, 0
+    vldrepl.h       vr1, a1, 0
+
+    vldrepl.h       vr2, a3, -2
+    vldrepl.h       vr3, a4, -2
+    b               L(short_ret)
+    nop
+
+L(less2):
+    beqz            a2, L(ret0)
+    ld.bu           t0, a0, 0
+    ld.bu           t1, a1, 0
+    sub.d           a0, t0, t1
+
+    jr              ra
+L(ret0):
+    move            a0, zero
+    jr              ra
+
+L(make_aligned):
+    xvld            xr0, a0, 0
+
+    xvld            xr1, a1, 0
+    xvseq.b         xr2, xr0, xr1
+    xvsetanyeqz.b   fcc0, xr2
+    bcnez           fcc0, L(end)
+
+    andi            t0, a0, 0x1f
+    sub.d           t0, t2, t0
+    sub.d           t1, a2, t0
+    add.d           a0, a0, t0
+
+    add.d           a1, a1, t0
+    andi            a2, t1, 0x3f
+    sub.d           t0, t1, a2
+    add.d           a5, a0, t0
+
+
+L(loop_align):
+    xvld            xr0, a0, 0
+    xvld            xr1, a1, 0
+    xvld            xr2, a0, 32
+    xvld            xr3, a1, 32
+
+    xvseq.b         xr0, xr0, xr1
+    xvseq.b         xr1, xr2, xr3
+    xvmin.bu        xr2, xr1, xr0
+    xvsetanyeqz.b   fcc0, xr2
+
+    bcnez           fcc0, L(pair_end)
+    addi.d          a0, a0, 64
+    addi.d          a1, a1, 64
+    bne             a0, a5, L(loop_align)
+
+    bnez            a2, L(last_bytes)
+    move            a0, zero
+    jr              ra
+    nop
+
+
+L(pair_end):
+    xvmsknz.b       xr0, xr0
+    xvmsknz.b       xr1, xr1
+    xvpickve.w      xr2, xr0, 4
+    xvpickve.w      xr3, xr1, 4
+
+    vilvl.h         vr0, vr2, vr0
+    vilvl.h         vr1, vr3, vr1
+    vilvl.w         vr0, vr1, vr0
+    movfr2gr.d      t0, fa0
+
+    cto.d           t0, t0
+    ldx.bu          t1, a0, t0
+    ldx.bu          t2, a1, t0
+    sub.d           a0, t1, t2
+
+    jr              ra
+END(MEMCMP)
+
+libc_hidden_builtin_def (MEMCMP)
+#endif
diff --git a/sysdeps/loongarch/lp64/multiarch/memcmp-lsx.S b/sysdeps/loongarch/lp64/multiarch/memcmp-lsx.S
new file mode 100644
index 0000000000..38a50a4c16
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/memcmp-lsx.S
@@ -0,0 +1,269 @@
+/* Optimized memcmp implementation using LoongArch LSX instructions.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <sys/regdef.h>
+#include <sys/asm.h>
+
+#if IS_IN (libc) && !defined __loongarch_soft_float
+
+#define MEMCMP __memcmp_lsx
+
+LEAF(MEMCMP, 6)
+    beqz            a2, L(out)
+    pcalau12i       t0, %pc_hi20(L(INDEX))
+    andi            a3, a0, 0xf
+    vld             vr5, t0, %pc_lo12(L(INDEX))
+
+    andi            a4, a1, 0xf
+    bne             a3, a4, L(unaligned)
+    bstrins.d       a0, zero, 3, 0
+    xor             a1, a1, a4
+
+    vld             vr0, a0, 0
+    vld             vr1, a1, 0
+    li.d            t0, 16
+    vreplgr2vr.b    vr3, a3
+
+    sub.d           t1, t0, a3
+    vadd.b          vr3, vr3, vr5
+    vshuf.b         vr0, vr3, vr0, vr3
+    vshuf.b         vr1, vr3, vr1, vr3
+
+
+    vseq.b          vr4, vr0, vr1
+    bgeu            t1, a2, L(al_end)
+    vsetanyeqz.b    fcc0, vr4
+    bcnez           fcc0, L(al_found)
+
+    sub.d           t1, a2, t1
+    andi            a2, t1, 31
+    beq             a2, t1, L(al_less_32bytes)
+    sub.d           t2, t1, a2
+
+    add.d           a4, a0, t2
+L(al_loop):
+    vld             vr0, a0, 16
+    vld             vr1, a1, 16
+    vld             vr2, a0, 32
+
+    vld             vr3, a1, 32
+    addi.d          a0, a0, 32
+    addi.d          a1, a1, 32
+    vseq.b          vr4, vr0, vr1
+
+
+    vseq.b          vr6, vr2, vr3
+    vand.v          vr6, vr4, vr6
+    vsetanyeqz.b    fcc0, vr6
+    bcnez           fcc0, L(al_pair_end)
+
+    bne             a0, a4, L(al_loop)
+L(al_less_32bytes):
+    bgeu            t0, a2, L(al_less_16bytes)
+    vld             vr0, a0, 16
+    vld             vr1, a1, 16
+
+    vld             vr2, a0, 32
+    vld             vr3, a1, 32
+    addi.d          a2, a2, -16
+    vreplgr2vr.b    vr6, a2
+
+    vslt.b          vr5, vr5, vr6
+    vseq.b          vr4, vr0, vr1
+    vseq.b          vr6, vr2, vr3
+    vorn.v          vr6, vr6, vr5
+
+
+L(al_pair_end):
+    vsetanyeqz.b    fcc0, vr4
+    bcnez           fcc0, L(al_found)
+    vnori.b         vr4, vr6, 0
+    vfrstpi.b       vr4, vr4, 0
+
+    vshuf.b         vr0, vr2, vr2, vr4
+    vshuf.b         vr1, vr3, vr3, vr4
+    vpickve2gr.bu   t0, vr0, 0
+    vpickve2gr.bu   t1, vr1, 0
+
+    sub.d           a0, t0, t1
+    jr              ra
+    nop
+    nop
+
+L(al_less_16bytes):
+    beqz            a2, L(out)
+    vld             vr0, a0, 16
+    vld             vr1, a1, 16
+    vseq.b          vr4, vr0, vr1
+
+
+L(al_end):
+    vreplgr2vr.b    vr6, a2
+    vslt.b          vr5, vr5, vr6
+    vorn.v          vr4, vr4, vr5
+    nop
+
+L(al_found):
+    vnori.b         vr4, vr4, 0
+    vfrstpi.b       vr4, vr4, 0
+    vshuf.b         vr0, vr0, vr0, vr4
+    vshuf.b         vr1, vr1, vr1, vr4
+
+    vpickve2gr.bu   t0, vr0, 0
+    vpickve2gr.bu   t1, vr1, 0
+    sub.d           a0, t0, t1
+    jr              ra
+
+L(out):
+    move            a0, zero
+    jr              ra
+    nop
+    nop
+
+
+L(unaligned):
+    xor             t2, a0, a1
+    sltu            a5, a3, a4
+    masknez         t2, t2, a5
+    xor             a0, a0, t2
+
+    xor             a1, a1, t2
+    andi            a3, a0, 0xf
+    andi            a4, a1, 0xf
+    bstrins.d       a0, zero, 3, 0
+
+    xor             a1, a1, a4
+    vld             vr4, a0, 0
+    vld             vr1, a1, 0
+    li.d            t0, 16
+
+    vreplgr2vr.b    vr2, a4
+    sub.d           a6, a4, a3
+    sub.d           t1, t0, a4
+    sub.d           t2, t0, a6
+
+
+    vadd.b          vr2, vr2, vr5
+    vreplgr2vr.b    vr6, t2
+    vadd.b          vr6, vr6, vr5
+    vshuf.b         vr0, vr4, vr4, vr6
+
+    vshuf.b         vr1, vr2, vr1, vr2
+    vshuf.b         vr0, vr2, vr0, vr2
+    vseq.b          vr7, vr0, vr1
+    bgeu            t1, a2, L(un_end)
+
+    vsetanyeqz.b    fcc0, vr7
+    bcnez           fcc0, L(un_found)
+    sub.d           a2, a2, t1
+    andi            t1, a2, 31
+
+    beq             a2, t1, L(un_less_32bytes)
+    sub.d           t2, a2, t1
+    move            a2, t1
+    add.d           a4, a1, t2
+
+
+L(un_loop):
+    vld             vr2, a0, 16
+    vld             vr1, a1, 16
+    vld             vr3, a1, 32
+    addi.d          a1, a1, 32
+
+    addi.d          a0, a0, 32
+    vshuf.b         vr0, vr2, vr4, vr6
+    vld             vr4, a0, 0
+    vseq.b          vr7, vr0, vr1
+
+    vshuf.b         vr2, vr4, vr2, vr6
+    vseq.b          vr8, vr2, vr3
+    vand.v          vr8, vr7, vr8
+    vsetanyeqz.b    fcc0, vr8
+
+    bcnez           fcc0, L(un_pair_end)
+    bne             a1, a4, L(un_loop)
+
+L(un_less_32bytes):
+    bltu            a2, t0, L(un_less_16bytes)
+    vld             vr2, a0, 16
+    vld             vr1, a1, 16
+    addi.d          a0, a0, 16
+
+    addi.d          a1, a1, 16
+    addi.d          a2, a2, -16
+    vshuf.b         vr0, vr2, vr4, vr6
+    vor.v           vr4, vr2, vr2
+
+    vseq.b          vr7, vr0, vr1
+    vsetanyeqz.b    fcc0, vr7
+    bcnez           fcc0, L(un_found)
+L(un_less_16bytes):
+    beqz            a2, L(out)
+    vld             vr1, a1, 16
+    bgeu            a6, a2, 1f
+
+    vld             vr2, a0, 16
+1:
+    vshuf.b         vr0, vr2, vr4, vr6
+    vseq.b          vr7, vr0, vr1
+L(un_end):
+    vreplgr2vr.b    vr3, a2
+
+
+    vslt.b          vr3, vr5, vr3
+    vorn.v          vr7, vr7, vr3
+
+L(un_found):
+    vnori.b         vr7, vr7, 0
+    vfrstpi.b       vr7, vr7, 0
+
+    vshuf.b         vr0, vr0, vr0, vr7
+    vshuf.b         vr1, vr1, vr1, vr7
+L(calc_result):
+    vpickve2gr.bu   t0, vr0, 0
+    vpickve2gr.bu   t1, vr1, 0
+
+    sub.d           t2, t0, t1
+    sub.d           t3, t1, t0
+    masknez         t0, t3, a5
+    maskeqz         t1, t2, a5
+
+    or              a0, t0, t1
+    jr              ra
+L(un_pair_end):
+    vsetanyeqz.b    fcc0, vr7
+    bcnez           fcc0, L(un_found)
+
+
+    vnori.b         vr7, vr8, 0
+    vfrstpi.b       vr7, vr7, 0
+    vshuf.b         vr0, vr2, vr2, vr7
+    vshuf.b         vr1, vr3, vr3, vr7
+
+    b               L(calc_result)
+END(MEMCMP)
+
+    .section         .rodata.cst16,"M",@progbits,16
+    .align           4
+L(INDEX):
+    .dword           0x0706050403020100
+    .dword           0x0f0e0d0c0b0a0908
+
+libc_hidden_builtin_def (MEMCMP)
+#endif
diff --git a/sysdeps/loongarch/lp64/multiarch/memcmp.c b/sysdeps/loongarch/lp64/multiarch/memcmp.c
new file mode 100644
index 0000000000..32eccac2a3
--- /dev/null
+++ b/sysdeps/loongarch/lp64/multiarch/memcmp.c
@@ -0,0 +1,43 @@
+/* Multiple versions of memcmp.
+   All versions must be listed in ifunc-impl-list.c.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* Define multiple versions only for the definition in libc.  */
+#if IS_IN (libc)
+# define memcmp __redirect_memcmp
+# include <string.h>
+# undef memcmp
+
+# define SYMBOL_NAME memcmp
+# include "ifunc-memcmp.h"
+
+libc_ifunc_redirected (__redirect_memcmp, memcmp,
+		       IFUNC_SELECTOR ());
+# undef bcmp
+weak_alias (memcmp, bcmp)
+
+# undef __memcmpeq
+strong_alias (memcmp, __memcmpeq)
+libc_hidden_def (__memcmpeq)
+
+# ifdef SHARED
+__hidden_ver1 (memcmp, __GI_memcmp, __redirect_memcmp)
+  __attribute__ ((visibility ("hidden"))) __attribute_copy__ (memcmp);
+# endif
+
+#endif
-- 
2.40.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 6/6] LoongArch: Change loongarch to LoongArch in comments
  2023-08-28  7:26 [PATCH 0/6] LoongArch: Add ifunc support for {raw}memchr, dengjianbo
                   ` (4 preceding siblings ...)
  2023-08-28  7:26 ` [PATCH 5/6] LoongArch: Add ifunc support for memcmp{aligned, " dengjianbo
@ 2023-08-28  7:26 ` dengjianbo
  5 siblings, 0 replies; 7+ messages in thread
From: dengjianbo @ 2023-08-28  7:26 UTC (permalink / raw)
  To: libc-alpha
  Cc: adhemerval.zanella, xry111, caiyinyu, xuchenghua, huangpei, dengjianbo

---
 sysdeps/loongarch/lp64/multiarch/memcpy-aligned.S    | 2 +-
 sysdeps/loongarch/lp64/multiarch/memcpy-lasx.S       | 2 +-
 sysdeps/loongarch/lp64/multiarch/memcpy-lsx.S        | 2 +-
 sysdeps/loongarch/lp64/multiarch/memcpy-unaligned.S  | 2 +-
 sysdeps/loongarch/lp64/multiarch/memmove-aligned.S   | 2 +-
 sysdeps/loongarch/lp64/multiarch/memmove-lasx.S      | 2 +-
 sysdeps/loongarch/lp64/multiarch/memmove-lsx.S       | 2 +-
 sysdeps/loongarch/lp64/multiarch/memmove-unaligned.S | 2 +-
 sysdeps/loongarch/lp64/multiarch/strchr-aligned.S    | 2 +-
 sysdeps/loongarch/lp64/multiarch/strchr-lasx.S       | 2 +-
 sysdeps/loongarch/lp64/multiarch/strchr-lsx.S        | 2 +-
 sysdeps/loongarch/lp64/multiarch/strchrnul-aligned.S | 2 +-
 sysdeps/loongarch/lp64/multiarch/strchrnul-lasx.S    | 2 +-
 sysdeps/loongarch/lp64/multiarch/strchrnul-lsx.S     | 2 +-
 sysdeps/loongarch/lp64/multiarch/strcmp-aligned.S    | 2 +-
 sysdeps/loongarch/lp64/multiarch/strcmp-lsx.S        | 2 +-
 sysdeps/loongarch/lp64/multiarch/strlen-aligned.S    | 2 +-
 sysdeps/loongarch/lp64/multiarch/strlen-lasx.S       | 2 +-
 sysdeps/loongarch/lp64/multiarch/strlen-lsx.S        | 2 +-
 sysdeps/loongarch/lp64/multiarch/strncmp-aligned.S   | 2 +-
 sysdeps/loongarch/lp64/multiarch/strncmp-lsx.S       | 2 +-
 sysdeps/loongarch/lp64/multiarch/strnlen-aligned.S   | 2 +-
 sysdeps/loongarch/lp64/multiarch/strnlen-lasx.S      | 2 +-
 sysdeps/loongarch/lp64/multiarch/strnlen-lsx.S       | 2 +-
 24 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/sysdeps/loongarch/lp64/multiarch/memcpy-aligned.S b/sysdeps/loongarch/lp64/multiarch/memcpy-aligned.S
index 299dd49ce1..7eb34395cb 100644
--- a/sysdeps/loongarch/lp64/multiarch/memcpy-aligned.S
+++ b/sysdeps/loongarch/lp64/multiarch/memcpy-aligned.S
@@ -1,4 +1,4 @@
-/* Optimized memcpy_aligned implementation using basic Loongarch instructions.
+/* Optimized memcpy_aligned implementation using basic LoongArch instructions.
    Copyright (C) 2023 Free Software Foundation, Inc.
 
    This file is part of the GNU C Library.
diff --git a/sysdeps/loongarch/lp64/multiarch/memcpy-lasx.S b/sysdeps/loongarch/lp64/multiarch/memcpy-lasx.S
index 4aae5bf831..ae148df5d7 100644
--- a/sysdeps/loongarch/lp64/multiarch/memcpy-lasx.S
+++ b/sysdeps/loongarch/lp64/multiarch/memcpy-lasx.S
@@ -1,4 +1,4 @@
-/* Optimized memcpy implementation using Loongarch LASX instructions.
+/* Optimized memcpy implementation using LoongArch LASX instructions.
    Copyright (C) 2023 Free Software Foundation, Inc.
 
    This file is part of the GNU C Library.
diff --git a/sysdeps/loongarch/lp64/multiarch/memcpy-lsx.S b/sysdeps/loongarch/lp64/multiarch/memcpy-lsx.S
index 6ebbe7a2c7..feb2bb0e0a 100644
--- a/sysdeps/loongarch/lp64/multiarch/memcpy-lsx.S
+++ b/sysdeps/loongarch/lp64/multiarch/memcpy-lsx.S
@@ -1,4 +1,4 @@
-/* Optimized memcpy implementation using Loongarch LSX instructions.
+/* Optimized memcpy implementation using LoongArch LSX instructions.
    Copyright (C) 2023 Free Software Foundation, Inc.
 
    This file is part of the GNU C Library.
diff --git a/sysdeps/loongarch/lp64/multiarch/memcpy-unaligned.S b/sysdeps/loongarch/lp64/multiarch/memcpy-unaligned.S
index 8e60a22dfb..31019b138f 100644
--- a/sysdeps/loongarch/lp64/multiarch/memcpy-unaligned.S
+++ b/sysdeps/loongarch/lp64/multiarch/memcpy-unaligned.S
@@ -1,4 +1,4 @@
-/* Optimized unaligned memcpy implementation using basic Loongarch instructions.
+/* Optimized unaligned memcpy implementation using basic LoongArch instructions.
    Copyright (C) 2023 Free Software Foundation, Inc.
 
    This file is part of the GNU C Library.
diff --git a/sysdeps/loongarch/lp64/multiarch/memmove-aligned.S b/sysdeps/loongarch/lp64/multiarch/memmove-aligned.S
index 5354f38379..a02114c057 100644
--- a/sysdeps/loongarch/lp64/multiarch/memmove-aligned.S
+++ b/sysdeps/loongarch/lp64/multiarch/memmove-aligned.S
@@ -1,4 +1,4 @@
-/* Optimized memmove_aligned implementation using basic Loongarch instructions.
+/* Optimized memmove_aligned implementation using basic LoongArch instructions.
    Copyright (C) 2023 Free Software Foundation, Inc.
 
    This file is part of the GNU C Library.
diff --git a/sysdeps/loongarch/lp64/multiarch/memmove-lasx.S b/sysdeps/loongarch/lp64/multiarch/memmove-lasx.S
index ff68e7a22b..95d8ee7b93 100644
--- a/sysdeps/loongarch/lp64/multiarch/memmove-lasx.S
+++ b/sysdeps/loongarch/lp64/multiarch/memmove-lasx.S
@@ -1,4 +1,4 @@
-/* Optimized memmove implementation using Loongarch LASX instructions.
+/* Optimized memmove implementation using LoongArch LASX instructions.
    Copyright (C) 2023 Free Software Foundation, Inc.
 
    This file is part of the GNU C Library.
diff --git a/sysdeps/loongarch/lp64/multiarch/memmove-lsx.S b/sysdeps/loongarch/lp64/multiarch/memmove-lsx.S
index 9e1502a79b..8a9367708d 100644
--- a/sysdeps/loongarch/lp64/multiarch/memmove-lsx.S
+++ b/sysdeps/loongarch/lp64/multiarch/memmove-lsx.S
@@ -1,4 +1,4 @@
-/* Optimized memmove implementation using Loongarch LSX instructions.
+/* Optimized memmove implementation using LoongArch LSX instructions.
    Copyright (C) 2023 Free Software Foundation, Inc.
 
    This file is part of the GNU C Library.
diff --git a/sysdeps/loongarch/lp64/multiarch/memmove-unaligned.S b/sysdeps/loongarch/lp64/multiarch/memmove-unaligned.S
index 90a64b6bb9..3284ce25fe 100644
--- a/sysdeps/loongarch/lp64/multiarch/memmove-unaligned.S
+++ b/sysdeps/loongarch/lp64/multiarch/memmove-unaligned.S
@@ -1,4 +1,4 @@
-/* Optimized memmove_unaligned implementation using basic Loongarch instructions.
+/* Optimized memmove_unaligned implementation using basic LoongArch instructions.
    Copyright (C) 2023 Free Software Foundation, Inc.
 
    This file is part of the GNU C Library.
diff --git a/sysdeps/loongarch/lp64/multiarch/strchr-aligned.S b/sysdeps/loongarch/lp64/multiarch/strchr-aligned.S
index 5fb01806e4..620200545b 100644
--- a/sysdeps/loongarch/lp64/multiarch/strchr-aligned.S
+++ b/sysdeps/loongarch/lp64/multiarch/strchr-aligned.S
@@ -1,4 +1,4 @@
-/* Optimized strchr implementation using basic Loongarch instructions.
+/* Optimized strchr implementation using basic LoongArch instructions.
    Copyright (C) 2023 Free Software Foundation, Inc.
 
    This file is part of the GNU C Library.
diff --git a/sysdeps/loongarch/lp64/multiarch/strchr-lasx.S b/sysdeps/loongarch/lp64/multiarch/strchr-lasx.S
index 254402daa5..4d3cc58845 100644
--- a/sysdeps/loongarch/lp64/multiarch/strchr-lasx.S
+++ b/sysdeps/loongarch/lp64/multiarch/strchr-lasx.S
@@ -1,4 +1,4 @@
-/* Optimized strchr implementation using loongarch LASX SIMD instructions.
+/* Optimized strchr implementation using LoongArch LASX instructions.
    Copyright (C) 2023 Free Software Foundation, Inc.
 
    This file is part of the GNU C Library.
diff --git a/sysdeps/loongarch/lp64/multiarch/strchr-lsx.S b/sysdeps/loongarch/lp64/multiarch/strchr-lsx.S
index dae98b0a55..8b78c35c20 100644
--- a/sysdeps/loongarch/lp64/multiarch/strchr-lsx.S
+++ b/sysdeps/loongarch/lp64/multiarch/strchr-lsx.S
@@ -1,4 +1,4 @@
-/* Optimized strlen implementation using loongarch LSX SIMD instructions.
+/* Optimized strlen implementation using LoongArch LSX instructions.
    Copyright (C) 2023 Free Software Foundation, Inc.
 
    This file is part of the GNU C Library.
diff --git a/sysdeps/loongarch/lp64/multiarch/strchrnul-aligned.S b/sysdeps/loongarch/lp64/multiarch/strchrnul-aligned.S
index 1c01a0232d..20856a06a0 100644
--- a/sysdeps/loongarch/lp64/multiarch/strchrnul-aligned.S
+++ b/sysdeps/loongarch/lp64/multiarch/strchrnul-aligned.S
@@ -1,4 +1,4 @@
-/* Optimized strchrnul implementation using basic Loongarch instructions.
+/* Optimized strchrnul implementation using basic LoongArch instructions.
    Copyright (C) 2023 Free Software Foundation, Inc.
 
    This file is part of the GNU C Library.
diff --git a/sysdeps/loongarch/lp64/multiarch/strchrnul-lasx.S b/sysdeps/loongarch/lp64/multiarch/strchrnul-lasx.S
index d45495e48f..4753d4ced5 100644
--- a/sysdeps/loongarch/lp64/multiarch/strchrnul-lasx.S
+++ b/sysdeps/loongarch/lp64/multiarch/strchrnul-lasx.S
@@ -1,4 +1,4 @@
-/* Optimized strchrnul implementation using loongarch LASX SIMD instructions.
+/* Optimized strchrnul implementation using LoongArch LASX instructions.
    Copyright (C) 2023 Free Software Foundation, Inc.
 
    This file is part of the GNU C Library.
diff --git a/sysdeps/loongarch/lp64/multiarch/strchrnul-lsx.S b/sysdeps/loongarch/lp64/multiarch/strchrnul-lsx.S
index 07d793ae5f..671e740c03 100644
--- a/sysdeps/loongarch/lp64/multiarch/strchrnul-lsx.S
+++ b/sysdeps/loongarch/lp64/multiarch/strchrnul-lsx.S
@@ -1,4 +1,4 @@
-/* Optimized strchrnul implementation using loongarch LSX SIMD instructions.
+/* Optimized strchrnul implementation using LoongArch LSX instructions.
    Copyright (C) 2023 Free Software Foundation, Inc.
 
    This file is part of the GNU C Library.
diff --git a/sysdeps/loongarch/lp64/multiarch/strcmp-aligned.S b/sysdeps/loongarch/lp64/multiarch/strcmp-aligned.S
index f5f4f3364e..ba1f9667e0 100644
--- a/sysdeps/loongarch/lp64/multiarch/strcmp-aligned.S
+++ b/sysdeps/loongarch/lp64/multiarch/strcmp-aligned.S
@@ -1,4 +1,4 @@
-/* Optimized strcmp implementation using basic Loongarch instructions.
+/* Optimized strcmp implementation using basic LoongArch instructions.
    Copyright (C) 2023 Free Software Foundation, Inc.
 
    This file is part of the GNU C Library.
diff --git a/sysdeps/loongarch/lp64/multiarch/strcmp-lsx.S b/sysdeps/loongarch/lp64/multiarch/strcmp-lsx.S
index 2e177a3872..091c8c9ebd 100644
--- a/sysdeps/loongarch/lp64/multiarch/strcmp-lsx.S
+++ b/sysdeps/loongarch/lp64/multiarch/strcmp-lsx.S
@@ -1,4 +1,4 @@
-/* Optimized strcmp implementation using Loongarch LSX instructions.
+/* Optimized strcmp implementation using LoongArch LSX instructions.
    Copyright (C) 2023 Free Software Foundation, Inc.
 
    This file is part of the GNU C Library.
diff --git a/sysdeps/loongarch/lp64/multiarch/strlen-aligned.S b/sysdeps/loongarch/lp64/multiarch/strlen-aligned.S
index e9e1d2fc04..ed0548e46b 100644
--- a/sysdeps/loongarch/lp64/multiarch/strlen-aligned.S
+++ b/sysdeps/loongarch/lp64/multiarch/strlen-aligned.S
@@ -1,4 +1,4 @@
-/* Optimized strlen implementation using basic Loongarch instructions.
+/* Optimized strlen implementation using basic LoongArch instructions.
    Copyright (C) 2023 Free Software Foundation, Inc.
 
    This file is part of the GNU C Library.
diff --git a/sysdeps/loongarch/lp64/multiarch/strlen-lasx.S b/sysdeps/loongarch/lp64/multiarch/strlen-lasx.S
index 258c47cea0..91342f3415 100644
--- a/sysdeps/loongarch/lp64/multiarch/strlen-lasx.S
+++ b/sysdeps/loongarch/lp64/multiarch/strlen-lasx.S
@@ -1,4 +1,4 @@
-/* Optimized strlen implementation using loongarch LASX SIMD instructions.
+/* Optimized strlen implementation using LoongArch LASX instructions.
    Copyright (C) 2023 Free Software Foundation, Inc.
 
    This file is part of the GNU C Library.
diff --git a/sysdeps/loongarch/lp64/multiarch/strlen-lsx.S b/sysdeps/loongarch/lp64/multiarch/strlen-lsx.S
index b194355e7b..b09c12e00b 100644
--- a/sysdeps/loongarch/lp64/multiarch/strlen-lsx.S
+++ b/sysdeps/loongarch/lp64/multiarch/strlen-lsx.S
@@ -1,4 +1,4 @@
-/* Optimized strlen implementation using Loongarch LSX SIMD instructions.
+/* Optimized strlen implementation using LoongArch LSX instructions.
    Copyright (C) 2023 Free Software Foundation, Inc.
 
    This file is part of the GNU C Library.
diff --git a/sysdeps/loongarch/lp64/multiarch/strncmp-aligned.S b/sysdeps/loongarch/lp64/multiarch/strncmp-aligned.S
index e2687fa770..f63de872a7 100644
--- a/sysdeps/loongarch/lp64/multiarch/strncmp-aligned.S
+++ b/sysdeps/loongarch/lp64/multiarch/strncmp-aligned.S
@@ -1,4 +1,4 @@
-/* Optimized strncmp implementation using basic Loongarch instructions.
+/* Optimized strncmp implementation using basic LoongArch instructions.
    Copyright (C) 2023 Free Software Foundation, Inc.
 
    This file is part of the GNU C Library.
diff --git a/sysdeps/loongarch/lp64/multiarch/strncmp-lsx.S b/sysdeps/loongarch/lp64/multiarch/strncmp-lsx.S
index 0b4eee2a98..83cb801d5d 100644
--- a/sysdeps/loongarch/lp64/multiarch/strncmp-lsx.S
+++ b/sysdeps/loongarch/lp64/multiarch/strncmp-lsx.S
@@ -1,4 +1,4 @@
-/* Optimized strncmp implementation using Loongarch LSX instructions.
+/* Optimized strncmp implementation using LoongArch LSX instructions.
    Copyright (C) 2023 Free Software Foundation, Inc.
 
    This file is part of the GNU C Library.
diff --git a/sysdeps/loongarch/lp64/multiarch/strnlen-aligned.S b/sysdeps/loongarch/lp64/multiarch/strnlen-aligned.S
index b900430a5d..a8296a1b21 100644
--- a/sysdeps/loongarch/lp64/multiarch/strnlen-aligned.S
+++ b/sysdeps/loongarch/lp64/multiarch/strnlen-aligned.S
@@ -1,4 +1,4 @@
-/* Optimized strnlen implementation using basic Loongarch instructions.
+/* Optimized strnlen implementation using basic LoongArch instructions.
    Copyright (C) 2023 Free Software Foundation, Inc.
 
    This file is part of the GNU C Library.
diff --git a/sysdeps/loongarch/lp64/multiarch/strnlen-lasx.S b/sysdeps/loongarch/lp64/multiarch/strnlen-lasx.S
index 2c03d3d9b4..aa6c812d30 100644
--- a/sysdeps/loongarch/lp64/multiarch/strnlen-lasx.S
+++ b/sysdeps/loongarch/lp64/multiarch/strnlen-lasx.S
@@ -1,4 +1,4 @@
-/* Optimized strnlen implementation using loongarch LASX instructions
+/* Optimized strnlen implementation using LoongArch LASX instructions
    Copyright (C) 2023 Free Software Foundation, Inc.
 
    This file is part of the GNU C Library.
diff --git a/sysdeps/loongarch/lp64/multiarch/strnlen-lsx.S b/sysdeps/loongarch/lp64/multiarch/strnlen-lsx.S
index b769a89584..d0febe3eb0 100644
--- a/sysdeps/loongarch/lp64/multiarch/strnlen-lsx.S
+++ b/sysdeps/loongarch/lp64/multiarch/strnlen-lsx.S
@@ -1,4 +1,4 @@
-/* Optimized strnlen implementation using loongarch LSX instructions
+/* Optimized strnlen implementation using LoongArch LSX instructions
    Copyright (C) 2023 Free Software Foundation, Inc.
 
    This file is part of the GNU C Library.
-- 
2.40.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-08-28  7:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-28  7:26 [PATCH 0/6] LoongArch: Add ifunc support for {raw}memchr, dengjianbo
2023-08-28  7:26 ` [PATCH 1/6] LoongArch: Add ifunc support for rawmemchr{aligned, lsx, lasx} dengjianbo
2023-08-28  7:26 ` [PATCH 2/6] LoongArch: Add ifunc support for memchr{aligned, " dengjianbo
2023-08-28  7:26 ` [PATCH 3/6] LoongArch: Add ifunc support for memrchr{lsx, lasx} dengjianbo
2023-08-28  7:26 ` [PATCH 4/6] LoongArch: Add ifunc support for memset{aligned, unaligned, lsx, lasx} dengjianbo
2023-08-28  7:26 ` [PATCH 5/6] LoongArch: Add ifunc support for memcmp{aligned, " dengjianbo
2023-08-28  7:26 ` [PATCH 6/6] LoongArch: Change loongarch to LoongArch in comments dengjianbo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).