public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Clean strncmp implementations
@ 2023-02-28 17:23 Adhemerval Zanella
  2023-02-28 17:23 ` [PATCH 1/3] powerpc: Remove strncmp variants Adhemerval Zanella
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Adhemerval Zanella @ 2023-02-28 17:23 UTC (permalink / raw)
  To: libc-alpha, Richard Henderson

While working to fix the crypto badsalttest failure (675bdaeeca7) I
noted that some implementation implements similar strategies used by
the generic implementation.  While alpha implements some unaligned
input optimization, powerpc just optimizes for aligned inputs.

Now that generic implementation allows both to use arch-specific
string compare functions (alpha and power7), it allows to just
remove this arch-specific code.

Adhemerval Zanella (3):
  powerpc: Remove strncmp variants
  powerpc: Remove powerpc64 strncmp variants
  alpha: Remove strncmp optimization

 sysdeps/alpha/strncmp.S                       | 276 ------------------
 .../powerpc32/power4/multiarch/Makefile       |   2 +-
 .../power4/multiarch/ifunc-impl-list.c        |   7 -
 .../power4/multiarch/strncmp-power7.S         |  38 ---
 .../power4/multiarch/strncmp-ppc32.S          |  40 ---
 .../powerpc32/power4/multiarch/strncmp.c      |  39 ---
 sysdeps/powerpc/powerpc32/power4/strncmp.S    | 196 -------------
 sysdeps/powerpc/powerpc32/power7/strncmp.S    | 199 -------------
 sysdeps/powerpc/powerpc32/strncmp.S           | 181 ------------
 sysdeps/powerpc/powerpc64/multiarch/Makefile  |   2 +-
 .../powerpc64/multiarch/ifunc-impl-list.c     |   2 -
 .../powerpc64/multiarch/strncmp-power7.S      |  23 --
 .../powerpc64/multiarch/strncmp-ppc64.S       |  26 --
 .../powerpc64/multiarch/strncmp-ppc64.c       |   7 +
 sysdeps/powerpc/powerpc64/multiarch/strncmp.c |   5 +-
 sysdeps/powerpc/powerpc64/power7/strncmp.S    | 228 ---------------
 sysdeps/powerpc/powerpc64/strncmp.S           | 210 -------------
 17 files changed, 10 insertions(+), 1471 deletions(-)
 delete mode 100644 sysdeps/alpha/strncmp.S
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/multiarch/strncmp-power7.S
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/multiarch/strncmp-ppc32.S
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/multiarch/strncmp.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/strncmp.S
 delete mode 100644 sysdeps/powerpc/powerpc32/power7/strncmp.S
 delete mode 100644 sysdeps/powerpc/powerpc32/strncmp.S
 delete mode 100644 sysdeps/powerpc/powerpc64/multiarch/strncmp-power7.S
 delete mode 100644 sysdeps/powerpc/powerpc64/multiarch/strncmp-ppc64.S
 create mode 100644 sysdeps/powerpc/powerpc64/multiarch/strncmp-ppc64.c
 delete mode 100644 sysdeps/powerpc/powerpc64/power7/strncmp.S
 delete mode 100644 sysdeps/powerpc/powerpc64/strncmp.S

-- 
2.34.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/3] powerpc: Remove strncmp variants
  2023-02-28 17:23 [PATCH 0/3] Clean strncmp implementations Adhemerval Zanella
@ 2023-02-28 17:23 ` Adhemerval Zanella
  2023-03-01  4:01   ` Rajalakshmi Srinivasaraghavan
  2023-02-28 17:23 ` [PATCH 2/3] powerpc: Remove powerpc64 " Adhemerval Zanella
  2023-02-28 17:24 ` [PATCH 3/3] alpha: Remove strncmp optimization Adhemerval Zanella
  2 siblings, 1 reply; 7+ messages in thread
From: Adhemerval Zanella @ 2023-02-28 17:23 UTC (permalink / raw)
  To: libc-alpha, Richard Henderson

The default, power4, and power7 implementation just adds word aligned
access when inputs have the same aligment.  The unaligned case
is still done by byte operations.

This is already covered by the generic implementation, which also add
the unaligned input optimization.

Checked on powerpc-linux-gnu built without multi-arch for powerpc,
power4, and power7.
---
 .../powerpc32/power4/multiarch/Makefile       |   2 +-
 .../power4/multiarch/ifunc-impl-list.c        |   7 -
 .../power4/multiarch/strncmp-power7.S         |  38 ----
 .../power4/multiarch/strncmp-ppc32.S          |  40 ----
 .../powerpc32/power4/multiarch/strncmp.c      |  39 ----
 sysdeps/powerpc/powerpc32/power4/strncmp.S    | 196 -----------------
 sysdeps/powerpc/powerpc32/power7/strncmp.S    | 199 ------------------
 sysdeps/powerpc/powerpc32/strncmp.S           | 181 ----------------
 8 files changed, 1 insertion(+), 701 deletions(-)
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/multiarch/strncmp-power7.S
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/multiarch/strncmp-ppc32.S
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/multiarch/strncmp.c
 delete mode 100644 sysdeps/powerpc/powerpc32/power4/strncmp.S
 delete mode 100644 sysdeps/powerpc/powerpc32/power7/strncmp.S
 delete mode 100644 sysdeps/powerpc/powerpc32/strncmp.S

diff --git a/sysdeps/powerpc/powerpc32/power4/multiarch/Makefile b/sysdeps/powerpc/powerpc32/power4/multiarch/Makefile
index b2f9deefb8..0a4e828435 100644
--- a/sysdeps/powerpc/powerpc32/power4/multiarch/Makefile
+++ b/sysdeps/powerpc/powerpc32/power4/multiarch/Makefile
@@ -5,7 +5,7 @@ sysdep_routines += memcpy-power7 memcpy-a2 memcpy-power6 memcpy-cell \
 		   mempcpy-power7 mempcpy-ppc32 memchr-power7 \
 		   memchr-ppc32 memrchr-power7 memrchr-ppc32 rawmemchr-power7 \
 		   rawmemchr-ppc32 strlen-power7 strlen-ppc32 strnlen-power7 \
-		   strnlen-ppc32 strncmp-power7 strncmp-ppc32 \
+		   strnlen-ppc32 \
 		   strcasecmp-power7 strcasecmp_l-power7 strncase-power7 \
 		   strncase_l-power7 strchrnul-power7 strchrnul-ppc32 \
 		   strchr-power7 strchr-ppc32 \
diff --git a/sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c b/sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c
index 3b95ad2c12..b4f80539e7 100644
--- a/sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c
+++ b/sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c
@@ -81,13 +81,6 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 			      __strnlen_power7)
 	      IFUNC_IMPL_ADD (array, i, strnlen, 1,
 			      __strnlen_ppc))
-
-  /* Support sysdeps/powerpc/powerpc32/multiarch/strncmp.c.  */
-  IFUNC_IMPL (i, name, strncmp,
-	      IFUNC_IMPL_ADD (array, i, strncmp, hwcap & PPC_FEATURE_HAS_VSX,
-			      __strncmp_power7)
-	      IFUNC_IMPL_ADD (array, i, strncmp, 1,
-			      __strncmp_ppc))
 #endif
 
   /* Support sysdeps/powerpc/powerpc32/power4/multiarch/memcmp.c.  */
diff --git a/sysdeps/powerpc/powerpc32/power4/multiarch/strncmp-power7.S b/sysdeps/powerpc/powerpc32/power4/multiarch/strncmp-power7.S
deleted file mode 100644
index 068b1bb8ad..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/multiarch/strncmp-power7.S
+++ /dev/null
@@ -1,38 +0,0 @@
-/* Optimized strcmp implementation for POWER7/PowerPC32.
-   Copyright (C) 2013-2023 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <sysdep.h>
-
-#undef EALIGN
-#define EALIGN(name, alignt, words)				\
- .globl C_SYMBOL_NAME(__strncmp_power7);			\
- .type C_SYMBOL_NAME(__strncmp_power7),@function;		\
- .align ALIGNARG(alignt);					\
- EALIGN_W_##words;						\
- C_LABEL(__strncmp_power7)					\
- cfi_startproc;
-
-#undef END
-#define END(name)						\
- cfi_endproc;							\
- ASM_SIZE_DIRECTIVE(__strncmp_power7)
-
-#undef libc_hidden_builtin_def
-#define libc_hidden_builtin_def(name)
-
-#include <sysdeps/powerpc/powerpc32/power7/strncmp.S>
diff --git a/sysdeps/powerpc/powerpc32/power4/multiarch/strncmp-ppc32.S b/sysdeps/powerpc/powerpc32/power4/multiarch/strncmp-ppc32.S
deleted file mode 100644
index b04afd4478..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/multiarch/strncmp-ppc32.S
+++ /dev/null
@@ -1,40 +0,0 @@
-/* Copyright (C) 2013-2023 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <sysdep.h>
-
-#if defined SHARED && IS_IN (libc)
-# undef EALIGN
-# define EALIGN(name, alignt, words)				\
-  .globl C_SYMBOL_NAME(__strncmp_ppc);			\
-  .type C_SYMBOL_NAME(__strncmp_ppc),@function;		\
-  .align ALIGNARG(alignt);					\
-  EALIGN_W_##words;						\
-  C_LABEL(__strncmp_ppc)					\
-  cfi_startproc;
-
-# undef END
-# define END(name)						\
-  cfi_endproc;							\
-  ASM_SIZE_DIRECTIVE(__strncmp_ppc)
-
-# undef libc_hidden_builtin_def
-# define libc_hidden_builtin_def(name)				\
-    .globl __GI_strncmp; __GI_strncmp = __strncmp_ppc
-#endif
-
-#include <sysdeps/powerpc/powerpc32/power4/strncmp.S>
diff --git a/sysdeps/powerpc/powerpc32/power4/multiarch/strncmp.c b/sysdeps/powerpc/powerpc32/power4/multiarch/strncmp.c
deleted file mode 100644
index 79046015e5..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/multiarch/strncmp.c
+++ /dev/null
@@ -1,39 +0,0 @@
-/* Multiple versions of strncmp.
-   Copyright (C) 2013-2023 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-/* Define multiple versions only for definition in libc.  */
-#if defined SHARED && IS_IN (libc)
-# define strncmp __redirect_strncmp
-/* Omit the strncmp inline definitions because it would redefine strncmp.  */
-# define __NO_STRING_INLINES
-# include <string.h>
-# include <shlib-compat.h>
-# include "init-arch.h"
-
-extern __typeof (strncmp) __strncmp_ppc attribute_hidden;
-extern __typeof (strncmp) __strncmp_power4 attribute_hidden;
-extern __typeof (strncmp) __strncmp_power7 attribute_hidden;
-# undef strncmp
-
-/* Avoid DWARF definition DIE on ifunc symbol so that GDB can handle
-   ifunc symbol properly.  */
-libc_ifunc_redirected (__redirect_strncmp, strncmp,
-		       (hwcap & PPC_FEATURE_HAS_VSX)
-		       ? __strncmp_power7
-		       : __strncmp_ppc);
-#endif
diff --git a/sysdeps/powerpc/powerpc32/power4/strncmp.S b/sysdeps/powerpc/powerpc32/power4/strncmp.S
deleted file mode 100644
index 1f65c947cd..0000000000
--- a/sysdeps/powerpc/powerpc32/power4/strncmp.S
+++ /dev/null
@@ -1,196 +0,0 @@
-/* Optimized strcmp implementation for PowerPC32.
-   Copyright (C) 2003-2023 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <sysdep.h>
-
-/* See strlen.s for comments on how the end-of-string testing works.  */
-
-/* int [r3] strncmp (const char *s1 [r3], const char *s2 [r4], size_t size [r5])  */
-
-EALIGN (strncmp, 4, 0)
-
-#define rTMP2	r0
-#define rRTN	r3
-#define rSTR1	r3	/* first string arg */
-#define rSTR2	r4	/* second string arg */
-#define rN	r5	/* max string length */
-#define rWORD1	r6	/* current word in s1 */
-#define rWORD2	r7	/* current word in s2 */
-#define rWORD3  r10
-#define rWORD4  r11
-#define rFEFE	r8	/* constant 0xfefefeff (-0x01010101) */
-#define r7F7F	r9	/* constant 0x7f7f7f7f */
-#define rNEG	r10	/* ~(word in s1 | 0x7f7f7f7f) */
-#define rBITDIF	r11	/* bits that differ in s1 & s2 words */
-#define rTMP	r12
-
-	dcbt	0,rSTR1
-	or	rTMP, rSTR2, rSTR1
-	lis	r7F7F, 0x7f7f
-	dcbt	0,rSTR2
-	clrlwi.	rTMP, rTMP, 30
-	cmplwi	cr1, rN, 0
-	lis	rFEFE, -0x101
-	bne	L(unaligned)
-/* We are word aligned so set up for two loops.  first a word
-   loop, then fall into the byte loop if any residual.  */
-	srwi.	rTMP, rN, 2
-	clrlwi	rN, rN, 30
-	addi	rFEFE, rFEFE, -0x101
-	addi	r7F7F, r7F7F, 0x7f7f
-	cmplwi	cr1, rN, 0
-	beq	L(unaligned)
-
-	mtctr	rTMP	/* Power4 wants mtctr 1st in dispatch group.  */
-	lwz	rWORD1, 0(rSTR1)
-	lwz	rWORD2, 0(rSTR2)
-	b	L(g1)
-
-L(g0):
-	lwzu	rWORD1, 4(rSTR1)
-	bne-	cr1, L(different)
-	lwzu	rWORD2, 4(rSTR2)
-L(g1):	add	rTMP, rFEFE, rWORD1
-	nor	rNEG, r7F7F, rWORD1
-	bdz	L(tail)
-	and.	rTMP, rTMP, rNEG
-	cmpw	cr1, rWORD1, rWORD2
-	beq+	L(g0)
-
-/* OK. We've hit the end of the string. We need to be careful that
-   we don't compare two strings as different because of gunk beyond
-   the end of the strings...  */
-
-#ifdef __LITTLE_ENDIAN__
-L(endstring):
-	slwi	rTMP, rTMP, 1
-	addi    rTMP2, rTMP, -1
-	andc    rTMP2, rTMP2, rTMP
-	and	rWORD2, rWORD2, rTMP2		/* Mask off gunk.  */
-	and	rWORD1, rWORD1, rTMP2
-	rlwinm	rTMP2, rWORD2, 8, 0xffffffff	/* Byte reverse word.  */
-	rlwinm	rTMP, rWORD1, 8, 0xffffffff
-	rldimi	rTMP2, rWORD2, 24, 32
-	rldimi	rTMP, rWORD1, 24, 32
-	rlwimi	rTMP2, rWORD2, 24, 16, 23
-	rlwimi	rTMP, rWORD1, 24, 16, 23
-	xor.	rBITDIF, rTMP, rTMP2
-	sub	rRTN, rTMP, rTMP2
-	bgelr+
-	ori	rRTN, rTMP2, 1
-	blr
-
-L(different):
-	lwz	rWORD1, -4(rSTR1)
-	rlwinm	rTMP2, rWORD2, 8, 0xffffffff	/* Byte reverse word.  */
-	rlwinm	rTMP, rWORD1, 8, 0xffffffff
-	rldimi	rTMP2, rWORD2, 24, 32
-	rldimi	rTMP, rWORD1, 24, 32
-	rlwimi	rTMP2, rWORD2, 24, 16, 23
-	rlwimi	rTMP, rWORD1, 24, 16, 23
-	xor.	rBITDIF, rTMP, rTMP2
-	sub	rRTN, rTMP, rTMP2
-	bgelr+
-	ori	rRTN, rTMP2, 1
-	blr
-
-#else
-L(endstring):
-	and	rTMP, r7F7F, rWORD1
-	beq	cr1, L(equal)
-	add	rTMP, rTMP, r7F7F
-	xor.	rBITDIF, rWORD1, rWORD2
-	andc	rNEG, rNEG, rTMP
-	blt-	L(highbit)
-	cntlzw	rBITDIF, rBITDIF
-	cntlzw	rNEG, rNEG
-	addi	rNEG, rNEG, 7
-	cmpw	cr1, rNEG, rBITDIF
-	sub	rRTN, rWORD1, rWORD2
-	bgelr+	cr1
-L(equal):
-	li	rRTN, 0
-	blr
-
-L(different):
-	lwz	rWORD1, -4(rSTR1)
-	xor.	rBITDIF, rWORD1, rWORD2
-	sub	rRTN, rWORD1, rWORD2
-	bgelr+
-L(highbit):
-	ori	rRTN, rWORD2, 1
-	blr
-#endif
-
-/* Oh well.  In this case, we just do a byte-by-byte comparison.  */
-	.align 4
-L(tail):
-	and.	rTMP, rTMP, rNEG
-	cmpw	cr1, rWORD1, rWORD2
-	bne-	L(endstring)
-	addi	rSTR1, rSTR1, 4
-	bne-	cr1, L(different)
-	addi	rSTR2, rSTR2, 4
-	cmplwi	cr1, rN, 0
-L(unaligned):
-	mtctr   rN	/* Power4 wants mtctr 1st in dispatch group */
-	ble	cr1, L(ux)
-L(uz):
-	lbz	rWORD1, 0(rSTR1)
-	lbz	rWORD2, 0(rSTR2)
-	.align 4
-L(u1):
-	cmpwi	cr1, rWORD1, 0
-	bdz	L(u4)
-	cmpw	rWORD1, rWORD2
-	beq-	cr1, L(u4)
-	bne-	L(u4)
-	lbzu    rWORD3, 1(rSTR1)
-	lbzu	rWORD4, 1(rSTR2)
-	cmpwi	cr1, rWORD3, 0
-	bdz	L(u3)
-	cmpw	rWORD3, rWORD4
-	beq-    cr1, L(u3)
-	bne-    L(u3)
-	lbzu	rWORD1, 1(rSTR1)
-	lbzu	rWORD2, 1(rSTR2)
-	cmpwi	cr1, rWORD1, 0
-	bdz	L(u4)
-	cmpw	rWORD1, rWORD2
-	beq-	cr1, L(u4)
-	bne-	L(u4)
-	lbzu	rWORD3, 1(rSTR1)
-	lbzu	rWORD4, 1(rSTR2)
-	cmpwi	cr1, rWORD3, 0
-	bdz	L(u3)
-	cmpw	rWORD3, rWORD4
-	beq-    cr1, L(u3)
-	bne-	L(u3)
-	lbzu	rWORD1, 1(rSTR1)
-	lbzu	rWORD2, 1(rSTR2)
-	b       L(u1)
-
-L(u3):  sub     rRTN, rWORD3, rWORD4
-	blr
-L(u4):	sub	rRTN, rWORD1, rWORD2
-	blr
-L(ux):
-	li	rRTN, 0
-	blr
-END (strncmp)
-libc_hidden_builtin_def (strncmp)
diff --git a/sysdeps/powerpc/powerpc32/power7/strncmp.S b/sysdeps/powerpc/powerpc32/power7/strncmp.S
deleted file mode 100644
index bbaab6ca0e..0000000000
--- a/sysdeps/powerpc/powerpc32/power7/strncmp.S
+++ /dev/null
@@ -1,199 +0,0 @@
-/* Optimized strcmp implementation for POWER7/PowerPC32.
-   Copyright (C) 2010-2023 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <sysdep.h>
-
-/* See strlen.s for comments on how the end-of-string testing works.  */
-
-/* int [r3] strncmp (const char *s1 [r3],
-		     const char *s2 [r4],
-		     size_t size [r5])  */
-
-EALIGN (strncmp,5,0)
-
-#define rTMP2	r0
-#define rRTN	r3
-#define rSTR1	r3	/* first string arg */
-#define rSTR2	r4	/* second string arg */
-#define rN	r5	/* max string length */
-#define rWORD1	r6	/* current word in s1 */
-#define rWORD2	r7	/* current word in s2 */
-#define rWORD3	r10
-#define rWORD4	r11
-#define rFEFE	r8	/* constant 0xfefefeff (-0x01010101) */
-#define r7F7F	r9	/* constant 0x7f7f7f7f */
-#define rNEG	r10	/* ~(word in s1 | 0x7f7f7f7f) */
-#define rBITDIF	r11	/* bits that differ in s1 & s2 words */
-#define rTMP	r12
-
-	dcbt	0,rSTR1
-	nop
-	or	rTMP,rSTR2,rSTR1
-	lis	r7F7F,0x7f7f
-	dcbt	0,rSTR2
-	nop
-	clrlwi.	rTMP,rTMP,30
-	cmplwi	cr1,rN,0
-	lis	rFEFE,-0x101
-	bne	L(unaligned)
-/* We are word aligned so set up for two loops.  first a word
-   loop, then fall into the byte loop if any residual.  */
-	srwi.	rTMP,rN,2
-	clrlwi	rN,rN,30
-	addi	rFEFE,rFEFE,-0x101
-	addi	r7F7F,r7F7F,0x7f7f
-	cmplwi	cr1,rN,0
-	beq	L(unaligned)
-
-	mtctr	rTMP
-	lwz	rWORD1,0(rSTR1)
-	lwz	rWORD2,0(rSTR2)
-	b	L(g1)
-
-L(g0):
-	lwzu	rWORD1,4(rSTR1)
-	bne	cr1,L(different)
-	lwzu	rWORD2,4(rSTR2)
-L(g1):	add	rTMP,rFEFE,rWORD1
-	nor	rNEG,r7F7F,rWORD1
-	bdz	L(tail)
-	and.	rTMP,rTMP,rNEG
-	cmpw	cr1,rWORD1,rWORD2
-	beq	L(g0)
-
-/* OK. We've hit the end of the string. We need to be careful that
-   we don't compare two strings as different because of gunk beyond
-   the end of the strings...  */
-#ifdef __LITTLE_ENDIAN__
-L(endstring):
-	slwi	rTMP, rTMP, 1
-	addi    rTMP2, rTMP, -1
-	andc    rTMP2, rTMP2, rTMP
-	and	rWORD2, rWORD2, rTMP2		/* Mask off gunk.  */
-	and	rWORD1, rWORD1, rTMP2
-	rlwinm	rTMP2, rWORD2, 8, 0xffffffff	/* Byte reverse word.  */
-	rlwinm	rTMP, rWORD1, 8, 0xffffffff
-	rldimi	rTMP2, rWORD2, 24, 32
-	rldimi	rTMP, rWORD1, 24, 32
-	rlwimi	rTMP2, rWORD2, 24, 16, 23
-	rlwimi	rTMP, rWORD1, 24, 16, 23
-	xor.	rBITDIF, rTMP, rTMP2
-	sub	rRTN, rTMP, rTMP2
-	bgelr
-	ori	rRTN, rTMP2, 1
-	blr
-
-L(different):
-	lwz	rWORD1, -4(rSTR1)
-	rlwinm	rTMP2, rWORD2, 8, 0xffffffff	/* Byte reverse word.  */
-	rlwinm	rTMP, rWORD1, 8, 0xffffffff
-	rldimi	rTMP2, rWORD2, 24, 32
-	rldimi	rTMP, rWORD1, 24, 32
-	rlwimi	rTMP2, rWORD2, 24, 16, 23
-	rlwimi	rTMP, rWORD1, 24, 16, 23
-	xor.	rBITDIF, rTMP, rTMP2
-	sub	rRTN, rTMP, rTMP2
-	bgelr
-	ori	rRTN, rTMP2, 1
-	blr
-
-#else
-L(endstring):
-	and	rTMP,r7F7F,rWORD1
-	beq	cr1,L(equal)
-	add	rTMP,rTMP,r7F7F
-	xor.	rBITDIF,rWORD1,rWORD2
-	andc	rNEG,rNEG,rTMP
-	blt	L(highbit)
-	cntlzw	rBITDIF,rBITDIF
-	cntlzw	rNEG,rNEG
-	addi	rNEG,rNEG,7
-	cmpw	cr1,rNEG,rBITDIF
-	sub	rRTN,rWORD1,rWORD2
-	bgelr	cr1
-L(equal):
-	li	rRTN,0
-	blr
-
-L(different):
-	lwz	rWORD1,-4(rSTR1)
-	xor.	rBITDIF,rWORD1,rWORD2
-	sub	rRTN,rWORD1,rWORD2
-	bgelr
-L(highbit):
-	ori	rRTN, rWORD2, 1
-	blr
-#endif
-
-/* Oh well. In this case, we just do a byte-by-byte comparison.  */
-	.align	4
-L(tail):
-	and.	rTMP,rTMP,rNEG
-	cmpw	cr1,rWORD1,rWORD2
-	bne	L(endstring)
-	addi	rSTR1,rSTR1,4
-	bne	cr1,L(different)
-	addi	rSTR2,rSTR2,4
-	cmplwi	cr1,rN,0
-L(unaligned):
-	mtctr	rN
-	ble	cr1,L(ux)
-L(uz):
-	lbz	rWORD1,0(rSTR1)
-	lbz	rWORD2,0(rSTR2)
-	.align	4
-L(u1):
-	cmpwi	cr1,rWORD1,0
-	bdz	L(u4)
-	cmpw	rWORD1,rWORD2
-	beq	cr1,L(u4)
-	bne	L(u4)
-	lbzu	rWORD3,1(rSTR1)
-	lbzu	rWORD4,1(rSTR2)
-	cmpwi	cr1,rWORD3,0
-	bdz	L(u3)
-	cmpw	rWORD3,rWORD4
-	beq	cr1,L(u3)
-	bne	L(u3)
-	lbzu	rWORD1,1(rSTR1)
-	lbzu	rWORD2,1(rSTR2)
-	cmpwi	cr1,rWORD1,0
-	bdz	L(u4)
-	cmpw	rWORD1,rWORD2
-	beq	cr1,L(u4)
-	bne	L(u4)
-	lbzu	rWORD3,1(rSTR1)
-	lbzu	rWORD4,1(rSTR2)
-	cmpwi	cr1,rWORD3,0
-	bdz	L(u3)
-	cmpw	rWORD3,rWORD4
-	beq	cr1,L(u3)
-	bne	L(u3)
-	lbzu	rWORD1,1(rSTR1)
-	lbzu	rWORD2,1(rSTR2)
-	b	L(u1)
-
-L(u3):  sub	rRTN,rWORD3,rWORD4
-	blr
-L(u4):	sub	rRTN,rWORD1,rWORD2
-	blr
-L(ux):
-	li	rRTN,0
-	blr
-END (strncmp)
-libc_hidden_builtin_def (strncmp)
diff --git a/sysdeps/powerpc/powerpc32/strncmp.S b/sysdeps/powerpc/powerpc32/strncmp.S
deleted file mode 100644
index 28cae4ddce..0000000000
--- a/sysdeps/powerpc/powerpc32/strncmp.S
+++ /dev/null
@@ -1,181 +0,0 @@
-/* Optimized strcmp implementation for PowerPC32.
-   Copyright (C) 2003-2023 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <sysdep.h>
-
-/* See strlen.s for comments on how the end-of-string testing works.  */
-
-/* int [r3] strncmp (const char *s1 [r3], const char *s2 [r4], size_t size [r5])  */
-
-EALIGN (strncmp, 4, 0)
-
-#define rTMP2	r0
-#define rRTN	r3
-#define rSTR1	r3	/* first string arg */
-#define rSTR2	r4	/* second string arg */
-#define rN	r5	/* max string length */
-#define rWORD1	r6	/* current word in s1 */
-#define rWORD2	r7	/* current word in s2 */
-#define rFEFE	r8	/* constant 0xfefefeff (-0x01010101) */
-#define r7F7F	r9	/* constant 0x7f7f7f7f */
-#define rNEG	r10	/* ~(word in s1 | 0x7f7f7f7f) */
-#define rBITDIF	r11	/* bits that differ in s1 & s2 words */
-#define rTMP	r12
-
-	dcbt	0,rSTR1
-	or	rTMP, rSTR2, rSTR1
-	lis	r7F7F, 0x7f7f
-	dcbt	0,rSTR2
-	clrlwi.	rTMP, rTMP, 30
-	cmplwi	cr1, rN, 0
-	lis	rFEFE, -0x101
-	bne	L(unaligned)
-/* We are word aligned so set up for two loops.  first a word
-   loop, then fall into the byte loop if any residual.  */
-	srwi.	rTMP, rN, 2
-	clrlwi	rN, rN, 30
-	addi	rFEFE, rFEFE, -0x101
-	addi	r7F7F, r7F7F, 0x7f7f
-	cmplwi	cr1, rN, 0
-	beq	L(unaligned)
-
-	mtctr	rTMP	/* Power4 wants mtctr 1st in dispatch group.  */
-	lwz	rWORD1, 0(rSTR1)
-	lwz	rWORD2, 0(rSTR2)
-	b	L(g1)
-
-L(g0):
-	lwzu	rWORD1, 4(rSTR1)
-	bne-	cr1, L(different)
-	lwzu	rWORD2, 4(rSTR2)
-L(g1):	add	rTMP, rFEFE, rWORD1
-	nor	rNEG, r7F7F, rWORD1
-	bdz	L(tail)
-	and.	rTMP, rTMP, rNEG
-	cmpw	cr1, rWORD1, rWORD2
-	beq+	L(g0)
-
-/* OK. We've hit the end of the string. We need to be careful that
-   we don't compare two strings as different because of gunk beyond
-   the end of the strings...  */
-
-#ifdef __LITTLE_ENDIAN__
-L(endstring):
-	slwi	rTMP, rTMP, 1
-	addi    rTMP2, rTMP, -1
-	andc    rTMP2, rTMP2, rTMP
-	and	rWORD2, rWORD2, rTMP2		/* Mask off gunk.  */
-	and	rWORD1, rWORD1, rTMP2
-	rlwinm	rTMP2, rWORD2, 8, 0xffffffff	/* Byte reverse word.  */
-	rlwinm	rTMP, rWORD1, 8, 0xffffffff
-	rlwimi	rTMP2, rWORD2, 24, 0, 7
-	rlwimi	rTMP, rWORD1, 24, 0, 7
-	rlwimi	rTMP2, rWORD2, 24, 16, 23
-	rlwimi	rTMP, rWORD1, 24, 16, 23
-	xor.	rBITDIF, rTMP, rTMP2
-	sub	rRTN, rTMP, rTMP2
-	bgelr+
-	ori	rRTN, rTMP2, 1
-	blr
-
-L(different):
-	lwz	rWORD1, -4(rSTR1)
-	rlwinm	rTMP2, rWORD2, 8, 0xffffffff	/* Byte reverse word.  */
-	rlwinm	rTMP, rWORD1, 8, 0xffffffff
-	rlwimi	rTMP2, rWORD2, 24, 0, 7
-	rlwimi	rTMP, rWORD1, 24, 0, 7
-	rlwimi	rTMP2, rWORD2, 24, 16, 23
-	rlwimi	rTMP, rWORD1, 24, 16, 23
-	xor.	rBITDIF, rTMP, rTMP2
-	sub	rRTN, rTMP, rTMP2
-	bgelr+
-	ori	rRTN, rTMP2, 1
-	blr
-
-#else
-L(endstring):
-	and	rTMP, r7F7F, rWORD1
-	beq	cr1, L(equal)
-	add	rTMP, rTMP, r7F7F
-	xor.	rBITDIF, rWORD1, rWORD2
-	andc	rNEG, rNEG, rTMP
-	blt-	L(highbit)
-	cntlzw	rBITDIF, rBITDIF
-	cntlzw	rNEG, rNEG
-	addi	rNEG, rNEG, 7
-	cmpw	cr1, rNEG, rBITDIF
-	sub	rRTN, rWORD1, rWORD2
-	bgelr+	cr1
-L(equal):
-	li	rRTN, 0
-	blr
-
-L(different):
-	lwz	rWORD1, -4(rSTR1)
-	xor.	rBITDIF, rWORD1, rWORD2
-	sub	rRTN, rWORD1, rWORD2
-	bgelr+
-L(highbit):
-	ori	rRTN, rWORD2, 1
-	blr
-#endif
-
-/* Oh well.  In this case, we just do a byte-by-byte comparison.  */
-	.align 4
-L(tail):
-	and.	rTMP, rTMP, rNEG
-	cmpw	cr1, rWORD1, rWORD2
-	bne-	L(endstring)
-	addi	rSTR1, rSTR1, 4
-	bne-	cr1, L(different)
-	addi	rSTR2, rSTR2, 4
-	cmplwi	cr1, rN, 0
-L(unaligned):
-	mtctr   rN	/* Power4 wants mtctr 1st in dispatch group */
-	bgt	cr1, L(uz)
-L(ux):
-	li	rRTN, 0
-	blr
-	.align 4
-L(uz):
-	lbz	rWORD1, 0(rSTR1)
-	lbz	rWORD2, 0(rSTR2)
-	nop
-	b	L(u1)
-L(u0):
-	lbzu	rWORD2, 1(rSTR2)
-L(u1):
-	bdz	L(u3)
-	cmpwi	cr1, rWORD1, 0
-	cmpw	rWORD1, rWORD2
-	beq-	cr1, L(u3)
-	lbzu	rWORD1, 1(rSTR1)
-	bne-	L(u2)
-	lbzu	rWORD2, 1(rSTR2)
-	bdz	L(u3)
-	cmpwi	cr1, rWORD1, 0
-	cmpw	rWORD1, rWORD2
-	bne-	L(u3)
-	lbzu	rWORD1, 1(rSTR1)
-	bne+	cr1, L(u0)
-
-L(u2):	lbzu	rWORD1, -1(rSTR1)
-L(u3):	sub	rRTN, rWORD1, rWORD2
-	blr
-END (strncmp)
-libc_hidden_builtin_def (strncmp)
-- 
2.34.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 2/3] powerpc: Remove powerpc64 strncmp variants
  2023-02-28 17:23 [PATCH 0/3] Clean strncmp implementations Adhemerval Zanella
  2023-02-28 17:23 ` [PATCH 1/3] powerpc: Remove strncmp variants Adhemerval Zanella
@ 2023-02-28 17:23 ` Adhemerval Zanella
  2023-03-01  4:01   ` Rajalakshmi Srinivasaraghavan
  2023-02-28 17:24 ` [PATCH 3/3] alpha: Remove strncmp optimization Adhemerval Zanella
  2 siblings, 1 reply; 7+ messages in thread
From: Adhemerval Zanella @ 2023-02-28 17:23 UTC (permalink / raw)
  To: libc-alpha, Richard Henderson

The default, and power7 implementation just adds word aligned
access when inputs have the same aligment.  The unaligned case
is still done by byte operations.

This is already covered by the generic implementation, which also add
the unaligned input optimization.

Checked on powerpc64-linux-gnu built without multi-arch for powerpc64,
power7, power8, and power9 (build for le).
---
 sysdeps/powerpc/powerpc64/multiarch/Makefile  |   2 +-
 .../powerpc64/multiarch/ifunc-impl-list.c     |   2 -
 .../powerpc64/multiarch/strncmp-power7.S      |  23 --
 .../powerpc64/multiarch/strncmp-ppc64.S       |  26 --
 .../powerpc64/multiarch/strncmp-ppc64.c       |   7 +
 sysdeps/powerpc/powerpc64/multiarch/strncmp.c |   5 +-
 sysdeps/powerpc/powerpc64/power7/strncmp.S    | 228 ------------------
 sysdeps/powerpc/powerpc64/strncmp.S           | 210 ----------------
 8 files changed, 9 insertions(+), 494 deletions(-)
 delete mode 100644 sysdeps/powerpc/powerpc64/multiarch/strncmp-power7.S
 delete mode 100644 sysdeps/powerpc/powerpc64/multiarch/strncmp-ppc64.S
 create mode 100644 sysdeps/powerpc/powerpc64/multiarch/strncmp-ppc64.c
 delete mode 100644 sysdeps/powerpc/powerpc64/power7/strncmp.S
 delete mode 100644 sysdeps/powerpc/powerpc64/strncmp.S

diff --git a/sysdeps/powerpc/powerpc64/multiarch/Makefile b/sysdeps/powerpc/powerpc64/multiarch/Makefile
index ed25e234ba..57de4a29c4 100644
--- a/sysdeps/powerpc/powerpc64/multiarch/Makefile
+++ b/sysdeps/powerpc/powerpc64/multiarch/Makefile
@@ -12,7 +12,7 @@ sysdep_routines += memcpy-power8-cached memcpy-power7 memcpy-a2 memcpy-power6 \
 		   strnlen-power8 strnlen-power7 strnlen-ppc64 \
 		   strcasecmp-power7 strcasecmp_l-power7 \
 		   strncase-power7 strncase_l-power7 \
-		   strncmp-power8 strncmp-power7 strncmp-ppc64 \
+		   strncmp-power8 strncmp-ppc64 \
 		   strchr-power8 strchr-power7 strchr-ppc64 \
 		   strchrnul-power8 strchrnul-power7 strchrnul-ppc64 \
 		   strcpy-power8 strcpy-power7 strcpy-ppc64 stpcpy-power8 \
diff --git a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
index 6ac67cd28b..ebe9434052 100644
--- a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
@@ -169,8 +169,6 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 #endif
 	      IFUNC_IMPL_ADD (array, i, strncmp, hwcap2 & PPC_FEATURE2_ARCH_2_07,
 			      __strncmp_power8)
-	      IFUNC_IMPL_ADD (array, i, strncmp, hwcap & PPC_FEATURE_ARCH_2_06,
-			      __strncmp_power7)
 	      IFUNC_IMPL_ADD (array, i, strncmp, 1,
 			      __strncmp_ppc))
 
diff --git a/sysdeps/powerpc/powerpc64/multiarch/strncmp-power7.S b/sysdeps/powerpc/powerpc64/multiarch/strncmp-power7.S
deleted file mode 100644
index 919a31342b..0000000000
--- a/sysdeps/powerpc/powerpc64/multiarch/strncmp-power7.S
+++ /dev/null
@@ -1,23 +0,0 @@
-/* Copyright (C) 2013-2023 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#define STRNCMP __strncmp_power7
-
-#undef libc_hidden_builtin_def
-#define libc_hidden_builtin_def(name)
-
-#include <sysdeps/powerpc/powerpc64/power7/strncmp.S>
diff --git a/sysdeps/powerpc/powerpc64/multiarch/strncmp-ppc64.S b/sysdeps/powerpc/powerpc64/multiarch/strncmp-ppc64.S
deleted file mode 100644
index 8401a401ed..0000000000
--- a/sysdeps/powerpc/powerpc64/multiarch/strncmp-ppc64.S
+++ /dev/null
@@ -1,26 +0,0 @@
-/* Copyright (C) 2013-2023 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#if defined SHARED && IS_IN (libc)
-# define STRNCMP __strncmp_ppc
-
-# undef libc_hidden_builtin_def
-# define libc_hidden_builtin_def(name)				\
-    .globl __GI_strncmp; __GI_strncmp = __strncmp_ppc
-#endif
-
-#include <sysdeps/powerpc/powerpc64/strncmp.S>
diff --git a/sysdeps/powerpc/powerpc64/multiarch/strncmp-ppc64.c b/sysdeps/powerpc/powerpc64/multiarch/strncmp-ppc64.c
new file mode 100644
index 0000000000..09cc009a91
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/multiarch/strncmp-ppc64.c
@@ -0,0 +1,7 @@
+#if defined SHARED && IS_IN (libc)
+# define STRNCMP __strncmp_ppc
+# undef libc_hidden_builtin_def
+# define libc_hidden_builtin_def(name) \
+    __hidden_ver1 (__strncmp_ppc, __GI_strncmp, __strncmp_ppc);
+#endif
+#include <string/strncmp.c>
diff --git a/sysdeps/powerpc/powerpc64/multiarch/strncmp.c b/sysdeps/powerpc/powerpc64/multiarch/strncmp.c
index d2bb11b00d..e8bab8e23d 100644
--- a/sysdeps/powerpc/powerpc64/multiarch/strncmp.c
+++ b/sysdeps/powerpc/powerpc64/multiarch/strncmp.c
@@ -26,7 +26,6 @@
 # include "init-arch.h"
 
 extern __typeof (strncmp) __strncmp_ppc attribute_hidden;
-extern __typeof (strncmp) __strncmp_power7 attribute_hidden;
 extern __typeof (strncmp) __strncmp_power8 attribute_hidden;
 # ifdef __LITTLE_ENDIAN__
 extern __typeof (strncmp) __strncmp_power9 attribute_hidden;
@@ -43,7 +42,5 @@ libc_ifunc_redirected (__redirect_strncmp, strncmp,
 # endif
 		       (hwcap2 & PPC_FEATURE2_ARCH_2_07)
 		       ? __strncmp_power8
-		       : (hwcap & PPC_FEATURE_ARCH_2_06)
-			 ? __strncmp_power7
-			 : __strncmp_ppc);
+		       : __strncmp_ppc);
 #endif
diff --git a/sysdeps/powerpc/powerpc64/power7/strncmp.S b/sysdeps/powerpc/powerpc64/power7/strncmp.S
deleted file mode 100644
index 43aaf8f5b5..0000000000
--- a/sysdeps/powerpc/powerpc64/power7/strncmp.S
+++ /dev/null
@@ -1,228 +0,0 @@
-/* Optimized strcmp implementation for POWER7/PowerPC64.
-   Copyright (C) 2010-2023 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <sysdep.h>
-
-#ifndef STRNCMP
-# define STRNCMP strncmp
-#endif
-
-/* See strlen.s for comments on how the end-of-string testing works.  */
-
-/* int [r3] strncmp (const char *s1 [r3],
-		     const char *s2 [r4],
-		     size_t size [r5])  */
-
-	.machine power7
-ENTRY_TOCLESS (STRNCMP, 5)
-	CALL_MCOUNT 3
-
-#define rTMP2	r0
-#define rRTN	r3
-#define rSTR1	r3	/* first string arg */
-#define rSTR2	r4	/* second string arg */
-#define rN	r5	/* max string length */
-#define rWORD1	r6	/* current word in s1 */
-#define rWORD2	r7	/* current word in s2 */
-#define rWORD3  r10
-#define rWORD4  r11
-#define rFEFE	r8	/* constant 0xfefefefefefefeff (-0x0101010101010101) */
-#define r7F7F	r9	/* constant 0x7f7f7f7f7f7f7f7f */
-#define rNEG	r10	/* ~(word in s1 | 0x7f7f7f7f7f7f7f7f) */
-#define rBITDIF	r11	/* bits that differ in s1 & s2 words */
-#define rTMP	r12
-
-	dcbt	0,rSTR1
-	nop
-	or	rTMP,rSTR2,rSTR1
-	lis	r7F7F,0x7f7f
-	dcbt	0,rSTR2
-	nop
-	clrldi.	rTMP,rTMP,61
-	cmpldi	cr1,rN,0
-	lis	rFEFE,-0x101
-	bne	L(unaligned)
-/* We are doubleword aligned so set up for two loops.  first a double word
-   loop, then fall into the byte loop if any residual.  */
-	srdi.	rTMP,rN,3
-	clrldi	rN,rN,61
-	addi	rFEFE,rFEFE,-0x101
-	addi	r7F7F,r7F7F,0x7f7f
-	cmpldi	cr1,rN,0
-	beq	L(unaligned)
-
-	mtctr	rTMP
-	ld	rWORD1,0(rSTR1)
-	ld	rWORD2,0(rSTR2)
-	sldi	rTMP,rFEFE,32
-	insrdi	r7F7F,r7F7F,32,0
-	add	rFEFE,rFEFE,rTMP
-	b	L(g1)
-
-L(g0):
-	ldu	rWORD1,8(rSTR1)
-	bne	cr1,L(different)
-	ldu	rWORD2,8(rSTR2)
-L(g1):	add	rTMP,rFEFE,rWORD1
-	nor	rNEG,r7F7F,rWORD1
-	bdz	L(tail)
-	and.	rTMP,rTMP,rNEG
-	cmpd	cr1,rWORD1,rWORD2
-	beq	L(g0)
-
-/* OK. We've hit the end of the string. We need to be careful that
-   we don't compare two strings as different because of gunk beyond
-   the end of the strings...  */
-
-#ifdef __LITTLE_ENDIAN__
-L(endstring):
-	addi    rTMP2, rTMP, -1
-	beq	cr1, L(equal)
-	andc    rTMP2, rTMP2, rTMP
-	rldimi	rTMP2, rTMP2, 1, 0
-	and	rWORD2, rWORD2, rTMP2	/* Mask off gunk.  */
-	and	rWORD1, rWORD1, rTMP2
-	cmpd	cr1, rWORD1, rWORD2
-	beq	cr1, L(equal)
-	cmpb	rBITDIF, rWORD1, rWORD2	/* 0xff on equal bytes.  */
-	addi	rNEG, rBITDIF, 1
-	orc	rNEG, rNEG, rBITDIF	/* 0's below LS differing byte.  */
-	sldi	rNEG, rNEG, 8		/* 1's above LS differing byte.  */
-	andc	rWORD1, rWORD1, rNEG	/* mask off MS bytes.  */
-	andc	rWORD2, rWORD2, rNEG
-	xor.	rBITDIF, rWORD1, rWORD2
-	sub	rRTN, rWORD1, rWORD2
-	blt	L(highbit)
-	sradi	rRTN, rRTN, 63		/* must return an int.  */
-	ori	rRTN, rRTN, 1
-	blr
-L(equal):
-	li	rRTN, 0
-	blr
-
-L(different):
-	ld	rWORD1, -8(rSTR1)
-	cmpb	rBITDIF, rWORD1, rWORD2	/* 0xff on equal bytes.  */
-	addi	rNEG, rBITDIF, 1
-	orc	rNEG, rNEG, rBITDIF	/* 0's below LS differing byte.  */
-	sldi	rNEG, rNEG, 8		/* 1's above LS differing byte.  */
-	andc	rWORD1, rWORD1, rNEG	/* mask off MS bytes.  */
-	andc	rWORD2, rWORD2, rNEG
-	xor.	rBITDIF, rWORD1, rWORD2
-	sub	rRTN, rWORD1, rWORD2
-	blt	L(highbit)
-	sradi	rRTN, rRTN, 63
-	ori	rRTN, rRTN, 1
-	blr
-L(highbit):
-	sradi	rRTN, rWORD2, 63
-	ori	rRTN, rRTN, 1
-	blr
-
-#else
-L(endstring):
-	and	rTMP,r7F7F,rWORD1
-	beq	cr1,L(equal)
-	add	rTMP,rTMP,r7F7F
-	xor.	rBITDIF,rWORD1,rWORD2
-	andc	rNEG,rNEG,rTMP
-	blt	L(highbit)
-	cntlzd	rBITDIF,rBITDIF
-	cntlzd	rNEG,rNEG
-	addi	rNEG,rNEG,7
-	cmpd	cr1,rNEG,rBITDIF
-	sub	rRTN,rWORD1,rWORD2
-	blt	cr1,L(equal)
-	sradi	rRTN,rRTN,63		/* must return an int.  */
-	ori	rRTN,rRTN,1
-	blr
-L(equal):
-	li	rRTN,0
-	blr
-
-L(different):
-	ld	rWORD1,-8(rSTR1)
-	xor.	rBITDIF,rWORD1,rWORD2
-	sub	rRTN,rWORD1,rWORD2
-	blt	L(highbit)
-	sradi	rRTN,rRTN,63
-	ori	rRTN,rRTN,1
-	blr
-L(highbit):
-	sradi	rRTN,rWORD2,63
-	ori	rRTN,rRTN,1
-	blr
-#endif
-
-/* Oh well.  In this case, we just do a byte-by-byte comparison.  */
-	.align	4
-L(tail):
-	and.	rTMP,rTMP,rNEG
-	cmpd	cr1,rWORD1,rWORD2
-	bne	L(endstring)
-	addi	rSTR1,rSTR1,8
-	bne	cr1,L(different)
-	addi	rSTR2,rSTR2,8
-	cmpldi	cr1,rN,0
-L(unaligned):
-	mtctr	rN
-	ble	cr1,L(ux)
-L(uz):
-	lbz	rWORD1,0(rSTR1)
-	lbz	rWORD2,0(rSTR2)
-	.align	4
-L(u1):
-	cmpdi	cr1,rWORD1,0
-	bdz	L(u4)
-	cmpd	rWORD1,rWORD2
-	beq	cr1,L(u4)
-	bne	L(u4)
-	lbzu	rWORD3,1(rSTR1)
-	lbzu	rWORD4,1(rSTR2)
-	cmpdi	cr1,rWORD3,0
-	bdz	L(u3)
-	cmpd	rWORD3,rWORD4
-	beq	cr1,L(u3)
-	bne	L(u3)
-	lbzu	rWORD1,1(rSTR1)
-	lbzu	rWORD2,1(rSTR2)
-	cmpdi	cr1,rWORD1,0
-	bdz	L(u4)
-	cmpd	rWORD1,rWORD2
-	beq	cr1,L(u4)
-	bne	L(u4)
-	lbzu	rWORD3,1(rSTR1)
-	lbzu	rWORD4,1(rSTR2)
-	cmpdi	cr1,rWORD3,0
-	bdz	L(u3)
-	cmpd	rWORD3,rWORD4
-	beq	cr1,L(u3)
-	bne	L(u3)
-	lbzu	rWORD1,1(rSTR1)
-	lbzu	rWORD2,1(rSTR2)
-	b	L(u1)
-
-L(u3):  sub	rRTN,rWORD3,rWORD4
-	blr
-L(u4):	sub	rRTN,rWORD1,rWORD2
-	blr
-L(ux):
-	li	rRTN,0
-	blr
-END (STRNCMP)
-libc_hidden_builtin_def (strncmp)
diff --git a/sysdeps/powerpc/powerpc64/strncmp.S b/sysdeps/powerpc/powerpc64/strncmp.S
deleted file mode 100644
index 453903b920..0000000000
--- a/sysdeps/powerpc/powerpc64/strncmp.S
+++ /dev/null
@@ -1,210 +0,0 @@
-/* Optimized strcmp implementation for PowerPC64.
-   Copyright (C) 2003-2023 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-#include <sysdep.h>
-
-/* See strlen.s for comments on how the end-of-string testing works.  */
-
-/* int [r3] strncmp (const char *s1 [r3], const char *s2 [r4], size_t size [r5])  */
-
-#ifndef STRNCMP
-# define STRNCMP strncmp
-#endif
-
-ENTRY_TOCLESS (STRNCMP, 4)
-	CALL_MCOUNT 3
-
-#define rTMP2	r0
-#define rRTN	r3
-#define rSTR1	r3	/* first string arg */
-#define rSTR2	r4	/* second string arg */
-#define rN	r5	/* max string length */
-#define rWORD1	r6	/* current word in s1 */
-#define rWORD2	r7	/* current word in s2 */
-#define rFEFE	r8	/* constant 0xfefefefefefefeff (-0x0101010101010101) */
-#define r7F7F	r9	/* constant 0x7f7f7f7f7f7f7f7f */
-#define rNEG	r10	/* ~(word in s1 | 0x7f7f7f7f7f7f7f7f) */
-#define rBITDIF	r11	/* bits that differ in s1 & s2 words */
-#define rTMP	r12
-
-	dcbt	0,rSTR1
-	or	rTMP, rSTR2, rSTR1
-	lis	r7F7F, 0x7f7f
-	dcbt	0,rSTR2
-	clrldi.	rTMP, rTMP, 61
-	cmpldi	cr1, rN, 0
-	lis	rFEFE, -0x101
-	bne	L(unaligned)
-/* We are doubleword aligned so set up for two loops.  first a double word
-   loop, then fall into the byte loop if any residual.  */
-	srdi.	rTMP, rN, 3
-	clrldi	rN, rN, 61
-	addi	rFEFE, rFEFE, -0x101
-	addi	r7F7F, r7F7F, 0x7f7f
-	cmpldi	cr1, rN, 0
-	beq	L(unaligned)
-
-	mtctr	rTMP	/* Power4 wants mtctr 1st in dispatch group.  */
-	ld	rWORD1, 0(rSTR1)
-	ld	rWORD2, 0(rSTR2)
-	sldi	rTMP, rFEFE, 32
-	insrdi	r7F7F, r7F7F, 32, 0
-	add	rFEFE, rFEFE, rTMP
-	b	L(g1)
-
-L(g0):
-	ldu	rWORD1, 8(rSTR1)
-	bne-	cr1, L(different)
-	ldu	rWORD2, 8(rSTR2)
-L(g1):	add	rTMP, rFEFE, rWORD1
-	nor	rNEG, r7F7F, rWORD1
-	bdz	L(tail)
-	and.	rTMP, rTMP, rNEG
-	cmpd	cr1, rWORD1, rWORD2
-	beq+	L(g0)
-
-/* OK. We've hit the end of the string. We need to be careful that
-   we don't compare two strings as different because of gunk beyond
-   the end of the strings...  */
-
-#ifdef __LITTLE_ENDIAN__
-L(endstring):
-	addi    rTMP2, rTMP, -1
-	beq	cr1, L(equal)
-	andc    rTMP2, rTMP2, rTMP
-	rldimi	rTMP2, rTMP2, 1, 0
-	and	rWORD2, rWORD2, rTMP2	/* Mask off gunk.  */
-	and	rWORD1, rWORD1, rTMP2
-	cmpd	cr1, rWORD1, rWORD2
-	beq	cr1, L(equal)
-	xor	rBITDIF, rWORD1, rWORD2	/* rBITDIF has bits that differ.  */
-	neg	rNEG, rBITDIF
-	and	rNEG, rNEG, rBITDIF	/* rNEG has LS bit that differs.  */
-	cntlzd	rNEG, rNEG		/* bitcount of the bit.  */
-	andi.	rNEG, rNEG, 56		/* bitcount to LS byte that differs. */
-	sld	rWORD1, rWORD1, rNEG	/* shift left to clear MS bytes.  */
-	sld	rWORD2, rWORD2, rNEG
-	xor.	rBITDIF, rWORD1, rWORD2
-	sub	rRTN, rWORD1, rWORD2
-	blt-	L(highbit)
-	sradi	rRTN, rRTN, 63		/* must return an int.  */
-	ori	rRTN, rRTN, 1
-	blr
-L(equal):
-	li	rRTN, 0
-	blr
-
-L(different):
-	ld	rWORD1, -8(rSTR1)
-	xor	rBITDIF, rWORD1, rWORD2	/* rBITDIF has bits that differ.  */
-	neg	rNEG, rBITDIF
-	and	rNEG, rNEG, rBITDIF	/* rNEG has LS bit that differs.  */
-	cntlzd	rNEG, rNEG		/* bitcount of the bit.  */
-	andi.	rNEG, rNEG, 56		/* bitcount to LS byte that differs. */
-	sld	rWORD1, rWORD1, rNEG	/* shift left to clear MS bytes.  */
-	sld	rWORD2, rWORD2, rNEG
-	xor.	rBITDIF, rWORD1, rWORD2
-	sub	rRTN, rWORD1, rWORD2
-	blt-	L(highbit)
-	sradi	rRTN, rRTN, 63
-	ori	rRTN, rRTN, 1
-	blr
-L(highbit):
-	sradi	rRTN, rWORD2, 63
-	ori	rRTN, rRTN, 1
-	blr
-
-#else
-L(endstring):
-	and	rTMP, r7F7F, rWORD1
-	beq	cr1, L(equal)
-	add	rTMP, rTMP, r7F7F
-	xor.	rBITDIF, rWORD1, rWORD2
-	andc	rNEG, rNEG, rTMP
-	blt-	L(highbit)
-	cntlzd	rBITDIF, rBITDIF
-	cntlzd	rNEG, rNEG
-	addi	rNEG, rNEG, 7
-	cmpd	cr1, rNEG, rBITDIF
-	sub	rRTN, rWORD1, rWORD2
-	blt-	cr1, L(equal)
-	sradi	rRTN, rRTN, 63		/* must return an int.  */
-	ori	rRTN, rRTN, 1
-	blr
-L(equal):
-	li	rRTN, 0
-	blr
-
-L(different):
-	ld	rWORD1, -8(rSTR1)
-	xor.	rBITDIF, rWORD1, rWORD2
-	sub	rRTN, rWORD1, rWORD2
-	blt-	L(highbit)
-	sradi	rRTN, rRTN, 63
-	ori	rRTN, rRTN, 1
-	blr
-L(highbit):
-	sradi	rRTN, rWORD2, 63
-	ori	rRTN, rRTN, 1
-	blr
-#endif
-
-/* Oh well.  In this case, we just do a byte-by-byte comparison.  */
-	.align 4
-L(tail):
-	and.	rTMP, rTMP, rNEG
-	cmpd	cr1, rWORD1, rWORD2
-	bne-	L(endstring)
-	addi	rSTR1, rSTR1, 8
-	bne-	cr1, L(different)
-	addi	rSTR2, rSTR2, 8
-	cmpldi	cr1, rN, 0
-L(unaligned):
-	mtctr   rN	/* Power4 wants mtctr 1st in dispatch group */
-	bgt	cr1, L(uz)
-L(ux):
-	li	rRTN, 0
-	blr
-	.align 4
-L(uz):
-	lbz	rWORD1, 0(rSTR1)
-	lbz	rWORD2, 0(rSTR2)
-	nop
-	b	L(u1)
-L(u0):
-	lbzu	rWORD2, 1(rSTR2)
-L(u1):
-	bdz	L(u3)
-	cmpdi	cr1, rWORD1, 0
-	cmpd	rWORD1, rWORD2
-	beq-	cr1, L(u3)
-	lbzu	rWORD1, 1(rSTR1)
-	bne-	L(u2)
-	lbzu	rWORD2, 1(rSTR2)
-	bdz	L(u3)
-	cmpdi	cr1, rWORD1, 0
-	cmpd	rWORD1, rWORD2
-	bne-	L(u3)
-	lbzu	rWORD1, 1(rSTR1)
-	bne+	cr1, L(u0)
-
-L(u2):	lbzu	rWORD1, -1(rSTR1)
-L(u3):	sub	rRTN, rWORD1, rWORD2
-	blr
-END (STRNCMP)
-libc_hidden_builtin_def (strncmp)
-- 
2.34.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 3/3] alpha: Remove strncmp optimization
  2023-02-28 17:23 [PATCH 0/3] Clean strncmp implementations Adhemerval Zanella
  2023-02-28 17:23 ` [PATCH 1/3] powerpc: Remove strncmp variants Adhemerval Zanella
  2023-02-28 17:23 ` [PATCH 2/3] powerpc: Remove powerpc64 " Adhemerval Zanella
@ 2023-02-28 17:24 ` Adhemerval Zanella
  2023-02-28 19:11   ` Richard Henderson
  2 siblings, 1 reply; 7+ messages in thread
From: Adhemerval Zanella @ 2023-02-28 17:24 UTC (permalink / raw)
  To: libc-alpha, Richard Henderson

The generic implementation already cover word access along with
cmpbge for both aligned and unaligned, so use it instead.

Checked qemu static for alpha-linux-gnu.
---
 sysdeps/alpha/strncmp.S | 276 ----------------------------------------
 1 file changed, 276 deletions(-)
 delete mode 100644 sysdeps/alpha/strncmp.S

diff --git a/sysdeps/alpha/strncmp.S b/sysdeps/alpha/strncmp.S
deleted file mode 100644
index a54cc45a0a..0000000000
--- a/sysdeps/alpha/strncmp.S
+++ /dev/null
@@ -1,276 +0,0 @@
-/* Copyright (C) 1996-2023 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library.  If not, see
-   <https://www.gnu.org/licenses/>.  */
-
-/* Bytewise compare two null-terminated strings of length no longer than N.  */
-
-#include <sysdep.h>
-
-	.set noat
-	.set noreorder
-
-/* EV6 only predicts one branch per octaword.  We'll use these to push
-   subsequent branches back to the next bundle.  This will generally add
-   a fetch+decode cycle to older machines, so skip in that case.  */
-#ifdef __alpha_fix__
-# define ev6_unop	unop
-#else
-# define ev6_unop
-#endif
-
-	.text
-
-ENTRY(strncmp)
-#ifdef PROF
-	ldgp	gp, 0(pv)
-	lda	AT, _mcount
-	jsr	AT, (AT), _mcount
-	.prologue 1
-#else
-	.prologue 0
-#endif
-
-	xor	a0, a1, t2	# are s1 and s2 co-aligned?
-	beq	a2, $zerolength
-	ldq_u	t0, 0(a0)	# load asap to give cache time to catch up
-	ldq_u	t1, 0(a1)
-	lda	t3, -1
-	and	t2, 7, t2
-	srl	t3, 1, t6
-	and	a0, 7, t4	# find s1 misalignment
-	and	a1, 7, t5	# find s2 misalignment
-	cmovlt	a2, t6, a2	# bound neg count to LONG_MAX
-	addq	a1, a2, a3	# s2+count
-	addq	a2, t4, a2	# bias count by s1 misalignment
-	and	a2, 7, t10	# ofs of last byte in s1 last word
-	srl	a2, 3, a2	# remaining full words in s1 count
-	bne	t2, $unaligned
-
-	/* On entry to this basic block:
-	   t0 == the first word of s1.
-	   t1 == the first word of s2.
-	   t3 == -1.  */
-$aligned:
-	mskqh	t3, a1, t8	# mask off leading garbage
-	ornot	t1, t8, t1
-	ornot	t0, t8, t0
-	cmpbge	zero, t1, t7	# bits set iff null found
-	beq	a2, $eoc	# check end of count
-	bne	t7, $eos
-	beq	t10, $ant_loop
-
-	/* Aligned compare main loop.
-	   On entry to this basic block:
-	   t0 == an s1 word.
-	   t1 == an s2 word not containing a null.  */
-
-	.align 4
-$a_loop:
-	xor	t0, t1, t2	# e0	:
-	bne	t2, $wordcmp	# .. e1 (zdb)
-	ldq_u	t1, 8(a1)	# e0    :
-	ldq_u	t0, 8(a0)	# .. e1 :
-
-	subq	a2, 1, a2	# e0    :
-	addq	a1, 8, a1	# .. e1 :
-	addq	a0, 8, a0	# e0    :
-	beq	a2, $eoc	# .. e1 :
-
-	cmpbge	zero, t1, t7	# e0    :
-	beq	t7, $a_loop	# .. e1 :
-
-	br	$eos
-
-	/* Alternate aligned compare loop, for when there's no trailing
-	   bytes on the count.  We have to avoid reading too much data.  */
-	.align 4
-$ant_loop:
-	xor	t0, t1, t2	# e0	:
-	ev6_unop
-	ev6_unop
-	bne	t2, $wordcmp	# .. e1 (zdb)
-
-	subq	a2, 1, a2	# e0    :
-	beq	a2, $zerolength	# .. e1 :
-	ldq_u	t1, 8(a1)	# e0    :
-	ldq_u	t0, 8(a0)	# .. e1 :
-
-	addq	a1, 8, a1	# e0    :
-	addq	a0, 8, a0	# .. e1 :
-	cmpbge	zero, t1, t7	# e0    :
-	beq	t7, $ant_loop	# .. e1 :
-
-	br	$eos
-
-	/* The two strings are not co-aligned.  Align s1 and cope.  */
-	/* On entry to this basic block:
-	   t0 == the first word of s1.
-	   t1 == the first word of s2.
-	   t3 == -1.
-	   t4 == misalignment of s1.
-	   t5 == misalignment of s2.
-	  t10 == misalignment of s1 end.  */
-	.align	4
-$unaligned:
-	/* If s1 misalignment is larger than s2 misalignment, we need
-	   extra startup checks to avoid SEGV.  */
-	subq	a1, t4, a1	# adjust s2 for s1 misalignment
-	cmpult	t4, t5, t9
-	subq	a3, 1, a3	# last byte of s2
-	bic	a1, 7, t8
-	mskqh	t3, t5, t7	# mask garbage in s2
-	subq	a3, t8, a3
-	ornot	t1, t7, t7
-	srl	a3, 3, a3	# remaining full words in s2 count
-	beq	t9, $u_head
-
-	/* Failing that, we need to look for both eos and eoc within the
-	   first word of s2.  If we find either, we can continue by
-	   pretending that the next word of s2 is all zeros.  */
-	lda	t2, 0		# next = zero
-	cmpeq	a3, 0, t8	# eoc in the first word of s2?
-	cmpbge	zero, t7, t7	# eos in the first word of s2?
-	or	t7, t8, t8
-	bne	t8, $u_head_nl
-
-	/* We know just enough now to be able to assemble the first
-	   full word of s2.  We can still find a zero at the end of it.
-
-	   On entry to this basic block:
-	   t0 == first word of s1
-	   t1 == first partial word of s2.
-	   t3 == -1.
-	   t10 == ofs of last byte in s1 last word.
-	   t11 == ofs of last byte in s2 last word.  */
-$u_head:
-	ldq_u	t2, 8(a1)	# load second partial s2 word
-	subq	a3, 1, a3
-$u_head_nl:
-	extql	t1, a1, t1	# create first s2 word
-	mskqh	t3, a0, t8
-	extqh	t2, a1, t4
-	ornot	t0, t8, t0	# kill s1 garbage
-	or	t1, t4, t1	# s2 word now complete
-	cmpbge	zero, t0, t7	# find eos in first s1 word
-	ornot	t1, t8, t1	# kill s2 garbage
-	beq	a2, $eoc
-	subq	a2, 1, a2
-	bne	t7, $eos
-	mskql	t3, a1, t8	# mask out s2[1] bits we have seen
-	xor	t0, t1, t4	# compare aligned words
-	or	t2, t8, t8
-	bne	t4, $wordcmp
-	cmpbge	zero, t8, t7	# eos in high bits of s2[1]?
-	cmpeq	a3, 0, t8	# eoc in s2[1]?
-	or	t7, t8, t7
-	bne	t7, $u_final
-
-	/* Unaligned copy main loop.  In order to avoid reading too much,
-	   the loop is structured to detect zeros in aligned words from s2.
-	   This has, unfortunately, effectively pulled half of a loop
-	   iteration out into the head and half into the tail, but it does
-	   prevent nastiness from accumulating in the very thing we want
-	   to run as fast as possible.
-
-	   On entry to this basic block:
-	   t2 == the unshifted low-bits from the next s2 word.
-	   t10 == ofs of last byte in s1 last word.
-	   t11 == ofs of last byte in s2 last word.  */
-	.align 4
-$u_loop:
-	extql	t2, a1, t3	# e0    :
-	ldq_u	t2, 16(a1)	# .. e1 : load next s2 high bits
-	ldq_u	t0, 8(a0)	# e0    : load next s1 word
-	addq	a1, 8, a1	# .. e1 :
-
-	addq	a0, 8, a0	# e0    :
-	subq	a3, 1, a3	# .. e1 :
-	extqh	t2, a1, t1	# e0    :
-	cmpbge	zero, t0, t7	# .. e1 : eos in current s1 word
-
-	or	t1, t3, t1	# e0    :
-	beq	a2, $eoc	# .. e1 : eoc in current s1 word
-	subq	a2, 1, a2	# e0    :
-	cmpbge	zero, t2, t4	# .. e1 : eos in s2[1]
-
-	xor	t0, t1, t3	# e0    : compare the words
-	ev6_unop
-	ev6_unop
-	bne	t7, $eos	# .. e1 :
-
-	cmpeq	a3, 0, t5	# e0    : eoc in s2[1]
-	ev6_unop
-	ev6_unop
-	bne	t3, $wordcmp	# .. e1 :
-
-	or	t4, t5, t4	# e0    : eos or eoc in s2[1].
-	beq	t4, $u_loop	# .. e1 (zdb)
-
-	/* We've found a zero in the low bits of the last s2 word.  Get
-	   the next s1 word and align them.  */
-	.align 3
-$u_final:
-	ldq_u	t0, 8(a0)
-	extql	t2, a1, t1
-	cmpbge	zero, t1, t7
-	bne	a2, $eos
-
-	/* We've hit end of count.  Zero everything after the count
-	   and compare whats left.  */
-	.align 3
-$eoc:
-	mskql	t0, t10, t0
-	mskql	t1, t10, t1
-	cmpbge	zero, t1, t7
-
-	/* We've found a zero somewhere in a word we just read.
-	   On entry to this basic block:
-	   t0 == s1 word
-	   t1 == s2 word
-	   t7 == cmpbge mask containing the zero.  */
-	.align 3
-$eos:
-	negq	t7, t6		# create bytemask of valid data
-	and	t6, t7, t8
-	subq	t8, 1, t6
-	or	t6, t8, t7
-	zapnot	t0, t7, t0	# kill the garbage
-	zapnot	t1, t7, t1
-	xor	t0, t1, v0	# ... and compare
-	beq	v0, $done
-
-	/* Here we have two differing co-aligned words in t0 & t1.
-	   Bytewise compare them and return (t0 > t1 ? 1 : -1).  */
-	.align 3
-$wordcmp:
-	cmpbge	t0, t1, t2	# comparison yields bit mask of ge
-	cmpbge	t1, t0, t3
-	xor	t2, t3, t0	# bits set iff t0/t1 bytes differ
-	negq	t0, t1		# clear all but least bit
-	and	t0, t1, t0
-	lda	v0, -1
-	and	t0, t2, t1	# was bit set in t0 > t1?
-	cmovne	t1, 1, v0
-$done:
-	ret
-
-	.align 3
-$zerolength:
-	clr	v0
-	ret
-
-	END(strncmp)
-libc_hidden_builtin_def (strncmp)
-- 
2.34.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 3/3] alpha: Remove strncmp optimization
  2023-02-28 17:24 ` [PATCH 3/3] alpha: Remove strncmp optimization Adhemerval Zanella
@ 2023-02-28 19:11   ` Richard Henderson
  0 siblings, 0 replies; 7+ messages in thread
From: Richard Henderson @ 2023-02-28 19:11 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha

On 2/28/23 07:24, Adhemerval Zanella wrote:
> The generic implementation already cover word access along with
> cmpbge for both aligned and unaligned, so use it instead.
> 
> Checked qemu static for alpha-linux-gnu.
> ---
>   sysdeps/alpha/strncmp.S | 276 ----------------------------------------
>   1 file changed, 276 deletions(-)
>   delete mode 100644 sysdeps/alpha/strncmp.S

LGTM.

r~

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/3] powerpc: Remove strncmp variants
  2023-02-28 17:23 ` [PATCH 1/3] powerpc: Remove strncmp variants Adhemerval Zanella
@ 2023-03-01  4:01   ` Rajalakshmi Srinivasaraghavan
  0 siblings, 0 replies; 7+ messages in thread
From: Rajalakshmi Srinivasaraghavan @ 2023-03-01  4:01 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha, Richard Henderson


On 2/28/23 11:23 AM, Adhemerval Zanella via Libc-alpha wrote:
> The default, power4, and power7 implementation just adds word aligned
> access when inputs have the same aligment.  The unaligned case
> is still done by byte operations.
>
> This is already covered by the generic implementation, which also add
> the unaligned input optimization.
>
> Checked on powerpc-linux-gnu built without multi-arch for powerpc,
> power4, and power7.

LGTM.

Reviewed-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/3] powerpc: Remove powerpc64 strncmp variants
  2023-02-28 17:23 ` [PATCH 2/3] powerpc: Remove powerpc64 " Adhemerval Zanella
@ 2023-03-01  4:01   ` Rajalakshmi Srinivasaraghavan
  0 siblings, 0 replies; 7+ messages in thread
From: Rajalakshmi Srinivasaraghavan @ 2023-03-01  4:01 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha, Richard Henderson


On 2/28/23 11:23 AM, Adhemerval Zanella via Libc-alpha wrote:
> The default, and power7 implementation just adds word aligned
> access when inputs have the same aligment.  The unaligned case
> is still done by byte operations.
>
> This is already covered by the generic implementation, which also add
> the unaligned input optimization.
>
> Checked on powerpc64-linux-gnu built without multi-arch for powerpc64,
> power7, power8, and power9 (build for le).

LGTM.

Reviewed-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-03-01  4:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-28 17:23 [PATCH 0/3] Clean strncmp implementations Adhemerval Zanella
2023-02-28 17:23 ` [PATCH 1/3] powerpc: Remove strncmp variants Adhemerval Zanella
2023-03-01  4:01   ` Rajalakshmi Srinivasaraghavan
2023-02-28 17:23 ` [PATCH 2/3] powerpc: Remove powerpc64 " Adhemerval Zanella
2023-03-01  4:01   ` Rajalakshmi Srinivasaraghavan
2023-02-28 17:24 ` [PATCH 3/3] alpha: Remove strncmp optimization Adhemerval Zanella
2023-02-28 19:11   ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).