public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 00/17] Improve generic string routines
@ 2022-09-02 20:39 Adhemerval Zanella
  2022-09-02 20:39 ` [PATCH 01/17] Parameterize op_t from memcopy.h Adhemerval Zanella
                   ` (16 more replies)
  0 siblings, 17 replies; 33+ messages in thread
From: Adhemerval Zanella @ 2022-09-02 20:39 UTC (permalink / raw)
  To: libc-alpha; +Cc: Joseph Myers, caiyinyu

It is an update my previous patchset [1] to provide generic string 
implementation for newer ports and make them only focus on just 
specific routines to get a better overall improvement.

It is done by:

  1. parametrizing the internal routines (for instance the find zero
     in a word) so each architecture can reimplement without the need
     to reimplement the whole routine.

  2. vectorizing more string implementations (for instance strcpy 
     and strcmp).

  3. Change some implementations to use already possible optimized
     ones (for instance strnlen).  It makes new ports to focus on
     only provide optimized implementation of a hardful symbols
     (for instance memchr) and make its improvement to be used in
     a larger set of routines.

For the rest of #5806 I think we can handle them later and if 
performance of generic implementation is closer I think it is better
to just remove old assembly implementations.

I also checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu,
and powerpc64-linux-gnu by removing the arch-specific assembly 
implementation and disabling multiarch (it covers both LE and BE
for 64 and 32 bits). I also checked the string routines on alpha, hppa,
and sh.

Changes since v3:
  * Rebased against master.
  * Dropped strcpy optimization.
  * Refactor strcmp implementation.
  * Some minor changes in comments.

Changes since v2:
  * Move string-fz{a,b,i} to its own patch.
  * Add a inline implementation for __builtin_c{l,t}z to avoid using
    compiler provided symbols.
  * Add a new header, string-maskoff.h, to handle unaligned accesses
    on some implementation.
  * Fixed strcmp on LE machines.
  * Added a unaligned strcpy variant for architecture that define
    _STRING_ARCH_unaligned.
  * Add SH string-fzb.h (which uses cmp/str instruction to find
    a zero in word).

Changes since v1:
  * Marked ChangeLog entries with [BZ #5806], as appropriate.
  * Reorganized the headers, so that armv6t2 and power6 need override
    as little as possible to use their (integer) zero detection insns.
  * Hopefully fixed all of the coding style issues.
  * Adjusted the memrchr algorithm as discussed.
  * Replaced the #ifdef STRRCHR etc that are used by the multiarch
  * files.
  * Tested on i386, i686, x86_64 (verified this is unused), ppc64,
    ppc64le --with-cpu=power8 (to use power6 in multiarch), armv7,
    aarch64, alpha (qemu) and hppa (qemu).

[1] https://sourceware.org/legacy-ml/libc-alpha/2018-01/msg00318.html

Adhemerval Zanella (10):
  Add string-maskoff.h generic header
  Add string vectorized find and detection functions
  string: Improve generic strlen
  string: Improve generic strnlen
  string: Improve generic strchr
  string: Improve generic strchrnul
  string: Improve generic strcmp
  string: Improve generic memchr
  string: Improve generic memrchr
  sh: Add string-fzb.h

Richard Henderson (7):
  Parameterize op_t from memcopy.h
  Parameterize OP_T_THRES from memcopy.h
  hppa: Add memcopy.h
  hppa: Add string-fzb.h and string-fzi.h
  alpha: Add string-fzb.h and string-fzi.h
  arm: Add string-fza.h
  powerpc: Add string-fza.h

 config.h.in                                   |   8 +
 configure                                     |  54 +++++
 configure.ac                                  |  34 +++
 string/memchr.c                               | 168 ++++----------
 string/memcmp.c                               |   4 -
 string/memrchr.c                              | 189 +++-------------
 string/strchr.c                               | 172 +++------------
 string/strchrnul.c                            | 156 ++-----------
 string/strcmp.c                               | 117 ++++++++--
 string/strlen.c                               |  90 ++------
 string/strnlen.c                              | 137 +-----------
 sysdeps/alpha/string-fzb.h                    |  51 +++++
 sysdeps/alpha/string-fzi.h                    | 113 ++++++++++
 sysdeps/arm/armv6t2/string-fza.h              |  70 ++++++
 sysdeps/generic/memcopy.h                     |  10 +-
 sysdeps/generic/string-extbyte.h              |  37 ++++
 sysdeps/generic/string-fza.h                  | 106 +++++++++
 sysdeps/generic/string-fzb.h                  |  49 +++++
 sysdeps/generic/string-fzi.h                  | 208 ++++++++++++++++++
 sysdeps/generic/string-maskoff.h              |  73 ++++++
 sysdeps/generic/string-opthr.h                |  25 +++
 sysdeps/generic/string-optype.h               |  31 +++
 sysdeps/hppa/memcopy.h                        |  42 ++++
 sysdeps/hppa/string-fzb.h                     |  69 ++++++
 sysdeps/hppa/string-fzi.h                     | 135 ++++++++++++
 sysdeps/i386/i686/multiarch/strnlen-c.c       |  14 +-
 sysdeps/i386/memcopy.h                        |   3 -
 sysdeps/i386/string-opthr.h                   |  25 +++
 sysdeps/m68k/memcopy.h                        |   3 -
 sysdeps/powerpc/powerpc32/power4/memcopy.h    |   5 -
 .../powerpc32/power4/multiarch/memchr-ppc32.c |  14 +-
 .../power4/multiarch/strchrnul-ppc32.c        |   4 -
 .../power4/multiarch/strnlen-ppc32.c          |  14 +-
 .../powerpc64/multiarch/memchr-ppc64.c        |   9 +-
 sysdeps/powerpc/string-fza.h                  |  70 ++++++
 sysdeps/s390/strchr-c.c                       |  11 +-
 sysdeps/s390/strchrnul-c.c                    |   2 -
 sysdeps/s390/strlen-c.c                       |  10 +-
 sysdeps/s390/strnlen-c.c                      |  14 +-
 sysdeps/sh/string-fzb.h                       |  53 +++++
 40 files changed, 1548 insertions(+), 851 deletions(-)
 create mode 100644 sysdeps/alpha/string-fzb.h
 create mode 100644 sysdeps/alpha/string-fzi.h
 create mode 100644 sysdeps/arm/armv6t2/string-fza.h
 create mode 100644 sysdeps/generic/string-extbyte.h
 create mode 100644 sysdeps/generic/string-fza.h
 create mode 100644 sysdeps/generic/string-fzb.h
 create mode 100644 sysdeps/generic/string-fzi.h
 create mode 100644 sysdeps/generic/string-maskoff.h
 create mode 100644 sysdeps/generic/string-opthr.h
 create mode 100644 sysdeps/generic/string-optype.h
 create mode 100644 sysdeps/hppa/memcopy.h
 create mode 100644 sysdeps/hppa/string-fzb.h
 create mode 100644 sysdeps/hppa/string-fzi.h
 create mode 100644 sysdeps/i386/string-opthr.h
 create mode 100644 sysdeps/powerpc/string-fza.h
 create mode 100644 sysdeps/sh/string-fzb.h

-- 
2.34.1


^ permalink raw reply	[flat|nested] 33+ messages in thread
* [PATCH 04/17] Add string vectorized find and detection functions
@ 2022-09-03 13:13 Wilco Dijkstra
  2022-09-19 13:59 ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 33+ messages in thread
From: Wilco Dijkstra @ 2022-09-03 13:13 UTC (permalink / raw)
  To: 'GNU C Library'; +Cc: Adhemerval Zanella

Hi Adhemerval,

+static inline unsigned int
+__clz (op_t x)
+{
+#if !HAVE_BUILTIN_CLZ
+  unsigned r;
+  op_t i;
+
+  x |= x >> 1;
+  x |= x >> 2;
+  x |= x >> 4;
+  x |= x >> 8;
+  x |= x >> 16;
+# if __WORDSIZE == 64
+  x |= x >> 32;
+  i = x * 0x03F79D71B4CB0A89ull >> 58;
+# else
+  i = x * 0x07C4ACDDU >> 27;
+# endif
+  r = index_access (i);
+  return r ^ (sizeof (op_t) * CHAR_BIT - 1);
+#else
+  if (sizeof (op_t) == sizeof (long int))
+    return __builtin_clzl (x);
+  else
+    return __builtin_clzll (x);
+#endif
+}

This is a really bad idea. Firstly it is incorrect - sizeof (op_t) != __WORDSIZE due to
the odd way it is defined (it can be 64 bits on 32-bit targets). That in itself is
problematic since it isn't clear that using 64 bits operations extensively is efficient
on 32-bit targets (using 64-bit multiplies in GMP is different from using 64-bit
load/store in memcpy/memset which is different from 64-bit logical operations and
shifts, so all of these should be decoupled rather than forced together).

Secondly, there are already several ways to use count leading zeroes in GLIBC.
One is use the builtin unconditionally (done in lots of places, eg. by math code),
another is count_leading_zeros defined in longlong.h. This would add the third way. 
It's not clear how much gain inlining gives over using the libgcc implementation,
but if it is significant then we could provide a generic inline clzl/clzll that can be
used throughout GLIBC (replacing existing builtin_clz and count_leading_zeros).

Finally, emulating a full clz is inefficient. If you have already called find_zero_low
then there are at most 4 bits set on a 32-bit LE target, so you can trivially get the
index of the first zero byte via:

x = x & -x;
x = (x >> 15) + (x >> 22) + 3 * (x >> 31);

This is many times faster. There may be similar sequences for big-endian, but
you could just do a multiply with a magic word that gives the correct result
without needing a lookup table.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2022-09-22 17:51 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-02 20:39 [PATCH 00/17] Improve generic string routines Adhemerval Zanella
2022-09-02 20:39 ` [PATCH 01/17] Parameterize op_t from memcopy.h Adhemerval Zanella
2022-09-02 20:39 ` [PATCH 02/17] Parameterize OP_T_THRES " Adhemerval Zanella
2022-09-02 20:39 ` [PATCH 03/17] Add string-maskoff.h generic header Adhemerval Zanella
2022-09-02 20:39 ` [PATCH 04/17] Add string vectorized find and detection functions Adhemerval Zanella
2022-09-03  3:20   ` Noah Goldstein
2022-09-19 14:00     ` Adhemerval Zanella Netto
2022-09-02 20:39 ` [PATCH 05/17] string: Improve generic strlen Adhemerval Zanella
2022-09-02 20:39 ` [PATCH 06/17] string: Improve generic strnlen Adhemerval Zanella
2022-09-02 20:39 ` [PATCH 07/17] string: Improve generic strchr Adhemerval Zanella
2022-09-02 20:39 ` [PATCH 08/17] string: Improve generic strchrnul Adhemerval Zanella
2022-09-02 20:39 ` [PATCH 09/17] string: Improve generic strcmp Adhemerval Zanella
2022-09-03  3:31   ` Noah Goldstein
2022-09-19 14:04     ` Adhemerval Zanella Netto
2022-09-03  8:54   ` Richard Henderson
2022-09-02 20:39 ` [PATCH 10/17] string: Improve generic memchr Adhemerval Zanella
2022-09-03  3:47   ` Noah Goldstein
2022-09-19 19:17     ` Adhemerval Zanella Netto
2022-09-19 21:59       ` Noah Goldstein
2022-09-22 17:51         ` Adhemerval Zanella Netto
2022-09-02 20:39 ` [PATCH 11/17] string: Improve generic memrchr Adhemerval Zanella
2022-09-02 20:39 ` [PATCH 12/17] hppa: Add memcopy.h Adhemerval Zanella
2022-09-02 20:39 ` [PATCH 13/17] hppa: Add string-fzb.h and string-fzi.h Adhemerval Zanella
2022-09-02 20:39 ` [PATCH 14/17] alpha: " Adhemerval Zanella
2022-09-02 20:39 ` [PATCH 15/17] arm: Add string-fza.h Adhemerval Zanella
2022-09-05 15:40   ` Richard Earnshaw
2022-09-05 15:50     ` Richard Earnshaw
2022-09-02 20:39 ` [PATCH 16/17] powerpc: " Adhemerval Zanella
2022-09-06 14:48   ` Paul E Murphy
2022-09-19 19:55     ` Adhemerval Zanella Netto
2022-09-02 20:39 ` [PATCH 17/17] sh: Add string-fzb.h Adhemerval Zanella
2022-09-03 13:13 [PATCH 04/17] Add string vectorized find and detection functions Wilco Dijkstra
2022-09-19 13:59 ` Adhemerval Zanella Netto

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).