public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
To: libc-alpha@sourceware.org,
	Richard Henderson <richard.henderson@linaro.org>,
	Noah Goldstein <goldstein.w.n@gmail.com>
Subject: [PATCH v9 00/22] Improve generic string routines
Date: Tue, 17 Jan 2023 16:59:52 -0300	[thread overview]
Message-ID: <20230117200014.1299923-1-adhemerval.zanella@linaro.org> (raw)

It is done by:

  1. Parametrizing the internal routines (for instance the find zero
     in a word) so each architecture can reimplement without the need
     to reimplement the whole routine.

  2. Vectorizing more string implementations (for instance strcpy
     and strcmp).

  3. Change some implementations to use already possible optimized
     ones (strnlen and strchr).  It makes new ports to focus on
     only provide optimized implementation of a hardful symbols
     (for instance memchr) and make its improvement to be used in
     a larger set of routines.

I checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu,
and powerpc64-linux-gnu by removing the arch-specific assembly
implementation and disabling multiarch (it covers both LE and BE
for 64 and 32 bits). I also checked the string routines on alpha, hppa,
and sh.

Changes since v8:
  * Change memrchr to use vectorized load on final string, instead of
    byte per byte reads.
  * Remove string-maskoff.h header.
  * Add string-repeat_bytes.h and string-shift.h.
  * Hook up the generic implementation on string tests.

Changes since v7:
  * Split string-fzc.h out of string-fzi.h, with all of the
    routines that are combinations of fza and fzi routines.
  * Fix missing find_t and shift_find() from alpha, arm, powerpc.
  * Use compiler builtins for arm and powerpc.
  * Define sh4 has_zero() via has_eq(), rather than reverse.

Changes since v6:
  * Add find_t to handle alpha way of comapring bytes (which returns
    a bit-mask instead of byte-mask).
  * Fixed alpha string-fzi.h and added string-fza.h.
  * Renamed check_mask to shift_find.

Changes since v5:
  * Replace 'inline' with '__always_inline' macros.
  * Replace strchr implementation with a simpler one that call
    strchrnul.
  * Add strchrnul suggested changes.
  * Add memchr suggested changes.
  * Added check_mask on string-maskoff.h.
  * Rebase and update Copyright years.

Changes since v4:
  * Removed __clz and __ctz in favor of count_leading_zero and
    count_trailing_zeros from longlong.h.
  * Use repeat_bytes more often.
  * Added a comment on strcmp final_cmp on why index_first_zero_ne can
    not be used.

Changes since v3:
  * Rebased against master.
  * Dropped strcpy optimization.
  * Refactor strcmp implementation.
  * Some minor changes in comments.

Changes since v2:
  * Move string-fz{a,b,i} to its own patch.
  * Add a inline implementation for __builtin_c{l,t}z to avoid using
    compiler provided symbols.
  * Add a new header, string-maskoff.h, to handle unaligned accesses
    on some implementation.
  * Fixed strcmp on LE machines.
  * Added a unaligned strcpy variant for architecture that define
    _STRING_ARCH_unaligned.
  * Add SH string-fzb.h (which uses cmp/str instruction to find
    a zero in word).

Changes since v1:
  * Marked ChangeLog entries with [BZ #5806], as appropriate.
  * Reorganized the headers, so that armv6t2 and power6 need override
    as little as possible to use their (integer) zero detection insns.
  * Hopefully fixed all of the coding style issues.
  * Adjusted the memrchr algorithm as discussed.
  * Replaced the #ifdef STRRCHR etc that are used by the multiarch
  * files.
  * Tested on i386, i686, x86_64 (verified this is unused), ppc64,
    ppc64le --with-cpu=power8 (to use power6 in multiarch), armv7,
    aarch64, alpha (qemu) and hppa (qemu).

Adhemerval Zanella (16):
  Parameterize op_t from memcopy.h
  Add string vectorized find and detection functions
  string: Improve generic strlen
  string: Improve generic strnlen
  string: Improve generic strchr
  string: Improve generic strchrnul
  string: Improve generic strcmp
  string: Improve generic memchr
  string: Improve generic memrchr
  sh: Add string-fzb.h
  string: Hook up the default implementation on test-strlen
  string: Hook up the default implementation on test-strnlen
  string: Hook up the default implementation on test-strchr
  string: Hook up the default implementation on test-strcmp
  string: Hook up the default implementation on test-memchr
  string: Hook up the default implementation on test-memrchr

Richard Henderson (6):
  Parameterize OP_T_THRES from memcopy.h
  hppa: Add memcopy.h
  hppa: Add string-fzb.h and string-fzi.h
  alpha: Add string-fzb.h and string-fzi.h
  arm: Add string-fza.h
  powerpc: Add string-fza.h

 string/memchr.c                               | 178 +++++-----------
 string/memcmp.c                               |   4 -
 string/memrchr.c                              | 195 ++++--------------
 string/strchr.c                               | 159 +-------------
 string/strchrnul.c                            | 155 ++------------
 string/strcmp.c                               | 119 +++++++++--
 string/strlen.c                               |  92 ++-------
 string/strnlen.c                              | 137 +-----------
 string/test-memchr.c                          |  31 ++-
 string/test-memrchr.c                         |   7 +
 string/test-strchr.c                          |  47 +++--
 string/test-strcmp.c                          |  22 ++
 string/test-strlen.c                          |  31 ++-
 string/test-strnlen.c                         |  35 +++-
 sysdeps/alpha/string-fza.h                    |  61 ++++++
 sysdeps/alpha/string-fzb.h                    |  52 +++++
 sysdeps/alpha/string-fzi.h                    |  62 ++++++
 sysdeps/alpha/string-shift.h                  |  42 ++++
 sysdeps/arm/armv6t2/string-fza.h              |  68 ++++++
 sysdeps/generic/memcopy.h                     |  10 +-
 sysdeps/generic/string-extbyte.h              |  37 ++++
 sysdeps/generic/string-fza.h                  | 105 ++++++++++
 sysdeps/generic/string-fzb.h                  |  49 +++++
 sysdeps/generic/string-fzc.h                  |  91 ++++++++
 sysdeps/generic/string-fzi.h                  |  71 +++++++
 sysdeps/generic/string-opthr.h                |  25 +++
 sysdeps/generic/string-optype.h               |  24 +++
 sysdeps/generic/string-repeat_bytes.h         |  32 +++
 sysdeps/generic/string-shift.h                |  49 +++++
 sysdeps/hppa/memcopy.h                        |  42 ++++
 sysdeps/hppa/string-fzb.h                     |  70 +++++++
 sysdeps/hppa/string-fzc.h                     | 124 +++++++++++
 sysdeps/hppa/string-fzi.h                     |  63 ++++++
 sysdeps/i386/i686/multiarch/strnlen-c.c       |  14 +-
 sysdeps/i386/memcopy.h                        |   3 -
 sysdeps/i386/string-opthr.h                   |  25 +++
 sysdeps/m68k/memcopy.h                        |   3 -
 sysdeps/powerpc/powerpc32/power4/memcopy.h    |   5 -
 .../powerpc32/power4/multiarch/memchr-ppc32.c |  14 +-
 .../power4/multiarch/strchrnul-ppc32.c        |   4 -
 .../power4/multiarch/strnlen-ppc32.c          |  14 +-
 .../powerpc64/multiarch/memchr-ppc64.c        |   9 +-
 sysdeps/powerpc/string-fza.h                  |  72 +++++++
 sysdeps/s390/strchr-c.c                       |  11 +-
 sysdeps/s390/strchrnul-c.c                    |   2 -
 sysdeps/s390/strlen-c.c                       |  10 +-
 sysdeps/s390/strnlen-c.c                      |  14 +-
 sysdeps/sh/string-fzb.h                       |  55 +++++
 sysdeps/x86_64/x32/string-optype.h            |  24 +++
 49 files changed, 1657 insertions(+), 911 deletions(-)
 create mode 100644 sysdeps/alpha/string-fza.h
 create mode 100644 sysdeps/alpha/string-fzb.h
 create mode 100644 sysdeps/alpha/string-fzi.h
 create mode 100644 sysdeps/alpha/string-shift.h
 create mode 100644 sysdeps/arm/armv6t2/string-fza.h
 create mode 100644 sysdeps/generic/string-extbyte.h
 create mode 100644 sysdeps/generic/string-fza.h
 create mode 100644 sysdeps/generic/string-fzb.h
 create mode 100644 sysdeps/generic/string-fzc.h
 create mode 100644 sysdeps/generic/string-fzi.h
 create mode 100644 sysdeps/generic/string-opthr.h
 create mode 100644 sysdeps/generic/string-optype.h
 create mode 100644 sysdeps/generic/string-repeat_bytes.h
 create mode 100644 sysdeps/generic/string-shift.h
 create mode 100644 sysdeps/hppa/memcopy.h
 create mode 100644 sysdeps/hppa/string-fzb.h
 create mode 100644 sysdeps/hppa/string-fzc.h
 create mode 100644 sysdeps/hppa/string-fzi.h
 create mode 100644 sysdeps/i386/string-opthr.h
 create mode 100644 sysdeps/powerpc/string-fza.h
 create mode 100644 sysdeps/sh/string-fzb.h
 create mode 100644 sysdeps/x86_64/x32/string-optype.h

-- 
2.34.1


             reply	other threads:[~2023-01-17 20:00 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-17 19:59 Adhemerval Zanella [this message]
2023-01-17 19:59 ` [PATCH v9 01/22] Parameterize op_t from memcopy.h Adhemerval Zanella
2023-01-17 19:59 ` [PATCH v9 02/22] Parameterize OP_T_THRES " Adhemerval Zanella
2023-01-17 19:59 ` [PATCH v9 03/22] Add string vectorized find and detection functions Adhemerval Zanella
2023-01-18  1:35   ` Richard Henderson
2023-01-18  1:54   ` Richard Henderson
2023-01-18 12:42     ` Adhemerval Zanella Netto
2023-01-18 19:24       ` Richard Henderson
2023-01-18 19:38         ` Adhemerval Zanella Netto
2023-01-17 19:59 ` [PATCH v9 04/22] string: Improve generic strlen Adhemerval Zanella
2023-01-17 19:59 ` [PATCH v9 05/22] string: Improve generic strnlen Adhemerval Zanella
2023-01-17 19:59 ` [PATCH v9 06/22] string: Improve generic strchr Adhemerval Zanella
2023-01-18  1:42   ` Richard Henderson
2023-01-17 19:59 ` [PATCH v9 07/22] string: Improve generic strchrnul Adhemerval Zanella
2023-01-17 20:00 ` [PATCH v9 08/22] string: Improve generic strcmp Adhemerval Zanella
2023-01-17 20:00 ` [PATCH v9 09/22] string: Improve generic memchr Adhemerval Zanella
2023-01-17 20:00 ` [PATCH v9 10/22] string: Improve generic memrchr Adhemerval Zanella
2023-01-18  2:10   ` Richard Henderson
2023-01-17 20:00 ` [PATCH v9 11/22] hppa: Add memcopy.h Adhemerval Zanella
2023-01-17 20:00 ` [PATCH v9 12/22] hppa: Add string-fzb.h and string-fzi.h Adhemerval Zanella
2023-01-17 20:00 ` [PATCH v9 13/22] alpha: " Adhemerval Zanella
2023-01-18  1:58   ` Richard Henderson
2023-01-17 20:00 ` [PATCH v9 14/22] arm: Add string-fza.h Adhemerval Zanella
2023-01-17 20:00 ` [PATCH v9 15/22] powerpc: " Adhemerval Zanella
2023-01-17 20:00 ` [PATCH v9 16/22] sh: Add string-fzb.h Adhemerval Zanella
2023-01-17 20:00 ` [PATCH v9 17/22] string: Hook up the default implementation on test-strlen Adhemerval Zanella
2023-01-17 20:00 ` [PATCH v9 18/22] string: Hook up the default implementation on test-strnlen Adhemerval Zanella
2023-01-17 20:00 ` [PATCH v9 19/22] string: Hook up the default implementation on test-strchr Adhemerval Zanella
2023-01-17 20:00 ` [PATCH v9 20/22] string: Hook up the default implementation on test-strcmp Adhemerval Zanella
2023-01-17 20:00 ` [PATCH v9 21/22] string: Hook up the default implementation on test-memchr Adhemerval Zanella
2023-01-17 20:00 ` [PATCH v9 22/22] string: Hook up the default implementation on test-memrchr Adhemerval Zanella
2023-01-18 18:51   ` Carlos O'Donell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230117200014.1299923-1-adhemerval.zanella@linaro.org \
    --to=adhemerval.zanella@linaro.org \
    --cc=goldstein.w.n@gmail.com \
    --cc=libc-alpha@sourceware.org \
    --cc=richard.henderson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).