From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x533.google.com (mail-ed1-x533.google.com [IPv6:2a00:1450:4864:20::533]) by sourceware.org (Postfix) with ESMTPS id 047143857B80 for ; Sat, 3 Sep 2022 03:20:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 047143857B80 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ed1-x533.google.com with SMTP id b44so4912582edf.9 for ; Fri, 02 Sep 2022 20:20:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=LSoOxTNWPCAsqAj3ZSOih8o+erwNEF0eqQ49n6zCDGU=; b=Ar0uG+cSuejys6qlQxEFheYv+U9oxQt8mxzjNF158/28N5HAL0dD2TWMAu4Bt67y+z WL+HBO3b/T/yDjFu2YdcS8C+dgqjtYowQAQRSufoBNp/+stw8mKIEjS44bPsOiMsO0Wv vdWlVKtn8bMrwDV2i4EjUEbEV23lOfalyryIwBEEyP1eJ5qYqMslOv16Ehr4iUklZ4pv o4l1S3OZxehYtYNCD+JivKH8o5iDR6Wto1vUbD6JKAPjynj8FtjbfapwwIryoMPECre+ 5uA5tZc+1aeheNiU97ERPuMC29P9j/w8X7/qzAWTILk4av7J6wIm9Os2smnfc8nxdfes j5kA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=LSoOxTNWPCAsqAj3ZSOih8o+erwNEF0eqQ49n6zCDGU=; b=GWKPs+BOGQJWipMHXLP/CN8WDFAPkLXDqi/1LJgYeIapaYQJXXYW/Tb38k6i3nUBGL WgFXjVLAVq3ZmiECfIWeKlr4w58qjnW79H69Mpqn2O/nlldxPqoHVmvDMNOJVpG5DjkT oSetyqohpcNEjacfI/uJ0bj+bQvn22vQnBF07O4NWtyac2FhBIOCza5CKUh+V6Nn/DUl Mjz2tHu2YLEgGprTJZ2EsKdeyPNMtq8YyHzPVHvJykzNBOk96rfCjezXkVpwplwi3+FA 7A424VoGvPo+1hUPHG/rXz8ZQk/Oc+oTCizIoyHU4P0ygZxi4qIInAg2jDtDqDE2qg8P rZBg== X-Gm-Message-State: ACgBeo2yf2Lvry74TC1q4hg50cPSpes6BEtBH9pUTS08VJYqzkqNLSUb I8F41E8Kf+ejWtzVxsT5WBFwT6LvrVVXPxlObC4= X-Google-Smtp-Source: AA6agR4nIRxYZ7AUKZd1IdXhp4fbn8xfFiBGkWsb+7N+evu+Hxwvph5tr/BMPP4IbHadzz1hGR7DhDxDVIjEJEahpgU= X-Received: by 2002:a05:6402:d05:b0:425:b7ab:776e with SMTP id eb5-20020a0564020d0500b00425b7ab776emr36778779edb.142.1662175213329; Fri, 02 Sep 2022 20:20:13 -0700 (PDT) MIME-Version: 1.0 References: <20220902203940.2385967-1-adhemerval.zanella@linaro.org> <20220902203940.2385967-5-adhemerval.zanella@linaro.org> In-Reply-To: <20220902203940.2385967-5-adhemerval.zanella@linaro.org> From: Noah Goldstein Date: Fri, 2 Sep 2022 20:20:01 -0700 Message-ID: Subject: Re: [PATCH 04/17] Add string vectorized find and detection functions To: Adhemerval Zanella Cc: GNU C Library , Richard Henderson , Joseph Myers , caiyinyu Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-8.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE,URI_DOTEDU autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, Sep 2, 2022 at 1:42 PM Adhemerval Zanella via Libc-alpha wrote: > > This patch adds generic string find and detection meant to be used in > generic vectorized string implementation. The idea is to decompose the > basic string operation so each architecture can reimplement if it > provides any specialized hardware instruction. > > The 'string-fza.h' provides zero byte detection functions (find_zero_low, > find_zero_all, find_eq_low, find_eq_all, find_zero_eq_low, find_zero_eq_all, > find_zero_ne_low, and find_zero_ne_all). They are used on both functions > provided by 'string-fzb.h' and 'string-fzi'. > > The 'string-fzb.h' provides boolean zero byte detection with the > functions: > > - has_zero: determine if any byte within a word is zero. > - has_eq: determine byte equality between two words. > - has_zero_eq: determine if any byte within a word is zero along with > byte equality between two words. > > The 'string-fzi.h' provides zero byte detection along with its positions: > > - index_first_zero: return index of first zero byte within a word. > - index_first_eq: return index of first byte different between two words. > - index_first_zero_eq: return index of first zero byte within a word or > first byte different between two words. > - index_first_zero_ne: return index of first zero byte within a word or > first byte equal between two words. > - index_last_zero: return index of last zero byte within a word. > - index_last_eq: return index of last byte different between two words. > > Also, to avoid libcalls in the '__builtin_c{t,l}z{l}' calls (which may > add performance degradation), inline implementation based on De Bruijn > sequences are added (enabled by a configure check). > > Co-authored-by: Richard Henderson > --- > config.h.in | 8 ++ > configure | 54 ++++++++ > configure.ac | 34 +++++ > sysdeps/generic/string-extbyte.h | 37 ++++++ > sysdeps/generic/string-fza.h | 106 ++++++++++++++++ > sysdeps/generic/string-fzb.h | 49 ++++++++ > sysdeps/generic/string-fzi.h | 208 +++++++++++++++++++++++++++++++ > 7 files changed, 496 insertions(+) > create mode 100644 sysdeps/generic/string-extbyte.h > create mode 100644 sysdeps/generic/string-fza.h > create mode 100644 sysdeps/generic/string-fzb.h > create mode 100644 sysdeps/generic/string-fzi.h > > diff --git a/config.h.in b/config.h.in > index 43d32518ab..8f0540d85a 100644 > --- a/config.h.in > +++ b/config.h.in > @@ -202,6 +202,14 @@ > /* An integer used to scale the timeout of test programs. */ > #define TIMEOUTFACTOR 1 > > +/* If compiler supports __builtin_ctz{l} without any external dependencies > + (libgcc for instance). */ > +#define HAVE_BUILTIN_CTZ 0 > + > +/* If compiler supports __builtin_clz{l} without any external dependencies > + (libgcc for instance). */ > +#define HAVE_BUILTIN_CLZ 0 > + > /* > */ > > diff --git a/configure b/configure > index ff2c406b3b..99cdec1bef 100755 > --- a/configure > +++ b/configure > @@ -6726,6 +6726,60 @@ if test $libc_cv_builtin_trap = yes; then > > fi > > +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for __builtin_ctz{l} with no external dependencies" >&5 > +$as_echo_n "checking for __builtin_ctz{l} with no external dependencies... " >&6; } > +if ${libc_cv_builtin_ctz+:} false; then : > + $as_echo_n "(cached) " >&6 > +else > + libc_cv_builtin_ctz=yes > +echo 'int foo (unsigned long x) { return __builtin_ctz (x); }' > conftest.c > +if { ac_try='${CC-cc} $CFLAGS $CPPFLAGS -S conftest.c -o conftest.s 1>&5' > + { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5 > + (eval $ac_try) 2>&5 > + ac_status=$? > + $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 > + test $ac_status = 0; }; }; then > + if grep '__ctz[s,d]i2' conftest.s > /dev/null; then > + libc_cv_builtin_ctz=no > + fi > +fi > +rm -f conftest.c conftest.s > + > +fi > +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $libc_cv_builtin_ctz" >&5 > +$as_echo "$libc_cv_builtin_ctz" >&6; } > +if test x$libc_cv_builtin_ctz = xyes; then > + $as_echo "#define HAVE_BUILTIN_CTZ 1" >>confdefs.h > + > +fi > + > +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for __builtin_clz{l} with no external dependencies" >&5 > +$as_echo_n "checking for __builtin_clz{l} with no external dependencies... " >&6; } > +if ${libc_cv_builtin_clz+:} false; then : > + $as_echo_n "(cached) " >&6 > +else > + libc_cv_builtin_clz=yes > +echo 'int foo (unsigned long x) { return __builtin_clz (x); }' > conftest.c > +if { ac_try='${CC-cc} $CFLAGS $CPPFLAGS -S conftest.c -o conftest.s 1>&5' > + { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5 > + (eval $ac_try) 2>&5 > + ac_status=$? > + $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 > + test $ac_status = 0; }; }; then > + if grep '__clz[s,d]i2' conftest.s > /dev/null; then > + libc_cv_builtin_clz=no > + fi > +fi > +rm -f conftest.c conftest.s > + > +fi > +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $libc_cv_builtin_clz" >&5 > +$as_echo "$libc_cv_builtin_clz" >&6; } > +if test x$libc_cv_builtin_clz = xyes; then > + $as_echo "#define HAVE_BUILTIN_CLZ 1" >>confdefs.h > + > +fi > + > ac_ext=cpp > ac_cpp='$CXXCPP $CPPFLAGS' > ac_compile='$CXX -c $CXXFLAGS $CPPFLAGS conftest.$ac_ext >&5' > diff --git a/configure.ac b/configure.ac > index eb5bc6a131..66e02a0566 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -1618,6 +1618,40 @@ if test $libc_cv_builtin_trap = yes; then > AC_DEFINE([HAVE_BUILTIN_TRAP]) > fi > > +AC_CACHE_CHECK(for __builtin_ctz{l} with no external dependencies, > + libc_cv_builtin_ctz, [dnl > +libc_cv_builtin_ctz=yes > +echo 'int foo (unsigned long x) { return __builtin_ctz (x); }' > conftest.c > +if AC_TRY_COMMAND(${CC-cc} $CFLAGS $CPPFLAGS -S conftest.c -o conftest.s 1>&AS_MESSAGE_LOG_FD); then > +changequote(,)dnl > + if grep '__ctz[s,d]i2' conftest.s > /dev/null; then > + libc_cv_builtin_ctz=no > + fi > +changequote([,])dnl > +fi > +rm -f conftest.c conftest.s > +]) > +if test x$libc_cv_builtin_ctz = xyes; then > + AC_DEFINE(HAVE_BUILTIN_CTZ) > +fi > + > +AC_CACHE_CHECK(for __builtin_clz{l} with no external dependencies, > + libc_cv_builtin_clz, [dnl > +libc_cv_builtin_clz=yes > +echo 'int foo (unsigned long x) { return __builtin_clz (x); }' > conftest.c > +if AC_TRY_COMMAND(${CC-cc} $CFLAGS $CPPFLAGS -S conftest.c -o conftest.s 1>&AS_MESSAGE_LOG_FD); then > +changequote(,)dnl > + if grep '__clz[s,d]i2' conftest.s > /dev/null; then > + libc_cv_builtin_clz=no > + fi > +changequote([,])dnl > +fi > +rm -f conftest.c conftest.s > +]) > +if test x$libc_cv_builtin_clz = xyes; then > + AC_DEFINE(HAVE_BUILTIN_CLZ) > +fi > + > dnl C++ feature tests. > AC_LANG_PUSH([C++]) > > diff --git a/sysdeps/generic/string-extbyte.h b/sysdeps/generic/string-extbyte.h > new file mode 100644 > index 0000000000..c8fecd259f > --- /dev/null > +++ b/sysdeps/generic/string-extbyte.h > @@ -0,0 +1,37 @@ > +/* Extract by from memory word. Generic C version. > + Copyright (C) 2022 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#ifndef _STRING_EXTBYTE_H > +#define _STRING_EXTBYTE_H 1 > + > +#include > +#include > +#include > + > +/* Extract the byte at index IDX from word X, with index 0 being the > + least significant byte. */ > +static inline unsigned char > +extractbyte (op_t x, unsigned int idx) > +{ > + if (__BYTE_ORDER == __LITTLE_ENDIAN) > + return x >> (idx * CHAR_BIT); > + else > + return x >> (sizeof (x) - 1 - idx) * CHAR_BIT; > +} > + > +#endif /* _STRING_EXTBYTE_H */ > diff --git a/sysdeps/generic/string-fza.h b/sysdeps/generic/string-fza.h > new file mode 100644 > index 0000000000..8470437824 > --- /dev/null > +++ b/sysdeps/generic/string-fza.h > @@ -0,0 +1,106 @@ > +/* Basic zero byte detection. Generic C version. > + Copyright (C) 2022 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#ifndef _STRING_FZA_H > +#define _STRING_FZA_H 1 > + > +#include > +#include > +#include > + > +/* This function returns non-zero if any byte in X is zero. > + More specifically, at least one bit set within the least significant > + byte that was zero; other bytes within the word are indeterminate. */ > +static inline op_t > +find_zero_low (op_t x) > +{ > + /* This expression comes from > + https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord > + Subtracting 1 sets 0x80 in a byte that was 0; anding ~x clears > + 0x80 in a byte that was >= 128; anding 0x80 isolates that test bit. */ > + op_t lsb = repeat_bytes (0x01); > + op_t msb = repeat_bytes (0x80); > + return (x - lsb) & ~x & msb; > +} > + > +/* This function returns at least one bit set within every byte of X that > + is zero. The result is exact in that, unlike find_zero_low, all bytes > + are determinate. This is usually used for finding the index of the > + most significant byte that was zero. */ > +static inline op_t > +find_zero_all (op_t x) > +{ > + /* For each byte, find not-zero by > + (0) And 0x7f so that we cannot carry between bytes, > + (1) Add 0x7f so that non-zero carries into 0x80, > + (2) Or in the original byte (which might have had 0x80 set). > + Then invert and mask such that 0x80 is set iff that byte was zero. */ > + op_t m = ((op_t)-1 / 0xff) * 0x7f; > + return ~(((x & m) + m) | x | m); > +} > + > +/* With similar caveats, identify bytes that are equal between X1 and X2. */ > +static inline op_t > +find_eq_low (op_t x1, op_t x2) > +{ > + return find_zero_low (x1 ^ x2); > +} > + > +static inline op_t > +find_eq_all (op_t x1, op_t x2) > +{ > + return find_zero_all (x1 ^ x2); > +} > + > +/* With similar caveats, identify zero bytes in X1 and bytes that are > + equal between in X1 and X2. */ > +static inline op_t > +find_zero_eq_low (op_t x1, op_t x2) > +{ > + return find_zero_low (x1) | find_zero_low (x1 ^ x2); > +} > + > +static inline op_t > +find_zero_eq_all (op_t x1, op_t x2) > +{ > + return find_zero_all (x1) | find_zero_all (x1 ^ x2); > +} > + > +/* With similar caveats, identify zero bytes in X1 and bytes that are > + not equal between in X1 and X2. */ > +static inline op_t > +find_zero_ne_low (op_t x1, op_t x2) > +{ > + op_t m = ((op_t)-1 / 0xff) * 0x7f; Can this use repeat_bytes? > + op_t eq = x1 ^ x2; > + op_t nz1 = (x1 + m) | x1; /* msb set if byte not zero. */ > + op_t ne2 = (eq + m) | eq; /* msb set if byte not equal. */ > + return (ne2 | ~nz1) & ~m; /* msb set if x1 zero or x2 not equal. */ > +} > + > +static inline op_t > +find_zero_ne_all (op_t x1, op_t x2) > +{ > + op_t m = ((op_t)-1 / 0xff) * 0x7f; Likewise. Elsewhere too. > + op_t eq = x1 ^ x2; > + op_t nz1 = ((x1 & m) + m) | x1; > + op_t ne2 = ((eq & m) + m) | eq; > + return (ne2 | ~nz1) & ~m; > +} > + > +#endif /* _STRING_FZA_H */ > diff --git a/sysdeps/generic/string-fzb.h b/sysdeps/generic/string-fzb.h > new file mode 100644 > index 0000000000..f1c0ae0922 > --- /dev/null > +++ b/sysdeps/generic/string-fzb.h > @@ -0,0 +1,49 @@ > +/* Zero byte detection, boolean. Generic C version. > + Copyright (C) 2022 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#ifndef _STRING_FZB_H > +#define _STRING_FZB_H 1 > + > +#include > +#include > + > +/* Determine if any byte within X is zero. This is a pure boolean test. */ > + > +static inline _Bool > +has_zero (op_t x) > +{ > + return find_zero_low (x) != 0; > +} > + > +/* Likewise, but for byte equality between X1 and X2. */ > + > +static inline _Bool > +has_eq (op_t x1, op_t x2) > +{ > + return find_eq_low (x1, x2) != 0; > +} > + > +/* Likewise, but for zeros in X1 and equal bytes between X1 and X2. */ > + > +static inline _Bool > +has_zero_eq (op_t x1, op_t x2) > +{ > + return find_zero_eq_low (x1, x2); > +} > + > +#endif /* _STRING_FZB_H */ > diff --git a/sysdeps/generic/string-fzi.h b/sysdeps/generic/string-fzi.h > new file mode 100644 > index 0000000000..8c6b6dc3d8 > --- /dev/null > +++ b/sysdeps/generic/string-fzi.h > @@ -0,0 +1,208 @@ > +/* Zero byte detection; indexes. Generic C version. > + Copyright (C) 2022 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#ifndef _STRING_FZI_H > +#define _STRING_FZI_H 1 > + > +#include > +#include > +#include > + > +/* An improved bitscan routine, multiplying the De Bruijn sequence with a > + 0-1 mask separated by the least significant one bit of a scanned integer > + or bitboard [1]. > + > + [1] https://chessprogramming.wikispaces.com/Kim+Walisch */ > +static inline unsigned int > +index_access (const op_t i) > +{ > + static const char index[] = > + { > +# if __WORDSIZE == 64 > + 0, 47, 1, 56, 48, 27, 2, 60, > + 57, 49, 41, 37, 28, 16, 3, 61, > + 54, 58, 35, 52, 50, 42, 21, 44, > + 38, 32, 29, 23, 17, 11, 4, 62, > + 46, 55, 26, 59, 40, 36, 15, 53, > + 34, 51, 20, 43, 31, 22, 10, 45, > + 25, 39, 14, 33, 19, 30, 9, 24, > + 13, 18, 8, 12, 7, 6, 5, 63 > +# else > + 0, 9, 1, 10, 13, 21, 2, 29, > + 11, 14, 16, 18, 22, 25, 3, 30, > + 8, 12, 20, 28, 15, 17, 24, 7, > + 19, 27, 23, 6, 26, 5, 4, 31 > +# endif > + }; > + return index[i]; > +} > + > +/* For architectures which only provides __builtin_clz{l} (HAVE_BUILTIN_CLZ) > + and/or __builtin_ctz{l} (HAVE_BUILTIN_CTZ) which uses external libcalls > + (for intance __c{l,t}z{s,d}i2 from libgcc) the following wrapper provides > + inline implementation for both count leading zeros and count trailing > + zeros using branchless computation. As for builtin, if x is 0 the > + result is undefined.*/ > +static inline unsigned int > +__ctz (op_t x) > +{ > +#if !HAVE_BUILTIN_CTZ > + op_t i; > +# if __WORDSIZE == 64 > + i = (x ^ (x - 1)) * 0x03F79D71B4CB0A89ull >> 58; > +# else > + i = (x ^ (x - 1)) * 0x07C4ACDDU >> 27; > +# endif > + return index_access (i); > +#else > + if (sizeof (op_t) == sizeof (long int)) > + return __builtin_ctzl (x); > + else > + return __builtin_ctzll (x); > +#endif > +}; > + > +static inline unsigned int > +__clz (op_t x) > +{ > +#if !HAVE_BUILTIN_CLZ > + unsigned r; > + op_t i; > + > + x |= x >> 1; > + x |= x >> 2; > + x |= x >> 4; > + x |= x >> 8; > + x |= x >> 16; > +# if __WORDSIZE == 64 > + x |= x >> 32; > + i = x * 0x03F79D71B4CB0A89ull >> 58; > +# else > + i = x * 0x07C4ACDDU >> 27; > +# endif > + r = index_access (i); > + return r ^ (sizeof (op_t) * CHAR_BIT - 1); > +#else > + if (sizeof (op_t) == sizeof (long int)) > + return __builtin_clzl (x); > + else > + return __builtin_clzll (x); > +#endif > +} > + > +/* A subroutine for the index_zero functions. Given a test word C, return > + the (memory order) index of the first byte (in memory order) that is > + non-zero. */ > +static inline unsigned int > +index_first_ (op_t c) > +{ > + _Static_assert (sizeof (op_t) == sizeof (long int) > + || sizeof (op_t) == sizeof (long long int), > + "Unhandled word size"); > + > + unsigned r; > + if (__BYTE_ORDER == __LITTLE_ENDIAN) > + r = __ctz (c); > + else > + r = __clz (c); > + return r / CHAR_BIT; > +} > + > +/* Similarly, but return the (memory order) index of the last byte that is > + non-zero. */ > +static inline unsigned int > +index_last_ (op_t c) > +{ > + _Static_assert (sizeof (op_t) == sizeof (long int) > + || sizeof (op_t) == sizeof (long long int), > + "Unhandled word size"); > + > + unsigned r; > + if (__BYTE_ORDER == __LITTLE_ENDIAN) > + r = __clz (c); > + else > + r = __ctz (c); > + return sizeof (op_t) - 1 - (r / CHAR_BIT); > +} > + > +/* Given a word X that is known to contain a zero byte, return the > + index of the first such within the word in memory order. */ > + > +static inline unsigned int > +index_first_zero (op_t x) > +{ > + if (__BYTE_ORDER == __LITTLE_ENDIAN) > + x = find_zero_low (x); > + else > + x = find_zero_all (x); > + return index_first_ (x); > +} > + > +/* Similarly, but perform the search for byte equality between X1 and X2. */ > +static inline unsigned int > +index_first_eq (op_t x1, op_t x2) > +{ > + if (__BYTE_ORDER == __LITTLE_ENDIAN) > + x1 = find_eq_low (x1, x2); > + else > + x1 = find_eq_all (x1, x2); > + return index_first_ (x1); > +} > + > +/* Similarly, but perform the search for zero within X1 or equality between > + X1 and X2. */ > +static inline unsigned int > +index_first_zero_eq (op_t x1, op_t x2) > +{ > + if (__BYTE_ORDER == __LITTLE_ENDIAN) > + x1 = find_zero_eq_low (x1, x2); > + else > + x1 = find_zero_eq_all (x1, x2); > + return index_first_ (x1); > +} > + > +/* Similarly, but perform the search for zero within X1 or > + inequality between X1 and X2. */ > +static inline unsigned int > +index_first_zero_ne (op_t x1, op_t x2) > +{ > + if (__BYTE_ORDER == __LITTLE_ENDIAN) > + x1 = find_zero_ne_low (x1, x2); > + else > + x1 = find_zero_ne_all (x1, x2); > + return index_first_ (x1); > +} > + > +/* Similarly, but search for the last zero within X. */ > +static inline unsigned int > +index_last_zero (op_t x) > +{ > + if (__BYTE_ORDER == __LITTLE_ENDIAN) > + x = find_zero_all (x); > + else > + x = find_zero_low (x); > + return index_last_ (x); > +} > + > +static inline unsigned int > +index_last_eq (op_t x1, op_t x2) > +{ > + return index_last_zero (x1 ^ x2); > +} > + > +#endif /* STRING_FZI_H */ > -- > 2.34.1 >