From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x530.google.com (mail-ed1-x530.google.com [IPv6:2a00:1450:4864:20::530]) by sourceware.org (Postfix) with ESMTPS id C445C3858D28 for ; Thu, 5 Jan 2023 22:53:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C445C3858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ed1-x530.google.com with SMTP id i15so161190edf.2 for ; Thu, 05 Jan 2023 14:53:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=gGW2wySkxZ0KLjNLNQUfldTCgWipTfTGvohOTqUps2A=; b=lI776C6JzfT60eykLwGR+hg6wscG7x6HeU8ZT7vEnyAiGf/kDCPkPThiCHjgLpNQrR Yy+vQmKfYXJJJjInpeIT5mLL3dMsqACWO6q4JXq8AbcvbXcPd+133dDG+IHw4yD2ymb5 J69ewfriLinAjxiHT0VO8mTBp6LKedj80TSUNq6hGunO8fg50ylP2Q7wjTUbnnj9+HAn LRmBs4qw9t5AOzKCp7oVIrorsof40RrZm4c++cOOqcuy+gQ55qBEa96gvBQ0dIK0dccV XGHxz2NNG0z9KAkX0Ojcb+9H/R6iE8ObvzVqcHJ/DgXdNoJkbUH2LCKkgC/yhA8kIfqz v53Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=gGW2wySkxZ0KLjNLNQUfldTCgWipTfTGvohOTqUps2A=; b=yPWXCOV9UXNLAAFTnKq/GydIPRbP7etTS6ihDhbvPpX3ziEf53Et1pd1KlPQZE4NaN mD10LpEU/V69zGHjzBxdIfkIm2E4vPfgkBx3lLKD08K05oR0cbCdrIeR2rEAG+a5/I2x e8iZcehbbAXnkk19YN1EoOeoUBhxtzx64W4Xz9K7hDKUAfJbLrUVlSMdCA9Q62ENdXqu 9/17NK91Pr0COxhQu5jsk1sOQr4CPj1x0VrVfnf4qRLo+x2BS/qDYr+KI4I3yHvxrVFF 5G/KZ0nbvTc/ZG5fYhUVevPJrz8WLYAAsCS2SNTvRLgpkqH6oehrdzl2PbUAGyQxaxXS G9Fw== X-Gm-Message-State: AFqh2korTMVUJG/F45sbO4tZbafmx1rftnpgdKEQ61qFyopt1mqfWj2d dW3kDE1Kklzny0vmEf8o/bEE8PNGYEDiPagyJU4= X-Google-Smtp-Source: AMrXdXuqrFi4AZWIaqYasbYzdAwMJnP73U2QqVp7GIJB5MkP7clS9ACEVvxr+ZHw1R1jbVyIfPbMg4wDFHVT+RIa/bQ= X-Received: by 2002:a05:6402:708:b0:488:dd00:372a with SMTP id w8-20020a056402070800b00488dd00372amr2705447edx.258.1672959229359; Thu, 05 Jan 2023 14:53:49 -0800 (PST) MIME-Version: 1.0 References: <20220919195920.956393-1-adhemerval.zanella@linaro.org> <20220919195920.956393-5-adhemerval.zanella@linaro.org> In-Reply-To: <20220919195920.956393-5-adhemerval.zanella@linaro.org> From: Noah Goldstein Date: Thu, 5 Jan 2023 14:53:36 -0800 Message-ID: Subject: Re: [PATCH v5 04/17] Add string vectorized find and detection functions To: Adhemerval Zanella Cc: libc-alpha@sourceware.org, Richard Henderson Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-8.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,URI_DOTEDU autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, Sep 19, 2022 at 1:02 PM Adhemerval Zanella via Libc-alpha wrote: > > This patch adds generic string find and detection meant to be used in > generic vectorized string implementation. The idea is to decompose the > basic string operation so each architecture can reimplement if it > provides any specialized hardware instruction. > > The 'string-fza.h' provides zero byte detection functions (find_zero_low, > find_zero_all, find_eq_low, find_eq_all, find_zero_eq_low, find_zero_eq_all, > find_zero_ne_low, and find_zero_ne_all). They are used on both functions > provided by 'string-fzb.h' and 'string-fzi'. > > The 'string-fzb.h' provides boolean zero byte detection with the > functions: > > - has_zero: determine if any byte within a word is zero. > - has_eq: determine byte equality between two words. > - has_zero_eq: determine if any byte within a word is zero along with > byte equality between two words. > > The 'string-fzi.h' provides zero byte detection along with its positions: > > - index_first_zero: return index of first zero byte within a word. > - index_first_eq: return index of first byte different between two words. > - index_first_zero_eq: return index of first zero byte within a word or > first byte different between two words. > - index_first_zero_ne: return index of first zero byte within a word or > first byte equal between two words. > - index_last_zero: return index of last zero byte within a word. > - index_last_eq: return index of last byte different between two words. > > Co-authored-by: Richard Henderson > --- > sysdeps/generic/string-extbyte.h | 37 ++++++++++ > sysdeps/generic/string-fza.h | 106 +++++++++++++++++++++++++++ > sysdeps/generic/string-fzb.h | 49 +++++++++++++ > sysdeps/generic/string-fzi.h | 120 +++++++++++++++++++++++++++++++ > 4 files changed, 312 insertions(+) > create mode 100644 sysdeps/generic/string-extbyte.h > create mode 100644 sysdeps/generic/string-fza.h > create mode 100644 sysdeps/generic/string-fzb.h > create mode 100644 sysdeps/generic/string-fzi.h > > diff --git a/sysdeps/generic/string-extbyte.h b/sysdeps/generic/string-extbyte.h > new file mode 100644 > index 0000000000..c8fecd259f > --- /dev/null > +++ b/sysdeps/generic/string-extbyte.h > @@ -0,0 +1,37 @@ > +/* Extract by from memory word. Generic C version. > + Copyright (C) 2022 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#ifndef _STRING_EXTBYTE_H > +#define _STRING_EXTBYTE_H 1 > + > +#include > +#include > +#include > + > +/* Extract the byte at index IDX from word X, with index 0 being the > + least significant byte. */ > +static inline unsigned char > +extractbyte (op_t x, unsigned int idx) > +{ > + if (__BYTE_ORDER == __LITTLE_ENDIAN) > + return x >> (idx * CHAR_BIT); > + else > + return x >> (sizeof (x) - 1 - idx) * CHAR_BIT; > +} > + > +#endif /* _STRING_EXTBYTE_H */ > diff --git a/sysdeps/generic/string-fza.h b/sysdeps/generic/string-fza.h > new file mode 100644 > index 0000000000..54be34e5f0 > --- /dev/null > +++ b/sysdeps/generic/string-fza.h > @@ -0,0 +1,106 @@ > +/* Basic zero byte detection. Generic C version. > + Copyright (C) 2022 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#ifndef _STRING_FZA_H > +#define _STRING_FZA_H 1 > + > +#include > +#include > +#include > + > +/* This function returns non-zero if any byte in X is zero. > + More specifically, at least one bit set within the least significant > + byte that was zero; other bytes within the word are indeterminate. */ > +static inline op_t > +find_zero_low (op_t x) > +{ > + /* This expression comes from > + https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord > + Subtracting 1 sets 0x80 in a byte that was 0; anding ~x clears > + 0x80 in a byte that was >= 128; anding 0x80 isolates that test bit. */ > + op_t lsb = repeat_bytes (0x01); > + op_t msb = repeat_bytes (0x80); > + return (x - lsb) & ~x & msb; > +} > + > +/* This function returns at least one bit set within every byte of X that > + is zero. The result is exact in that, unlike find_zero_low, all bytes > + are determinate. This is usually used for finding the index of the > + most significant byte that was zero. */ > +static inline op_t > +find_zero_all (op_t x) > +{ > + /* For each byte, find not-zero by > + (0) And 0x7f so that we cannot carry between bytes, > + (1) Add 0x7f so that non-zero carries into 0x80, > + (2) Or in the original byte (which might have had 0x80 set). > + Then invert and mask such that 0x80 is set iff that byte was zero. */ > + op_t m = ((op_t)-1 / 0xff) * 0x7f; Use repeat_byte here? > + return ~(((x & m) + m) | x | m); > +} > + > +/* With similar caveats, identify bytes that are equal between X1 and X2. */ > +static inline op_t > +find_eq_low (op_t x1, op_t x2) > +{ > + return find_zero_low (x1 ^ x2); > +} > + > +static inline op_t > +find_eq_all (op_t x1, op_t x2) > +{ > + return find_zero_all (x1 ^ x2); > +} > + > +/* With similar caveats, identify zero bytes in X1 and bytes that are > + equal between in X1 and X2. */ > +static inline op_t > +find_zero_eq_low (op_t x1, op_t x2) > +{ > + return find_zero_low (x1) | find_zero_low (x1 ^ x2); > +} > + > +static inline op_t > +find_zero_eq_all (op_t x1, op_t x2) > +{ > + return find_zero_all (x1) | find_zero_all (x1 ^ x2); > +} > + > +/* With similar caveats, identify zero bytes in X1 and bytes that are > + not equal between in X1 and X2. */ > +static inline op_t > +find_zero_ne_low (op_t x1, op_t x2) > +{ > + op_t m = repeat_bytes (0x7f); > + op_t eq = x1 ^ x2; > + op_t nz1 = (x1 + m) | x1; /* msb set if byte not zero. */ > + op_t ne2 = (eq + m) | eq; /* msb set if byte not equal. */ > + return (ne2 | ~nz1) & ~m; /* msb set if x1 zero or x2 not equal. */ > +} > + > +static inline op_t > +find_zero_ne_all (op_t x1, op_t x2) > +{ > + op_t m = repeat_bytes (0x7f); > + op_t eq = x1 ^ x2; > + op_t nz1 = ((x1 & m) + m) | x1; > + op_t ne2 = ((eq & m) + m) | eq; > + return (ne2 | ~nz1) & ~m; > +} > + > +#endif /* _STRING_FZA_H */ > diff --git a/sysdeps/generic/string-fzb.h b/sysdeps/generic/string-fzb.h > new file mode 100644 > index 0000000000..f1c0ae0922 > --- /dev/null > +++ b/sysdeps/generic/string-fzb.h > @@ -0,0 +1,49 @@ > +/* Zero byte detection, boolean. Generic C version. > + Copyright (C) 2022 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#ifndef _STRING_FZB_H > +#define _STRING_FZB_H 1 > + > +#include > +#include > + > +/* Determine if any byte within X is zero. This is a pure boolean test. */ > + > +static inline _Bool > +has_zero (op_t x) > +{ > + return find_zero_low (x) != 0; > +} > + > +/* Likewise, but for byte equality between X1 and X2. */ > + > +static inline _Bool > +has_eq (op_t x1, op_t x2) > +{ > + return find_eq_low (x1, x2) != 0; > +} > + > +/* Likewise, but for zeros in X1 and equal bytes between X1 and X2. */ > + > +static inline _Bool > +has_zero_eq (op_t x1, op_t x2) > +{ > + return find_zero_eq_low (x1, x2); > +} > + > +#endif /* _STRING_FZB_H */ > diff --git a/sysdeps/generic/string-fzi.h b/sysdeps/generic/string-fzi.h > new file mode 100644 > index 0000000000..888e1b8baa > --- /dev/null > +++ b/sysdeps/generic/string-fzi.h > @@ -0,0 +1,120 @@ > +/* Zero byte detection; indexes. Generic C version. > + Copyright (C) 2022 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#ifndef _STRING_FZI_H > +#define _STRING_FZI_H 1 > + > +#include > +#include > +#include > +#include > +#include > +#include > + > +/* A subroutine for the index_zero functions. Given a test word C, return > + the (memory order) index of the first byte (in memory order) that is > + non-zero. */ > +static inline unsigned int > +index_first_ (op_t c) > +{ > + int r; > + if (__BYTE_ORDER == __LITTLE_ENDIAN) > + count_trailing_zeros (r, c); > + else > + count_leading_zeros (r, c); > + return r / CHAR_BIT; > +} > + > +/* Similarly, but return the (memory order) index of the last byte that is > + non-zero. */ > +static inline unsigned int > +index_last_ (op_t c) > +{ > + int r; > + if (__BYTE_ORDER == __LITTLE_ENDIAN) > + count_leading_zeros (r, c); > + else > + count_trailing_zeros (r, c); > + return sizeof (op_t) - 1 - (r / CHAR_BIT); > +} > + > +/* Given a word X that is known to contain a zero byte, return the index of > + the first such within the word in memory order. */ > +static inline unsigned int > +index_first_zero (op_t x) > +{ > + if (__BYTE_ORDER == __LITTLE_ENDIAN) > + x = find_zero_low (x); > + else > + x = find_zero_all (x); > + return index_first_ (x); > +} > + > +/* Similarly, but perform the search for byte equality between X1 and X2. */ > +static inline unsigned int > +index_first_eq (op_t x1, op_t x2) > +{ > + if (__BYTE_ORDER == __LITTLE_ENDIAN) > + x1 = find_eq_low (x1, x2); > + else > + x1 = find_eq_all (x1, x2); > + return index_first_ (x1); > +} > + > +/* Similarly, but perform the search for zero within X1 or equality between > + X1 and X2. */ > +static inline unsigned int > +index_first_zero_eq (op_t x1, op_t x2) > +{ > + if (__BYTE_ORDER == __LITTLE_ENDIAN) > + x1 = find_zero_eq_low (x1, x2); > + else > + x1 = find_zero_eq_all (x1, x2); > + return index_first_ (x1); > +} > + > +/* Similarly, but perform the search for zero within X1 or inequality between > + X1 and X2. */ > +static inline unsigned int > +index_first_zero_ne (op_t x1, op_t x2) > +{ > + if (__BYTE_ORDER == __LITTLE_ENDIAN) > + x1 = find_zero_ne_low (x1, x2); > + else > + x1 = find_zero_ne_all (x1, x2); > + return index_first_ (x1); > +} > + > +/* Similarly, but search for the last zero within X. */ > +static inline unsigned int > +index_last_zero (op_t x) > +{ > + if (__BYTE_ORDER == __LITTLE_ENDIAN) > + x = find_zero_all (x); > + else > + x = find_zero_low (x); > + return index_last_ (x); > +} > + > +static inline unsigned int > +index_last_eq (op_t x1, op_t x2) > +{ > + return index_last_zero (x1 ^ x2); > +} > + > +#endif /* STRING_FZI_H */ > -- > 2.34.1 >