From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yb1-xb34.google.com (mail-yb1-xb34.google.com [IPv6:2607:f8b0:4864:20::b34]) by sourceware.org (Postfix) with ESMTPS id 74A083858D37 for ; Wed, 1 Feb 2023 19:42:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 74A083858D37 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-yb1-xb34.google.com with SMTP id 129so23770690ybb.0 for ; Wed, 01 Feb 2023 11:42:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=wWMkCVV99kodvBVMVK8S5ptr364kWKfZuyeXK1oRAVQ=; b=K1Pl/JEUrlbVYatPigm0Shff7uBAwmHPrv6jBDSVhNFe+6t89xVo4QP9su3azNYUSk mMBiHOfzM4DbkRVfh09R2Ndc8VIVIferzTef2rIVZBBOuHQ1xhhlatFN1YMzOsw+14zd JQ4JSayogfJ+Uyf2SEkqEYfTeavkrVp95LJImV0uhphrDl5R0F3qCAm6orfCFTqab3m8 PcrodG4d3sBR0iBVFCB9qUZjDwtw8uP4MVDEuVk+NGn5RNg+MB11GvCgN85Y79PRFpjG UG6XE81N2zS1HaqhhVS4CGEluoAmp70h7lZVWHb8JEjpHULOoNNE0Tu+32N1By3RpyC/ 8YpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=wWMkCVV99kodvBVMVK8S5ptr364kWKfZuyeXK1oRAVQ=; b=3IDRzNB429lL8AW5ZB5QD3Z9FlwxJAhIjNFlH3Vb7zOOv+WEdp0QNguu35zieDjDVt Y8KP5GXARp5P0Ghx1mnOy6l7VGk+oRBKyKnJ+E9ky2a4umdZ9XY+gl0EWnFqOpOnTc4F XVQvPobUvmaUCDlBYLkTcp4XT3T2eiRmZt8FXo6KS2W03R4B/2LeS1Mcuk6bNalz+PLQ /VoYSZQZ29r4cZ2+BX90YfwKFKBr26poYvQKfi2E2eaUpO7tYmlpsFZY//LOuDK65/Vf 4mk7qPthxVSgWTxq5u712ZNFaAHtq5V8W8VgYwmdxgo2WKkEixVAnjGH6lL4oV1grUCb y6Ug== X-Gm-Message-State: AO0yUKVxku9My7aYwpsAEU3jPKowdLlTu3/rzj6b1TQa71yGFde2zHfZ GRRCAwDG+k53hxO9fk2AfyE7u883m2lAX/472rI= X-Google-Smtp-Source: AK7set/JTf2gPmd7MVEPDe5i8BmdAf6mmrOzHYtqiLV6pPnantkaQTTko0qimyIz6IaVHKnNKkCu2hsuHH32ZZpowvA= X-Received: by 2002:a25:aace:0:b0:802:e757:97cc with SMTP id t72-20020a25aace000000b00802e75797ccmr507969ybi.472.1675280573730; Wed, 01 Feb 2023 11:42:53 -0800 (PST) MIME-Version: 1.0 References: <20230201170406.303978-1-adhemerval.zanella@linaro.org> <20230201170406.303978-10-adhemerval.zanella@linaro.org> In-Reply-To: <20230201170406.303978-10-adhemerval.zanella@linaro.org> From: Noah Goldstein Date: Wed, 1 Feb 2023 13:42:41 -0600 Message-ID: Subject: Re: [PATCH v11 09/29] string: Improve generic strncmp To: Adhemerval Zanella Cc: libc-alpha@sourceware.org, Richard Henderson , Jeff Law , Xi Ruoyao Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-8.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, Feb 1, 2023 at 11:04 AM Adhemerval Zanella wrote: > > It follows the strategy: > > - Align the first input to word boundary using byte operations. > > - If second input is also word aligned, read a word per time, check > for null (using has_zero), and check final words using byte > operation. > > - If second input is not word aligned, loop by aligning the source, > and merge the result of two reads. Similar to aligned case, check > for null with has_zero, and check final words using byte operation. > > Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc64-linux-gnu, > and powerpc-linux-gnu by removing the arch-specific assembly > implementation and disabling multi-arch (it covers both LE and BE > for 64 and 32 bits). > --- > string/strcmp-impl.h | 41 ++++++++++++++ > string/strcmp.c | 23 ++------ > string/strncmp.c | 130 +++++++++++++++++++++++++++++++------------ > 3 files changed, 138 insertions(+), 56 deletions(-) > create mode 100644 string/strcmp-impl.h > > diff --git a/string/strcmp-impl.h b/string/strcmp-impl.h > new file mode 100644 > index 0000000000..618240368a > --- /dev/null > +++ b/string/strcmp-impl.h > @@ -0,0 +1,41 @@ > +/* Common definition for string compare implementations. > + Copyright (C) 2023 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#ifndef _STRCMP_IMPL_H > +#define _STRCMP_IMPL_H > + > +#include > +#include > + > +static inline int > +final_cmp (const op_t w1, const op_t w2, size_t n) > +{ > + /* It can not use index_first_zero_ne because it must not compare past the > + final '\0' is present (and final_cmp is called before has_zero check). > + */ > + for (size_t i = 0; i < n; i++) > + { > + unsigned char c1 = extractbyte (w1, i); > + unsigned char c2 = extractbyte (w2, i); > + if (c1 == '\0' || c1 != c2) > + return c1 - c2; > + } > + return 0; > +} > + > +#endif > diff --git a/string/strcmp.c b/string/strcmp.c > index 42e24242b6..8e7b3310db 100644 > --- a/string/strcmp.c > +++ b/string/strcmp.c > @@ -16,6 +16,7 @@ > . */ > > #include > +#include > #include > #include > #include > @@ -25,22 +26,6 @@ > # define strcmp STRCMP > #endif > > -static inline int > -final_cmp (const op_t w1, const op_t w2) > -{ > - /* It can not use index_first_zero_ne because it must not compare past the > - final '\0' is present (and final_cmp is called before has_zero check). > - */ > - for (size_t i = 0; i < sizeof (op_t); i++) > - { > - unsigned char c1 = extractbyte (w1, i); > - unsigned char c2 = extractbyte (w2, i); > - if (c1 == '\0' || c1 != c2) > - return c1 - c2; > - } > - return 0; > -} > - > /* Aligned loop: if a difference is found, exit to compare the bytes. Else > if a zero is found we have equal strings. */ > static inline int > @@ -56,7 +41,7 @@ strcmp_aligned_loop (const op_t *x1, const op_t *x2, op_t w1) > w2 = *x2++; > } > > - return final_cmp (w1, w2); > + return final_cmp (w1, w2, sizeof (op_t)); > } > > /* Unaligned loop: align the first partial of P2, with 0xff for the rest of > @@ -83,7 +68,7 @@ strcmp_unaligned_loop (const op_t *x1, const op_t *x2, op_t w1, uintptr_t ofs) > w2b = *x2++; > w2 = MERGE (w2a, sh_1, w2b, sh_2); > if (w1 != w2) > - return final_cmp (w1, w2); > + return final_cmp (w1, w2, sizeof (op_t)); > if (has_zero (w2b)) > break; > w1 = *x1++; > @@ -100,7 +85,7 @@ strcmp_unaligned_loop (const op_t *x1, const op_t *x2, op_t w1, uintptr_t ofs) > w2 = MERGE (w2b, sh_1, 0, sh_2); > } > > - return final_cmp (w1, w2); > + return final_cmp (w1, w2, sizeof (op_t)); > } > > /* Compare S1 and S2, returning less than, equal to or > diff --git a/string/strncmp.c b/string/strncmp.c > index fd7cee09b6..3e6040df09 100644 > --- a/string/strncmp.c > +++ b/string/strncmp.c > @@ -15,7 +15,13 @@ > License along with the GNU C Library; if not, see > . */ > > +#include > +#include > +#include > +#include > +#include > #include > +#include > #include > > #undef strncmp > @@ -24,51 +30,101 @@ > #define STRNCMP strncmp > #endif > > -/* Compare no more than N characters of S1 and S2, > - returning less than, equal to or greater than zero > - if S1 is lexicographically less than, equal to or > - greater than S2. */ > -int > -STRNCMP (const char *s1, const char *s2, size_t n) > +/* Aligned loop: if a difference is found, exit to compare the bytes. Else > + if a zero is found we have equal strings. */ > +static inline int > +strncmp_aligned_loop (const op_t *x1, const op_t *x2, op_t w1, size_t n) > { > - unsigned char c1 = '\0'; > - unsigned char c2 = '\0'; > + op_t w2 = *x2++; > > - if (n >= 4) > + while (w1 == w2) > { > - size_t n4 = n >> 2; > - do > - { > - c1 = (unsigned char) *s1++; > - c2 = (unsigned char) *s2++; > - if (c1 == '\0' || c1 != c2) > - return c1 - c2; > - c1 = (unsigned char) *s1++; > - c2 = (unsigned char) *s2++; > - if (c1 == '\0' || c1 != c2) > - return c1 - c2; > - c1 = (unsigned char) *s1++; > - c2 = (unsigned char) *s2++; > - if (c1 == '\0' || c1 != c2) > - return c1 - c2; > - c1 = (unsigned char) *s1++; > - c2 = (unsigned char) *s2++; > - if (c1 == '\0' || c1 != c2) > - return c1 - c2; > - } while (--n4 > 0); > - n &= 3; > + if (n <= sizeof (op_t)) > + break; > + n -= sizeof (op_t); > + > + if (has_zero (w1)) > + return 0; > + w1 = *x1++; > + w2 = *x2++; > } > > - while (n > 0) > + return final_cmp (w1, w2, n); > +} > + > +/* Unaligned loop: align the first partial of P2, with 0xff for the rest of > + the bytes so that we can also apply the has_zero test to see if we have > + already reached EOS. If we have, then we can simply fall through to the > + final comparison. */ > +static inline int > +strncmp_unaligned_loop (const op_t *x1, const op_t *x2, op_t w1, uintptr_t ofs, > + size_t n) > +{ > + op_t w2a = *x2++; > + uintptr_t sh_1 = ofs * CHAR_BIT; > + uintptr_t sh_2 = sizeof(op_t) * CHAR_BIT - sh_1; > + > + op_t w2 = MERGE (w2a, sh_1, (op_t)-1, sh_2); > + if (!has_zero (w2) && n > (sizeof (op_t) - ofs)) > { > - c1 = (unsigned char) *s1++; > - c2 = (unsigned char) *s2++; > - if (c1 == '\0' || c1 != c2) > - return c1 - c2; > - n--; > + op_t w2b; > + > + /* Unaligned loop. The invariant is that W2B, which is "ahead" of W1, > + does not contain end-of-string. Therefore it is safe (and necessary) > + to read another word from each while we do not have a difference. */ > + while (1) > + { > + w2b = *x2++; > + w2 = MERGE (w2a, sh_1, w2b, sh_2); > + if (n <= sizeof (op_t) || w1 != w2) > + return final_cmp (w1, w2, n); > + n -= sizeof(op_t); > + if (has_zero (w2b) || n <= (sizeof (op_t) - ofs)) > + break; > + w1 = *x1++; > + w2a = w2b; > + } > + > + /* Zero found in the second partial of P2. If we had EOS in the aligned > + word, we have equality. */ > + if (has_zero (w1)) > + return 0; > + > + /* Load the final word of P1 and align the final partial of P2. */ > + w1 = *x1++; > + w2 = MERGE (w2b, sh_1, 0, sh_2); > } > > - return c1 - c2; > + return final_cmp (w1, w2, n); > } > > +/* Compare no more than N characters of S1 and S2, > + returning less than, equal to or greater than zero > + if S1 is lexicographically less than, equal to or > + greater than S2. */ > +int > +STRNCMP (const char *p1, const char *p2, size_t n) > +{ > + /* Handle the unaligned bytes of p1 first. */ > + uintptr_t a = MIN (-(uintptr_t)p1 % sizeof(op_t), n); > + int diff = 0; > + for (int i = 0; i < a; ++i) > + { > + unsigned char c1 = *p1++; > + unsigned char c2 = *p2++; > + diff = c1 - c2; > + if (c1 == '\0' || diff != 0) > + return diff; > + } > + if (a == n) > + return diff; NIT: return 0 IMO > + > + /* P1 is now aligned to op_t. P2 may or may not be. */ > + const op_t *x1 = (const op_t *) p1; > + op_t w1 = *x1++; > + uintptr_t ofs = (uintptr_t) p2 % sizeof(op_t); > + return ofs == 0 > + ? strncmp_aligned_loop (x1, (const op_t *) p2, w1, n - a) > + : strncmp_unaligned_loop (x1, (const op_t *) (p2 - ofs), w1, ofs, n - a); > +} > libc_hidden_builtin_def (STRNCMP) > -- > 2.34.1 >