From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x532.google.com (mail-ed1-x532.google.com [IPv6:2a00:1450:4864:20::532]) by sourceware.org (Postfix) with ESMTPS id 6EC0D3857425 for ; Sat, 3 Sep 2022 03:48:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6EC0D3857425 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ed1-x532.google.com with SMTP id b16so4987695edd.4 for ; Fri, 02 Sep 2022 20:48:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=tpBJ1WLjnB01Ee57FUSQKZFpcv8FAjLc0+xAqPst7aU=; b=buOcgQeKSLkZM75wdyPlRj+nmgl23JKHnFNemCZQou+2q9zy+28xIrFhuOvru+QVi0 twLjJRMRnRfGR6/YC4zKtw9lw21Zv0l3dhs3ao14mGxd6y9qNEWxCjsCtRMdPflQLcWf yY8ll8NaRKZZsswYovmZXlEpQXn9zmWb4PKoFGnLb6QL0gwn6qDysfKbFCFcT3+nWrUp F0BTZ0Vd2hGdWdK9Oc/N8A1Kn49ZeRqW5KNeka/RKnlNzsfcLdGMB9NfU0Y9uw0N6mrV G0Ol6R/BOZmoaTGq0dFLSc6xn+2MSqECFDD3Q6qF3ktHyWcmMSNwld0uTkvqHXmrCWVn kyiw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=tpBJ1WLjnB01Ee57FUSQKZFpcv8FAjLc0+xAqPst7aU=; b=bsP80Kv8XOmYBt6RQhFSduyaK33nhrJeVXDuqRafcWNSHfnHZInEIXyfZAs6QIIdn9 jgqfYNEqAh2yk2WPx/JrJoK1uNVNtZFahIeEpEi8CfFQzppDBJSaD4JX9PiK5JWKYEd5 jq2eLQEPKTyJlfsc5YyEnJstWSJPyIAcn79z/gVnWabXDAkN7/J+eYgA1r5XYye4rrwv jX4AlfSABAVWFSIwjHvAKchHkAeHO4cJYLN+pV5qLMyty20uzARozvwBm6tQ6VSdLxxI 9wHflFOSy3xwse/l77XG8AODJb+ERZsp1yfHkFaq8fHO1hUKvgCq9Yr2mpMfFFJWBgQp tXgw== X-Gm-Message-State: ACgBeo0g+iQr6lpJ04T7BeGp0eUbG/6IkQiIW66yYCHBMiUJTrkyrW1F vW6sZM2i85QCBYwsMC4GGUG8R2BdMzfRMWSKNic= X-Google-Smtp-Source: AA6agR43CCObCqQKqDeR4ATTEZ8GLTdtl2PmG3l2acr3kENqSKYVTctRjO1vprWe8cd/8yUeht8Y13rUZJtkNE9F7eE= X-Received: by 2002:a05:6402:378f:b0:43a:d3f5:79f2 with SMTP id et15-20020a056402378f00b0043ad3f579f2mr35620551edb.338.1662176881157; Fri, 02 Sep 2022 20:48:01 -0700 (PDT) MIME-Version: 1.0 References: <20220902203940.2385967-1-adhemerval.zanella@linaro.org> <20220902203940.2385967-11-adhemerval.zanella@linaro.org> In-Reply-To: <20220902203940.2385967-11-adhemerval.zanella@linaro.org> From: Noah Goldstein Date: Fri, 2 Sep 2022 20:47:50 -0700 Message-ID: Subject: Re: [PATCH 10/17] string: Improve generic memchr To: Adhemerval Zanella Cc: GNU C Library , Richard Henderson , Joseph Myers , caiyinyu Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, Sep 2, 2022 at 1:45 PM Adhemerval Zanella via Libc-alpha wrote: > > New algorithm have the following key differences: > > - Reads first word unaligned and use string-maskoff function to > remove unwanted data. This strategy follow arch-specific > optimization used on aarch64 and powerpc. > > - Use string-fz{b,i} and string-opthr functions. > > Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu, > and powerpc64-linux-gnu by removing the arch-specific assembly > implementation and disabling multi-arch (it covers both LE and BE > for 64 and 32 bits). > > Co-authored-by: Richard Henderson > --- > string/memchr.c | 168 +++++------------- > .../powerpc32/power4/multiarch/memchr-ppc32.c | 14 +- > .../powerpc64/multiarch/memchr-ppc64.c | 9 +- > 3 files changed, 48 insertions(+), 143 deletions(-) > > diff --git a/string/memchr.c b/string/memchr.c > index 422bcd0cd6..8fe0ac48ab 100644 > --- a/string/memchr.c > +++ b/string/memchr.c > @@ -1,10 +1,6 @@ > -/* Copyright (C) 1991-2022 Free Software Foundation, Inc. > +/* Scan memory for a character. Generic version > + Copyright (C) 1991-2022 Free Software Foundation, Inc. > This file is part of the GNU C Library. > - Based on strlen implementation by Torbjorn Granlund (tege@sics.se), > - with help from Dan Sahlin (dan@sics.se) and > - commentary by Jim Blandy (jimb@ai.mit.edu); > - adaptation to memchr suggested by Dick Karpinski (dick@cca.ucsf.edu), > - and implemented by Roland McGrath (roland@ai.mit.edu). > > The GNU C Library is free software; you can redistribute it and/or > modify it under the terms of the GNU Lesser General Public > @@ -20,143 +16,65 @@ > License along with the GNU C Library; if not, see > . */ > > -#ifndef _LIBC > -# include > -#endif > - > +#include > +#include > +#include > +#include > +#include > +#include > #include > > -#include > +#undef memchr > > -#include > - > -#undef __memchr > -#ifdef _LIBC > -# undef memchr > +#ifdef MEMCHR > +# define __memchr MEMCHR > #endif > > -#ifndef weak_alias > -# define __memchr memchr > -#endif > - > -#ifndef MEMCHR > -# define MEMCHR __memchr > -#endif > +static inline const char * > +sadd (uintptr_t x, uintptr_t y) > +{ > + uintptr_t ret = INT_ADD_OVERFLOW (x, y) ? (uintptr_t)-1 : x + y; > + return (const char *)ret; > +} > > /* Search no more than N bytes of S for C. */ > void * > -MEMCHR (void const *s, int c_in, size_t n) > +__memchr (void const *s, int c_in, size_t n) > { > - /* On 32-bit hardware, choosing longword to be a 32-bit unsigned > - long instead of a 64-bit uintmax_t tends to give better > - performance. On 64-bit hardware, unsigned long is generally 64 > - bits already. Change this typedef to experiment with > - performance. */ > - typedef unsigned long int longword; > + if (__glibc_unlikely (n == 0)) > + return NULL; > > - const unsigned char *char_ptr; > - const longword *longword_ptr; > - longword repeated_one; > - longword repeated_c; > - unsigned char c; > + uintptr_t s_int = (uintptr_t) s; > > - c = (unsigned char) c_in; > + /* Set up a word, each of whose bytes is C. */ > + op_t repeated_c = repeat_bytes (c_in); > + op_t before_mask = create_mask (s_int); > > - /* Handle the first few bytes by reading one byte at a time. > - Do this until CHAR_PTR is aligned on a longword boundary. */ > - for (char_ptr = (const unsigned char *) s; > - n > 0 && (size_t) char_ptr % sizeof (longword) != 0; > - --n, ++char_ptr) > - if (*char_ptr == c) > - return (void *) char_ptr; > + /* Compute the address of the last byte taking in consideration possible > + overflow. */ > + const char *lbyte = sadd (s_int, n - 1); > > - longword_ptr = (const longword *) char_ptr; > + /* Compute the address of the word containing the last byte. */ > + const op_t *lword = word_containing (lbyte); > > - /* All these elucidatory comments refer to 4-byte longwords, > - but the theory applies equally well to any size longwords. */ > + /* Read the first word, but munge it so that bytes before the array > + will not match goal. */ > + const op_t * word_ptr = word_containing (s); > + op_t word = (*word_ptr | before_mask) ^ (repeated_c & before_mask); Why do you xor with repeated_c & before_mask here? Doesn't the has_eq(word, repeated_c) do that? > > - /* Compute auxiliary longword values: > - repeated_one is a value which has a 1 in every byte. > - repeated_c has c in every byte. */ > - repeated_one = 0x01010101; > - repeated_c = c | (c << 8); > - repeated_c |= repeated_c << 16; > - if (0xffffffffU < (longword) -1) > + while (has_eq (word, repeated_c) == 0) > { > - repeated_one |= repeated_one << 31 << 1; > - repeated_c |= repeated_c << 31 << 1; > - if (8 < sizeof (longword)) > - { > - size_t i; > - > - for (i = 64; i < sizeof (longword) * 8; i *= 2) > - { > - repeated_one |= repeated_one << i; > - repeated_c |= repeated_c << i; > - } > - } > + if (word_ptr == lword) > + return NULL; > + word = *++word_ptr; > } > > - /* Instead of the traditional loop which tests each byte, we will test a > - longword at a time. The tricky part is testing if *any of the four* > - bytes in the longword in question are equal to c. We first use an xor > - with repeated_c. This reduces the task to testing whether *any of the > - four* bytes in longword1 is zero. > - > - We compute tmp = > - ((longword1 - repeated_one) & ~longword1) & (repeated_one << 7). > - That is, we perform the following operations: > - 1. Subtract repeated_one. > - 2. & ~longword1. > - 3. & a mask consisting of 0x80 in every byte. > - Consider what happens in each byte: > - - If a byte of longword1 is zero, step 1 and 2 transform it into 0xff, > - and step 3 transforms it into 0x80. A carry can also be propagated > - to more significant bytes. > - - If a byte of longword1 is nonzero, let its lowest 1 bit be at > - position k (0 <= k <= 7); so the lowest k bits are 0. After step 1, > - the byte ends in a single bit of value 0 and k bits of value 1. > - After step 2, the result is just k bits of value 1: 2^k - 1. After > - step 3, the result is 0. And no carry is produced. > - So, if longword1 has only non-zero bytes, tmp is zero. > - Whereas if longword1 has a zero byte, call j the position of the least > - significant zero byte. Then the result has a zero at positions 0, ..., > - j-1 and a 0x80 at position j. We cannot predict the result at the more > - significant bytes (positions j+1..3), but it does not matter since we > - already have a non-zero bit at position 8*j+7. > - > - So, the test whether any byte in longword1 is zero is equivalent to > - testing whether tmp is nonzero. */ > - > - while (n >= sizeof (longword)) > - { > - longword longword1 = *longword_ptr ^ repeated_c; > - > - if ((((longword1 - repeated_one) & ~longword1) > - & (repeated_one << 7)) != 0) > - break; > - longword_ptr++; > - n -= sizeof (longword); > - } > - > - char_ptr = (const unsigned char *) longword_ptr; > - > - /* At this point, we know that either n < sizeof (longword), or one of the > - sizeof (longword) bytes starting at char_ptr is == c. On little-endian > - machines, we could determine the first such byte without any further > - memory accesses, just by looking at the tmp result from the last loop > - iteration. But this does not work on big-endian machines. Choose code > - that works in both cases. */ > - > - for (; n > 0; --n, ++char_ptr) > - { > - if (*char_ptr == c) > - return (void *) char_ptr; > - } > - > - return NULL; > + /* We found a match, but it might be in a byte past the end > + of the array. */ > + char *ret = (char *) word_ptr + index_first_eq (word, repeated_c); > + return (ret <= lbyte) ? ret : NULL; > } > -#ifdef weak_alias > +#ifndef MEMCHR > weak_alias (__memchr, memchr) > -#endif > libc_hidden_builtin_def (memchr) > +#endif > diff --git a/sysdeps/powerpc/powerpc32/power4/multiarch/memchr-ppc32.c b/sysdeps/powerpc/powerpc32/power4/multiarch/memchr-ppc32.c > index fc69df54b3..02877d3c98 100644 > --- a/sysdeps/powerpc/powerpc32/power4/multiarch/memchr-ppc32.c > +++ b/sysdeps/powerpc/powerpc32/power4/multiarch/memchr-ppc32.c > @@ -18,17 +18,11 @@ > > #include > > -#define MEMCHR __memchr_ppc > +extern __typeof (memchr) __memchr_ppc attribute_hidden; > > -#undef weak_alias > -#define weak_alias(a, b) > +#define MEMCHR __memchr_ppc > +#include > > #ifdef SHARED > -# undef libc_hidden_builtin_def > -# define libc_hidden_builtin_def(name) \ > - __hidden_ver1(__memchr_ppc, __GI_memchr, __memchr_ppc); > +__hidden_ver1(__memchr_ppc, __GI_memchr, __memchr_ppc); > #endif > - > -extern __typeof (memchr) __memchr_ppc attribute_hidden; > - > -#include > diff --git a/sysdeps/powerpc/powerpc64/multiarch/memchr-ppc64.c b/sysdeps/powerpc/powerpc64/multiarch/memchr-ppc64.c > index 3c966f4403..15beca787b 100644 > --- a/sysdeps/powerpc/powerpc64/multiarch/memchr-ppc64.c > +++ b/sysdeps/powerpc/powerpc64/multiarch/memchr-ppc64.c > @@ -18,14 +18,7 @@ > > #include > > -#define MEMCHR __memchr_ppc > - > -#undef weak_alias > -#define weak_alias(a, b) > - > -# undef libc_hidden_builtin_def > -# define libc_hidden_builtin_def(name) > - > extern __typeof (memchr) __memchr_ppc attribute_hidden; > > +#define MEMCHR __memchr_ppc > #include > -- > 2.34.1 >