From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x632.google.com (mail-ej1-x632.google.com [IPv6:2a00:1450:4864:20::632]) by sourceware.org (Postfix) with ESMTPS id 822483858D37 for ; Mon, 9 Jan 2023 21:26:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 822483858D37 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ej1-x632.google.com with SMTP id ss4so16284800ejb.11 for ; Mon, 09 Jan 2023 13:26:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=OUqUl9ZaTjo6ovFBb4pkzUzD+3VKWlYfwbsCYPCT7Uk=; b=ii9KqzOYLYslRTW7fUM9Boh72Iu/s+L+gJQZK2BAAF0xCftc0PavFqa3Sfqsq8bK35 3b2tLKJXSI0Ca4BptC7nb3Hpx7t+AbWMejJ5dvbiVknFI7YbHMjY44Soaj2GrZ9mhzkG CU3TSliVvIu3DR2lqHdFAQHTfXScZ0S+UXrqCpeRg5qQRT0kmcfoPTVAeNAzYHHjkYiT i2f2qDJ3jKhgs+phCQ/0iObQ5gp4hMSmIAVDRMS97R5X4neWj5UfL6Zn/QRQziChi85k zicSGq+PpU++i7JDKj83BWcrXKhlBQVsBZ02jhnQ8oZw8agxoTowXmAkNr9pdQVFR9Ez SAaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=OUqUl9ZaTjo6ovFBb4pkzUzD+3VKWlYfwbsCYPCT7Uk=; b=oFrkHMDE058MBNQ3BEC6XQlc8+ZAsgUNJW6LIGHnj11RaI1ts1iiNTSeRTAW47whoh t06Y72nWVZgRuH4LpM8vKZqLlGUNDywH/JSv5SzH3OhveKggHHTdChiohmwU+Fr0SfFo ctdVmLZzZUAi16ICJZIP2Sej7GpOLWcJ9KNSi/xpO0Kk77xAJYwZHUuog/PvHQx3nkeZ OThYi2ttRpjnhR3/jth0g+IX6gyYezPR0rO92XOKZUNYGkwsc8AChZcz8t60EdJH5PoP 7TMjCcEJhOG4in5tY9MuM5Cjg5XuGFngoFmW8RwJFcdKRJzxLNXQHxw2x6IUpDB0398l HyAw== X-Gm-Message-State: AFqh2krlDQVTJbDT7BKDofofCoXQzxLQEIRXJh4+QcfCzYTrC1qEkUSM 9D1F1/TDcG4Uq4M9WyjaDdKbqRNenzFgc5RUKyY= X-Google-Smtp-Source: AMrXdXsQHzTez93DxjpP4XNQsw56w4g2OJoT0fmkfrLdOcCuHYzbrDFrnJFS9TOdjKaNdbNabkfaa5Wis9BF3C0lJjE= X-Received: by 2002:a17:906:a881:b0:7c1:6425:aae5 with SMTP id ha1-20020a170906a88100b007c16425aae5mr4911352ejb.169.1673299588222; Mon, 09 Jan 2023 13:26:28 -0800 (PST) MIME-Version: 1.0 References: <20220919195920.956393-1-adhemerval.zanella@linaro.org> <20220919195920.956393-11-adhemerval.zanella@linaro.org> <6e926487-5fff-5c67-6c86-6cc38a126bf8@linaro.org> In-Reply-To: <6e926487-5fff-5c67-6c86-6cc38a126bf8@linaro.org> From: Noah Goldstein Date: Mon, 9 Jan 2023 13:26:16 -0800 Message-ID: Subject: Re: [PATCH v5 10/17] string: Improve generic memchr To: Adhemerval Zanella Netto Cc: libc-alpha@sourceware.org, Richard Henderson Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, Jan 9, 2023 at 12:51 PM Adhemerval Zanella Netto wrote: > > > > On 05/01/23 20:49, Noah Goldstein wrote: > > On Mon, Sep 19, 2022 at 1:05 PM Adhemerval Zanella via Libc-alpha > > wrote: > >> > >> New algorithm have the following key differences: > >> > >> - Reads first word unaligned and use string-maskoff function to > >> remove unwanted data. This strategy follow arch-specific > >> optimization used on aarch64 and powerpc. > >> > >> - Use string-fz{b,i} and string-opthr functions. > >> > >> Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu, > >> and powerpc64-linux-gnu by removing the arch-specific assembly > >> implementation and disabling multi-arch (it covers both LE and BE > >> for 64 and 32 bits). > >> > >> Co-authored-by: Richard Henderson > >> --- > >> string/memchr.c | 168 +++++------------- > >> .../powerpc32/power4/multiarch/memchr-ppc32.c | 14 +- > >> .../powerpc64/multiarch/memchr-ppc64.c | 9 +- > >> 3 files changed, 48 insertions(+), 143 deletions(-) > >> > >> diff --git a/string/memchr.c b/string/memchr.c > >> index 422bcd0cd6..08d518b02d 100644 > >> --- a/string/memchr.c > >> +++ b/string/memchr.c > >> @@ -1,10 +1,6 @@ > >> -/* Copyright (C) 1991-2022 Free Software Foundation, Inc. > >> +/* Scan memory for a character. Generic version > >> + Copyright (C) 1991-2022 Free Software Foundation, Inc. > >> This file is part of the GNU C Library. > >> - Based on strlen implementation by Torbjorn Granlund (tege@sics.se), > >> - with help from Dan Sahlin (dan@sics.se) and > >> - commentary by Jim Blandy (jimb@ai.mit.edu); > >> - adaptation to memchr suggested by Dick Karpinski (dick@cca.ucsf.edu), > >> - and implemented by Roland McGrath (roland@ai.mit.edu). > >> > >> The GNU C Library is free software; you can redistribute it and/or > >> modify it under the terms of the GNU Lesser General Public > >> @@ -20,143 +16,65 @@ > >> License along with the GNU C Library; if not, see > >> . */ > >> > >> -#ifndef _LIBC > >> -# include > >> -#endif > >> - > >> +#include > >> +#include > >> +#include > >> +#include > >> +#include > >> +#include > >> #include > >> > >> -#include > >> +#undef memchr > >> > >> -#include > >> - > >> -#undef __memchr > >> -#ifdef _LIBC > >> -# undef memchr > >> +#ifdef MEMCHR > >> +# define __memchr MEMCHR > >> #endif > >> > >> -#ifndef weak_alias > >> -# define __memchr memchr > >> -#endif > >> - > >> -#ifndef MEMCHR > >> -# define MEMCHR __memchr > >> -#endif > >> +static inline const char * > >> +sadd (uintptr_t x, uintptr_t y) > >> +{ > >> + uintptr_t ret = INT_ADD_OVERFLOW (x, y) ? (uintptr_t)-1 : x + y; > >> + return (const char *)ret; > >> +} > >> > >> /* Search no more than N bytes of S for C. */ > >> void * > >> -MEMCHR (void const *s, int c_in, size_t n) > >> +__memchr (void const *s, int c_in, size_t n) > >> { > >> - /* On 32-bit hardware, choosing longword to be a 32-bit unsigned > >> - long instead of a 64-bit uintmax_t tends to give better > >> - performance. On 64-bit hardware, unsigned long is generally 64 > >> - bits already. Change this typedef to experiment with > >> - performance. */ > >> - typedef unsigned long int longword; > >> + if (__glibc_unlikely (n == 0)) > >> + return NULL; > >> > >> - const unsigned char *char_ptr; > >> - const longword *longword_ptr; > >> - longword repeated_one; > >> - longword repeated_c; > >> - unsigned char c; > >> + uintptr_t s_int = (uintptr_t) s; > >> > >> - c = (unsigned char) c_in; > >> + /* Set up a word, each of whose bytes is C. */ > >> + op_t repeated_c = repeat_bytes (c_in); > >> + op_t before_mask = create_mask (s_int); > >> > >> - /* Handle the first few bytes by reading one byte at a time. > >> - Do this until CHAR_PTR is aligned on a longword boundary. */ > >> - for (char_ptr = (const unsigned char *) s; > >> - n > 0 && (size_t) char_ptr % sizeof (longword) != 0; > >> - --n, ++char_ptr) > >> - if (*char_ptr == c) > >> - return (void *) char_ptr; > >> + /* Compute the address of the last byte taking in consideration possible > >> + overflow. */ > >> + const char *lbyte = sadd (s_int, n - 1); > >> > >> - longword_ptr = (const longword *) char_ptr; > >> + /* Compute the address of the word containing the last byte. */ > >> + const op_t *lword = word_containing (lbyte); > >> > >> - /* All these elucidatory comments refer to 4-byte longwords, > >> - but the theory applies equally well to any size longwords. */ > >> + /* Read the first word, but munge it so that bytes before the array > >> + will not match goal. */ > >> + const op_t *word_ptr = word_containing (s); > >> + op_t word = (*word_ptr | before_mask) ^ (repeated_c & before_mask); > >> > >> - /* Compute auxiliary longword values: > >> - repeated_one is a value which has a 1 in every byte. > >> - repeated_c has c in every byte. */ > >> - repeated_one = 0x01010101; > >> - repeated_c = c | (c << 8); > >> - repeated_c |= repeated_c << 16; > >> - if (0xffffffffU < (longword) -1) > >> + while (has_eq (word, repeated_c) == 0) > >> { > >> - repeated_one |= repeated_one << 31 << 1; > >> - repeated_c |= repeated_c << 31 << 1; > >> - if (8 < sizeof (longword)) > >> - { > >> - size_t i; > >> - > >> - for (i = 64; i < sizeof (longword) * 8; i *= 2) > >> - { > >> - repeated_one |= repeated_one << i; > >> - repeated_c |= repeated_c << i; > >> - } > >> - } > >> + if (word_ptr == lword) > >> + return NULL; > > Inuitively making lword, lword - 1 so that normal returns don't need the extra > > null check would be faster. > > Hum, I did not follow; could you explain it with more details what you mean here? I was thinking something like: ``` op_t word = *word_ptr; op_t mask = find_eq_low (word, repeated_c) >> (CHAR_BIT * (s_int % sizeof (uintptr_t))); if (mask) { char *ret = (char *) s + index_first_ (mask); return (ret <= lbyte) ? ret : NULL; } if (word_ptr == lword) return NULL; word = *++word_ptr; while (word_ptr != lword) { if (has_eq (word, repeated_c)) return (char *) word_ptr + index_first_eq (word, repeated_c); word = *++word_ptr; } if (has_eq (word, repeated_c)) { /* We found a match, but it might be in a byte past the end of the array. */ char *ret = (char *) word_ptr + index_first_eq (word, repeated_c); if (ret <= lbyte) return ret; } return NULL; ``` The idea is until the last byte you don't need the extra bounds check (tested on test-memchr.c on little-endian).