From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=JtgS=5H=linaro.org=adhemerval.zanella@sourceware.org>
Received: from mail-oi1-x230.google.com (mail-oi1-x230.google.com [IPv6:2607:f8b0:4864:20::230])
	by sourceware.org (Postfix) with ESMTPS id 3B3AD385840A
	for <libc-alpha@sourceware.org>; Tue, 10 Jan 2023 14:34:03 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3B3AD385840A
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org
Received: by mail-oi1-x230.google.com with SMTP id r130so10186403oih.2
        for <libc-alpha@sourceware.org>; Tue, 10 Jan 2023 06:34:03 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=linaro.org; s=google;
        h=content-transfer-encoding:in-reply-to:organization:from:references
         :cc:to:content-language:subject:user-agent:mime-version:date
         :message-id:from:to:cc:subject:date:message-id:reply-to;
        bh=X3wh/Cd0ns/Y8A04ZOrwQ3oEcsZiWx0OZ9wOTfufVfc=;
        b=bIJcVu05ofOtIWwRkyctQU20kUkumTdIQ+/QrJP8IV35/YKB57G/i8g4DPn9hfbEGQ
         9S/sfwSPY7cGpAKKdHtRAuu2UOAvGa6UivkpnaksQbAhGlm7n+69J6TJWDMuFAQOS3Pc
         NGyDw5n8JXo5wsDSa6DxMenkZNEYKDJZ0iyFdiEy49Z5nw4phqWyn4MpIWyC3R20p5yi
         m6Dvo/A3Cdb3WDfI6yCoNRwzLcy7+vlxd1VSG6Kke+ZcEvJ998sM8w22NsE9S859MFom
         rPwgpL3g9XDDkvdE9U0q1r/09avIN7WKycMOCKeF4Zc7FhSnTqpX6zUJKtd5q8ItuVet
         GMlQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:in-reply-to:organization:from:references
         :cc:to:content-language:subject:user-agent:mime-version:date
         :message-id:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=X3wh/Cd0ns/Y8A04ZOrwQ3oEcsZiWx0OZ9wOTfufVfc=;
        b=LgJvDqg7NwcZNA4Q7awIy9otQJ8pPzBr0Me//b9Ixb96GXudVY64gjCjbYlsL9XJ8H
         KLAXlIFc3GaYBYhD/O852fKkh5nd1MNenXH1ND/hqoUovlYOYDxf+O1BMpucN1FqcaXo
         IqpVtGjQN9fYxfCQjQ3oEZLFt3WZPiJOULsAIzz3pkZjxSnbAeq7Ym+jzyO5Laq5W9I4
         RF4aumSnVl0PepFtbMYOmfj0+xuAwTv4mEZwjAmL0WPw9viT1H1ItF+NKAVsi3vtp4mR
         4sP0HJFTkMgL0pes+CNmgdKfjdAVfrqVq0pv1v/ji0LmqivrLZOq6is8FnngSpkHBLui
         hi4A==
X-Gm-Message-State: AFqh2krGWfH/Jb44ndpvj0PkGtXqcsGk3Rh8RHpC6rth1QQX3dmL8rdj
	3lR/jmuf/bouMp6pWflpJFVHYA==
X-Google-Smtp-Source: AMrXdXt1EwUWWVBlYA+R+dj6FHy+hnnwJwHk/guQA7NEPzQbcTlQO3ju5WhwjRRgZ/3qBGhjzjyOqg==
X-Received: by 2002:a05:6808:1511:b0:363:ea5c:2c2f with SMTP id u17-20020a056808151100b00363ea5c2c2fmr12175810oiw.18.1673361242450;
        Tue, 10 Jan 2023 06:34:02 -0800 (PST)
Received: from ?IPV6:2804:1b3:a7c0:a93a:8d00:c4d9:6d86:9f2b? ([2804:1b3:a7c0:a93a:8d00:c4d9:6d86:9f2b])
        by smtp.gmail.com with ESMTPSA id w11-20020a0568080d4b00b0035c422bb303sm5350630oik.19.2023.01.10.06.34.00
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Tue, 10 Jan 2023 06:34:01 -0800 (PST)
Message-ID: <d2586a1b-05e4-64ba-7698-0bb314e6ee99@linaro.org>
Date: Tue, 10 Jan 2023 11:33:59 -0300
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0)
 Gecko/20100101 Thunderbird/102.6.1
Subject: Re: [PATCH v5 10/17] string: Improve generic memchr
Content-Language: en-US
To: Noah Goldstein <goldstein.w.n@gmail.com>
Cc: libc-alpha@sourceware.org, Richard Henderson <rth@twiddle.net>
References: <20220919195920.956393-1-adhemerval.zanella@linaro.org>
 <20220919195920.956393-11-adhemerval.zanella@linaro.org>
 <CAFUsyf+dbO-VEy_MugLe_EF4fmrsEKKFT797dL4xGmbwL2SR_Q@mail.gmail.com>
 <6e926487-5fff-5c67-6c86-6cc38a126bf8@linaro.org>
 <CAFUsyfLONZvFkm3-ChT=WAxt88jY3fkiC5pkkf+EFc=HfV2EOg@mail.gmail.com>
From: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Organization: Linaro
In-Reply-To: <CAFUsyfLONZvFkm3-ChT=WAxt88jY3fkiC5pkkf+EFc=HfV2EOg@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <libc-alpha.sourceware.org>


On 09/01/23 18:26, Noah Goldstein wrote:
> On Mon, Jan 9, 2023 at 12:51 PM Adhemerval Zanella Netto
> <adhemerval.zanella@linaro.org> wrote:
>>
>>
>>
>> On 05/01/23 20:49, Noah Goldstein wrote:
>>> On Mon, Sep 19, 2022 at 1:05 PM Adhemerval Zanella via Libc-alpha
>>> <libc-alpha@sourceware.org> wrote:
>>>>
>>>> New algorithm have the following key differences:
>>>>
>>>>   - Reads first word unaligned and use string-maskoff function to
>>>>     remove unwanted data.  This strategy follow arch-specific
>>>>     optimization used on aarch64 and powerpc.
>>>>
>>>>   - Use string-fz{b,i} and string-opthr functions.
>>>>
>>>> Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu,
>>>> and powerpc64-linux-gnu by removing the arch-specific assembly
>>>> implementation and disabling multi-arch (it covers both LE and BE
>>>> for 64 and 32 bits).
>>>>
>>>> Co-authored-by: Richard Henderson  <rth@twiddle.net>
>>>> ---
>>>>  string/memchr.c                               | 168 +++++-------------
>>>>  .../powerpc32/power4/multiarch/memchr-ppc32.c |  14 +-
>>>>  .../powerpc64/multiarch/memchr-ppc64.c        |   9 +-
>>>>  3 files changed, 48 insertions(+), 143 deletions(-)
>>>>
>>>> diff --git a/string/memchr.c b/string/memchr.c
>>>> index 422bcd0cd6..08d518b02d 100644
>>>> --- a/string/memchr.c
>>>> +++ b/string/memchr.c
>>>> @@ -1,10 +1,6 @@
>>>> -/* Copyright (C) 1991-2022 Free Software Foundation, Inc.
>>>> +/* Scan memory for a character.  Generic version
>>>> +   Copyright (C) 1991-2022 Free Software Foundation, Inc.
>>>>     This file is part of the GNU C Library.
>>>> -   Based on strlen implementation by Torbjorn Granlund (tege@sics.se),
>>>> -   with help from Dan Sahlin (dan@sics.se) and
>>>> -   commentary by Jim Blandy (jimb@ai.mit.edu);
>>>> -   adaptation to memchr suggested by Dick Karpinski (dick@cca.ucsf.edu),
>>>> -   and implemented by Roland McGrath (roland@ai.mit.edu).
>>>>
>>>>     The GNU C Library is free software; you can redistribute it and/or
>>>>     modify it under the terms of the GNU Lesser General Public
>>>> @@ -20,143 +16,65 @@
>>>>     License along with the GNU C Library; if not, see
>>>>     <https://www.gnu.org/licenses/>.  */
>>>>
>>>> -#ifndef _LIBC
>>>> -# include <config.h>
>>>> -#endif
>>>> -
>>>> +#include <intprops.h>
>>>> +#include <string-fza.h>
>>>> +#include <string-fzb.h>
>>>> +#include <string-fzi.h>
>>>> +#include <string-maskoff.h>
>>>> +#include <string-opthr.h>
>>>>  #include <string.h>
>>>>
>>>> -#include <stddef.h>
>>>> +#undef memchr
>>>>
>>>> -#include <limits.h>
>>>> -
>>>> -#undef __memchr
>>>> -#ifdef _LIBC
>>>> -# undef memchr
>>>> +#ifdef MEMCHR
>>>> +# define __memchr MEMCHR
>>>>  #endif
>>>>
>>>> -#ifndef weak_alias
>>>> -# define __memchr memchr
>>>> -#endif
>>>> -
>>>> -#ifndef MEMCHR
>>>> -# define MEMCHR __memchr
>>>> -#endif
>>>> +static inline const char *
>>>> +sadd (uintptr_t x, uintptr_t y)
>>>> +{
>>>> +  uintptr_t ret = INT_ADD_OVERFLOW (x, y) ? (uintptr_t)-1 : x + y;
>>>> +  return (const char *)ret;
>>>> +}
>>>>
>>>>  /* Search no more than N bytes of S for C.  */
>>>>  void *
>>>> -MEMCHR (void const *s, int c_in, size_t n)
>>>> +__memchr (void const *s, int c_in, size_t n)
>>>>  {
>>>> -  /* On 32-bit hardware, choosing longword to be a 32-bit unsigned
>>>> -     long instead of a 64-bit uintmax_t tends to give better
>>>> -     performance.  On 64-bit hardware, unsigned long is generally 64
>>>> -     bits already.  Change this typedef to experiment with
>>>> -     performance.  */
>>>> -  typedef unsigned long int longword;
>>>> +  if (__glibc_unlikely (n == 0))
>>>> +    return NULL;
>>>>
>>>> -  const unsigned char *char_ptr;
>>>> -  const longword *longword_ptr;
>>>> -  longword repeated_one;
>>>> -  longword repeated_c;
>>>> -  unsigned char c;
>>>> +  uintptr_t s_int = (uintptr_t) s;
>>>>
>>>> -  c = (unsigned char) c_in;
>>>> +  /* Set up a word, each of whose bytes is C.  */
>>>> +  op_t repeated_c = repeat_bytes (c_in);
>>>> +  op_t before_mask = create_mask (s_int);
>>>>
>>>> -  /* Handle the first few bytes by reading one byte at a time.
>>>> -     Do this until CHAR_PTR is aligned on a longword boundary.  */
>>>> -  for (char_ptr = (const unsigned char *) s;
>>>> -       n > 0 && (size_t) char_ptr % sizeof (longword) != 0;
>>>> -       --n, ++char_ptr)
>>>> -    if (*char_ptr == c)
>>>> -      return (void *) char_ptr;
>>>> +  /* Compute the address of the last byte taking in consideration possible
>>>> +     overflow.  */
>>>> +  const char *lbyte = sadd (s_int, n - 1);
>>>>
>>>> -  longword_ptr = (const longword *) char_ptr;
>>>> +  /* Compute the address of the word containing the last byte. */
>>>> +  const op_t *lword = word_containing (lbyte);
>>>>
>>>> -  /* All these elucidatory comments refer to 4-byte longwords,
>>>> -     but the theory applies equally well to any size longwords.  */
>>>> +  /* Read the first word, but munge it so that bytes before the array
>>>> +     will not match goal.  */
>>>> +  const op_t *word_ptr = word_containing (s);
>>>> +  op_t word = (*word_ptr | before_mask) ^ (repeated_c & before_mask);
>>>>
>>>> -  /* Compute auxiliary longword values:
>>>> -     repeated_one is a value which has a 1 in every byte.
>>>> -     repeated_c has c in every byte.  */
>>>> -  repeated_one = 0x01010101;
>>>> -  repeated_c = c | (c << 8);
>>>> -  repeated_c |= repeated_c << 16;
>>>> -  if (0xffffffffU < (longword) -1)
>>>> +  while (has_eq (word, repeated_c) == 0)
>>>>      {
>>>> -      repeated_one |= repeated_one << 31 << 1;
>>>> -      repeated_c |= repeated_c << 31 << 1;
>>>> -      if (8 < sizeof (longword))
>>>> -       {
>>>> -         size_t i;
>>>> -
>>>> -         for (i = 64; i < sizeof (longword) * 8; i *= 2)
>>>> -           {
>>>> -             repeated_one |= repeated_one << i;
>>>> -             repeated_c |= repeated_c << i;
>>>> -           }
>>>> -       }
>>>> +      if (word_ptr == lword)
>>>> +       return NULL;
>>> Inuitively making lword, lword - 1 so that normal returns don't need the extra
>>> null check would be faster.
>>
>> Hum, I did not follow; could you explain it with more details what you mean here?
> 
> I was thinking something like:
> 
> ```
>   op_t word = *word_ptr;
>   op_t mask = find_eq_low (word, repeated_c)
>       >> (CHAR_BIT * (s_int % sizeof (uintptr_t)));
>   if (mask)
>     {
>       char *ret = (char *) s + index_first_ (mask);
>       return (ret <= lbyte) ? ret : NULL;
>     }
>   if (word_ptr == lword)
>     return NULL;
> 
>   word = *++word_ptr;
>   while (word_ptr != lword)
>     {
>       if (has_eq (word, repeated_c))
> return (char *) word_ptr + index_first_eq (word, repeated_c);
>       word = *++word_ptr;
>     }
> 
>   if (has_eq (word, repeated_c))
>     {
> 
>       /* We found a match, but it might be in a byte past the end
> of the array.  */
>       char *ret = (char *) word_ptr + index_first_eq (word, repeated_c);
>       if (ret <= lbyte)
> return ret;
>     }
>   return NULL;
> ```
> 
> The idea is until the last byte you don't need the extra bounds check (tested
> on test-memchr.c on little-endian).

Alright, this works. I will update the path.