From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl1-x630.google.com (mail-pl1-x630.google.com [IPv6:2607:f8b0:4864:20::630]) by sourceware.org (Postfix) with ESMTPS id 3D8E33858CDA for ; Mon, 9 Jan 2023 23:33:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3D8E33858CDA Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=twiddle.net Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pl1-x630.google.com with SMTP id d3so11301693plr.10 for ; Mon, 09 Jan 2023 15:33:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :sender:from:to:cc:subject:date:message-id:reply-to; bh=ryhI+xD7J/5uwOCkp/jGDtj5a8YNIltTNyA4mnkUHrM=; b=flf48IHcc+9iaZiy08um/QCkdZamOdKH19klnmvjf8mH4A/xu2bhHshvLaQBbDuSQl UmxlFn32UkwWi0PuYlyx6E7SjrRlVYFr/IIefxMrSYE51DLw6vwcgzkp1Z9q/GB5mgLs 3QqZBMvqe/l1hR+TMtV9lOGRAUSQeBI+UkGD7FxznXmTdIz9kL+qpQ0WzhZazAn6XN1Q tS+eRvURn8pe3sNPwSEaJ+wiQn8Bp2AO+rk4f4ulU/wWXKsyRIx2wkMLwOnkOOBZhXON cMFEth9gHIt7sru7OR74SU9MgHzWFILUxhgHn1mgNT/uTTwVcxlxHKK4uhUckUcFCXri vrpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :sender:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ryhI+xD7J/5uwOCkp/jGDtj5a8YNIltTNyA4mnkUHrM=; b=OocjRe7wtVQ5S34GTs3Pn5uvebUiDATcdDJ3wTKl2jA5p7VZAXr2mjW0xTM3PbWdYH bhpEViUGjGa8V1VF7fWfu+GrmFt2xkck8/Ri6DkqlBy5DdZd6y2/7zmtHB/b61KYCEBM zY5zWw9YFfWXptqRnxljsOPutEy62J9VlV5gWPdIN01nbpqok6eeCFumT3CHrT6kE9P6 vb+vfrVx5v8iVO3H482Ggtn/AsWHpphyiZ3wWOTMDsgs46OiL6v/IVAUDBv8hhS2a20a Yy7hVQ4cmLmuERxN3oWo/whPgzJ645cxLjkjEEO9A0TPOb4Ck75iY90Kgm7s7nFlGSRV ir0Q== X-Gm-Message-State: AFqh2kr8WDmU+62dkz6PGwTHgBEGJxj+eFzvv3z/sfOep7WWzAjI4EI1 7lvfx1IDMZQzAHo+Wf6dxdVs/NrAQPY= X-Google-Smtp-Source: AMrXdXsTva291sGvjqGp4cfGJ1LpDcCdm1iNRajyspEomQWGvTt+4PDk8YREH77ipBQWtVA/PI57sA== X-Received: by 2002:a05:6a20:2a01:b0:a2:17a6:44cd with SMTP id e1-20020a056a202a0100b000a217a644cdmr75863520pzh.58.1673307209907; Mon, 09 Jan 2023 15:33:29 -0800 (PST) Received: from ?IPV6:2602:47:d48c:8101:158e:facf:7a46:ba9f? ([2602:47:d48c:8101:158e:facf:7a46:ba9f]) by smtp.googlemail.com with ESMTPSA id e2-20020a635002000000b0046feb2754e5sm5535312pgb.28.2023.01.09.15.33.29 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 09 Jan 2023 15:33:29 -0800 (PST) Sender: Richard Henderson Message-ID: Date: Mon, 9 Jan 2023 15:33:27 -0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: [PATCH v5 08/17] string: Improve generic strchrnul To: Adhemerval Zanella Netto , Noah Goldstein Cc: libc-alpha@sourceware.org References: <20220919195920.956393-1-adhemerval.zanella@linaro.org> <20220919195920.956393-9-adhemerval.zanella@linaro.org> Content-Language: en-US From: Richard Henderson In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,GIT_PATCH_0,HEADER_FROM_DIFFERENT_DOMAINS,KAM_SHORT,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 1/9/23 12:35, Adhemerval Zanella Netto wrote: > > > On 05/01/23 20:17, Noah Goldstein wrote: >> On Mon, Sep 19, 2022 at 1:04 PM Adhemerval Zanella via Libc-alpha >> wrote: >>> >>> New algorithm have the following key differences: >>> >>> - Reads first word unaligned and use string-maskoff function to >>> remove unwanted data. This strategy follow arch-specific >>> optimization used on aarch64 and powerpc. >>> >>> - Use string-fz{b,i} functions. >>> >>> Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc64-linux-gnu, >>> and powerpc-linux-gnu by removing the arch-specific assembly >>> implementation and disabling multi-arch (it covers both LE and BE >>> for 64 and 32 bits). >>> >>> Co-authored-by: Richard Henderson >>> --- >>> string/strchrnul.c | 156 +++--------------- >>> .../power4/multiarch/strchrnul-ppc32.c | 4 - >>> sysdeps/s390/strchrnul-c.c | 2 - >>> 3 files changed, 24 insertions(+), 138 deletions(-) >>> >>> diff --git a/string/strchrnul.c b/string/strchrnul.c >>> index 0cc1fc6bb0..67defa3dab 100644 >>> --- a/string/strchrnul.c >>> +++ b/string/strchrnul.c >>> @@ -1,10 +1,5 @@ >>> /* Copyright (C) 1991-2022 Free Software Foundation, Inc. >>> This file is part of the GNU C Library. >>> - Based on strlen implementation by Torbjorn Granlund (tege@sics.se), >>> - with help from Dan Sahlin (dan@sics.se) and >>> - bug fix and commentary by Jim Blandy (jimb@ai.mit.edu); >>> - adaptation to strchr suggested by Dick Karpinski (dick@cca.ucsf.edu), >>> - and implemented by Roland McGrath (roland@ai.mit.edu). >>> >>> The GNU C Library is free software; you can redistribute it and/or >>> modify it under the terms of the GNU Lesser General Public >>> @@ -21,146 +16,43 @@ >>> . */ >>> >>> #include >>> -#include >>> #include >>> +#include >>> +#include >>> +#include >>> +#include >>> +#include >>> >>> #undef __strchrnul >>> #undef strchrnul >>> >>> -#ifndef STRCHRNUL >>> -# define STRCHRNUL __strchrnul >>> +#ifdef STRCHRNUL >>> +# define __strchrnul STRCHRNUL >>> #endif >>> >>> /* Find the first occurrence of C in S or the final NUL byte. */ >>> char * >>> -STRCHRNUL (const char *s, int c_in) >>> +__strchrnul (const char *str, int c_in) >>> { >>> - const unsigned char *char_ptr; >>> - const unsigned long int *longword_ptr; >>> - unsigned long int longword, magic_bits, charmask; >>> - unsigned char c; >>> - >>> - c = (unsigned char) c_in; >>> - >>> - /* Handle the first few characters by reading one character at a time. >>> - Do this until CHAR_PTR is aligned on a longword boundary. */ >>> - for (char_ptr = (const unsigned char *) s; >>> - ((unsigned long int) char_ptr & (sizeof (longword) - 1)) != 0; >>> - ++char_ptr) >>> - if (*char_ptr == c || *char_ptr == '\0') >>> - return (void *) char_ptr; >>> - >>> - /* All these elucidatory comments refer to 4-byte longwords, >>> - but the theory applies equally well to 8-byte longwords. */ >>> - >>> - longword_ptr = (unsigned long int *) char_ptr; >>> - >>> - /* Bits 31, 24, 16, and 8 of this number are zero. Call these bits >>> - the "holes." Note that there is a hole just to the left of >>> - each byte, with an extra at the end: >>> - >>> - bits: 01111110 11111110 11111110 11111111 >>> - bytes: AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD >>> - >>> - The 1-bits make sure that carries propagate to the next 0-bit. >>> - The 0-bits provide holes for carries to fall into. */ >>> - magic_bits = -1; >>> - magic_bits = magic_bits / 0xff * 0xfe << 1 >> 1 | 1; >>> - >>> - /* Set up a longword, each of whose bytes is C. */ >>> - charmask = c | (c << 8); >>> - charmask |= charmask << 16; >>> - if (sizeof (longword) > 4) >>> - /* Do the shift in two steps to avoid a warning if long has 32 bits. */ >>> - charmask |= (charmask << 16) << 16; >>> - if (sizeof (longword) > 8) >>> - abort (); >>> - >>> - /* Instead of the traditional loop which tests each character, >>> - we will test a longword at a time. The tricky part is testing >>> - if *any of the four* bytes in the longword in question are zero. */ >>> - for (;;) >>> - { >>> - /* We tentatively exit the loop if adding MAGIC_BITS to >>> - LONGWORD fails to change any of the hole bits of LONGWORD. >>> - >>> - 1) Is this safe? Will it catch all the zero bytes? >>> - Suppose there is a byte with all zeros. Any carry bits >>> - propagating from its left will fall into the hole at its >>> - least significant bit and stop. Since there will be no >>> - carry from its most significant bit, the LSB of the >>> - byte to the left will be unchanged, and the zero will be >>> - detected. >>> + /* Set up a word, each of whose bytes is C. */ >>> + op_t repeated_c = repeat_bytes (c_in); >>> >>> - 2) Is this worthwhile? Will it ignore everything except >>> - zero bytes? Suppose every byte of LONGWORD has a bit set >>> - somewhere. There will be a carry into bit 8. If bit 8 >>> - is set, this will carry into bit 16. If bit 8 is clear, >>> - one of bits 9-15 must be set, so there will be a carry >>> - into bit 16. Similarly, there will be a carry into bit >>> - 24. If one of bits 24-30 is set, there will be a carry >>> - into bit 31, so all of the hole bits will be changed. >>> + /* Align the input address to op_t. */ >>> + uintptr_t s_int = (uintptr_t) str; >>> + const op_t *word_ptr = word_containing (str); >>> >>> - The one misfire occurs when bits 24-30 are clear and bit >>> - 31 is set; in this case, the hole at bit 31 is not >>> - changed. If we had access to the processor carry flag, >>> - we could close this loophole by putting the fourth hole >>> - at bit 32! >>> + /* Read the first aligned word, but force bytes before the string to >>> + match neither zero nor goal (we make sure the high bit of each byte >>> + is 1, and the low 7 bits are all the opposite of the goal byte). */ >>> + op_t bmask = create_mask (s_int); >>> + op_t word = (*word_ptr | bmask) ^ (repeated_c & highbit_mask (bmask)); >> >> Think much clearer (and probably better codegen) is: >> find_zero_eq_low/all(word, repeated) >> (s_int * CHAR_BIT) > > It does not seem to work, at least not replacing the two lines with: > > op_t word = find_zero_eq_all/low (*word_ptr, repeated_c) >> (s_int * CHAR_BIT); Oh, two fine points: (1) big-endian would want shifting left, (2) alpha would want shifting by bits not bytes, because the cmpbge insn produces an 8-bit mask. so you'd need to hide this shift in the headers like create_mask(). r~