From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x635.google.com (mail-ej1-x635.google.com [IPv6:2a00:1450:4864:20::635]) by sourceware.org (Postfix) with ESMTPS id 1C4DC3858D28 for ; Thu, 5 Jan 2023 23:52:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1C4DC3858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ej1-x635.google.com with SMTP id gh17so229067ejb.6 for ; Thu, 05 Jan 2023 15:52:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=YSh0Hn/z/DTUOwtzplbjoPO+ByjdtCPuHQSo7lQTBCE=; b=J42vo/aEdzbV5HSGth5RWosTcKkVKTPU05R8jvHcsj3m9FwlgK0Jq/hygsWxpaZbuk j5J3zEgwJESw0tT2e1bpMMKotNS9QMH1FVNWh7G8VPbj0VLARjXUe+qHVJWYMhgFaDOP V7iTbukF27kMdqZjQOCRgcybQzQNPWUtZJUJYSkyUANqEfp6LK1wxAF+yECMOHPPu06e 7zsSwcW21vfPBC2aXN0VyEZykPRpHyswQSEv9cyLz3rPp/W016GIjelry3LWUKkSgC0e G2yarzbNNdD+SBpS4xvv92JUkEahdkWKvKPUSgLBYAxib1d5WwniXRhyRYNqhQkNclc3 ykPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=YSh0Hn/z/DTUOwtzplbjoPO+ByjdtCPuHQSo7lQTBCE=; b=P2LCU/lwOTthhXq6ntB6DH9cl2rhrSfDiJZRROh/TY8Jx4L8ibXwP587HHnU42De17 ljPFOese1NAIQ16Gf6gpnCZ2c0x+YGx61nlVo/B/aiDrX8mQ8/6ULjPIxReOJyWyZjXH ZKz7RcTgd54tGFwl3cF02rkOb33Mcl0PrBUmwLpiWqPFxRV5X/riy6GlSWkohYB8PeU2 HPSrZG71DL5dpcrjH7pVUZjQdrVdntwKX54XxU/alN2DP4S0vVZuj55Nzu/7DHfGA6No Vm/PNc/+t06mbWEef4BgWjWt+p0zq77d7NuMslQTnQYaQIgNp3bcZ3qz/l0t2AGRVI2z l1Cg== X-Gm-Message-State: AFqh2kphPhb1lhfG4oE7Iyclk8llnQkvIWdjXcw3HUblufg5+LMlq7jm CojuESxlGXaW67nwL4HmQ2iNmzdnuiYg6fQmdxk= X-Google-Smtp-Source: AMrXdXtsIyFpQgw9I93pnmGkBho0qQnAYrLXvN0xZrSSxkIMq0PBATeh3x+Js18nvG0Eq4BtyKpVWG7drTnrQxeLAJU= X-Received: by 2002:a17:906:a881:b0:7c1:6425:aae5 with SMTP id ha1-20020a170906a88100b007c16425aae5mr3828437ejb.169.1672962734895; Thu, 05 Jan 2023 15:52:14 -0800 (PST) MIME-Version: 1.0 References: <20220919195920.956393-1-adhemerval.zanella@linaro.org> <51f962c19445d29a3187eeedc220558926b56a60.camel@xry111.site> In-Reply-To: From: Noah Goldstein Date: Thu, 5 Jan 2023 15:52:03 -0800 Message-ID: Subject: Re: [PATCH v5 00/17] Improve generic string routines To: Adhemerval Zanella Netto Cc: Xi Ruoyao , libc-alpha@sourceware.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, Jan 5, 2023 at 1:56 PM Adhemerval Zanella Netto via Libc-alpha wrote: > > Unfortunately no one worked on reviewing it. It would be good to have > it for 2.37, although I think it is too late. However, since most > architectures do use arch-specific routines, I think the possible disruption > of using this patchset should be minimal. I can start reviewing this. Not sure I can do all the arch headers but can get up to 11/17. > > On 05/12/22 14:07, Xi Ruoyao wrote: > > Hi, > > > > Any status update on this series? > > > > On Mon, 2022-09-19 at 16:59 -0300, Adhemerval Zanella via Libc-alpha > > wrote: > >> It is done by: > >> > >> 1. parametrizing the internal routines (for instance the find zero > >> in a word) so each architecture can reimplement without the need > >> to reimplement the whole routine. > >> > >> 2. vectorizing more string implementations (for instance strcpy > >> and strcmp). > >> > >> 3. Change some implementations to use already possible optimized > >> ones (for instance strnlen). It makes new ports to focus on > >> only provide optimized implementation of a hardful symbols > >> (for instance memchr) and make its improvement to be used in > >> a larger set of routines. > >> > >> For the rest of #5806 I think we can handle them later and if > >> performance of generic implementation is closer I think it is better > >> to just remove old assembly implementations. > >> > >> I also checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu, > >> and powerpc64-linux-gnu by removing the arch-specific assembly > >> implementation and disabling multiarch (it covers both LE and BE > >> for 64 and 32 bits). I also checked the string routines on alpha, > >> hppa, > >> and sh. > >> > >> Changes since v4: > >> * Removed __clz and __ctz in favor of count_leading_zero and > >> count_trailing_zeros from longlong.h. > >> * Use repeat_bytes more often. > >> * Added a comment on strcmp final_cmp on why index_first_zero_ne can > >> not be used. > >> > >> Changes since v3: > >> * Rebased against master. > >> * Dropped strcpy optimization. > >> * Refactor strcmp implementation. > >> * Some minor changes in comments. > >> > >> Changes since v2: > >> * Move string-fz{a,b,i} to its own patch. > >> * Add a inline implementation for __builtin_c{l,t}z to avoid using > >> compiler provided symbols. > >> * Add a new header, string-maskoff.h, to handle unaligned accesses > >> on some implementation. > >> * Fixed strcmp on LE machines. > >> * Added a unaligned strcpy variant for architecture that define > >> _STRING_ARCH_unaligned. > >> * Add SH string-fzb.h (which uses cmp/str instruction to find > >> a zero in word). > >> > >> Changes since v1: > >> * Marked ChangeLog entries with [BZ #5806], as appropriate. > >> * Reorganized the headers, so that armv6t2 and power6 need override > >> as little as possible to use their (integer) zero detection insns. > >> * Hopefully fixed all of the coding style issues. > >> * Adjusted the memrchr algorithm as discussed. > >> * Replaced the #ifdef STRRCHR etc that are used by the multiarch > >> * files. > >> * Tested on i386, i686, x86_64 (verified this is unused), ppc64, > >> ppc64le --with-cpu=power8 (to use power6 in multiarch), armv7, > >> aarch64, alpha (qemu) and hppa (qemu). > >> > >> Adhemerval Zanella (10): > >> Add string-maskoff.h generic header > >> Add string vectorized find and detection functions > >> string: Improve generic strlen > >> string: Improve generic strnlen > >> string: Improve generic strchr > >> string: Improve generic strchrnul > >> string: Improve generic strcmp > >> string: Improve generic memchr > >> string: Improve generic memrchr > >> sh: Add string-fzb.h > >> > >> Richard Henderson (7): > >> Parameterize op_t from memcopy.h > >> Parameterize OP_T_THRES from memcopy.h > >> hppa: Add memcopy.h > >> hppa: Add string-fzb.h and string-fzi.h > >> alpha: Add string-fzb.h and string-fzi.h > >> arm: Add string-fza.h > >> powerpc: Add string-fza.h > >> > >> string/memchr.c | 168 ++++------------ > >> string/memcmp.c | 4 - > >> string/memrchr.c | 189 +++-------------- > >> - > >> string/strchr.c | 172 +++------------- > >> string/strchrnul.c | 156 +++------------ > >> string/strcmp.c | 119 +++++++++-- > >> string/strlen.c | 90 ++------- > >> string/strnlen.c | 137 +------------ > >> sysdeps/alpha/string-fzb.h | 51 +++++ > >> sysdeps/alpha/string-fzi.h | 113 +++++++++++ > >> sysdeps/arm/armv6t2/string-fza.h | 70 +++++++ > >> sysdeps/generic/memcopy.h | 10 +- > >> sysdeps/generic/string-extbyte.h | 37 ++++ > >> sysdeps/generic/string-fza.h | 106 ++++++++++ > >> sysdeps/generic/string-fzb.h | 49 +++++ > >> sysdeps/generic/string-fzi.h | 120 +++++++++++ > >> sysdeps/generic/string-maskoff.h | 73 +++++++ > >> sysdeps/generic/string-opthr.h | 25 +++ > >> sysdeps/generic/string-optype.h | 31 +++ > >> sysdeps/hppa/memcopy.h | 42 ++++ > >> sysdeps/hppa/string-fzb.h | 69 +++++++ > >> sysdeps/hppa/string-fzi.h | 135 +++++++++++++ > >> sysdeps/i386/i686/multiarch/strnlen-c.c | 14 +- > >> sysdeps/i386/memcopy.h | 3 - > >> sysdeps/i386/string-opthr.h | 25 +++ > >> sysdeps/m68k/memcopy.h | 3 - > >> sysdeps/powerpc/powerpc32/power4/memcopy.h | 5 - > >> .../powerpc32/power4/multiarch/memchr-ppc32.c | 14 +- > >> .../power4/multiarch/strchrnul-ppc32.c | 4 - > >> .../power4/multiarch/strnlen-ppc32.c | 14 +- > >> .../powerpc64/multiarch/memchr-ppc64.c | 9 +- > >> sysdeps/powerpc/string-fza.h | 70 +++++++ > >> sysdeps/s390/strchr-c.c | 11 +- > >> sysdeps/s390/strchrnul-c.c | 2 - > >> sysdeps/s390/strlen-c.c | 10 +- > >> sysdeps/s390/strnlen-c.c | 14 +- > >> sysdeps/sh/string-fzb.h | 53 +++++ > >> 37 files changed, 1366 insertions(+), 851 deletions(-) > >> create mode 100644 sysdeps/alpha/string-fzb.h > >> create mode 100644 sysdeps/alpha/string-fzi.h > >> create mode 100644 sysdeps/arm/armv6t2/string-fza.h > >> create mode 100644 sysdeps/generic/string-extbyte.h > >> create mode 100644 sysdeps/generic/string-fza.h > >> create mode 100644 sysdeps/generic/string-fzb.h > >> create mode 100644 sysdeps/generic/string-fzi.h > >> create mode 100644 sysdeps/generic/string-maskoff.h > >> create mode 100644 sysdeps/generic/string-opthr.h > >> create mode 100644 sysdeps/generic/string-optype.h > >> create mode 100644 sysdeps/hppa/memcopy.h > >> create mode 100644 sysdeps/hppa/string-fzb.h > >> create mode 100644 sysdeps/hppa/string-fzi.h > >> create mode 100644 sysdeps/i386/string-opthr.h > >> create mode 100644 sysdeps/powerpc/string-fza.h > >> create mode 100644 sysdeps/sh/string-fzb.h > >> > >