From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot1-x32a.google.com (mail-ot1-x32a.google.com [IPv6:2607:f8b0:4864:20::32a]) by sourceware.org (Postfix) with ESMTPS id 1F4B33858D33 for ; Thu, 5 Jan 2023 21:56:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1F4B33858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-ot1-x32a.google.com with SMTP id j16-20020a056830271000b0067202045ee9so23285246otu.7 for ; Thu, 05 Jan 2023 13:56:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=content-transfer-encoding:in-reply-to:organization:from:references :to:content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=goI5KLSO06shqa7LDIYHiV044uWmK8Lb4+UHeI0jLEM=; b=Q0HLorEckpbYA0nUcABIctZB5rvZqXFULUG2LyiHJVwtfa/Yr6Z9KsWu9Ctkgp87N6 7z/IX3e2JWYCueAYdCRO1fgMI4DpdNuEOV+690qW32t3IQdVnG1Ysd1Hz56tV4uOZH61 CX+nf7+asYXvUT9x7jngTgo/ZR1adU9tSD30R0aUp4B4B2xTm4bdC2yq3sOBz2cfJm8K KN/mWjEvboGs0j1zB2ZFbjzMjXxoafdxy7jh/DK2eYns6rRig4kWVTO/xI3Zqbg0SCcd 5Ih9pc/p7q7QLBpyuL83JsAeYu/NJFzIgbseKGXW4vPvEEpfHY0aHM3bef0a7Hey3P2P qC1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:organization:from:references :to:content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=goI5KLSO06shqa7LDIYHiV044uWmK8Lb4+UHeI0jLEM=; b=rjE+UBembU9lbY1IUa7M8iLf32NdpqGDv5lSNi9htOHkUhtnsLWgurplXPbAITBKDv 4VfNGwPKvVHl95nzziTp20bmXcUU12rZfJLfX/17h19KGAdI6n1eau/fMySQS8Uc1kEy t/pVqXKmBHxvMiVMOIzC/VQHgnnthrkeWFIf1coiai5f7kUJgJUdGDnB3gTWq2p1r8xs rCq5PI87WbgisuPEvbfVg7ptr00jj139dIlheRGgvGQyEhrgIUDXosv0wqgWQqnRPViA a8Kq+taHucHiJk9aqJyumhxuO5UHhVNXhmFwHR377g3Tr6kA+xp0+9kDldX4f5LXmty0 3KOw== X-Gm-Message-State: AFqh2ko2fghXTS2M87CvUN8EqrLSQJ5l/0D0//+40+Z7mskS1S19r6kn VpIrhKhNyL1Y53Ri1oY7/+AiCA== X-Google-Smtp-Source: AMrXdXu0F8dDupbhV4Lsc2lpsseCEBjRVw2BBkm3Rg4pZ7DjySyGo1dRRiVJWnKfskR/Wbk8jzh+1Q== X-Received: by 2002:a9d:76c4:0:b0:67b:fa87:565d with SMTP id p4-20020a9d76c4000000b0067bfa87565dmr27364151otl.30.1672955780210; Thu, 05 Jan 2023 13:56:20 -0800 (PST) Received: from ?IPV6:2804:1b3:a7c0:1729:d199:4137:cb7:b882? ([2804:1b3:a7c0:1729:d199:4137:cb7:b882]) by smtp.gmail.com with ESMTPSA id f19-20020a056830265300b00667d9a866b0sm17738919otu.59.2023.01.05.13.56.18 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 05 Jan 2023 13:56:19 -0800 (PST) Message-ID: Date: Thu, 5 Jan 2023 18:56:17 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.6.1 Subject: Re: [PATCH v5 00/17] Improve generic string routines Content-Language: en-US To: Xi Ruoyao , libc-alpha@sourceware.org References: <20220919195920.956393-1-adhemerval.zanella@linaro.org> <51f962c19445d29a3187eeedc220558926b56a60.camel@xry111.site> From: Adhemerval Zanella Netto Organization: Linaro In-Reply-To: <51f962c19445d29a3187eeedc220558926b56a60.camel@xry111.site> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-5.5 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Unfortunately no one worked on reviewing it. It would be good to have it for 2.37, although I think it is too late. However, since most architectures do use arch-specific routines, I think the possible disruption of using this patchset should be minimal. On 05/12/22 14:07, Xi Ruoyao wrote: > Hi, > > Any status update on this series? > > On Mon, 2022-09-19 at 16:59 -0300, Adhemerval Zanella via Libc-alpha > wrote: >> It is done by: >> >>   1. parametrizing the internal routines (for instance the find zero >>      in a word) so each architecture can reimplement without the need >>      to reimplement the whole routine. >> >>   2. vectorizing more string implementations (for instance strcpy >>      and strcmp). >> >>   3. Change some implementations to use already possible optimized >>      ones (for instance strnlen).  It makes new ports to focus on >>      only provide optimized implementation of a hardful symbols >>      (for instance memchr) and make its improvement to be used in >>      a larger set of routines. >> >> For the rest of #5806 I think we can handle them later and if >> performance of generic implementation is closer I think it is better >> to just remove old assembly implementations. >> >> I also checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu, >> and powerpc64-linux-gnu by removing the arch-specific assembly >> implementation and disabling multiarch (it covers both LE and BE >> for 64 and 32 bits). I also checked the string routines on alpha, >> hppa, >> and sh. >> >> Changes since v4: >>   * Removed __clz and __ctz in favor of count_leading_zero and >>     count_trailing_zeros from longlong.h. >>   * Use repeat_bytes more often. >>   * Added a comment on strcmp final_cmp on why index_first_zero_ne can >>     not be used. >> >> Changes since v3: >>   * Rebased against master. >>   * Dropped strcpy optimization. >>   * Refactor strcmp implementation. >>   * Some minor changes in comments. >> >> Changes since v2: >>   * Move string-fz{a,b,i} to its own patch. >>   * Add a inline implementation for __builtin_c{l,t}z to avoid using >>     compiler provided symbols. >>   * Add a new header, string-maskoff.h, to handle unaligned accesses >>     on some implementation. >>   * Fixed strcmp on LE machines. >>   * Added a unaligned strcpy variant for architecture that define >>     _STRING_ARCH_unaligned. >>   * Add SH string-fzb.h (which uses cmp/str instruction to find >>     a zero in word). >> >> Changes since v1: >>   * Marked ChangeLog entries with [BZ #5806], as appropriate. >>   * Reorganized the headers, so that armv6t2 and power6 need override >>     as little as possible to use their (integer) zero detection insns. >>   * Hopefully fixed all of the coding style issues. >>   * Adjusted the memrchr algorithm as discussed. >>   * Replaced the #ifdef STRRCHR etc that are used by the multiarch >>   * files. >>   * Tested on i386, i686, x86_64 (verified this is unused), ppc64, >>     ppc64le --with-cpu=power8 (to use power6 in multiarch), armv7, >>     aarch64, alpha (qemu) and hppa (qemu). >> >> Adhemerval Zanella (10): >>   Add string-maskoff.h generic header >>   Add string vectorized find and detection functions >>   string: Improve generic strlen >>   string: Improve generic strnlen >>   string: Improve generic strchr >>   string: Improve generic strchrnul >>   string: Improve generic strcmp >>   string: Improve generic memchr >>   string: Improve generic memrchr >>   sh: Add string-fzb.h >> >> Richard Henderson (7): >>   Parameterize op_t from memcopy.h >>   Parameterize OP_T_THRES from memcopy.h >>   hppa: Add memcopy.h >>   hppa: Add string-fzb.h and string-fzi.h >>   alpha: Add string-fzb.h and string-fzi.h >>   arm: Add string-fza.h >>   powerpc: Add string-fza.h >> >>  string/memchr.c                               | 168 ++++------------ >>  string/memcmp.c                               |   4 - >>  string/memrchr.c                              | 189 +++-------------- >> - >>  string/strchr.c                               | 172 +++------------- >>  string/strchrnul.c                            | 156 +++------------ >>  string/strcmp.c                               | 119 +++++++++-- >>  string/strlen.c                               |  90 ++------- >>  string/strnlen.c                              | 137 +------------ >>  sysdeps/alpha/string-fzb.h                    |  51 +++++ >>  sysdeps/alpha/string-fzi.h                    | 113 +++++++++++ >>  sysdeps/arm/armv6t2/string-fza.h              |  70 +++++++ >>  sysdeps/generic/memcopy.h                     |  10 +- >>  sysdeps/generic/string-extbyte.h              |  37 ++++ >>  sysdeps/generic/string-fza.h                  | 106 ++++++++++ >>  sysdeps/generic/string-fzb.h                  |  49 +++++ >>  sysdeps/generic/string-fzi.h                  | 120 +++++++++++ >>  sysdeps/generic/string-maskoff.h              |  73 +++++++ >>  sysdeps/generic/string-opthr.h                |  25 +++ >>  sysdeps/generic/string-optype.h               |  31 +++ >>  sysdeps/hppa/memcopy.h                        |  42 ++++ >>  sysdeps/hppa/string-fzb.h                     |  69 +++++++ >>  sysdeps/hppa/string-fzi.h                     | 135 +++++++++++++ >>  sysdeps/i386/i686/multiarch/strnlen-c.c       |  14 +- >>  sysdeps/i386/memcopy.h                        |   3 - >>  sysdeps/i386/string-opthr.h                   |  25 +++ >>  sysdeps/m68k/memcopy.h                        |   3 - >>  sysdeps/powerpc/powerpc32/power4/memcopy.h    |   5 - >>  .../powerpc32/power4/multiarch/memchr-ppc32.c |  14 +- >>  .../power4/multiarch/strchrnul-ppc32.c        |   4 - >>  .../power4/multiarch/strnlen-ppc32.c          |  14 +- >>  .../powerpc64/multiarch/memchr-ppc64.c        |   9 +- >>  sysdeps/powerpc/string-fza.h                  |  70 +++++++ >>  sysdeps/s390/strchr-c.c                       |  11 +- >>  sysdeps/s390/strchrnul-c.c                    |   2 - >>  sysdeps/s390/strlen-c.c                       |  10 +- >>  sysdeps/s390/strnlen-c.c                      |  14 +- >>  sysdeps/sh/string-fzb.h                       |  53 +++++ >>  37 files changed, 1366 insertions(+), 851 deletions(-) >>  create mode 100644 sysdeps/alpha/string-fzb.h >>  create mode 100644 sysdeps/alpha/string-fzi.h >>  create mode 100644 sysdeps/arm/armv6t2/string-fza.h >>  create mode 100644 sysdeps/generic/string-extbyte.h >>  create mode 100644 sysdeps/generic/string-fza.h >>  create mode 100644 sysdeps/generic/string-fzb.h >>  create mode 100644 sysdeps/generic/string-fzi.h >>  create mode 100644 sysdeps/generic/string-maskoff.h >>  create mode 100644 sysdeps/generic/string-opthr.h >>  create mode 100644 sysdeps/generic/string-optype.h >>  create mode 100644 sysdeps/hppa/memcopy.h >>  create mode 100644 sysdeps/hppa/string-fzb.h >>  create mode 100644 sysdeps/hppa/string-fzi.h >>  create mode 100644 sysdeps/i386/string-opthr.h >>  create mode 100644 sysdeps/powerpc/string-fza.h >>  create mode 100644 sysdeps/sh/string-fzb.h >> >