From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x35.google.com (mail-oa1-x35.google.com [IPv6:2001:4860:4864:20::35]) by sourceware.org (Postfix) with ESMTPS id 3FCA73858D28 for ; Mon, 19 Sep 2022 19:59:26 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3FCA73858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-oa1-x35.google.com with SMTP id 586e51a60fabf-1280590722dso1033947fac.1 for ; Mon, 19 Sep 2022 12:59:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:from:to:cc:subject:date; bh=RfXZ8ONlgyQe37cd3Cf8NOyStmt/pTFRK9QPF4q6gck=; b=DFp1FJQW6dV+623CcdJIbKWy7lyxwlAxO3GKiqwa3xz6OMKF8TA3Z2PJp2JMy1kdZP z3bl4VDd3HjMh//QafYLCegsOcEFA44X61p+cfhh2WhaB+PDhaieJ2+egObVhINzqtLD 8t/2jwGX/F0S3SJW6/qsLwlLooD6BIDLKsmMf4wCecdaEM0P/KxEZtoyQg9eq+GUiw6h gWj8g83yY4VqbSUrXdOaaqrAd2NUlW9NlbIZPV9JkDsCccFvjlNsSFRxhb0YAFTIkeDB u1hHZNPt1yM/QMYYXc3HZflu+wDSBzb3I9P9TQFCHjAz/TTyzTrPXBfRFLZlzuzIM7rm 4RuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date; bh=RfXZ8ONlgyQe37cd3Cf8NOyStmt/pTFRK9QPF4q6gck=; b=R83KMNYD55OHniR0JujS6GE4Gp8IVC0KnVYv5NRX96gx1SX4x7d5kwpn7BtlrQNf8S ltfvaWnwlDVUIHMueDoPQYUhI9K6yXglP2bvpD7nFZ4KvYMlE/uhdXzlIOBBqUk/yOQm Es+gFH2uLpHl75DJhFWIhTQoypQBrIdunGC9gir7sTHYa0a//u7RE0c/XPrP0Lz9iT36 muiJZFO1sSeO0+JkhdaW3O5tnAY48SYBDr9cVs3z/L1nF/zs+XkXv+eRu24E7fYup0gj 3rL0nrvafHxTaFZ0O53zIu9uSS++W+2n5Ld30ikGMtJXqkWjw2jsDtMYy2/M07xU2pkM sEvw== X-Gm-Message-State: ACrzQf16IWGHy6fz5oaWm7k8s0IVb9gSNT3PksOyAOMf6th+g/qZVyS2 aUftqqX2ik20M8XkA5XiBeyqh+lRxB+bkirv X-Google-Smtp-Source: AMsMyM4vPWklT0rety6kOpPov4N0gMHNT3I9S/3wee3VZAs+gol66HYz7kBa9LI8N8SgnQ84PF3npw== X-Received: by 2002:a05:6870:70a9:b0:127:9820:f0f7 with SMTP id v41-20020a05687070a900b001279820f0f7mr11109627oae.296.1663617565147; Mon, 19 Sep 2022 12:59:25 -0700 (PDT) Received: from mandiga.. ([2804:1b3:a7c1:c266:b3ad:a56d:1ca:d6ea]) by smtp.gmail.com with ESMTPSA id l7-20020a4ab0c7000000b00475dc6c6f31sm4532291oon.45.2022.09.19.12.59.24 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Sep 2022 12:59:24 -0700 (PDT) From: Adhemerval Zanella To: libc-alpha@sourceware.org Subject: [PATCH v5 00/17] Improve generic string routines Date: Mon, 19 Sep 2022 16:59:03 -0300 Message-Id: <20220919195920.956393-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: It is done by: 1. parametrizing the internal routines (for instance the find zero in a word) so each architecture can reimplement without the need to reimplement the whole routine. 2. vectorizing more string implementations (for instance strcpy and strcmp). 3. Change some implementations to use already possible optimized ones (for instance strnlen). It makes new ports to focus on only provide optimized implementation of a hardful symbols (for instance memchr) and make its improvement to be used in a larger set of routines. For the rest of #5806 I think we can handle them later and if performance of generic implementation is closer I think it is better to just remove old assembly implementations. I also checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu, and powerpc64-linux-gnu by removing the arch-specific assembly implementation and disabling multiarch (it covers both LE and BE for 64 and 32 bits). I also checked the string routines on alpha, hppa, and sh. Changes since v4: * Removed __clz and __ctz in favor of count_leading_zero and count_trailing_zeros from longlong.h. * Use repeat_bytes more often. * Added a comment on strcmp final_cmp on why index_first_zero_ne can not be used. Changes since v3: * Rebased against master. * Dropped strcpy optimization. * Refactor strcmp implementation. * Some minor changes in comments. Changes since v2: * Move string-fz{a,b,i} to its own patch. * Add a inline implementation for __builtin_c{l,t}z to avoid using compiler provided symbols. * Add a new header, string-maskoff.h, to handle unaligned accesses on some implementation. * Fixed strcmp on LE machines. * Added a unaligned strcpy variant for architecture that define _STRING_ARCH_unaligned. * Add SH string-fzb.h (which uses cmp/str instruction to find a zero in word). Changes since v1: * Marked ChangeLog entries with [BZ #5806], as appropriate. * Reorganized the headers, so that armv6t2 and power6 need override as little as possible to use their (integer) zero detection insns. * Hopefully fixed all of the coding style issues. * Adjusted the memrchr algorithm as discussed. * Replaced the #ifdef STRRCHR etc that are used by the multiarch * files. * Tested on i386, i686, x86_64 (verified this is unused), ppc64, ppc64le --with-cpu=power8 (to use power6 in multiarch), armv7, aarch64, alpha (qemu) and hppa (qemu). Adhemerval Zanella (10): Add string-maskoff.h generic header Add string vectorized find and detection functions string: Improve generic strlen string: Improve generic strnlen string: Improve generic strchr string: Improve generic strchrnul string: Improve generic strcmp string: Improve generic memchr string: Improve generic memrchr sh: Add string-fzb.h Richard Henderson (7): Parameterize op_t from memcopy.h Parameterize OP_T_THRES from memcopy.h hppa: Add memcopy.h hppa: Add string-fzb.h and string-fzi.h alpha: Add string-fzb.h and string-fzi.h arm: Add string-fza.h powerpc: Add string-fza.h string/memchr.c | 168 ++++------------ string/memcmp.c | 4 - string/memrchr.c | 189 +++--------------- string/strchr.c | 172 +++------------- string/strchrnul.c | 156 +++------------ string/strcmp.c | 119 +++++++++-- string/strlen.c | 90 ++------- string/strnlen.c | 137 +------------ sysdeps/alpha/string-fzb.h | 51 +++++ sysdeps/alpha/string-fzi.h | 113 +++++++++++ sysdeps/arm/armv6t2/string-fza.h | 70 +++++++ sysdeps/generic/memcopy.h | 10 +- sysdeps/generic/string-extbyte.h | 37 ++++ sysdeps/generic/string-fza.h | 106 ++++++++++ sysdeps/generic/string-fzb.h | 49 +++++ sysdeps/generic/string-fzi.h | 120 +++++++++++ sysdeps/generic/string-maskoff.h | 73 +++++++ sysdeps/generic/string-opthr.h | 25 +++ sysdeps/generic/string-optype.h | 31 +++ sysdeps/hppa/memcopy.h | 42 ++++ sysdeps/hppa/string-fzb.h | 69 +++++++ sysdeps/hppa/string-fzi.h | 135 +++++++++++++ sysdeps/i386/i686/multiarch/strnlen-c.c | 14 +- sysdeps/i386/memcopy.h | 3 - sysdeps/i386/string-opthr.h | 25 +++ sysdeps/m68k/memcopy.h | 3 - sysdeps/powerpc/powerpc32/power4/memcopy.h | 5 - .../powerpc32/power4/multiarch/memchr-ppc32.c | 14 +- .../power4/multiarch/strchrnul-ppc32.c | 4 - .../power4/multiarch/strnlen-ppc32.c | 14 +- .../powerpc64/multiarch/memchr-ppc64.c | 9 +- sysdeps/powerpc/string-fza.h | 70 +++++++ sysdeps/s390/strchr-c.c | 11 +- sysdeps/s390/strchrnul-c.c | 2 - sysdeps/s390/strlen-c.c | 10 +- sysdeps/s390/strnlen-c.c | 14 +- sysdeps/sh/string-fzb.h | 53 +++++ 37 files changed, 1366 insertions(+), 851 deletions(-) create mode 100644 sysdeps/alpha/string-fzb.h create mode 100644 sysdeps/alpha/string-fzi.h create mode 100644 sysdeps/arm/armv6t2/string-fza.h create mode 100644 sysdeps/generic/string-extbyte.h create mode 100644 sysdeps/generic/string-fza.h create mode 100644 sysdeps/generic/string-fzb.h create mode 100644 sysdeps/generic/string-fzi.h create mode 100644 sysdeps/generic/string-maskoff.h create mode 100644 sysdeps/generic/string-opthr.h create mode 100644 sysdeps/generic/string-optype.h create mode 100644 sysdeps/hppa/memcopy.h create mode 100644 sysdeps/hppa/string-fzb.h create mode 100644 sysdeps/hppa/string-fzi.h create mode 100644 sysdeps/i386/string-opthr.h create mode 100644 sysdeps/powerpc/string-fza.h create mode 100644 sysdeps/sh/string-fzb.h -- 2.34.1