From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vs1-xe34.google.com (mail-vs1-xe34.google.com [IPv6:2607:f8b0:4864:20::e34]) by sourceware.org (Postfix) with ESMTPS id 1A3703888821; Thu, 12 May 2022 19:43:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1A3703888821 Received: by mail-vs1-xe34.google.com with SMTP id v139so6341952vsv.0; Thu, 12 May 2022 12:43:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=H02Ua1QjTfFQGm7dacVRtyGsy0okDT0cpgolRq0l9Lc=; b=Wx16SfV8ivV2NbjdPUuyVq842daxxqC/ZFtzAXMYRBLyIRanBXtONEdIC9UzbUnYBg sV6kvipyQ9PgNe730TjP/E3aSF6ZA6xNbxxlERLUyNue9GgYyqpim1Goam4OBGwtbsr1 pAAk5330KOHTw4cnPVd6ZjzzxKhwsAq/BpbabRorUHHeDiBVpYnXGp7U2tKVYDRxCMQ1 kj6ujSwy3iesl/LBx52RuuwfAm++B1ToPJiraE01Q5Tyh+u+Qir3RfX0xbP57f/uIsKu BRxGvFRWgugOOBCZ9Rg2vHFKpleNym2cFaBndD0SN/5Z9nxuiF2aH0flIfYKYmLrtbve bGBA== X-Gm-Message-State: AOAM533g8guw0+GMwKs9V24qz0XSMj+tJpSjpSPc2lvPNZdwFgiqk2Ry WXlm3XcyLUUgRaez/LYPpXoznLmhkehkNGI/yTXh2fO9xwo= X-Google-Smtp-Source: ABdhPJwceAUX+YeeqzaqNhzMg1Bww0kzc/RBBMoAGcg0iytpblllHKqJNlmOnwWkZdsnNWpGxD6SLlZcG661RQNj2tE= X-Received: by 2002:a67:d396:0:b0:32d:76fc:ae33 with SMTP id b22-20020a67d396000000b0032d76fcae33mr1071737vsj.28.1652384601460; Thu, 12 May 2022 12:43:21 -0700 (PDT) MIME-Version: 1.0 References: <20220323215734.3927131-1-goldstein.w.n@gmail.com> <20220323215734.3927131-11-goldstein.w.n@gmail.com> In-Reply-To: From: Sunil Pandey Date: Thu, 12 May 2022 12:42:45 -0700 Message-ID: Subject: Re: [PATCH v1 11/23] x86: Remove strspn-sse2.S and use the generic implementation To: "H.J. Lu" , Libc-stable Mailing List Cc: Noah Goldstein , GNU C Library Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-7.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, HK_RANDOM_ENVFROM, HK_RANDOM_FROM, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-stable@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-stable mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 May 2022 19:43:24 -0000 On Thu, Mar 24, 2022 at 12:00 PM H.J. Lu via Libc-alpha wrote: > > On Wed, Mar 23, 2022 at 3:01 PM Noah Goldstein wrote: > > > > The generic implementation is faster. > > > > geometric_mean(N=20) of all benchmarks New / Original: .710 > > > > All string/memory tests pass. > > --- > > Geomtric Mean N=20 runs; All functions page aligned > > len, align1, align2, pos, New Time / Old Time > > 1, 0, 0, 512, 0.824 > > 1, 1, 0, 512, 1.018 > > 1, 0, 1, 512, 0.986 > > 1, 1, 1, 512, 1.092 > > 2, 0, 0, 512, 0.86 > > 2, 2, 0, 512, 0.868 > > 2, 0, 2, 512, 0.858 > > 2, 2, 2, 512, 0.857 > > 3, 0, 0, 512, 0.836 > > 3, 3, 0, 512, 0.849 > > 3, 0, 3, 512, 0.84 > > 3, 3, 3, 512, 0.85 > > 4, 0, 0, 512, 0.843 > > 4, 4, 0, 512, 0.837 > > 4, 0, 4, 512, 0.835 > > 4, 4, 4, 512, 0.846 > > 5, 0, 0, 512, 0.852 > > 5, 5, 0, 512, 0.848 > > 5, 0, 5, 512, 0.85 > > 5, 5, 5, 512, 0.85 > > 6, 0, 0, 512, 0.853 > > 6, 6, 0, 512, 0.855 > > 6, 0, 6, 512, 0.853 > > 6, 6, 6, 512, 0.853 > > 7, 0, 0, 512, 0.857 > > 7, 7, 0, 512, 0.861 > > 7, 0, 7, 512, 0.94 > > 7, 7, 7, 512, 0.856 > > 8, 0, 0, 512, 0.927 > > 8, 0, 8, 512, 0.965 > > 9, 0, 0, 512, 0.967 > > 9, 1, 0, 512, 0.976 > > 9, 0, 9, 512, 0.887 > > 9, 1, 9, 512, 0.881 > > 10, 0, 0, 512, 0.853 > > 10, 2, 0, 512, 0.846 > > 10, 0, 10, 512, 0.855 > > 10, 2, 10, 512, 0.849 > > 11, 0, 0, 512, 0.854 > > 11, 3, 0, 512, 0.855 > > 11, 0, 11, 512, 0.85 > > 11, 3, 11, 512, 0.854 > > 12, 0, 0, 512, 0.864 > > 12, 4, 0, 512, 0.864 > > 12, 0, 12, 512, 0.867 > > 12, 4, 12, 512, 0.87 > > 13, 0, 0, 512, 0.853 > > 13, 5, 0, 512, 0.841 > > 13, 0, 13, 512, 0.837 > > 13, 5, 13, 512, 0.85 > > 14, 0, 0, 512, 0.838 > > 14, 6, 0, 512, 0.842 > > 14, 0, 14, 512, 0.818 > > 14, 6, 14, 512, 0.845 > > 15, 0, 0, 512, 0.799 > > 15, 7, 0, 512, 0.847 > > 15, 0, 15, 512, 0.787 > > 15, 7, 15, 512, 0.84 > > 16, 0, 0, 512, 0.824 > > 16, 0, 16, 512, 0.827 > > 17, 0, 0, 512, 0.817 > > 17, 1, 0, 512, 0.823 > > 17, 0, 17, 512, 0.82 > > 17, 1, 17, 512, 0.814 > > 18, 0, 0, 512, 0.81 > > 18, 2, 0, 512, 0.833 > > 18, 0, 18, 512, 0.811 > > 18, 2, 18, 512, 0.842 > > 19, 0, 0, 512, 0.823 > > 19, 3, 0, 512, 0.818 > > 19, 0, 19, 512, 0.821 > > 19, 3, 19, 512, 0.824 > > 20, 0, 0, 512, 0.814 > > 20, 4, 0, 512, 0.818 > > 20, 0, 20, 512, 0.806 > > 20, 4, 20, 512, 0.802 > > 21, 0, 0, 512, 0.835 > > 21, 5, 0, 512, 0.839 > > 21, 0, 21, 512, 0.842 > > 21, 5, 21, 512, 0.82 > > 22, 0, 0, 512, 0.824 > > 22, 6, 0, 512, 0.831 > > 22, 0, 22, 512, 0.819 > > 22, 6, 22, 512, 0.824 > > 23, 0, 0, 512, 0.816 > > 23, 7, 0, 512, 0.856 > > 23, 0, 23, 512, 0.808 > > 23, 7, 23, 512, 0.848 > > 24, 0, 0, 512, 0.88 > > 24, 0, 24, 512, 0.846 > > 25, 0, 0, 512, 0.929 > > 25, 1, 0, 512, 0.917 > > 25, 0, 25, 512, 0.884 > > 25, 1, 25, 512, 0.859 > > 26, 0, 0, 512, 0.919 > > 26, 2, 0, 512, 0.867 > > 26, 0, 26, 512, 0.914 > > 26, 2, 26, 512, 0.845 > > 27, 0, 0, 512, 0.919 > > 27, 3, 0, 512, 0.864 > > 27, 0, 27, 512, 0.917 > > 27, 3, 27, 512, 0.847 > > 28, 0, 0, 512, 0.905 > > 28, 4, 0, 512, 0.896 > > 28, 0, 28, 512, 0.898 > > 28, 4, 28, 512, 0.871 > > 29, 0, 0, 512, 0.911 > > 29, 5, 0, 512, 0.91 > > 29, 0, 29, 512, 0.905 > > 29, 5, 29, 512, 0.884 > > 30, 0, 0, 512, 0.907 > > 30, 6, 0, 512, 0.802 > > 30, 0, 30, 512, 0.906 > > 30, 6, 30, 512, 0.818 > > 31, 0, 0, 512, 0.907 > > 31, 7, 0, 512, 0.821 > > 31, 0, 31, 512, 0.89 > > 31, 7, 31, 512, 0.787 > > 4, 0, 0, 32, 0.623 > > 4, 1, 0, 32, 0.606 > > 4, 0, 1, 32, 0.6 > > 4, 1, 1, 32, 0.603 > > 4, 0, 0, 64, 0.731 > > 4, 2, 0, 64, 0.733 > > 4, 0, 2, 64, 0.734 > > 4, 2, 2, 64, 0.755 > > 4, 0, 0, 128, 0.822 > > 4, 3, 0, 128, 0.873 > > 4, 0, 3, 128, 0.89 > > 4, 3, 3, 128, 0.907 > > 4, 0, 0, 256, 0.827 > > 4, 4, 0, 256, 0.811 > > 4, 0, 4, 256, 0.794 > > 4, 4, 4, 256, 0.814 > > 4, 5, 0, 512, 0.841 > > 4, 0, 5, 512, 0.831 > > 4, 5, 5, 512, 0.845 > > 4, 0, 0, 1024, 0.861 > > 4, 6, 0, 1024, 0.857 > > 4, 0, 6, 1024, 0.9 > > 4, 6, 6, 1024, 0.861 > > 4, 0, 0, 2048, 0.879 > > 4, 7, 0, 2048, 0.875 > > 4, 0, 7, 2048, 0.883 > > 4, 7, 7, 2048, 0.88 > > 10, 1, 0, 64, 0.747 > > 10, 1, 1, 64, 0.743 > > 10, 2, 0, 64, 0.732 > > 10, 2, 2, 64, 0.729 > > 10, 3, 0, 64, 0.747 > > 10, 3, 3, 64, 0.733 > > 10, 4, 0, 64, 0.74 > > 10, 4, 4, 64, 0.751 > > 10, 5, 0, 64, 0.735 > > 10, 5, 5, 64, 0.746 > > 10, 6, 0, 64, 0.735 > > 10, 6, 6, 64, 0.733 > > 10, 7, 0, 64, 0.734 > > 10, 7, 7, 64, 0.74 > > 6, 0, 0, 0, 0.377 > > 6, 0, 0, 1, 0.369 > > 6, 0, 1, 1, 0.383 > > 6, 0, 0, 2, 0.391 > > 6, 0, 2, 2, 0.394 > > 6, 0, 0, 3, 0.416 > > 6, 0, 3, 3, 0.411 > > 6, 0, 0, 4, 0.475 > > 6, 0, 4, 4, 0.483 > > 6, 0, 0, 5, 0.473 > > 6, 0, 5, 5, 0.476 > > 6, 0, 0, 6, 0.459 > > 6, 0, 6, 6, 0.445 > > 6, 0, 0, 7, 0.433 > > 6, 0, 7, 7, 0.432 > > 6, 0, 0, 8, 0.492 > > 6, 0, 8, 8, 0.494 > > 6, 0, 0, 9, 0.476 > > 6, 0, 9, 9, 0.483 > > 6, 0, 0, 10, 0.46 > > 6, 0, 10, 10, 0.476 > > 6, 0, 0, 11, 0.463 > > 6, 0, 11, 11, 0.463 > > 6, 0, 0, 12, 0.511 > > 6, 0, 12, 12, 0.515 > > 6, 0, 0, 13, 0.506 > > 6, 0, 13, 13, 0.536 > > 6, 0, 0, 14, 0.496 > > 6, 0, 14, 14, 0.484 > > 6, 0, 0, 15, 0.473 > > 6, 0, 15, 15, 0.475 > > 6, 0, 0, 16, 0.534 > > 6, 0, 16, 16, 0.534 > > 6, 0, 0, 17, 0.525 > > 6, 0, 17, 17, 0.523 > > 6, 0, 0, 18, 0.522 > > 6, 0, 18, 18, 0.524 > > 6, 0, 0, 19, 0.512 > > 6, 0, 19, 19, 0.514 > > 6, 0, 0, 20, 0.535 > > 6, 0, 20, 20, 0.54 > > 6, 0, 0, 21, 0.543 > > 6, 0, 21, 21, 0.536 > > 6, 0, 0, 22, 0.542 > > 6, 0, 22, 22, 0.542 > > 6, 0, 0, 23, 0.529 > > 6, 0, 23, 23, 0.53 > > 6, 0, 0, 24, 0.596 > > 6, 0, 24, 24, 0.589 > > 6, 0, 0, 25, 0.583 > > 6, 0, 25, 25, 0.58 > > 6, 0, 0, 26, 0.574 > > 6, 0, 26, 26, 0.58 > > 6, 0, 0, 27, 0.575 > > 6, 0, 27, 27, 0.558 > > 6, 0, 0, 28, 0.606 > > 6, 0, 28, 28, 0.606 > > 6, 0, 0, 29, 0.589 > > 6, 0, 29, 29, 0.595 > > 6, 0, 0, 30, 0.592 > > 6, 0, 30, 30, 0.585 > > 6, 0, 0, 31, 0.585 > > 6, 0, 31, 31, 0.579 > > 6, 0, 0, 32, 0.625 > > 6, 0, 32, 32, 0.615 > > 6, 0, 0, 33, 0.615 > > 6, 0, 33, 33, 0.61 > > 6, 0, 0, 34, 0.604 > > 6, 0, 34, 34, 0.6 > > 6, 0, 0, 35, 0.602 > > 6, 0, 35, 35, 0.608 > > 6, 0, 0, 36, 0.644 > > 6, 0, 36, 36, 0.644 > > 6, 0, 0, 37, 0.658 > > 6, 0, 37, 37, 0.651 > > 6, 0, 0, 38, 0.644 > > 6, 0, 38, 38, 0.649 > > 6, 0, 0, 39, 0.626 > > 6, 0, 39, 39, 0.632 > > 6, 0, 0, 40, 0.662 > > 6, 0, 40, 40, 0.661 > > 6, 0, 0, 41, 0.656 > > 6, 0, 41, 41, 0.655 > > 6, 0, 0, 42, 0.643 > > 6, 0, 42, 42, 0.637 > > 6, 0, 0, 43, 0.622 > > 6, 0, 43, 43, 0.628 > > 6, 0, 0, 44, 0.673 > > 6, 0, 44, 44, 0.687 > > 6, 0, 0, 45, 0.661 > > 6, 0, 45, 45, 0.659 > > 6, 0, 0, 46, 0.657 > > 6, 0, 46, 46, 0.653 > > 6, 0, 0, 47, 0.658 > > 6, 0, 47, 47, 0.65 > > 6, 0, 0, 48, 0.678 > > 6, 0, 48, 48, 0.683 > > 6, 0, 0, 49, 0.676 > > 6, 0, 49, 49, 0.661 > > 6, 0, 0, 50, 0.672 > > 6, 0, 50, 50, 0.662 > > 6, 0, 0, 51, 0.656 > > 6, 0, 51, 51, 0.659 > > 6, 0, 0, 52, 0.682 > > 6, 0, 52, 52, 0.686 > > 6, 0, 0, 53, 0.67 > > 6, 0, 53, 53, 0.674 > > 6, 0, 0, 54, 0.663 > > 6, 0, 54, 54, 0.675 > > 6, 0, 0, 55, 0.662 > > 6, 0, 55, 55, 0.665 > > 6, 0, 0, 56, 0.681 > > 6, 0, 56, 56, 0.697 > > 6, 0, 0, 57, 0.686 > > 6, 0, 57, 57, 0.687 > > 6, 0, 0, 58, 0.701 > > 6, 0, 58, 58, 0.693 > > 6, 0, 0, 59, 0.709 > > 6, 0, 59, 59, 0.698 > > 6, 0, 0, 60, 0.708 > > 6, 0, 60, 60, 0.708 > > 6, 0, 0, 61, 0.709 > > 6, 0, 61, 61, 0.716 > > 6, 0, 0, 62, 0.709 > > 6, 0, 62, 62, 0.707 > > 6, 0, 0, 63, 0.703 > > 6, 0, 63, 63, 0.716 > > > > .../{strspn-sse2.S => strspn-sse2.c} | 8 +- > > sysdeps/x86_64/strspn.S | 112 ------------------ > > 2 files changed, 4 insertions(+), 116 deletions(-) > > rename sysdeps/x86_64/multiarch/{strspn-sse2.S => strspn-sse2.c} (86%) > > delete mode 100644 sysdeps/x86_64/strspn.S > > > > diff --git a/sysdeps/x86_64/multiarch/strspn-sse2.S b/sysdeps/x86_64/multiarch/strspn-sse2.c > > similarity index 86% > > rename from sysdeps/x86_64/multiarch/strspn-sse2.S > > rename to sysdeps/x86_64/multiarch/strspn-sse2.c > > index e0a095f25a..61cc6cb0a5 100644 > > --- a/sysdeps/x86_64/multiarch/strspn-sse2.S > > +++ b/sysdeps/x86_64/multiarch/strspn-sse2.c > > @@ -1,4 +1,4 @@ > > -/* strspn optimized with SSE2. > > +/* strspn. > > Copyright (C) 2017-2022 Free Software Foundation, Inc. > > This file is part of the GNU C Library. > > > > @@ -19,10 +19,10 @@ > > #if IS_IN (libc) > > > > # include > > -# define strspn __strspn_sse2 > > +# define STRSPN __strspn_sse2 > > > > # undef libc_hidden_builtin_def > > -# define libc_hidden_builtin_def(strspn) > > +# define libc_hidden_builtin_def(STRSPN) > > #endif > > > > -#include > > +#include > > diff --git a/sysdeps/x86_64/strspn.S b/sysdeps/x86_64/strspn.S > > deleted file mode 100644 > > index 61b76ee0a1..0000000000 > > --- a/sysdeps/x86_64/strspn.S > > +++ /dev/null > > @@ -1,112 +0,0 @@ > > -/* strspn (str, ss) -- Return the length of the initial segment of STR > > - which contains only characters from SS. > > - For AMD x86-64. > > - Copyright (C) 1994-2022 Free Software Foundation, Inc. > > - This file is part of the GNU C Library. > > - > > - The GNU C Library is free software; you can redistribute it and/or > > - modify it under the terms of the GNU Lesser General Public > > - License as published by the Free Software Foundation; either > > - version 2.1 of the License, or (at your option) any later version. > > - > > - The GNU C Library is distributed in the hope that it will be useful, > > - but WITHOUT ANY WARRANTY; without even the implied warranty of > > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > - Lesser General Public License for more details. > > - > > - You should have received a copy of the GNU Lesser General Public > > - License along with the GNU C Library; if not, see > > - . */ > > - > > -#include > > - > > - .text > > -ENTRY (strspn) > > - > > - movq %rdi, %rdx /* Save SRC. */ > > - > > - /* First we create a table with flags for all possible characters. > > - For the ASCII (7bit/8bit) or ISO-8859-X character sets which are > > - supported by the C string functions we have 256 characters. > > - Before inserting marks for the stop characters we clear the whole > > - table. */ > > - movq %rdi, %r8 /* Save value. */ > > - subq $256, %rsp /* Make space for 256 bytes. */ > > - cfi_adjust_cfa_offset(256) > > - movl $32, %ecx /* 32*8 bytes = 256 bytes. */ > > - movq %rsp, %rdi > > - xorl %eax, %eax /* We store 0s. */ > > - cld > > - rep > > - stosq > > - > > - movq %rsi, %rax /* Setup stopset. */ > > - > > -/* For understanding the following code remember that %rcx == 0 now. > > - Although all the following instruction only modify %cl we always > > - have a correct zero-extended 64-bit value in %rcx. */ > > - > > - .p2align 4 > > -L(2): movb (%rax), %cl /* get byte from stopset */ > > - testb %cl, %cl /* is NUL char? */ > > - jz L(1) /* yes => start compare loop */ > > - movb %cl, (%rsp,%rcx) /* set corresponding byte in stopset table */ > > - > > - movb 1(%rax), %cl /* get byte from stopset */ > > - testb $0xff, %cl /* is NUL char? */ > > - jz L(1) /* yes => start compare loop */ > > - movb %cl, (%rsp,%rcx) /* set corresponding byte in stopset table */ > > - > > - movb 2(%rax), %cl /* get byte from stopset */ > > - testb $0xff, %cl /* is NUL char? */ > > - jz L(1) /* yes => start compare loop */ > > - movb %cl, (%rsp,%rcx) /* set corresponding byte in stopset table */ > > - > > - movb 3(%rax), %cl /* get byte from stopset */ > > - addq $4, %rax /* increment stopset pointer */ > > - movb %cl, (%rsp,%rcx) /* set corresponding byte in stopset table */ > > - testb $0xff, %cl /* is NUL char? */ > > - jnz L(2) /* no => process next dword from stopset */ > > - > > -L(1): leaq -4(%rdx), %rax /* prepare loop */ > > - > > - /* We use a neat trick for the following loop. Normally we would > > - have to test for two termination conditions > > - 1. a character in the stopset was found > > - and > > - 2. the end of the string was found > > - But as a sign that the character is in the stopset we store its > > - value in the table. But the value of NUL is NUL so the loop > > - terminates for NUL in every case. */ > > - > > - .p2align 4 > > -L(3): addq $4, %rax /* adjust pointer for full loop round */ > > - > > - movb (%rax), %cl /* get byte from string */ > > - testb %cl, (%rsp,%rcx) /* is it contained in skipset? */ > > - jz L(4) /* no => return */ > > - > > - movb 1(%rax), %cl /* get byte from string */ > > - testb %cl, (%rsp,%rcx) /* is it contained in skipset? */ > > - jz L(5) /* no => return */ > > - > > - movb 2(%rax), %cl /* get byte from string */ > > - testb %cl, (%rsp,%rcx) /* is it contained in skipset? */ > > - jz L(6) /* no => return */ > > - > > - movb 3(%rax), %cl /* get byte from string */ > > - testb %cl, (%rsp,%rcx) /* is it contained in skipset? */ > > - jnz L(3) /* yes => start loop again */ > > - > > - incq %rax /* adjust pointer */ > > -L(6): incq %rax > > -L(5): incq %rax > > - > > -L(4): addq $256, %rsp /* remove stopset */ > > - cfi_adjust_cfa_offset(-256) > > - subq %rdx, %rax /* we have to return the number of valid > > - characters, so compute distance to first > > - non-valid character */ > > - ret > > -END (strspn) > > -libc_hidden_builtin_def (strspn) > > -- > > 2.25.1 > > > > LGTM. > > Reviewed-by: H.J. Lu > > Thanks. > > -- > H.J. I would like to backport this patch to release branches. Any comments or objections? --Sunil