public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: "H.J. Lu" <hjl.tools@gmail.com>
To: noah <goldstein.w.n@gmail.com>
Cc: GNU C Library <libc-alpha@sourceware.org>,
	"Carlos O'Donell" <carlos@systemhalted.org>
Subject: Re: [PATCH v4 2/2] x86: Add additional benchmarks and tests for strchr
Date: Mon, 8 Feb 2021 11:34:58 -0800	[thread overview]
Message-ID: <CAMe9rOq+v9Uk8b3LG=E401qm6gg4R04NRaZ1pGVZQRfTWuweJQ@mail.gmail.com> (raw)
In-Reply-To: <CAMe9rOoUBW+3EY-ziZYp7Mj_MhKAGovceR=-PqiMFZDfyJnBTQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 9109 bytes --]

On Mon, Feb 8, 2021 at 6:08 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Tue, Feb 2, 2021 at 9:39 PM <goldstein.w.n@gmail.com> wrote:
> >
> > From: noah <goldstein.w.n@gmail.com>
> >
> > This patch adds additional benchmarks and tests for string size of
> > 4096 and several benchmarks for string size 256 with different
> > alignments.
> >
> > Signed-off-by: noah <goldstein.w.n@gmail.com>
> > ---
> > Added 2 additional benchmark and test sizes:
> >
> > 4096: Just feels like a natural "large" size to test
> >
> > 256 with multiple alignments: This essentially is to test how
> > expensive the initial work prior to the 4x loop is depending on
> > different alignments.
> >
> > results from bench-strchr: All times are in seconds and the medium of
> > 100 runs.  Old is current strchr-avx2.S implementation. New is this
> > patch.
> >
> > Summary: New is definetly faster for medium -> large sizes. Once the
> > 4x loop is hit there is a 10%+ speedup and New always wins out. For
> > smaller sizes there is more variance as to which is faster and the
> > differences are small. Generally it seems the New version wins
> > out. This is likely because 0 - 31 sized strings are the fast path for
> > new (no jmp). Also something that is neat is the significant
> > performance improved for alignment 96 and 112. This is because the 5x
> > vectors before 4x loop really favor that alignment.
> >
> > Benchmarking CPU:
> > Icelake: Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
> >
> > size, algn, Old T , New T  -------- Win  Dif
> > 0   , 0   , 2.54  , 2.52   -------- New  -0.02
> > 1   , 0   , 2.57  , 2.52   -------- New  -0.05
> > 2   , 0   , 2.56  , 2.52   -------- New  -0.04
> > 3   , 0   , 2.58  , 2.54   -------- New  -0.04
> > 4   , 0   , 2.61  , 2.55   -------- New  -0.06
> > 5   , 0   , 2.65  , 2.62   -------- New  -0.03
> > 6   , 0   , 2.73  , 2.74   -------- Old  -0.01
> > 7   , 0   , 2.75  , 2.74   -------- New  -0.01
> > 8   , 0   , 2.62  , 2.6    -------- New  -0.02
> > 9   , 0   , 2.73  , 2.75   -------- Old  -0.02
> > 10  , 0   , 2.74  , 2.74   -------- Eq    N/A
> > 11  , 0   , 2.76  , 2.72   -------- New  -0.04
> > 12  , 0   , 2.74  , 2.72   -------- New  -0.02
> > 13  , 0   , 2.75  , 2.72   -------- New  -0.03
> > 14  , 0   , 2.74  , 2.73   -------- New  -0.01
> > 15  , 0   , 2.74  , 2.73   -------- New  -0.01
> > 16  , 0   , 2.74  , 2.73   -------- New  -0.01
> > 17  , 0   , 2.74  , 2.74   -------- Eq    N/A
> > 18  , 0   , 2.73  , 2.73   -------- Eq    N/A
> > 19  , 0   , 2.73  , 2.73   -------- Eq    N/A
> > 20  , 0   , 2.73  , 2.73   -------- Eq    N/A
> > 21  , 0   , 2.73  , 2.72   -------- New  -0.01
> > 22  , 0   , 2.71  , 2.74   -------- Old  -0.03
> > 23  , 0   , 2.71  , 2.69   -------- New  -0.02
> > 24  , 0   , 2.68  , 2.67   -------- New  -0.01
> > 25  , 0   , 2.66  , 2.62   -------- New  -0.04
> > 26  , 0   , 2.64  , 2.62   -------- New  -0.02
> > 27  , 0   , 2.71  , 2.64   -------- New  -0.07
> > 28  , 0   , 2.67  , 2.69   -------- Old  -0.02
> > 29  , 0   , 2.72  , 2.72   -------- Eq    N/A
> > 30  , 0   , 2.68  , 2.69   -------- Old  -0.01
> > 31  , 0   , 2.68  , 2.68   -------- Eq    N/A
> > 32  , 0   , 3.51  , 3.52   -------- Old  -0.01
> > 32  , 1   , 3.52  , 3.51   -------- New  -0.01
> > 64  , 0   , 3.97  , 3.93   -------- New  -0.04
> > 64  , 2   , 3.95  , 3.9    -------- New  -0.05
> > 64  , 1   , 4.0   , 3.93   -------- New  -0.07
> > 64  , 3   , 3.97  , 3.88   -------- New  -0.09
> > 64  , 4   , 3.95  , 3.89   -------- New  -0.06
> > 64  , 5   , 3.94  , 3.9    -------- New  -0.04
> > 64  , 6   , 3.97  , 3.9    -------- New  -0.07
> > 64  , 7   , 3.97  , 3.91   -------- New  -0.06
> > 96  , 0   , 4.74  , 4.52   -------- New  -0.22
> > 128 , 0   , 5.29  , 5.19   -------- New  -0.1
> > 128 , 2   , 5.29  , 5.15   -------- New  -0.14
> > 128 , 3   , 5.31  , 5.22   -------- New  -0.09
> > 256 , 0   , 11.19 , 9.81   -------- New  -1.38
> > 256 , 3   , 11.19 , 9.84   -------- New  -1.35
> > 256 , 4   , 11.2  , 9.88   -------- New  -1.32
> > 256 , 16  , 11.21 , 9.79   -------- New  -1.42
> > 256 , 32  , 11.39 , 10.34  -------- New  -1.05
> > 256 , 48  , 11.88 , 10.56  -------- New  -1.32
> > 256 , 64  , 11.82 , 10.83  -------- New  -0.99
> > 256 , 80  , 11.85 , 10.86  -------- New  -0.99
> > 256 , 96  , 9.56  , 8.76   -------- New  -0.8
> > 256 , 112 , 9.55  , 8.9    -------- New  -0.65
> > 512 , 0   , 15.76 , 13.72  -------- New  -2.04
> > 512 , 4   , 15.72 , 13.74  -------- New  -1.98
> > 512 , 5   , 15.73 , 13.74  -------- New  -1.99
> > 1024, 0   , 24.85 , 21.33  -------- New  -3.52
> > 1024, 5   , 24.86 , 21.27  -------- New  -3.59
> > 1024, 6   , 24.87 , 21.32  -------- New  -3.55
> > 2048, 0   , 45.75 , 36.7   -------- New  -9.05
> > 2048, 6   , 43.91 , 35.42  -------- New  -8.49
> > 2048, 7   , 44.43 , 36.37  -------- New  -8.06
> > 4096, 0   , 96.94 , 81.34  -------- New  -15.6
> > 4096, 7   , 97.01 , 81.32  -------- New  -15.69
> >
> >  benchtests/bench-strchr.c | 26 +++++++++++++++++++++++++-
> >  string/test-strchr.c      | 26 +++++++++++++++++++++++++-
> >  2 files changed, 50 insertions(+), 2 deletions(-)
> >
> > diff --git a/benchtests/bench-strchr.c b/benchtests/bench-strchr.c
> > index bf493fe458..4ce2369d9b 100644
> > --- a/benchtests/bench-strchr.c
> > +++ b/benchtests/bench-strchr.c
> > @@ -100,7 +100,7 @@ do_test (size_t align, size_t pos, size_t len, int seek_char, int max_char)
> >    size_t i;
> >    CHAR *result;
> >    CHAR *buf = (CHAR *) buf1;
> > -  align &= 15;
> > +  align &= 127;
> >    if ((align + len) * sizeof (CHAR) >= page_size)
> >      return;
> >
> > @@ -151,12 +151,24 @@ test_main (void)
> >        do_test (i, 16 << i, 2048, SMALL_CHAR, MIDDLE_CHAR);
> >      }
> >
> > +  for (i = 1; i < 8; ++i)
> > +    {
> > +      do_test (0, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
> > +      do_test (i, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
> > +    }
> > +
> >    for (i = 1; i < 8; ++i)
> >      {
> >        do_test (i, 64, 256, SMALL_CHAR, MIDDLE_CHAR);
> >        do_test (i, 64, 256, SMALL_CHAR, BIG_CHAR);
> >      }
> >
> > +  for (i = 0; i < 8; ++i)
> > +    {
> > +      do_test (16 * i, 256, 512, SMALL_CHAR, MIDDLE_CHAR);
> > +      do_test (16 * i, 256, 512, SMALL_CHAR, BIG_CHAR);
> > +    }
> > +
> >    for (i = 0; i < 32; ++i)
> >      {
> >        do_test (0, i, i + 1, SMALL_CHAR, MIDDLE_CHAR);
> > @@ -169,12 +181,24 @@ test_main (void)
> >        do_test (i, 16 << i, 2048, 0, MIDDLE_CHAR);
> >      }
> >
> > +  for (i = 1; i < 8; ++i)
> > +    {
> > +      do_test (0, 16 << i, 4096, 0, MIDDLE_CHAR);
> > +      do_test (i, 16 << i, 4096, 0, MIDDLE_CHAR);
> > +    }
> > +
> >    for (i = 1; i < 8; ++i)
> >      {
> >        do_test (i, 64, 256, 0, MIDDLE_CHAR);
> >        do_test (i, 64, 256, 0, BIG_CHAR);
> >      }
> >
> > +  for (i = 0; i < 8; ++i)
> > +    {
> > +      do_test (16 * i, 256, 512, 0, MIDDLE_CHAR);
> > +      do_test (16 * i, 256, 512, 0, BIG_CHAR);
> > +    }
> > +
> >    for (i = 0; i < 32; ++i)
> >      {
> >        do_test (0, i, i + 1, 0, MIDDLE_CHAR);
> > diff --git a/string/test-strchr.c b/string/test-strchr.c
> > index 5b6022746c..2cf4ea2add 100644
> > --- a/string/test-strchr.c
> > +++ b/string/test-strchr.c
> > @@ -130,7 +130,7 @@ do_test (size_t align, size_t pos, size_t len, int seek_char, int max_char)
> >    size_t i;
> >    CHAR *result;
> >    CHAR *buf = (CHAR *) buf1;
> > -  align &= 15;
> > +  align &= 127;
> >    if ((align + len) * sizeof (CHAR) >= page_size)
> >      return;
> >
> > @@ -259,12 +259,24 @@ test_main (void)
> >        do_test (i, 16 << i, 2048, SMALL_CHAR, MIDDLE_CHAR);
> >      }
> >
> > +  for (i = 1; i < 8; ++i)
> > +    {
> > +      do_test (0, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
> > +      do_test (i, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
> > +    }
> > +
> >    for (i = 1; i < 8; ++i)
> >      {
> >        do_test (i, 64, 256, SMALL_CHAR, MIDDLE_CHAR);
> >        do_test (i, 64, 256, SMALL_CHAR, BIG_CHAR);
> >      }
> >
> > +  for (i = 0; i < 8; ++i)
> > +    {
> > +      do_test (16 * i, 256, 512, SMALL_CHAR, MIDDLE_CHAR);
> > +      do_test (16 * i, 256, 512, SMALL_CHAR, BIG_CHAR);
> > +    }
> > +
> >    for (i = 0; i < 32; ++i)
> >      {
> >        do_test (0, i, i + 1, SMALL_CHAR, MIDDLE_CHAR);
> > @@ -277,12 +289,24 @@ test_main (void)
> >        do_test (i, 16 << i, 2048, 0, MIDDLE_CHAR);
> >      }
> >
> > +  for (i = 1; i < 8; ++i)
> > +    {
> > +      do_test (0, 16 << i, 4096, 0, MIDDLE_CHAR);
> > +      do_test (i, 16 << i, 4096, 0, MIDDLE_CHAR);
> > +    }
> > +
> >    for (i = 1; i < 8; ++i)
> >      {
> >        do_test (i, 64, 256, 0, MIDDLE_CHAR);
> >        do_test (i, 64, 256, 0, BIG_CHAR);
> >      }
> >
> > +  for (i = 0; i < 8; ++i)
> > +    {
> > +      do_test (16 * i, 256, 512, 0, MIDDLE_CHAR);
> > +      do_test (16 * i, 256, 512, 0, BIG_CHAR);
> > +    }
> > +
> >    for (i = 0; i < 32; ++i)
> >      {
> >        do_test (0, i, i + 1, 0, MIDDLE_CHAR);
> > --
> > 2.29.2
> >
>
> LGTM.
>
> Thanks.
>
> --
> H.J.

This is the updated patch with extra white spaces fixed I am checking in.


-- 
H.J.

[-- Attachment #2: 0002-strchr-Add-additional-benchmarks-and-tests.patch --]
[-- Type: text/x-patch, Size: 3895 bytes --]

From a00e2fe3dfd3a4e218ba6c1c3445ee68322ddda9 Mon Sep 17 00:00:00 2001
From: noah <goldstein.w.n@gmail.com>
Date: Wed, 3 Feb 2021 00:39:00 -0500
Subject: [PATCH 2/2] strchr: Add additional benchmarks and tests

This patch adds additional benchmarks and tests for string size of
4096 and several benchmarks for string size 256 with different
alignments.
---
 benchtests/bench-strchr.c | 26 +++++++++++++++++++++++++-
 string/test-strchr.c      | 26 +++++++++++++++++++++++++-
 2 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/benchtests/bench-strchr.c b/benchtests/bench-strchr.c
index bf493fe458..4ce2369d9b 100644
--- a/benchtests/bench-strchr.c
+++ b/benchtests/bench-strchr.c
@@ -100,7 +100,7 @@ do_test (size_t align, size_t pos, size_t len, int seek_char, int max_char)
   size_t i;
   CHAR *result;
   CHAR *buf = (CHAR *) buf1;
-  align &= 15;
+  align &= 127;
   if ((align + len) * sizeof (CHAR) >= page_size)
     return;
 
@@ -151,12 +151,24 @@ test_main (void)
       do_test (i, 16 << i, 2048, SMALL_CHAR, MIDDLE_CHAR);
     }
 
+  for (i = 1; i < 8; ++i)
+    {
+      do_test (0, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
+      do_test (i, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
+    }
+
   for (i = 1; i < 8; ++i)
     {
       do_test (i, 64, 256, SMALL_CHAR, MIDDLE_CHAR);
       do_test (i, 64, 256, SMALL_CHAR, BIG_CHAR);
     }
 
+  for (i = 0; i < 8; ++i)
+    {
+      do_test (16 * i, 256, 512, SMALL_CHAR, MIDDLE_CHAR);
+      do_test (16 * i, 256, 512, SMALL_CHAR, BIG_CHAR);
+    }
+
   for (i = 0; i < 32; ++i)
     {
       do_test (0, i, i + 1, SMALL_CHAR, MIDDLE_CHAR);
@@ -169,12 +181,24 @@ test_main (void)
       do_test (i, 16 << i, 2048, 0, MIDDLE_CHAR);
     }
 
+  for (i = 1; i < 8; ++i)
+    {
+      do_test (0, 16 << i, 4096, 0, MIDDLE_CHAR);
+      do_test (i, 16 << i, 4096, 0, MIDDLE_CHAR);
+    }
+
   for (i = 1; i < 8; ++i)
     {
       do_test (i, 64, 256, 0, MIDDLE_CHAR);
       do_test (i, 64, 256, 0, BIG_CHAR);
     }
 
+  for (i = 0; i < 8; ++i)
+    {
+      do_test (16 * i, 256, 512, 0, MIDDLE_CHAR);
+      do_test (16 * i, 256, 512, 0, BIG_CHAR);
+    }
+
   for (i = 0; i < 32; ++i)
     {
       do_test (0, i, i + 1, 0, MIDDLE_CHAR);
diff --git a/string/test-strchr.c b/string/test-strchr.c
index 5b6022746c..6c8ca54a7d 100644
--- a/string/test-strchr.c
+++ b/string/test-strchr.c
@@ -130,7 +130,7 @@ do_test (size_t align, size_t pos, size_t len, int seek_char, int max_char)
   size_t i;
   CHAR *result;
   CHAR *buf = (CHAR *) buf1;
-  align &= 15;
+  align &= 127;
   if ((align + len) * sizeof (CHAR) >= page_size)
     return;
 
@@ -259,12 +259,24 @@ test_main (void)
       do_test (i, 16 << i, 2048, SMALL_CHAR, MIDDLE_CHAR);
     }
 
+  for (i = 1; i < 8; ++i)
+    {
+      do_test (0, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
+      do_test (i, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
+    }
+
   for (i = 1; i < 8; ++i)
     {
       do_test (i, 64, 256, SMALL_CHAR, MIDDLE_CHAR);
       do_test (i, 64, 256, SMALL_CHAR, BIG_CHAR);
     }
 
+  for (i = 0; i < 8; ++i)
+    {
+      do_test (16 * i, 256, 512, SMALL_CHAR, MIDDLE_CHAR);
+      do_test (16 * i, 256, 512, SMALL_CHAR, BIG_CHAR);
+    }
+
   for (i = 0; i < 32; ++i)
     {
       do_test (0, i, i + 1, SMALL_CHAR, MIDDLE_CHAR);
@@ -277,12 +289,24 @@ test_main (void)
       do_test (i, 16 << i, 2048, 0, MIDDLE_CHAR);
     }
 
+  for (i = 1; i < 8; ++i)
+    {
+      do_test (0, 16 << i, 4096, 0, MIDDLE_CHAR);
+      do_test (i, 16 << i, 4096, 0, MIDDLE_CHAR);
+    }
+
   for (i = 1; i < 8; ++i)
     {
       do_test (i, 64, 256, 0, MIDDLE_CHAR);
       do_test (i, 64, 256, 0, BIG_CHAR);
     }
 
+  for (i = 0; i < 8; ++i)
+    {
+      do_test (16 * i, 256, 512, 0, MIDDLE_CHAR);
+      do_test (16 * i, 256, 512, 0, BIG_CHAR);
+    }
+
   for (i = 0; i < 32; ++i)
     {
       do_test (0, i, i + 1, 0, MIDDLE_CHAR);
-- 
2.29.2


  reply	other threads:[~2021-02-08 19:35 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-03  5:38 [PATCH v4 1/2] x86: Refactor and improve performance of strchr-avx2.S goldstein.w.n
2021-02-03  5:39 ` [PATCH v4 2/2] x86: Add additional benchmarks and tests for strchr goldstein.w.n
2021-02-08 14:08   ` H.J. Lu
2021-02-08 19:34     ` H.J. Lu [this message]
2021-02-08 19:49       ` Noah Goldstein
2021-02-08 14:08 ` [PATCH v4 1/2] x86: Refactor and improve performance of strchr-avx2.S H.J. Lu
2021-02-08 19:33   ` H.J. Lu
2021-02-08 19:48     ` Noah Goldstein
2021-02-08 20:57       ` Noah Goldstein
2022-04-27 23:43         ` Sunil Pandey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAMe9rOq+v9Uk8b3LG=E401qm6gg4R04NRaZ1pGVZQRfTWuweJQ@mail.gmail.com' \
    --to=hjl.tools@gmail.com \
    --cc=carlos@systemhalted.org \
    --cc=goldstein.w.n@gmail.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).