public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: goldstein.w.n@gmail.com
To: libc-alpha@sourceware.org
Cc: carlos@systemhalted.org, goldstein.w.n@gmail.com, hjl.tools@gmail.com
Subject: [PATCH v4 2/2] x86: Add additional benchmarks and tests for strchr
Date: Wed,  3 Feb 2021 00:39:00 -0500	[thread overview]
Message-ID: <20210203053900.4125403-2-goldstein.w.n@gmail.com> (raw)
In-Reply-To: <20210203053900.4125403-1-goldstein.w.n@gmail.com>

From: noah <goldstein.w.n@gmail.com>

This patch adds additional benchmarks and tests for string size of
4096 and several benchmarks for string size 256 with different
alignments.

Signed-off-by: noah <goldstein.w.n@gmail.com>
---
Added 2 additional benchmark and test sizes:

4096: Just feels like a natural "large" size to test
    
256 with multiple alignments: This essentially is to test how
expensive the initial work prior to the 4x loop is depending on
different alignments.

results from bench-strchr: All times are in seconds and the medium of
100 runs.  Old is current strchr-avx2.S implementation. New is this
patch.

Summary: New is definetly faster for medium -> large sizes. Once the
4x loop is hit there is a 10%+ speedup and New always wins out. For
smaller sizes there is more variance as to which is faster and the
differences are small. Generally it seems the New version wins
out. This is likely because 0 - 31 sized strings are the fast path for
new (no jmp). Also something that is neat is the significant
performance improved for alignment 96 and 112. This is because the 5x
vectors before 4x loop really favor that alignment.

Benchmarking CPU:
Icelake: Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz

size, algn, Old T , New T  -------- Win  Dif
0   , 0   , 2.54  , 2.52   -------- New  -0.02
1   , 0   , 2.57  , 2.52   -------- New  -0.05
2   , 0   , 2.56  , 2.52   -------- New  -0.04
3   , 0   , 2.58  , 2.54   -------- New  -0.04
4   , 0   , 2.61  , 2.55   -------- New  -0.06
5   , 0   , 2.65  , 2.62   -------- New  -0.03
6   , 0   , 2.73  , 2.74   -------- Old  -0.01
7   , 0   , 2.75  , 2.74   -------- New  -0.01
8   , 0   , 2.62  , 2.6    -------- New  -0.02
9   , 0   , 2.73  , 2.75   -------- Old  -0.02
10  , 0   , 2.74  , 2.74   -------- Eq    N/A
11  , 0   , 2.76  , 2.72   -------- New  -0.04
12  , 0   , 2.74  , 2.72   -------- New  -0.02
13  , 0   , 2.75  , 2.72   -------- New  -0.03
14  , 0   , 2.74  , 2.73   -------- New  -0.01
15  , 0   , 2.74  , 2.73   -------- New  -0.01
16  , 0   , 2.74  , 2.73   -------- New  -0.01
17  , 0   , 2.74  , 2.74   -------- Eq    N/A
18  , 0   , 2.73  , 2.73   -------- Eq    N/A
19  , 0   , 2.73  , 2.73   -------- Eq    N/A
20  , 0   , 2.73  , 2.73   -------- Eq    N/A
21  , 0   , 2.73  , 2.72   -------- New  -0.01
22  , 0   , 2.71  , 2.74   -------- Old  -0.03
23  , 0   , 2.71  , 2.69   -------- New  -0.02
24  , 0   , 2.68  , 2.67   -------- New  -0.01
25  , 0   , 2.66  , 2.62   -------- New  -0.04
26  , 0   , 2.64  , 2.62   -------- New  -0.02
27  , 0   , 2.71  , 2.64   -------- New  -0.07
28  , 0   , 2.67  , 2.69   -------- Old  -0.02
29  , 0   , 2.72  , 2.72   -------- Eq    N/A
30  , 0   , 2.68  , 2.69   -------- Old  -0.01
31  , 0   , 2.68  , 2.68   -------- Eq    N/A
32  , 0   , 3.51  , 3.52   -------- Old  -0.01
32  , 1   , 3.52  , 3.51   -------- New  -0.01
64  , 0   , 3.97  , 3.93   -------- New  -0.04
64  , 2   , 3.95  , 3.9    -------- New  -0.05
64  , 1   , 4.0   , 3.93   -------- New  -0.07
64  , 3   , 3.97  , 3.88   -------- New  -0.09
64  , 4   , 3.95  , 3.89   -------- New  -0.06
64  , 5   , 3.94  , 3.9    -------- New  -0.04
64  , 6   , 3.97  , 3.9    -------- New  -0.07
64  , 7   , 3.97  , 3.91   -------- New  -0.06
96  , 0   , 4.74  , 4.52   -------- New  -0.22
128 , 0   , 5.29  , 5.19   -------- New  -0.1
128 , 2   , 5.29  , 5.15   -------- New  -0.14
128 , 3   , 5.31  , 5.22   -------- New  -0.09
256 , 0   , 11.19 , 9.81   -------- New  -1.38
256 , 3   , 11.19 , 9.84   -------- New  -1.35
256 , 4   , 11.2  , 9.88   -------- New  -1.32
256 , 16  , 11.21 , 9.79   -------- New  -1.42
256 , 32  , 11.39 , 10.34  -------- New  -1.05
256 , 48  , 11.88 , 10.56  -------- New  -1.32
256 , 64  , 11.82 , 10.83  -------- New  -0.99
256 , 80  , 11.85 , 10.86  -------- New  -0.99
256 , 96  , 9.56  , 8.76   -------- New  -0.8 
256 , 112 , 9.55  , 8.9    -------- New  -0.65
512 , 0   , 15.76 , 13.72  -------- New  -2.04
512 , 4   , 15.72 , 13.74  -------- New  -1.98
512 , 5   , 15.73 , 13.74  -------- New  -1.99
1024, 0   , 24.85 , 21.33  -------- New  -3.52
1024, 5   , 24.86 , 21.27  -------- New  -3.59
1024, 6   , 24.87 , 21.32  -------- New  -3.55
2048, 0   , 45.75 , 36.7   -------- New  -9.05
2048, 6   , 43.91 , 35.42  -------- New  -8.49
2048, 7   , 44.43 , 36.37  -------- New  -8.06
4096, 0   , 96.94 , 81.34  -------- New  -15.6
4096, 7   , 97.01 , 81.32  -------- New  -15.69

 benchtests/bench-strchr.c | 26 +++++++++++++++++++++++++-
 string/test-strchr.c      | 26 +++++++++++++++++++++++++-
 2 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/benchtests/bench-strchr.c b/benchtests/bench-strchr.c
index bf493fe458..4ce2369d9b 100644
--- a/benchtests/bench-strchr.c
+++ b/benchtests/bench-strchr.c
@@ -100,7 +100,7 @@ do_test (size_t align, size_t pos, size_t len, int seek_char, int max_char)
   size_t i;
   CHAR *result;
   CHAR *buf = (CHAR *) buf1;
-  align &= 15;
+  align &= 127;
   if ((align + len) * sizeof (CHAR) >= page_size)
     return;
 
@@ -151,12 +151,24 @@ test_main (void)
       do_test (i, 16 << i, 2048, SMALL_CHAR, MIDDLE_CHAR);
     }
 
+  for (i = 1; i < 8; ++i)
+    {
+      do_test (0, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
+      do_test (i, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
+    }
+
   for (i = 1; i < 8; ++i)
     {
       do_test (i, 64, 256, SMALL_CHAR, MIDDLE_CHAR);
       do_test (i, 64, 256, SMALL_CHAR, BIG_CHAR);
     }
 
+  for (i = 0; i < 8; ++i)
+    {
+      do_test (16 * i, 256, 512, SMALL_CHAR, MIDDLE_CHAR);
+      do_test (16 * i, 256, 512, SMALL_CHAR, BIG_CHAR);
+    }
+
   for (i = 0; i < 32; ++i)
     {
       do_test (0, i, i + 1, SMALL_CHAR, MIDDLE_CHAR);
@@ -169,12 +181,24 @@ test_main (void)
       do_test (i, 16 << i, 2048, 0, MIDDLE_CHAR);
     }
 
+  for (i = 1; i < 8; ++i)
+    {
+      do_test (0, 16 << i, 4096, 0, MIDDLE_CHAR);
+      do_test (i, 16 << i, 4096, 0, MIDDLE_CHAR);
+    }
+
   for (i = 1; i < 8; ++i)
     {
       do_test (i, 64, 256, 0, MIDDLE_CHAR);
       do_test (i, 64, 256, 0, BIG_CHAR);
     }
 
+  for (i = 0; i < 8; ++i)
+    {
+      do_test (16 * i, 256, 512, 0, MIDDLE_CHAR);
+      do_test (16 * i, 256, 512, 0, BIG_CHAR);
+    }
+
   for (i = 0; i < 32; ++i)
     {
       do_test (0, i, i + 1, 0, MIDDLE_CHAR);
diff --git a/string/test-strchr.c b/string/test-strchr.c
index 5b6022746c..2cf4ea2add 100644
--- a/string/test-strchr.c
+++ b/string/test-strchr.c
@@ -130,7 +130,7 @@ do_test (size_t align, size_t pos, size_t len, int seek_char, int max_char)
   size_t i;
   CHAR *result;
   CHAR *buf = (CHAR *) buf1;
-  align &= 15;
+  align &= 127;
   if ((align + len) * sizeof (CHAR) >= page_size)
     return;
 
@@ -259,12 +259,24 @@ test_main (void)
       do_test (i, 16 << i, 2048, SMALL_CHAR, MIDDLE_CHAR);
     }
 
+  for (i = 1; i < 8; ++i)
+    {
+      do_test (0, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
+      do_test (i, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
+    }
+  
   for (i = 1; i < 8; ++i)
     {
       do_test (i, 64, 256, SMALL_CHAR, MIDDLE_CHAR);
       do_test (i, 64, 256, SMALL_CHAR, BIG_CHAR);
     }
 
+  for (i = 0; i < 8; ++i)
+    {
+      do_test (16 * i, 256, 512, SMALL_CHAR, MIDDLE_CHAR);
+      do_test (16 * i, 256, 512, SMALL_CHAR, BIG_CHAR);
+    }
+  
   for (i = 0; i < 32; ++i)
     {
       do_test (0, i, i + 1, SMALL_CHAR, MIDDLE_CHAR);
@@ -277,12 +289,24 @@ test_main (void)
       do_test (i, 16 << i, 2048, 0, MIDDLE_CHAR);
     }
 
+  for (i = 1; i < 8; ++i)
+    {
+      do_test (0, 16 << i, 4096, 0, MIDDLE_CHAR);
+      do_test (i, 16 << i, 4096, 0, MIDDLE_CHAR);
+    }
+  
   for (i = 1; i < 8; ++i)
     {
       do_test (i, 64, 256, 0, MIDDLE_CHAR);
       do_test (i, 64, 256, 0, BIG_CHAR);
     }
 
+  for (i = 0; i < 8; ++i)
+    {
+      do_test (16 * i, 256, 512, 0, MIDDLE_CHAR);
+      do_test (16 * i, 256, 512, 0, BIG_CHAR);
+    }
+  
   for (i = 0; i < 32; ++i)
     {
       do_test (0, i, i + 1, 0, MIDDLE_CHAR);
-- 
2.29.2


  reply	other threads:[~2021-02-03  5:39 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-03  5:38 [PATCH v4 1/2] x86: Refactor and improve performance of strchr-avx2.S goldstein.w.n
2021-02-03  5:39 ` goldstein.w.n [this message]
2021-02-08 14:08   ` [PATCH v4 2/2] x86: Add additional benchmarks and tests for strchr H.J. Lu
2021-02-08 19:34     ` H.J. Lu
2021-02-08 19:49       ` Noah Goldstein
2021-02-08 14:08 ` [PATCH v4 1/2] x86: Refactor and improve performance of strchr-avx2.S H.J. Lu
2021-02-08 19:33   ` H.J. Lu
2021-02-08 19:48     ` Noah Goldstein
2021-02-08 20:57       ` Noah Goldstein
2022-04-27 23:43         ` Sunil Pandey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210203053900.4125403-2-goldstein.w.n@gmail.com \
    --to=goldstein.w.n@gmail.com \
    --cc=carlos@systemhalted.org \
    --cc=hjl.tools@gmail.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).