From: Noah Goldstein <goldstein.w.n@gmail.com>
To: "H.J. Lu" <hjl.tools@gmail.com>
Cc: GNU C Library <libc-alpha@sourceware.org>,
"Carlos O'Donell" <carlos@systemhalted.org>
Subject: Re: [PATCH v4 2/2] x86: Add additional benchmarks and tests for strchr
Date: Mon, 8 Feb 2021 14:49:13 -0500 [thread overview]
Message-ID: <CAFUsyfK4vmoCYe5YgH6EfmFPtFnHEWhgTV3csNoHr9d7w+hVfQ@mail.gmail.com> (raw)
In-Reply-To: <CAMe9rOq+v9Uk8b3LG=E401qm6gg4R04NRaZ1pGVZQRfTWuweJQ@mail.gmail.com>
On Mon, Feb 8, 2021 at 2:35 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Mon, Feb 8, 2021 at 6:08 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Tue, Feb 2, 2021 at 9:39 PM <goldstein.w.n@gmail.com> wrote:
> > >
> > > From: noah <goldstein.w.n@gmail.com>
> > >
> > > This patch adds additional benchmarks and tests for string size of
> > > 4096 and several benchmarks for string size 256 with different
> > > alignments.
> > >
> > > Signed-off-by: noah <goldstein.w.n@gmail.com>
> > > ---
> > > Added 2 additional benchmark and test sizes:
> > >
> > > 4096: Just feels like a natural "large" size to test
> > >
> > > 256 with multiple alignments: This essentially is to test how
> > > expensive the initial work prior to the 4x loop is depending on
> > > different alignments.
> > >
> > > results from bench-strchr: All times are in seconds and the medium of
> > > 100 runs. Old is current strchr-avx2.S implementation. New is this
> > > patch.
> > >
> > > Summary: New is definetly faster for medium -> large sizes. Once the
> > > 4x loop is hit there is a 10%+ speedup and New always wins out. For
> > > smaller sizes there is more variance as to which is faster and the
> > > differences are small. Generally it seems the New version wins
> > > out. This is likely because 0 - 31 sized strings are the fast path for
> > > new (no jmp). Also something that is neat is the significant
> > > performance improved for alignment 96 and 112. This is because the 5x
> > > vectors before 4x loop really favor that alignment.
> > >
> > > Benchmarking CPU:
> > > Icelake: Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
> > >
> > > size, algn, Old T , New T -------- Win Dif
> > > 0 , 0 , 2.54 , 2.52 -------- New -0.02
> > > 1 , 0 , 2.57 , 2.52 -------- New -0.05
> > > 2 , 0 , 2.56 , 2.52 -------- New -0.04
> > > 3 , 0 , 2.58 , 2.54 -------- New -0.04
> > > 4 , 0 , 2.61 , 2.55 -------- New -0.06
> > > 5 , 0 , 2.65 , 2.62 -------- New -0.03
> > > 6 , 0 , 2.73 , 2.74 -------- Old -0.01
> > > 7 , 0 , 2.75 , 2.74 -------- New -0.01
> > > 8 , 0 , 2.62 , 2.6 -------- New -0.02
> > > 9 , 0 , 2.73 , 2.75 -------- Old -0.02
> > > 10 , 0 , 2.74 , 2.74 -------- Eq N/A
> > > 11 , 0 , 2.76 , 2.72 -------- New -0.04
> > > 12 , 0 , 2.74 , 2.72 -------- New -0.02
> > > 13 , 0 , 2.75 , 2.72 -------- New -0.03
> > > 14 , 0 , 2.74 , 2.73 -------- New -0.01
> > > 15 , 0 , 2.74 , 2.73 -------- New -0.01
> > > 16 , 0 , 2.74 , 2.73 -------- New -0.01
> > > 17 , 0 , 2.74 , 2.74 -------- Eq N/A
> > > 18 , 0 , 2.73 , 2.73 -------- Eq N/A
> > > 19 , 0 , 2.73 , 2.73 -------- Eq N/A
> > > 20 , 0 , 2.73 , 2.73 -------- Eq N/A
> > > 21 , 0 , 2.73 , 2.72 -------- New -0.01
> > > 22 , 0 , 2.71 , 2.74 -------- Old -0.03
> > > 23 , 0 , 2.71 , 2.69 -------- New -0.02
> > > 24 , 0 , 2.68 , 2.67 -------- New -0.01
> > > 25 , 0 , 2.66 , 2.62 -------- New -0.04
> > > 26 , 0 , 2.64 , 2.62 -------- New -0.02
> > > 27 , 0 , 2.71 , 2.64 -------- New -0.07
> > > 28 , 0 , 2.67 , 2.69 -------- Old -0.02
> > > 29 , 0 , 2.72 , 2.72 -------- Eq N/A
> > > 30 , 0 , 2.68 , 2.69 -------- Old -0.01
> > > 31 , 0 , 2.68 , 2.68 -------- Eq N/A
> > > 32 , 0 , 3.51 , 3.52 -------- Old -0.01
> > > 32 , 1 , 3.52 , 3.51 -------- New -0.01
> > > 64 , 0 , 3.97 , 3.93 -------- New -0.04
> > > 64 , 2 , 3.95 , 3.9 -------- New -0.05
> > > 64 , 1 , 4.0 , 3.93 -------- New -0.07
> > > 64 , 3 , 3.97 , 3.88 -------- New -0.09
> > > 64 , 4 , 3.95 , 3.89 -------- New -0.06
> > > 64 , 5 , 3.94 , 3.9 -------- New -0.04
> > > 64 , 6 , 3.97 , 3.9 -------- New -0.07
> > > 64 , 7 , 3.97 , 3.91 -------- New -0.06
> > > 96 , 0 , 4.74 , 4.52 -------- New -0.22
> > > 128 , 0 , 5.29 , 5.19 -------- New -0.1
> > > 128 , 2 , 5.29 , 5.15 -------- New -0.14
> > > 128 , 3 , 5.31 , 5.22 -------- New -0.09
> > > 256 , 0 , 11.19 , 9.81 -------- New -1.38
> > > 256 , 3 , 11.19 , 9.84 -------- New -1.35
> > > 256 , 4 , 11.2 , 9.88 -------- New -1.32
> > > 256 , 16 , 11.21 , 9.79 -------- New -1.42
> > > 256 , 32 , 11.39 , 10.34 -------- New -1.05
> > > 256 , 48 , 11.88 , 10.56 -------- New -1.32
> > > 256 , 64 , 11.82 , 10.83 -------- New -0.99
> > > 256 , 80 , 11.85 , 10.86 -------- New -0.99
> > > 256 , 96 , 9.56 , 8.76 -------- New -0.8
> > > 256 , 112 , 9.55 , 8.9 -------- New -0.65
> > > 512 , 0 , 15.76 , 13.72 -------- New -2.04
> > > 512 , 4 , 15.72 , 13.74 -------- New -1.98
> > > 512 , 5 , 15.73 , 13.74 -------- New -1.99
> > > 1024, 0 , 24.85 , 21.33 -------- New -3.52
> > > 1024, 5 , 24.86 , 21.27 -------- New -3.59
> > > 1024, 6 , 24.87 , 21.32 -------- New -3.55
> > > 2048, 0 , 45.75 , 36.7 -------- New -9.05
> > > 2048, 6 , 43.91 , 35.42 -------- New -8.49
> > > 2048, 7 , 44.43 , 36.37 -------- New -8.06
> > > 4096, 0 , 96.94 , 81.34 -------- New -15.6
> > > 4096, 7 , 97.01 , 81.32 -------- New -15.69
> > >
> > > benchtests/bench-strchr.c | 26 +++++++++++++++++++++++++-
> > > string/test-strchr.c | 26 +++++++++++++++++++++++++-
> > > 2 files changed, 50 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/benchtests/bench-strchr.c b/benchtests/bench-strchr.c
> > > index bf493fe458..4ce2369d9b 100644
> > > --- a/benchtests/bench-strchr.c
> > > +++ b/benchtests/bench-strchr.c
> > > @@ -100,7 +100,7 @@ do_test (size_t align, size_t pos, size_t len, int seek_char, int max_char)
> > > size_t i;
> > > CHAR *result;
> > > CHAR *buf = (CHAR *) buf1;
> > > - align &= 15;
> > > + align &= 127;
> > > if ((align + len) * sizeof (CHAR) >= page_size)
> > > return;
> > >
> > > @@ -151,12 +151,24 @@ test_main (void)
> > > do_test (i, 16 << i, 2048, SMALL_CHAR, MIDDLE_CHAR);
> > > }
> > >
> > > + for (i = 1; i < 8; ++i)
> > > + {
> > > + do_test (0, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
> > > + do_test (i, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
> > > + }
> > > +
> > > for (i = 1; i < 8; ++i)
> > > {
> > > do_test (i, 64, 256, SMALL_CHAR, MIDDLE_CHAR);
> > > do_test (i, 64, 256, SMALL_CHAR, BIG_CHAR);
> > > }
> > >
> > > + for (i = 0; i < 8; ++i)
> > > + {
> > > + do_test (16 * i, 256, 512, SMALL_CHAR, MIDDLE_CHAR);
> > > + do_test (16 * i, 256, 512, SMALL_CHAR, BIG_CHAR);
> > > + }
> > > +
> > > for (i = 0; i < 32; ++i)
> > > {
> > > do_test (0, i, i + 1, SMALL_CHAR, MIDDLE_CHAR);
> > > @@ -169,12 +181,24 @@ test_main (void)
> > > do_test (i, 16 << i, 2048, 0, MIDDLE_CHAR);
> > > }
> > >
> > > + for (i = 1; i < 8; ++i)
> > > + {
> > > + do_test (0, 16 << i, 4096, 0, MIDDLE_CHAR);
> > > + do_test (i, 16 << i, 4096, 0, MIDDLE_CHAR);
> > > + }
> > > +
> > > for (i = 1; i < 8; ++i)
> > > {
> > > do_test (i, 64, 256, 0, MIDDLE_CHAR);
> > > do_test (i, 64, 256, 0, BIG_CHAR);
> > > }
> > >
> > > + for (i = 0; i < 8; ++i)
> > > + {
> > > + do_test (16 * i, 256, 512, 0, MIDDLE_CHAR);
> > > + do_test (16 * i, 256, 512, 0, BIG_CHAR);
> > > + }
> > > +
> > > for (i = 0; i < 32; ++i)
> > > {
> > > do_test (0, i, i + 1, 0, MIDDLE_CHAR);
> > > diff --git a/string/test-strchr.c b/string/test-strchr.c
> > > index 5b6022746c..2cf4ea2add 100644
> > > --- a/string/test-strchr.c
> > > +++ b/string/test-strchr.c
> > > @@ -130,7 +130,7 @@ do_test (size_t align, size_t pos, size_t len, int seek_char, int max_char)
> > > size_t i;
> > > CHAR *result;
> > > CHAR *buf = (CHAR *) buf1;
> > > - align &= 15;
> > > + align &= 127;
> > > if ((align + len) * sizeof (CHAR) >= page_size)
> > > return;
> > >
> > > @@ -259,12 +259,24 @@ test_main (void)
> > > do_test (i, 16 << i, 2048, SMALL_CHAR, MIDDLE_CHAR);
> > > }
> > >
> > > + for (i = 1; i < 8; ++i)
> > > + {
> > > + do_test (0, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
> > > + do_test (i, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
> > > + }
> > > +
> > > for (i = 1; i < 8; ++i)
> > > {
> > > do_test (i, 64, 256, SMALL_CHAR, MIDDLE_CHAR);
> > > do_test (i, 64, 256, SMALL_CHAR, BIG_CHAR);
> > > }
> > >
> > > + for (i = 0; i < 8; ++i)
> > > + {
> > > + do_test (16 * i, 256, 512, SMALL_CHAR, MIDDLE_CHAR);
> > > + do_test (16 * i, 256, 512, SMALL_CHAR, BIG_CHAR);
> > > + }
> > > +
> > > for (i = 0; i < 32; ++i)
> > > {
> > > do_test (0, i, i + 1, SMALL_CHAR, MIDDLE_CHAR);
> > > @@ -277,12 +289,24 @@ test_main (void)
> > > do_test (i, 16 << i, 2048, 0, MIDDLE_CHAR);
> > > }
> > >
> > > + for (i = 1; i < 8; ++i)
> > > + {
> > > + do_test (0, 16 << i, 4096, 0, MIDDLE_CHAR);
> > > + do_test (i, 16 << i, 4096, 0, MIDDLE_CHAR);
> > > + }
> > > +
> > > for (i = 1; i < 8; ++i)
> > > {
> > > do_test (i, 64, 256, 0, MIDDLE_CHAR);
> > > do_test (i, 64, 256, 0, BIG_CHAR);
> > > }
> > >
> > > + for (i = 0; i < 8; ++i)
> > > + {
> > > + do_test (16 * i, 256, 512, 0, MIDDLE_CHAR);
> > > + do_test (16 * i, 256, 512, 0, BIG_CHAR);
> > > + }
> > > +
> > > for (i = 0; i < 32; ++i)
> > > {
> > > do_test (0, i, i + 1, 0, MIDDLE_CHAR);
> > > --
> > > 2.29.2
> > >
> >
> > LGTM.
> >
> > Thanks.
> >
> > --
> > H.J.
>
> This is the updated patch with extra white spaces fixed I am checking in.
>
>
> --
> H.J.
Awesome! Thanks!
N.G.
next prev parent reply other threads:[~2021-02-08 19:49 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-03 5:38 [PATCH v4 1/2] x86: Refactor and improve performance of strchr-avx2.S goldstein.w.n
2021-02-03 5:39 ` [PATCH v4 2/2] x86: Add additional benchmarks and tests for strchr goldstein.w.n
2021-02-08 14:08 ` H.J. Lu
2021-02-08 19:34 ` H.J. Lu
2021-02-08 19:49 ` Noah Goldstein [this message]
2021-02-08 14:08 ` [PATCH v4 1/2] x86: Refactor and improve performance of strchr-avx2.S H.J. Lu
2021-02-08 19:33 ` H.J. Lu
2021-02-08 19:48 ` Noah Goldstein
2021-02-08 20:57 ` Noah Goldstein
2022-04-27 23:43 ` Sunil Pandey
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAFUsyfK4vmoCYe5YgH6EfmFPtFnHEWhgTV3csNoHr9d7w+hVfQ@mail.gmail.com \
--to=goldstein.w.n@gmail.com \
--cc=carlos@systemhalted.org \
--cc=hjl.tools@gmail.com \
--cc=libc-alpha@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).