From: "H.J. Lu" <hjl.tools@gmail.com>
To: noah <goldstein.w.n@gmail.com>
Cc: GNU C Library <libc-alpha@sourceware.org>,
"Carlos O'Donell" <carlos@systemhalted.org>
Subject: Re: [PATCH v2 2/2] x86: Add additional benchmarks for strchr
Date: Mon, 1 Feb 2021 09:10:36 -0800 [thread overview]
Message-ID: <CAMe9rOop=no5vopFXMjGkXozwqWosaoYeq-3-aZ39hHXwY-fVA@mail.gmail.com> (raw)
In-Reply-To: <20210201003014.785099-2-goldstein.w.n@gmail.com>
On Sun, Jan 31, 2021 at 4:30 PM noah <goldstein.w.n@gmail.com> wrote:
>
> This patch adds additional benchmarks for string size of 4096 and
> several benchmarks for string size 256 with different alignments.
>
> Signed-off-by: noah <goldstein.w.n@gmail.com>
> ---
> Added 2 additional benchmark sizes:
>
> 4096: Just feels like a natural "large" size to test
>
> 256 with multiple alignments: This essentially is to test how
> expensive the initial work prior to the 4x loop is depending on
> different alignments.
>
> results from bench-strchr: All times are in seconds and the medium of
> 100 runs. Old is current strchr-avx2.S implementation. New is this
> patch.
>
> Summary: New is definetly faster for medium -> large sizes. Once the
> 4x loop is hit there is a 10%+ speedup and New always wins out. For
> smaller sizes there is more variance as to which is faster and the
> differences are small. Generally it seems the New version wins
> out. This is likely because 0 - 31 sized strings are the fast path for
> new (no jmp).
>
> Benchmarking CPU:
> Icelake: Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
>
> size, algn, Old T , New T -------- Win Dif
> 0 , 0 , 2.54 , 2.52 -------- New -0.02
> 1 , 0 , 2.57 , 2.52 -------- New -0.05
> 2 , 0 , 2.56 , 2.52 -------- New -0.04
> 3 , 0 , 2.58 , 2.54 -------- New -0.04
> 4 , 0 , 2.61 , 2.55 -------- New -0.06
> 5 , 0 , 2.65 , 2.62 -------- New -0.03
> 6 , 0 , 2.73 , 2.74 -------- Old -0.01
> 7 , 0 , 2.75 , 2.74 -------- New -0.01
> 8 , 0 , 2.62 , 2.6 -------- New -0.02
> 9 , 0 , 2.73 , 2.75 -------- Old -0.02
> 10 , 0 , 2.74 , 2.74 -------- Eq N/A
> 11 , 0 , 2.76 , 2.72 -------- New -0.04
> 12 , 0 , 2.74 , 2.72 -------- New -0.02
> 13 , 0 , 2.75 , 2.72 -------- New -0.03
> 14 , 0 , 2.74 , 2.73 -------- New -0.01
> 15 , 0 , 2.74 , 2.73 -------- New -0.01
> 16 , 0 , 2.74 , 2.73 -------- New -0.01
> 17 , 0 , 2.74 , 2.74 -------- Eq N/A
> 18 , 0 , 2.73 , 2.73 -------- Eq N/A
> 19 , 0 , 2.73 , 2.73 -------- Eq N/A
> 20 , 0 , 2.73 , 2.73 -------- Eq N/A
> 21 , 0 , 2.73 , 2.72 -------- New -0.01
> 22 , 0 , 2.71 , 2.74 -------- Old -0.03
> 23 , 0 , 2.71 , 2.69 -------- New -0.02
> 24 , 0 , 2.68 , 2.67 -------- New -0.01
> 25 , 0 , 2.66 , 2.62 -------- New -0.04
> 26 , 0 , 2.64 , 2.62 -------- New -0.02
> 27 , 0 , 2.71 , 2.64 -------- New -0.07
> 28 , 0 , 2.67 , 2.69 -------- Old -0.02
> 29 , 0 , 2.72 , 2.72 -------- Eq N/A
> 30 , 0 , 2.68 , 2.69 -------- Old -0.01
> 31 , 0 , 2.68 , 2.68 -------- Eq N/A
> 32 , 0 , 3.51 , 3.52 -------- Old -0.01
> 32 , 1 , 3.52 , 3.51 -------- New -0.01
> 64 , 0 , 3.97 , 3.93 -------- New -0.04
> 64 , 2 , 3.95 , 3.9 -------- New -0.05
> 64 , 1 , 4.0 , 3.93 -------- New -0.07
> 64 , 3 , 3.97 , 3.88 -------- New -0.09
> 64 , 4 , 3.95 , 3.89 -------- New -0.06
> 64 , 5 , 3.94 , 3.9 -------- New -0.04
> 64 , 6 , 3.97 , 3.9 -------- New -0.07
> 64 , 7 , 3.97 , 3.91 -------- New -0.06
> 96 , 0 , 4.74 , 4.52 -------- New -0.22
> 128 , 0 , 5.29 , 5.19 -------- New -0.1
> 128 , 2 , 5.29 , 5.15 -------- New -0.14
> 128 , 3 , 5.31 , 5.22 -------- New -0.09
> 256 , 0 , 11.19 , 9.81 -------- New -1.38
> 256 , 3 , 11.19 , 9.84 -------- New -1.35
> 256 , 4 , 11.2 , 9.88 -------- New -1.32
> 256 , 16 , 11.21 , 9.79 -------- New -1.42
> 256 , 32 , 11.39 , 10.34 -------- New -1.05
> 256 , 48 , 11.88 , 10.56 -------- New -1.32
> 256 , 64 , 11.82 , 10.83 -------- New -0.99
> 256 , 80 , 11.85 , 10.86 -------- New -0.99
> 256 , 96 , 9.56 , 8.76 -------- New -0.8
> 256 , 112 , 9.55 , 8.9 -------- New -0.65
> 512 , 0 , 15.76 , 13.72 -------- New -2.04
> 512 , 4 , 15.72 , 13.74 -------- New -1.98
> 512 , 5 , 15.73 , 13.74 -------- New -1.99
> 1024, 0 , 24.85 , 21.33 -------- New -3.52
> 1024, 5 , 24.86 , 21.27 -------- New -3.59
> 1024, 6 , 24.87 , 21.32 -------- New -3.55
> 2048, 0 , 45.75 , 36.7 -------- New -9.05
> 2048, 6 , 43.91 , 35.42 -------- New -8.49
> 2048, 7 , 44.43 , 36.37 -------- New -8.06
> 4096, 0 , 96.94 , 81.34 -------- New -15.6
> 4096, 7 , 97.01 , 81.32 -------- New -15.69
>
>
>
> benchtests/bench-strchr.c | 32 ++++++++++++++++++++++++++++++--
> 1 file changed, 30 insertions(+), 2 deletions(-)
>
> diff --git a/benchtests/bench-strchr.c b/benchtests/bench-strchr.c
> index bf493fe458..5fd98a5d43 100644
> --- a/benchtests/bench-strchr.c
> +++ b/benchtests/bench-strchr.c
> @@ -100,9 +100,13 @@ do_test (size_t align, size_t pos, size_t len, int seek_char, int max_char)
> size_t i;
> CHAR *result;
> CHAR *buf = (CHAR *) buf1;
> - align &= 15;
> +
> + align &= 127;
> if ((align + len) * sizeof (CHAR) >= page_size)
> - return;
> + {
> + return;
> + }
> +
>
> for (i = 0; i < len; ++i)
> {
> @@ -151,12 +155,24 @@ test_main (void)
> do_test (i, 16 << i, 2048, SMALL_CHAR, MIDDLE_CHAR);
> }
>
> + for (i = 1; i < 8; ++i)
> + {
> + do_test (0, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
> + do_test (i, 16 << i, 4096, SMALL_CHAR, MIDDLE_CHAR);
> + }
> +
> for (i = 1; i < 8; ++i)
> {
> do_test (i, 64, 256, SMALL_CHAR, MIDDLE_CHAR);
> do_test (i, 64, 256, SMALL_CHAR, BIG_CHAR);
> }
>
> + for (i = 0; i < 8; ++i)
> + {
> + do_test (16 * i, 256, 512, SMALL_CHAR, MIDDLE_CHAR);
> + do_test (16 * i, 256, 512, SMALL_CHAR, BIG_CHAR);
> + }
> +
> for (i = 0; i < 32; ++i)
> {
> do_test (0, i, i + 1, SMALL_CHAR, MIDDLE_CHAR);
> @@ -169,12 +185,24 @@ test_main (void)
> do_test (i, 16 << i, 2048, 0, MIDDLE_CHAR);
> }
>
> + for (i = 1; i < 8; ++i)
> + {
> + do_test (0, 16 << i, 4096, 0, MIDDLE_CHAR);
> + do_test (i, 16 << i, 4096, 0, MIDDLE_CHAR);
> + }
> +
> for (i = 1; i < 8; ++i)
> {
> do_test (i, 64, 256, 0, MIDDLE_CHAR);
> do_test (i, 64, 256, 0, BIG_CHAR);
> }
>
> + for (i = 0; i < 8; ++i)
> + {
> + do_test (16 * i, 256, 512, 0, MIDDLE_CHAR);
> + do_test (16 * i, 256, 512, 0, BIG_CHAR);
> + }
> +
> for (i = 0; i < 32; ++i)
> {
> do_test (0, i, i + 1, 0, MIDDLE_CHAR);
> --
> 2.29.2
Please make the similar changes in string/test-strchr.c.
Thanks.
--
H.J.
next prev parent reply other threads:[~2021-02-01 17:11 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-01 0:30 [PATCH v2 1/2] x86: Refactor and improve performance of strchr-avx2.S noah
2021-02-01 0:30 ` [PATCH v2 2/2] x86: Add additional benchmarks for strchr noah
2021-02-01 17:10 ` H.J. Lu [this message]
2021-02-01 17:08 ` [PATCH v2 1/2] x86: Refactor and improve performance of strchr-avx2.S H.J. Lu
2021-02-02 7:23 ` [PATCH v3 " goldstein.w.n
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAMe9rOop=no5vopFXMjGkXozwqWosaoYeq-3-aZ39hHXwY-fVA@mail.gmail.com' \
--to=hjl.tools@gmail.com \
--cc=carlos@systemhalted.org \
--cc=goldstein.w.n@gmail.com \
--cc=libc-alpha@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).