From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
To: libc-alpha@sourceware.org
Cc: "H . J . Lu" <hjl.tools@gmail.com>,
Noah Goldstein <goldstein.w.n@gmail.com>,
Sajan Karumanchi <sajan.karumanchi@gmail.com>,
bmerry@sarao.ac.za, pmallapp@amd.com
Subject: [PATCH v3 0/3] x86: Improve ERMS usage on Zen3+
Date: Thu, 8 Feb 2024 10:08:37 -0300 [thread overview]
Message-ID: <20240208130840.533348-1-adhemerval.zanella@linaro.org> (raw)
For the sizes where REP MOVSB and REP STOSB are used on Zen3+ cores, the
result performance is lower than vectorized instructions (with some
input alignment showing a very large performance gap as indicated by
BZ#30995).
The glibc enables ERMS on AMD code for sizes between 2113
(rep_movsb_threshold) and L2 cache size rep_movsb_stop_threshold or
524288 on a Zen3 core). Using the provided benchmarks from BZ#30995, the
memcpy on Ryzen 9 5900X shows:
Size (bytes) Destination Alignment Throughput (GB/s)
2113 0 84.2448
2113 15 4.4310
524287 0 57.1122
524287 15 4.34671
While by using vectorized instructions with the tunable
GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_threshold=1000000 it shows:
Size (bytes) Destination Alignment Throughput (GB/s)
2113 0 124.1830
2113 15 121.8720
524287 0 58.3212
524287 15 58.5352
Increasing the number of concurrent jobs does show improvements in ERMS
over vectorized instructions as well. The performance difference with
ERMS improves if input alignments are equal, although it does not reach
parity with the vectorized path.
The memset also shows similar performance improvement with vectorized
instructions instead of REP STOSB. On the same machine, the default
strategy shows:
Size (bytes) Destination Alignment Throughput (GB/s)
2113 0 68.0113
2113 15 56.1880
524287 0 119.3670
524287 15 116.2590
While with GLIBC_TUNABLES=glibc.cpu.x86_rep_stosb_threshold=1000000:
Size (bytes) Destination Alignment Throughput (GB/s)
2113 0 133.2310
2113 15 132.5800
524287 0 112.0650
524287 15 118.0960
I also saw a slight performance increase on 502.gcc_r (1 copy), where
where result went from 9.82 to 9.85. The benchmarks hit hard both memcpy
and memset.
Changes from v2:
- Removed rep_movsb_stop_threshold tunable.
- Simplify the memset change.
Changes from v1:
- Reword comment and commit message.
Adhemerval Zanella (3):
x86: Fix Zen3/Zen4 ERMS selection (BZ 30994)
x86: Do not prefer ERMS for memset on Zen3+
x86: Expand the comment on when REP STOSB is used on memset
sysdeps/x86/dl-cacheinfo.h | 43 ++++++++++---------
.../multiarch/memset-vec-unaligned-erms.S | 4 +-
2 files changed, 26 insertions(+), 21 deletions(-)
--
2.34.1
next reply other threads:[~2024-02-08 13:08 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-08 13:08 Adhemerval Zanella [this message]
2024-02-08 13:08 ` [PATCH v3 1/3] x86: Fix Zen3/Zen4 ERMS selection (BZ 30994) Adhemerval Zanella
2024-02-12 15:56 ` H.J. Lu
2024-02-08 13:08 ` [PATCH v3 2/3] x86: Do not prefer ERMS for memset on Zen3+ Adhemerval Zanella
2024-02-12 15:56 ` H.J. Lu
2024-02-08 13:08 ` [PATCH v3 3/3] x86: Expand the comment on when REP STOSB is used on memset Adhemerval Zanella
2024-02-12 15:56 ` H.J. Lu
2024-03-25 15:15 ` [PATCH v3 0/3] x86: Improve ERMS usage on Zen3+ Florian Weimer
2024-03-25 15:19 ` H.J. Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240208130840.533348-1-adhemerval.zanella@linaro.org \
--to=adhemerval.zanella@linaro.org \
--cc=bmerry@sarao.ac.za \
--cc=goldstein.w.n@gmail.com \
--cc=hjl.tools@gmail.com \
--cc=libc-alpha@sourceware.org \
--cc=pmallapp@amd.com \
--cc=sajan.karumanchi@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).