[Bug string/30994] REP MOVSB performance suffers from page aliasing on Zen 4

public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "adhemerval.zanella at linaro dot org" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sourceware.org
Subject: [Bug string/30994] REP MOVSB performance suffers from page aliasing on Zen 4
Date: Mon, 30 Oct 2023 16:27:35 +0000	[thread overview]
Message-ID: <bug-30994-131-Kuy5RSPpTg@http.sourceware.org/bugzilla/> (raw)
In-Reply-To: <bug-30994-131@http.sourceware.org/bugzilla/>

https://sourceware.org/bugzilla/show_bug.cgi?id=30994

--- Comment #12 from Adhemerval Zanella <adhemerval.zanella at linaro dot org> ---
(In reply to Bruce Merry from comment #11)
> > On Zen3 I am not seeing such slowdown using vectorized instructions.
> 
> Agreed, I'm also not seeing this huge-page slowdown on our Zen 3 servers
> (this is with Ubuntu 22.04's glibc 2.32; I haven't got a hand-built glibc
> handy on  that server):
> 
> $ ./memcpy_loop -D 512 -b 4096 -t mmap_huge -f memcpy -p 10000000 -r 5 0
> Using 1 threads, each with 4096 bytes of mmap_huge memory (10000000 passes)
> Using function memcpy
> 90.065 GB/s
> 89.9096 GB/s
> 89.9131 GB/s
> 89.8207 GB/s
> 89.952 GB/s
> 
> $ GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_threshold=1000000 ./memcpy_loop -D
> 512 -b 4096 -t mmap_huge -f memcpy -p 10000000 -r 5 0
> Using 1 threads, each with 4096 bytes of mmap_huge memory (10000000 passes)
> Using function memcpy
> 116.997 GB/s
> 116.874 GB/s
> 116.937 GB/s
> 117.029 GB/s
> 117.007 GB/s
> 
> On the other hand, there seem to be other cases where REP MOVSB is faster on
> Zen 3:
> 
> $ ./memcpy_loop -D 512 -f memcpy_rep_movsb -r 5 -t mmap 0
> Using 1 threads, each with 134217728 bytes of mmap memory (10 passes)
> Using function memcpy_rep_movsb
> 22.045 GB/s
> 22.3135 GB/s
> 22.1144 GB/s
> 22.8571 GB/s
> 22.2688 GB/s
> 
> $ ./memcpy_loop -D 512 -f memcpy -r 5 -t mmap 0
> Using 1 threads, each with 134217728 bytes of mmap memory (10 passes)
> Using function memcpy
> 7.66155 GB/s
> 7.71314 GB/s
> 7.72952 GB/s
> 7.72505 GB/s
> 7.74309 GB/s
> 
> But overall it does seem like the vectorised copy performs better than REP
> MOVSB on Zen 3.

The main issues seem to define when ERMS is better than vectorized based on
arguments. Current glibc only takes into consideration the input size, whereas
from the discussion it seems we need to also take into consideration the
argument alignment (and both of them).

Also, it seems that Zen3 ERMS is slightly better than non-temporal
instructions, which is another tuning heuristics since again only the size is
used where to use it (currently x86_non_temporal_threshold).

In any case, I think at least for sizes where ERMS is currently being used it
would be better to use the vectorized path. Most likely some more tunings to
switch to ERMS on large sizes would be profitable for Zen cores.

Does AMD provide any tuning manual describing such characteristics for
instruction and memory operations?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

next prev parent reply	other threads:[~2023-10-30 16:27 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-24  6:18 [Bug string/30994] New: " bmerry at sarao dot ac.za
2023-10-24  6:19 ` [Bug string/30994] " bmerry at sarao dot ac.za
2023-10-24  6:20 ` bmerry at sarao dot ac.za
2023-10-24  6:21 ` bmerry at sarao dot ac.za
2023-10-24  6:21 ` bmerry at sarao dot ac.za
2023-10-24  6:32 ` bmerry at sarao dot ac.za
2023-10-24 17:57 ` sam at gentoo dot org
2023-10-25 12:40 ` fweimer at redhat dot com
2023-10-25 13:37 ` bmerry at sarao dot ac.za
2023-10-27 12:39 ` adhemerval.zanella at linaro dot org
2023-10-27 13:04 ` bmerry at sarao dot ac.za
2023-10-27 13:16 ` bmerry at sarao dot ac.za
2023-10-30  8:21 ` bmerry at sarao dot ac.za
2023-10-30 13:30 ` adhemerval.zanella at linaro dot org
2023-10-30 14:21 ` bmerry at sarao dot ac.za
2023-10-30 16:27 ` adhemerval.zanella at linaro dot org [this message]
2023-11-07 15:44 ` jamborm at gcc dot gnu.org
2023-11-29  3:08 ` lilydjwg at gmail dot com
2023-11-29 13:01 ` holger@applied-asynchrony.com
2023-11-29 15:57 ` jrmuizel at gmail dot com
2023-11-29 17:25 ` gabravier at gmail dot com
2023-11-29 17:30 ` sam at gentoo dot org
2023-11-29 19:58 ` matti.niemenmaa+sourcesbugs at iki dot fi
2023-11-29 21:08 ` pageexec at gmail dot com
2023-11-30  3:13 ` dushistov at mail dot ru
2023-12-08  8:32 ` mati865 at gmail dot com
2024-02-13 16:54 ` cvs-commit at gcc dot gnu.org
2024-04-04 10:36 ` cvs-commit at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-30994-131-Kuy5RSPpTg@http.sourceware.org/bugzilla/ \
    --to=sourceware-bugzilla@sourceware.org \
    --cc=glibc-bugs@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).