public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug string/30995] New: Zen 4: sub-optimal memcpy on very large copies
@ 2023-10-24  7:38 bmerry at sarao dot ac.za
  2023-10-24 17:56 ` [Bug string/30995] " sam at gentoo dot org
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: bmerry at sarao dot ac.za @ 2023-10-24  7:38 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30995

            Bug ID: 30995
           Summary: Zen 4: sub-optimal memcpy on very large copies
           Product: glibc
           Version: 2.38
            Status: UNCONFIRMED
          Severity: minor
          Priority: P2
         Component: string
          Assignee: unassigned at sourceware dot org
          Reporter: bmerry at sarao dot ac.za
  Target Milestone: ---

At sizes significantly larger than 32MB, the copy strategy seems to perform
worse on Zen 4 than either REP MOVSB or a more naive AVX-512 streaming copy.

Steps to reproduce:
1. Compile the microbench at
https://github.com/ska-sa/katgpucbf/blob/6176ed2e1f5eccf7f2acc97e4779141ac794cc01/scratch/memcpy_loop.cpp
using the adjacent Makefile (or g++ -std=c++17 -std=c++17 -Wall -O3 -pthread -o
memcpy_loop memcpy_loop.cpp)
2. Run it as ./memcpy_loop -f memcpy -r 5
3. Run it again as ./memcpy_loop -f memcpy_rep_movsb -r 5
4. Run it again as ./memcpy_loop -f memcpy_stream_avx512 -r 5

On the system I'm testing, the first reports 19.2 GB/s while the second (which
directly invokes REP MOVSB) reports 27-27.5 GB/s and the third (a
straight-forward non-temporal AVX-512 implementation) reports 27.8 GB/s. This
is for a 128 MiB copy (other sizes can be passed to the benchmark with -b).

Interestingly, I don't see this regression on a similarly-configured Zen 3
system, where memcpy and memcpy_rep_movsb seem to have roughly the same
performance on large copies. This is in spite of the comment at
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86/dl-cacheinfo.h;h=87486054f931e52f53123c672217f1903297ec76;hb=HEAD#l1031
claiming that Zen 3's REP MOVSB performs poorly on large copies.

System information: Epyc 9374F processor, Ubuntu 22.04, glibc compiled from git
glibc-2.38.9000-185-g2aa0974d25

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-11-29 21:08 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-24  7:38 [Bug string/30995] New: Zen 4: sub-optimal memcpy on very large copies bmerry at sarao dot ac.za
2023-10-24 17:56 ` [Bug string/30995] " sam at gentoo dot org
2023-10-25 10:16 ` bmerry at sarao dot ac.za
2023-10-25 12:50 ` fweimer at redhat dot com
2023-10-25 13:21 ` bmerry at sarao dot ac.za
2023-10-30 12:30 ` adhemerval.zanella at linaro dot org
2023-10-30 12:34 ` adhemerval.zanella at linaro dot org
2023-10-30 14:00 ` bmerry at sarao dot ac.za
2023-10-30 14:24 ` bmerry at sarao dot ac.za
2023-10-30 16:17 ` adhemerval.zanella at linaro dot org
2023-11-07 13:01 ` jamborm at gcc dot gnu.org
2023-11-29 17:27 ` gabravier at gmail dot com
2023-11-29 17:30 ` sam at gentoo dot org
2023-11-29 21:08 ` pageexec at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).