public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug string/30994] New: REP MOVSB performance suffers from page aliasing on Zen 4
@ 2023-10-24  6:18 bmerry at sarao dot ac.za
  2023-10-24  6:19 ` [Bug string/30994] " bmerry at sarao dot ac.za
                   ` (26 more replies)
  0 siblings, 27 replies; 28+ messages in thread
From: bmerry at sarao dot ac.za @ 2023-10-24  6:18 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30994

            Bug ID: 30994
           Summary: REP MOVSB performance suffers from page aliasing on
                    Zen 4
           Product: glibc
           Version: 2.38
            Status: UNCONFIRMED
          Severity: minor
          Priority: P2
         Component: string
          Assignee: unassigned at sourceware dot org
          Reporter: bmerry at sarao dot ac.za
  Target Milestone: ---

When (dst-src)&0xFFF is small (but non-zero), the REP MOVSB path in memcpy
performs extremely poorly (as much as 25x slower than the alternative path).
I'm observing this on Zen 4 (Epyc 9374F). I'm running Ubuntu 22.04 with a glibc
hand-built from glibc-2.38.9000-185-g2aa0974d25.

To reproduce:
1. Download the microbench at
https://github.com/ska-sa/katgpucbf/blob/6176ed2e1f5eccf7f2acc97e4779141ac794cc01/scratch/memcpy_loop.cpp
2. Compile it with the adjacent Makefile (tl;dr: g++ -std=c++17 -O3 -pthread -o
memcpy_loop memcpy_loop.cpp)
3. Run ./memcpy_loop -t mmap -f memcpy -b 8192 -p 100000 -D 1 -r 5
4. Run GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_threshold=10000 ./memcpy_loop -t
mmap -f memcpy -b 8192 -p 100000 -D 1 -r 5

Step 3 reports a rate of 4.2 GB/s, while step 4 (which disables the rep_movsb
path) reports a rate of 111 GB/s. The test uses 8192-byte memory copies, where
the source is page-aligned and the destination starts 1 byte into a page.

I'll also attach the bench-memcpy-large.out, which shows similar results.

I've previously filed this as an Ubuntu bug
(https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/2030515) but it doesn't
seem to have received much attention.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug string/30994] REP MOVSB performance suffers from page aliasing on Zen 4
  2023-10-24  6:18 [Bug string/30994] New: REP MOVSB performance suffers from page aliasing on Zen 4 bmerry at sarao dot ac.za
@ 2023-10-24  6:19 ` bmerry at sarao dot ac.za
  2023-10-24  6:20 ` bmerry at sarao dot ac.za
                   ` (25 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: bmerry at sarao dot ac.za @ 2023-10-24  6:19 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30994

--- Comment #1 from Bruce Merry <bmerry at sarao dot ac.za> ---
Created attachment 15193
  --> https://sourceware.org/bugzilla/attachment.cgi?id=15193&action=edit
Glibc's memcpy benchmark results

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug string/30994] REP MOVSB performance suffers from page aliasing on Zen 4
  2023-10-24  6:18 [Bug string/30994] New: REP MOVSB performance suffers from page aliasing on Zen 4 bmerry at sarao dot ac.za
  2023-10-24  6:19 ` [Bug string/30994] " bmerry at sarao dot ac.za
@ 2023-10-24  6:20 ` bmerry at sarao dot ac.za
  2023-10-24  6:21 ` bmerry at sarao dot ac.za
                   ` (24 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: bmerry at sarao dot ac.za @ 2023-10-24  6:20 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30994

--- Comment #2 from Bruce Merry <bmerry at sarao dot ac.za> ---
Created attachment 15194
  --> https://sourceware.org/bugzilla/attachment.cgi?id=15194&action=edit
Output of ld-linux.so.2 --list-tunables

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug string/30994] REP MOVSB performance suffers from page aliasing on Zen 4
  2023-10-24  6:18 [Bug string/30994] New: REP MOVSB performance suffers from page aliasing on Zen 4 bmerry at sarao dot ac.za
  2023-10-24  6:19 ` [Bug string/30994] " bmerry at sarao dot ac.za
  2023-10-24  6:20 ` bmerry at sarao dot ac.za
@ 2023-10-24  6:21 ` bmerry at sarao dot ac.za
  2023-10-24  6:21 ` bmerry at sarao dot ac.za
                   ` (23 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: bmerry at sarao dot ac.za @ 2023-10-24  6:21 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30994

--- Comment #3 from Bruce Merry <bmerry at sarao dot ac.za> ---
Created attachment 15195
  --> https://sourceware.org/bugzilla/attachment.cgi?id=15195&action=edit
Output of ld-linux.so.2 --list-diagnostics

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug string/30994] REP MOVSB performance suffers from page aliasing on Zen 4
  2023-10-24  6:18 [Bug string/30994] New: REP MOVSB performance suffers from page aliasing on Zen 4 bmerry at sarao dot ac.za
                   ` (2 preceding siblings ...)
  2023-10-24  6:21 ` bmerry at sarao dot ac.za
@ 2023-10-24  6:21 ` bmerry at sarao dot ac.za
  2023-10-24  6:32 ` bmerry at sarao dot ac.za
                   ` (22 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: bmerry at sarao dot ac.za @ 2023-10-24  6:21 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30994

Bruce Merry <bmerry at sarao dot ac.za> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
               Host|                            |x86_64-linux-gnu

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug string/30994] REP MOVSB performance suffers from page aliasing on Zen 4
  2023-10-24  6:18 [Bug string/30994] New: REP MOVSB performance suffers from page aliasing on Zen 4 bmerry at sarao dot ac.za
                   ` (3 preceding siblings ...)
  2023-10-24  6:21 ` bmerry at sarao dot ac.za
@ 2023-10-24  6:32 ` bmerry at sarao dot ac.za
  2023-10-24 17:57 ` sam at gentoo dot org
                   ` (21 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: bmerry at sarao dot ac.za @ 2023-10-24  6:32 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30994

--- Comment #4 from Bruce Merry <bmerry at sarao dot ac.za> ---
This issue also affects Zen 3. Zen 2 doesn't advertise ERMS so memcpy isn't
affected.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug string/30994] REP MOVSB performance suffers from page aliasing on Zen 4
  2023-10-24  6:18 [Bug string/30994] New: REP MOVSB performance suffers from page aliasing on Zen 4 bmerry at sarao dot ac.za
                   ` (4 preceding siblings ...)
  2023-10-24  6:32 ` bmerry at sarao dot ac.za
@ 2023-10-24 17:57 ` sam at gentoo dot org
  2023-10-25 12:40 ` fweimer at redhat dot com
                   ` (20 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: sam at gentoo dot org @ 2023-10-24 17:57 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30994

Sam James <sam at gentoo dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sam at gentoo dot org

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug string/30994] REP MOVSB performance suffers from page aliasing on Zen 4
  2023-10-24  6:18 [Bug string/30994] New: REP MOVSB performance suffers from page aliasing on Zen 4 bmerry at sarao dot ac.za
                   ` (5 preceding siblings ...)
  2023-10-24 17:57 ` sam at gentoo dot org
@ 2023-10-25 12:40 ` fweimer at redhat dot com
  2023-10-25 13:37 ` bmerry at sarao dot ac.za
                   ` (19 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: fweimer at redhat dot com @ 2023-10-25 12:40 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30994

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://launchpad.net/bugs/
                   |                            |2030515
                 CC|                            |fweimer at redhat dot com

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug string/30994] REP MOVSB performance suffers from page aliasing on Zen 4
  2023-10-24  6:18 [Bug string/30994] New: REP MOVSB performance suffers from page aliasing on Zen 4 bmerry at sarao dot ac.za
                   ` (6 preceding siblings ...)
  2023-10-25 12:40 ` fweimer at redhat dot com
@ 2023-10-25 13:37 ` bmerry at sarao dot ac.za
  2023-10-27 12:39 ` adhemerval.zanella at linaro dot org
                   ` (18 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: bmerry at sarao dot ac.za @ 2023-10-25 13:37 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30994

--- Comment #5 from Bruce Merry <bmerry at sarao dot ac.za> ---
FWIW, backwards REP MOVSB (std; rep movsb; cld) is still horribly slow on Zen 4
(4 GB/s even when the data is nicely aligned and cached).

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug string/30994] REP MOVSB performance suffers from page aliasing on Zen 4
  2023-10-24  6:18 [Bug string/30994] New: REP MOVSB performance suffers from page aliasing on Zen 4 bmerry at sarao dot ac.za
                   ` (7 preceding siblings ...)
  2023-10-25 13:37 ` bmerry at sarao dot ac.za
@ 2023-10-27 12:39 ` adhemerval.zanella at linaro dot org
  2023-10-27 13:04 ` bmerry at sarao dot ac.za
                   ` (17 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: adhemerval.zanella at linaro dot org @ 2023-10-27 12:39 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30994

Adhemerval Zanella <adhemerval.zanella at linaro dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |adhemerval.zanella at linaro dot o
                   |                            |rg

--- Comment #6 from Adhemerval Zanella <adhemerval.zanella at linaro dot org> ---
I have access to a Zen3 code (5900X) and I can confirm that using REP MOVSB
seems to be always worse than vector instructions.  ERMS is used for sizes
between 2112 (rep_movsb_threshold) and 524288 (rep_movsb_stop_threshold or the
L2 size for Zen3) and the '-S 0 -D 1' performance really seems to be a
microcode since I don't see similar performance difference with other
alignments.

On Zen3 with REP MOVSB I see:

$ ./testrun.sh ./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 0
84.2448 GB/s

$ ./testrun.sh ./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 0 `seq -s'
' 0 2 23`
506.099 GB/s

$ ./testrun.sh ./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 0 `seq -s'
' 0 23`
990.845 GB/s


$ ./testrun.sh ./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 0
57.1122 GB/s

$ ./testrun.sh ./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 0 `seq
-s' ' 0 2 23`
325.409 GB/s

$ ./testrun.sh ./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 0 `seq
-s' ' 0 23`
510.87 GB/s


$ ./testrun.sh ./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 15
4.43104 GB/s

$ ./testrun.sh ./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 15 `seq -s'
' 0 2 23`
22.4551 GB/s

$ ./testrun.sh ./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 15 `seq -s'
' 0 23`
40.4088 GB/s


$ ./testrun.sh ./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 15
4.34671 GB/s

$ ./testrun.sh ./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 15 `seq
-s' ' 0 2 23`
22.0829 GB/s

$ ./testrun.sh ./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 15 `seq
-s' ' 0 23`


While with vectorized instructions I see:


$ GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_threshold=1000000 ./testrun.sh
./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 0
124.183 GB/s

$ GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_threshold=1000000 ./testrun.sh
./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 0 `seq -s' ' 0 2 23`
773.696 GB/s

$ GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_threshold=1000000 ./testrun.sh
./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 0 `seq -s' ' 0 23`
1413.02 GB/s


$ GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_threshold=1000000 ./testrun.sh
./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 0
58.3212 GB/s

$ GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_threshold=1000000 ./testrun.sh
./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 0 `seq -s' ' 0 2 23`
322.583 GB/s

$ GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_threshold=1000000 ./testrun.sh
./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 0 `seq -s' ' 0 23`
506.116 GB/s

$ GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_threshold=1000000 ./testrun.sh
./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 15
121.872 GB/s

$ GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_threshold=1000000 ./testrun.sh
./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 15 `seq -s' ' 0 2 23`
717.717 GB/s

$ GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_threshold=1000000 ./testrun.sh
./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 15 `seq -s' ' 0 23`
1318.17 GB/s

$ GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_threshold=1000000 ./testrun.sh
./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 15
58.5352 GB/s

$ GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_threshold=1000000 ./testrun.sh
./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 15 `seq -s' ' 0 2 23`
325.996 GB/s

$ GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_threshold=1000000 ./testrun.sh
./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 15 `seq -s' ' 0 23`
498.552 GB/s

So it seems there in gain in using REP MOVSB on Zen3/Zen4, specially on the
size is was supposed to be better. glibc 2.34 added a fix from AMD
(6e02b3e9327b7dbb063958d2b124b64fcb4bbe3f), where the assumption is ERMS
performs poorly on data above L2 cache size so REP MOVSB is limited to L2 cache
size (from 2113 to 524287), but I think AMD engineers did not really evaluated
that ERM is indeed better than vectorized instruction.

And I think BZ#30995 is the same issue, since __memcpy_avx512_unaligned_erms
uses the same tunable to decide whether to use ERMS. I have created a patch
that just disable ERMS usage on AMD cores [1], can you check if it improves
performance on Zen4 as well?

Also, I have notices that memset is also showing subpar performance with ERMS
and I also disable it on my branch.

[1]
https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/bz30944-memcpy-zen

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug string/30994] REP MOVSB performance suffers from page aliasing on Zen 4
  2023-10-24  6:18 [Bug string/30994] New: REP MOVSB performance suffers from page aliasing on Zen 4 bmerry at sarao dot ac.za
                   ` (8 preceding siblings ...)
  2023-10-27 12:39 ` adhemerval.zanella at linaro dot org
@ 2023-10-27 13:04 ` bmerry at sarao dot ac.za
  2023-10-27 13:16 ` bmerry at sarao dot ac.za
                   ` (16 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: bmerry at sarao dot ac.za @ 2023-10-27 13:04 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30994

--- Comment #7 from Bruce Merry <bmerry at sarao dot ac.za> ---
Here's what I get on the Zen 4 system with the same parameters. I haven't had a
chance to look at what it all means:

+ ./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 0 -r5
80.6649 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 0 0 2 4 6 8 10 12 14 16
18 20 22 -r5
954.928 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 0 0 1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20 21 22 23 -r5
1883.1 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 0 -r5
48.7753 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 0 0 2 4 6 8 10 12 14
16 18 20 22 -r5
570.385 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 0 0 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 -r5
676.928 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 15 -r5
3.54696 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 15 0 2 4 6 8 10 12 14 16
18 20 22 -r5
42.5706 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 15 0 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 -r5
85.0753 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 15 -r5
3.50689 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 15 0 2 4 6 8 10 12 14
16 18 20 22 -r5
41.5237 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 15 0 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 -r5
81.8951 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 0 -r5
102.05 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 0 0 2 4 6 8 10 12 14 16
18 20 22 -r5
1206.81 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 0 0 1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20 21 22 23 -r5
2415.47 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 0 -r5
49.4859 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 0 0 2 4 6 8 10 12 14
16 18 20 22 -r5
583.279 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 0 0 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 -r5
1066.54 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 15 -r5
97.1753 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 15 0 2 4 6 8 10 12 14 16
18 20 22 -r5
991.128 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 15 0 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 -r5
2257.42 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 15 -r5
49.3362 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 15 0 2 4 6 8 10 12 14
16 18 20 22 -r5
571.026 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 15 0 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 -r5
1075.03 GB/s

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug string/30994] REP MOVSB performance suffers from page aliasing on Zen 4
  2023-10-24  6:18 [Bug string/30994] New: REP MOVSB performance suffers from page aliasing on Zen 4 bmerry at sarao dot ac.za
                   ` (9 preceding siblings ...)
  2023-10-27 13:04 ` bmerry at sarao dot ac.za
@ 2023-10-27 13:16 ` bmerry at sarao dot ac.za
  2023-10-30  8:21 ` bmerry at sarao dot ac.za
                   ` (15 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: bmerry at sarao dot ac.za @ 2023-10-27 13:16 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30994

--- Comment #8 from Bruce Merry <bmerry at sarao dot ac.za> ---
Ah looks like the GLIBC_TUNABLES environment variable didn't appear in the
output. Let me try again:

+ ./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 0 -r5
80.6649 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 0 0 2 4 6 8 10 12 14 16
18 20 22 -r5
954.928 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 0 0 1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20 21 22 23 -r5
1883.1 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 0 -r5
48.7753 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 0 0 2 4 6 8 10 12 14
16 18 20 22 -r5
570.385 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 0 0 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 -r5
676.928 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 15 -r5
3.54696 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 15 0 2 4 6 8 10 12 14 16
18 20 22 -r5
42.5706 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 15 0 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 -r5
85.0753 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 15 -r5
3.50689 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 15 0 2 4 6 8 10 12 14
16 18 20 22 -r5
41.5237 GB/s
+ ./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 15 0 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 -r5
81.8951 GB/s
+ GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_threshold=1000000
+ ./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 0 -r5
102.05 GB/s
+ GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_threshold=1000000
+ ./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 0 0 2 4 6 8 10 12 14 16
18 20 22 -r5
1206.81 GB/s
+ GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_threshold=1000000
+ ./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 0 0 1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20 21 22 23 -r5
2415.47 GB/s
+ GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_threshold=1000000
+ ./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 0 -r5
49.4859 GB/s
+ GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_threshold=1000000
+ ./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 0 0 2 4 6 8 10 12 14
16 18 20 22 -r5
583.279 GB/s
+ GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_threshold=1000000
+ ./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 0 0 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 -r5
1066.54 GB/s
+ GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_threshold=1000000
+ ./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 15 -r5
97.1753 GB/s
+ GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_threshold=1000000
+ ./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 15 0 2 4 6 8 10 12 14 16
18 20 22 -r5
991.128 GB/s
+ GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_threshold=1000000
+ ./memcpy_loop -t mmap -f memcpy -b 2113 -p 100000 -D 15 0 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 -r5
2257.42 GB/s
+ GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_threshold=1000000
+ ./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 15 -r5
49.3362 GB/s
+ GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_threshold=1000000
+ ./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 15 0 2 4 6 8 10 12 14
16 18 20 22 -r5
571.026 GB/s
+ GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_threshold=1000000
+ ./memcpy_loop -t mmap -f memcpy -b 524287 -p 100000 -D 15 0 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 -r5
1075.03 GB/s

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug string/30994] REP MOVSB performance suffers from page aliasing on Zen 4
  2023-10-24  6:18 [Bug string/30994] New: REP MOVSB performance suffers from page aliasing on Zen 4 bmerry at sarao dot ac.za
                   ` (10 preceding siblings ...)
  2023-10-27 13:16 ` bmerry at sarao dot ac.za
@ 2023-10-30  8:21 ` bmerry at sarao dot ac.za
  2023-10-30 13:30 ` adhemerval.zanella at linaro dot org
                   ` (14 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: bmerry at sarao dot ac.za @ 2023-10-30  8:21 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30994

--- Comment #9 from Bruce Merry <bmerry at sarao dot ac.za> ---
So in those cases, REP MOVSB seems to be a slow-down, but there do also seem to
be cases where REP MOVSB is much faster (this is on Zen 4) e.g.

$ ./memcpy_loop -D 512 -b 4096 -t mmap_huge -f memcpy -p 10000000 -r 5 0
Using 1 threads, each with 4096 bytes of mmap_huge memory (10000000 passes)
Using function memcpy
94.5295 GB/s
94.3382 GB/s
94.474 GB/s
94.2385 GB/s
94.5105 GB/s

$ GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_threshold=1000000 ./memcpy_loop -D 512
-b 4096 -t mmap_huge -f memcpy -p 10000000 -r 5 0
Using 1 threads, each with 4096 bytes of mmap_huge memory (10000000 passes)
Using function memcpy
56.5062 GB/s
55.3669 GB/s
56.4723 GB/s
55.857 GB/s
56.5396 GB/s

When not using huge pages, the vectorised memcpy hits 115.5 GB/s. I'm seeing a
lot of cases on Zen 4 where huge pages actually makes things worse; maybe it's
related to hardware prefetch reading past the end of the buffer?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug string/30994] REP MOVSB performance suffers from page aliasing on Zen 4
  2023-10-24  6:18 [Bug string/30994] New: REP MOVSB performance suffers from page aliasing on Zen 4 bmerry at sarao dot ac.za
                   ` (11 preceding siblings ...)
  2023-10-30  8:21 ` bmerry at sarao dot ac.za
@ 2023-10-30 13:30 ` adhemerval.zanella at linaro dot org
  2023-10-30 14:21 ` bmerry at sarao dot ac.za
                   ` (13 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: adhemerval.zanella at linaro dot org @ 2023-10-30 13:30 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30994

--- Comment #10 from Adhemerval Zanella <adhemerval.zanella at linaro dot org> ---
On Zen3 I am not seeing such slowdown using vectorized instructions.  With a
patch glibc to disable REP MOVSB I see:

$ ./testrun.sh ./memcpy_loop -D 512 -b 4096 -t mmap_huge -f memcpy -p 10000000
146.593 GB/s

# Force REP MOVSB
$ GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_stop_threshold=4097 ./testrun.sh
./memcpy_loop  -D 512 -b 4096 -t mmap_huge -f memcpy -p 10000000
116.298 GB/s

And I don't see difference between mmap and mmap_huge.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug string/30994] REP MOVSB performance suffers from page aliasing on Zen 4
  2023-10-24  6:18 [Bug string/30994] New: REP MOVSB performance suffers from page aliasing on Zen 4 bmerry at sarao dot ac.za
                   ` (12 preceding siblings ...)
  2023-10-30 13:30 ` adhemerval.zanella at linaro dot org
@ 2023-10-30 14:21 ` bmerry at sarao dot ac.za
  2023-10-30 16:27 ` adhemerval.zanella at linaro dot org
                   ` (12 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: bmerry at sarao dot ac.za @ 2023-10-30 14:21 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30994

--- Comment #11 from Bruce Merry <bmerry at sarao dot ac.za> ---
> On Zen3 I am not seeing such slowdown using vectorized instructions.

Agreed, I'm also not seeing this huge-page slowdown on our Zen 3 servers (this
is with Ubuntu 22.04's glibc 2.32; I haven't got a hand-built glibc handy on 
that server):

$ ./memcpy_loop -D 512 -b 4096 -t mmap_huge -f memcpy -p 10000000 -r 5 0
Using 1 threads, each with 4096 bytes of mmap_huge memory (10000000 passes)
Using function memcpy
90.065 GB/s
89.9096 GB/s
89.9131 GB/s
89.8207 GB/s
89.952 GB/s

$ GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_threshold=1000000 ./memcpy_loop -D 512
-b 4096 -t mmap_huge -f memcpy -p 10000000 -r 5 0
Using 1 threads, each with 4096 bytes of mmap_huge memory (10000000 passes)
Using function memcpy
116.997 GB/s
116.874 GB/s
116.937 GB/s
117.029 GB/s
117.007 GB/s

On the other hand, there seem to be other cases where REP MOVSB is faster on
Zen 3:

$ ./memcpy_loop -D 512 -f memcpy_rep_movsb -r 5 -t mmap 0
Using 1 threads, each with 134217728 bytes of mmap memory (10 passes)
Using function memcpy_rep_movsb
22.045 GB/s
22.3135 GB/s
22.1144 GB/s
22.8571 GB/s
22.2688 GB/s

$ ./memcpy_loop -D 512 -f memcpy -r 5 -t mmap 0
Using 1 threads, each with 134217728 bytes of mmap memory (10 passes)
Using function memcpy
7.66155 GB/s
7.71314 GB/s
7.72952 GB/s
7.72505 GB/s
7.74309 GB/s

But overall it does seem like the vectorised copy performs better than REP
MOVSB on Zen 3.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug string/30994] REP MOVSB performance suffers from page aliasing on Zen 4
  2023-10-24  6:18 [Bug string/30994] New: REP MOVSB performance suffers from page aliasing on Zen 4 bmerry at sarao dot ac.za
                   ` (13 preceding siblings ...)
  2023-10-30 14:21 ` bmerry at sarao dot ac.za
@ 2023-10-30 16:27 ` adhemerval.zanella at linaro dot org
  2023-11-07 15:44 ` jamborm at gcc dot gnu.org
                   ` (11 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: adhemerval.zanella at linaro dot org @ 2023-10-30 16:27 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30994

--- Comment #12 from Adhemerval Zanella <adhemerval.zanella at linaro dot org> ---
(In reply to Bruce Merry from comment #11)
> > On Zen3 I am not seeing such slowdown using vectorized instructions.
> 
> Agreed, I'm also not seeing this huge-page slowdown on our Zen 3 servers
> (this is with Ubuntu 22.04's glibc 2.32; I haven't got a hand-built glibc
> handy on  that server):
> 
> $ ./memcpy_loop -D 512 -b 4096 -t mmap_huge -f memcpy -p 10000000 -r 5 0
> Using 1 threads, each with 4096 bytes of mmap_huge memory (10000000 passes)
> Using function memcpy
> 90.065 GB/s
> 89.9096 GB/s
> 89.9131 GB/s
> 89.8207 GB/s
> 89.952 GB/s
> 
> $ GLIBC_TUNABLES=glibc.cpu.x86_rep_movsb_threshold=1000000 ./memcpy_loop -D
> 512 -b 4096 -t mmap_huge -f memcpy -p 10000000 -r 5 0
> Using 1 threads, each with 4096 bytes of mmap_huge memory (10000000 passes)
> Using function memcpy
> 116.997 GB/s
> 116.874 GB/s
> 116.937 GB/s
> 117.029 GB/s
> 117.007 GB/s
> 
> On the other hand, there seem to be other cases where REP MOVSB is faster on
> Zen 3:
> 
> $ ./memcpy_loop -D 512 -f memcpy_rep_movsb -r 5 -t mmap 0
> Using 1 threads, each with 134217728 bytes of mmap memory (10 passes)
> Using function memcpy_rep_movsb
> 22.045 GB/s
> 22.3135 GB/s
> 22.1144 GB/s
> 22.8571 GB/s
> 22.2688 GB/s
> 
> $ ./memcpy_loop -D 512 -f memcpy -r 5 -t mmap 0
> Using 1 threads, each with 134217728 bytes of mmap memory (10 passes)
> Using function memcpy
> 7.66155 GB/s
> 7.71314 GB/s
> 7.72952 GB/s
> 7.72505 GB/s
> 7.74309 GB/s
> 
> But overall it does seem like the vectorised copy performs better than REP
> MOVSB on Zen 3.

The main issues seem to define when ERMS is better than vectorized based on
arguments. Current glibc only takes into consideration the input size, whereas
from the discussion it seems we need to also take into consideration the
argument alignment (and both of them).

Also, it seems that Zen3 ERMS is slightly better than non-temporal
instructions, which is another tuning heuristics since again only the size is
used where to use it (currently x86_non_temporal_threshold).

In any case, I think at least for sizes where ERMS is currently being used it
would be better to use the vectorized path. Most likely some more tunings to
switch to ERMS on large sizes would be profitable for Zen cores.

Does AMD provide any tuning manual describing such characteristics for
instruction and memory operations?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug string/30994] REP MOVSB performance suffers from page aliasing on Zen 4
  2023-10-24  6:18 [Bug string/30994] New: REP MOVSB performance suffers from page aliasing on Zen 4 bmerry at sarao dot ac.za
                   ` (14 preceding siblings ...)
  2023-10-30 16:27 ` adhemerval.zanella at linaro dot org
@ 2023-11-07 15:44 ` jamborm at gcc dot gnu.org
  2023-11-29  3:08 ` lilydjwg at gmail dot com
                   ` (10 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: jamborm at gcc dot gnu.org @ 2023-11-07 15:44 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30994

Martin Jambor <jamborm at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jamborm at gcc dot gnu.org

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug string/30994] REP MOVSB performance suffers from page aliasing on Zen 4
  2023-10-24  6:18 [Bug string/30994] New: REP MOVSB performance suffers from page aliasing on Zen 4 bmerry at sarao dot ac.za
                   ` (15 preceding siblings ...)
  2023-11-07 15:44 ` jamborm at gcc dot gnu.org
@ 2023-11-29  3:08 ` lilydjwg at gmail dot com
  2023-11-29 13:01 ` holger@applied-asynchrony.com
                   ` (9 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: lilydjwg at gmail dot com @ 2023-11-29  3:08 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30994

lilydjwg at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |lilydjwg at gmail dot com

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug string/30994] REP MOVSB performance suffers from page aliasing on Zen 4
  2023-10-24  6:18 [Bug string/30994] New: REP MOVSB performance suffers from page aliasing on Zen 4 bmerry at sarao dot ac.za
                   ` (16 preceding siblings ...)
  2023-11-29  3:08 ` lilydjwg at gmail dot com
@ 2023-11-29 13:01 ` holger@applied-asynchrony.com
  2023-11-29 15:57 ` jrmuizel at gmail dot com
                   ` (8 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: holger@applied-asynchrony.com @ 2023-11-29 13:01 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30994

Holger Hoffstätte <holger@applied-asynchrony.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |holger@applied-asynchrony.c
                   |                            |om

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug string/30994] REP MOVSB performance suffers from page aliasing on Zen 4
  2023-10-24  6:18 [Bug string/30994] New: REP MOVSB performance suffers from page aliasing on Zen 4 bmerry at sarao dot ac.za
                   ` (17 preceding siblings ...)
  2023-11-29 13:01 ` holger@applied-asynchrony.com
@ 2023-11-29 15:57 ` jrmuizel at gmail dot com
  2023-11-29 17:25 ` gabravier at gmail dot com
                   ` (7 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: jrmuizel at gmail dot com @ 2023-11-29 15:57 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30994

Jeff Muizelaar <jrmuizel at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jrmuizel at gmail dot com

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug string/30994] REP MOVSB performance suffers from page aliasing on Zen 4
  2023-10-24  6:18 [Bug string/30994] New: REP MOVSB performance suffers from page aliasing on Zen 4 bmerry at sarao dot ac.za
                   ` (18 preceding siblings ...)
  2023-11-29 15:57 ` jrmuizel at gmail dot com
@ 2023-11-29 17:25 ` gabravier at gmail dot com
  2023-11-29 17:30 ` sam at gentoo dot org
                   ` (6 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: gabravier at gmail dot com @ 2023-11-29 17:25 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30994

Gabriel Ravier <gabravier at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |gabravier at gmail dot com

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug string/30994] REP MOVSB performance suffers from page aliasing on Zen 4
  2023-10-24  6:18 [Bug string/30994] New: REP MOVSB performance suffers from page aliasing on Zen 4 bmerry at sarao dot ac.za
                   ` (19 preceding siblings ...)
  2023-11-29 17:25 ` gabravier at gmail dot com
@ 2023-11-29 17:30 ` sam at gentoo dot org
  2023-11-29 19:58 ` matti.niemenmaa+sourcesbugs at iki dot fi
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: sam at gentoo dot org @ 2023-11-29 17:30 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30994

Sam James <sam at gentoo dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://sourceware.org/bugz
                   |                            |illa/show_bug.cgi?id=30995

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug string/30994] REP MOVSB performance suffers from page aliasing on Zen 4
  2023-10-24  6:18 [Bug string/30994] New: REP MOVSB performance suffers from page aliasing on Zen 4 bmerry at sarao dot ac.za
                   ` (20 preceding siblings ...)
  2023-11-29 17:30 ` sam at gentoo dot org
@ 2023-11-29 19:58 ` matti.niemenmaa+sourcesbugs at iki dot fi
  2023-11-29 21:08 ` pageexec at gmail dot com
                   ` (4 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: matti.niemenmaa+sourcesbugs at iki dot fi @ 2023-11-29 19:58 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30994

Matti Niemenmaa <matti.niemenmaa+sourcesbugs at iki dot fi> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |matti.niemenmaa+sourcesbugs
                   |                            |@iki.fi

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug string/30994] REP MOVSB performance suffers from page aliasing on Zen 4
  2023-10-24  6:18 [Bug string/30994] New: REP MOVSB performance suffers from page aliasing on Zen 4 bmerry at sarao dot ac.za
                   ` (21 preceding siblings ...)
  2023-11-29 19:58 ` matti.niemenmaa+sourcesbugs at iki dot fi
@ 2023-11-29 21:08 ` pageexec at gmail dot com
  2023-11-30  3:13 ` dushistov at mail dot ru
                   ` (3 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: pageexec at gmail dot com @ 2023-11-29 21:08 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30994

PaX Team <pageexec at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |pageexec at gmail dot com

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug string/30994] REP MOVSB performance suffers from page aliasing on Zen 4
  2023-10-24  6:18 [Bug string/30994] New: REP MOVSB performance suffers from page aliasing on Zen 4 bmerry at sarao dot ac.za
                   ` (22 preceding siblings ...)
  2023-11-29 21:08 ` pageexec at gmail dot com
@ 2023-11-30  3:13 ` dushistov at mail dot ru
  2023-12-08  8:32 ` mati865 at gmail dot com
                   ` (2 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: dushistov at mail dot ru @ 2023-11-30  3:13 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30994

Evgeniy Dushistov <dushistov at mail dot ru> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dushistov at mail dot ru

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug string/30994] REP MOVSB performance suffers from page aliasing on Zen 4
  2023-10-24  6:18 [Bug string/30994] New: REP MOVSB performance suffers from page aliasing on Zen 4 bmerry at sarao dot ac.za
                   ` (23 preceding siblings ...)
  2023-11-30  3:13 ` dushistov at mail dot ru
@ 2023-12-08  8:32 ` mati865 at gmail dot com
  2024-02-13 16:54 ` cvs-commit at gcc dot gnu.org
  2024-04-04 10:36 ` cvs-commit at gcc dot gnu.org
  26 siblings, 0 replies; 28+ messages in thread
From: mati865 at gmail dot com @ 2023-12-08  8:32 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30994

Mateusz Mikuła <mati865 at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mati865 at gmail dot com

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug string/30994] REP MOVSB performance suffers from page aliasing on Zen 4
  2023-10-24  6:18 [Bug string/30994] New: REP MOVSB performance suffers from page aliasing on Zen 4 bmerry at sarao dot ac.za
                   ` (24 preceding siblings ...)
  2023-12-08  8:32 ` mati865 at gmail dot com
@ 2024-02-13 16:54 ` cvs-commit at gcc dot gnu.org
  2024-04-04 10:36 ` cvs-commit at gcc dot gnu.org
  26 siblings, 0 replies; 28+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-02-13 16:54 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30994

--- Comment #13 from Sourceware Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by H.J. Lu <hjl@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=0c0d39fe4aeb0f69b26e76337c5dfd5530d5d44e

commit 0c0d39fe4aeb0f69b26e76337c5dfd5530d5d44e
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Thu Feb 8 10:08:38 2024 -0300

    x86: Fix Zen3/Zen4 ERMS selection (BZ 30994)

    The REP MOVSB usage on memcpy/memmove does not show much performance
    improvement on Zen3/Zen4 cores compared to the vectorized loops.  Also,
    as from BZ 30994, if the source is aligned and the destination is not
    the performance can be 20x slower.

    The performance difference is noticeable with small buffer sizes, closer
    to the lower bounds limits when memcpy/memmove starts to use ERMS.  The
    performance of REP MOVSB is similar to vectorized instruction on the
    size limit (the L2 cache).  Also, there is no drawback to multiple cores
    sharing the cache.

    Checked on x86_64-linux-gnu on Zen3.
    Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug string/30994] REP MOVSB performance suffers from page aliasing on Zen 4
  2023-10-24  6:18 [Bug string/30994] New: REP MOVSB performance suffers from page aliasing on Zen 4 bmerry at sarao dot ac.za
                   ` (25 preceding siblings ...)
  2024-02-13 16:54 ` cvs-commit at gcc dot gnu.org
@ 2024-04-04 10:36 ` cvs-commit at gcc dot gnu.org
  26 siblings, 0 replies; 28+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-04-04 10:36 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30994

--- Comment #14 from Sourceware Commits <cvs-commit at gcc dot gnu.org> ---
The release/2.39/master branch has been updated by Arjun Shankar
<arjun@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=aa4249266e9906c4bc833e4847f4d8feef59504f

commit aa4249266e9906c4bc833e4847f4d8feef59504f
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Thu Feb 8 10:08:38 2024 -0300

    x86: Fix Zen3/Zen4 ERMS selection (BZ 30994)

    The REP MOVSB usage on memcpy/memmove does not show much performance
    improvement on Zen3/Zen4 cores compared to the vectorized loops.  Also,
    as from BZ 30994, if the source is aligned and the destination is not
    the performance can be 20x slower.

    The performance difference is noticeable with small buffer sizes, closer
    to the lower bounds limits when memcpy/memmove starts to use ERMS.  The
    performance of REP MOVSB is similar to vectorized instruction on the
    size limit (the L2 cache).  Also, there is no drawback to multiple cores
    sharing the cache.

    Checked on x86_64-linux-gnu on Zen3.
    Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

    (cherry picked from commit 0c0d39fe4aeb0f69b26e76337c5dfd5530d5d44e)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2024-04-04 10:36 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-24  6:18 [Bug string/30994] New: REP MOVSB performance suffers from page aliasing on Zen 4 bmerry at sarao dot ac.za
2023-10-24  6:19 ` [Bug string/30994] " bmerry at sarao dot ac.za
2023-10-24  6:20 ` bmerry at sarao dot ac.za
2023-10-24  6:21 ` bmerry at sarao dot ac.za
2023-10-24  6:21 ` bmerry at sarao dot ac.za
2023-10-24  6:32 ` bmerry at sarao dot ac.za
2023-10-24 17:57 ` sam at gentoo dot org
2023-10-25 12:40 ` fweimer at redhat dot com
2023-10-25 13:37 ` bmerry at sarao dot ac.za
2023-10-27 12:39 ` adhemerval.zanella at linaro dot org
2023-10-27 13:04 ` bmerry at sarao dot ac.za
2023-10-27 13:16 ` bmerry at sarao dot ac.za
2023-10-30  8:21 ` bmerry at sarao dot ac.za
2023-10-30 13:30 ` adhemerval.zanella at linaro dot org
2023-10-30 14:21 ` bmerry at sarao dot ac.za
2023-10-30 16:27 ` adhemerval.zanella at linaro dot org
2023-11-07 15:44 ` jamborm at gcc dot gnu.org
2023-11-29  3:08 ` lilydjwg at gmail dot com
2023-11-29 13:01 ` holger@applied-asynchrony.com
2023-11-29 15:57 ` jrmuizel at gmail dot com
2023-11-29 17:25 ` gabravier at gmail dot com
2023-11-29 17:30 ` sam at gentoo dot org
2023-11-29 19:58 ` matti.niemenmaa+sourcesbugs at iki dot fi
2023-11-29 21:08 ` pageexec at gmail dot com
2023-11-30  3:13 ` dushistov at mail dot ru
2023-12-08  8:32 ` mati865 at gmail dot com
2024-02-13 16:54 ` cvs-commit at gcc dot gnu.org
2024-04-04 10:36 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).