public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libc/27437] New: [aarch64]memcpy_simd has performance regression with larger size on Neoverse N1
@ 2021-02-19  2:56 xuchunmei at linux dot alibaba.com
  2021-02-19  2:57 ` [Bug libc/27437] " xuchunmei at linux dot alibaba.com
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: xuchunmei at linux dot alibaba.com @ 2021-02-19  2:56 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=27437

            Bug ID: 27437
           Summary: [aarch64]memcpy_simd has performance regression with
                    larger size on Neoverse N1
           Product: glibc
           Version: 2.32
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: libc
          Assignee: unassigned at sourceware dot org
          Reporter: xuchunmei at linux dot alibaba.com
                CC: drepper.fsp at gmail dot com
  Target Milestone: ---

my test platform is Neoverse N1 with 8vcpu and 32G memory.
one test env is glibc2.28 and another is glibc2.32.
I use performance testcase perf-bench-mem with memcpy, the test command is:
perf bench mem memcpy -l 100000 -s 1MB -f default

following is the compare result of glibc 2.28 and glibc2.32, the first column
is the length of memcpy to test, and the data is perf-bench-mem test result of
copy throughput.

length  glibc2.28       glibc2.32
1KB     40.974632       41.072926       1
2KB     42.864724       42.769414       -1%
4KB     43.652475       43.713758       1
8KB     44.136496       44.119306       1
16KB    44.216839       44.275858       1
32KB    43.860959       44.387913       1%
64KB    42.098147       44.104689       4%
128KB   41.403627       39.714452       -4%
256KB   43.682267       40.190337       -8%
512KB   44.157858       37.020873       -16%
1MB     44.398972       16.413157       -63%
2MB     44.401274       13.739617       -69%

when test size is larger than 128KB, glibc2.32 is slower to copy.

I use perf record to record the hot function:
glibc2.32:
+   99.93%  mem-memcpy  libc-2.32.so         [.] __GI___memcpy_simd
     0.01%  perf        ld-2.32.so           [.] do_lookup_x
     0.01%  mem-memcpy  [kernel.kallsyms]    [k] zap_pte_range
     0.00%  perf        ld-2.32.so           [.] strcmp
     0.00%  perf        ld-2.32.so           [.] _dl_relocate_object

glibc2.28:
+   99.48%  mem-memcpy  libc-2.28.so         [.] __memcpy_generic
     0.18%  perf        ld-2.28.so           [.] do_lookup_x
     0.09%  perf        ld-2.28.so           [.] _dl_relocate_object
     0.04%  perf        ld-2.28.so           [.] _dl_lookup_symbol_x

and detail in glibc2.32:
       │ d8:   ldr   q3, [x1]
  0.02 │       and   x14, x1, #0xf
       │       and   x1, x1, #0xfffffffffffffff0
       │       sub   x3, x0, x14
       │       add   x2, x2, x14
       │       ldp   q0, q1, [x1, #16]
  0.00 │       str   q3, [x0]
       │       ldp   q2, q3, [x1, #48]
  0.02 │       subs  x2, x2, #0x90
       │     ↓ b.ls  120
  0.16 │100:   stp   q0, q1, [x3, #16]
  5.40 │       ldp   q0, q1, [x1, #80]
 10.92 │       stp   q2, q3, [x3, #48]
  4.93 │       ldp   q2, q3, [x1, #112]
 77.29 │       add   x1, x1, #0x40
  0.44 │       add   x3, x3, #0x40
  0.01 │       subs  x2, x2, #0x40
  0.81 │     ↑ b.hi  100
       │120:   ldp   q4, q5, [x4, #-64]
       │       stp   q0, q1, [x3, #16]
       │       ldp   q0, q1, [x4, #-32]
       │       stp   q2, q3, [x3, #48]
       │       stp   q4, q5, [x5, #-64]
       │       stp   q0, q1, [x5, #-32]
       │     ← ret

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-02-19 14:33 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-19  2:56 [Bug libc/27437] New: [aarch64]memcpy_simd has performance regression with larger size on Neoverse N1 xuchunmei at linux dot alibaba.com
2021-02-19  2:57 ` [Bug libc/27437] " xuchunmei at linux dot alibaba.com
2021-02-19  3:16 ` carlos at redhat dot com
2021-02-19  3:42 ` xuchunmei at linux dot alibaba.com
2021-02-19 11:27 ` wdijkstr at arm dot com
2021-02-19 11:54 ` xuchunmei at linux dot alibaba.com
2021-02-19 11:55 ` xuchunmei at linux dot alibaba.com
2021-02-19 11:59 ` xuchunmei at linux dot alibaba.com
2021-02-19 13:12 ` wdijkstr at arm dot com
2021-02-19 14:28 ` xuchunmei at linux dot alibaba.com
2021-02-19 14:33 ` xuchunmei at linux dot alibaba.com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).