public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libc/27437] New: [aarch64]memcpy_simd has performance regression with larger size on Neoverse N1
@ 2021-02-19  2:56 xuchunmei at linux dot alibaba.com
  2021-02-19  2:57 ` [Bug libc/27437] " xuchunmei at linux dot alibaba.com
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: xuchunmei at linux dot alibaba.com @ 2021-02-19  2:56 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=27437

            Bug ID: 27437
           Summary: [aarch64]memcpy_simd has performance regression with
                    larger size on Neoverse N1
           Product: glibc
           Version: 2.32
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: libc
          Assignee: unassigned at sourceware dot org
          Reporter: xuchunmei at linux dot alibaba.com
                CC: drepper.fsp at gmail dot com
  Target Milestone: ---

my test platform is Neoverse N1 with 8vcpu and 32G memory.
one test env is glibc2.28 and another is glibc2.32.
I use performance testcase perf-bench-mem with memcpy, the test command is:
perf bench mem memcpy -l 100000 -s 1MB -f default

following is the compare result of glibc 2.28 and glibc2.32, the first column
is the length of memcpy to test, and the data is perf-bench-mem test result of
copy throughput.

length  glibc2.28       glibc2.32
1KB     40.974632       41.072926       1
2KB     42.864724       42.769414       -1%
4KB     43.652475       43.713758       1
8KB     44.136496       44.119306       1
16KB    44.216839       44.275858       1
32KB    43.860959       44.387913       1%
64KB    42.098147       44.104689       4%
128KB   41.403627       39.714452       -4%
256KB   43.682267       40.190337       -8%
512KB   44.157858       37.020873       -16%
1MB     44.398972       16.413157       -63%
2MB     44.401274       13.739617       -69%

when test size is larger than 128KB, glibc2.32 is slower to copy.

I use perf record to record the hot function:
glibc2.32:
+   99.93%  mem-memcpy  libc-2.32.so         [.] __GI___memcpy_simd
     0.01%  perf        ld-2.32.so           [.] do_lookup_x
     0.01%  mem-memcpy  [kernel.kallsyms]    [k] zap_pte_range
     0.00%  perf        ld-2.32.so           [.] strcmp
     0.00%  perf        ld-2.32.so           [.] _dl_relocate_object

glibc2.28:
+   99.48%  mem-memcpy  libc-2.28.so         [.] __memcpy_generic
     0.18%  perf        ld-2.28.so           [.] do_lookup_x
     0.09%  perf        ld-2.28.so           [.] _dl_relocate_object
     0.04%  perf        ld-2.28.so           [.] _dl_lookup_symbol_x

and detail in glibc2.32:
       │ d8:   ldr   q3, [x1]
  0.02 │       and   x14, x1, #0xf
       │       and   x1, x1, #0xfffffffffffffff0
       │       sub   x3, x0, x14
       │       add   x2, x2, x14
       │       ldp   q0, q1, [x1, #16]
  0.00 │       str   q3, [x0]
       │       ldp   q2, q3, [x1, #48]
  0.02 │       subs  x2, x2, #0x90
       │     ↓ b.ls  120
  0.16 │100:   stp   q0, q1, [x3, #16]
  5.40 │       ldp   q0, q1, [x1, #80]
 10.92 │       stp   q2, q3, [x3, #48]
  4.93 │       ldp   q2, q3, [x1, #112]
 77.29 │       add   x1, x1, #0x40
  0.44 │       add   x3, x3, #0x40
  0.01 │       subs  x2, x2, #0x40
  0.81 │     ↑ b.hi  100
       │120:   ldp   q4, q5, [x4, #-64]
       │       stp   q0, q1, [x3, #16]
       │       ldp   q0, q1, [x4, #-32]
       │       stp   q2, q3, [x3, #48]
       │       stp   q4, q5, [x5, #-64]
       │       stp   q0, q1, [x5, #-32]
       │     ← ret

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug libc/27437] [aarch64]memcpy_simd has performance regression with larger size on Neoverse N1
  2021-02-19  2:56 [Bug libc/27437] New: [aarch64]memcpy_simd has performance regression with larger size on Neoverse N1 xuchunmei at linux dot alibaba.com
@ 2021-02-19  2:57 ` xuchunmei at linux dot alibaba.com
  2021-02-19  3:16 ` carlos at redhat dot com
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: xuchunmei at linux dot alibaba.com @ 2021-02-19  2:57 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=27437

xuchunmei <xuchunmei at linux dot alibaba.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |wdijkstr at arm dot com,
                   |                            |xuchunmei at linux dot alibaba.com

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug libc/27437] [aarch64]memcpy_simd has performance regression with larger size on Neoverse N1
  2021-02-19  2:56 [Bug libc/27437] New: [aarch64]memcpy_simd has performance regression with larger size on Neoverse N1 xuchunmei at linux dot alibaba.com
  2021-02-19  2:57 ` [Bug libc/27437] " xuchunmei at linux dot alibaba.com
@ 2021-02-19  3:16 ` carlos at redhat dot com
  2021-02-19  3:42 ` xuchunmei at linux dot alibaba.com
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: carlos at redhat dot com @ 2021-02-19  3:16 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=27437

Carlos O'Donell <carlos at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |carlos at redhat dot com

--- Comment #1 from Carlos O'Donell <carlos at redhat dot com> ---
What results do you get from bench-memcpy-random i.e. make bench; on your
system?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug libc/27437] [aarch64]memcpy_simd has performance regression with larger size on Neoverse N1
  2021-02-19  2:56 [Bug libc/27437] New: [aarch64]memcpy_simd has performance regression with larger size on Neoverse N1 xuchunmei at linux dot alibaba.com
  2021-02-19  2:57 ` [Bug libc/27437] " xuchunmei at linux dot alibaba.com
  2021-02-19  3:16 ` carlos at redhat dot com
@ 2021-02-19  3:42 ` xuchunmei at linux dot alibaba.com
  2021-02-19 11:27 ` wdijkstr at arm dot com
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: xuchunmei at linux dot alibaba.com @ 2021-02-19  3:42 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=27437

--- Comment #2 from xuchunmei <xuchunmei at linux dot alibaba.com> ---
(In reply to Carlos O'Donell from comment #1)
> What results do you get from bench-memcpy-random i.e. make bench; on your
> system?

# ./bench-memcpy-random 
{
 "timing_type": "hp_timing",
 "functions": {
  "memcpy": {
   "bench-variant": "random",
   "ifuncs": ["__memcpy_thunderx", "__memcpy_thunderx2", "__memcpy_falkor",
"__memcpy_simd", "__memcpy_generic"],
   "results": [
    {
     "max-size": 4096,
     "timings": [61793.7, 59328.8, 56071.7, 50435.7, 53163.3]
    },
    {
     "max-size": 8192,
     "timings": [62629.7, 58642.9, 55397.9, 49791.9, 52634.6]
    },
    {
     "max-size": 16384,
     "timings": [63192.2, 58967, 55733.3, 49763.7, 53064.4]
    },
    {
     "max-size": 32768,
     "timings": [63471.5, 59236.6, 56408.2, 51509.8, 54014.6]
    },
    {
     "max-size": 65536,
     "timings": [65745.4, 60589.8, 57791.2, 52921.2, 57637.8]
    },
    {
     "max-size": 131072,
     "timings": [68051.3, 62946.6, 60451.2, 56379.2, 60693]
    },
    {
     "max-size": 262144,
     "timings": [74675.4, 69991.7, 67861.6, 63699, 67316.2]
    },
    {
     "max-size": 524288,
     "timings": [94101.2, 91320.1, 89655.6, 84932.3, 87520.9]
    }]
  }
 }
}

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug libc/27437] [aarch64]memcpy_simd has performance regression with larger size on Neoverse N1
  2021-02-19  2:56 [Bug libc/27437] New: [aarch64]memcpy_simd has performance regression with larger size on Neoverse N1 xuchunmei at linux dot alibaba.com
                   ` (2 preceding siblings ...)
  2021-02-19  3:42 ` xuchunmei at linux dot alibaba.com
@ 2021-02-19 11:27 ` wdijkstr at arm dot com
  2021-02-19 11:54 ` xuchunmei at linux dot alibaba.com
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: wdijkstr at arm dot com @ 2021-02-19 11:27 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=27437

--- Comment #3 from Wilco <wdijkstr at arm dot com> ---
So bench-memcpy-random shows __memcpy_simd is fastest by a good margin for
small cases. Do you see similar differences between __memcpy_generic and
__memcpy_simd in bench-memcpy-walk or bench-memcpy-large?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug libc/27437] [aarch64]memcpy_simd has performance regression with larger size on Neoverse N1
  2021-02-19  2:56 [Bug libc/27437] New: [aarch64]memcpy_simd has performance regression with larger size on Neoverse N1 xuchunmei at linux dot alibaba.com
                   ` (3 preceding siblings ...)
  2021-02-19 11:27 ` wdijkstr at arm dot com
@ 2021-02-19 11:54 ` xuchunmei at linux dot alibaba.com
  2021-02-19 11:55 ` xuchunmei at linux dot alibaba.com
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: xuchunmei at linux dot alibaba.com @ 2021-02-19 11:54 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=27437

--- Comment #4 from xuchunmei <xuchunmei at linux dot alibaba.com> ---
(In reply to Wilco from comment #3)
> So bench-memcpy-random shows __memcpy_simd is fastest by a good margin for
> small cases. Do you see similar differences between __memcpy_generic and
> __memcpy_simd in bench-memcpy-walk or bench-memcpy-large?

# ./bench-memcpy-walk 
{
 "timing_type": "hp_timing",
 "functions": {
  "memcpy": {
   "bench-variant": "walk",
   "ifuncs": ["__memcpy_thunderx", "__memcpy_thunderx2", "__memcpy_falkor",
"__memcpy_simd", "__memcpy_generic"],
   "results": [
    {
     "length": 128,
     "timings": [33.76, 34.9569, 9.54915, 9.57585, 9.47256]
    },
    {
     "length": 129,
     "timings": [35.2716, 31.849, 30.5517, 29.9549, 31.5533]
    },
    {
     "length": 256,
     "timings": [62.829, 60.8114, 57.7517, 56.2223, 57.1628]
    },
    {
     "length": 257,
     "timings": [49.6077, 47.399, 47.2415, 46.5737, 47.5711]
    },
    {
     "length": 512,
     "timings": [113.586, 113.427, 115.86, 116.922, 116.778]
    },
    {
     "length": 513,
     "timings": [106.793, 101.588, 94.7251, 88.1578, 88.2201]
    },
    {
     "length": 1024,
     "timings": [121.122, 122.055, 128.414, 122.723, 123.736]
    },
    {
     "length": 1025,
     "timings": [210.901, 195.7, 195.425, 181.864, 172.566]
    },
    {
     "length": 2048,
     "timings": [218.448, 224.068, 223.655, 230.854, 216.989]
    },
    {
     "length": 2049,
     "timings": [321.531, 329.909, 289.185, 285.851, 281.323]
    },
    {
     "length": 4096,
     "timings": [374.179, 401.979, 384.495, 392.112, 381.073]
    },
    {
     "length": 4097,
     "timings": [450.306, 510.441, 414.745, 401.545, 401.715]
    },
    {
     "length": 8192,
     "timings": [667.217, 673.253, 677.579, 694.132, 674.405]
    },
    {
     "length": 8193,
     "timings": [679.844, 768.811, 610.683, 591.152, 591.464]
    },
    {
     "length": 16384,
     "timings": [1236.02, 1206.93, 1261.11, 1287.68, 1255.33]
    },
    {
     "length": 16385,
     "timings": [1102.34, 1254.92, 1071.11, 1054.39, 1053.36]
    },
    {
     "length": 32768,
     "timings": [2275.35, 2294.88, 2424.76, 2472.89, 2404.96]
    },
    {
     "length": 32769,
     "timings": [2328.63, 2305.86, 2109.21, 2002.92, 2024.76]
    },
    {
     "length": 65536,
     "timings": [4437.26, 4435.93, 4803.28, 4880.86, 4743.9]
    },
    {
     "length": 65537,
     "timings": [4355.46, 4326.86, 4179.13, 4072.96, 4076.16]
    },
    {
     "length": 131072,
     "timings": [8670.91, 8735.29, 9394.05, 9515.15, 9383.43]
    },
    {
     "length": 131073,
     "timings": [8454.82, 9398.74, 8723.57, 8726.39, 8669.98]
    },
    {
     "length": 262144,
     "timings": [17410.9, 17450.9, 18792.5, 18928.1, 18682.2]
    },
    {
     "length": 262145,
     "timings": [16825.3, 16689.3, 17449.6, 17574.3, 17360.3]
    },
    {
     "length": 524288,
     "timings": [34310.5, 34433.7, 37218.7, 37446.8, 37060.6]
    },
    {
     "length": 524289,
     "timings": [33399.9, 33441.8, 34781.2, 34548.7, 34354.3]
    },
    {
     "length": 1048576,
     "timings": [68204.8, 68134.8, 74178.7, 74529.9, 73581.2]
    },
    {
     "length": 1048577,
     "timings": [64777.2, 65532.3, 69548.6, 69252.3, 68627.3]
    },
    {
     "length": 2097152,
     "timings": [134797, 135150, 146707, 147932, 146395]
    },
    {
     "length": 2097153,
     "timings": [131402, 130510, 141845, 142190, 141580]
    },
    {
     "length": 4194304,
     "timings": [268444, 269134, 292860, 295185, 293134]
    },
    {
     "length": 4194305,
     "timings": [265649, 265709, 287879, 289754, 288134]
    },
    {
     "length": 8388608,
     "timings": [534478, 538868, 585879, 589639, 584779]
    },
    {
     "length": 8388609,
     "timings": [533418, 535318, 581869, 587639, 580609]
    },
    {
     "length": 16777216,
     "timings": [1.07644e+06, 1.07876e+06, 1.17434e+06, 1.18022e+06,
1.17142e+06]
    },
    {
     "length": 16777217,
     "timings": [1.07702e+06, 1.07438e+06, 1.1722e+06, 1.18212e+06,
1.17316e+06]
    },
    {
     "length": 33554432,
     "timings": [2.14187e+06, 2.15503e+06, 2.35112e+06, 2.38628e+06,
2.34704e+06]
    },
    {
     "length": 33554433,
     "timings": [2.16367e+06, 2.15555e+06, 2.35476e+06, 2.36968e+06,
2.35604e+06]
    }]
  }
 }
}

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug libc/27437] [aarch64]memcpy_simd has performance regression with larger size on Neoverse N1
  2021-02-19  2:56 [Bug libc/27437] New: [aarch64]memcpy_simd has performance regression with larger size on Neoverse N1 xuchunmei at linux dot alibaba.com
                   ` (4 preceding siblings ...)
  2021-02-19 11:54 ` xuchunmei at linux dot alibaba.com
@ 2021-02-19 11:55 ` xuchunmei at linux dot alibaba.com
  2021-02-19 11:59 ` xuchunmei at linux dot alibaba.com
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: xuchunmei at linux dot alibaba.com @ 2021-02-19 11:55 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=27437

--- Comment #5 from xuchunmei <xuchunmei at linux dot alibaba.com> ---
# ./bench-memcpy-large 
{
 "timing_type": "hp_timing",
 "functions": {
  "memcpy": {
   "bench-variant": "large",
   "ifuncs": ["__memcpy_thunderx", "__memcpy_thunderx2", "__memcpy_falkor",
"__memcpy_simd", "__memcpy_generic"],
   "results": [
    {
     "length": 65543,
     "align1": 0,
     "align2": 0,
     "timings": [12775.2, 1480, 1425.06, 1420, 1425]
    },
    {
     "length": 65551,
     "align1": 0,
     "align2": 3,
     "timings": [2550.06, 1752.5, 1527.56, 1487.5, 2330.06]
    },
    {
     "length": 65567,
     "align1": 3,
     "align2": 0,
     "timings": [2250, 1747.56, 1452.5, 1432.56, 2150]
    },
    {
     "length": 65599,
     "align1": 3,
     "align2": 5,
     "timings": [2367.56, 1740, 1532.56, 1482.5, 2347.56]
    },
    {
     "length": 131079,
     "align1": 0,
     "align2": 0,
     "timings": [4535.06, 3045.06, 2805.06, 2977.56, 2892.5]
    },
    {
     "length": 131087,
     "align1": 0,
     "align2": 3,
     "timings": [6550.06, 3522.5, 3127.56, 3110.06, 5510.06]
    },
    {
     "length": 131103,
     "align1": 3,
     "align2": 0,
     "timings": [6780.12, 3455, 2832.5, 3102.56, 5442.56]
    },
    {
     "length": 131135,
     "align1": 3,
     "align2": 5,
     "timings": [6570.06, 3485.06, 3132.56, 3857.56, 5500.12]
    },
    {
     "length": 262151,
     "align1": 0,
     "align2": 0,
     "timings": [9050.12, 6047.62, 5560.06, 5900.12, 6667.62]
    },
    {
     "length": 262159,
     "align1": 0,
     "align2": 3,
     "timings": [13042.7, 6915.12, 6185.12, 6100.06, 10887.7]
    },
    {
     "length": 262175,
     "align1": 3,
     "align2": 0,
     "timings": [13512.7, 6900.12, 7135.06, 6115.06, 10787.7]
    },
    {
     "length": 262207,
     "align1": 3,
     "align2": 5,
     "timings": [13045.2, 6882.56, 6152.62, 6100.06, 10880.2]
    },
    {
     "length": 524295,
     "align1": 0,
     "align2": 0,
     "timings": [19175.2, 12147.7, 11115.1, 11762.7, 12040.2]
    },
    {
     "length": 524303,
     "align1": 0,
     "align2": 3,
     "timings": [26097.9, 13782.7, 12357.7, 12100.2, 21637.9]
    },
    {
     "length": 524319,
     "align1": 3,
     "align2": 0,
     "timings": [27090.4, 13725.2, 11530.1, 12185.2, 22050.3]
    },
    {
     "length": 524351,
     "align1": 3,
     "align2": 5,
     "timings": [26545.4, 13832.8, 12300.2, 12102.7, 21635.3]
    },
    {
     "length": 1048583,
     "align1": 0,
     "align2": 0,
     "timings": [56493.4, 51520.8, 55768.4, 57578.4, 55663.4]
    },
    {
     "length": 1048591,
     "align1": 0,
     "align2": 3,
     "timings": [63180.9, 55333.4, 52825.8, 52888.4, 67321]
    },
    {
     "length": 1048607,
     "align1": 3,
     "align2": 0,
     "timings": [64918.6, 54993.3, 56908.4, 58403.4, 68601.1]
    },
    {
     "length": 1048639,
     "align1": 3,
     "align2": 5,
     "timings": [62773.5, 54600.8, 55968.4, 54903.4, 64518.5]
    },
    {
     "length": 2097159,
     "align1": 0,
     "align2": 0,
     "timings": [137770, 140780, 153962, 153740, 153040]
    },
    {
     "length": 2097167,
     "align1": 0,
     "align2": 3,
     "timings": [145682, 146480, 150777, 150967, 160282]
    },
    {
     "length": 2097183,
     "align1": 3,
     "align2": 0,
     "timings": [149220, 144587, 156512, 155420, 164575]
    },
    {
     "length": 2097215,
     "align1": 3,
     "align2": 5,
     "timings": [142237, 139987, 147570, 148227, 158155]
    },
    {
     "length": 4194311,
     "align1": 0,
     "align2": 0,
     "timings": [305932, 297210, 320072, 322650, 316517]
    },
    {
     "length": 4194319,
     "align1": 0,
     "align2": 3,
     "timings": [297642, 299982, 313495, 316667, 330998]
    },
    {
     "length": 4194335,
     "align1": 3,
     "align2": 0,
     "timings": [304000, 299292, 317437, 320967, 333400]
    },
    {
     "length": 4194367,
     "align1": 3,
     "align2": 5,
     "timings": [299717, 297707, 317915, 318242, 331165]
    },
    {
     "length": 8388615,
     "align1": 0,
     "align2": 0,
     "timings": [630660, 604037, 649978, 655200, 646123]
    },
    {
     "length": 8388623,
     "align1": 0,
     "align2": 3,
     "timings": [622982, 614902, 642418, 646813, 670370]
    },
    {
     "length": 8388639,
     "align1": 3,
     "align2": 0,
     "timings": [626285, 615827, 646858, 650653, 669793]
    },
    {
     "length": 8388671,
     "align1": 3,
     "align2": 5,
     "timings": [629460, 622300, 647858, 656640, 696578]
    },
    {
     "length": 16777223,
     "align1": 0,
     "align2": 0,
     "timings": [1.29937e+06, 1.25757e+06, 1.29938e+06, 1.29798e+06,
1.28434e+06]
    },
    {
     "length": 16777231,
     "align1": 0,
     "align2": 3,
     "timings": [1.24592e+06, 1.23578e+06, 1.3021e+06, 1.29313e+06,
1.33601e+06]
    },
    {
     "length": 16777247,
     "align1": 3,
     "align2": 0,
     "timings": [1.27559e+06, 1.24098e+06, 1.29053e+06, 1.29575e+06,
1.33914e+06]
    },
    {
     "length": 16777279,
     "align1": 3,
     "align2": 5,
     "timings": [1.26273e+06, 1.23506e+06, 1.28375e+06, 1.29168e+06,
1.3573e+06]
    },
    {
     "length": 33554439,
     "align1": 0,
     "align2": 0,
     "timings": [2.63771e+06, 2.50596e+06, 2.6239e+06, 2.6192e+06, 2.62283e+06]
    },
    {
     "length": 33554447,
     "align1": 0,
     "align2": 3,
     "timings": [2.59987e+06, 2.50243e+06, 2.60401e+06, 2.62009e+06,
2.68767e+06]
    },
    {
     "length": 33554463,
     "align1": 3,
     "align2": 0,
     "timings": [2.61623e+06, 2.49325e+06, 2.7087e+06, 2.78022e+06,
2.84058e+06]
    },
    {
     "length": 33554495,
     "align1": 3,
     "align2": 5,
     "timings": [2.70683e+06, 2.50238e+06, 2.59609e+06, 2.60011e+06,
2.67857e+06]
    }]
  }
 }
}

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug libc/27437] [aarch64]memcpy_simd has performance regression with larger size on Neoverse N1
  2021-02-19  2:56 [Bug libc/27437] New: [aarch64]memcpy_simd has performance regression with larger size on Neoverse N1 xuchunmei at linux dot alibaba.com
                   ` (5 preceding siblings ...)
  2021-02-19 11:55 ` xuchunmei at linux dot alibaba.com
@ 2021-02-19 11:59 ` xuchunmei at linux dot alibaba.com
  2021-02-19 13:12 ` wdijkstr at arm dot com
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: xuchunmei at linux dot alibaba.com @ 2021-02-19 11:59 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=27437

--- Comment #6 from xuchunmei <xuchunmei at linux dot alibaba.com> ---
bench-memcpy-walk result show thar when length is larger than 1024,
__memcpy_simd seems a little slower than __memcpy_generic.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug libc/27437] [aarch64]memcpy_simd has performance regression with larger size on Neoverse N1
  2021-02-19  2:56 [Bug libc/27437] New: [aarch64]memcpy_simd has performance regression with larger size on Neoverse N1 xuchunmei at linux dot alibaba.com
                   ` (6 preceding siblings ...)
  2021-02-19 11:59 ` xuchunmei at linux dot alibaba.com
@ 2021-02-19 13:12 ` wdijkstr at arm dot com
  2021-02-19 14:28 ` xuchunmei at linux dot alibaba.com
  2021-02-19 14:33 ` xuchunmei at linux dot alibaba.com
  9 siblings, 0 replies; 11+ messages in thread
From: wdijkstr at arm dot com @ 2021-02-19 13:12 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=27437

--- Comment #7 from Wilco <wdijkstr at arm dot com> ---
(In reply to xuchunmei from comment #6)
> bench-memcpy-walk result show thar when length is larger than 1024,
> __memcpy_simd seems a little slower than __memcpy_generic.

Yes but the difference is small, and in bench-memcpy-large __memcpy_simd wins
by a huge margin on the unaligned cases.

Since none of these reproduce what you are seeing, would it be possible to
create a small testcase that demonstrates the issue you are seeing in
perf-bench-mem?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug libc/27437] [aarch64]memcpy_simd has performance regression with larger size on Neoverse N1
  2021-02-19  2:56 [Bug libc/27437] New: [aarch64]memcpy_simd has performance regression with larger size on Neoverse N1 xuchunmei at linux dot alibaba.com
                   ` (7 preceding siblings ...)
  2021-02-19 13:12 ` wdijkstr at arm dot com
@ 2021-02-19 14:28 ` xuchunmei at linux dot alibaba.com
  2021-02-19 14:33 ` xuchunmei at linux dot alibaba.com
  9 siblings, 0 replies; 11+ messages in thread
From: xuchunmei at linux dot alibaba.com @ 2021-02-19 14:28 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=27437

--- Comment #8 from xuchunmei <xuchunmei at linux dot alibaba.com> ---
(In reply to Wilco from comment #7)
> (In reply to xuchunmei from comment #6)
> > bench-memcpy-walk result show thar when length is larger than 1024,
> > __memcpy_simd seems a little slower than __memcpy_generic.
> 
> Yes but the difference is small, and in bench-memcpy-large __memcpy_simd
> wins by a huge margin on the unaligned cases.
> 
> Since none of these reproduce what you are seeing, would it be possible to
> create a small testcase that demonstrates the issue you are seeing in
> perf-bench-mem?

sorry to bring confusion, performance regression is not caused by
__memcpy_simd, it is the difference of my test env, the difference is not only
glibc, but also other differences, I will check again.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug libc/27437] [aarch64]memcpy_simd has performance regression with larger size on Neoverse N1
  2021-02-19  2:56 [Bug libc/27437] New: [aarch64]memcpy_simd has performance regression with larger size on Neoverse N1 xuchunmei at linux dot alibaba.com
                   ` (8 preceding siblings ...)
  2021-02-19 14:28 ` xuchunmei at linux dot alibaba.com
@ 2021-02-19 14:33 ` xuchunmei at linux dot alibaba.com
  9 siblings, 0 replies; 11+ messages in thread
From: xuchunmei at linux dot alibaba.com @ 2021-02-19 14:33 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=27437

xuchunmei <xuchunmei at linux dot alibaba.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |NOTABUG

--- Comment #9 from xuchunmei <xuchunmei at linux dot alibaba.com> ---
since bench-memcpy-random and bench-memcpy-large result has showed that
__memcpy_simd has no regression, and my test env has more differences no just
glibc.
I will check in detail.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-02-19 14:33 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-19  2:56 [Bug libc/27437] New: [aarch64]memcpy_simd has performance regression with larger size on Neoverse N1 xuchunmei at linux dot alibaba.com
2021-02-19  2:57 ` [Bug libc/27437] " xuchunmei at linux dot alibaba.com
2021-02-19  3:16 ` carlos at redhat dot com
2021-02-19  3:42 ` xuchunmei at linux dot alibaba.com
2021-02-19 11:27 ` wdijkstr at arm dot com
2021-02-19 11:54 ` xuchunmei at linux dot alibaba.com
2021-02-19 11:55 ` xuchunmei at linux dot alibaba.com
2021-02-19 11:59 ` xuchunmei at linux dot alibaba.com
2021-02-19 13:12 ` wdijkstr at arm dot com
2021-02-19 14:28 ` xuchunmei at linux dot alibaba.com
2021-02-19 14:33 ` xuchunmei at linux dot alibaba.com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).