public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3()
[not found] <bug-24872-131@http.sourceware.org/bugzilla/>
@ 2020-05-30 15:30 ` jan at jki dot io
2021-03-24 23:46 ` hjl.tools at gmail dot com
` (14 subsequent siblings)
15 siblings, 0 replies; 16+ messages in thread
From: jan at jki dot io @ 2020-05-30 15:30 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=24872
Jan <jan at jki dot io> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jan at jki dot io
--- Comment #1 from Jan <jan at jki dot io> ---
This also applies to AMD CPU's.
I get worse performance with avx on a 3970x
model name : AMD Ryzen Threadripper 3970X 32-Core Processor
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb
rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf
pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx
f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse
3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext
perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate sme ssbd mba sev ibpb stibp
vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb
sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total
cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock
nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter
pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3()
[not found] <bug-24872-131@http.sourceware.org/bugzilla/>
2020-05-30 15:30 ` [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3() jan at jki dot io
@ 2021-03-24 23:46 ` hjl.tools at gmail dot com
2021-03-25 0:05 ` hjl.tools at gmail dot com
` (13 subsequent siblings)
15 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2021-03-24 23:46 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=24872
H.J. Lu <hjl.tools at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |hjl.tools at gmail dot com,
| |skpgkp2 at gmail dot com
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3()
[not found] <bug-24872-131@http.sourceware.org/bugzilla/>
2020-05-30 15:30 ` [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3() jan at jki dot io
2021-03-24 23:46 ` hjl.tools at gmail dot com
@ 2021-03-25 0:05 ` hjl.tools at gmail dot com
2021-03-25 2:42 ` skpgkp2 at gmail dot com
` (12 subsequent siblings)
15 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2021-03-25 0:05 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=24872
--- Comment #2 from H.J. Lu <hjl.tools at gmail dot com> ---
On Intel i7-8559U, for glibc master branch, benchtests/bench-memcpy-large
shows:
Function: memcpy
Variant: large
__memcpy_avx_unaligned_erms __memcpy_ssse3
==============================================================================
length=65543, align1=0, align2=0: 14301.30 17782.90
length=65551, align1=0, align2=3: 15940.10 17913.20
length=65567, align1=3, align2=0: 13492.10 17706.70
length=65599, align1=3, align2=5: 16658.40 17927.30
length=131079, align1=0, align2=0: 30461.70 44844.90
length=131087, align1=0, align2=3: 10647.30 12163.80
length=131103, align1=3, align2=0: 9425.44 12177.90
length=131135, align1=3, align2=5: 11126.40 12181.70
length=262151, align1=0, align2=0: 23470.20 39790.60
length=262159, align1=0, align2=3: 33363.40 33786.50
length=262175, align1=3, align2=0: 23122.00 29927.70
length=262207, align1=3, align2=5: 25862.10 28582.20
length=524295, align1=0, align2=0: 45083.10 55485.80
length=524303, align1=0, align2=3: 47938.20 54088.40
length=524319, align1=3, align2=0: 42350.10 51983.70
length=524351, align1=3, align2=5: 45029.20 52464.20
length=1048583, align1=0, align2=0: 88527.90 101156.00
length=1048591, align1=0, align2=3: 93855.80 100754.00
length=1048607, align1=3, align2=0: 94034.90 100673.00
length=1048639, align1=3, align2=5: 90740.50 103256.00
length=2097159, align1=0, align2=0: 185803.00 193467.00
length=2097167, align1=0, align2=3: 187839.00 211012.00
length=2097183, align1=3, align2=0: 186758.00 195055.00
length=2097215, align1=3, align2=5: 190751.00 195920.00
length=4194311, align1=0, align2=0: 374530.00 391675.00
length=4194319, align1=0, align2=3: 378556.00 395988.00
length=4194335, align1=3, align2=0: 376987.00 396840.00
length=4194367, align1=3, align2=5: 380713.00 399326.00
length=8388615, align1=0, align2=0: 1248790.00 1296470.00
length=8388623, align1=0, align2=3: 924123.00 1011000.00
length=8388639, align1=3, align2=0: 910170.00 926244.00
length=8388671, align1=3, align2=5: 915979.00 1011690.00
length=16777223, align1=0, align2=0: 2119530.00 2228360.00
length=16777231, align1=0, align2=3: 2123510.00 2321720.00
length=16777247, align1=3, align2=0: 2092680.00 2231230.00
length=16777279, align1=3, align2=5: 2121050.00 2280890.00
length=33554439, align1=0, align2=0: 4881620.00 4770780.00
length=33554447, align1=0, align2=3: 4634040.00 4795500.00
length=33554463, align1=3, align2=0: 4599820.00 4676770.00
length=33554495, align1=3, align2=5: 4638870.00 4841840.00
avx_unaligned_erms is faster than ssse3
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3()
[not found] <bug-24872-131@http.sourceware.org/bugzilla/>
` (2 preceding siblings ...)
2021-03-25 0:05 ` hjl.tools at gmail dot com
@ 2021-03-25 2:42 ` skpgkp2 at gmail dot com
2021-03-25 3:00 ` skpgkp2 at gmail dot com
` (11 subsequent siblings)
15 siblings, 0 replies; 16+ messages in thread
From: skpgkp2 at gmail dot com @ 2021-03-25 2:42 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=24872
--- Comment #3 from Sunil Pandey <skpgkp2 at gmail dot com> ---
Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz, for glibc master branch,
benchtests/bench-memcpy-large. avx_unaligned_erms is faster than ssse3
shows:
Function: memcpy
Variant: large
__memcpy_avx_unaligned_erms __memcpy_ssse3
========================================================================================================================
length=65543, align1=0, align2=0: 4633.50 6862.62 (-48.11%)
length=65551, align1=0, align2=3: 5152.25 6498.88 (-26.14%)
length=65567, align1=3, align2=0: 5052.12 6385.75 (-26.40%)
length=65599, align1=3, align2=5: 5811.12 6420.50 (-10.49%)
length=131079, align1=0, align2=0: 9181.50 12541.40 (-36.59%)
length=131087, align1=0, align2=3: 10162.00 12765.60 (-25.62%)
length=131103, align1=3, align2=0: 9961.50 12600.60 (-26.49%)
length=131135, align1=3, align2=5: 10134.60 12671.10 (-25.03%)
length=262151, align1=0, align2=0: 17199.90 24132.40 (-40.31%)
length=262159, align1=0, align2=3: 19601.20 24818.40 (-26.62%)
length=262175, align1=3, align2=0: 18511.60 23472.00 (-26.80%)
length=262207, align1=3, align2=5: 18139.80 22806.60 (-25.73%)
length=524295, align1=0, align2=0: 43515.40 67501.40 (-55.12%)
length=524303, align1=0, align2=3: 44062.60 70280.60 (-59.50%)
length=524319, align1=3, align2=0: 41980.60 67370.60 (-60.48%)
length=524351, align1=3, align2=5: 39343.60 65058.90 (-65.36%)
length=1048583, align1=0, align2=0: 637645.00 704786.00 (-10.53%)
length=1048591, align1=0, align2=3: 546501.00 551314.00 ( -0.88%)
length=1048607, align1=3, align2=0: 493258.00 542408.00 ( -9.96%)
length=1048639, align1=3, align2=5: 457022.00 513160.00 (-12.28%)
length=2097159, align1=0, align2=0: 928221.00 1055570.00 (-13.72%)
length=2097167, align1=0, align2=3: 934195.00 975572.00 ( -4.43%)
length=2097183, align1=3, align2=0: 929252.00 1052450.00 (-13.26%)
length=2097215, align1=3, align2=5: 934500.00 1047300.00 (-12.07%)
length=4194311, align1=0, align2=0: 1901330.00 2124790.00 (-11.75%)
length=4194319, align1=0, align2=3: 1931670.00 1954720.00 ( -1.19%)
length=4194335, align1=3, align2=0: 1906640.00 2113830.00 (-10.87%)
length=4194367, align1=3, align2=5: 1927260.00 2108930.00 ( -9.43%)
length=8388615, align1=0, align2=0: 3802180.00 4254990.00 (-11.91%)
length=8388623, align1=0, align2=3: 3858480.00 3962610.00 ( -2.70%)
length=8388639, align1=3, align2=0: 3797900.00 4233080.00 (-11.46%)
length=8388671, align1=3, align2=5: 3848300.00 4252190.00 (-10.50%)
length=16777223, align1=0, align2=0: 7604180.00 8557160.00 (-12.53%)
length=16777231, align1=0, align2=3: 7705930.00 7923390.00 ( -2.82%)
length=16777247, align1=3, align2=0: 7612860.00 8487690.00 (-11.49%)
length=16777279, align1=3, align2=5: 7708250.00 8512540.00 (-10.43%)
length=33554439, align1=0, align2=0: 15591300.00 17522200.00 (-12.38%)
length=33554447, align1=0, align2=3: 15808700.00 16259700.00 ( -2.85%)
length=33554463, align1=3, align2=0: 15535400.00 17188100.00 (-10.64%)
length=33554495, align1=3, align2=5: 15714100.00 17517900.00 (-11.48%)
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3()
[not found] <bug-24872-131@http.sourceware.org/bugzilla/>
` (3 preceding siblings ...)
2021-03-25 2:42 ` skpgkp2 at gmail dot com
@ 2021-03-25 3:00 ` skpgkp2 at gmail dot com
2021-03-25 5:53 ` skpgkp2 at gmail dot com
` (10 subsequent siblings)
15 siblings, 0 replies; 16+ messages in thread
From: skpgkp2 at gmail dot com @ 2021-03-25 3:00 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=24872
--- Comment #4 from Sunil Pandey <skpgkp2 at gmail dot com> ---
Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
Function: memcpy
Variant: large
__memcpy_avx_unaligned_erms __memcpy_ssse3
========================================================================================================================
length=65543, align1=0, align2=0: 20262.10 25710.00 (-26.89%)
length=65551, align1=0, align2=3: 23048.80 25749.00 (-11.72%)
length=65567, align1=3, align2=0: 19265.80 25841.80 (-34.13%)
length=65599, align1=3, align2=5: 24043.60 26029.60 ( -8.26%)
length=131079, align1=0, align2=0: 7462.25 9998.31 (-33.99%)
length=131087, align1=0, align2=3: 8639.44 10153.40 (-17.52%)
length=131103, align1=3, align2=0: 7628.75 9808.25 (-28.57%)
length=131135, align1=3, align2=5: 9045.75 9898.38 ( -9.43%)
length=262151, align1=0, align2=0: 19262.30 25770.40 (-33.79%)
length=262159, align1=0, align2=3: 21997.60 25890.30 (-17.70%)
length=262175, align1=3, align2=0: 20201.20 25578.40 (-26.62%)
length=262207, align1=3, align2=5: 22811.20 25794.90 (-13.08%)
length=524295, align1=0, align2=0: 43116.70 59701.20 (-38.46%)
length=524303, align1=0, align2=3: 46403.70 60424.80 (-30.22%)
length=524319, align1=3, align2=0: 45007.30 60604.30 (-34.65%)
length=524351, align1=3, align2=5: 48101.60 62129.50 (-29.16%)
length=1048583, align1=0, align2=0: 96434.60 106336.00 (-10.27%)
length=1048591, align1=0, align2=3: 94443.60 107462.00 (-13.78%)
length=1048607, align1=3, align2=0: 94346.10 112411.00 (-19.15%)
length=1048639, align1=3, align2=5: 99799.90 107768.00 ( -7.98%)
length=2097159, align1=0, align2=0: 192824.00 214921.00 (-11.46%)
length=2097167, align1=0, align2=3: 187735.00 209353.00 (-11.52%)
length=2097183, align1=3, align2=0: 187375.00 209017.00 (-11.55%)
length=2097215, align1=3, align2=5: 187469.00 225963.00 (-20.53%)
length=4194311, align1=0, align2=0: 411755.00 429404.00 ( -4.29%)
length=4194319, align1=0, align2=3: 398156.00 449104.00 (-12.80%)
length=4194335, align1=3, align2=0: 387210.00 454115.00 (-17.28%)
length=4194367, align1=3, align2=5: 408760.00 434598.00 ( -6.32%)
length=8388615, align1=0, align2=0: 969548.00 1024490.00 ( -5.67%)
length=8388623, align1=0, align2=3: 955586.00 1011370.00 ( -5.84%)
length=8388639, align1=3, align2=0: 944273.00 987034.00 ( -4.53%)
length=8388671, align1=3, align2=5: 938864.00 1008330.00 ( -7.40%)
length=16777223, align1=0, align2=0: 3201330.00 3225610.00 ( -0.76%)
length=16777231, align1=0, align2=3: 3027600.00 3269680.00 ( -8.00%)
length=16777247, align1=3, align2=0: 3018960.00 3208180.00 ( -6.27%)
length=16777279, align1=3, align2=5: 3056540.00 3295340.00 ( -7.81%)
length=33554439, align1=0, align2=0: 6797390.00 6965460.00 ( -2.47%)
length=33554447, align1=0, align2=3: 6660090.00 6948620.00 ( -4.33%)
length=33554463, align1=3, align2=0: 6695920.00 6848340.00 ( -2.28%)
length=33554495, align1=3, align2=5: 6632220.00 6942640.00 ( -4.68%)
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3()
[not found] <bug-24872-131@http.sourceware.org/bugzilla/>
` (4 preceding siblings ...)
2021-03-25 3:00 ` skpgkp2 at gmail dot com
@ 2021-03-25 5:53 ` skpgkp2 at gmail dot com
2021-03-25 13:04 ` hjl.tools at gmail dot com
` (9 subsequent siblings)
15 siblings, 0 replies; 16+ messages in thread
From: skpgkp2 at gmail dot com @ 2021-03-25 5:53 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=24872
--- Comment #5 from Sunil Pandey <skpgkp2 at gmail dot com> ---
Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
1 million iteration:
Function: memcpy
Variant: large
__memcpy_avx_unaligned_erms __memcpy_ssse3
===============================================================================
length=1048576, align1=0, align2=0: 95223.20 100984.00 ( -6.05%)
length=1048576, align1=0, align2=3: 95244.40 104451.00 ( -9.67%)
length=1048576, align1=3, align2=0: 95258.70 101457.00 ( -6.51%)
length=1048576, align1=3, align2=5: 95225.30 104204.00 ( -9.43%)
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3()
[not found] <bug-24872-131@http.sourceware.org/bugzilla/>
` (5 preceding siblings ...)
2021-03-25 5:53 ` skpgkp2 at gmail dot com
@ 2021-03-25 13:04 ` hjl.tools at gmail dot com
2021-03-25 13:05 ` hjl.tools at gmail dot com
` (8 subsequent siblings)
15 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2021-03-25 13:04 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=24872
H.J. Lu <hjl.tools at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Last reconfirmed| |2021-03-25
Status|UNCONFIRMED |WAITING
--- Comment #6 from H.J. Lu <hjl.tools at gmail dot com> ---
On Intel i7-8559U:
hjl@gnu-cfl-2 tmp]$ cat x.c
#include <string.h>
char s_buffer[1024*1024];
char s_buffer2[1024*1024];
int main(int argc, const char* argv[])
{
unsigned long i = 0;
for (i = 0; i < 1000000; ++i)
memcpy(s_buffer, s_buffer2, 1024*1024);
return 0;
}
[hjl@gnu-cfl-2 tmp]$ gcc -O2 x.c
[hjl@gnu-cfl-2 tmp]$ time ./a.out
real 0m20.678s
user 0m20.652s
sys 0m0.005s
[hjl@gnu-cfl-2 tmp]$ time
GLIBC_TUNABLES=glibc.cpu.hwcaps=-AVX_Fast_Unaligned_Load,-Fast_Unaligned_Copy
./a.out
real 0m20.741s
user 0m20.718s
sys 0m0.006s
[hjl@gnu-cfl-2 tmp]$
__memmove_avx_unaligned_erms is slightly faster.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3()
[not found] <bug-24872-131@http.sourceware.org/bugzilla/>
` (6 preceding siblings ...)
2021-03-25 13:04 ` hjl.tools at gmail dot com
@ 2021-03-25 13:05 ` hjl.tools at gmail dot com
2021-03-25 13:16 ` skpgkp2 at gmail dot com
` (7 subsequent siblings)
15 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2021-03-25 13:05 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=24872
H.J. Lu <hjl.tools at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |lili.cui at intel dot com
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3()
[not found] <bug-24872-131@http.sourceware.org/bugzilla/>
` (7 preceding siblings ...)
2021-03-25 13:05 ` hjl.tools at gmail dot com
@ 2021-03-25 13:16 ` skpgkp2 at gmail dot com
2021-03-25 16:28 ` skpgkp2 at gmail dot com
` (6 subsequent siblings)
15 siblings, 0 replies; 16+ messages in thread
From: skpgkp2 at gmail dot com @ 2021-03-25 13:16 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=24872
--- Comment #7 from Sunil Pandey <skpgkp2 at gmail dot com> ---
Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Function: memcpy
Variant: large
__memcpy_avx_unaligned_erms __memcpy_ssse3
==============================================================================
length=1048576, align1=0, align2=0: 159367.00 170874.00 ( -7.22%)
length=1048576, align1=0, align2=3: 163147.00 171467.00 ( -5.10%)
length=1048576, align1=3, align2=0: 162648.00 171993.00 ( -5.75%)
length=1048576, align1=3, align2=5: 163561.00 171564.00 ( -4.89%)
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3()
[not found] <bug-24872-131@http.sourceware.org/bugzilla/>
` (8 preceding siblings ...)
2021-03-25 13:16 ` skpgkp2 at gmail dot com
@ 2021-03-25 16:28 ` skpgkp2 at gmail dot com
2021-03-25 16:59 ` skpgkp2 at gmail dot com
` (5 subsequent siblings)
15 siblings, 0 replies; 16+ messages in thread
From: skpgkp2 at gmail dot com @ 2021-03-25 16:28 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=24872
--- Comment #8 from Sunil Pandey <skpgkp2 at gmail dot com> ---
Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
$ rpm -q glibc
glibc-2.28-42.el8.x86_64
$ time ./a.out
real 0m36.327s
user 0m36.290s
sys 0m0.002s
$ time
GLIBC_TUNABLES=glibc.cpu.hwcaps=-AVX_Fast_Unaligned_Load,-Fast_Unaligned_Copy
./a.out
real 0m36.257s
user 0m36.223s
sys 0m0.002s
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3()
[not found] <bug-24872-131@http.sourceware.org/bugzilla/>
` (9 preceding siblings ...)
2021-03-25 16:28 ` skpgkp2 at gmail dot com
@ 2021-03-25 16:59 ` skpgkp2 at gmail dot com
2021-05-29 5:54 ` gouhaojake at 163 dot com
` (4 subsequent siblings)
15 siblings, 0 replies; 16+ messages in thread
From: skpgkp2 at gmail dot com @ 2021-03-25 16:59 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=24872
--- Comment #9 from Sunil Pandey <skpgkp2 at gmail dot com> ---
Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
$ time ./a.out
real 0m29.613s
user 0m29.585s
sys 0m0.003s
$ time
GLIBC_TUNABLES=glibc.cpu.hwcaps=-AVX_Fast_Unaligned_Load,-Fast_Unaligned_Copy
./a.out
real 0m29.651s
user 0m29.622s
sys 0m0.004s
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3()
[not found] <bug-24872-131@http.sourceware.org/bugzilla/>
` (10 preceding siblings ...)
2021-03-25 16:59 ` skpgkp2 at gmail dot com
@ 2021-05-29 5:54 ` gouhaojake at 163 dot com
2021-05-30 1:24 ` gouhaojake at 163 dot com
` (3 subsequent siblings)
15 siblings, 0 replies; 16+ messages in thread
From: gouhaojake at 163 dot com @ 2021-05-29 5:54 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=24872
苟浩 <gouhaojake at 163 dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |gouhaojake at 163 dot com
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3()
[not found] <bug-24872-131@http.sourceware.org/bugzilla/>
` (11 preceding siblings ...)
2021-05-29 5:54 ` gouhaojake at 163 dot com
@ 2021-05-30 1:24 ` gouhaojake at 163 dot com
2021-05-30 1:40 ` hjl.tools at gmail dot com
` (2 subsequent siblings)
15 siblings, 0 replies; 16+ messages in thread
From: gouhaojake at 163 dot com @ 2021-05-30 1:24 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=24872
--- Comment #10 from 苟浩 <gouhaojake at 163 dot com> ---
I also encountered a similar problem when I used the stream tool to test the
memory bandwidth.
On the same machine, centos7 and centos8 are installed. Centos7 uses glibc-2.17
and centos8 uses glibc-2.28. The stream test data shows that the memory
performance of centos8 is much worse than that of centos7.
>From the flame diagram, the only difference between the two calls is that
glibc-2.28 uses __memmove_avx_unaligned_erms(), glibc-2.17 uses __memcpy_
ssse3()。
This machine is x86_64. However, under aarch64 architecture, glibc-2.17 and
glibc-2.28 perform almost the same.
The stream compile command uses:
# gcc -O3 -mcmodel=large -fopenmp -DSTREAM_ARRAY_SIZE=2147483648 -DNTIMES=30
stream.c -o stream
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3()
[not found] <bug-24872-131@http.sourceware.org/bugzilla/>
` (12 preceding siblings ...)
2021-05-30 1:24 ` gouhaojake at 163 dot com
@ 2021-05-30 1:40 ` hjl.tools at gmail dot com
2021-05-30 1:41 ` hjl.tools at gmail dot com
2021-05-30 5:07 ` gouhaojake at 163 dot com
15 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2021-05-30 1:40 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=24872
--- Comment #11 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to 苟浩 from comment #10)
> I also encountered a similar problem when I used the stream tool to test the
> memory bandwidth.
>
> On the same machine, centos7 and centos8 are installed. Centos7 uses
What is your CPU?
> glibc-2.17 and centos8 uses glibc-2.28. The stream test data shows that the
> memory performance of centos8 is much worse than that of centos7.
>
> From the flame diagram, the only difference between the two calls is that
> glibc-2.28 uses __memmove_avx_unaligned_erms(), glibc-2.17 uses __memcpy_
> ssse3()。
>
> This machine is x86_64. However, under aarch64 architecture, glibc-2.17 and
> glibc-2.28 perform almost the same.
>
> The stream compile command uses:
>
> # gcc -O3 -mcmodel=large -fopenmp -DSTREAM_ARRAY_SIZE=2147483648 -DNTIMES=30
> stream.c -o stream
What is stream.c?
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3()
[not found] <bug-24872-131@http.sourceware.org/bugzilla/>
` (13 preceding siblings ...)
2021-05-30 1:40 ` hjl.tools at gmail dot com
@ 2021-05-30 1:41 ` hjl.tools at gmail dot com
2021-05-30 5:07 ` gouhaojake at 163 dot com
15 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2021-05-30 1:41 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=24872
H.J. Lu <hjl.tools at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |dianhong.xu at intel dot com
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3()
[not found] <bug-24872-131@http.sourceware.org/bugzilla/>
` (14 preceding siblings ...)
2021-05-30 1:41 ` hjl.tools at gmail dot com
@ 2021-05-30 5:07 ` gouhaojake at 163 dot com
15 siblings, 0 replies; 16+ messages in thread
From: gouhaojake at 163 dot com @ 2021-05-30 5:07 UTC (permalink / raw)
To: glibc-bugs
https://sourceware.org/bugzilla/show_bug.cgi?id=24872
--- Comment #12 from 苟浩 <gouhaojake at 163 dot com> ---
(In reply to H.J. Lu from comment #11)
x86_64 is hygon 7285, aarch64 is huawei kunpeng920
stream.c in http://www.cs.virginia.edu/stream/FTP/Code/
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2021-05-30 5:07 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <bug-24872-131@http.sourceware.org/bugzilla/>
2020-05-30 15:30 ` [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3() jan at jki dot io
2021-03-24 23:46 ` hjl.tools at gmail dot com
2021-03-25 0:05 ` hjl.tools at gmail dot com
2021-03-25 2:42 ` skpgkp2 at gmail dot com
2021-03-25 3:00 ` skpgkp2 at gmail dot com
2021-03-25 5:53 ` skpgkp2 at gmail dot com
2021-03-25 13:04 ` hjl.tools at gmail dot com
2021-03-25 13:05 ` hjl.tools at gmail dot com
2021-03-25 13:16 ` skpgkp2 at gmail dot com
2021-03-25 16:28 ` skpgkp2 at gmail dot com
2021-03-25 16:59 ` skpgkp2 at gmail dot com
2021-05-29 5:54 ` gouhaojake at 163 dot com
2021-05-30 1:24 ` gouhaojake at 163 dot com
2021-05-30 1:40 ` hjl.tools at gmail dot com
2021-05-30 1:41 ` hjl.tools at gmail dot com
2021-05-30 5:07 ` gouhaojake at 163 dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).