public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3()
       [not found] <bug-24872-131@http.sourceware.org/bugzilla/>
@ 2020-05-30 15:30 ` jan at jki dot io
  2021-03-24 23:46 ` hjl.tools at gmail dot com
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: jan at jki dot io @ 2020-05-30 15:30 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=24872

Jan <jan at jki dot io> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jan at jki dot io

--- Comment #1 from Jan <jan at jki dot io> ---
This also applies  to AMD CPU's.
I get worse performance with avx on a 3970x

model name      : AMD Ryzen Threadripper 3970X 32-Core Processor
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb
rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf
pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx
f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse
3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext
perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate sme ssbd mba sev ibpb stibp
vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb
sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total
cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock
nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter
pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3()
       [not found] <bug-24872-131@http.sourceware.org/bugzilla/>
  2020-05-30 15:30 ` [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3() jan at jki dot io
@ 2021-03-24 23:46 ` hjl.tools at gmail dot com
  2021-03-25  0:05 ` hjl.tools at gmail dot com
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2021-03-24 23:46 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=24872

H.J. Lu <hjl.tools at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hjl.tools at gmail dot com,
                   |                            |skpgkp2 at gmail dot com

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3()
       [not found] <bug-24872-131@http.sourceware.org/bugzilla/>
  2020-05-30 15:30 ` [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3() jan at jki dot io
  2021-03-24 23:46 ` hjl.tools at gmail dot com
@ 2021-03-25  0:05 ` hjl.tools at gmail dot com
  2021-03-25  2:42 ` skpgkp2 at gmail dot com
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2021-03-25  0:05 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=24872

--- Comment #2 from H.J. Lu <hjl.tools at gmail dot com> ---
On Intel i7-8559U, for glibc master branch, benchtests/bench-memcpy-large
shows:

Function: memcpy
Variant: large
                                    __memcpy_avx_unaligned_erms __memcpy_ssse3
==============================================================================
    length=65543, align1=0, align2=0:     14301.30          17782.90    
    length=65551, align1=0, align2=3:     15940.10          17913.20    
    length=65567, align1=3, align2=0:     13492.10          17706.70    
    length=65599, align1=3, align2=5:     16658.40          17927.30    
   length=131079, align1=0, align2=0:     30461.70          44844.90    
   length=131087, align1=0, align2=3:     10647.30          12163.80    
   length=131103, align1=3, align2=0:      9425.44          12177.90    
   length=131135, align1=3, align2=5:     11126.40          12181.70    
   length=262151, align1=0, align2=0:     23470.20          39790.60    
   length=262159, align1=0, align2=3:     33363.40          33786.50    
   length=262175, align1=3, align2=0:     23122.00          29927.70    
   length=262207, align1=3, align2=5:     25862.10          28582.20    
   length=524295, align1=0, align2=0:     45083.10          55485.80    
   length=524303, align1=0, align2=3:     47938.20          54088.40    
   length=524319, align1=3, align2=0:     42350.10          51983.70    
   length=524351, align1=3, align2=5:     45029.20          52464.20    
  length=1048583, align1=0, align2=0:     88527.90         101156.00    
  length=1048591, align1=0, align2=3:     93855.80         100754.00    
  length=1048607, align1=3, align2=0:     94034.90         100673.00    
  length=1048639, align1=3, align2=5:     90740.50         103256.00    
  length=2097159, align1=0, align2=0:    185803.00         193467.00    
  length=2097167, align1=0, align2=3:    187839.00         211012.00    
  length=2097183, align1=3, align2=0:    186758.00         195055.00    
  length=2097215, align1=3, align2=5:    190751.00         195920.00    
  length=4194311, align1=0, align2=0:    374530.00         391675.00    
  length=4194319, align1=0, align2=3:    378556.00         395988.00    
  length=4194335, align1=3, align2=0:    376987.00         396840.00    
  length=4194367, align1=3, align2=5:    380713.00         399326.00    
  length=8388615, align1=0, align2=0:   1248790.00        1296470.00    
  length=8388623, align1=0, align2=3:    924123.00        1011000.00    
  length=8388639, align1=3, align2=0:    910170.00         926244.00    
  length=8388671, align1=3, align2=5:    915979.00        1011690.00    
 length=16777223, align1=0, align2=0:   2119530.00        2228360.00    
 length=16777231, align1=0, align2=3:   2123510.00        2321720.00    
 length=16777247, align1=3, align2=0:   2092680.00        2231230.00    
 length=16777279, align1=3, align2=5:   2121050.00        2280890.00    
 length=33554439, align1=0, align2=0:   4881620.00        4770780.00    
 length=33554447, align1=0, align2=3:   4634040.00        4795500.00    
 length=33554463, align1=3, align2=0:   4599820.00        4676770.00    
 length=33554495, align1=3, align2=5:   4638870.00        4841840.00

avx_unaligned_erms is faster than ssse3

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3()
       [not found] <bug-24872-131@http.sourceware.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2021-03-25  0:05 ` hjl.tools at gmail dot com
@ 2021-03-25  2:42 ` skpgkp2 at gmail dot com
  2021-03-25  3:00 ` skpgkp2 at gmail dot com
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: skpgkp2 at gmail dot com @ 2021-03-25  2:42 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=24872

--- Comment #3 from Sunil Pandey <skpgkp2 at gmail dot com> ---
Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz, for glibc master branch,
benchtests/bench-memcpy-large. avx_unaligned_erms is faster than ssse3

shows:

Function: memcpy
Variant: large
                                    __memcpy_avx_unaligned_erms __memcpy_ssse3
========================================================================================================================
    length=65543, align1=0, align2=0:      4633.50           6862.62 (-48.11%)  
    length=65551, align1=0, align2=3:      5152.25           6498.88 (-26.14%)  
    length=65567, align1=3, align2=0:      5052.12           6385.75 (-26.40%)  
    length=65599, align1=3, align2=5:      5811.12           6420.50 (-10.49%)  
   length=131079, align1=0, align2=0:      9181.50          12541.40 (-36.59%)  
   length=131087, align1=0, align2=3:     10162.00          12765.60 (-25.62%)  
   length=131103, align1=3, align2=0:      9961.50          12600.60 (-26.49%)  
   length=131135, align1=3, align2=5:     10134.60          12671.10 (-25.03%)  
   length=262151, align1=0, align2=0:     17199.90          24132.40 (-40.31%)  
   length=262159, align1=0, align2=3:     19601.20          24818.40 (-26.62%)  
   length=262175, align1=3, align2=0:     18511.60          23472.00 (-26.80%)  
   length=262207, align1=3, align2=5:     18139.80          22806.60 (-25.73%)  
   length=524295, align1=0, align2=0:     43515.40          67501.40 (-55.12%)  
   length=524303, align1=0, align2=3:     44062.60          70280.60 (-59.50%)  
   length=524319, align1=3, align2=0:     41980.60          67370.60 (-60.48%)  
   length=524351, align1=3, align2=5:     39343.60          65058.90 (-65.36%)  
  length=1048583, align1=0, align2=0:    637645.00         704786.00 (-10.53%)  
  length=1048591, align1=0, align2=3:    546501.00         551314.00 ( -0.88%)  
  length=1048607, align1=3, align2=0:    493258.00         542408.00 ( -9.96%)  
  length=1048639, align1=3, align2=5:    457022.00         513160.00 (-12.28%)  
  length=2097159, align1=0, align2=0:    928221.00        1055570.00 (-13.72%)  
  length=2097167, align1=0, align2=3:    934195.00         975572.00 ( -4.43%)  
  length=2097183, align1=3, align2=0:    929252.00        1052450.00 (-13.26%)  
  length=2097215, align1=3, align2=5:    934500.00        1047300.00 (-12.07%)  
  length=4194311, align1=0, align2=0:   1901330.00        2124790.00 (-11.75%)  
  length=4194319, align1=0, align2=3:   1931670.00        1954720.00 ( -1.19%)  
  length=4194335, align1=3, align2=0:   1906640.00        2113830.00 (-10.87%)  
  length=4194367, align1=3, align2=5:   1927260.00        2108930.00 ( -9.43%)  
  length=8388615, align1=0, align2=0:   3802180.00        4254990.00 (-11.91%)  
  length=8388623, align1=0, align2=3:   3858480.00        3962610.00 ( -2.70%)  
  length=8388639, align1=3, align2=0:   3797900.00        4233080.00 (-11.46%)  
  length=8388671, align1=3, align2=5:   3848300.00        4252190.00 (-10.50%)  
 length=16777223, align1=0, align2=0:   7604180.00        8557160.00 (-12.53%)  
 length=16777231, align1=0, align2=3:   7705930.00        7923390.00 ( -2.82%)  
 length=16777247, align1=3, align2=0:   7612860.00        8487690.00 (-11.49%)  
 length=16777279, align1=3, align2=5:   7708250.00        8512540.00 (-10.43%)  
 length=33554439, align1=0, align2=0:  15591300.00       17522200.00 (-12.38%)  
 length=33554447, align1=0, align2=3:  15808700.00       16259700.00 ( -2.85%)  
 length=33554463, align1=3, align2=0:  15535400.00       17188100.00 (-10.64%)  
 length=33554495, align1=3, align2=5:  15714100.00       17517900.00 (-11.48%)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3()
       [not found] <bug-24872-131@http.sourceware.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2021-03-25  2:42 ` skpgkp2 at gmail dot com
@ 2021-03-25  3:00 ` skpgkp2 at gmail dot com
  2021-03-25  5:53 ` skpgkp2 at gmail dot com
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: skpgkp2 at gmail dot com @ 2021-03-25  3:00 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=24872

--- Comment #4 from Sunil Pandey <skpgkp2 at gmail dot com> ---
Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz

Function: memcpy
Variant: large
                                    __memcpy_avx_unaligned_erms __memcpy_ssse3
========================================================================================================================
    length=65543, align1=0, align2=0:     20262.10          25710.00 (-26.89%)  
    length=65551, align1=0, align2=3:     23048.80          25749.00 (-11.72%)  
    length=65567, align1=3, align2=0:     19265.80          25841.80 (-34.13%)  
    length=65599, align1=3, align2=5:     24043.60          26029.60 ( -8.26%)  
   length=131079, align1=0, align2=0:      7462.25           9998.31 (-33.99%)  
   length=131087, align1=0, align2=3:      8639.44          10153.40 (-17.52%)  
   length=131103, align1=3, align2=0:      7628.75           9808.25 (-28.57%)  
   length=131135, align1=3, align2=5:      9045.75           9898.38 ( -9.43%)  
   length=262151, align1=0, align2=0:     19262.30          25770.40 (-33.79%)  
   length=262159, align1=0, align2=3:     21997.60          25890.30 (-17.70%)  
   length=262175, align1=3, align2=0:     20201.20          25578.40 (-26.62%)  
   length=262207, align1=3, align2=5:     22811.20          25794.90 (-13.08%)  
   length=524295, align1=0, align2=0:     43116.70          59701.20 (-38.46%)  
   length=524303, align1=0, align2=3:     46403.70          60424.80 (-30.22%)  
   length=524319, align1=3, align2=0:     45007.30          60604.30 (-34.65%)  
   length=524351, align1=3, align2=5:     48101.60          62129.50 (-29.16%)  
  length=1048583, align1=0, align2=0:     96434.60         106336.00 (-10.27%)  
  length=1048591, align1=0, align2=3:     94443.60         107462.00 (-13.78%)  
  length=1048607, align1=3, align2=0:     94346.10         112411.00 (-19.15%)  
  length=1048639, align1=3, align2=5:     99799.90         107768.00 ( -7.98%)  
  length=2097159, align1=0, align2=0:    192824.00         214921.00 (-11.46%)  
  length=2097167, align1=0, align2=3:    187735.00         209353.00 (-11.52%)  
  length=2097183, align1=3, align2=0:    187375.00         209017.00 (-11.55%)  
  length=2097215, align1=3, align2=5:    187469.00         225963.00 (-20.53%)  
  length=4194311, align1=0, align2=0:    411755.00         429404.00 ( -4.29%)  
  length=4194319, align1=0, align2=3:    398156.00         449104.00 (-12.80%)  
  length=4194335, align1=3, align2=0:    387210.00         454115.00 (-17.28%)  
  length=4194367, align1=3, align2=5:    408760.00         434598.00 ( -6.32%)  
  length=8388615, align1=0, align2=0:    969548.00        1024490.00 ( -5.67%)  
  length=8388623, align1=0, align2=3:    955586.00        1011370.00 ( -5.84%)  
  length=8388639, align1=3, align2=0:    944273.00         987034.00 ( -4.53%)  
  length=8388671, align1=3, align2=5:    938864.00        1008330.00 ( -7.40%)  
 length=16777223, align1=0, align2=0:   3201330.00        3225610.00 ( -0.76%)  
 length=16777231, align1=0, align2=3:   3027600.00        3269680.00 ( -8.00%)  
 length=16777247, align1=3, align2=0:   3018960.00        3208180.00 ( -6.27%)  
 length=16777279, align1=3, align2=5:   3056540.00        3295340.00 ( -7.81%)  
 length=33554439, align1=0, align2=0:   6797390.00        6965460.00 ( -2.47%)  
 length=33554447, align1=0, align2=3:   6660090.00        6948620.00 ( -4.33%)  
 length=33554463, align1=3, align2=0:   6695920.00        6848340.00 ( -2.28%)  
 length=33554495, align1=3, align2=5:   6632220.00        6942640.00 ( -4.68%)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3()
       [not found] <bug-24872-131@http.sourceware.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2021-03-25  3:00 ` skpgkp2 at gmail dot com
@ 2021-03-25  5:53 ` skpgkp2 at gmail dot com
  2021-03-25 13:04 ` hjl.tools at gmail dot com
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: skpgkp2 at gmail dot com @ 2021-03-25  5:53 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=24872

--- Comment #5 from Sunil Pandey <skpgkp2 at gmail dot com> ---
Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz

1 million iteration:

Function: memcpy
Variant: large
                                    __memcpy_avx_unaligned_erms __memcpy_ssse3
===============================================================================
  length=1048576, align1=0, align2=0:     95223.20         100984.00 ( -6.05%)  
  length=1048576, align1=0, align2=3:     95244.40         104451.00 ( -9.67%)  
  length=1048576, align1=3, align2=0:     95258.70         101457.00 ( -6.51%)  
  length=1048576, align1=3, align2=5:     95225.30         104204.00 ( -9.43%)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3()
       [not found] <bug-24872-131@http.sourceware.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2021-03-25  5:53 ` skpgkp2 at gmail dot com
@ 2021-03-25 13:04 ` hjl.tools at gmail dot com
  2021-03-25 13:05 ` hjl.tools at gmail dot com
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2021-03-25 13:04 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=24872

H.J. Lu <hjl.tools at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2021-03-25
             Status|UNCONFIRMED                 |WAITING

--- Comment #6 from H.J. Lu <hjl.tools at gmail dot com> ---
On Intel i7-8559U:

hjl@gnu-cfl-2 tmp]$ cat x.c
#include <string.h>

char s_buffer[1024*1024];
char s_buffer2[1024*1024];
int main(int argc, const char* argv[])
{
  unsigned long i = 0;

  for (i = 0; i < 1000000; ++i)
    memcpy(s_buffer, s_buffer2, 1024*1024);
  return 0;
}
[hjl@gnu-cfl-2 tmp]$ gcc -O2 x.c
[hjl@gnu-cfl-2 tmp]$ time ./a.out 

real    0m20.678s
user    0m20.652s
sys     0m0.005s
[hjl@gnu-cfl-2 tmp]$ time
GLIBC_TUNABLES=glibc.cpu.hwcaps=-AVX_Fast_Unaligned_Load,-Fast_Unaligned_Copy
./a.out 

real    0m20.741s
user    0m20.718s
sys     0m0.006s
[hjl@gnu-cfl-2 tmp]$

__memmove_avx_unaligned_erms is slightly faster.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3()
       [not found] <bug-24872-131@http.sourceware.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2021-03-25 13:04 ` hjl.tools at gmail dot com
@ 2021-03-25 13:05 ` hjl.tools at gmail dot com
  2021-03-25 13:16 ` skpgkp2 at gmail dot com
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2021-03-25 13:05 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=24872

H.J. Lu <hjl.tools at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |lili.cui at intel dot com

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3()
       [not found] <bug-24872-131@http.sourceware.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2021-03-25 13:05 ` hjl.tools at gmail dot com
@ 2021-03-25 13:16 ` skpgkp2 at gmail dot com
  2021-03-25 16:28 ` skpgkp2 at gmail dot com
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: skpgkp2 at gmail dot com @ 2021-03-25 13:16 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=24872

--- Comment #7 from Sunil Pandey <skpgkp2 at gmail dot com> ---
Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

Function: memcpy
Variant: large
                                    __memcpy_avx_unaligned_erms __memcpy_ssse3
==============================================================================
  length=1048576, align1=0, align2=0:    159367.00         170874.00 ( -7.22%)  
  length=1048576, align1=0, align2=3:    163147.00         171467.00 ( -5.10%)  
  length=1048576, align1=3, align2=0:    162648.00         171993.00 ( -5.75%)  
  length=1048576, align1=3, align2=5:    163561.00         171564.00 ( -4.89%)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3()
       [not found] <bug-24872-131@http.sourceware.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2021-03-25 13:16 ` skpgkp2 at gmail dot com
@ 2021-03-25 16:28 ` skpgkp2 at gmail dot com
  2021-03-25 16:59 ` skpgkp2 at gmail dot com
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: skpgkp2 at gmail dot com @ 2021-03-25 16:28 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=24872

--- Comment #8 from Sunil Pandey <skpgkp2 at gmail dot com> ---
Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

$ rpm -q glibc
glibc-2.28-42.el8.x86_64

$ time ./a.out

real    0m36.327s
user    0m36.290s
sys     0m0.002s


$ time
GLIBC_TUNABLES=glibc.cpu.hwcaps=-AVX_Fast_Unaligned_Load,-Fast_Unaligned_Copy
./a.out

real    0m36.257s
user    0m36.223s
sys     0m0.002s

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3()
       [not found] <bug-24872-131@http.sourceware.org/bugzilla/>
                   ` (9 preceding siblings ...)
  2021-03-25 16:28 ` skpgkp2 at gmail dot com
@ 2021-03-25 16:59 ` skpgkp2 at gmail dot com
  2021-05-29  5:54 ` gouhaojake at 163 dot com
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: skpgkp2 at gmail dot com @ 2021-03-25 16:59 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=24872

--- Comment #9 from Sunil Pandey <skpgkp2 at gmail dot com> ---
Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz

$ time ./a.out

real    0m29.613s
user    0m29.585s
sys     0m0.003s

$ time
GLIBC_TUNABLES=glibc.cpu.hwcaps=-AVX_Fast_Unaligned_Load,-Fast_Unaligned_Copy
./a.out

real    0m29.651s
user    0m29.622s
sys     0m0.004s

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3()
       [not found] <bug-24872-131@http.sourceware.org/bugzilla/>
                   ` (10 preceding siblings ...)
  2021-03-25 16:59 ` skpgkp2 at gmail dot com
@ 2021-05-29  5:54 ` gouhaojake at 163 dot com
  2021-05-30  1:24 ` gouhaojake at 163 dot com
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: gouhaojake at 163 dot com @ 2021-05-29  5:54 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=24872

苟浩 <gouhaojake at 163 dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |gouhaojake at 163 dot com

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3()
       [not found] <bug-24872-131@http.sourceware.org/bugzilla/>
                   ` (11 preceding siblings ...)
  2021-05-29  5:54 ` gouhaojake at 163 dot com
@ 2021-05-30  1:24 ` gouhaojake at 163 dot com
  2021-05-30  1:40 ` hjl.tools at gmail dot com
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: gouhaojake at 163 dot com @ 2021-05-30  1:24 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=24872

--- Comment #10 from 苟浩 <gouhaojake at 163 dot com> ---
I also encountered a similar problem when I used the stream tool to test the
memory bandwidth.

On the same machine, centos7 and centos8 are installed. Centos7 uses glibc-2.17
and centos8 uses glibc-2.28. The stream test data shows that the memory
performance of centos8 is much worse than that of centos7.

>From the flame diagram, the only difference between the two calls is that
glibc-2.28 uses __memmove_avx_unaligned_erms(), glibc-2.17 uses __memcpy_
ssse3()。

This machine is x86_64. However, under aarch64 architecture, glibc-2.17 and
glibc-2.28 perform almost the same.

The stream compile command uses:

# gcc -O3 -mcmodel=large -fopenmp -DSTREAM_ARRAY_SIZE=2147483648 -DNTIMES=30
stream.c -o stream

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3()
       [not found] <bug-24872-131@http.sourceware.org/bugzilla/>
                   ` (12 preceding siblings ...)
  2021-05-30  1:24 ` gouhaojake at 163 dot com
@ 2021-05-30  1:40 ` hjl.tools at gmail dot com
  2021-05-30  1:41 ` hjl.tools at gmail dot com
  2021-05-30  5:07 ` gouhaojake at 163 dot com
  15 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2021-05-30  1:40 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=24872

--- Comment #11 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to 苟浩 from comment #10)
> I also encountered a similar problem when I used the stream tool to test the
> memory bandwidth.
> 
> On the same machine, centos7 and centos8 are installed. Centos7 uses

What is your CPU?

> glibc-2.17 and centos8 uses glibc-2.28. The stream test data shows that the
> memory performance of centos8 is much worse than that of centos7.
> 
> From the flame diagram, the only difference between the two calls is that
> glibc-2.28 uses __memmove_avx_unaligned_erms(), glibc-2.17 uses __memcpy_
> ssse3()。
> 
> This machine is x86_64. However, under aarch64 architecture, glibc-2.17 and
> glibc-2.28 perform almost the same.
> 
> The stream compile command uses:
> 
> # gcc -O3 -mcmodel=large -fopenmp -DSTREAM_ARRAY_SIZE=2147483648 -DNTIMES=30
> stream.c -o stream

What is stream.c?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3()
       [not found] <bug-24872-131@http.sourceware.org/bugzilla/>
                   ` (13 preceding siblings ...)
  2021-05-30  1:40 ` hjl.tools at gmail dot com
@ 2021-05-30  1:41 ` hjl.tools at gmail dot com
  2021-05-30  5:07 ` gouhaojake at 163 dot com
  15 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2021-05-30  1:41 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=24872

H.J. Lu <hjl.tools at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dianhong.xu at intel dot com

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3()
       [not found] <bug-24872-131@http.sourceware.org/bugzilla/>
                   ` (14 preceding siblings ...)
  2021-05-30  1:41 ` hjl.tools at gmail dot com
@ 2021-05-30  5:07 ` gouhaojake at 163 dot com
  15 siblings, 0 replies; 16+ messages in thread
From: gouhaojake at 163 dot com @ 2021-05-30  5:07 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=24872

--- Comment #12 from 苟浩 <gouhaojake at 163 dot com> ---
(In reply to H.J. Lu from comment #11)

x86_64 is hygon 7285, aarch64 is huawei kunpeng920

stream.c in http://www.cs.virginia.edu/stream/FTP/Code/

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-05-30  5:07 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-24872-131@http.sourceware.org/bugzilla/>
2020-05-30 15:30 ` [Bug libc/24872] __memmove_avx_unaligned_erms() performs significantly much slower than __memcpy_ssse3() jan at jki dot io
2021-03-24 23:46 ` hjl.tools at gmail dot com
2021-03-25  0:05 ` hjl.tools at gmail dot com
2021-03-25  2:42 ` skpgkp2 at gmail dot com
2021-03-25  3:00 ` skpgkp2 at gmail dot com
2021-03-25  5:53 ` skpgkp2 at gmail dot com
2021-03-25 13:04 ` hjl.tools at gmail dot com
2021-03-25 13:05 ` hjl.tools at gmail dot com
2021-03-25 13:16 ` skpgkp2 at gmail dot com
2021-03-25 16:28 ` skpgkp2 at gmail dot com
2021-03-25 16:59 ` skpgkp2 at gmail dot com
2021-05-29  5:54 ` gouhaojake at 163 dot com
2021-05-30  1:24 ` gouhaojake at 163 dot com
2021-05-30  1:40 ` hjl.tools at gmail dot com
2021-05-30  1:41 ` hjl.tools at gmail dot com
2021-05-30  5:07 ` gouhaojake at 163 dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).