public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* what is the application scenes of adding optimized Q-register for memcpy
@ 2020-07-25  1:49 wangshuo (AF)
  2020-07-25  8:05 ` Szabolcs Nagy
  2020-07-25 10:02 ` Wilco Dijkstra
  0 siblings, 2 replies; 3+ messages in thread
From: wangshuo (AF) @ 2020-07-25  1:49 UTC (permalink / raw)
  To: Wilco.Dijkstra; +Cc: libc-alpha, Hushiyuan

this commit 4a733bf375238a6a595033b5785cea7f27d61307 adds optimized 
Q-register memcpy.
However, I can not get an ideal results in my enviornment. This is my test:

test suite: libMicro-0.4.0

./memcpy -E -C 200 -L -S -W -N "memcpy_10"    -s 10   -I 10
./memcpy -E -C 200 -L -S -W -N "memcpy_1k"    -s 1k   -I 50
./memcpy -E -C 200 -L -S -W -N "memcpy_10k"   -s 10k  -I 800
./memcpy -E -C 200 -L -S -W -N "memcpy_1m"    -s 1m   -I 500000
./memcpy -E -C 200 -L -S -W -N "memcpy_10m"   -s 10m  -I 5000000


hardware platform:
Kunpeng-920 @ 2600.0000MHz
L1d cache: 6 MiB
L1i cache: 6 MiB
L2 cache:  48 MiB
L3 cache:  192 MiB

            before this commit(usecs)         after this commit(usecs)
memcpy_10	    0.0065 	                       0.0065
memcpy_1k	    0.0299 	                       0.0294
memcpy_10k	    0.2642 	                       0.2642
memcpy_1m	    27.9040 	                       27.6480
memcpy_10m	    265.9840 	                       274.6880
strlen_10	    0.0039 	                       0.0039
strlen_1k	    0.0571 	                       0.0450

I was wondering if you could give me some advices about my test results.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: what is the application scenes of adding optimized Q-register for memcpy
  2020-07-25  1:49 what is the application scenes of adding optimized Q-register for memcpy wangshuo (AF)
@ 2020-07-25  8:05 ` Szabolcs Nagy
  2020-07-25 10:02 ` Wilco Dijkstra
  1 sibling, 0 replies; 3+ messages in thread
From: Szabolcs Nagy @ 2020-07-25  8:05 UTC (permalink / raw)
  To: wangshuo (AF); +Cc: Wilco.Dijkstra, Hushiyuan, libc-alpha

The 07/25/2020 09:49, wangshuo (AF) wrote:
> this commit 4a733bf375238a6a595033b5785cea7f27d61307 adds optimized
> Q-register memcpy.

please add "aarch64" to the email subject if it's
for aarch64 only.

that commit should not alter the kunpeng920 memcpy.
did you change the ifunc logic to use the new one?

the previous commit increases the entry alignment
and this one may move the memcpy to a slightly
different location in libc.so, but that's about it.

> However, I can not get an ideal results in my enviornment. This is my test:
> 
> test suite: libMicro-0.4.0
> 
> ./memcpy -E -C 200 -L -S -W -N "memcpy_10"    -s 10   -I 10
> ./memcpy -E -C 200 -L -S -W -N "memcpy_1k"    -s 1k   -I 50
> ./memcpy -E -C 200 -L -S -W -N "memcpy_10k"   -s 10k  -I 800
> ./memcpy -E -C 200 -L -S -W -N "memcpy_1m"    -s 1m   -I 500000
> ./memcpy -E -C 200 -L -S -W -N "memcpy_10m"   -s 10m  -I 5000000
> 
> 
> hardware platform:
> Kunpeng-920 @ 2600.0000MHz
> L1d cache: 6 MiB
> L1i cache: 6 MiB
> L2 cache:  48 MiB
> L3 cache:  192 MiB
> 
>            before this commit(usecs)         after this commit(usecs)
> memcpy_10	    0.0065 	                       0.0065
> memcpy_1k	    0.0299 	                       0.0294
> memcpy_10k	    0.2642 	                       0.2642
> memcpy_1m	    27.9040 	                       27.6480
> memcpy_10m	    265.9840 	                       274.6880
> strlen_10	    0.0039 	                       0.0039
> strlen_1k	    0.0571 	                       0.0450
> 
> I was wondering if you could give me some advices about my test results.

3% regression on large copies may be explained by
uarch implementation internals, you can verify
that by keeping the code the same just add some
nop padding around memcpy.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: what is the application scenes of adding optimized Q-register for memcpy
  2020-07-25  1:49 what is the application scenes of adding optimized Q-register for memcpy wangshuo (AF)
  2020-07-25  8:05 ` Szabolcs Nagy
@ 2020-07-25 10:02 ` Wilco Dijkstra
  1 sibling, 0 replies; 3+ messages in thread
From: Wilco Dijkstra @ 2020-07-25 10:02 UTC (permalink / raw)
  To: wangshuo (AF); +Cc: libc-alpha, Hushiyuan, Szabolcs Nagy

Hi Wangshuo,

As Szabolcs said, you didn't run the new GLIBC memcpy. The easiest way to
compare memcpy implementations in GLIBC is to use "make bench". After that
you can run the memcpy benchmarks directly, eg:

taskset -c 5 $BUILD_DIR/benchtests/bench-memcpy-random >out.txt

This produces results for the different memcpy implementations so you can see
which works best. I think the new memcpy_simd will work better on Kunpeng
than memcpy_falkor.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-07-25 10:02 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-25  1:49 what is the application scenes of adding optimized Q-register for memcpy wangshuo (AF)
2020-07-25  8:05 ` Szabolcs Nagy
2020-07-25 10:02 ` Wilco Dijkstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).