* PING: [PATCH 0/7] [BZ #19776] Improve x86-64 memcpy-sse2-unaligned.S
@ 2016-03-18 11:46 H.J. Lu
0 siblings, 0 replies; only message in thread
From: H.J. Lu @ 2016-03-18 11:46 UTC (permalink / raw)
To: GNU C Library; +Cc: Ondrej Bilka
Ping
On Mon, Mar 7, 2016 at 9:36 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> This set of patches improves x86-64 memcpy-sse2-unaligned.S by
>
> 1. Removing dead code.
> 2. Setting RAX to the return value at entrance.
> 3. Removing unnecessary code.
> 4. Adding entry points for __mempcpy_chk_sse2_unaligned,
> __mempcpy_sse2_unaligned and __memcpy_chk_sse2_unaligned.
> 5. Enabling __mempcpy_chk_sse2_unaligned, __mempcpy_sse2_unaligned and
> __memcpy_chk_sse2_unaligned.
>
> bench-mempcpy shows
>
> Ivy Bridge:
>
> simple_mempcpy __mempcpy_avx_unaligned __mempcpy_ssse3_back __mempcpy_ssse3 __mempcpy_sse2_unaligned __mempcpy_sse2
> Length 432, alignment 27/ 0: 1628.16 98.3906 73.5625 94.7344 67.1719 139.531
> Length 432, alignment 0/27: 1627.84 148.891 80.625 98.8281 104.766 142.625
> Length 432, alignment 27/27: 1631.03 90.5469 70.0312 69.5938 76.5469 123.969
> Length 448, alignment 0/ 0: 1685.95 79.1875 65.1719 72.9062 70.1406 116.688
> Length 448, alignment 28/ 0: 1685.84 89.4531 73.0156 99.5938 86.3594 138.203
> Length 448, alignment 0/28: 1684.52 148.016 82.2812 94.8438 103.344 147.578
> Length 448, alignment 28/28: 1684.42 86.4688 65.4062 70.0469 72.4688 123.422
> Length 464, alignment 0/ 0: 1740.77 70.1406 66.2812 69.2656 71.25 118.234
> Length 464, alignment 29/ 0: 1742.31 100.141 75.875 98.9219 83.375 145.594
> Length 464, alignment 0/29: 1742.31 148.016 80.2969 107.766 102.031 154.531
> Length 464, alignment 29/29: 1740.98 91.5469 64.8438 72.4531 71.4688 127.062
> Length 480, alignment 0/ 0: 1967.2 76.875 66.625 71.1406 71.25 123.641
> Length 480, alignment 30/ 0: 1799.02 94.3125 72.9062 103.797 80.2969 144.484
> Length 480, alignment 0/30: 1797.47 148.453 82.6094 102.906 102.25 158.062
> Length 480, alignment 30/30: 1799.02 90.8906 68.0469 69.5938 71.3594 124.844
> Length 496, alignment 0/ 0: 1853.83 71.25 68.3906 71.9219 69.1406 123.422
> Length 496, alignment 31/ 0: 1855.38 94.8438 74.3438 104.672 73.2344 148.125
> Length 496, alignment 0/31: 1853.59 148.906 80.2969 109.297 114.703 163.016
> Length 496, alignment 31/31: 1855.27 93.0781 71.4688 72.3438 84.2656 127.953
> Length 4096, alignment 0/ 0: 14559.7 509.891 506.469 474.156 508.344 591.062
>
> Nehalem:
>
> simple_mempcpy __mempcpy_ssse3_back __mempcpy_ssse3 __mempcpy_sse2_unaligned __mempcpy_sse2
>
> Length 432, alignment 27/ 0: 113.25 50.9531 64.0312 39.1406 77.6719
> Length 432, alignment 0/27: 130.688 45.7969 63.9844 89.1562 133.078
> Length 432, alignment 27/27: 118.266 34.4531 36 40.9688 70.7812
> Length 448, alignment 0/ 0: 98.2969 34.3594 37.5 39.5156 56.2969
> Length 448, alignment 28/ 0: 115.641 51.7969 64.6406 44.6719 77.2031
> Length 448, alignment 0/28: 143.297 46.7812 64.9688 88.1719 137.25
> Length 448, alignment 28/28: 118.453 34.4531 36.7969 40.0312 70.3125
> Length 464, alignment 0/ 0: 101.156 36.0938 37.125 39.4688 63.6562
> Length 464, alignment 29/ 0: 118.594 52.6875 69.1406 43.6875 79.9219
> Length 464, alignment 0/29: 133.922 46.4062 71.0156 88.5 142.922
> Length 464, alignment 29/29: 126.047 36.1406 39.375 39.3281 71.3438
> Length 480, alignment 0/ 0: 104.203 36.1406 38.2969 39.2812 59.5312
> Length 480, alignment 30/ 0: 120.375 53.2969 69.8438 47.25 80.5312
> Length 480, alignment 0/30: 150 47.0625 69.9844 87.4219 148.125
> Length 480, alignment 30/30: 126.375 37.9219 37.6875 39.2812 70.8281
> Length 496, alignment 0/ 0: 107.016 37.5 39.0938 39.5156 67.2656
> Length 496, alignment 31/ 0: 119.719 169.078 71.4375 45.6562 79.4531
> Length 496, alignment 0/31: 139.641 47.25 71.2969 101.953 155.062
> Length 496, alignment 31/31: 123.844 39.8438 40.6406 45.75 70.5469
> Length 4096, alignment 0/ 0: 749.203 245.859 249.609 253.172 292.078
>
> *** BLURB HERE ***
>
> H.J. Lu (7):
> Remove dead code from memcpy-sse2-unaligned.S
> Don't use RAX as scratch register
> Remove L(overlapping) from memcpy-sse2-unaligned.S
> Add entry points for __mempcpy_sse2_unaligned and _chk functions
> Enable __mempcpy_sse2_unaligned
> Enable __mempcpy_chk_sse2_unaligned
> Enable __memcpy_chk_sse2_unaligned
>
> sysdeps/x86_64/multiarch/ifunc-impl-list.c | 6 ++
> sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S | 125 ++++++++---------------
> sysdeps/x86_64/multiarch/memcpy_chk.S | 23 +++--
> sysdeps/x86_64/multiarch/mempcpy.S | 19 ++--
> sysdeps/x86_64/multiarch/mempcpy_chk.S | 19 ++--
> 5 files changed, 85 insertions(+), 107 deletions(-)
>
> --
> 2.5.0
>
--
H.J.
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2016-03-18 11:46 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-18 11:46 PING: [PATCH 0/7] [BZ #19776] Improve x86-64 memcpy-sse2-unaligned.S H.J. Lu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).