public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* PING: [PATCH 0/7] [BZ #19776] Improve x86-64 memcpy-sse2-unaligned.S
@ 2016-03-18 11:46 H.J. Lu
  0 siblings, 0 replies; only message in thread
From: H.J. Lu @ 2016-03-18 11:46 UTC (permalink / raw)
  To: GNU C Library; +Cc: Ondrej Bilka

Ping

On Mon, Mar 7, 2016 at 9:36 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> This set of patches improves x86-64 memcpy-sse2-unaligned.S by
>
> 1. Removing dead code.
> 2. Setting RAX to the return value at entrance.
> 3. Removing unnecessary code.
> 4. Adding entry points for __mempcpy_chk_sse2_unaligned,
> __mempcpy_sse2_unaligned and __memcpy_chk_sse2_unaligned.
> 5. Enabling __mempcpy_chk_sse2_unaligned, __mempcpy_sse2_unaligned and
> __memcpy_chk_sse2_unaligned.
>
> bench-mempcpy shows
>
> Ivy Bridge:
>
>                               simple_mempcpy __mempcpy_avx_unaligned __mempcpy_ssse3_back __mempcpy_ssse3 __mempcpy_sse2_unaligned __mempcpy_sse2
> Length  432, alignment 27/ 0:   1628.16 98.3906 73.5625 94.7344 67.1719 139.531
> Length  432, alignment  0/27:   1627.84 148.891 80.625  98.8281 104.766 142.625
> Length  432, alignment 27/27:   1631.03 90.5469 70.0312 69.5938 76.5469 123.969
> Length  448, alignment  0/ 0:   1685.95 79.1875 65.1719 72.9062 70.1406 116.688
> Length  448, alignment 28/ 0:   1685.84 89.4531 73.0156 99.5938 86.3594 138.203
> Length  448, alignment  0/28:   1684.52 148.016 82.2812 94.8438 103.344 147.578
> Length  448, alignment 28/28:   1684.42 86.4688 65.4062 70.0469 72.4688 123.422
> Length  464, alignment  0/ 0:   1740.77 70.1406 66.2812 69.2656 71.25   118.234
> Length  464, alignment 29/ 0:   1742.31 100.141 75.875  98.9219 83.375  145.594
> Length  464, alignment  0/29:   1742.31 148.016 80.2969 107.766 102.031 154.531
> Length  464, alignment 29/29:   1740.98 91.5469 64.8438 72.4531 71.4688 127.062
> Length  480, alignment  0/ 0:   1967.2  76.875  66.625  71.1406 71.25   123.641
> Length  480, alignment 30/ 0:   1799.02 94.3125 72.9062 103.797 80.2969 144.484
> Length  480, alignment  0/30:   1797.47 148.453 82.6094 102.906 102.25  158.062
> Length  480, alignment 30/30:   1799.02 90.8906 68.0469 69.5938 71.3594 124.844
> Length  496, alignment  0/ 0:   1853.83 71.25   68.3906 71.9219 69.1406 123.422
> Length  496, alignment 31/ 0:   1855.38 94.8438 74.3438 104.672 73.2344 148.125
> Length  496, alignment  0/31:   1853.59 148.906 80.2969 109.297 114.703 163.016
> Length  496, alignment 31/31:   1855.27 93.0781 71.4688 72.3438 84.2656 127.953
> Length 4096, alignment  0/ 0:   14559.7 509.891 506.469 474.156 508.344 591.062
>
> Nehalem:
>
>                              simple_mempcpy __mempcpy_ssse3_back __mempcpy_ssse3 __mempcpy_sse2_unaligned __mempcpy_sse2
>
> Length  432, alignment 27/ 0:   113.25  50.9531 64.0312 39.1406 77.6719
> Length  432, alignment  0/27:   130.688 45.7969 63.9844 89.1562 133.078
> Length  432, alignment 27/27:   118.266 34.4531 36      40.9688 70.7812
> Length  448, alignment  0/ 0:   98.2969 34.3594 37.5    39.5156 56.2969
> Length  448, alignment 28/ 0:   115.641 51.7969 64.6406 44.6719 77.2031
> Length  448, alignment  0/28:   143.297 46.7812 64.9688 88.1719 137.25
> Length  448, alignment 28/28:   118.453 34.4531 36.7969 40.0312 70.3125
> Length  464, alignment  0/ 0:   101.156 36.0938 37.125  39.4688 63.6562
> Length  464, alignment 29/ 0:   118.594 52.6875 69.1406 43.6875 79.9219
> Length  464, alignment  0/29:   133.922 46.4062 71.0156 88.5    142.922
> Length  464, alignment 29/29:   126.047 36.1406 39.375  39.3281 71.3438
> Length  480, alignment  0/ 0:   104.203 36.1406 38.2969 39.2812 59.5312
> Length  480, alignment 30/ 0:   120.375 53.2969 69.8438 47.25   80.5312
> Length  480, alignment  0/30:   150     47.0625 69.9844 87.4219 148.125
> Length  480, alignment 30/30:   126.375 37.9219 37.6875 39.2812 70.8281
> Length  496, alignment  0/ 0:   107.016 37.5    39.0938 39.5156 67.2656
> Length  496, alignment 31/ 0:   119.719 169.078 71.4375 45.6562 79.4531
> Length  496, alignment  0/31:   139.641 47.25   71.2969 101.953 155.062
> Length  496, alignment 31/31:   123.844 39.8438 40.6406 45.75   70.5469
> Length 4096, alignment  0/ 0:   749.203 245.859 249.609 253.172 292.078
>
> *** BLURB HERE ***
>
> H.J. Lu (7):
>   Remove dead code from memcpy-sse2-unaligned.S
>   Don't use RAX as scratch register
>   Remove L(overlapping) from memcpy-sse2-unaligned.S
>   Add entry points for __mempcpy_sse2_unaligned and _chk functions
>   Enable __mempcpy_sse2_unaligned
>   Enable __mempcpy_chk_sse2_unaligned
>   Enable __memcpy_chk_sse2_unaligned
>
>  sysdeps/x86_64/multiarch/ifunc-impl-list.c       |   6 ++
>  sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S | 125 ++++++++---------------
>  sysdeps/x86_64/multiarch/memcpy_chk.S            |  23 +++--
>  sysdeps/x86_64/multiarch/mempcpy.S               |  19 ++--
>  sysdeps/x86_64/multiarch/mempcpy_chk.S           |  19 ++--
>  5 files changed, 85 insertions(+), 107 deletions(-)
>
> --
> 2.5.0
>



-- 
H.J.

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2016-03-18 11:46 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-18 11:46 PING: [PATCH 0/7] [BZ #19776] Improve x86-64 memcpy-sse2-unaligned.S H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).