On Fri, Apr 1, 2016 at 12:38 PM, H.J. Lu <hongjiu.lu@intel.com> wrote:
> Tested on Haswell, Ivy Bridge, Westmere and Penryn. Also tested with
> --disable-multi-arch.  Any comments, feedbacks?
>
>
>.J.
> ---
> Since the new SSE2/AVX2 memcpy/memmove are faster than the previous ones,
> we can remove the previous SSE2/AVX2 memcpy/memmove and replace them with
> the new ones.
>
> No change in IFUNC selection if SSE2 and AVX2 memcpy/memmove weren't used
> before.  If SSE2 or AVX2 memcpy/memmove were used, the new SSE2 or AVX2
> memcpy/memmove optimized with Enhanced REP MOVSB will be used for
> processors with ERMS.  The new AVX512 memcpy/memmove will be used for
> processors with AVX512 which prefer vzeroupper.
>
> Since the new SSE2 memcpy/memmove are faster than the previous default
> memcpy/memmove used in libc.a and ld.so, we also remove the previous
> default memcpy/memmove and make them the default memcpy/memmove.
>
> Together, it reduces the size of libc.so by about 6 KB and the size of
> ld.so by about 2 KB.
>

Here is the updated patch against master.  The current memcpy
performance data is at

https://sourceware.org/bugzilla/attachment.cgi?id=9184

The current memmove performance data is at

https://sourceware.org/bugzilla/attachment.cgi?id=9185

Any comments, feedbacks, objections?


-- 
H.J.