Now that the IFUNC infrastructure for aarch64 is in place, here is a patch to use it to create ThunderX specific versions of memcpy and memmove. This was part of my original patch before it was split in two and a couple of issues were raised at that time.  Siddhesh Poyarekar wanted to separate the generic and thunderx copies of memcpy/memmove instead of using ifdefs in a combined source file. I prefer the ifdef version as a cleaner implementation with less code duplication but I can change it if that is the consensus. Also Adhemerval Zanella did some benchmarking that showed the prefetching done in the thunderx version might be appropriate for the generic version.  However if you look at the prefetching we only do it every other time through the loop.  This is because the loop copies 64 bytes and the ThunderX cache line size is 128 bytes.  If other aarch64 chips have a 64 byte cache line they might want a different prefetching setup. If people think we should use the ThunderX version of memcpy for all aarch64 systems I am happy to drop this patch and create one that just changes memcpy.S to do the ThunderX style prefetches for all aarch64 systems. Steve Ellcey sellcey@cavium.com 2017-03-24  Steve Ellcey   * sysdeps/aarch64/memcpy.S (MEMMOVE, MEMCPY): New macros. (memmove): Use MEMMOVE for name. (memcpy): Use MEMCPY for name.  Add loop with prefetching under USE_THUNDERX macro. * sysdeps/aarch64/multiarch/Makefile: New file. * sysdeps/aarch64/multiarch/ifunc-impl-list.c: Likewise. * sysdeps/aarch64/multiarch/init-arch.h: Likewise. * sysdeps/aarch64/multiarch/memcpy.c: Likewise. * sysdeps/aarch64/multiarch/memcpy_generic.S: Likewise. * sysdeps/aarch64/multiarch/memcpy_thunderx.S: Likewise. * sysdeps/aarch64/multiarch/memmove.c: Likewise.