On Tue, Mar 29, 2022 at 10:57 AM Noah Goldstein wrote: >On Mon, Mar 28, 2022 at 9:51 PM Mayshao-oc wrote: > > > > On Mon, Mar 28, 2022 at 9:07 PM H.J. Lu wrote: > > > > > > > On Mon, Mar 28, 2022 at 1:10 AM Mayshao-oc wrote: > > > > > > > > On Fri, Mar 25, 2022 at 6:36 PM Noah Goldstein wrote: > > > > > > > > > With SSE2, SSE4.1, AVX2, and EVEX versions very few targets prefer > > > > > SSSE3. As a result its no longer with the code size cost. > > > > > --- > > > > > sysdeps/x86_64/multiarch/Makefile | 2 - > > > > > sysdeps/x86_64/multiarch/ifunc-impl-list.c | 15 - > > > > > sysdeps/x86_64/multiarch/ifunc-memmove.h | 18 +- > > > > > sysdeps/x86_64/multiarch/memcpy-ssse3.S | 3151 -------------------- > > > > > sysdeps/x86_64/multiarch/memmove-ssse3.S | 4 - > > > > > 5 files changed, 7 insertions(+), 3183 deletions(-) > > > > > delete mode 100644 sysdeps/x86_64/multiarch/memcpy-ssse3.S > > > > > delete mode 100644 sysdeps/x86_64/multiarch/memmove-ssse3.S > > > > > > > > On some platforms, such as Zhaoxin, the memcpy performance of SSSE3 > > > > is better than that of AVX2, and the current computer system has sufficient > > > > disk capacity and memory capacity. > > > > > > How does the SSSE3 version compare against the SSE2 version? > > > > On some Zhaoxin processors, the overall performance of SSSE3 is about > > 10% higher than that of SSE2. > > > > > > Best Regards, > > May Shao > > Any chance you can post the result from running `bench-memset` or some > equivalent benchmark? Curious where the regressions are. Ideally we would > fix the SSE2 version so its optimal. Bench-memcpy on Zhaoxin KX-6000 processor shows that, when length <=4 or length >= 128, memcpy SSSE3 can achieve an average performance improvement of 25% compared to SSSE2. I have attached the test results, hope this is what you want to see. > > > > It is strongly recommended to keep the SSSE3 version. > > > > > > > > > > > > > -- > > > H.J.