We have changed strcpy to include both strcpy and stpcpy implementation, and use USE_AS_STPCPY to distinguish these two functions, stpcpy function will define related macros and include strcpy source code. See patch v2: https://sourceware.org/pipermail/libc-alpha/2023-September/151531.html On 2023-09-11 17:53, dengjianbo wrote: > Tested strcpy-lasx comparing with strcpy(call stpcpy-lasx), the > difference between two timings are 0.28, strcpy-lasx takes less time. > When the length of data is less than 32, it could reduce the runtime > more than 30%. > > See: > https://github.com/jiadengx/glibc_test/blob/main/bench/strcpy_lasx_compare_generic_strcpy.out > > There are some duplicated code in strcpy from stpcpy, since the main > part is almost same. Maybe we can try to use one source code with > MARCO USE_AS_STPCPY to distinguish strcpy and stpcpy like x86_64? it > could avoid the performance degradation. > > On 2023-09-08 22:22, Xi Ruoyao wrote: >> On Fri, 2023-09-08 at 17:33 +0800, dengjianbo wrote: >>> According to glibc strcpy microbenchmark test results(changed to use >>> generic_strcpy instead of strlen + memcpy), comparing with generic_strcpy, >>> this implementation could reduce the runtime as following: >>> >>> Name              Percent of rutime reduced >>> strcpy-aligned    10%-45% >>> strcpy-unaligned  10%-49%, comparing with the aligned version,unaligned >>>                   version experience better performance in case src and dest >>>                   cannot be both aligned with 8bytes >>> strcpy-lsx        20%-80% >>> strcpy-lasx       15%-86% >> Generic strcpy calls stpcpy, so if we've optimized stpcpy maybe it's not >> necessary to duplicate everything in strcpy. Is there a benchmark >> result comparing the timing with and without this patch, but both with >> the second patch (optimized stpcpy)? >>