* Re: Missing optimization: mempcpy(3) vs memcpy(3) [not found] ` <8f9d61cf-14a5-4099-e2b6-7c8cac47a28b@suse.cz> @ 2022-12-12 13:44 ` Alejandro Colomar 2022-12-12 13:56 ` Jakub Jelinek 0 siblings, 1 reply; 11+ messages in thread From: Alejandro Colomar @ 2022-12-12 13:44 UTC (permalink / raw) To: Martin Liška, gcc, GNU C Library [-- Attachment #1.1: Type: text/plain, Size: 3638 bytes --] Hi Martin, On 12/12/22 14:37, Martin Liška wrote: > On 12/9/22 18:11, Alejandro Colomar via Gcc wrote: >> I expect the compiler to be knowledgeable enough to call whatever is fastest, whatever it is, but be consistent in both cases. However, here are the results: > > Hi. > > Note the glibc implementation of mempcpy typically uses (calls) memcpy, thus Thanks for the info. I CCed glibc now, and copied my original email below for completeness. > I don't see any problem with the code snippets you provided. Well, then the optimization may be the other way around (although I question why it is implemented that way, and not the other way around, but I'm not a hardware or libc guy, so there may be reasons). If calling memcpy(3) is better, then the code calling mempcpy(3) could be expanded inline to call it (but I doubt it). If calling mempcpy(3) is better, then the hand-made pattern resembling mempcpy(3) should probably be merged as a call to mempcpy(3). But acting different on equivalent calls to both of them seems inconsistent to me, unless you trust the programmer to know better how to optimize, that is... Cheers, Alex -------- Forwarded Message -------- Subject: Missing optimization: mempcpy(3) vs memcpy(3) Date: Fri, 9 Dec 2022 18:11:17 +0100 From: Alejandro Colomar <alx.manpages@gmail.com> To: gcc@gcc.gnu.org Hi! I expect mempcpy(3) to be at least as fast as memcpy(3), since it performs the same operations, with the exception that mempcpy(3) returns something useful (as opposed to memcpy(3), which could perfectly return void), and in fact something more likely to be in cache, if the copy is performed upwards. The following two files are alternative implementations of a function, each one written in terms of one of memcpy(3) and mempcpy(3): $ cat usts2stp1.c #include <string.h> struct ustr_s { size_t len; char *ustr; }; char * usts2stp(char *restrict dst, const struct ustr_s *restrict src) { memcpy(dst, src->ustr, src->len); dst[src->len] = '\0'; return dst + src->len; } $ cat usts2stp3.c #define _GNU_SOURCE #include <string.h> struct ustr_s { size_t len; char *ustr; }; char * usts2stp(char *restrict dst, const struct ustr_s *restrict src) { char *end; end = mempcpy(dst, src->ustr, src->len); *end = '\0'; return end; } I expect the compiler to be knowledgeable enough to call whatever is fastest, whatever it is, but be consistent in both cases. However, here are the results: $ cc -Wall -Wextra -O3 -S usts2stp*.c $ diff -u usts2stp[13].s --- usts2stp1.s 2022-12-09 18:06:11.708367061 +0100 +++ usts2stp3.s 2022-12-09 18:06:11.740366451 +0100 @@ -1,4 +1,4 @@ - .file "usts2stp1.c" + .file "usts2stp3.c" .text .p2align 4 .globl usts2stp @@ -6,16 +6,13 @@ usts2stp: .LFB0: .cfi_startproc - pushq %rbx + subq $8, %rsp .cfi_def_cfa_offset 16 - .cfi_offset 3, -16 - movq (%rsi), %rbx + movq (%rsi), %rdx movq 8(%rsi), %rsi - movq %rbx, %rdx - call memcpy@PLT - leaq (%rax,%rbx), %rax + call mempcpy@PLT movb $0, (%rax) - popq %rbx + addq $8, %rsp .cfi_def_cfa_offset 8 ret .cfi_endproc The code with memcpy(3) seems to be worse (assuming both calls to be equivalent). Shouldn't GCC produce the same code for both implementations? Cheers, Alex -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Missing optimization: mempcpy(3) vs memcpy(3) 2022-12-12 13:44 ` Missing optimization: mempcpy(3) vs memcpy(3) Alejandro Colomar @ 2022-12-12 13:56 ` Jakub Jelinek 2022-12-12 14:05 ` Alejandro Colomar 0 siblings, 1 reply; 11+ messages in thread From: Jakub Jelinek @ 2022-12-12 13:56 UTC (permalink / raw) To: Alejandro Colomar; +Cc: Martin Liška, gcc, GNU C Library On Mon, Dec 12, 2022 at 02:44:04PM +0100, Alejandro Colomar via Gcc wrote: > > I don't see any problem with the code snippets you provided. > > Well, then the optimization may be the other way around (although I question > why it is implemented that way, and not the other way around, but I'm not a > hardware or libc guy, so there may be reasons). > > If calling memcpy(3) is better, then the code calling mempcpy(3) could be > expanded inline to call it (but I doubt it). > > If calling mempcpy(3) is better, then the hand-made pattern resembling > mempcpy(3) should probably be merged as a call to mempcpy(3). > > But acting different on equivalent calls to both of them seems inconsistent > to me, unless you trust the programmer to know better how to optimize, that > is... I think that is the case, plus the question if one can use a non-standard function to implement a standard function (and if it would be triggered by seeing an expected prototype for the non-standard function). Otherwise, whether mempcpy in libc is implemented as memcpy + tweak return value or has its own implementation is something that is heavily dependent on the target and changes over time, so hardcoding that in gcc is problematic. For -Os mempcpy call might be very well smaller even if the library side is then slower. Jakub ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Missing optimization: mempcpy(3) vs memcpy(3) 2022-12-12 13:56 ` Jakub Jelinek @ 2022-12-12 14:05 ` Alejandro Colomar 2022-12-12 14:48 ` Jonathan Wakely 0 siblings, 1 reply; 11+ messages in thread From: Alejandro Colomar @ 2022-12-12 14:05 UTC (permalink / raw) To: Jakub Jelinek; +Cc: Martin Liška, gcc, GNU C Library [-- Attachment #1.1: Type: text/plain, Size: 2640 bytes --] Hi Jakub, On 12/12/22 14:56, Jakub Jelinek wrote: > On Mon, Dec 12, 2022 at 02:44:04PM +0100, Alejandro Colomar via Gcc wrote: >>> I don't see any problem with the code snippets you provided. >> >> Well, then the optimization may be the other way around (although I question >> why it is implemented that way, and not the other way around, but I'm not a >> hardware or libc guy, so there may be reasons). >> >> If calling memcpy(3) is better, then the code calling mempcpy(3) could be >> expanded inline to call it (but I doubt it). >> >> If calling mempcpy(3) is better, then the hand-made pattern resembling >> mempcpy(3) should probably be merged as a call to mempcpy(3). >> >> But acting different on equivalent calls to both of them seems inconsistent >> to me, unless you trust the programmer to know better how to optimize, that >> is... > > I think that is the case, plus the question if one can use a non-standard > function to implement a standard function (and if it would be triggered > by seeing an expected prototype for the non-standard function). I guess implementing a standard function by calling a non-standard one is fine. The implementation is free to do what it pleases, as long as it provides the expected interface. > > Otherwise, whether mempcpy in libc is implemented as memcpy + tweak return > value or has its own implementation is something that is heavily dependent > on the target and changes over time, so hardcoding that in gcc is > problematic. Might be, although I'm guessing that if GCC collapses mempcpy(3)-like hand-made patterns to mempcpy(3), the worst that can happen is that glibc undoes that; not a horrible crime. In the best case, it saves a function call, or a few assignments. > For -Os mempcpy call might be very well smaller even if the > library side is then slower. Heh, you might be surprised with the following. Remember that the file ending in 1 is a hand-made pattern around memcpy(3), while the file ending in 3 calls mempcpy(3) directly; yet GCC emits more code for mempcpy(3). I don't see any reason for this. Cheers, Alex --- $ diff -u usts2stp[13].s --- usts2stp1.s 2022-12-12 15:00:34.775119720 +0100 +++ usts2stp3.s 2022-12-12 15:00:34.807119072 +0100 @@ -1,12 +1,13 @@ - .file "usts2stp1.c" + .file "usts2stp3.c" .text .globl usts2stp .type usts2stp, @function usts2stp: .LFB0: .cfi_startproc - movq (%rsi), %rcx + movq %rsi, %rax movq 8(%rsi), %rsi + movq (%rax), %rcx rep movsb movb $0, (%rdi) movq %rdi, %rax -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Missing optimization: mempcpy(3) vs memcpy(3) 2022-12-12 14:05 ` Alejandro Colomar @ 2022-12-12 14:48 ` Jonathan Wakely 2022-12-12 14:53 ` Jakub Jelinek 0 siblings, 1 reply; 11+ messages in thread From: Jonathan Wakely @ 2022-12-12 14:48 UTC (permalink / raw) To: Alejandro Colomar; +Cc: Jakub Jelinek, Martin Liška, gcc, GNU C Library On Mon, 12 Dec 2022 at 14:09, Alejandro Colomar wrote: > On 12/12/22 14:56, Jakub Jelinek wrote: > > I think that is the case, plus the question if one can use a non-standard > > function to implement a standard function (and if it would be triggered > > by seeing an expected prototype for the non-standard function). > > I guess implementing a standard function by calling a non-standard one is fine. > The implementation is free to do what it pleases, as long as it provides the > expected interface. Even if the program provides a function called mempcpy? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Missing optimization: mempcpy(3) vs memcpy(3) 2022-12-12 14:48 ` Jonathan Wakely @ 2022-12-12 14:53 ` Jakub Jelinek 2022-12-12 15:56 ` Alejandro Colomar 0 siblings, 1 reply; 11+ messages in thread From: Jakub Jelinek @ 2022-12-12 14:53 UTC (permalink / raw) To: Jonathan Wakely; +Cc: Alejandro Colomar, Martin Liška, gcc, GNU C Library On Mon, Dec 12, 2022 at 02:48:35PM +0000, Jonathan Wakely wrote: > On Mon, 12 Dec 2022 at 14:09, Alejandro Colomar wrote: > > On 12/12/22 14:56, Jakub Jelinek wrote: > > > > I think that is the case, plus the question if one can use a non-standard > > > function to implement a standard function (and if it would be triggered > > > by seeing an expected prototype for the non-standard function). > > > > I guess implementing a standard function by calling a non-standard one is fine. > > The implementation is free to do what it pleases, as long as it provides the > > expected interface. > > Even if the program provides a function called mempcpy? And even does something completely different... Jakub ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Missing optimization: mempcpy(3) vs memcpy(3) 2022-12-12 14:53 ` Jakub Jelinek @ 2022-12-12 15:56 ` Alejandro Colomar 2022-12-12 16:09 ` Jakub Jelinek 0 siblings, 1 reply; 11+ messages in thread From: Alejandro Colomar @ 2022-12-12 15:56 UTC (permalink / raw) To: Jakub Jelinek, Jonathan Wakely; +Cc: Martin Liška, gcc, GNU C Library [-- Attachment #1.1: Type: text/plain, Size: 1725 bytes --] Hi Jonathan and Jakub, On 12/12/22 15:53, Jakub Jelinek wrote: > On Mon, Dec 12, 2022 at 02:48:35PM +0000, Jonathan Wakely wrote: >> On Mon, 12 Dec 2022 at 14:09, Alejandro Colomar wrote: >>> On 12/12/22 14:56, Jakub Jelinek wrote: >> >>>> I think that is the case, plus the question if one can use a non-standard >>>> function to implement a standard function (and if it would be triggered >>>> by seeing an expected prototype for the non-standard function). >>> >>> I guess implementing a standard function by calling a non-standard one is fine. >>> The implementation is free to do what it pleases, as long as it provides the >>> expected interface. >> >> Even if the program provides a function called mempcpy? Yes. Quoting the glibc manual: "The names of all library types, macros, variables and functions that come from the ISO C standard are reserved unconditionally; your program may not redefine these names." And in case someone didn't know that mempcpy(3) was present in glibc, and could try to argue that it's unnice of glibc to pretend to reserve a name not specified by ISO C, the following applies: "Names beginning with ‘str’, ‘mem’, or ‘wcs’ followed by a lowercase letter are reserved for additional string and array functions. See String and Array Utilities." <https://www.gnu.org/software/libc/manual/html_node/Reserved-Names.html> ISO C23 will relax that reserve a little bit, but functions defined by libc are always reserved, no matter what. > > And even does something completely different... So, redefining mempcpy(3) is UB. What happens then, only nasal demons know. Cheers, Alex -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Missing optimization: mempcpy(3) vs memcpy(3) 2022-12-12 15:56 ` Alejandro Colomar @ 2022-12-12 16:09 ` Jakub Jelinek 2022-12-12 17:15 ` Alejandro Colomar 2022-12-12 17:42 ` Jonathan Wakely 0 siblings, 2 replies; 11+ messages in thread From: Jakub Jelinek @ 2022-12-12 16:09 UTC (permalink / raw) To: Alejandro Colomar; +Cc: Jonathan Wakely, Martin Liška, gcc, GNU C Library On Mon, Dec 12, 2022 at 04:56:27PM +0100, Alejandro Colomar wrote: > "Names beginning with ‘str’, ‘mem’, or ‘wcs’ followed by a lowercase letter > are reserved for additional string and array functions. See String and Array > Utilities." It is not that simple. mem*, str* and wcs* are just potentially reserved identifiers, they are only reserved if the implementation provided them. And what we discuss here is how to reliably find out if it was an implementation that provided them, because in case of gcc the implementation is GCC and the C library and perhaps some other libraries too. gcc can be used with lots of different C libraries, and many don't implement mempcpy. Jakub ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Missing optimization: mempcpy(3) vs memcpy(3) 2022-12-12 16:09 ` Jakub Jelinek @ 2022-12-12 17:15 ` Alejandro Colomar 2022-12-12 17:42 ` Jonathan Wakely 1 sibling, 0 replies; 11+ messages in thread From: Alejandro Colomar @ 2022-12-12 17:15 UTC (permalink / raw) To: Jakub Jelinek; +Cc: Jonathan Wakely, Martin Liška, gcc, GNU C Library [-- Attachment #1.1: Type: text/plain, Size: 2147 bytes --] Hi Jakub, On 12/12/22 17:09, Jakub Jelinek wrote: > On Mon, Dec 12, 2022 at 04:56:27PM +0100, Alejandro Colomar wrote: >> "Names beginning with ‘str’, ‘mem’, or ‘wcs’ followed by a lowercase letter >> are reserved for additional string and array functions. See String and Array >> Utilities." > > It is not that simple. > mem*, str* and wcs* are just potentially reserved identifiers, they are only > reserved if the implementation provided them. To clarify: While ISO C up to C17 had them fully reserved, ISO C23 will make them potentially reserved identifiers. POSIX further fully reserves them again (maybe next POSIX aligns with C23 on that; I don't know). > And what we discuss here > is how to reliably find out if it was an implementation that provided them, > because in case of gcc the implementation is GCC and the C library and > perhaps some other libraries too. > gcc can be used with lots of different C libraries, and many don't implement > mempcpy. Well, if GCC can't know what the implementation provides, then we're in big trouble. Me, being just a user-space programmer, only know of _GNU_SOURCE for determining if the function is available at compile-time. :) Any of the POSIX or ISO C feature_test_macro(7)s prior to C23 should also be enough to tell the compiler that mem* identifiers are reserved, and therefore possibly provided by libc. mempcpy(3) Library Functions Manual mempcpy(3) NAME mempcpy, wmempcpy - copy memory area LIBRARY Standard C library (libc, -lc) SYNOPSIS #define _GNU_SOURCE /* See feature_test_macros(7) */ #include <string.h> void *mempcpy(void dest[restrict .n], const void src[restrict .n], size_t n); #define _GNU_SOURCE /* See feature_test_macros(7) */ #include <wchar.h> wchar_t *wmempcpy(wchar_t dest[restrict .n], const wchar_t src[restrict .n], size_t n); Cheers, Alex -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Missing optimization: mempcpy(3) vs memcpy(3) 2022-12-12 16:09 ` Jakub Jelinek 2022-12-12 17:15 ` Alejandro Colomar @ 2022-12-12 17:42 ` Jonathan Wakely 1 sibling, 0 replies; 11+ messages in thread From: Jonathan Wakely @ 2022-12-12 17:42 UTC (permalink / raw) To: Jakub Jelinek; +Cc: Alejandro Colomar, Martin Liška, gcc, GNU C Library On Mon, 12 Dec 2022 at 16:10, Jakub Jelinek <jakub@redhat.com> wrote: > > On Mon, Dec 12, 2022 at 04:56:27PM +0100, Alejandro Colomar wrote: > > "Names beginning with ‘str’, ‘mem’, or ‘wcs’ followed by a lowercase letter > > are reserved for additional string and array functions. See String and Array > > Utilities." > > It is not that simple. > mem*, str* and wcs* are just potentially reserved identifiers, they are only > reserved if the implementation provided them. And only if the program includes <string.h>. > And what we discuss here > is how to reliably find out if it was an implementation that provided them, > because in case of gcc the implementation is GCC and the C library and > perhaps some other libraries too. > gcc can be used with lots of different C libraries, and many don't implement > mempcpy. > > Jakub > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Missing optimization: mempcpy(3) vs memcpy(3) @ 2022-12-12 14:34 Wilco Dijkstra 2022-12-12 14:57 ` Cristian Rodríguez 0 siblings, 1 reply; 11+ messages in thread From: Wilco Dijkstra @ 2022-12-12 14:34 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: 'GNU C Library', gcc Hi, I don't believe there is a missing optimization here: compilers expand mempcpy by default into memcpy since that is the standard library call. That means even if your source code contains mempcpy, there will never be any calls to mempcpy. The reason is obvious: most targets support optimized memcpy in the C library while very few optimize mempcpy. The same is true for bzero, bcmp and bcopy. Targets can do it differently, IIRC x86 is the only target that emits calls both to memcpy and mempcpy. Cheers, Wilco ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Missing optimization: mempcpy(3) vs memcpy(3) 2022-12-12 14:34 Wilco Dijkstra @ 2022-12-12 14:57 ` Cristian Rodríguez 0 siblings, 0 replies; 11+ messages in thread From: Cristian Rodríguez @ 2022-12-12 14:57 UTC (permalink / raw) To: Wilco Dijkstra; +Cc: Alejandro Colomar (man-pages), GNU C Library, gcc On Mon, Dec 12, 2022 at 11:35 AM Wilco Dijkstra via Libc-alpha <libc-alpha@sourceware.org> wrote: > > Hi, > > I don't believe there is a missing optimization here: compilers expand mempcpy > by default into memcpy since that is the standard library call. That means even > if your source code contains mempcpy, there will never be any calls to mempcpy. > > The reason is obvious: most targets support optimized memcpy in the C library > while very few optimize mempcpy. The same is true for bzero, bcmp and bcopy. > > Targets can do it differently, IIRC x86 is the only target that emits calls both to > memcpy and mempcpy. yeah, x86_64 at least uses both, but I think open coded mempcpy needs to be transformed into a library call anyway. Other optimizations that are actually missing are: - the cases where of snprintf %s could become memccpy - open coded memccpy could also be turned into library calls. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2022-12-12 17:42 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <fdc766b8-7cfd-806b-3602-53e1ba9b277e@gmail.com> [not found] ` <8f9d61cf-14a5-4099-e2b6-7c8cac47a28b@suse.cz> 2022-12-12 13:44 ` Missing optimization: mempcpy(3) vs memcpy(3) Alejandro Colomar 2022-12-12 13:56 ` Jakub Jelinek 2022-12-12 14:05 ` Alejandro Colomar 2022-12-12 14:48 ` Jonathan Wakely 2022-12-12 14:53 ` Jakub Jelinek 2022-12-12 15:56 ` Alejandro Colomar 2022-12-12 16:09 ` Jakub Jelinek 2022-12-12 17:15 ` Alejandro Colomar 2022-12-12 17:42 ` Jonathan Wakely 2022-12-12 14:34 Wilco Dijkstra 2022-12-12 14:57 ` Cristian Rodríguez
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).