public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Missing optimization: mempcpy(3) vs memcpy(3)
@ 2022-12-12 14:34 Wilco Dijkstra
  2022-12-12 14:57 ` Cristian Rodríguez
  0 siblings, 1 reply; 13+ messages in thread
From: Wilco Dijkstra @ 2022-12-12 14:34 UTC (permalink / raw)
  To: Alejandro Colomar (man-pages); +Cc: 'GNU C Library', gcc

Hi,

I don't believe there is a missing optimization here: compilers expand mempcpy
by default into memcpy since that is the standard library call. That means even
if your source code contains mempcpy, there will never be any calls to mempcpy.

The reason is obvious: most targets support optimized memcpy in the C library
while very few optimize mempcpy. The same is true for bzero, bcmp and bcopy.

Targets can do it differently, IIRC x86 is the only target that emits calls both to
memcpy and mempcpy.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 13+ messages in thread
* Missing optimization: mempcpy(3) vs memcpy(3)
@ 2022-12-09 17:11 Alejandro Colomar
  2022-12-12 13:37 ` Martin Liška
  0 siblings, 1 reply; 13+ messages in thread
From: Alejandro Colomar @ 2022-12-09 17:11 UTC (permalink / raw)
  To: gcc


[-- Attachment #1.1: Type: text/plain, Size: 2195 bytes --]

Hi!

I expect mempcpy(3) to be at least as fast as memcpy(3), since it performs the 
same operations, with the exception that mempcpy(3) returns something useful (as 
opposed to memcpy(3), which could perfectly return void), and in fact something 
more likely to be in cache, if the copy is performed upwards.

The following two files are alternative implementations of a function, each one 
written in terms of one of memcpy(3) and mempcpy(3):


$ cat usts2stp1.c
     #include <string.h>

     struct ustr_s {
     	size_t  len;
     	char    *ustr;
     };

     char *
     usts2stp(char *restrict dst, const struct ustr_s *restrict src)
     {
     	memcpy(dst, src->ustr, src->len);
     	dst[src->len] = '\0';

     	return dst + src->len;
     }

$ cat usts2stp3.c
     #define _GNU_SOURCE
     #include <string.h>

     struct ustr_s {
     	size_t  len;
     	char    *ustr;
     };

     char *
     usts2stp(char *restrict dst, const struct ustr_s *restrict src)
     {
     	char *end;

     	end = mempcpy(dst, src->ustr, src->len);
     	*end = '\0';

     	return end;
     }


I expect the compiler to be knowledgeable enough to call whatever is fastest, 
whatever it is, but be consistent in both cases.  However, here are the results:


$ cc -Wall -Wextra -O3 -S usts2stp*.c
$ diff -u usts2stp[13].s
--- usts2stp1.s	2022-12-09 18:06:11.708367061 +0100
+++ usts2stp3.s	2022-12-09 18:06:11.740366451 +0100
@@ -1,4 +1,4 @@
-	.file	"usts2stp1.c"
+	.file	"usts2stp3.c"
  	.text
  	.p2align 4
  	.globl	usts2stp
@@ -6,16 +6,13 @@
  usts2stp:
  .LFB0:
  	.cfi_startproc
-	pushq	%rbx
+	subq	$8, %rsp
  	.cfi_def_cfa_offset 16
-	.cfi_offset 3, -16
-	movq	(%rsi), %rbx
+	movq	(%rsi), %rdx
  	movq	8(%rsi), %rsi
-	movq	%rbx, %rdx
-	call	memcpy@PLT
-	leaq	(%rax,%rbx), %rax
+	call	mempcpy@PLT
  	movb	$0, (%rax)
-	popq	%rbx
+	addq	$8, %rsp
  	.cfi_def_cfa_offset 8
  	ret
  	.cfi_endproc


The code with memcpy(3) seems to be worse (assuming both calls to be 
equivalent).  Shouldn't GCC produce the same code for both implementations?

Cheers,

Alex


-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-12-12 17:42 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-12 14:34 Missing optimization: mempcpy(3) vs memcpy(3) Wilco Dijkstra
2022-12-12 14:57 ` Cristian Rodríguez
  -- strict thread matches above, loose matches on Subject: below --
2022-12-09 17:11 Alejandro Colomar
2022-12-12 13:37 ` Martin Liška
2022-12-12 13:44   ` Alejandro Colomar
2022-12-12 13:56     ` Jakub Jelinek
2022-12-12 14:05       ` Alejandro Colomar
2022-12-12 14:48         ` Jonathan Wakely
2022-12-12 14:53           ` Jakub Jelinek
2022-12-12 15:56             ` Alejandro Colomar
2022-12-12 16:09               ` Jakub Jelinek
2022-12-12 17:15                 ` Alejandro Colomar
2022-12-12 17:42                 ` Jonathan Wakely

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).