public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Missing optimization: mempcpy(3) vs memcpy(3)
@ 2022-12-12 14:34 Wilco Dijkstra
  2022-12-12 14:57 ` Cristian Rodríguez
  0 siblings, 1 reply; 13+ messages in thread
From: Wilco Dijkstra @ 2022-12-12 14:34 UTC (permalink / raw)
  To: Alejandro Colomar (man-pages); +Cc: 'GNU C Library', gcc

Hi,

I don't believe there is a missing optimization here: compilers expand mempcpy
by default into memcpy since that is the standard library call. That means even
if your source code contains mempcpy, there will never be any calls to mempcpy.

The reason is obvious: most targets support optimized memcpy in the C library
while very few optimize mempcpy. The same is true for bzero, bcmp and bcopy.

Targets can do it differently, IIRC x86 is the only target that emits calls both to
memcpy and mempcpy.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Missing optimization: mempcpy(3) vs memcpy(3)
  2022-12-12 14:34 Missing optimization: mempcpy(3) vs memcpy(3) Wilco Dijkstra
@ 2022-12-12 14:57 ` Cristian Rodríguez
  0 siblings, 0 replies; 13+ messages in thread
From: Cristian Rodríguez @ 2022-12-12 14:57 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: Alejandro Colomar (man-pages), GNU C Library, gcc

On Mon, Dec 12, 2022 at 11:35 AM Wilco Dijkstra via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> Hi,
>
> I don't believe there is a missing optimization here: compilers expand mempcpy
> by default into memcpy since that is the standard library call. That means even
> if your source code contains mempcpy, there will never be any calls to mempcpy.
>
> The reason is obvious: most targets support optimized memcpy in the C library
> while very few optimize mempcpy. The same is true for bzero, bcmp and bcopy.
>
> Targets can do it differently, IIRC x86 is the only target that emits calls both to
> memcpy and mempcpy.

yeah, x86_64 at least uses both, but I think open coded mempcpy needs
to be transformed into a library call anyway.

Other optimizations that are actually missing are:

-  the cases where of snprintf %s could become memccpy

- open coded memccpy could also be turned into library calls.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Missing optimization: mempcpy(3) vs memcpy(3)
  2022-12-12 16:09               ` Jakub Jelinek
  2022-12-12 17:15                 ` Alejandro Colomar
@ 2022-12-12 17:42                 ` Jonathan Wakely
  1 sibling, 0 replies; 13+ messages in thread
From: Jonathan Wakely @ 2022-12-12 17:42 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Alejandro Colomar, Martin Liška, gcc, GNU C Library

On Mon, 12 Dec 2022 at 16:10, Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Mon, Dec 12, 2022 at 04:56:27PM +0100, Alejandro Colomar wrote:
> > "Names beginning with ‘str’, ‘mem’, or ‘wcs’ followed by a lowercase letter
> > are reserved for additional string and array functions. See String and Array
> > Utilities."
>
> It is not that simple.
> mem*, str* and wcs* are just potentially reserved identifiers, they are only
> reserved if the implementation provided them.

And only if the program includes <string.h>.


> And what we discuss here
> is how to reliably find out if it was an implementation that provided them,
> because in case of gcc the implementation is GCC and the C library and
> perhaps some other libraries too.
> gcc can be used with lots of different C libraries, and many don't implement
> mempcpy.
>
>         Jakub
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Missing optimization: mempcpy(3) vs memcpy(3)
  2022-12-12 16:09               ` Jakub Jelinek
@ 2022-12-12 17:15                 ` Alejandro Colomar
  2022-12-12 17:42                 ` Jonathan Wakely
  1 sibling, 0 replies; 13+ messages in thread
From: Alejandro Colomar @ 2022-12-12 17:15 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Jonathan Wakely, Martin Liška, gcc, GNU C Library


[-- Attachment #1.1: Type: text/plain, Size: 2147 bytes --]

Hi Jakub,

On 12/12/22 17:09, Jakub Jelinek wrote:
> On Mon, Dec 12, 2022 at 04:56:27PM +0100, Alejandro Colomar wrote:
>> "Names beginning with ‘str’, ‘mem’, or ‘wcs’ followed by a lowercase letter
>> are reserved for additional string and array functions. See String and Array
>> Utilities."
> 
> It is not that simple.
> mem*, str* and wcs* are just potentially reserved identifiers, they are only
> reserved if the implementation provided them.

To clarify:
While ISO C up to C17 had them fully reserved, ISO C23 will make them 
potentially reserved identifiers.  POSIX further fully reserves them again 
(maybe next POSIX aligns with C23 on that; I don't know).

>  And what we discuss here
> is how to reliably find out if it was an implementation that provided them,
> because in case of gcc the implementation is GCC and the C library and
> perhaps some other libraries too.
> gcc can be used with lots of different C libraries, and many don't implement
> mempcpy.

Well, if GCC can't know what the implementation provides, then we're in big 
trouble.  Me, being just a user-space programmer, only know of _GNU_SOURCE for 
determining if the function is available at compile-time.  :)

Any of the POSIX or ISO C feature_test_macro(7)s prior to C23 should also be 
enough to tell the compiler that mem* identifiers are reserved, and therefore 
possibly provided by libc.


mempcpy(3)                 Library Functions Manual                 mempcpy(3)

NAME
        mempcpy, wmempcpy  - copy memory area

LIBRARY
        Standard C library (libc, -lc)

SYNOPSIS
        #define _GNU_SOURCE         /* See feature_test_macros(7) */
        #include <string.h>

        void *mempcpy(void dest[restrict .n], const void src[restrict .n],
                      size_t n);

        #define _GNU_SOURCE         /* See feature_test_macros(7) */
        #include <wchar.h>

        wchar_t *wmempcpy(wchar_t dest[restrict .n],
                      const wchar_t src[restrict .n],
                      size_t n);



Cheers,

Alex


-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Missing optimization: mempcpy(3) vs memcpy(3)
  2022-12-12 15:56             ` Alejandro Colomar
@ 2022-12-12 16:09               ` Jakub Jelinek
  2022-12-12 17:15                 ` Alejandro Colomar
  2022-12-12 17:42                 ` Jonathan Wakely
  0 siblings, 2 replies; 13+ messages in thread
From: Jakub Jelinek @ 2022-12-12 16:09 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Jonathan Wakely, Martin Liška, gcc, GNU C Library

On Mon, Dec 12, 2022 at 04:56:27PM +0100, Alejandro Colomar wrote:
> "Names beginning with ‘str’, ‘mem’, or ‘wcs’ followed by a lowercase letter
> are reserved for additional string and array functions. See String and Array
> Utilities."

It is not that simple.
mem*, str* and wcs* are just potentially reserved identifiers, they are only
reserved if the implementation provided them.  And what we discuss here
is how to reliably find out if it was an implementation that provided them,
because in case of gcc the implementation is GCC and the C library and
perhaps some other libraries too.
gcc can be used with lots of different C libraries, and many don't implement
mempcpy.

	Jakub


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Missing optimization: mempcpy(3) vs memcpy(3)
  2022-12-12 14:53           ` Jakub Jelinek
@ 2022-12-12 15:56             ` Alejandro Colomar
  2022-12-12 16:09               ` Jakub Jelinek
  0 siblings, 1 reply; 13+ messages in thread
From: Alejandro Colomar @ 2022-12-12 15:56 UTC (permalink / raw)
  To: Jakub Jelinek, Jonathan Wakely; +Cc: Martin Liška, gcc, GNU C Library


[-- Attachment #1.1: Type: text/plain, Size: 1725 bytes --]

Hi Jonathan and Jakub,

On 12/12/22 15:53, Jakub Jelinek wrote:
> On Mon, Dec 12, 2022 at 02:48:35PM +0000, Jonathan Wakely wrote:
>> On Mon, 12 Dec 2022 at 14:09, Alejandro Colomar wrote:
>>> On 12/12/22 14:56, Jakub Jelinek wrote:
>>
>>>> I think that is the case, plus the question if one can use a non-standard
>>>> function to implement a standard function (and if it would be triggered
>>>> by seeing an expected prototype for the non-standard function).
>>>
>>> I guess implementing a standard function by calling a non-standard one is fine.
>>> The implementation is free to do what it pleases, as long as it provides the
>>> expected interface.
>>
>> Even if the program provides a function called mempcpy?

Yes.  Quoting the glibc manual:

"The names of all library types, macros, variables and functions that come from 
the ISO C standard are reserved unconditionally; your program may not redefine 
these names."

And in case someone didn't know that mempcpy(3) was present in glibc, and could 
try to argue that it's unnice of glibc to pretend to reserve a name not 
specified by ISO C, the following applies:

"Names beginning with ‘str’, ‘mem’, or ‘wcs’ followed by a lowercase letter are 
reserved for additional string and array functions. See String and Array Utilities."

<https://www.gnu.org/software/libc/manual/html_node/Reserved-Names.html>

ISO C23 will relax that reserve a little bit, but functions defined by libc are 
always reserved, no matter what.

> 
> And even does something completely different...

So, redefining mempcpy(3) is UB.  What happens then, only nasal demons know.

Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Missing optimization: mempcpy(3) vs memcpy(3)
  2022-12-12 14:48         ` Jonathan Wakely
@ 2022-12-12 14:53           ` Jakub Jelinek
  2022-12-12 15:56             ` Alejandro Colomar
  0 siblings, 1 reply; 13+ messages in thread
From: Jakub Jelinek @ 2022-12-12 14:53 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: Alejandro Colomar, Martin Liška, gcc, GNU C Library

On Mon, Dec 12, 2022 at 02:48:35PM +0000, Jonathan Wakely wrote:
> On Mon, 12 Dec 2022 at 14:09, Alejandro Colomar wrote:
> > On 12/12/22 14:56, Jakub Jelinek wrote:
> 
> > > I think that is the case, plus the question if one can use a non-standard
> > > function to implement a standard function (and if it would be triggered
> > > by seeing an expected prototype for the non-standard function).
> >
> > I guess implementing a standard function by calling a non-standard one is fine.
> > The implementation is free to do what it pleases, as long as it provides the
> > expected interface.
> 
> Even if the program provides a function called mempcpy?

And even does something completely different...

	Jakub


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Missing optimization: mempcpy(3) vs memcpy(3)
  2022-12-12 14:05       ` Alejandro Colomar
@ 2022-12-12 14:48         ` Jonathan Wakely
  2022-12-12 14:53           ` Jakub Jelinek
  0 siblings, 1 reply; 13+ messages in thread
From: Jonathan Wakely @ 2022-12-12 14:48 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Jakub Jelinek, Martin Liška, gcc, GNU C Library

On Mon, 12 Dec 2022 at 14:09, Alejandro Colomar wrote:
> On 12/12/22 14:56, Jakub Jelinek wrote:

> > I think that is the case, plus the question if one can use a non-standard
> > function to implement a standard function (and if it would be triggered
> > by seeing an expected prototype for the non-standard function).
>
> I guess implementing a standard function by calling a non-standard one is fine.
> The implementation is free to do what it pleases, as long as it provides the
> expected interface.

Even if the program provides a function called mempcpy?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Missing optimization: mempcpy(3) vs memcpy(3)
  2022-12-12 13:56     ` Jakub Jelinek
@ 2022-12-12 14:05       ` Alejandro Colomar
  2022-12-12 14:48         ` Jonathan Wakely
  0 siblings, 1 reply; 13+ messages in thread
From: Alejandro Colomar @ 2022-12-12 14:05 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Martin Liška, gcc, GNU C Library


[-- Attachment #1.1: Type: text/plain, Size: 2640 bytes --]

Hi Jakub,

On 12/12/22 14:56, Jakub Jelinek wrote:
> On Mon, Dec 12, 2022 at 02:44:04PM +0100, Alejandro Colomar via Gcc wrote:
>>> I don't see any problem with the code snippets you provided.
>>
>> Well, then the optimization may be the other way around (although I question
>> why it is implemented that way, and not the other way around, but I'm not a
>> hardware or libc guy, so there may be reasons).
>>
>> If calling memcpy(3) is better, then the code calling mempcpy(3) could be
>> expanded inline to call it (but I doubt it).
>>
>> If calling mempcpy(3) is better, then the hand-made pattern resembling
>> mempcpy(3) should probably be merged as a call to mempcpy(3).
>>
>> But acting different on equivalent calls to both of them seems inconsistent
>> to me, unless you trust the programmer to know better how to optimize, that
>> is...
> 
> I think that is the case, plus the question if one can use a non-standard
> function to implement a standard function (and if it would be triggered
> by seeing an expected prototype for the non-standard function).

I guess implementing a standard function by calling a non-standard one is fine. 
The implementation is free to do what it pleases, as long as it provides the 
expected interface.

> 
> Otherwise, whether mempcpy in libc is implemented as memcpy + tweak return
> value or has its own implementation is something that is heavily dependent
> on the target and changes over time, so hardcoding that in gcc is
> problematic.

Might be, although I'm guessing that if GCC collapses mempcpy(3)-like hand-made 
patterns to mempcpy(3), the worst that can happen is that glibc undoes that; not 
a horrible crime.  In the best case, it saves a function call, or a few assignments.

>  For -Os mempcpy call might be very well smaller even if the
> library side is then slower.

Heh, you might be surprised with the following.  Remember that the file ending 
in 1 is a hand-made pattern around memcpy(3), while the file ending in 3 calls 
mempcpy(3) directly; yet GCC emits more code for mempcpy(3).  I don't see any 
reason for this.

Cheers,

Alex
---

$ diff -u usts2stp[13].s
--- usts2stp1.s	2022-12-12 15:00:34.775119720 +0100
+++ usts2stp3.s	2022-12-12 15:00:34.807119072 +0100
@@ -1,12 +1,13 @@
-	.file	"usts2stp1.c"
+	.file	"usts2stp3.c"
  	.text
  	.globl	usts2stp
  	.type	usts2stp, @function
  usts2stp:
  .LFB0:
  	.cfi_startproc
-	movq	(%rsi), %rcx
+	movq	%rsi, %rax
  	movq	8(%rsi), %rsi
+	movq	(%rax), %rcx
  	rep movsb
  	movb	$0, (%rdi)
  	movq	%rdi, %rax


-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Missing optimization: mempcpy(3) vs memcpy(3)
  2022-12-12 13:44   ` Alejandro Colomar
@ 2022-12-12 13:56     ` Jakub Jelinek
  2022-12-12 14:05       ` Alejandro Colomar
  0 siblings, 1 reply; 13+ messages in thread
From: Jakub Jelinek @ 2022-12-12 13:56 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Martin Liška, gcc, GNU C Library

On Mon, Dec 12, 2022 at 02:44:04PM +0100, Alejandro Colomar via Gcc wrote:
> > I don't see any problem with the code snippets you provided.
> 
> Well, then the optimization may be the other way around (although I question
> why it is implemented that way, and not the other way around, but I'm not a
> hardware or libc guy, so there may be reasons).
> 
> If calling memcpy(3) is better, then the code calling mempcpy(3) could be
> expanded inline to call it (but I doubt it).
> 
> If calling mempcpy(3) is better, then the hand-made pattern resembling
> mempcpy(3) should probably be merged as a call to mempcpy(3).
> 
> But acting different on equivalent calls to both of them seems inconsistent
> to me, unless you trust the programmer to know better how to optimize, that
> is...

I think that is the case, plus the question if one can use a non-standard
function to implement a standard function (and if it would be triggered
by seeing an expected prototype for the non-standard function).

Otherwise, whether mempcpy in libc is implemented as memcpy + tweak return
value or has its own implementation is something that is heavily dependent
on the target and changes over time, so hardcoding that in gcc is
problematic.  For -Os mempcpy call might be very well smaller even if the
library side is then slower.

	Jakub


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Missing optimization: mempcpy(3) vs memcpy(3)
  2022-12-12 13:37 ` Martin Liška
@ 2022-12-12 13:44   ` Alejandro Colomar
  2022-12-12 13:56     ` Jakub Jelinek
  0 siblings, 1 reply; 13+ messages in thread
From: Alejandro Colomar @ 2022-12-12 13:44 UTC (permalink / raw)
  To: Martin Liška, gcc, GNU C Library


[-- Attachment #1.1: Type: text/plain, Size: 3638 bytes --]

Hi Martin,

On 12/12/22 14:37, Martin Liška wrote:
> On 12/9/22 18:11, Alejandro Colomar via Gcc wrote:
>> I expect the compiler to be knowledgeable enough to call whatever is fastest, whatever it is, but be consistent in both cases.  However, here are the results:
> 
> Hi.
> 
> Note the glibc implementation of mempcpy typically uses (calls) memcpy, thus

Thanks for the info.  I CCed glibc now, and copied my original email below for 
completeness.

> I don't see any problem with the code snippets you provided.

Well, then the optimization may be the other way around (although I question why 
it is implemented that way, and not the other way around, but I'm not a hardware 
or libc guy, so there may be reasons).

If calling memcpy(3) is better, then the code calling mempcpy(3) could be 
expanded inline to call it (but I doubt it).

If calling mempcpy(3) is better, then the hand-made pattern resembling 
mempcpy(3) should probably be merged as a call to mempcpy(3).

But acting different on equivalent calls to both of them seems inconsistent to 
me, unless you trust the programmer to know better how to optimize, that is...

Cheers,

Alex


-------- Forwarded Message --------
Subject: Missing optimization: mempcpy(3) vs memcpy(3)
Date: Fri, 9 Dec 2022 18:11:17 +0100
From: Alejandro Colomar <alx.manpages@gmail.com>
To: gcc@gcc.gnu.org

Hi!

I expect mempcpy(3) to be at least as fast as memcpy(3), since it performs the 
same operations, with the exception that mempcpy(3) returns something useful (as 
opposed to memcpy(3), which could perfectly return void), and in fact something 
more likely to be in cache, if the copy is performed upwards.

The following two files are alternative implementations of a function, each one 
written in terms of one of memcpy(3) and mempcpy(3):


$ cat usts2stp1.c
      #include <string.h>

      struct ustr_s {
      	size_t  len;
      	char    *ustr;
      };

      char *
      usts2stp(char *restrict dst, const struct ustr_s *restrict src)
      {
      	memcpy(dst, src->ustr, src->len);
      	dst[src->len] = '\0';

      	return dst + src->len;
      }

$ cat usts2stp3.c
      #define _GNU_SOURCE
      #include <string.h>

      struct ustr_s {
      	size_t  len;
      	char    *ustr;
      };

      char *
      usts2stp(char *restrict dst, const struct ustr_s *restrict src)
      {
      	char *end;

      	end = mempcpy(dst, src->ustr, src->len);
      	*end = '\0';

      	return end;
      }


I expect the compiler to be knowledgeable enough to call whatever is fastest, 
whatever it is, but be consistent in both cases.  However, here are the results:


$ cc -Wall -Wextra -O3 -S usts2stp*.c
$ diff -u usts2stp[13].s
--- usts2stp1.s	2022-12-09 18:06:11.708367061 +0100
+++ usts2stp3.s	2022-12-09 18:06:11.740366451 +0100
@@ -1,4 +1,4 @@
-	.file	"usts2stp1.c"
+	.file	"usts2stp3.c"
   	.text
   	.p2align 4
   	.globl	usts2stp
@@ -6,16 +6,13 @@
   usts2stp:
   .LFB0:
   	.cfi_startproc
-	pushq	%rbx
+	subq	$8, %rsp
   	.cfi_def_cfa_offset 16
-	.cfi_offset 3, -16
-	movq	(%rsi), %rbx
+	movq	(%rsi), %rdx
   	movq	8(%rsi), %rsi
-	movq	%rbx, %rdx
-	call	memcpy@PLT
-	leaq	(%rax,%rbx), %rax
+	call	mempcpy@PLT
   	movb	$0, (%rax)
-	popq	%rbx
+	addq	$8, %rsp
   	.cfi_def_cfa_offset 8
   	ret
   	.cfi_endproc


The code with memcpy(3) seems to be worse (assuming both calls to be 
equivalent).  Shouldn't GCC produce the same code for both implementations?

Cheers,

Alex


-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Missing optimization: mempcpy(3) vs memcpy(3)
  2022-12-09 17:11 Alejandro Colomar
@ 2022-12-12 13:37 ` Martin Liška
  2022-12-12 13:44   ` Alejandro Colomar
  0 siblings, 1 reply; 13+ messages in thread
From: Martin Liška @ 2022-12-12 13:37 UTC (permalink / raw)
  To: Alejandro Colomar, gcc

On 12/9/22 18:11, Alejandro Colomar via Gcc wrote:
> I expect the compiler to be knowledgeable enough to call whatever is fastest, whatever it is, but be consistent in both cases.  However, here are the results:

Hi.

Note the glibc implementation of mempcpy typically uses (calls) memcpy, thus
I don't see any problem with the code snippets you provided.

Cheers,
Martin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Missing optimization: mempcpy(3) vs memcpy(3)
@ 2022-12-09 17:11 Alejandro Colomar
  2022-12-12 13:37 ` Martin Liška
  0 siblings, 1 reply; 13+ messages in thread
From: Alejandro Colomar @ 2022-12-09 17:11 UTC (permalink / raw)
  To: gcc


[-- Attachment #1.1: Type: text/plain, Size: 2195 bytes --]

Hi!

I expect mempcpy(3) to be at least as fast as memcpy(3), since it performs the 
same operations, with the exception that mempcpy(3) returns something useful (as 
opposed to memcpy(3), which could perfectly return void), and in fact something 
more likely to be in cache, if the copy is performed upwards.

The following two files are alternative implementations of a function, each one 
written in terms of one of memcpy(3) and mempcpy(3):


$ cat usts2stp1.c
     #include <string.h>

     struct ustr_s {
     	size_t  len;
     	char    *ustr;
     };

     char *
     usts2stp(char *restrict dst, const struct ustr_s *restrict src)
     {
     	memcpy(dst, src->ustr, src->len);
     	dst[src->len] = '\0';

     	return dst + src->len;
     }

$ cat usts2stp3.c
     #define _GNU_SOURCE
     #include <string.h>

     struct ustr_s {
     	size_t  len;
     	char    *ustr;
     };

     char *
     usts2stp(char *restrict dst, const struct ustr_s *restrict src)
     {
     	char *end;

     	end = mempcpy(dst, src->ustr, src->len);
     	*end = '\0';

     	return end;
     }


I expect the compiler to be knowledgeable enough to call whatever is fastest, 
whatever it is, but be consistent in both cases.  However, here are the results:


$ cc -Wall -Wextra -O3 -S usts2stp*.c
$ diff -u usts2stp[13].s
--- usts2stp1.s	2022-12-09 18:06:11.708367061 +0100
+++ usts2stp3.s	2022-12-09 18:06:11.740366451 +0100
@@ -1,4 +1,4 @@
-	.file	"usts2stp1.c"
+	.file	"usts2stp3.c"
  	.text
  	.p2align 4
  	.globl	usts2stp
@@ -6,16 +6,13 @@
  usts2stp:
  .LFB0:
  	.cfi_startproc
-	pushq	%rbx
+	subq	$8, %rsp
  	.cfi_def_cfa_offset 16
-	.cfi_offset 3, -16
-	movq	(%rsi), %rbx
+	movq	(%rsi), %rdx
  	movq	8(%rsi), %rsi
-	movq	%rbx, %rdx
-	call	memcpy@PLT
-	leaq	(%rax,%rbx), %rax
+	call	mempcpy@PLT
  	movb	$0, (%rax)
-	popq	%rbx
+	addq	$8, %rsp
  	.cfi_def_cfa_offset 8
  	ret
  	.cfi_endproc


The code with memcpy(3) seems to be worse (assuming both calls to be 
equivalent).  Shouldn't GCC produce the same code for both implementations?

Cheers,

Alex


-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-12-12 17:42 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-12 14:34 Missing optimization: mempcpy(3) vs memcpy(3) Wilco Dijkstra
2022-12-12 14:57 ` Cristian Rodríguez
  -- strict thread matches above, loose matches on Subject: below --
2022-12-09 17:11 Alejandro Colomar
2022-12-12 13:37 ` Martin Liška
2022-12-12 13:44   ` Alejandro Colomar
2022-12-12 13:56     ` Jakub Jelinek
2022-12-12 14:05       ` Alejandro Colomar
2022-12-12 14:48         ` Jonathan Wakely
2022-12-12 14:53           ` Jakub Jelinek
2022-12-12 15:56             ` Alejandro Colomar
2022-12-12 16:09               ` Jakub Jelinek
2022-12-12 17:15                 ` Alejandro Colomar
2022-12-12 17:42                 ` Jonathan Wakely

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).