public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug string/31055] New: Request: guarantee that memcpy(x, x, n) is well-defined
@ 2023-11-11 17:37 post+sourceware.org at ralfj dot de
  2023-11-11 18:41 ` [Bug string/31055] " sam at gentoo dot org
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: post+sourceware.org at ralfj dot de @ 2023-11-11 17:37 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31055

            Bug ID: 31055
           Summary: Request: guarantee that memcpy(x, x, n) is
                    well-defined
           Product: glibc
           Version: 2.38
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P2
         Component: string
          Assignee: unassigned at sourceware dot org
          Reporter: post+sourceware.org at ralfj dot de
  Target Milestone: ---

The documentation for memcpy states

> The behavior of this function is undefined if the two arrays to and from overlap

However, this contract is suboptimal for several large consumers of this API,
including GCC (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667), clang
(https://reviews.llvm.org/D86993), and rustc
(https://doc.rust-lang.org/nightly/core/index.html#how-to-use-the-core-library).
All of these compilers assume that when both pointers are equal, memcpy is
effectively a NOP.

(I am not a compiler backend engineer, so I cannot fully explain why they
consider this contract insufficient and are amending it unilaterally. But there
seems to be a widespread consensus on the compiler side that adding an extra
conditional to guard against the case of both pointers being equal has
significant cost. Obviously compilers should have tried to work with libc
implementations and not just unilaterally assume a different contract than what
is documented; I cannot speak to the history of how we got here, I would just
like to improve the situation here and now.)

Efforts seem to be underway to change the C standard to make this API contract
more useful for these compilers (https://reviews.llvm.org/D86993#4585590), but
that is a slow process. So in the meantime it would be great if glibc could
guarantee that for the special case where to==from, behavior is indeed
well-defined. (Given that in particular the flagship compiler of the GNU
project itself makes this assumption, I hope this is a reasonable request.)

My assumption is that this only requires a change to the documentation, and
that the implementation is already well-defined for to==from. If, however, the
implementation of glibc on some targets is *not* producing well-defined results
for to==from, then that would also be very interesting to know.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug string/31055] Request: guarantee that memcpy(x, x, n) is well-defined
  2023-11-11 17:37 [Bug string/31055] New: Request: guarantee that memcpy(x, x, n) is well-defined post+sourceware.org at ralfj dot de
@ 2023-11-11 18:41 ` sam at gentoo dot org
  2023-11-13 18:47 ` adhemerval.zanella at linaro dot org
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: sam at gentoo dot org @ 2023-11-11 18:41 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31055

Sam James <sam at gentoo dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=32667
                 CC|                            |sam at gentoo dot org
                URL|                            |https://reviews.llvm.org/D8
                   |                            |6993

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug string/31055] Request: guarantee that memcpy(x, x, n) is well-defined
  2023-11-11 17:37 [Bug string/31055] New: Request: guarantee that memcpy(x, x, n) is well-defined post+sourceware.org at ralfj dot de
  2023-11-11 18:41 ` [Bug string/31055] " sam at gentoo dot org
@ 2023-11-13 18:47 ` adhemerval.zanella at linaro dot org
  2023-11-18 17:56 ` post+sourceware.org at ralfj dot de
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: adhemerval.zanella at linaro dot org @ 2023-11-13 18:47 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31055

Adhemerval Zanella <adhemerval.zanella at linaro dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |adhemerval.zanella at linaro dot o
                   |                            |rg

--- Comment #1 from Adhemerval Zanella <adhemerval.zanella at linaro dot org> ---
(In reply to Ralf Jung from comment #0)
> 
> My assumption is that this only requires a change to the documentation, and
> that the implementation is already well-defined for to==from. If, however,
> the implementation of glibc on some targets is *not* producing well-defined
> results for to==from, then that would also be very interesting to know.

From the draft proposal document [1], it means that memcpy and other
string/memory functions might accept NULL/invalid arguments. 

It would require glibc to remove the __nonnull on various prototypes and check
if the compiler builtin used on fortify would also handle NULL pointers
correctly.  It would require extending both default and fortify testing, and
adding appropriate support if required (on fortify wrappers for instance if
there are compiler bugs). 

I am also not sure if *all* the architecture memcpy implementation handles
memcpy(x, x, n) correctly, although I expect it would not be an issue (GCC is
already generating code with this assumption).

The PowerPC dcbz trick is not used on powerpc optimization, and cache fill
instructions are only used on memset implementations (and for zero fill, so I
don't think this would be an issue).

[1]
https://docs.google.com/document/d/1guH_HgibKrX7t9JfKGfWX2UCPyZOTLsnRfR6UleD1F8/edit

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug string/31055] Request: guarantee that memcpy(x, x, n) is well-defined
  2023-11-11 17:37 [Bug string/31055] New: Request: guarantee that memcpy(x, x, n) is well-defined post+sourceware.org at ralfj dot de
  2023-11-11 18:41 ` [Bug string/31055] " sam at gentoo dot org
  2023-11-13 18:47 ` adhemerval.zanella at linaro dot org
@ 2023-11-18 17:56 ` post+sourceware.org at ralfj dot de
  2023-11-21 15:30 ` adhemerval.zanella at linaro dot org
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: post+sourceware.org at ralfj dot de @ 2023-11-18 17:56 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31055

--- Comment #2 from Ralf Jung <post+sourceware.org at ralfj dot de> ---
> From the draft proposal document [1], it means that memcpy and other string/memory functions might accept NULL/invalid arguments. 

Yes, that too. Though I opened this issue specifically about the src==dest
case.

> I am also not sure if *all* the architecture memcpy implementation handles memcpy(x, x, n) correctly, although I expect it would not be an issue (GCC is already generating code with this assumption).

That's the thing. Either they fail for that case and we have a problem since
GCC (and clang and rustc) generate bad code. Or they are all fine for that case
and de-facto they can't change that anyway because GCC (and clang and rustc)
rely on it, and it's better to just officially document this rather than
continue with this annoying situation where something's clearly broken but
nobody wants to fix it.

(And this issue is not purely theoretical, it leads to real problems such as
https://bugs.kde.org/show_bug.cgi?id=148543. The valgrind devs are waiting for
*someone* to clarify what the contract for memcpy actually is.)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug string/31055] Request: guarantee that memcpy(x, x, n) is well-defined
  2023-11-11 17:37 [Bug string/31055] New: Request: guarantee that memcpy(x, x, n) is well-defined post+sourceware.org at ralfj dot de
                   ` (2 preceding siblings ...)
  2023-11-18 17:56 ` post+sourceware.org at ralfj dot de
@ 2023-11-21 15:30 ` adhemerval.zanella at linaro dot org
  2023-11-23  7:36 ` post+sourceware.org at ralfj dot de
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: adhemerval.zanella at linaro dot org @ 2023-11-21 15:30 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31055

--- Comment #3 from Adhemerval Zanella <adhemerval.zanella at linaro dot org> ---
(In reply to Ralf Jung from comment #2)
> > From the draft proposal document [1], it means that memcpy and other string/memory functions might accept NULL/invalid arguments. 
> 
> Yes, that too. Though I opened this issue specifically about the src==dest
> case.

I think with this extra constraint we can't really mark the memcpy as having
'restrict' arguments, which would also have extra performance implications.  It
would also mean deviating from C standard which I am not sure would be really
the correct approach.

(I may be reading the restrict keyword definition wrongly, and thus the C
standard does allow restrict with a pointer to alias to itself).

> 
> > I am also not sure if *all* the architecture memcpy implementation handles memcpy(x, x, n) correctly, although I expect it would not be an issue (GCC is already generating code with this assumption).
> 
> That's the thing. Either they fail for that case and we have a problem since
> GCC (and clang and rustc) generate bad code. Or they are all fine for that
> case and de-facto they can't change that anyway because GCC (and clang and
> rustc) rely on it, and it's better to just officially document this rather
> than continue with this annoying situation where something's clearly broken
> but nobody wants to fix it.

My understanding is if the compiler can not ensure the memory ranges don't
overlap, it should emit a memmove instead. But this is really a corner case
that won't really trigger the wrong result because afaiu all glibc memcpy
implementations do not issue any tricks as the powerpc one valgrind described.

> 
> (And this issue is not purely theoretical, it leads to real problems such as
> https://bugs.kde.org/show_bug.cgi?id=148543. The valgrind devs are waiting
> for *someone* to clarify what the contract for memcpy actually is.)

And I think valgrind will need to at least keep this warning for non-glibc
targets, which I am not it would simplify things.  Maybe the best way would to
follow the __memcmpeq route, where it was added solely for compile usage.

[1] https://sourceware.org/pipermail/libc-alpha/2021-September/131099.html

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug string/31055] Request: guarantee that memcpy(x, x, n) is well-defined
  2023-11-11 17:37 [Bug string/31055] New: Request: guarantee that memcpy(x, x, n) is well-defined post+sourceware.org at ralfj dot de
                   ` (3 preceding siblings ...)
  2023-11-21 15:30 ` adhemerval.zanella at linaro dot org
@ 2023-11-23  7:36 ` post+sourceware.org at ralfj dot de
  2023-11-23 11:53 ` adhemerval.zanella at linaro dot org
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: post+sourceware.org at ralfj dot de @ 2023-11-23  7:36 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31055

--- Comment #4 from Ralf Jung <post+sourceware.org at ralfj dot de> ---
> I think with this extra constraint we can't really mark the memcpy as having 'restrict' arguments, which would also have extra performance implications.

Does anything change in the generated assembly for memcpy when the arguments
are vs are not `restrict`? If yes, that would be a sign that compilers should
probably stop using `memcpy(x, x, n)`... but it seems unlikely, given how
heavily hand-tuned that code already is.

> And I think valgrind will need to at least keep this warning for non-glibc targets, which I am not it would simplify things.

It would at least give solid motivation for the to support the option of a libc
that allows this, e.g. via a flag.

> Maybe the best way would to follow the __memcmpeq route, where it was added solely for compile usage.

That's also a possibility, of course. Though I assume adding a new symbol would
take forever to propagate through the ecosystem. Also, would that symbol be for
GCC only or also for other compilers? It would not be great to have a GCC-only
solution.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug string/31055] Request: guarantee that memcpy(x, x, n) is well-defined
  2023-11-11 17:37 [Bug string/31055] New: Request: guarantee that memcpy(x, x, n) is well-defined post+sourceware.org at ralfj dot de
                   ` (4 preceding siblings ...)
  2023-11-23  7:36 ` post+sourceware.org at ralfj dot de
@ 2023-11-23 11:53 ` adhemerval.zanella at linaro dot org
  2023-11-23 14:48 ` sam at gentoo dot org
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: adhemerval.zanella at linaro dot org @ 2023-11-23 11:53 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31055

--- Comment #5 from Adhemerval Zanella <adhemerval.zanella at linaro dot org> ---
(In reply to Ralf Jung from comment #4)
> > I think with this extra constraint we can't really mark the memcpy as having 'restrict' arguments, which would also have extra performance implications.
> 
> Does anything change in the generated assembly for memcpy when the arguments
> are vs are not `restrict`? If yes, that would be a sign that compilers
> should probably stop using `memcpy(x, x, n)`... but it seems unlikely, given
> how heavily hand-tuned that code already is.

But it is still a deviation from the standard, isn't it? But from BZ#32667 it
should not matter anyway since GCC now assumes that source and destination can
be equal [1].

> 
> > And I think valgrind will need to at least keep this warning for non-glibc targets, which I am not it would simplify things.
> 
> It would at least give solid motivation for the to support the option of a
> libc that allows this, e.g. via a flag.
> 
> > Maybe the best way would to follow the __memcmpeq route, where it was added solely for compile usage.
> 
> That's also a possibility, of course. Though I assume adding a new symbol
> would take forever to propagate through the ecosystem. Also, would that
> symbol be for GCC only or also for other compilers? It would not be great to
> have a GCC-only solution.

This seems to be Jakub proposed solution [2], but indeed even __memcmpeq did
not have any adoption so far [3] [4].  I will bring this to libc-alpha to see
what other maintainers thing about it.

[1]
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=7758cb4b53e8a33642709402ce582f769eb9fd18;hp=6ce952188ab39e303e4f63e474b5cba83b5b12fd
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32667#c35
[3] https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596881.html
[4] https://reviews.llvm.org/D127461

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug string/31055] Request: guarantee that memcpy(x, x, n) is well-defined
  2023-11-11 17:37 [Bug string/31055] New: Request: guarantee that memcpy(x, x, n) is well-defined post+sourceware.org at ralfj dot de
                   ` (5 preceding siblings ...)
  2023-11-23 11:53 ` adhemerval.zanella at linaro dot org
@ 2023-11-23 14:48 ` sam at gentoo dot org
  2023-11-23 15:18 ` post+sourceware.org at ralfj dot de
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: sam at gentoo dot org @ 2023-11-23 14:48 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31055

Sam James <sam at gentoo dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://llvm.org/bugs/show_
                   |                            |bug.cgi?id=11763,
                   |                            |https://bugs.kde.org/show_b
                   |                            |ug.cgi?id=148543

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug string/31055] Request: guarantee that memcpy(x, x, n) is well-defined
  2023-11-11 17:37 [Bug string/31055] New: Request: guarantee that memcpy(x, x, n) is well-defined post+sourceware.org at ralfj dot de
                   ` (6 preceding siblings ...)
  2023-11-23 14:48 ` sam at gentoo dot org
@ 2023-11-23 15:18 ` post+sourceware.org at ralfj dot de
  2023-11-24  7:46 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: post+sourceware.org at ralfj dot de @ 2023-11-23 15:18 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31055

--- Comment #6 from Ralf Jung <post+sourceware.org at ralfj dot de> ---
I didn't know there's a "restrict" in the signature of this function even in
the C standard.

Yes that would have to be removed to make memcpy(x, x, n) well-defined. But if
there are no benefits from having the restrict there (which can be determined
by checking if the generated assembly changes, or by benchmarking), then
there's no point in having it in the first place.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug string/31055] Request: guarantee that memcpy(x, x, n) is well-defined
  2023-11-11 17:37 [Bug string/31055] New: Request: guarantee that memcpy(x, x, n) is well-defined post+sourceware.org at ralfj dot de
                   ` (7 preceding siblings ...)
  2023-11-23 15:18 ` post+sourceware.org at ralfj dot de
@ 2023-11-24  7:46 ` rguenth at gcc dot gnu.org
  2023-11-24  8:52 ` post+sourceware.org at ralfj dot de
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-11-24  7:46 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31055

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC now documents the requirement on the target C library to support memcpy
with exactly overlapping source/destination as not invoking undefined behavior
but working as expected.

It might be good to add a testcase to glibc that exercises this, verifying
it works so anybody trying to change memcpy in a way that violate this
would catch this during testing.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug string/31055] Request: guarantee that memcpy(x, x, n) is well-defined
  2023-11-11 17:37 [Bug string/31055] New: Request: guarantee that memcpy(x, x, n) is well-defined post+sourceware.org at ralfj dot de
                   ` (8 preceding siblings ...)
  2023-11-24  7:46 ` rguenth at gcc dot gnu.org
@ 2023-11-24  8:52 ` post+sourceware.org at ralfj dot de
  2023-11-24  9:37 ` rguenth at gcc dot gnu.org
  2023-11-28  7:18 ` sam at gentoo dot org
  11 siblings, 0 replies; 13+ messages in thread
From: post+sourceware.org at ralfj dot de @ 2023-11-24  8:52 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31055

--- Comment #8 from Ralf Jung <post+sourceware.org at ralfj dot de> ---
Is there a URL where that documentation can be found? That could be useful to
reference in the future.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug string/31055] Request: guarantee that memcpy(x, x, n) is well-defined
  2023-11-11 17:37 [Bug string/31055] New: Request: guarantee that memcpy(x, x, n) is well-defined post+sourceware.org at ralfj dot de
                   ` (9 preceding siblings ...)
  2023-11-24  8:52 ` post+sourceware.org at ralfj dot de
@ 2023-11-24  9:37 ` rguenth at gcc dot gnu.org
  2023-11-28  7:18 ` sam at gentoo dot org
  11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-11-24  9:37 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31055

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Ralf Jung from comment #8)
> Is there a URL where that documentation can be found? That could be useful
> to reference in the future.

https://gcc.gnu.org/onlinedocs/gcc/Standards.html#C-Language

at its end it says

"Most of the compiler support routines used by GCC are present in libgcc, but
there are a few exceptions. GCC requires the freestanding environment provide
memcpy, memmove, memset and memcmp. Contrary to the standards covering memcpy
GCC expects the case of an exact overlap of source and destination to work and
not invoke undefined behavior. Finally, if __builtin_trap is used, and the
target does not implement the trap pattern, then GCC emits a call to abort."

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug string/31055] Request: guarantee that memcpy(x, x, n) is well-defined
  2023-11-11 17:37 [Bug string/31055] New: Request: guarantee that memcpy(x, x, n) is well-defined post+sourceware.org at ralfj dot de
                   ` (10 preceding siblings ...)
  2023-11-24  9:37 ` rguenth at gcc dot gnu.org
@ 2023-11-28  7:18 ` sam at gentoo dot org
  11 siblings, 0 replies; 13+ messages in thread
From: sam at gentoo dot org @ 2023-11-28  7:18 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31055

Sam James <sam at gentoo dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://github.com/llvm/llv
                   |                            |m-project/issues/73516

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-11-28  7:18 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-11 17:37 [Bug string/31055] New: Request: guarantee that memcpy(x, x, n) is well-defined post+sourceware.org at ralfj dot de
2023-11-11 18:41 ` [Bug string/31055] " sam at gentoo dot org
2023-11-13 18:47 ` adhemerval.zanella at linaro dot org
2023-11-18 17:56 ` post+sourceware.org at ralfj dot de
2023-11-21 15:30 ` adhemerval.zanella at linaro dot org
2023-11-23  7:36 ` post+sourceware.org at ralfj dot de
2023-11-23 11:53 ` adhemerval.zanella at linaro dot org
2023-11-23 14:48 ` sam at gentoo dot org
2023-11-23 15:18 ` post+sourceware.org at ralfj dot de
2023-11-24  7:46 ` rguenth at gcc dot gnu.org
2023-11-24  8:52 ` post+sourceware.org at ralfj dot de
2023-11-24  9:37 ` rguenth at gcc dot gnu.org
2023-11-28  7:18 ` sam at gentoo dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).