public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Carlos O'Donell <carlos@redhat.com>
To: Zack Weinberg <zack@owlfolio.org>
Cc: GNU libc development <libc-alpha@sourceware.org>
Subject: Re: Bug 29863 - Segmentation fault in memcmp-sse2.S if memory contents can concurrently change
Date: Tue, 13 Dec 2022 23:16:19 -0500	[thread overview]
Message-ID: <c0206fb2-4fe5-e893-7629-fb10ea4de457@redhat.com> (raw)
In-Reply-To: <ypiko7s6afca.fsf@owlfolio.org>

On 12/13/22 21:28, Zack Weinberg wrote:
> Carlos O'Donell <carlos@redhat.com> writes:
> 
>> On 12/13/22 15:56, Zack Weinberg via Libc-alpha wrote:
>>> I think it would be reasonable for glibc to make the following weaker guarantee:
>>> for any call `memcmp(a, b, n)`, if the data pointed to by `a` and/or `b` is being
>>> concurrently modified, the return value is unspecified but *not* indeterminate.
>>> Also, memcmp will never access memory outside the bounds [a, a+n) and [b, b+n),
>>> no matter what.
>>
>> I disagree strongly.
> 
> I’m really surprised to hear you say that.  To me this is a natural
> guarantee for memcmp — in fact, for *all* of the mem* functions — to
> make, to the point where my reaction was *of course* this is our bug!

Please let me expand on my answer.

We are talking about the C language, and when you write "unspecified" in that context
it means the language *does* have something to say about the behaviour but does not pick
one or other of the available behaviours. This is not the case, the language very clearly
says this is undefined behaviour, so it says nothing about what should happen.

My understanding was that you were trying to ascribe more determinism to the operation
of memcpy under the presence of data races than could be granted by UB.

I strongly disagree to ascribing more determinism than UB.

Could you expand on why you think this is a "natural" guarantee and from what that
derives from? Is it that you view the input domain to the function as the "natural"
bytes upon which the function is allowed to operate?

>> These are advanced lockless techniques.
>> They should be hidden behind new APIs that provide the required guarantees.
> 
> That the application was doing “advanced lockless techniques” is, to me,

If the application does not follow the language requirements then it is UB.

There are some advanced lockless techniques that do not follow the C memory model.

My point is that such techniques and the requirements should be well understood
and implemented by APIs that provide the required guarantees, not by existing C
string and memory APIs.

> not relevant in the slightest.  The important thing to me is that the
> memory regions `memcmp` is allowed to access are wholly specified by the
> mathematical values of its three arguments, and *not* by the data
> pointed to by the first two arguments.  Nothing any other thread does
> can change the fact that memcmp has no business touching bytes at
> addresses outside [a, a+n) and [b, b+n).

I strongly disagree (though the quiet theoretician in me agrees with you).

The standards are in no way prescriptive in saying that memcmp shall not read or
write to memory outside of the input domain.

> Why do you think it is important for the C library to have latitude to
> break that aspect of the mem* functions’ API contract?  Even if only
> under exotic circumstances?

Why do you think an application programmer has the right to ignore the requirements
of the language and expect the runtime to operate as intended?

Why would we as library authors enter into an API contract that is *stronger* than
the language guarantees?

I am empathetic to the yottadb developers, but we need new APIs for these requirements.

I will still review Noah's patch here:
https://sourceware.org/pipermail/libc-alpha/2022-December/144058.html

I do not have a sustained objection to modifying things to "just work" (tm) because
with UB you could do anything, and it doesn't impact the normal use cases.

That is to say for a specific patch, and a specific change, I can agree, but not
to the larger generalizations about memcmp.

-- 
Cheers,
Carlos.


  reply	other threads:[~2022-12-14  4:16 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-13 18:20 Narayanan Iyer
2022-12-13 18:31 ` Andrew Pinski
2022-12-13 18:39   ` Narayanan Iyer
2022-12-13 18:39 ` Cristian Rodríguez
2022-12-13 19:08 ` Noah Goldstein
2022-12-13 19:13   ` Narayanan Iyer
2022-12-13 19:25     ` Noah Goldstein
2022-12-13 20:56       ` Zack Weinberg
2022-12-13 23:29         ` Carlos O'Donell
2022-12-14  2:28           ` Zack Weinberg
2022-12-14  4:16             ` Carlos O'Donell [this message]
2022-12-14 14:16               ` Zack Weinberg
2022-12-14 17:36                 ` Paolo Bonzini
2022-12-29  7:09                   ` Zack Weinberg
2022-12-29 19:32               ` “Undefined behavior” considered harmful (was Re: Bug 29863 - Segmentation fault in memcmp-sse2.S…) Zack Weinberg
2022-12-29 22:20                 ` Andreas Schwab
2022-12-30 13:28                   ` Florian Weimer
2022-12-30 15:09                 ` Florian Weimer
2022-12-13 22:52       ` Bug 29863 - Segmentation fault vs invalid results, memory models, and control/data dependencies Carlos O'Donell
2022-12-14 12:03         ` Florian Weimer
2022-12-13 21:20   ` Bug 29863 - Segmentation fault in memcmp-sse2.S if memory contents can concurrently change Florian Weimer
2022-12-13 22:59     ` Noah Goldstein
2022-12-14 12:06       ` Florian Weimer
     [not found] <PAWPR08MB89825887E12FF900540365F483E09@PAWPR08MB8982.eurprd08.prod.outlook.com>
     [not found] ` <PAWPR08MB898260DA844D695EA70ED3E483E09@PAWPR08MB8982.eurprd08.prod.outlook.com>
2022-12-14 21:56   ` Wilco Dijkstra
2022-12-29  7:21     ` Zack Weinberg
2022-12-29 20:02       ` Alejandro Colomar
2022-12-30 18:02         ` Joseph Myers
2023-03-20 15:40           ` Zack Weinberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c0206fb2-4fe5-e893-7629-fb10ea4de457@redhat.com \
    --to=carlos@redhat.com \
    --cc=libc-alpha@sourceware.org \
    --cc=zack@owlfolio.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).