public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Zack Weinberg <zack@owlfolio.org>, Carlos O'Donell <carlos@redhat.com>
Cc: GNU libc development <libc-alpha@sourceware.org>
Subject: Re: Bug 29863 - Segmentation fault in memcmp-sse2.S if memory contents can concurrently change
Date: Wed, 14 Dec 2022 18:36:44 +0100	[thread overview]
Message-ID: <7d2b4dd3-0583-3f2b-95ee-9538615386ac@redhat.com> (raw)
In-Reply-To: <663fab35-0f08-4b36-a653-0145c36ca7f8@app.fastmail.com>

On 12/14/22 15:16, Zack Weinberg via Libc-alpha wrote:
>> The standards are in no way prescriptive in saying that memcmp
>> shall not read or write to memory outside of the input domain.
>
> ... is (as I read it) contradicted by 7.1.4p5 (N1570) "A library
> function shall not directly or indirectly access objects accessible
> by threads other than the current thread unless the objects are
> accessed directly or indirectly via the function's arguments."  There
> is more wiggle room in this wording than I'd ideally like, but since
> memcmp has no way of knowing whether any particular piece of data
> outside the ranges supplied as arguments is "accessible by threads
> other than the current thread", it needs to be conservative and not
> touch any of it.

I don't think this applies here for two reasons though:

1) a SIGSEGV would always be acceptable (that's not a valid object), and 
so would an infinite loop

2) I think that the as-if rule would even allow reads to objects 
accessible by other threads, if they don't affect the result (so they 
are not observable by the calling thread) *and* are not observable by 
those other threads either.  The classic case here is strlen() doing 
aligned word (or larger) reads, even though those reads might trespass 
the NUL terminator.


Promising any kind of behavior when a data race happens involving the 
str*/mem* functions is harder than it seems.  As soon as the functions 
read a byte more than once, the view of memory that they operate on does 
not even obey causality.

The following code is admittedly a bit contrived but shows the pitfalls 
of reading more than once from the same location:

   while (*(u32*)s == *(u32*)d) {
     n-=4, s+=4, d+=4;
     if (n < 4) goto short;
   }

   // we know the loop will stop don't we?
   while (*s==*d) s++, d++;
   return *s < *d ? -1 : 1;

short:
   while (n && *s == *d) s++, d++, n--;
   return n ? (*s < *d ? -1 : 1) : 0;

... and you could access beyond the [a,a+n) [b,b+n) range if a 
concurrent mutator causes the second while loop to go off the cliff.

There are also compiler issues: it's also hard to ensure that the 
compiler won't decide to read twice for very down-to-Earth instruction 
selection reasons, for example by using a register-and-memory ALU 
operation.  Many mem*/str* routines do not have a single memory write, 
so the compiler has a *lot* of leeway to reorder and rematerialize 
memory accesses.  For example you could have something like this in a 
memmem():

   c = *p;
   ...
   p += table[c];

If the compiler changes the second statement to "p += table1[*p]", for 
example to avoid a spill, concurrent mutation can result in 
out-of-bounds accesses or other kinds of UB.  The standard certainly 
didn't require that str*/mem* be written in assembly, or that it does 
all of it's accesses as atomic loads or perhaps (argh!) volatile.

Paolo


  reply	other threads:[~2022-12-14 17:36 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-13 18:20 Narayanan Iyer
2022-12-13 18:31 ` Andrew Pinski
2022-12-13 18:39   ` Narayanan Iyer
2022-12-13 18:39 ` Cristian Rodríguez
2022-12-13 19:08 ` Noah Goldstein
2022-12-13 19:13   ` Narayanan Iyer
2022-12-13 19:25     ` Noah Goldstein
2022-12-13 20:56       ` Zack Weinberg
2022-12-13 23:29         ` Carlos O'Donell
2022-12-14  2:28           ` Zack Weinberg
2022-12-14  4:16             ` Carlos O'Donell
2022-12-14 14:16               ` Zack Weinberg
2022-12-14 17:36                 ` Paolo Bonzini [this message]
2022-12-29  7:09                   ` Zack Weinberg
2022-12-29 19:32               ` “Undefined behavior” considered harmful (was Re: Bug 29863 - Segmentation fault in memcmp-sse2.S…) Zack Weinberg
2022-12-29 22:20                 ` Andreas Schwab
2022-12-30 13:28                   ` Florian Weimer
2022-12-30 15:09                 ` Florian Weimer
2022-12-13 22:52       ` Bug 29863 - Segmentation fault vs invalid results, memory models, and control/data dependencies Carlos O'Donell
2022-12-14 12:03         ` Florian Weimer
2022-12-13 21:20   ` Bug 29863 - Segmentation fault in memcmp-sse2.S if memory contents can concurrently change Florian Weimer
2022-12-13 22:59     ` Noah Goldstein
2022-12-14 12:06       ` Florian Weimer
     [not found] <PAWPR08MB89825887E12FF900540365F483E09@PAWPR08MB8982.eurprd08.prod.outlook.com>
     [not found] ` <PAWPR08MB898260DA844D695EA70ED3E483E09@PAWPR08MB8982.eurprd08.prod.outlook.com>
2022-12-14 21:56   ` Wilco Dijkstra
2022-12-29  7:21     ` Zack Weinberg
2022-12-29 20:02       ` Alejandro Colomar
2022-12-30 18:02         ` Joseph Myers
2023-03-20 15:40           ` Zack Weinberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7d2b4dd3-0583-3f2b-95ee-9538615386ac@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=carlos@redhat.com \
    --cc=libc-alpha@sourceware.org \
    --cc=zack@owlfolio.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).