public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Zack Weinberg <zack@owlfolio.org>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Carlos O'Donell <carlos@redhat.com>,
	GNU libc development <libc-alpha@sourceware.org>
Subject: Re: Bug 29863 - Segmentation fault in memcmp-sse2.S if memory contents can concurrently change
Date: Thu, 29 Dec 2022 02:09:20 -0500	[thread overview]
Message-ID: <ypiksfgyofdr.wl-zack@owlfolio.org> (raw)
In-Reply-To: <7d2b4dd3-0583-3f2b-95ee-9538615386ac@redhat.com>

On Wed, 14 Dec 2022 12:36:44 -0500, Paolo Bonzini wrote: 
> On 12/14/22 15:16, Zack Weinberg via Libc-alpha wrote:
> > 7.1.4p5 (N1570) "A library
> > function shall not directly or indirectly access objects accessible
> > by threads other than the current thread unless the objects are
> > accessed directly or indirectly via the function's arguments."
>
> I don't think this applies here for two reasons though:
> 
> 1) a SIGSEGV would always be acceptable (that's not a valid object),
> and so would an infinite loop

An infinite loop would indeed be acceptable, but I don’t see any
justification for a SIGSEGV as long as the _page tables_ are not being
changed concurrently from another thread.  To put it another way,
given a call `memcmp(a, b, n)`, as long as the address ranges
`[a, a+n)` and `[b, b+n)` remain _readable_ for the duration of the
call, no, SIGSEGV is not an acceptable outcome in my book, no matter
how unstable the contents of those address ranges are.

 > 2) I think that the as-if rule would even allow reads to objects
> accessible by other threads, if they don't affect the result (so they
> are not observable by the calling thread) *and* are not observable by
> those other threads either.  The classic case here is strlen() doing
> aligned word (or larger) reads, even though those reads might trespass
> the NUL terminator.

I’m not entirely comfortable with that logic, because those reads are
forbidden by the _abstract_ machine’s memory model, in which reading
even a single byte beyond the bounds of an explicitly declared array
or malloc() block is UB.  Very few concrete machines can enforce that
rule precisely; the only ones I can think of are the valgrind VM and
_maybe_ CHERI.  But that hasn’t so far stopped the compiler people
from implementing optimizations that assume the concrete machine
_does_ enforce that rule precisely.

> The following code is admittedly a bit contrived but shows the
> pitfalls of reading more than once from the same location:
> 
>   while (*(u32*)s == *(u32*)d) {
>     n-=4, s+=4, d+=4;
>     if (n < 4) goto short;
>   }
> 
>   // we know the loop will stop don't we?
>   while (*s==*d) s++, d++;
>   return *s < *d ? -1 : 1;
> 
> short:
>   while (n && *s == *d) s++, d++, n--;
>   return n ? (*s < *d ? -1 : 1) : 0;

In my book, this falls clearly into “don’t do that then” territory.
It is our responsibility as library implementors to code memcmp so
that it _does not_ access beyond the [a,a+n) [b,b+n) range, _even if_
there is a concurrent mutator.

> There are also compiler issues: it's also hard to ensure that the
> compiler won't decide to read twice for very down-to-Earth instruction
> selection reasons, for example by using a register-and-memory ALU
> operation.  Many mem*/str* routines do not have a single memory write,
> so the compiler has a *lot* of leeway to reorder and rematerialize
> memory accesses.

Again, “don’t do that then”―not “don’t read twice,” but “don’t elide
bounds checks, specifically, based on the assumption that two reads
from the same location will return the same value.”

> For example you could have something like this in a
> memmem():
> 
>   c = *p;
>   ...
>   p += table[c];
> 
> If the compiler changes the second statement to "p += table1[*p]", for
> example to avoid a spill,

that should be _perfectly fine_, because the next line of code is
something like

  if (p > limit) break;

and not

  c = *p;

If the compiler deletes the bounds check, I’m prepared to argue both
that that’s an incorrect optimization, and that if the standard says
it’s fine then the _standard_ is wrong.

zw

  reply	other threads:[~2022-12-29  7:11 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-13 18:20 Narayanan Iyer
2022-12-13 18:31 ` Andrew Pinski
2022-12-13 18:39   ` Narayanan Iyer
2022-12-13 18:39 ` Cristian Rodríguez
2022-12-13 19:08 ` Noah Goldstein
2022-12-13 19:13   ` Narayanan Iyer
2022-12-13 19:25     ` Noah Goldstein
2022-12-13 20:56       ` Zack Weinberg
2022-12-13 23:29         ` Carlos O'Donell
2022-12-14  2:28           ` Zack Weinberg
2022-12-14  4:16             ` Carlos O'Donell
2022-12-14 14:16               ` Zack Weinberg
2022-12-14 17:36                 ` Paolo Bonzini
2022-12-29  7:09                   ` Zack Weinberg [this message]
2022-12-29 19:32               ` “Undefined behavior” considered harmful (was Re: Bug 29863 - Segmentation fault in memcmp-sse2.S…) Zack Weinberg
2022-12-29 22:20                 ` Andreas Schwab
2022-12-30 13:28                   ` Florian Weimer
2022-12-30 15:09                 ` Florian Weimer
2022-12-13 22:52       ` Bug 29863 - Segmentation fault vs invalid results, memory models, and control/data dependencies Carlos O'Donell
2022-12-14 12:03         ` Florian Weimer
2022-12-13 21:20   ` Bug 29863 - Segmentation fault in memcmp-sse2.S if memory contents can concurrently change Florian Weimer
2022-12-13 22:59     ` Noah Goldstein
2022-12-14 12:06       ` Florian Weimer
     [not found] <PAWPR08MB89825887E12FF900540365F483E09@PAWPR08MB8982.eurprd08.prod.outlook.com>
     [not found] ` <PAWPR08MB898260DA844D695EA70ED3E483E09@PAWPR08MB8982.eurprd08.prod.outlook.com>
2022-12-14 21:56   ` Wilco Dijkstra
2022-12-29  7:21     ` Zack Weinberg
2022-12-29 20:02       ` Alejandro Colomar
2022-12-30 18:02         ` Joseph Myers
2023-03-20 15:40           ` Zack Weinberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ypiksfgyofdr.wl-zack@owlfolio.org \
    --to=zack@owlfolio.org \
    --cc=carlos@redhat.com \
    --cc=libc-alpha@sourceware.org \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).