public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Alejandro Colomar <alx.manpages@gmail.com>
To: Zack Weinberg <zack@owlfolio.org>,
	Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Cc: Carlos O'Donell <carlos@redhat.com>,
	'GNU C Library' <libc-alpha@sourceware.org>
Subject: Re: Bug 29863 - Segmentation fault in memcmp-sse2.S if memory contents can concurrently change
Date: Thu, 29 Dec 2022 21:02:06 +0100	[thread overview]
Message-ID: <2a6f6912-592a-b82b-0efb-ea985dea2548@gmail.com> (raw)
In-Reply-To: <ypikr0wioet2.wl-zack@owlfolio.org>

Hi Zack,

On 12/29/22 08:21, Zack Weinberg via Libc-alpha wrote:
> On Wed, 14 Dec 2022 16:56:28 -0500, Wilco Dijkstra wrote:
>> I'd expect that mem* functions will never read outside their bounds
>> since the bounds are explicitly defined by the arguments, not by the
>> data. So that should be easy to guarantee.
> 
> I concur.
> 
>> For the str* functions it may be harder since the data itself
>> defines when to stop reading.  So if an implementation uses multiple
>> accesses to the same address, you could potentially mistake the end
>> of a string (eg. first one detects a special case, while the 2nd
>> then verifies it).
> 
> I also concur here.
>   
>> Still, I wouldn't expect totally random memory accesses even in this
>> case - you would read beyond the end of a string if the string end
>> is changed concurrently.
> 
> We may run into a problem where it’s difficult to _state_ the limits
> of the misbehavior, just because the C standard doesn’t itself try to
> put limits on misbehavior in the face of an incorrect program, so we
> don’t have any language for it (which I would argue is a bug in the
> standard, see the detailed reply to Carlos that I’ll be writing, er,
> tomorrow).

The standard already makes some kind of guarantee, when it differences between 
bound UB and critical UB.  The problem is that once you've met bound UB, it's 
hard not to convert it to critical UB in the following lines of code.  Even a 
compiler assumption that would otherwise be fine might result in critical UB.

> 
> Still, taking strcmp(a, b) for example, and assuming WLOG a flat
> address space in which a < b, it should be possible to guarantee
> 
>   - no accesses to any byte in the range [0, a) ever
>   - if an oracle for strlen(), capable of executing in zero cycles,
>     would return the same value for strlen(a) throughout the execution
>     of strcmp(), then no accesses to any byte in the range
>     [a+strlen(a), b)
>   - if an oracle for strlen(), capable of executing in zero cycles,
>     would return the same value for strlen(b) throughout the execution
>     of strcmp(), then no accesses to any byte in the range
>     [b+strlen(b), ADDR_MAX)
>   - however, if the oracle strlen() values _do_ change during the
>     execution of strcmp(), then accesses to bytes in the latter two
>     ranges are possible
>   - a SIGSEGV is permissible if and only if there was at least one
>     point during execution at which a call to the oracle strlen() would
>     have triggered a SIGSEGV

Are you meaning this would be an invalid implementation of strcmp(3)?:

int
strcmp(const char *s1, const char *s2)
{
	for (; *s1 != '\0' || *s1 != '\0'; s1++, s2++) {
		if (*s1 < *s2)
			return -1;
		if (*s1 > *s2)
			return 1;
	}
}

Okay, probably it's not the fastest one, but it's simple.  This one would 
SIGSEGV in the following case:

Another thread might insert a NUL at the beginning of each string (after the 
loop has passed over it), and in the next cycle remove the 
previously-terminating NUL from the strings.  The loop would then run forever, 
until a crash.

Cheers,

Alex

> 
> Ne?
> 
>> Finally it's worth mentioning that nscd does the exact same thing:
>> it uses memcmp and non-atomic accesses on shared data that is being
>> modified by other threads. It looks totally broken, especially with
>> weaker memory ordering, however this kind of insanity may actually
>> be a common design pattern...
> 
> I don’t want to hold up nscd as an example of quality design or
> implementation, but yeah, I share your concern re “may actually be a
> common design pattern”…
> 
> zw

  reply	other threads:[~2022-12-29 20:02 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <PAWPR08MB89825887E12FF900540365F483E09@PAWPR08MB8982.eurprd08.prod.outlook.com>
     [not found] ` <PAWPR08MB898260DA844D695EA70ED3E483E09@PAWPR08MB8982.eurprd08.prod.outlook.com>
2022-12-14 21:56   ` Wilco Dijkstra
2022-12-29  7:21     ` Zack Weinberg
2022-12-29 20:02       ` Alejandro Colomar [this message]
2022-12-30 18:02         ` Joseph Myers
2023-03-20 15:40           ` Zack Weinberg
2022-12-13 18:20 Narayanan Iyer
2022-12-13 18:31 ` Andrew Pinski
2022-12-13 18:39   ` Narayanan Iyer
2022-12-13 18:39 ` Cristian Rodríguez
2022-12-13 19:08 ` Noah Goldstein
2022-12-13 19:13   ` Narayanan Iyer
2022-12-13 19:25     ` Noah Goldstein
2022-12-13 20:56       ` Zack Weinberg
2022-12-13 23:29         ` Carlos O'Donell
2022-12-14  2:28           ` Zack Weinberg
2022-12-14  4:16             ` Carlos O'Donell
2022-12-14 14:16               ` Zack Weinberg
2022-12-14 17:36                 ` Paolo Bonzini
2022-12-29  7:09                   ` Zack Weinberg
2022-12-13 21:20   ` Florian Weimer
2022-12-13 22:59     ` Noah Goldstein
2022-12-14 12:06       ` Florian Weimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2a6f6912-592a-b82b-0efb-ea985dea2548@gmail.com \
    --to=alx.manpages@gmail.com \
    --cc=Wilco.Dijkstra@arm.com \
    --cc=carlos@redhat.com \
    --cc=libc-alpha@sourceware.org \
    --cc=zack@owlfolio.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).