From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by sourceware.org (Postfix) with ESMTPS id B95843858CDA for ; Thu, 29 Dec 2022 20:02:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B95843858CDA Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-wr1-x42b.google.com with SMTP id j17so12836424wrr.7 for ; Thu, 29 Dec 2022 12:02:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:from:to:cc:subject:date:message-id:reply-to; bh=W+DPYPnYyUpsfDx+/QD7ZvBKTYQYfkgFewrwt/alghg=; b=GaTIrV+txSTdZV7nD+5sgL9I6+64nsvofk9NzJAPk8mfHiIVUz6swZMAELBcBr6eQ7 YmAUt/+HJ2ydEzo7GK/hwBzJ9kE9mdk8zE9GETbBzJ9zVr/o1LObGchueZ7AeUvl8zQZ qDVWItWiL3FUAoJRG5FpGOKwRy+k6oB91I+K69noUOVT/URh4VGUOApItg1YX13JSskD z1wHIv3kxN2UKtsCI1OChzNB8jr/uMwGCnblBepMxKnTHqJJIWqXJZJcAWoizQNB3pbm M3G5GwShf6nw7LSlQRa2xYnx5YHJdyN1jk8hh6DkYgC+Rq3snUjYVHK/v9xnRn6zikCh srpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=W+DPYPnYyUpsfDx+/QD7ZvBKTYQYfkgFewrwt/alghg=; b=34wOvyohEFQFBkUCruQpplVBZXHw5gohcEgHGJ5+Z06j9ZjgrzVJW36FhWCZRh2Vjx 4a0Ee1xmGdSDl/qwNhNNhkHzC+tGsHs9EcMDHKTBnLTDYmafNgbsKEWPbEadr9So8o7b gStOMFy6cfjp3q8etq3P8FOAVp1zHTKgIckaUsdhfy1UB2TGsL+TAPsvqP0JuD2+qSEZ oW0MPt/RxjzYvzg291NNF30DGxjc75/BozOcJ2G5CMlm7cuhDiwfaegxSpVlnLudP1AY bLPf4ZhgHgvCHKu8g/O2iTyevFnK13gbbBnwtC5SM0sF2VFZ+x4oWkWQfOpA1yakBM5x 3nBA== X-Gm-Message-State: AFqh2kpvwP3Y+hvQpgqqQGc1tM9JPdBN7gv6L/3739FBSx9bpwJriDaj MGtRN7PXAXMnMGDvQgOzc17QWG38stU= X-Google-Smtp-Source: AMrXdXuLQFk5zSQ5C/ovY5wIoDXexprq/nRLGN8SWXYFzlVkLqEjONlkiFG7jNxCEdEZUbaNfDOwvg== X-Received: by 2002:adf:fe87:0:b0:274:fae4:a512 with SMTP id l7-20020adffe87000000b00274fae4a512mr13390908wrr.71.1672344127451; Thu, 29 Dec 2022 12:02:07 -0800 (PST) Received: from [192.168.0.160] ([170.253.36.171]) by smtp.gmail.com with ESMTPSA id a6-20020adff7c6000000b002421db5f279sm18735204wrq.78.2022.12.29.12.02.06 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 29 Dec 2022 12:02:06 -0800 (PST) Sender: Alejandro Colomar Message-ID: <2a6f6912-592a-b82b-0efb-ea985dea2548@gmail.com> Date: Thu, 29 Dec 2022 21:02:06 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0 Subject: Re: Bug 29863 - Segmentation fault in memcmp-sse2.S if memory contents can concurrently change Content-Language: en-US To: Zack Weinberg , Wilco Dijkstra Cc: Carlos O'Donell , 'GNU C Library' References: From: Alejandro Colomar In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Zack, On 12/29/22 08:21, Zack Weinberg via Libc-alpha wrote: > On Wed, 14 Dec 2022 16:56:28 -0500, Wilco Dijkstra wrote: >> I'd expect that mem* functions will never read outside their bounds >> since the bounds are explicitly defined by the arguments, not by the >> data. So that should be easy to guarantee. > > I concur. > >> For the str* functions it may be harder since the data itself >> defines when to stop reading. So if an implementation uses multiple >> accesses to the same address, you could potentially mistake the end >> of a string (eg. first one detects a special case, while the 2nd >> then verifies it). > > I also concur here. > >> Still, I wouldn't expect totally random memory accesses even in this >> case - you would read beyond the end of a string if the string end >> is changed concurrently. > > We may run into a problem where it’s difficult to _state_ the limits > of the misbehavior, just because the C standard doesn’t itself try to > put limits on misbehavior in the face of an incorrect program, so we > don’t have any language for it (which I would argue is a bug in the > standard, see the detailed reply to Carlos that I’ll be writing, er, > tomorrow). The standard already makes some kind of guarantee, when it differences between bound UB and critical UB. The problem is that once you've met bound UB, it's hard not to convert it to critical UB in the following lines of code. Even a compiler assumption that would otherwise be fine might result in critical UB. > > Still, taking strcmp(a, b) for example, and assuming WLOG a flat > address space in which a < b, it should be possible to guarantee > > - no accesses to any byte in the range [0, a) ever > - if an oracle for strlen(), capable of executing in zero cycles, > would return the same value for strlen(a) throughout the execution > of strcmp(), then no accesses to any byte in the range > [a+strlen(a), b) > - if an oracle for strlen(), capable of executing in zero cycles, > would return the same value for strlen(b) throughout the execution > of strcmp(), then no accesses to any byte in the range > [b+strlen(b), ADDR_MAX) > - however, if the oracle strlen() values _do_ change during the > execution of strcmp(), then accesses to bytes in the latter two > ranges are possible > - a SIGSEGV is permissible if and only if there was at least one > point during execution at which a call to the oracle strlen() would > have triggered a SIGSEGV Are you meaning this would be an invalid implementation of strcmp(3)?: int strcmp(const char *s1, const char *s2) { for (; *s1 != '\0' || *s1 != '\0'; s1++, s2++) { if (*s1 < *s2) return -1; if (*s1 > *s2) return 1; } } Okay, probably it's not the fastest one, but it's simple. This one would SIGSEGV in the following case: Another thread might insert a NUL at the beginning of each string (after the loop has passed over it), and in the next cycle remove the previously-terminating NUL from the strings. The loop would then run forever, until a crash. Cheers, Alex > > Ne? > >> Finally it's worth mentioning that nscd does the exact same thing: >> it uses memcmp and non-atomic accesses on shared data that is being >> modified by other threads. It looks totally broken, especially with >> weaker memory ordering, however this kind of insanity may actually >> be a common design pattern... > > I don’t want to hold up nscd as an example of quality design or > implementation, but yeah, I share your concern re “may actually be a > common design pattern”… > > zw