From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from wout2-smtp.messagingengine.com (wout2-smtp.messagingengine.com [64.147.123.25]) by sourceware.org (Postfix) with ESMTPS id 614663858CDA for ; Thu, 29 Dec 2022 19:34:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 614663858CDA Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=owlfolio.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=owlfolio.org Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id EC5D032005D8; Thu, 29 Dec 2022 14:34:53 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Thu, 29 Dec 2022 14:34:54 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=owlfolio.org; h= cc:cc:content-type:date:date:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to; s=fm1; t=1672342493; x=1672428893; bh=OTvzam42ws kry1HtVOoghEbo2DhiDMili5nsRR1bWiI=; b=2/n2NK0Fbu8kz01NkTjFo5F3V2 Yy2NtXJIiZlSx3mzGl5QkD2EXx25NUKy8ZnFCYLHz8i9XNrUqQ4qk6iLDNQ+8wuC 2/s+mrXMSItLBiG/O6vLOBDde4DIeW31qkCbg0La5Tz9V36VjflRCLvpF0EYvUor +4IRiCDkkfHH7LNrDknSP2L/2JD5QAEWuJzT2mmed4SGwYLR5BOXKJ3t10TGAGXG yLBSWAO+UjzWyDf17MN7dzafHG+Pt/sOqvKAnoe5brGksTfG+K/8zEc9Do/s5kB9 OmG4GH7QYNFe0M17wCdboiZzzTcVxh0kO56gAdc3b9Ky1jgHKqydfE2ZO5Gg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:date:date:feedback-id :feedback-id:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:sender:subject:subject:to:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1672342493; x=1672428893; bh=OTvzam42wskry1HtVOoghEbo2Dhi DMili5nsRR1bWiI=; b=rtJ2MUjndC8Az+lvehPxWhcOiaEWJKaKWcACOU7T/YhG a6R4HUTrJqxLls3rXOVxKH3+yFWAQIvJ1HgSUq4E2izFdTJZbGBftcRuYOCxNG/B P+iFugw+XyHpyiFC/bwz96jvBtESsKqlXFa+cAlQVOtQEQr1+Za8aFNJgVtOTned EjI2mal1fR7GobDoWyDGnjooOu+/DwgFKeye+BSAjyZea814kwkK5XJCZ9+JOndV 38HgzSl8R70xS1dd5WRtV0ZVEFK0IRhgw9aVohshGnblF6jcu9VUul6hbMI8BRml SQ3r49sdgDvruG9axvOFInDe+1982aOLHpJSUgs+hw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrieeggdduvdeiucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfffkhffvvefujghffgggtgesthdtredttdergfenucfhrhhomhepkggrtghk ucghvghinhgsvghrghcuoeiirggtkhesohiflhhfohhlihhordhorhhgqeenucggtffrrg htthgvrhhnpedufeelfefgkeejudegudegkedvgfdtledtteeltdfgffehvddufeetjeej uefhjeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpe iirggtkhesohiflhhfohhlihhordhorhhg X-ME-Proxy: Feedback-ID: i876146a2:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 29 Dec 2022 14:34:52 -0500 (EST) Date: Thu, 29 Dec 2022 14:32:50 -0500 Message-ID: From: Zack Weinberg To: Carlos O'Donell Cc: GNU libc development Subject: =?ISO-2022-JP?B?GyRCIUgbKEJVbmRlZmluZWQgYmVoYXZpb3IbJEIhSRsoQg==?= considered harmful (was Re: Bug 29863 - Segmentation fault in =?ISO-2022-JP?B?bWVtY21wLXNzZTIuUxskQiFEGyhCKQ==?= In-Reply-To: References: <0a1f01d90f1f$96c7ce60$c4576b20$@yottadb.com> <0b2901d90f26$f82b4720$e881d560$@yottadb.com> <38450ca5-599d-4e5d-b2db-be01856680cb@app.fastmail.com> <736bb5b6-f9d5-b541-f983-1e5026aaacfa@redhat.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=ISO-2022-JP X-Spam-Status: No, score=0.2 required=5.0 tests=BAYES_00,CHARSET_FARAWAY_HEADER,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,JMQ_SPF_NEUTRAL,RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: The original issue with memcmp() seems to have been resolved, but I’d like to start a broader discussion about the C library’s responsibilities in the face of a program that’s, to some degree, incorrect. On Tue, 13 Dec 2022 23:16:19 -0500, Carlos O'Donell wrote: > We are talking about the C language, and when you write > "unspecified" in that context it means the language *does* have > something to say about the behaviour but does not pick one or other > of the available behaviours. This is not the case, the language very > clearly says this is undefined behaviour, so it says nothing about > what should happen. > > My understanding was that you were trying to ascribe more > determinism to the operation of memcpy under the presence of data > races than could be granted by UB. > > I strongly disagree to ascribing more determinism than UB. Everything I’m about to say stems from the following three premises: 1. The C standard uses “undefined behavior” far more liberally than it ought to. In many cases of existing UB the committee could define the behavior (possibly as implementation-defined or unspecified) without any actual negative consequences. It seems the committee *is* moving in this direction as of C2x, for instance by dropping the allowances for non-twos-complement signed arithmetic, but they could and should go a lot farther down that road. 2. The remaining cases of UB are those where we still want to say that the program is *incorrect* if it does these things, but we don’t want to require the compiler to diagnose the incorrectness (usually because detection would be intractable). *Even in these cases*, the current concept of “undefined behavior,” licensing the implementation to do *anything*, is troublesome. The standard should replace the concept entirely, with something analogous to, say, the ARM ARM’s concept of “constrained unpredictable” behavior: A program that does X is incorrect, and doing X will have unpredictable consequences, BUT there are concrete limits on what those consequences can be. 3. Implementations can and should work in advance of the C committee on restricting the consequences of UB as outlined in (1) and (2). That is, the present state of the C standard should not stop _us_ from specifying the behavior of GNU libc in cases where clause 7 leaves behavior “undefined”. The earlier thread provides a concrete example of a type 2 change that I think is desirable: instead of 5.1.2.4p25 simply saying that if a data race occurs, the behavior is undefined, it should say that if a data race occurs, all calculations data- or control-dependent on the conflicting expressions produce unpredictable results, where “unpredictable result” is approximately the same thing as the current “unspecified value” but might wind up needing to have a slightly different definition. [Notice that this is already a significant constraint on what actually happens, since unpredictability can no longer propagate backwards in time.] Then, also, 7.1.4 should say that if the set of memory locations that are potentially accessed by a library function can be specified without reference to the contents of any of those memory locations (true for memXXX, not for strXXX) then, a data race on the contents of any of those memory locations cannot expand the set. > Could you expand on why you think this is a "natural" guarantee and > from what that derives from? I am envisioning (pointer, length) 2-tuples as object capabilities. By calling memcmp(a, b, n), the caller grants memcmp access to the address ranges [a, a+n) and [b, b+n), but not a single byte more. Some concrete machines (e.g. the valgrind and ASAN VMs, and to a lesser extent CHERI) can actually enforce the limits of that grant. Even if the hardware cannot enforce the limits, the callee should honor them. The contents, and the stability of the contents, of those address ranges *cannot* affect the limits of the grant, because they appear nowhere in the expressions that define the limits. > If the application does not follow the language requirements then it > is UB. As described above, arguments from what is or is not currently UB have no weight from my perspective. > Why would we as library authors enter into an API contract that is > *stronger* than the language guarantees? Because we believe that the language guarantees are too weak and need to be strengthened, and the standard committees will want to see that strengthening play out as “existing practice” before they make any changes. zw