From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27]) by sourceware.org (Postfix) with ESMTPS id 3722F3858D20 for ; Thu, 29 Dec 2022 07:11:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3722F3858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=owlfolio.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=owlfolio.org Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.nyi.internal (Postfix) with ESMTP id EE62D5C02F0; Thu, 29 Dec 2022 02:11:21 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Thu, 29 Dec 2022 02:11:21 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=owlfolio.org; h= cc:cc:content-type:date:date:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to; s=fm1; t=1672297881; x=1672384281; bh=PNkPFIviS0 COQIEBSh60wyUVnevS9Fahr6GCrZqi5TE=; b=cR0JtkRlX7PZvRcuhsYrle8JPL 9FuqHymjkomED3ma538IelQ7aNOjlLRWs6oiVEpp1zLvqXG84RIZOZ2OGht1YZwB jyM3YmUOt4C6uYuVW22tyBn8FQz0xesw7/2o+4SbuMbWo5URagw2qz2s/wfD9DBA +KyFxPFM5uqm6A9JO5KP3cqrESvhLniOU2T341/EpSAwVryWIYAHOX1M+X2dn5fu lNhMqgOGHrlBbdBWoH9YVY2dB3kx+0fhtkZ0KBBtJDD1qxHKEiuSTReLbYQORXeJ p9VPedxlwg+7WcKDm/Ov36H0oqeGCSm61DCb3s3mRPnCD1pS97yrT8f29krQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:date:date:feedback-id :feedback-id:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:sender:subject:subject:to:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1672297881; x=1672384281; bh=PNkPFIviS0COQIEBSh60wyUVnevS 9Fahr6GCrZqi5TE=; b=FByfXSVyLzagMI4IC14zCTcEo7V3qFfWYyiR9EYEwTY1 QDgyV76w5zcLLhAf7M53LMtFnA+2ePHuvh3/9z2w4MzG0VXC2OZkZ07mX0klivfN lKrXuLZyWO76OK2I0AJWCLICEa8x15ZXl7TpVgM33hvLNAUssTYEmjwkH6YJCtjM 3DDtQ5TDvr8/e+KxFgf04LL/zCaWYmo0xOXXH8DX1hiZP/BXomfXGlOMuMW2yKX9 IUL6vBmb/6xpnzBvy1vYSivwdxMBZyQdyBO0S6bVKOPFEr+cupPgLcylWe9AaPwu CbuVTaHqfg3Jr1685Elzmsg72zb26O+f1ufpl9tvqw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrieefgddutdehucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfffkhffvvefujghffgggtgesthdtredttdergfenucfhrhhomhepkggrtghk ucghvghinhgsvghrghcuoeiirggtkhesohiflhhfohhlihhordhorhhgqeenucggtffrrg htthgvrhhnpedufeelfefgkeejudegudegkedvgfdtledtteeltdfgffehvddufeetjeej uefhjeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpe iirggtkhesohiflhhfohhlihhordhorhhg X-ME-Proxy: Feedback-ID: i876146a2:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 29 Dec 2022 02:11:21 -0500 (EST) Date: Thu, 29 Dec 2022 02:09:20 -0500 Message-ID: From: Zack Weinberg To: Paolo Bonzini Cc: Carlos O'Donell , GNU libc development Subject: Re: Bug 29863 - Segmentation fault in memcmp-sse2.S if memory contents can concurrently change In-Reply-To: <7d2b4dd3-0583-3f2b-95ee-9538615386ac@redhat.com> References: <0a1f01d90f1f$96c7ce60$c4576b20$@yottadb.com> <0b2901d90f26$f82b4720$e881d560$@yottadb.com> <38450ca5-599d-4e5d-b2db-be01856680cb@app.fastmail.com> <736bb5b6-f9d5-b541-f983-1e5026aaacfa@redhat.com> <663fab35-0f08-4b36-a653-0145c36ca7f8@app.fastmail.com> <7d2b4dd3-0583-3f2b-95ee-9538615386ac@redhat.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=ISO-2022-JP X-Spam-Status: No, score=-3.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,JMQ_SPF_NEUTRAL,RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, 14 Dec 2022 12:36:44 -0500, Paolo Bonzini wrote: > On 12/14/22 15:16, Zack Weinberg via Libc-alpha wrote: > > 7.1.4p5 (N1570) "A library > > function shall not directly or indirectly access objects accessible > > by threads other than the current thread unless the objects are > > accessed directly or indirectly via the function's arguments." > > I don't think this applies here for two reasons though: > > 1) a SIGSEGV would always be acceptable (that's not a valid object), > and so would an infinite loop An infinite loop would indeed be acceptable, but I don’t see any justification for a SIGSEGV as long as the _page tables_ are not being changed concurrently from another thread. To put it another way, given a call `memcmp(a, b, n)`, as long as the address ranges `[a, a+n)` and `[b, b+n)` remain _readable_ for the duration of the call, no, SIGSEGV is not an acceptable outcome in my book, no matter how unstable the contents of those address ranges are. > 2) I think that the as-if rule would even allow reads to objects > accessible by other threads, if they don't affect the result (so they > are not observable by the calling thread) *and* are not observable by > those other threads either. The classic case here is strlen() doing > aligned word (or larger) reads, even though those reads might trespass > the NUL terminator. I’m not entirely comfortable with that logic, because those reads are forbidden by the _abstract_ machine’s memory model, in which reading even a single byte beyond the bounds of an explicitly declared array or malloc() block is UB. Very few concrete machines can enforce that rule precisely; the only ones I can think of are the valgrind VM and _maybe_ CHERI. But that hasn’t so far stopped the compiler people from implementing optimizations that assume the concrete machine _does_ enforce that rule precisely. > The following code is admittedly a bit contrived but shows the > pitfalls of reading more than once from the same location: > > while (*(u32*)s == *(u32*)d) { > n-=4, s+=4, d+=4; > if (n < 4) goto short; > } > > // we know the loop will stop don't we? > while (*s==*d) s++, d++; > return *s < *d ? -1 : 1; > > short: > while (n && *s == *d) s++, d++, n--; > return n ? (*s < *d ? -1 : 1) : 0; In my book, this falls clearly into “don’t do that then” territory. It is our responsibility as library implementors to code memcmp so that it _does not_ access beyond the [a,a+n) [b,b+n) range, _even if_ there is a concurrent mutator. > There are also compiler issues: it's also hard to ensure that the > compiler won't decide to read twice for very down-to-Earth instruction > selection reasons, for example by using a register-and-memory ALU > operation. Many mem*/str* routines do not have a single memory write, > so the compiler has a *lot* of leeway to reorder and rematerialize > memory accesses. Again, “don’t do that then”―not “don’t read twice,” but “don’t elide bounds checks, specifically, based on the assumption that two reads from the same location will return the same value.” > For example you could have something like this in a > memmem(): > > c = *p; > ... > p += table[c]; > > If the compiler changes the second statement to "p += table1[*p]", for > example to avoid a spill, that should be _perfectly fine_, because the next line of code is something like if (p > limit) break; and not c = *p; If the compiler deletes the bounds check, I’m prepared to argue both that that’s an incorrect optimization, and that if the standard says it’s fine then the _standard_ is wrong. zw