From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 5836E382FC88 for ; Wed, 14 Dec 2022 17:36:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5836E382FC88 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671039414; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EKiYuMNLj6WMOCQOTqok3Vzk0MwseSW229NTbIl1uHc=; b=H/0XJ/LfrGg7pivpEzM3r95/gSXAC7uc+h1yR7oPa1Kg475snkCNyPPGp2SS1W0Qe4W0xE sYHCmOkDTqa61GMQ8/ka6uzN6bTYjF4aih+DocAv8vhJ632JjkNFczzRwmNNhs2Dl8ED7K 2HjdWPDciyFRSfVEVeMwjd22R3AKdeI= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-516-kNT1gLLmPJqI8CBUV510kw-1; Wed, 14 Dec 2022 12:36:47 -0500 X-MC-Unique: kNT1gLLmPJqI8CBUV510kw-1 Received: by mail-wr1-f71.google.com with SMTP id v14-20020adf8b4e000000b0024174021277so127034wra.13 for ; Wed, 14 Dec 2022 09:36:47 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:subject:from:references:cc:to :content-language:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=EKiYuMNLj6WMOCQOTqok3Vzk0MwseSW229NTbIl1uHc=; b=osG4gg8bDODupbq/N3uHIeOGmYaA4X657Lvc+PUKQpjgowUttTZ9PLW3hJSYVZflSz 2A7FJuPAfuPnpwboecWJCpm4V+2ttFvqsq39Tv1f4wLpSGSy+PtmqMetls3HZtIwFLBp G1qhN5yCgdYXsg1Qu46eyWdhuNaI9Mbqe5oCU/hS1YiTGVOJMTycR/R3OeaWJkEcPUpV gylj/mBPody58VpxApHKLMmhSQdTBKdIpNioNuhJuYwUSHgLkdp76mti1U7Mx0uoopie lF/oQ130rduP4sOk6nwj17Vf41A9MHIC7wdWgssfC3B7qhx0mIibz035DI5F8W5Av3X2 FGGQ== X-Gm-Message-State: ANoB5pm8SnnFpGHXCkurgfwqkeb4p4WGwQWitJ4IWgusZP0wTKOZUkVR CoArJdQ6Qyp8ezpS+2dslfZ3gJHcXKG+nnLLwVC4pMAtfgoCZuu/6te1LHR82C6AE8Vm39JDmq9 3ov/LX6pElcGLVVR2zbxv X-Received: by 2002:a05:600c:310e:b0:3cf:a39f:eb2a with SMTP id g14-20020a05600c310e00b003cfa39feb2amr19829776wmo.11.1671039406006; Wed, 14 Dec 2022 09:36:46 -0800 (PST) X-Google-Smtp-Source: AA0mqf7a+caSqRAAOCIvkGBTsZ30LJkp7e/BiTF5FGQPZwBDs8YTeaPH56ppFWMb1jc0BQSbRQSd1A== X-Received: by 2002:a05:600c:310e:b0:3cf:a39f:eb2a with SMTP id g14-20020a05600c310e00b003cfa39feb2amr19829766wmo.11.1671039405775; Wed, 14 Dec 2022 09:36:45 -0800 (PST) Received: from ?IPV6:2001:b07:6468:f312:e3ec:5559:7c5c:1928? ([2001:b07:6468:f312:e3ec:5559:7c5c:1928]) by smtp.googlemail.com with ESMTPSA id t16-20020a1c7710000000b003d1e4f3ac8esm3373983wmi.33.2022.12.14.09.36.44 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 14 Dec 2022 09:36:44 -0800 (PST) Message-ID: <7d2b4dd3-0583-3f2b-95ee-9538615386ac@redhat.com> Date: Wed, 14 Dec 2022 18:36:44 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.5.0 To: Zack Weinberg , Carlos O'Donell Cc: GNU libc development References: <0a1f01d90f1f$96c7ce60$c4576b20$@yottadb.com> <0b2901d90f26$f82b4720$e881d560$@yottadb.com> <38450ca5-599d-4e5d-b2db-be01856680cb@app.fastmail.com> <736bb5b6-f9d5-b541-f983-1e5026aaacfa@redhat.com> <663fab35-0f08-4b36-a653-0145c36ca7f8@app.fastmail.com> From: Paolo Bonzini Subject: Re: Bug 29863 - Segmentation fault in memcmp-sse2.S if memory contents can concurrently change In-Reply-To: <663fab35-0f08-4b36-a653-0145c36ca7f8@app.fastmail.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 12/14/22 15:16, Zack Weinberg via Libc-alpha wrote: >> The standards are in no way prescriptive in saying that memcmp >> shall not read or write to memory outside of the input domain. > > ... is (as I read it) contradicted by 7.1.4p5 (N1570) "A library > function shall not directly or indirectly access objects accessible > by threads other than the current thread unless the objects are > accessed directly or indirectly via the function's arguments." There > is more wiggle room in this wording than I'd ideally like, but since > memcmp has no way of knowing whether any particular piece of data > outside the ranges supplied as arguments is "accessible by threads > other than the current thread", it needs to be conservative and not > touch any of it. I don't think this applies here for two reasons though: 1) a SIGSEGV would always be acceptable (that's not a valid object), and so would an infinite loop 2) I think that the as-if rule would even allow reads to objects accessible by other threads, if they don't affect the result (so they are not observable by the calling thread) *and* are not observable by those other threads either. The classic case here is strlen() doing aligned word (or larger) reads, even though those reads might trespass the NUL terminator. Promising any kind of behavior when a data race happens involving the str*/mem* functions is harder than it seems. As soon as the functions read a byte more than once, the view of memory that they operate on does not even obey causality. The following code is admittedly a bit contrived but shows the pitfalls of reading more than once from the same location: while (*(u32*)s == *(u32*)d) { n-=4, s+=4, d+=4; if (n < 4) goto short; } // we know the loop will stop don't we? while (*s==*d) s++, d++; return *s < *d ? -1 : 1; short: while (n && *s == *d) s++, d++, n--; return n ? (*s < *d ? -1 : 1) : 0; ... and you could access beyond the [a,a+n) [b,b+n) range if a concurrent mutator causes the second while loop to go off the cliff. There are also compiler issues: it's also hard to ensure that the compiler won't decide to read twice for very down-to-Earth instruction selection reasons, for example by using a register-and-memory ALU operation. Many mem*/str* routines do not have a single memory write, so the compiler has a *lot* of leeway to reorder and rematerialize memory accesses. For example you could have something like this in a memmem(): c = *p; ... p += table[c]; If the compiler changes the second statement to "p += table1[*p]", for example to avoid a spill, concurrent mutation can result in out-of-bounds accesses or other kinds of UB. The standard certainly didn't require that str*/mem* be written in assembly, or that it does all of it's accesses as atomic loads or perhaps (argh!) volatile. Paolo