From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 3789138533CE for ; Wed, 14 Dec 2022 04:16:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3789138533CE Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670991387; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0Y1rziSDd1CFUtgpNtltxzCm/k8KrAmXJa10KWrxtq4=; b=IZdvPrpokVAxnUs19zQElGd3iW8/bpAFdm55UBL3A59vAdow3AhYLm1PUGGg5hyHgu+Bag KI+aOxKuhGQUrVNyNstA8MlVKw6V5bRHDyM5WIGn5/vvx5CWd+/BObUVb7Ix8U3svpDCs9 QseasSFG0j0F6kh2OJanph6asVaSNRY= Received: from mail-io1-f70.google.com (mail-io1-f70.google.com [209.85.166.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-390-6oUXxhd_O6meuN4yikCSJQ-1; Tue, 13 Dec 2022 23:16:23 -0500 X-MC-Unique: 6oUXxhd_O6meuN4yikCSJQ-1 Received: by mail-io1-f70.google.com with SMTP id z200-20020a6bc9d1000000b006e003aecf04so3196125iof.16 for ; Tue, 13 Dec 2022 20:16:23 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=0Y1rziSDd1CFUtgpNtltxzCm/k8KrAmXJa10KWrxtq4=; b=z5CneAvngWceplHMOAUb4TTHgAOi/LzkvCvC9eM4s44izREPjnPtDB8v3qDYKQ1Lfx 6aa+GOH3zdCFdXLyV15dNRIbwghEzFV1iqUGJS971Iru+MgChvn/nKm1H+KCqDkoI/Gb L+7yT9hOYuj5kIZboYSSh6W2TxM5B8lNIyHeycMY9CpWm419u9UxxFPBwUwk3b3w5Ice pb/y4cVqlaNRwfuGHOCd8Wjh+Jg1W5nhdsy8NoCgX+hfpT4UyM/doEs+5FrUQ3g9spOY HAIcPX8k5K5+q/fElJ1IDsn6xse9V57KXuvla5XdAgBkGaUW6jx+PYvsbqe9atgcvKfF WHUQ== X-Gm-Message-State: ANoB5pm2/renspaZ+t7L7xyWIP3og8j2jBNo1HF+k19DPznSblvhSlxp 7Gq6aTdNgtVahOrQxAJF2cZ3TZkcXKYLGoKXvwNhyDxGx9JD5unQyxqSDGrGD/Ycqk5toJQ/Bxd Lz/Q4W80HtbwCNv6hbvuG X-Received: by 2002:a6b:cac7:0:b0:6d4:264f:2427 with SMTP id a190-20020a6bcac7000000b006d4264f2427mr13524746iog.21.1670991382434; Tue, 13 Dec 2022 20:16:22 -0800 (PST) X-Google-Smtp-Source: AA0mqf71CxSR8Uy7v5YtYAexJcGwZUpdjHHyrnyu9Tqv/GKhfZxvJgeuMn7Vvr0i8Bm49eZsWu2WEQ== X-Received: by 2002:a6b:cac7:0:b0:6d4:264f:2427 with SMTP id a190-20020a6bcac7000000b006d4264f2427mr13524735iog.21.1670991382046; Tue, 13 Dec 2022 20:16:22 -0800 (PST) Received: from [192.168.0.241] (192-0-145-146.cpe.teksavvy.com. [192.0.145.146]) by smtp.gmail.com with ESMTPSA id l14-20020a05660227ce00b006ca9e36fec8sm5859975ios.54.2022.12.13.20.16.20 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 13 Dec 2022 20:16:20 -0800 (PST) Message-ID: Date: Tue, 13 Dec 2022 23:16:19 -0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.0 Subject: Re: Bug 29863 - Segmentation fault in memcmp-sse2.S if memory contents can concurrently change To: Zack Weinberg Cc: GNU libc development References: <0a1f01d90f1f$96c7ce60$c4576b20$@yottadb.com> <0b2901d90f26$f82b4720$e881d560$@yottadb.com> <38450ca5-599d-4e5d-b2db-be01856680cb@app.fastmail.com> <736bb5b6-f9d5-b541-f983-1e5026aaacfa@redhat.com> From: Carlos O'Donell Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-6.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 12/13/22 21:28, Zack Weinberg wrote: > Carlos O'Donell writes: > >> On 12/13/22 15:56, Zack Weinberg via Libc-alpha wrote: >>> I think it would be reasonable for glibc to make the following weaker guarantee: >>> for any call `memcmp(a, b, n)`, if the data pointed to by `a` and/or `b` is being >>> concurrently modified, the return value is unspecified but *not* indeterminate. >>> Also, memcmp will never access memory outside the bounds [a, a+n) and [b, b+n), >>> no matter what. >> >> I disagree strongly. > > I’m really surprised to hear you say that. To me this is a natural > guarantee for memcmp — in fact, for *all* of the mem* functions — to > make, to the point where my reaction was *of course* this is our bug! Please let me expand on my answer. We are talking about the C language, and when you write "unspecified" in that context it means the language *does* have something to say about the behaviour but does not pick one or other of the available behaviours. This is not the case, the language very clearly says this is undefined behaviour, so it says nothing about what should happen. My understanding was that you were trying to ascribe more determinism to the operation of memcpy under the presence of data races than could be granted by UB. I strongly disagree to ascribing more determinism than UB. Could you expand on why you think this is a "natural" guarantee and from what that derives from? Is it that you view the input domain to the function as the "natural" bytes upon which the function is allowed to operate? >> These are advanced lockless techniques. >> They should be hidden behind new APIs that provide the required guarantees. > > That the application was doing “advanced lockless techniques” is, to me, If the application does not follow the language requirements then it is UB. There are some advanced lockless techniques that do not follow the C memory model. My point is that such techniques and the requirements should be well understood and implemented by APIs that provide the required guarantees, not by existing C string and memory APIs. > not relevant in the slightest. The important thing to me is that the > memory regions `memcmp` is allowed to access are wholly specified by the > mathematical values of its three arguments, and *not* by the data > pointed to by the first two arguments. Nothing any other thread does > can change the fact that memcmp has no business touching bytes at > addresses outside [a, a+n) and [b, b+n). I strongly disagree (though the quiet theoretician in me agrees with you). The standards are in no way prescriptive in saying that memcmp shall not read or write to memory outside of the input domain. > Why do you think it is important for the C library to have latitude to > break that aspect of the mem* functions’ API contract? Even if only > under exotic circumstances? Why do you think an application programmer has the right to ignore the requirements of the language and expect the runtime to operate as intended? Why would we as library authors enter into an API contract that is *stronger* than the language guarantees? I am empathetic to the yottadb developers, but we need new APIs for these requirements. I will still review Noah's patch here: https://sourceware.org/pipermail/libc-alpha/2022-December/144058.html I do not have a sustained objection to modifying things to "just work" (tm) because with UB you could do anything, and it doesn't impact the normal use cases. That is to say for a specific patch, and a specific change, I can agree, but not to the larger generalizations about memcmp. -- Cheers, Carlos.