From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mailrelay.tugraz.at (mailrelay.tugraz.at [129.27.2.202]) by sourceware.org (Postfix) with ESMTPS id 6DD433857C45 for ; Fri, 24 Feb 2023 16:37:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6DD433857C45 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=tugraz.at Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=tugraz.at Received: from vra-173-101.tugraz.at (vra-173-101.tugraz.at [129.27.173.101]) by mailrelay.tugraz.at (Postfix) with ESMTPSA id 4PNbCn0P3fz1LM0W; Fri, 24 Feb 2023 17:37:40 +0100 (CET) DKIM-Filter: OpenDKIM Filter v2.11.0 mailrelay.tugraz.at 4PNbCn0P3fz1LM0W DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tugraz.at; s=mailrelay; t=1677256663; bh=h8nUQZQLZM6QP36i79AHQONhG/jQ4xKnT+xqgN6d1E8=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=hj7nQQD+SaJ70YY4QG8PZ2EQxe3QnCn1ybKC+Uhk24EuebiYgHLCRcR+QgE3EXmc7 XsFSf7F8IJrC8U6dr0zJcjBWRvEBmQIJJ//KiFodrjOuX48XcqvRJZQeznOGIcCGoS mP9khMLlCzEA8Tu7oIOEH1pGvVpDsjieluLmKrtM= Message-ID: <499fd21da32ccd8d6207f004c39a2261acbb49ea.camel@tugraz.at> Subject: Re: Missed warning (-Wuse-after-free) From: Martin Uecker To: "Serge E. Hallyn" Cc: Alex Colomar , GCC , Iker Pedrosa , Florian Weimer , Paul Eggert , Michael Kerrisk , =?UTF-8?Q?J=E2=82=91=E2=82=99=E2=82=9B?= Gustedt , David Malcolm , Sam James , Jonathan Wakely Date: Fri, 24 Feb 2023 17:37:40 +0100 In-Reply-To: <20230224160145.GA374298@mail.hallyn.com> References: <38e7e994a81d2a18666404dbaeb556f3508a6bd6.camel@redhat.com> <23d3a3ff-adad-ac2e-92a6-4e19f4093143@gmail.com> <2148ef80dee2a034ee531d662fc8709d26159ec5.camel@tugraz.at> <0049730a-e28c-0e0f-8d92-695395f1ec21@gmail.com> <6edeb3c197c327c1c6639d322c53ec6056039a33.camel@tugraz.at> <20230224012114.GA360078@mail.hallyn.com> <9d34a5da747601b0d9a3512cddfaf113726620ee.camel@tugraz.at> <20230224160145.GA374298@mail.hallyn.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.38.3-1+deb11u1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TUG-Backscatter-control: G/VXY7/6zeyuAY/PU2/0qw X-Spam-Scanner: SpamAssassin 3.003001 X-Spam-Score-relay: -1.9 X-Scanned-By: MIMEDefang 2.74 on 129.27.10.117 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Am Freitag, dem 24.02.2023 um 10:01 -0600 schrieb Serge E. Hallyn: > On Fri, Feb 24, 2023 at 09:36:45AM +0100, Martin Uecker wrote: > > Am Donnerstag, dem 23.02.2023 um 19:21 -0600 schrieb Serge E. Hallyn: ... > > > > Yes, but one comment about terminology:. The C standard > > differentiates between the representation, i.e. the bytes on > > the stack, and the value. The representation is converted to > > a value during lvalue conversion. For an invalid pointer > > the representation is indeterminate because it now does not > > point to a valid object anymore. So it is not possible to > > convert the representation to a value during lvalue conversion. > > In other words, it does not make sense to speak of the value > > of the pointer anymore. > > I'm sure there are, especially from an implementer's point of view, > great reasons for this. > > However, as just a user, the "value" of 'void *p' should absolutely > not be tied to whatever is at that address. Think about it in this way: The set of possible values for a pointer is the set of objects that exist at a point in time. If one object disappears, a pointer can not point to it anymore. So it is not that the pointer changes, but the set of valid values. > I'm given a simple > linear memory space, under which sits an entirely different view > obfuscated by page tables, but that doesn't concern me. if I say > void *p = -1, then if I print p, then I expect to see that value. If you store an integer into a pointer (you need a cast), then this is implementation-defined and may also produce an invalid pointer. > > Since I'm complaining about standards I'm picking and choosing here, > but I'll still point at the printf(3) manpage :) : > >        p The void * pointer argument is printed in hexadecimal (as if by %#x >               or %#lx). This is valid if the pointer is valid, but if the pointer is invalid, this is undefined behavior. In C one not think about pointers as addresses. They are abstract handles that point to objects, and compilers do exploit this for optimization. If you need an address, you can cast it to uintptr_t (but see below). > > > > I realize C11 appears to have changed that. I fear that in doing so it > > > actually risks increasing the confusion about pointers. IMO it's much > > > easier to reason about > > > > > > o = realloc(p, X); > > > > > > (and more baroque constructions) when keeping in mind that o, p, and the > > > object pointed to by either one are all different things. > > > > > > > What did change in C11? As far as I know, the pointer model > > did not change in C11. > > I haven't looked in more detail, and don't really plan to, but my > understanding is that the text of: > >   The lifetime of an object is the portion of program execution during which storage is >   guaranteed to be reserved for it. An object exists, has a constant address, and retains >   its last-stored value throughout its lifetime. If an object is referred to outside of its >   lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when >   the object it points to (or just past) reaches the end of its lifetime. > > (especially the last sentence) was new. This is not new. C99 "The value of a pointer becomes indeterminate when the object it points to reaches the end of its lifetime." C90: "The value of a pointer that referred to an object with automatic storage duration that is no longer guaranteed to be reserved is indeterminate." and "The value of a pointer that refers to freed space is indeterminate." > Maybe the words "value of a pointer" don't mean what I think they > mean. But that's the phrase to which I object. The n bytes on > the stack, p, are not changed just because something happened with > the accounting for the memory at the address represented by that > value. If they do, then that's not 'C' any more. It is not about the bytes of the pointer changing. But if the object is freed they do not represent a valid pointer anymore. There were CPUs that trapped when an invalid address is loaded, e.g. because the data segment for the object was removed from the segment tables. So this is a  rule in portable 'C'  for more than 30 years. Nowadays compilers exploit the knowledge that the object is freed. So you can not reliably use such a pointer. If you do this, your code will be broken on most modern compilers. > > > > > > Reading an uninitialized value of automatic storage whose > > > > > address was not taken is undefined behavior, so everything > > > > > is possible afterwards. > > > > > > > > > > An uninitialized variable whose address was taken has a > > > > > representation which can represent an unspecified value > > > > > or a no-value (trap) representation. Reading the > > > > > representation itself is always ok and gives consistent > > > > > results. Reading the variable can be undefined behavior > > > > > iff it is a trap representation, otherwise you get > > > > > the unspecified value which is stored there. > > > > > > > > > > At least this is my reading of the C standard. Compilers > > > > > are not full conformant. > > > > > > > > Does all this imply that the following is well defined behavior (and shall > > > > print what one would expect)? > > > > > > > >   free(p); > > > > > > > >   (void) &p; // take the address > > > >   // or maybe we should (void) memcmp(&p, &p, sizeof(p)); ? > > > > > > > >   printf("%p\n", p); // we took previously its address, > > > >                       // so now it has to hold consistently > > > >                       // the previous value > > > > > > > > > > > > No, the printf is not well defined, because the lvalue conversion > > of the pointer with indeterminate representation may lead to > > Sorry, my eyes glaze over at 'lvalue conversion of the pointer' - is > that referring to what's at the target address?  I meant the load of the pointer itself, not the load of the object the pointer points to. > Because to print the > hex value of p, it doesn't matter what's at *p - even if that memory > is no longer mapped, it does not affect the %x of p. > > BTW I appreciate all the insight. I'm arguing my case, but I'm quite > certain I'll walk away finally understanding why I'm wrong. I hope my answers help. We thought about these questions a lot when working on the proposed memory model for C (TS 6010) and there is not much room for changing the rules.  We could make it well defined to pass around such invalid pointers (and allow printing them). But because one can not reliably use them for anything, this would do more harm than good. In the proposed TS 6010 clarified that casting to uintptr_t gives you an address that behaves as you would expect.  But modern compilers do not yet conform to this. But I assume this will get fixed at some point. Martin