From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.hallyn.com (mail.hallyn.com [178.63.66.53]) by sourceware.org (Postfix) with ESMTPS id D33BF3851ABF for ; Fri, 24 Feb 2023 16:01:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D33BF3851ABF Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=hallyn.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mail.hallyn.com Received: by mail.hallyn.com (Postfix, from userid 1001) id 4F3B2CCD; Fri, 24 Feb 2023 10:01:45 -0600 (CST) Date: Fri, 24 Feb 2023 10:01:45 -0600 From: "Serge E. Hallyn" To: Martin Uecker Cc: "Serge E. Hallyn" , Alex Colomar , GCC , Iker Pedrosa , Florian Weimer , Paul Eggert , Michael Kerrisk , =?utf-8?B?SuKCkeKCmeKCmw==?= Gustedt , David Malcolm , Sam James , Jonathan Wakely Subject: Re: Missed warning (-Wuse-after-free) Message-ID: <20230224160145.GA374298@mail.hallyn.com> References: <38e7e994a81d2a18666404dbaeb556f3508a6bd6.camel@redhat.com> <23d3a3ff-adad-ac2e-92a6-4e19f4093143@gmail.com> <2148ef80dee2a034ee531d662fc8709d26159ec5.camel@tugraz.at> <0049730a-e28c-0e0f-8d92-695395f1ec21@gmail.com> <6edeb3c197c327c1c6639d322c53ec6056039a33.camel@tugraz.at> <20230224012114.GA360078@mail.hallyn.com> <9d34a5da747601b0d9a3512cddfaf113726620ee.camel@tugraz.at> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <9d34a5da747601b0d9a3512cddfaf113726620ee.camel@tugraz.at> X-Spam-Status: No, score=-0.4 required=5.0 tests=BAYES_00,BODY_8BITS,KAM_DMARC_STATUS,SPF_HELO_PASS,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, Feb 24, 2023 at 09:36:45AM +0100, Martin Uecker wrote: > Am Donnerstag, dem 23.02.2023 um 19:21 -0600 schrieb Serge E. Hallyn: > > On Fri, Feb 24, 2023 at 01:02:54AM +0100, Alex Colomar wrote: > > > Hi Martin, > > > > > > On 2/23/23 20:57, Martin Uecker wrote: > > > > Am Donnerstag, dem 23.02.2023 um 20:23 +0100 schrieb Alex Colomar: > > > > > Hi Martin, > > > > > > > > > > On 2/17/23 14:48, Martin Uecker wrote: > > > > > > > This new wording doesn't even allow one to use memcmp(3); > > > > > > > just reading the pointer value, however you do it, is UB. > > > > > > > > > > > > memcmp would not use the pointer value but work > > > > > > on the representation bytes and is still allowed. > > > > > > > > > > Hmm, interesting. It's rather unspecified behavior. Still > > > > > unpredictable: (memcmp(&p, &p, sizeof(p) == 0) might evaluate to true or > > > > > false randomly; the compiler may compile out the call to memcmp(3), > > > > > since it knows it won't produce any observable behavior. > > > > > > > > > > > > > > > > > > No, I think several things get mixed up here. > > > > > > > > The representation of a pointer that becomes invalid > > > > does not change. > > > > > > > > So (0 === memcmp(&p, &p, sizeof(p)) always > > > > evaluates to true. > > > > > > > > Also in general, an unspecified value is simply unspecified > > > > but does not change anymore. > > > > Right. p is its own thing - n bytes on the stack containing some value. > > Once it comes into scope, it doesn't change on its own. And if I do > > free(p) or o = realloc(p), then the value of p itself - the n bytes on > > the stack - does not change. > > Yes, but one comment about terminology:. The C standard > differentiates between the representation, i.e. the bytes on > the stack, and the value. The representation is converted to > a value during lvalue conversion. For an invalid pointer > the representation is indeterminate because it now does not > point to a valid object anymore. So it is not possible to > convert the representation to a value during lvalue conversion. > In other words, it does not make sense to speak of the value > of the pointer anymore. I'm sure there are, especially from an implementer's point of view, great reasons for this. However, as just a user, the "value" of 'void *p' should absolutely not be tied to whatever is at that address. I'm given a simple linear memory space, under which sits an entirely different view obfuscated by page tables, but that doesn't concern me. if I say void *p = -1, then if I print p, then I expect to see that value. Since I'm complaining about standards I'm picking and choosing here, but I'll still point at the printf(3) manpage :) : p The void * pointer argument is printed in hexadecimal (as if by %#x or %#lx). > > I realize C11 appears to have changed that. I fear that in doing so it > > actually risks increasing the confusion about pointers. IMO it's much > > easier to reason about > > > > o = realloc(p, X); > > > > (and more baroque constructions) when keeping in mind that o, p, and the > > object pointed to by either one are all different things. > > > > What did change in C11? As far as I know, the pointer model > did not change in C11. I haven't looked in more detail, and don't really plan to, but my understanding is that the text of: The lifetime of an object is the portion of program execution during which storage is guaranteed to be reserved for it. An object exists, has a constant address, and retains its last-stored value throughout its lifetime. If an object is referred to outside of its lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime. (especially the last sentence) was new. Maybe the words "value of a pointer" don't mean what I think they mean. But that's the phrase to which I object. The n bytes on the stack, p, are not changed just because something happened with the accounting for the memory at the address represented by that value. If they do, then that's not 'C' any more. > > > > Reading an uninitialized value of automatic storage whose > > > > address was not taken is undefined behavior, so everything > > > > is possible afterwards. > > > > > > > > An uninitialized variable whose address was taken has a > > > > representation which can represent an unspecified value > > > > or a no-value (trap) representation. Reading the > > > > representation itself is always ok and gives consistent > > > > results. Reading the variable can be undefined behavior > > > > iff it is a trap representation, otherwise you get > > > > the unspecified value which is stored there. > > > > > > > > At least this is my reading of the C standard. Compilers > > > > are not full conformant. > > > > > > Does all this imply that the following is well defined behavior (and shall > > > print what one would expect)? > > > > > >   free(p); > > > > > >   (void) &p; // take the address > > >   // or maybe we should (void) memcmp(&p, &p, sizeof(p)); ? > > > > > >   printf("%p\n", p); // we took previously its address, > > >                       // so now it has to hold consistently > > >                       // the previous value > > > > > > > > No, the printf is not well defined, because the lvalue conversion > of the pointer with indeterminate representation may lead to Sorry, my eyes glaze over at 'lvalue conversion of the pointer' - is that referring to what's at the target address? Because to print the hex value of p, it doesn't matter what's at *p - even if that memory is no longer mapped, it does not affect the %x of p. BTW I appreciate all the insight. I'm arguing my case, but I'm quite certain I'll walk away finally understanding why I'm wrong. > undefined behavior. > > > Martin > > > > > This feels weird. And a bit of a Schroedinger's pointer. I'm not entirely > > > convinced, but might be. > > > > Again, p is just an n byte variable which happens to have (one hopes) > > pointed at a previously malloc'd address. > > > > And I'd argue that pre-C11, this was not confusing, and would not have > > felt weird to you. > > > > But I am most grateful to you for having brought this to my attention. > > I may not agree with it and not like it, but it's right there in the > > spec, so time for me to adjust :) > > > > > >