public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: Martin Uecker <uecker@tugraz.at>
To: "Serge E. Hallyn" <serge@hallyn.com>
Cc: "Alex Colomar" <alx.manpages@gmail.com>, GCC <gcc@gcc.gnu.org>,
	"Iker Pedrosa" <ipedrosa@redhat.com>,
	"Florian Weimer" <fweimer@redhat.com>,
	"Paul Eggert" <eggert@cs.ucla.edu>,
	"Michael Kerrisk" <mtk.manpages@gmail.com>,
	"Jₑₙₛ Gustedt" <jens.gustedt@inria.fr>,
	"David Malcolm" <dmalcolm@redhat.com>,
	"Sam James" <sam@gentoo.org>,
	"Jonathan Wakely" <jwakely.gcc@gmail.com>
Subject: Re: Missed warning (-Wuse-after-free)
Date: Fri, 24 Feb 2023 17:37:40 +0100	[thread overview]
Message-ID: <499fd21da32ccd8d6207f004c39a2261acbb49ea.camel@tugraz.at> (raw)
In-Reply-To: <20230224160145.GA374298@mail.hallyn.com>

Am Freitag, dem 24.02.2023 um 10:01 -0600 schrieb Serge E. Hallyn:
> On Fri, Feb 24, 2023 at 09:36:45AM +0100, Martin Uecker wrote:
> > Am Donnerstag, dem 23.02.2023 um 19:21 -0600 schrieb Serge E. Hallyn:

...
> > 
> > Yes, but one comment about terminology:. The C standard
> > differentiates between the representation, i.e. the bytes on
> > the stack, and the value.  The representation is converted to
> > a value during lvalue conversion.  For an invalid pointer
> > the representation is indeterminate because it now does not
> > point to a valid object anymore.  So it is not possible to
> > convert the representation to a value during lvalue conversion.
> > In other words, it does not make sense to speak of the value
> > of the pointer anymore.
> 
> I'm sure there are, especially from an implementer's point of view,
> great reasons for this.
> 
> However, as just a user, the "value" of 'void *p' should absolutely
> not be tied to whatever is at that address.

Think about it in this way: The set of possible values for a pointer
is the set of objects that exist at a point in time. If one object
disappears, a pointer can not point to it anymore. So it is not that
the pointer changes, but the set of valid values.

>   I'm given a simple
> linear memory space, under which sits an entirely different view
> obfuscated by page tables, but that doesn't concern me.  if I say
> void *p = -1, then if I print p, then I expect to see that value.

If you store an integer into a pointer (you need a cast), then
this is implementation-defined and may also produce an invalid
pointer.

> 
> Since I'm complaining about standards I'm picking and choosing here,
> but I'll still point at the printf(3) manpage :)  :
> 
>        p      The  void * pointer argument is printed in hexadecimal (as if by %#x
>               or %#lx).

This is valid if the pointer is valid, but if the pointer
is invalid, this is undefined behavior.

In C one not think about pointers as addresses. They
are abstract handles that point to objects, and compilers
do exploit this for optimization.

If you need an address, you can cast it to uintptr_t
(but see below).

> 
> > > I realize C11 appears to have changed that.  I fear that in doing so it
> > > actually risks increasing the confusion about pointers.  IMO it's much
> > > easier to reason about
> > > 
> > > 	o = realloc(p, X);
> > > 
> > > (and more baroque constructions) when keeping in mind that o, p, and the
> > > object pointed to by either one are all different things.
> > > 
> > 
> > What did change in C11? As far as I know, the pointer model
> > did not change in C11.
> 
> I haven't looked in more detail, and don't really plan to, but my
> understanding is that the text of:
> 
>   The lifetime of an object is the portion of program execution during which storage is
>   guaranteed to be reserved for it. An object exists, has a constant address, and retains
>   its last-stored value throughout its lifetime. If an object is referred to outside of its
>   lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when
>   the object it points to (or just past) reaches the end of its lifetime.
> 
> (especially the last sentence) was new.

This is not new.

C99 "The value of a pointer becomes indeterminate when
the object it points to reaches the end of its lifetime."

C90: "The value of a pointer that referred to an object
with automatic storage duration that is no longer
guaranteed to be reserved is indeterminate."

and

"The value of a pointer that refers to freed space is
indeterminate."

> Maybe the words "value of a pointer" don't mean what I think they
> mean.  But that's the phrase to which I object.  The n bytes on
> the stack, p, are not changed just because something happened with
> the accounting for the memory at the address represented by that
> value.  If they do, then that's not 'C' any more.

It is not about the bytes of the pointer changing. But if
the object is freed they do not represent a valid pointer
anymore.  There were CPUs that trapped when an invalid
address is loaded, e.g. because the data segment for the
object was removed from the segment tables. So this is a 
rule in portable 'C'  for more than 30 years.

Nowadays compilers exploit the knowledge that the
object is freed. So you can not reliably use such
a pointer. If you do this, your code will be broken on
most modern compilers.


> 
> > > > > Reading an uninitialized value of automatic storage whose
> > > > > address was not taken is undefined behavior, so everything
> > > > > is possible afterwards.
> > > > > 
> > > > > An uninitialized variable whose address was taken has a
> > > > > representation which can represent an unspecified value
> > > > > or a no-value (trap) representation. Reading the
> > > > > representation itself is always ok and gives consistent
> > > > > results. Reading the variable can be undefined behavior
> > > > > iff it is a trap representation, otherwise you get
> > > > > the unspecified value which is stored there.
> > > > > 
> > > > > At least this is my reading of the C standard. Compilers
> > > > > are not full conformant.
> > > > 
> > > > Does all this imply that the following is well defined behavior (and shall
> > > > print what one would expect)?
> > > > 
> > > >   free(p);
> > > > 
> > > >   (void) &p;  // take the address
> > > >   // or maybe we should (void) memcmp(&p, &p, sizeof(p)); ?
> > > > 
> > > >   printf("%p\n", p);  // we took previously its address,
> > > >                       // so now it has to hold consistently
> > > >                       // the previous value
> > > > 
> > > > 
> > 
> > No, the printf is not well defined, because the lvalue conversion
> > of the pointer with indeterminate representation may lead to
> 
> Sorry, my eyes glaze over at 'lvalue conversion of the pointer' - is
> that referring to what's at the target address? 

I meant the load of the pointer itself, not the load of
the object the pointer points to.

>  Because to print the
> hex value of p, it doesn't matter what's at *p - even if that memory
> is no longer mapped, it does not affect the %x of p.
> 
> BTW I appreciate all the insight.  I'm arguing my case, but I'm quite
> certain I'll walk away finally understanding why I'm wrong.

I hope my answers help. We thought about these questions a lot
when working on the proposed memory model for C (TS 6010)
and there is not much room for changing the rules. 

We could make it well defined to pass around such invalid
pointers (and allow printing them). But because one can
not reliably use them for anything, this would do more harm
than good.

In the proposed TS 6010 clarified that casting to uintptr_t
gives you an address that behaves as you would expect. 
But modern compilers do not yet conform to this. But I assume
this will get fixed at some point.

Martin






  reply	other threads:[~2023-02-24 16:37 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-16 14:35 Alejandro Colomar
2023-02-16 15:15 ` David Malcolm
2023-02-17  1:04   ` Alejandro Colomar
2023-02-17  1:05     ` Alejandro Colomar
2023-02-17  1:56       ` Sam James
2023-02-17  8:12     ` Martin Uecker
2023-02-17 11:35       ` Alejandro Colomar
2023-02-17 13:34         ` Andreas Schwab
2023-02-17 13:48         ` Martin Uecker
2023-02-23 19:23           ` Alex Colomar
2023-02-23 19:57             ` Martin Uecker
2023-02-24  0:02               ` Alex Colomar
2023-02-24  1:21                 ` Serge E. Hallyn
2023-02-24  1:42                   ` Alex Colomar
2023-02-24  3:01                     ` Peter Lafreniere
2023-02-24  8:52                       ` Martin Uecker
2023-02-24  8:43                     ` Martin Uecker
2023-02-24 16:10                     ` Serge E. Hallyn
2023-02-24  8:36                   ` Martin Uecker
2023-02-24 16:01                     ` Serge E. Hallyn
2023-02-24 16:37                       ` Martin Uecker [this message]
2023-02-17  3:48   ` Siddhesh Poyarekar
2023-02-17 11:22     ` Alejandro Colomar
2023-02-17 13:38       ` Siddhesh Poyarekar
2023-02-17 14:01         ` Mark Wielaard
2023-02-17 14:06           ` Siddhesh Poyarekar
2023-02-17 21:20         ` [PATCH] Make -Wuse-after-free=3 the default one in -Wall Alejandro Colomar
2023-02-17 21:39           ` Siddhesh Poyarekar
2023-02-17 21:41             ` Siddhesh Poyarekar
2023-02-17 22:58             ` Alejandro Colomar
2023-02-17 23:03               ` Siddhesh Poyarekar
2023-02-17 11:24     ` Missed warning (-Wuse-after-free) Jonathan Wakely
2023-02-17 11:43       ` Alejandro Colomar
2023-02-17 12:04         ` Jonathan Wakely
2023-02-17 12:53       ` Siddhesh Poyarekar
2023-02-17 14:10         ` Jonathan Wakely
2023-02-17 13:44     ` David Malcolm
2023-02-17 14:01       ` Siddhesh Poyarekar
2023-02-17  8:49 ` Yann Droneaud

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=499fd21da32ccd8d6207f004c39a2261acbb49ea.camel@tugraz.at \
    --to=uecker@tugraz.at \
    --cc=alx.manpages@gmail.com \
    --cc=dmalcolm@redhat.com \
    --cc=eggert@cs.ucla.edu \
    --cc=fweimer@redhat.com \
    --cc=gcc@gcc.gnu.org \
    --cc=ipedrosa@redhat.com \
    --cc=jens.gustedt@inria.fr \
    --cc=jwakely.gcc@gmail.com \
    --cc=mtk.manpages@gmail.com \
    --cc=sam@gentoo.org \
    --cc=serge@hallyn.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).