public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <richard.guenther@gmail.com>
To: Peter.Sewell@cl.cam.ac.uk
Cc: Jeff Law <law@redhat.com>, GCC Development <gcc@gcc.gnu.org>,
		cl-c-memory-object-model
	<cl-c-memory-object-model@lists.cam.ac.uk>
Subject: Re: C provenance semantics proposal
Date: Wed, 17 Apr 2019 09:41:00 -0000	[thread overview]
Message-ID: <CAFiYyc3Ri_U5Sqsv1gm6JhsOv=DYLB6LxtSLy7smP9sr-g+LWA@mail.gmail.com> (raw)
In-Reply-To: <CAHWkzRTU_qoKe375UrOb9eej757XHGq4TkdF7vuCzFp=T4wqqg@mail.gmail.com>

On Wed, Apr 17, 2019 at 11:15 AM Peter Sewell <Peter.Sewell@cl.cam.ac.uk> wrote:
>
> On 17/04/2019, Richard Biener <richard.guenther@gmail.com> wrote:
> > On Fri, Apr 12, 2019 at 5:31 PM Peter Sewell <Peter.Sewell@cl.cam.ac.uk>
> > wrote:
> >>
> >> On Fri, 12 Apr 2019 at 15:51, Jeff Law <law@redhat.com> wrote:
> >> >
> >> > On 4/2/19 2:11 AM, Peter Sewell wrote:
> >> > > Dear all,
> >> > >
> >> > > continuing the discussion from the 2018 GNU Tools Cauldron, we
> >> > > (the WG14 C memory object model study group) now
> >> > > have a detailed proposal for pointer provenance semantics, refining
> >> > > the "provenance not via integers (PNVI)" model presented there.
> >> > > This will be discussed at the ISO WG14 C standards committee at the
> >> > > end of April, and comments from the GCC community before then would
> >> > > be very welcome.   The proposal reconciles the needs of existing code
> >> > > and the behaviour of existing compilers as well as we can, but it
> >> > > doesn't
> >> > > exactly match any of the latter, so we'd especially like to know
> >> > > whether
> >> > > it would be feasible to implement - our hope is that it would only
> >> > > require
> >> > > minor changes.  It's presented in three documents:
> >> > >
> >> > > N2362  Moving to a provenance-aware memory model for C: proposal for
> >> > > C2x
> >> > > by the memory object model study group.  Jens Gustedt, Peter Sewell,
> >> > > Kayvan Memarian, Victor B. F. Gomes, Martin Uecker.
> >> > > This introduces the proposal and gives the proposed change to the
> >> > > standard
> >> > > text, presented as change-highlighted pages of the standard
> >> > > (though one might want to read the N2363 examples before going into
> >> > > that).
> >> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2362.pdf
> >> > >
> >> > > N2363  C provenance semantics: examples.
> >> > > Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, Jens Gustedt,
> >> > > Martin Uecker.
> >> > > This explains the proposal and its design choices with discussion of
> >> > > a
> >> > > series of examples.
> >> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2363.pdf
> >> > >
> >> > > N2364  C provenance semantics: detailed semantics.
> >> > > Peter Sewell, Kayvan Memarian, Victor B. F. Gomes.
> >> > > This gives a detailed mathematical semantics for the proposal
> >> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf
> >> > >
> >> > > In addition, at http://cerberus.cl.cam.ac.uk/cerberus we provide an
> >> > > executable version of the semantics, with a web interface that
> >> > > allows one to explore and visualise the behaviour of small test
> >> > > programs, stepping through and seeing the abstract-machine
> >> > > memory state including provenance information.   N2363 compares
> >> > > the results of this for the example programs with gcc, clang, and icc
> >> > > results, though the tests are really intended as tests of the
> >> > > semantics
> >> > > rather than compiler tests, so one has to interpret this with care.
> >> > THanks.  I just noticed this came up in EuroLLVM as well.    Getting
> >> > some standards clarity in this space would be good.
> >> >
> >> > Richi is in the best position to cover for GCC, but I suspect he's
> >> > buried with gcc-9 issues as we approach the upcoming release.
> >> > Hopefully
> >> > he'll have time to review this once crunch time has past.  I think more
> >> > than anything sanity checking the proposal's requirements vs what can
> >> > be
> >> > reasonably implmemented is most important at this stage.
> >>
> >> Indeed.  We talked with him at the GNU cauldron, without uncovering
> >> any serious problems, but more detailed review from an implementability
> >> point of view would be great.   For the UB mailing list we just made
> >> a brief plain-text summary of the proposal (leaving out all the examples
> >> and standards diff, and glossing over some details).  I'll paste that
> >> in below in case it's helpful.  The next WG14 meeting is the week of
> >> April 29; comments before then would be particularly useful if that's
> >> possible.
> >>
> >> best,
> >> Peter
> >>
> >> C pointer values are typically represented at runtime as simple
> >> concrete numeric values, but mainstream compilers routinely exploit
> >> information about the "provenance" of pointers to reason that they
> >> cannot alias, and hence to justify optimisations.  This is
> >> long-standing practice, but exactly what it means (what programmers
> >> can rely on, and what provenance-based alias analysis is allowed to
> >> do), has never been nailed down.   That's what the proposal does.
> >>
> >>
> >> The basic idea is to associate a *provenance* with every pointer
> >> value, identifying the original storage instance (or allocation, in
> >> other words) that the pointer is derived from.  In more detail:
> >>
> >> - We take abstract-machine pointer values to be pairs (pi,a), adding a
> >>   provenance pi, either @i where i is a storage instance ID, or the
> >>   *empty* provenance, to their concrete address a.
> >>
> >> - On every storage instance creation (of objects with static, thread,
> >>   automatic, and allocated storage duration), the abstract machine
> >>   nondeterministically chooses a fresh storage instance ID i (unique
> >>   across the entire execution), and the resulting pointer value
> >>   carries that single storage instance ID as its provenance @i.
> >>
> >> - Provenance is preserved by pointer arithmetic that adds or subtracts
> >>   an integer to a pointer.
> >>
> >> - At any access via a pointer value, its numeric address must be
> >>   consistent with its provenance, with undefined behaviour
> >>   otherwise. In particular:
> >>
> >>   -- access via a pointer value which has provenance a single storage
> >>      instance ID @i must be within the memory footprint of the
> >>      corresponding original storage instance, which must still be
> >>      live.
> >>
> >>   -- all other accesses, including those via a pointer value with
> >>      empty provenance, are undefined behaviour.
> >>
> >> Regarding such accesses as undefined behaviour is necessary to make
> >> optimisation based on provenance alias analysis sound: if the standard
> >> did define behaviour for programs that make provenance-violating
> >> accesses, e.g.~by adopting a concrete semantics, optimisation based on
> >> provenance-aware alias analysis would not be sound.  In other words,
> >> the provenance lets one distinguish a one-past pointer from a pointer
> >> to the start of an adjacently-allocated object, which otherwise are
> >> indistinguishable.
> >>
> >> All this is for the C abstract machine as defined in the standard:
> >> compilers might rely on provenance in their alias analysis and
> >> optimisation, but one would not expect normal implementations to
> >> record or manipulate provenance at runtime (though dynamic or static
> >> analysis tools might).
> >>
> >>
> >> Then, to support low-level systems programming, C provides many other
> >> ways to construct and manipulate pointer values:
> >>
> >> - casts of pointers to integer types and back, possibly with integer
> >>   arithmetic, e.g.~to force alignment, or to store information in
> >>   unused bits of pointers;
> >>
> >> - copying pointer values with memcpy;
> >>
> >> - manipulation of the representation bytes of pointers, e.g.~via user
> >>   code that copies them via char* or unsigned char* accesses;
> >>
> >> - type punning between pointer and integer values;
> >>
> >> - I/O, using either fprintf/fscanf and the %p format, fwrite/fread on
> >>   the pointer representation bytes, or pointer/integer casts and
> >>   integer I/O;
> >>
> >> - copying pointer values with realloc; and
> >>
> >> - constructing pointer values that embody knowledge established from
> >>   linking, and from constants that represent the addresses of
> >>   memory-mapped devices.
> >>
> >>
> >> A satisfactory semantics has to address all these, together with the
> >> implications on optimisation.  We've explored several, but our main
> >> proposal is "PNVI-ae-udi" (provenance not via integers,
> >> address-exposed, user-disambiguation).
> >>
> >> This semantics does not track provenance via integers.  Instead, at
> >> integer-to-pointer cast points, it checks whether the given address
> >> points within a live object that has previously been *exposed* and, if
> >> so, recreates the corresponding provenance.
> >>
> >> A storage instance is deemed exposed by a cast of a pointer to it to
> >> an integer type, by a read (at non-pointer type) of the representation
> >> of the pointer, or by an output of the pointer using %p.
> >
> > So this is not what GCC implements which tracks provenance through
> > non-pointer types to a limited extent when only copying is taking place.
> >
> > Your proposal makes
> >
> >  int a, b;
> >  int *p = &a;
> >  int *q = &b;
> >  uintptr_t pi = (uintptr_t)p; //expose
> >  uintptr_t qi = (uintptr_t)q; //expose
> >  pi += 4;
> >  if (pi == qi)
> >    *(int *)pi = 1;
> >
> > well-defined since (int *)pi now has the provenance of &b.
>
> Yes.  (Just to be clear: it's not that we think the above example is
> desirable in itself, but it's well-defined as a consequence of what
> we do to make other common idioms, eg pointer bit manipulation,
> well-defined.)
>
> > Note GCC, when tracking provenance of non-pointer type
> > adds like in
> >
> >   int *p = &a;
> >   uintptr_t pi = (uintptr_t)p;
> >   pi += 4;
> >
> > considers pi to have provenance "anything" (not sure if you
> > have something like that) since we add 4 which has provenance
> > "anything" to pi which has provenance &a.
>
> We don't at present have a provenance "anything", but if the gcc
> "anything" means that it's assumed that it might alias with anything,
> then it looks like gcc's implementing a sound approximation to
> the proposal here?

GCC makes the code well-defined whereas the proposal would make
dereferencing a pointer based on pi invoke undefined behavior?  Since
your proposal is based on an abstract machine there isn't anything
like a pointer with multiple provenances (which "anything" is), just
pointers with no provenance (pointing outside of any object), right?
For points-to analysis we of course have to track all possible
provenances of a pointer (and if we know it doesn't point inside
any object we make it point to nothing).

Btw, GCC changed its behavior here to support optimizing matlab
generated C code which passes pointers to arrays across functions
by marshalling them in two float typed halves (yikes!).  GCC is able
to properly track provenance across the decomposition / recomposition
when doing points-to analysis ;)

Btw, one thing GCC struggles is when it applies rules that clearly
apply to pointer dereferences to pointer equality compares where
the standard has that special casing of comparing two pointers
where one points one after an object requiring the comparison
to evaluate to true when the objects are adjacent.  GCC
currently statically optimizes if (&x + 1 == &y) to false for
this reason (but not the corresponding integer comparison).

Richard.

>
> best,
> Peter
>
>
> >> The user-disambiguation refinement adds some complexity but supports
> >> roundtrip casts, from pointer to integer and back, of pointers that
> >> are one-past a storage instance.
> >

  reply	other threads:[~2019-04-17  9:41 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-02  8:11 Peter Sewell
2019-04-12 14:51 ` Jeff Law
2019-04-12 15:31   ` Peter Sewell
2019-04-17  9:06     ` Richard Biener
2019-04-17  9:15       ` Peter Sewell
2019-04-17  9:41         ` Richard Biener [this message]
2019-04-17 11:53           ` Uecker, Martin
2019-04-17 12:41             ` Richard Biener
2019-04-17 12:56               ` Uecker, Martin
2019-04-17 13:35                 ` Richard Biener
2019-04-17 14:12                   ` Uecker, Martin
2019-04-17 17:31                     ` Peter Sewell
2019-04-18  9:32                     ` Richard Biener
2019-04-18  9:56                       ` Richard Biener
2019-04-18 10:48                         ` Peter Sewell
2019-04-18 11:57                         ` Uecker, Martin
2019-04-18 12:31                           ` Richard Biener
2019-04-18 13:25                             ` Uecker, Martin
2019-04-18 10:45                       ` Peter Sewell
2019-04-18 12:20                         ` Uecker, Martin
2019-04-18 12:42                           ` Richard Biener
2019-04-18 12:47                             ` Jakub Jelinek
2019-04-18 12:51                               ` Jakub Jelinek
2019-04-18 13:29                                 ` Jeff Law
2019-04-24 10:12                                   ` Richard Biener
2019-04-18 13:49                             ` Uecker, Martin
2019-04-19  8:19                             ` Jens Gustedt
2019-04-19  8:49                               ` Jakub Jelinek
2019-04-19  9:09                                 ` Jens Gustedt
2019-04-19  9:34                                   ` Jakub Jelinek
2019-04-21  8:15                                     ` Jens Gustedt
2019-04-24 10:24                                   ` Richard Biener
2019-04-24 18:43                                     ` Jeff Law
2019-04-24 19:21                                       ` Jens Gustedt
2019-04-19  9:11                                 ` Peter Sewell
2019-04-19  9:15                                   ` Jens Gustedt
2019-04-19  9:35                                     ` Peter Sewell
2019-04-19 10:35                                       ` Uecker, Martin
2019-04-19 10:01                               ` Uecker, Martin
2019-04-18 13:42                           ` Jeff Law
2019-04-18 13:54                             ` Uecker, Martin
2019-04-18 14:49                               ` Peter Sewell
2019-04-18 15:09                                 ` Uecker, Martin
2019-04-24 10:19                             ` Richard Biener
2019-04-24 18:41                               ` Jeff Law
2019-04-24 19:30                                 ` Philipp Klaus Krause
2019-04-24 19:55                                   ` Uecker, Martin
2019-04-24 19:33                                 ` Jakub Jelinek
2019-04-24 21:19                                 ` Peter Sewell
2019-04-25 12:42                                   ` Richard Biener
2019-04-25 13:03                                     ` Peter Sewell
2019-04-25 13:13                                       ` Richard Biener
2019-04-25 13:20                                         ` Peter Sewell
2019-04-29 14:31                                       ` Joseph Myers
2019-04-25 12:39                                 ` Richard Biener
2019-05-09 11:26                                   ` Ralf Jung

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFiYyc3Ri_U5Sqsv1gm6JhsOv=DYLB6LxtSLy7smP9sr-g+LWA@mail.gmail.com' \
    --to=richard.guenther@gmail.com \
    --cc=Peter.Sewell@cl.cam.ac.uk \
    --cc=cl-c-memory-object-model@lists.cam.ac.uk \
    --cc=gcc@gcc.gnu.org \
    --cc=law@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).