public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* C provenance semantics proposal
@ 2019-04-02  8:11 Peter Sewell
  2019-04-12 14:51 ` Jeff Law
  0 siblings, 1 reply; 56+ messages in thread
From: Peter Sewell @ 2019-04-02  8:11 UTC (permalink / raw)
  To: gcc; +Cc: cl-c-memory-object-model

Dear all,

continuing the discussion from the 2018 GNU Tools Cauldron, we
(the WG14 C memory object model study group) now
have a detailed proposal for pointer provenance semantics, refining
the "provenance not via integers (PNVI)" model presented there.
This will be discussed at the ISO WG14 C standards committee at the
end of April, and comments from the GCC community before then would
be very welcome.   The proposal reconciles the needs of existing code
and the behaviour of existing compilers as well as we can, but it doesn't
exactly match any of the latter, so we'd especially like to know whether
it would be feasible to implement - our hope is that it would only require
minor changes.  It's presented in three documents:

N2362  Moving to a provenance-aware memory model for C: proposal for C2x
by the memory object model study group.  Jens Gustedt, Peter Sewell,
Kayvan Memarian, Victor B. F. Gomes, Martin Uecker.
This introduces the proposal and gives the proposed change to the standard
text, presented as change-highlighted pages of the standard
(though one might want to read the N2363 examples before going into that).
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2362.pdf

N2363  C provenance semantics: examples.
Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, Jens Gustedt, Martin Uecker.
This explains the proposal and its design choices with discussion of a
series of examples.
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2363.pdf

N2364  C provenance semantics: detailed semantics.
Peter Sewell, Kayvan Memarian, Victor B. F. Gomes.
This gives a detailed mathematical semantics for the proposal
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf

In addition, at http://cerberus.cl.cam.ac.uk/cerberus we provide an
executable version of the semantics, with a web interface that
allows one to explore and visualise the behaviour of small test
programs, stepping through and seeing the abstract-machine
memory state including provenance information.   N2363 compares
the results of this for the example programs with gcc, clang, and icc
results, though the tests are really intended as tests of the semantics
rather than compiler tests, so one has to interpret this with care.

best,
Peter (for the study group)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-02  8:11 C provenance semantics proposal Peter Sewell
@ 2019-04-12 14:51 ` Jeff Law
  2019-04-12 15:31   ` Peter Sewell
  0 siblings, 1 reply; 56+ messages in thread
From: Jeff Law @ 2019-04-12 14:51 UTC (permalink / raw)
  To: Peter.Sewell, gcc; +Cc: cl-c-memory-object-model

On 4/2/19 2:11 AM, Peter Sewell wrote:
> Dear all,
> 
> continuing the discussion from the 2018 GNU Tools Cauldron, we
> (the WG14 C memory object model study group) now
> have a detailed proposal for pointer provenance semantics, refining
> the "provenance not via integers (PNVI)" model presented there.
> This will be discussed at the ISO WG14 C standards committee at the
> end of April, and comments from the GCC community before then would
> be very welcome.   The proposal reconciles the needs of existing code
> and the behaviour of existing compilers as well as we can, but it doesn't
> exactly match any of the latter, so we'd especially like to know whether
> it would be feasible to implement - our hope is that it would only require
> minor changes.  It's presented in three documents:
> 
> N2362  Moving to a provenance-aware memory model for C: proposal for C2x
> by the memory object model study group.  Jens Gustedt, Peter Sewell,
> Kayvan Memarian, Victor B. F. Gomes, Martin Uecker.
> This introduces the proposal and gives the proposed change to the standard
> text, presented as change-highlighted pages of the standard
> (though one might want to read the N2363 examples before going into that).
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2362.pdf
> 
> N2363  C provenance semantics: examples.
> Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, Jens Gustedt, Martin Uecker.
> This explains the proposal and its design choices with discussion of a
> series of examples.
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2363.pdf
> 
> N2364  C provenance semantics: detailed semantics.
> Peter Sewell, Kayvan Memarian, Victor B. F. Gomes.
> This gives a detailed mathematical semantics for the proposal
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf
> 
> In addition, at http://cerberus.cl.cam.ac.uk/cerberus we provide an
> executable version of the semantics, with a web interface that
> allows one to explore and visualise the behaviour of small test
> programs, stepping through and seeing the abstract-machine
> memory state including provenance information.   N2363 compares
> the results of this for the example programs with gcc, clang, and icc
> results, though the tests are really intended as tests of the semantics
> rather than compiler tests, so one has to interpret this with care.
THanks.  I just noticed this came up in EuroLLVM as well.    Getting
some standards clarity in this space would be good.

Richi is in the best position to cover for GCC, but I suspect he's
buried with gcc-9 issues as we approach the upcoming release.  Hopefully
he'll have time to review this once crunch time has past.  I think more
than anything sanity checking the proposal's requirements vs what can be
reasonably implmemented is most important at this stage.

Jeff

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-12 14:51 ` Jeff Law
@ 2019-04-12 15:31   ` Peter Sewell
  2019-04-17  9:06     ` Richard Biener
  0 siblings, 1 reply; 56+ messages in thread
From: Peter Sewell @ 2019-04-12 15:31 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc, cl-c-memory-object-model

On Fri, 12 Apr 2019 at 15:51, Jeff Law <law@redhat.com> wrote:
>
> On 4/2/19 2:11 AM, Peter Sewell wrote:
> > Dear all,
> >
> > continuing the discussion from the 2018 GNU Tools Cauldron, we
> > (the WG14 C memory object model study group) now
> > have a detailed proposal for pointer provenance semantics, refining
> > the "provenance not via integers (PNVI)" model presented there.
> > This will be discussed at the ISO WG14 C standards committee at the
> > end of April, and comments from the GCC community before then would
> > be very welcome.   The proposal reconciles the needs of existing code
> > and the behaviour of existing compilers as well as we can, but it doesn't
> > exactly match any of the latter, so we'd especially like to know whether
> > it would be feasible to implement - our hope is that it would only require
> > minor changes.  It's presented in three documents:
> >
> > N2362  Moving to a provenance-aware memory model for C: proposal for C2x
> > by the memory object model study group.  Jens Gustedt, Peter Sewell,
> > Kayvan Memarian, Victor B. F. Gomes, Martin Uecker.
> > This introduces the proposal and gives the proposed change to the standard
> > text, presented as change-highlighted pages of the standard
> > (though one might want to read the N2363 examples before going into that).
> > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2362.pdf
> >
> > N2363  C provenance semantics: examples.
> > Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, Jens Gustedt, Martin Uecker.
> > This explains the proposal and its design choices with discussion of a
> > series of examples.
> > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2363.pdf
> >
> > N2364  C provenance semantics: detailed semantics.
> > Peter Sewell, Kayvan Memarian, Victor B. F. Gomes.
> > This gives a detailed mathematical semantics for the proposal
> > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf
> >
> > In addition, at http://cerberus.cl.cam.ac.uk/cerberus we provide an
> > executable version of the semantics, with a web interface that
> > allows one to explore and visualise the behaviour of small test
> > programs, stepping through and seeing the abstract-machine
> > memory state including provenance information.   N2363 compares
> > the results of this for the example programs with gcc, clang, and icc
> > results, though the tests are really intended as tests of the semantics
> > rather than compiler tests, so one has to interpret this with care.
> THanks.  I just noticed this came up in EuroLLVM as well.    Getting
> some standards clarity in this space would be good.
>
> Richi is in the best position to cover for GCC, but I suspect he's
> buried with gcc-9 issues as we approach the upcoming release.  Hopefully
> he'll have time to review this once crunch time has past.  I think more
> than anything sanity checking the proposal's requirements vs what can be
> reasonably implmemented is most important at this stage.

Indeed.  We talked with him at the GNU cauldron, without uncovering
any serious problems, but more detailed review from an implementability
point of view would be great.   For the UB mailing list we just made
a brief plain-text summary of the proposal (leaving out all the examples
and standards diff, and glossing over some details).  I'll paste that
in below in case it's helpful.  The next WG14 meeting is the week of
April 29; comments before then would be particularly useful if that's possible.

best,
Peter

C pointer values are typically represented at runtime as simple
concrete numeric values, but mainstream compilers routinely exploit
information about the "provenance" of pointers to reason that they
cannot alias, and hence to justify optimisations.  This is
long-standing practice, but exactly what it means (what programmers
can rely on, and what provenance-based alias analysis is allowed to
do), has never been nailed down.   That's what the proposal does.


The basic idea is to associate a *provenance* with every pointer
value, identifying the original storage instance (or allocation, in
other words) that the pointer is derived from.  In more detail:

- We take abstract-machine pointer values to be pairs (pi,a), adding a
  provenance pi, either @i where i is a storage instance ID, or the
  *empty* provenance, to their concrete address a.

- On every storage instance creation (of objects with static, thread,
  automatic, and allocated storage duration), the abstract machine
  nondeterministically chooses a fresh storage instance ID i (unique
  across the entire execution), and the resulting pointer value
  carries that single storage instance ID as its provenance @i.

- Provenance is preserved by pointer arithmetic that adds or subtracts
  an integer to a pointer.

- At any access via a pointer value, its numeric address must be
  consistent with its provenance, with undefined behaviour
  otherwise. In particular:

  -- access via a pointer value which has provenance a single storage
     instance ID @i must be within the memory footprint of the
     corresponding original storage instance, which must still be
     live.

  -- all other accesses, including those via a pointer value with
     empty provenance, are undefined behaviour.

Regarding such accesses as undefined behaviour is necessary to make
optimisation based on provenance alias analysis sound: if the standard
did define behaviour for programs that make provenance-violating
accesses, e.g.~by adopting a concrete semantics, optimisation based on
provenance-aware alias analysis would not be sound.  In other words,
the provenance lets one distinguish a one-past pointer from a pointer
to the start of an adjacently-allocated object, which otherwise are
indistinguishable.

All this is for the C abstract machine as defined in the standard:
compilers might rely on provenance in their alias analysis and
optimisation, but one would not expect normal implementations to
record or manipulate provenance at runtime (though dynamic or static
analysis tools might).


Then, to support low-level systems programming, C provides many other
ways to construct and manipulate pointer values:

- casts of pointers to integer types and back, possibly with integer
  arithmetic, e.g.~to force alignment, or to store information in
  unused bits of pointers;

- copying pointer values with memcpy;

- manipulation of the representation bytes of pointers, e.g.~via user
  code that copies them via char* or unsigned char* accesses;

- type punning between pointer and integer values;

- I/O, using either fprintf/fscanf and the %p format, fwrite/fread on
  the pointer representation bytes, or pointer/integer casts and
  integer I/O;

- copying pointer values with realloc; and

- constructing pointer values that embody knowledge established from
  linking, and from constants that represent the addresses of
  memory-mapped devices.


A satisfactory semantics has to address all these, together with the
implications on optimisation.  We've explored several, but our main
proposal is "PNVI-ae-udi" (provenance not via integers,
address-exposed, user-disambiguation).

This semantics does not track provenance via integers.  Instead, at
integer-to-pointer cast points, it checks whether the given address
points within a live object that has previously been *exposed* and, if
so, recreates the corresponding provenance.

A storage instance is deemed exposed by a cast of a pointer to it to
an integer type, by a read (at non-pointer type) of the representation
of the pointer, or by an output of the pointer using %p.

The user-disambiguation refinement adds some complexity but supports
roundtrip casts, from pointer to integer and back, of pointers that
are one-past a storage instance.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-12 15:31   ` Peter Sewell
@ 2019-04-17  9:06     ` Richard Biener
  2019-04-17  9:15       ` Peter Sewell
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Biener @ 2019-04-17  9:06 UTC (permalink / raw)
  To: Peter.Sewell; +Cc: Jeff Law, GCC Development, cl-c-memory-object-model

On Fri, Apr 12, 2019 at 5:31 PM Peter Sewell <Peter.Sewell@cl.cam.ac.uk> wrote:
>
> On Fri, 12 Apr 2019 at 15:51, Jeff Law <law@redhat.com> wrote:
> >
> > On 4/2/19 2:11 AM, Peter Sewell wrote:
> > > Dear all,
> > >
> > > continuing the discussion from the 2018 GNU Tools Cauldron, we
> > > (the WG14 C memory object model study group) now
> > > have a detailed proposal for pointer provenance semantics, refining
> > > the "provenance not via integers (PNVI)" model presented there.
> > > This will be discussed at the ISO WG14 C standards committee at the
> > > end of April, and comments from the GCC community before then would
> > > be very welcome.   The proposal reconciles the needs of existing code
> > > and the behaviour of existing compilers as well as we can, but it doesn't
> > > exactly match any of the latter, so we'd especially like to know whether
> > > it would be feasible to implement - our hope is that it would only require
> > > minor changes.  It's presented in three documents:
> > >
> > > N2362  Moving to a provenance-aware memory model for C: proposal for C2x
> > > by the memory object model study group.  Jens Gustedt, Peter Sewell,
> > > Kayvan Memarian, Victor B. F. Gomes, Martin Uecker.
> > > This introduces the proposal and gives the proposed change to the standard
> > > text, presented as change-highlighted pages of the standard
> > > (though one might want to read the N2363 examples before going into that).
> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2362.pdf
> > >
> > > N2363  C provenance semantics: examples.
> > > Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, Jens Gustedt, Martin Uecker.
> > > This explains the proposal and its design choices with discussion of a
> > > series of examples.
> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2363.pdf
> > >
> > > N2364  C provenance semantics: detailed semantics.
> > > Peter Sewell, Kayvan Memarian, Victor B. F. Gomes.
> > > This gives a detailed mathematical semantics for the proposal
> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf
> > >
> > > In addition, at http://cerberus.cl.cam.ac.uk/cerberus we provide an
> > > executable version of the semantics, with a web interface that
> > > allows one to explore and visualise the behaviour of small test
> > > programs, stepping through and seeing the abstract-machine
> > > memory state including provenance information.   N2363 compares
> > > the results of this for the example programs with gcc, clang, and icc
> > > results, though the tests are really intended as tests of the semantics
> > > rather than compiler tests, so one has to interpret this with care.
> > THanks.  I just noticed this came up in EuroLLVM as well.    Getting
> > some standards clarity in this space would be good.
> >
> > Richi is in the best position to cover for GCC, but I suspect he's
> > buried with gcc-9 issues as we approach the upcoming release.  Hopefully
> > he'll have time to review this once crunch time has past.  I think more
> > than anything sanity checking the proposal's requirements vs what can be
> > reasonably implmemented is most important at this stage.
>
> Indeed.  We talked with him at the GNU cauldron, without uncovering
> any serious problems, but more detailed review from an implementability
> point of view would be great.   For the UB mailing list we just made
> a brief plain-text summary of the proposal (leaving out all the examples
> and standards diff, and glossing over some details).  I'll paste that
> in below in case it's helpful.  The next WG14 meeting is the week of
> April 29; comments before then would be particularly useful if that's possible.
>
> best,
> Peter
>
> C pointer values are typically represented at runtime as simple
> concrete numeric values, but mainstream compilers routinely exploit
> information about the "provenance" of pointers to reason that they
> cannot alias, and hence to justify optimisations.  This is
> long-standing practice, but exactly what it means (what programmers
> can rely on, and what provenance-based alias analysis is allowed to
> do), has never been nailed down.   That's what the proposal does.
>
>
> The basic idea is to associate a *provenance* with every pointer
> value, identifying the original storage instance (or allocation, in
> other words) that the pointer is derived from.  In more detail:
>
> - We take abstract-machine pointer values to be pairs (pi,a), adding a
>   provenance pi, either @i where i is a storage instance ID, or the
>   *empty* provenance, to their concrete address a.
>
> - On every storage instance creation (of objects with static, thread,
>   automatic, and allocated storage duration), the abstract machine
>   nondeterministically chooses a fresh storage instance ID i (unique
>   across the entire execution), and the resulting pointer value
>   carries that single storage instance ID as its provenance @i.
>
> - Provenance is preserved by pointer arithmetic that adds or subtracts
>   an integer to a pointer.
>
> - At any access via a pointer value, its numeric address must be
>   consistent with its provenance, with undefined behaviour
>   otherwise. In particular:
>
>   -- access via a pointer value which has provenance a single storage
>      instance ID @i must be within the memory footprint of the
>      corresponding original storage instance, which must still be
>      live.
>
>   -- all other accesses, including those via a pointer value with
>      empty provenance, are undefined behaviour.
>
> Regarding such accesses as undefined behaviour is necessary to make
> optimisation based on provenance alias analysis sound: if the standard
> did define behaviour for programs that make provenance-violating
> accesses, e.g.~by adopting a concrete semantics, optimisation based on
> provenance-aware alias analysis would not be sound.  In other words,
> the provenance lets one distinguish a one-past pointer from a pointer
> to the start of an adjacently-allocated object, which otherwise are
> indistinguishable.
>
> All this is for the C abstract machine as defined in the standard:
> compilers might rely on provenance in their alias analysis and
> optimisation, but one would not expect normal implementations to
> record or manipulate provenance at runtime (though dynamic or static
> analysis tools might).
>
>
> Then, to support low-level systems programming, C provides many other
> ways to construct and manipulate pointer values:
>
> - casts of pointers to integer types and back, possibly with integer
>   arithmetic, e.g.~to force alignment, or to store information in
>   unused bits of pointers;
>
> - copying pointer values with memcpy;
>
> - manipulation of the representation bytes of pointers, e.g.~via user
>   code that copies them via char* or unsigned char* accesses;
>
> - type punning between pointer and integer values;
>
> - I/O, using either fprintf/fscanf and the %p format, fwrite/fread on
>   the pointer representation bytes, or pointer/integer casts and
>   integer I/O;
>
> - copying pointer values with realloc; and
>
> - constructing pointer values that embody knowledge established from
>   linking, and from constants that represent the addresses of
>   memory-mapped devices.
>
>
> A satisfactory semantics has to address all these, together with the
> implications on optimisation.  We've explored several, but our main
> proposal is "PNVI-ae-udi" (provenance not via integers,
> address-exposed, user-disambiguation).
>
> This semantics does not track provenance via integers.  Instead, at
> integer-to-pointer cast points, it checks whether the given address
> points within a live object that has previously been *exposed* and, if
> so, recreates the corresponding provenance.
>
> A storage instance is deemed exposed by a cast of a pointer to it to
> an integer type, by a read (at non-pointer type) of the representation
> of the pointer, or by an output of the pointer using %p.

So this is not what GCC implements which tracks provenance through
non-pointer types to a limited extent when only copying is taking place.

Your proposal makes

 int a, b;
 int *p = &a;
 int *q = &b;
 uintptr_t pi = (uintptr_t)p; //expose
 uintptr_t qi = (uintptr_t)q; //expose
 pi += 4;
 if (pi == qi)
   *(int *)pi = 1;

well-defined since (int *)pi now has the provenance of &b.

Note GCC, when tracking provenance of non-pointer type
adds like in

  int *p = &a;
  uintptr_t pi = (uintptr_t)p;
  pi += 4;

considers pi to have provenance "anything" (not sure if you
have something like that) since we add 4 which has provenance
"anything" to pi which has provenance &a.

> The user-disambiguation refinement adds some complexity but supports
> roundtrip casts, from pointer to integer and back, of pointers that
> are one-past a storage instance.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-17  9:06     ` Richard Biener
@ 2019-04-17  9:15       ` Peter Sewell
  2019-04-17  9:41         ` Richard Biener
  0 siblings, 1 reply; 56+ messages in thread
From: Peter Sewell @ 2019-04-17  9:15 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, GCC Development, cl-c-memory-object-model

On 17/04/2019, Richard Biener <richard.guenther@gmail.com> wrote:
> On Fri, Apr 12, 2019 at 5:31 PM Peter Sewell <Peter.Sewell@cl.cam.ac.uk>
> wrote:
>>
>> On Fri, 12 Apr 2019 at 15:51, Jeff Law <law@redhat.com> wrote:
>> >
>> > On 4/2/19 2:11 AM, Peter Sewell wrote:
>> > > Dear all,
>> > >
>> > > continuing the discussion from the 2018 GNU Tools Cauldron, we
>> > > (the WG14 C memory object model study group) now
>> > > have a detailed proposal for pointer provenance semantics, refining
>> > > the "provenance not via integers (PNVI)" model presented there.
>> > > This will be discussed at the ISO WG14 C standards committee at the
>> > > end of April, and comments from the GCC community before then would
>> > > be very welcome.   The proposal reconciles the needs of existing code
>> > > and the behaviour of existing compilers as well as we can, but it
>> > > doesn't
>> > > exactly match any of the latter, so we'd especially like to know
>> > > whether
>> > > it would be feasible to implement - our hope is that it would only
>> > > require
>> > > minor changes.  It's presented in three documents:
>> > >
>> > > N2362  Moving to a provenance-aware memory model for C: proposal for
>> > > C2x
>> > > by the memory object model study group.  Jens Gustedt, Peter Sewell,
>> > > Kayvan Memarian, Victor B. F. Gomes, Martin Uecker.
>> > > This introduces the proposal and gives the proposed change to the
>> > > standard
>> > > text, presented as change-highlighted pages of the standard
>> > > (though one might want to read the N2363 examples before going into
>> > > that).
>> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2362.pdf
>> > >
>> > > N2363  C provenance semantics: examples.
>> > > Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, Jens Gustedt,
>> > > Martin Uecker.
>> > > This explains the proposal and its design choices with discussion of
>> > > a
>> > > series of examples.
>> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2363.pdf
>> > >
>> > > N2364  C provenance semantics: detailed semantics.
>> > > Peter Sewell, Kayvan Memarian, Victor B. F. Gomes.
>> > > This gives a detailed mathematical semantics for the proposal
>> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf
>> > >
>> > > In addition, at http://cerberus.cl.cam.ac.uk/cerberus we provide an
>> > > executable version of the semantics, with a web interface that
>> > > allows one to explore and visualise the behaviour of small test
>> > > programs, stepping through and seeing the abstract-machine
>> > > memory state including provenance information.   N2363 compares
>> > > the results of this for the example programs with gcc, clang, and icc
>> > > results, though the tests are really intended as tests of the
>> > > semantics
>> > > rather than compiler tests, so one has to interpret this with care.
>> > THanks.  I just noticed this came up in EuroLLVM as well.    Getting
>> > some standards clarity in this space would be good.
>> >
>> > Richi is in the best position to cover for GCC, but I suspect he's
>> > buried with gcc-9 issues as we approach the upcoming release.
>> > Hopefully
>> > he'll have time to review this once crunch time has past.  I think more
>> > than anything sanity checking the proposal's requirements vs what can
>> > be
>> > reasonably implmemented is most important at this stage.
>>
>> Indeed.  We talked with him at the GNU cauldron, without uncovering
>> any serious problems, but more detailed review from an implementability
>> point of view would be great.   For the UB mailing list we just made
>> a brief plain-text summary of the proposal (leaving out all the examples
>> and standards diff, and glossing over some details).  I'll paste that
>> in below in case it's helpful.  The next WG14 meeting is the week of
>> April 29; comments before then would be particularly useful if that's
>> possible.
>>
>> best,
>> Peter
>>
>> C pointer values are typically represented at runtime as simple
>> concrete numeric values, but mainstream compilers routinely exploit
>> information about the "provenance" of pointers to reason that they
>> cannot alias, and hence to justify optimisations.  This is
>> long-standing practice, but exactly what it means (what programmers
>> can rely on, and what provenance-based alias analysis is allowed to
>> do), has never been nailed down.   That's what the proposal does.
>>
>>
>> The basic idea is to associate a *provenance* with every pointer
>> value, identifying the original storage instance (or allocation, in
>> other words) that the pointer is derived from.  In more detail:
>>
>> - We take abstract-machine pointer values to be pairs (pi,a), adding a
>>   provenance pi, either @i where i is a storage instance ID, or the
>>   *empty* provenance, to their concrete address a.
>>
>> - On every storage instance creation (of objects with static, thread,
>>   automatic, and allocated storage duration), the abstract machine
>>   nondeterministically chooses a fresh storage instance ID i (unique
>>   across the entire execution), and the resulting pointer value
>>   carries that single storage instance ID as its provenance @i.
>>
>> - Provenance is preserved by pointer arithmetic that adds or subtracts
>>   an integer to a pointer.
>>
>> - At any access via a pointer value, its numeric address must be
>>   consistent with its provenance, with undefined behaviour
>>   otherwise. In particular:
>>
>>   -- access via a pointer value which has provenance a single storage
>>      instance ID @i must be within the memory footprint of the
>>      corresponding original storage instance, which must still be
>>      live.
>>
>>   -- all other accesses, including those via a pointer value with
>>      empty provenance, are undefined behaviour.
>>
>> Regarding such accesses as undefined behaviour is necessary to make
>> optimisation based on provenance alias analysis sound: if the standard
>> did define behaviour for programs that make provenance-violating
>> accesses, e.g.~by adopting a concrete semantics, optimisation based on
>> provenance-aware alias analysis would not be sound.  In other words,
>> the provenance lets one distinguish a one-past pointer from a pointer
>> to the start of an adjacently-allocated object, which otherwise are
>> indistinguishable.
>>
>> All this is for the C abstract machine as defined in the standard:
>> compilers might rely on provenance in their alias analysis and
>> optimisation, but one would not expect normal implementations to
>> record or manipulate provenance at runtime (though dynamic or static
>> analysis tools might).
>>
>>
>> Then, to support low-level systems programming, C provides many other
>> ways to construct and manipulate pointer values:
>>
>> - casts of pointers to integer types and back, possibly with integer
>>   arithmetic, e.g.~to force alignment, or to store information in
>>   unused bits of pointers;
>>
>> - copying pointer values with memcpy;
>>
>> - manipulation of the representation bytes of pointers, e.g.~via user
>>   code that copies them via char* or unsigned char* accesses;
>>
>> - type punning between pointer and integer values;
>>
>> - I/O, using either fprintf/fscanf and the %p format, fwrite/fread on
>>   the pointer representation bytes, or pointer/integer casts and
>>   integer I/O;
>>
>> - copying pointer values with realloc; and
>>
>> - constructing pointer values that embody knowledge established from
>>   linking, and from constants that represent the addresses of
>>   memory-mapped devices.
>>
>>
>> A satisfactory semantics has to address all these, together with the
>> implications on optimisation.  We've explored several, but our main
>> proposal is "PNVI-ae-udi" (provenance not via integers,
>> address-exposed, user-disambiguation).
>>
>> This semantics does not track provenance via integers.  Instead, at
>> integer-to-pointer cast points, it checks whether the given address
>> points within a live object that has previously been *exposed* and, if
>> so, recreates the corresponding provenance.
>>
>> A storage instance is deemed exposed by a cast of a pointer to it to
>> an integer type, by a read (at non-pointer type) of the representation
>> of the pointer, or by an output of the pointer using %p.
>
> So this is not what GCC implements which tracks provenance through
> non-pointer types to a limited extent when only copying is taking place.
>
> Your proposal makes
>
>  int a, b;
>  int *p = &a;
>  int *q = &b;
>  uintptr_t pi = (uintptr_t)p; //expose
>  uintptr_t qi = (uintptr_t)q; //expose
>  pi += 4;
>  if (pi == qi)
>    *(int *)pi = 1;
>
> well-defined since (int *)pi now has the provenance of &b.

Yes.  (Just to be clear: it's not that we think the above example is
desirable in itself, but it's well-defined as a consequence of what
we do to make other common idioms, eg pointer bit manipulation,
well-defined.)

> Note GCC, when tracking provenance of non-pointer type
> adds like in
>
>   int *p = &a;
>   uintptr_t pi = (uintptr_t)p;
>   pi += 4;
>
> considers pi to have provenance "anything" (not sure if you
> have something like that) since we add 4 which has provenance
> "anything" to pi which has provenance &a.

We don't at present have a provenance "anything", but if the gcc
"anything" means that it's assumed that it might alias with anything,
then it looks like gcc's implementing a sound approximation to
the proposal here?

best,
Peter


>> The user-disambiguation refinement adds some complexity but supports
>> roundtrip casts, from pointer to integer and back, of pointers that
>> are one-past a storage instance.
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-17  9:15       ` Peter Sewell
@ 2019-04-17  9:41         ` Richard Biener
  2019-04-17 11:53           ` Uecker, Martin
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Biener @ 2019-04-17  9:41 UTC (permalink / raw)
  To: Peter.Sewell; +Cc: Jeff Law, GCC Development, cl-c-memory-object-model

On Wed, Apr 17, 2019 at 11:15 AM Peter Sewell <Peter.Sewell@cl.cam.ac.uk> wrote:
>
> On 17/04/2019, Richard Biener <richard.guenther@gmail.com> wrote:
> > On Fri, Apr 12, 2019 at 5:31 PM Peter Sewell <Peter.Sewell@cl.cam.ac.uk>
> > wrote:
> >>
> >> On Fri, 12 Apr 2019 at 15:51, Jeff Law <law@redhat.com> wrote:
> >> >
> >> > On 4/2/19 2:11 AM, Peter Sewell wrote:
> >> > > Dear all,
> >> > >
> >> > > continuing the discussion from the 2018 GNU Tools Cauldron, we
> >> > > (the WG14 C memory object model study group) now
> >> > > have a detailed proposal for pointer provenance semantics, refining
> >> > > the "provenance not via integers (PNVI)" model presented there.
> >> > > This will be discussed at the ISO WG14 C standards committee at the
> >> > > end of April, and comments from the GCC community before then would
> >> > > be very welcome.   The proposal reconciles the needs of existing code
> >> > > and the behaviour of existing compilers as well as we can, but it
> >> > > doesn't
> >> > > exactly match any of the latter, so we'd especially like to know
> >> > > whether
> >> > > it would be feasible to implement - our hope is that it would only
> >> > > require
> >> > > minor changes.  It's presented in three documents:
> >> > >
> >> > > N2362  Moving to a provenance-aware memory model for C: proposal for
> >> > > C2x
> >> > > by the memory object model study group.  Jens Gustedt, Peter Sewell,
> >> > > Kayvan Memarian, Victor B. F. Gomes, Martin Uecker.
> >> > > This introduces the proposal and gives the proposed change to the
> >> > > standard
> >> > > text, presented as change-highlighted pages of the standard
> >> > > (though one might want to read the N2363 examples before going into
> >> > > that).
> >> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2362.pdf
> >> > >
> >> > > N2363  C provenance semantics: examples.
> >> > > Peter Sewell, Kayvan Memarian, Victor B. F. Gomes, Jens Gustedt,
> >> > > Martin Uecker.
> >> > > This explains the proposal and its design choices with discussion of
> >> > > a
> >> > > series of examples.
> >> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2363.pdf
> >> > >
> >> > > N2364  C provenance semantics: detailed semantics.
> >> > > Peter Sewell, Kayvan Memarian, Victor B. F. Gomes.
> >> > > This gives a detailed mathematical semantics for the proposal
> >> > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf
> >> > >
> >> > > In addition, at http://cerberus.cl.cam.ac.uk/cerberus we provide an
> >> > > executable version of the semantics, with a web interface that
> >> > > allows one to explore and visualise the behaviour of small test
> >> > > programs, stepping through and seeing the abstract-machine
> >> > > memory state including provenance information.   N2363 compares
> >> > > the results of this for the example programs with gcc, clang, and icc
> >> > > results, though the tests are really intended as tests of the
> >> > > semantics
> >> > > rather than compiler tests, so one has to interpret this with care.
> >> > THanks.  I just noticed this came up in EuroLLVM as well.    Getting
> >> > some standards clarity in this space would be good.
> >> >
> >> > Richi is in the best position to cover for GCC, but I suspect he's
> >> > buried with gcc-9 issues as we approach the upcoming release.
> >> > Hopefully
> >> > he'll have time to review this once crunch time has past.  I think more
> >> > than anything sanity checking the proposal's requirements vs what can
> >> > be
> >> > reasonably implmemented is most important at this stage.
> >>
> >> Indeed.  We talked with him at the GNU cauldron, without uncovering
> >> any serious problems, but more detailed review from an implementability
> >> point of view would be great.   For the UB mailing list we just made
> >> a brief plain-text summary of the proposal (leaving out all the examples
> >> and standards diff, and glossing over some details).  I'll paste that
> >> in below in case it's helpful.  The next WG14 meeting is the week of
> >> April 29; comments before then would be particularly useful if that's
> >> possible.
> >>
> >> best,
> >> Peter
> >>
> >> C pointer values are typically represented at runtime as simple
> >> concrete numeric values, but mainstream compilers routinely exploit
> >> information about the "provenance" of pointers to reason that they
> >> cannot alias, and hence to justify optimisations.  This is
> >> long-standing practice, but exactly what it means (what programmers
> >> can rely on, and what provenance-based alias analysis is allowed to
> >> do), has never been nailed down.   That's what the proposal does.
> >>
> >>
> >> The basic idea is to associate a *provenance* with every pointer
> >> value, identifying the original storage instance (or allocation, in
> >> other words) that the pointer is derived from.  In more detail:
> >>
> >> - We take abstract-machine pointer values to be pairs (pi,a), adding a
> >>   provenance pi, either @i where i is a storage instance ID, or the
> >>   *empty* provenance, to their concrete address a.
> >>
> >> - On every storage instance creation (of objects with static, thread,
> >>   automatic, and allocated storage duration), the abstract machine
> >>   nondeterministically chooses a fresh storage instance ID i (unique
> >>   across the entire execution), and the resulting pointer value
> >>   carries that single storage instance ID as its provenance @i.
> >>
> >> - Provenance is preserved by pointer arithmetic that adds or subtracts
> >>   an integer to a pointer.
> >>
> >> - At any access via a pointer value, its numeric address must be
> >>   consistent with its provenance, with undefined behaviour
> >>   otherwise. In particular:
> >>
> >>   -- access via a pointer value which has provenance a single storage
> >>      instance ID @i must be within the memory footprint of the
> >>      corresponding original storage instance, which must still be
> >>      live.
> >>
> >>   -- all other accesses, including those via a pointer value with
> >>      empty provenance, are undefined behaviour.
> >>
> >> Regarding such accesses as undefined behaviour is necessary to make
> >> optimisation based on provenance alias analysis sound: if the standard
> >> did define behaviour for programs that make provenance-violating
> >> accesses, e.g.~by adopting a concrete semantics, optimisation based on
> >> provenance-aware alias analysis would not be sound.  In other words,
> >> the provenance lets one distinguish a one-past pointer from a pointer
> >> to the start of an adjacently-allocated object, which otherwise are
> >> indistinguishable.
> >>
> >> All this is for the C abstract machine as defined in the standard:
> >> compilers might rely on provenance in their alias analysis and
> >> optimisation, but one would not expect normal implementations to
> >> record or manipulate provenance at runtime (though dynamic or static
> >> analysis tools might).
> >>
> >>
> >> Then, to support low-level systems programming, C provides many other
> >> ways to construct and manipulate pointer values:
> >>
> >> - casts of pointers to integer types and back, possibly with integer
> >>   arithmetic, e.g.~to force alignment, or to store information in
> >>   unused bits of pointers;
> >>
> >> - copying pointer values with memcpy;
> >>
> >> - manipulation of the representation bytes of pointers, e.g.~via user
> >>   code that copies them via char* or unsigned char* accesses;
> >>
> >> - type punning between pointer and integer values;
> >>
> >> - I/O, using either fprintf/fscanf and the %p format, fwrite/fread on
> >>   the pointer representation bytes, or pointer/integer casts and
> >>   integer I/O;
> >>
> >> - copying pointer values with realloc; and
> >>
> >> - constructing pointer values that embody knowledge established from
> >>   linking, and from constants that represent the addresses of
> >>   memory-mapped devices.
> >>
> >>
> >> A satisfactory semantics has to address all these, together with the
> >> implications on optimisation.  We've explored several, but our main
> >> proposal is "PNVI-ae-udi" (provenance not via integers,
> >> address-exposed, user-disambiguation).
> >>
> >> This semantics does not track provenance via integers.  Instead, at
> >> integer-to-pointer cast points, it checks whether the given address
> >> points within a live object that has previously been *exposed* and, if
> >> so, recreates the corresponding provenance.
> >>
> >> A storage instance is deemed exposed by a cast of a pointer to it to
> >> an integer type, by a read (at non-pointer type) of the representation
> >> of the pointer, or by an output of the pointer using %p.
> >
> > So this is not what GCC implements which tracks provenance through
> > non-pointer types to a limited extent when only copying is taking place.
> >
> > Your proposal makes
> >
> >  int a, b;
> >  int *p = &a;
> >  int *q = &b;
> >  uintptr_t pi = (uintptr_t)p; //expose
> >  uintptr_t qi = (uintptr_t)q; //expose
> >  pi += 4;
> >  if (pi == qi)
> >    *(int *)pi = 1;
> >
> > well-defined since (int *)pi now has the provenance of &b.
>
> Yes.  (Just to be clear: it's not that we think the above example is
> desirable in itself, but it's well-defined as a consequence of what
> we do to make other common idioms, eg pointer bit manipulation,
> well-defined.)
>
> > Note GCC, when tracking provenance of non-pointer type
> > adds like in
> >
> >   int *p = &a;
> >   uintptr_t pi = (uintptr_t)p;
> >   pi += 4;
> >
> > considers pi to have provenance "anything" (not sure if you
> > have something like that) since we add 4 which has provenance
> > "anything" to pi which has provenance &a.
>
> We don't at present have a provenance "anything", but if the gcc
> "anything" means that it's assumed that it might alias with anything,
> then it looks like gcc's implementing a sound approximation to
> the proposal here?

GCC makes the code well-defined whereas the proposal would make
dereferencing a pointer based on pi invoke undefined behavior?  Since
your proposal is based on an abstract machine there isn't anything
like a pointer with multiple provenances (which "anything" is), just
pointers with no provenance (pointing outside of any object), right?
For points-to analysis we of course have to track all possible
provenances of a pointer (and if we know it doesn't point inside
any object we make it point to nothing).

Btw, GCC changed its behavior here to support optimizing matlab
generated C code which passes pointers to arrays across functions
by marshalling them in two float typed halves (yikes!).  GCC is able
to properly track provenance across the decomposition / recomposition
when doing points-to analysis ;)

Btw, one thing GCC struggles is when it applies rules that clearly
apply to pointer dereferences to pointer equality compares where
the standard has that special casing of comparing two pointers
where one points one after an object requiring the comparison
to evaluate to true when the objects are adjacent.  GCC
currently statically optimizes if (&x + 1 == &y) to false for
this reason (but not the corresponding integer comparison).

Richard.

>
> best,
> Peter
>
>
> >> The user-disambiguation refinement adds some complexity but supports
> >> roundtrip casts, from pointer to integer and back, of pointers that
> >> are one-past a storage instance.
> >

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-17  9:41         ` Richard Biener
@ 2019-04-17 11:53           ` Uecker, Martin
  2019-04-17 12:41             ` Richard Biener
  0 siblings, 1 reply; 56+ messages in thread
From: Uecker, Martin @ 2019-04-17 11:53 UTC (permalink / raw)
  To: Peter.Sewell, richard.guenther; +Cc: gcc, law, cl-c-memory-object-model


Hi Richard,

Am Mittwoch, den 17.04.2019, 11:41 +0200 schrieb Richard Biener:
> On Wed, Apr 17, 2019 at 11:15 AM Peter Sewell <Peter.Sewell@cl.cam.ac.uk> wrote:
> > 
> > On 17/04/2019, Richard Biener <richard.guenther@gmail.com> wrote:
> > > On Fri, Apr 12, 2019 at 5:31 PM Peter Sewell <Peter.Sewell@cl.cam.ac.uk>
> > > wrote:

...
> > > So this is not what GCC implements which tracks provenance through
> > > non-pointer types to a limited extent when only copying is taking place.
> > > 
> > > Your proposal makes
> > > 
> > >  int a, b;
> > >  int *p = &a;
> > >  int *q = &b;
> > >  uintptr_t pi = (uintptr_t)p; //expose
> > >  uintptr_t qi = (uintptr_t)q; //expose
> > >  pi += 4;
> > >  if (pi == qi)
> > >    *(int *)pi = 1;
> > > 
> > > well-defined since (int *)pi now has the provenance of &b.
> > 
> > Yes.  (Just to be clear: it's not that we think the above example is
> > desirable in itself, but it's well-defined as a consequence of what
> > we do to make other common idioms, eg pointer bit manipulation,
> > well-defined.)
> > 
> > > Note GCC, when tracking provenance of non-pointer type
> > > adds like in
> > > 
> > >   int *p = &a;
> > >   uintptr_t pi = (uintptr_t)p;
> > >   pi += 4;
> > > 
> > > considers pi to have provenance "anything" (not sure if you
> > > have something like that) since we add 4 which has provenance
> > > "anything" to pi which has provenance &a.
> > 
> > We don't at present have a provenance "anything", but if the gcc
> > "anything" means that it's assumed that it might alias with anything,
> > then it looks like gcc's implementing a sound approximation to
> > the proposal here?
> 
> GCC makes the code well-defined whereas the proposal would make
> dereferencing a pointer based on pi invoke undefined behavior?

No, if there is an exposed object where pi points to, it is
defined behaviour. 

>  Since
> your proposal is based on an abstract machine there isn't anything
> like a pointer with multiple provenances (which "anything" is), just
> pointers with no provenance (pointing outside of any object), right?

This is correct. What the proposal does though is put a limit
on where pointers obtained from integers are allowed to point
to: They cannot point to non-exposed objects. I assume GCC
"anything" provenances also cannot point to all possible
objects.

> For points-to analysis we of course have to track all possible
> provenances of a pointer (and if we know it doesn't point inside
> any object we make it point to nothing).

Yes, a compiler should track what it knows (it could also track
if it knows that some pointers point to the same object, etc.)
while the abstract machine knows everything there is to know.

> Btw, GCC changed its behavior here to support optimizing matlab
> generated C code which passes pointers to arrays across functions
> by marshalling them in two float typed halves (yikes!).  GCC is able
> to properly track provenance across the decomposition / recomposition
> when doing points-to analysis ;)

Impressive ;-)  I would have thought that such encoding
happens at ABI boundaries, where you cannot track anyway.
But this seems to occur inside compiled code?

While we do not attach a provenance to integers
in our proposal, it does not necessarily imply that a compiler
is not allowed to track such information. It then depends on
how it uses it.

For example,

int z;
int x;
uintptr_t pi = (uintptr_t)&x;

// encode in two floats ;-)

// pass floats around

// decode

int* p = (int*)pi;

If the compiler can prove that the address is still
the same, it can also reattach the original provenance
under some conditions.

But there is a caveat: It can only do this is it cannot
also be  a one-after pointer for z (or some other object).
If the address of 'z' is not exposed, it may be able to
assume this.

> Btw, one thing GCC struggles is when it applies rules that clearly
> apply to pointer dereferences to pointer equality compares where
> the standard has that special casing of comparing two pointers
> where one points one after an object requiring the comparison
> to evaluate to true when the objects are adjacent.  GCC
> currently statically optimizes if (&x + 1 == &y) to false for
> this reason (but not the corresponding integer comparison).

Yes, according to the current rules (and this doesn't change in
the proposal) two points comparing equal does not imply that
they are interchangable. Making the comparison unspecified 
(as C++) would not help. We could make it undefined, which
would make all optimizations based on the assumption that
the pointer are interchangable valid. But I fear that this
would introduce a corner case that could lead to subtle
and hard-to-detect bugs.

Martin

> Richard.
> 
> > 
> > best,
> > Peter
> > 
> > 
> > > > The user-disambiguation refinement adds some complexity but supports
> > > > roundtrip casts, from pointer to integer and back, of pointers that
> > > > are one-past a storage instance.
> 
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-17 11:53           ` Uecker, Martin
@ 2019-04-17 12:41             ` Richard Biener
  2019-04-17 12:56               ` Uecker, Martin
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Biener @ 2019-04-17 12:41 UTC (permalink / raw)
  To: Uecker, Martin; +Cc: Peter.Sewell, gcc, law, cl-c-memory-object-model

On Wed, Apr 17, 2019 at 1:53 PM Uecker, Martin
<Martin.Uecker@med.uni-goettingen.de> wrote:
>
>
> Hi Richard,
>
> Am Mittwoch, den 17.04.2019, 11:41 +0200 schrieb Richard Biener:
> > On Wed, Apr 17, 2019 at 11:15 AM Peter Sewell <Peter.Sewell@cl.cam.ac.uk> wrote:
> > >
> > > On 17/04/2019, Richard Biener <richard.guenther@gmail.com> wrote:
> > > > On Fri, Apr 12, 2019 at 5:31 PM Peter Sewell <Peter.Sewell@cl.cam.ac.uk>
> > > > wrote:
>
> ...
> > > > So this is not what GCC implements which tracks provenance through
> > > > non-pointer types to a limited extent when only copying is taking place.
> > > >
> > > > Your proposal makes
> > > >
> > > >  int a, b;
> > > >  int *p = &a;
> > > >  int *q = &b;
> > > >  uintptr_t pi = (uintptr_t)p; //expose
> > > >  uintptr_t qi = (uintptr_t)q; //expose
> > > >  pi += 4;
> > > >  if (pi == qi)
> > > >    *(int *)pi = 1;
> > > >
> > > > well-defined since (int *)pi now has the provenance of &b.
> > >
> > > Yes.  (Just to be clear: it's not that we think the above example is
> > > desirable in itself, but it's well-defined as a consequence of what
> > > we do to make other common idioms, eg pointer bit manipulation,
> > > well-defined.)
> > >
> > > > Note GCC, when tracking provenance of non-pointer type
> > > > adds like in
> > > >
> > > >   int *p = &a;
> > > >   uintptr_t pi = (uintptr_t)p;
> > > >   pi += 4;
> > > >
> > > > considers pi to have provenance "anything" (not sure if you
> > > > have something like that) since we add 4 which has provenance
> > > > "anything" to pi which has provenance &a.
> > >
> > > We don't at present have a provenance "anything", but if the gcc
> > > "anything" means that it's assumed that it might alias with anything,
> > > then it looks like gcc's implementing a sound approximation to
> > > the proposal here?
> >
> > GCC makes the code well-defined whereas the proposal would make
> > dereferencing a pointer based on pi invoke undefined behavior?
>
> No, if there is an exposed object where pi points to, it is
> defined behaviour.
>
> >  Since
> > your proposal is based on an abstract machine there isn't anything
> > like a pointer with multiple provenances (which "anything" is), just
> > pointers with no provenance (pointing outside of any object), right?
>
> This is correct. What the proposal does though is put a limit
> on where pointers obtained from integers are allowed to point
> to: They cannot point to non-exposed objects. I assume GCC
> "anything" provenances also cannot point to all possible
> objects.

Yes.  We exclude objects that do not have their address taken
though (so somewhat similar to your "exposed").

> > For points-to analysis we of course have to track all possible
> > provenances of a pointer (and if we know it doesn't point inside
> > any object we make it point to nothing).
>
> Yes, a compiler should track what it knows (it could also track
> if it knows that some pointers point to the same object, etc.)
> while the abstract machine knows everything there is to know.
>
> > Btw, GCC changed its behavior here to support optimizing matlab
> > generated C code which passes pointers to arrays across functions
> > by marshalling them in two float typed halves (yikes!).  GCC is able
> > to properly track provenance across the decomposition / recomposition
> > when doing points-to analysis ;)
>
> Impressive ;-)  I would have thought that such encoding
> happens at ABI boundaries, where you cannot track anyway.
> But this seems to occur inside compiled code?

It occurs when matlab generates C code for an expression.  They
seem to use floating-point for everything at their self-invented ABI
boundary so "obviously" that includes pointers.

> While we do not attach a provenance to integers
> in our proposal, it does not necessarily imply that a compiler
> is not allowed to track such information. It then depends on
> how it uses it.
>
> For example,
>
> int z;
> int x;
> uintptr_t pi = (uintptr_t)&x;
>
> // encode in two floats ;-)
>
> // pass floats around
>
> // decode
>
> int* p = (int*)pi;
>
> If the compiler can prove that the address is still
> the same, it can also reattach the original provenance
> under some conditions.
>
> But there is a caveat: It can only do this is it cannot
> also be  a one-after pointer for z (or some other object).
> If the address of 'z' is not exposed, it may be able to
> assume this.
>
> > Btw, one thing GCC struggles is when it applies rules that clearly
> > apply to pointer dereferences to pointer equality compares where
> > the standard has that special casing of comparing two pointers
> > where one points one after an object requiring the comparison
> > to evaluate to true when the objects are adjacent.  GCC
> > currently statically optimizes if (&x + 1 == &y) to false for
> > this reason (but not the corresponding integer comparison).
>
> Yes, according to the current rules (and this doesn't change in
> the proposal) two points comparing equal does not imply that
> they are interchangable. Making the comparison unspecified
> (as C++) would not help. We could make it undefined, which
> would make all optimizations based on the assumption that
> the pointer are interchangable valid. But I fear that this
> would introduce a corner case that could lead to subtle
> and hard-to-detect bugs.

Yeah.

Richard.

> Martin
>
> > Richard.
> >
> > >
> > > best,
> > > Peter
> > >
> > >
> > > > > The user-disambiguation refinement adds some complexity but supports
> > > > > roundtrip casts, from pointer to integer and back, of pointers that
> > > > > are one-past a storage instance.
> >
> >

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-17 12:41             ` Richard Biener
@ 2019-04-17 12:56               ` Uecker, Martin
  2019-04-17 13:35                 ` Richard Biener
  0 siblings, 1 reply; 56+ messages in thread
From: Uecker, Martin @ 2019-04-17 12:56 UTC (permalink / raw)
  To: richard.guenther; +Cc: gcc, Peter.Sewell, law, cl-c-memory-object-model

Am Mittwoch, den 17.04.2019, 14:41 +0200 schrieb Richard Biener:
> On Wed, Apr 17, 2019 at 1:53 PM Uecker, Martin
> <Martin.Uecker@med.uni-goettingen.de> wrote:

> > 
> > >  Since
> > > your proposal is based on an abstract machine there isn't anything
> > > like a pointer with multiple provenances (which "anything" is), just
> > > pointers with no provenance (pointing outside of any object), right?
> > 
> > This is correct. What the proposal does though is put a limit
> > on where pointers obtained from integers are allowed to point
> > to: They cannot point to non-exposed objects. I assume GCC
> > "anything" provenances also cannot point to all possible
> > objects.
> 
> Yes.  We exclude objects that do not have their address taken
> though (so somewhat similar to your "exposed").

Also if the address never escapes?

Using address-taken as the criterion is one option we considered,
but we felt this exposes too many objects, like automatic
arrays or locally used malloced/alloced data etc.

Using integer-casts as criterion means that all
objects whose address is taken but where (a) it is not
seen that the pointer is cast to an integer and
where (b) the pointer never escapes can be assumed safe.

Best,
Martin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-17 12:56               ` Uecker, Martin
@ 2019-04-17 13:35                 ` Richard Biener
  2019-04-17 14:12                   ` Uecker, Martin
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Biener @ 2019-04-17 13:35 UTC (permalink / raw)
  To: Uecker, Martin; +Cc: gcc, Peter.Sewell, law, cl-c-memory-object-model

On Wed, Apr 17, 2019 at 2:56 PM Uecker, Martin
<Martin.Uecker@med.uni-goettingen.de> wrote:
>
> Am Mittwoch, den 17.04.2019, 14:41 +0200 schrieb Richard Biener:
> > On Wed, Apr 17, 2019 at 1:53 PM Uecker, Martin
> > <Martin.Uecker@med.uni-goettingen.de> wrote:
>
> > >
> > > >  Since
> > > > your proposal is based on an abstract machine there isn't anything
> > > > like a pointer with multiple provenances (which "anything" is), just
> > > > pointers with no provenance (pointing outside of any object), right?
> > >
> > > This is correct. What the proposal does though is put a limit
> > > on where pointers obtained from integers are allowed to point
> > > to: They cannot point to non-exposed objects. I assume GCC
> > > "anything" provenances also cannot point to all possible
> > > objects.
> >
> > Yes.  We exclude objects that do not have their address taken
> > though (so somewhat similar to your "exposed").
>
> Also if the address never escapes?

Yes.

> Using address-taken as the criterion is one option we considered,
> but we felt this exposes too many objects, like automatic
> arrays or locally used malloced/alloced data etc.
>
> Using integer-casts as criterion means that all
> objects whose address is taken but where (a) it is not
> seen that the pointer is cast to an integer and
> where (b) the pointer never escapes can be assumed safe.

Yeah, since the abstract machine sees everything using whatever
seems fit is possible.

Richard.

> Best,
> Martin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-17 13:35                 ` Richard Biener
@ 2019-04-17 14:12                   ` Uecker, Martin
  2019-04-17 17:31                     ` Peter Sewell
  2019-04-18  9:32                     ` Richard Biener
  0 siblings, 2 replies; 56+ messages in thread
From: Uecker, Martin @ 2019-04-17 14:12 UTC (permalink / raw)
  To: richard.guenther; +Cc: gcc, Peter.Sewell, law, cl-c-memory-object-model

Am Mittwoch, den 17.04.2019, 15:34 +0200 schrieb Richard Biener:
> On Wed, Apr 17, 2019 at 2:56 PM Uecker, Martin
> <Martin.Uecker@med.uni-goettingen.de> wrote:
> > 
> > Am Mittwoch, den 17.04.2019, 14:41 +0200 schrieb Richard Biener:
> > > On Wed, Apr 17, 2019 at 1:53 PM Uecker, Martin
> > > <Martin.Uecker@med.uni-goettingen.de> wrote:
> > > > 
> > > > >  Since
> > > > > your proposal is based on an abstract machine there isn't anything
> > > > > like a pointer with multiple provenances (which "anything" is), just
> > > > > pointers with no provenance (pointing outside of any object), right?
> > > > 
> > > > This is correct. What the proposal does though is put a limit
> > > > on where pointers obtained from integers are allowed to point
> > > > to: They cannot point to non-exposed objects. I assume GCC
> > > > "anything" provenances also cannot point to all possible
> > > > objects.
> > > 
> > > Yes.  We exclude objects that do not have their address taken
> > > though (so somewhat similar to your "exposed").
> > 
> > Also if the address never escapes?
> 
> Yes.

Then with respect to "expose" it seems GCC implements
a superset which means it allows some behavior which
is undefined according to the proposal. So all seems
well with respect to this part.


With respect to tracking provenance through integers
some changes might be required.

Let's consider this example:
   
int x;
int y;
uintptr_t pi = (uintptr_t)&x;
uintptr_t pj = (uintptr_t)&y;
 
if (pi + 4 == pj) {
                
   int* p = (int*)pj; // can be one-after pointer of 'x'
   p[-1] = 1;         // well defined?
}

If I understand correctly, a pointer obtained from
pi + 4 would have a "anything" provenance (which is
fine). But the pointer obtained from 'pj' would have the
provenance of 'y' so the access to 'x' would not
be allowed. But according to the preferred version of
our proposal, the pointer could also be used to
access 'x' because it is also exposed.

GCC could make pj have a "anything" provenance
even though it is not modified. (This would break 
some optimization such as the one for Matlab.)

Maybe one could also refine this optimization to check
for additional conditions which rule out the case
that there is another object the pointer could point
to.

Best,
Martin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-17 14:12                   ` Uecker, Martin
@ 2019-04-17 17:31                     ` Peter Sewell
  2019-04-18  9:32                     ` Richard Biener
  1 sibling, 0 replies; 56+ messages in thread
From: Peter Sewell @ 2019-04-17 17:31 UTC (permalink / raw)
  To: Uecker, Martin; +Cc: richard.guenther, gcc, law, cl-c-memory-object-model

On Wed, 17 Apr 2019 at 15:12, Uecker, Martin
<Martin.Uecker@med.uni-goettingen.de> wrote:
>
> Am Mittwoch, den 17.04.2019, 15:34 +0200 schrieb Richard Biener:
> > On Wed, Apr 17, 2019 at 2:56 PM Uecker, Martin
> > <Martin.Uecker@med.uni-goettingen.de> wrote:
> > >
> > > Am Mittwoch, den 17.04.2019, 14:41 +0200 schrieb Richard Biener:
> > > > On Wed, Apr 17, 2019 at 1:53 PM Uecker, Martin
> > > > <Martin.Uecker@med.uni-goettingen.de> wrote:
> > > > >
> > > > > >  Since
> > > > > > your proposal is based on an abstract machine there isn't anything
> > > > > > like a pointer with multiple provenances (which "anything" is), just
> > > > > > pointers with no provenance (pointing outside of any object), right?
> > > > >
> > > > > This is correct. What the proposal does though is put a limit
> > > > > on where pointers obtained from integers are allowed to point
> > > > > to: They cannot point to non-exposed objects. I assume GCC
> > > > > "anything" provenances also cannot point to all possible
> > > > > objects.
> > > >
> > > > Yes.  We exclude objects that do not have their address taken
> > > > though (so somewhat similar to your "exposed").
> > >
> > > Also if the address never escapes?
> >
> > Yes.

Just for reference, here's Richard's above example in Cerberus,
changed slightly so the Cerberus default allocator puts b after a,
One can step through to see that this execution is allowed.

https://cerberus.cl.cam.ac.uk/?short/9b83be



> Then with respect to "expose" it seems GCC implements
> a superset which means it allows some behavior which
> is undefined according to the proposal. So all seems
> well with respect to this part.

y

For the float thing (which is truly horrible :-), that reinterpretation
of a pointer value as two floats would also (in our terms) expose
the pointer, so the conversion the other way would just work.

> With respect to tracking provenance through integers
> some changes might be required.
>
> Let's consider this example:
>
> int x;
> int y;
> uintptr_t pi = (uintptr_t)&x;
> uintptr_t pj = (uintptr_t)&y;
>
> if (pi + 4 == pj) {
>
>    int* p = (int*)pj; // can be one-after pointer of 'x'
>    p[-1] = 1;         // well defined?
> }
>
> If I understand correctly, a pointer obtained from
> pi + 4 would have a "anything" provenance (which is
> fine). But the pointer obtained from 'pj' would have the
> provenance of 'y' so the access to 'x' would not
> be allowed. But according to the preferred version of
> our proposal, the pointer could also be used to
> access 'x' because it is also exposed.

That's an interesting example.  Here's a live version
in Cerberus, which shows (at step 19) the ambiguous
provenance of the result of the (int*) cast, resolved
in the next steps by the pointer subtraction.

https://cerberus.cl.cam.ac.uk/?short/a23efd

So it's allowed in PNVI-ae-udi.  For interest,
PNVI-plain makes this UB (as one can see by changing
the selected option under the Model dropdown).

> GCC could make pj have a "anything" provenance
> even though it is not modified. (This would break
> some optimization such as the one for Matlab.)
>
> Maybe one could also refine this optimization to check
> for additional conditions which rule out the case
> that there is another object the pointer could point
> to.
>
> Best,
> Martin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-17 14:12                   ` Uecker, Martin
  2019-04-17 17:31                     ` Peter Sewell
@ 2019-04-18  9:32                     ` Richard Biener
  2019-04-18  9:56                       ` Richard Biener
  2019-04-18 10:45                       ` Peter Sewell
  1 sibling, 2 replies; 56+ messages in thread
From: Richard Biener @ 2019-04-18  9:32 UTC (permalink / raw)
  To: Uecker, Martin; +Cc: gcc, Peter.Sewell, law, cl-c-memory-object-model

On Wed, Apr 17, 2019 at 4:12 PM Uecker, Martin
<Martin.Uecker@med.uni-goettingen.de> wrote:
>
> Am Mittwoch, den 17.04.2019, 15:34 +0200 schrieb Richard Biener:
> > On Wed, Apr 17, 2019 at 2:56 PM Uecker, Martin
> > <Martin.Uecker@med.uni-goettingen.de> wrote:
> > >
> > > Am Mittwoch, den 17.04.2019, 14:41 +0200 schrieb Richard Biener:
> > > > On Wed, Apr 17, 2019 at 1:53 PM Uecker, Martin
> > > > <Martin.Uecker@med.uni-goettingen.de> wrote:
> > > > >
> > > > > >  Since
> > > > > > your proposal is based on an abstract machine there isn't anything
> > > > > > like a pointer with multiple provenances (which "anything" is), just
> > > > > > pointers with no provenance (pointing outside of any object), right?
> > > > >
> > > > > This is correct. What the proposal does though is put a limit
> > > > > on where pointers obtained from integers are allowed to point
> > > > > to: They cannot point to non-exposed objects. I assume GCC
> > > > > "anything" provenances also cannot point to all possible
> > > > > objects.
> > > >
> > > > Yes.  We exclude objects that do not have their address taken
> > > > though (so somewhat similar to your "exposed").
> > >
> > > Also if the address never escapes?
> >
> > Yes.
>
> Then with respect to "expose" it seems GCC implements
> a superset which means it allows some behavior which
> is undefined according to the proposal. So all seems
> well with respect to this part.
>
>
> With respect to tracking provenance through integers
> some changes might be required.
>
> Let's consider this example:
>
> int x;
> int y;
> uintptr_t pi = (uintptr_t)&x;
> uintptr_t pj = (uintptr_t)&y;
>
> if (pi + 4 == pj) {
>
>    int* p = (int*)pj; // can be one-after pointer of 'x'
>    p[-1] = 1;         // well defined?
> }
>
> If I understand correctly, a pointer obtained from
> pi + 4 would have a "anything" provenance (which is
> fine). But the pointer obtained from 'pj' would have the
> provenance of 'y' so the access to 'x' would not
> be allowed.

Correct.  This is the most difficult case for us to handle
exactly also because (also valid for the proposal?)

int x;
int y;
uintptr_t pi = (uintptr_t)&x;
uintptr_t pj = (uintptr_t)&y;

if (pi + 4 == pj) {

   int* p = (int*)(pi + 4); // can be one-after pointer of 'x'
   p[-1] = 1;         // well defined?
}

while well-handled by GCC in the written form (as you
say, pi + 4 yields "anything" provenance), GCC itself
may tranform it into the first variant by noticing
the conditional equivalence and substituting pj for
pi + 4.

> But according to the preferred version of
> our proposal, the pointer could also be used to
> access 'x' because it is also exposed.
>
> GCC could make pj have a "anything" provenance
> even though it is not modified. (This would break
> some optimization such as the one for Matlab.)
>
> Maybe one could also refine this optimization to check
> for additional conditions which rule out the case
> that there is another object the pointer could point
> to.

The only feasible solution would be to not track
provenance through non-pointers and make
conversions of non-pointers to pointers have
"anything" provenance.

The additional issue that appears here though
is that we cannot even turn (int *)(uintptr_t)p
into p anymore since with the conditional
substitution we can then still arrive at
effectively (&y)[-1] = 1 which is of course
undefined behavior.

That is, your proposal makes

 ((int *)(uintptr_t)&y)[-1] = 1

well-defined (if &y - 1 == &x) but keeps

  (&y)[-1] = 1

as undefined which strikes me as a little bit
inconsistent.  If that's true it's IMHO worth
a defect report and second consideration.

Richard.

> Best,
> Martin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-18  9:32                     ` Richard Biener
@ 2019-04-18  9:56                       ` Richard Biener
  2019-04-18 10:48                         ` Peter Sewell
  2019-04-18 11:57                         ` Uecker, Martin
  2019-04-18 10:45                       ` Peter Sewell
  1 sibling, 2 replies; 56+ messages in thread
From: Richard Biener @ 2019-04-18  9:56 UTC (permalink / raw)
  To: Uecker, Martin; +Cc: gcc, Peter.Sewell, law, cl-c-memory-object-model

On Thu, Apr 18, 2019 at 11:31 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Wed, Apr 17, 2019 at 4:12 PM Uecker, Martin
> <Martin.Uecker@med.uni-goettingen.de> wrote:
> >
> > Am Mittwoch, den 17.04.2019, 15:34 +0200 schrieb Richard Biener:
> > > On Wed, Apr 17, 2019 at 2:56 PM Uecker, Martin
> > > <Martin.Uecker@med.uni-goettingen.de> wrote:
> > > >
> > > > Am Mittwoch, den 17.04.2019, 14:41 +0200 schrieb Richard Biener:
> > > > > On Wed, Apr 17, 2019 at 1:53 PM Uecker, Martin
> > > > > <Martin.Uecker@med.uni-goettingen.de> wrote:
> > > > > >
> > > > > > >  Since
> > > > > > > your proposal is based on an abstract machine there isn't anything
> > > > > > > like a pointer with multiple provenances (which "anything" is), just
> > > > > > > pointers with no provenance (pointing outside of any object), right?
> > > > > >
> > > > > > This is correct. What the proposal does though is put a limit
> > > > > > on where pointers obtained from integers are allowed to point
> > > > > > to: They cannot point to non-exposed objects. I assume GCC
> > > > > > "anything" provenances also cannot point to all possible
> > > > > > objects.
> > > > >
> > > > > Yes.  We exclude objects that do not have their address taken
> > > > > though (so somewhat similar to your "exposed").
> > > >
> > > > Also if the address never escapes?
> > >
> > > Yes.
> >
> > Then with respect to "expose" it seems GCC implements
> > a superset which means it allows some behavior which
> > is undefined according to the proposal. So all seems
> > well with respect to this part.
> >
> >
> > With respect to tracking provenance through integers
> > some changes might be required.
> >
> > Let's consider this example:
> >
> > int x;
> > int y;
> > uintptr_t pi = (uintptr_t)&x;
> > uintptr_t pj = (uintptr_t)&y;
> >
> > if (pi + 4 == pj) {
> >
> >    int* p = (int*)pj; // can be one-after pointer of 'x'
> >    p[-1] = 1;         // well defined?
> > }
> >
> > If I understand correctly, a pointer obtained from
> > pi + 4 would have a "anything" provenance (which is
> > fine). But the pointer obtained from 'pj' would have the
> > provenance of 'y' so the access to 'x' would not
> > be allowed.
>
> Correct.  This is the most difficult case for us to handle
> exactly also because (also valid for the proposal?)
>
> int x;
> int y;
> uintptr_t pi = (uintptr_t)&x;
> uintptr_t pj = (uintptr_t)&y;
>
> if (pi + 4 == pj) {
>
>    int* p = (int*)(pi + 4); // can be one-after pointer of 'x'
>    p[-1] = 1;         // well defined?
> }
>
> while well-handled by GCC in the written form (as you
> say, pi + 4 yields "anything" provenance), GCC itself
> may tranform it into the first variant by noticing
> the conditional equivalence and substituting pj for
> pi + 4.
>
> > But according to the preferred version of
> > our proposal, the pointer could also be used to
> > access 'x' because it is also exposed.
> >
> > GCC could make pj have a "anything" provenance
> > even though it is not modified. (This would break
> > some optimization such as the one for Matlab.)
> >
> > Maybe one could also refine this optimization to check
> > for additional conditions which rule out the case
> > that there is another object the pointer could point
> > to.
>
> The only feasible solution would be to not track
> provenance through non-pointers and make
> conversions of non-pointers to pointers have
> "anything" provenance.
>
> The additional issue that appears here though
> is that we cannot even turn (int *)(uintptr_t)p
> into p anymore since with the conditional
> substitution we can then still arrive at
> effectively (&y)[-1] = 1 which is of course
> undefined behavior.
>
> That is, your proposal makes
>
>  ((int *)(uintptr_t)&y)[-1] = 1
>
> well-defined (if &y - 1 == &x) but keeps
>
>   (&y)[-1] = 1
>
> as undefined which strikes me as a little bit
> inconsistent.  If that's true it's IMHO worth
> a defect report and second consideration.

Similarly that

int x;
int y;
uintptr_t pj = (uintptr_t)&y;

if (&x + 1 == &y) {

   int* p = (int*)pj; // can be one-after pointer of 'x'
   p[-1] = 1;         // well defined?
}

is undefined but when I add a no-op

 (uintptr_t)&x;

it is well-defined is undesirable.  Can this no-op
stmt appear in another function?  Or even in
another translation unit (if x and y are global variables)?
And does such stmt have to be present (in another
TU) to make the example valid in this case?

To me all this makes requiring exposal through a cast
to a non-pointer (or accessing its representation) not
in any way more "useful" for an optimizing compiler than
modeling exposal through address-taking.

Richard.

> Richard.
>
> > Best,
> > Martin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-18  9:32                     ` Richard Biener
  2019-04-18  9:56                       ` Richard Biener
@ 2019-04-18 10:45                       ` Peter Sewell
  2019-04-18 12:20                         ` Uecker, Martin
  1 sibling, 1 reply; 56+ messages in thread
From: Peter Sewell @ 2019-04-18 10:45 UTC (permalink / raw)
  To: Richard Biener; +Cc: Uecker, Martin, gcc, law, cl-c-memory-object-model

On Thu, 18 Apr 2019 at 10:32, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Wed, Apr 17, 2019 at 4:12 PM Uecker, Martin
> <Martin.Uecker@med.uni-goettingen.de> wrote:
> >
> > Am Mittwoch, den 17.04.2019, 15:34 +0200 schrieb Richard Biener:
> > > On Wed, Apr 17, 2019 at 2:56 PM Uecker, Martin
> > > <Martin.Uecker@med.uni-goettingen.de> wrote:
> > > >
> > > > Am Mittwoch, den 17.04.2019, 14:41 +0200 schrieb Richard Biener:
> > > > > On Wed, Apr 17, 2019 at 1:53 PM Uecker, Martin
> > > > > <Martin.Uecker@med.uni-goettingen.de> wrote:
> > > > > >
> > > > > > >  Since
> > > > > > > your proposal is based on an abstract machine there isn't anything
> > > > > > > like a pointer with multiple provenances (which "anything" is), just
> > > > > > > pointers with no provenance (pointing outside of any object), right?
> > > > > >
> > > > > > This is correct. What the proposal does though is put a limit
> > > > > > on where pointers obtained from integers are allowed to point
> > > > > > to: They cannot point to non-exposed objects. I assume GCC
> > > > > > "anything" provenances also cannot point to all possible
> > > > > > objects.
> > > > >
> > > > > Yes.  We exclude objects that do not have their address taken
> > > > > though (so somewhat similar to your "exposed").
> > > >
> > > > Also if the address never escapes?
> > >
> > > Yes.
> >
> > Then with respect to "expose" it seems GCC implements
> > a superset which means it allows some behavior which
> > is undefined according to the proposal. So all seems
> > well with respect to this part.
> >
> >
> > With respect to tracking provenance through integers
> > some changes might be required.
> >
> > Let's consider this example:
> >
> > int x;
> > int y;
> > uintptr_t pi = (uintptr_t)&x;
> > uintptr_t pj = (uintptr_t)&y;
> >
> > if (pi + 4 == pj) {
> >
> >    int* p = (int*)pj; // can be one-after pointer of 'x'
> >    p[-1] = 1;         // well defined?
> > }
> >
> > If I understand correctly, a pointer obtained from
> > pi + 4 would have a "anything" provenance (which is
> > fine). But the pointer obtained from 'pj' would have the
> > provenance of 'y' so the access to 'x' would not
> > be allowed.
>
> Correct.  This is the most difficult case for us to handle
> exactly also because (also valid for the proposal?)
>
> int x;
> int y;
> uintptr_t pi = (uintptr_t)&x;
> uintptr_t pj = (uintptr_t)&y;
>
> if (pi + 4 == pj) {
>
>    int* p = (int*)(pi + 4); // can be one-after pointer of 'x'
>    p[-1] = 1;         // well defined?
> }
>
> while well-handled by GCC in the written form (as you
> say, pi + 4 yields "anything" provenance), GCC itself
> may tranform it into the first variant by noticing
> the conditional equivalence and substituting pj for
> pi + 4.

In the proposed semantics, the integers have no provenance
info at all, so pj and pi+4 are interchangeable inside the conditional.

An equality test of two pointers, on the other hand, doesn't necessarily
mean that they are interchangeable.  I don't see any good way to
avoid that in a provenance semantics, where a one-past
pointer might sometimes compare equal to a pointer to an
adjacent object but be illegal for accessing it.


> > But according to the preferred version of
> > our proposal, the pointer could also be used to
> > access 'x' because it is also exposed.
> >
> > GCC could make pj have a "anything" provenance
> > even though it is not modified. (This would break
> > some optimization such as the one for Matlab.)
> >
> > Maybe one could also refine this optimization to check
> > for additional conditions which rule out the case
> > that there is another object the pointer could point
> > to.
>
> The only feasible solution would be to not track
> provenance through non-pointers and make
> conversions of non-pointers to pointers have
> "anything" provenance.
>
> The additional issue that appears here though
> is that we cannot even turn (int *)(uintptr_t)p
> into p anymore since with the conditional
> substitution we can then still arrive at
> effectively (&y)[-1] = 1 which is of course
> undefined behavior.
>
> That is, your proposal makes
>
>  ((int *)(uintptr_t)&y)[-1] = 1
>
> well-defined (if &y - 1 == &x) but keeps
>
>   (&y)[-1] = 1
>
> as undefined

that's true (if x has been exposed).

>which strikes me as a little bit
> inconsistent.  If that's true it's IMHO worth
> a defect report and second consideration.

There's a trade-off here. We could permit roundtrips
of pointer-to-integer-to-pointer only recover provenance
if the pointer is properly within the object, giving empty
provenance for a one-past pointer.  That would fix the
above, but it's not clear whether this would be a bad
restriction for existing code.

best,
Peter


> Richard.
>
> > Best,
> > Martin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-18  9:56                       ` Richard Biener
@ 2019-04-18 10:48                         ` Peter Sewell
  2019-04-18 11:57                         ` Uecker, Martin
  1 sibling, 0 replies; 56+ messages in thread
From: Peter Sewell @ 2019-04-18 10:48 UTC (permalink / raw)
  To: Richard Biener; +Cc: Uecker, Martin, gcc, law, cl-c-memory-object-model

On Thu, 18 Apr 2019 at 10:56, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Thu, Apr 18, 2019 at 11:31 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Wed, Apr 17, 2019 at 4:12 PM Uecker, Martin
> > <Martin.Uecker@med.uni-goettingen.de> wrote:
> > >
> > > Am Mittwoch, den 17.04.2019, 15:34 +0200 schrieb Richard Biener:
> > > > On Wed, Apr 17, 2019 at 2:56 PM Uecker, Martin
> > > > <Martin.Uecker@med.uni-goettingen.de> wrote:
> > > > >
> > > > > Am Mittwoch, den 17.04.2019, 14:41 +0200 schrieb Richard Biener:
> > > > > > On Wed, Apr 17, 2019 at 1:53 PM Uecker, Martin
> > > > > > <Martin.Uecker@med.uni-goettingen.de> wrote:
> > > > > > >
> > > > > > > >  Since
> > > > > > > > your proposal is based on an abstract machine there isn't anything
> > > > > > > > like a pointer with multiple provenances (which "anything" is), just
> > > > > > > > pointers with no provenance (pointing outside of any object), right?
> > > > > > >
> > > > > > > This is correct. What the proposal does though is put a limit
> > > > > > > on where pointers obtained from integers are allowed to point
> > > > > > > to: They cannot point to non-exposed objects. I assume GCC
> > > > > > > "anything" provenances also cannot point to all possible
> > > > > > > objects.
> > > > > >
> > > > > > Yes.  We exclude objects that do not have their address taken
> > > > > > though (so somewhat similar to your "exposed").
> > > > >
> > > > > Also if the address never escapes?
> > > >
> > > > Yes.
> > >
> > > Then with respect to "expose" it seems GCC implements
> > > a superset which means it allows some behavior which
> > > is undefined according to the proposal. So all seems
> > > well with respect to this part.
> > >
> > >
> > > With respect to tracking provenance through integers
> > > some changes might be required.
> > >
> > > Let's consider this example:
> > >
> > > int x;
> > > int y;
> > > uintptr_t pi = (uintptr_t)&x;
> > > uintptr_t pj = (uintptr_t)&y;
> > >
> > > if (pi + 4 == pj) {
> > >
> > >    int* p = (int*)pj; // can be one-after pointer of 'x'
> > >    p[-1] = 1;         // well defined?
> > > }
> > >
> > > If I understand correctly, a pointer obtained from
> > > pi + 4 would have a "anything" provenance (which is
> > > fine). But the pointer obtained from 'pj' would have the
> > > provenance of 'y' so the access to 'x' would not
> > > be allowed.
> >
> > Correct.  This is the most difficult case for us to handle
> > exactly also because (also valid for the proposal?)
> >
> > int x;
> > int y;
> > uintptr_t pi = (uintptr_t)&x;
> > uintptr_t pj = (uintptr_t)&y;
> >
> > if (pi + 4 == pj) {
> >
> >    int* p = (int*)(pi + 4); // can be one-after pointer of 'x'
> >    p[-1] = 1;         // well defined?
> > }
> >
> > while well-handled by GCC in the written form (as you
> > say, pi + 4 yields "anything" provenance), GCC itself
> > may tranform it into the first variant by noticing
> > the conditional equivalence and substituting pj for
> > pi + 4.
> >
> > > But according to the preferred version of
> > > our proposal, the pointer could also be used to
> > > access 'x' because it is also exposed.
> > >
> > > GCC could make pj have a "anything" provenance
> > > even though it is not modified. (This would break
> > > some optimization such as the one for Matlab.)
> > >
> > > Maybe one could also refine this optimization to check
> > > for additional conditions which rule out the case
> > > that there is another object the pointer could point
> > > to.
> >
> > The only feasible solution would be to not track
> > provenance through non-pointers and make
> > conversions of non-pointers to pointers have
> > "anything" provenance.
> >
> > The additional issue that appears here though
> > is that we cannot even turn (int *)(uintptr_t)p
> > into p anymore since with the conditional
> > substitution we can then still arrive at
> > effectively (&y)[-1] = 1 which is of course
> > undefined behavior.
> >
> > That is, your proposal makes
> >
> >  ((int *)(uintptr_t)&y)[-1] = 1
> >
> > well-defined (if &y - 1 == &x) but keeps
> >
> >   (&y)[-1] = 1
> >
> > as undefined which strikes me as a little bit
> > inconsistent.  If that's true it's IMHO worth
> > a defect report and second consideration.
>
> Similarly that
>
> int x;
> int y;
> uintptr_t pj = (uintptr_t)&y;
>
> if (&x + 1 == &y) {
>
>    int* p = (int*)pj; // can be one-after pointer of 'x'
>    p[-1] = 1;         // well defined?
> }
>
> is undefined but when I add a no-op
>
>  (uintptr_t)&x;
>
> it is well-defined is undesirable.  Can this no-op
> stmt appear in another function?  Or even in
> another translation unit (if x and y are global variables)?
> And does such stmt have to be present (in another
> TU) to make the example valid in this case?

yes to all that - again, in the variant in which
roundtrips of a one-past pointer are supported.

> To me all this makes requiring exposal through a cast
> to a non-pointer (or accessing its representation) not
> in any way more "useful" for an optimizing compiler than
> modeling exposal through address-taking.

interesting, thanks

best,
Peter


> Richard.
>
> > Richard.
> >
> > > Best,
> > > Martin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-18  9:56                       ` Richard Biener
  2019-04-18 10:48                         ` Peter Sewell
@ 2019-04-18 11:57                         ` Uecker, Martin
  2019-04-18 12:31                           ` Richard Biener
  1 sibling, 1 reply; 56+ messages in thread
From: Uecker, Martin @ 2019-04-18 11:57 UTC (permalink / raw)
  To: richard.guenther; +Cc: gcc, Peter.Sewell, law, cl-c-memory-object-model

Am Donnerstag, den 18.04.2019, 11:56 +0200 schrieb Richard Biener:
> On Thu, Apr 18, 2019 at 11:31 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
> > 
> > On Wed, Apr 17, 2019 at 4:12 PM Uecker, Martin
> > <Martin.Uecker@med.uni-goettingen.de> wrote:
> > > 
> > > Am Mittwoch, den 17.04.2019, 15:34 +0200 schrieb Richard Biener:
> > > > On Wed, Apr 17, 2019 at 2:56 PM Uecker, Martin
> > > > <Martin.Uecker@med.uni-goettingen.de> wrote:

....
> > > Let's consider this example:
> > > 
> > > int x;
> > > int y;
> > > uintptr_t pi = (uintptr_t)&x;
> > > uintptr_t pj = (uintptr_t)&y;
> > > 
> > > if (pi + 4 == pj) {
> > > 
> > >    int* p = (int*)pj; // can be one-after pointer of 'x'
> > >    p[-1] = 1;         // well defined?
> > > }
> > > 
> > > If I understand correctly, a pointer obtained from
> > > pi + 4 would have a "anything" provenance (which is
> > > fine). But the pointer obtained from 'pj' would have the
> > > provenance of 'y' so the access to 'x' would not
> > > be allowed.
> > 
> > Correct.  This is the most difficult case for us to handle
> > exactly also because (also valid for the proposal?)
> > 
> > int x;
> > int y;
> > uintptr_t pi = (uintptr_t)&x;
> > uintptr_t pj = (uintptr_t)&y;
> > 
> > if (pi + 4 == pj) {
> > 
> >    int* p = (int*)(pi + 4); // can be one-after pointer of 'x'
> >    p[-1] = 1;         // well defined?
> > }
> > 
> > while well-handled by GCC in the written form (as you
> > say, pi + 4 yields "anything" provenance), GCC itself
> > may tranform it into the first variant by noticing
> > the conditional equivalence and substituting pj for
> > pi + 4.

Integers are just integers in the proposal, so conditional
equivalence is not a problem for them. In my opinion this
is a strength of the proposal. Tracking provenance for
integers would mean that all computations would be affected
by such subtle semantics issues (where you can not even
replace an integer by an equivalent one). In this
proposal this is limited to pointers where it at least
makes some sense.

> > > But according to the preferred version of
> > > our proposal, the pointer could also be used to
> > > access 'x' because it is also exposed.
> > > 
> > > GCC could make pj have a "anything" provenance
> > > even though it is not modified. (This would break
> > > some optimization such as the one for Matlab.)
> > > 
> > > Maybe one could also refine this optimization to check
> > > for additional conditions which rule out the case
> > > that there is another object the pointer could point
> > > to.
> > 
> > The only feasible solution would be to not track
> > provenance through non-pointers and make
> > conversions of non-pointers to pointers have
> > "anything" provenance.

This would be one solution, yes. But you could
reattach the same provenance if you know that the
pointer points in the middle of an object (so is
not a first or one-after pointer) or if you know
that there is no exposed object directly adjacent
to this object, etc.. 

> > The additional issue that appears here though
> > is that we cannot even turn (int *)(uintptr_t)p
> > into p anymore since with the conditional
> > substitution we can then still arrive at
> > effectively (&y)[-1] = 1 which is of course
> > undefined behavior.
> > 
> > That is, your proposal makes
> > 
> >  ((int *)(uintptr_t)&y)[-1] = 1
> > 
> > well-defined (if &y - 1 == &x) but keeps
> > 
> >   (&y)[-1] = 1
> > 
> > as undefined which strikes me as a little bit
> > inconsistent.  If that's true it's IMHO worth
> > a defect report and second consideration.

This is true. But I would not call it inconsistent.
It is just unusual if you expect that casts to integers
and back are no-ops.  In this proposal a round-trip has
the effect of stripping the original provenance and
attaching a new one (which could be the same as the
old one).

While in this specific scenario this might seem
unreasonable, there are other examples where you may
want to be able to get from one object to the others.
and using casts to integers would then be the
blessed way to express this. 

In my opinion, this is also intuitive: 
By casting to an integer one then gets simple discrete
pointer semantics where one does not have provenance.


> Similarly that
> 
> int x;
> int y;
> uintptr_t pj = (uintptr_t)&y;
> 
> if (&x + 1 == &y) {
> 
>    int* p = (int*)pj; // can be one-after pointer of 'x'
>    p[-1] = 1;         // well defined?
> }
> 
> is undefined but when I add a no-op
> 
>  (uintptr_t)&x;
> 
> it is well-defined is undesirable.  Can this no-op
> stmt appear in another function?  Or even in
> another translation unit (if x and y are global variables)?
> And does such stmt have to be present (in another
> TU) to make the example valid in this case?

Without that statement, the example is not valid as the
address of 'x' is not exposed. With the statement this
becomes valid and it does not matter where this statement
appears. Again, I agree that he fact that such a statement
has a side-effect is something one needs to get used to.

But adress-taken already has side-effect which could be
surprising, doesn't it? If I understood your answer
above correctly, for GCC you get this side-effect already
without the cast:

&x;


For the statement to appear elsewhere, the address must
escape first. I would expect a compiler to treat a
cast to an integer identically to an escaped address.

> To me all this makes requiring exposal through a cast
> to a non-pointer (or accessing its representation) not
> in any way more "useful" for an optimizing compiler than
> modeling exposal through address-taking.

There would be a difference for cases like this:

int x[3];
int y;

x[0] = 1; 
uintptr_t pj = (uintptr_t)&y;

if (pi + 4 == pj) {

  int* p = (int*)(pi + 4);
  p[-1] = 1; 
}

Here 'x' is not exposed in our proposal so the assignment
via 'p' is invalid but the address is taken implicitly. 

Other examples is storage allocated via malloc/alloca
where there is always a pointer involved but which is
not automatically exposed in our proposal.


Best,
Martin




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-18 10:45                       ` Peter Sewell
@ 2019-04-18 12:20                         ` Uecker, Martin
  2019-04-18 12:42                           ` Richard Biener
  2019-04-18 13:42                           ` Jeff Law
  0 siblings, 2 replies; 56+ messages in thread
From: Uecker, Martin @ 2019-04-18 12:20 UTC (permalink / raw)
  To: Peter.Sewell, richard.guenther; +Cc: gcc, law, cl-c-memory-object-model

Am Donnerstag, den 18.04.2019, 11:45 +0100 schrieb Peter Sewell:
> On Thu, 18 Apr 2019 at 10:32, Richard Biener <richard.guenther@gmail.com> wrote:


> An equality test of two pointers, on the other hand, doesn't necessarily
> mean that they are interchangeable.  I don't see any good way to
> avoid that in a provenance semantics, where a one-past
> pointer might sometimes compare equal to a pointer to an
> adjacent object but be illegal for accessing it.

As I see it, there are essentially four options:

1.) Compilers do not use conditional equivalences for
optimizations of pointers (or only when additional
conditions apply which make it safe)

2.) We make pointer comparison between a pointer
and a one-after pointer of a different object
undefined behaviour.

3.) We make comparison have the side effect that
afterwards any of the two pointers could have any
of the two provenances. (with disambiguitation
similar to what we have for casts).

4.) Compilers make sure that exposed objects never
are allocated next to each other (as Jens proposed).


None of these options is great.


Best,
Martin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-18 11:57                         ` Uecker, Martin
@ 2019-04-18 12:31                           ` Richard Biener
  2019-04-18 13:25                             ` Uecker, Martin
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Biener @ 2019-04-18 12:31 UTC (permalink / raw)
  To: Uecker, Martin; +Cc: gcc, Peter.Sewell, law, cl-c-memory-object-model

On Thu, Apr 18, 2019 at 1:57 PM Uecker, Martin
<Martin.Uecker@med.uni-goettingen.de> wrote:
>
> Am Donnerstag, den 18.04.2019, 11:56 +0200 schrieb Richard Biener:
> > On Thu, Apr 18, 2019 at 11:31 AM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> > >
> > > On Wed, Apr 17, 2019 at 4:12 PM Uecker, Martin
> > > <Martin.Uecker@med.uni-goettingen.de> wrote:
> > > >
> > > > Am Mittwoch, den 17.04.2019, 15:34 +0200 schrieb Richard Biener:
> > > > > On Wed, Apr 17, 2019 at 2:56 PM Uecker, Martin
> > > > > <Martin.Uecker@med.uni-goettingen.de> wrote:
>
> ....
> > > > Let's consider this example:
> > > >
> > > > int x;
> > > > int y;
> > > > uintptr_t pi = (uintptr_t)&x;
> > > > uintptr_t pj = (uintptr_t)&y;
> > > >
> > > > if (pi + 4 == pj) {
> > > >
> > > >    int* p = (int*)pj; // can be one-after pointer of 'x'
> > > >    p[-1] = 1;         // well defined?
> > > > }
> > > >
> > > > If I understand correctly, a pointer obtained from
> > > > pi + 4 would have a "anything" provenance (which is
> > > > fine). But the pointer obtained from 'pj' would have the
> > > > provenance of 'y' so the access to 'x' would not
> > > > be allowed.
> > >
> > > Correct.  This is the most difficult case for us to handle
> > > exactly also because (also valid for the proposal?)
> > >
> > > int x;
> > > int y;
> > > uintptr_t pi = (uintptr_t)&x;
> > > uintptr_t pj = (uintptr_t)&y;
> > >
> > > if (pi + 4 == pj) {
> > >
> > >    int* p = (int*)(pi + 4); // can be one-after pointer of 'x'
> > >    p[-1] = 1;         // well defined?
> > > }
> > >
> > > while well-handled by GCC in the written form (as you
> > > say, pi + 4 yields "anything" provenance), GCC itself
> > > may tranform it into the first variant by noticing
> > > the conditional equivalence and substituting pj for
> > > pi + 4.
>
> Integers are just integers in the proposal, so conditional
> equivalence is not a problem for them. In my opinion this
> is a strength of the proposal. Tracking provenance for
> integers would mean that all computations would be affected
> by such subtle semantics issues (where you can not even
> replace an integer by an equivalent one). In this
> proposal this is limited to pointers where it at least
> makes some sense.
>
> > > > But according to the preferred version of
> > > > our proposal, the pointer could also be used to
> > > > access 'x' because it is also exposed.
> > > >
> > > > GCC could make pj have a "anything" provenance
> > > > even though it is not modified. (This would break
> > > > some optimization such as the one for Matlab.)
> > > >
> > > > Maybe one could also refine this optimization to check
> > > > for additional conditions which rule out the case
> > > > that there is another object the pointer could point
> > > > to.
> > >
> > > The only feasible solution would be to not track
> > > provenance through non-pointers and make
> > > conversions of non-pointers to pointers have
> > > "anything" provenance.
>
> This would be one solution, yes. But you could
> reattach the same provenance if you know that the
> pointer points in the middle of an object (so is
> not a first or one-after pointer) or if you know
> that there is no exposed object directly adjacent
> to this object, etc..
>
> > > The additional issue that appears here though
> > > is that we cannot even turn (int *)(uintptr_t)p
> > > into p anymore since with the conditional
> > > substitution we can then still arrive at
> > > effectively (&y)[-1] = 1 which is of course
> > > undefined behavior.
> > >
> > > That is, your proposal makes
> > >
> > >  ((int *)(uintptr_t)&y)[-1] = 1
> > >
> > > well-defined (if &y - 1 == &x) but keeps
> > >
> > >   (&y)[-1] = 1
> > >
> > > as undefined which strikes me as a little bit
> > > inconsistent.  If that's true it's IMHO worth
> > > a defect report and second consideration.
>
> This is true. But I would not call it inconsistent.
> It is just unusual if you expect that casts to integers
> and back are no-ops.  In this proposal a round-trip has
> the effect of stripping the original provenance and
> attaching a new one (which could be the same as the
> old one).

Well, the standard explicitely says that if you convert
a pointer to an integer (with the same or more precision)
and back you get the same pointer back.  That suggests
(int *)(uintptr_t)&y is a semantical no-op?

> While in this specific scenario this might seem
> unreasonable, there are other examples where you may
> want to be able to get from one object to the others.
> and using casts to integers would then be the
> blessed way to express this.

Sure, no arguing about this.  Sofar this all has been in
the hands of implementors to make uses of this idiom work,
now users will be able to wield the standards sword :/

> In my opinion, this is also intuitive:
> By casting to an integer one then gets simple discrete
> pointer semantics where one does not have provenance.
>
>
> > Similarly that
> >
> > int x;
> > int y;
> > uintptr_t pj = (uintptr_t)&y;
> >
> > if (&x + 1 == &y) {
> >
> >    int* p = (int*)pj; // can be one-after pointer of 'x'
> >    p[-1] = 1;         // well defined?
> > }
> >
> > is undefined but when I add a no-op
> >
> >  (uintptr_t)&x;
> >
> > it is well-defined is undesirable.  Can this no-op
> > stmt appear in another function?  Or even in
> > another translation unit (if x and y are global variables)?
> > And does such stmt have to be present (in another
> > TU) to make the example valid in this case?
>
> Without that statement, the example is not valid as the
> address of 'x' is not exposed. With the statement this
> becomes valid and it does not matter where this statement
> appears. Again, I agree that he fact that such a statement
> has a side-effect is something one needs to get used to.
>
> But adress-taken already has side-effect which could be
> surprising, doesn't it? If I understood your answer
> above correctly, for GCC you get this side-effect already
> without the cast:
>
> &x;

Well, yes.  But for GCC the important issue is whether
this address-taking is still done after optimization
(at the point we use provenance info to compute points-to sets).
So this plain stmt wouldn't survive and would not make
the example valid.  It's of course a lot harder to write this
down into standard wording ;)  (if not impossible...)

I guess there as to be a data dependence between an address-taken
operation and recreating that address (or a derived one to the same
object).  That is, we're trying to support delta-compressing pointers
as often used in shared memory data structures.

But as you've seen already conditional "dependences" are prone
to break.

> For the statement to appear elsewhere, the address must
> escape first. I would expect a compiler to treat a
> cast to an integer identically to an escaped address.

Sure, (uintptr_t)&a also takes the address of a and passing
that integer to a function makes the address of the object a
escape.

> > To me all this makes requiring exposal through a cast
> > to a non-pointer (or accessing its representation) not
> > in any way more "useful" for an optimizing compiler than
> > modeling exposal through address-taking.
>
> There would be a difference for cases like this:
>
> int x[3];
> int y;
>
> x[0] = 1;
> uintptr_t pj = (uintptr_t)&y;
>
> if (pi + 4 == pj) {
>
>   int* p = (int*)(pi + 4);
>   p[-1] = 1;
> }
>
> Here 'x' is not exposed in our proposal so the assignment
> via 'p' is invalid but the address is taken implicitly.

Via the x[0] - yes.  Unfortunate details of the C standard ;)

> Other examples is storage allocated via malloc/alloca
> where there is always a pointer involved but which is
> not automatically exposed in our proposal.

True, but the compiler nevertheless has to assume it is exposed
once that pointer escapes the current function (or TU).  It's
hard to make the validity decision at parsing time and at
optimization time a stmt like

 (uintptr_t)ptr;

is gone very quickly.

Richard.

>
> Best,
> Martin
>
>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-18 12:20                         ` Uecker, Martin
@ 2019-04-18 12:42                           ` Richard Biener
  2019-04-18 12:47                             ` Jakub Jelinek
                                               ` (2 more replies)
  2019-04-18 13:42                           ` Jeff Law
  1 sibling, 3 replies; 56+ messages in thread
From: Richard Biener @ 2019-04-18 12:42 UTC (permalink / raw)
  To: Uecker, Martin; +Cc: Peter.Sewell, gcc, law, cl-c-memory-object-model

On Thu, Apr 18, 2019 at 2:20 PM Uecker, Martin
<Martin.Uecker@med.uni-goettingen.de> wrote:
>
> Am Donnerstag, den 18.04.2019, 11:45 +0100 schrieb Peter Sewell:
> > On Thu, 18 Apr 2019 at 10:32, Richard Biener <richard.guenther@gmail.com> wrote:
>
>
> > An equality test of two pointers, on the other hand, doesn't necessarily
> > mean that they are interchangeable.  I don't see any good way to
> > avoid that in a provenance semantics, where a one-past
> > pointer might sometimes compare equal to a pointer to an
> > adjacent object but be illegal for accessing it.
>
> As I see it, there are essentially four options:
>
> 1.) Compilers do not use conditional equivalences for
> optimizations of pointers (or only when additional
> conditions apply which make it safe)
>
> 2.) We make pointer comparison between a pointer
> and a one-after pointer of a different object
> undefined behaviour.

Yes please!  OTOH GCC transforms
(uintptr_t)&a != (uintptr_t)(&b+1)
into &a != &b + 1 (for equality compares) and then
doesn't follow this C rule anyways.

> 3.) We make comparison have the side effect that
> afterwards any of the two pointers could have any
> of the two provenances. (with disambiguitation
> similar to what we have for casts).
>
> 4.) Compilers make sure that exposed objects never
> are allocated next to each other (as Jens proposed).

5.) While the standard guarantees that (int *)(uintptr_t)p == p
it does not guarantee that (uintptr_t)&a and (uintptr_t)&b
have a specific relation to each other.  To me this means
that (uintptr_t)(&b + 1) - (uintptr_t)&b is not necessarily
equal to sizeof(b).  (of course it's a QOI issue if that doesn't
hold)

> None of these options is great.

Indeed.  But you are now writing down one specific variant
(which isn't great either).  Sometimes no written down variant
is better than a not so great one, even if there isn't any obviously
greater one.

That said, GCCs implementation of the proposal might be
to require -fno-tree-pta to follow it.  And even that might not
fully rescue us because of that (int *)(uintptr_t) stripping...

At least I see no way to make use of the "exposed"ness
and thus we have to assume every variable is exposed.
Of course similar if the address-taken variant would be
written down in the standard given the standard applies
to the source form and not some intermediate (optimized)
compiler language.

Richard.

>
>
> Best,
> Martin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-18 12:42                           ` Richard Biener
@ 2019-04-18 12:47                             ` Jakub Jelinek
  2019-04-18 12:51                               ` Jakub Jelinek
  2019-04-18 13:49                             ` Uecker, Martin
  2019-04-19  8:19                             ` Jens Gustedt
  2 siblings, 1 reply; 56+ messages in thread
From: Jakub Jelinek @ 2019-04-18 12:47 UTC (permalink / raw)
  To: Richard Biener
  Cc: Uecker, Martin, Peter.Sewell, gcc, law, cl-c-memory-object-model

On Thu, Apr 18, 2019 at 02:42:22PM +0200, Richard Biener wrote:
> > 1.) Compilers do not use conditional equivalences for
> > optimizations of pointers (or only when additional
> > conditions apply which make it safe)
> >
> > 2.) We make pointer comparison between a pointer
> > and a one-after pointer of a different object
> > undefined behaviour.
> 
> Yes please!  OTOH GCC transforms
> (uintptr_t)&a != (uintptr_t)(&b+1)
> into &a != &b + 1 (for equality compares) and then

I think we don't.  It was http://gcc.gnu.org/PR88775, but we haven't applied
those changes, because we don't consider the point to start of one object
vs. pointer to end of another one case in pointer comparisons (but do
consider it in integral comparisons).

	Jakub

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-18 12:47                             ` Jakub Jelinek
@ 2019-04-18 12:51                               ` Jakub Jelinek
  2019-04-18 13:29                                 ` Jeff Law
  0 siblings, 1 reply; 56+ messages in thread
From: Jakub Jelinek @ 2019-04-18 12:51 UTC (permalink / raw)
  To: Richard Biener
  Cc: Uecker, Martin, Peter.Sewell, gcc, law, cl-c-memory-object-model

On Thu, Apr 18, 2019 at 02:47:18PM +0200, Jakub Jelinek wrote:
> On Thu, Apr 18, 2019 at 02:42:22PM +0200, Richard Biener wrote:
> > > 1.) Compilers do not use conditional equivalences for
> > > optimizations of pointers (or only when additional
> > > conditions apply which make it safe)
> > >
> > > 2.) We make pointer comparison between a pointer
> > > and a one-after pointer of a different object
> > > undefined behaviour.
> > 
> > Yes please!  OTOH GCC transforms
> > (uintptr_t)&a != (uintptr_t)(&b+1)
> > into &a != &b + 1 (for equality compares) and then
> 
> I think we don't.  It was http://gcc.gnu.org/PR88775, but we haven't applied
> those changes, because we don't consider the point to start of one object
> vs. pointer to end of another one case in pointer comparisons (but do
> consider it in integral comparisons).

That said, in RTL we really don't differentiate between pointers and
integers and we'll need to do something about that one day.

	Jakub

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-18 12:31                           ` Richard Biener
@ 2019-04-18 13:25                             ` Uecker, Martin
  0 siblings, 0 replies; 56+ messages in thread
From: Uecker, Martin @ 2019-04-18 13:25 UTC (permalink / raw)
  To: richard.guenther; +Cc: gcc, Peter.Sewell, law, cl-c-memory-object-model

Am Donnerstag, den 18.04.2019, 14:30 +0200 schrieb Richard Biener:
> On Thu, Apr 18, 2019 at 1:57 PM Uecker, Martin
> <Martin.Uecker@med.uni-goettingen.de> wrote:
> > 
> > Am Donnerstag, den 18.04.2019, 11:56 +0200 schrieb Richard Biener:
> > > On Thu, Apr 18, 2019 at 11:31 AM Richard Biener
> > > <richard.guenther@gmail.com> wrote:
> > > > 

> > > > The additional issue that appears here though
> > > > is that we cannot even turn (int *)(uintptr_t)p
> > > > into p anymore since with the conditional
> > > > substitution we can then still arrive at
> > > > effectively (&y)[-1] = 1 which is of course
> > > > undefined behavior.
> > > > 
> > > > That is, your proposal makes
> > > > 
> > > >  ((int *)(uintptr_t)&y)[-1] = 1
> > > > 
> > > > well-defined (if &y - 1 == &x) but keeps
> > > > 
> > > >   (&y)[-1] = 1
> > > > 
> > > > as undefined which strikes me as a little bit
> > > > inconsistent.  If that's true it's IMHO worth
> > > > a defect report and second consideration.
> > 
> > This is true. But I would not call it inconsistent.
> > It is just unusual if you expect that casts to integers
> > and back are no-ops.  In this proposal a round-trip has
> > the effect of stripping the original provenance and
> > attaching a new one (which could be the same as the
> > old one).
> 
> Well, the standard explicitely says that if you convert
> a pointer to an integer (with the same or more precision)
> and back you get the same pointer back.  That suggests
> (int *)(uintptr_t)&y is a semantical no-op?

Not quite, it only guarantees that it compares equal
(7.20.1.4) which for pointers is (sadly) not the same.

But our proposal would make it work perfectly from a
programmer's point of view: The pointer you get back
can always be used instead of the original pointer.
But because it is not always clear whether this was
a pointer to a first element or a one-after pointer 
it has to work for both. For the compiler writer this
means that it is not the same pointer but a pointer
one know less about.

> > While in this specific scenario this might seem
> > unreasonable, there are other examples where you may
> > want to be able to get from one object to the others.
> > and using casts to integers would then be the
> > blessed way to express this.
> 
> Sure, no arguing about this.  Sofar this all has been in
> the hands of implementors to make uses of this idiom work,
> now users will be able to wield the standards sword :/

Well, isn't this the point of a standard? But we want
to get this right and this is why we are talking to you.

> > In my opinion, this is also intuitive:
> > By casting to an integer one then gets simple discrete
> > pointer semantics where one does not have provenance.
> > 
> > 
> > > Similarly that
> > > 
> > > int x;
> > > int y;
> > > uintptr_t pj = (uintptr_t)&y;
> > > 
> > > if (&x + 1 == &y) {
> > > 
> > >    int* p = (int*)pj; // can be one-after pointer of 'x'
> > >    p[-1] = 1;         // well defined?
> > > }
> > > 
> > > is undefined but when I add a no-op
> > > 
> > >  (uintptr_t)&x;
> > > 
> > > it is well-defined is undesirable.  Can this no-op
> > > stmt appear in another function?  Or even in
> > > another translation unit (if x and y are global variables)?
> > > And does such stmt have to be present (in another
> > > TU) to make the example valid in this case?
> > 
> > Without that statement, the example is not valid as the
> > address of 'x' is not exposed. With the statement this
> > becomes valid and it does not matter where this statement
> > appears. Again, I agree that he fact that such a statement
> > has a side-effect is something one needs to get used to.
> > 
> > But adress-taken already has side-effect which could be
> > surprising, doesn't it? If I understood your answer
> > above correctly, for GCC you get this side-effect already
> > without the cast:
> > 
> > &x;
> 
> Well, yes.  But for GCC the important issue is whether
> this address-taking is still done after optimization
> (at the point we use provenance info to compute points-to sets).
> So this plain stmt wouldn't survive and would not make
> the example valid.  It's of course a lot harder to write this
> down into standard wording ;)  (if not impossible...)

"it has a side-effect whenever GCC does not optimize it away"
seems unlikely to get accepted in the standard ;-)

One could make a special rule about the statements with
unused results or add some language about "observability".

But couldn't the frontend simply mark the relevant casts?
(e.g. transform into __builtin_expose() or something)  

> I guess there as to be a data dependence between an address-taken
> operation and recreating that address (or a derived one to the same
> object).  That is, we're trying to support delta-compressing pointers
> as often used in shared memory data structures.
> 
> But as you've seen already conditional "dependences" are prone
> to break.

Yes, this is why we do not like it. Even assuming we could
make this sound, it would add a lot of complexity.

Limiting "provenance tracking"  to pointers where there
are a very limited amount of possible operations to begin
with and where we get 99% of the benefits makes a lot of
sense to me. But then every cast to an integer means we do
not track and the pointer has escaped.

> > For the statement to appear elsewhere, the address must
> > escape first. I would expect a compiler to treat a
> > cast to an integer identically to an escaped address.
> 
> Sure, (uintptr_t)&a also takes the address of a and passing
> that integer to a function makes the address of the object a
> escape.

My point is that even without the integer escaping, the
integer cast would imply it has escaped. But casts elsewhere
do not need to be considered, because this means the address
has escaped anyway.

> > > To me all this makes requiring exposal through a cast
> > > to a non-pointer (or accessing its representation) not
> > > in any way more "useful" for an optimizing compiler than
> > > modeling exposal through address-taking.
> > 
> > There would be a difference for cases like this:
> > 
> > int x[3];
> > int y;
> > 
> > x[0] = 1;
> > uintptr_t pj = (uintptr_t)&y;
> > 
> > if (pi + 4 == pj) {
> > 
> >   int* p = (int*)(pi + 4);
> >   p[-1] = 1;
> > }
> > 
> > Here 'x' is not exposed in our proposal so the assignment
> > via 'p' is invalid but the address is taken implicitly.
> 
> Via the x[0] - yes.  Unfortunate details of the C standard ;)
> 
> > Other examples is storage allocated via malloc/alloca
> > where there is always a pointer involved but which is
> > not automatically exposed in our proposal.
> 
> True, but the compiler nevertheless has to assume it is exposed
> once that pointer escapes the current function (or TU).  It's
> hard to make the validity decision at parsing time and at
> optimization time a stmt like
> 
>  (uintptr_t)ptr;
> 
> is gone very quickly.

Why not transform it into __builtin_expose in the frontend?

Best,
Martin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-18 12:51                               ` Jakub Jelinek
@ 2019-04-18 13:29                                 ` Jeff Law
  2019-04-24 10:12                                   ` Richard Biener
  0 siblings, 1 reply; 56+ messages in thread
From: Jeff Law @ 2019-04-18 13:29 UTC (permalink / raw)
  To: Jakub Jelinek, Richard Biener
  Cc: Uecker, Martin, Peter.Sewell, gcc, cl-c-memory-object-model

On 4/18/19 6:50 AM, Jakub Jelinek wrote:
> On Thu, Apr 18, 2019 at 02:47:18PM +0200, Jakub Jelinek wrote:
>> On Thu, Apr 18, 2019 at 02:42:22PM +0200, Richard Biener wrote:
>>>> 1.) Compilers do not use conditional equivalences for
>>>> optimizations of pointers (or only when additional
>>>> conditions apply which make it safe)
>>>>
>>>> 2.) We make pointer comparison between a pointer
>>>> and a one-after pointer of a different object
>>>> undefined behaviour.
>>>
>>> Yes please!  OTOH GCC transforms
>>> (uintptr_t)&a != (uintptr_t)(&b+1)
>>> into &a != &b + 1 (for equality compares) and then
>>
>> I think we don't.  It was http://gcc.gnu.org/PR88775, but we haven't applied
>> those changes, because we don't consider the point to start of one object
>> vs. pointer to end of another one case in pointer comparisons (but do
>> consider it in integral comparisons).
> 
> That said, in RTL we really don't differentiate between pointers and
> integers and we'll need to do something about that one day.
I'd be happy to get things sorted out up to the RTL transition,
particularly the cases involving equivalences.  Distinguishing between
pointer and same sized integers in RTL will be difficult.
jeff

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-18 12:20                         ` Uecker, Martin
  2019-04-18 12:42                           ` Richard Biener
@ 2019-04-18 13:42                           ` Jeff Law
  2019-04-18 13:54                             ` Uecker, Martin
  2019-04-24 10:19                             ` Richard Biener
  1 sibling, 2 replies; 56+ messages in thread
From: Jeff Law @ 2019-04-18 13:42 UTC (permalink / raw)
  To: Uecker, Martin, Peter.Sewell, richard.guenther
  Cc: gcc, cl-c-memory-object-model

On 4/18/19 6:20 AM, Uecker, Martin wrote:
> Am Donnerstag, den 18.04.2019, 11:45 +0100 schrieb Peter Sewell:
>> On Thu, 18 Apr 2019 at 10:32, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> 
>> An equality test of two pointers, on the other hand, doesn't necessarily
>> mean that they are interchangeable.  I don't see any good way to
>> avoid that in a provenance semantics, where a one-past
>> pointer might sometimes compare equal to a pointer to an
>> adjacent object but be illegal for accessing it.
> 
> As I see it, there are essentially four options:
> 
> 1.) Compilers do not use conditional equivalences for
> optimizations of pointers (or only when additional
> conditions apply which make it safe)
I know this will hit DOM and CSE.  I wouldn't be surprised if it touches
VRP as well, maybe PTA.  It seems simple enough though :-)

> 
> 2.) We make pointer comparison between a pointer
> and a one-after pointer of a different object
> undefined behaviour.
I generally like this as well, though I suspect it probably makes a lot
of currently well defined code undefined.

> 
> 3.) We make comparison have the side effect that
> afterwards any of the two pointers could have any
> of the two provenances. (with disambiguitation
> similar to what we have for casts).
This could have some interesting effects on PTA.  Richi?


> 
> 4.) Compilers make sure that exposed objects never
> are allocated next to each other (as Jens proposed).
Ugh.  Not sure how you enforce that.  Consider that the compiler may
ultimately have no control over layout of data in static storage.

jeff

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-18 12:42                           ` Richard Biener
  2019-04-18 12:47                             ` Jakub Jelinek
@ 2019-04-18 13:49                             ` Uecker, Martin
  2019-04-19  8:19                             ` Jens Gustedt
  2 siblings, 0 replies; 56+ messages in thread
From: Uecker, Martin @ 2019-04-18 13:49 UTC (permalink / raw)
  To: richard.guenther; +Cc: gcc, Peter.Sewell, law, cl-c-memory-object-model

Am Donnerstag, den 18.04.2019, 14:42 +0200 schrieb Richard Biener:
> On Thu, Apr 18, 2019 at 2:20 PM Uecker, Martin
> <Martin.Uecker@med.uni-goettingen.de> wrote:
> > 
> > Am Donnerstag, den 18.04.2019, 11:45 +0100 schrieb Peter Sewell:
> > > On Thu, 18 Apr 2019 at 10:32, Richard Biener <richard.guenther@gmail.com> wrote:
> > 
> > 
> > > An equality test of two pointers, on the other hand, doesn't necessarily
> > > mean that they are interchangeable.  I don't see any good way to
> > > avoid that in a provenance semantics, where a one-past
> > > pointer might sometimes compare equal to a pointer to an
> > > adjacent object but be illegal for accessing it.
> > 
> > As I see it, there are essentially four options:
> > 
> > 1.) Compilers do not use conditional equivalences for
> > optimizations of pointers (or only when additional
> > conditions apply which make it safe)
> > 
> > 2.) We make pointer comparison between a pointer
> > and a one-after pointer of a different object
> > undefined behaviour.
> 
> Yes please!  OTOH GCC transforms
> (uintptr_t)&a != (uintptr_t)(&b+1)
> into &a != &b + 1 (for equality compares) and then
> doesn't follow this C rule anyways.

I know this would be the best option from the point of
view of a compiler write.

My concern is that this adds a trap for programmers. You
can then compare arbitrary pointers except in this specific
special case.

> > 3.) We make comparison have the side effect that
> > afterwards any of the two pointers could have any
> > of the two provenances. (with disambiguitation
> > similar to what we have for casts).
> > 
> > 4.) Compilers make sure that exposed objects never
> > are allocated next to each other (as Jens proposed).
> 
> 5.) While the standard guarantees that (int *)(uintptr_t)p == p
> it does not guarantee that (uintptr_t)&a and (uintptr_t)&b
> have a specific relation to each other.  To me this means
> that (uintptr_t)(&b + 1) - (uintptr_t)&b is not necessarily
> equal to sizeof(b).  (of course it's a QOI issue if that doesn't
> hold)

I think a direct mapping from addresses to integer is the
only thing which is feasible in most cases. But maybe the
compiler could actually move 'a' and 'b' away from each other?

> > None of these options is great.
> 
> Indeed.  But you are now writing down one specific variant
> (which isn't great either).  Sometimes no written down variant
> is better than a not so great one, even if there isn't any obviously
> greater one.

I am not sure everybody would agree.

> That said, GCCs implementation of the proposal might be
> to require -fno-tree-pta to follow it.  And even that might not
> fully rescue us because of that (int *)(uintptr_t) stripping...

Don't strip it then? The FE could add a marker.

Best,
Marti

> At least I see no way to make use of the "exposed"ness
> and thus we have to assume every variable is exposed.
> Of course similar if the address-taken variant would be
> written down in the standard given the standard applies
> to the source form and not some intermediate (optimized)
> compiler language.




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-18 13:42                           ` Jeff Law
@ 2019-04-18 13:54                             ` Uecker, Martin
  2019-04-18 14:49                               ` Peter Sewell
  2019-04-24 10:19                             ` Richard Biener
  1 sibling, 1 reply; 56+ messages in thread
From: Uecker, Martin @ 2019-04-18 13:54 UTC (permalink / raw)
  To: law, Peter.Sewell, richard.guenther; +Cc: gcc, cl-c-memory-object-model

Am Donnerstag, den 18.04.2019, 07:42 -0600 schrieb Jeff Law:
> On 4/18/19 6:20 AM, Uecker, Martin wrote:
> > Am Donnerstag, den 18.04.2019, 11:45 +0100 schrieb Peter Sewell:
> > > On Thu, 18 Apr 2019 at 10:32, Richard Biener <richard.guenther@gmail.com> wrote:

...
> > 4.) Compilers make sure that exposed objects never
> > are allocated next to each other (as Jens proposed).
> 
> Ugh.  Not sure how you enforce that.  Consider that the compiler may
> ultimately have no control over layout of data in static storage.

One maybe only where it matters? I assume the biggest benefit
is for local variables and there the compiler has full control.

For arbitrary pointer coming from somewhere, one has no provenance
information anyway.

Best,
Martin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-18 13:54                             ` Uecker, Martin
@ 2019-04-18 14:49                               ` Peter Sewell
  2019-04-18 15:09                                 ` Uecker, Martin
  0 siblings, 1 reply; 56+ messages in thread
From: Peter Sewell @ 2019-04-18 14:49 UTC (permalink / raw)
  To: Uecker, Martin; +Cc: law, richard.guenther, gcc, cl-c-memory-object-model

On Thu, 18 Apr 2019 at 14:54, Uecker, Martin
<Martin.Uecker@med.uni-goettingen.de> wrote:
>
> Am Donnerstag, den 18.04.2019, 07:42 -0600 schrieb Jeff Law:
> > On 4/18/19 6:20 AM, Uecker, Martin wrote:
> > > Am Donnerstag, den 18.04.2019, 11:45 +0100 schrieb Peter Sewell:
> > > > On Thu, 18 Apr 2019 at 10:32, Richard Biener <richard.guenther@gmail.com> wrote:
>
> ...
> > > 4.) Compilers make sure that exposed objects never
> > > are allocated next to each other (as Jens proposed).
> >
> > Ugh.  Not sure how you enforce that.  Consider that the compiler may
> > ultimately have no control over layout of data in static storage.
>
> One maybe only where it matters? I assume the biggest benefit
> is for local variables and there the compiler has full control.
>
> For arbitrary pointer coming from somewhere, one has no provenance
> information anyway.

that's not quite true - one does know that it can't have the same provenance
as anything created more recently than the incoming pointer

> Best,
> Martin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-18 14:49                               ` Peter Sewell
@ 2019-04-18 15:09                                 ` Uecker, Martin
  0 siblings, 0 replies; 56+ messages in thread
From: Uecker, Martin @ 2019-04-18 15:09 UTC (permalink / raw)
  To: Peter.Sewell; +Cc: gcc, law, richard.guenther, cl-c-memory-object-model

Am Donnerstag, den 18.04.2019, 15:49 +0100 schrieb Peter Sewell:
> On Thu, 18 Apr 2019 at 14:54, Uecker, Martin
> <Martin.Uecker@med.uni-goettingen.de> wrote:
> > 
> > Am Donnerstag, den 18.04.2019, 07:42 -0600 schrieb Jeff Law:
> > > On 4/18/19 6:20 AM, Uecker, Martin wrote:
> > > > Am Donnerstag, den 18.04.2019, 11:45 +0100 schrieb Peter Sewell:
> > > > > On Thu, 18 Apr 2019 at 10:32, Richard Biener <richard.guenther@gmail.com> wrote:
> > 
> > ...
> > > > 4.) Compilers make sure that exposed objects never
> > > > are allocated next to each other (as Jens proposed).
> > > 
> > > Ugh.  Not sure how you enforce that.  Consider that the compiler may
> > > ultimately have no control over layout of data in static storage.
> > 
> > One maybe only where it matters? I assume the biggest benefit
> > is for local variables and there the compiler has full control.
> > 
> > For arbitrary pointer coming from somewhere, one has no provenance
> > information anyway.
> 
> that's not quite true - one does know that it can't have the same provenance
> as anything created more recently than the incoming pointer

Good point. But then the objects can not be next to each other anyway.

Best,
Martin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-18 12:42                           ` Richard Biener
  2019-04-18 12:47                             ` Jakub Jelinek
  2019-04-18 13:49                             ` Uecker, Martin
@ 2019-04-19  8:19                             ` Jens Gustedt
  2019-04-19  8:49                               ` Jakub Jelinek
  2019-04-19 10:01                               ` Uecker, Martin
  2 siblings, 2 replies; 56+ messages in thread
From: Jens Gustedt @ 2019-04-19  8:19 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc, cl-c-memory-object-model, law

[-- Attachment #1: Type: text/plain, Size: 1378 bytes --]

Hello,

On Thu, 18 Apr 2019 14:42:22 +0200 Richard Biener
<richard.guenther@gmail.com> wrote:

> On Thu, Apr 18, 2019 at 2:20 PM Uecker, Martin
> <Martin.Uecker@med.uni-goettingen.de> wrote:

> > 1.) Compilers do not use conditional equivalences for
> > optimizations of pointers (or only when additional
> > conditions apply which make it safe)
> >
> > 2.) We make pointer comparison between a pointer
> > and a one-after pointer of a different object
> > undefined behaviour.  
> 
> Yes please!

No please don't, not UB.

If any of this, make the result unspecified but not UB, please.

> OTOH GCC transforms
> (uintptr_t)&a != (uintptr_t)(&b+1)
> into &a != &b + 1 (for equality compares) and then
> doesn't follow this C rule anyways.

Actually our proposal we are discussing here goes exactly the other
way around. It basically reduces

  &a != &b + 1

to

  (uintptr_t)&a != (uintptr_t)(&b+1)

with only an exception for null pointers, but which probably don't
matter for a platform where null pointers are just all bits 0.

Jens

-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-19  8:19                             ` Jens Gustedt
@ 2019-04-19  8:49                               ` Jakub Jelinek
  2019-04-19  9:09                                 ` Jens Gustedt
  2019-04-19  9:11                                 ` Peter Sewell
  2019-04-19 10:01                               ` Uecker, Martin
  1 sibling, 2 replies; 56+ messages in thread
From: Jakub Jelinek @ 2019-04-19  8:49 UTC (permalink / raw)
  To: Jens Gustedt; +Cc: Richard Biener, gcc, cl-c-memory-object-model, law

On Fri, Apr 19, 2019 at 10:19:28AM +0200, Jens Gustedt wrote:
> > OTOH GCC transforms
> > (uintptr_t)&a != (uintptr_t)(&b+1)
> > into &a != &b + 1 (for equality compares) and then
> > doesn't follow this C rule anyways.
> 
> Actually our proposal we are discussing here goes exactly the other
> way around. It basically reduces
> 
>   &a != &b + 1
> 
> to
> 
>   (uintptr_t)&a != (uintptr_t)(&b+1)
> 
> with only an exception for null pointers, but which probably don't
> matter for a platform where null pointers are just all bits 0.

That penalizes quite a few optimizations though.
If you have
ptr != ptr2
and points-to analysis finds a set of variables ptr as well as ptr2 points
to and the sets would be disjoint, it would be nice to be able to optimize
that comparison away (gcc does); similarly, if one of the pointers is
&object or &object + sizeof (object).
By requiring what you request above, it can be pretty much never optimized,
unless the points-to analysis is able to also record if the pointer points
to the start, middle or end of object and only if it is known to be in the
middle it can safely optimize, for start or end it would need to prove the
other pointer is to end or start and only non-zero sized objects are
involved.

	Jakub

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-19  8:49                               ` Jakub Jelinek
@ 2019-04-19  9:09                                 ` Jens Gustedt
  2019-04-19  9:34                                   ` Jakub Jelinek
  2019-04-24 10:24                                   ` Richard Biener
  2019-04-19  9:11                                 ` Peter Sewell
  1 sibling, 2 replies; 56+ messages in thread
From: Jens Gustedt @ 2019-04-19  9:09 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Biener, gcc, cl-c-memory-object-model, law

[-- Attachment #1: Type: text/plain, Size: 2507 bytes --]

Hello Jakub,

On Fri, 19 Apr 2019 10:49:08 +0200 Jakub Jelinek <jakub@redhat.com>
wrote:

> On Fri, Apr 19, 2019 at 10:19:28AM +0200, Jens Gustedt wrote:
> > > OTOH GCC transforms
> > > (uintptr_t)&a != (uintptr_t)(&b+1)
> > > into &a != &b + 1 (for equality compares) and then
> > > doesn't follow this C rule anyways.  
> > 
> > Actually our proposal we are discussing here goes exactly the other
> > way around. It basically reduces
> > 
> >   &a != &b + 1
> > 
> > to
> > 
> >   (uintptr_t)&a != (uintptr_t)(&b+1)
> > 
> > with only an exception for null pointers, but which probably don't
> > matter for a platform where null pointers are just all bits 0.  
> 
> That penalizes quite a few optimizations though.
> If you have
> ptr != ptr2
> and points-to analysis finds a set of variables ptr as well as ptr2
> points to and the sets would be disjoint, it would be nice to be able
> to optimize that comparison away

yes

> (gcc does);

great

> similarly, if one of the
> pointers is &object or &object + sizeof (object).

Here I don't follow. Why would one waste brain and ressources to
optimize code that does such tricks?

> By requiring what you request above, it can be pretty much never
> optimized, unless the points-to analysis is able to also record if
> the pointer points to the start, middle or end of object and only if
> it is known to be in the middle it can safely optimize, for start or
> end it would need to prove the other pointer is to end or start and
> only non-zero sized objects are involved.

I have the impression that you just propose an inversion of the
roles. What you require is the user to keep track of this kind of
information, and to know when they do (or should not) compare a
one-passed pointer to something with a different provenance.

I just don't feel that it is adequate to impose such a detailed
knowledge on users, which is basically about a marginal use
case. One-off pointers don't occur "naturally" in many places, I'd
guess. Using them for anything else than to test bounds for array
traversal is insane, and there "usually" the test is with `<`, anyhow,
which has different rules.

Jens

-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-19  8:49                               ` Jakub Jelinek
  2019-04-19  9:09                                 ` Jens Gustedt
@ 2019-04-19  9:11                                 ` Peter Sewell
  2019-04-19  9:15                                   ` Jens Gustedt
  1 sibling, 1 reply; 56+ messages in thread
From: Peter Sewell @ 2019-04-19  9:11 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Jens Gustedt, Richard Biener, gcc, law, cl-c-memory-object-model

On 19/04/2019, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, Apr 19, 2019 at 10:19:28AM +0200, Jens Gustedt wrote:
>> > OTOH GCC transforms
>> > (uintptr_t)&a != (uintptr_t)(&b+1)
>> > into &a != &b + 1 (for equality compares) and then
>> > doesn't follow this C rule anyways.
>>
>> Actually our proposal we are discussing here goes exactly the other
>> way around. It basically reduces
>>
>>   &a != &b + 1
>>
>> to
>>
>>   (uintptr_t)&a != (uintptr_t)(&b+1)
>>
>> with only an exception for null pointers, but which probably don't
>> matter for a platform where null pointers are just all bits 0.
>
> That penalizes quite a few optimizations though.
> If you have
> ptr != ptr2
> and points-to analysis finds a set of variables ptr as well as ptr2 points
> to and the sets would be disjoint, it would be nice to be able to optimize
> that comparison away (gcc does); similarly, if one of the pointers is
> &object or &object + sizeof (object).
> By requiring what you request above, it can be pretty much never optimized,
> unless the points-to analysis is able to also record if the pointer points
> to the start, middle or end of object and only if it is known to be in the
> middle it can safely optimize, for start or end it would need to prove the
> other pointer is to end or start and only non-zero sized objects are
> involved.

A possible compromise position might be to make it implementation-defined
whether round-trip casts of a one-past pointer into integer and back preserve
provenance.   I don't know whether that corner case crops up in real code...

best,
Peter

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-19  9:11                                 ` Peter Sewell
@ 2019-04-19  9:15                                   ` Jens Gustedt
  2019-04-19  9:35                                     ` Peter Sewell
  0 siblings, 1 reply; 56+ messages in thread
From: Jens Gustedt @ 2019-04-19  9:15 UTC (permalink / raw)
  Cc: Jakub Jelinek, Richard Biener, gcc, law, cl-c-memory-object-model

[-- Attachment #1: Type: text/plain, Size: 1660 bytes --]

Hello Peter,

On Fri, 19 Apr 2019 10:11:43 +0100 Peter Sewell
<Peter.Sewell@cl.cam.ac.uk> wrote:

> On 19/04/2019, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Fri, Apr 19, 2019 at 10:19:28AM +0200, Jens Gustedt wrote:  
>  [...]  

> > That penalizes quite a few optimizations though.
> > If you have
> > ptr != ptr2
> > and points-to analysis finds a set of variables ptr as well as ptr2
> > points to and the sets would be disjoint, it would be nice to be
> > able to optimize that comparison away (gcc does); similarly, if one
> > of the pointers is &object or &object + sizeof (object).
> > By requiring what you request above, it can be pretty much never
> > optimized, unless the points-to analysis is able to also record if
> > the pointer points to the start, middle or end of object and only
> > if it is known to be in the middle it can safely optimize, for
> > start or end it would need to prove the other pointer is to end or
> > start and only non-zero sized objects are involved.  
> 
> A possible compromise position might be to make it
> implementation-defined whether round-trip casts of a one-past pointer
> into integer and back preserve provenance.   I don't know whether
> that corner case crops up in real code...

Wouldn't that impose to keep track of some provenance information in
integers?

Jens

-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-19  9:09                                 ` Jens Gustedt
@ 2019-04-19  9:34                                   ` Jakub Jelinek
  2019-04-21  8:15                                     ` Jens Gustedt
  2019-04-24 10:24                                   ` Richard Biener
  1 sibling, 1 reply; 56+ messages in thread
From: Jakub Jelinek @ 2019-04-19  9:34 UTC (permalink / raw)
  To: Jens Gustedt; +Cc: Richard Biener, gcc, cl-c-memory-object-model, law

On Fri, Apr 19, 2019 at 11:09:27AM +0200, Jens Gustedt wrote:
> > similarly, if one of the
> > pointers is &object or &object + sizeof (object).
> 
> Here I don't follow. Why would one waste brain and ressources to
> optimize code that does such tricks?

What tricks?  A normal pointer comparison either of two pointers
or a pointer and address of something is something that happens in
real-world code all the time, and in many cases it is essential
that optimizing compilers attempt to optimize such tests as much as
possible.
In the http://gcc.gnu.org/PR88775 (yes, it is C++, not C, but I don't see
significant differences there), we started using uintptr_t comparisons
instead of pointer comparisons in std::less etc. because the C++ standard
requires that those are total ordering even for pointers and because we do
optimize those ptr != ptr2 comparisons but don't for the uintptr_t casts
because of the ptr points to end of obj1 and ptr2 points to obj2 that might
be allocated adjacent we generate significantly worse code.
Now you are suggesting we must generate such bad code even for the pointer
comparisons.

	Jakub

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-19  9:15                                   ` Jens Gustedt
@ 2019-04-19  9:35                                     ` Peter Sewell
  2019-04-19 10:35                                       ` Uecker, Martin
  0 siblings, 1 reply; 56+ messages in thread
From: Peter Sewell @ 2019-04-19  9:35 UTC (permalink / raw)
  To: Jens Gustedt
  Cc: Jakub Jelinek, Richard Biener, law, cl-c-memory-object-model, gcc

On 19/04/2019, Jens Gustedt <jens.gustedt@inria.fr> wrote:
> Hello Peter,
>
> On Fri, 19 Apr 2019 10:11:43 +0100 Peter Sewell
> <Peter.Sewell@cl.cam.ac.uk> wrote:
>
>> On 19/04/2019, Jakub Jelinek <jakub@redhat.com> wrote:
>> > On Fri, Apr 19, 2019 at 10:19:28AM +0200, Jens Gustedt wrote:
>>  [...]
>
>> > That penalizes quite a few optimizations though.
>> > If you have
>> > ptr != ptr2
>> > and points-to analysis finds a set of variables ptr as well as ptr2
>> > points to and the sets would be disjoint, it would be nice to be
>> > able to optimize that comparison away (gcc does); similarly, if one
>> > of the pointers is &object or &object + sizeof (object).
>> > By requiring what you request above, it can be pretty much never
>> > optimized, unless the points-to analysis is able to also record if
>> > the pointer points to the start, middle or end of object and only
>> > if it is known to be in the middle it can safely optimize, for
>> > start or end it would need to prove the other pointer is to end or
>> > start and only non-zero sized objects are involved.
>>
>> A possible compromise position might be to make it
>> implementation-defined whether round-trip casts of a one-past pointer
>> into integer and back preserve provenance.   I don't know whether
>> that corner case crops up in real code...
>
> Wouldn't that impose to keep track of some provenance information in
> integers?

I was conflating two things, sorry.  I meant an adaption of PNVI-ae-udi
that would let implementations turn off the udi part if they wish.   Then
for those, casting a one-past pointer into integer and back would give an
empty-provenance pointer that couldn't be used for accesses, which helps
with the p[-1] examples that Richard was thinking of.  As we think this
roundtrip casting of a one-past pointer might be an exotic corner case,
this might be reasonable.

This:
>> > If you have
>> > ptr != ptr2
>> > and points-to analysis finds a set of variables ptr as well as ptr2
>> > points to and the sets would be disjoint, it would be nice to be
>> > able to optimize that comparison away (gcc does)

seems to be an argument to keep pointer == nondeterministically
provenance-sensitive or not, though whether it outweighs the
simplicity gain of making == just examine the address isn't clear to me.
My inclination would still be to the latter.

best,
Peter




> Jens
>
> --
> :: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
> :: ::::::::::::::: office Strasbourg : +33 368854536   ::
> :: :::::::::::::::::::::: gsm France : +33 651400183   ::
> :: ::::::::::::::: gsm international : +49 15737185122 ::
> :: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-19  8:19                             ` Jens Gustedt
  2019-04-19  8:49                               ` Jakub Jelinek
@ 2019-04-19 10:01                               ` Uecker, Martin
  1 sibling, 0 replies; 56+ messages in thread
From: Uecker, Martin @ 2019-04-19 10:01 UTC (permalink / raw)
  To: jens.gustedt, richard.guenther; +Cc: gcc, law, cl-c-memory-object-model

Am Freitag, den 19.04.2019, 10:19 +0200 schrieb Jens Gustedt:
> Hello,
> 
> On Thu, 18 Apr 2019 14:42:22 +0200 Richard Biener
> <richard.guenther@gmail.com> wrote:
> 
> > On Thu, Apr 18, 2019 at 2:20 PM Uecker, Martin
> > <Martin.Uecker@med.uni-goettingen.de> wrote:
> > > 1.) Compilers do not use conditional equivalences for
> > > optimizations of pointers (or only when additional
> > > conditions apply which make it safe)
> > > 
> > > 2.) We make pointer comparison between a pointer
> > > and a one-after pointer of a different object
> > > undefined behaviour.  
> > 
> > Yes please!
> 
> No please don't, not UB.
> 
> If any of this, make the result unspecified but not UB, please.

Making it unspecified would not help with
conditional equivalences. The problem here
is that compilers would like to assume (and
sometimes incorrectly already do so) that in the
following conditional comparison one
may substitute 'a' for 'b':

if (a == b) {

	*b = 1;
}

This is guaranteed to work only if comparison
implies identical semantics but this isn't
the case for pointers with provenance.

Best,
Martin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-19  9:35                                     ` Peter Sewell
@ 2019-04-19 10:35                                       ` Uecker, Martin
  0 siblings, 0 replies; 56+ messages in thread
From: Uecker, Martin @ 2019-04-19 10:35 UTC (permalink / raw)
  To: Peter.Sewell, jens.gustedt
  Cc: gcc, law, jakub, richard.guenther, cl-c-memory-object-model

Am Freitag, den 19.04.2019, 10:35 +0100 schrieb Peter Sewell:
> On 19/04/2019, Jens Gustedt <jens.gustedt@inria.fr> wrote:
> > Hello Peter,
> > 
> > On Fri, 19 Apr 2019 10:11:43 +0100 Peter Sewell
> > <Peter.Sewell@cl.cam.ac.uk> wrote:
> > 
> > > On 19/04/2019, Jakub Jelinek <jakub@redhat.com> wrote:
> > > > On Fri, Apr 19, 2019 at 10:19:28AM +0200, Jens Gustedt wrote:
 [...]
> I was conflating two things, sorry.  I meant an adaption of PNVI-ae-udi
> that would let implementations turn off the udi part if they wish.   Then
> for those, casting a one-past pointer into integer and back would give an
> empty-provenance pointer that couldn't be used for accesses, which helps
> with the p[-1] examples that Richard was thinking of.  As we think this
> roundtrip casting of a one-past pointer might be an exotic corner case,
> this might be reasonable.

The point of making such corner cast "just work" from a programmer's
point of view is to not have him need to know about and always
worry about it just to not accidentally write broken code.
Making it implementation-defined just makes it more complicated
to understand when it might work or not.

So in my opinion we should either make it work always or we
remove it completely and make the limitations of one-after
pointers very explicit. But my preference is still the first.

> 
> This:
> > > > If you have
> > > > ptr != ptr2
> > > > and points-to analysis finds a set of variables ptr as well as ptr2
> > > > points to and the sets would be disjoint, it would be nice to be
> > > > able to optimize that comparison away (gcc does)

BTW: This is always true. Just that the one-after case means
that the sets of addresses overlap in some cases where one
may (incorrectly) assume they don't when only
considering provenance.

> seems to be an argument to keep pointer == nondeterministically
> provenance-sensitive or not, though whether it outweighs the
> simplicity gain of making == just examine the address isn't clear to me.
> My inclination would still be to the latter.

This would be my preference too. In fact, I believe that most
optimizations could still be implemented.
Certainly, this adds complexity to the compiler
because it then also has do more sophisticated
checks and maybe also have some rules about
which objects could be adjacent to each other.

And as a last resort, there could be a compiler flag which
is off by default that enables optimizations based on
the assumption that one-after pointers are never compared
to pointers pointing to different objects. 

Best,
Martin



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-19  9:34                                   ` Jakub Jelinek
@ 2019-04-21  8:15                                     ` Jens Gustedt
  0 siblings, 0 replies; 56+ messages in thread
From: Jens Gustedt @ 2019-04-21  8:15 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Biener, gcc, cl-c-memory-object-model, law

[-- Attachment #1: Type: text/plain, Size: 2002 bytes --]

Hello Jakub,

On Fri, 19 Apr 2019 11:34:33 +0200 Jakub Jelinek <jakub@redhat.com>
wrote:

> On Fri, Apr 19, 2019 at 11:09:27AM +0200, Jens Gustedt wrote:
> > > similarly, if one of the
> > > pointers is &object or &object + sizeof (object).  
> > 
> > Here I don't follow. Why would one waste brain and ressources to
> > optimize code that does such tricks?  
> 
> What tricks?

&object + sizeof (object)

> A normal pointer comparison either of two pointers
> or a pointer and address of something is something that happens in
> real-world code all the time, and in many cases it is essential
> that optimizing compilers attempt to optimize such tests as much as
> possible.

Yes, but not if one of the addresses is a one-passed pointer, this is
a marginal use case.

> In the http://gcc.gnu.org/PR88775 (yes, it is C++, not C, but I don't
> see significant differences there),

Hm, probably my C++ is a bit rusty, but I see huge differences
here. What understand is that you have difficulties for some C++ code
that uses `operator=` overloading (instead of initialization) to
optimize that assignment. I see a lot of difficulties here, of which
some are common for C and C++ (the lack of proper treatement of string
literals as constants, for example) to purely C++ difficulties, e.g
not being able to model `restrict` pointer arguments, and to deal with
possible (or impossible) aliasing between a just created object and a
string literal.

So, I see a whole chain of reasoning breaking down with that code, but
nothing that convinces me that `operator==` for pointer types is the
culprit. The wrong is probably already done when it comes to it.


Jens

-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-18 13:29                                 ` Jeff Law
@ 2019-04-24 10:12                                   ` Richard Biener
  0 siblings, 0 replies; 56+ messages in thread
From: Richard Biener @ 2019-04-24 10:12 UTC (permalink / raw)
  To: Jeff Law
  Cc: Jakub Jelinek, Uecker, Martin, Peter.Sewell, gcc,
	cl-c-memory-object-model

On Thu, Apr 18, 2019 at 3:29 PM Jeff Law <law@redhat.com> wrote:
>
> On 4/18/19 6:50 AM, Jakub Jelinek wrote:
> > On Thu, Apr 18, 2019 at 02:47:18PM +0200, Jakub Jelinek wrote:
> >> On Thu, Apr 18, 2019 at 02:42:22PM +0200, Richard Biener wrote:
> >>>> 1.) Compilers do not use conditional equivalences for
> >>>> optimizations of pointers (or only when additional
> >>>> conditions apply which make it safe)
> >>>>
> >>>> 2.) We make pointer comparison between a pointer
> >>>> and a one-after pointer of a different object
> >>>> undefined behaviour.
> >>>
> >>> Yes please!  OTOH GCC transforms
> >>> (uintptr_t)&a != (uintptr_t)(&b+1)
> >>> into &a != &b + 1 (for equality compares) and then
> >>
> >> I think we don't.  It was http://gcc.gnu.org/PR88775, but we haven't applied
> >> those changes, because we don't consider the point to start of one object
> >> vs. pointer to end of another one case in pointer comparisons (but do
> >> consider it in integral comparisons).
> >
> > That said, in RTL we really don't differentiate between pointers and
> > integers and we'll need to do something about that one day.
> I'd be happy to get things sorted out up to the RTL transition,
> particularly the cases involving equivalences.  Distinguishing between
> pointer and same sized integers in RTL will be difficult.

But we run into this with very simple testcases so we do have to fix it.
There's no point trying to "enhance" the GIMPLE side when RTL makes
it break so easily.

Richard.

> jeff

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-18 13:42                           ` Jeff Law
  2019-04-18 13:54                             ` Uecker, Martin
@ 2019-04-24 10:19                             ` Richard Biener
  2019-04-24 18:41                               ` Jeff Law
  1 sibling, 1 reply; 56+ messages in thread
From: Richard Biener @ 2019-04-24 10:19 UTC (permalink / raw)
  To: Jeff Law; +Cc: Uecker, Martin, Peter.Sewell, gcc, cl-c-memory-object-model

On Thu, Apr 18, 2019 at 3:42 PM Jeff Law <law@redhat.com> wrote:
>
> On 4/18/19 6:20 AM, Uecker, Martin wrote:
> > Am Donnerstag, den 18.04.2019, 11:45 +0100 schrieb Peter Sewell:
> >> On Thu, 18 Apr 2019 at 10:32, Richard Biener <richard.guenther@gmail.com> wrote:
> >
> >
> >> An equality test of two pointers, on the other hand, doesn't necessarily
> >> mean that they are interchangeable.  I don't see any good way to
> >> avoid that in a provenance semantics, where a one-past
> >> pointer might sometimes compare equal to a pointer to an
> >> adjacent object but be illegal for accessing it.
> >
> > As I see it, there are essentially four options:
> >
> > 1.) Compilers do not use conditional equivalences for
> > optimizations of pointers (or only when additional
> > conditions apply which make it safe)
> I know this will hit DOM and CSE.  I wouldn't be surprised if it touches
> VRP as well, maybe PTA.  It seems simple enough though :-)

Also touches fundamental PHI-OPT transforms like

 if (a == b)
...

 # c = PHI <a, b>

where we'd lose eliding such a conditional.  IMHO that's bad
and very undesirable.

> >
> > 2.) We make pointer comparison between a pointer
> > and a one-after pointer of a different object
> > undefined behaviour.
> I generally like this as well, though I suspect it probably makes a lot
> of currently well defined code undefined.
>
> >
> > 3.) We make comparison have the side effect that
> > afterwards any of the two pointers could have any
> > of the two provenances. (with disambiguitation
> > similar to what we have for casts).
> This could have some interesting effects on PTA.  Richi?

I played with this and doing this in an incomplete way like
just handling

  if (a == b)

as two-way assignment during constraint building is possible.
But that's not enough of course since every call is implicitely
producing equivalences between everything [escaped] ...
which makes points-to degrade to a point where it is useless.

So I think we need a working scheme where points-to doesn't
degrade from equivalencies being computed and the compiler
being free to introduce equivalences as well as copy-propagate
those.

Honestly I can't come up with a working solution to this
problem.

>
> >
> > 4.) Compilers make sure that exposed objects never
> > are allocated next to each other (as Jens proposed).
> Ugh.  Not sure how you enforce that.  Consider that the compiler may
> ultimately have no control over layout of data in static storage.

Make everything 1 byte larger.

> jeff

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-19  9:09                                 ` Jens Gustedt
  2019-04-19  9:34                                   ` Jakub Jelinek
@ 2019-04-24 10:24                                   ` Richard Biener
  2019-04-24 18:43                                     ` Jeff Law
  1 sibling, 1 reply; 56+ messages in thread
From: Richard Biener @ 2019-04-24 10:24 UTC (permalink / raw)
  To: Jens Gustedt; +Cc: Jakub Jelinek, gcc, cl-c-memory-object-model, law

On Fri, Apr 19, 2019 at 11:09 AM Jens Gustedt <jens.gustedt@inria.fr> wrote:
>
> Hello Jakub,
>
> On Fri, 19 Apr 2019 10:49:08 +0200 Jakub Jelinek <jakub@redhat.com>
> wrote:
>
> > On Fri, Apr 19, 2019 at 10:19:28AM +0200, Jens Gustedt wrote:
> > > > OTOH GCC transforms
> > > > (uintptr_t)&a != (uintptr_t)(&b+1)
> > > > into &a != &b + 1 (for equality compares) and then
> > > > doesn't follow this C rule anyways.
> > >
> > > Actually our proposal we are discussing here goes exactly the other
> > > way around. It basically reduces
> > >
> > >   &a != &b + 1
> > >
> > > to
> > >
> > >   (uintptr_t)&a != (uintptr_t)(&b+1)
> > >
> > > with only an exception for null pointers, but which probably don't
> > > matter for a platform where null pointers are just all bits 0.
> >
> > That penalizes quite a few optimizations though.
> > If you have
> > ptr != ptr2
> > and points-to analysis finds a set of variables ptr as well as ptr2
> > points to and the sets would be disjoint, it would be nice to be able
> > to optimize that comparison away
>
> yes
>
> > (gcc does);
>
> great
>
> > similarly, if one of the
> > pointers is &object or &object + sizeof (object).
>
> Here I don't follow. Why would one waste brain and ressources to
> optimize code that does such tricks?
>
> > By requiring what you request above, it can be pretty much never
> > optimized, unless the points-to analysis is able to also record if
> > the pointer points to the start, middle or end of object and only if
> > it is known to be in the middle it can safely optimize, for start or
> > end it would need to prove the other pointer is to end or start and
> > only non-zero sized objects are involved.
>
> I have the impression that you just propose an inversion of the
> roles. What you require is the user to keep track of this kind of
> information, and to know when they do (or should not) compare a
> one-passed pointer to something with a different provenance.
>
> I just don't feel that it is adequate to impose such a detailed
> knowledge on users, which is basically about a marginal use
> case. One-off pointers don't occur "naturally" in many places,

They occur in the single important place - loop IV tests in
C++ style iterator != end where end is a "pointer" to one after
the last valid iterator value.

I'd
> guess. Using them for anything else than to test bounds for array
> traversal is insane, and there "usually" the test is with `<`, anyhow,
> which has different rules.

Unfortunately then C++ arrived and compilers were expected to
also optimize that nasty code.

Richard.

>
> Jens
>
> --
> :: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
> :: ::::::::::::::: office Strasbourg : +33 368854536   ::
> :: :::::::::::::::::::::: gsm France : +33 651400183   ::
> :: ::::::::::::::: gsm international : +49 15737185122 ::
> :: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-24 10:19                             ` Richard Biener
@ 2019-04-24 18:41                               ` Jeff Law
  2019-04-24 19:30                                 ` Philipp Klaus Krause
                                                   ` (3 more replies)
  0 siblings, 4 replies; 56+ messages in thread
From: Jeff Law @ 2019-04-24 18:41 UTC (permalink / raw)
  To: Richard Biener
  Cc: Uecker, Martin, Peter.Sewell, gcc, cl-c-memory-object-model

On 4/24/19 4:19 AM, Richard Biener wrote:
> On Thu, Apr 18, 2019 at 3:42 PM Jeff Law <law@redhat.com> wrote:
>>
>> On 4/18/19 6:20 AM, Uecker, Martin wrote:
>>> Am Donnerstag, den 18.04.2019, 11:45 +0100 schrieb Peter Sewell:
>>>> On Thu, 18 Apr 2019 at 10:32, Richard Biener <richard.guenther@gmail.com> wrote:
>>>
>>>
>>>> An equality test of two pointers, on the other hand, doesn't necessarily
>>>> mean that they are interchangeable.  I don't see any good way to
>>>> avoid that in a provenance semantics, where a one-past
>>>> pointer might sometimes compare equal to a pointer to an
>>>> adjacent object but be illegal for accessing it.
>>>
>>> As I see it, there are essentially four options:
>>>
>>> 1.) Compilers do not use conditional equivalences for
>>> optimizations of pointers (or only when additional
>>> conditions apply which make it safe)
>> I know this will hit DOM and CSE.  I wouldn't be surprised if it touches
>> VRP as well, maybe PTA.  It seems simple enough though :-)
> 
> Also touches fundamental PHI-OPT transforms like
> 
>  if (a == b)
> ...
> 
>  # c = PHI <a, b>
> 
> where we'd lose eliding such a conditional.  IMHO that's bad
> and very undesirable.
But if we only suppress this optimization for pointers is it that terrible?



>>>
>>> 3.) We make comparison have the side effect that
>>> afterwards any of the two pointers could have any
>>> of the two provenances. (with disambiguitation
>>> similar to what we have for casts).
>> This could have some interesting effects on PTA.  Richi?
> 
> I played with this and doing this in an incomplete way like
> just handling
> 
>   if (a == b)
> 
> as two-way assignment during constraint building is possible.
> But that's not enough of course since every call is implicitely
> producing equivalences between everything [escaped] ...
> which makes points-to degrade to a point where it is useless.
But the calls aren't generating conditional equivalences.  I must be
missing something here.  You're the expert in this space, so if you say
it totally degrades PTA, then it's a non-starter.

> 
> So I think we need a working scheme where points-to doesn't
> degrade from equivalencies being computed and the compiler
> being free to introduce equivalences as well as copy-propagate
> those.
> 
> Honestly I can't come up with a working solution to this
> problem.
> 
>>
>>>
>>> 4.) Compilers make sure that exposed objects never
>>> are allocated next to each other (as Jens proposed).
>> Ugh.  Not sure how you enforce that.  Consider that the compiler may
>> ultimately have no control over layout of data in static storage.
> 
> Make everything 1 byte larger.
Not a bad idea.  I suspect the embedded folks would go bananas though.

jeff

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-24 10:24                                   ` Richard Biener
@ 2019-04-24 18:43                                     ` Jeff Law
  2019-04-24 19:21                                       ` Jens Gustedt
  0 siblings, 1 reply; 56+ messages in thread
From: Jeff Law @ 2019-04-24 18:43 UTC (permalink / raw)
  To: Richard Biener, Jens Gustedt; +Cc: Jakub Jelinek, gcc, cl-c-memory-object-model

On 4/24/19 4:24 AM, Richard Biener wrote:
> On Fri, Apr 19, 2019 at 11:09 AM Jens Gustedt <jens.gustedt@inria.fr> wrote:
>>
>> Hello Jakub,
>>
>> On Fri, 19 Apr 2019 10:49:08 +0200 Jakub Jelinek <jakub@redhat.com>
>> wrote:
>>
>>> On Fri, Apr 19, 2019 at 10:19:28AM +0200, Jens Gustedt wrote:
>>>>> OTOH GCC transforms
>>>>> (uintptr_t)&a != (uintptr_t)(&b+1)
>>>>> into &a != &b + 1 (for equality compares) and then
>>>>> doesn't follow this C rule anyways.
>>>>
>>>> Actually our proposal we are discussing here goes exactly the other
>>>> way around. It basically reduces
>>>>
>>>>   &a != &b + 1
>>>>
>>>> to
>>>>
>>>>   (uintptr_t)&a != (uintptr_t)(&b+1)
>>>>
>>>> with only an exception for null pointers, but which probably don't
>>>> matter for a platform where null pointers are just all bits 0.
>>>
>>> That penalizes quite a few optimizations though.
>>> If you have
>>> ptr != ptr2
>>> and points-to analysis finds a set of variables ptr as well as ptr2
>>> points to and the sets would be disjoint, it would be nice to be able
>>> to optimize that comparison away
>>
>> yes
>>
>>> (gcc does);
>>
>> great
>>
>>> similarly, if one of the
>>> pointers is &object or &object + sizeof (object).
>>
>> Here I don't follow. Why would one waste brain and ressources to
>> optimize code that does such tricks?
>>
>>> By requiring what you request above, it can be pretty much never
>>> optimized, unless the points-to analysis is able to also record if
>>> the pointer points to the start, middle or end of object and only if
>>> it is known to be in the middle it can safely optimize, for start or
>>> end it would need to prove the other pointer is to end or start and
>>> only non-zero sized objects are involved.
>>
>> I have the impression that you just propose an inversion of the
>> roles. What you require is the user to keep track of this kind of
>> information, and to know when they do (or should not) compare a
>> one-passed pointer to something with a different provenance.
>>
>> I just don't feel that it is adequate to impose such a detailed
>> knowledge on users, which is basically about a marginal use
>> case. One-off pointers don't occur "naturally" in many places,
> 
> They occur in the single important place - loop IV tests in
> C++ style iterator != end where end is a "pointer" to one after
> the last valid iterator value.
I don't think this is limited to C++.

Jeff

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-24 18:43                                     ` Jeff Law
@ 2019-04-24 19:21                                       ` Jens Gustedt
  0 siblings, 0 replies; 56+ messages in thread
From: Jens Gustedt @ 2019-04-24 19:21 UTC (permalink / raw)
  To: Jeff Law, Richard Biener; +Cc: Jakub Jelinek, gcc, cl-c-memory-object-model

Am 24. April 2019 20:43:03 MESZ schrieb Jeff Law <law@redhat.com>:
>On 4/24/19 4:24 AM, Richard Biener wrote:
>> On Fri, Apr 19, 2019 at 11:09 AM Jens Gustedt <jens.gustedt@inria.fr>
>wrote:
>>>
>>> Hello Jakub,
>>>
>>> On Fri, 19 Apr 2019 10:49:08 +0200 Jakub Jelinek <jakub@redhat.com>
>>> wrote:
>>>
>>>> On Fri, Apr 19, 2019 at 10:19:28AM +0200, Jens Gustedt wrote:
>>>>>> OTOH GCC transforms
>>>>>> (uintptr_t)&a != (uintptr_t)(&b+1)
>>>>>> into &a != &b + 1 (for equality compares) and then
>>>>>> doesn't follow this C rule anyways.
>>>>>
>>>>> Actually our proposal we are discussing here goes exactly the
>other
>>>>> way around. It basically reduces
>>>>>
>>>>>   &a != &b + 1
>>>>>
>>>>> to
>>>>>
>>>>>   (uintptr_t)&a != (uintptr_t)(&b+1)
>>>>>
>>>>> with only an exception for null pointers, but which probably don't
>>>>> matter for a platform where null pointers are just all bits 0.
>>>>
>>>> That penalizes quite a few optimizations though.
>>>> If you have
>>>> ptr != ptr2
>>>> and points-to analysis finds a set of variables ptr as well as ptr2
>>>> points to and the sets would be disjoint, it would be nice to be
>able
>>>> to optimize that comparison away
>>>
>>> yes
>>>
>>>> (gcc does);
>>>
>>> great
>>>
>>>> similarly, if one of the
>>>> pointers is &object or &object + sizeof (object).
>>>
>>> Here I don't follow. Why would one waste brain and ressources to
>>> optimize code that does such tricks?
>>>
>>>> By requiring what you request above, it can be pretty much never
>>>> optimized, unless the points-to analysis is able to also record if
>>>> the pointer points to the start, middle or end of object and only
>if
>>>> it is known to be in the middle it can safely optimize, for start
>or
>>>> end it would need to prove the other pointer is to end or start and
>>>> only non-zero sized objects are involved.
>>>
>>> I have the impression that you just propose an inversion of the
>>> roles. What you require is the user to keep track of this kind of
>>> information, and to know when they do (or should not) compare a
>>> one-passed pointer to something with a different provenance.
>>>
>>> I just don't feel that it is adequate to impose such a detailed
>>> knowledge on users, which is basically about a marginal use
>>> case. One-off pointers don't occur "naturally" in many places,
>> 
>> They occur in the single important place - loop IV tests in
>> C++ style iterator != end where end is a "pointer" to one after
>> the last valid iterator value.
>I don't think this is limited to C++.

Sure, but still this is "usually" compared to a pointer into the array. If it is not, something fundamentally went wrong. If a compiler detects this, great, but that is a qoi issue. I would not expect a compiler to optimize much in such a situation, the first objective would be to point to the logical error.

So I really would like to hear about other aspects of our proposal. It would be good if we could agree on the fundamentals, first, and then sort out marginal cases later.

Jens


-- 
Jens Gustedt - INRIA & ICube, Strasbourg, France

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-24 18:41                               ` Jeff Law
@ 2019-04-24 19:30                                 ` Philipp Klaus Krause
  2019-04-24 19:55                                   ` Uecker, Martin
  2019-04-24 19:33                                 ` Jakub Jelinek
                                                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 56+ messages in thread
From: Philipp Klaus Krause @ 2019-04-24 19:30 UTC (permalink / raw)
  To: Jeff Law, Richard Biener
  Cc: Uecker, Martin, Peter.Sewell, cl-c-memory-object-model, gcc

Am 24.04.19 um 20:41 schrieb Jeff Law:
>>>> 4.) Compilers make sure that exposed objects never
>>>> are allocated next to each other (as Jens proposed).
>>> Ugh.  Not sure how you enforce that.  Consider that the compiler may
>>> ultimately have no control over layout of data in static storage.
>>
>> Make everything 1 byte larger.
> Not a bad idea.  I suspect the embedded folks would go bananas though.
> 

Some of the systems the Small Device C compiler targets have just 60 B
of RAM. And we are curetnly considering adding support for a multicore
microcontroller, of which also variants with just 60 B of RAM exist.
But even for the other architectures supported in SDCC, where devices
tend to have much bigger RAM, in the range of 128B to even a few KB,
wasting memory like that is not acceptable.

Philipp

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-24 18:41                               ` Jeff Law
  2019-04-24 19:30                                 ` Philipp Klaus Krause
@ 2019-04-24 19:33                                 ` Jakub Jelinek
  2019-04-24 21:19                                 ` Peter Sewell
  2019-04-25 12:39                                 ` Richard Biener
  3 siblings, 0 replies; 56+ messages in thread
From: Jakub Jelinek @ 2019-04-24 19:33 UTC (permalink / raw)
  To: Jeff Law
  Cc: Richard Biener, Uecker, Martin, Peter.Sewell, gcc,
	cl-c-memory-object-model

On Wed, Apr 24, 2019 at 12:41:25PM -0600, Jeff Law wrote:
> >>> 4.) Compilers make sure that exposed objects never
> >>> are allocated next to each other (as Jens proposed).
> >> Ugh.  Not sure how you enforce that.  Consider that the compiler may
> >> ultimately have no control over layout of data in static storage.
> > 
> > Make everything 1 byte larger.
> Not a bad idea.  I suspect the embedded folks would go bananas though.

With variables with larger alignment that can be significantly more
than one byte.  That is not going to fly even for non-embedded.

	Jakub

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-24 19:30                                 ` Philipp Klaus Krause
@ 2019-04-24 19:55                                   ` Uecker, Martin
  0 siblings, 0 replies; 56+ messages in thread
From: Uecker, Martin @ 2019-04-24 19:55 UTC (permalink / raw)
  To: krauseph, law, richard.guenther
  Cc: gcc, Peter.Sewell, cl-c-memory-object-model

Am Mittwoch, den 24.04.2019, 21:30 +0200 schrieb Philipp Klaus Krause:
> Am 24.04.19 um 20:41 schrieb Jeff Law:
> > > > > 4.) Compilers make sure that exposed objects never
> > > > > are allocated next to each other (as Jens proposed).
> > > > 
> > > > Ugh.  Not sure how you enforce that.  Consider that the compiler may
> > > > ultimately have no control over layout of data in static storage.
> > > 
> > > Make everything 1 byte larger.
> > 
> > Not a bad idea.  I suspect the embedded folks would go bananas though.
> > 
> 
> Some of the systems the Small Device C compiler targets have just 60 B
> of RAM. And we are curetnly considering adding support for a multicore
> microcontroller, of which also variants with just 60 B of RAM exist.
> But even for the other architectures supported in SDCC, where devices
> tend to have much bigger RAM, in the range of 128B to even a few KB,
> wasting memory like that is not acceptable.

It would not be a requirement, you could always do the pointer
comparison at run-time or write a smarter compiler which not
only uses provenance for optimization but also takes
neighborhood relationships into accout.

Martin



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-24 18:41                               ` Jeff Law
  2019-04-24 19:30                                 ` Philipp Klaus Krause
  2019-04-24 19:33                                 ` Jakub Jelinek
@ 2019-04-24 21:19                                 ` Peter Sewell
  2019-04-25 12:42                                   ` Richard Biener
  2019-04-25 12:39                                 ` Richard Biener
  3 siblings, 1 reply; 56+ messages in thread
From: Peter Sewell @ 2019-04-24 21:19 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Biener, Uecker, Martin, gcc, cl-c-memory-object-model

On 24/04/2019, Jeff Law <law@redhat.com> wrote:
> On 4/24/19 4:19 AM, Richard Biener wrote:
>> On Thu, Apr 18, 2019 at 3:42 PM Jeff Law <law@redhat.com> wrote:
>>>
>>> On 4/18/19 6:20 AM, Uecker, Martin wrote:
>>>> Am Donnerstag, den 18.04.2019, 11:45 +0100 schrieb Peter Sewell:
>>>>> On Thu, 18 Apr 2019 at 10:32, Richard Biener
>>>>> <richard.guenther@gmail.com> wrote:
>>>>
>>>>
>>>>> An equality test of two pointers, on the other hand, doesn't
>>>>> necessarily
>>>>> mean that they are interchangeable.  I don't see any good way to
>>>>> avoid that in a provenance semantics, where a one-past
>>>>> pointer might sometimes compare equal to a pointer to an
>>>>> adjacent object but be illegal for accessing it.
>>>>
>>>> As I see it, there are essentially four options:
>>>>
>>>> 1.) Compilers do not use conditional equivalences for
>>>> optimizations of pointers (or only when additional
>>>> conditions apply which make it safe)
>>> I know this will hit DOM and CSE.  I wouldn't be surprised if it touches
>>> VRP as well, maybe PTA.  It seems simple enough though :-)
>>
>> Also touches fundamental PHI-OPT transforms like
>>
>>  if (a == b)
>> ...
>>
>>  # c = PHI <a, b>
>>
>> where we'd lose eliding such a conditional.  IMHO that's bad
>> and very undesirable.
> But if we only suppress this optimization for pointers is it that terrible?

As far as I can see right now, there isn't a serious alternative.
Suppose x and y are adjacent, p=&x+1, and q=&y, so p==q might
be true (either in a semantics for the source-language == that just
compares the concrete representations or in one that's allowed
but not required to be provenance-sensitive).   It's not possible
to simultaneously have *p UB (which AIUI the compiler has to
have in the intermediate language, to make alias analysis sound),
*q not UB, and p interchangeable with q.    Am I missing something?

Peter


>
>>>>
>>>> 3.) We make comparison have the side effect that
>>>> afterwards any of the two pointers could have any
>>>> of the two provenances. (with disambiguitation
>>>> similar to what we have for casts).
>>> This could have some interesting effects on PTA.  Richi?
>>
>> I played with this and doing this in an incomplete way like
>> just handling
>>
>>   if (a == b)
>>
>> as two-way assignment during constraint building is possible.
>> But that's not enough of course since every call is implicitely
>> producing equivalences between everything [escaped] ...
>> which makes points-to degrade to a point where it is useless.
> But the calls aren't generating conditional equivalences.  I must be
> missing something here.  You're the expert in this space, so if you say
> it totally degrades PTA, then it's a non-starter.
>
>>
>> So I think we need a working scheme where points-to doesn't
>> degrade from equivalencies being computed and the compiler
>> being free to introduce equivalences as well as copy-propagate
>> those.
>>
>> Honestly I can't come up with a working solution to this
>> problem.
>>
>>>
>>>>
>>>> 4.) Compilers make sure that exposed objects never
>>>> are allocated next to each other (as Jens proposed).
>>> Ugh.  Not sure how you enforce that.  Consider that the compiler may
>>> ultimately have no control over layout of data in static storage.
>>
>> Make everything 1 byte larger.
> Not a bad idea.  I suspect the embedded folks would go bananas though.
>
> jeff
>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-24 18:41                               ` Jeff Law
                                                   ` (2 preceding siblings ...)
  2019-04-24 21:19                                 ` Peter Sewell
@ 2019-04-25 12:39                                 ` Richard Biener
  2019-05-09 11:26                                   ` Ralf Jung
  3 siblings, 1 reply; 56+ messages in thread
From: Richard Biener @ 2019-04-25 12:39 UTC (permalink / raw)
  To: Jeff Law; +Cc: Uecker, Martin, Peter.Sewell, gcc, cl-c-memory-object-model

On Wed, Apr 24, 2019 at 8:41 PM Jeff Law <law@redhat.com> wrote:
>
> On 4/24/19 4:19 AM, Richard Biener wrote:
> > On Thu, Apr 18, 2019 at 3:42 PM Jeff Law <law@redhat.com> wrote:
> >>
> >> On 4/18/19 6:20 AM, Uecker, Martin wrote:
> >>> Am Donnerstag, den 18.04.2019, 11:45 +0100 schrieb Peter Sewell:
> >>>> On Thu, 18 Apr 2019 at 10:32, Richard Biener <richard.guenther@gmail.com> wrote:
> >>>
> >>>
> >>>> An equality test of two pointers, on the other hand, doesn't necessarily
> >>>> mean that they are interchangeable.  I don't see any good way to
> >>>> avoid that in a provenance semantics, where a one-past
> >>>> pointer might sometimes compare equal to a pointer to an
> >>>> adjacent object but be illegal for accessing it.
> >>>
> >>> As I see it, there are essentially four options:
> >>>
> >>> 1.) Compilers do not use conditional equivalences for
> >>> optimizations of pointers (or only when additional
> >>> conditions apply which make it safe)
> >> I know this will hit DOM and CSE.  I wouldn't be surprised if it touches
> >> VRP as well, maybe PTA.  It seems simple enough though :-)
> >
> > Also touches fundamental PHI-OPT transforms like
> >
> >  if (a == b)
> > ...
> >
> >  # c = PHI <a, b>
> >
> > where we'd lose eliding such a conditional.  IMHO that's bad
> > and very undesirable.
> But if we only suppress this optimization for pointers is it that terrible?

I've at least seen a lot of cases with c = PHI <a, 0> for null pointer
checks.  It's just we're going to chase a lot of cases down even
knowing RTL will fuck up later big times.

>
>
> >>>
> >>> 3.) We make comparison have the side effect that
> >>> afterwards any of the two pointers could have any
> >>> of the two provenances. (with disambiguitation
> >>> similar to what we have for casts).
> >> This could have some interesting effects on PTA.  Richi?
> >
> > I played with this and doing this in an incomplete way like
> > just handling
> >
> >   if (a == b)
> >
> > as two-way assignment during constraint building is possible.
> > But that's not enough of course since every call is implicitely
> > producing equivalences between everything [escaped] ...
> > which makes points-to degrade to a point where it is useless.
> But the calls aren't generating conditional equivalences.  I must be
> missing something here.

if (compare_a_and_b (a, b))
  ...

yes, they are not creating conditional equivalences that can be
propagated out (w/o IPA info).  But we compute points-to
early, then inline (exposing the propagation opportunity),
preserving the points-to result.

>  You're the expert in this space, so if you say
> it totally degrades PTA, then it's a non-starter.

Well, it's possible to fix all testcases that get thrown to us but
what I have difficulties with is designing a way to follow the
proposed standard.

Btw, I've tried the trivial points-to patch for conditionals only
and even that regressed points-to testcases.

> >
> > So I think we need a working scheme where points-to doesn't
> > degrade from equivalencies being computed and the compiler
> > being free to introduce equivalences as well as copy-propagate
> > those.
> >
> > Honestly I can't come up with a working solution to this
> > problem.
> >
> >>
> >>>
> >>> 4.) Compilers make sure that exposed objects never
> >>> are allocated next to each other (as Jens proposed).
> >> Ugh.  Not sure how you enforce that.  Consider that the compiler may
> >> ultimately have no control over layout of data in static storage.
> >
> > Make everything 1 byte larger.
> Not a bad idea.  I suspect the embedded folks would go bananas though.

Maybe, but those folks are also using -fno-strict-aliasing ...

Anyhow, my issue is that I don't see a clean design that would follow
the proposed standard wording (even our current desired implementation
behavior btw!) and not degrade simple testcases :/

Richard.

> jeff
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-24 21:19                                 ` Peter Sewell
@ 2019-04-25 12:42                                   ` Richard Biener
  2019-04-25 13:03                                     ` Peter Sewell
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Biener @ 2019-04-25 12:42 UTC (permalink / raw)
  To: Peter.Sewell; +Cc: Jeff Law, Uecker, Martin, gcc, cl-c-memory-object-model

On Wed, Apr 24, 2019 at 11:18 PM Peter Sewell <Peter.Sewell@cl.cam.ac.uk> wrote:
>
> On 24/04/2019, Jeff Law <law@redhat.com> wrote:
> > On 4/24/19 4:19 AM, Richard Biener wrote:
> >> On Thu, Apr 18, 2019 at 3:42 PM Jeff Law <law@redhat.com> wrote:
> >>>
> >>> On 4/18/19 6:20 AM, Uecker, Martin wrote:
> >>>> Am Donnerstag, den 18.04.2019, 11:45 +0100 schrieb Peter Sewell:
> >>>>> On Thu, 18 Apr 2019 at 10:32, Richard Biener
> >>>>> <richard.guenther@gmail.com> wrote:
> >>>>
> >>>>
> >>>>> An equality test of two pointers, on the other hand, doesn't
> >>>>> necessarily
> >>>>> mean that they are interchangeable.  I don't see any good way to
> >>>>> avoid that in a provenance semantics, where a one-past
> >>>>> pointer might sometimes compare equal to a pointer to an
> >>>>> adjacent object but be illegal for accessing it.
> >>>>
> >>>> As I see it, there are essentially four options:
> >>>>
> >>>> 1.) Compilers do not use conditional equivalences for
> >>>> optimizations of pointers (or only when additional
> >>>> conditions apply which make it safe)
> >>> I know this will hit DOM and CSE.  I wouldn't be surprised if it touches
> >>> VRP as well, maybe PTA.  It seems simple enough though :-)
> >>
> >> Also touches fundamental PHI-OPT transforms like
> >>
> >>  if (a == b)
> >> ...
> >>
> >>  # c = PHI <a, b>
> >>
> >> where we'd lose eliding such a conditional.  IMHO that's bad
> >> and very undesirable.
> > But if we only suppress this optimization for pointers is it that terrible?
>
> As far as I can see right now, there isn't a serious alternative.
> Suppose x and y are adjacent, p=&x+1, and q=&y, so p==q might
> be true (either in a semantics for the source-language == that just
> compares the concrete representations or in one that's allowed
> but not required to be provenance-sensitive).   It's not possible
> to simultaneously have *p UB (which AIUI the compiler has to
> have in the intermediate language, to make alias analysis sound),
> *q not UB, and p interchangeable with q.    Am I missing something?

No, you are not missing anything.  We do have this issue right now,
independent of standard wordings.  But the standard has that, too,
not allowing *(&x + 1), allowing the compare and allowing *&y.
Isn't that a defect as well?

Richard.

> Peter
>
>
> >
> >>>>
> >>>> 3.) We make comparison have the side effect that
> >>>> afterwards any of the two pointers could have any
> >>>> of the two provenances. (with disambiguitation
> >>>> similar to what we have for casts).
> >>> This could have some interesting effects on PTA.  Richi?
> >>
> >> I played with this and doing this in an incomplete way like
> >> just handling
> >>
> >>   if (a == b)
> >>
> >> as two-way assignment during constraint building is possible.
> >> But that's not enough of course since every call is implicitely
> >> producing equivalences between everything [escaped] ...
> >> which makes points-to degrade to a point where it is useless.
> > But the calls aren't generating conditional equivalences.  I must be
> > missing something here.  You're the expert in this space, so if you say
> > it totally degrades PTA, then it's a non-starter.
> >
> >>
> >> So I think we need a working scheme where points-to doesn't
> >> degrade from equivalencies being computed and the compiler
> >> being free to introduce equivalences as well as copy-propagate
> >> those.
> >>
> >> Honestly I can't come up with a working solution to this
> >> problem.
> >>
> >>>
> >>>>
> >>>> 4.) Compilers make sure that exposed objects never
> >>>> are allocated next to each other (as Jens proposed).
> >>> Ugh.  Not sure how you enforce that.  Consider that the compiler may
> >>> ultimately have no control over layout of data in static storage.
> >>
> >> Make everything 1 byte larger.
> > Not a bad idea.  I suspect the embedded folks would go bananas though.
> >
> > jeff
> >
> >

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-25 12:42                                   ` Richard Biener
@ 2019-04-25 13:03                                     ` Peter Sewell
  2019-04-25 13:13                                       ` Richard Biener
  2019-04-29 14:31                                       ` Joseph Myers
  0 siblings, 2 replies; 56+ messages in thread
From: Peter Sewell @ 2019-04-25 13:03 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, Uecker, Martin, gcc, cl-c-memory-object-model

On 25/04/2019, Richard Biener <richard.guenther@gmail.com> wrote:
> On Wed, Apr 24, 2019 at 11:18 PM Peter Sewell <Peter.Sewell@cl.cam.ac.uk>
> wrote:
>>
>> On 24/04/2019, Jeff Law <law@redhat.com> wrote:
>> > On 4/24/19 4:19 AM, Richard Biener wrote:
>> >> On Thu, Apr 18, 2019 at 3:42 PM Jeff Law <law@redhat.com> wrote:
>> >>>
>> >>> On 4/18/19 6:20 AM, Uecker, Martin wrote:
>> >>>> Am Donnerstag, den 18.04.2019, 11:45 +0100 schrieb Peter Sewell:
>> >>>>> On Thu, 18 Apr 2019 at 10:32, Richard Biener
>> >>>>> <richard.guenther@gmail.com> wrote:
>> >>>>
>> >>>>
>> >>>>> An equality test of two pointers, on the other hand, doesn't
>> >>>>> necessarily
>> >>>>> mean that they are interchangeable.  I don't see any good way to
>> >>>>> avoid that in a provenance semantics, where a one-past
>> >>>>> pointer might sometimes compare equal to a pointer to an
>> >>>>> adjacent object but be illegal for accessing it.
>> >>>>
>> >>>> As I see it, there are essentially four options:
>> >>>>
>> >>>> 1.) Compilers do not use conditional equivalences for
>> >>>> optimizations of pointers (or only when additional
>> >>>> conditions apply which make it safe)
>> >>> I know this will hit DOM and CSE.  I wouldn't be surprised if it
>> >>> touches
>> >>> VRP as well, maybe PTA.  It seems simple enough though :-)
>> >>
>> >> Also touches fundamental PHI-OPT transforms like
>> >>
>> >>  if (a == b)
>> >> ...
>> >>
>> >>  # c = PHI <a, b>
>> >>
>> >> where we'd lose eliding such a conditional.  IMHO that's bad
>> >> and very undesirable.
>> > But if we only suppress this optimization for pointers is it that
>> > terrible?
>>
>> As far as I can see right now, there isn't a serious alternative.
>> Suppose x and y are adjacent, p=&x+1, and q=&y, so p==q might
>> be true (either in a semantics for the source-language == that just
>> compares the concrete representations or in one that's allowed
>> but not required to be provenance-sensitive).   It's not possible
>> to simultaneously have *p UB (which AIUI the compiler has to
>> have in the intermediate language, to make alias analysis sound),
>> *q not UB, and p interchangeable with q.    Am I missing something?
>
> No, you are not missing anything.  We do have this issue right now,
> independent of standard wordings.  But the standard has that, too,
> not allowing *(&x + 1), allowing the compare and allowing *&y.
> Isn't that a defect as well?

In the source-language semantics, it's ok for p==q to not imply
that p and q are interchangeable, and if compilers are doing
provenance-based alias analysis (so address equality doesn't
imply equally-readable/writable), it's pretty much inescapable.

Hence why (without knowing much about the optimisations that
actually go on) it's tempting to suggest that for pointer equality
comparison one could just not infer that interchangeability. I'd be
very interested to know the actual cost of that.

(The standard does also have a defect in its definition of equality - on
the one hand, it says that &x+1==&y comparison must be true
if they are adjacent, but on the other (in DR260) that everything
might be provenance-aware.   My preference would be to resolve
that by requiring source-language == to not be provenance aware,
but I think this is a more-or-less independent thing.)

Peter

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-25 13:03                                     ` Peter Sewell
@ 2019-04-25 13:13                                       ` Richard Biener
  2019-04-25 13:20                                         ` Peter Sewell
  2019-04-29 14:31                                       ` Joseph Myers
  1 sibling, 1 reply; 56+ messages in thread
From: Richard Biener @ 2019-04-25 13:13 UTC (permalink / raw)
  To: Peter.Sewell; +Cc: Jeff Law, Uecker, Martin, gcc, cl-c-memory-object-model

On Thu, Apr 25, 2019 at 3:03 PM Peter Sewell <Peter.Sewell@cl.cam.ac.uk> wrote:
>
> On 25/04/2019, Richard Biener <richard.guenther@gmail.com> wrote:
> > On Wed, Apr 24, 2019 at 11:18 PM Peter Sewell <Peter.Sewell@cl.cam.ac.uk>
> > wrote:
> >>
> >> On 24/04/2019, Jeff Law <law@redhat.com> wrote:
> >> > On 4/24/19 4:19 AM, Richard Biener wrote:
> >> >> On Thu, Apr 18, 2019 at 3:42 PM Jeff Law <law@redhat.com> wrote:
> >> >>>
> >> >>> On 4/18/19 6:20 AM, Uecker, Martin wrote:
> >> >>>> Am Donnerstag, den 18.04.2019, 11:45 +0100 schrieb Peter Sewell:
> >> >>>>> On Thu, 18 Apr 2019 at 10:32, Richard Biener
> >> >>>>> <richard.guenther@gmail.com> wrote:
> >> >>>>
> >> >>>>
> >> >>>>> An equality test of two pointers, on the other hand, doesn't
> >> >>>>> necessarily
> >> >>>>> mean that they are interchangeable.  I don't see any good way to
> >> >>>>> avoid that in a provenance semantics, where a one-past
> >> >>>>> pointer might sometimes compare equal to a pointer to an
> >> >>>>> adjacent object but be illegal for accessing it.
> >> >>>>
> >> >>>> As I see it, there are essentially four options:
> >> >>>>
> >> >>>> 1.) Compilers do not use conditional equivalences for
> >> >>>> optimizations of pointers (or only when additional
> >> >>>> conditions apply which make it safe)
> >> >>> I know this will hit DOM and CSE.  I wouldn't be surprised if it
> >> >>> touches
> >> >>> VRP as well, maybe PTA.  It seems simple enough though :-)
> >> >>
> >> >> Also touches fundamental PHI-OPT transforms like
> >> >>
> >> >>  if (a == b)
> >> >> ...
> >> >>
> >> >>  # c = PHI <a, b>
> >> >>
> >> >> where we'd lose eliding such a conditional.  IMHO that's bad
> >> >> and very undesirable.
> >> > But if we only suppress this optimization for pointers is it that
> >> > terrible?
> >>
> >> As far as I can see right now, there isn't a serious alternative.
> >> Suppose x and y are adjacent, p=&x+1, and q=&y, so p==q might
> >> be true (either in a semantics for the source-language == that just
> >> compares the concrete representations or in one that's allowed
> >> but not required to be provenance-sensitive).   It's not possible
> >> to simultaneously have *p UB (which AIUI the compiler has to
> >> have in the intermediate language, to make alias analysis sound),
> >> *q not UB, and p interchangeable with q.    Am I missing something?
> >
> > No, you are not missing anything.  We do have this issue right now,
> > independent of standard wordings.  But the standard has that, too,
> > not allowing *(&x + 1), allowing the compare and allowing *&y.
> > Isn't that a defect as well?
>
> In the source-language semantics, it's ok for p==q to not imply
> that p and q are interchangeable, and if compilers are doing
> provenance-based alias analysis (so address equality doesn't
> imply equally-readable/writable), it's pretty much inescapable.
>
> Hence why (without knowing much about the optimisations that
> actually go on) it's tempting to suggest that for pointer equality
> comparison one could just not infer that interchangeability. I'd be
> very interested to know the actual cost of that.

Since we at the moment track provenance through non-pointers
it means we cannot do this for non-pointer equivalences either.
So doing this means no longer tracking provenance through
non-pointers.

> (The standard does also have a defect in its definition of equality - on
> the one hand, it says that &x+1==&y comparison must be true
> if they are adjacent, but on the other (in DR260) that everything
> might be provenance-aware.   My preference would be to resolve
> that by requiring source-language == to not be provenance aware,
> but I think this is a more-or-less independent thing.)

I think it's related at least to us using provenance to optimize
pointer comparisons.

Richard.

> Peter

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-25 13:13                                       ` Richard Biener
@ 2019-04-25 13:20                                         ` Peter Sewell
  0 siblings, 0 replies; 56+ messages in thread
From: Peter Sewell @ 2019-04-25 13:20 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, Uecker, Martin, gcc, cl-c-memory-object-model

On 25/04/2019, Richard Biener <richard.guenther@gmail.com> wrote:
> On Thu, Apr 25, 2019 at 3:03 PM Peter Sewell <Peter.Sewell@cl.cam.ac.uk>
> wrote:
>>
>> On 25/04/2019, Richard Biener <richard.guenther@gmail.com> wrote:
>> > On Wed, Apr 24, 2019 at 11:18 PM Peter Sewell
>> > <Peter.Sewell@cl.cam.ac.uk>
>> > wrote:
>> >>
>> >> On 24/04/2019, Jeff Law <law@redhat.com> wrote:
>> >> > On 4/24/19 4:19 AM, Richard Biener wrote:
>> >> >> On Thu, Apr 18, 2019 at 3:42 PM Jeff Law <law@redhat.com> wrote:
>> >> >>>
>> >> >>> On 4/18/19 6:20 AM, Uecker, Martin wrote:
>> >> >>>> Am Donnerstag, den 18.04.2019, 11:45 +0100 schrieb Peter Sewell:
>> >> >>>>> On Thu, 18 Apr 2019 at 10:32, Richard Biener
>> >> >>>>> <richard.guenther@gmail.com> wrote:
>> >> >>>>
>> >> >>>>
>> >> >>>>> An equality test of two pointers, on the other hand, doesn't
>> >> >>>>> necessarily
>> >> >>>>> mean that they are interchangeable.  I don't see any good way to
>> >> >>>>> avoid that in a provenance semantics, where a one-past
>> >> >>>>> pointer might sometimes compare equal to a pointer to an
>> >> >>>>> adjacent object but be illegal for accessing it.
>> >> >>>>
>> >> >>>> As I see it, there are essentially four options:
>> >> >>>>
>> >> >>>> 1.) Compilers do not use conditional equivalences for
>> >> >>>> optimizations of pointers (or only when additional
>> >> >>>> conditions apply which make it safe)
>> >> >>> I know this will hit DOM and CSE.  I wouldn't be surprised if it
>> >> >>> touches
>> >> >>> VRP as well, maybe PTA.  It seems simple enough though :-)
>> >> >>
>> >> >> Also touches fundamental PHI-OPT transforms like
>> >> >>
>> >> >>  if (a == b)
>> >> >> ...
>> >> >>
>> >> >>  # c = PHI <a, b>
>> >> >>
>> >> >> where we'd lose eliding such a conditional.  IMHO that's bad
>> >> >> and very undesirable.
>> >> > But if we only suppress this optimization for pointers is it that
>> >> > terrible?
>> >>
>> >> As far as I can see right now, there isn't a serious alternative.
>> >> Suppose x and y are adjacent, p=&x+1, and q=&y, so p==q might
>> >> be true (either in a semantics for the source-language == that just
>> >> compares the concrete representations or in one that's allowed
>> >> but not required to be provenance-sensitive).   It's not possible
>> >> to simultaneously have *p UB (which AIUI the compiler has to
>> >> have in the intermediate language, to make alias analysis sound),
>> >> *q not UB, and p interchangeable with q.    Am I missing something?
>> >
>> > No, you are not missing anything.  We do have this issue right now,
>> > independent of standard wordings.  But the standard has that, too,
>> > not allowing *(&x + 1), allowing the compare and allowing *&y.
>> > Isn't that a defect as well?
>>
>> In the source-language semantics, it's ok for p==q to not imply
>> that p and q are interchangeable, and if compilers are doing
>> provenance-based alias analysis (so address equality doesn't
>> imply equally-readable/writable), it's pretty much inescapable.
>>
>> Hence why (without knowing much about the optimisations that
>> actually go on) it's tempting to suggest that for pointer equality
>> comparison one could just not infer that interchangeability. I'd be
>> very interested to know the actual cost of that.
>
> Since we at the moment track provenance through non-pointers
> it means we cannot do this for non-pointer equivalences either.
> So doing this means no longer tracking provenance through
> non-pointers.

Yes, it would mean that.

(As you may recall, we did earlier propose a source-language
semantics that did track provenance through integers, so that's
not inconceivable - but it does get complicated, and the
current consensus seems to be towards the not-via-integer
options.)

>> (The standard does also have a defect in its definition of equality - on
>> the one hand, it says that &x+1==&y comparison must be true
>> if they are adjacent, but on the other (in DR260) that everything
>> might be provenance-aware.   My preference would be to resolve
>> that by requiring source-language == to not be provenance aware,
>> but I think this is a more-or-less independent thing.)
>
> I think it's related at least to us using provenance to optimize
> pointer comparisons.

Yes.  If that's a significant win, one would want to keep allowing
(but not requiring) == to be provenance-aware.  The argument
in the other direction is that it would simplify the source semantics.

Peter

> Richard.
>
>> Peter
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-25 13:03                                     ` Peter Sewell
  2019-04-25 13:13                                       ` Richard Biener
@ 2019-04-29 14:31                                       ` Joseph Myers
  1 sibling, 0 replies; 56+ messages in thread
From: Joseph Myers @ 2019-04-29 14:31 UTC (permalink / raw)
  To: Peter Sewell
  Cc: Richard Biener, Jeff Law, Uecker, Martin, gcc, cl-c-memory-object-model

On Thu, 25 Apr 2019, Peter Sewell wrote:

> (The standard does also have a defect in its definition of equality - on
> the one hand, it says that &x+1==&y comparison must be true
> if they are adjacent, but on the other (in DR260) that everything
> might be provenance-aware.   My preference would be to resolve
> that by requiring source-language == to not be provenance aware,
> but I think this is a more-or-less independent thing.)

I've argued (in bug 61502 which you reported) that whether two objects 
follow each other in the address space need not be constant for the 
lifetime of those objects (that following in the address space, for 
separate objects, means nothing other than certain properties of 
comparisons, and, in particular, need not be constant as a property 
applied to constant addresses).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: C provenance semantics proposal
  2019-04-25 12:39                                 ` Richard Biener
@ 2019-05-09 11:26                                   ` Ralf Jung
  0 siblings, 0 replies; 56+ messages in thread
From: Ralf Jung @ 2019-05-09 11:26 UTC (permalink / raw)
  To: Richard Biener, Jeff Law
  Cc: Uecker, Martin, Peter.Sewell, cl-c-memory-object-model, gcc

Hi all,

>>>  if (a == b)
>>> ...
>>>
>>>  # c = PHI <a, b>
>>>
>>> where we'd lose eliding such a conditional.  IMHO that's bad
>>> and very undesirable.
>> But if we only suppress this optimization for pointers is it that terrible?
> 
> I've at least seen a lot of cases with c = PHI <a, 0> for null pointer
> checks.  It's just we're going to chase a lot of cases down even
> knowing RTL will fuck up later big times.

Even if pointer substitution based on "==" is not possible in general, it can
still be possible (depending on other choices in the IR semantics) to do
substitution for "== NULL" pointer tests.  This is the conclusion we came to
when studying similar questions for LLVM IR in
<https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf>.

Kind regards,
Ralf

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2019-05-09 11:26 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-02  8:11 C provenance semantics proposal Peter Sewell
2019-04-12 14:51 ` Jeff Law
2019-04-12 15:31   ` Peter Sewell
2019-04-17  9:06     ` Richard Biener
2019-04-17  9:15       ` Peter Sewell
2019-04-17  9:41         ` Richard Biener
2019-04-17 11:53           ` Uecker, Martin
2019-04-17 12:41             ` Richard Biener
2019-04-17 12:56               ` Uecker, Martin
2019-04-17 13:35                 ` Richard Biener
2019-04-17 14:12                   ` Uecker, Martin
2019-04-17 17:31                     ` Peter Sewell
2019-04-18  9:32                     ` Richard Biener
2019-04-18  9:56                       ` Richard Biener
2019-04-18 10:48                         ` Peter Sewell
2019-04-18 11:57                         ` Uecker, Martin
2019-04-18 12:31                           ` Richard Biener
2019-04-18 13:25                             ` Uecker, Martin
2019-04-18 10:45                       ` Peter Sewell
2019-04-18 12:20                         ` Uecker, Martin
2019-04-18 12:42                           ` Richard Biener
2019-04-18 12:47                             ` Jakub Jelinek
2019-04-18 12:51                               ` Jakub Jelinek
2019-04-18 13:29                                 ` Jeff Law
2019-04-24 10:12                                   ` Richard Biener
2019-04-18 13:49                             ` Uecker, Martin
2019-04-19  8:19                             ` Jens Gustedt
2019-04-19  8:49                               ` Jakub Jelinek
2019-04-19  9:09                                 ` Jens Gustedt
2019-04-19  9:34                                   ` Jakub Jelinek
2019-04-21  8:15                                     ` Jens Gustedt
2019-04-24 10:24                                   ` Richard Biener
2019-04-24 18:43                                     ` Jeff Law
2019-04-24 19:21                                       ` Jens Gustedt
2019-04-19  9:11                                 ` Peter Sewell
2019-04-19  9:15                                   ` Jens Gustedt
2019-04-19  9:35                                     ` Peter Sewell
2019-04-19 10:35                                       ` Uecker, Martin
2019-04-19 10:01                               ` Uecker, Martin
2019-04-18 13:42                           ` Jeff Law
2019-04-18 13:54                             ` Uecker, Martin
2019-04-18 14:49                               ` Peter Sewell
2019-04-18 15:09                                 ` Uecker, Martin
2019-04-24 10:19                             ` Richard Biener
2019-04-24 18:41                               ` Jeff Law
2019-04-24 19:30                                 ` Philipp Klaus Krause
2019-04-24 19:55                                   ` Uecker, Martin
2019-04-24 19:33                                 ` Jakub Jelinek
2019-04-24 21:19                                 ` Peter Sewell
2019-04-25 12:42                                   ` Richard Biener
2019-04-25 13:03                                     ` Peter Sewell
2019-04-25 13:13                                       ` Richard Biener
2019-04-25 13:20                                         ` Peter Sewell
2019-04-29 14:31                                       ` Joseph Myers
2019-04-25 12:39                                 ` Richard Biener
2019-05-09 11:26                                   ` Ralf Jung

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).