aliasing

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* aliasing
@ 2024-03-18  7:03 Martin Uecker
  2024-03-18  8:26 ` aliasing Richard Biener
  2024-03-18  9:00 ` aliasing David Brown
  0 siblings, 2 replies; 19+ messages in thread
From: Martin Uecker @ 2024-03-18  7:03 UTC (permalink / raw)
  To: gcc; +Cc: Richard Biener

Hi,

can you please take a quick look at this? This is intended to align
the C standard with existing practice with respect to aliasing by
removing the special rules for "objects with no declared type" and
making it fully symmetric and only based on types with non-atomic
character types being able to alias everything.

Unrelated to this change, I have another question:  I wonder if GCC
(or any other compiler) actually exploits the " or is copied as an
array of  byte type, " rule to  make assumptions about the effective
types of the target array? I know compilers do this work memcpy...  
Maybe also if a loop is transformed to memcpy?

Martin

Add the following definition after 3.5, paragraph 2:

byte array
object having either no declared type or an array of objects declared with a byte type

byte type
non-atomic character type

Modify 6.5,paragraph 6:
The effective type of an object that is not a byte array, for an access to its
stored value, is the declared type of the object.97) If a value is
stored into a byte array through an lvalue having a byte type, then
the type of the lvalue becomes the effective type of the object for that
access and for subsequent accesses that do not modify the stored value.
If a value is copied into a byte array using memcpy or memmove, or is 
copied as an array of byte type, then the effective type of the
modified object for that access and for subsequent accesses that do not
modify the value is the effective type of the object from which the
value is copied, if it has one. For all other accesses to a byte array,
the effective type of the object is simply the type of the lvalue used
for the access.

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3230.pdf

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: aliasing
  2024-03-18  7:03 aliasing Martin Uecker
@ 2024-03-18  8:26 ` Richard Biener
  2024-03-18 10:55   ` aliasing Martin Uecker
  2024-03-18  9:00 ` aliasing David Brown
  1 sibling, 1 reply; 19+ messages in thread
From: Richard Biener @ 2024-03-18  8:26 UTC (permalink / raw)
  To: Martin Uecker; +Cc: gcc

On Mon, Mar 18, 2024 at 8:03 AM Martin Uecker <uecker@tugraz.at> wrote:
>
>
> Hi,
>
> can you please take a quick look at this? This is intended to align
> the C standard with existing practice with respect to aliasing by
> removing the special rules for "objects with no declared type" and
> making it fully symmetric and only based on types with non-atomic
> character types being able to alias everything.
>
>
> Unrelated to this change, I have another question:  I wonder if GCC
> (or any other compiler) actually exploits the " or is copied as an
> array of  byte type, " rule to  make assumptions about the effective
> types of the target array?

We do not make assumptions about this anymore.  We did in the
past (might be a distant past) transform say

struct X { int i; float f; } a, b;

void foo ()
{
  __builtin_memcpy (&a, &b, sizeof (struct X));
}

into

  a = b;

which has an lvalue of type struct X.  But this assumed b's effective
type was X.  Nowadays we treat the copy as using alias set zero.
That effectively means the destination gets its effective type "cleared"
(all subsequent accesses are valid to access storage with the effective
type of a byte array).

> I know compilers do this work memcpy...
> Maybe also if a loop is transformed to memcpy?

We currently do not preserve the original effective type of the destination
(or the effective type used to access the source) when doing this.  With
some tricks we could (we also lose aligment guarantees of the original
accesses).

> Martin
>
>
> Add the following definition after 3.5, paragraph 2:
>
> byte array
> object having either no declared type or an array of objects declared with a byte type
>
> byte type
> non-atomic character type
>
> Modify 6.5,paragraph 6:
> The effective type of an object that is not a byte array, for an access to its
> stored value, is the declared type of the object.97) If a value is
> stored into a byte array through an lvalue having a byte type, then
> the type of the lvalue becomes the effective type of the object for that
> access and for subsequent accesses that do not modify the stored value.
> If a value is copied into a byte array using memcpy or memmove, or is
> copied as an array of byte type, then the effective type of the
> modified object for that access and for subsequent accesses that do not
> modify the value is the effective type of the object from which the
> value is copied, if it has one. For all other accesses to a byte array,
> the effective type of the object is simply the type of the lvalue used
> for the access.

What's the purpose of this change?  To me this reads more confusing and
complicated than what I find in the c23 draft from April last year.

I'll note that GCC does not take advantage of "The effective type of an
object for an access to its stored value is the declard type of the object",
instead it always relies on the type of the lvalue (treating non-atomic
character types specially, as well as treating all string ops like memcpy
or strcpy as using a character type for the access) and the effective type
of the object for that access and for subsequent accesses that do not
modify the stored value always becomes that of the lvalue type used for
the access.

Let me give you an complication example made valid in C++:

struct B { float x; float y; };
struct X { int n; char buf[8]; } x, y;

void foo(struct B *b)
{
  memcpy (x.buf, b, sizeof (struct B)); // in C++:  new (x.buf) B (*b);
  y = x; // (*)
}

What's the effective type of 'x' in the 'y = x' copy?  With your new
wording, does 'B' transfer to x.buf with memcpy?  What's the
frankenstein effective type of 'x' then?  What's the effective type
of 'y' after the copy?  Can an lvalue of type 'B' access y.buf?

Richard.

> https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3230.pdf
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: aliasing
  2024-03-18  7:03 aliasing Martin Uecker
  2024-03-18  8:26 ` aliasing Richard Biener
@ 2024-03-18  9:00 ` David Brown
  2024-03-18 10:09   ` aliasing Jonathan Wakely
  2024-03-18 11:41   ` aliasing Martin Uecker
  1 sibling, 2 replies; 19+ messages in thread
From: David Brown @ 2024-03-18  9:00 UTC (permalink / raw)
  To: Martin Uecker, gcc; +Cc: Richard Biener

Hi,

I would very glad to see this change in the standards.

Should "byte type" include all character types (signed, unsigned and 
plain), or should it be restricted to "unsigned char" since that is the 
"byte" type ?  (I think allowing all character types makes sense, but 
only unsigned char is guaranteed to be suitable for general object 
backing store.)

Should it also include "uint8_t" (if it exists) ?  "uint8_t" is often an 
alias for "unsigned char", but it could be something different, like an 
alias for __UINT8_TYPE__, or "unsigned int 
__attribute__((mode(QImode)))", which is used in the AVR gcc port.

In my line of work - small-systems embedded development - it is common 
to have "home-made" or specialised memory allocation systems rather than 
relying on a generic heap.  This is, I think, some of the "existing 
practice" that you are considering here - there is a "backing store" of 
some sort that can be allocated and used as objects of a type other than 
the declared type of the backing store.  While a simple unsigned char 
array is a very common kind of backing store, there are others that are 
used, and it would be good to be sure of the correctness guarantees for 
these.  Possibilities that I have seen include:

unsigned char heap1[N];

uint8_t heap2[N];

union {
	double dummy_for_alignment;
	char heap[N];
} heap3;

struct {
	uint32_t capacity;
	uint8_t * p_next_free;
	uint8_t heap[N];
} heap4;

uint32_t heap5[N];

Apart from this last one, if "uint8_t" is guaranteed to be a "byte 
type", then I believe your wording means that these unions and structs 
would also work as "byte arrays".  But it might be useful to add a 
footnote clarifying that.

(It is also not uncommon to have the backing space allocated by the 
linker, but then it falls under the existing "no declared type" case.)

I would not want uint32_t to be considered an "alias anything" type, but 
I have occasionally seen such types used for memory store backings.  It 
is perhaps worth considering defining "byte type" as "non-atomic 
character type, [u]int8_t (if they exist), or other 
implementation-defined types".

Some other compilers might guarantee not to do type-based alias analysis 
and thus view all types as "byte types" in this way.  For gcc, there 
could be a kind of reverse "may_alias" type attribute to create such types.

There are a number of other features that could make allocation 
functions more efficient and safer in use, and which could be ideally be 
standardised in the C standards or at least added as gcc extensions, but 
I think that's more than you are looking for here!

David

On 18/03/2024 08:03, Martin Uecker via Gcc wrote:
> 
> Hi,
> 
> can you please take a quick look at this? This is intended to align
> the C standard with existing practice with respect to aliasing by
> removing the special rules for "objects with no declared type" and
> making it fully symmetric and only based on types with non-atomic
> character types being able to alias everything.
> 
> 
> Unrelated to this change, I have another question:  I wonder if GCC
> (or any other compiler) actually exploits the " or is copied as an
> array of  byte type, " rule to  make assumptions about the effective
> types of the target array? I know compilers do this work memcpy...
> Maybe also if a loop is transformed to memcpy?
> 
> Martin
> 
> 
> Add the following definition after 3.5, paragraph 2:
> 
> byte array
> object having either no declared type or an array of objects declared with a byte type
> 
> byte type
> non-atomic character type
> 
> Modify 6.5,paragraph 6:
> The effective type of an object that is not a byte array, for an access to its
> stored value, is the declared type of the object.97) If a value is
> stored into a byte array through an lvalue having a byte type, then
> the type of the lvalue becomes the effective type of the object for that
> access and for subsequent accesses that do not modify the stored value.
> If a value is copied into a byte array using memcpy or memmove, or is
> copied as an array of byte type, then the effective type of the
> modified object for that access and for subsequent accesses that do not
> modify the value is the effective type of the object from which the
> value is copied, if it has one. For all other accesses to a byte array,
> the effective type of the object is simply the type of the lvalue used
> for the access.
> 
> https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3230.pdf
> 
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: aliasing
  2024-03-18  9:00 ` aliasing David Brown
@ 2024-03-18 10:09   ` Jonathan Wakely
  2024-03-18 11:41   ` aliasing Martin Uecker
  1 sibling, 0 replies; 19+ messages in thread
From: Jonathan Wakely @ 2024-03-18 10:09 UTC (permalink / raw)
  To: David Brown; +Cc: Martin Uecker, gcc, Richard Biener

On Mon, 18 Mar 2024 at 09:01, David Brown wrote:
> Should it also include "uint8_t" (if it exists) ?  "uint8_t" is often an
> alias for "unsigned char", but it could be something different, like an
> alias for __UINT8_TYPE__, or "unsigned int
> __attribute__((mode(QImode)))", which is used in the AVR gcc port.

N.B. __UINT8_TYPE__ is not a type, it's just a macro that expands to
something else (like unsigned char).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: aliasing
  2024-03-18  8:26 ` aliasing Richard Biener
@ 2024-03-18 10:55   ` Martin Uecker
  2024-03-18 11:56     ` aliasing Martin Uecker
  0 siblings, 1 reply; 19+ messages in thread
From: Martin Uecker @ 2024-03-18 10:55 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc

Am Montag, dem 18.03.2024 um 09:26 +0100 schrieb Richard Biener:
> On Mon, Mar 18, 2024 at 8:03 AM Martin Uecker <uecker@tugraz.at> wrote:
> > 
> > 
> > Hi,
> > 
> > can you please take a quick look at this? This is intended to align
> > the C standard with existing practice with respect to aliasing by
> > removing the special rules for "objects with no declared type" and
> > making it fully symmetric and only based on types with non-atomic
> > character types being able to alias everything.
> > 
> > 
> > Unrelated to this change, I have another question:  I wonder if GCC
> > (or any other compiler) actually exploits the " or is copied as an
> > array of  byte type, " rule to  make assumptions about the effective
> > types of the target array?
> 
> We do not make assumptions about this anymore.  We did in the
> past (might be a distant past) transform say
> 
> struct X { int i; float f; } a, b;
> 
> void foo ()
> {
>   __builtin_memcpy (&a, &b, sizeof (struct X));
> }
> 
> into
> 
>   a = b;
> 
> which has an lvalue of type struct X.  But this assumed b's effective
> type was X.  Nowadays we treat the copy as using alias set zero.
> That effectively means the destination gets its effective type "cleared"
> (all subsequent accesses are valid to access storage with the effective
> type of a byte array).

Ok, thanks!  I wonder whether we should remove this special rule
from the standard.  I mostly worried about the "copied as an
array of  byte type" wording which seems difficult to precisely
define.

> 
> > I know compilers do this work memcpy...
> > Maybe also if a loop is transformed to memcpy?
> 
> We currently do not preserve the original effective type of the destination
> (or the effective type used to access the source) when doing this.  With
> some tricks we could (we also lose aligment guarantees of the original
> accesses).
> 
> > Martin
> > 
> > 
> > Add the following definition after 3.5, paragraph 2:
> > 
> > byte array
> > object having either no declared type or an array of objects declared with a byte type
> > 
> > byte type
> > non-atomic character type

This essentially becomes the "alias anything" type.

> > 
> > Modify 6.5,paragraph 6:
> > The effective type of an object that is not a byte array, for an access to its
> > stored value, is the declared type of the object.97) If a value is
> > stored into a byte array through an lvalue having a byte type, then
> > the type of the lvalue becomes the effective type of the object for that
> > access and for subsequent accesses that do not modify the stored value.
> > If a value is copied into a byte array using memcpy or memmove, or is
> > copied as an array of byte type, then the effective type of the
> > modified object for that access and for subsequent accesses that do not
> > modify the value is the effective type of the object from which the
> > value is copied, if it has one. For all other accesses to a byte array,
> > the effective type of the object is simply the type of the lvalue used
> > for the access.
> 
> What's the purpose of this change?  To me this reads more confusing and
> complicated than what I find in the c23 draft from April last year.

Note that C23 has been finalized. This change is proposed for the
revision after c23. 

> 
> I'll note that GCC does not take advantage of "The effective type of an
> object for an access to its stored value is the declard type of the object",
> instead it always relies on the type of the lvalue (treating non-atomic
> character types specially, as well as treating all string ops like memcpy
> or strcpy as using a character type for the access) and the effective type
> of the object for that access and for subsequent accesses that do not
> modify the stored value always becomes that of the lvalue type used for
> the access.

Understood.

> 
> Let me give you an complication example made valid in C++:
> 
> struct B { float x; float y; };
> struct X { int n; char buf[8]; } x, y;
> 
> void foo(struct B *b)
> {
>   memcpy (x.buf, b, sizeof (struct B)); // in C++:  new (x.buf) B (*b);

Let's make it an explicit store for the moment
(should not make a difference though):

    *(struct B*)x.buf = *b;

>   y = x; // (*)
> }
> 
> What's the effective type of 'x' in the 'y = x' copy? 

Good point. The existing wording would take the declared
type of x as the effective type, but this may not be
what you are interested in. Let's assume that x has no declared
type but that it had effective type struct X before the
store to x.buf (because of an even earlier store to 
x with type struct X).

There is a general question how stores to subobjects
affect effective types and I do not think this is clear
even before this proposed change.


>  With your new
> wording, does 'B' transfer to x.buf with memcpy?  

Yes, it would. At least this is the intention.

Note that this would currently be undefined behavior
because x.buf has a declared type. So this is main
thing we want to change, i.e. making this defined.

> What's the
> frankenstein effective type of 'x' then?  What's the effective type
> of 'y' after the copy?  Can an lvalue of type 'B' access y.buf?

All good questions, but unfortunately not clear even
in the current wording I think.

Martin


> 
> Richard.
> 
> > https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3230.pdf
> > 
> > 




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: aliasing
  2024-03-18  9:00 ` aliasing David Brown
  2024-03-18 10:09   ` aliasing Jonathan Wakely
@ 2024-03-18 11:41   ` Martin Uecker
  2024-03-18 13:29     ` aliasing David Brown
  1 sibling, 1 reply; 19+ messages in thread
From: Martin Uecker @ 2024-03-18 11:41 UTC (permalink / raw)
  To: David Brown; +Cc: gcc



Hi David,

Am Montag, dem 18.03.2024 um 10:00 +0100 schrieb David Brown:
> Hi,
> 
> I would very glad to see this change in the standards.
> 
> 
> Should "byte type" include all character types (signed, unsigned and 
> plain), or should it be restricted to "unsigned char" since that is the 
> "byte" type ?  (I think allowing all character types makes sense, but 
> only unsigned char is guaranteed to be suitable for general object 
> backing store.)

At the moment, the special type that can access all others are
all non-atomic character types.  So for symmetry reasons, it
seems that this is also what we want for backing store.

I am not sure what you mean by "only unsigned char". Are you talking
about C++?  "unsigned char" has no special role in C.

> 
> Should it also include "uint8_t" (if it exists) ?  "uint8_t" is often an 
> alias for "unsigned char", but it could be something different, like an 
> alias for __UINT8_TYPE__, or "unsigned int 
> __attribute__((mode(QImode)))", which is used in the AVR gcc port.

I think this might be a reason to not include it, as it could
affect aliasing analysis. At least, this would be a different
independent change to consider.

> 
> In my line of work - small-systems embedded development - it is common 
> to have "home-made" or specialised memory allocation systems rather than 
> relying on a generic heap.  This is, I think, some of the "existing 
> practice" that you are considering here - there is a "backing store" of 
> some sort that can be allocated and used as objects of a type other than 
> the declared type of the backing store.  While a simple unsigned char 
> array is a very common kind of backing store, there are others that are 
> used, and it would be good to be sure of the correctness guarantees for 
> these.  Possibilities that I have seen include:
> 
> unsigned char heap1[N];
> 
> uint8_t heap2[N];
> 
> union {
> 	double dummy_for_alignment;
> 	char heap[N];
> } heap3;
> 
> struct {
> 	uint32_t capacity;
> 	uint8_t * p_next_free;
> 	uint8_t heap[N];
> } heap4;
> 
> uint32_t heap5[N];
> 
> Apart from this last one, if "uint8_t" is guaranteed to be a "byte 
> type", then I believe your wording means that these unions and structs 
> would also work as "byte arrays".  But it might be useful to add a 
> footnote clarifying that.
> 

I need to think about this. 

> (It is also not uncommon to have the backing space allocated by the 
> linker, but then it falls under the existing "no declared type" case.)

Yes, although with the change we would make the "no declared type" also 
be byte arrays, so there is then simply no difference anymore.

> 
> 
> I would not want uint32_t to be considered an "alias anything" type, but 
> I have occasionally seen such types used for memory store backings.  It 
> is perhaps worth considering defining "byte type" as "non-atomic 
> character type, [u]int8_t (if they exist), or other 
> implementation-defined types".

This could make sense, the question is whether we want to encourage
the use of other types for this use case, as this would then not
be portable.

Are there important reason for not using "unsigned char" ?

> 
> Some other compilers might guarantee not to do type-based alias analysis 
> and thus view all types as "byte types" in this way.  For gcc, there 
> could be a kind of reverse "may_alias" type attribute to create such types.
> 
> 
> 
> There are a number of other features that could make allocation 
> functions more efficient and safer in use, and which could be ideally be 
> standardised in the C standards or at least added as gcc extensions, but 
> I think that's more than you are looking for here!

It is possible to submit proposal to WG14.

Martin


> 
> David
> 
> 
> 
> On 18/03/2024 08:03, Martin Uecker via Gcc wrote:
> > 
> > Hi,
> > 
> > can you please take a quick look at this? This is intended to align
> > the C standard with existing practice with respect to aliasing by
> > removing the special rules for "objects with no declared type" and
> > making it fully symmetric and only based on types with non-atomic
> > character types being able to alias everything.
> > 
> > 
> > Unrelated to this change, I have another question:  I wonder if GCC
> > (or any other compiler) actually exploits the " or is copied as an
> > array of  byte type, " rule to  make assumptions about the effective
> > types of the target array? I know compilers do this work memcpy...
> > Maybe also if a loop is transformed to memcpy?
> > 
> > Martin
> > 
> > 
> > Add the following definition after 3.5, paragraph 2:
> > 
> > byte array
> > object having either no declared type or an array of objects declared with a byte type
> > 
> > byte type
> > non-atomic character type
> > 
> > Modify 6.5,paragraph 6:
> > The effective type of an object that is not a byte array, for an access to its
> > stored value, is the declared type of the object.97) If a value is
> > stored into a byte array through an lvalue having a byte type, then
> > the type of the lvalue becomes the effective type of the object for that
> > access and for subsequent accesses that do not modify the stored value.
> > If a value is copied into a byte array using memcpy or memmove, or is
> > copied as an array of byte type, then the effective type of the
> > modified object for that access and for subsequent accesses that do not
> > modify the value is the effective type of the object from which the
> > value is copied, if it has one. For all other accesses to a byte array,
> > the effective type of the object is simply the type of the lvalue used
> > for the access.
> > 
> > https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3230.pdf
> > 
> > 
> > 
> 

-- 
Univ.-Prof. Dr. rer. nat. Martin Uecker
Graz University of Technology
Institute of Biomedical Imaging



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: aliasing
  2024-03-18 10:55   ` aliasing Martin Uecker
@ 2024-03-18 11:56     ` Martin Uecker
  2024-03-18 13:21       ` aliasing Richard Biener
  0 siblings, 1 reply; 19+ messages in thread
From: Martin Uecker @ 2024-03-18 11:56 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc

Am Montag, dem 18.03.2024 um 11:55 +0100 schrieb Martin Uecker:
> Am Montag, dem 18.03.2024 um 09:26 +0100 schrieb Richard Biener:
> > On Mon, Mar 18, 2024 at 8:03 AM Martin Uecker <uecker@tugraz.at> wrote:
> > 
> 
> > 
> > Let me give you an complication example made valid in C++:
> > 
> > struct B { float x; float y; };
> > struct X { int n; char buf[8]; } x, y;
> > 
> > void foo(struct B *b)
> > {
> >   memcpy (x.buf, b, sizeof (struct B)); // in C++:  new (x.buf) B (*b);
> 
> Let's make it an explicit store for the moment
> (should not make a difference though):
> 
>     *(struct B*)x.buf = *b;
> 
> >   y = x; // (*)
> > }
> > 
> > What's the effective type of 'x' in the 'y = x' copy? 
> 
> Good point. The existing wording would take the declared
> type of x as the effective type, but this may not be
> what you are interested in. Let's assume that x has no declared
> type but that it had effective type struct X before the
> store to x.buf (because of an even earlier store to 
> x with type struct X).
> 
> There is a general question how stores to subobjects
> affect effective types and I do not think this is clear
> even before this proposed change.

Actually, I think this is not allowed because:

"An object shall have its stored value accessed only by an
lvalue expression that has one of the following types:

— a type compatible with the effective type of the object,
...
— an aggregate or union type that includes one of the
aforementioned types among its members (including,
recursively, a member of a subaggregate or contained union), or

— a character type."

... and we would need to move "a character type" above
in the list to make it defined.

Martin



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: aliasing
  2024-03-18 11:56     ` aliasing Martin Uecker
@ 2024-03-18 13:21       ` Richard Biener
  2024-03-18 15:13         ` aliasing Martin Uecker
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Biener @ 2024-03-18 13:21 UTC (permalink / raw)
  To: Martin Uecker; +Cc: gcc

On Mon, Mar 18, 2024 at 12:56 PM Martin Uecker <uecker@tugraz.at> wrote:
>
> Am Montag, dem 18.03.2024 um 11:55 +0100 schrieb Martin Uecker:
> > Am Montag, dem 18.03.2024 um 09:26 +0100 schrieb Richard Biener:
> > > On Mon, Mar 18, 2024 at 8:03 AM Martin Uecker <uecker@tugraz.at> wrote:
> > >
> >
> > >
> > > Let me give you an complication example made valid in C++:
> > >
> > > struct B { float x; float y; };
> > > struct X { int n; char buf[8]; } x, y;
> > >
> > > void foo(struct B *b)
> > > {
> > >   memcpy (x.buf, b, sizeof (struct B)); // in C++:  new (x.buf) B (*b);
> >
> > Let's make it an explicit store for the moment
> > (should not make a difference though):
> >
> >     *(struct B*)x.buf = *b;
> >
> > >   y = x; // (*)
> > > }
> > >
> > > What's the effective type of 'x' in the 'y = x' copy?
> >
> > Good point. The existing wording would take the declared
> > type of x as the effective type, but this may not be
> > what you are interested in. Let's assume that x has no declared
> > type but that it had effective type struct X before the
> > store to x.buf (because of an even earlier store to
> > x with type struct X).
> >
> > There is a general question how stores to subobjects
> > affect effective types and I do not think this is clear
> > even before this proposed change.
>
> Actually, I think this is not allowed because:
>
> "An object shall have its stored value accessed only by an
> lvalue expression that has one of the following types:
>
> — a type compatible with the effective type of the object,
> ...
> — an aggregate or union type that includes one of the
> aforementioned types among its members (including,
> recursively, a member of a subaggregate or contained union), or
>
> — a character type."
>
> ... and we would need to move "a character type" above
> in the list to make it defined.

So after

*(struct B*)x.buf = *b;

'x' cannot be used to access itself?  In particular also
an access to 'x.n' is affected by this?

You are right that the current wording of the standard doesn't
clarify any of this but this kind of storage abstraction is used
commonly in the embedded world when there's no runtime
library providing allocation.  And you said you want to make
the standard closer to implementation practice ...

Elsewhere when doing 'y = x' people refer to the wording that
aggregates are copied elementwise but it's not specified how
those elementwise accesses work - the lvalues are still of type
X here or are new lvalues implicitly formed and fall under the
other wordings?  Can I thus form an effective type of X by
storing it's subobjects at respective offsets (ignoring padding,
for example) and can I then use an lvalue of type 'X' to access
the whole aggregate?

Richard.

> Martin
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: aliasing
  2024-03-18 11:41   ` aliasing Martin Uecker
@ 2024-03-18 13:29     ` David Brown
  2024-03-18 13:54       ` aliasing Andreas Schwab
  2024-03-18 15:00       ` aliasing Martin Uecker
  0 siblings, 2 replies; 19+ messages in thread
From: David Brown @ 2024-03-18 13:29 UTC (permalink / raw)
  To: Martin Uecker; +Cc: gcc

On 18/03/2024 12:41, Martin Uecker wrote:
> 
> 
> Hi David,
> 
> Am Montag, dem 18.03.2024 um 10:00 +0100 schrieb David Brown:
>> Hi,
>>
>> I would very glad to see this change in the standards.
>>
>>
>> Should "byte type" include all character types (signed, unsigned and
>> plain), or should it be restricted to "unsigned char" since that is the
>> "byte" type ?  (I think allowing all character types makes sense, but
>> only unsigned char is guaranteed to be suitable for general object
>> backing store.)
> 
> At the moment, the special type that can access all others are
> all non-atomic character types.  So for symmetry reasons, it
> seems that this is also what we want for backing store.
> 
> I am not sure what you mean by "only unsigned char". Are you talking
> about C++?  "unsigned char" has no special role in C.
> 

"unsigned char" does have a special role in C - in 6.2.6.1p4 it 
describes any object as being able to be copied to an array of unsigned 
char to get the "object representation".  The same is not true for an 
array of "signed char".  I think it would be possible to have an 
implementation where "signed char" was 8-bit two's complement except 
that 0x80 would be a trap representation rather than -128.  I am not 
sure of the consequences of such an implementation (assuming I am even 
correct in it being allowed).

>>
>> Should it also include "uint8_t" (if it exists) ?  "uint8_t" is often an
>> alias for "unsigned char", but it could be something different, like an
>> alias for __UINT8_TYPE__, or "unsigned int
>> __attribute__((mode(QImode)))", which is used in the AVR gcc port.
> 
> I think this might be a reason to not include it, as it could
> affect aliasing analysis. At least, this would be a different
> independent change to consider.
> 

I think it is important that there is a guarantee here, because people 
do use uint8_t as a generic "raw memory" type.  Embedded standards like 
MISRA strongly discourage the use of "unsized" types such as "unsigned 
char", and it is generally assumed that "uint8_t" has the aliasing 
superpowers of a character type.  But it is possible that the a change 
would be better put in the library section on <stdint.h> rather than 
this section.

>>
>> In my line of work - small-systems embedded development - it is common
>> to have "home-made" or specialised memory allocation systems rather than
>> relying on a generic heap.  This is, I think, some of the "existing
>> practice" that you are considering here - there is a "backing store" of
>> some sort that can be allocated and used as objects of a type other than
>> the declared type of the backing store.  While a simple unsigned char
>> array is a very common kind of backing store, there are others that are
>> used, and it would be good to be sure of the correctness guarantees for
>> these.  Possibilities that I have seen include:
>>
>> unsigned char heap1[N];
>>
>> uint8_t heap2[N];
>>
>> union {
>> 	double dummy_for_alignment;
>> 	char heap[N];
>> } heap3;
>>
>> struct {
>> 	uint32_t capacity;
>> 	uint8_t * p_next_free;
>> 	uint8_t heap[N];
>> } heap4;
>>
>> uint32_t heap5[N];
>>
>> Apart from this last one, if "uint8_t" is guaranteed to be a "byte
>> type", then I believe your wording means that these unions and structs
>> would also work as "byte arrays".  But it might be useful to add a
>> footnote clarifying that.
>>
> 
> I need to think about this.
> 

Thank you.

I see people making a lot of assumptions in their embedded programming 
that are not fully justified in the C standards.  Sometimes the 
assumptions are just bad, or it would be easy to write code without the 
assumptions.  But at other times it would be very awkward or inefficient 
to write code that is completely "safe" (in terms of having fully 
defined behaviour from the C standards or from implementation-dependent 
behaviour).  Making your own dynamic memory allocation functions is one 
such case.  So I have a tendency to jump on any suggestion of changes to 
the C (or C++) standards that could let people write such essential code 
in a safer or more efficient manner.

>> (It is also not uncommon to have the backing space allocated by the
>> linker, but then it falls under the existing "no declared type" case.)
> 
> Yes, although with the change we would make the "no declared type" also
> be byte arrays, so there is then simply no difference anymore.
> 

Fair enough.  (Linker-defined storage does not just have no declared 
type, it has no directly declared size or other properties either.  The 
start and the stop of the storage area is typically declared as "extern 
uint8_t __space_start[], __space_stop[];", or perhaps as single 
characters or uint32_t types.  The space in between is just calculated 
as the difference between pointers to these.)

>>
>>
>> I would not want uint32_t to be considered an "alias anything" type, but
>> I have occasionally seen such types used for memory store backings.  It
>> is perhaps worth considering defining "byte type" as "non-atomic
>> character type, [u]int8_t (if they exist), or other
>> implementation-defined types".
> 
> This could make sense, the question is whether we want to encourage
> the use of other types for this use case, as this would then not
> be portable.

I think uint8_t should be highly portable, except to targets where it 
does not exist (and in this day and age, that basically means some DSP 
devices that have 16-bit, 24-bit or 32-bit char).

There is precedence for this wording, however, in 6.7.2.1p5 for 
bit-fields - "A bit-field shall have a type that is a qualified or 
unqualified version of _Bool, signed int, unsigned int, or some other 
implementation-defined type".

I think it should be clear enough that using an implementation-defined 
type rather than a character type would potentially limit portability. 
For the kinds of systems I am thinking off, extreme portability is 
normally not of prime concern - efficiency on a particular target with a 
particular compiler is often more important.

> 
> Are there important reason for not using "unsigned char" ?
> 

What is "important" is often a subjective matter.  One reason many 
people use "uint8_t" is that they prefer to be explicit about sizes, and 
would rather have a hard error if the code is used on a target that 
doesn't support the size.  Some coding standards, such as the very 
common (though IMHO somewhat flawed) MISRA standard, strongly encourage 
size-specific types and consider the use of "int" or "unsigned char" as 
a violation of their rules and directives.  Many libraries and code 
bases with a history older than C99 have their own typedef names for 
size-specific types or low-level storage types, such as "sys_uint8", 
"BYTE", "u8", and so on, and users may prefer these for consistency. 
And for people with a background in hardware or assembly (not uncommon 
for small systems embedded programming), or other languages such as 
Rust, "unsigned char" sounds vague, poorly defined, and somewhat 
meaningless as a type name for a raw byte of memory or a minimal sized 
unsigned integer.

Of course most alternative names for bytes would be typedefs of 
"unsigned char" and therefore work just the same way.  But as noted 
before, uint8_t could be defined in another manner on some systems (and 
on GCC for the AVR, it /is/ defined in a different way - though I have 
no idea why).

And bigger types, such as uint32_t, have been used to force alignment 
for backing store (either because the compiler did not support _Alignas, 
or the programmer did not know about it).  (But I am not suggesting that 
plain "uint32_t" should be considered a "byte type" for aliasing purposes.)

>>
>> Some other compilers might guarantee not to do type-based alias analysis
>> and thus view all types as "byte types" in this way.  For gcc, there
>> could be a kind of reverse "may_alias" type attribute to create such types.
>>
>>
>>
>> There are a number of other features that could make allocation
>> functions more efficient and safer in use, and which could be ideally be
>> standardised in the C standards or at least added as gcc extensions, but
>> I think that's more than you are looking for here!
> 
> It is possible to submit proposal to WG14.
> 

Yes, I know.  But giving you some feedback here is a step in that 
direction - even if it turns out that it doesn't affect your wording in 
the end.

David

> Martin
> 
> 
>>
>> David
>>
>>
>>
>> On 18/03/2024 08:03, Martin Uecker via Gcc wrote:
>>>
>>> Hi,
>>>
>>> can you please take a quick look at this? This is intended to align
>>> the C standard with existing practice with respect to aliasing by
>>> removing the special rules for "objects with no declared type" and
>>> making it fully symmetric and only based on types with non-atomic
>>> character types being able to alias everything.
>>>
>>>
>>> Unrelated to this change, I have another question:  I wonder if GCC
>>> (or any other compiler) actually exploits the " or is copied as an
>>> array of  byte type, " rule to  make assumptions about the effective
>>> types of the target array? I know compilers do this work memcpy...
>>> Maybe also if a loop is transformed to memcpy?
>>>
>>> Martin
>>>
>>>
>>> Add the following definition after 3.5, paragraph 2:
>>>
>>> byte array
>>> object having either no declared type or an array of objects declared with a byte type
>>>
>>> byte type
>>> non-atomic character type
>>>
>>> Modify 6.5,paragraph 6:
>>> The effective type of an object that is not a byte array, for an access to its
>>> stored value, is the declared type of the object.97) If a value is
>>> stored into a byte array through an lvalue having a byte type, then
>>> the type of the lvalue becomes the effective type of the object for that
>>> access and for subsequent accesses that do not modify the stored value.
>>> If a value is copied into a byte array using memcpy or memmove, or is
>>> copied as an array of byte type, then the effective type of the
>>> modified object for that access and for subsequent accesses that do not
>>> modify the value is the effective type of the object from which the
>>> value is copied, if it has one. For all other accesses to a byte array,
>>> the effective type of the object is simply the type of the lvalue used
>>> for the access.
>>>
>>> https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3230.pdf
>>>
>>>
>>>
>>
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: aliasing
  2024-03-18 13:29     ` aliasing David Brown
@ 2024-03-18 13:54       ` Andreas Schwab
  2024-03-18 16:46         ` aliasing David Brown
  2024-03-18 15:00       ` aliasing Martin Uecker
  1 sibling, 1 reply; 19+ messages in thread
From: Andreas Schwab @ 2024-03-18 13:54 UTC (permalink / raw)
  To: David Brown; +Cc: Martin Uecker, gcc

On Mär 18 2024, David Brown wrote:

> I think it would be possible to have an implementation where "signed
> char" was 8-bit two's complement except that 0x80 would be a trap
> representation rather than -128.

signed char cannot have padding bits, thus it cannot have a trap
representation.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: aliasing
  2024-03-18 13:29     ` aliasing David Brown
  2024-03-18 13:54       ` aliasing Andreas Schwab
@ 2024-03-18 15:00       ` Martin Uecker
  2024-03-18 17:11         ` aliasing David Brown
  1 sibling, 1 reply; 19+ messages in thread
From: Martin Uecker @ 2024-03-18 15:00 UTC (permalink / raw)
  To: David Brown; +Cc: gcc

Am Montag, dem 18.03.2024 um 14:29 +0100 schrieb David Brown:
> 
> On 18/03/2024 12:41, Martin Uecker wrote:
> > 
> > 
> > Hi David,
> > 
> > Am Montag, dem 18.03.2024 um 10:00 +0100 schrieb David Brown:
> > > Hi,
> > > 
> > > I would very glad to see this change in the standards.
> > > 
> > > 
> > > Should "byte type" include all character types (signed, unsigned and
> > > plain), or should it be restricted to "unsigned char" since that is the
> > > "byte" type ?  (I think allowing all character types makes sense, but
> > > only unsigned char is guaranteed to be suitable for general object
> > > backing store.)
> > 
> > At the moment, the special type that can access all others are
> > all non-atomic character types.  So for symmetry reasons, it
> > seems that this is also what we want for backing store.
> > 
> > I am not sure what you mean by "only unsigned char". Are you talking
> > about C++?  "unsigned char" has no special role in C.
> > 
> 
> "unsigned char" does have a special role in C - in 6.2.6.1p4 it 
> describes any object as being able to be copied to an array of unsigned 
> char to get the "object representation". 
>  The same is not true for an 
> array of "signed char".  I think it would be possible to have an 
> implementation where "signed char" was 8-bit two's complement except 
> that 0x80 would be a trap representation rather than -128.  I am not 
> sure of the consequences of such an implementation (assuming I am even 
> correct in it being allowed).

Yes, but with C23 this is not possible anymore. I think signed
char or char should work equally well now. 

> 
> > > 
> > > Should it also include "uint8_t" (if it exists) ?  "uint8_t" is often an
> > > alias for "unsigned char", but it could be something different, like an
> > > alias for __UINT8_TYPE__, or "unsigned int
> > > __attribute__((mode(QImode)))", which is used in the AVR gcc port.
> > 
> > I think this might be a reason to not include it, as it could
> > affect aliasing analysis. At least, this would be a different
> > independent change to consider.
> > 
> 
> I think it is important that there is a guarantee here, because people 
> do use uint8_t as a generic "raw memory" type.  Embedded standards like 
> MISRA strongly discourage the use of "unsized" types such as "unsigned 
> char", and it is generally assumed that "uint8_t" has the aliasing 
> superpowers of a character type.  But it is possible that the a change 
> would be better put in the library section on <stdint.h> rather than 
> this section.
> 
> > > 
> > > In my line of work - small-systems embedded development - it is common
> > > to have "home-made" or specialised memory allocation systems rather than
> > > relying on a generic heap.  This is, I think, some of the "existing
> > > practice" that you are considering here - there is a "backing store" of
> > > some sort that can be allocated and used as objects of a type other than
> > > the declared type of the backing store.  While a simple unsigned char
> > > array is a very common kind of backing store, there are others that are
> > > used, and it would be good to be sure of the correctness guarantees for
> > > these.  Possibilities that I have seen include:
> > > 
> > > unsigned char heap1[N];
> > > 
> > > uint8_t heap2[N];
> > > 
> > > union {
> > > 	double dummy_for_alignment;
> > > 	char heap[N];
> > > } heap3;
> > > 
> > > struct {
> > > 	uint32_t capacity;
> > > 	uint8_t * p_next_free;
> > > 	uint8_t heap[N];
> > > } heap4;
> > > 
> > > uint32_t heap5[N];
> > > 
> > > Apart from this last one, if "uint8_t" is guaranteed to be a "byte
> > > type", then I believe your wording means that these unions and structs
> > > would also work as "byte arrays".  But it might be useful to add a
> > > footnote clarifying that.
> > > 
> > 
> > I need to think about this.
> > 
> 
> Thank you.
> 
> I see people making a lot of assumptions in their embedded programming 
> that are not fully justified in the C standards.  Sometimes the 
> assumptions are just bad, or it would be easy to write code without the 
> assumptions.  But at other times it would be very awkward or inefficient 
> to write code that is completely "safe" (in terms of having fully 
> defined behaviour from the C standards or from implementation-dependent 
> behaviour).  Making your own dynamic memory allocation functions is one 
> such case.  So I have a tendency to jump on any suggestion of changes to 
> the C (or C++) standards that could let people write such essential code 
> in a safer or more efficient manner.

That something is undefined does not automatically mean it is 
forbidden or unsafe.  It simply means it is not portable.  I think
in the embedded space it will be difficult to make everything well
defined.  But I fully agree that widely used techniques should
ideally be based on defined behavior and we should  change the
standard accordingly.

> 
> > > (It is also not uncommon to have the backing space allocated by the
> > > linker, but then it falls under the existing "no declared type" case.)
> > 
> > Yes, although with the change we would make the "no declared type" also
> > be byte arrays, so there is then simply no difference anymore.
> > 
> 
> Fair enough.  (Linker-defined storage does not just have no declared 
> type, it has no directly declared size or other properties either.  The 
> start and the stop of the storage area is typically declared as "extern 
> uint8_t __space_start[], __space_stop[];", or perhaps as single 
> characters or uint32_t types.  The space in between is just calculated 
> as the difference between pointers to these.)
> 
> > > 
> > > 
> > > I would not want uint32_t to be considered an "alias anything" type, but
> > > I have occasionally seen such types used for memory store backings.  It
> > > is perhaps worth considering defining "byte type" as "non-atomic
> > > character type, [u]int8_t (if they exist), or other
> > > implementation-defined types".
> > 
> > This could make sense, the question is whether we want to encourage
> > the use of other types for this use case, as this would then not
> > be portable.
> 
> I think uint8_t should be highly portable, except to targets where it 
> does not exist (and in this day and age, that basically means some DSP 
> devices that have 16-bit, 24-bit or 32-bit char).
> 
> There is precedence for this wording, however, in 6.7.2.1p5 for 
> bit-fields - "A bit-field shall have a type that is a qualified or 
> unqualified version of _Bool, signed int, unsigned int, or some other 
> implementation-defined type".
> 
> I think it should be clear enough that using an implementation-defined 
> type rather than a character type would potentially limit portability. 
> For the kinds of systems I am thinking off, extreme portability is 
> normally not of prime concern - efficiency on a particular target with a 
> particular compiler is often more important.

Thanks, I will bring back this information to WG14.
> 
> > 
> > Are there important reason for not using "unsigned char" ?
> > 
> 
> What is "important" is often a subjective matter.  One reason many 
> people use "uint8_t" is that they prefer to be explicit about sizes, and 
> would rather have a hard error if the code is used on a target that 
> doesn't support the size.  Some coding standards, such as the very 
> common (though IMHO somewhat flawed) MISRA standard, strongly encourage 
> size-specific types and consider the use of "int" or "unsigned char" as 
> a violation of their rules and directives.  Many libraries and code 
> bases with a history older than C99 have their own typedef names for 
> size-specific types or low-level storage types, such as "sys_uint8", 
> "BYTE", "u8", and so on, and users may prefer these for consistency. 
> And for people with a background in hardware or assembly (not uncommon 
> for small systems embedded programming), or other languages such as 
> Rust, "unsigned char" sounds vague, poorly defined, and somewhat 
> meaningless as a type name for a raw byte of memory or a minimal sized 
> unsigned integer.
> 
> Of course most alternative names for bytes would be typedefs of 
> "unsigned char" and therefore work just the same way.  But as noted 
> before, uint8_t could be defined in another manner on some systems (and 
> on GCC for the AVR, it /is/ defined in a different way - though I have 
> no idea why).
> 
> And bigger types, such as uint32_t, have been used to force alignment 
> for backing store (either because the compiler did not support _Alignas, 
> or the programmer did not know about it).  (But I am not suggesting that 
> plain "uint32_t" should be considered a "byte type" for aliasing purposes.)
> 
> > > 
> > > Some other compilers might guarantee not to do type-based alias analysis
> > > and thus view all types as "byte types" in this way.  For gcc, there
> > > could be a kind of reverse "may_alias" type attribute to create such types.
> > > 
> > > 
> > > 
> > > There are a number of other features that could make allocation
> > > functions more efficient and safer in use, and which could be ideally be
> > > standardised in the C standards or at least added as gcc extensions, but
> > > I think that's more than you are looking for here!
> > 
> > It is possible to submit proposal to WG14.
> > 
> 
> Yes, I know.  But giving you some feedback here is a step in that 
> direction - even if it turns out that it doesn't affect your wording in 
> the end.

Any kind of feedback is very welcome. Thank you!

Martin

> > > On 18/03/2024 08:03, Martin Uecker via Gcc wrote:
> > > > 
> > > > Hi,
> > > > 
> > > > can you please take a quick look at this? This is intended to align
> > > > the C standard with existing practice with respect to aliasing by
> > > > removing the special rules for "objects with no declared type" and
> > > > making it fully symmetric and only based on types with non-atomic
> > > > character types being able to alias everything.
> > > > 
> > > > 
> > > > Unrelated to this change, I have another question:  I wonder if GCC
> > > > (or any other compiler) actually exploits the " or is copied as an
> > > > array of  byte type, " rule to  make assumptions about the effective
> > > > types of the target array? I know compilers do this work memcpy...
> > > > Maybe also if a loop is transformed to memcpy?
> > > > 
> > > > Martin
> > > > 
> > > > 
> > > > Add the following definition after 3.5, paragraph 2:
> > > > 
> > > > byte array
> > > > object having either no declared type or an array of objects declared with a byte type
> > > > 
> > > > byte type
> > > > non-atomic character type
> > > > 
> > > > Modify 6.5,paragraph 6:
> > > > The effective type of an object that is not a byte array, for an access to its
> > > > stored value, is the declared type of the object.97) If a value is
> > > > stored into a byte array through an lvalue having a byte type, then
> > > > the type of the lvalue becomes the effective type of the object for that
> > > > access and for subsequent accesses that do not modify the stored value.
> > > > If a value is copied into a byte array using memcpy or memmove, or is
> > > > copied as an array of byte type, then the effective type of the
> > > > modified object for that access and for subsequent accesses that do not
> > > > modify the value is the effective type of the object from which the
> > > > value is copied, if it has one. For all other accesses to a byte array,
> > > > the effective type of the object is simply the type of the lvalue used
> > > > for the access.
> > > > 
> > > > https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3230.pdf
> > > > 
> > > > 
> > > > 
> > > 
> > 

-- 
Univ.-Prof. Dr. rer. nat. Martin Uecker
Graz University of Technology
Institute of Biomedical Imaging



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: aliasing
  2024-03-18 13:21       ` aliasing Richard Biener
@ 2024-03-18 15:13         ` Martin Uecker
  0 siblings, 0 replies; 19+ messages in thread
From: Martin Uecker @ 2024-03-18 15:13 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc

Am Montag, dem 18.03.2024 um 14:21 +0100 schrieb Richard Biener:
> On Mon, Mar 18, 2024 at 12:56 PM Martin Uecker <uecker@tugraz.at> wrote:
> > 
> > Am Montag, dem 18.03.2024 um 11:55 +0100 schrieb Martin Uecker:
> > > Am Montag, dem 18.03.2024 um 09:26 +0100 schrieb Richard Biener:
> > > > On Mon, Mar 18, 2024 at 8:03 AM Martin Uecker <uecker@tugraz.at> wrote:
> > > > 
> > > 
> > > > 
> > > > Let me give you an complication example made valid in C++:
> > > > 
> > > > struct B { float x; float y; };
> > > > struct X { int n; char buf[8]; } x, y;
> > > > 
> > > > void foo(struct B *b)
> > > > {
> > > >   memcpy (x.buf, b, sizeof (struct B)); // in C++:  new (x.buf) B (*b);
> > > 
> > > Let's make it an explicit store for the moment
> > > (should not make a difference though):
> > > 
> > >     *(struct B*)x.buf = *b;
> > > 
> > > >   y = x; // (*)
> > > > }
> > > > 
> > > > What's the effective type of 'x' in the 'y = x' copy?
> > > 
> > > Good point. The existing wording would take the declared
> > > type of x as the effective type, but this may not be
> > > what you are interested in. Let's assume that x has no declared
> > > type but that it had effective type struct X before the
> > > store to x.buf (because of an even earlier store to
> > > x with type struct X).
> > > 
> > > There is a general question how stores to subobjects
> > > affect effective types and I do not think this is clear
> > > even before this proposed change.
> > 
> > Actually, I think this is not allowed because:
> > 
> > "An object shall have its stored value accessed only by an
> > lvalue expression that has one of the following types:
> > 
> > — a type compatible with the effective type of the object,
> > ...
> > — an aggregate or union type that includes one of the
> > aforementioned types among its members (including,
> > recursively, a member of a subaggregate or contained union), or
> > 
> > — a character type."
> > 
> > ... and we would need to move "a character type" above
> > in the list to make it defined.
> 
> So after
> 
> *(struct B*)x.buf = *b;
> 
> 'x' cannot be used to access itself?  In particular also
> an access to 'x.n' is affected by this?

According to the current wording and assuming x has no
a declared type,  x.buf would acquire an effective 
type of struct B. Then if  x.buf is read as part of 
a x it is accessed with an lvalue of struct X (which
does not include a struct B but a character buffer).

So yes, currently it would  be undefined behavior 
and the proposed wording would not change this. Clearly,
we should include an additional change to fix this.

> 
> You are right that the current wording of the standard doesn't
> clarify any of this but this kind of storage abstraction is used
> commonly in the embedded world when there's no runtime
> library providing allocation.  And you said you want to make
> the standard closer to implementation practice ...

Well, we are working on it... Any help is much appreciated.

> 
> Elsewhere when doing 'y = x' people refer to the wording that
> aggregates are copied elementwise but it's not specified how
> those elementwise accesses work - the lvalues are still of type
> X here or are new lvalues implicitly formed and fall under the
> other wordings? 

I think there is no wording for elementwise copy.

My understanding is that the 

"...an aggregate or union type that includes..."

wording above is supposed to define this via an lvalue
access with aggregate or union type.  It blesses the
implied access to the elements via the access with 
an lvalue which has the type of the aggregate.  


>  Can I thus form an effective type of X by
> storing it's subobjects at respective offsets (ignoring padding,
> for example) and can I then use an lvalue of type 'X' to access
> the whole aggregate?

I think this is defined behavior.  The subjects get
their effective types via the individual stores and then 
the access using lvalue of type 'X' is ok according to
the "..an aggregate or union type that includes.."
rule.


Martin



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: aliasing
  2024-03-18 13:54       ` aliasing Andreas Schwab
@ 2024-03-18 16:46         ` David Brown
  2024-03-18 16:55           ` aliasing David Brown
  0 siblings, 1 reply; 19+ messages in thread
From: David Brown @ 2024-03-18 16:46 UTC (permalink / raw)
  To: gcc

On 18/03/2024 14:54, Andreas Schwab via Gcc wrote:
> On Mär 18 2024, David Brown wrote:
> 
>> I think it would be possible to have an implementation where "signed
>> char" was 8-bit two's complement except that 0x80 would be a trap
>> representation rather than -128.
> 
> signed char cannot have padding bits, thus it cannot have a trap
> representation.
> 

The premise is correct (no padding bits are allowed in signed char), but 
does it follow that it cannot have a trap representation?  I don't think 
the standards are clear either way here - I think the committee missed a 
chance to tidy up the description a bit more when C23 removed formats 
other than two's complement for signed integer types.

I also feel slightly uneasy using signed char for accessing object 
representations since the object representation is defined in terms of 
an unsigned char array, and conversion from unsigned char to signed char 
is implementation-defined.  (This too could have been tightened in C23, 
as there is unlikely to be any implementation that does not do the 
conversion in the obvious manner.)

But I am perhaps worrying too much here.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: aliasing
  2024-03-18 16:46         ` aliasing David Brown
@ 2024-03-18 16:55           ` David Brown
  0 siblings, 0 replies; 19+ messages in thread
From: David Brown @ 2024-03-18 16:55 UTC (permalink / raw)
  To: gcc

On 18/03/2024 17:46, David Brown via Gcc wrote:
> On 18/03/2024 14:54, Andreas Schwab via Gcc wrote:
>> On Mär 18 2024, David Brown wrote:
>>
>>> I think it would be possible to have an implementation where "signed
>>> char" was 8-bit two's complement except that 0x80 would be a trap
>>> representation rather than -128.
>>
>> signed char cannot have padding bits, thus it cannot have a trap
>> representation.
>>
> 
> The premise is correct (no padding bits are allowed in signed char), but 
> does it follow that it cannot have a trap representation?

5.2.4.2.1p3 in C23 makes the range of a signed integer type go from
- (2 ^ (N-1)) to (2 ^ (N-1)) - 1, which means all values are valid and 
there can be no trap value if there are no padding bits.

> I don't think 
> the standards are clear either way here - I think the committee missed a 
> chance to tidy up the description a bit more when C23 removed formats 
> other than two's complement for signed integer types.
> 
> I also feel slightly uneasy using signed char for accessing object 
> representations since the object representation is defined in terms of 
> an unsigned char array, and conversion from unsigned char to signed char 
> is implementation-defined.  (This too could have been tightened in C23, 
> as there is unlikely to be any implementation that does not do the 
> conversion in the obvious manner.)
> 
> But I am perhaps worrying too much here.
> 
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: aliasing
  2024-03-18 15:00       ` aliasing Martin Uecker
@ 2024-03-18 17:11         ` David Brown
  0 siblings, 0 replies; 19+ messages in thread
From: David Brown @ 2024-03-18 17:11 UTC (permalink / raw)
  To: Martin Uecker, David Brown; +Cc: gcc

On 18/03/2024 16:00, Martin Uecker via Gcc wrote:
> Am Montag, dem 18.03.2024 um 14:29 +0100 schrieb David Brown:
>>
>> On 18/03/2024 12:41, Martin Uecker wrote:
>>>
>>>
>>> Hi David,
>>>
>>> Am Montag, dem 18.03.2024 um 10:00 +0100 schrieb David Brown:
>>>> Hi,
>>>>
>>>> I would very glad to see this change in the standards.
>>>>
>>>>
>>>> Should "byte type" include all character types (signed, unsigned and
>>>> plain), or should it be restricted to "unsigned char" since that is the
>>>> "byte" type ?  (I think allowing all character types makes sense, but
>>>> only unsigned char is guaranteed to be suitable for general object
>>>> backing store.)
>>>
>>> At the moment, the special type that can access all others are
>>> all non-atomic character types.  So for symmetry reasons, it
>>> seems that this is also what we want for backing store.
>>>
>>> I am not sure what you mean by "only unsigned char". Are you talking
>>> about C++?  "unsigned char" has no special role in C.
>>>
>>
>> "unsigned char" does have a special role in C - in 6.2.6.1p4 it
>> describes any object as being able to be copied to an array of unsigned
>> char to get the "object representation".
>>   The same is not true for an
>> array of "signed char".  I think it would be possible to have an
>> implementation where "signed char" was 8-bit two's complement except
>> that 0x80 would be a trap representation rather than -128.  I am not
>> sure of the consequences of such an implementation (assuming I am even
>> correct in it being allowed).
> 
> Yes, but with C23 this is not possible anymore. I think signed
> char or char should work equally well now.

I have just noticed that in C23, the SCHAR_MIN is -128 (or -2 ^ (N-1) in 
general), eliminating the possibility of having a trap value for signed 
char (or any other integer type without padding bits).  There's always a 
bit of jumping around in the C standards to get the complete picture!

But as I said in another post, I still worry a little about the unsigned 
to signed conversion being implementation-defined, and therefore not 
guaranteed to work in a way that preserves the underlying object 
representation.  I think it should be possible to make a small change to 
the description of unsigned to signed conversions to eliminate that.

>>
>> I see people making a lot of assumptions in their embedded programming
>> that are not fully justified in the C standards.  Sometimes the
>> assumptions are just bad, or it would be easy to write code without the
>> assumptions.  But at other times it would be very awkward or inefficient
>> to write code that is completely "safe" (in terms of having fully
>> defined behaviour from the C standards or from implementation-dependent
>> behaviour).  Making your own dynamic memory allocation functions is one
>> such case.  So I have a tendency to jump on any suggestion of changes to
>> the C (or C++) standards that could let people write such essential code
>> in a safer or more efficient manner.
> 
> That something is undefined does not automatically mean it is
> forbidden or unsafe.  It simply means it is not portable.  

That is the case for things that are not defined in the C standards, but 
defined elsewhere.  If the behaviour of a piece of code is not defined 
anywhere for the toolchain you are using, then it is inherently unsafe 
to use.  ("Forbidden" is another matter.  It might be "forbidden" by 
your coding standards, or your boss, but the language and tools don't 
forbid things!)

Something that is not defined in the C standards, but defined in your 
compiler manual or additional standards (such as POSIX) is safe to use 
but limited in portability.

And of course something that is "inherently unsafe" may be considered 
safe in practice, by analysing the generated object code or doing 
exhaustive testing.

> I think
> in the embedded space it will be difficult to make everything well
> defined.  

Yes, that is absolutely true.  (And it is even more difficult if you try 
to restrict yourself to things with full definitions in the C standards 
or explicit implementation-defined behaviour documented by toolchains. 
You almost invariably need some degree of compiler extensions for parts 
of the code.)

But I want to reduce to the smallest practical level the amount of code 
that "works in practice" rather than "known to work by design".

> But I fully agree that widely used techniques should
> ideally be based on defined behavior and we should  change the
> standard accordingly.
> 

Yes, where possible and practical, the standard provide the guarantees 
that programmers need.  Failing that, compiler extensions are good too - 
I'd be very happy with a GCC variable __attribute__ "backing_store" that 
could be applied to allocator backing stores and provide the aliasing 
guarantees needed.  (It might even be needed anyway, to work well with 
the "malloc" attribute, even with your change to the standard.)

David

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: aliasing
  1999-08-21  9:46 ` aliasing Mark Mitchell
@ 1999-08-31 23:20   ` Mark Mitchell
  0 siblings, 0 replies; 19+ messages in thread
From: Mark Mitchell @ 1999-08-31 23:20 UTC (permalink / raw)
  To: jlm; +Cc: gcc

The relevant paragraph is in [basic.lval] of the C++ standard.  The
paragraph in the C standard is nearly identical.  Here it is.  Perhaps
someone would like to HTML-ify this, and make a FAQ entry out of it?

  If a program attempts to access the stored value of an object
  through an lvalue of other than one of the following types the
  behavior is undefined:

  --the dynamic type of the object,

You can access an object using the type it really has.  (I.e., you can
use an `int *' to refer to an `int'.)

  --a cv-qualified version of the dynamic type of the object,

You can also use a `const int *' to read an `int'.

  --a type that is the signed or  unsigned  type  corresponding  to  the
    dynamic type of the object,

Or an `unsigned int *'.

  --a  type  that  is the signed or unsigned type corresponding to a cv-
    qualified version of the dynamic type of the object,

Or a `const unsigned int *'.

  --an aggregate or union type that includes one of  the  aforementioned
    types among its members (including, recursively, a member of a
    sub-aggregate or contained union),

You can read or write an entire structure, thereby accessing all of
its fields.

  --a type that is a (possibly cv-qualified)  base  class  type  of  the
    dynamic type of the object,

This one is C++-specific.  You can read or write an entire base class
of the actual type of the object.

  --a char or unsigned char type.

You can use a `char *', `unsigned char *', `volatile char *',
`unsigned const volatile char *', etc. to read or write from anywhere.

All pointer types here can be replaced with reference types as well.

--
Mark Mitchell                   mark@codesourcery.com
CodeSourcery, LLC               http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* aliasing
  1999-08-21  9:23 aliasing Jason Moxham
  1999-08-21  9:46 ` aliasing Mark Mitchell
@ 1999-08-31 23:20 ` Jason Moxham
  1 sibling, 0 replies; 19+ messages in thread
From: Jason Moxham @ 1999-08-31 23:20 UTC (permalink / raw)
  To: gcc

I have a C++ ( really a C prog with a couple of overloaded fn's )
program which I'm fairly sure breaks the strict aliasing rules of
Gcc-2.95.1 

Where can find a copy of what these rules are , so I can fix my code ?


For a temporary fix I added -fno-strict-aliasing to the options and
expected a performance drop . However the program now runs 1% faster ???
(I assure you 1% is not a trival amount)

Using Slackware 3.9 + gcc 2.95.1 on AMD K6(x86)
options -O3 -fno-exceptions -funroll-loops -march=k6
-fomit-frame-pointer -ffast-math   ... etc

I can send 
source(100K) + exe's(50K) + data  etc.. 


Jason Moxham
jlm@maths.soton.ac.uk

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: aliasing
  1999-08-21  9:23 aliasing Jason Moxham
@ 1999-08-21  9:46 ` Mark Mitchell
  1999-08-31 23:20   ` aliasing Mark Mitchell
  1999-08-31 23:20 ` aliasing Jason Moxham
  1 sibling, 1 reply; 19+ messages in thread
From: Mark Mitchell @ 1999-08-21  9:46 UTC (permalink / raw)
  To: jlm; +Cc: gcc

The relevant paragraph is in [basic.lval] of the C++ standard.  The
paragraph in the C standard is nearly identical.  Here it is.  Perhaps
someone would like to HTML-ify this, and make a FAQ entry out of it?

  If a program attempts to access the stored value of an object
  through an lvalue of other than one of the following types the
  behavior is undefined:

  --the dynamic type of the object,

You can access an object using the type it really has.  (I.e., you can
use an `int *' to refer to an `int'.)

  --a cv-qualified version of the dynamic type of the object,

You can also use a `const int *' to read an `int'.

  --a type that is the signed or  unsigned  type  corresponding  to  the
    dynamic type of the object,

Or an `unsigned int *'.

  --a  type  that  is the signed or unsigned type corresponding to a cv-
    qualified version of the dynamic type of the object,

Or a `const unsigned int *'.

  --an aggregate or union type that includes one of  the  aforementioned
    types among its members (including, recursively, a member of a
    sub-aggregate or contained union),

You can read or write an entire structure, thereby accessing all of
its fields.

  --a type that is a (possibly cv-qualified)  base  class  type  of  the
    dynamic type of the object,

This one is C++-specific.  You can read or write an entire base class
of the actual type of the object.

  --a char or unsigned char type.

You can use a `char *', `unsigned char *', `volatile char *',
`unsigned const volatile char *', etc. to read or write from anywhere.

All pointer types here can be replaced with reference types as well.

--
Mark Mitchell                   mark@codesourcery.com
CodeSourcery, LLC               http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* aliasing
@ 1999-08-21  9:23 Jason Moxham
  1999-08-21  9:46 ` aliasing Mark Mitchell
  1999-08-31 23:20 ` aliasing Jason Moxham
  0 siblings, 2 replies; 19+ messages in thread
From: Jason Moxham @ 1999-08-21  9:23 UTC (permalink / raw)
  To: gcc

I have a C++ ( really a C prog with a couple of overloaded fn's )
program which I'm fairly sure breaks the strict aliasing rules of
Gcc-2.95.1 

Where can find a copy of what these rules are , so I can fix my code ?


For a temporary fix I added -fno-strict-aliasing to the options and
expected a performance drop . However the program now runs 1% faster ???
(I assure you 1% is not a trival amount)

Using Slackware 3.9 + gcc 2.95.1 on AMD K6(x86)
options -O3 -fno-exceptions -funroll-loops -march=k6
-fomit-frame-pointer -ffast-math   ... etc

I can send 
source(100K) + exe's(50K) + data  etc.. 


Jason Moxham
jlm@maths.soton.ac.uk

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-03-18 17:11 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-18  7:03 aliasing Martin Uecker
2024-03-18  8:26 ` aliasing Richard Biener
2024-03-18 10:55   ` aliasing Martin Uecker
2024-03-18 11:56     ` aliasing Martin Uecker
2024-03-18 13:21       ` aliasing Richard Biener
2024-03-18 15:13         ` aliasing Martin Uecker
2024-03-18  9:00 ` aliasing David Brown
2024-03-18 10:09   ` aliasing Jonathan Wakely
2024-03-18 11:41   ` aliasing Martin Uecker
2024-03-18 13:29     ` aliasing David Brown
2024-03-18 13:54       ` aliasing Andreas Schwab
2024-03-18 16:46         ` aliasing David Brown
2024-03-18 16:55           ` aliasing David Brown
2024-03-18 15:00       ` aliasing Martin Uecker
2024-03-18 17:11         ` aliasing David Brown
  -- strict thread matches above, loose matches on Subject: below --
1999-08-21  9:23 aliasing Jason Moxham
1999-08-21  9:46 ` aliasing Mark Mitchell
1999-08-31 23:20   ` aliasing Mark Mitchell
1999-08-31 23:20 ` aliasing Jason Moxham

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).