From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=MF3z=KY=hesbynett.no=david.brown@sourceware.org>
Received: from spam02.hesby.net (spam01.hesby.net [81.29.32.152])
	by sourceware.org (Postfix) with ESMTP id 003703858418
	for <gcc@gcc.gnu.org>; Mon, 18 Mar 2024 13:29:20 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 003703858418
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=hesbynett.no
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=hesbynett.no
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 003703858418
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=81.29.32.152
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1710768564; cv=none;
	b=LBEMSQNYKcXtcLxGXd3/tCMZvKrpxv/abIbySK5sMX+i/kIzC4XFsXeVKtwTSDYT4U9CXw1lqq61dWQDjjdhg0NN2MvuKvooYhiHiR3Ptc3B3QGjoojsvFqzc4LXXenB7WlchO07FXS7tMMzAaICNJyhOkUwmDM1OUcUc1+PBhQ=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1710768564; c=relaxed/simple;
	bh=kh046BBbmOwtEgaqGbp4nbJPzPb9mLg9n9myetNYsnw=;
	h=Message-ID:Date:MIME-Version:Subject:To:From; b=t2z9/5jfi6G1nzzxUFdNL25+mCoiNdxOpGg23NoKgi09vCs2oMStwqkfTsAQxnarKaRyNQjrajXIKvXBwTbmwmMUvfIXyjQhfUqEUNf0aYvwKXnc7SCCVtMYXJhEa52dVk5kHnN61fJ4GKS1TrPrzo8et//kd3q6LhfQKrxOq3I=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: from [192.168.0.77] (unknown [79.161.10.130])
	by spam02.hesby.net (Halon) with ESMTPSA
	id 848f5b13-e52b-11ee-98be-506b8dfa0e58;
	Mon, 18 Mar 2024 14:29:18 +0100 (CET)
Message-ID: <6bff9afd-3e84-4260-9d05-8faec5f3ebe2@hesbynett.no>
Date: Mon, 18 Mar 2024 14:29:15 +0100
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: aliasing
Content-Language: en-GB
To: Martin Uecker <uecker@tugraz.at>
Cc: gcc@gcc.gnu.org
References: <b57d2094edb09f4e6db5080276ce043bc481b4d0.camel@tugraz.at>
 <c29bbf97-6dd3-717f-17ee-e2bf6ffdb18b@hesbynett.no>
 <089a1300d266a3921feab4efb911987ca465e5c9.camel@tugraz.at>
From: David Brown <david.brown@hesbynett.no>
In-Reply-To: <089a1300d266a3921feab4efb911987ca465e5c9.camel@tugraz.at>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-3032.7 required=5.0 tests=BAYES_00,KAM_DMARC_STATUS,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc.gcc.gnu.org>


On 18/03/2024 12:41, Martin Uecker wrote:
> 
> 
> Hi David,
> 
> Am Montag, dem 18.03.2024 um 10:00 +0100 schrieb David Brown:
>> Hi,
>>
>> I would very glad to see this change in the standards.
>>
>>
>> Should "byte type" include all character types (signed, unsigned and
>> plain), or should it be restricted to "unsigned char" since that is the
>> "byte" type ?  (I think allowing all character types makes sense, but
>> only unsigned char is guaranteed to be suitable for general object
>> backing store.)
> 
> At the moment, the special type that can access all others are
> all non-atomic character types.  So for symmetry reasons, it
> seems that this is also what we want for backing store.
> 
> I am not sure what you mean by "only unsigned char". Are you talking
> about C++?  "unsigned char" has no special role in C.
> 

"unsigned char" does have a special role in C - in 6.2.6.1p4 it 
describes any object as being able to be copied to an array of unsigned 
char to get the "object representation".  The same is not true for an 
array of "signed char".  I think it would be possible to have an 
implementation where "signed char" was 8-bit two's complement except 
that 0x80 would be a trap representation rather than -128.  I am not 
sure of the consequences of such an implementation (assuming I am even 
correct in it being allowed).

>>
>> Should it also include "uint8_t" (if it exists) ?  "uint8_t" is often an
>> alias for "unsigned char", but it could be something different, like an
>> alias for __UINT8_TYPE__, or "unsigned int
>> __attribute__((mode(QImode)))", which is used in the AVR gcc port.
> 
> I think this might be a reason to not include it, as it could
> affect aliasing analysis. At least, this would be a different
> independent change to consider.
> 

I think it is important that there is a guarantee here, because people 
do use uint8_t as a generic "raw memory" type.  Embedded standards like 
MISRA strongly discourage the use of "unsized" types such as "unsigned 
char", and it is generally assumed that "uint8_t" has the aliasing 
superpowers of a character type.  But it is possible that the a change 
would be better put in the library section on <stdint.h> rather than 
this section.

>>
>> In my line of work - small-systems embedded development - it is common
>> to have "home-made" or specialised memory allocation systems rather than
>> relying on a generic heap.  This is, I think, some of the "existing
>> practice" that you are considering here - there is a "backing store" of
>> some sort that can be allocated and used as objects of a type other than
>> the declared type of the backing store.  While a simple unsigned char
>> array is a very common kind of backing store, there are others that are
>> used, and it would be good to be sure of the correctness guarantees for
>> these.  Possibilities that I have seen include:
>>
>> unsigned char heap1[N];
>>
>> uint8_t heap2[N];
>>
>> union {
>> 	double dummy_for_alignment;
>> 	char heap[N];
>> } heap3;
>>
>> struct {
>> 	uint32_t capacity;
>> 	uint8_t * p_next_free;
>> 	uint8_t heap[N];
>> } heap4;
>>
>> uint32_t heap5[N];
>>
>> Apart from this last one, if "uint8_t" is guaranteed to be a "byte
>> type", then I believe your wording means that these unions and structs
>> would also work as "byte arrays".  But it might be useful to add a
>> footnote clarifying that.
>>
> 
> I need to think about this.
> 

Thank you.

I see people making a lot of assumptions in their embedded programming 
that are not fully justified in the C standards.  Sometimes the 
assumptions are just bad, or it would be easy to write code without the 
assumptions.  But at other times it would be very awkward or inefficient 
to write code that is completely "safe" (in terms of having fully 
defined behaviour from the C standards or from implementation-dependent 
behaviour).  Making your own dynamic memory allocation functions is one 
such case.  So I have a tendency to jump on any suggestion of changes to 
the C (or C++) standards that could let people write such essential code 
in a safer or more efficient manner.

>> (It is also not uncommon to have the backing space allocated by the
>> linker, but then it falls under the existing "no declared type" case.)
> 
> Yes, although with the change we would make the "no declared type" also
> be byte arrays, so there is then simply no difference anymore.
> 

Fair enough.  (Linker-defined storage does not just have no declared 
type, it has no directly declared size or other properties either.  The 
start and the stop of the storage area is typically declared as "extern 
uint8_t __space_start[], __space_stop[];", or perhaps as single 
characters or uint32_t types.  The space in between is just calculated 
as the difference between pointers to these.)

>>
>>
>> I would not want uint32_t to be considered an "alias anything" type, but
>> I have occasionally seen such types used for memory store backings.  It
>> is perhaps worth considering defining "byte type" as "non-atomic
>> character type, [u]int8_t (if they exist), or other
>> implementation-defined types".
> 
> This could make sense, the question is whether we want to encourage
> the use of other types for this use case, as this would then not
> be portable.

I think uint8_t should be highly portable, except to targets where it 
does not exist (and in this day and age, that basically means some DSP 
devices that have 16-bit, 24-bit or 32-bit char).

There is precedence for this wording, however, in 6.7.2.1p5 for 
bit-fields - "A bit-field shall have a type that is a qualified or 
unqualified version of _Bool, signed int, unsigned int, or some other 
implementation-defined type".

I think it should be clear enough that using an implementation-defined 
type rather than a character type would potentially limit portability. 
For the kinds of systems I am thinking off, extreme portability is 
normally not of prime concern - efficiency on a particular target with a 
particular compiler is often more important.

> 
> Are there important reason for not using "unsigned char" ?
> 

What is "important" is often a subjective matter.  One reason many 
people use "uint8_t" is that they prefer to be explicit about sizes, and 
would rather have a hard error if the code is used on a target that 
doesn't support the size.  Some coding standards, such as the very 
common (though IMHO somewhat flawed) MISRA standard, strongly encourage 
size-specific types and consider the use of "int" or "unsigned char" as 
a violation of their rules and directives.  Many libraries and code 
bases with a history older than C99 have their own typedef names for 
size-specific types or low-level storage types, such as "sys_uint8", 
"BYTE", "u8", and so on, and users may prefer these for consistency. 
And for people with a background in hardware or assembly (not uncommon 
for small systems embedded programming), or other languages such as 
Rust, "unsigned char" sounds vague, poorly defined, and somewhat 
meaningless as a type name for a raw byte of memory or a minimal sized 
unsigned integer.

Of course most alternative names for bytes would be typedefs of 
"unsigned char" and therefore work just the same way.  But as noted 
before, uint8_t could be defined in another manner on some systems (and 
on GCC for the AVR, it /is/ defined in a different way - though I have 
no idea why).

And bigger types, such as uint32_t, have been used to force alignment 
for backing store (either because the compiler did not support _Alignas, 
or the programmer did not know about it).  (But I am not suggesting that 
plain "uint32_t" should be considered a "byte type" for aliasing purposes.)

>>
>> Some other compilers might guarantee not to do type-based alias analysis
>> and thus view all types as "byte types" in this way.  For gcc, there
>> could be a kind of reverse "may_alias" type attribute to create such types.
>>
>>
>>
>> There are a number of other features that could make allocation
>> functions more efficient and safer in use, and which could be ideally be
>> standardised in the C standards or at least added as gcc extensions, but
>> I think that's more than you are looking for here!
> 
> It is possible to submit proposal to WG14.
> 

Yes, I know.  But giving you some feedback here is a step in that 
direction - even if it turns out that it doesn't affect your wording in 
the end.

David


> Martin
> 
> 
>>
>> David
>>
>>
>>
>> On 18/03/2024 08:03, Martin Uecker via Gcc wrote:
>>>
>>> Hi,
>>>
>>> can you please take a quick look at this? This is intended to align
>>> the C standard with existing practice with respect to aliasing by
>>> removing the special rules for "objects with no declared type" and
>>> making it fully symmetric and only based on types with non-atomic
>>> character types being able to alias everything.
>>>
>>>
>>> Unrelated to this change, I have another question:  I wonder if GCC
>>> (or any other compiler) actually exploits the " or is copied as an
>>> array of  byte type, " rule to  make assumptions about the effective
>>> types of the target array? I know compilers do this work memcpy...
>>> Maybe also if a loop is transformed to memcpy?
>>>
>>> Martin
>>>
>>>
>>> Add the following definition after 3.5, paragraph 2:
>>>
>>> byte array
>>> object having either no declared type or an array of objects declared with a byte type
>>>
>>> byte type
>>> non-atomic character type
>>>
>>> Modify 6.5,paragraph 6:
>>> The effective type of an object that is not a byte array, for an access to its
>>> stored value, is the declared type of the object.97) If a value is
>>> stored into a byte array through an lvalue having a byte type, then
>>> the type of the lvalue becomes the effective type of the object for that
>>> access and for subsequent accesses that do not modify the stored value.
>>> If a value is copied into a byte array using memcpy or memmove, or is
>>> copied as an array of byte type, then the effective type of the
>>> modified object for that access and for subsequent accesses that do not
>>> modify the value is the effective type of the object from which the
>>> value is copied, if it has one. For all other accesses to a byte array,
>>> the effective type of the object is simply the type of the lvalue used
>>> for the access.
>>>
>>> https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3230.pdf
>>>
>>>
>>>
>>
>