From: Martin Uecker <uecker@tugraz.at>
To: David Brown <david.brown@hesbynett.no>
Cc: gcc@gcc.gnu.org
Subject: Re: aliasing
Date: Mon, 18 Mar 2024 16:00:50 +0100 [thread overview]
Message-ID: <09f7c87f08fc25083975dd9fd5250cdcbb02993e.camel@tugraz.at> (raw)
In-Reply-To: <6bff9afd-3e84-4260-9d05-8faec5f3ebe2@hesbynett.no>
Am Montag, dem 18.03.2024 um 14:29 +0100 schrieb David Brown:
>
> On 18/03/2024 12:41, Martin Uecker wrote:
> >
> >
> > Hi David,
> >
> > Am Montag, dem 18.03.2024 um 10:00 +0100 schrieb David Brown:
> > > Hi,
> > >
> > > I would very glad to see this change in the standards.
> > >
> > >
> > > Should "byte type" include all character types (signed, unsigned and
> > > plain), or should it be restricted to "unsigned char" since that is the
> > > "byte" type ? (I think allowing all character types makes sense, but
> > > only unsigned char is guaranteed to be suitable for general object
> > > backing store.)
> >
> > At the moment, the special type that can access all others are
> > all non-atomic character types. So for symmetry reasons, it
> > seems that this is also what we want for backing store.
> >
> > I am not sure what you mean by "only unsigned char". Are you talking
> > about C++? "unsigned char" has no special role in C.
> >
>
> "unsigned char" does have a special role in C - in 6.2.6.1p4 it
> describes any object as being able to be copied to an array of unsigned
> char to get the "object representation".
> The same is not true for an
> array of "signed char". I think it would be possible to have an
> implementation where "signed char" was 8-bit two's complement except
> that 0x80 would be a trap representation rather than -128. I am not
> sure of the consequences of such an implementation (assuming I am even
> correct in it being allowed).
Yes, but with C23 this is not possible anymore. I think signed
char or char should work equally well now.
>
> > >
> > > Should it also include "uint8_t" (if it exists) ? "uint8_t" is often an
> > > alias for "unsigned char", but it could be something different, like an
> > > alias for __UINT8_TYPE__, or "unsigned int
> > > __attribute__((mode(QImode)))", which is used in the AVR gcc port.
> >
> > I think this might be a reason to not include it, as it could
> > affect aliasing analysis. At least, this would be a different
> > independent change to consider.
> >
>
> I think it is important that there is a guarantee here, because people
> do use uint8_t as a generic "raw memory" type. Embedded standards like
> MISRA strongly discourage the use of "unsized" types such as "unsigned
> char", and it is generally assumed that "uint8_t" has the aliasing
> superpowers of a character type. But it is possible that the a change
> would be better put in the library section on <stdint.h> rather than
> this section.
>
> > >
> > > In my line of work - small-systems embedded development - it is common
> > > to have "home-made" or specialised memory allocation systems rather than
> > > relying on a generic heap. This is, I think, some of the "existing
> > > practice" that you are considering here - there is a "backing store" of
> > > some sort that can be allocated and used as objects of a type other than
> > > the declared type of the backing store. While a simple unsigned char
> > > array is a very common kind of backing store, there are others that are
> > > used, and it would be good to be sure of the correctness guarantees for
> > > these. Possibilities that I have seen include:
> > >
> > > unsigned char heap1[N];
> > >
> > > uint8_t heap2[N];
> > >
> > > union {
> > > double dummy_for_alignment;
> > > char heap[N];
> > > } heap3;
> > >
> > > struct {
> > > uint32_t capacity;
> > > uint8_t * p_next_free;
> > > uint8_t heap[N];
> > > } heap4;
> > >
> > > uint32_t heap5[N];
> > >
> > > Apart from this last one, if "uint8_t" is guaranteed to be a "byte
> > > type", then I believe your wording means that these unions and structs
> > > would also work as "byte arrays". But it might be useful to add a
> > > footnote clarifying that.
> > >
> >
> > I need to think about this.
> >
>
> Thank you.
>
> I see people making a lot of assumptions in their embedded programming
> that are not fully justified in the C standards. Sometimes the
> assumptions are just bad, or it would be easy to write code without the
> assumptions. But at other times it would be very awkward or inefficient
> to write code that is completely "safe" (in terms of having fully
> defined behaviour from the C standards or from implementation-dependent
> behaviour). Making your own dynamic memory allocation functions is one
> such case. So I have a tendency to jump on any suggestion of changes to
> the C (or C++) standards that could let people write such essential code
> in a safer or more efficient manner.
That something is undefined does not automatically mean it is
forbidden or unsafe. It simply means it is not portable. I think
in the embedded space it will be difficult to make everything well
defined. But I fully agree that widely used techniques should
ideally be based on defined behavior and we should change the
standard accordingly.
>
> > > (It is also not uncommon to have the backing space allocated by the
> > > linker, but then it falls under the existing "no declared type" case.)
> >
> > Yes, although with the change we would make the "no declared type" also
> > be byte arrays, so there is then simply no difference anymore.
> >
>
> Fair enough. (Linker-defined storage does not just have no declared
> type, it has no directly declared size or other properties either. The
> start and the stop of the storage area is typically declared as "extern
> uint8_t __space_start[], __space_stop[];", or perhaps as single
> characters or uint32_t types. The space in between is just calculated
> as the difference between pointers to these.)
>
> > >
> > >
> > > I would not want uint32_t to be considered an "alias anything" type, but
> > > I have occasionally seen such types used for memory store backings. It
> > > is perhaps worth considering defining "byte type" as "non-atomic
> > > character type, [u]int8_t (if they exist), or other
> > > implementation-defined types".
> >
> > This could make sense, the question is whether we want to encourage
> > the use of other types for this use case, as this would then not
> > be portable.
>
> I think uint8_t should be highly portable, except to targets where it
> does not exist (and in this day and age, that basically means some DSP
> devices that have 16-bit, 24-bit or 32-bit char).
>
> There is precedence for this wording, however, in 6.7.2.1p5 for
> bit-fields - "A bit-field shall have a type that is a qualified or
> unqualified version of _Bool, signed int, unsigned int, or some other
> implementation-defined type".
>
> I think it should be clear enough that using an implementation-defined
> type rather than a character type would potentially limit portability.
> For the kinds of systems I am thinking off, extreme portability is
> normally not of prime concern - efficiency on a particular target with a
> particular compiler is often more important.
Thanks, I will bring back this information to WG14.
>
> >
> > Are there important reason for not using "unsigned char" ?
> >
>
> What is "important" is often a subjective matter. One reason many
> people use "uint8_t" is that they prefer to be explicit about sizes, and
> would rather have a hard error if the code is used on a target that
> doesn't support the size. Some coding standards, such as the very
> common (though IMHO somewhat flawed) MISRA standard, strongly encourage
> size-specific types and consider the use of "int" or "unsigned char" as
> a violation of their rules and directives. Many libraries and code
> bases with a history older than C99 have their own typedef names for
> size-specific types or low-level storage types, such as "sys_uint8",
> "BYTE", "u8", and so on, and users may prefer these for consistency.
> And for people with a background in hardware or assembly (not uncommon
> for small systems embedded programming), or other languages such as
> Rust, "unsigned char" sounds vague, poorly defined, and somewhat
> meaningless as a type name for a raw byte of memory or a minimal sized
> unsigned integer.
>
> Of course most alternative names for bytes would be typedefs of
> "unsigned char" and therefore work just the same way. But as noted
> before, uint8_t could be defined in another manner on some systems (and
> on GCC for the AVR, it /is/ defined in a different way - though I have
> no idea why).
>
> And bigger types, such as uint32_t, have been used to force alignment
> for backing store (either because the compiler did not support _Alignas,
> or the programmer did not know about it). (But I am not suggesting that
> plain "uint32_t" should be considered a "byte type" for aliasing purposes.)
>
> > >
> > > Some other compilers might guarantee not to do type-based alias analysis
> > > and thus view all types as "byte types" in this way. For gcc, there
> > > could be a kind of reverse "may_alias" type attribute to create such types.
> > >
> > >
> > >
> > > There are a number of other features that could make allocation
> > > functions more efficient and safer in use, and which could be ideally be
> > > standardised in the C standards or at least added as gcc extensions, but
> > > I think that's more than you are looking for here!
> >
> > It is possible to submit proposal to WG14.
> >
>
> Yes, I know. But giving you some feedback here is a step in that
> direction - even if it turns out that it doesn't affect your wording in
> the end.
Any kind of feedback is very welcome. Thank you!
Martin
> > > On 18/03/2024 08:03, Martin Uecker via Gcc wrote:
> > > >
> > > > Hi,
> > > >
> > > > can you please take a quick look at this? This is intended to align
> > > > the C standard with existing practice with respect to aliasing by
> > > > removing the special rules for "objects with no declared type" and
> > > > making it fully symmetric and only based on types with non-atomic
> > > > character types being able to alias everything.
> > > >
> > > >
> > > > Unrelated to this change, I have another question: I wonder if GCC
> > > > (or any other compiler) actually exploits the " or is copied as an
> > > > array of byte type, " rule to make assumptions about the effective
> > > > types of the target array? I know compilers do this work memcpy...
> > > > Maybe also if a loop is transformed to memcpy?
> > > >
> > > > Martin
> > > >
> > > >
> > > > Add the following definition after 3.5, paragraph 2:
> > > >
> > > > byte array
> > > > object having either no declared type or an array of objects declared with a byte type
> > > >
> > > > byte type
> > > > non-atomic character type
> > > >
> > > > Modify 6.5,paragraph 6:
> > > > The effective type of an object that is not a byte array, for an access to its
> > > > stored value, is the declared type of the object.97) If a value is
> > > > stored into a byte array through an lvalue having a byte type, then
> > > > the type of the lvalue becomes the effective type of the object for that
> > > > access and for subsequent accesses that do not modify the stored value.
> > > > If a value is copied into a byte array using memcpy or memmove, or is
> > > > copied as an array of byte type, then the effective type of the
> > > > modified object for that access and for subsequent accesses that do not
> > > > modify the value is the effective type of the object from which the
> > > > value is copied, if it has one. For all other accesses to a byte array,
> > > > the effective type of the object is simply the type of the lvalue used
> > > > for the access.
> > > >
> > > > https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3230.pdf
> > > >
> > > >
> > > >
> > >
> >
--
Univ.-Prof. Dr. rer. nat. Martin Uecker
Graz University of Technology
Institute of Biomedical Imaging
next prev parent reply other threads:[~2024-03-18 15:00 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-18 7:03 aliasing Martin Uecker
2024-03-18 8:26 ` aliasing Richard Biener
2024-03-18 10:55 ` aliasing Martin Uecker
2024-03-18 11:56 ` aliasing Martin Uecker
2024-03-18 13:21 ` aliasing Richard Biener
2024-03-18 15:13 ` aliasing Martin Uecker
2024-03-18 9:00 ` aliasing David Brown
2024-03-18 10:09 ` aliasing Jonathan Wakely
2024-03-18 11:41 ` aliasing Martin Uecker
2024-03-18 13:29 ` aliasing David Brown
2024-03-18 13:54 ` aliasing Andreas Schwab
2024-03-18 16:46 ` aliasing David Brown
2024-03-18 16:55 ` aliasing David Brown
2024-03-18 15:00 ` Martin Uecker [this message]
2024-03-18 17:11 ` aliasing David Brown
-- strict thread matches above, loose matches on Subject: below --
1999-08-21 9:23 aliasing Jason Moxham
1999-08-21 9:46 ` aliasing Mark Mitchell
1999-08-31 23:20 ` aliasing Mark Mitchell
1999-08-31 23:20 ` aliasing Jason Moxham
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=09f7c87f08fc25083975dd9fd5250cdcbb02993e.camel@tugraz.at \
--to=uecker@tugraz.at \
--cc=david.brown@hesbynett.no \
--cc=gcc@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).