* Assignment of union containing const-qualifier member @ 2024-01-30 21:45 Alejandro Colomar 2024-01-31 18:14 ` Alejandro Colomar 0 siblings, 1 reply; 12+ messages in thread From: Alejandro Colomar @ 2024-01-30 21:45 UTC (permalink / raw) To: gcc-help [-- Attachment #1: Type: text/plain, Size: 1564 bytes --] Hi, I'm trying to do something like the following: $ cat const.c union u { int a; const int b; }; int main(void) { union u u, v; u.a = 42; v = u; } $ cc -Wall -Wextra const.c const.c: In function ‘main’: const.c:12:11: error: assignment of read-only variable ‘v’ 12 | v = u; | ^ const.c:9:21: warning: variable ‘v’ set but not used [-Wunused-but-set-variable] 9 | union u u, v; | ^ The actual data I'm using is not just an int, but that serves to reproduce the problem easily. In reality, the union is more like this: struct rstr { const size_t length; const char *const start; }; union str { struct { size_t length; char *start; } w; struct rstr r; }; I don't see anywhere in C11 that makes this a constraint violation, and considering that the const member is fully overlapped by a non-const member, I don't see why this assignment would be bad. I couldn't find what's the exact semantics of assignment to a union, but I guess it's similar to initialization, and since the first member is non-const, this should be doable. Maybe it's a bit of unspecified behavior, precisely because nothing seems to specify it. Is that really a constraint violation, or should the compiler accept the code? Have a lovely day, Alex -- <https://www.alejandro-colomar.es/> Looking for a remote C programming job at the moment. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Assignment of union containing const-qualifier member 2024-01-30 21:45 Assignment of union containing const-qualifier member Alejandro Colomar @ 2024-01-31 18:14 ` Alejandro Colomar 2024-02-04 7:33 ` Amol Surati 0 siblings, 1 reply; 12+ messages in thread From: Alejandro Colomar @ 2024-01-31 18:14 UTC (permalink / raw) To: gcc-help [-- Attachment #1: Type: text/plain, Size: 2311 bytes --] On Tue, Jan 30, 2024 at 10:45:11PM +0100, Alejandro Colomar wrote: > Hi, > > I'm trying to do something like the following: > > $ cat const.c > union u { > int a; > const int b; > }; > > int > main(void) > { > union u u, v; > > u.a = 42; > v = u; > } > $ cc -Wall -Wextra const.c > const.c: In function ‘main’: > const.c:12:11: error: assignment of read-only variable ‘v’ > 12 | v = u; > | ^ > const.c:9:21: warning: variable ‘v’ set but not used [-Wunused-but-set-variable] > 9 | union u u, v; > | ^ > > The actual data I'm using is not just an int, but that serves to > reproduce the problem easily. In reality, the union is more like this: > > struct rstr { > const size_t length; > const char *const start; > }; > > union str { > struct { > size_t length; > char *start; > } w; > struct rstr r; > }; > > I don't see anywhere in C11 that makes this a constraint violation, and Ahh, I didn't find it. It's specified undex 6.3 (Conversions), 6.3.2.1 (Lvalues, arrays, and function designators) <http://port70.net/~nsz/c/c11/n1570.html#6.3.2.1>: < A modifiable lvalue is an lvalue that does not have array type, does < not have an incomplete type, does not have a const- qualified type, < and if it is a structure or union, does not have any member < (including, recursively, any member or element of all contained < aggregates or unions) with a const- qualified type. I'm wondering if this is a bit conservative, and a union like this could be a modifiable lvalue. I find it useful, since it allows inheriting a non-modifiable version of the string which cannot be made modifiable again, but would let you edit the string as long as you keep the union. I added the named member 'w' to allow copying v.w = u.w, but when the union is part of a larger structure and I need to copy the entire structure, that doesn't help. memcpy(3) does help, but it looses all type safety. Maybe this could be allowed as an extension. Any thoughts? Cheers, Alex -- <https://www.alejandro-colomar.es/> Looking for a remote C programming job at the moment. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Assignment of union containing const-qualifier member 2024-01-31 18:14 ` Alejandro Colomar @ 2024-02-04 7:33 ` Amol Surati 2024-02-04 8:18 ` Amol Surati 2024-02-04 18:40 ` Alejandro Colomar 0 siblings, 2 replies; 12+ messages in thread From: Amol Surati @ 2024-02-04 7:33 UTC (permalink / raw) To: Alejandro Colomar; +Cc: gcc-help On Wed, 31 Jan 2024 at 23:46, Alejandro Colomar via Gcc-help <gcc-help@gcc.gnu.org> wrote: > > On Tue, Jan 30, 2024 at 10:45:11PM +0100, Alejandro Colomar wrote: > > Hi, > > [ ... ] > structure, that doesn't help. memcpy(3) does help, but it looses all > type safety. > > Maybe this could be allowed as an extension. Any thoughts? > Does it make sense to propose that, if the first top-level member of a union is completely (i.e. recursively) writable, then a non-const union object as a whole is writable? If so, then, for union objects a and b of a union that has such const members, a = b can be expected to not raise errors about const-correctness. It seems that a union only provides a view of the object. The union object doesn't automatically become const qualified if a member of the union is const-qualified. This seems to be the reason v.w = u.w works; otherwise, that modification can also be viewed as the modification of an object (v.r) defined with a const-qualified type through the use of an lvalue (v.w) with non-const-qualified type - something that's forbidden by the std. More towards the use of the string as described: If there are multiple such union objects that point to the same string, and if a piece of code decides to modify the string, other consumers of this string remain unaware of the modification, unless they check for it, for e.g., by keeping a copy, calc. hash, etc., to ensure that the string was indeed not silently modified behind their backs. I think it is better to have a 'class' and associated APIs. See [1], for e.g., or the implementation of c++ std::string. The ownership of an object of such a class can be passed by passing a non-const pointer to the object. Functions that are not supposed to own the object can be passed a const pointer. Despite that, if such functions need to modify it for local needs, they can create a copy to work with. One can additionally maintain a ref-count on the char pointer, to avoid having to unnecessarily copy a string if it is going to be placed in several stay-resident-after-return data-structures. -Amol [1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3210.pdf > Cheers, > Alex > > -- > <https://www.alejandro-colomar.es/> > Looking for a remote C programming job at the moment. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Assignment of union containing const-qualifier member 2024-02-04 7:33 ` Amol Surati @ 2024-02-04 8:18 ` Amol Surati 2024-02-04 18:40 ` Alejandro Colomar 1 sibling, 0 replies; 12+ messages in thread From: Amol Surati @ 2024-02-04 8:18 UTC (permalink / raw) To: Alejandro Colomar; +Cc: gcc-help On Sun, 4 Feb 2024 at 13:03, Amol Surati <suratiamol@gmail.com> wrote: > > On Wed, 31 Jan 2024 at 23:46, Alejandro Colomar via Gcc-help > <gcc-help@gcc.gnu.org> wrote: > > > > On Tue, Jan 30, 2024 at 10:45:11PM +0100, Alejandro Colomar wrote: > > > Hi, > > > > > [ ... ] > > > structure, that doesn't help. memcpy(3) does help, but it looses all > > type safety. > > > > Maybe this could be allowed as an extension. Any thoughts? > > > > Does it make sense to propose that, if the first top-level member of a > union is completely (i.e. recursively) writable, then a non-const union > object as a whole is writable? If so, then, for union objects a and b of > a union that has such const members, a = b can be expected to not > raise errors about const-correctness. This doesn't look okay- what if the last written-to member was not the first member of the union? The assignment can either copy the full size of the union (i.e. sizeof (b)) or copy just the member that was last written to within b. The latter requires the compiler to maintain an 'active' member of unions, which I believe isn't expected from the compilers. I believe the compiler performs an assignment a=b through memcpy (or through statements that have similar effects as that of memcpy) that, as you noticed, ignores type-safety. It may be because of such obstacles that the presence of a const member in the union prevents a union object from being a modifiable lvalue. > > It seems that a union only provides a view of the object. The union > object doesn't automatically become const qualified if a member > of the union is const-qualified. This seems to be the reason v.w = u.w > works; otherwise, that modification can also be viewed as the > modification of an object (v.r) defined with a const-qualified type through > the use of an lvalue (v.w) with non-const-qualified type - something that's > forbidden by the std. > > More towards the use of the string as described: > If there are multiple such union objects that point to the same string, > and if a piece of code decides to modify the string, other consumers of > this string remain unaware of the modification, unless they check for it, > for e.g., by keeping a copy, calc. hash, etc., to ensure that the string was > indeed not silently modified behind their backs. > > I think it is better to have a 'class' and associated APIs. > See [1], for e.g., or the implementation of c++ std::string. > > The ownership of an object of such a class can be passed by passing > a non-const pointer to the object. > > Functions that are not supposed to own the object can be passed a > const pointer. Despite that, if such functions need to modify it for local > needs, they can create a copy to work with. > > One can additionally maintain a ref-count on the char pointer, to avoid > having to unnecessarily copy a string if it is going to be placed in several > stay-resident-after-return data-structures. > > -Amol > > [1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3210.pdf > > > Cheers, > > Alex > > > > -- > > <https://www.alejandro-colomar.es/> > > Looking for a remote C programming job at the moment. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Assignment of union containing const-qualifier member 2024-02-04 7:33 ` Amol Surati 2024-02-04 8:18 ` Amol Surati @ 2024-02-04 18:40 ` Alejandro Colomar 2024-02-04 20:37 ` Alejandro Colomar 2024-02-07 4:13 ` Amol Surati 1 sibling, 2 replies; 12+ messages in thread From: Alejandro Colomar @ 2024-02-04 18:40 UTC (permalink / raw) To: Amol Surati; +Cc: gcc-help [-- Attachment #1: Type: text/plain, Size: 8760 bytes --] Hi Amol, On Sun, Feb 04, 2024 at 01:03:48PM +0530, Amol Surati wrote: > On Wed, 31 Jan 2024 at 23:46, Alejandro Colomar via Gcc-help > <gcc-help@gcc.gnu.org> wrote: > > > > On Tue, Jan 30, 2024 at 10:45:11PM +0100, Alejandro Colomar wrote: > > > Hi, > > > > > [ ... ] > > > structure, that doesn't help. memcpy(3) does help, but it looses all > > type safety. > > > > Maybe this could be allowed as an extension. Any thoughts? > > > > Does it make sense to propose that, if the first top-level member of a > union is completely (i.e. recursively) writable, then a non-const union > object as a whole is writable? If so, then, for union objects a and b of > a union that has such const members, a = b can be expected to not > raise errors about const-correctness. To have a specific proposal, I'll specify it as a diff of ISO C11: $ diff -u c11 suggestion --- c11 2024-02-04 19:37:27.520851005 +0100 +++ suggestion 2024-02-04 19:38:56.785402567 +0100 @@ -8,8 +8,8 @@ does not have array type, does not have an incomplete type, does not have a const- qualified type, -and if it is a structure or union, -does not have any member +and if it is a structure does not have any member, +or if it is a union does not have all members, (including, recursively, any member or element of all contained aggregates or unions) with a const- qualified type. (Modifying <http://port70.net/~nsz/c/c11/n1570.html#6.3.2.1p1>) > > It seems that a union only provides a view of the object. The union > object doesn't automatically become const qualified if a member > of the union is const-qualified. This seems to be the reason v.w = u.w > works; otherwise, that modification can also be viewed as the > modification of an object (v.r) defined with a const-qualified type through > the use of an lvalue (v.w) with non-const-qualified type - something that's > forbidden by the std. Modifying a union via a non-const member is fine in C, I believe. I think you're creating a new object, and discarding the old one, so you don't need to care if there was an old object defined via a const-qualified type. That is, the following code is valid C, AFAIK: alx@debian:~/tmp$ cat u.c union u { int a; const int b; }; int main(void) { union u u = {.b = 42}; u.a = 7; return u.b; } alx@debian:~/tmp$ gcc-14 -Wall -Wextra u.c alx@debian:~/tmp$ ./a.out ; echo $? 7 alx@debian:~/tmp$ clang-17 -Weverything u.c alx@debian:~/tmp$ ./a.out ; echo $? 7 > More towards the use of the string as described: > If there are multiple such union objects that point to the same string, > and if a piece of code decides to modify the string, other consumers of > this string remain unaware of the modification, unless they check for it, > for e.g., by keeping a copy, calc. hash, etc., to ensure that the string was > indeed not silently modified behind their backs. `const` only guarantees that an object is not modified through that pointer. As long as you keep another pointer to the same object, it can be modified via that other pointer. To guarantee that an object is really constant --at least for what concerns a function--, you need to also specify `restrict`. If you have a `const type* restrict`, then you know for sure it is constant, as far as the current function is concerned. If you're worried about multi-threaded programs, well, unions aren't any more problematic here than passing a `const T *restrict` to a function, and modifying it in another thread via a non-const lvalue. As long as the original object wasn't const, that's fair game. It's the programmer's task to make sure the functions behave well if that can happen. > > I think it is better to have a 'class' and associated APIs. But we can't have that in C. > See [1], for e.g., or the implementation of c++ std::string. > > The ownership of an object of such a class can be passed by passing > a non-const pointer to the object. > > Functions that are not supposed to own the object can be passed a > const pointer. Despite that, if such functions need to modify it for local > needs, they can create a copy to work with. > > One can additionally maintain a ref-count on the char pointer, to avoid > having to unnecessarily copy a string if it is going to be placed in several > stay-resident-after-return data-structures. I normally prefer simple C strings, with a simple pointer. The reason I'm using this struct+union is performance. In nginx, to reduce memory consumption (you can get substrings by copying a pointer and specifying a length), and also avoid calculating lengths of strings more than once, we use these structures. So far, we were using a simple struct: typedef struct string { size_t length; u_char *start; } string_t; // it has a different name, but let's keep it simple But that means we basically can't use `const` at all with our strings. Because if you specify void foo(const string_t *str); that means that you can't modify the pointer, but you can actually modify the pointee. Which means that you can't guarantee that a string isn't corrupted after some call, unless you inspect all the code that the function calls, recursively. I started working on a way to improve these strings around a year ago, and have recently come up with something. > > -Amol > > [1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3210.pdf Maybe you can get something from what I've learnt with strings in Nginx, since they're quite close to what that proposal has. The main concern I have with that proposal is the same concern I've had with strings in Nginx so far: you can't really make them `const`. Unless you make the type opaque, and only provide accessors via functions that protect the strings even if they could modify them. You can only make them const, if you use two distinct types: a read-only version, let's call it rstring, and a read-write version, let's call it string. struct rstring_s { size_t length; const char *start; }; union nxt_string_u { struct { size_t length; char *start; }; struct { size_t length; char *start; } w; const rstring_t r; }; In Nginx we have another complexity: we don't necessarily terminate our strings: this allows getting a substring in the middle of another string without needing to make an actual copy of the memory. But then it means we need more types to have type safety. I haven't finished developing that, so I can't tell you if the code below does work, but this is what I'm really working with at the moment: struct nxt_rstr_s { size_t length; const u_char *start; }; union nxt_str_u { struct { size_t length; u_char *start; }; struct { size_t length; u_char *start; } w; const nxt_rstr_t r; }; union nxt_rstrz_u { struct { size_t length; union { const u_char *start; const char *cstrz; }; }; struct { size_t length; const u_char *start; } w; const nxt_rstr_t r; }; union nxt_strz_u { struct { size_t length; union { u_char *start; char *cstrz; }; }; struct { size_t length; u_char *start; } w; const nxt_rstr_t r; const nxt_rstrz_t rz; }; Structures `***z` contain null-terminated strings, while the other ones don't. You can read terminated strings as non-terminated ones, but not the other way. And you can access writable strings as read-only strings, but not the other way around. (We use `u_char` to avoid the problems that `char` has due to its ambiguous sign; I would personally prefer using -funsigned-char, but that's what it is, for historic reasons.) Anyway, that `u_char` makes sure we don't mix our strings with libc calls accidentally, and I only provide the `cstrz` member in unions that actually provide a libc-compatible string view. Have a lovely day, Alex -- <https://www.alejandro-colomar.es/> Looking for a remote C programming job at the moment. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Assignment of union containing const-qualifier member 2024-02-04 18:40 ` Alejandro Colomar @ 2024-02-04 20:37 ` Alejandro Colomar 2024-02-07 4:13 ` Amol Surati 1 sibling, 0 replies; 12+ messages in thread From: Alejandro Colomar @ 2024-02-04 20:37 UTC (permalink / raw) To: Amol Surati; +Cc: gcc-help [-- Attachment #1: Type: text/plain, Size: 2524 bytes --] On Sun, Feb 04, 2024 at 07:40:23PM +0100, Alejandro Colomar wrote: > > It seems that a union only provides a view of the object. The union > > object doesn't automatically become const qualified if a member > > of the union is const-qualified. This seems to be the reason v.w = u.w > > works; otherwise, that modification can also be viewed as the > > modification of an object (v.r) defined with a const-qualified type through > > the use of an lvalue (v.w) with non-const-qualified type - something that's > > forbidden by the std. > > Modifying a union via a non-const member is fine in C, I believe. I > think you're creating a new object, and discarding the old one, so you > don't need to care if there was an old object defined via a > const-qualified type. That is, the following code is valid C, AFAIK: > > alx@debian:~/tmp$ cat u.c > union u { > int a; > const int b; > }; > > int > main(void) > { > union u u = {.b = 42}; > > u.a = 7; > return u.b; > } > alx@debian:~/tmp$ gcc-14 -Wall -Wextra u.c > alx@debian:~/tmp$ ./a.out ; echo $? > 7 > alx@debian:~/tmp$ clang-17 -Weverything u.c > alx@debian:~/tmp$ ./a.out ; echo $? > 7 Although I would have doubts about what happens if the union contains an aggregate, and you modify part of it: union u { int a[2]; const int b[2]; }; int main(void) { union u u = {.b = {42, 13}}; u.a[0] = 7; return u.b[0]; } or union u { struct { int a0; int a1; } a; const struct { const int b0; const int b1; } b; }; int main(void) { union u u = {.b = {42, 13}}; u.a.a0 = 7; return u.b.b0; } In these cases, we're mnodifying a const object (.b), via a non-const lvalue (.a.a0, and .a.a1). We're modifying it, because we keep the original value at .b.b1, but the new one at .b.b0, so in this case we can't say it's an entire new object. Is this legal? Both GCC and Clang accept it, but I have doubts. Maybe you could warn if the program sets a union via a const member when there are non-const members, because it seems like something wrong to do. But as long as you don't set the const member, you should be fine according to ISO C. That is, if you initialize the union via .a, then anything is valid, as the object is defined non-const. Have a lovely night, Alex -- <https://www.alejandro-colomar.es/> Looking for a remote C programming job at the moment. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Assignment of union containing const-qualifier member 2024-02-04 18:40 ` Alejandro Colomar 2024-02-04 20:37 ` Alejandro Colomar @ 2024-02-07 4:13 ` Amol Surati 2024-02-07 13:29 ` Alejandro Colomar 1 sibling, 1 reply; 12+ messages in thread From: Amol Surati @ 2024-02-07 4:13 UTC (permalink / raw) To: Alejandro Colomar; +Cc: gcc-help Hi Alex, On Mon, 5 Feb 2024 at 00:10, Alejandro Colomar <alx@kernel.org> wrote: > > Hi Amol, > > On Sun, Feb 04, 2024 at 01:03:48PM +0530, Amol Surati wrote: > > On Wed, 31 Jan 2024 at 23:46, Alejandro Colomar via Gcc-help > > <gcc-help@gcc.gnu.org> wrote: > > > > > > On Tue, Jan 30, 2024 at 10:45:11PM +0100, Alejandro Colomar wrote: > > > > Hi, > > > > > > > > [ ... ] > > > > > structure, that doesn't help. memcpy(3) does help, but it looses all > > > type safety. > > > > > > Maybe this could be allowed as an extension. Any thoughts? > > > > > > > Does it make sense to propose that, if the first top-level member of a > > union is completely (i.e. recursively) writable, then a non-const union > > object as a whole is writable? If so, then, for union objects a and b of > > a union that has such const members, a = b can be expected to not > > raise errors about const-correctness. > > To have a specific proposal, I'll specify it as a diff of ISO C11: > > $ diff -u c11 suggestion > --- c11 2024-02-04 19:37:27.520851005 +0100 > +++ suggestion 2024-02-04 19:38:56.785402567 +0100 > @@ -8,8 +8,8 @@ > does not have array type, > does not have an incomplete type, > does not have a const- qualified type, > -and if it is a structure or union, > -does not have any member > +and if it is a structure does not have any member, > +or if it is a union does not have all members, As mentioned in my other reply, the assignment from one object to another (of the same union type that has a const member) will have to copy the size of the largest member, regardless of which member was active the last time the source object was written to. Such a copy, I believe, is a sort of memcpy that the compiler inserts. And as you had already noticed before, memcpy loses all type-safety. These might be the reasons the std flags an error when such union objects are assigned to - assignment implies type-erasing memcpy. > (including, recursively, > any member or element of all contained aggregates or unions) > with a const- qualified type. > > (Modifying <http://port70.net/~nsz/c/c11/n1570.html#6.3.2.1p1>) [ ... ] > Modifying a union via a non-const member is fine in C, I believe. I That's true, but as you noted in your other reply, unions allow diluting the type-safety to a great extent. Linux kernel has a piece of code that does modify such a union although it possesses decorations for the sparse semantic checker. In its essence, it is similar to: union { const int flags; int __attribute__ ((noderef)) __flags; }; See https://sparse.docs.kernel.org/en/latest/annotations.html#noderef [ ... ] > > I think it is better to have a 'class' and associated APIs. > > But we can't have that in C. By 'class', I meant a rigid way of accessing the struct strictly through the provided APIs. [ ... ] > The main concern I have with that proposal is the same concern I've had > with strings in Nginx so far: you can't really make them `const`. > Unless you make the type opaque, and only provide accessors via > functions that protect the strings even if they could modify them. The type need not be opaque, to allow for optimizations. But the users of the type must behave as if the type were opaque. The bulk of the API can be 'static inline' functions within the header, to allow for optimizations and inlining within a single translation unit. There's also LTO and thin-LTOs to carry out optimizations across translation units, if needed. > > You can only make them const, if you use two distinct types: a read-only > version, let's call it rstring, and a read-write version, let's call it > string. But do we need both views (r/w and r/o) at the same time? > > struct rstring_s { > size_t length; > const char *start; > }; > > > union nxt_string_u { > struct { > size_t length; > char *start; > }; > struct { > size_t length; > char *start; > } w; > const rstring_t r; > }; > > In Nginx we have another complexity: we don't necessarily terminate our > strings: this allows getting a substring in the middle of another string > without needing to make an actual copy of the memory. But then it means > we need more types to have type safety. I haven't finished developing > that, so I can't tell you if the code below does work, but this is what > I'm really working with at the moment: > > struct nxt_rstr_s { > size_t length; > const u_char *start; > }; > > > union nxt_str_u { > struct { > size_t length; > u_char *start; > }; > struct { > size_t length; > u_char *start; > } w; > const nxt_rstr_t r; > }; > > > union nxt_rstrz_u { > struct { > size_t length; > union { > const u_char *start; > const char *cstrz; > }; > }; > struct { > size_t length; > const u_char *start; > } w; > const nxt_rstr_t r; > }; > > > union nxt_strz_u { > struct { > size_t length; > union { > u_char *start; > char *cstrz; > }; > }; > struct { > size_t length; > u_char *start; > } w; > const nxt_rstr_t r; > const nxt_rstrz_t rz; > }; > > Structures `***z` contain null-terminated strings, while the other ones > don't. You can read terminated strings as non-terminated ones, but not > the other way. And you can access writable strings as read-only > strings, but not the other way around. > > (We use `u_char` to avoid the problems that `char` has due to its > ambiguous sign; I would personally prefer using -funsigned-char, but > that's what it is, for historic reasons.) > Anyway, that `u_char` makes sure we don't mix our strings with libc > calls accidentally, and I only provide the `cstrz` member in unions that > actually provide a libc-compatible string view. Passing these structs+unions is still passing the string as r/w. It does not seem too far away from passing non-const pointers and sizes. In a sort of a project of mine, I had to implement a buffer that can be used to store ref-counted (non-terminated) strings and binary data. I did initially create two different types, buf_rw and buf_ro, but then scrapped that for what I have now, a skeleton of which is shown below: struct string { size_t size; const char *data; }; data[i] is now marked as non-modifiable. Passing 'const struct string *' to a function implies that the function cannot modify (in addition to data[i]), the fields size and data, through the passed ptr. A function that is passed 'struct string *' can still be allowed to inplace-modify data[i] by invoking appropriate string APIs that temporarily casts away the constness of data[i]. The SEI CERT coding standard for C allows for an exception for such modifications. See https://wiki.sei.cmu.edu/confluence/display/c/EXP05-C.+Do+not+cast+away+a+const+qualification#:~:text=end%20is%20undefined%20*/-,EXP05%2DC%2DEX3,-%3A%20Because%20const Some other options: Make use of the sparse semantic checker. It might help to name the fields with underscores to highlight the fact that they are internal fields. For e.g. 'str_size__' and 'str_data__'. Any use of these fields outside of the dedicated API functions can now be easily traced by simple grep searches. It might also help to create a visible type such as struct string { const uintptr_t raw[2]; }; and let the API manage the translation between uintptr_t and size_t/char *. -Amol [1] https://wiki.sei.cmu.edu/confluence/display/c/EXP05-C.+Do+not+cast+away+a+const+qualification#:~:text=end%20is%20undefined%20*/-,EXP05%2DC%2DEX3,-%3A%20Because%20const > > Have a lovely day, > Alex > > -- > <https://www.alejandro-colomar.es/> > Looking for a remote C programming job at the moment. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Assignment of union containing const-qualifier member 2024-02-07 4:13 ` Amol Surati @ 2024-02-07 13:29 ` Alejandro Colomar 2024-02-12 10:45 ` Amol Surati 0 siblings, 1 reply; 12+ messages in thread From: Alejandro Colomar @ 2024-02-07 13:29 UTC (permalink / raw) To: Amol Surati; +Cc: gcc-help [-- Attachment #1: Type: text/plain, Size: 12172 bytes --] Hi Amol, On Wed, Feb 07, 2024 at 09:43:55AM +0530, Amol Surati wrote: > > To have a specific proposal, I'll specify it as a diff of ISO C11: > > > > $ diff -u c11 suggestion > > --- c11 2024-02-04 19:37:27.520851005 +0100 > > +++ suggestion 2024-02-04 19:38:56.785402567 +0100 > > @@ -8,8 +8,8 @@ > > does not have array type, > > does not have an incomplete type, > > does not have a const- qualified type, > > -and if it is a structure or union, > > -does not have any member > > +and if it is a structure does not have any member, > > +or if it is a union does not have all members, > > As mentioned in my other reply, the assignment from one object to > another (of the same union type that has a const member) will have > to copy the size of the largest member, regardless of which member > was active the last time the source object was written to. Such a > copy, I believe, is a sort of memcpy that the compiler inserts. And > as you had already noticed before, memcpy loses all type-safety. The fact that the compiler does the equivalent of memcpy(3) doesn't imply a removal of type safety. The copiler is aware of the union type at the assignment, and so can emit diagnostics if anything is wrong. When assigning to a union member, only that member needs to be copied, and I'm not sure what happens to other members that have a larger size. As far as I can read in the standard, it's Undefined Behavior. Maybe someone can correct me. When assigning to a union (not its members), the standard doesn't seem to specify at all the semantics, so I can't really tell. Depending on the semantics of assigning to a member, I can tell. At first glance, I don't see anything we should worry. > These might be the reasons the std flags an error when such union > objects are assigned to - assignment implies type-erasing memcpy. > > > (including, recursively, > > any member or element of all contained aggregates or unions) > > with a const- qualified type. > > > > (Modifying <http://port70.net/~nsz/c/c11/n1570.html#6.3.2.1p1>) > > [ ... ] > > > Modifying a union via a non-const member is fine in C, I believe. I > > That's true, but as you noted in your other reply, unions allow diluting > the type-safety to a great extent. Not really. If well used, they do actually improve type safety. Think of my string unions vs the struct proposed by Martin. My union has const correctness, which is impossible to achieve via structs. A similar problem happens with struct sockaddr, which for historic reasons is a series of unrelated structures. If they had been designed as a hierarchic union, it wouldn't be so problematic, and you wouldn't need to cast everywhere to use them. > > Linux kernel has a piece of code that does modify such a union > although it possesses decorations for the sparse semantic checker. > In its essence, it is similar to: > > union { > const int flags; > int __attribute__ ((noderef)) __flags; > }; > See https://sparse.docs.kernel.org/en/latest/annotations.html#noderef > > [ ... ] > > > > I think it is better to have a 'class' and associated APIs. > > > > But we can't have that in C. > > By 'class', I meant a rigid way of accessing the struct strictly through the > provided APIs. That's precisely what a union offers. You can access it via a union member, which then restricts you to use the functions that accept that type, or you can keep the entire union for unlimited use. > > [ ... ] > > > The main concern I have with that proposal is the same concern I've had > > with strings in Nginx so far: you can't really make them `const`. > > Unless you make the type opaque, and only provide accessors via > > functions that protect the strings even if they could modify them. > > The type need not be opaque, to allow for optimizations. But the users of > the type must behave as if the type were opaque. The bulk of the API can > be 'static inline' functions within the header, to allow for optimizations and > inlining within a single translation unit. There's also LTO and thin-LTOs to > carry out optimizations across translation units, if needed. Indeed. But I'm not sure I'd like a string type that I must treat as a FILE*. Also, how do you initialize such strings? With an EMPTY_STRING macro? Do you always get a heap pointer, so you can't declare a string in the stack as this? struct string s; What's the content of that string? Is a NULL pointer a valid state of the string? In NGINX we found this week several cases where we were doing pointer arithmetics on strings that were holding NULL, precisely due to this (or it was after some calloc(3), more likely, I don't remember). It was NULL + 0, which is the least of undefined behaviors I'm concerned about, but hey, there's another NULL to worry about. So, you'll need to do struct string s = EMPTY_STRING; to avoid ever having a NULL, which is yet another thing to worry about. Having suffered these strings, I wouldn't recommend them as something better than a char*. In NGINX, we have them due to historic reasons, and rewriting the entire code base would be unreasonable, and while they may have been a good optimization 2 decades ago, I bet that it's a useless micro-optimization today, and we could just use char* everywhere to have less problems. > > You can only make them const, if you use two distinct types: a read-only > > version, let's call it rstring, and a read-write version, let's call it > > string. > > But do we need both views (r/w and r/o) at the same time? Yes: I want to be able to take a r/w string and pass it to a function that accepts a r/o string. Since distinct types are incompatible, and this is the main problem with things like sockaddr, you can't just cast; that's likely to invoke Undefined Behavior. It's the same as I can pass a char* to a function that accepts a const char*, without doing a cast. > > > > > struct rstring_s { > > size_t length; > > const char *start; > > }; > > > > > > union nxt_string_u { > > struct { > > size_t length; > > char *start; > > }; > > struct { > > size_t length; > > char *start; > > } w; > > const rstring_t r; > > }; > > > > In Nginx we have another complexity: we don't necessarily terminate our > > strings: this allows getting a substring in the middle of another string > > without needing to make an actual copy of the memory. But then it means > > we need more types to have type safety. I haven't finished developing > > that, so I can't tell you if the code below does work, but this is what > > I'm really working with at the moment: > > > > struct nxt_rstr_s { > > size_t length; > > const u_char *start; > > }; > > > > > > union nxt_str_u { > > struct { > > size_t length; > > u_char *start; > > }; > > struct { > > size_t length; > > u_char *start; > > } w; > > const nxt_rstr_t r; > > }; > > > > > > union nxt_rstrz_u { > > struct { > > size_t length; > > union { > > const u_char *start; > > const char *cstrz; > > }; > > }; > > struct { > > size_t length; > > const u_char *start; > > } w; > > const nxt_rstr_t r; > > }; > > > > > > union nxt_strz_u { > > struct { > > size_t length; > > union { > > u_char *start; > > char *cstrz; > > }; > > }; > > struct { > > size_t length; > > u_char *start; > > } w; > > const nxt_rstr_t r; > > const nxt_rstrz_t rz; > > }; > > > > Structures `***z` contain null-terminated strings, while the other ones > > don't. You can read terminated strings as non-terminated ones, but not > > the other way. And you can access writable strings as read-only > > strings, but not the other way around. > > > > (We use `u_char` to avoid the problems that `char` has due to its > > ambiguous sign; I would personally prefer using -funsigned-char, but > > that's what it is, for historic reasons.) > > Anyway, that `u_char` makes sure we don't mix our strings with libc > > calls accidentally, and I only provide the `cstrz` member in unions that > > actually provide a libc-compatible string view. > > Passing these structs+unions is still passing the string as r/w. It does > not seem too far away from passing non-const pointers and sizes. The thing is, I don't always pass the union. If I want to write the string in a function, I pass the union. If I want it r/o, I pass the const member. > In a sort of a project of mine, I had to implement a buffer that can be used > to store ref-counted (non-terminated) strings and binary data. I did initially > create two different types, buf_rw and buf_ro, but then scrapped that for > what I have now, a skeleton of which is shown below: > > struct string { > size_t size; > const char *data; > }; > > data[i] is now marked as non-modifiable. > > Passing 'const struct string *' to a function implies that the function cannot > modify (in addition to data[i]), the fields size and data, through the > passed ptr. > > A function that is passed 'struct string *' can still be allowed to > inplace-modify data[i] by invoking appropriate string APIs that temporarily > casts away the constness of data[i]. That would be a violation of const correctness. If necessary, yeah, go ahead. But personally, I'd prefer avoiding that. For something that is proposed as an inclusion to ISO C, I'd rather not have it. I would prefer that a libstring adds those APIs, rather than having them in the standard. They aren't uncontroversial. Neither char* is uncontroversial, but at least it's simple. > The SEI CERT coding standard for C allows for an exception for such > modifications. See > https://wiki.sei.cmu.edu/confluence/display/c/EXP05-C.+Do+not+cast+away+a+const+qualification#:~:text=end%20is%20undefined%20*/-,EXP05%2DC%2DEX3,-%3A%20Because%20const > > Some other options: > Make use of the sparse semantic checker. > > It might help to name the fields with underscores to highlight the fact that > they are internal fields. For e.g. 'str_size__' and 'str_data__'. Any use of > these fields outside of the dedicated API functions can now be easily > traced by simple grep searches. > > It might also help to create a visible type such as > > struct string { > const uintptr_t raw[2]; > }; > > and let the API manage the translation between uintptr_t and size_t/char *. But you'll need a transparent union to do the magic. All of that shows that it isn't an easy thing through structures. unions at least have it easier. Cheers, Alex -- <https://www.alejandro-colomar.es/> Looking for a remote C programming job at the moment. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Assignment of union containing const-qualifier member 2024-02-07 13:29 ` Alejandro Colomar @ 2024-02-12 10:45 ` Amol Surati 2024-02-12 11:18 ` Amol Surati 2024-03-18 9:19 ` Alejandro Colomar 0 siblings, 2 replies; 12+ messages in thread From: Amol Surati @ 2024-02-12 10:45 UTC (permalink / raw) To: Alejandro Colomar; +Cc: gcc-help Hi Alex, On Wed, 7 Feb 2024 at 18:59, Alejandro Colomar <alx@kernel.org> wrote: > > Hi Amol, > > On Wed, Feb 07, 2024 at 09:43:55AM +0530, Amol Surati wrote: [ ... ] > The fact that the compiler does the equivalent of memcpy(3) doesn't > imply a removal of type safety. > > The copiler is aware of the union type at the assignment, and so can > emit diagnostics if anything is wrong. True. That's the diagnostics that was emitted when running 'v=u'. > > When assigning to a union member, only that member needs to be copied, > and I'm not sure what happens to other members that have a larger size. > As far as I can read in the standard, it's Undefined Behavior. Maybe > someone can correct me. The std says: "When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values." > When assigning to a union (not its members), the standard doesn't seem > to specify at all the semantics, so I can't really tell. Depending on > the semantics of assigning to a member, I can tell. At first glance, I > don't see anything we should worry. [ ... ] > Not really. If well used, they do actually improve type safety. Think > of my string unions vs the struct proposed by Martin. My union has > const correctness, which is impossible to achieve via structs. A I am not sure I fully understand how using structs forbids achieving const-correctness (vs the unions as declared), but I am also not aware of all the ways in which nginx uses strings. [ ... ] > Indeed. But I'm not sure I'd like a string type that I must treat as a > FILE*. Also, how do you initialize such strings? With an EMPTY_STRING > macro? Do you always get a heap pointer, so you can't declare a string > in the stack as this? > > struct string s; > > What's the content of that string? Is a NULL pointer a valid state of > the string? In NGINX we found this week several cases where we were > doing pointer arithmetics on strings that were holding NULL, precisely > due to this (or it was after some calloc(3), more likely, I don't > remember). It was NULL + 0, which is the least of undefined behaviors > I'm concerned about, but hey, there's another NULL to worry about. > > So, you'll need to do > > struct string s = EMPTY_STRING; > > to avoid ever having a NULL, which is yet another thing to worry about. > > Having suffered these strings, I wouldn't recommend them as something > better than a char*. In NGINX, we have them due to historic reasons, > and rewriting the entire code base would be unreasonable, and while they > may have been a good optimization 2 decades ago, I bet that it's a > useless micro-optimization today, and we could just use char* everywhere > to have less problems. If a variable is going to be overwritten without being read first, then there's no need to initialize it. Within my implementation, the state resulting from zero-init, i.e. 'struct string s = {0};', is deliberately chosen to be well-defined for the implementation. For e.g. appending a char to a zero-init'd string is well-defined. > > > You can only make them const, if you use two distinct types: a read-only > > > version, let's call it rstring, and a read-write version, let's call it > > > string. > > > > But do we need both views (r/w and r/o) at the same time? > > Yes: I want to be able to take a r/w string and pass it to a function > that accepts a r/o string. Since distinct types are incompatible, and > this is the main problem with things like sockaddr, you can't just cast; > that's likely to invoke Undefined Behavior. The std says: "A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer." So typecasting between object types is well-defined, given that the resulting pointer is indeed correctly aligned for the target type. The various sock_addr types all impose a common default alignment, because they all begin with a member of the same type. The std. also says: "An object shall have its stored value accessed only by an lvalue expression that has one of the following types: . . . — a character type." For e.g., the _sys_bind system-call under Linux kernel receives the same 'const struct sockaddr *', along with the size of the data, from glibc. The kernel then copies the data using memcpy (or equivalent routines). Does the pointer conversion forbid the kernel from memcpy'ing the object using that converted pointer? (It is likely that the converted pointer, if it is not of compatible type, may not be dereferenced using that type; but here the kernel isn't dereferencing the pointer using the 'const struct sockaddr *' type; it is reading the obj-representation using memcpy.) If no, then this behaviour is well-defined. If yes, then this behaviour is undefined, although, in practice, both gcc and clang (the compilers that can compile Linux kernels) must be allowing the expected behaviour here, since the compiled kernel binaries do indeed work wherever they are deployed. This only means that, *if* the behaviour turns out to be theoretically undefined, *then* the std. can be augmented to specifically allow reading of representations of objects pointed to by such suitably converted pointers, if it does not already allow such reads. [ ... ] > > A function that is passed 'struct string *' can still be allowed to > > inplace-modify data[i] by invoking appropriate string APIs that temporarily > > casts away the constness of data[i]. > > That would be a violation of const correctness. If necessary, yeah, go > ahead. But personally, I'd prefer avoiding that. For something that is > proposed as an inclusion to ISO C, I'd rather not have it. I keep two bits of info within the string: (1) whether the 'struct string' is heap-allocated, or not. (2) whether the data (const char *) it points to is heap-allocated, or not. The std. says: "If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined." Given that a malloc'd object doesn't have its own declaration or a declared type to begin with, it also doesn't have a definition, let alone a const-qualified definition. Modifying malloc'd objects should not cause const-correctness to be violated, even when casting away const on a pointer that points to such an object. Modifiable strings can be created from statically allocated data (for e.g., pointing inside the .rodata section), and the data can be malloc'd and changed/appended-to on a copy-on-write basis. -Amol > > I would prefer that a libstring adds those APIs, rather than having them > in the standard. They aren't uncontroversial. Neither char* is > uncontroversial, but at least it's simple. > > > The SEI CERT coding standard for C allows for an exception for such > > modifications. See > > https://wiki.sei.cmu.edu/confluence/display/c/EXP05-C.+Do+not+cast+away+a+const+qualification#:~:text=end%20is%20undefined%20*/-,EXP05%2DC%2DEX3,-%3A%20Because%20const > > > > Some other options: > > Make use of the sparse semantic checker. > > > > It might help to name the fields with underscores to highlight the fact that > > they are internal fields. For e.g. 'str_size__' and 'str_data__'. Any use of > > these fields outside of the dedicated API functions can now be easily > > traced by simple grep searches. > > > > It might also help to create a visible type such as > > > > struct string { > > const uintptr_t raw[2]; > > }; > > > > and let the API manage the translation between uintptr_t and size_t/char *. > > But you'll need a transparent union to do the magic. All of that shows > that it isn't an easy thing through structures. unions at least have it > easier. > > Cheers, > Alex > > -- > <https://www.alejandro-colomar.es/> > Looking for a remote C programming job at the moment. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Assignment of union containing const-qualifier member 2024-02-12 10:45 ` Amol Surati @ 2024-02-12 11:18 ` Amol Surati 2024-03-18 9:19 ` Alejandro Colomar 1 sibling, 0 replies; 12+ messages in thread From: Amol Surati @ 2024-02-12 11:18 UTC (permalink / raw) To: Alejandro Colomar; +Cc: gcc-help Typo in the sock_addr section. Please read "So typecasting between object types is well-defined, given that the" as "So typecasting between pointer types is well-defined, given that the" > The std says: > "A pointer to an object type may be converted to a pointer to a different > object type. If the resulting pointer is not correctly aligned for the > referenced type, the behavior is undefined. Otherwise, when converted > back again, the result shall compare equal to the original pointer." > > So typecasting between object types is well-defined, given that the > resulting pointer is indeed correctly aligned for the target type. The > various sock_addr types all impose a common default alignment, > because they all begin with a member of the same type. -Amol ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Assignment of union containing const-qualifier member 2024-02-12 10:45 ` Amol Surati 2024-02-12 11:18 ` Amol Surati @ 2024-03-18 9:19 ` Alejandro Colomar 2024-03-18 9:23 ` Alejandro Colomar 1 sibling, 1 reply; 12+ messages in thread From: Alejandro Colomar @ 2024-03-18 9:19 UTC (permalink / raw) To: Amol Surati; +Cc: gcc-help [-- Attachment #1: Type: text/plain, Size: 8728 bytes --] Hi Amol, On Mon, Feb 12, 2024 at 04:15:24PM +0530, Amol Surati wrote: > Hi Alex, > > On Wed, 7 Feb 2024 at 18:59, Alejandro Colomar <alx@kernel.org> wrote: > > > > Hi Amol, > > > > On Wed, Feb 07, 2024 at 09:43:55AM +0530, Amol Surati wrote: > [ ... ] > > The fact that the compiler does the equivalent of memcpy(3) doesn't > > imply a removal of type safety. > > > > The copiler is aware of the union type at the assignment, and so can > > emit diagnostics if anything is wrong. > > True. That's the diagnostics that was emitted when running 'v=u'. > > > > > When assigning to a union member, only that member needs to be copied, > > and I'm not sure what happens to other members that have a larger size. > > As far as I can read in the standard, it's Undefined Behavior. Maybe > > someone can correct me. > > The std says: > "When a value is stored in a member of an object of union type, the bytes > of the object representation that do not correspond to that member but > do correspond to other members take unspecified values." Hmm, interesting. > > When assigning to a union (not its members), the standard doesn't seem > > to specify at all the semantics, so I can't really tell. Depending on > > the semantics of assigning to a member, I can tell. At first glance, I > > don't see anything we should worry. > > [ ... ] > > Not really. If well used, they do actually improve type safety. Think > > of my string unions vs the struct proposed by Martin. My union has > > const correctness, which is impossible to achieve via structs. A > > I am not sure I fully understand how using structs forbids achieving > const-correctness (vs the unions as declared), > but I am also not aware > of all the ways in which nginx uses strings. > > [ ... ] > > > Indeed. But I'm not sure I'd like a string type that I must treat as a > > FILE*. Also, how do you initialize such strings? With an EMPTY_STRING > > macro? Do you always get a heap pointer, so you can't declare a string > > in the stack as this? > > > > struct string s; > > > > What's the content of that string? Is a NULL pointer a valid state of > > the string? In NGINX we found this week several cases where we were > > doing pointer arithmetics on strings that were holding NULL, precisely > > due to this (or it was after some calloc(3), more likely, I don't > > remember). It was NULL + 0, which is the least of undefined behaviors > > I'm concerned about, but hey, there's another NULL to worry about. > > > > So, you'll need to do > > > > struct string s = EMPTY_STRING; > > > > to avoid ever having a NULL, which is yet another thing to worry about. > > > > Having suffered these strings, I wouldn't recommend them as something > > better than a char*. In NGINX, we have them due to historic reasons, > > and rewriting the entire code base would be unreasonable, and while they > > may have been a good optimization 2 decades ago, I bet that it's a > > useless micro-optimization today, and we could just use char* everywhere > > to have less problems. > > If a variable is going to be overwritten without being read first, then there's > no need to initialize it. Within my implementation, the state resulting from > zero-init, i.e. 'struct string s = {0};', is deliberately chosen to be > well-defined > for the implementation. For e.g. appending a char to a zero-init'd string is > well-defined. Good. Reading a zero-init'd string would return something like "", I guess. Probably needs some special-case in the code, but ok. > > > > You can only make them const, if you use two distinct types: a read-only > > > > version, let's call it rstring, and a read-write version, let's call it > > > > string. > > > > > > But do we need both views (r/w and r/o) at the same time? > > > > Yes: I want to be able to take a r/w string and pass it to a function > > that accepts a r/o string. Since distinct types are incompatible, and > > this is the main problem with things like sockaddr, you can't just cast; > > that's likely to invoke Undefined Behavior. > > The std says: > "A pointer to an object type may be converted to a pointer to a different > object type. If the resulting pointer is not correctly aligned for the > referenced type, the behavior is undefined. Otherwise, when converted > back again, the result shall compare equal to the original pointer." > > So typecasting between object types is well-defined, given that the Yes, casting pointers is fine. > resulting pointer is indeed correctly aligned for the target type. The But having the resulting pointer be correctly aligned is enough for allowing a pointer conversion. But the only thing that the standard allows to do with that pointer is to convert it back. It is not allowed to dereference it, even if the alignment is correct. <http://port70.net/~nsz/c/c11/n1570.html#6.3.2.3p7> > various sock_addr types all impose a common default alignment, > because they all begin with a member of the same type. > > The std. also says: > "An object shall have its stored value accessed only by an lvalue > expression that has one of the following types: > . . . > — a character type." > > For e.g., the _sys_bind system-call under Linux kernel receives the > same 'const struct sockaddr *', along with the size of the data, from > glibc. The kernel then copies the data using memcpy (or equivalent > routines). > > Does the pointer conversion forbid the kernel from memcpy'ing the > object using that converted pointer? (It is likely that the converted > pointer, if it is not of compatible type, may not be dereferenced using > that type; but here the kernel isn't dereferencing the pointer using > the 'const struct sockaddr *' type; it is reading the obj-representation > using memcpy.) > > If no, then this behaviour is well-defined. Nah, memcpy'ing is fair game. > If yes, then this behaviour is undefined, although, in practice, both > gcc and clang (the compilers that can compile Linux kernels) must be > allowing the expected behaviour here, since the compiled kernel > binaries do indeed work wherever they are deployed. This only means > that, *if* the behaviour turns out to be theoretically undefined, *then* > the std. can be augmented to specifically allow reading of > representations of objects pointed to by such suitably converted pointers, > if it does not already allow such reads. However, when the kernel writes data that the user should read, it does so via memcpy on a sockaddr_storage. How is the user supposed to read that data? Say we define a sockaddr_storage, pass it to getpeername(2), where the kernel writes to it, and then we want to read it. Which type should we use? Imagine that it's a Unix socket, so it should be as sockaddr_un. But it was never written as that type, so we can't. <https://austingroupbugs.net/view.php?id=1641> > [ ... ] > > > > A function that is passed 'struct string *' can still be allowed to > > > inplace-modify data[i] by invoking appropriate string APIs that temporarily > > > casts away the constness of data[i]. > > > > That would be a violation of const correctness. If necessary, yeah, go > > ahead. But personally, I'd prefer avoiding that. For something that is > > proposed as an inclusion to ISO C, I'd rather not have it. > > I keep two bits of info within the string: > (1) whether the 'struct string' is heap-allocated, or not. > (2) whether the data (const char *) it points to is heap-allocated, or not. > > The std. says: > "If an attempt is made to modify an object defined with a const-qualified > type through use of an lvalue with non-const-qualified type, the behavior > is undefined." > > Given that a malloc'd object doesn't have its own declaration or a > declared type to begin with, it also doesn't have a definition, let alone > a const-qualified definition. Modifying malloc'd objects should not cause > const-correctness to be violated, even when casting away const on a > pointer that points to such an object. > > Modifiable strings can be created from statically allocated data (for e.g., > pointing inside the .rodata section), and the data can be malloc'd and > changed/appended-to on a copy-on-write basis. Hmmm, while casting away const is valid, it's not safe. I'm not sure I'd call that const-correct. It's not UB, that I agree. But you're bypassing type safety when you do a cast. Have a lovely day! Alex -- <https://www.alejandro-colomar.es/> Looking for a remote C programming job at the moment. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Assignment of union containing const-qualifier member 2024-03-18 9:19 ` Alejandro Colomar @ 2024-03-18 9:23 ` Alejandro Colomar 0 siblings, 0 replies; 12+ messages in thread From: Alejandro Colomar @ 2024-03-18 9:23 UTC (permalink / raw) To: Amol Surati; +Cc: gcc-help [-- Attachment #1: Type: text/plain, Size: 478 bytes --] Hi Amol, Let me ask these questions: Why does ISO C need a string structure? Why isn't it enough that some library implements such a string structure? At least for some decades, until time has proven that this is the string we all want. If it's a good type, and the library is portable, there shouldn't be a real difference if it's standard or not. Cheers, Alex -- <https://www.alejandro-colomar.es/> Looking for a remote C programming job at the moment. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2024-03-18 9:23 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-01-30 21:45 Assignment of union containing const-qualifier member Alejandro Colomar 2024-01-31 18:14 ` Alejandro Colomar 2024-02-04 7:33 ` Amol Surati 2024-02-04 8:18 ` Amol Surati 2024-02-04 18:40 ` Alejandro Colomar 2024-02-04 20:37 ` Alejandro Colomar 2024-02-07 4:13 ` Amol Surati 2024-02-07 13:29 ` Alejandro Colomar 2024-02-12 10:45 ` Amol Surati 2024-02-12 11:18 ` Amol Surati 2024-03-18 9:19 ` Alejandro Colomar 2024-03-18 9:23 ` Alejandro Colomar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).