* [whish] -Wunterminated-string-initialization: new warning @ 2022-11-13 21:56 Alejandro Colomar 2022-11-13 22:12 ` Andrew Pinski 0 siblings, 1 reply; 6+ messages in thread From: Alejandro Colomar @ 2022-11-13 21:56 UTC (permalink / raw) To: gcc [-- Attachment #1.1: Type: text/plain, Size: 1453 bytes --] Hi! I'd like to get warnings if I write the following code: char foo[3] = "foo"; It's hard to keep track of sizes to make sure that the string literals always initialize to terminated strings. It seems something that should be easy to implement in the compiler. A morecomplex case where it's harder to keep track of sizes is: static const char log_levels[][8] = { "alert", "error", "warn", "notice", "info", "debug", }; Here, 8 works now (and 7 too, but for aligmnent reasons I chose 8). If tomorrow we add or change an entry, It'll be hard to keep it safe. Such a warning would help a lot. An example program is: $ cat str.c char two[2] = "foo"; // 'f' 'o' char three[3] = "foo"; // 'f' 'o' 'o' char four[4] = "foo"; // 'f' 'o' 'o' '\0' char five[5] = "foo"; // 'f' 'o' 'o' '\0' '\0' char implicit[] = "foo"; // 'f' 'o' 'o' '\0' $ cc -Wall -Wextra str.c str.c:1:19: warning: initializer-string for array of ‘char’ is too long 1 | char two[2] = "foo"; // 'f' 'o' | ^~~~~ /usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/12/../../../x86_64-linux-gnu/Scrt1.o: in function `_start': (.text+0x17): undefined reference to `main' collect2: error: ld returned 1 exit status Here, I'd like that with the new warning, 'three' would also get warned. Cheers, Alex -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [whish] -Wunterminated-string-initialization: new warning 2022-11-13 21:56 [whish] -Wunterminated-string-initialization: new warning Alejandro Colomar @ 2022-11-13 22:12 ` Andrew Pinski 2022-11-14 11:37 ` Alejandro Colomar 0 siblings, 1 reply; 6+ messages in thread From: Andrew Pinski @ 2022-11-13 22:12 UTC (permalink / raw) To: Alejandro Colomar; +Cc: gcc On Sun, Nov 13, 2022 at 1:57 PM Alejandro Colomar via Gcc <gcc@gcc.gnu.org> wrote: > > Hi! > > I'd like to get warnings if I write the following code: > > char foo[3] = "foo"; This should be easy to add as it is already part of the -Wc++-compat option as for C++ it is invalid code. <source>:2:19: warning: initializer-string for array of 'char' is too long 2 | char two[2] = "foo"; // 'f' 'o' | ^~~~~ <source>:3:19: warning: initializer-string for array of 'char' is too long for C++ [-Wc++-compat] 3 | char three[3] = "foo"; // 'f' 'o' 'o' | ^~~~~ ... (for your more complex case [though I needed to modify one of the strings to exactly 8] <source>:5:7: warning: initializer-string for array of 'char' is too long for C++ [-Wc++-compat] 5 | "01234567", | ^~~~~~~~~~ else if (warn_cxx_compat && compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0) warning_at (init_loc, OPT_Wc___compat, ("initializer-string for array of %qT " "is too long for C++"), typ1); That is the current code which does this warning even so it is just a matter of adding an option to c-family/c.opt and then having c++-compat enable it and using that new option here. Thanks, Andrew Pinski > > It's hard to keep track of sizes to make sure that the string literals always > initialize to terminated strings. It seems something that should be easy to > implement in the compiler. > > A morecomplex case where it's harder to keep track of sizes is: > > static const char log_levels[][8] = { > "alert", > "error", > "warn", > "notice", > "info", > "debug", > }; > > Here, 8 works now (and 7 too, but for aligmnent reasons I chose 8). If tomorrow > we add or change an entry, It'll be hard to keep it safe. Such a warning would > help a lot. > > > An example program is: > > $ cat str.c > char two[2] = "foo"; // 'f' 'o' > char three[3] = "foo"; // 'f' 'o' 'o' > char four[4] = "foo"; // 'f' 'o' 'o' '\0' > char five[5] = "foo"; // 'f' 'o' 'o' '\0' '\0' > char implicit[] = "foo"; // 'f' 'o' 'o' '\0' > > $ cc -Wall -Wextra str.c > str.c:1:19: warning: initializer-string for array of ‘char’ is too long > 1 | char two[2] = "foo"; // 'f' 'o' > | ^~~~~ > /usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/12/../../../x86_64-linux-gnu/Scrt1.o: > in function `_start': > (.text+0x17): undefined reference to `main' > collect2: error: ld returned 1 exit status > > > Here, I'd like that with the new warning, 'three' would also get warned. > > Cheers, > > Alex > -- > <http://www.alejandro-colomar.es/> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [whish] -Wunterminated-string-initialization: new warning 2022-11-13 22:12 ` Andrew Pinski @ 2022-11-14 11:37 ` Alejandro Colomar 2022-11-14 13:14 ` Jonathan Wakely 0 siblings, 1 reply; 6+ messages in thread From: Alejandro Colomar @ 2022-11-14 11:37 UTC (permalink / raw) To: Andrew Pinski; +Cc: gcc, Douglas McIlroy [-- Attachment #1.1: Type: text/plain, Size: 3881 bytes --] Hi Andrew! On 11/13/22 23:12, Andrew Pinski wrote: > On Sun, Nov 13, 2022 at 1:57 PM Alejandro Colomar via Gcc > <gcc@gcc.gnu.org> wrote: >> >> Hi! >> >> I'd like to get warnings if I write the following code: >> >> char foo[3] = "foo"; > > This should be easy to add as it is already part of the -Wc++-compat > option as for C++ it is invalid code. > > <source>:2:19: warning: initializer-string for array of 'char' is too long > 2 | char two[2] = "foo"; // 'f' 'o' > | ^~~~~ > <source>:3:19: warning: initializer-string for array of 'char' is too > long for C++ [-Wc++-compat] > 3 | char three[3] = "foo"; // 'f' 'o' 'o' > | ^~~~~ > > > ... (for your more complex case [though I needed to modify one of the > strings to exactly 8] > > <source>:5:7: warning: initializer-string for array of 'char' is too > long for C++ [-Wc++-compat] > 5 | "01234567", > | ^~~~~~~~~~ > > else if (warn_cxx_compat > && compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0) > warning_at (init_loc, OPT_Wc___compat, > ("initializer-string for array of %qT " > "is too long for C++"), typ1); > > That is the current code which does this warning even so it is just a > matter of adding an option to c-family/c.opt and then having > c++-compat enable it and using that new option here. > > Thanks, > Andrew Pinski Great! I'd like to implement it myself, as I've never written any GCC code yet, so it's interesting to me. If you recall any (hopefully recent) case where a similar thing happened (the warning was already implemented and only needed a name), it might help me check how it was done. BTW, I had another idea to add a suffix to string literals to make them unterminated: char foo[3] = "foo"u; // OK char bar[4] = "bar"; // OK char baz[4] = "baz"u; // Warning: initializer is too short. char etc[3] = "etc"; // Warning: unterminated string. Is that doable? Do you think it makes sense? I have a code base that uses a mix of terminated and unterminated strings, and it would be nice to be able to tell the one I want in each case. Cheers, Alex > >> >> It's hard to keep track of sizes to make sure that the string literals always >> initialize to terminated strings. It seems something that should be easy to >> implement in the compiler. >> >> A morecomplex case where it's harder to keep track of sizes is: >> >> static const char log_levels[][8] = { >> "alert", >> "error", >> "warn", >> "notice", >> "info", >> "debug", >> }; >> >> Here, 8 works now (and 7 too, but for aligmnent reasons I chose 8). If tomorrow >> we add or change an entry, It'll be hard to keep it safe. Such a warning would >> help a lot. >> >> >> An example program is: >> >> $ cat str.c >> char two[2] = "foo"; // 'f' 'o' >> char three[3] = "foo"; // 'f' 'o' 'o' >> char four[4] = "foo"; // 'f' 'o' 'o' '\0' >> char five[5] = "foo"; // 'f' 'o' 'o' '\0' '\0' >> char implicit[] = "foo"; // 'f' 'o' 'o' '\0' >> >> $ cc -Wall -Wextra str.c >> str.c:1:19: warning: initializer-string for array of ‘char’ is too long >> 1 | char two[2] = "foo"; // 'f' 'o' >> | ^~~~~ >> /usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/12/../../../x86_64-linux-gnu/Scrt1.o: >> in function `_start': >> (.text+0x17): undefined reference to `main' >> collect2: error: ld returned 1 exit status >> >> >> Here, I'd like that with the new warning, 'three' would also get warned. >> >> Cheers, >> >> Alex >> -- >> <http://www.alejandro-colomar.es/> -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [whish] -Wunterminated-string-initialization: new warning 2022-11-14 11:37 ` Alejandro Colomar @ 2022-11-14 13:14 ` Jonathan Wakely 2022-11-14 13:41 ` unterminated string literals (was: [whish] -Wunterminated-string-initialization: new warning) Alejandro Colomar 2022-11-14 13:46 ` [whish] -Wunterminated-string-initialization: new warning Alejandro Colomar 0 siblings, 2 replies; 6+ messages in thread From: Jonathan Wakely @ 2022-11-14 13:14 UTC (permalink / raw) To: Alejandro Colomar; +Cc: Andrew Pinski, gcc, Douglas McIlroy On Mon, 14 Nov 2022 at 11:38, Alejandro Colomar via Gcc <gcc@gcc.gnu.org> wrote: > > Hi Andrew! > > On 11/13/22 23:12, Andrew Pinski wrote: > > On Sun, Nov 13, 2022 at 1:57 PM Alejandro Colomar via Gcc > > <gcc@gcc.gnu.org> wrote: > >> > >> Hi! > >> > >> I'd like to get warnings if I write the following code: > >> > >> char foo[3] = "foo"; > > > > This should be easy to add as it is already part of the -Wc++-compat > > option as for C++ it is invalid code. > > > > <source>:2:19: warning: initializer-string for array of 'char' is too long > > 2 | char two[2] = "foo"; // 'f' 'o' > > | ^~~~~ > > <source>:3:19: warning: initializer-string for array of 'char' is too > > long for C++ [-Wc++-compat] > > 3 | char three[3] = "foo"; // 'f' 'o' 'o' > > | ^~~~~ > > > > > > ... (for your more complex case [though I needed to modify one of the > > strings to exactly 8] > > > > <source>:5:7: warning: initializer-string for array of 'char' is too > > long for C++ [-Wc++-compat] > > 5 | "01234567", > > | ^~~~~~~~~~ > > > > else if (warn_cxx_compat > > && compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0) > > warning_at (init_loc, OPT_Wc___compat, > > ("initializer-string for array of %qT " > > "is too long for C++"), typ1); > > > > That is the current code which does this warning even so it is just a > > matter of adding an option to c-family/c.opt and then having > > c++-compat enable it and using that new option here. > > > > Thanks, > > Andrew Pinski > > Great! I'd like to implement it myself, as I've never written any GCC code yet, > so it's interesting to me. If you recall any (hopefully recent) case where a > similar thing happened (the warning was already implemented and only needed a > name), it might help me check how it was done. `git log gcc/c-family/c.opt` will show loads of changes adding warnings. > > BTW, I had another idea to add a suffix to string literals to make them > unterminated: > > char foo[3] = "foo"u; // OK > char bar[4] = "bar"; // OK > > char baz[4] = "baz"u; // Warning: initializer is too short. > char etc[3] = "etc"; // Warning: unterminated string. > > Is that doable? Do you think it makes sense? IMHO no. This is not useful enough to add a language extension, it's an incredibly niche use case. Your suggested syntax also looks very confusing with UTF-16 string literals, and is not sufficiently distinct from a normal string literal to be obvious when quickly reading the code. People expect string literals in C to be null-terminated, having a subtle suffix that changes that would be a bug farm. You can do {'b', 'a', 'z'} if you want an explicitly unterminated array of char. ^ permalink raw reply [flat|nested] 6+ messages in thread
* unterminated string literals (was: [whish] -Wunterminated-string-initialization: new warning) 2022-11-14 13:14 ` Jonathan Wakely @ 2022-11-14 13:41 ` Alejandro Colomar 2022-11-14 13:46 ` [whish] -Wunterminated-string-initialization: new warning Alejandro Colomar 1 sibling, 0 replies; 6+ messages in thread From: Alejandro Colomar @ 2022-11-14 13:41 UTC (permalink / raw) To: Jonathan Wakely; +Cc: Andrew Pinski, gcc, Douglas McIlroy [-- Attachment #1.1: Type: text/plain, Size: 2388 bytes --] Hi Jonathan, On 11/14/22 14:14, Jonathan Wakely wrote: > On Mon, 14 Nov 2022 at 11:38, Alejandro Colomar via Gcc <gcc@gcc.gnu.org> wrote: >> BTW, I had another idea to add a suffix to string literals to make them >> unterminated: >> >> char foo[3] = "foo"u; // OK >> char bar[4] = "bar"; // OK >> >> char baz[4] = "baz"u; // Warning: initializer is too short. >> char etc[3] = "etc"; // Warning: unterminated string. >> >> Is that doable? Do you think it makes sense? > > IMHO no. This is not useful enough to add a language extension, it's > an incredibly niche use case. I agree it's way too niche. > Your suggested syntax also looks very > confusing with UTF-16 string literals, Maybe. > and is not sufficiently > distinct from a normal string literal to be obvious when quickly > reading the code. People expect string literals in C to be > null-terminated, having a subtle suffix that changes that would be a > bug farm. But, you have to combine both the suffix with the corresponding size (one less than for normal strings). A programmer needs to be consciously doing this. For readers of the code, maybe there's a bit more of a readability issue, especially if you don't know the extension. But when you stop a little bit to check what that suffix is doing and then realize the size is weird, a reasonable programmer should at least ask or check the documentation for that thing. Regarding safety, I also have that thing very present in my mind, and in an attempt to get the compiler on my side, I decided to use 'char *' for NUL-terminated strings, and 'u_char *' for u_nterminated strings. That helps the compiler know when we're using one in place of another, which as you say would be a source of bugs. Maybe having the type of these new strings be u_char[] instead of char[] would help have more type safety. I didn't suggest this because that would not be how strings in C have always been. However, considering that they are not really strings, it could make sense. > > You can do {'b', 'a', 'z'} if you want an explicitly unterminated array of char. A bit unreadable :) I think I'll keep using normal literals, and maybe some workaround to disable the warnings for specific cases. Not my preference, but it can work. Cheers, Alex -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [whish] -Wunterminated-string-initialization: new warning 2022-11-14 13:14 ` Jonathan Wakely 2022-11-14 13:41 ` unterminated string literals (was: [whish] -Wunterminated-string-initialization: new warning) Alejandro Colomar @ 2022-11-14 13:46 ` Alejandro Colomar 1 sibling, 0 replies; 6+ messages in thread From: Alejandro Colomar @ 2022-11-14 13:46 UTC (permalink / raw) To: Jonathan Wakely; +Cc: Andrew Pinski, gcc, Douglas McIlroy [-- Attachment #1.1: Type: text/plain, Size: 2310 bytes --] Hi Jonathan, On 11/14/22 14:14, Jonathan Wakely wrote: > On Mon, 14 Nov 2022 at 11:38, Alejandro Colomar via Gcc <gcc@gcc.gnu.org> wrote: >> >> Hi Andrew! >> >> On 11/13/22 23:12, Andrew Pinski wrote: >>> On Sun, Nov 13, 2022 at 1:57 PM Alejandro Colomar via Gcc >>> <gcc@gcc.gnu.org> wrote: >>>> >>>> Hi! >>>> >>>> I'd like to get warnings if I write the following code: >>>> >>>> char foo[3] = "foo"; >>> >>> This should be easy to add as it is already part of the -Wc++-compat >>> option as for C++ it is invalid code. >>> >>> <source>:2:19: warning: initializer-string for array of 'char' is too long >>> 2 | char two[2] = "foo"; // 'f' 'o' >>> | ^~~~~ >>> <source>:3:19: warning: initializer-string for array of 'char' is too >>> long for C++ [-Wc++-compat] >>> 3 | char three[3] = "foo"; // 'f' 'o' 'o' >>> | ^~~~~ >>> >>> >>> ... (for your more complex case [though I needed to modify one of the >>> strings to exactly 8] >>> >>> <source>:5:7: warning: initializer-string for array of 'char' is too >>> long for C++ [-Wc++-compat] >>> 5 | "01234567", >>> | ^~~~~~~~~~ >>> >>> else if (warn_cxx_compat >>> && compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0) >>> warning_at (init_loc, OPT_Wc___compat, >>> ("initializer-string for array of %qT " >>> "is too long for C++"), typ1); >>> >>> That is the current code which does this warning even so it is just a >>> matter of adding an option to c-family/c.opt and then having >>> c++-compat enable it and using that new option here. >>> >>> Thanks, >>> Andrew Pinski >> >> Great! I'd like to implement it myself, as I've never written any GCC code yet, >> so it's interesting to me. If you recall any (hopefully recent) case where a >> similar thing happened (the warning was already implemented and only needed a >> name), it might help me check how it was done. > > `git log gcc/c-family/c.opt` will show loads of changes adding warnings. Thanks, I will check and try it, and come back if I have any doubts. Cheers, Alex -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-11-14 13:47 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-11-13 21:56 [whish] -Wunterminated-string-initialization: new warning Alejandro Colomar 2022-11-13 22:12 ` Andrew Pinski 2022-11-14 11:37 ` Alejandro Colomar 2022-11-14 13:14 ` Jonathan Wakely 2022-11-14 13:41 ` unterminated string literals (was: [whish] -Wunterminated-string-initialization: new warning) Alejandro Colomar 2022-11-14 13:46 ` [whish] -Wunterminated-string-initialization: new warning Alejandro Colomar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).