* [whish] -Wunterminated-string-initialization: new warning
@ 2022-11-13 21:56 Alejandro Colomar
2022-11-13 22:12 ` Andrew Pinski
0 siblings, 1 reply; 6+ messages in thread
From: Alejandro Colomar @ 2022-11-13 21:56 UTC (permalink / raw)
To: gcc
[-- Attachment #1.1: Type: text/plain, Size: 1453 bytes --]
Hi!
I'd like to get warnings if I write the following code:
char foo[3] = "foo";
It's hard to keep track of sizes to make sure that the string literals always
initialize to terminated strings. It seems something that should be easy to
implement in the compiler.
A morecomplex case where it's harder to keep track of sizes is:
static const char log_levels[][8] = {
"alert",
"error",
"warn",
"notice",
"info",
"debug",
};
Here, 8 works now (and 7 too, but for aligmnent reasons I chose 8). If tomorrow
we add or change an entry, It'll be hard to keep it safe. Such a warning would
help a lot.
An example program is:
$ cat str.c
char two[2] = "foo"; // 'f' 'o'
char three[3] = "foo"; // 'f' 'o' 'o'
char four[4] = "foo"; // 'f' 'o' 'o' '\0'
char five[5] = "foo"; // 'f' 'o' 'o' '\0' '\0'
char implicit[] = "foo"; // 'f' 'o' 'o' '\0'
$ cc -Wall -Wextra str.c
str.c:1:19: warning: initializer-string for array of ‘char’ is too long
1 | char two[2] = "foo"; // 'f' 'o'
| ^~~~~
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/12/../../../x86_64-linux-gnu/Scrt1.o:
in function `_start':
(.text+0x17): undefined reference to `main'
collect2: error: ld returned 1 exit status
Here, I'd like that with the new warning, 'three' would also get warned.
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [whish] -Wunterminated-string-initialization: new warning
2022-11-13 21:56 [whish] -Wunterminated-string-initialization: new warning Alejandro Colomar
@ 2022-11-13 22:12 ` Andrew Pinski
2022-11-14 11:37 ` Alejandro Colomar
0 siblings, 1 reply; 6+ messages in thread
From: Andrew Pinski @ 2022-11-13 22:12 UTC (permalink / raw)
To: Alejandro Colomar; +Cc: gcc
On Sun, Nov 13, 2022 at 1:57 PM Alejandro Colomar via Gcc
<gcc@gcc.gnu.org> wrote:
>
> Hi!
>
> I'd like to get warnings if I write the following code:
>
> char foo[3] = "foo";
This should be easy to add as it is already part of the -Wc++-compat
option as for C++ it is invalid code.
<source>:2:19: warning: initializer-string for array of 'char' is too long
2 | char two[2] = "foo"; // 'f' 'o'
| ^~~~~
<source>:3:19: warning: initializer-string for array of 'char' is too
long for C++ [-Wc++-compat]
3 | char three[3] = "foo"; // 'f' 'o' 'o'
| ^~~~~
... (for your more complex case [though I needed to modify one of the
strings to exactly 8]
<source>:5:7: warning: initializer-string for array of 'char' is too
long for C++ [-Wc++-compat]
5 | "01234567",
| ^~~~~~~~~~
else if (warn_cxx_compat
&& compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0)
warning_at (init_loc, OPT_Wc___compat,
("initializer-string for array of %qT "
"is too long for C++"), typ1);
That is the current code which does this warning even so it is just a
matter of adding an option to c-family/c.opt and then having
c++-compat enable it and using that new option here.
Thanks,
Andrew Pinski
>
> It's hard to keep track of sizes to make sure that the string literals always
> initialize to terminated strings. It seems something that should be easy to
> implement in the compiler.
>
> A morecomplex case where it's harder to keep track of sizes is:
>
> static const char log_levels[][8] = {
> "alert",
> "error",
> "warn",
> "notice",
> "info",
> "debug",
> };
>
> Here, 8 works now (and 7 too, but for aligmnent reasons I chose 8). If tomorrow
> we add or change an entry, It'll be hard to keep it safe. Such a warning would
> help a lot.
>
>
> An example program is:
>
> $ cat str.c
> char two[2] = "foo"; // 'f' 'o'
> char three[3] = "foo"; // 'f' 'o' 'o'
> char four[4] = "foo"; // 'f' 'o' 'o' '\0'
> char five[5] = "foo"; // 'f' 'o' 'o' '\0' '\0'
> char implicit[] = "foo"; // 'f' 'o' 'o' '\0'
>
> $ cc -Wall -Wextra str.c
> str.c:1:19: warning: initializer-string for array of ‘char’ is too long
> 1 | char two[2] = "foo"; // 'f' 'o'
> | ^~~~~
> /usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/12/../../../x86_64-linux-gnu/Scrt1.o:
> in function `_start':
> (.text+0x17): undefined reference to `main'
> collect2: error: ld returned 1 exit status
>
>
> Here, I'd like that with the new warning, 'three' would also get warned.
>
> Cheers,
>
> Alex
> --
> <http://www.alejandro-colomar.es/>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [whish] -Wunterminated-string-initialization: new warning
2022-11-13 22:12 ` Andrew Pinski
@ 2022-11-14 11:37 ` Alejandro Colomar
2022-11-14 13:14 ` Jonathan Wakely
0 siblings, 1 reply; 6+ messages in thread
From: Alejandro Colomar @ 2022-11-14 11:37 UTC (permalink / raw)
To: Andrew Pinski; +Cc: gcc, Douglas McIlroy
[-- Attachment #1.1: Type: text/plain, Size: 3881 bytes --]
Hi Andrew!
On 11/13/22 23:12, Andrew Pinski wrote:
> On Sun, Nov 13, 2022 at 1:57 PM Alejandro Colomar via Gcc
> <gcc@gcc.gnu.org> wrote:
>>
>> Hi!
>>
>> I'd like to get warnings if I write the following code:
>>
>> char foo[3] = "foo";
>
> This should be easy to add as it is already part of the -Wc++-compat
> option as for C++ it is invalid code.
>
> <source>:2:19: warning: initializer-string for array of 'char' is too long
> 2 | char two[2] = "foo"; // 'f' 'o'
> | ^~~~~
> <source>:3:19: warning: initializer-string for array of 'char' is too
> long for C++ [-Wc++-compat]
> 3 | char three[3] = "foo"; // 'f' 'o' 'o'
> | ^~~~~
>
>
> ... (for your more complex case [though I needed to modify one of the
> strings to exactly 8]
>
> <source>:5:7: warning: initializer-string for array of 'char' is too
> long for C++ [-Wc++-compat]
> 5 | "01234567",
> | ^~~~~~~~~~
>
> else if (warn_cxx_compat
> && compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0)
> warning_at (init_loc, OPT_Wc___compat,
> ("initializer-string for array of %qT "
> "is too long for C++"), typ1);
>
> That is the current code which does this warning even so it is just a
> matter of adding an option to c-family/c.opt and then having
> c++-compat enable it and using that new option here.
>
> Thanks,
> Andrew Pinski
Great! I'd like to implement it myself, as I've never written any GCC code yet,
so it's interesting to me. If you recall any (hopefully recent) case where a
similar thing happened (the warning was already implemented and only needed a
name), it might help me check how it was done.
BTW, I had another idea to add a suffix to string literals to make them
unterminated:
char foo[3] = "foo"u; // OK
char bar[4] = "bar"; // OK
char baz[4] = "baz"u; // Warning: initializer is too short.
char etc[3] = "etc"; // Warning: unterminated string.
Is that doable? Do you think it makes sense?
I have a code base that uses a mix of terminated and unterminated strings, and
it would be nice to be able to tell the one I want in each case.
Cheers,
Alex
>
>>
>> It's hard to keep track of sizes to make sure that the string literals always
>> initialize to terminated strings. It seems something that should be easy to
>> implement in the compiler.
>>
>> A morecomplex case where it's harder to keep track of sizes is:
>>
>> static const char log_levels[][8] = {
>> "alert",
>> "error",
>> "warn",
>> "notice",
>> "info",
>> "debug",
>> };
>>
>> Here, 8 works now (and 7 too, but for aligmnent reasons I chose 8). If tomorrow
>> we add or change an entry, It'll be hard to keep it safe. Such a warning would
>> help a lot.
>>
>>
>> An example program is:
>>
>> $ cat str.c
>> char two[2] = "foo"; // 'f' 'o'
>> char three[3] = "foo"; // 'f' 'o' 'o'
>> char four[4] = "foo"; // 'f' 'o' 'o' '\0'
>> char five[5] = "foo"; // 'f' 'o' 'o' '\0' '\0'
>> char implicit[] = "foo"; // 'f' 'o' 'o' '\0'
>>
>> $ cc -Wall -Wextra str.c
>> str.c:1:19: warning: initializer-string for array of ‘char’ is too long
>> 1 | char two[2] = "foo"; // 'f' 'o'
>> | ^~~~~
>> /usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/12/../../../x86_64-linux-gnu/Scrt1.o:
>> in function `_start':
>> (.text+0x17): undefined reference to `main'
>> collect2: error: ld returned 1 exit status
>>
>>
>> Here, I'd like that with the new warning, 'three' would also get warned.
>>
>> Cheers,
>>
>> Alex
>> --
>> <http://www.alejandro-colomar.es/>
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [whish] -Wunterminated-string-initialization: new warning
2022-11-14 11:37 ` Alejandro Colomar
@ 2022-11-14 13:14 ` Jonathan Wakely
2022-11-14 13:41 ` unterminated string literals (was: [whish] -Wunterminated-string-initialization: new warning) Alejandro Colomar
2022-11-14 13:46 ` [whish] -Wunterminated-string-initialization: new warning Alejandro Colomar
0 siblings, 2 replies; 6+ messages in thread
From: Jonathan Wakely @ 2022-11-14 13:14 UTC (permalink / raw)
To: Alejandro Colomar; +Cc: Andrew Pinski, gcc, Douglas McIlroy
On Mon, 14 Nov 2022 at 11:38, Alejandro Colomar via Gcc <gcc@gcc.gnu.org> wrote:
>
> Hi Andrew!
>
> On 11/13/22 23:12, Andrew Pinski wrote:
> > On Sun, Nov 13, 2022 at 1:57 PM Alejandro Colomar via Gcc
> > <gcc@gcc.gnu.org> wrote:
> >>
> >> Hi!
> >>
> >> I'd like to get warnings if I write the following code:
> >>
> >> char foo[3] = "foo";
> >
> > This should be easy to add as it is already part of the -Wc++-compat
> > option as for C++ it is invalid code.
> >
> > <source>:2:19: warning: initializer-string for array of 'char' is too long
> > 2 | char two[2] = "foo"; // 'f' 'o'
> > | ^~~~~
> > <source>:3:19: warning: initializer-string for array of 'char' is too
> > long for C++ [-Wc++-compat]
> > 3 | char three[3] = "foo"; // 'f' 'o' 'o'
> > | ^~~~~
> >
> >
> > ... (for your more complex case [though I needed to modify one of the
> > strings to exactly 8]
> >
> > <source>:5:7: warning: initializer-string for array of 'char' is too
> > long for C++ [-Wc++-compat]
> > 5 | "01234567",
> > | ^~~~~~~~~~
> >
> > else if (warn_cxx_compat
> > && compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0)
> > warning_at (init_loc, OPT_Wc___compat,
> > ("initializer-string for array of %qT "
> > "is too long for C++"), typ1);
> >
> > That is the current code which does this warning even so it is just a
> > matter of adding an option to c-family/c.opt and then having
> > c++-compat enable it and using that new option here.
> >
> > Thanks,
> > Andrew Pinski
>
> Great! I'd like to implement it myself, as I've never written any GCC code yet,
> so it's interesting to me. If you recall any (hopefully recent) case where a
> similar thing happened (the warning was already implemented and only needed a
> name), it might help me check how it was done.
`git log gcc/c-family/c.opt` will show loads of changes adding warnings.
>
> BTW, I had another idea to add a suffix to string literals to make them
> unterminated:
>
> char foo[3] = "foo"u; // OK
> char bar[4] = "bar"; // OK
>
> char baz[4] = "baz"u; // Warning: initializer is too short.
> char etc[3] = "etc"; // Warning: unterminated string.
>
> Is that doable? Do you think it makes sense?
IMHO no. This is not useful enough to add a language extension, it's
an incredibly niche use case. Your suggested syntax also looks very
confusing with UTF-16 string literals, and is not sufficiently
distinct from a normal string literal to be obvious when quickly
reading the code. People expect string literals in C to be
null-terminated, having a subtle suffix that changes that would be a
bug farm.
You can do {'b', 'a', 'z'} if you want an explicitly unterminated array of char.
^ permalink raw reply [flat|nested] 6+ messages in thread
* unterminated string literals (was: [whish] -Wunterminated-string-initialization: new warning)
2022-11-14 13:14 ` Jonathan Wakely
@ 2022-11-14 13:41 ` Alejandro Colomar
2022-11-14 13:46 ` [whish] -Wunterminated-string-initialization: new warning Alejandro Colomar
1 sibling, 0 replies; 6+ messages in thread
From: Alejandro Colomar @ 2022-11-14 13:41 UTC (permalink / raw)
To: Jonathan Wakely; +Cc: Andrew Pinski, gcc, Douglas McIlroy
[-- Attachment #1.1: Type: text/plain, Size: 2388 bytes --]
Hi Jonathan,
On 11/14/22 14:14, Jonathan Wakely wrote:
> On Mon, 14 Nov 2022 at 11:38, Alejandro Colomar via Gcc <gcc@gcc.gnu.org> wrote:
>> BTW, I had another idea to add a suffix to string literals to make them
>> unterminated:
>>
>> char foo[3] = "foo"u; // OK
>> char bar[4] = "bar"; // OK
>>
>> char baz[4] = "baz"u; // Warning: initializer is too short.
>> char etc[3] = "etc"; // Warning: unterminated string.
>>
>> Is that doable? Do you think it makes sense?
>
> IMHO no. This is not useful enough to add a language extension, it's
> an incredibly niche use case.
I agree it's way too niche.
> Your suggested syntax also looks very
> confusing with UTF-16 string literals,
Maybe.
> and is not sufficiently
> distinct from a normal string literal to be obvious when quickly
> reading the code. People expect string literals in C to be
> null-terminated, having a subtle suffix that changes that would be a
> bug farm.
But, you have to combine both the suffix with the corresponding size (one less
than for normal strings). A programmer needs to be consciously doing this. For
readers of the code, maybe there's a bit more of a readability issue, especially
if you don't know the extension. But when you stop a little bit to check what
that suffix is doing and then realize the size is weird, a reasonable programmer
should at least ask or check the documentation for that thing.
Regarding safety, I also have that thing very present in my mind, and in an
attempt to get the compiler on my side, I decided to use 'char *' for
NUL-terminated strings, and 'u_char *' for u_nterminated strings. That helps
the compiler know when we're using one in place of another, which as you say
would be a source of bugs.
Maybe having the type of these new strings be u_char[] instead of char[] would
help have more type safety. I didn't suggest this because that would not be how
strings in C have always been. However, considering that they are not really
strings, it could make sense.
>
> You can do {'b', 'a', 'z'} if you want an explicitly unterminated array of char.
A bit unreadable :)
I think I'll keep using normal literals, and maybe some workaround to disable
the warnings for specific cases. Not my preference, but it can work.
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [whish] -Wunterminated-string-initialization: new warning
2022-11-14 13:14 ` Jonathan Wakely
2022-11-14 13:41 ` unterminated string literals (was: [whish] -Wunterminated-string-initialization: new warning) Alejandro Colomar
@ 2022-11-14 13:46 ` Alejandro Colomar
1 sibling, 0 replies; 6+ messages in thread
From: Alejandro Colomar @ 2022-11-14 13:46 UTC (permalink / raw)
To: Jonathan Wakely; +Cc: Andrew Pinski, gcc, Douglas McIlroy
[-- Attachment #1.1: Type: text/plain, Size: 2310 bytes --]
Hi Jonathan,
On 11/14/22 14:14, Jonathan Wakely wrote:
> On Mon, 14 Nov 2022 at 11:38, Alejandro Colomar via Gcc <gcc@gcc.gnu.org> wrote:
>>
>> Hi Andrew!
>>
>> On 11/13/22 23:12, Andrew Pinski wrote:
>>> On Sun, Nov 13, 2022 at 1:57 PM Alejandro Colomar via Gcc
>>> <gcc@gcc.gnu.org> wrote:
>>>>
>>>> Hi!
>>>>
>>>> I'd like to get warnings if I write the following code:
>>>>
>>>> char foo[3] = "foo";
>>>
>>> This should be easy to add as it is already part of the -Wc++-compat
>>> option as for C++ it is invalid code.
>>>
>>> <source>:2:19: warning: initializer-string for array of 'char' is too long
>>> 2 | char two[2] = "foo"; // 'f' 'o'
>>> | ^~~~~
>>> <source>:3:19: warning: initializer-string for array of 'char' is too
>>> long for C++ [-Wc++-compat]
>>> 3 | char three[3] = "foo"; // 'f' 'o' 'o'
>>> | ^~~~~
>>>
>>>
>>> ... (for your more complex case [though I needed to modify one of the
>>> strings to exactly 8]
>>>
>>> <source>:5:7: warning: initializer-string for array of 'char' is too
>>> long for C++ [-Wc++-compat]
>>> 5 | "01234567",
>>> | ^~~~~~~~~~
>>>
>>> else if (warn_cxx_compat
>>> && compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0)
>>> warning_at (init_loc, OPT_Wc___compat,
>>> ("initializer-string for array of %qT "
>>> "is too long for C++"), typ1);
>>>
>>> That is the current code which does this warning even so it is just a
>>> matter of adding an option to c-family/c.opt and then having
>>> c++-compat enable it and using that new option here.
>>>
>>> Thanks,
>>> Andrew Pinski
>>
>> Great! I'd like to implement it myself, as I've never written any GCC code yet,
>> so it's interesting to me. If you recall any (hopefully recent) case where a
>> similar thing happened (the warning was already implemented and only needed a
>> name), it might help me check how it was done.
>
> `git log gcc/c-family/c.opt` will show loads of changes adding warnings.
Thanks, I will check and try it, and come back if I have any doubts.
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-11-14 13:47 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-13 21:56 [whish] -Wunterminated-string-initialization: new warning Alejandro Colomar
2022-11-13 22:12 ` Andrew Pinski
2022-11-14 11:37 ` Alejandro Colomar
2022-11-14 13:14 ` Jonathan Wakely
2022-11-14 13:41 ` unterminated string literals (was: [whish] -Wunterminated-string-initialization: new warning) Alejandro Colomar
2022-11-14 13:46 ` [whish] -Wunterminated-string-initialization: new warning Alejandro Colomar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).