public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* [whish] -Wunterminated-string-initialization: new warning
@ 2022-11-13 21:56 Alejandro Colomar
  2022-11-13 22:12 ` Andrew Pinski
  0 siblings, 1 reply; 6+ messages in thread
From: Alejandro Colomar @ 2022-11-13 21:56 UTC (permalink / raw)
  To: gcc


[-- Attachment #1.1: Type: text/plain, Size: 1453 bytes --]

Hi!

I'd like to get warnings if I write the following code:

char foo[3] = "foo";

It's hard to keep track of sizes to make sure that the string literals always 
initialize to terminated strings.  It seems something that should be easy to 
implement in the compiler.

A morecomplex case where it's harder to keep track of sizes is:

static const char  log_levels[][8] = {
     "alert",
     "error",
     "warn",
     "notice",
     "info",
     "debug",
};

Here, 8 works now (and 7 too, but for aligmnent reasons I chose 8).  If tomorrow 
we add or change an entry, It'll be hard to keep it safe.  Such a warning would 
help a lot.


An example program is:

$ cat str.c
char     two[2] = "foo";   // 'f' 'o'
char   three[3] = "foo";   // 'f' 'o' 'o'
char    four[4] = "foo";   // 'f' 'o' 'o' '\0'
char    five[5] = "foo";   // 'f' 'o' 'o' '\0' '\0'
char implicit[] = "foo";   // 'f' 'o' 'o' '\0'

$ cc -Wall -Wextra str.c
str.c:1:19: warning: initializer-string for array of ‘char’ is too long
     1 | char     two[2] = "foo";   // 'f' 'o'
       |                   ^~~~~
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/12/../../../x86_64-linux-gnu/Scrt1.o: 
in function `_start':
(.text+0x17): undefined reference to `main'
collect2: error: ld returned 1 exit status


Here, I'd like that with the new warning, 'three' would also get warned.

Cheers,

Alex
-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [whish] -Wunterminated-string-initialization: new warning
  2022-11-13 21:56 [whish] -Wunterminated-string-initialization: new warning Alejandro Colomar
@ 2022-11-13 22:12 ` Andrew Pinski
  2022-11-14 11:37   ` Alejandro Colomar
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Pinski @ 2022-11-13 22:12 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: gcc

On Sun, Nov 13, 2022 at 1:57 PM Alejandro Colomar via Gcc
<gcc@gcc.gnu.org> wrote:
>
> Hi!
>
> I'd like to get warnings if I write the following code:
>
> char foo[3] = "foo";

This should be easy to add as it is already part of the -Wc++-compat
option as for C++ it is invalid code.

<source>:2:19: warning: initializer-string for array of 'char' is too long
    2 | char     two[2] = "foo";   // 'f' 'o'
      |                   ^~~~~
<source>:3:19: warning: initializer-string for array of 'char' is too
long for C++ [-Wc++-compat]
    3 | char   three[3] = "foo";   // 'f' 'o' 'o'
      |                   ^~~~~


... (for your more complex case [though I needed to modify one of the
strings to exactly 8]

<source>:5:7: warning: initializer-string for array of 'char' is too
long for C++ [-Wc++-compat]
    5 |       "01234567",
      |       ^~~~~~~~~~

              else if (warn_cxx_compat
                       && compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0)
                warning_at (init_loc, OPT_Wc___compat,
                            ("initializer-string for array of %qT "
                             "is too long for C++"), typ1);

That is the current code which does this warning even so it is just a
matter of adding an option to c-family/c.opt and then having
c++-compat enable it and using that new option here.

Thanks,
Andrew Pinski

>
> It's hard to keep track of sizes to make sure that the string literals always
> initialize to terminated strings.  It seems something that should be easy to
> implement in the compiler.
>
> A morecomplex case where it's harder to keep track of sizes is:
>
> static const char  log_levels[][8] = {
>      "alert",
>      "error",
>      "warn",
>      "notice",
>      "info",
>      "debug",
> };
>
> Here, 8 works now (and 7 too, but for aligmnent reasons I chose 8).  If tomorrow
> we add or change an entry, It'll be hard to keep it safe.  Such a warning would
> help a lot.
>
>
> An example program is:
>
> $ cat str.c
> char     two[2] = "foo";   // 'f' 'o'
> char   three[3] = "foo";   // 'f' 'o' 'o'
> char    four[4] = "foo";   // 'f' 'o' 'o' '\0'
> char    five[5] = "foo";   // 'f' 'o' 'o' '\0' '\0'
> char implicit[] = "foo";   // 'f' 'o' 'o' '\0'
>
> $ cc -Wall -Wextra str.c
> str.c:1:19: warning: initializer-string for array of ‘char’ is too long
>      1 | char     two[2] = "foo";   // 'f' 'o'
>        |                   ^~~~~
> /usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/12/../../../x86_64-linux-gnu/Scrt1.o:
> in function `_start':
> (.text+0x17): undefined reference to `main'
> collect2: error: ld returned 1 exit status
>
>
> Here, I'd like that with the new warning, 'three' would also get warned.
>
> Cheers,
>
> Alex
> --
> <http://www.alejandro-colomar.es/>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [whish] -Wunterminated-string-initialization: new warning
  2022-11-13 22:12 ` Andrew Pinski
@ 2022-11-14 11:37   ` Alejandro Colomar
  2022-11-14 13:14     ` Jonathan Wakely
  0 siblings, 1 reply; 6+ messages in thread
From: Alejandro Colomar @ 2022-11-14 11:37 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: gcc, Douglas McIlroy


[-- Attachment #1.1: Type: text/plain, Size: 3881 bytes --]

Hi Andrew!

On 11/13/22 23:12, Andrew Pinski wrote:
> On Sun, Nov 13, 2022 at 1:57 PM Alejandro Colomar via Gcc
> <gcc@gcc.gnu.org> wrote:
>>
>> Hi!
>>
>> I'd like to get warnings if I write the following code:
>>
>> char foo[3] = "foo";
> 
> This should be easy to add as it is already part of the -Wc++-compat
> option as for C++ it is invalid code.
> 
> <source>:2:19: warning: initializer-string for array of 'char' is too long
>      2 | char     two[2] = "foo";   // 'f' 'o'
>        |                   ^~~~~
> <source>:3:19: warning: initializer-string for array of 'char' is too
> long for C++ [-Wc++-compat]
>      3 | char   three[3] = "foo";   // 'f' 'o' 'o'
>        |                   ^~~~~
> 
> 
> ... (for your more complex case [though I needed to modify one of the
> strings to exactly 8]
> 
> <source>:5:7: warning: initializer-string for array of 'char' is too
> long for C++ [-Wc++-compat]
>      5 |       "01234567",
>        |       ^~~~~~~~~~
> 
>                else if (warn_cxx_compat
>                         && compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0)
>                  warning_at (init_loc, OPT_Wc___compat,
>                              ("initializer-string for array of %qT "
>                               "is too long for C++"), typ1);
> 
> That is the current code which does this warning even so it is just a
> matter of adding an option to c-family/c.opt and then having
> c++-compat enable it and using that new option here.
> 
> Thanks,
> Andrew Pinski

Great!  I'd like to implement it myself, as I've never written any GCC code yet, 
so it's interesting to me.  If you recall any (hopefully recent) case where a 
similar thing happened (the warning was already implemented and only needed a 
name), it might help me check how it was done.

BTW, I had another idea to add a suffix to string literals to make them 
unterminated:

char foo[3] = "foo"u;  // OK
char bar[4] = "bar";   // OK

char baz[4] = "baz"u;  // Warning: initializer is too short.
char etc[3] = "etc";   // Warning: unterminated string.

Is that doable?  Do you think it makes sense?

I have a code base that uses a mix of terminated and unterminated strings, and 
it would be nice to be able to tell the one I want in each case.

Cheers,

Alex

> 
>>
>> It's hard to keep track of sizes to make sure that the string literals always
>> initialize to terminated strings.  It seems something that should be easy to
>> implement in the compiler.
>>
>> A morecomplex case where it's harder to keep track of sizes is:
>>
>> static const char  log_levels[][8] = {
>>       "alert",
>>       "error",
>>       "warn",
>>       "notice",
>>       "info",
>>       "debug",
>> };
>>
>> Here, 8 works now (and 7 too, but for aligmnent reasons I chose 8).  If tomorrow
>> we add or change an entry, It'll be hard to keep it safe.  Such a warning would
>> help a lot.
>>
>>
>> An example program is:
>>
>> $ cat str.c
>> char     two[2] = "foo";   // 'f' 'o'
>> char   three[3] = "foo";   // 'f' 'o' 'o'
>> char    four[4] = "foo";   // 'f' 'o' 'o' '\0'
>> char    five[5] = "foo";   // 'f' 'o' 'o' '\0' '\0'
>> char implicit[] = "foo";   // 'f' 'o' 'o' '\0'
>>
>> $ cc -Wall -Wextra str.c
>> str.c:1:19: warning: initializer-string for array of ‘char’ is too long
>>       1 | char     two[2] = "foo";   // 'f' 'o'
>>         |                   ^~~~~
>> /usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/12/../../../x86_64-linux-gnu/Scrt1.o:
>> in function `_start':
>> (.text+0x17): undefined reference to `main'
>> collect2: error: ld returned 1 exit status
>>
>>
>> Here, I'd like that with the new warning, 'three' would also get warned.
>>
>> Cheers,
>>
>> Alex
>> --
>> <http://www.alejandro-colomar.es/>

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [whish] -Wunterminated-string-initialization: new warning
  2022-11-14 11:37   ` Alejandro Colomar
@ 2022-11-14 13:14     ` Jonathan Wakely
  2022-11-14 13:41       ` unterminated string literals (was: [whish] -Wunterminated-string-initialization: new warning) Alejandro Colomar
  2022-11-14 13:46       ` [whish] -Wunterminated-string-initialization: new warning Alejandro Colomar
  0 siblings, 2 replies; 6+ messages in thread
From: Jonathan Wakely @ 2022-11-14 13:14 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Andrew Pinski, gcc, Douglas McIlroy

On Mon, 14 Nov 2022 at 11:38, Alejandro Colomar via Gcc <gcc@gcc.gnu.org> wrote:
>
> Hi Andrew!
>
> On 11/13/22 23:12, Andrew Pinski wrote:
> > On Sun, Nov 13, 2022 at 1:57 PM Alejandro Colomar via Gcc
> > <gcc@gcc.gnu.org> wrote:
> >>
> >> Hi!
> >>
> >> I'd like to get warnings if I write the following code:
> >>
> >> char foo[3] = "foo";
> >
> > This should be easy to add as it is already part of the -Wc++-compat
> > option as for C++ it is invalid code.
> >
> > <source>:2:19: warning: initializer-string for array of 'char' is too long
> >      2 | char     two[2] = "foo";   // 'f' 'o'
> >        |                   ^~~~~
> > <source>:3:19: warning: initializer-string for array of 'char' is too
> > long for C++ [-Wc++-compat]
> >      3 | char   three[3] = "foo";   // 'f' 'o' 'o'
> >        |                   ^~~~~
> >
> >
> > ... (for your more complex case [though I needed to modify one of the
> > strings to exactly 8]
> >
> > <source>:5:7: warning: initializer-string for array of 'char' is too
> > long for C++ [-Wc++-compat]
> >      5 |       "01234567",
> >        |       ^~~~~~~~~~
> >
> >                else if (warn_cxx_compat
> >                         && compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0)
> >                  warning_at (init_loc, OPT_Wc___compat,
> >                              ("initializer-string for array of %qT "
> >                               "is too long for C++"), typ1);
> >
> > That is the current code which does this warning even so it is just a
> > matter of adding an option to c-family/c.opt and then having
> > c++-compat enable it and using that new option here.
> >
> > Thanks,
> > Andrew Pinski
>
> Great!  I'd like to implement it myself, as I've never written any GCC code yet,
> so it's interesting to me.  If you recall any (hopefully recent) case where a
> similar thing happened (the warning was already implemented and only needed a
> name), it might help me check how it was done.

`git log gcc/c-family/c.opt` will show loads of changes adding warnings.

>
> BTW, I had another idea to add a suffix to string literals to make them
> unterminated:
>
> char foo[3] = "foo"u;  // OK
> char bar[4] = "bar";   // OK
>
> char baz[4] = "baz"u;  // Warning: initializer is too short.
> char etc[3] = "etc";   // Warning: unterminated string.
>
> Is that doable?  Do you think it makes sense?

IMHO no. This is not useful enough to add a language extension, it's
an incredibly niche use case. Your suggested syntax also looks very
confusing with UTF-16 string literals, and is not sufficiently
distinct from a normal string literal to be obvious when quickly
reading the code. People expect string literals in C to be
null-terminated, having a subtle suffix that changes that would be a
bug farm.

You can do {'b', 'a', 'z'} if you want an explicitly unterminated array of char.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* unterminated string literals (was: [whish] -Wunterminated-string-initialization: new warning)
  2022-11-14 13:14     ` Jonathan Wakely
@ 2022-11-14 13:41       ` Alejandro Colomar
  2022-11-14 13:46       ` [whish] -Wunterminated-string-initialization: new warning Alejandro Colomar
  1 sibling, 0 replies; 6+ messages in thread
From: Alejandro Colomar @ 2022-11-14 13:41 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: Andrew Pinski, gcc, Douglas McIlroy


[-- Attachment #1.1: Type: text/plain, Size: 2388 bytes --]

Hi Jonathan,

On 11/14/22 14:14, Jonathan Wakely wrote:
> On Mon, 14 Nov 2022 at 11:38, Alejandro Colomar via Gcc <gcc@gcc.gnu.org> wrote:
>> BTW, I had another idea to add a suffix to string literals to make them
>> unterminated:
>>
>> char foo[3] = "foo"u;  // OK
>> char bar[4] = "bar";   // OK
>>
>> char baz[4] = "baz"u;  // Warning: initializer is too short.
>> char etc[3] = "etc";   // Warning: unterminated string.
>>
>> Is that doable?  Do you think it makes sense?
> 
> IMHO no. This is not useful enough to add a language extension, it's
> an incredibly niche use case.

I agree it's way too niche.

> Your suggested syntax also looks very
> confusing with UTF-16 string literals,

Maybe.

> and is not sufficiently
> distinct from a normal string literal to be obvious when quickly
> reading the code. People expect string literals in C to be
> null-terminated, having a subtle suffix that changes that would be a
> bug farm.

But, you have to combine both the suffix with the corresponding size (one less 
than for normal strings).  A programmer needs to be consciously doing this.  For 
readers of the code, maybe there's a bit more of a readability issue, especially 
if you don't know the extension.  But when you stop a little bit to check what 
that suffix is doing and then realize the size is weird, a reasonable programmer 
should at least ask or check the documentation for that thing.

Regarding safety, I also have that thing very present in my mind, and in an 
attempt to get the compiler on my side, I decided to use 'char *' for 
NUL-terminated strings, and 'u_char *' for u_nterminated strings.  That helps 
the compiler know when we're using one in place of another, which as you say 
would be a source of bugs.

Maybe having the type of these new strings be u_char[] instead of char[] would 
help have more type safety.  I didn't suggest this because that would not be how 
strings in C have always been.  However, considering that they are not really 
strings, it could make sense.

> 
> You can do {'b', 'a', 'z'} if you want an explicitly unterminated array of char.

A bit unreadable :)
I think I'll keep using normal literals, and maybe some workaround to disable 
the warnings for specific cases.  Not my preference, but it can work.

Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [whish] -Wunterminated-string-initialization: new warning
  2022-11-14 13:14     ` Jonathan Wakely
  2022-11-14 13:41       ` unterminated string literals (was: [whish] -Wunterminated-string-initialization: new warning) Alejandro Colomar
@ 2022-11-14 13:46       ` Alejandro Colomar
  1 sibling, 0 replies; 6+ messages in thread
From: Alejandro Colomar @ 2022-11-14 13:46 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: Andrew Pinski, gcc, Douglas McIlroy


[-- Attachment #1.1: Type: text/plain, Size: 2310 bytes --]

Hi Jonathan,

On 11/14/22 14:14, Jonathan Wakely wrote:
> On Mon, 14 Nov 2022 at 11:38, Alejandro Colomar via Gcc <gcc@gcc.gnu.org> wrote:
>>
>> Hi Andrew!
>>
>> On 11/13/22 23:12, Andrew Pinski wrote:
>>> On Sun, Nov 13, 2022 at 1:57 PM Alejandro Colomar via Gcc
>>> <gcc@gcc.gnu.org> wrote:
>>>>
>>>> Hi!
>>>>
>>>> I'd like to get warnings if I write the following code:
>>>>
>>>> char foo[3] = "foo";
>>>
>>> This should be easy to add as it is already part of the -Wc++-compat
>>> option as for C++ it is invalid code.
>>>
>>> <source>:2:19: warning: initializer-string for array of 'char' is too long
>>>       2 | char     two[2] = "foo";   // 'f' 'o'
>>>         |                   ^~~~~
>>> <source>:3:19: warning: initializer-string for array of 'char' is too
>>> long for C++ [-Wc++-compat]
>>>       3 | char   three[3] = "foo";   // 'f' 'o' 'o'
>>>         |                   ^~~~~
>>>
>>>
>>> ... (for your more complex case [though I needed to modify one of the
>>> strings to exactly 8]
>>>
>>> <source>:5:7: warning: initializer-string for array of 'char' is too
>>> long for C++ [-Wc++-compat]
>>>       5 |       "01234567",
>>>         |       ^~~~~~~~~~
>>>
>>>                 else if (warn_cxx_compat
>>>                          && compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0)
>>>                   warning_at (init_loc, OPT_Wc___compat,
>>>                               ("initializer-string for array of %qT "
>>>                                "is too long for C++"), typ1);
>>>
>>> That is the current code which does this warning even so it is just a
>>> matter of adding an option to c-family/c.opt and then having
>>> c++-compat enable it and using that new option here.
>>>
>>> Thanks,
>>> Andrew Pinski
>>
>> Great!  I'd like to implement it myself, as I've never written any GCC code yet,
>> so it's interesting to me.  If you recall any (hopefully recent) case where a
>> similar thing happened (the warning was already implemented and only needed a
>> name), it might help me check how it was done.
> 
> `git log gcc/c-family/c.opt` will show loads of changes adding warnings.

Thanks, I will check and try it, and come back if I have any doubts.

Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-11-14 13:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-13 21:56 [whish] -Wunterminated-string-initialization: new warning Alejandro Colomar
2022-11-13 22:12 ` Andrew Pinski
2022-11-14 11:37   ` Alejandro Colomar
2022-11-14 13:14     ` Jonathan Wakely
2022-11-14 13:41       ` unterminated string literals (was: [whish] -Wunterminated-string-initialization: new warning) Alejandro Colomar
2022-11-14 13:46       ` [whish] -Wunterminated-string-initialization: new warning Alejandro Colomar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).