public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] C, ObjC: Add -Wunterminated-string-initialization
@ 2023-03-17  1:12 Alejandro Colomar
  2023-03-17  1:27 ` Alejandro Colomar
  0 siblings, 1 reply; 2+ messages in thread
From: Alejandro Colomar @ 2023-03-17  1:12 UTC (permalink / raw)
  To: gcc
  Cc: Alejandro Colomar, Doug McIlroy, G. Branden Robinson,
	Ralph Corderoy, Dave Kemper, Larry McVoy, Andrew Pinski,
	Jonathan Wakely, Andrew Clayton

Warn about the following:

    char  s[3] = "foo";

Initializing a char array with a string literal of the same length as
the size of the array is usually a mistake.  Rarely is the case where
one wants to create a non-terminated character sequence from a string
literal.

In some cases, for writing faster code, one may want to use arrays
instead of pointers, since that removes the need for storing an array of
pointers apart from the strings themselves.

    char  *log_levels[]   = { "info", "warning", "err" };
vs.
    char  log_levels[][7] = { "info", "warning", "err" };

This forces the programmer to specify a size, which might change if a
new entry is later added.  Having no way to enforce null termination is
very dangerous, however, so it is useful to have a warning for this, so
that the compiler can make sure that the programmer didn't make any
mistakes.  This warning catches the bug above, so that the programmer
will be able to fix it and write:

    char  log_levels[][8] = { "info", "warning", "err" };

This warning already existed as part of -Wc++-compat, but this patch
allows enabling it separately.  It is also included in -Wextra, since
it may not always be desired (when unterminated character sequences are
wanted), but it's likely to be desired in most cases.

Link: <https://lists.gnu.org/archive/html/groff/2022-11/msg00059.html>
Link: <https://lists.gnu.org/archive/html/groff/2022-11/msg00063.html>
Link: <https://inbox.sourceware.org/gcc/36da94eb-1cac-5ae8-7fea-ec66160cf413@gmail.com/T/>
Acked-by: Doug McIlroy <douglas.mcilroy@dartmouth.edu>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Ralph Corderoy <ralph@inputplus.co.uk>
Cc: Dave Kemper <saint.snit@gmail.com>
Cc: Larry McVoy <lm@mcvoy.com>
Cc: Andrew Pinski <pinskia@gmail.com>
Cc: Jonathan Wakely <jwakely.gcc@gmail.com>
Cc: Andrew Clayton <andrew@digital-domain.net>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---

Hi!

I finally have a working patch for this warning :-)
Tested with the following code:

	$ cat str.c 
	int main(void)
	{
		char a[2] = "foo";
		char b[3] = "bar";
		char c[4] = "baz";
		char d[5] = "qwe";
		char log_levels[][N] = {  // -DN=7
			"info",
			"warning",
			"err"
		};
		return *a + *b + *c + *d + log_levels[0][0];
	}

One thing which doesn't make me fully happy about this warning is that
the message is a bit worse than the one in C++.  See:

	$ /opt/local/gnu/gcc/wusi/1/bin/gcc str.c \
	      -Wall -Wunterminated-string-initialization -DN=8
	str.c: In function ‘main’:
	str.c:4:21: warning: initializer-string for array of ‘char’ is too long
	    4 |         char a[2] = "foo";
	      |                     ^~~~~
	str.c:5:21: warning: initializer-string for array of ‘char’ is too long for C++ [-Wunterminated-string-initialization]
	    5 |         char b[3] = "bar";
	      |                     ^~~~~
	$ /opt/local/gnu/gcc/wusi/1/bin/g++ str.c \
	      -Wall -Wunterminated-string-initialization -DN=8
	str.c: In function ‘int main()’:
	str.c:4:21: error: initializer-string for ‘char [2]’ is too long [-fpermissive]
	    4 |         char a[2] = "foo";
	      |                     ^~~~~
	str.c:5:21: error: initializer-string for ‘char [3]’ is too long [-fpermissive]
	    5 |         char b[3] = "bar";
	      |                     ^~~~~

In C++ we see the complete type in the error message, which is more
informative than "array of 'char'".  This is especially relevant for
multiline definitions, where the shown line may not contain the type,
but only the string.  However, that was already the case previously with
-Wc++-compat, so a fix for that might be better as a different patch.

	$ /opt/local/gnu/gcc/wusi/1/bin/gcc str.c \
	      -Wall -Wunterminated-string-initialization -DN=7
	str.c: In function ‘main’:
	str.c:4:21: warning: initializer-string for array of ‘char’ is too long
	    4 |         char a[2] = "foo";
	      |                     ^~~~~
	str.c:5:21: warning: initializer-string for array of ‘char’ is too long for C++ [-Wunterminated-string-initialization]
	    5 |         char b[3] = "bar";
	      |                     ^~~~~
	str.c:10:17: warning: initializer-string for array of ‘char’ is too long for C++ [-Wunterminated-string-initialization]
	   10 |                 "warning",
	      |                 ^~~~~~~~~
	$ /opt/local/gnu/gcc/wusi/1/bin/g++ str.c \
	      -Wall -Wunterminated-string-initialization -DN=7
	str.c: In function ‘int main()’:
	str.c:4:21: error: initializer-string for ‘char [2]’ is too long [-fpermissive]
	    4 |         char a[2] = "foo";
	      |                     ^~~~~
	str.c:5:21: error: initializer-string for ‘char [3]’ is too long [-fpermissive]
	    5 |         char b[3] = "bar";
	      |                     ^~~~~
	str.c:10:17: error: initializer-string for ‘char [7]’ is too long [-fpermissive]
	   10 |                 "warning",
	      |                 ^~~~~~~~~


BTW, I only tested C; not ObjC.  I never in my life used Objective C, so
I don't even know how relevant this is for that language.  I just found
that it has -Wc++-compat, and so I guessed that this warning would also
trigger in that language, so I did the same as for C.  I hope that's
correct.

Cheers,

Alex

 gcc/c-family/c.opt | 4 ++++
 gcc/c/c-typeck.cc  | 6 +++---
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 3333cddeece..7f1fccfe02b 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1382,6 +1382,10 @@ Wunsuffixed-float-constants
 C ObjC Var(warn_unsuffixed_float_constants) Warning
 Warn about unsuffixed float constants.
 
+Wunterminated-string-initialization
+C ObjC Var(warn_unterminated_string_initialization) Warning LangEnabledBy(C ObjC,Wextra || Wc++-compat)
+Warn about character arrays initialized as unterminated character sequences by a string literal.
+
 Wunused
 C ObjC C++ ObjC++ LangEnabledBy(C ObjC C++ ObjC++,Wall)
 ; documented in common.opt
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 45bacc06c47..ce2750f98bb 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -8420,11 +8420,11 @@ digest_init (location_t init_loc, tree type, tree init, tree origtype,
 		pedwarn_init (init_loc, 0,
 			      ("initializer-string for array of %qT "
 			       "is too long"), typ1);
-	      else if (warn_cxx_compat
+	      else if (warn_unterminated_string_initialization
 		       && compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0)
-		warning_at (init_loc, OPT_Wc___compat,
+		warning_at (init_loc, OPT_Wunterminated_string_initialization,
 			    ("initializer-string for array of %qT "
-			     "is too long for C++"), typ1);
+			     "is too long"), typ1);
 	      if (compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0)
 		{
 		  unsigned HOST_WIDE_INT size
-- 
2.39.2


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] C, ObjC: Add -Wunterminated-string-initialization
  2023-03-17  1:12 [PATCH] C, ObjC: Add -Wunterminated-string-initialization Alejandro Colomar
@ 2023-03-17  1:27 ` Alejandro Colomar
  0 siblings, 0 replies; 2+ messages in thread
From: Alejandro Colomar @ 2023-03-17  1:27 UTC (permalink / raw)
  To: gcc
  Cc: Alejandro Colomar, Doug McIlroy, G. Branden Robinson,
	Ralph Corderoy, Dave Kemper, Larry McVoy, Andrew Pinski,
	Jonathan Wakely, Andrew Clayton


[-- Attachment #1.1: Type: text/plain, Size: 7632 bytes --]



On 3/17/23 02:12, Alejandro Colomar wrote:
> Warn about the following:
> 
>     char  s[3] = "foo";
> 
> Initializing a char array with a string literal of the same length as
> the size of the array is usually a mistake.  Rarely is the case where
> one wants to create a non-terminated character sequence from a string
> literal.
> 
> In some cases, for writing faster code, one may want to use arrays
> instead of pointers, since that removes the need for storing an array of
> pointers apart from the strings themselves.
> 
>     char  *log_levels[]   = { "info", "warning", "err" };
> vs.
>     char  log_levels[][7] = { "info", "warning", "err" };
> 
> This forces the programmer to specify a size, which might change if a
> new entry is later added.  Having no way to enforce null termination is
> very dangerous, however, so it is useful to have a warning for this, so
> that the compiler can make sure that the programmer didn't make any
> mistakes.  This warning catches the bug above, so that the programmer
> will be able to fix it and write:
> 
>     char  log_levels[][8] = { "info", "warning", "err" };
> 
> This warning already existed as part of -Wc++-compat, but this patch
> allows enabling it separately.  It is also included in -Wextra, since
> it may not always be desired (when unterminated character sequences are
> wanted), but it's likely to be desired in most cases.
> 
> Link: <https://lists.gnu.org/archive/html/groff/2022-11/msg00059.html>
> Link: <https://lists.gnu.org/archive/html/groff/2022-11/msg00063.html>
> Link: <https://inbox.sourceware.org/gcc/36da94eb-1cac-5ae8-7fea-ec66160cf413@gmail.com/T/>
> Acked-by: Doug McIlroy <douglas.mcilroy@dartmouth.edu>
> Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
> Cc: Ralph Corderoy <ralph@inputplus.co.uk>
> Cc: Dave Kemper <saint.snit@gmail.com>
> Cc: Larry McVoy <lm@mcvoy.com>
> Cc: Andrew Pinski <pinskia@gmail.com>
> Cc: Jonathan Wakely <jwakely.gcc@gmail.com>
> Cc: Andrew Clayton <andrew@digital-domain.net>
> Signed-off-by: Alejandro Colomar <alx@kernel.org>
> ---
> 
> Hi!
> 
> I finally have a working patch for this warning :-)
> Tested with the following code:
> 
> 	$ cat str.c 
> 	int main(void)
> 	{
> 		char a[2] = "foo";
> 		char b[3] = "bar";
> 		char c[4] = "baz";
> 		char d[5] = "qwe";
> 		char log_levels[][N] = {  // -DN=7
> 			"info",
> 			"warning",
> 			"err"
> 		};
> 		return *a + *b + *c + *d + log_levels[0][0];
> 	}
> 
> One thing which doesn't make me fully happy about this warning is that
> the message is a bit worse than the one in C++.  See:
> 
> 	$ /opt/local/gnu/gcc/wusi/1/bin/gcc str.c \
> 	      -Wall -Wunterminated-string-initialization -DN=8
> 	str.c: In function ‘main’:
> 	str.c:4:21: warning: initializer-string for array of ‘char’ is too long
> 	    4 |         char a[2] = "foo";
> 	      |                     ^~~~~
> 	str.c:5:21: warning: initializer-string for array of ‘char’ is too long for C++ [-Wunterminated-string-initialization]

You may notice that these messages still have the "for C++" thingy.
I removed that after testing, but since it's just text I didn't test again.

> 	    5 |         char b[3] = "bar";
> 	      |                     ^~~~~
> 	$ /opt/local/gnu/gcc/wusi/1/bin/g++ str.c \
> 	      -Wall -Wunterminated-string-initialization -DN=8
> 	str.c: In function ‘int main()’:
> 	str.c:4:21: error: initializer-string for ‘char [2]’ is too long [-fpermissive]
> 	    4 |         char a[2] = "foo";
> 	      |                     ^~~~~
> 	str.c:5:21: error: initializer-string for ‘char [3]’ is too long [-fpermissive]
> 	    5 |         char b[3] = "bar";
> 	      |                     ^~~~~
> 
> In C++ we see the complete type in the error message, which is more
> informative than "array of 'char'".  This is especially relevant for
> multiline definitions, where the shown line may not contain the type,
> but only the string.  However, that was already the case previously with
> -Wc++-compat, so a fix for that might be better as a different patch.
> 
> 	$ /opt/local/gnu/gcc/wusi/1/bin/gcc str.c \
> 	      -Wall -Wunterminated-string-initialization -DN=7
> 	str.c: In function ‘main’:
> 	str.c:4:21: warning: initializer-string for array of ‘char’ is too long
> 	    4 |         char a[2] = "foo";
> 	      |                     ^~~~~
> 	str.c:5:21: warning: initializer-string for array of ‘char’ is too long for C++ [-Wunterminated-string-initialization]
> 	    5 |         char b[3] = "bar";
> 	      |                     ^~~~~
> 	str.c:10:17: warning: initializer-string for array of ‘char’ is too long for C++ [-Wunterminated-string-initialization]
> 	   10 |                 "warning",
> 	      |                 ^~~~~~~~~
> 	$ /opt/local/gnu/gcc/wusi/1/bin/g++ str.c \
> 	      -Wall -Wunterminated-string-initialization -DN=7
> 	str.c: In function ‘int main()’:
> 	str.c:4:21: error: initializer-string for ‘char [2]’ is too long [-fpermissive]
> 	    4 |         char a[2] = "foo";
> 	      |                     ^~~~~
> 	str.c:5:21: error: initializer-string for ‘char [3]’ is too long [-fpermissive]
> 	    5 |         char b[3] = "bar";
> 	      |                     ^~~~~
> 	str.c:10:17: error: initializer-string for ‘char [7]’ is too long [-fpermissive]
> 	   10 |                 "warning",
> 	      |                 ^~~~~~~~~
> 
> 
> BTW, I only tested C; not ObjC.  I never in my life used Objective C, so
> I don't even know how relevant this is for that language.  I just found
> that it has -Wc++-compat, and so I guessed that this warning would also
> trigger in that language, so I did the same as for C.  I hope that's
> correct.
> 
> Cheers,
> 
> Alex
> 
>  gcc/c-family/c.opt | 4 ++++
>  gcc/c/c-typeck.cc  | 6 +++---
>  2 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
> index 3333cddeece..7f1fccfe02b 100644
> --- a/gcc/c-family/c.opt
> +++ b/gcc/c-family/c.opt
> @@ -1382,6 +1382,10 @@ Wunsuffixed-float-constants
>  C ObjC Var(warn_unsuffixed_float_constants) Warning
>  Warn about unsuffixed float constants.
>  
> +Wunterminated-string-initialization
> +C ObjC Var(warn_unterminated_string_initialization) Warning LangEnabledBy(C ObjC,Wextra || Wc++-compat)
> +Warn about character arrays initialized as unterminated character sequences by a string literal.
> +
>  Wunused
>  C ObjC C++ ObjC++ LangEnabledBy(C ObjC C++ ObjC++,Wall)
>  ; documented in common.opt
> diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
> index 45bacc06c47..ce2750f98bb 100644
> --- a/gcc/c/c-typeck.cc
> +++ b/gcc/c/c-typeck.cc
> @@ -8420,11 +8420,11 @@ digest_init (location_t init_loc, tree type, tree init, tree origtype,
>  		pedwarn_init (init_loc, 0,
>  			      ("initializer-string for array of %qT "
>  			       "is too long"), typ1);
> -	      else if (warn_cxx_compat
> +	      else if (warn_unterminated_string_initialization
>  		       && compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0)
> -		warning_at (init_loc, OPT_Wc___compat,
> +		warning_at (init_loc, OPT_Wunterminated_string_initialization,
>  			    ("initializer-string for array of %qT "
> -			     "is too long for C++"), typ1);
> +			     "is too long"), typ1);
>  	      if (compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0)
>  		{
>  		  unsigned HOST_WIDE_INT size

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-03-17  1:27 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-17  1:12 [PATCH] C, ObjC: Add -Wunterminated-string-initialization Alejandro Colomar
2023-03-17  1:27 ` Alejandro Colomar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).