* -Wformat and u8"" @ 2022-05-09 9:15 Ulrich Drepper 2022-05-09 9:26 ` Florian Weimer 0 siblings, 1 reply; 6+ messages in thread From: Ulrich Drepper @ 2022-05-09 9:15 UTC (permalink / raw) To: gcc I have a C++20+ code base which forces the program to run using an UTF-8 locale and then uses u8"" strings internally. This causes warnings with -Wformat. #include <stdio.h> int main() { printf((const char*) u8"test %d\n", 1); return 0; } Compile with g++ -std=gnu++20 -c -O -Wall t.cc and you'll see: t.cc: In function ‘int main()’: t.cc:5:24: warning: format string is not an array of type ‘char’ [-Wformat=] 5 | printf((const char*) u8"test %d\n", 1); | ^~~~~~~~~~~~~ I would say it is not gcc's business to question my use of u8"" given that I use a cast and the u8"" string can be parsed by the -Wformat handling. Before filing a report I'd like to take the temperature and see whether people agree with this. Thanks. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: -Wformat and u8"" 2022-05-09 9:15 -Wformat and u8"" Ulrich Drepper @ 2022-05-09 9:26 ` Florian Weimer 2022-05-09 10:04 ` Ulrich Drepper 2022-05-09 10:07 ` Andreas Schwab 0 siblings, 2 replies; 6+ messages in thread From: Florian Weimer @ 2022-05-09 9:26 UTC (permalink / raw) To: Ulrich Drepper via Gcc; +Cc: Ulrich Drepper * Ulrich Drepper via Gcc: > t.cc: In function ‘int main()’: > t.cc:5:24: warning: format string is not an array of type ‘char’ [-Wformat=] > 5 | printf((const char*) u8"test %d\n", 1); > | ^~~~~~~~~~~~~ This is not an aliasing violation because of the exception for char, right? So the warning does not even highlight theoretical undefined behavior. On the other hand, that cast is still quite ugly. All string-related functions in the C library currently need it. It might obscure real type errors. Isn't this a problem with char8_t? Thanks, Florian ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: -Wformat and u8"" 2022-05-09 9:26 ` Florian Weimer @ 2022-05-09 10:04 ` Ulrich Drepper 2022-05-09 10:07 ` Andreas Schwab 1 sibling, 0 replies; 6+ messages in thread From: Ulrich Drepper @ 2022-05-09 10:04 UTC (permalink / raw) To: Florian Weimer; +Cc: Ulrich Drepper via Gcc On Mon, May 9, 2022 at 11:26 AM Florian Weimer <fweimer@redhat.com> wrote: > On the other hand, that cast is still quite ugly. Yes, there aren't yet any I/O functions defined for char8_t and therefore that's the best we can do right now. I have all kinds of ugly macros to high these casts. > All string-related > functions in the C library currently need it. Yes, but the cast isn't the issue. Or more correctly: gcc disregarding the cast for -Wformat is. Anyway, I'm not concerned about the non-I/O functions. This is all C++ code after all and there are functions for all the rest. > Isn't this a problem with char8_t? > Well, yes, the problem is that gcc seems to just see the u8"" type (char8_t) even though I tell it with the cast to regard it as a const char. Again, I ensure that the encoding matches and putting UTF-8 in char strings is actually incorrect (in theory). ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: -Wformat and u8"" 2022-05-09 9:26 ` Florian Weimer 2022-05-09 10:04 ` Ulrich Drepper @ 2022-05-09 10:07 ` Andreas Schwab 2022-05-10 13:27 ` Jonathan Wakely 1 sibling, 1 reply; 6+ messages in thread From: Andreas Schwab @ 2022-05-09 10:07 UTC (permalink / raw) To: Florian Weimer via Gcc; +Cc: Florian Weimer, Ulrich Drepper On Mai 09 2022, Florian Weimer via Gcc wrote: > * Ulrich Drepper via Gcc: > >> t.cc: In function ‘int main()’: >> t.cc:5:24: warning: format string is not an array of type ‘char’ [-Wformat=] >> 5 | printf((const char*) u8"test %d\n", 1); >> | ^~~~~~~~~~~~~ > > This is not an aliasing violation because of the exception for char, > right? So the warning does not even highlight theoretical undefined > behavior. > > On the other hand, that cast is still quite ugly. All string-related > functions in the C library currently need it. It might obscure real > type errors. Isn't this a problem with char8_t? In C++20, u8 literals have a distinct type, which is an incompatible change from C++17. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: -Wformat and u8"" 2022-05-09 10:07 ` Andreas Schwab @ 2022-05-10 13:27 ` Jonathan Wakely 2022-05-10 17:48 ` Tom Honermann 0 siblings, 1 reply; 6+ messages in thread From: Jonathan Wakely @ 2022-05-10 13:27 UTC (permalink / raw) To: Andreas Schwab; +Cc: Florian Weimer via Gcc, Florian Weimer, Ulrich Drepper On Mon, 9 May 2022 at 11:09, Andreas Schwab wrote: > > On Mai 09 2022, Florian Weimer via Gcc wrote: > > > * Ulrich Drepper via Gcc: > > > >> t.cc: In function ‘int main()’: > >> t.cc:5:24: warning: format string is not an array of type ‘char’ [-Wformat=] > >> 5 | printf((const char*) u8"test %d\n", 1); > >> | ^~~~~~~~~~~~~ > > > > This is not an aliasing violation because of the exception for char, > > right? So the warning does not even highlight theoretical undefined > > behavior. > > > > On the other hand, that cast is still quite ugly. All string-related > > functions in the C library currently need it. It might obscure real > > type errors. Isn't this a problem with char8_t? > > In C++20, u8 literals have a distinct type, which is an incompatible > change from C++17. And the recommended way to deal with it is to use a cast as Ulrich did. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: -Wformat and u8"" 2022-05-10 13:27 ` Jonathan Wakely @ 2022-05-10 17:48 ` Tom Honermann 0 siblings, 0 replies; 6+ messages in thread From: Tom Honermann @ 2022-05-10 17:48 UTC (permalink / raw) To: Jonathan Wakely, Andreas Schwab Cc: Florian Weimer via Gcc, Florian Weimer, Ulrich Drepper On 5/10/22 9:27 AM, Jonathan Wakely wrote: > On Mon, 9 May 2022 at 11:09, Andreas Schwab wrote: >> On Mai 09 2022, Florian Weimer via Gcc wrote: >> >>> * Ulrich Drepper via Gcc: >>> >>>> t.cc: In function ‘int main()’: >>>> t.cc:5:24: warning: format string is not an array of type ‘char’ [-Wformat=] >>>> 5 | printf((const char*) u8"test %d\n", 1); >>>> | ^~~~~~~~~~~~~ >>> This is not an aliasing violation because of the exception for char, >>> right? So the warning does not even highlight theoretical undefined >>> behavior. >>> >>> On the other hand, that cast is still quite ugly. All string-related >>> functions in the C library currently need it. It might obscure real >>> type errors. Isn't this a problem with char8_t? >> In C++20, u8 literals have a distinct type, which is an incompatible >> change from C++17. > And the recommended way to deal with it is to use a cast as Ulrich did. Thanks for copying me, Jonathan. From the perspective of the standard, printf() expects its format string to be specified in the locale dependent multibyte encoding, so passing a UTF-8 encoded string is, of course, not guaranteed to produce a useful result (and certainly would not on, for example, an EBCDIC-based platform). I would not recommend use of a cast in this case, but would rather ask why there is a perceived need to specify a u8 prefixed string literal at all. If the locale is expected/required to be UTF-8 for the program to work as intended, then the execution character set is presumably set to be (or should be) UTF-8 as well in which case an ordinary string literal will be UTF-8 encoded and there is no need to use a u8 prefixed string literal. So, instead of adding a cast, I would recommend removing the u8 prefix. Tom. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-05-10 17:48 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-05-09 9:15 -Wformat and u8"" Ulrich Drepper 2022-05-09 9:26 ` Florian Weimer 2022-05-09 10:04 ` Ulrich Drepper 2022-05-09 10:07 ` Andreas Schwab 2022-05-10 13:27 ` Jonathan Wakely 2022-05-10 17:48 ` Tom Honermann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).