public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* -Wformat and u8""
@ 2022-05-09  9:15 Ulrich Drepper
  2022-05-09  9:26 ` Florian Weimer
  0 siblings, 1 reply; 6+ messages in thread
From: Ulrich Drepper @ 2022-05-09  9:15 UTC (permalink / raw)
  To: gcc

I have a C++20+ code base which forces the program to run using an UTF-8
locale and then uses u8"" strings internally.  This causes warnings with
-Wformat.

#include <stdio.h>

int main()
{
  printf((const char*) u8"test %d\n", 1);
  return 0;
}

Compile with
   g++ -std=gnu++20 -c -O -Wall t.cc

and you'll see:
t.cc: In function ‘int main()’:
t.cc:5:24: warning: format string is not an array of type ‘char’ [-Wformat=]
    5 |   printf((const char*) u8"test %d\n", 1);
      |                        ^~~~~~~~~~~~~

I would say it is not gcc's business to question my use of u8"" given that
I use a cast and the u8"" string can be parsed by the -Wformat handling.

Before filing a report I'd like to take the temperature and see whether
people agree with this.

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: -Wformat and u8""
  2022-05-09  9:15 -Wformat and u8"" Ulrich Drepper
@ 2022-05-09  9:26 ` Florian Weimer
  2022-05-09 10:04   ` Ulrich Drepper
  2022-05-09 10:07   ` Andreas Schwab
  0 siblings, 2 replies; 6+ messages in thread
From: Florian Weimer @ 2022-05-09  9:26 UTC (permalink / raw)
  To: Ulrich Drepper via Gcc; +Cc: Ulrich Drepper

* Ulrich Drepper via Gcc:

> t.cc: In function ‘int main()’:
> t.cc:5:24: warning: format string is not an array of type ‘char’ [-Wformat=]
>     5 |   printf((const char*) u8"test %d\n", 1);
>       |                        ^~~~~~~~~~~~~

This is not an aliasing violation because of the exception for char,
right?  So the warning does not even highlight theoretical undefined
behavior.

On the other hand, that cast is still quite ugly.  All string-related
functions in the C library currently need it.  It might obscure real
type errors.  Isn't this a problem with char8_t?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: -Wformat and u8""
  2022-05-09  9:26 ` Florian Weimer
@ 2022-05-09 10:04   ` Ulrich Drepper
  2022-05-09 10:07   ` Andreas Schwab
  1 sibling, 0 replies; 6+ messages in thread
From: Ulrich Drepper @ 2022-05-09 10:04 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Ulrich Drepper via Gcc

On Mon, May 9, 2022 at 11:26 AM Florian Weimer <fweimer@redhat.com> wrote:

> On the other hand, that cast is still quite ugly.


Yes, there aren't yet any I/O functions defined for char8_t and therefore
that's the best we can do right now.  I have all kinds of ugly macros to
high these casts.


> All string-related
> functions in the C library currently need it.


Yes, but the cast isn't the issue.  Or more correctly: gcc disregarding the
cast for -Wformat is.

Anyway, I'm not concerned about the non-I/O functions.  This is all C++
code after all and there are functions for all the rest.


> Isn't this a problem with char8_t?
>

 Well, yes, the problem is that gcc seems to just see the u8"" type
(char8_t) even though I tell it with the cast to regard it as a const
char.  Again, I ensure that the encoding matches and putting UTF-8 in char
strings is actually incorrect (in theory).

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: -Wformat and u8""
  2022-05-09  9:26 ` Florian Weimer
  2022-05-09 10:04   ` Ulrich Drepper
@ 2022-05-09 10:07   ` Andreas Schwab
  2022-05-10 13:27     ` Jonathan Wakely
  1 sibling, 1 reply; 6+ messages in thread
From: Andreas Schwab @ 2022-05-09 10:07 UTC (permalink / raw)
  To: Florian Weimer via Gcc; +Cc: Florian Weimer, Ulrich Drepper

On Mai 09 2022, Florian Weimer via Gcc wrote:

> * Ulrich Drepper via Gcc:
>
>> t.cc: In function ‘int main()’:
>> t.cc:5:24: warning: format string is not an array of type ‘char’ [-Wformat=]
>>     5 |   printf((const char*) u8"test %d\n", 1);
>>       |                        ^~~~~~~~~~~~~
>
> This is not an aliasing violation because of the exception for char,
> right?  So the warning does not even highlight theoretical undefined
> behavior.
>
> On the other hand, that cast is still quite ugly.  All string-related
> functions in the C library currently need it.  It might obscure real
> type errors.  Isn't this a problem with char8_t?

In C++20, u8 literals have a distinct type, which is an incompatible
change from C++17.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: -Wformat and u8""
  2022-05-09 10:07   ` Andreas Schwab
@ 2022-05-10 13:27     ` Jonathan Wakely
  2022-05-10 17:48       ` Tom Honermann
  0 siblings, 1 reply; 6+ messages in thread
From: Jonathan Wakely @ 2022-05-10 13:27 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Florian Weimer via Gcc, Florian Weimer, Ulrich Drepper

On Mon, 9 May 2022 at 11:09, Andreas Schwab wrote:
>
> On Mai 09 2022, Florian Weimer via Gcc wrote:
>
> > * Ulrich Drepper via Gcc:
> >
> >> t.cc: In function ‘int main()’:
> >> t.cc:5:24: warning: format string is not an array of type ‘char’ [-Wformat=]
> >>     5 |   printf((const char*) u8"test %d\n", 1);
> >>       |                        ^~~~~~~~~~~~~
> >
> > This is not an aliasing violation because of the exception for char,
> > right?  So the warning does not even highlight theoretical undefined
> > behavior.
> >
> > On the other hand, that cast is still quite ugly.  All string-related
> > functions in the C library currently need it.  It might obscure real
> > type errors.  Isn't this a problem with char8_t?
>
> In C++20, u8 literals have a distinct type, which is an incompatible
> change from C++17.

And the recommended way to deal with it is to use a cast as Ulrich did.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: -Wformat and u8""
  2022-05-10 13:27     ` Jonathan Wakely
@ 2022-05-10 17:48       ` Tom Honermann
  0 siblings, 0 replies; 6+ messages in thread
From: Tom Honermann @ 2022-05-10 17:48 UTC (permalink / raw)
  To: Jonathan Wakely, Andreas Schwab
  Cc: Florian Weimer via Gcc, Florian Weimer, Ulrich Drepper

On 5/10/22 9:27 AM, Jonathan Wakely wrote:
> On Mon, 9 May 2022 at 11:09, Andreas Schwab wrote:
>> On Mai 09 2022, Florian Weimer via Gcc wrote:
>>
>>> * Ulrich Drepper via Gcc:
>>>
>>>> t.cc: In function ‘int main()’:
>>>> t.cc:5:24: warning: format string is not an array of type ‘char’ [-Wformat=]
>>>>      5 |   printf((const char*) u8"test %d\n", 1);
>>>>        |                        ^~~~~~~~~~~~~
>>> This is not an aliasing violation because of the exception for char,
>>> right?  So the warning does not even highlight theoretical undefined
>>> behavior.
>>>
>>> On the other hand, that cast is still quite ugly.  All string-related
>>> functions in the C library currently need it.  It might obscure real
>>> type errors.  Isn't this a problem with char8_t?
>> In C++20, u8 literals have a distinct type, which is an incompatible
>> change from C++17.
> And the recommended way to deal with it is to use a cast as Ulrich did.

Thanks for copying me, Jonathan.

 From the perspective of the standard, printf() expects its format 
string to be specified in the locale dependent multibyte encoding, so 
passing a UTF-8 encoded string is, of course, not guaranteed to produce 
a useful result (and certainly would not on, for example, an 
EBCDIC-based platform).

I would not recommend use of a cast in this case, but would rather ask 
why there is a perceived need to specify a u8 prefixed string literal at 
all. If the locale is expected/required to be UTF-8 for the program to 
work as intended, then the execution character set is presumably set to 
be (or should be) UTF-8 as well in which case an ordinary string literal 
will be UTF-8 encoded and there is no need to use a u8 prefixed string 
literal. So, instead of adding a cast, I would recommend removing the u8 
prefix.

Tom.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-05-10 17:48 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-09  9:15 -Wformat and u8"" Ulrich Drepper
2022-05-09  9:26 ` Florian Weimer
2022-05-09 10:04   ` Ulrich Drepper
2022-05-09 10:07   ` Andreas Schwab
2022-05-10 13:27     ` Jonathan Wakely
2022-05-10 17:48       ` Tom Honermann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).