public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: Tom Honermann <tom@honermann.net>
To: Jonathan Wakely <jwakely.gcc@gmail.com>,
	Andreas Schwab <schwab@linux-m68k.org>
Cc: Florian Weimer via Gcc <gcc@gcc.gnu.org>,
	Florian Weimer <fweimer@redhat.com>,
	Ulrich Drepper <drepper@redhat.com>
Subject: Re: -Wformat and u8""
Date: Tue, 10 May 2022 13:48:31 -0400	[thread overview]
Message-ID: <57dcebc2-6ba0-895d-d520-f6ac292c0e32@honermann.net> (raw)
In-Reply-To: <CAH6eHdRFxHYe_sAQgvswX+Owpzhw_bh0MNsD++o9bVc-OePR9Q@mail.gmail.com>

On 5/10/22 9:27 AM, Jonathan Wakely wrote:
> On Mon, 9 May 2022 at 11:09, Andreas Schwab wrote:
>> On Mai 09 2022, Florian Weimer via Gcc wrote:
>>
>>> * Ulrich Drepper via Gcc:
>>>
>>>> t.cc: In function ‘int main()’:
>>>> t.cc:5:24: warning: format string is not an array of type ‘char’ [-Wformat=]
>>>>      5 |   printf((const char*) u8"test %d\n", 1);
>>>>        |                        ^~~~~~~~~~~~~
>>> This is not an aliasing violation because of the exception for char,
>>> right?  So the warning does not even highlight theoretical undefined
>>> behavior.
>>>
>>> On the other hand, that cast is still quite ugly.  All string-related
>>> functions in the C library currently need it.  It might obscure real
>>> type errors.  Isn't this a problem with char8_t?
>> In C++20, u8 literals have a distinct type, which is an incompatible
>> change from C++17.
> And the recommended way to deal with it is to use a cast as Ulrich did.

Thanks for copying me, Jonathan.

 From the perspective of the standard, printf() expects its format 
string to be specified in the locale dependent multibyte encoding, so 
passing a UTF-8 encoded string is, of course, not guaranteed to produce 
a useful result (and certainly would not on, for example, an 
EBCDIC-based platform).

I would not recommend use of a cast in this case, but would rather ask 
why there is a perceived need to specify a u8 prefixed string literal at 
all. If the locale is expected/required to be UTF-8 for the program to 
work as intended, then the execution character set is presumably set to 
be (or should be) UTF-8 as well in which case an ordinary string literal 
will be UTF-8 encoded and there is no need to use a u8 prefixed string 
literal. So, instead of adding a cast, I would recommend removing the u8 
prefix.

Tom.


      reply	other threads:[~2022-05-10 17:48 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-09  9:15 Ulrich Drepper
2022-05-09  9:26 ` Florian Weimer
2022-05-09 10:04   ` Ulrich Drepper
2022-05-09 10:07   ` Andreas Schwab
2022-05-10 13:27     ` Jonathan Wakely
2022-05-10 17:48       ` Tom Honermann [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57dcebc2-6ba0-895d-d520-f6ac292c0e32@honermann.net \
    --to=tom@honermann.net \
    --cc=drepper@redhat.com \
    --cc=fweimer@redhat.com \
    --cc=gcc@gcc.gnu.org \
    --cc=jwakely.gcc@gmail.com \
    --cc=schwab@linux-m68k.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).