From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp82.iad3b.emailsrvr.com (smtp82.iad3b.emailsrvr.com [146.20.161.82]) by sourceware.org (Postfix) with ESMTPS id 168053858016 for ; Tue, 10 May 2022 17:48:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 168053858016 X-Auth-ID: tom@honermann.net Received: by smtp19.relay.iad3b.emailsrvr.com (Authenticated sender: tom-AT-honermann.net) with ESMTPSA id 7CC8F40130; Tue, 10 May 2022 13:48:32 -0400 (EDT) Message-ID: <57dcebc2-6ba0-895d-d520-f6ac292c0e32@honermann.net> Date: Tue, 10 May 2022 13:48:31 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Subject: Re: -Wformat and u8"" Content-Language: en-US To: Jonathan Wakely , Andreas Schwab Cc: Florian Weimer via Gcc , Florian Weimer , Ulrich Drepper References: <87wnevcb0f.fsf@oldenburg.str.redhat.com> <87levbkoj4.fsf@igel.home> From: Tom Honermann In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Classification-ID: b1117228-7e94-458d-bee2-2a02caac34a5-1-1 X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 May 2022 17:48:36 -0000 On 5/10/22 9:27 AM, Jonathan Wakely wrote: > On Mon, 9 May 2022 at 11:09, Andreas Schwab wrote: >> On Mai 09 2022, Florian Weimer via Gcc wrote: >> >>> * Ulrich Drepper via Gcc: >>> >>>> t.cc: In function ‘int main()’: >>>> t.cc:5:24: warning: format string is not an array of type ‘char’ [-Wformat=] >>>> 5 | printf((const char*) u8"test %d\n", 1); >>>> | ^~~~~~~~~~~~~ >>> This is not an aliasing violation because of the exception for char, >>> right? So the warning does not even highlight theoretical undefined >>> behavior. >>> >>> On the other hand, that cast is still quite ugly. All string-related >>> functions in the C library currently need it. It might obscure real >>> type errors. Isn't this a problem with char8_t? >> In C++20, u8 literals have a distinct type, which is an incompatible >> change from C++17. > And the recommended way to deal with it is to use a cast as Ulrich did. Thanks for copying me, Jonathan. From the perspective of the standard, printf() expects its format string to be specified in the locale dependent multibyte encoding, so passing a UTF-8 encoded string is, of course, not guaranteed to produce a useful result (and certainly would not on, for example, an EBCDIC-based platform). I would not recommend use of a cast in this case, but would rather ask why there is a perceived need to specify a u8 prefixed string literal at all. If the locale is expected/required to be UTF-8 for the program to work as intended, then the execution character set is presumably set to be (or should be) UTF-8 as well in which case an ordinary string literal will be UTF-8 encoded and there is no need to use a u8 prefixed string literal. So, instead of adding a cast, I would recommend removing the u8 prefix. Tom.