public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
From: "cvs-commit at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> To: gcc-bugs@gcc.gnu.org Subject: [Bug c++/102615] [C++23] P2316R2 - Consistent character literal encoding Date: Thu, 07 Oct 2021 13:17:23 +0000 [thread overview] Message-ID: <bug-102615-4-7QgsDhSDLc@http.gcc.gnu.org/bugzilla/> (raw) In-Reply-To: <bug-102615-4@http.gcc.gnu.org/bugzilla/> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102615 --- Comment #2 from CVS Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:348b426be3fc99453b42e79a18331c7bf24ee0dc commit r12-4226-g348b426be3fc99453b42e79a18331c7bf24ee0dc Author: Jakub Jelinek <jakub@redhat.com> Date: Thu Oct 7 15:16:13 2021 +0200 c++: Add testcase for C++23 P2316R2 - consistent character literal encoding [PR102615] I believe we need no changes to the compiler for P2316R2, seems we treat character literals the same between preprocessor and C++ expressions, here is a testcase that should verify it. Note, seems the internal charset for GCC can be either UTF-8 or UTF-EBCDIC, but I bet it is very hard (at least for me) to actually test the latter. I'd guess one needs all system headers to be in EBCDIC and the gcc sources too. But looking around the source, I'm a little bit worried about the UTF-EBCDIC case. One is: #if '\n' == 0x0A && ' ' == 0x20 && '0' == 0x30 \ && 'A' == 0x41 && 'a' == 0x61 && '!' == 0x21 # define HOST_CHARSET HOST_CHARSET_ASCII #else # if '\n' == 0x15 && ' ' == 0x40 && '0' == 0xF0 \ && 'A' == 0xC1 && 'a' == 0x81 && '!' == 0x5A # define HOST_CHARSET HOST_CHARSET_EBCDIC # else # define HOST_CHARSET HOST_CHARSET_UNKNOWN # endif #endif in include/safe-ctype.h, does that mean we only support EBCDIC if -funsigned-char and otherwise fail to build gcc? Because with -fsigned-char, '0' is -0x10 rather than 0xF0, 'A' is -0x3F rather than 0xC1 and 'a' is -0x7F rather than 0x81. And another thing, if HOST_CHARSET == HOST_CHARSET_EBCDIC, how does the libcpp/lex.c static const cppchar_t utf8_signifier = 0xC0; ... if (*buffer->cur >= utf8_signifier) { if (_cpp_valid_utf8 (pfile, &buffer->cur, buffer->rlimit, 1 + !first, state, &s)) return true; } work? Because in UTF-EBCDIC, >= 0xC0 isn't the right test for start of multi-byte character, it is more complicated and seems _cpp_valid_utf8 assumes UTF-8 as the host charset. 2021-10-07 Jakub Jelinek <jakub@redhat.com> PR c++/102615 * g++.dg/cpp23/charlit-encoding1.C: New testcase for C++23 P2316R2.
next prev parent reply other threads:[~2021-10-07 13:17 UTC|newest] Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-10-05 16:47 [Bug c++/102615] New: " mpolacek at gcc dot gnu.org 2021-10-05 16:48 ` [Bug c++/102615] " jakub at gcc dot gnu.org 2021-10-07 13:17 ` cvs-commit at gcc dot gnu.org [this message] 2021-10-07 13:24 ` jakub at gcc dot gnu.org
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-102615-4-7QgsDhSDLc@http.gcc.gnu.org/bugzilla/ \ --to=gcc-bugzilla@gcc.gnu.org \ --cc=gcc-bugs@gcc.gnu.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).