From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 975CA3858419; Thu, 7 Oct 2021 13:17:23 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 975CA3858419 From: "cvs-commit at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug c++/102615] [C++23] P2316R2 - Consistent character literal encoding Date: Thu, 07 Oct 2021 13:17:23 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: c++ X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: cvs-commit at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Oct 2021 13:17:23 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D102615 --- Comment #2 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:348b426be3fc99453b42e79a18331c7bf24ee0dc commit r12-4226-g348b426be3fc99453b42e79a18331c7bf24ee0dc Author: Jakub Jelinek Date: Thu Oct 7 15:16:13 2021 +0200 c++: Add testcase for C++23 P2316R2 - consistent character literal enco= ding [PR102615] I believe we need no changes to the compiler for P2316R2, seems we treat character literals the same between preprocessor and C++ expressions, here is a testcase that should verify it. Note, seems the internal charset for GCC can be either UTF-8 or UTF-EBC= DIC, but I bet it is very hard (at least for me) to actually test the latter. I'd guess one needs all system headers to be in EBCDIC and the gcc sour= ces too. But looking around the source, I'm a little bit worried about the UTF-EBCDIC case. One is: #if '\n' =3D=3D 0x0A && ' ' =3D=3D 0x20 && '0' =3D=3D 0x30 \ && 'A' =3D=3D 0x41 && 'a' =3D=3D 0x61 && '!' =3D=3D 0x21 # define HOST_CHARSET HOST_CHARSET_ASCII #else # if '\n' =3D=3D 0x15 && ' ' =3D=3D 0x40 && '0' =3D=3D 0xF0 \ && 'A' =3D=3D 0xC1 && 'a' =3D=3D 0x81 && '!' =3D=3D 0x5A # define HOST_CHARSET HOST_CHARSET_EBCDIC # else # define HOST_CHARSET HOST_CHARSET_UNKNOWN # endif #endif in include/safe-ctype.h, does that mean we only support EBCDIC if -funsigned-char and otherwise fail to build gcc? Because with -fsigned-char, '0' is -0= x10 rather than 0xF0, 'A' is -0x3F rather than 0xC1 and 'a' is -0x7F rather than 0x81. And another thing, if HOST_CHARSET =3D=3D HOST_CHARSET_EBCDIC, how does= the libcpp/lex.c static const cppchar_t utf8_signifier =3D 0xC0; ... if (*buffer->cur >=3D utf8_signifier) { if (_cpp_valid_utf8 (pfile, &buffer->cur, buffer->rlimit, 1 + !first, state, &s)) return true; } work? Because in UTF-EBCDIC, >=3D 0xC0 isn't the right test for start = of multi-byte character, it is more complicated and seems _cpp_valid_utf8 assumes UTF-8 as the host charset. 2021-10-07 Jakub Jelinek PR c++/102615 * g++.dg/cpp23/charlit-encoding1.C: New testcase for C++23 P231= 6R2.=