From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id 975CA3858419; Thu,  7 Oct 2021 13:17:23 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 975CA3858419
From: "cvs-commit at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug c++/102615] [C++23] P2316R2 - Consistent character literal
 encoding
Date: Thu, 07 Oct 2021 13:17:23 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: c++
X-Bugzilla-Version: 12.0
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: cvs-commit at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-102615-4-7QgsDhSDLc@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-102615-4@http.gcc.gnu.org/bugzilla/>
References: <bug-102615-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Oct 2021 13:17:23 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D102615
--- Comment #2 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:348b426be3fc99453b42e79a18331c7bf24ee0dc

commit r12-4226-g348b426be3fc99453b42e79a18331c7bf24ee0dc
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu Oct 7 15:16:13 2021 +0200

    c++: Add testcase for C++23 P2316R2 - consistent character literal enco=
ding
[PR102615]

    I believe we need no changes to the compiler for P2316R2, seems we treat
    character literals the same between preprocessor and C++ expressions,
    here is a testcase that should verify it.

    Note, seems the internal charset for GCC can be either UTF-8 or UTF-EBC=
DIC,
    but I bet it is very hard (at least for me) to actually test the latter.
    I'd guess one needs all system headers to be in EBCDIC and the gcc sour=
ces
too.
    But looking around the source, I'm a little bit worried about the
UTF-EBCDIC
    case.
    One is:
     #if  '\n' =3D=3D 0x0A && ' ' =3D=3D 0x20 && '0' =3D=3D 0x30 \
        && 'A' =3D=3D 0x41 && 'a' =3D=3D 0x61 && '!' =3D=3D 0x21
     #  define HOST_CHARSET HOST_CHARSET_ASCII
     #else
     # if '\n' =3D=3D 0x15 && ' ' =3D=3D 0x40 && '0' =3D=3D 0xF0 \
        && 'A' =3D=3D 0xC1 && 'a' =3D=3D 0x81 && '!' =3D=3D 0x5A
     #  define HOST_CHARSET HOST_CHARSET_EBCDIC
     # else
     #  define HOST_CHARSET HOST_CHARSET_UNKNOWN
     # endif
     #endif
    in include/safe-ctype.h, does that mean we only support EBCDIC if
-funsigned-char
    and otherwise fail to build gcc?  Because with -fsigned-char, '0' is -0=
x10
    rather than 0xF0, 'A' is -0x3F rather than 0xC1 and 'a' is -0x7F rather
than
    0x81.
    And another thing, if HOST_CHARSET =3D=3D HOST_CHARSET_EBCDIC, how does=
 the
libcpp/lex.c
    static const cppchar_t utf8_signifier =3D 0xC0;
    ...
          if (*buffer->cur >=3D utf8_signifier)
            {
              if (_cpp_valid_utf8 (pfile, &buffer->cur, buffer->rlimit, 1 +
!first,
                                   state, &s))
                return true;
            }
    work?  Because in UTF-EBCDIC, >=3D 0xC0 isn't the right test for start =
of
    multi-byte character, it is more complicated and seems _cpp_valid_utf8
    assumes UTF-8 as the host charset.

    2021-10-07  Jakub Jelinek  <jakub@redhat.com>

            PR c++/102615
            * g++.dg/cpp23/charlit-encoding1.C: New testcase for C++23 P231=
6R2.=