From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 969E83952535; Wed, 4 Aug 2021 18:34:59 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 969E83952535 From: "joseph at codesourcery dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug c++/100977] [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31 Date: Wed, 04 Aug 2021 18:34:59 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: c++ X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: joseph at codesourcery dot com X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Aug 2021 18:34:59 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D100977 --- Comment #4 from joseph at codesourcery dot com --- On Wed, 4 Aug 2021, jakub at gcc dot gnu.org via Gcc-bugs wrote: > plus various changes in the check_nfc function. > So, the first question is if the C11/N11/C99 etc. stuff should use Unicod= e 4.1 > (or what was used when it was generated) tables and only CXX20/NXX20 shou= ld use > Unicode 13.0 tables (what about NFC/NKC?), or if it is ok to just regener= ate > everything using Unicode 13.0 files, add parsing of the > DerivedCoreProperties.txt file too (and pick XID_Start and XID_Continue > properties there, throw away everything < 0x80 and otherwise compute CXX2= 0 flag > as XID_Continue and NXX20 flag as XID_Continue \ XID_Start. I think it's fine for the normalization tests for older standard versions=20 to use the latest Unicode version, so changing each time we update from=20 newer Unicode data (as per=20 I used=20 Unicode 6.3.0 at that time). A trickier question is whether the XID_Start and XID_Continue sets of=20 characters used for C++23 are meant to be fixed to a particular Unicode=20 version (possibly updated for future C++ versions) or whether the set used= =20 for C++23 is meant to be updated for each future Unicode release as it=20 comes out. (Note also that identifiers not in NFC become ill-formed, i.e.=20 -Wnormalized=3Dnfc needs to be a pedwarn for C++23.)=