public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/106648] New: [C++23] P2071 - Named universal character escapes
@ 2022-08-16 16:57 mpolacek at gcc dot gnu.org
  2022-08-26  7:28 ` [Bug c++/106648] " cvs-commit at gcc dot gnu.org
  2022-08-26  7:32 ` jakub at gcc dot gnu.org
  0 siblings, 2 replies; 3+ messages in thread
From: mpolacek at gcc dot gnu.org @ 2022-08-16 16:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106648

            Bug ID: 106648
           Summary: [C++23] P2071 - Named universal character escapes
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mpolacek at gcc dot gnu.org
  Target Milestone: ---

See <https://wg21.link/p2071>.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug c++/106648] [C++23] P2071 - Named universal character escapes
  2022-08-16 16:57 [Bug c++/106648] New: [C++23] P2071 - Named universal character escapes mpolacek at gcc dot gnu.org
@ 2022-08-26  7:28 ` cvs-commit at gcc dot gnu.org
  2022-08-26  7:32 ` jakub at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-08-26  7:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106648

--- Comment #4 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:eb4879ab9053085a59b8d1594ef76487948bba7e

commit r13-2212-geb4879ab9053085a59b8d1594ef76487948bba7e
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Fri Aug 26 09:24:56 2022 +0200

    c++: Implement C++23 P2071R2 - Named universal character escapes [PR106648]

    The following patch implements the
    C++23 P2071R2 - Named universal character escapes
    paper to support \N{LATIN SMALL LETTER E} etc.
    I've used Unicode 14.0, there are 144803 character name properties
    (including the ones generated by Unicode NR1 and NR2 rules)
    and correction/control/alternate aliases, together with zero terminators
    that would be 3884745 bytes, which is clearly unacceptable for libcpp.
    This patch instead contains a generator which from the UnicodeData.txt
    and NameAliases.txt files emits a space optimized radix tree (208765
    bytes long for 14.0), a single string literal dictionary (59418 bytes),
    maximum name length (currently 88 chars) and two small helper arrays
    for the NR1/NR2 name generation.
    The radix tree needs 2 to 9 bytes per node, the exact format is
    described in the generator program.  There could be ways to shrink
    the dictionary size somewhat at the expense of slightly slower lookups.

    Currently the patch implements strict matching (that is what is needed
    to actually implement it on valid code) and Unicode UAX44-LM2 algorithm
    loose matching to provide hints (that algorithm essentially ignores
    hyphens in between two alphanumeric characters, spaces and underscores
    (with one exception for hyphen) and does case insensitive matching).
    In the attachment is a WIP patch that shows how to implement also
    spellcheck.{h,cc} style discovery of misspellings, but I'll need to talk
    to David Malcolm about it, as spellcheck.{h,cc} is in gcc/ subdir
    (so the WIP incremental patch instead prints all the names to stderr).

    2022-08-26  Jakub Jelinek  <jakub@redhat.com>

            PR c++/106648
    libcpp/
            * charset.cc: Implement C++23 P2071R2 - Named universal character
            escapes.  Include uname2c.h.
            (hangul_syllables, hangul_count): New variables.
            (struct uname2c_data): New type.
            (_cpp_uname2c, _cpp_uname2c_uax44_lm2): New functions.
            (_cpp_valid_ucn): Use them.  Handle named universal character
escapes.
            (convert_ucn): Adjust comment.
            (convert_escape): Call convert_ucn even for \N.
            (_cpp_interpret_identifier): Handle named universal character
escapes.
            * lex.cc (get_bidi_ucn): Fix up function comment formatting.
            (get_bidi_named): New function.
            (forms_identifier_p, lex_string): Handle named universal character
            escapes.
            * makeuname2c.cc: New file.  Small parts copied from makeucnid.cc.
            * uname2c.h: New generated file.
    gcc/c-family/
            * c-cppbuiltin.cc (c_cpp_builtins): Predefine
            __cpp_named_character_escapes to 202207L.
    gcc/testsuite/
            * c-c++-common/cpp/named-universal-char-escape-1.c: New test.
            * c-c++-common/cpp/named-universal-char-escape-2.c: New test.
            * c-c++-common/cpp/named-universal-char-escape-3.c: New test.
            * c-c++-common/cpp/named-universal-char-escape-4.c: New test.
            * c-c++-common/Wbidi-chars-25.c: New test.
            * gcc.dg/cpp/named-universal-char-escape-1.c: New test.
            * gcc.dg/cpp/named-universal-char-escape-2.c: New test.
            * g++.dg/cpp/named-universal-char-escape-1.C: New test.
            * g++.dg/cpp/named-universal-char-escape-2.C: New test.
            * g++.dg/cpp23/feat-cxx2b.C: Test __cpp_named_character_escapes.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug c++/106648] [C++23] P2071 - Named universal character escapes
  2022-08-16 16:57 [Bug c++/106648] New: [C++23] P2071 - Named universal character escapes mpolacek at gcc dot gnu.org
  2022-08-26  7:28 ` [Bug c++/106648] " cvs-commit at gcc dot gnu.org
@ 2022-08-26  7:32 ` jakub at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-08-26  7:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106648

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|ASSIGNED                    |RESOLVED

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Implemented for GCC 13.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-08-26  7:32 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-16 16:57 [Bug c++/106648] New: [C++23] P2071 - Named universal character escapes mpolacek at gcc dot gnu.org
2022-08-26  7:28 ` [Bug c++/106648] " cvs-commit at gcc dot gnu.org
2022-08-26  7:32 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).