public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/100977] New: [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31
@ 2021-06-08 18:15 jason at gcc dot gnu.org
2021-06-08 18:19 ` [Bug c++/100977] " mpolacek at gcc dot gnu.org
` (15 more replies)
0 siblings, 16 replies; 17+ messages in thread
From: jason at gcc dot gnu.org @ 2021-06-08 18:15 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100977
Bug ID: 100977
Summary: [C++23] Implement C++ Identifier Syntax using Unicode
Standard Annex 31
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: jason at gcc dot gnu.org
Blocks: 98940
Target Milestone: ---
https://wg21.link/p1949r7
This seems like largely a matter of adding another category to
libcpp/ucnid.tab.
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98940
[Bug 98940] Implement C++23 language features
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c++/100977] [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31
2021-06-08 18:15 [Bug c++/100977] New: [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31 jason at gcc dot gnu.org
@ 2021-06-08 18:19 ` mpolacek at gcc dot gnu.org
2021-08-04 13:39 ` jakub at gcc dot gnu.org
` (14 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: mpolacek at gcc dot gnu.org @ 2021-06-08 18:19 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100977
Marek Polacek <mpolacek at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
Last reconfirmed| |2021-06-08
CC| |mpolacek at gcc dot gnu.org
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c++/100977] [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31
2021-06-08 18:15 [Bug c++/100977] New: [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31 jason at gcc dot gnu.org
2021-06-08 18:19 ` [Bug c++/100977] " mpolacek at gcc dot gnu.org
@ 2021-08-04 13:39 ` jakub at gcc dot gnu.org
2021-08-04 14:08 ` jakub at gcc dot gnu.org
` (13 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-08-04 13:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100977
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I think it might be better to make makeucnid parse also the
https://www.unicode.org/Public/13.0.0/ucd/DerivedCoreProperties.txt
file and read the XID_Start and XID_End properties from there.
But when I just regenerate ucnid.h using Unicode 13.0.0 txt files, the
difference is:
--- /usr/src/gcc/libcpp/ucnid.h 2021-08-04 15:04:46.053701822 +0200
+++ ucnid.h 2021-08-04 15:05:36.773996631 +0200
@@ -505,6 +505,7 @@ static const struct ucnrange ucnranges[]
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x07f0 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x07f1 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 220, 0x07f2 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x07fc },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x0815 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x0816 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x0817 },
@@ -529,7 +530,23 @@ static const struct ucnrange ucnranges[]
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x0858 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 220, 0x0859 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 220, 0x085a },
-{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x08e3 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x08d2 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 220, 0x08d3 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x08d4 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x08d5 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x08d6 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x08d7 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x08d8 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x08d9 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x08da },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x08db },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x08dc },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x08dd },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x08de },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x08df },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x08e0 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x08e2 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 220, 0x08e3 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x08e4 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x08e5 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 220, 0x08e6 },
@@ -556,6 +573,7 @@ static const struct ucnrange ucnranges[]
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x08fb },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x08fc },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x08fd },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x08fe },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x0900 },
{ C99| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x0903 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x0904 },
@@ -615,6 +633,7 @@ static const struct ucnrange ucnranges[]
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x09e5 },
{ C99|N99| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x09ef },
{ C99| 0|CXX|C11| 0|CID|NFC|NKC| 0, 0, 0x09f1 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x09fd },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x0a01 },
{ C99| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x0a02 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x0a04 },
@@ -820,6 +839,8 @@ static const struct ucnrange ucnranges[]
{ C99| 0|CXX|C11| 0|CID|NFC|NKC| 0, 0, 0x0d28 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x0d29 },
{ C99| 0|CXX|C11| 0|CID|NFC|NKC| 0, 0, 0x0d39 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x0d3a },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 9, 0x0d3b },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x0d3d },
{ C99| 0| 0|C11| 0|CID|NFC|NKC|CTX, 0, 0x0d3e },
{ C99| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x0d43 },
@@ -894,7 +915,7 @@ static const struct ucnrange ucnranges[]
{ C99| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x0eb7 },
{ C99| 0| 0|C11| 0|CID|NFC|NKC| 0, 118, 0x0eb8 },
{ C99| 0| 0|C11| 0|CID|NFC|NKC| 0, 118, 0x0eb9 },
-{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x0eba },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 9, 0x0eba },
{ C99| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x0ebc },
{ C99| 0|CXX|C11| 0|CID|NFC|NKC| 0, 0, 0x0ebd },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x0ebf },
@@ -1031,6 +1052,22 @@ static const struct ucnrange ucnranges[]
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1a7a },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1a7b },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1a7e },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1aaf },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1ab0 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1ab1 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1ab2 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1ab3 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1ab4 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 220, 0x1ab5 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 220, 0x1ab6 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 220, 0x1ab7 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 220, 0x1ab8 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 220, 0x1ab9 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 220, 0x1aba },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1abb },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1abc },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1abe },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 220, 0x1abf },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1b05 },
{ 0| 0| 0|C11| 0| 0|NFC|NKC| 0, 0, 0x1b06 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1b07 },
@@ -1094,6 +1131,8 @@ static const struct ucnrange ucnranges[]
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 1, 0x1ce7 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1cec },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1cf3 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1cf7 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1cf8 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1d2b },
{ 0| 0| 0|C11| 0|CID|NFC| 0| 0, 0, 0x1d2e },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1d2f },
@@ -1144,7 +1183,27 @@ static const struct ucnrange ucnranges[]
{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 230, 0x1de3 },
{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 230, 0x1de4 },
{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 230, 0x1de5 },
-{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 0, 0x1dfb },
+{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 230, 0x1de6 },
+{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 230, 0x1de7 },
+{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 230, 0x1de8 },
+{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 230, 0x1de9 },
+{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 230, 0x1dea },
+{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 230, 0x1deb },
+{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 230, 0x1dec },
+{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 230, 0x1ded },
+{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 230, 0x1dee },
+{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 230, 0x1def },
+{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 230, 0x1df0 },
+{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 230, 0x1df1 },
+{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 230, 0x1df2 },
+{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 230, 0x1df3 },
+{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 230, 0x1df4 },
+{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 230, 0x1df5 },
+{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 232, 0x1df6 },
+{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 228, 0x1df7 },
+{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 228, 0x1df8 },
+{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 0, 0x1dfa },
+{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 230, 0x1dfb },
{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 233, 0x1dfc },
{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 220, 0x1dfd },
{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 230, 0x1dfe },
@@ -1527,8 +1586,6 @@ static const struct ucnrange ucnranges[]
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x324f },
{ 0| 0| 0|C11| 0|CID|NFC| 0| 0, 0, 0x327e },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x327f },
-{ 0| 0| 0|C11| 0|CID|NFC| 0| 0, 0, 0x32fe },
-{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x32ff },
{ 0| 0| 0|C11| 0|CID|NFC| 0| 0, 0, 0x33ff },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x4dff },
{ C99| 0|CXX|C11| 0|CID|NFC|NKC| 0, 0, 0x9fa5 },
@@ -1543,7 +1600,9 @@ static const struct ucnrange ucnranges[]
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0xa67a },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0xa67b },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0xa67c },
-{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0xa69e },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0xa69b },
+{ 0| 0| 0|C11| 0|CID|NFC| 0| 0, 0, 0xa69d },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0xa69e },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0xa6ef },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0xa6f0 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0xa76f },
@@ -1551,6 +1610,7 @@ static const struct ucnrange ucnranges[]
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0xa7f7 },
{ 0| 0| 0|C11| 0|CID|NFC| 0| 0, 0, 0xa7f9 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0xa805 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0xa82b },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0xa8c3 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0xa8df },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0xa8e0 },
@@ -1586,6 +1646,10 @@ static const struct ucnrange ucnranges[]
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0xaabe },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0xaac0 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0xaaf5 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0xab5b },
+{ 0| 0| 0|C11| 0|CID|NFC| 0| 0, 0, 0xab5f },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0xab68 },
+{ 0| 0| 0|C11| 0|CID|NFC| 0| 0, 0, 0xab69 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0xabec },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0xabff },
{ C99| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0xd7a3 },
@@ -1650,7 +1714,16 @@ static const struct ucnrange ucnranges[]
{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 230, 0xfe23 },
{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 230, 0xfe24 },
{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 230, 0xfe25 },
-{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 0, 0xfe2f },
+{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 230, 0xfe26 },
+{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 220, 0xfe27 },
+{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 220, 0xfe28 },
+{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 220, 0xfe29 },
+{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 220, 0xfe2a },
+{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 220, 0xfe2b },
+{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 220, 0xfe2c },
+{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 220, 0xfe2d },
+{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 230, 0xfe2e },
+{ 0| 0| 0|C11|N11|CID|NFC|NKC| 0, 230, 0xfe2f },
{ 0| 0| 0|C11| 0|CID|NFC| 0| 0, 0, 0xfe44 },
{ 0| 0| 0| 0| 0|CID|NFC|NKC| 0, 0, 0xfe46 },
{ 0| 0| 0|C11| 0|CID|NFC| 0| 0, 0, 0xfe52 },
@@ -1686,13 +1759,39 @@ static const struct ucnrange ucnranges[]
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0xfffd },
{ 0| 0| 0| 0| 0|CID|NFC|NKC| 0, 0, 0xffff },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x101fc },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x102df },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x10375 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x10376 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x10377 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x10378 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x10379 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x10a0c },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x10a0e },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x10a37 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x10a38 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 1, 0x10a39 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x10a3e },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x10ae4 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x10ae5 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x10d23 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x10d24 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x10d25 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x10d26 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x10eaa },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x10eab },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x10f45 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 220, 0x10f46 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 220, 0x10f47 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x10f48 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x10f49 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x10f4a },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 220, 0x10f4b },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x10f4c },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 220, 0x10f4d },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 220, 0x10f4e },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 220, 0x10f4f },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x11045 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1107e },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x11099 },
{ 0| 0| 0|C11| 0| 0|NFC|NKC| 0, 0, 0x1109a },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1109b },
@@ -1711,9 +1810,88 @@ static const struct ucnrange ucnranges[]
{ 0| 0| 0|C11| 0| 0|NFC|NKC| 0, 0, 0x1112f },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x11132 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 9, 0x11133 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x11172 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x111bf },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x111c9 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x11234 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 9, 0x11235 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x112e8 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 7, 0x112e9 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1133a },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 7, 0x1133b },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1133d },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC|CTX, 0, 0x1133e },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1134a },
+{ 0| 0| 0|C11| 0| 0|NFC|NKC| 0, 0, 0x1134c },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x11356 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC|CTX, 0, 0x11357 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x11365 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x11366 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x11367 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x11368 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x11369 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1136a },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1136b },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1136f },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x11370 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x11371 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x11372 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x11373 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x11441 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x11445 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1145d },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x114af },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC|CTX, 0, 0x114b0 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x114b9 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC|CTX, 0, 0x114ba },
+{ 0| 0| 0|C11| 0| 0|NFC|NKC| 0, 0, 0x114bc },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC|CTX, 0, 0x114bd },
+{ 0| 0| 0|C11| 0| 0|NFC|NKC| 0, 0, 0x114be },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x114c1 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 9, 0x114c2 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x115ae },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC|CTX, 0, 0x115af },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x115b9 },
+{ 0| 0| 0|C11| 0| 0|NFC|NKC| 0, 0, 0x115bb },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x115be },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 9, 0x115bf },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1163e },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x116b5 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 9, 0x116b6 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1172a },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x11838 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 9, 0x11839 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1192f },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC|CTX, 0, 0x11930 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x11937 },
+{ 0| 0| 0|C11| 0| 0|NFC|NKC| 0, 0, 0x11938 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1193c },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 9, 0x1193d },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x11942 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x119df },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x11a33 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x11a46 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x11a98 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x11c3e },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x11d41 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x11d43 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 9, 0x11d44 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x11d96 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x16aef },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 1, 0x16af0 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 1, 0x16af1 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 1, 0x16af2 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 1, 0x16af3 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x16b2f },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x16b30 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x16b31 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x16b32 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x16b33 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x16b34 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x16b35 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x16fef },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 6, 0x16ff0 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1bc9d },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1d15d },
{ 0| 0| 0|C11| 0| 0| 0| 0| 0, 0, 0x1d164 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 216, 0x1d165 },
@@ -1792,6 +1970,69 @@ static const struct ucnrange ucnranges[]
{ 0| 0| 0|C11| 0|CID|NFC| 0| 0, 0, 0x1d7cb },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1d7cd },
{ 0| 0| 0|C11| 0|CID|NFC| 0| 0, 0, 0x1d7ff },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1dfff },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e000 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e001 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e002 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e003 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e004 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e005 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1e007 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e008 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e009 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e00a },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e00b },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e00c },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e00d },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e00e },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e00f },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e010 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e011 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e012 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e013 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e014 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e015 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e016 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e017 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1e01a },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e01b },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e01c },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e01d },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e01e },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e01f },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e020 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1e022 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e023 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1e025 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e026 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e027 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e028 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e029 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1e12f },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e130 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e131 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e132 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e133 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e134 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e135 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1e2eb },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e2ec },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e2ed },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e2ee },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1e8cf },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 220, 0x1e8d0 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 220, 0x1e8d1 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 220, 0x1e8d2 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 220, 0x1e8d3 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 220, 0x1e8d4 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 220, 0x1e8d5 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1e943 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e944 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e945 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e946 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e947 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e948 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x1e949 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1edff },
{ 0| 0| 0|C11| 0|CID|NFC| 0| 0, 0, 0x1ee03 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1ee04 },
@@ -1865,17 +2106,19 @@ static const struct ucnrange ucnranges[]
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1f12f },
{ 0| 0| 0|C11| 0|CID|NFC| 0| 0, 0, 0x1f14f },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1f169 },
-{ 0| 0| 0|C11| 0|CID|NFC| 0| 0, 0, 0x1f16b },
+{ 0| 0| 0|C11| 0|CID|NFC| 0| 0, 0, 0x1f16c },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1f18f },
{ 0| 0| 0|C11| 0|CID|NFC| 0| 0, 0, 0x1f190 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1f1ff },
{ 0| 0| 0|C11| 0|CID|NFC| 0| 0, 0, 0x1f202 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1f20f },
-{ 0| 0| 0|C11| 0|CID|NFC| 0| 0, 0, 0x1f23a },
+{ 0| 0| 0|C11| 0|CID|NFC| 0| 0, 0, 0x1f23b },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1f23f },
{ 0| 0| 0|C11| 0|CID|NFC| 0| 0, 0, 0x1f248 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1f24f },
{ 0| 0| 0|C11| 0|CID|NFC| 0| 0, 0, 0x1f251 },
+{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1fbef },
+{ 0| 0| 0|C11| 0|CID|NFC| 0| 0, 0, 0x1fbf9 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x1fffd },
{ 0| 0| 0| 0| 0|CID|NFC|NKC| 0, 0, 0x1ffff },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x2f7ff },
plus various changes in the check_nfc function.
So, the first question is if the C11/N11/C99 etc. stuff should use Unicode 4.1
(or what was used when it was generated) tables and only CXX20/NXX20 should use
Unicode 13.0 tables (what about NFC/NKC?), or if it is ok to just regenerate
everything using Unicode 13.0 files, add parsing of the
DerivedCoreProperties.txt file too (and pick XID_Start and XID_Continue
properties there, throw away everything < 0x80 and otherwise compute CXX20 flag
as XID_Continue and NXX20 flag as XID_Continue \ XID_Start.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c++/100977] [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31
2021-06-08 18:15 [Bug c++/100977] New: [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31 jason at gcc dot gnu.org
2021-06-08 18:19 ` [Bug c++/100977] " mpolacek at gcc dot gnu.org
2021-08-04 13:39 ` jakub at gcc dot gnu.org
@ 2021-08-04 14:08 ` jakub at gcc dot gnu.org
2021-08-04 16:14 ` jakub at gcc dot gnu.org
` (12 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-08-04 14:08 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100977
--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 51258
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51258&action=edit
gcc12-pr100977-1.patch
I think I found a bug in the makeucnid.c program, sometimes the ranges are
split
even when they contain the identical flags and combining value (which results
in
unnecessarily large table), but in other cases, e.g.
U+0483 to U+0487 inclusive are combining 230 and U+0488 is combining 0,
but the generated file had:
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x0482 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x0483 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x0484 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x0485 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 230, 0x0486 },
{ 0| 0| 0|C11| 0|CID|NFC|NKC| 0, 0, 0x048f },
i.e. 0x0487 would be handled as non-combining.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c++/100977] [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31
2021-06-08 18:15 [Bug c++/100977] New: [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31 jason at gcc dot gnu.org
` (2 preceding siblings ...)
2021-08-04 14:08 ` jakub at gcc dot gnu.org
@ 2021-08-04 16:14 ` jakub at gcc dot gnu.org
2021-08-04 18:34 ` joseph at codesourcery dot com
` (11 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-08-04 16:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100977
--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Incrementally, here is a makeucnid.c patch to also emit CXX23 and NXX23 flags
(CXX23 for valid as C++23 identifier and NXX23 for valid as C++23 identifier
but not as the first character), but doesn't contain changes to actually handle
it on the libcpp side.
--- libcpp/makeucnid.c.jj 2021-08-04 17:35:35.995944075 +0200
+++ libcpp/makeucnid.c 2021-08-04 18:13:56.399062234 +0200
@@ -17,7 +17,7 @@ along with this program; see the file CO
/* Run this program as
./makeucnid ucnid.tab UnicodeData.txt DerivedNormalizationProps.txt \
- > ucnid.h
+ DerivedCoreProperties.txt > ucnid.h
*/
#include <stdio.h>
@@ -32,10 +32,12 @@ enum {
N99 = 4,
C11 = 8,
N11 = 16,
- all_languages = C99 | CXX | C11,
- not_NFC = 32,
- not_NFKC = 64,
- maybe_not_NFC = 128
+ CXX23 = 32,
+ NXX23 = 64,
+ all_languages = C99 | CXX | C11 | CXX23 | NXX23,
+ not_NFC = 128,
+ not_NFKC = 256,
+ maybe_not_NFC = 512
};
#define NUM_CODE_POINTS 0x110000
@@ -241,6 +243,74 @@ read_derived (const char *fname)
fclose (f);
}
+/* Read DerivedCoreProperties.txt and fill in languages version in
+ flags from the XID_Start and XID_Continue properties. */
+
+static void
+read_derivedcore (char *fname)
+{
+ FILE * f = fopen (fname, "r");
+
+ if (!f)
+ fail ("opening DerivedCoreProperties.txt");
+ for (;;)
+ {
+ char line[256];
+ unsigned long codepoint_start, codepoint_end;
+ char *l;
+ int i, j;
+
+ if (!fgets (line, sizeof (line), f))
+ break;
+ if (line[0] == '#' || line[0] == '\n' || line[0] == '\r')
+ continue;
+ codepoint_start = strtoul (line, &l, 16);
+ if (l == line)
+ fail ("parsing DerivedCoreProperties.txt, reading code point");
+ if (codepoint_start > MAX_CODE_POINT)
+ fail ("parsing DerivedCoreProperties.txt, code point too large");
+
+ if (*l == '.' && l[1] == '.')
+ {
+ char *l2 = l + 2;
+ codepoint_end = strtoul (l + 2, &l, 16);
+ if (l == l2 || codepoint_end < codepoint_start)
+ fail ("parsing DerivedCoreProperties.txt, reading code point");
+ if (codepoint_end > MAX_CODE_POINT)
+ fail ("parsing DerivedCoreProperties.txt, code point too large");
+ }
+ else
+ codepoint_end = codepoint_start;
+
+ while (*l == ' ')
+ l++;
+ if (*l++ != ';')
+ fail ("parsing DerivedCoreProperties.txt, reading code point");
+
+ while (*l == ' ')
+ l++;
+
+ if (codepoint_end < 0x80)
+ continue;
+
+ if (strncmp (l, "XID_Start ", 10) == 0)
+ {
+ for (; codepoint_start <= codepoint_end; codepoint_start++)
+ flags[codepoint_start]
+ = (flags[codepoint_start] | CXX23) & ~NXX23;
+ }
+ else if (strncmp (l, "XID_Continue ", 13) == 0)
+ {
+ for (; codepoint_start <= codepoint_end; codepoint_start++)
+ if ((flags[codepoint_start] & CXX23) == 0)
+ flags[codepoint_start] |= CXX23 | NXX23;
+ }
+ }
+ if (ferror (f))
+ fail ("reading DerivedCoreProperties.txt");
+ fclose (f);
+}
+
/* Write out the table.
The table consists of two words per entry. The first word is the flags
for the unicode code points up to and including the second word. */
@@ -261,12 +331,14 @@ write_table (void)
|| really_safe != (decomp[i][0] == 0)
|| combining_value[i] != last_combine)
{
- printf ("{ %s|%s|%s|%s|%s|%s|%s|%s|%s, %3d, %#06x },\n",
+ printf ("{ %s|%s|%s|%s|%s|%s|%s|%s|%s|%s|%s, %3d, %#06x },\n",
last_flag & C99 ? "C99" : " 0",
last_flag & N99 ? "N99" : " 0",
last_flag & CXX ? "CXX" : " 0",
last_flag & C11 ? "C11" : " 0",
last_flag & N11 ? "N11" : " 0",
+ last_flag & CXX23 ? "CXX23" : " 0",
+ last_flag & NXX23 ? "NXX23" : " 0",
really_safe ? "CID" : " 0",
last_flag & not_NFC ? " 0" : "NFC",
last_flag & not_NFKC ? " 0" : "NKC",
@@ -439,11 +511,12 @@ write_copyright (void)
int
main(int argc, char ** argv)
{
- if (argc != 4)
+ if (argc != 5)
fail ("too few arguments to makeucn");
read_ucnid (argv[1]);
read_table (argv[2]);
read_derived (argv[3]);
+ read_derivedcore (argv[4]);
write_copyright ();
write_table ();
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c++/100977] [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31
2021-06-08 18:15 [Bug c++/100977] New: [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31 jason at gcc dot gnu.org
` (3 preceding siblings ...)
2021-08-04 16:14 ` jakub at gcc dot gnu.org
@ 2021-08-04 18:34 ` joseph at codesourcery dot com
2021-08-04 18:40 ` jakub at gcc dot gnu.org
` (10 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: joseph at codesourcery dot com @ 2021-08-04 18:34 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100977
--- Comment #4 from joseph at codesourcery dot com <joseph at codesourcery dot com> ---
On Wed, 4 Aug 2021, jakub at gcc dot gnu.org via Gcc-bugs wrote:
> plus various changes in the check_nfc function.
> So, the first question is if the C11/N11/C99 etc. stuff should use Unicode 4.1
> (or what was used when it was generated) tables and only CXX20/NXX20 should use
> Unicode 13.0 tables (what about NFC/NKC?), or if it is ok to just regenerate
> everything using Unicode 13.0 files, add parsing of the
> DerivedCoreProperties.txt file too (and pick XID_Start and XID_Continue
> properties there, throw away everything < 0x80 and otherwise compute CXX20 flag
> as XID_Continue and NXX20 flag as XID_Continue \ XID_Start.
I think it's fine for the normalization tests for older standard versions
to use the latest Unicode version, so changing each time we update from
newer Unicode data (as per
<https://gcc.gnu.org/legacy-ml/gcc-patches/2013-11/msg01901.html> I used
Unicode 6.3.0 at that time).
A trickier question is whether the XID_Start and XID_Continue sets of
characters used for C++23 are meant to be fixed to a particular Unicode
version (possibly updated for future C++ versions) or whether the set used
for C++23 is meant to be updated for each future Unicode release as it
comes out.
(Note also that identifiers not in NFC become ill-formed, i.e.
-Wnormalized=nfc needs to be a pedwarn for C++23.)
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c++/100977] [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31
2021-06-08 18:15 [Bug c++/100977] New: [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31 jason at gcc dot gnu.org
` (4 preceding siblings ...)
2021-08-04 18:34 ` joseph at codesourcery dot com
@ 2021-08-04 18:40 ` jakub at gcc dot gnu.org
2021-08-04 19:06 ` ubizjak at gmail dot com
` (9 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-08-04 18:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100977
--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 51260
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51260&action=edit
gcc12-pr100977-2-wip.patch
Here is WIP incremental patch, but I'd prefer to do it in steps, first the
above
mentioned bug, then separately update to latest Unicode, then do the
cxx23_identifiers change and need to add there some testsuite coverage and deal
with the nfc.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c++/100977] [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31
2021-06-08 18:15 [Bug c++/100977] New: [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31 jason at gcc dot gnu.org
` (5 preceding siblings ...)
2021-08-04 18:40 ` jakub at gcc dot gnu.org
@ 2021-08-04 19:06 ` ubizjak at gmail dot com
2021-08-04 19:20 ` jakub at gcc dot gnu.org
` (8 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: ubizjak at gmail dot com @ 2021-08-04 19:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100977
--- Comment #6 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Jakub Jelinek from comment #3)
> - printf ("{ %s|%s|%s|%s|%s|%s|%s|%s|%s, %3d, %#06x },\n",
> + printf ("{ %s|%s|%s|%s|%s|%s|%s|%s|%s|%s|%s, %3d, %#06x },\n",
BTW: You can also use width with strings (e.g. %3s) to avoid spaces bellow.
> last_flag & C99 ? "C99" : " 0",
> last_flag & N99 ? "N99" : " 0",
> last_flag & CXX ? "CXX" : " 0",
> last_flag & C11 ? "C11" : " 0",
> last_flag & N11 ? "N11" : " 0",
> + last_flag & CXX23 ? "CXX23" : " 0",
> + last_flag & NXX23 ? "NXX23" : " 0",
> really_safe ? "CID" : " 0",
> last_flag & not_NFC ? " 0" : "NFC",
> last_flag & not_NFKC ? " 0" : "NKC",
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c++/100977] [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31
2021-06-08 18:15 [Bug c++/100977] New: [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31 jason at gcc dot gnu.org
` (6 preceding siblings ...)
2021-08-04 19:06 ` ubizjak at gmail dot com
@ 2021-08-04 19:20 ` jakub at gcc dot gnu.org
2021-08-04 19:25 ` ubizjak at gmail dot com
` (7 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-08-04 19:20 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100977
--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
True, but is it worth changing on a tool that is one twice in a decade?
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c++/100977] [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31
2021-06-08 18:15 [Bug c++/100977] New: [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31 jason at gcc dot gnu.org
` (7 preceding siblings ...)
2021-08-04 19:20 ` jakub at gcc dot gnu.org
@ 2021-08-04 19:25 ` ubizjak at gmail dot com
2021-08-05 10:17 ` jakub at gcc dot gnu.org
` (6 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: ubizjak at gmail dot com @ 2021-08-04 19:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100977
--- Comment #8 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Jakub Jelinek from comment #7)
> True, but is it worth changing on a tool that is one twice in a decade?
Well, the question is self-answering ;)
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c++/100977] [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31
2021-06-08 18:15 [Bug c++/100977] New: [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31 jason at gcc dot gnu.org
` (8 preceding siblings ...)
2021-08-04 19:25 ` ubizjak at gmail dot com
@ 2021-08-05 10:17 ` jakub at gcc dot gnu.org
2021-08-05 15:34 ` cvs-commit at gcc dot gnu.org
` (5 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-08-05 10:17 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100977
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #51260|0 |1
is obsolete| |
Status|NEW |ASSIGNED
Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org
--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 51265
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51265&action=edit
gcc12-pr100977-2.patch
Here is an updated (but so far only very lightly tested) patch on top of the
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576748.html
and
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576749.html
patches, including the pedwarn for "is not in NFC" and testsuite coverage.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c++/100977] [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31
2021-06-08 18:15 [Bug c++/100977] New: [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31 jason at gcc dot gnu.org
` (9 preceding siblings ...)
2021-08-05 10:17 ` jakub at gcc dot gnu.org
@ 2021-08-05 15:34 ` cvs-commit at gcc dot gnu.org
2021-08-05 15:35 ` cvs-commit at gcc dot gnu.org
` (4 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-08-05 15:34 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100977
--- Comment #10 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:
https://gcc.gnu.org/g:4805b92a32637b987f924463d6af9dcf95b21f63
commit r12-2771-g4805b92a32637b987f924463d6af9dcf95b21f63
Author: Jakub Jelinek <jakub@redhat.com>
Date: Thu Aug 5 17:34:16 2021 +0200
libcpp: Fix makeucnid bug with combining values [PR100977]
I've noticed in ucnid.h two adjacent lines that had all flags and combine
values identical and as such were supposed to be merged.
This is due to a bug in makeucnid.c, which records last_flag,
last_combine and really_safe of what has just been printed, but
because of a typo mishandles it for last_combine, always compares against
the combining_value[0] which is 0.
This has two effects on the table, one is that often the table is
unnecessarily large, as for non-zero .combine every character has its own
record instead of adjacent characters with the same flags and combine
being merged. This means larger tables.
The other is that sometimes the last char that has combine set doesn't
actually have it in the tables, because the code is printing entries only
upon seeing the next character and if that character does have
combining_value of 0 and flags are otherwise the same as previously
printed,
it will not print anything.
The following patch fixes that, for clarity what exactly it affects
I've regenerated with the same Unicode files as last time it has
been regenerated.
2021-08-05 Jakub Jelinek <jakub@redhat.com>
PR c++/100977
* makeucnid.c (write_table): Fix computation of last_combine.
* ucnid.h: Regenerated using Unicode 6.3.0 files.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c++/100977] [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31
2021-06-08 18:15 [Bug c++/100977] New: [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31 jason at gcc dot gnu.org
` (10 preceding siblings ...)
2021-08-05 15:34 ` cvs-commit at gcc dot gnu.org
@ 2021-08-05 15:35 ` cvs-commit at gcc dot gnu.org
2021-09-01 20:37 ` cvs-commit at gcc dot gnu.org
` (3 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-08-05 15:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100977
--- Comment #11 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:
https://gcc.gnu.org/g:4739344d36e6d24764cbedde44a3fff6edc70f6c
commit r12-2772-g4739344d36e6d24764cbedde44a3fff6edc70f6c
Author: Jakub Jelinek <jakub@redhat.com>
Date: Thu Aug 5 17:35:20 2021 +0200
libcpp: Regenerate ucnid.h using Unicode 13.0.0 files [PR100977]
The following patch (incremental to the makeucnid.c fix) regenerates
ucnid.h with https://www.unicode.org/Public/13.0.0/ucd/ files.
2021-08-05 Jakub Jelinek <jakub@redhat.com>
PR c++/100977
* ucnid.h: Regenerated using Unicode 13.0.0 files.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c++/100977] [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31
2021-06-08 18:15 [Bug c++/100977] New: [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31 jason at gcc dot gnu.org
` (11 preceding siblings ...)
2021-08-05 15:35 ` cvs-commit at gcc dot gnu.org
@ 2021-09-01 20:37 ` cvs-commit at gcc dot gnu.org
2021-09-01 20:38 ` jakub at gcc dot gnu.org
` (2 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-09-01 20:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100977
--- Comment #12 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:
https://gcc.gnu.org/g:c4d6dcacfca1b804504515496e6d9de176d7f51e
commit r12-3302-gc4d6dcacfca1b804504515496e6d9de176d7f51e
Author: Jakub Jelinek <jakub@redhat.com>
Date: Wed Sep 1 22:33:06 2021 +0200
libcpp: Implement C++23 P1949R7 - C++ Identifier Syntax using Unicode
Standard Annex 31
The following patch implements the
P1949R7 - C++ Identifier Syntax using Unicode Standard Annex 31
paper. We already allow UTF-8 characters in the source, so that part
is already implemented, so IMHO all we need to do is pedwarn instead of
just warn for the (default) -Wnormalize=nfc (or for -Wnormalize={id,nkfc})
if the character is not in NFC and to use the unicode XID_Start and
XID_Continue derived code properties to find out what characters are
allowed
(the standard actually adds U+005F to XID_Start, but we are handling the
ASCII compatible characters differently already and they aren't allowed
in UCNs in identifiers). Instead of hardcoding the large tables
in ucnid.tab, this patch makes makeucnid.c read them from the Unicode
tables (13.0.0 version at this point).
For non-pedantic mode, we accept as 2nd+ char in identifiers a union
of valid characters in all supported modes, but for the 1st char it
was actually pedantically requiring that it is not any of the characters
that may not appear in the currently chosen standard as the first
character.
This patch changes it such that also what is allowed at the start of an
identifier is a union of characters valid at the start of an identifier
in any of the pedantic modes.
2021-09-01 Jakub Jelinek <jakub@redhat.com>
PR c++/100977
libcpp/
* include/cpplib.h (struct cpp_options): Add cxx23_identifiers.
* charset.c (CXX23, NXX23): New enumerators.
(CID, NFC, NKC, CTX): Renumber.
(ucn_valid_in_identifier): Implement P1949R7 - use CXX23 and
NXX23 flags for cxx23_identifiers. For start character in
non-pedantic mode, allow characters that are allowed as start
characters in any of the supported language modes, rather than
disallowing characters allowed only as non-start characters in
current mode but for characters from other language modes allowing
them even if they are never allowed at start.
* init.c (struct lang_flags): Add cxx23_identifiers.
(lang_defaults): Add cxx23_identifiers column.
(cpp_set_lang): Initialize CPP_OPTION (pfile, cxx23_identifiers).
* lex.c (warn_about_normalization): If cxx23_identifiers, use
cpp_pedwarning_with_line instead of cpp_warning_with_line for
"is not in NFC" diagnostics.
* makeucnid.c: Adjust usage comment.
(CXX23, NXX23): New enumerators.
(all_languages): Add CXX23.
(not_NFC, not_NFKC, maybe_not_NFC): Renumber.
(read_derivedcore): New function.
(write_table): Print also CXX23 and NXX23 columns.
(main): Require 5 arguments instead of 4, call read_derivedcore.
* ucnid.h: Regenerated using Unicode 13.0.0 files.
gcc/testsuite/
* g++.dg/cpp23/normalize1.C: New test.
* g++.dg/cpp23/normalize2.C: New test.
* g++.dg/cpp23/normalize3.C: New test.
* g++.dg/cpp23/normalize4.C: New test.
* g++.dg/cpp23/normalize5.C: New test.
* g++.dg/cpp23/normalize6.C: New test.
* g++.dg/cpp23/normalize7.C: New test.
* g++.dg/cpp23/ucnid-1-utf8.C: New test.
* g++.dg/cpp23/ucnid-2-utf8.C: New test.
* gcc.dg/cpp/ucnid-4.c: Don't expect
"not valid at the start of an identifier" errors.
* gcc.dg/cpp/ucnid-4-utf8.c: Likewise.
* gcc.dg/cpp/ucnid-5-utf8.c: New test.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c++/100977] [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31
2021-06-08 18:15 [Bug c++/100977] New: [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31 jason at gcc dot gnu.org
` (12 preceding siblings ...)
2021-09-01 20:37 ` cvs-commit at gcc dot gnu.org
@ 2021-09-01 20:38 ` jakub at gcc dot gnu.org
2021-11-30 8:51 ` cvs-commit at gcc dot gnu.org
2021-12-01 9:22 ` cvs-commit at gcc dot gnu.org
15 siblings, 0 replies; 17+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-09-01 20:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100977
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|ASSIGNED |RESOLVED
--- Comment #13 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Implemented for GCC 12.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c++/100977] [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31
2021-06-08 18:15 [Bug c++/100977] New: [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31 jason at gcc dot gnu.org
` (13 preceding siblings ...)
2021-09-01 20:38 ` jakub at gcc dot gnu.org
@ 2021-11-30 8:51 ` cvs-commit at gcc dot gnu.org
2021-12-01 9:22 ` cvs-commit at gcc dot gnu.org
15 siblings, 0 replies; 17+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-11-30 8:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100977
--- Comment #14 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:
https://gcc.gnu.org/g:7abcc9ca20d4e17deabb308b5f483aaccc3dc02c
commit r12-5597-g7abcc9ca20d4e17deabb308b5f483aaccc3dc02c
Author: Jakub Jelinek <jakub@redhat.com>
Date: Tue Nov 30 09:50:52 2021 +0100
libcpp: Enable P1949R7 for C++11 and up as it was a DR [PR100977]
Jonathan mentioned on IRC that:
"Accept P1949R7 (C++ Identifier Syntax using Unicode Standard Annex 31) as
a Defect Report and apply the changes therein to the C++ working paper."
while I've actually implemented it only for -std={gnu,c}++{23,2b}.
As the C++98 rules were significantly different, I'm not trying to change
anything for C++98.
2021-11-30 Jakub Jelinek <jakub@redhat.com>
PR c++/100977
* init.c (lang_defaults): Enable cxx23_identifiers for
-std={gnu,c}++{11,14,17,20} too.
* c-c++-common/cpp/ucnid-2011-1-utf8.c: Expect errors in C++.
* c-c++-common/cpp/ucnid-2011-1.c: Likewise.
* g++.dg/cpp/ucnid-4-utf8.C: Add missing space to dg-options.
* g++.dg/cpp23/normalize3.C: Enable for c++11 rather than just
c++23.
* g++.dg/cpp23/normalize4.C: Likewise.
* g++.dg/cpp23/normalize5.C: Likewise.
* g++.dg/cpp23/normalize7.C: Expect errors rather than just
warnings
for c++11 and up rather than just c++23.
* g++.dg/cpp23/ucnid-2-utf8.C: Expect errors even for c++11 ..
c++20.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Bug c++/100977] [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31
2021-06-08 18:15 [Bug c++/100977] New: [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31 jason at gcc dot gnu.org
` (14 preceding siblings ...)
2021-11-30 8:51 ` cvs-commit at gcc dot gnu.org
@ 2021-12-01 9:22 ` cvs-commit at gcc dot gnu.org
15 siblings, 0 replies; 17+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-12-01 9:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100977
--- Comment #15 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:
https://gcc.gnu.org/g:c264208e161830a5642ee3125871c23110508462
commit r12-5653-gc264208e161830a5642ee3125871c23110508462
Author: Jakub Jelinek <jakub@redhat.com>
Date: Wed Dec 1 10:21:20 2021 +0100
libcpp: Enable P1949R7 for C++98 too [PR100977]
On Mon, Nov 29, 2021 at 05:53:58PM -0500, Jason Merrill wrote:
> I'm inclined to go ahead and change C++98 as well; I doubt anyone is
relying
> on the particular C++98 extended character set rules, and we already
accept
> the union of the different sets when not pedantic.
Ok, here is an incremental patch to do that also for -std={c,gnu}++98.
2021-12-01 Jakub Jelinek <jakub@redhat.com>
PR c++/100977
* init.c (struct lang_flags): Remove cxx23_identifiers.
(lang_defaults): Remove cxx23_identifiers initializers.
(cpp_set_lang): Don't copy cxx23_identifiers.
* include/cpplib.h (struct cpp_options): Adjust comment about
c11_identifiers. Remove cxx23_identifiers field.
* lex.c (warn_about_normalization): Use cplusplus instead of
cxx23_identifiers.
* charset.c (ucn_valid_in_identifier): Likewise.
* g++.dg/cpp/ucnid-1.C: Adjust expected diagnostics.
* g++.dg/cpp/ucnid-1-utf8.C: Likewise.
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2021-12-01 9:22 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-08 18:15 [Bug c++/100977] New: [C++23] Implement C++ Identifier Syntax using Unicode Standard Annex 31 jason at gcc dot gnu.org
2021-06-08 18:19 ` [Bug c++/100977] " mpolacek at gcc dot gnu.org
2021-08-04 13:39 ` jakub at gcc dot gnu.org
2021-08-04 14:08 ` jakub at gcc dot gnu.org
2021-08-04 16:14 ` jakub at gcc dot gnu.org
2021-08-04 18:34 ` joseph at codesourcery dot com
2021-08-04 18:40 ` jakub at gcc dot gnu.org
2021-08-04 19:06 ` ubizjak at gmail dot com
2021-08-04 19:20 ` jakub at gcc dot gnu.org
2021-08-04 19:25 ` ubizjak at gmail dot com
2021-08-05 10:17 ` jakub at gcc dot gnu.org
2021-08-05 15:34 ` cvs-commit at gcc dot gnu.org
2021-08-05 15:35 ` cvs-commit at gcc dot gnu.org
2021-09-01 20:37 ` cvs-commit at gcc dot gnu.org
2021-09-01 20:38 ` jakub at gcc dot gnu.org
2021-11-30 8:51 ` cvs-commit at gcc dot gnu.org
2021-12-01 9:22 ` cvs-commit at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).