[Bug c++/110343] New: [C++26] P2558R2 - Add @, $, and ` to the basic character set

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug c++/110343] New: [C++26] P2558R2 - Add @, $, and ` to the basic character set
@ 2023-06-21 16:13 mpolacek at gcc dot gnu.org
  2023-08-23 12:45 ` [Bug c++/110343] " jakub at gcc dot gnu.org
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: mpolacek at gcc dot gnu.org @ 2023-06-21 16:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110343

            Bug ID: 110343
           Summary: [C++26] P2558R2 - Add @, $, and ` to the basic
                    character set
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mpolacek at gcc dot gnu.org
  Target Milestone: ---

See <https://wg21.link/P2558R2>.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug c++/110343] [C++26] P2558R2 - Add @, $, and ` to the basic character set
  2023-06-21 16:13 [Bug c++/110343] New: [C++26] P2558R2 - Add @, $, and ` to the basic character set mpolacek at gcc dot gnu.org
@ 2023-08-23 12:45 ` jakub at gcc dot gnu.org
  2024-01-09 18:35 ` emsr at gcc dot gnu.org
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-08-23 12:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110343

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
clang claims to implement this but that doesn't seem to be the case, I think
e.g.
const char *p = R"abc`@$(foobar)abc`@$";
should be accepted for -std=c++2c.
I'm lost at what we need to do for
This is currently rejected by GCC ‘error: universal character is not valid in
an identifier’, although this seems to be a bug, and the code is accepted by
clang and msvc.
in the paper (3.1).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug c++/110343] [C++26] P2558R2 - Add @, $, and ` to the basic character set
  2023-06-21 16:13 [Bug c++/110343] New: [C++26] P2558R2 - Add @, $, and ` to the basic character set mpolacek at gcc dot gnu.org
  2023-08-23 12:45 ` [Bug c++/110343] " jakub at gcc dot gnu.org
@ 2024-01-09 18:35 ` emsr at gcc dot gnu.org
  2024-01-09 18:39 ` emsr at gcc dot gnu.org
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: emsr at gcc dot gnu.org @ 2024-01-09 18:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110343

--- Comment #2 from Ed Smith-Rowland <emsr at gcc dot gnu.org> ---
The example in the paper:
----------------------------------------------------------
/*
gcc -E charset.c > charhelp.c
gcc -o charhelp charhelp.c
*/

#include <stdio.h>

#define STR(x) #x

int main()
{
  printf("%s", STR(\u0060)); // U+0060 is ` GRAVE ACCENT
  printf("%s", "\u0060"); // U+0060 is ` GRAVE ACCENT
}
----------------------------------------------------------
Does give an error, but when I preprocess the file with -E the result
compiles just fine with either C or C++.

Preprocessor bug?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug c++/110343] [C++26] P2558R2 - Add @, $, and ` to the basic character set
  2023-06-21 16:13 [Bug c++/110343] New: [C++26] P2558R2 - Add @, $, and ` to the basic character set mpolacek at gcc dot gnu.org
  2023-08-23 12:45 ` [Bug c++/110343] " jakub at gcc dot gnu.org
  2024-01-09 18:35 ` emsr at gcc dot gnu.org
@ 2024-01-09 18:39 ` emsr at gcc dot gnu.org
  2024-01-09 18:44 ` jakub at gcc dot gnu.org
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: emsr at gcc dot gnu.org @ 2024-01-09 18:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110343

--- Comment #3 from Ed Smith-Rowland <emsr at gcc dot gnu.org> ---
Created attachment 57018
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57018&action=edit
Get the raw string literal to compile.

I just added the new characters to lex_raw_string and got

const char *p = R"abc`@$(foobar)abc`@$";

to go.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug c++/110343] [C++26] P2558R2 - Add @, $, and ` to the basic character set
  2023-06-21 16:13 [Bug c++/110343] New: [C++26] P2558R2 - Add @, $, and ` to the basic character set mpolacek at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2024-01-09 18:39 ` emsr at gcc dot gnu.org
@ 2024-01-09 18:44 ` jakub at gcc dot gnu.org
  2024-01-09 19:42 ` emsr at gcc dot gnu.org
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-01-09 18:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110343

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jason at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Ed Smith-Rowland from comment #3)
> Created attachment 57018 [details]
> Get the raw string literal to compile.
> 
> I just added the new characters to lex_raw_string and got
> 
> const char *p = R"abc`@$(foobar)abc`@$";
> 
> to go.

Shouldn't those be added conditionally on whether it is -std=c++26 compilation
or not?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug c++/110343] [C++26] P2558R2 - Add @, $, and ` to the basic character set
  2023-06-21 16:13 [Bug c++/110343] New: [C++26] P2558R2 - Add @, $, and ` to the basic character set mpolacek at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2024-01-09 18:44 ` jakub at gcc dot gnu.org
@ 2024-01-09 19:42 ` emsr at gcc dot gnu.org
  2024-01-09 22:01 ` emsr at gcc dot gnu.org
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: emsr at gcc dot gnu.org @ 2024-01-09 19:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110343

--- Comment #5 from Ed Smith-Rowland <emsr at gcc dot gnu.org> ---
Probably should. I'll see how to do that.
I might have to set up the lang flag and all that unless someone beats me to
it.

I was going to say that the error on the stringification is possibly correct.
The paper says usage in things _other_ than literals is illegal.

Plus, over on gcc I think they have the equivalent thing implemented and I
assume they would know.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug c++/110343] [C++26] P2558R2 - Add @, $, and ` to the basic character set
  2023-06-21 16:13 [Bug c++/110343] New: [C++26] P2558R2 - Add @, $, and ` to the basic character set mpolacek at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2024-01-09 19:42 ` emsr at gcc dot gnu.org
@ 2024-01-09 22:01 ` emsr at gcc dot gnu.org
  2024-05-02  8:28 ` jakub at gcc dot gnu.org
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: emsr at gcc dot gnu.org @ 2024-01-09 22:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110343

--- Comment #6 from Ed Smith-Rowland <emsr at gcc dot gnu.org> ---
Created attachment 57019
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57019&action=edit
Add a flag to only allow new chars in c++26.

Here s a patch that adds and checks a flag in libcpp and also adds a test.

If you have a better idea for the flag name let me know.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug c++/110343] [C++26] P2558R2 - Add @, $, and ` to the basic character set
  2023-06-21 16:13 [Bug c++/110343] New: [C++26] P2558R2 - Add @, $, and ` to the basic character set mpolacek at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2024-01-09 22:01 ` emsr at gcc dot gnu.org
@ 2024-05-02  8:28 ` jakub at gcc dot gnu.org
  2024-07-02 15:24 ` jakub at gcc dot gnu.org
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-05-02  8:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110343

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Note, GCC 15 stage1 is open, so feel free to post your patch.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug c++/110343] [C++26] P2558R2 - Add @, $, and ` to the basic character set
  2023-06-21 16:13 [Bug c++/110343] New: [C++26] P2558R2 - Add @, $, and ` to the basic character set mpolacek at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2024-05-02  8:28 ` jakub at gcc dot gnu.org
@ 2024-07-02 15:24 ` jakub at gcc dot gnu.org
  2024-07-17 17:49 ` jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-07-02 15:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110343

--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Ed, ping again, will you post this to gcc-patches?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug c++/110343] [C++26] P2558R2 - Add @, $, and ` to the basic character set
  2023-06-21 16:13 [Bug c++/110343] New: [C++26] P2558R2 - Add @, $, and ` to the basic character set mpolacek at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2024-07-02 15:24 ` jakub at gcc dot gnu.org
@ 2024-07-17 17:49 ` jakub at gcc dot gnu.org
  2024-07-18 18:42 ` emsr at gcc dot gnu.org
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-07-17 17:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110343

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |redi at gcc dot gnu.org

--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I've tried to understand the preprocessor issue mentioned in the paper, but am
confused on what is the right behavior and why.

Consider
#define STR(x) #x
const char *a = "\u00b7";
const char *b = STR(\u00b7);
const char *c = "\u0041";
const char *d = STR(\u0041);
const char *e = STR(a\u00b7);
const char *f = STR(a\u0041);
const char *g = STR(a \u00b7);
const char *h = STR(a \u0041);
const char *i = "\u066d";
const char *j = STR(\u066d);
const char *k = "\u0040";
const char *l = STR(\u0040);
const char *m = STR(a\u066d);
const char *n = STR(a\u0040);
const char *o = STR(a \u066d);
const char *p = STR(a \u0040);

Neither clang nor gcc emit any diagnostics on the a, c, i and k initializers,
those are certainly valid.
g++ emits with -pedantic-errors errors on all the others, while clang++ on the
ones with STR involving \u0041, \u0040 and a\u0066d.
The chosen values are \u0040 '@' as something being changed by this paper,
\u0041 'A',
\u00b7 as an example of character which is pedantically valid in identifiers if
not at the start and \u066d s something pedantically not valid in identifiers.

Now, https://eel.is/c++draft/lex.charset#6 says that UCN used outside of a
string/character literal which corresponds to basic character set character (or
control character) is ill-formed, that would make d, f, h cases invalid for C++
and l, n, p cases invalid for C++26.

https://eel.is/c++draft/lex.name states which characters can appear at the
start of the identifier and which can appear after the start.
And https://eel.is/c++draft/lex.pptoken states that preprocessing-token is
either identifier, or tons of other things, or
"each non-whitespace character that cannot be one of the above"

Then https://eel.is/c++draft/lex.pptoken#1 says that this last category is
invalid if the preprocessing token is being converted into token.

And https://eel.is/c++draft/lex.pptoken#2 includes
"If any character not in the basic character set matches the last category, the
program is ill-formed."

Now, e.g. for the C++23 STR(\u0040) case, \u0040 is there not in the basic
character set, so valid outside of the literals (not the case anymore in
C++26), but it isn't nondigit and doesn't have XID_Start property, so it isn't
IMHO an identifier and so must be the "each non-whitespace character that
cannot be one of the above" case.
Why doesn't the above mentioned https://eel.is/c++draft/lex.pptoken#2 sentence
make that invalid?  Ignoring that, I'd say it would be then stringized and that
feels like it is what clang++ is doing.
Now, e.g. for the STR(a\u066d) case, I wonder why that isn't lexed as a
identifier
followed by \u066d "each non-whitespace character that cannot be one of the
above"
token and stringified similarly, clang++ rejects that.

What GCC libcpp seems to be doing is that if that forms_identifier_p calls
_cpp_valid_utf8 or _cpp_valid_ucn with an argument which tells it is first or
second+ in identifier, and e.g. _cpp_valid_ucn then for UCNs valid in string
literals calls
  else if (identifier_pos)
    {
      int validity = ucn_valid_in_identifier (pfile, result, nst);

      if (validity == 0)
        cpp_error (pfile, CPP_DL_ERROR,
                   "universal character %.*s is not valid in an identifier",
                   (int) (str - base), base);
      else if (validity == 2 && identifier_pos == 1)
        cpp_error (pfile, CPP_DL_ERROR,
   "universal character %.*s is not valid at the start of an identifier",
                   (int) (str - base), base);
    }
so basically all those invalid in identifiers cases emit an error and pretend
to be valid in identifiers, rather than what e.g. _cpp_valid_utf8 does for C
but not for C++ and only for the chars completely invalid in identifiers rather
than just valid in identifiers but not at the start:
          /* In C++, this is an error for invalid character in an identifier
             because logically, the UTF-8 was converted to a UCN during
             translation phase 1 (even though we don't physically do it that
             way).  In C, this byte rather becomes grammatically a separate
             token.  */

          if (CPP_OPTION (pfile, cplusplus))
            cpp_error (pfile, CPP_DL_ERROR,
                       "extended character %.*s is not valid in an identifier",
                       (int) (*pstr - base), base);
          else
            {
              *pstr = base;
              return false;
            }
The comment doesn't really match what is done in recent C++ versions because
there UCNs are translated to characters and not the other way around.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug c++/110343] [C++26] P2558R2 - Add @, $, and ` to the basic character set
  2023-06-21 16:13 [Bug c++/110343] New: [C++26] P2558R2 - Add @, $, and ` to the basic character set mpolacek at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2024-07-17 17:49 ` jakub at gcc dot gnu.org
@ 2024-07-18 18:42 ` emsr at gcc dot gnu.org
  2024-07-18 20:10 ` jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: emsr at gcc dot gnu.org @ 2024-07-18 18:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110343

--- Comment #10 from Ed Smith-Rowland <emsr at gcc dot gnu.org> ---
Sorry I was out for a while.
I was trying to figure out if there was some table of allowed characters we
should use. Also, C23 needs this too IIUC and I was wondering if we should
coordinate.

It looks like you got it though.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug c++/110343] [C++26] P2558R2 - Add @, $, and ` to the basic character set
  2023-06-21 16:13 [Bug c++/110343] New: [C++26] P2558R2 - Add @, $, and ` to the basic character set mpolacek at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2024-07-18 18:42 ` emsr at gcc dot gnu.org
@ 2024-07-18 20:10 ` jakub at gcc dot gnu.org
  2024-07-25 19:38 ` cvs-commit at gcc dot gnu.org
  2024-07-25 19:39 ` jakub at gcc dot gnu.org
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-07-18 20:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110343

--- Comment #11 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I've posted https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657583.html
I don't think we should change the raw string handling for C23, because unlike
C++26 they didn't add the $@` chars to the basic character set, but next to it.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug c++/110343] [C++26] P2558R2 - Add @, $, and ` to the basic character set
  2023-06-21 16:13 [Bug c++/110343] New: [C++26] P2558R2 - Add @, $, and ` to the basic character set mpolacek at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2024-07-18 20:10 ` jakub at gcc dot gnu.org
@ 2024-07-25 19:38 ` cvs-commit at gcc dot gnu.org
  2024-07-25 19:39 ` jakub at gcc dot gnu.org
  12 siblings, 0 replies; 14+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-07-25 19:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110343

--- Comment #12 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:29341f21ce1eb7cdb8cd468e4ceb0d07cf2775e0

commit r15-2322-g29341f21ce1eb7cdb8cd468e4ceb0d07cf2775e0
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu Jul 25 21:36:31 2024 +0200

    c++: Implement C++26 P2558R2 - Add @, $, and ` to the basic character set
[PR110343]

    The following patch implements the easy parts of the paper.
    When @$` are added to the basic character set, it means that
    R"@$`()@$`" should now be valid (here I've noticed most of the
    raw string tests were tested solely with -std=c++11 or -std=gnu++11
    and I've tried to change that), and on the other side even if
    by extension $ is allowed in identifiers, \u0024 or \U00000024
    or \u{24} should not be, similarly how \u0041 is not allowed.

    The paper in 3.1 claims though that
     #include <stdio.h>

     #define STR(x) #x

    int main()
    {
      printf("%s", STR(\u0060)); // U+0060 is ` GRAVE ACCENT
    }
    should have been accepted before this paper (and rejected after it),
    but g++ rejects it.

    I've tried to understand it, but am confused on what is the right
    behavior and why.

    Consider
     #define STR(x) #x
    const char *a = "\u00b7";
    const char *b = STR(\u00b7);
    const char *c = "\u0041";
    const char *d = STR(\u0041);
    const char *e = STR(a\u00b7);
    const char *f = STR(a\u0041);
    const char *g = STR(a \u00b7);
    const char *h = STR(a \u0041);
    const char *i = "\u066d";
    const char *j = STR(\u066d);
    const char *k = "\u0040";
    const char *l = STR(\u0040);
    const char *m = STR(a\u066d);
    const char *n = STR(a\u0040);
    const char *o = STR(a \u066d);
    const char *p = STR(a \u0040);

    Neither clang nor gcc emit any diagnostics on the a, c, i and k
    initializers, those are certainly valid (c is invalid in C23 though).  g++
    emits with -pedantic-errors errors on all the others, while clang++ on the
    ones with STR involving \u0041, \u0040 and a\u0066d.  The chosen values are
    \u0040 '@' as something being changed by this paper, \u0041 'A' as basic
    character set char valid in identifiers before/after, \u00b7 as an example
    of character which is pedantically valid in identifiers if not at the start
    and \u066d s something pedantically not valid in identifiers.

    Now, https://eel.is/c++draft/lex.charset#6 says that UCN used outside of a
    string/character literal which corresponds to basic character set character
    (or control character) is ill-formed, that would make d, f, h cases invalid
    for C++ and l, n, p cases invalid for C++26.

    https://eel.is/c++draft/lex.name states which characters can appear at the
    start of the identifier and which can appear after the start.  And
    https://eel.is/c++draft/lex.pptoken states that preprocessing-token is
    either identifier, or tons of other things, or "each non-whitespace
    character that cannot be one of the above"

    Then https://eel.is/c++draft/lex.pptoken#1 says that this last category is
    invalid if the preprocessing token is being converted into token.

    And https://eel.is/c++draft/lex.pptoken#2 includes "If any character not in
    the basic character set matches the last category, the program is
    ill-formed."

    Now, e.g.  for the C++23 STR(\u0040) case, \u0040 is there not in the basic
    character set, so valid outside of the literals (not the case anymore in
    C++26), but it isn't nondigit and doesn't have XID_Start property, so it
    isn't IMHO an identifier and so must be the "each non-whitespace character
    that cannot be one of the above" case.  Why doesn't the above mentioned
    https://eel.is/c++draft/lex.pptoken#2 sentence make that invalid?  Ignoring
    that, I'd say it would be then stringized and that feels like it is what
    clang++ is doing.  Now, e.g.  for the STR(a\u066d) case, I wonder why that
    isn't lexed as a identifier followed by \u066d "each non-whitespace
    character that cannot be one of the above" token and stringified similarly,
    clang++ rejects that.

    What GCC libcpp seems to be doing is that if that forms_identifier_p calls
    _cpp_valid_utf8 or _cpp_valid_ucn with an argument which tells it is first
    or second+ in identifier, and e.g.  _cpp_valid_ucn then for UCNs valid in
    string literals calls
      else if (identifier_pos)
        {
          int validity = ucn_valid_in_identifier (pfile, result, nst);

          if (validity == 0)
            cpp_error (pfile, CPP_DL_ERROR,
                       "universal character %.*s is not valid in an
identifier",
                       (int) (str - base), base);
          else if (validity == 2 && identifier_pos == 1)
            cpp_error (pfile, CPP_DL_ERROR,
       "universal character %.*s is not valid at the start of an identifier",
                       (int) (str - base), base);
        }
    so basically all those invalid in identifiers cases emit an error and
    pretend to be valid in identifiers, rather than what e.g.  _cpp_valid_utf8
    does for C but not for C++ and only for the chars completely invalid in
    identifiers rather than just valid in identifiers but not at the start:
              /* In C++, this is an error for invalid character in an
identifier
                 because logically, the UTF-8 was converted to a UCN during
                 translation phase 1 (even though we don't physically do it
that
                 way).  In C, this byte rather becomes grammatically a separate
                 token.  */

              if (CPP_OPTION (pfile, cplusplus))
                cpp_error (pfile, CPP_DL_ERROR,
                           "extended character %.*s is not valid in an
identifier",
                           (int) (*pstr - base), base);
              else
                {
                  *pstr = base;
                  return false;
                }
    The comment doesn't really match what is done in recent C++ versions
because
    there UCNs are translated to characters and not the other way around.

    2024-07-25  Jakub Jelinek  <jakub@redhat.com>

            PR c++/110343
    libcpp/
            * lex.cc: C++26 P2558R2 - Add @, $, and ` to the basic character
set.
            (lex_raw_string): For C++26 allow $@` characters in prefix.
            * charset.cc (_cpp_valid_ucn): For C++26 reject \u0024 in
identifiers.
    gcc/testsuite/
            * c-c++-common/raw-string-1.c: Use { c || c++11 } effective target,
            remove c++ specific dg-options.
            * c-c++-common/raw-string-2.c: Likewise.
            * c-c++-common/raw-string-4.c: Likewise.
            * c-c++-common/raw-string-5.c: Likewise.  Expect some diagnostics
            only for non-c++26, for c++26 expect different.
            * c-c++-common/raw-string-6.c: Use { c || c++11 } effective target,
            remove c++ specific dg-options.
            * c-c++-common/raw-string-11.c: Likewise.
            * c-c++-common/raw-string-13.c: Likewise.
            * c-c++-common/raw-string-14.c: Likewise.
            * c-c++-common/raw-string-15.c: Use { c || c++11 } effective
target,
            change c++ specific dg-options to just -Wtrigraphs.
            * c-c++-common/raw-string-16.c: Likewise.
            * c-c++-common/raw-string-17.c: Use { c || c++11 } effective
target,
            remove c++ specific dg-options.
            * c-c++-common/raw-string-18.c: Use { c || c++11 } effective
target,
            remove -std=c++11 from c++ specific dg-options.
            * c-c++-common/raw-string-19.c: Likewise.
            * g++.dg/cpp26/raw-string1.C: New test.
            * g++.dg/cpp26/raw-string2.C: New test.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug c++/110343] [C++26] P2558R2 - Add @, $, and ` to the basic character set
  2023-06-21 16:13 [Bug c++/110343] New: [C++26] P2558R2 - Add @, $, and ` to the basic character set mpolacek at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2024-07-25 19:38 ` cvs-commit at gcc dot gnu.org
@ 2024-07-25 19:39 ` jakub at gcc dot gnu.org
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-07-25 19:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110343

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |FIXED
           Assignee|emsr at gcc dot gnu.org            |jakub at gcc dot gnu.org

--- Comment #13 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Implemented for 15.1+.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-07-25 19:39 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-21 16:13 [Bug c++/110343] New: [C++26] P2558R2 - Add @, $, and ` to the basic character set mpolacek at gcc dot gnu.org
2023-08-23 12:45 ` [Bug c++/110343] " jakub at gcc dot gnu.org
2024-01-09 18:35 ` emsr at gcc dot gnu.org
2024-01-09 18:39 ` emsr at gcc dot gnu.org
2024-01-09 18:44 ` jakub at gcc dot gnu.org
2024-01-09 19:42 ` emsr at gcc dot gnu.org
2024-01-09 22:01 ` emsr at gcc dot gnu.org
2024-05-02  8:28 ` jakub at gcc dot gnu.org
2024-07-02 15:24 ` jakub at gcc dot gnu.org
2024-07-17 17:49 ` jakub at gcc dot gnu.org
2024-07-18 18:42 ` emsr at gcc dot gnu.org
2024-07-18 20:10 ` jakub at gcc dot gnu.org
2024-07-25 19:38 ` cvs-commit at gcc dot gnu.org
2024-07-25 19:39 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).