public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "cvs-commit at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug preprocessor/103902] GCC requires a space between string-literal and identifier in a literal-operator-id where the identifier is not in basic character set
Date: Wed, 19 Jul 2023 03:59:23 +0000	[thread overview]
Message-ID: <bug-103902-4-m259vavf8T@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-103902-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103902

--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Lewis Hyatt <lhyatt@gcc.gnu.org>:

https://gcc.gnu.org/g:1d3e4f4e2d19c3394dc018118a78c1f4b59cb5c2

commit r14-2629-g1d3e4f4e2d19c3394dc018118a78c1f4b59cb5c2
Author: Lewis Hyatt <lhyatt@gmail.com>
Date:   Tue Jul 18 17:16:08 2023 -0400

    libcpp: Handle extended characters in user-defined literal suffix
[PR103902]

    The PR complains that we do not handle UTF-8 in the suffix for a
user-defined
    literal, such as:

    bool operator ""_Ï (unsigned long long);

    In fact we don't handle any extended identifier characters there, whether
    UTF-8, UCNs, or the $ sign. We do handle it fine if the optional space
after
    the "" tokens is included, since then the identifier is lexed in the
    "normal" way as its own token. But when it is lexed as part of the string
    token, this is handled in lex_string() with a one-off loop that is not
aware
    of extended characters.

    This patch fixes it by adding a new function scan_cur_identifier() that can
    be used to lex an identifier while in the middle of lexing another token.

    BTW, the other place that has been mis-lexing identifiers is
    lex_identifier_intern(), which is used to implement #pragma push_macro
    and #pragma pop_macro. This does not support extended characters either.
    I will add that in a subsequent patch, because it can't directly reuse the
    new function, but rather needs to lex from a string instead of a
cpp_buffer.

    With scan_cur_identifier(), we do also correctly warn about bidi and
    normalization issues in the extended identifiers comprising the suffix.

    libcpp/ChangeLog:

            PR preprocessor/103902
            * lex.cc (identifier_diagnostics_on_lex): New function refactoring
            some common code.
            (lex_identifier_intern): Use the new function.
            (lex_identifier): Don't run identifier diagnostics here, rather let
            the call site do it when needed.
            (_cpp_lex_direct): Adjust the call sites of lex_identifier ()
            acccordingly.
            (struct scan_id_result): New struct.
            (scan_cur_identifier): New function.
            (create_literal2): New function.
            (lit_accum::create_literal2): New function.
            (is_macro): Folded into new function...
            (maybe_ignore_udl_macro_suffix): ...here.
            (is_macro_not_literal_suffix): Folded likewise.
            (lex_raw_string): Handle UTF-8 in UDL suffix via
            scan_cur_identifier ().
            (lex_string): Likewise.

    gcc/testsuite/ChangeLog:

            PR preprocessor/103902
            * g++.dg/cpp0x/udlit-extended-id-1.C: New test.
            * g++.dg/cpp0x/udlit-extended-id-2.C: New test.
            * g++.dg/cpp0x/udlit-extended-id-3.C: New test.
            * g++.dg/cpp0x/udlit-extended-id-4.C: New test.

  parent reply	other threads:[~2023-07-19  3:59 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-04  9:24 [Bug c++/103902] New: Only the addition space between string-literal and identifier in a literal-operator-id will be accepted by GCC where the identifier is not in a " xmh970252187 at gmail dot com
2022-01-04 23:07 ` [Bug preprocessor/103902] GCC requires a space between string-literal and identifier in a literal-operator-id where the identifier is not in " pinskia at gcc dot gnu.org
2022-06-10 19:13 ` lhyatt at gcc dot gnu.org
2022-06-10 19:14 ` lhyatt at gcc dot gnu.org
2022-06-28 15:22 ` lhyatt at gcc dot gnu.org
2023-02-08 17:42 ` pinskia at gcc dot gnu.org
2023-07-19  3:59 ` cvs-commit at gcc dot gnu.org [this message]
2023-07-19  4:02 ` lhyatt at gcc dot gnu.org
2023-07-20  0:23 ` thiago.bauermann at linaro dot org
2023-07-20  1:11 ` lhyatt at gcc dot gnu.org
2023-07-20  1:14 ` thiago.bauermann at linaro dot org
2023-07-20  3:21 ` cvs-commit at gcc dot gnu.org
2023-07-20 16:24 ` thiago.bauermann at linaro dot org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-103902-4-m259vavf8T@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).