From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 1A29E385697B; Wed, 19 Jul 2023 03:59:24 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1A29E385697B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1689739164; bh=kgJl8m5vKSRooke2VyE0BiAPjeqdriwgmeHOXXesLr4=; h=From:To:Subject:Date:In-Reply-To:References:From; b=ubYGzVlroCDe+GydJzrQgxvZuailF7+ffJziZAzIyUiIy6F4s8mjY53/Mz261Y3O2 4QgpvcOLG1DShHZOOENaTDCphFAHxXuHQgpVopTZY1dWt/Eq5UG3akxatakJyxbKc9 vZMFnhP05RoyQfmV+nBHz+8Nc8VFfMpc95zKRScU= From: "cvs-commit at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug preprocessor/103902] GCC requires a space between string-literal and identifier in a literal-operator-id where the identifier is not in basic character set Date: Wed, 19 Jul 2023 03:59:23 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: preprocessor X-Bugzilla-Version: 11.2.1 X-Bugzilla-Keywords: patch, rejects-valid X-Bugzilla-Severity: normal X-Bugzilla-Who: cvs-commit at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: lhyatt at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D103902 --- Comment #6 from CVS Commits --- The master branch has been updated by Lewis Hyatt : https://gcc.gnu.org/g:1d3e4f4e2d19c3394dc018118a78c1f4b59cb5c2 commit r14-2629-g1d3e4f4e2d19c3394dc018118a78c1f4b59cb5c2 Author: Lewis Hyatt Date: Tue Jul 18 17:16:08 2023 -0400 libcpp: Handle extended characters in user-defined literal suffix [PR103902] The PR complains that we do not handle UTF-8 in the suffix for a user-defined literal, such as: bool operator ""_=C3=8F (unsigned long long); In fact we don't handle any extended identifier characters there, wheth= er UTF-8, UCNs, or the $ sign. We do handle it fine if the optional space after the "" tokens is included, since then the identifier is lexed in the "normal" way as its own token. But when it is lexed as part of the stri= ng token, this is handled in lex_string() with a one-off loop that is not aware of extended characters. This patch fixes it by adding a new function scan_cur_identifier() that= can be used to lex an identifier while in the middle of lexing another toke= n. BTW, the other place that has been mis-lexing identifiers is lex_identifier_intern(), which is used to implement #pragma push_macro and #pragma pop_macro. This does not support extended characters either. I will add that in a subsequent patch, because it can't directly reuse = the new function, but rather needs to lex from a string instead of a cpp_buffer. With scan_cur_identifier(), we do also correctly warn about bidi and normalization issues in the extended identifiers comprising the suffix. libcpp/ChangeLog: PR preprocessor/103902 * lex.cc (identifier_diagnostics_on_lex): New function refactor= ing some common code. (lex_identifier_intern): Use the new function. (lex_identifier): Don't run identifier diagnostics here, rather= let the call site do it when needed. (_cpp_lex_direct): Adjust the call sites of lex_identifier () acccordingly. (struct scan_id_result): New struct. (scan_cur_identifier): New function. (create_literal2): New function. (lit_accum::create_literal2): New function. (is_macro): Folded into new function... (maybe_ignore_udl_macro_suffix): ...here. (is_macro_not_literal_suffix): Folded likewise. (lex_raw_string): Handle UTF-8 in UDL suffix via scan_cur_identifier (). (lex_string): Likewise. gcc/testsuite/ChangeLog: PR preprocessor/103902 * g++.dg/cpp0x/udlit-extended-id-1.C: New test. * g++.dg/cpp0x/udlit-extended-id-2.C: New test. * g++.dg/cpp0x/udlit-extended-id-3.C: New test. * g++.dg/cpp0x/udlit-extended-id-4.C: New test.=