From: Jason Merrill <jason@redhat.com>
To: Lewis Hyatt <lhyatt@gmail.com>, gcc-patches@gcc.gnu.org
Subject: Re: Ping^3: [PATCH] libcpp: Handle extended characters in user-defined literal suffix [PR103902]
Date: Wed, 15 Feb 2023 13:39:30 -0500 [thread overview]
Message-ID: <8af7ac06-02a0-40a6-5fd7-56a4e40cccee@redhat.com> (raw)
In-Reply-To: <20220926222725.GA19652@ldh-imac.local>
On 9/26/22 15:27, Lewis Hyatt wrote:
> On Wed, Jun 15, 2022 at 03:06:16PM -0400, Lewis Hyatt wrote:
>> On Tue, Jun 14, 2022 at 05:26:49PM -0400, Lewis Hyatt wrote:
>>> Hello-
>>>
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103902
>>>
>>> The attached patch resolves PR preprocessor/103902 as described in the patch
>>> message inline below. bootstrap + regtest all languages was successful on
>>> x86-64 Linux, with no new failures:
>>>
>>> FAIL 103 103
>>> PASS 542338 542371
>>> UNSUPPORTED 15247 15250
>>> UNTESTED 136 136
>>> XFAIL 4166 4166
>>> XPASS 17 17
>>>
>>> Please let me know if it looks OK?
>>>
>>> A few questions I have:
>>>
>>> - A difference introduced with this patch is that after lexing something
>>> like `operator ""_abc', then `_abc' is added to the identifier hash map,
>>> whereas previously it was not. I feel like this must be OK because with the
>>> optional space as in `operator "" _abc', it would be added with or without the
>>> patch.
>>>
>>> - The behavior of `#pragma GCC poison' is not consistent (including prior to
>>> my patch). I tried to make it more so but there is still one thing I want to
>>> ask about. Leaving aside extended characters for now, the inconsistency is
>>> that currently the poison is only checked, when the suffix appears as a
>>> standalone token.
>>>
>>> #pragma GCC poison _X
>>> bool operator ""_X (unsigned long long); //accepted before the patch,
>>> //rejected after it
>>> bool operator "" _X (unsigned long long); //rejected either before or after
>>> const char * operator ""_X (const char *, unsigned long); //accepted before,
>>> //rejected after
>>> const char * operator "" _X (const char *, unsigned long); //rejected either
>>>
>>> const char * s = ""_X; //accepted before the patch, rejected after it
>>> const bool b = 1_X; //accepted before or after ****
>>>
>>> I feel like after the patch, the behavior is the expected behavior for all
>>> cases but the last one. Here, we allow the poisoned identifier because it's
>>> not lexed as an identifier, it's lexed as part of a pp-number. Does it seem OK
>>> like this or does it need to be addressed?
>>
>> Sorry, that version actually did not handle the case of -Wc++11-compat in
>> c++98 mode correctly. This updated version fixes that and adds the missing
>> test coverage for that, if you could please review this one instead?
>>
>> By the way, the pipermail archive seems to permanently mangle UTF-8 in inline
>> attachments. I attached the patch also gzipped to address that for the
>> archive, since the new testcases do use non-ASCII characters.
>>
>> Thanks for taking a look!
>
> Hello-
>
> May I please ping this patch again? Joseph suggested that it would be best if
> a C++ maintainer has a look at it. This is one of just a few places left where
> we don't handle UTF-8 properly in libcpp, it would be really nice to get them
> fixed up if there is time to review this patch. Thanks!
>
> https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596704.html
>
> I re-attached it here as it required some trivial rebasing on top of recently
> pushed changes. As before, I also attached the gzipped version so that the
> UTF-8 testcases show up OK in the online archive, in case that's still an
> issue. Thanks for taking a look!
Thank you for the patch, sorry it slipped off my radar.
> This patch fixes it by adding a new function scan_cur_identifier() that can be
> used to lex an identifier while in the middle of lexing another token. It is
> somewhat duplicative of the code in lex_identifier(), which handles the normal
> case, but I think there's no good way to avoid that without pessimizing the
> usual case, since lex_identifier() takes advantage of the fact that the first
> character of the identifier has already been analyzed.
So could you analyze the first character and then call lex_identifier?
> With scan_cur_identifier(), we do also correctly warn about bidi and
> normalization issues in the extended identifiers comprising the suffix, and we
> check for poisoned identifiers there as well.
Hmm, I don't think we want the check for poisoned identifiers; a suffix
is not a name. That goes for the other diagnostics in
identifier_diagnostics_on_lex, as well. At the meeting last week the
committee decided to deprecate the declaration with a space to clarify
this distinction.
> + if (!accum.accum)
> + create_literal2 (pfile, token, base,
> + suffix_begin - base,
> + NODE_NAME (sr.node),
> + NODE_LEN (sr.node),
> + type);
> + else
> + {
> + accum.create_literal2 (pfile, token, base,
> + suffix_begin - base,
> + NODE_NAME (sr.node),
> + NODE_LEN (sr.node),
> + type);
> + _cpp_release_buff (pfile, accum.first);
> + }
How about always calling accum.create_literal2?
Jason
next prev parent reply other threads:[~2023-02-15 18:39 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-14 21:26 Lewis Hyatt
2022-06-15 19:06 ` Lewis Hyatt
2022-09-26 22:27 ` Ping^3: " Lewis Hyatt
2023-02-10 16:30 ` Jakub Jelinek
2023-02-10 16:58 ` Lewis Hyatt
2023-02-10 17:10 ` Jakub Jelinek
2023-02-15 18:39 ` Jason Merrill [this message]
2023-02-15 23:18 ` Lewis Hyatt
2023-03-02 23:07 ` [PATCH v2] " Lewis Hyatt
2023-07-18 20:33 ` Jason Merrill
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8af7ac06-02a0-40a6-5fd7-56a4e40cccee@redhat.com \
--to=jason@redhat.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=lhyatt@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).