public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jason Merrill <jason@redhat.com>
To: Lewis Hyatt <lhyatt@gmail.com>, gcc-patches@gcc.gnu.org
Subject: Re: Ping^3: [PATCH] libcpp: Handle extended characters in user-defined literal suffix [PR103902]
Date: Wed, 15 Feb 2023 13:39:30 -0500	[thread overview]
Message-ID: <8af7ac06-02a0-40a6-5fd7-56a4e40cccee@redhat.com> (raw)
In-Reply-To: <20220926222725.GA19652@ldh-imac.local>

On 9/26/22 15:27, Lewis Hyatt wrote:
> On Wed, Jun 15, 2022 at 03:06:16PM -0400, Lewis Hyatt wrote:
>> On Tue, Jun 14, 2022 at 05:26:49PM -0400, Lewis Hyatt wrote:
>>> Hello-
>>>
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103902
>>>
>>> The attached patch resolves PR preprocessor/103902 as described in the patch
>>> message inline below. bootstrap + regtest all languages was successful on
>>> x86-64 Linux, with no new failures:
>>>
>>> FAIL 103 103
>>> PASS 542338 542371
>>> UNSUPPORTED 15247 15250
>>> UNTESTED 136 136
>>> XFAIL 4166 4166
>>> XPASS 17 17
>>>
>>> Please let me know if it looks OK?
>>>
>>> A few questions I have:
>>>
>>> - A difference introduced with this patch is that after lexing something
>>> like `operator ""_abc', then `_abc' is added to the identifier hash map,
>>> whereas previously it was not. I feel like this must be OK because with the
>>> optional space as in `operator "" _abc', it would be added with or without the
>>> patch.
>>>
>>> - The behavior of `#pragma GCC poison' is not consistent (including prior to
>>>    my patch). I tried to make it more so but there is still one thing I want to
>>>    ask about. Leaving aside extended characters for now, the inconsistency is
>>>    that currently the poison is only checked, when the suffix appears as a
>>>    standalone token.
>>>
>>>    #pragma GCC poison _X
>>>    bool operator ""_X (unsigned long long);   //accepted before the patch,
>>>                                               //rejected after it
>>>    bool operator "" _X (unsigned long long);  //rejected either before or after
>>>    const char * operator ""_X (const char *, unsigned long); //accepted before,
>>>                                                              //rejected after
>>>    const char * operator "" _X (const char *, unsigned long); //rejected either
>>>
>>>    const char * s = ""_X; //accepted before the patch, rejected after it
>>>    const bool b = 1_X; //accepted before or after ****
>>>
>>> I feel like after the patch, the behavior is the expected behavior for all
>>> cases but the last one. Here, we allow the poisoned identifier because it's
>>> not lexed as an identifier, it's lexed as part of a pp-number. Does it seem OK
>>> like this or does it need to be addressed?
>>
>> Sorry, that version actually did not handle the case of -Wc++11-compat in
>> c++98 mode correctly. This updated version fixes that and adds the missing
>> test coverage for that, if you could please review this one instead?
>>
>> By the way, the pipermail archive seems to permanently mangle UTF-8 in inline
>> attachments. I attached the patch also gzipped to address that for the
>> archive, since the new testcases do use non-ASCII characters.
>>
>> Thanks for taking a look!
> 
> Hello-
> 
> May I please ping this patch again? Joseph suggested that it would be best if
> a C++ maintainer has a look at it. This is one of just a few places left where
> we don't handle UTF-8 properly in libcpp, it would be really nice to get them
> fixed up if there is time to review this patch. Thanks!
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596704.html
> 
> I re-attached it here as it required some trivial rebasing on top of recently
> pushed changes. As before, I also attached the gzipped version so that the
> UTF-8 testcases show up OK in the online archive, in case that's still an
> issue. Thanks for taking a look!

Thank you for the patch, sorry it slipped off my radar.

> This patch fixes it by adding a new function scan_cur_identifier() that can be
> used to lex an identifier while in the middle of lexing another token. It is
> somewhat duplicative of the code in lex_identifier(), which handles the normal
> case, but I think there's no good way to avoid that without pessimizing the
> usual case, since lex_identifier() takes advantage of the fact that the first
> character of the identifier has already been analyzed.

So could you analyze the first character and then call lex_identifier?

> With scan_cur_identifier(), we do also correctly warn about bidi and
> normalization issues in the extended identifiers comprising the suffix, and we
> check for poisoned identifiers there as well.

Hmm, I don't think we want the check for poisoned identifiers; a suffix 
is not a name.  That goes for the other diagnostics in 
identifier_diagnostics_on_lex, as well.  At the meeting last week the 
committee decided to deprecate the declaration with a space to clarify 
this distinction.

> +	      if (!accum.accum)
> +		create_literal2 (pfile, token, base,
> +				 suffix_begin - base,
> +				 NODE_NAME (sr.node),
> +				 NODE_LEN (sr.node),
> +				 type);
> +	      else
> +		{
> +		  accum.create_literal2 (pfile, token, base,
> +					 suffix_begin - base,
> +					 NODE_NAME (sr.node),
> +					 NODE_LEN (sr.node),
> +					 type);
> +		  _cpp_release_buff (pfile, accum.first);
> +		}

How about always calling accum.create_literal2?

Jason


  parent reply	other threads:[~2023-02-15 18:39 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-14 21:26 Lewis Hyatt
2022-06-15 19:06 ` Lewis Hyatt
2022-09-26 22:27   ` Ping^3: " Lewis Hyatt
2023-02-10 16:30     ` Jakub Jelinek
2023-02-10 16:58       ` Lewis Hyatt
2023-02-10 17:10         ` Jakub Jelinek
2023-02-15 18:39     ` Jason Merrill [this message]
2023-02-15 23:18       ` Lewis Hyatt
2023-03-02 23:07         ` [PATCH v2] " Lewis Hyatt
2023-07-18 20:33           ` Jason Merrill

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8af7ac06-02a0-40a6-5fd7-56a4e40cccee@redhat.com \
    --to=jason@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=lhyatt@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).