On Wed, Jun 15, 2022 at 03:06:16PM -0400, Lewis Hyatt wrote:
> On Tue, Jun 14, 2022 at 05:26:49PM -0400, Lewis Hyatt wrote:
> > Hello-
> > 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103902
> > 
> > The attached patch resolves PR preprocessor/103902 as described in the patch
> > message inline below. bootstrap + regtest all languages was successful on
> > x86-64 Linux, with no new failures:
> > 
> > FAIL 103 103
> > PASS 542338 542371
> > UNSUPPORTED 15247 15250
> > UNTESTED 136 136
> > XFAIL 4166 4166
> > XPASS 17 17
> > 
> > Please let me know if it looks OK?
> > 
> > A few questions I have:
> > 
> > - A difference introduced with this patch is that after lexing something
> > like `operator ""_abc', then `_abc' is added to the identifier hash map,
> > whereas previously it was not. I feel like this must be OK because with the
> > optional space as in `operator "" _abc', it would be added with or without the
> > patch.
> > 
> > - The behavior of `#pragma GCC poison' is not consistent (including prior to
> >   my patch). I tried to make it more so but there is still one thing I want to
> >   ask about. Leaving aside extended characters for now, the inconsistency is
> >   that currently the poison is only checked, when the suffix appears as a
> >   standalone token.
> > 
> >   #pragma GCC poison _X
> >   bool operator ""_X (unsigned long long);   //accepted before the patch,
> >                                              //rejected after it
> >   bool operator "" _X (unsigned long long);  //rejected either before or after
> >   const char * operator ""_X (const char *, unsigned long); //accepted before,
> >                                                             //rejected after
> >   const char * operator "" _X (const char *, unsigned long); //rejected either
> > 
> >   const char * s = ""_X; //accepted before the patch, rejected after it
> >   const bool b = 1_X; //accepted before or after ****
> > 
> > I feel like after the patch, the behavior is the expected behavior for all
> > cases but the last one. Here, we allow the poisoned identifier because it's
> > not lexed as an identifier, it's lexed as part of a pp-number. Does it seem OK
> > like this or does it need to be addressed?
> 
> Sorry, that version actually did not handle the case of -Wc++11-compat in
> c++98 mode correctly. This updated version fixes that and adds the missing
> test coverage for that, if you could please review this one instead?
> 
> By the way, the pipermail archive seems to permanently mangle UTF-8 in inline
> attachments. I attached the patch also gzipped to address that for the
> archive, since the new testcases do use non-ASCII characters.
> 
> Thanks for taking a look!

Hello-

May I please ping this patch again? Joseph suggested that it would be best if
a C++ maintainer has a look at it. This is one of just a few places left where
we don't handle UTF-8 properly in libcpp, it would be really nice to get them
fixed up if there is time to review this patch. Thanks!

https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596704.html

I re-attached it here as it required some trivial rebasing on top of recently
pushed changes. As before, I also attached the gzipped version so that the
UTF-8 testcases show up OK in the online archive, in case that's still an
issue. Thanks for taking a look!

-Lewis