public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jason Merrill <jason@redhat.com>
To: Jakub Jelinek <jakub@redhat.com>
Cc: Joseph Myers <joseph@codesourcery.com>, gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] c++, v2: Implement C++23 P2071R2 - Named universal character escapes [PR106648]
Date: Thu, 1 Sep 2022 15:00:28 -0400	[thread overview]
Message-ID: <37250e6c-80f9-2b93-a381-c1c9b869c04d@redhat.com> (raw)
In-Reply-To: <YxCULjMrhvN5f7xR@tucnak>

On 9/1/22 07:14, Jakub Jelinek wrote:
> On Wed, Aug 31, 2022 at 12:14:22PM -0400, Jason Merrill wrote:
>> On 8/31/22 11:07, Jakub Jelinek wrote:
>>> On Wed, Aug 31, 2022 at 10:52:49AM -0400, Jason Merrill wrote:
>>>> It could be more explicit, but I think we can assume that from the existing
>>>> wording; it says it designates the named character.  If there is no such
>>>> character, that cannot be satisfied, so it must be ill-formed.
>>>
>>> Ok.
>>>
>>>>> So, we could reject the int h case above and accept silently the others?
>>>>
>>>> Why not warn on the others?
>>>
>>> We were always silent for the cases like \u123X or \U12345X.
>>> Do you think we should emit some warnings (but never pedwarns/errors in that
>>> case) that it is universal character name like but not completely?
>>
>> I think that would be helpful, at least for \u{ and \N{.
> 
> Ok.
> 
>>> Given what you said above, I think that is what we want for the last 2
>>> for C++23, the question is if it is ok also for C++20/C17 etc. and whether
>>> it should depend on -pedantic or -pedantic-errors or GNU vs. ISO mode
>>> or not in that case.  We could handle those 2 also differently, just
>>> warn instead of error for the \N{ABC} case if not in C++23 mode when
>>> identifier_pos.
>>
>> That sounds right.
>>
>>> Here is an incremental version of the patch which will make valid
>>> \u{123} and \N{LATIN SMALL LETTER A WITH ACUTE} an extension in GNU
>>> modes before C++23 and split it as separate tokens in ISO modes.
>>
>> Looks good.
> 
> Here is a patch which implements that.
> I just wonder if we shouldn't have some warning option that would cover
> these warnings, currently one needs to use -w to disable those warnings.
> 
> Apparently clang uses -Wunicode option to cover these, but unfortunately
> they don't bother to document it (nor almost any other warning option),
> so it is unclear what else exactly it covers.  Plus a question is how
> we should document that option for GCC...

We might as well use the same flag name, and document it to mean what it 
currently means for GCC.

> 2022-09-01  Jakub Jelinek  <jakub@redhat.com>
> 
> 	* charset.cc (_cpp_valid_ucn): In possible identifier contexts, don't
> 	handle \u{ or \N{ specially in -std=c* modes except -std=c++2{3,b}.
> 	In possible identifier contexts, don't emit an error and punt
> 	if \N isn't followed by {, or if \N{} surrounds some lower case
> 	letters or _.  In possible identifier contexts when not C++23, don't
> 	emit an error but warning about unknown character names and treat as
> 	separate tokens.  When treating as separate tokens \u{ or \N{, emit
> 	warnings.
> 
> 	* c-c++-common/cpp/delimited-escape-seq-4.c: New test.
> 	* c-c++-common/cpp/delimited-escape-seq-5.c: New test.
> 	* c-c++-common/cpp/named-universal-char-escape-5.c: New test.
> 	* c-c++-common/cpp/named-universal-char-escape-6.c: New test.
> 	* g++.dg/cpp23/named-universal-char-escape1.C: New test.
> 	* g++.dg/cpp23/named-universal-char-escape2.C: New test.
> 
> --- libcpp/charset.cc.jj	2022-09-01 09:47:24.146886929 +0200
> +++ libcpp/charset.cc	2022-09-01 12:52:28.424034208 +0200
> @@ -1448,7 +1448,11 @@ _cpp_valid_ucn (cpp_reader *pfile, const
>     if (str[-1] == 'u')
>       {
>         length = 4;
> -      if (str < limit && *str == '{')
> +      if (str < limit
> +	  && *str == '{'
> +	  && (!identifier_pos
> +	      || CPP_OPTION (pfile, delimited_escape_seqs)
> +	      || !CPP_OPTION (pfile, std)))
>   	{
>   	  str++;
>   	  /* Magic value to indicate no digits seen.  */
> @@ -1462,8 +1466,22 @@ _cpp_valid_ucn (cpp_reader *pfile, const
>     else if (str[-1] == 'N')
>       {
>         length = 4;
> +      if (identifier_pos
> +	  && !CPP_OPTION (pfile, delimited_escape_seqs)
> +	  && CPP_OPTION (pfile, std))
> +	{
> +	  *cp = 0;
> +	  return false;
> +	}
>         if (str == limit || *str != '{')
> -	cpp_error (pfile, CPP_DL_ERROR, "'\\N' not followed by '{'");
> +	{
> +	  if (identifier_pos)
> +	    {
> +	      *cp = 0;
> +	      return false;
> +	    }
> +	  cpp_error (pfile, CPP_DL_ERROR, "'\\N' not followed by '{'");
> +	}
>         else
>   	{
>   	  str++;
> @@ -1489,8 +1507,16 @@ _cpp_valid_ucn (cpp_reader *pfile, const
>   
>   	  if (str < limit && *str == '}')
>   	    {
> -	      if (name == str && identifier_pos)
> +	      if (identifier_pos && (name == str || !strict))
>   		{
> +		  if (name == str)
> +		    cpp_warning (pfile, CPP_W_NONE,
> +				 "empty named universal character escape "
> +				 "sequence; treating it as separate tokens");
> +		  else
> +		    cpp_warning (pfile, CPP_W_NONE,
> +				 "incomplete named universal character escape "
> +				 "sequence; treating it as separate tokens");

It looks like this is handling \N{abc}, for which "incomplete" seems 
like the wrong description; it's complete, just wrong, and the 
diagnostic doesn't help correct it.

I think we don't want to handle !strict here, but rather...

>   		  *cp = 0;
>   		  return false;
>   		}
> @@ -1515,27 +1541,48 @@ _cpp_valid_ucn (cpp_reader *pfile, const
>   					   uname2c_tree, NULL);
>   		  if (result == (cppchar_t) -1)
>   		    {
> -		      cpp_error (pfile, CPP_DL_ERROR,
> -				 "\\N{%.*s} is not a valid universal "
> -				 "character", (int) (str - name), name);
> +		      bool ret = true;
> +		      if (identifier_pos
> +			  && !CPP_OPTION (pfile, delimited_escape_seqs))

...here...

> +			ret = cpp_warning (pfile, CPP_W_NONE,
> +					   "\\N{%.*s} is not a valid "
> +					   "universal character; treating it "
> +					   "as separate tokens",
> +					   (int) (str - name), name);
> +		      else
> +			cpp_error (pfile, CPP_DL_ERROR,
> +				   "\\N{%.*s} is not a valid universal "
> +				   "character", (int) (str - name), name);
>   
>   		      /* Try to do a loose name lookup according to
>   			 Unicode loose matching rule UAX44-LM2.  */
>   		      char canon_name[uname2c_max_name_len + 1];
>   		      result = _cpp_uname2c_uax44_lm2 ((const char *) name,
>   						       str - name, canon_name);
> -		      if (result != (cppchar_t) -1)
> +		      if (result != (cppchar_t) -1 && ret)
>   			cpp_error (pfile, CPP_DL_NOTE,
>   				   "did you mean \\N{%s}?", canon_name);
>   		      else
> -			result = 0x40;
> +			result = 0xC0;
> +		      if (identifier_pos
> +			  && !CPP_OPTION (pfile, delimited_escape_seqs))

...and here.

> +			{
> +			  *cp = 0;
> +			  return false;
> +			}
>   		    }
>   		}
>   	      str++;
>   	      extend_char_range (char_range, loc_reader);
>   	    }
>   	  else if (identifier_pos)
> -	    length = 1;
> +	    {
> +	      cpp_warning (pfile, CPP_W_NONE,
> +			   "incomplete named universal character escape "
> +			   "sequence; treating it as separate tokens");
> +	      *cp = 0;
> +	      return false;
> +	    }
>   	  else
>   	    {
>   	      cpp_error (pfile, CPP_DL_ERROR,
> @@ -1584,12 +1631,17 @@ _cpp_valid_ucn (cpp_reader *pfile, const
>         }
>       while (--length);
>   
> -  if (delimited
> -      && str < limit
> -      && *str == '}'
> -      && (length != 32 || !identifier_pos))
> +  if (delimited && str < limit && *str == '}')
>       {
> -      if (length == 32)
> +      if (length == 32 && identifier_pos)
> +	{
> +	  cpp_warning (pfile, CPP_W_NONE,
> +		       "empty delimited escape sequence; "
> +		       "treating it as separate tokens");
> +	  *cp = 0;
> +	  return false;
> +	}
> +      else if (length == 32)
>   	cpp_error (pfile, CPP_DL_ERROR,
>   		   "empty delimited escape sequence");
>         else if (!CPP_OPTION (pfile, delimited_escape_seqs)
> @@ -1607,6 +1659,10 @@ _cpp_valid_ucn (cpp_reader *pfile, const
>        error message in that case.  */
>     if (length && identifier_pos)
>       {
> +      if (delimited)
> +	cpp_warning (pfile, CPP_W_NONE,
> +		     "incomplete delimited escape sequence; "
> +		     "treating it as separate tokens");
>         *cp = 0;
>         return false;
>       }
> --- gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-4.c.jj	2022-09-01 11:46:26.951102634 +0200
> +++ gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-4.c	2022-09-01 12:50:17.351810090 +0200
> @@ -0,0 +1,13 @@
> +/* P2290R3 - Delimited escape sequences */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target wchar } */
> +/* { dg-options "-std=gnu99 -Wno-c++-compat" { target c } } */
> +/* { dg-options "-std=gnu++20" { target c++ } } */
> +
> +#define z(x) 0
> +#define a z(
> +int b = a\u{});		/* { dg-warning "empty delimited escape sequence; treating it as separate tokens" } */
> +int c = a\u{);		/* { dg-warning "incomplete delimited escape sequence; treating it as separate tokens" } */
> +int d = a\u{12XYZ});	/* { dg-warning "incomplete delimited escape sequence; treating it as separate tokens" } */
> +int e = a\u123);
> +int f = a\U1234567);
> --- gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-5.c.jj	2022-09-01 11:46:26.951102634 +0200
> +++ gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-5.c	2022-09-01 12:57:50.830665924 +0200
> @@ -0,0 +1,13 @@
> +/* P2290R3 - Delimited escape sequences */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target wchar } */
> +/* { dg-options "-std=c17 -Wno-c++-compat" { target c } } */
> +/* { dg-options "-std=c++23" { target c++ } } */
> +
> +#define z(x) 0
> +#define a z(
> +int b = a\u{});		/* { dg-warning "empty delimited escape sequence; treating it as separate tokens" "" { target c++23 } } */
> +int c = a\u{);		/* { dg-warning "incomplete delimited escape sequence; treating it as separate tokens" "" { target c++23 } } */
> +int d = a\u{12XYZ});	/* { dg-warning "incomplete delimited escape sequence; treating it as separate tokens" "" { target c++23 } } */
> +int e = a\u123);
> +int f = a\U1234567);
> --- gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-5.c.jj	2022-09-01 11:46:26.951102634 +0200
> +++ gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-5.c	2022-09-01 12:48:01.753647302 +0200
> @@ -0,0 +1,15 @@
> +/* P2071R2 - Named universal character escapes */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target wchar } */
> +/* { dg-options "-std=gnu99 -Wno-c++-compat" { target c } } */
> +/* { dg-options "-std=gnu++20" { target c++ } } */
> +
> +#define z(x) 0
> +#define a z(
> +int b = a\N{});				/* { dg-warning "empty named universal character escape sequence; treating it as separate tokens" } */
> +int c = a\N{);				/* { dg-warning "incomplete named universal character escape sequence; treating it as separate tokens" } */
> +int d = a\N);
> +int e = a\NARG);
> +int f = a\N{abc});			/* { dg-warning "incomplete named universal character escape sequence; treating it as separate tokens" } */
> +int g = a\N{ABC.123});			/* { dg-warning "incomplete named universal character escape sequence; treating it as separate tokens" } */
> +int h = a\N{NON-EXISTENT CHAR});	/* { dg-warning "is not a valid universal character" } */
> --- gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-6.c.jj	2022-09-01 11:46:26.951102634 +0200
> +++ gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-6.c	2022-09-01 11:46:26.951102634 +0200
> @@ -0,0 +1,16 @@
> +/* P2071R2 - Named universal character escapes */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target wchar } */
> +/* { dg-options "-std=c17 -Wno-c++-compat" { target c } } */
> +/* { dg-options "-std=c++20" { target c++ } } */
> +
> +#define z(x) 0
> +#define a z(
> +int b = a\N{});
> +int c = a\N{);
> +int d = a\N);
> +int e = a\NARG);
> +int f = a\N{abc});
> +int g = a\N{ABC.123});
> +int h = a\N{NON-EXISTENT CHAR});	/* { dg-bogus "is not a valid universal character" } */
> +int i = a\N{LATIN SMALL CHARACTER A WITH ACUTE});
> --- gcc/testsuite/g++.dg/cpp23/named-universal-char-escape1.C.jj	2022-09-01 11:46:26.951102634 +0200
> +++ gcc/testsuite/g++.dg/cpp23/named-universal-char-escape1.C	2022-09-01 12:56:48.031516792 +0200
> @@ -0,0 +1,14 @@
> +// P2071R2 - Named universal character escapes
> +// { dg-do compile }
> +// { dg-require-effective-target wchar }
> +
> +#define z(x) 0
> +#define a z(
> +int b = a\N{});				// { dg-warning "empty named universal character escape sequence; treating it as separate tokens" "" { target c++23 } }
> +int c = a\N{);				// { dg-warning "incomplete named universal character escape sequence; treating it as separate tokens" "" { target c++23 } }
> +int d = a\N);
> +int e = a\NARG);
> +int f = a\N{abc});			// { dg-warning "incomplete named universal character escape sequence; treating it as separate tokens" "" { target c++23 } }
> +int g = a\N{ABC.123});			// { dg-warning "incomplete named universal character escape sequence; treating it as separate tokens" "" { target c++23 } }
> +int h = a\N{NON-EXISTENT CHAR});	// { dg-error "is not a valid universal character" "" { target c++23 } }
> +					// { dg-error "was not declared in this scope" "" { target c++23 } .-1 }
> --- gcc/testsuite/g++.dg/cpp23/named-universal-char-escape2.C.jj	2022-09-01 12:54:47.436150733 +0200
> +++ gcc/testsuite/g++.dg/cpp23/named-universal-char-escape2.C	2022-09-01 12:59:22.512428644 +0200
> @@ -0,0 +1,16 @@
> +// P2071R2 - Named universal character escapes
> +// { dg-do compile }
> +// { dg-require-effective-target wchar }
> +// { dg-options "" }
> +
> +#define z(x) 0
> +#define a z(
> +int b = a\N{});				// { dg-warning "empty named universal character escape sequence; treating it as separate tokens" }
> +int c = a\N{);				// { dg-warning "incomplete named universal character escape sequence; treating it as separate tokens" }
> +int d = a\N);
> +int e = a\NARG);
> +int f = a\N{abc});			// { dg-warning "incomplete named universal character escape sequence; treating it as separate tokens" }
> +int g = a\N{ABC.123});			// { dg-warning "incomplete named universal character escape sequence; treating it as separate tokens" }
> +int h = a\N{NON-EXISTENT CHAR});	// { dg-error "is not a valid universal character" "" { target c++23 } }
> +					// { dg-error "was not declared in this scope" "" { target c++23 } .-1 }
> +					// { dg-warning "is not a valid universal character; treating it as separate tokens" "" { target c++20_down } .-2 }
> 
> 
> 	Jakub
> 


  reply	other threads:[~2022-09-01 19:00 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-21 18:18 [PATCH] c++: " Jakub Jelinek
2022-08-22 15:50 ` [PATCH] c++: Predefine __cpp_named_character_escapes=202207L for C++23 [PR106648] Jakub Jelinek
2022-08-24 20:22 ` [PATCH] c++: Implement C++23 P2071R2 - Named universal character escapes [PR106648] Jason Merrill
2022-08-25  8:49   ` [PATCH] c++, v2: " Jakub Jelinek
2022-08-25 13:34     ` Jason Merrill
2022-08-30 21:10     ` Joseph Myers
2022-08-30 21:18       ` Jakub Jelinek
2022-08-30 21:37         ` Jakub Jelinek
2022-08-31 14:18           ` Jason Merrill
2022-08-31 14:35           ` Jakub Jelinek
2022-08-31 14:52             ` Jason Merrill
2022-08-31 15:07               ` Jakub Jelinek
2022-08-31 15:25                 ` Jakub Jelinek
2022-08-31 16:14                 ` Jason Merrill
2022-09-01 11:14                   ` Jakub Jelinek
2022-09-01 19:00                     ` Jason Merrill [this message]
2022-09-01 20:23                       ` Jakub Jelinek
2022-09-03 10:29                       ` [PATCH] libcpp, v3: Named universal character escapes and delimited escape sequence tweaks Jakub Jelinek
2022-09-03 10:54                         ` Jakub Jelinek
2022-09-05  7:54                           ` Jakub Jelinek
2022-09-07  1:32                           ` Jason Merrill

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=37250e6c-80f9-2b93-a381-c1c9b869c04d@redhat.com \
    --to=jason@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=joseph@codesourcery.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).