From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jason@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124])
	by sourceware.org (Postfix) with ESMTPS id 55BFD384F022
	for <gcc-patches@gcc.gnu.org>; Wed,  7 Sep 2022 01:32:16 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 55BFD384F022
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1662514335;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=AtthsFhtqn/RYw32w0rJQ1s+8li/h/dtcnAUgwinUMA=;
	b=ej6Q6QGhe21BDD1Mob4VOevGXR9HqVsTuoY6TUzMp9PD/dqluvEP5IsLTCLNgS8Dm3uXJ2
	QMs47g76OL5Nf505jfq6so82SH503mPuDB5a2kh3rzqMqFZwD7RRCzytv7Qzli3Zb8UMcL
	Qh3S4VBjj3g1LYq4L+GozGh510oK1V0=
Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com
 [209.85.222.198]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id
 us-mta-544-2ETB9x_bPuSjx1SX9NBOzA-1; Tue, 06 Sep 2022 21:32:14 -0400
X-MC-Unique: 2ETB9x_bPuSjx1SX9NBOzA-1
Received: by mail-qk1-f198.google.com with SMTP id ay10-20020a05620a178a00b006bbcab9d554so10792641qkb.13
        for <gcc-patches@gcc.gnu.org>; Tue, 06 Sep 2022 18:32:14 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:in-reply-to:from:references:to
         :content-language:subject:user-agent:mime-version:date:message-id
         :x-gm-message-state:from:to:cc:subject:date;
        bh=AtthsFhtqn/RYw32w0rJQ1s+8li/h/dtcnAUgwinUMA=;
        b=P7Rk8bMn14bksZnFowH2mRCzB4xyD5Se129O9sIrFWw2BlkMzzGVQNqov7VccjJoiL
         9nUpG7TvwzqSVFG9TCuHUAiJVOnJM4N0nrBmF8hWzZJB3T1JXCEqtBL9TqdbHIXzymPu
         yCH1US8gPKkRqZQZJp6PmROQ+80cd+SydZve47LyS2XO2ztyd6tN2M+maC82GisMP6Vw
         04RurYML1kmKuzgQM8WWZVFi+1PfT1VS6jjwQv6v+Y3LQKXDM68Qh2zPXlY2S8gY/AQR
         mGV9Y+nteoiLna5PKp09gUMPIwH9UVjdwgCARCtMDAgxpKykTX+w/AgwUXIiW+rHdpXI
         AvDA==
X-Gm-Message-State: ACgBeo2JBb4VEeIXezECxhl/O6Byezs78TnycVz+48BNfIbHw4WiaIYb
	sGKRAbm6dD7zWPpdwNB6ZaL3rIZ/USd3sDU3qSfwND9NpblrH+ss57jUD75TVmVGSC2FC/cYwSc
	jLRULmxM4U+cxDFr5fw==
X-Received: by 2002:a05:620a:40c2:b0:6bb:1687:3760 with SMTP id g2-20020a05620a40c200b006bb16873760mr1117550qko.475.1662514334100;
        Tue, 06 Sep 2022 18:32:14 -0700 (PDT)
X-Google-Smtp-Source: AA6agR5lBPnTqn1yHKQqxgKragFE1aHWdQnzZXZd3uDVRh66+FO7hRmF9XNhVf6qmra75Pz0aqxzwA==
X-Received: by 2002:a05:620a:40c2:b0:6bb:1687:3760 with SMTP id g2-20020a05620a40c200b006bb16873760mr1117533qko.475.1662514333603;
        Tue, 06 Sep 2022 18:32:13 -0700 (PDT)
Received: from [192.168.1.101] (130-44-159-43.s15913.c3-0.arl-cbr1.sbo-arl.ma.cable.rcncustomer.com. [130.44.159.43])
        by smtp.gmail.com with ESMTPSA id l21-20020a37f915000000b006bbe7ded98csm12493056qkj.112.2022.09.06.18.32.12
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Tue, 06 Sep 2022 18:32:13 -0700 (PDT)
Message-ID: <1a7b8ffa-4b39-0bd8-8d14-8d5d721dd1fa@redhat.com>
Date: Tue, 6 Sep 2022 21:32:12 -0400
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
 Thunderbird/91.13.0
Subject: Re: [PATCH] libcpp, v3: Named universal character escapes and
 delimited escape sequence tweaks
To: Jakub Jelinek <jakub@redhat.com>, Joseph Myers <joseph@codesourcery.com>,
 gcc-patches@gcc.gnu.org
References: <alpine.DEB.2.22.394.2208302055240.446383@digraph.polyomino.org.uk>
 <Yw5+nPD8O+JTx3uL@tucnak> <Yw6DA3MhofyzWnje@tucnak> <Yw9xsBRmTqkLMlGC@tucnak>
 <5da578e7-9c43-99ea-15c1-aefc641a0654@redhat.com> <Yw95MR3YN1aT2ks6@tucnak>
 <df9730f4-d796-7bf6-dd18-d0c9c5a0cf12@redhat.com> <YxCULjMrhvN5f7xR@tucnak>
 <37250e6c-80f9-2b93-a381-c1c9b869c04d@redhat.com> <YxMsnC5ei4zydz+4@tucnak>
 <YxMyZy6glKRqkL86@tucnak>
From: Jason Merrill <jason@redhat.com>
In-Reply-To: <YxMyZy6glKRqkL86@tucnak>
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Language: en-US
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-7.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On 9/3/22 06:54, Jakub Jelinek wrote:
> On Sat, Sep 03, 2022 at 12:29:52PM +0200, Jakub Jelinek wrote:
>> On Thu, Sep 01, 2022 at 03:00:28PM -0400, Jason Merrill wrote:
>>> We might as well use the same flag name, and document it to mean what it
>>> currently means for GCC.
>>
>> Ok, following patch introduces -Wunicode (on by default).
>>
>>> It looks like this is handling \N{abc}, for which "incomplete" seems like
>>> the wrong description; it's complete, just wrong, and the diagnostic doesn't
>>> help correct it.
>>
>> And also will emit the is not a valid universal character with did you mean
>> if it matches loosely, otherwise will use the not terminated with } after
>> ... wording.
>>
>> Ok if it passes bootstrap/regtest?

OK, thanks.

> Actually, treating the !strict case like the strict case except for always
> warning instead of error if outside of literals is simpler.
> 
> The following version does that.  The only difference on the testcases is in
> the
> int f = a\N{abc});
> cases where it emits different diagnostics.
> 
> 2022-09-03  Jakub Jelinek  <jakub@redhat.com>
> 
> libcpp/
> 	* include/cpplib.h (struct cpp_options): Add cpp_warn_unicode member.
> 	(enum cpp_warning_reason): Add CPP_W_UNICODE.
> 	* init.cc (cpp_create_reader): Initialize cpp_warn_unicode.
> 	* charset.cc (_cpp_valid_ucn): In possible identifier contexts, don't
> 	handle \u{ or \N{ specially in -std=c* modes except -std=c++2{3,b}.
> 	In possible identifier contexts, don't emit an error and punt
> 	if \N isn't followed by {, or if \N{} surrounds some lower case
> 	letters or _.  In possible identifier contexts when not C++23, don't
> 	emit an error but warning about unknown character names and treat as
> 	separate tokens.  When treating as separate tokens \u{ or \N{, emit
> 	warnings.
> gcc/
> 	* doc/invoke.texi (-Wno-unicode): Document.
> gcc/c-family/
> 	* c.opt (Winvalid-utf8): Use ObjC instead of objC.  Remove
> 	" in comments" from description.
> 	(Wunicode): New option.
> gcc/testsuite/
> 	* c-c++-common/cpp/delimited-escape-seq-4.c: New test.
> 	* c-c++-common/cpp/delimited-escape-seq-5.c: New test.
> 	* c-c++-common/cpp/delimited-escape-seq-6.c: New test.
> 	* c-c++-common/cpp/delimited-escape-seq-7.c: New test.
> 	* c-c++-common/cpp/named-universal-char-escape-5.c: New test.
> 	* c-c++-common/cpp/named-universal-char-escape-6.c: New test.
> 	* c-c++-common/cpp/named-universal-char-escape-7.c: New test.
> 	* g++.dg/cpp23/named-universal-char-escape1.C: New test.
> 	* g++.dg/cpp23/named-universal-char-escape2.C: New test.
> 
> --- libcpp/include/cpplib.h.jj	2022-09-03 09:35:41.465984642 +0200
> +++ libcpp/include/cpplib.h	2022-09-03 11:30:57.250677870 +0200
> @@ -565,6 +565,10 @@ struct cpp_options
>        2 if it should be a pedwarn.  */
>     unsigned char cpp_warn_invalid_utf8;
>   
> +  /* True if libcpp should warn about invalid forms of delimited or named
> +     escape sequences.  */
> +  bool cpp_warn_unicode;
> +
>     /* True if -finput-charset= option has been used explicitly.  */
>     bool cpp_input_charset_explicit;
>   
> @@ -675,7 +679,8 @@ enum cpp_warning_reason {
>     CPP_W_CXX20_COMPAT,
>     CPP_W_EXPANSION_TO_DEFINED,
>     CPP_W_BIDIRECTIONAL,
> -  CPP_W_INVALID_UTF8
> +  CPP_W_INVALID_UTF8,
> +  CPP_W_UNICODE
>   };
>   
>   /* Callback for header lookup for HEADER, which is the name of a
> --- libcpp/init.cc.jj	2022-09-01 09:47:23.729892618 +0200
> +++ libcpp/init.cc	2022-09-03 11:19:10.954452329 +0200
> @@ -228,6 +228,7 @@ cpp_create_reader (enum c_lang lang, cpp
>     CPP_OPTION (pfile, warn_date_time) = 0;
>     CPP_OPTION (pfile, cpp_warn_bidirectional) = bidirectional_unpaired;
>     CPP_OPTION (pfile, cpp_warn_invalid_utf8) = 0;
> +  CPP_OPTION (pfile, cpp_warn_unicode) = 1;
>     CPP_OPTION (pfile, cpp_input_charset_explicit) = 0;
>   
>     /* Default CPP arithmetic to something sensible for the host for the
> --- libcpp/charset.cc.jj	2022-09-01 14:19:47.462235851 +0200
> +++ libcpp/charset.cc	2022-09-03 12:42:41.800923600 +0200
> @@ -1448,7 +1448,11 @@ _cpp_valid_ucn (cpp_reader *pfile, const
>     if (str[-1] == 'u')
>       {
>         length = 4;
> -      if (str < limit && *str == '{')
> +      if (str < limit
> +	  && *str == '{'
> +	  && (!identifier_pos
> +	      || CPP_OPTION (pfile, delimited_escape_seqs)
> +	      || !CPP_OPTION (pfile, std)))
>   	{
>   	  str++;
>   	  /* Magic value to indicate no digits seen.  */
> @@ -1462,8 +1466,22 @@ _cpp_valid_ucn (cpp_reader *pfile, const
>     else if (str[-1] == 'N')
>       {
>         length = 4;
> +      if (identifier_pos
> +	  && !CPP_OPTION (pfile, delimited_escape_seqs)
> +	  && CPP_OPTION (pfile, std))
> +	{
> +	  *cp = 0;
> +	  return false;
> +	}
>         if (str == limit || *str != '{')
> -	cpp_error (pfile, CPP_DL_ERROR, "'\\N' not followed by '{'");
> +	{
> +	  if (identifier_pos)
> +	    {
> +	      *cp = 0;
> +	      return false;
> +	    }
> +	  cpp_error (pfile, CPP_DL_ERROR, "'\\N' not followed by '{'");
> +	}
>         else
>   	{
>   	  str++;
> @@ -1489,15 +1507,19 @@ _cpp_valid_ucn (cpp_reader *pfile, const
>   
>   	  if (str < limit && *str == '}')
>   	    {
> -	      if (name == str && identifier_pos)
> +	      if (identifier_pos && name == str)
>   		{
> +		  cpp_warning (pfile, CPP_W_UNICODE,
> +			       "empty named universal character escape "
> +			       "sequence; treating it as separate tokens");
>   		  *cp = 0;
>   		  return false;
>   		}
>   	      if (name == str)
>   		cpp_error (pfile, CPP_DL_ERROR,
>   			   "empty named universal character escape sequence");
> -	      else if (!CPP_OPTION (pfile, delimited_escape_seqs)
> +	      else if ((!identifier_pos || strict)
> +		       && !CPP_OPTION (pfile, delimited_escape_seqs)
>   		       && CPP_OPTION (pfile, cpp_pedantic))
>   		cpp_error (pfile, CPP_DL_PEDWARN,
>   			   "named universal character escapes are only valid "
> @@ -1515,27 +1537,51 @@ _cpp_valid_ucn (cpp_reader *pfile, const
>   					   uname2c_tree, NULL);
>   		  if (result == (cppchar_t) -1)
>   		    {
> -		      cpp_error (pfile, CPP_DL_ERROR,
> -				 "\\N{%.*s} is not a valid universal "
> -				 "character", (int) (str - name), name);
> +		      bool ret = true;
> +		      if (identifier_pos
> +			  && (!CPP_OPTION (pfile, delimited_escape_seqs)
> +			      || !strict))
> +			ret = cpp_warning (pfile, CPP_W_UNICODE,
> +					   "\\N{%.*s} is not a valid "
> +					   "universal character; treating it "
> +					   "as separate tokens",
> +					   (int) (str - name), name);
> +		      else
> +			cpp_error (pfile, CPP_DL_ERROR,
> +				   "\\N{%.*s} is not a valid universal "
> +				   "character", (int) (str - name), name);
>   
>   		      /* Try to do a loose name lookup according to
>   			 Unicode loose matching rule UAX44-LM2.  */
>   		      char canon_name[uname2c_max_name_len + 1];
>   		      result = _cpp_uname2c_uax44_lm2 ((const char *) name,
>   						       str - name, canon_name);
> -		      if (result != (cppchar_t) -1)
> +		      if (result != (cppchar_t) -1 && ret)
>   			cpp_error (pfile, CPP_DL_NOTE,
>   				   "did you mean \\N{%s}?", canon_name);
>   		      else
> -			result = 0x40;
> +			result = 0xC0;
> +		      if (identifier_pos
> +			  && (!CPP_OPTION (pfile, delimited_escape_seqs)
> +			      || !strict))
> +			{
> +			  *cp = 0;
> +			  return false;
> +			}
>   		    }
>   		}
>   	      str++;
>   	      extend_char_range (char_range, loc_reader);
>   	    }
>   	  else if (identifier_pos)
> -	    length = 1;
> +	    {
> +	      cpp_warning (pfile, CPP_W_UNICODE,
> +			   "'\\N{' not terminated with '}' after %.*s; "
> +			   "treating it as separate tokens",
> +			   (int) (str - base), base);
> +	      *cp = 0;
> +	      return false;
> +	    }
>   	  else
>   	    {
>   	      cpp_error (pfile, CPP_DL_ERROR,
> @@ -1584,12 +1630,17 @@ _cpp_valid_ucn (cpp_reader *pfile, const
>         }
>       while (--length);
>   
> -  if (delimited
> -      && str < limit
> -      && *str == '}'
> -      && (length != 32 || !identifier_pos))
> +  if (delimited && str < limit && *str == '}')
>       {
> -      if (length == 32)
> +      if (length == 32 && identifier_pos)
> +	{
> +	  cpp_warning (pfile, CPP_W_UNICODE,
> +		       "empty delimited escape sequence; "
> +		       "treating it as separate tokens");
> +	  *cp = 0;
> +	  return false;
> +	}
> +      else if (length == 32)
>   	cpp_error (pfile, CPP_DL_ERROR,
>   		   "empty delimited escape sequence");
>         else if (!CPP_OPTION (pfile, delimited_escape_seqs)
> @@ -1607,6 +1658,11 @@ _cpp_valid_ucn (cpp_reader *pfile, const
>        error message in that case.  */
>     if (length && identifier_pos)
>       {
> +      if (delimited)
> +	cpp_warning (pfile, CPP_W_UNICODE,
> +		     "'\\u{' not terminated with '}' after %.*s; "
> +		     "treating it as separate tokens",
> +		     (int) (str - base), base);
>         *cp = 0;
>         return false;
>       }
> --- gcc/doc/invoke.texi.jj	2022-09-03 09:35:40.966991672 +0200
> +++ gcc/doc/invoke.texi	2022-09-03 11:39:03.875914845 +0200
> @@ -365,7 +365,7 @@ Objective-C and Objective-C++ Dialects}.
>   -Winfinite-recursion @gol
>   -Winit-self  -Winline  -Wno-int-conversion  -Wint-in-bool-context @gol
>   -Wno-int-to-pointer-cast  -Wno-invalid-memory-model @gol
> --Winvalid-pch  -Winvalid-utf8 -Wjump-misses-init  @gol
> +-Winvalid-pch  -Winvalid-utf8  -Wno-unicode  -Wjump-misses-init  @gol
>   -Wlarger-than=@var{byte-size}  -Wlogical-not-parentheses  -Wlogical-op  @gol
>   -Wlong-long  -Wno-lto-type-mismatch -Wmain  -Wmaybe-uninitialized @gol
>   -Wmemset-elt-size  -Wmemset-transposed-args @gol
> @@ -9577,6 +9577,12 @@ Warn if an invalid UTF-8 character is fo
>   This warning is on by default for C++23 if @option{-finput-charset=UTF-8}
>   is used and turned into error with @option{-pedantic-errors}.
>   
> +@item -Wno-unicode
> +@opindex Wunicode
> +@opindex Wno-unicode
> +Don't diagnose invalid forms of delimited or named escape sequences which are
> +treated as separate tokens.  @option{Wunicode} is enabled by default.
> +
>   @item -Wlong-long
>   @opindex Wlong-long
>   @opindex Wno-long-long
> --- gcc/c-family/c.opt.jj	2022-09-03 09:35:40.206002393 +0200
> +++ gcc/c-family/c.opt	2022-09-03 11:17:04.529201926 +0200
> @@ -822,8 +822,8 @@ C ObjC C++ ObjC++ CPP(warn_invalid_pch)
>   Warn about PCH files that are found but not used.
>   
>   Winvalid-utf8
> -C objC C++ ObjC++ CPP(cpp_warn_invalid_utf8) CppReason(CPP_W_INVALID_UTF8) Var(warn_invalid_utf8) Init(0) Warning
> -Warn about invalid UTF-8 characters in comments.
> +C ObjC C++ ObjC++ CPP(cpp_warn_invalid_utf8) CppReason(CPP_W_INVALID_UTF8) Var(warn_invalid_utf8) Init(0) Warning
> +Warn about invalid UTF-8 characters.
>   
>   Wjump-misses-init
>   C ObjC Var(warn_jump_misses_init) Warning LangEnabledby(C ObjC,Wc++-compat)
> @@ -1345,6 +1345,10 @@ Wundef
>   C ObjC C++ ObjC++ CPP(warn_undef) CppReason(CPP_W_UNDEF) Var(cpp_warn_undef) Init(0) Warning
>   Warn if an undefined macro is used in an #if directive.
>   
> +Wunicode
> +C ObjC C++ ObjC++ CPP(cpp_warn_unicode) CppReason(CPP_W_UNICODE) Var(warn_unicode) Init(1) Warning
> +Warn about invalid forms of delimited or named escape sequences.
> +
>   Wuninitialized
>   C ObjC C++ ObjC++ LTO LangEnabledBy(C ObjC C++ ObjC++ LTO,Wall)
>   ;
> --- gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-4.c.jj	2022-09-03 11:13:37.570068845 +0200
> +++ gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-4.c	2022-09-03 11:56:52.818054420 +0200
> @@ -0,0 +1,13 @@
> +/* P2290R3 - Delimited escape sequences */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target wchar } */
> +/* { dg-options "-std=gnu99 -Wno-c++-compat" { target c } } */
> +/* { dg-options "-std=gnu++20" { target c++ } } */
> +
> +#define z(x) 0
> +#define a z(
> +int b = a\u{});		/* { dg-warning "empty delimited escape sequence; treating it as separate tokens" } */
> +int c = a\u{);		/* { dg-warning "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{; treating it as separate tokens" } */
> +int d = a\u{12XYZ});	/* { dg-warning "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{12; treating it as separate tokens" } */
> +int e = a\u123);
> +int f = a\U1234567);
> --- gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-5.c.jj	2022-09-03 11:13:37.570068845 +0200
> +++ gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-5.c	2022-09-03 12:01:35.618124647 +0200
> @@ -0,0 +1,13 @@
> +/* P2290R3 - Delimited escape sequences */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target wchar } */
> +/* { dg-options "-std=c17 -Wno-c++-compat" { target c } } */
> +/* { dg-options "-std=c++23" { target c++ } } */
> +
> +#define z(x) 0
> +#define a z(
> +int b = a\u{});		/* { dg-warning "empty delimited escape sequence; treating it as separate tokens" "" { target c++23 } } */
> +int c = a\u{);		/* { dg-warning "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{; treating it as separate tokens" "" { target c++23 } } */
> +int d = a\u{12XYZ});	/* { dg-warning "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{12; treating it as separate tokens" "" { target c++23 } } */
> +int e = a\u123);
> +int f = a\U1234567);
> --- gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-6.c.jj	2022-09-03 11:59:36.573778876 +0200
> +++ gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-6.c	2022-09-03 11:59:55.808511591 +0200
> @@ -0,0 +1,13 @@
> +/* P2290R3 - Delimited escape sequences */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target wchar } */
> +/* { dg-options "-std=gnu99 -Wno-c++-compat -Wno-unicode" { target c } } */
> +/* { dg-options "-std=gnu++20 -Wno-unicode" { target c++ } } */
> +
> +#define z(x) 0
> +#define a z(
> +int b = a\u{});		/* { dg-bogus "empty delimited escape sequence; treating it as separate tokens" } */
> +int c = a\u{);		/* { dg-bogus "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{; treating it as separate tokens" } */
> +int d = a\u{12XYZ});	/* { dg-bogus "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{12; treating it as separate tokens" } */
> +int e = a\u123);
> +int f = a\U1234567);
> --- gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-7.c.jj	2022-09-03 12:01:48.958939255 +0200
> +++ gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-7.c	2022-09-03 12:02:16.765552854 +0200
> @@ -0,0 +1,13 @@
> +/* P2290R3 - Delimited escape sequences */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target wchar } */
> +/* { dg-options "-std=c17 -Wno-c++-compat -Wno-unicode" { target c } } */
> +/* { dg-options "-std=c++23 -Wno-unicode" { target c++ } } */
> +
> +#define z(x) 0
> +#define a z(
> +int b = a\u{});		/* { dg-bogus "empty delimited escape sequence; treating it as separate tokens" } */
> +int c = a\u{);		/* { dg-bogus "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{; treating it as separate tokens" } */
> +int d = a\u{12XYZ});	/* { dg-bogus "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{12; treating it as separate tokens" } */
> +int e = a\u123);
> +int f = a\U1234567);
> --- gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-5.c.jj	2022-09-03 11:13:37.570068845 +0200
> +++ gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-5.c	2022-09-03 12:45:18.968747909 +0200
> @@ -0,0 +1,17 @@
> +/* P2071R2 - Named universal character escapes */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target wchar } */
> +/* { dg-options "-std=gnu99 -Wno-c++-compat" { target c } } */
> +/* { dg-options "-std=gnu++20" { target c++ } } */
> +
> +#define z(x) 0
> +#define a z(
> +int b = a\N{});				/* { dg-warning "empty named universal character escape sequence; treating it as separate tokens" } */
> +int c = a\N{);				/* { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" } */
> +int d = a\N);
> +int e = a\NARG);
> +int f = a\N{abc});				/* { dg-warning "\\\\N\\\{abc\\\} is not a valid universal character; treating it as separate tokens" } */
> +int g = a\N{ABC.123});				/* { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{ABC; treating it as separate tokens" } */
> +int h = a\N{NON-EXISTENT CHAR});	/* { dg-warning "\\\\N\\\{NON-EXISTENT CHAR\\\} is not a valid universal character; treating it as separate tokens" } */
> +int i = a\N{Latin_Small_Letter_A_With_Acute});	/* { dg-warning "\\\\N\\\{Latin_Small_Letter_A_With_Acute\\\} is not a valid universal character; treating it as separate tokens" } */
> +					/* { dg-message "did you mean \\\\N\\\{LATIN SMALL LETTER A WITH ACUTE\\\}\\?" "" { target *-*-* } .-1 } */
> --- gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-6.c.jj	2022-09-03 11:13:37.570068845 +0200
> +++ gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-6.c	2022-09-03 11:44:34.558316155 +0200
> @@ -0,0 +1,17 @@
> +/* P2071R2 - Named universal character escapes */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target wchar } */
> +/* { dg-options "-std=c17 -Wno-c++-compat" { target c } } */
> +/* { dg-options "-std=c++20" { target c++ } } */
> +
> +#define z(x) 0
> +#define a z(
> +int b = a\N{});
> +int c = a\N{);
> +int d = a\N);
> +int e = a\NARG);
> +int f = a\N{abc});
> +int g = a\N{ABC.123});
> +int h = a\N{NON-EXISTENT CHAR});	/* { dg-bogus "is not a valid universal character" } */
> +int i = a\N{Latin_Small_Letter_A_With_Acute});
> +int j = a\N{LATIN SMALL LETTER A WITH ACUTE});
> --- gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-7.c.jj	2022-09-03 12:18:31.296022384 +0200
> +++ gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-7.c	2022-09-03 12:45:57.663212248 +0200
> @@ -0,0 +1,17 @@
> +/* P2071R2 - Named universal character escapes */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target wchar } */
> +/* { dg-options "-std=gnu99 -Wno-c++-compat -Wno-unicode" { target c } } */
> +/* { dg-options "-std=gnu++20 -Wno-unicode" { target c++ } } */
> +
> +#define z(x) 0
> +#define a z(
> +int b = a\N{});				/* { dg-bogus "empty named universal character escape sequence; treating it as separate tokens" } */
> +int c = a\N{);				/* { dg-bogus "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" } */
> +int d = a\N);
> +int e = a\NARG);
> +int f = a\N{abc});				/* { dg-bogus "\\\\N\\\{abc\\\} is not a valid universal character; treating it as separate tokens" } */
> +int g = a\N{ABC.123});				/* { dg-bogus "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{ABC; treating it as separate tokens" } */
> +int h = a\N{NON-EXISTENT CHAR});	/* { dg-bogus "\\\\N\\\{NON-EXISTENT CHAR\\\} is not a valid universal character; treating it as separate tokens" } */
> +int i = a\N{Latin_Small_Letter_A_With_Acute});	/* { dg-bogus "\\\\N\\\{Latin_Small_Letter_A_With_Acute\\\} is not a valid universal character; treating it as separate tokens" } */
> +					/* { dg-bogus "did you mean \\\\N\\\{LATIN SMALL LETTER A WITH ACUTE\\\}\\?" "" { target *-*-* } .-1 } */
> --- gcc/testsuite/g++.dg/cpp23/named-universal-char-escape1.C.jj	2022-09-03 11:13:37.571068831 +0200
> +++ gcc/testsuite/g++.dg/cpp23/named-universal-char-escape1.C	2022-09-03 12:44:03.893787182 +0200
> @@ -0,0 +1,16 @@
> +// P2071R2 - Named universal character escapes
> +// { dg-do compile }
> +// { dg-require-effective-target wchar }
> +
> +#define z(x) 0
> +#define a z(
> +int b = a\N{});				// { dg-warning "empty named universal character escape sequence; treating it as separate tokens" "" { target c++23 } }
> +int c = a\N{);				// { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" "" { target c++23 } }
> +int d = a\N);
> +int e = a\NARG);
> +int f = a\N{abc});			// { dg-warning "\\\\N\\\{abc\\\} is not a valid universal character; treating it as separate tokens" "" { target c++23 } }
> +int g = a\N{ABC.123});			// { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{ABC; treating it as separate tokens" "" { target c++23 } }
> +int h = a\N{NON-EXISTENT CHAR});	// { dg-error "is not a valid universal character" "" { target c++23 } }
> +					// { dg-error "was not declared in this scope" "" { target c++23 } .-1 }
> +int i = a\N{Latin_Small_Letter_A_With_Acute});	// { dg-warning "\\\\N\\\{Latin_Small_Letter_A_With_Acute\\\} is not a valid universal character; treating it as separate tokens" "" { target c++23 } }
> +					// { dg-message "did you mean \\\\N\\\{LATIN SMALL LETTER A WITH ACUTE\\\}\\?" "" { target c++23 } .-1 }
> --- gcc/testsuite/g++.dg/cpp23/named-universal-char-escape2.C.jj	2022-09-03 11:13:37.571068831 +0200
> +++ gcc/testsuite/g++.dg/cpp23/named-universal-char-escape2.C	2022-09-03 12:44:31.723401937 +0200
> @@ -0,0 +1,18 @@
> +// P2071R2 - Named universal character escapes
> +// { dg-do compile }
> +// { dg-require-effective-target wchar }
> +// { dg-options "" }
> +
> +#define z(x) 0
> +#define a z(
> +int b = a\N{});				// { dg-warning "empty named universal character escape sequence; treating it as separate tokens" }
> +int c = a\N{);				// { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" }
> +int d = a\N);
> +int e = a\NARG);
> +int f = a\N{abc});			// { dg-warning "\\\\N\\\{abc\\\} is not a valid universal character; treating it as separate tokens" }
> +int g = a\N{ABC.123});			// { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{ABC; treating it as separate tokens" }
> +int h = a\N{NON-EXISTENT CHAR});	// { dg-error "is not a valid universal character" "" { target c++23 } }
> +					// { dg-error "was not declared in this scope" "" { target c++23 } .-1 }
> +					// { dg-warning "\\\\N\\\{NON-EXISTENT CHAR\\\} is not a valid universal character; treating it as separate tokens" "" { target c++20_down } .-2 }
> +int i = a\N{Latin_Small_Letter_A_With_Acute});	// { dg-warning "\\\\N\\\{Latin_Small_Letter_A_With_Acute\\\} is not a valid universal character; treating it as separate tokens" }
> +					// { dg-message "did you mean \\\\N\\\{LATIN SMALL LETTER A WITH ACUTE\\\}\\?" "" { target *-*-* } .-1 }
> 
> 
> 	Jakub
>