From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jakub@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124])
	by sourceware.org (Postfix) with ESMTPS id 01C7C3858D39
	for <gcc-patches@gcc.gnu.org>; Wed, 31 Aug 2022 15:07:37 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 01C7C3858D39
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1661958457;
	h=from:from:reply-to:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=Ipb0OXZdUfud+uuW/3T8IJdyYWImoUH+P86IMVraqwU=;
	b=MrPwVU3/AObJKjyNY6S3JDsA4plfIGlW1v3C7Lc0DmPOkISY1NMKm4J61MQEn4OVYdY3FC
	t2XqIOTMmqfSv/Rk01/ieiICmB/JbEZ/lphpVPCaBUZKl6ySftYWsYYSCq/xN06b7zVxXR
	5BmEo/GvcR0uAEZa/3XQ3RnfE5h/h7M=
Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com
 [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 us-mta-550-8IOLd_FXOay44rQXFiunyQ-1; Wed, 31 Aug 2022 11:07:33 -0400
X-MC-Unique: 8IOLd_FXOay44rQXFiunyQ-1
Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6])
	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 15B1B805AF5;
	Wed, 31 Aug 2022 15:07:33 +0000 (UTC)
Received: from tucnak.zalov.cz (unknown [10.39.192.41])
	by smtp.corp.redhat.com (Postfix) with ESMTPS id AF1EC2166B26;
	Wed, 31 Aug 2022 15:07:32 +0000 (UTC)
Received: from tucnak.zalov.cz (localhost [127.0.0.1])
	by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 27VF7UEt206283
	(version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT);
	Wed, 31 Aug 2022 17:07:30 +0200
Received: (from jakub@localhost)
	by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 27VF7Tt3206282;
	Wed, 31 Aug 2022 17:07:29 +0200
Date: Wed, 31 Aug 2022 17:07:29 +0200
From: Jakub Jelinek <jakub@redhat.com>
To: Jason Merrill <jason@redhat.com>
Cc: Joseph Myers <joseph@codesourcery.com>, gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] c++, v2: Implement C++23 P2071R2 - Named universal
 character escapes [PR106648]
Message-ID: <Yw95MR3YN1aT2ks6@tucnak>
Reply-To: Jakub Jelinek <jakub@redhat.com>
References: <YwJ22kdlxJ70JcPJ@tucnak>
 <4fcd7e74-6f1c-dbec-a42c-e4e3fd13470b@redhat.com>
 <Ywc3pI1lnzq/FvOu@tucnak>
 <alpine.DEB.2.22.394.2208302055240.446383@digraph.polyomino.org.uk>
 <Yw5+nPD8O+JTx3uL@tucnak>
 <Yw6DA3MhofyzWnje@tucnak>
 <Yw9xsBRmTqkLMlGC@tucnak>
 <5da578e7-9c43-99ea-15c1-aefc641a0654@redhat.com>
MIME-Version: 1.0
In-Reply-To: <5da578e7-9c43-99ea-15c1-aefc641a0654@redhat.com>
X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=WINDOWS-1252
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-3.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Wed, Aug 31, 2022 at 10:52:49AM -0400, Jason Merrill wrote:
> It could be more explicit, but I think we can assume that from the existing
> wording; it says it designates the named character.  If there is no such
> character, that cannot be satisfied, so it must be ill-formed.

Ok.

> > So, we could reject the int h case above and accept silently the others?
> 
> Why not warn on the others?

We were always silent for the cases like \u123X or \U12345X.
Do you think we should emit some warnings (but never pedwarns/errors in that
case) that it is universal character name like but not completely?

The following patch let's us silently accept:
#define z(x) 0
#define a z(
int b = a\u{});
int c = a\u{);
int d = a\N{});
int e = a\N{);
int f = a\u123);
int g = a\U1234567);
int h = a\N);
int i = a\NARG);
int j = a\N{abc});
int k = a\N{ABC.123});

The following 2 will be still rejected with errors:
int l = a\N{ABC});
int m = a\N{LATIN SMALL LETTER A WITH ACUTE});
the first one because ABC is not valid Unicode name and the latter because
it will be int m = aá); and will trigger other errors later.

Given what you said above, I think that is what we want for the last 2
for C++23, the question is if it is ok also for C++20/C17 etc. and whether
it should depend on -pedantic or -pedantic-errors or GNU vs. ISO mode
or not in that case.  We could handle those 2 also differently, just
warn instead of error for the \N{ABC} case if not in C++23 mode when
identifier_pos.

--- libcpp/charset.cc.jj	2022-08-31 12:34:18.921176118 +0200
+++ libcpp/charset.cc	2022-08-31 16:50:48.862775486 +0200
@@ -1463,7 +1463,14 @@ _cpp_valid_ucn (cpp_reader *pfile, const
     {
       length = 4;
       if (str == limit || *str != '{')
-	cpp_error (pfile, CPP_DL_ERROR, "'\\N' not followed by '{'");
+	{
+	  if (identifier_pos)
+	    {
+	      *cp = 0;
+	      return false;
+	    }
+	  cpp_error (pfile, CPP_DL_ERROR, "'\\N' not followed by '{'");
+	}
       else
 	{
 	  str++;
@@ -1489,7 +1496,7 @@ _cpp_valid_ucn (cpp_reader *pfile, const
 
 	  if (str < limit && *str == '}')
 	    {
-	      if (name == str && identifier_pos)
+	      if (identifier_pos && (name == str || !strict))
 		{
 		  *cp = 0;
 		  return false;

	Jakub