From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jason@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124])
	by sourceware.org (Postfix) with ESMTPS id C0BCD3858D39
	for <gcc-patches@gcc.gnu.org>; Wed, 31 Aug 2022 16:14:28 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C0BCD3858D39
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1661962468;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=snfDU5APpCJtazHOTUQG917opL4bp+tN9fp/n5ZS0ik=;
	b=SFd2lqIHR+ar3MIAHtPHGpfXQa6r67bjXccTD+WoTcr7OUCA/PTZESxU2XfQQFILuy94nX
	P12s97OKYWqHvEfX9RLRkiuvHz7exn/78ctdgBf04VvPuMdq1S2tjV/K2LZtL6dvjddQXq
	q3qrHjBCLsPSX53X4QxBtgXiaEFMuls=
Received: from mail-qv1-f70.google.com (mail-qv1-f70.google.com
 [209.85.219.70]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id
 us-mta-62-54RaOGONMIOsqf8HM1BXTg-1; Wed, 31 Aug 2022 12:14:25 -0400
X-MC-Unique: 54RaOGONMIOsqf8HM1BXTg-1
Received: by mail-qv1-f70.google.com with SMTP id ls9-20020a0562145f8900b004990aeb7bd6so4728227qvb.4
        for <gcc-patches@gcc.gnu.org>; Wed, 31 Aug 2022 09:14:25 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:in-reply-to:from:references:cc:to
         :content-language:subject:user-agent:mime-version:date:message-id
         :x-gm-message-state:from:to:cc;
        bh=snfDU5APpCJtazHOTUQG917opL4bp+tN9fp/n5ZS0ik=;
        b=LlxwibnrX3yrrU0bLuGigCIx+loMy/r4T+QINOOUUz0eC8ZeLcylIxH1IF7m5efieO
         BUjWNJh4pGTXHe1aB2wbtDe1S9ZHEUsEuabVQ3REHYG2PZfrOIZl1X8PknwiqRiqEEOd
         r0oErUW+aUpjYhgsGyVTC1zfP5YSukI1kuvNOrMzzm2qFqyRwj6yd1GPhW/A4oRSKglF
         lamI+UCOD+W+kivJWVpZL1Q3IylQFdzVaC4CDShI4zfSrVfOHGkjpNeKYVlB1rg8s1LJ
         ZM8wi07J/mvj11UJ0cXWT/C5ca6TH5HCZ21LCi/daCRVa7k4OUJQqHhXi/G+tYYHUgBB
         5PWQ==
X-Gm-Message-State: ACgBeo3TDyQ+f4M6HGCoT7+Ex5ebBl15Xgm8n2TnbM5roDD2YJRDwRH0
	8gL6wYnE1scvpCCjU0Cn6wkRt7Ngj2eoEHrE6/n2dkGDd5u26LUloMpnXMZkgM/I3PfuOR+Xu6/
	ptifJeQJXZSTHCNxJ6Q==
X-Received: by 2002:a37:888:0:b0:6bc:68cf:cdf5 with SMTP id 130-20020a370888000000b006bc68cfcdf5mr16164468qki.639.1661962465262;
        Wed, 31 Aug 2022 09:14:25 -0700 (PDT)
X-Google-Smtp-Source: AA6agR58wW/M0Il2DtH4OWxyJubF2zZSkU36EHDpp+KfG/VScBvTYityX4uwpnkyp/xYvnAmjAb7qw==
X-Received: by 2002:a37:888:0:b0:6bc:68cf:cdf5 with SMTP id 130-20020a370888000000b006bc68cfcdf5mr16164442qki.639.1661962464948;
        Wed, 31 Aug 2022 09:14:24 -0700 (PDT)
Received: from [192.168.1.101] (130-44-159-43.s15913.c3-0.arl-cbr1.sbo-arl.ma.cable.rcncustomer.com. [130.44.159.43])
        by smtp.gmail.com with ESMTPSA id bw13-20020a05622a098d00b0031f36cd1958sm8728053qtb.81.2022.08.31.09.14.23
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Wed, 31 Aug 2022 09:14:24 -0700 (PDT)
Message-ID: <df9730f4-d796-7bf6-dd18-d0c9c5a0cf12@redhat.com>
Date: Wed, 31 Aug 2022 12:14:22 -0400
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
 Thunderbird/91.13.0
Subject: Re: [PATCH] c++, v2: Implement C++23 P2071R2 - Named universal
 character escapes [PR106648]
To: Jakub Jelinek <jakub@redhat.com>
Cc: Joseph Myers <joseph@codesourcery.com>, gcc-patches@gcc.gnu.org
References: <YwJ22kdlxJ70JcPJ@tucnak>
 <4fcd7e74-6f1c-dbec-a42c-e4e3fd13470b@redhat.com> <Ywc3pI1lnzq/FvOu@tucnak>
 <alpine.DEB.2.22.394.2208302055240.446383@digraph.polyomino.org.uk>
 <Yw5+nPD8O+JTx3uL@tucnak> <Yw6DA3MhofyzWnje@tucnak> <Yw9xsBRmTqkLMlGC@tucnak>
 <5da578e7-9c43-99ea-15c1-aefc641a0654@redhat.com> <Yw95MR3YN1aT2ks6@tucnak>
From: Jason Merrill <jason@redhat.com>
In-Reply-To: <Yw95MR3YN1aT2ks6@tucnak>
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Language: en-US
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-6.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On 8/31/22 11:07, Jakub Jelinek wrote:
> On Wed, Aug 31, 2022 at 10:52:49AM -0400, Jason Merrill wrote:
>> It could be more explicit, but I think we can assume that from the existing
>> wording; it says it designates the named character.  If there is no such
>> character, that cannot be satisfied, so it must be ill-formed.
> 
> Ok.
> 
>>> So, we could reject the int h case above and accept silently the others?
>>
>> Why not warn on the others?
> 
> We were always silent for the cases like \u123X or \U12345X.
> Do you think we should emit some warnings (but never pedwarns/errors in that
> case) that it is universal character name like but not completely?

I think that would be helpful, at least for \u{ and \N{.

> The following patch let's us silently accept:
> #define z(x) 0
> #define a z(
> int b = a\u{});
> int c = a\u{);
> int d = a\N{});
> int e = a\N{);
> int f = a\u123);
> int g = a\U1234567);
> int h = a\N);
> int i = a\NARG);
> int j = a\N{abc});
> int k = a\N{ABC.123});
> 
> The following 2 will be still rejected with errors:
> int l = a\N{ABC});
> int m = a\N{LATIN SMALL LETTER A WITH ACUTE});
> the first one because ABC is not valid Unicode name and the latter because
> it will be int m = aá); and will trigger other errors later.
> 
> Given what you said above, I think that is what we want for the last 2
> for C++23, the question is if it is ok also for C++20/C17 etc. and whether
> it should depend on -pedantic or -pedantic-errors or GNU vs. ISO mode
> or not in that case.  We could handle those 2 also differently, just
> warn instead of error for the \N{ABC} case if not in C++23 mode when
> identifier_pos.

That sounds right.

> Here is an incremental version of the patch which will make valid
> \u{123} and \N{LATIN SMALL LETTER A WITH ACUTE} an extension in GNU
> modes before C++23 and split it as separate tokens in ISO modes.

Looks good.

Jason