From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jakub@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124])
	by sourceware.org (Postfix) with ESMTPS id A14A83858C83
	for <gcc-patches@gcc.gnu.org>; Thu,  1 Sep 2022 20:23:34 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A14A83858C83
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1662063814;
	h=from:from:reply-to:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:in-reply-to:in-reply-to:  references:references;
	bh=8YCqOwW18CmabHwaaF9rj24UO5cfGNUfUk+d2a4OcjM=;
	b=c7sOnVjpjdwoT+g3Vh3m8I9lKtrnthdPpZVfl2YBJTE1boh/p2u/NZ46J8eKCgatK4CUdI
	VKERl3v/W3JfvGDLrC4z3fS8+pqmyp2/hYg3DrEmgYpKKtJVoKxWhZ8VKvI6FnpbAe3fR/
	sXJFYAA5bLpItX+bfi17xUcI0LX2ar0=
Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com
 [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 us-mta-248-y_Vyu5k8ODiRQB2VI_FqZQ-1; Thu, 01 Sep 2022 16:23:31 -0400
X-MC-Unique: y_Vyu5k8ODiRQB2VI_FqZQ-1
Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10])
	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mimecast-mx02.redhat.com (Postfix) with ESMTPS id AE012801231;
	Thu,  1 Sep 2022 20:23:30 +0000 (UTC)
Received: from tucnak.zalov.cz (unknown [10.39.192.41])
	by smtp.corp.redhat.com (Postfix) with ESMTPS id 5A1F9492C3B;
	Thu,  1 Sep 2022 20:23:30 +0000 (UTC)
Received: from tucnak.zalov.cz (localhost [127.0.0.1])
	by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 281KNRrx2597569
	(version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT);
	Thu, 1 Sep 2022 22:23:28 +0200
Received: (from jakub@localhost)
	by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 281KNQ672597568;
	Thu, 1 Sep 2022 22:23:26 +0200
Date: Thu, 1 Sep 2022 22:23:26 +0200
From: Jakub Jelinek <jakub@redhat.com>
To: Jason Merrill <jason@redhat.com>
Cc: Joseph Myers <joseph@codesourcery.com>, gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] c++, v2: Implement C++23 P2071R2 - Named universal
 character escapes [PR106648]
Message-ID: <YxEUvi/A5hOg5Sa+@tucnak>
Reply-To: Jakub Jelinek <jakub@redhat.com>
References: <Ywc3pI1lnzq/FvOu@tucnak>
 <alpine.DEB.2.22.394.2208302055240.446383@digraph.polyomino.org.uk>
 <Yw5+nPD8O+JTx3uL@tucnak>
 <Yw6DA3MhofyzWnje@tucnak>
 <Yw9xsBRmTqkLMlGC@tucnak>
 <5da578e7-9c43-99ea-15c1-aefc641a0654@redhat.com>
 <Yw95MR3YN1aT2ks6@tucnak>
 <df9730f4-d796-7bf6-dd18-d0c9c5a0cf12@redhat.com>
 <YxCULjMrhvN5f7xR@tucnak>
 <37250e6c-80f9-2b93-a381-c1c9b869c04d@redhat.com>
MIME-Version: 1.0
In-Reply-To: <37250e6c-80f9-2b93-a381-c1c9b869c04d@redhat.com>
X-Scanned-By: MIMEDefang 2.85 on 10.11.54.10
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
X-Spam-Status: No, score=-3.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Thu, Sep 01, 2022 at 03:00:28PM -0400, Jason Merrill wrote:
> > Apparently clang uses -Wunicode option to cover these, but unfortunately
> > they don't bother to document it (nor almost any other warning option),
> > so it is unclear what else exactly it covers.  Plus a question is how
> > we should document that option for GCC...
> 
> We might as well use the same flag name, and document it to mean what it
> currently means for GCC.

Ok, will work on that tomorrow.

> > @@ -1489,8 +1507,16 @@ _cpp_valid_ucn (cpp_reader *pfile, const
> >   	  if (str < limit && *str == '}')
> >   	    {
> > -	      if (name == str && identifier_pos)
> > +	      if (identifier_pos && (name == str || !strict))
> >   		{
> > +		  if (name == str)
> > +		    cpp_warning (pfile, CPP_W_NONE,
> > +				 "empty named universal character escape "
> > +				 "sequence; treating it as separate tokens");
> > +		  else
> > +		    cpp_warning (pfile, CPP_W_NONE,
> > +				 "incomplete named universal character escape "
> > +				 "sequence; treating it as separate tokens");
> 
> It looks like this is handling \N{abc}, for which "incomplete" seems like
> the wrong description; it's complete, just wrong, and the diagnostic doesn't
> help correct it.

The point is to make it more consistent with the \N{X.1} handling.
The grammar is clear that only upper case letters + digits + space + hyphen
can appear in between \N{ and }.  So, both of those cases IMHO should be
handled the same.  The !strict case is if there is at least one lower case
letter or underscore but no other characters than letters + digits + space +
hyphen + underscore, we then find the terminating } and inside of
string/character literals want to do the UAX44LM2 algorithm suggestions.
But for X.1 in literals we don't even look for }, we just emit the
              cpp_error (pfile, CPP_DL_ERROR,
                         "'\\N{' not terminated with '}' after %.*s",
                         (int) (str - base), base);
diagnostics which prints after X
For the identifier_pos case, both the !strict and *str != '}' cases
are the same reason why it is treated as separate tokens, not because
the name is not valid, but because it contains invalid characters.
So perhaps for the identifier_pos !strict and *str != '}' cases
we could emit a warning with the same wording as above (but so that
we stop for !strict on the first lowercase or _ char just break instead
of set strict = true if identifier_pos).
Or we could emit such a warning and a note that would clarify that only
upper case letters, digits, space or hyphen are allowed there?

	Jakub