From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <joseph_myers@mentor.com>
Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153])
 by sourceware.org (Postfix) with ESMTPS id 979513847839
 for <gcc-patches@gcc.gnu.org>; Fri,  6 Aug 2021 20:08:24 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 979513847839
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
IronPort-SDR: swzbtSfqZ7mT136GFOiMmHXWjzwjWCIMKJVoAiAUJkTyJ6M0F42LVqOX+9GSx+pZhw5D09FnbX
 WdcWvDtPJh7Wf0G4trlDd+Dyt3DcAAj98nfE9lOZoG30ERhmVN8KPkWZICVWDFmWoZ0r6WfVPz
 AtAJk9Ki12FqRongJxAVNFods4PkK+36TpyRrHbzOOUEQtsxu9iCvXpzDk3mS9MzirjPbVK5jS
 UP8TPCnD/ZbvTKMChucsyCRfaWS9Ucglrc5CLA7QTR+LE6cU72Qr4SF+bHUGvfsOJcaxauf4ph
 7bt2uQmWeJHbRWZUadVrbWAO
X-IronPort-AV: E=Sophos;i="5.84,301,1620720000"; d="scan'208";a="66856097"
Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167])
 by esa1.mentor.iphmx.com with ESMTP; 06 Aug 2021 12:08:23 -0800
IronPort-SDR: KqbsrBnDqmPVVSjFKPN5XD4u1rRaEc3/FHVSrigarQofyR0XIiaTg04ihZTLVLR9WBPE2OOwbd
 ceb3q7GA667fyE2AePA2qCXtZp0W0EQnH9I5kyvd1aygVidH/fy0saMLFlw7pBiq1ktGr11vVR
 jRAI5i/dSZhH2g3JI+PTim6W4S96tU+G8PXtpWC4/082UF5udTxgnCOoOuf7WxGMdYQOkQkF73
 sxPGyHPA/LSbD8pm2twoP6u2kSR15z1DsZx8xvURW4UxkA2eyMELKZha1dgtBOanJ9vCHFiFnM
 ob4=
Date: Fri, 6 Aug 2021 20:08:17 +0000
From: Joseph Myers <joseph@codesourcery.com>
X-X-Sender: jsm28@digraph.polyomino.org.uk
To: Jakub Jelinek <jakub@redhat.com>
CC: Jason Merrill <jason@redhat.com>, Marek Polacek <polacek@redhat.com>,
 <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] libcpp: For C++23 treat UCNs and UTF-8 chars not valid
 in identifiers as separate tokens
In-Reply-To: <20210806144757.GW2380545@tucnak>
Message-ID: <alpine.DEB.2.22.394.2108062000070.1280523@digraph.polyomino.org.uk>
References: <20210806080906.GR2380545@tucnak>
 <20210806095356.GU2380545@tucnak> <20210806144757.GW2380545@tucnak>
User-Agent: Alpine 2.22 (DEB 394 2020-01-19)
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: SVR-IES-MBX-07.mgc.mentorg.com (139.181.222.7) To
 svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1)
X-Spam-Status: No, score=-3119.3 required=5.0 tests=BAYES_00,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=no autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 06 Aug 2021 20:08:26 -0000

On Fri, 6 Aug 2021, Jakub Jelinek via Gcc-patches wrote:

> On Fri, Aug 06, 2021 at 11:53:56AM +0200, Jakub Jelinek via Gcc-patches wrote:
> > Actually, there is another change in P1949R7 that I haven't touched
> > in the patch and not sure what the implications are.
> > 
> > To the preprocessing-token non-terminal it adds
> > 	each universal-character-name that cannot be one of the above
> > and changes the following paragraph:
> >  ...
> >  preprocessing operators and punctuators, and single
> > +universal-character-names and
> >  non-whitespace characters that do not lexically match the other
> >  preprocessing token categories.
> > +If a single universal-character-name does not match any of the other
> > +preprocessing token categories, the program is ill-formed.
> >  If a ' or a " character matches the last category, the behavior
> >  is undefined.
> >  ...
> 
> If the above (and identifier-start and identifier-continue non-terminals
> only mentioning XID_Start+0x5F and XID_Continue UCNs) means that we should
> indeed put each such UTF-8 char or UCN into a separate CPP_OTHER token
> for C++23, then we need something like this incremental patch.
> The drawback is worse diagnostics though, so maybe it would be useful if
> the cpp_error that ... is not valid in an identifier or is not
> valid at the start of an identifier would be emitted as a warning (and not
> warn when skipping)?

It's not clear to me that this change to the standard actually requires 
any change in how GCC behaves.  A UCN (or character considered to be 
converted to a UCN) that's not valid in identifiers is still invalid in a 
context where an identifier preprocessing token could occur (including in 
#if 0), whether it's interpreted as a "single UCN" preprocessing token 
(stated to be ill-formed) or (part of) an invalid identifier preprocessing 
token.

-- 
Joseph S. Myers
joseph@codesourcery.com