From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id 979513847839 for ; Fri, 6 Aug 2021 20:08:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 979513847839 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: swzbtSfqZ7mT136GFOiMmHXWjzwjWCIMKJVoAiAUJkTyJ6M0F42LVqOX+9GSx+pZhw5D09FnbX WdcWvDtPJh7Wf0G4trlDd+Dyt3DcAAj98nfE9lOZoG30ERhmVN8KPkWZICVWDFmWoZ0r6WfVPz AtAJk9Ki12FqRongJxAVNFods4PkK+36TpyRrHbzOOUEQtsxu9iCvXpzDk3mS9MzirjPbVK5jS UP8TPCnD/ZbvTKMChucsyCRfaWS9Ucglrc5CLA7QTR+LE6cU72Qr4SF+bHUGvfsOJcaxauf4ph 7bt2uQmWeJHbRWZUadVrbWAO X-IronPort-AV: E=Sophos;i="5.84,301,1620720000"; d="scan'208";a="66856097" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa1.mentor.iphmx.com with ESMTP; 06 Aug 2021 12:08:23 -0800 IronPort-SDR: KqbsrBnDqmPVVSjFKPN5XD4u1rRaEc3/FHVSrigarQofyR0XIiaTg04ihZTLVLR9WBPE2OOwbd ceb3q7GA667fyE2AePA2qCXtZp0W0EQnH9I5kyvd1aygVidH/fy0saMLFlw7pBiq1ktGr11vVR jRAI5i/dSZhH2g3JI+PTim6W4S96tU+G8PXtpWC4/082UF5udTxgnCOoOuf7WxGMdYQOkQkF73 sxPGyHPA/LSbD8pm2twoP6u2kSR15z1DsZx8xvURW4UxkA2eyMELKZha1dgtBOanJ9vCHFiFnM ob4= Date: Fri, 6 Aug 2021 20:08:17 +0000 From: Joseph Myers X-X-Sender: jsm28@digraph.polyomino.org.uk To: Jakub Jelinek CC: Jason Merrill , Marek Polacek , Subject: Re: [PATCH] libcpp: For C++23 treat UCNs and UTF-8 chars not valid in identifiers as separate tokens In-Reply-To: <20210806144757.GW2380545@tucnak> Message-ID: References: <20210806080906.GR2380545@tucnak> <20210806095356.GU2380545@tucnak> <20210806144757.GW2380545@tucnak> User-Agent: Alpine 2.22 (DEB 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-07.mgc.mentorg.com (139.181.222.7) To svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) X-Spam-Status: No, score=-3119.3 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Aug 2021 20:08:26 -0000 On Fri, 6 Aug 2021, Jakub Jelinek via Gcc-patches wrote: > On Fri, Aug 06, 2021 at 11:53:56AM +0200, Jakub Jelinek via Gcc-patches wrote: > > Actually, there is another change in P1949R7 that I haven't touched > > in the patch and not sure what the implications are. > > > > To the preprocessing-token non-terminal it adds > > each universal-character-name that cannot be one of the above > > and changes the following paragraph: > > ... > > preprocessing operators and punctuators, and single > > +universal-character-names and > > non-whitespace characters that do not lexically match the other > > preprocessing token categories. > > +If a single universal-character-name does not match any of the other > > +preprocessing token categories, the program is ill-formed. > > If a ' or a " character matches the last category, the behavior > > is undefined. > > ... > > If the above (and identifier-start and identifier-continue non-terminals > only mentioning XID_Start+0x5F and XID_Continue UCNs) means that we should > indeed put each such UTF-8 char or UCN into a separate CPP_OTHER token > for C++23, then we need something like this incremental patch. > The drawback is worse diagnostics though, so maybe it would be useful if > the cpp_error that ... is not valid in an identifier or is not > valid at the start of an identifier would be emitted as a warning (and not > warn when skipping)? It's not clear to me that this change to the standard actually requires any change in how GCC behaves. A UCN (or character considered to be converted to a UCN) that's not valid in identifiers is still invalid in a context where an identifier preprocessing token could occur (including in #if 0), whether it's interpreted as a "single UCN" preprocessing token (stated to be ill-formed) or (part of) an invalid identifier preprocessing token. -- Joseph S. Myers joseph@codesourcery.com