From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 23004 invoked by alias); 28 Oct 2002 18:53:31 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 22992 invoked from network); 28 Oct 2002 18:53:30 -0000 Received: from unknown (HELO brown.csi.cam.ac.uk) (131.111.8.14) by sources.redhat.com with SMTP; 28 Oct 2002 18:53:30 -0000 Received: from student.cusu.cam.ac.uk ([131.111.179.82] helo=kern.srcf.societies.cam.ac.uk ident=mail) by brown.csi.cam.ac.uk with esmtp (Exim 4.10) id 186F0z-0005oY-00; Mon, 28 Oct 2002 18:53:29 +0000 Received: from jsm28 (helo=localhost) by kern.srcf.societies.cam.ac.uk with local-esmtp (Exim 3.35 #1 (Debian)) id 186F0y-00023H-00; Mon, 28 Oct 2002 18:53:28 +0000 Date: Mon, 28 Oct 2002 10:53:00 -0000 From: "Joseph S. Myers" X-X-Sender: To: Zack Weinberg cc: Martin =?iso-8859-1?Q?v=2E_L=F6wis?= , , Subject: Re: Implementing Universal Character Names in identifiers In-Reply-To: <20021028183910.GC24090@codesourcery.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-SW-Source: 2002-10/txt/msg01709.txt.bz2 On Mon, 28 Oct 2002, Zack Weinberg wrote: > What you wrote in response to this is interesting but doesn't address > the issue of Unicode normalization of identifiers. It sounds more > like an extended discussion of the previous point. I'm talking about > the process described in UAX 15 (http://www.unicode.org/unicode/reports/tr15/) > and in particular annex 7 of that document ("Programming Language > Identifiers"). I don't think there's anything in the language standards to permit normalization to NFC as described there. (It could be done in "phase 0" for UTF-8 in the input file, like we ignore whitespace at end of line, but not for UCNs. And do we really want to build in the large character tables required for normalization?) > - In cpplib, provide routines that validate individual identifiers > against the precise lists in C99 and C++98. > > - GCC enforces the precise lists in C99 and C++98 only in -pedantic > mode. There's still the typo in the C++98 list that's a recognised Defect that should be corrected (following existing practice of implementing resolutions to Defect Reports before they make it into a TC). But non-pedantic should use the current Unicode ranges of identifier characters for both languages. -- Joseph S. Myers jsm28@cam.ac.uk