From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-70751-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 23004 invoked by alias); 28 Oct 2002 18:53:31 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 22992 invoked from network); 28 Oct 2002 18:53:30 -0000
Received: from unknown (HELO brown.csi.cam.ac.uk) (131.111.8.14)
  by sources.redhat.com with SMTP; 28 Oct 2002 18:53:30 -0000
Received: from student.cusu.cam.ac.uk
	([131.111.179.82] helo=kern.srcf.societies.cam.ac.uk ident=mail)
	by brown.csi.cam.ac.uk with esmtp (Exim 4.10)
	id 186F0z-0005oY-00; Mon, 28 Oct 2002 18:53:29 +0000
Received: from jsm28 (helo=localhost)
	by kern.srcf.societies.cam.ac.uk with local-esmtp (Exim 3.35 #1 (Debian))
	id 186F0y-00023H-00; Mon, 28 Oct 2002 18:53:28 +0000
Date: Mon, 28 Oct 2002 10:53:00 -0000
From: "Joseph S. Myers" <jsm28@cam.ac.uk>
X-X-Sender:  <jsm28@kern.srcf.societies.cam.ac.uk>
To: Zack Weinberg <zack@codesourcery.com>
cc: Martin =?iso-8859-1?Q?v=2E_L=F6wis?= <loewis@informatik.hu-berlin.de>, 
     <gcc-patches@gcc.gnu.org>,  <java@gcc.gnu.org>
Subject: Re: Implementing Universal Character Names in identifiers
In-Reply-To: <20021028183910.GC24090@codesourcery.com>
Message-ID: <Pine.LNX.4.33.0210281844370.4320-100000@kern.srcf.societies.cam.ac.uk>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-SW-Source: 2002-10/txt/msg01709.txt.bz2

On Mon, 28 Oct 2002, Zack Weinberg wrote:

> What you wrote in response to this is interesting but doesn't address
> the issue of Unicode normalization of identifiers.  It sounds more
> like an extended discussion of the previous point.  I'm talking about
> the process described in UAX 15 (http://www.unicode.org/unicode/reports/tr15/)
> and in particular annex 7 of that document ("Programming Language
> Identifiers").

I don't think there's anything in the language standards to permit
normalization to NFC as described there.  (It could be done in "phase 0"  
for UTF-8 in the input file, like we ignore whitespace at end of line, but
not for UCNs.  And do we really want to build in the large character
tables required for normalization?)

>  - In cpplib, provide routines that validate individual identifiers
>    against the precise lists in C99 and C++98.
> 
>  - GCC enforces the precise lists in C99 and C++98 only in -pedantic
>    mode.

There's still the typo in the C++98 list that's a recognised Defect that
should be corrected (following existing practice of implementing
resolutions to Defect Reports before they make it into a TC).  But 
non-pedantic should use the current Unicode ranges of identifier 
characters for both languages.

-- 
Joseph S. Myers
jsm28@cam.ac.uk