From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-158326-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 3697 invoked by alias); 16 Sep 2005 00:02:23 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 3639 invoked by alias); 16 Sep 2005 00:02:03 -0000
Date: Fri, 16 Sep 2005 00:02:00 -0000
Message-ID: <20050916000203.3638.qmail@sourceware.org>
From: "geoffk at geoffk dot org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
In-Reply-To: <20030127145600.9449.rearnsha@arm.com>
References: <20030127145600.9449.rearnsha@arm.com>
Reply-To: gcc-bugzilla@gcc.gnu.org
Subject: [Bug preprocessor/9449] UCNs not recognized in identifiers (c++/c99)
X-Bugzilla-Reason: CC
X-SW-Source: 2005-09/txt/msg01920.txt.bz2
List-Id: <gcc-bugs.sourceware.org>


------- Additional Comments From geoffk at geoffk dot org  2005-09-16 00:01 -------
Subject: Re:  UCNs not recognized in identifiers (c++/c99)


On 15/09/2005, at 3:53 PM, joseph at codesourcery dot com wrote:

>   Yes, "spelling" is meant in terms of the source code characters.
>   The idea is to permit simple strcmp-like checking by the  
> preprocessor.

Good, so that answers that question.

You raise a good point about GCC not having documentation for phase  
1.  I don't have time to write all of it, but I think I can write the  
last part, about UCNs, so maybe together we can get it all done.  My  
proposed wording is:

@cite{The mapping between physical source file multibyte characters
and the source character set in translation phase 1 (C90 and C99  
5.1.1.2).}

[CR/NL/CR-NL are turned into EOL markers, spaces are deleted between  
backslash and the end of a line, it's converted to UTF-8 using iconv  
based on -finput-charset---and what else?]

Then, any character sequence which would form a UCN in an identifier  
in phase 3 of translation is converted into the corresponding UTF-8  
sequence.  Any backslash-newline combinations in the UCN are  
preserved and placed after the UTF-8 sequence.

[note that there's no way for a user to tell whether a backslash- 
newline combination is placed before, in the middle of, or after, the  
UTF-8 sequence.]

...

@cite{Which additional multibyte characters may appear in identifiers
and their correspondence to universal character names (C99 6.4.2).}

UTF-8 character sequences may appear in identifiers, and they  
correspond to the UCN that specifies that character.  A UTF-8  
sequence may appear only if the UCN that it corresponds to would be  
permitted in the identifier at that point.  At present, only those  
UTF-8 sequences which were produced by the mapping from UCNs to UTF-8  
sequences in translation phase 1 are permitted, but this is likely to  
change in the future.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=9449