From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 484 invoked by alias); 21 Feb 2005 20:15:16 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 32625 invoked by alias); 21 Feb 2005 20:15:00 -0000 Date: Mon, 21 Feb 2005 23:50:00 -0000 Message-ID: <20050221201500.32621.qmail@sourceware.org> From: "zack at codesourcery dot com" To: gcc-bugs@gcc.gnu.org In-Reply-To: <20030127145600.9449.rearnsha@arm.com> References: <20030127145600.9449.rearnsha@arm.com> Reply-To: gcc-bugzilla@gcc.gnu.org Subject: [Bug preprocessor/9449] UCNs not recognized in identifiers (c++/c99) X-Bugzilla-Reason: CC X-SW-Source: 2005-02/txt/msg02516.txt.bz2 List-Id: ------- Additional Comments From zack at codesourcery dot com 2005-02-21 20:14 ------- Subject: Re: UCNs not recognized in identifiers (c++/c99) "geoffk at geoffk dot org" writes: > Although I agree that these are all (except the below) nice things to > have, I don't think I agree that they are all preconditions to having > any part of an implementation. For instance, an implementation that > said sorry() when using # on an identifier from a UCN would still be > more useful than the complete lack of implementation we have now. In my book, a complete lack of implementation of this particular feature is better than an incomplete one. This is because I see the vast majority of the work required to do a complete implementation as being due-diligence tasks needed to ensure that the feature cannot crash the compiler, cause wrong code generation, or introduce compatibility problems, and as long as someone is going to do all that work, why shouldn't they do the rest of the job as long as they're in there? > The second half would a pp-number, instead. It is always true that > splitting an identifier between characters yields two valid > preprocessing tokens. Joseph has mostly explained this, but I should add that what you get if you split, say, "a\u0660b", between the "a" and the backslash is two identifiers, the second of which's "initial character is a universal character name designating a digit", which violates a shall-clause in a semantics paragraph, and therefore provokes undefined behavior. (C99 6.4.2.1p3.) Standing policy is that all cases which provoke undefined behavior inside the preprocessor, except already-documented GNU extensions, shall produce hard errors. I am tempted to make a partial exception in this case in the interest of better compatibility with C++. Almost all of the UCNs in the "digits" block of C99 annex D are completely excluded from C++98 annex E - so "a\u0660b" for instance is an invalid identifier, and we never get as far as wondering what happens if we split it before the backslash. However, the range 0e50-0e59 is in the "Thai" range of C++98/E, but *both* the "Thai" and the "Digits" ranges of C99/D. It would be sensible, IMO, to resolve the error in C99/D by removing 0e50-0e59 from the "Digits" range, thus permitting those characters to begin identifiers in both C and C++. [Note that currently ucnid.tab takes the opposite position.] zw -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=9449