From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-131836-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 484 invoked by alias); 21 Feb 2005 20:15:16 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 32625 invoked by alias); 21 Feb 2005 20:15:00 -0000
Date: Mon, 21 Feb 2005 23:50:00 -0000
Message-ID: <20050221201500.32621.qmail@sourceware.org>
From: "zack at codesourcery dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
In-Reply-To: <20030127145600.9449.rearnsha@arm.com>
References: <20030127145600.9449.rearnsha@arm.com>
Reply-To: gcc-bugzilla@gcc.gnu.org
Subject: [Bug preprocessor/9449] UCNs not recognized in identifiers (c++/c99)
X-Bugzilla-Reason: CC
X-SW-Source: 2005-02/txt/msg02516.txt.bz2
List-Id: <gcc-bugs.sourceware.org>


------- Additional Comments From zack at codesourcery dot com  2005-02-21 20:14 -------
Subject: Re:  UCNs not recognized in identifiers
 (c++/c99)

"geoffk at geoffk dot org" <gcc-bugzilla@gcc.gnu.org> writes:

> Although I agree that these are all (except the below) nice things to 
> have, I don't think I agree that they are all preconditions to having 
> any part of an implementation.  For instance, an implementation that 
> said sorry() when using # on an identifier from a UCN would still be 
> more useful than the complete lack of implementation we have now.

In my book, a complete lack of implementation of this particular
feature is better than an incomplete one.  This is because I see the
vast majority of the work required to do a complete implementation as
being due-diligence tasks needed to ensure that the feature cannot
crash the compiler, cause wrong code generation, or introduce
compatibility problems, and as long as someone is going to do all that
work, why shouldn't they do the rest of the job as long as they're in
there?

> The second half would a pp-number, instead.  It is always true that
> splitting an identifier between characters yields two valid
> preprocessing tokens.

Joseph has mostly explained this, but I should add that what you get
if you split, say, "a\u0660b", between the "a" and the backslash is
two identifiers, the second of which's "initial character is a
universal character name designating a digit", which violates a
shall-clause in a semantics paragraph, and therefore provokes
undefined behavior. (C99 6.4.2.1p3.)

Standing policy is that all cases which provoke undefined behavior
inside the preprocessor, except already-documented GNU extensions,
shall produce hard errors.  I am tempted to make a partial exception
in this case in the interest of better compatibility with C++.  Almost
all of the UCNs in the "digits" block of C99 annex D are completely
excluded from C++98 annex E - so "a\u0660b" for instance is an invalid
identifier, and we never get as far as wondering what happens if we
split it before the backslash.  However, the range 0e50-0e59 is in the
"Thai" range of C++98/E, but *both* the "Thai" and the "Digits" ranges
of C99/D.  It would be sensible, IMO, to resolve the error in C99/D by
removing 0e50-0e59 from the "Digits" range, thus permitting those
characters to begin identifiers in both C and C++.  [Note that
currently ucnid.tab takes the opposite position.]

zw


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=9449