From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 18667 invoked by alias); 7 Nov 2002 09:47:50 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 18646 invoked from network); 7 Nov 2002 09:47:46 -0000 Received: from unknown (HELO mail.informatik.hu-berlin.de) (141.20.20.50) by sources.redhat.com with SMTP; 7 Nov 2002 09:47:46 -0000 Received: from paros.informatik.hu-berlin.de (paros [141.20.23.39]) by mail.informatik.hu-berlin.de (8.11.3/8.11.3/INF-2.0-MA-SOLARIS-2.8) with ESMTP id gA79lj720799; Thu, 7 Nov 2002 10:47:45 +0100 (MET) Received: (from loewis@localhost) by paros.informatik.hu-berlin.de (8.12.2+Sun/8.12.2/Submit) id gA79liST006444; Thu, 7 Nov 2002 10:47:44 +0100 (CET) X-Authentication-Warning: paros.informatik.hu-berlin.de: loewis set sender to loewis@informatik.hu-berlin.de using -f To: Neil Booth Cc: gcc-patches@gcc.gnu.org Subject: Re: Implementing Universal Character Names in identifiers References: <200210280715.g9S7FdI2003815@paros.informatik.hu-berlin.de> <20021107080904.GE11859@daikokuya.co.uk> <20021107091150.GA12793@daikokuya.co.uk> From: loewis@informatik.hu-berlin.de (Martin v. =?iso-8859-1?q?L=F6wis?=) Date: Thu, 07 Nov 2002 01:47:00 -0000 In-Reply-To: <20021107091150.GA12793@daikokuya.co.uk> Message-ID: User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3.50 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-SW-Source: 2002-11/txt/msg00429.txt.bz2 Neil Booth writes: > We should definitely accept it. Why should UCNs be different from > everything else? I can see that C++ calls it undefined behaviour, but > C99 appears to require it. It's also important, to me at least, from > a QOI perspective. I don't think C99 requires it. 5.1.1.2/1.2 says # If, as a result, a character sequence that matches the syntax of a # universal character name is produced, the behavior is undefined. I think UCNs are rightfully different from nearly everything else; they are quite similar to multi-byte characters. If you have an escaped newline in the middle of a multi-byte character, you would not expect concatenation to create a new multi-byte character, either, would you? I cannot see any important use cases for such a feature. Implementations are allowed to reject this case, and it simplifies the implementation to reject it, so I can see really no reason to make life more complicated than necessary. Producing an error now still gives the opportunity to provide an extension later. Notice that the compiler deliberatly abstains from providing a well-definition of undefined behaviour in some cases, to point out portability issues. Users often complain that GCC provides too many extensions, so I think every single extension must be judged very carefully. > A backslash is a token; so is u00c0. Your example is indeed an > error, but was not what I had in mind. I suspect pasting just works, > anyway. Can you please give an example for what you had in mind? Regards, Martin