From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-71465-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 18667 invoked by alias); 7 Nov 2002 09:47:50 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 18646 invoked from network); 7 Nov 2002 09:47:46 -0000
Received: from unknown (HELO mail.informatik.hu-berlin.de) (141.20.20.50)
  by sources.redhat.com with SMTP; 7 Nov 2002 09:47:46 -0000
Received: from paros.informatik.hu-berlin.de (paros [141.20.23.39])
	by mail.informatik.hu-berlin.de (8.11.3/8.11.3/INF-2.0-MA-SOLARIS-2.8) with ESMTP id gA79lj720799;
	Thu, 7 Nov 2002 10:47:45 +0100 (MET)
Received: (from loewis@localhost)
	by paros.informatik.hu-berlin.de (8.12.2+Sun/8.12.2/Submit) id gA79liST006444;
	Thu, 7 Nov 2002 10:47:44 +0100 (CET)
X-Authentication-Warning: paros.informatik.hu-berlin.de: loewis set sender to loewis@informatik.hu-berlin.de using -f
To: Neil Booth <neil@daikokuya.co.uk>
Cc: gcc-patches@gcc.gnu.org
Subject: Re: Implementing Universal Character Names in identifiers
References: <200210280715.g9S7FdI2003815@paros.informatik.hu-berlin.de>
	<20021107080904.GE11859@daikokuya.co.uk>
	<j4adkl3ga3.fsf@informatik.hu-berlin.de>
	<20021107091150.GA12793@daikokuya.co.uk>
From: loewis@informatik.hu-berlin.de (Martin v. =?iso-8859-1?q?L=F6wis?=)
Date: Thu, 07 Nov 2002 01:47:00 -0000
In-Reply-To: <20021107091150.GA12793@daikokuya.co.uk>
Message-ID: <j4u1it1zjz.fsf@informatik.hu-berlin.de>
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3.50
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-SW-Source: 2002-11/txt/msg00429.txt.bz2

Neil Booth <neil@daikokuya.co.uk> writes:

> We should definitely accept it.  Why should UCNs be different from
> everything else?  I can see that C++ calls it undefined behaviour, but
> C99 appears to require it.   It's also important, to me at least, from
> a QOI perspective.

I don't think C99 requires it. 5.1.1.2/1.2 says

# If, as a result, a character sequence that matches the syntax of a
# universal character name is produced, the behavior is undefined.

I think UCNs are rightfully different from nearly everything else;
they are quite similar to multi-byte characters. If you have an
escaped newline in the middle of a multi-byte character, you would not
expect concatenation to create a new multi-byte character, either,
would you?

I cannot see any important use cases for such a
feature. Implementations are allowed to reject this case, and it
simplifies the implementation to reject it, so I can see really no
reason to make life more complicated than necessary. Producing an
error now still gives the opportunity to provide an extension later.

Notice that the compiler deliberatly abstains from providing a
well-definition of undefined behaviour in some cases, to point out
portability issues. Users often complain that GCC provides too many
extensions, so I think every single extension must be judged very
carefully.

> A backslash is a token; so is u00c0.  Your example is indeed an
> error, but was not what I had in mind.  I suspect pasting just works,
> anyway.

Can you please give an example for what you had in mind?

Regards,
Martin