From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 11419 invoked by alias); 6 Nov 2007 19:54:02 -0000 Received: (qmail 11407 invoked by uid 22791); 6 Nov 2007 19:54:01 -0000 X-Spam-Check-By: sourceware.org Received: from smtp-out.google.com (HELO smtp-out.google.com) (216.239.45.13) by sourceware.org (qpsmtpd/0.31) with ESMTP; Tue, 06 Nov 2007 19:53:57 +0000 Received: from zps19.corp.google.com (zps19.corp.google.com [172.25.146.19]) by smtp-out.google.com with ESMTP id lA6JrtQl008049 for ; Tue, 6 Nov 2007 11:53:55 -0800 Received: from wa-out-1112.google.com (wahj5.prod.google.com [10.114.236.5]) by zps19.corp.google.com with ESMTP id lA6JqsOp001515 for ; Tue, 6 Nov 2007 11:53:45 -0800 Received: by wa-out-1112.google.com with SMTP id j5so2585819wah for ; Tue, 06 Nov 2007 11:53:45 -0800 (PST) Received: by 10.141.52.5 with SMTP id e5mr3104973rvk.1194378825504; Tue, 06 Nov 2007 11:53:45 -0800 (PST) Received: by 10.141.129.12 with HTTP; Tue, 6 Nov 2007 11:53:45 -0800 (PST) Message-ID: <29bd08b70711061153n3f7b2196q72c82bab81f62361@mail.gmail.com> Date: Tue, 06 Nov 2007 20:53:00 -0000 From: "Lawrence Crowl" To: "Joseph S. Myers" Subject: Re: Old UTF16 patch Cc: "Elena Zannoni" , gcc@gcc.gnu.org, "Tom Tromey" In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <472A61CC.2050609@oracle.com> X-IsSubscribed: yes Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2007-11/txt/msg00166.txt.bz2 On 11/1/07, Joseph S. Myers wrote: > I haven't followed any developments relating to TR19769 in WG14 > after its publication in detail; has WG14 yet given an answer > on what should be done with u'C' where C represents a single > character that requires a surrogate pair to represent in UTF-16 > (to name one noted place where the TR underspecifies things)? Pending such an answer, I think gcc should make such characters ill-formed. The text in the C TR is "The corresponding character constant is denoted by u'c-char-sequence' and has the type char16_t." Given that surrogate pairs are unrepresentable in that type, I conclude that the intent was to make character literals requiring surrogates ill-formed. The C++ standard also makes such characters ill-formed. Furthermore, making them ill-formed will be upward compatible should the C committee choose some other interpretation. > A TR is not a standard, so for C this must be disabled in all strict > conformance modes (note that it affects the rules for lexing and so > changes the semantics of conforming programs); likewise for C++98. > The C++0x draft includes the notation from TR19769, so the feature > should be enabled by default in C++0x (and so far as the C TR is > compatible with C++0x, both should be followed in both C and C++ > when the feature is enabled). Note that char16_t and char32_t are typedefs in C but primitive types in C++, just like wchar_t. -- Lawrence Crowl