From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 1334 invoked by alias); 12 Sep 2008 20:06:57 -0000 Received: (qmail 1323 invoked by uid 22791); 12 Sep 2008 20:06:56 -0000 X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (66.187.233.31) by sourceware.org (qpsmtpd/0.31) with ESMTP; Fri, 12 Sep 2008 20:06:22 +0000 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id m8CK3ulo000680; Fri, 12 Sep 2008 16:03:56 -0400 Received: from hs20-bc2-1.build.redhat.com (hs20-bc2-1.build.redhat.com [10.10.28.34]) by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id m8CK3tnC000716; Fri, 12 Sep 2008 16:03:55 -0400 Received: from hs20-bc2-1.build.redhat.com (localhost.localdomain [127.0.0.1]) by hs20-bc2-1.build.redhat.com (8.13.1/8.13.1) with ESMTP id m8CK3t6S026979; Fri, 12 Sep 2008 16:03:55 -0400 Received: (from jakub@localhost) by hs20-bc2-1.build.redhat.com (8.13.1/8.13.1/Submit) id m8CK3s4A026975; Fri, 12 Sep 2008 16:03:54 -0400 Date: Fri, 12 Sep 2008 21:00:00 -0000 From: Jakub Jelinek To: "Joseph S. Myers" Cc: Tom Tromey , Jason Merrill , gcc-patches@gcc.gnu.org, Kris Van Hees , Ulrich Drepper Subject: Re: [PATCH] Support for C++0x and C1x u8 string literals and raw string literals Message-ID: <20080912200354.GD9666@hs20-bc2-1.build.redhat.com> Reply-To: Jakub Jelinek References: <20080912132007.GA9666@hs20-bc2-1.build.redhat.com> <20080912191951.GC9666@hs20-bc2-1.build.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.1i X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2008-09/txt/msg00990.txt.bz2 On Fri, Sep 12, 2008 at 07:53:04PM +0000, Joseph S. Myers wrote: > On Fri, 12 Sep 2008, Jakub Jelinek wrote: > > > UCNs aren't valid in d-char-sequence though, only in normal strings and within > > r-char-sequence. > > However, backslash is valid in d-char-sequence, and so are all the other > characters making up UCNs. The way I read N2723 is that in phase 1 each @ > chararcter is converted to either \u0040 or \U00000040, then in phase 3 > that sequence of characters may end up being interpreted as something > other than a UCN. If a sequence matching UCN syntax is produced by > deleting backslash-newline, that's undefined in C++ (unlike in C), but > what's not mentioned as undefined is a UCN from stage 1 not being > interpreted as a UCN in stage 3 - whether through being in a > d-char-sequence or for any other reason. Certainly writing \u0040 > directly in a d-char-sequence would appear to be valid. If it is up to the implementation to choose between \u0040 and \U00000040, then writing R"@@[]@@"; would be either valid or invalid, depending on whether the implementation has replaced it by \u0040 or \U00000040 (as in the latter case it is 2 x 10 characters, more than 16 char limit for d-char-sequence). Jakub