From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 23465 invoked by alias); 12 Sep 2008 15:57:21 -0000 Received: (qmail 23454 invoked by uid 22791); 12 Sep 2008 15:57:20 -0000 X-Spam-Check-By: sourceware.org Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (65.74.133.4) by sourceware.org (qpsmtpd/0.31) with ESMTP; Fri, 12 Sep 2008 15:56:41 +0000 Received: (qmail 15451 invoked from network); 12 Sep 2008 15:56:39 -0000 Received: from unknown (HELO digraph.polyomino.org.uk) (joseph@127.0.0.2) by mail.codesourcery.com with ESMTPA; 12 Sep 2008 15:56:39 -0000 Received: from jsm28 (helo=localhost) by digraph.polyomino.org.uk with local-esmtp (Exim 4.68) (envelope-from ) id 1KeB0g-0004ku-Jx; Fri, 12 Sep 2008 15:56:38 +0000 Date: Fri, 12 Sep 2008 16:52:00 -0000 From: "Joseph S. Myers" To: Jakub Jelinek cc: Tom Tromey , Jason Merrill , gcc-patches@gcc.gnu.org, Kris Van Hees , Ulrich Drepper Subject: Re: [PATCH] Support for C++0x and C1x u8 string literals and raw string literals In-Reply-To: <20080912132007.GA9666@hs20-bc2-1.build.redhat.com> Message-ID: References: <20080912132007.GA9666@hs20-bc2-1.build.redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2008-09/txt/msg00973.txt.bz2 On Fri, 12 Sep 2008, Jakub Jelinek wrote: > The following patch adds support for the rest. There is one thing > in which currently gcc raw strings violates the standard, because of the > controversial extension which treats backslash whitespace newline > the same as backslash newline. I've added test for that and xfailed > it for now. For the raw string delimiter sequences I've tried to That is not a standard violation - it's GCC defining a phase 1 translation that cannot result in all possible sequences of basic source characters. (I do think however we should stop doing this, and so allow backslash whitespace newline sequences to be represented.) > be really pedantic and accept only basic source charset character except > the listed 7, rather than say all characters except the listed 7 > plus maybe disallowing '\0', as this is a new feature I think being > pedantic doesn't hurt. In one of the raw string papers floating > around there was an example using R"@[...]@" which is not pedantically > valid, as @ is not basic source charset character. u8 string But that example is conditionally valid in C++ only, although not in C, because in phase 1 @ will have been converted to a UCN (part of the existing C++98 semantics we don't implement). The validity is only conditional because there is no requirement to use the same UCN for each instance of @. I'll raise the issue on the WG14 reflector. I've raised another question I noticed there - N1333 would change string literals for C to be const-qualified, which it seems to be agreed was not intended. As this is obviously a new feature for 4.5, by the time Stage 1 starts we may well have new C and C++ drafts with some of the glitches sorted out. -- Joseph S. Myers joseph@codesourcery.com