From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 115996 invoked by alias); 12 Sep 2019 00:33:34 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 115979 invoked by uid 89); 12 Sep 2019 00:33:34 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-3.0 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.1 spammy=H*i:sk:vtckDjU, H*f:CAA_5UQ4, H*f:sk:vtckDjU, H*i:CAA_5UQ4 X-HELO: esa4.mentor.iphmx.com Received: from esa4.mentor.iphmx.com (HELO esa4.mentor.iphmx.com) (68.232.137.252) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 12 Sep 2019 00:33:33 +0000 IronPort-SDR: 7ikCwQxDJrNG9EoLKxekvZszlyztC8JvqT60ncWnmiOEEBCyeASYECFZmnjgRXXWZZiTNb5UBl dvXlG7KehVuUKU4150ux2Cf34KUInEFxKE7OaW0PdgR+GYd+30ad1gSekHpsBFf8SyrWRrYhjS VANZLJ900SMm0p0re+hjyBA4id5BzThc3Beh6IO1uBo1GFgTJlRN0N+lcciBDmAuy9ME35iOCu D35mXZzOkkeSQdS+2w8zz9XaVUIFtcxeg6X8oj5QS5Oj54legO1SbE1WgpV2xw2enSp6r0otyH E+w= Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa4.mentor.iphmx.com with ESMTP; 11 Sep 2019 16:33:31 -0800 IronPort-SDR: w0Ot78fy54n7ekFZmiTsW3DZHVpBUDPL5jWxZZXjol/2AHO5vMtVB1Wmg2yaPpMLCnBW6/h+kK wO4AVeplT/Mqppak9O/NexDiHesu3AsDtiyBFAb634zZ92YyowROd//oFWmOfw8D1ELAHTPvYG f4XUbpdT+6VDGqeOXaPou5s9wpKvyk6av13W2i7skhv8LHLgXz5xVGghJA+Cb2pDTYGSKTo8wb SI80ztAz+0na7bNsbuMyxqQY/bACzXC1+uRxyyzQQp5KCVJZxgz78/dKaBc7jon9NDlAku1276 H4o= Date: Thu, 12 Sep 2019 00:33:00 -0000 From: Joseph Myers To: Lewis Hyatt CC: Subject: Re: Patch to support extended characters in C/C++ identifiers In-Reply-To: Message-ID: References: <20190812220121.GA9251@ldh.local> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Return-Path: joseph@codesourcery.com X-SW-Source: 2019-09/txt/msg00822.txt.bz2 On Wed, 11 Sep 2019, Lewis Hyatt wrote: > things that may be a little surprising. For instance, you can take a > UTF-8 encoded file and insert a backslash line continuation in the > middle of a multibyte sequence, and gcc will happily paste it back > together and then interpret the resulting UTF-8. I think it's > technically OK standardwise since the conversion from extended > characters to the source character set is implementation-defined, but > it's hardly a straightforward definition. It is sort of consistent > with the treatment of undefined behavior with UCN escapes though, > which gcc already permits to be pasted together over a line > continuation. Anyway, should this behavior be documented as well? I I don't think that peculiarity should be documented. (Whereas accepting arbitrary bytes inside comments and strings by default is arguably actually a feature.) > > gcc/testsuite/g++.dg/cpp/ucnid-2-utf8.C and > > gcc/testsuite/g++.dg/cpp/ucnid-3-utf8.C are testing double stringizing in > > C++, where strictly the results they expect show that GCC does not conform > > to the C++ standard requirement to convert all extended characters to UCNs > > (because C++ does not have the special C rule making it > > implementation-defined whether the \ of a UCN in a string literal is > > doubled when stringizing). > > Thanks, I didn't mean to ignore this point when you made it on the PR > comments, I just wasn't sure what was the best way to handle it. Do > you find it preferable to just add a comment, or should I rather > change the test to look for the standard-confirming output, and make > it an XFAIL? My inclination would be a comment, with reference to a bug filed for this issue in Bugzilla. > Finally, one general question, when I submit these last changes, is it > better to send them as a new patch relative to what I already sent, or > is it better to send the whole thing updated from scratch? Thanks > again. A complete patch that can be applied to trunk is best. -- Joseph S. Myers joseph@codesourcery.com