From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 85777 invoked by alias); 9 Sep 2016 23:17:31 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 85764 invoked by uid 89); 9 Sep 2016 23:17:30 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.4 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS autolearn=no version=3.3.2 spammy=perception, perceived, Hx-languages-length:2816, HX-Received:sk:p188mr6 X-HELO: mail-yw0-f181.google.com Received: from mail-yw0-f181.google.com (HELO mail-yw0-f181.google.com) (209.85.161.181) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 09 Sep 2016 23:17:29 +0000 Received: by mail-yw0-f181.google.com with SMTP id u124so42817557ywg.3 for ; Fri, 09 Sep 2016 16:17:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=mRQTVIP+CbG05VvyzvFh0fgtet+viYnkolMhERfcDx4=; b=Qg+PyQtbBrIqNy0w1bf+g6/T6VsJ4jRvQFIAuI/v8bZDaLH5Mze3Dk42FJqsFrZcdw oNO8lfiwF9gsIeuhd9zNhhbUWUwoj/a/w8rYfdgJBD2j3b8AFHqIRiM0n7WYZuymunzm xNXYzy39U7DjvqECGiOeniCgfJU6snfNczJaR+m6FPeaNLs+DNlh/PfVYxqGzvNGWWIm /cxck9Uw8+dfns8nhSbRsLFw47AOiJZ02/WaiDPcnqRrLpI8SD4dxZSrIozE/z8jbEmS zBSzBvoU95njRcFIvOxRuKVfMPxxS2iB5pn+hlV4ot+l7y/+RBSkIujTkjpP3xgBQtNI i3xA== X-Gm-Message-State: AE9vXwMRyJfQUk0UnS61ylUcJUnWxrU6TbVSz1SNAyWOYiFvWdAFY2hJSKv55qvQgvkH/A== X-Received: by 10.129.41.197 with SMTP id p188mr6147894ywp.13.1473463047499; Fri, 09 Sep 2016 16:17:27 -0700 (PDT) Received: from [192.168.0.26] (75-166-199-51.hlrn.qwest.net. [75.166.199.51]) by smtp.gmail.com with ESMTPSA id m129sm2036664ywd.48.2016.09.09.16.17.26 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 09 Sep 2016 16:17:26 -0700 (PDT) Subject: Re: [PATCH] avoid non-printable characters in diagnostics (c/77620, c/77521) To: Joseph Myers References: <57D21C36.3020906@gmail.com> Cc: Gcc Patch List From: Martin Sebor Message-ID: <57D34305.3020908@gmail.com> Date: Sat, 10 Sep 2016 00:07:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2016-09/txt/msg00564.txt.bz2 On 09/09/2016 07:59 AM, Joseph Myers wrote: > On Thu, 8 Sep 2016, Martin Sebor wrote: > >> PS I used hexadecimal based on what c-format.c does but now that >> I checked more carefully how %qE formats string literals I see it >> uses octal. I think hexadecimal is preferable because it avoids >> ambiguity but I'm open to changing it to octal if there's a strong > > I'm not clear what you mean about ambiguity. In C strings, an octal > escape sequence has up to three characters, so if it has three characters > it's unambiguous, whereas a hex escape sequence can have any number of > characters, so if the unprintable character is followed by a valid hex > digit then in C you need to represent that as an escape (or use string > constant concatenation, etc.). The patch doesn't try to do that as far as > I can see. > > Now, presumably the output isn't intended to be interpreted as C strings > anyway (if it was, you'd need to escape " and \ as well), so the patch is > OK, but I don't think it avoids ambiguity (and there's a clear case that > it shouldn't - that if the string passed to %qs is printable, it should be > printed as-is even if it contains escape sequences that could also result > from a non-printable string passed to %qs). Thank you. I tried to be clear about it in the description of the changes but I see the PS caused some confusion. Let me clarify that the patch has nothing to do with with ambiguity (perceived or real) in the representation of the escape sequences. The only purpose of the change is to avoid printing non-printable characters or excessively large escape sequences in GCC diagnostics. I mentioned the hex vs octal notation to invite input into which of the two of them people would prefer to see used by the %qc and qs directives, and whether it's worth considering changing the %qE directive to use the same notation as well, for consistency (and to help with readability if there is consensus that one is clearer than the other). What I meant by ambiguity is for example a string like "\1234" where it's not obvious where the octal sequence ends. Is it '\1' followed by "234" or '\12' followed by "34" or '\123' followed by "4"? (It's only possible to tell if one knows that GCC always uses three digits for the octal character, but not everyone knows that.) To be clear: I'm talking about the GCC output and not necessarily about what the standard has to say about it. In contrast to the octal notation, I find the string "\x1234" clearer. It can only mean '\x1' followed by "234" or '\x12' followed by "34" and I think more people will expect it to be the latter because representing characters using two hex digits is more common. But this is just my own perception and YMMV. Martin