From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-435601-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 85777 invoked by alias); 9 Sep 2016 23:17:31 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 85764 invoked by uid 89); 9 Sep 2016 23:17:30 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=0.4 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS autolearn=no version=3.3.2 spammy=perception, perceived, Hx-languages-length:2816, HX-Received:sk:p188mr6
X-HELO: mail-yw0-f181.google.com
Received: from mail-yw0-f181.google.com (HELO mail-yw0-f181.google.com) (209.85.161.181) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 09 Sep 2016 23:17:29 +0000
Received: by mail-yw0-f181.google.com with SMTP id u124so42817557ywg.3        for <gcc-patches@gcc.gnu.org>; Fri, 09 Sep 2016 16:17:29 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=1e100.net; s=20130820;        h=x-gm-message-state:subject:to:references:cc:from:message-id:date         :user-agent:mime-version:in-reply-to:content-transfer-encoding;        bh=mRQTVIP+CbG05VvyzvFh0fgtet+viYnkolMhERfcDx4=;        b=Qg+PyQtbBrIqNy0w1bf+g6/T6VsJ4jRvQFIAuI/v8bZDaLH5Mze3Dk42FJqsFrZcdw         oNO8lfiwF9gsIeuhd9zNhhbUWUwoj/a/w8rYfdgJBD2j3b8AFHqIRiM0n7WYZuymunzm         xNXYzy39U7DjvqECGiOeniCgfJU6snfNczJaR+m6FPeaNLs+DNlh/PfVYxqGzvNGWWIm         /cxck9Uw8+dfns8nhSbRsLFw47AOiJZ02/WaiDPcnqRrLpI8SD4dxZSrIozE/z8jbEmS         zBSzBvoU95njRcFIvOxRuKVfMPxxS2iB5pn+hlV4ot+l7y/+RBSkIujTkjpP3xgBQtNI         i3xA==
X-Gm-Message-State: AE9vXwMRyJfQUk0UnS61ylUcJUnWxrU6TbVSz1SNAyWOYiFvWdAFY2hJSKv55qvQgvkH/A==
X-Received: by 10.129.41.197 with SMTP id p188mr6147894ywp.13.1473463047499;        Fri, 09 Sep 2016 16:17:27 -0700 (PDT)
Received: from [192.168.0.26] (75-166-199-51.hlrn.qwest.net. [75.166.199.51])        by smtp.gmail.com with ESMTPSA id m129sm2036664ywd.48.2016.09.09.16.17.26        (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);        Fri, 09 Sep 2016 16:17:26 -0700 (PDT)
Subject: Re: [PATCH] avoid non-printable characters in diagnostics (c/77620, c/77521)
To: Joseph Myers <joseph@codesourcery.com>
References: <57D21C36.3020906@gmail.com> <alpine.DEB.2.20.1609091354170.24297@digraph.polyomino.org.uk>
Cc: Gcc Patch List <gcc-patches@gcc.gnu.org>
From: Martin Sebor <msebor@gmail.com>
Message-ID: <57D34305.3020908@gmail.com>
Date: Sat, 10 Sep 2016 00:07:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0
MIME-Version: 1.0
In-Reply-To: <alpine.DEB.2.20.1609091354170.24297@digraph.polyomino.org.uk>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-IsSubscribed: yes
X-SW-Source: 2016-09/txt/msg00564.txt.bz2

On 09/09/2016 07:59 AM, Joseph Myers wrote:
> On Thu, 8 Sep 2016, Martin Sebor wrote:
>
>> PS I used hexadecimal based on what c-format.c does but now that
>> I checked more carefully how %qE formats string literals I see it
>> uses octal.  I think hexadecimal is preferable because it avoids
>> ambiguity but I'm open to changing it to octal if there's a strong
>
> I'm not clear what you mean about ambiguity.  In C strings, an octal
> escape sequence has up to three characters, so if it has three characters
> it's unambiguous, whereas a hex escape sequence can have any number of
> characters, so if the unprintable character is followed by a valid hex
> digit then in C you need to represent that as an escape (or use string
> constant concatenation, etc.).  The patch doesn't try to do that as far as
> I can see.
>
> Now, presumably the output isn't intended to be interpreted as C strings
> anyway (if it was, you'd need to escape " and \ as well), so the patch is
> OK, but I don't think it avoids ambiguity (and there's a clear case that
> it shouldn't - that if the string passed to %qs is printable, it should be
> printed as-is even if it contains escape sequences that could also result
> from a non-printable string passed to %qs).

Thank you.

I tried to be clear about it in the description of the changes
but I see the PS caused some confusion.  Let me clarify that
the patch has nothing to do with with ambiguity (perceived or
real) in the representation of the escape sequences.  The only
purpose of the change is to avoid printing non-printable
characters or excessively large escape sequences in GCC
diagnostics.

I mentioned the hex vs octal notation to invite input into which
of the two of them people would prefer to see used by the %qc and
qs directives, and whether it's worth considering changing the %qE
directive to use the same notation as well, for consistency (and
to help with readability if there is consensus that one is clearer
than the other).

What I meant by ambiguity is for example a string like "\1234"
where it's not obvious where the octal sequence ends.  Is it '\1'
followed  by "234" or '\12' followed by "34" or '\123' followed
by "4"?  (It's only possible to tell if one knows that GCC always
uses three digits for the octal character, but not everyone knows
that.)  To be clear: I'm talking about the GCC output and not
necessarily about what the standard has to say about it.

In contrast to the octal notation, I find the string "\x1234"
clearer.  It can only mean '\x1' followed by "234" or '\x12'
followed by "34" and I think more people will expect it to be
the latter because representing characters using two hex digits
is more common.  But this is just my own perception and YMMV.

Martin