public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jeff Law <law@redhat.com>
To: Martin Sebor <msebor@gmail.com>, Joseph Myers <joseph@codesourcery.com>
Cc: Gcc Patch List <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] avoid non-printable characters in diagnostics (c/77620, c/77521)
Date: Fri, 16 Sep 2016 16:57:00 -0000	[thread overview]
Message-ID: <acc71358-2750-7a31-9d11-c5ce29aa60ea@redhat.com> (raw)
In-Reply-To: <57D34305.3020908@gmail.com>

On 09/09/2016 05:17 PM, Martin Sebor wrote:
> On 09/09/2016 07:59 AM, Joseph Myers wrote:
>> On Thu, 8 Sep 2016, Martin Sebor wrote:
>>
>>> PS I used hexadecimal based on what c-format.c does but now that
>>> I checked more carefully how %qE formats string literals I see it
>>> uses octal.  I think hexadecimal is preferable because it avoids
>>> ambiguity but I'm open to changing it to octal if there's a strong
>>
>> I'm not clear what you mean about ambiguity.  In C strings, an octal
>> escape sequence has up to three characters, so if it has three characters
>> it's unambiguous, whereas a hex escape sequence can have any number of
>> characters, so if the unprintable character is followed by a valid hex
>> digit then in C you need to represent that as an escape (or use string
>> constant concatenation, etc.).  The patch doesn't try to do that as
>> far as
>> I can see.
>>
>> Now, presumably the output isn't intended to be interpreted as C strings
>> anyway (if it was, you'd need to escape " and \ as well), so the patch is
>> OK, but I don't think it avoids ambiguity (and there's a clear case that
>> it shouldn't - that if the string passed to %qs is printable, it
>> should be
>> printed as-is even if it contains escape sequences that could also result
>> from a non-printable string passed to %qs).
>
> Thank you.
>
> I tried to be clear about it in the description of the changes
> but I see the PS caused some confusion.  Let me clarify that
> the patch has nothing to do with with ambiguity (perceived or
> real) in the representation of the escape sequences.  The only
> purpose of the change is to avoid printing non-printable
> characters or excessively large escape sequences in GCC
> diagnostics.
>
> I mentioned the hex vs octal notation to invite input into which
> of the two of them people would prefer to see used by the %qc and
> qs directives, and whether it's worth considering changing the %qE
> directive to use the same notation as well, for consistency (and
> to help with readability if there is consensus that one is clearer
> than the other).
>
> What I meant by ambiguity is for example a string like "\1234"
> where it's not obvious where the octal sequence ends.  Is it '\1'
> followed  by "234" or '\12' followed by "34" or '\123' followed
> by "4"?  (It's only possible to tell if one knows that GCC always
> uses three digits for the octal character, but not everyone knows
> that.)  To be clear: I'm talking about the GCC output and not
> necessarily about what the standard has to say about it.
>
> In contrast to the octal notation, I find the string "\x1234"
> clearer.  It can only mean '\x1' followed by "234" or '\x12'
> followed by "34" and I think more people will expect it to be
> the latter because representing characters using two hex digits
> is more common.  But this is just my own perception and YMMV.
Both styles are ambiguous, but isn't that an inherent problem once we 
try to avoid non-printable characters by rendering them as octal or hex 
sequences?

I can't make a strong argument for either style over the other.

Jeff

  reply	other threads:[~2016-09-16 16:49 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-09  4:10 Martin Sebor
2016-09-09 14:00 ` Joseph Myers
2016-09-10  0:07   ` Martin Sebor
2016-09-16 16:57     ` Jeff Law [this message]
     [not found]     ` <alpine.LSU.2.20.1612311507300.2994@anthias.pfeifer.com>
2017-01-01 21:33       ` Martin Sebor
2016-09-16 17:00 ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=acc71358-2750-7a31-9d11-c5ce29aa60ea@redhat.com \
    --to=law@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=joseph@codesourcery.com \
    --cc=msebor@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).