public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: Tom Tromey <tom@tromey.com>
To: Andrew Burgess <aburgess@redhat.com>
Cc: Tom Tromey <tom@tromey.com>,  gdb-patches@sourceware.org
Subject: Re: [PATCH v2 09/18] Include \0 in printable wide characters
Date: Wed, 23 Feb 2022 15:28:53 -0700	[thread overview]
Message-ID: <87czjddxp6.fsf@tromey.com> (raw)
In-Reply-To: <20220223134930.GT2571@redhat.com> (Andrew Burgess's message of "Wed, 23 Feb 2022 13:49:30 +0000")

>>>>> "Andrew" == Andrew Burgess <aburgess@redhat.com> writes:

Andrew> My confusion here is that I initially thought; if we have multiple
Andrew> characters, some that are printable, and some that are not, then
Andrew> surely, we would want to print the initial printable ones for real,
Andrew> and only later switch to escape sequences, right?

Andrew> Except, that's not what we do.

Andrew> And the reason (probably obvious to quicker minds than mine) is that
Andrew> characters might have different widths, so we can't "just" print the
Andrew> initial characters, and then print the unprintable as escape
Andrew> sequences, as we wouldn't know where in BUF the unprintable character
Andrew> actually starts.

Yeah, that's my understanding as well.

Andrew> OK, so my idea of removing wchar_printable is clearly a bad idea, but
Andrew> how does this relate to your change?

Andrew> Well, prior to this patch, if we had 3 characters, the first two are
Andrew> printable, and the third was \0, we would spot the non-printable \0,
Andrew> and so print the whole buffer, all 3 characters, as escape sequences.

Andrew> With this patch, all 3 characters will appear to be printable.  So now
Andrew> we will print the first character, just fine.  Then print the second
Andrew> character just fine.  Now for the third character, the \0, we call to
Andrew> print_wchar.  The \0 is not handled by anything but the 'default' case
Andrew> of the switch.

Andrew> In the default case, the \0 is non-printable, so we end up in the
Andrew> escape sequence printing code, which then tries to load bytes starting
Andrew> from BUF - which isn't going to be correct.

I think the idea behind this is that only a real \0 in the input will
really ever turn into a L'\0' in the wchar_t form.  It seems to me that
an L'\0' pretty much has to correspond exactly to a target \0, just
because C is pervasive and an encoding where stray \0 bytes can appear
would break everything.

Andrew> Now, this is where things are a bit weird.  The code in
Andrew> generic_emit_char is clearly written to handle multiple characters,
Andrew> but, I've only ever seen it print 1 character, which is why, I claim,
Andrew> your above change to wchar_printable works.

That's most likely because you are trying this on Linux.  Linux uses
UTF-32 for wchar_t, and so there aren't target characters that can't be
converted to a single wchar_t -- because UTF-32 is pretty much designed
to round-trip everything else.  So, on Linux hosts, I think some of
these loops aren't really needed.

However, Windows uses UTF-16 and a single target character can be
converted to two wchar_t, via surrogate pairs.

On Solaris and (IIRC) NetBSD, wchar_t is even weirder, though I don't
recall whether it is a variable-length encoding.

Anyway the \0 case is only really here for Rust.  So maybe another idea
is to handle it exactly there, somehow.  The Rust printer can assume the
use of UTF-32 on the target, so that would all work out fine.

Tom

  reply	other threads:[~2022-02-23 22:28 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-17 22:05 [PATCH v2 00/18] Refactor character printing Tom Tromey
2022-02-17 22:05 ` [PATCH v2 01/18] Fix latent quote char bug in generic_printstr Tom Tromey
2022-02-17 22:05 ` [PATCH v2 02/18] Boolify need_escape in generic_emit_char Tom Tromey
2022-02-17 22:05 ` [PATCH v2 03/18] Remove c_emit_char Tom Tromey
2022-02-17 22:05 ` [PATCH v2 04/18] Remove c_printstr Tom Tromey
2022-02-17 22:05 ` [PATCH v2 05/18] Don't use wchar_printable in print_wchar Tom Tromey
2022-02-22 15:36   ` Andrew Burgess
2022-10-10 16:39     ` Tom Tromey
2022-02-17 22:05 ` [PATCH v2 06/18] Fix a latent bug " Tom Tromey
2022-02-17 22:05 ` [PATCH v2 07/18] Remove language_defn::emitchar Tom Tromey
2022-02-17 22:05 ` [PATCH v2 08/18] Add gdb_iswcntrl Tom Tromey
2022-02-17 22:05 ` [PATCH v2 09/18] Include \0 in printable wide characters Tom Tromey
2022-02-23 13:49   ` Andrew Burgess
2022-02-23 22:28     ` Tom Tromey [this message]
2022-02-23 23:59       ` Tom Tromey
2022-02-17 22:05 ` [PATCH v2 10/18] Use a ui_file in print_wchar Tom Tromey
2022-02-17 22:05 ` [PATCH v2 11/18] Add an emitter callback to generic_printstr and generic_emit_char Tom Tromey
2022-02-17 22:05 ` [PATCH v2 12/18] Add a default encoding to generic_emit_char and generic_printstr Tom Tromey
2022-02-17 22:05 ` [PATCH v2 13/18] Change generic_emit_char to print the quotes Tom Tromey
2022-02-17 22:05 ` [PATCH v2 14/18] Use generic_emit_char in Rust Tom Tromey
2022-02-17 22:05 ` [PATCH v2 15/18] Use generic_emit_char in Ada Tom Tromey
2022-02-17 22:05 ` [PATCH v2 16/18] Use generic_emit_char in Modula-2 Tom Tromey
2022-02-23 20:17   ` Gaius Mulley
2022-03-16 12:29   ` [PATCH] Additional modula2 tests Gaius Mulley
2022-04-07 14:21     ` Tom Tromey
2022-04-09 23:16       ` Gaius Mulley
2022-04-11 19:45   ` [PATCH v1] Array access in Modula-2 Gaius Mulley
2022-02-17 22:05 ` [PATCH v2 17/18] Use generic_emit_char in Pascal Tom Tromey
2022-02-17 22:05 ` [PATCH v2 18/18] Simplify Fortran string printing Tom Tromey
2022-10-10 17:37 ` [PATCH v2 00/18] Refactor character printing Tom Tromey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87czjddxp6.fsf@tromey.com \
    --to=tom@tromey.com \
    --cc=aburgess@redhat.com \
    --cc=gdb-patches@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).