public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: Tom Tromey <tom@tromey.com>
To: Tom Tromey <tom@tromey.com>
Cc: Andrew Burgess <aburgess@redhat.com>,  gdb-patches@sourceware.org
Subject: Re: [PATCH v2 09/18] Include \0 in printable wide characters
Date: Wed, 23 Feb 2022 16:59:02 -0700	[thread overview]
Message-ID: <878ru1dtix.fsf@tromey.com> (raw)
In-Reply-To: <87czjddxp6.fsf@tromey.com> (Tom Tromey's message of "Wed, 23 Feb 2022 15:28:53 -0700")

Tom> I think the idea behind this is that only a real \0 in the input will
Tom> really ever turn into a L'\0' in the wchar_t form.  It seems to me that
Tom> an L'\0' pretty much has to correspond exactly to a target \0, just
Tom> because C is pervasive and an encoding where stray \0 bytes can appear
Tom> would break everything.

I went for a short walk and, naturally, realized this is only half
right.

An L'\0' can come from a non-zero target encoding in the Java flavor of
UTF-8, which exists precisely to smuggle a wide '\0' through a
multi-byte encoding.

https://en.wikipedia.org/wiki/UTF-8#Modified_UTF-8

So while I still believe that a target '\0' will always map to a L'\0',
it's not the case that an L'\0' necessarily came from one such, with the
Java-style 0xc0 0x80 being a counter-example.

In this case it's not 100% clear what is the best thing to do.  Possibly
iconv will just give an encoding error, as that is an overlong sequence.
Anyway maybe the right thing to do is print \xc0\x80 or the like, to
make it clear that something unusual is going on.

Tom> That's most likely because you are trying this on Linux.  Linux uses
Tom> UTF-32 for wchar_t, and so there aren't target characters that can't be
Tom> converted to a single wchar_t

I now wonder if this is true as well, because you might see a "CESU-8"
encoding:

https://en.wikipedia.org/wiki/CESU-8

... where surrogate pairs are represented as two UTF-8 sequences.  This
could show up as a target program decision to use this encoding,
combined with using UTF-8 in gdb.  I didn't experiment to see what iconv
does for this sort of thing.

I'll look into a Rust-specific fix and just drop this patch.

Tom

  reply	other threads:[~2022-02-24  0:00 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-17 22:05 [PATCH v2 00/18] Refactor character printing Tom Tromey
2022-02-17 22:05 ` [PATCH v2 01/18] Fix latent quote char bug in generic_printstr Tom Tromey
2022-02-17 22:05 ` [PATCH v2 02/18] Boolify need_escape in generic_emit_char Tom Tromey
2022-02-17 22:05 ` [PATCH v2 03/18] Remove c_emit_char Tom Tromey
2022-02-17 22:05 ` [PATCH v2 04/18] Remove c_printstr Tom Tromey
2022-02-17 22:05 ` [PATCH v2 05/18] Don't use wchar_printable in print_wchar Tom Tromey
2022-02-22 15:36   ` Andrew Burgess
2022-10-10 16:39     ` Tom Tromey
2022-02-17 22:05 ` [PATCH v2 06/18] Fix a latent bug " Tom Tromey
2022-02-17 22:05 ` [PATCH v2 07/18] Remove language_defn::emitchar Tom Tromey
2022-02-17 22:05 ` [PATCH v2 08/18] Add gdb_iswcntrl Tom Tromey
2022-02-17 22:05 ` [PATCH v2 09/18] Include \0 in printable wide characters Tom Tromey
2022-02-23 13:49   ` Andrew Burgess
2022-02-23 22:28     ` Tom Tromey
2022-02-23 23:59       ` Tom Tromey [this message]
2022-02-17 22:05 ` [PATCH v2 10/18] Use a ui_file in print_wchar Tom Tromey
2022-02-17 22:05 ` [PATCH v2 11/18] Add an emitter callback to generic_printstr and generic_emit_char Tom Tromey
2022-02-17 22:05 ` [PATCH v2 12/18] Add a default encoding to generic_emit_char and generic_printstr Tom Tromey
2022-02-17 22:05 ` [PATCH v2 13/18] Change generic_emit_char to print the quotes Tom Tromey
2022-02-17 22:05 ` [PATCH v2 14/18] Use generic_emit_char in Rust Tom Tromey
2022-02-17 22:05 ` [PATCH v2 15/18] Use generic_emit_char in Ada Tom Tromey
2022-02-17 22:05 ` [PATCH v2 16/18] Use generic_emit_char in Modula-2 Tom Tromey
2022-02-23 20:17   ` Gaius Mulley
2022-03-16 12:29   ` [PATCH] Additional modula2 tests Gaius Mulley
2022-04-07 14:21     ` Tom Tromey
2022-04-09 23:16       ` Gaius Mulley
2022-04-11 19:45   ` [PATCH v1] Array access in Modula-2 Gaius Mulley
2022-02-17 22:05 ` [PATCH v2 17/18] Use generic_emit_char in Pascal Tom Tromey
2022-02-17 22:05 ` [PATCH v2 18/18] Simplify Fortran string printing Tom Tromey
2022-10-10 17:37 ` [PATCH v2 00/18] Refactor character printing Tom Tromey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878ru1dtix.fsf@tromey.com \
    --to=tom@tromey.com \
    --cc=aburgess@redhat.com \
    --cc=gdb-patches@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).