Re: [PATCH] gdb: add UTF16/UTF32 target charsets in phony_iconv

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

From: Patrick Monnerat <patrick@monnerat.net>
To: Tom Tromey <tom@tromey.com>
Cc: Patrick Monnerat via Gdb-patches <gdb-patches@sourceware.org>
Subject: Re: [PATCH] gdb: add UTF16/UTF32 target charsets in phony_iconv
Date: Sun, 9 Oct 2022 02:47:18 +0200	[thread overview]
Message-ID: <2f10efe4-1095-b620-ea1c-08cc047c45c4@monnerat.net> (raw)
In-Reply-To: <874jwejgbb.fsf@tromey.com>

On 10/8/22 20:55, Tom Tromey wrote:
>
> The comments at the top of gdb_wchar.h describe the situation somewhat,
> though they don't really explain what was wrong with Solaris.  My
> recollection, though, is that the Solaris wchar_t doesn't have any
> ordinary encoding but is instead a weird hybrid thing, and furthermore
> that the Solaris iconv doesn't accept "wchar_t" as an encoding name.
> So, on Solaris, there's no convenient way to do the conversions (it's
> possible to convert wchar_t to/from the locale's multi-byte encoding,
> but I didn't implement that since it seemed like a pain).

Thanks for the additional explanation.

This describes the particular case of Solaris. Are there other OSes with 
similar implementations?

In all cases, this tends to demonstrate wchar_t is not reliable.

>
> All of this is based on the idea that it's convenient to work in a wide
> character representation at some points in the code.  At the time, I
> figured relying on wchar_t would be good for this because (presumably)
> hosts would support that reasonably well and we wouldn't have to do
> extra work in gdb.
>
> However, it seems to me that it doesn't really have to be done this way.
> We could use UTF-32 instead, by making our own tables (along the lines
> of ada-unicode.py) for "isdigit" and "isprint".

Totally agreed: we need to have something more "predictable". UTF-32 
seems a good choice, but the endian problem should still be resolved. 
Should it be fixed (UTF-32[BL]E) or machine dependent? Both have pros 
and cons. We could have a class implementing those chars + their 
ctype-like methods and even a basic_string instance subclass supporting 
conversions.

I nevertheless don't have any idea what is the amount of work required 
to change this.

> In addition to this, I suppose we could simply require iconv.  Probably
> any host that has iconv will support UTF-32 (if not, what good is it
> really).  And libiconv exists and can even be conveniently dropped into
> the source tree if there are any hosts that don't have it.  This may not
> be a good plan if there are active host platforms where this would be a
> pain to deal with.

IMO, only old platforms (>~15 years) have an iconv that does not feature 
UTF. Do we have to support them?

For the particular case of Solaris, did things changed nowadays and how 
old versions should be supported?

>
> Anyway, what do you think of this plan?

Globally and in the long term, I fully agree. Requiring iconv, using 
UTF-32 instead of wchar_t and dropping phony_iconv looks like the best 
solution. But again, I don't imagine the amount of rework implied.

As I'm mainly the Insight maintainer (since 2014) and a very small and 
recent committer to gdb, I don't want to make a revolution into the 
latter, just make it clean and usable from Insight when called in my 
test contexts (currently linux OK, cygwin OK, mingw BAD). That said, I'm 
not against a reasonable contribution that benefits to bare gdb too, 
within the limits of my knowledge and understanding of its code.

In the short and middle terms, I think the current patch is still 
useful: it immediately (and dirtily!) solves the problem introduced by 
Ada support and will allow a smooth and gentle UTF-32 transition until 
reaching a situation where phony_iconv can be dropped.

The questions I ask above are more to emphasize important strategic 
points rather than requiring an immediate answer, I guess!

Patrick

next prev parent reply	other threads:[~2022-10-09  0:47 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-02 14:00 Patrick Monnerat
2022-10-07 20:10 ` Tom Tromey
2022-10-08  0:12   ` Patrick Monnerat
2022-10-08 18:55     ` Tom Tromey
2022-10-09  0:47       ` Patrick Monnerat [this message]
2022-10-10 16:11         ` Tom Tromey
2022-10-16  1:50           ` Tom Tromey
2022-10-16  6:24             ` Eli Zaretskii
2022-10-17 23:10               ` Tom Tromey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2f10efe4-1095-b620-ea1c-08cc047c45c4@monnerat.net \
    --to=patrick@monnerat.net \
    --cc=gdb-patches@sourceware.org \
    --cc=tom@tromey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).