public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Tom de Vries <tdevries@suse.de>
Cc: gdb-patches@sourceware.org, tom@tromey.com
Subject: Re: [PATCH] [gdb/tui] Handle unicode chars in prompt
Date: Fri, 26 May 2023 16:56:24 +0300	[thread overview]
Message-ID: <83pm6njiwn.fsf@gnu.org> (raw)
In-Reply-To: <20230526132512.29496-1-tdevries@suse.de> (message from Tom de Vries via Gdb-patches on Fri, 26 May 2023 15:25:12 +0200)

> Cc: Tom Tromey <tom@tromey.com>
> Date: Fri, 26 May 2023 15:25:12 +0200
> From: Tom de Vries via Gdb-patches <gdb-patches@sourceware.org>
> 
> +/* Return true if STRING starts with a multi-byte char.  Return the length of
> +   the multi-byte char in LEN, or 0 in case it's a multi-byte null char.
> +   Implementation based on _rl_read_mbchar.  */
> +
> +static bool
> +is_mb_char (const char *string, int &len)
> +{
> +  for (len = 1; len <= MB_CUR_MAX; len++)
> +    {
> +      size_t res;
> +
> +      {
> +	wchar_t wc;  <<<<<<<<<<<<<<<<<<<<<<<
> +	mbstate_t ps;
> +	memset (&ps, 0, sizeof (mbstate_t));
> +	res = mbrtowc (&wc, string, len, &ps);

The above assumes each call to mbrtowc produces only one wchar_t
value.  But that's non-portable: on MS-Windows wchar_t is a 16-bit
wide data type, and wchar_t "wide characters" are actually encoded in
UTF-16.  So characters beyond the BMP will yield 2 wchar_t values, not
one.

One additional caveat: "multibyte" != "UTF-8".  There's more than one
multibyte encoding, and the current locale could use some non-UTF-8
encoding instead.  For example, some encoding of the ISO-2022 family.
I'm not sure what this means for the issue at hand.

Yet another consideration is whether tui_puts_internal is used for
outputting text in the target charset, in which case you may have
problems with using mbrtowc, because AFAIK that supports only the
current locale's codeset.  If the target charset is different from the
locale's (basically, the host) charset, and we don't convert one to
the other before calling tui_puts_internal, mbrtowc will fail.

Yes, this is a mess.

Thanks.

  reply	other threads:[~2023-05-26 13:55 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-26 13:25 Tom de Vries
2023-05-26 13:56 ` Eli Zaretskii [this message]
2023-05-30 16:51   ` Tom Tromey
2023-06-09  9:34   ` Tom de Vries
2023-06-09 10:21     ` Eli Zaretskii
2023-05-26 15:44 ` Tom de Vries
2023-05-30 17:03   ` Tom Tromey
2023-05-30 18:07     ` DJ Delorie
2023-05-31  0:02       ` Tom Tromey
2023-05-31 11:29     ` Tom de Vries
2023-06-08 22:44       ` Tom de Vries
2023-06-09 15:13         ` Tom Tromey
2023-06-09  9:48     ` Tom de Vries
2023-06-09 15:15       ` Tom Tromey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83pm6njiwn.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=gdb-patches@sourceware.org \
    --cc=tdevries@suse.de \
    --cc=tom@tromey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).