[PATCH] [gdb/tui] Handle unicode chars in prompt

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

* [PATCH] [gdb/tui] Handle unicode chars in prompt
@ 2023-05-26 13:25 Tom de Vries
  2023-05-26 13:56 ` Eli Zaretskii
  2023-05-26 15:44 ` Tom de Vries
  0 siblings, 2 replies; 14+ messages in thread
From: Tom de Vries @ 2023-05-26 13:25 UTC (permalink / raw)
  To: gdb-patches; +Cc: Tom Tromey

Let's try to set the prompt using a unicode character, say '❯', aka U+276F
(heavy right-pointing angle quotation mark ornament).

This works fine on an xterm with CLI (with X marking the position of the
blinking cursor):
...
$ gdb -q -ex "set prompt GDB❯ "
GDB❯ X
...
but with TUI:
...
$ gdb -q -tui -ex "set prompt GDB❯ "
...
we get instead:
...
GDB  GDB  X
...

We can use the test-case gdb.tui/unicode-prompt.exp to get more details, using
tuiterm.

With Term::dump_screen we have:
...
   16 (gdb) set prompt GDB❯
   17 GDB❯ GDB❯ GDB❯ set prompt (gdb)
   18 (gdb)
...
and with Term::dump_screen_with_attrs (summarizing using attribute sets <attrs1>
and <attrs2>):
...
   16 (gdb) set prompt GDB❯
   17 GDB<attrs1>❯<attrs2> GDB<attrs1>❯<attrs2> GDB<attrs1>❯<attrs2> set prompt (gdb)
   18 (gdb)
...
where:
...
<attrs1> == <reverse:1><invisible:1><blinking:1><intensity:bold>
<attrs2> == <reverse:0><invisible:0><blinking:0><intensity:normal>
...

This explains why we didn't see the unicode char on xterm: it's hidden
because the invisible attribute is set.

So, there seem to be two problems:
- the attributes are incorrect, and
- the prompt is repeated a couple of times.

In TUI, the prompt is written out by tui_puts_internal, which outputs one byte
at a time using waddch, which apparantly breaks multi-byte char support.

Fix this by detecting multi-byte chars in tui_puts_internal, and printing them using
waddnstr.

Tested on x86_64-linux.

Reported-By: wuzy01@qq.com

PR tui/28800
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28800
---
 gdb/testsuite/gdb.tui/unicode-prompt.exp | 45 ++++++++++++++++
 gdb/tui/tui-io.c                         | 67 +++++++++++++++++++++++-
 2 files changed, 111 insertions(+), 1 deletion(-)
 create mode 100644 gdb/testsuite/gdb.tui/unicode-prompt.exp

diff --git a/gdb/testsuite/gdb.tui/unicode-prompt.exp b/gdb/testsuite/gdb.tui/unicode-prompt.exp
new file mode 100644
index 00000000000..6c2f9036921
--- /dev/null
+++ b/gdb/testsuite/gdb.tui/unicode-prompt.exp
@@ -0,0 +1,45 @@
+# Copyright 2023 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+require allow_tui_tests
+
+tuiterm_env
+
+save_vars { env(LC_ALL) env(LANG) env(LC_CTYPE) } {
+    # Override "C" settings from default_gdb_init.
+    setenv LC_ALL ""
+    setenv LANG en_US.UTF-8
+    setenv LC_CTYPE ""
+
+    Term::clean_restart 24 80
+
+    if {![Term::enter_tui]} {
+	unsupported "TUI not supported"
+	return
+    }
+
+    set unicode_char "\u276F"
+
+    set prompt "GDB$unicode_char "
+    set prompt_re [string_to_regexp $prompt]
+
+    # Set new prompt.
+    send_gdb "set prompt $prompt\n"
+    # Set old prompt back.
+    send_gdb "set prompt (gdb) \n"
+
+    gdb_assert { [Term::wait_for "^${prompt_re}set prompt $gdb_prompt "] } \
+	"prompt with unicode char"
+}
diff --git a/gdb/tui/tui-io.c b/gdb/tui/tui-io.c
index a1eadcd937d..f6412e2dbad 100644
--- a/gdb/tui/tui-io.c
+++ b/gdb/tui/tui-io.c
@@ -514,6 +514,51 @@ tui_puts (const char *string, WINDOW *w)
     update_cmdwin_start_line ();
 }
 
+/* Return true if STRING starts with a multi-byte char.  Return the length of
+   the multi-byte char in LEN, or 0 in case it's a multi-byte null char.
+   Implementation based on _rl_read_mbchar.  */
+
+static bool
+is_mb_char (const char *string, int &len)
+{
+  for (len = 1; len <= MB_CUR_MAX; len++)
+    {
+      size_t res;
+
+      {
+	wchar_t wc;
+	mbstate_t ps;
+	memset (&ps, 0, sizeof (mbstate_t));
+	res = mbrtowc (&wc, string, len, &ps);
+      }
+
+      if (res == (size_t)(-1))
+	{
+	  /* Not a multi-byte char.  */
+	  return false;
+	}
+
+      if (res == (size_t)(-2))
+	{
+	  /* Part of a multi-byte char.  */
+	  continue;
+	}
+
+      if (res == 0)
+	{
+	  /* Multi-byte null char.  */
+	  len = 0;
+	  return true;
+	}
+
+      /* Complete multi-byte char.  */
+      gdb_assert (res == len);
+      return true;
+    }
+
+  return false;
+}
+
 static void
 tui_puts_internal (WINDOW *w, const char *string, int *height)
 {
@@ -521,8 +566,28 @@ tui_puts_internal (WINDOW *w, const char *string, int *height)
   int prev_col = 0;
   bool saw_nl = false;
 
-  while ((c = *string++) != 0)
+  while (true)
     {
+      {
+	int mb_len;
+	if (is_mb_char (string, mb_len) && mb_len != 1)
+	  {
+	    if (mb_len == 0)
+	      {
+		/* Multi-byte null char.  */
+		break;
+	      }
+
+	    waddnstr (w, string, mb_len);
+	    string += mb_len;
+	    continue;
+	  }
+      }
+
+      c = *string++;
+      if (c == '\0')
+	break;
+
       if (c == '\n')
 	saw_nl = true;
 

base-commit: 5fd6b60d86ab6ab4bbd173524062b5d2aeac199a
-- 
2.35.3


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] [gdb/tui] Handle unicode chars in prompt
  2023-05-26 13:25 [PATCH] [gdb/tui] Handle unicode chars in prompt Tom de Vries
@ 2023-05-26 13:56 ` Eli Zaretskii
  2023-05-30 16:51   ` Tom Tromey
  2023-06-09  9:34   ` Tom de Vries
  2023-05-26 15:44 ` Tom de Vries
  1 sibling, 2 replies; 14+ messages in thread
From: Eli Zaretskii @ 2023-05-26 13:56 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gdb-patches, tom

> Cc: Tom Tromey <tom@tromey.com>
> Date: Fri, 26 May 2023 15:25:12 +0200
> From: Tom de Vries via Gdb-patches <gdb-patches@sourceware.org>
> 
> +/* Return true if STRING starts with a multi-byte char.  Return the length of
> +   the multi-byte char in LEN, or 0 in case it's a multi-byte null char.
> +   Implementation based on _rl_read_mbchar.  */
> +
> +static bool
> +is_mb_char (const char *string, int &len)
> +{
> +  for (len = 1; len <= MB_CUR_MAX; len++)
> +    {
> +      size_t res;
> +
> +      {
> +	wchar_t wc;  <<<<<<<<<<<<<<<<<<<<<<<
> +	mbstate_t ps;
> +	memset (&ps, 0, sizeof (mbstate_t));
> +	res = mbrtowc (&wc, string, len, &ps);

The above assumes each call to mbrtowc produces only one wchar_t
value.  But that's non-portable: on MS-Windows wchar_t is a 16-bit
wide data type, and wchar_t "wide characters" are actually encoded in
UTF-16.  So characters beyond the BMP will yield 2 wchar_t values, not
one.

One additional caveat: "multibyte" != "UTF-8".  There's more than one
multibyte encoding, and the current locale could use some non-UTF-8
encoding instead.  For example, some encoding of the ISO-2022 family.
I'm not sure what this means for the issue at hand.

Yet another consideration is whether tui_puts_internal is used for
outputting text in the target charset, in which case you may have
problems with using mbrtowc, because AFAIK that supports only the
current locale's codeset.  If the target charset is different from the
locale's (basically, the host) charset, and we don't convert one to
the other before calling tui_puts_internal, mbrtowc will fail.

Yes, this is a mess.

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] [gdb/tui] Handle unicode chars in prompt
  2023-05-26 13:25 [PATCH] [gdb/tui] Handle unicode chars in prompt Tom de Vries
  2023-05-26 13:56 ` Eli Zaretskii
@ 2023-05-26 15:44 ` Tom de Vries
  2023-05-30 17:03   ` Tom Tromey
  1 sibling, 1 reply; 14+ messages in thread
From: Tom de Vries @ 2023-05-26 15:44 UTC (permalink / raw)
  To: gdb-patches; +Cc: Tom Tromey

On 5/26/23 15:25, Tom de Vries via Gdb-patches wrote:
> In TUI, the prompt is written out by tui_puts_internal, which outputs one byte
> at a time using waddch, which apparantly breaks multi-byte char support.
> 
> Fix this by detecting multi-byte chars in tui_puts_internal, and printing them using
> waddnstr.

FWIW, I just came across this commit, which seems relevant:
...
commit 2c72d5e58a55d3e0f867ffd9421184852f051cb7
Author: Tom Tromey <tom@tromey.com>
Date:   Sun Sep 27 20:30:30 2020 -0600

Rewrite tui_puts

This rewrites tui_puts.  It now writes as many bytes as possible in a
call to waddnstr, letting curses handle multi-byte sequences properly.

Note that tui_puts_internal remains.  It is needed to handle computing
the start line of the readline prompt, which is difficult to do
properly in the case where redisplaying can also cause the command
window to scroll.  This might be possible to implement by reverting to
single "character" output, by using mbsrtowcs for its side effects to
find character boundaries in the input.  I have not attempted this.
...

This patch uses mbrtowc rather than mbsrtowcs, but I suppose the idea is 
the same.

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] [gdb/tui] Handle unicode chars in prompt
  2023-05-26 13:56 ` Eli Zaretskii
@ 2023-05-30 16:51   ` Tom Tromey
  2023-06-09  9:34   ` Tom de Vries
  1 sibling, 0 replies; 14+ messages in thread
From: Tom Tromey @ 2023-05-30 16:51 UTC (permalink / raw)
  To: Eli Zaretskii via Gdb-patches; +Cc: Tom de Vries, Eli Zaretskii, tom

>>>>> "Eli" == Eli Zaretskii via Gdb-patches <gdb-patches@sourceware.org> writes:

Eli> Yet another consideration is whether tui_puts_internal is used for
Eli> outputting text in the target charset

I think this shouldn't happen; text from the inferior goes through gdb's
target-to-host charset conversion.

Tom

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] [gdb/tui] Handle unicode chars in prompt
  2023-05-26 15:44 ` Tom de Vries
@ 2023-05-30 17:03   ` Tom Tromey
  2023-05-30 18:07     ` DJ Delorie
                       ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Tom Tromey @ 2023-05-30 17:03 UTC (permalink / raw)
  To: Tom de Vries via Gdb-patches; +Cc: Tom de Vries, Tom Tromey

>> In TUI, the prompt is written out by tui_puts_internal, which outputs one byte
>> at a time using waddch, which apparantly breaks multi-byte char support.
>> Fix this by detecting multi-byte chars in tui_puts_internal, and
>> printing them using
>> waddnstr.

> FWIW, I just came across this commit, which seems relevant:

Tom> Note that tui_puts_internal remains.  It is needed to handle computing
Tom> the start line of the readline prompt, which is difficult to do
Tom> properly in the case where redisplaying can also cause the command
Tom> window to scroll.  This might be possible to implement by reverting to
Tom> single "character" output, by using mbsrtowcs for its side effects to
Tom> find character boundaries in the input.  I have not attempted this.
Tom> ...

I no longer remember what made this difficult.  I wonder if it's
possible to simply emit as many characters as possible in a single call,
and then use getyx to figure out the length of the prompt after it has
been fully displayed.  If the prompt wraps or if it takes multiple
lines, offhand it seems fine to just pick whatever the final column
happens to be.

Using wchar functions in gdb is a pain; at least in the past,
gdb_wchar.h was written to support systems that don't support these at
all (DJGPP - not sure if that host even builds any more).

Some characters may take multiple columns (see 'wcwidth').  I'd hope
that the display-and-getyx approach would avoid having to have gdb
understand this; though I suppose gdb's pager probably already gets this
wrong.

Tom

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] [gdb/tui] Handle unicode chars in prompt
  2023-05-30 17:03   ` Tom Tromey
@ 2023-05-30 18:07     ` DJ Delorie
  2023-05-31  0:02       ` Tom Tromey
  2023-05-31 11:29     ` Tom de Vries
  2023-06-09  9:48     ` Tom de Vries
  2 siblings, 1 reply; 14+ messages in thread
From: DJ Delorie @ 2023-05-30 18:07 UTC (permalink / raw)
  To: Tom Tromey; +Cc: gdb-patches, tdevries, tom

Tom Tromey <tom@tromey.com> writes:
> Using wchar functions in gdb is a pain; at least in the past,
> gdb_wchar.h was written to support systems that don't support these at
> all (DJGPP - not sure if that host even builds any more).

The latest gdb build we have is 8.0 but I don't do the builds so I don't
know what the status is for anything newer.

As for wide chars, DJGPP has wchar_t and the required helper functions,
and that's about all the support we added.  I don't think I've ever seen
DJGPP produce anything beyond ASCII or the usual DOS code pages.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] [gdb/tui] Handle unicode chars in prompt
  2023-05-30 18:07     ` DJ Delorie
@ 2023-05-31  0:02       ` Tom Tromey
  0 siblings, 0 replies; 14+ messages in thread
From: Tom Tromey @ 2023-05-31  0:02 UTC (permalink / raw)
  To: DJ Delorie; +Cc: Tom Tromey, gdb-patches, tdevries

DJ> As for wide chars, DJGPP has wchar_t and the required helper functions,
DJ> and that's about all the support we added.  I don't think I've ever seen
DJ> DJGPP produce anything beyond ASCII or the usual DOS code pages.

Maybe we could drop the compatibility code then.  It's hard to really be
sure, the comments refer to DJGPP but we wouldn't necessarily know if
they were in use on some other system.

Tom

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] [gdb/tui] Handle unicode chars in prompt
  2023-05-30 17:03   ` Tom Tromey
  2023-05-30 18:07     ` DJ Delorie
@ 2023-05-31 11:29     ` Tom de Vries
  2023-06-08 22:44       ` Tom de Vries
  2023-06-09  9:48     ` Tom de Vries
  2 siblings, 1 reply; 14+ messages in thread
From: Tom de Vries @ 2023-05-31 11:29 UTC (permalink / raw)
  To: Tom Tromey, Tom de Vries via Gdb-patches

[-- Attachment #1: Type: text/plain, Size: 1441 bytes --]

On 5/30/23 19:03, Tom Tromey wrote:
>>> In TUI, the prompt is written out by tui_puts_internal, which outputs one byte
>>> at a time using waddch, which apparantly breaks multi-byte char support.
>>> Fix this by detecting multi-byte chars in tui_puts_internal, and
>>> printing them using
>>> waddnstr.
> 
>> FWIW, I just came across this commit, which seems relevant:
> 
> Tom> Note that tui_puts_internal remains.  It is needed to handle computing
> Tom> the start line of the readline prompt, which is difficult to do
> Tom> properly in the case where redisplaying can also cause the command
> Tom> window to scroll.  This might be possible to implement by reverting to
> Tom> single "character" output, by using mbsrtowcs for its side effects to
> Tom> find character boundaries in the input.  I have not attempted this.
> Tom> ...
> 
> I no longer remember what made this difficult.  I wonder if it's
> possible to simply emit as many characters as possible in a single call,
> and then use getyx to figure out the length of the prompt after it has
> been fully displayed.  If the prompt wraps or if it takes multiple
> lines, offhand it seems fine to just pick whatever the final column
> happens to be.

I've given that a try, and that seems to work.

I also realized that we don't cover wrapping prompts in the testsuite, 
so I wrote a test-case ( 
https://sourceware.org/pipermail/gdb-patches/2023-May/199950.html ).

Thanks,
- Tom

[-- Attachment #2: 0001-gdb-tui-Handle-unicode-chars-in-prompt.patch --]
[-- Type: text/x-patch, Size: 6736 bytes --]

From 734e62fb3db7a25a69db0fa25f8820fc2ba88bb7 Mon Sep 17 00:00:00 2001
From: Tom de Vries <tdevries@suse.de>
Date: Wed, 24 May 2023 19:54:34 +0200
Subject: [PATCH] [gdb/tui] Handle unicode chars in prompt
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Let's try to set the prompt using a unicode character, say '❯', aka U+276F
(heavy right-pointing angle quotation mark ornament).

This works fine on an xterm with CLI (with X marking the position of the
blinking cursor):
...
$ gdb -q -ex "set prompt GDB❯ "
GDB❯ X
...
but with TUI:
...
$ gdb -q -tui -ex "set prompt GDB❯ "
...
we get instead:
...
GDB  GDB  X
...

We can use the test-case gdb.tui/unicode-prompt.exp to get more details, using
tuiterm.

With Term::dump_screen we have:
...
   16 (gdb) set prompt GDB❯
   17 GDB❯ GDB❯ GDB❯ set prompt (gdb)
   18 (gdb)
...
and with Term::dump_screen_with_attrs (summarizing using attribute sets <attrs1>
and <attrs2>):
...
   16 (gdb) set prompt GDB❯
   17 GDB<attrs1>❯<attrs2> GDB<attrs1>❯<attrs2> GDB<attrs1>❯<attrs2> set prompt (gdb)
   18 (gdb)
...
where:
...
<attrs1> == <reverse:1><invisible:1><blinking:1><intensity:bold>
<attrs2> == <reverse:0><invisible:0><blinking:0><intensity:normal>
...

This explains why we didn't see the unicode char on xterm: it's hidden
because the invisible attribute is set.

So, there seem to be two problems:
- the attributes are incorrect, and
- the prompt is repeated a couple of times.

In TUI, the prompt is written out by tui_puts_internal, which outputs one byte
at a time using waddch, which apparantly breaks multi-byte char support.

In contrast, tui_puts splits up the string using separators "\n\1\2\033\t" and
prints out the bits inbetween using waddnstr.

Fix this by:
- factoring out new function tui_puts_1 out of tui_puts, and
- using it in tui_puts_internal.

Tested on x86_64-linux.

Reported-By: wuzy01@qq.com

PR tui/28800
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28800
---
 gdb/testsuite/gdb.tui/unicode-prompt.exp | 45 ++++++++++++++
 gdb/tui/tui-io.c                         | 76 ++++++++++++------------
 2 files changed, 82 insertions(+), 39 deletions(-)
 create mode 100644 gdb/testsuite/gdb.tui/unicode-prompt.exp

diff --git a/gdb/testsuite/gdb.tui/unicode-prompt.exp b/gdb/testsuite/gdb.tui/unicode-prompt.exp
new file mode 100644
index 00000000000..7cd38731e83
--- /dev/null
+++ b/gdb/testsuite/gdb.tui/unicode-prompt.exp
@@ -0,0 +1,45 @@
+# Copyright 2023 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Test a prompt with a unicode char in TUI.
+
+require allow_tui_tests
+
+tuiterm_env
+
+save_vars { env(LC_ALL) } {
+    # Override "C" setting from default_gdb_init.
+    setenv LC_ALL "C.UTF-8"
+
+    Term::clean_restart 24 80
+
+    if {![Term::enter_tui]} {
+	unsupported "TUI not supported"
+	return
+    }
+
+    set unicode_char "\u276F"
+
+    set prompt "GDB$unicode_char "
+    set prompt_re [string_to_regexp $prompt]
+
+    # Set new prompt.
+    send_gdb "set prompt $prompt\n"
+    # Set old prompt back.
+    send_gdb "set prompt (gdb) \n"
+
+    gdb_assert { [Term::wait_for "^${prompt_re}set prompt $gdb_prompt "] } \
+	"prompt with unicode char"
+}
diff --git a/gdb/tui/tui-io.c b/gdb/tui/tui-io.c
index a1eadcd937d..40b40717dbd 100644
--- a/gdb/tui/tui-io.c
+++ b/gdb/tui/tui-io.c
@@ -451,16 +451,18 @@ tui_write (const char *buf, size_t length)
   tui_puts (copy.c_str ());
 }
 
-/* Print a string in the curses command window.  The output is
-   buffered.  It is up to the caller to refresh the screen if
-   necessary.  */
+/* Print a STRING in the curses command window W.  Set *SAW_NL to true if the
+   STRING contains a newline.  */
 
-void
-tui_puts (const char *string, WINDOW *w)
+static void
+tui_puts_1 (const char *string, WINDOW *w, bool *saw_nl)
 {
   if (w == nullptr)
     w = TUI_CMD_WIN->handle.get ();
 
+  if (saw_nl != nullptr)
+    *saw_nl = false;
+
   while (true)
     {
       const char *next = strpbrk (string, "\n\1\2\033\t");
@@ -485,6 +487,10 @@ tui_puts (const char *string, WINDOW *w)
 	  break;
 
 	case '\n':
+	  if (saw_nl != nullptr)
+	    *saw_nl = true;
+	  /* FALLTHROUGH */
+
 	case '\t':
 	  do_tui_putc (w, c);
 	  ++next;
@@ -504,57 +510,49 @@ tui_puts (const char *string, WINDOW *w)
 	  break;
 
 	default:
-	  gdb_assert_not_reached ("missing case in tui_puts");
+	  gdb_assert_not_reached ("missing case in tui_puts_1");
 	}
 
       string = next;
     }
+}
+
+/* Print a string in the curses command window.  The output is
+   buffered.  It is up to the caller to refresh the screen if
+   necessary.  */
+
+void
+tui_puts (const char *string, WINDOW *w)
+{
+  if (w == nullptr)
+    w = TUI_CMD_WIN->handle.get ();
+
+  tui_puts_1 (string, w, nullptr);
 
   if (TUI_CMD_WIN != nullptr && w == TUI_CMD_WIN->handle.get ())
     update_cmdwin_start_line ();
 }
 
+/* Print a STRING in the curses command window W.  Update HEIGHT according to
+   line wraps.  */
+
 static void
 tui_puts_internal (WINDOW *w, const char *string, int *height)
 {
-  char c;
-  int prev_col = 0;
-  bool saw_nl = false;
-
-  while ((c = *string++) != 0)
-    {
-      if (c == '\n')
-	saw_nl = true;
+  bool saw_nl;
+  int prev_line = getcury (w);
 
-      if (c == '\1' || c == '\2')
-	{
-	  /* Ignore these, they are readline escape-marking
-	     sequences.  */
-	}
-      else
-	{
-	  if (c == '\033')
-	    {
-	      size_t bytes_read = apply_ansi_escape (w, string - 1);
-	      if (bytes_read > 0)
-		{
-		  string = string + bytes_read - 1;
-		  continue;
-		}
-	    }
-	  do_tui_putc (w, c);
+  tui_puts_1 (string, w, &saw_nl);
 
-	  if (height != nullptr)
-	    {
-	      int col = getcurx (w);
-	      if (col <= prev_col)
-		++*height;
-	      prev_col = col;
-	    }
-	}
+  if (height != nullptr)
+    {
+      int line = getcury (w);
+      *height += line - prev_line;
     }
+
   if (TUI_CMD_WIN != nullptr && w == TUI_CMD_WIN->handle.get ())
     update_cmdwin_start_line ();
+
   if (saw_nl)
     wrefresh (w);
 }

base-commit: 768d1d879be2d134e049521f28d4d5e03b69bafc
-- 
2.35.3


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] [gdb/tui] Handle unicode chars in prompt
  2023-05-31 11:29     ` Tom de Vries
@ 2023-06-08 22:44       ` Tom de Vries
  2023-06-09 15:13         ` Tom Tromey
  0 siblings, 1 reply; 14+ messages in thread
From: Tom de Vries @ 2023-06-08 22:44 UTC (permalink / raw)
  To: Tom Tromey, Tom de Vries via Gdb-patches

On 5/31/23 13:29, Tom de Vries wrote:
> On 5/30/23 19:03, Tom Tromey wrote:
>>>> In TUI, the prompt is written out by tui_puts_internal, which 
>>>> outputs one byte
>>>> at a time using waddch, which apparantly breaks multi-byte char 
>>>> support.
>>>> Fix this by detecting multi-byte chars in tui_puts_internal, and
>>>> printing them using
>>>> waddnstr.
>>
>>> FWIW, I just came across this commit, which seems relevant:
>>
>> Tom> Note that tui_puts_internal remains.  It is needed to handle 
>> computing
>> Tom> the start line of the readline prompt, which is difficult to do
>> Tom> properly in the case where redisplaying can also cause the command
>> Tom> window to scroll.  This might be possible to implement by 
>> reverting to
>> Tom> single "character" output, by using mbsrtowcs for its side 
>> effects to
>> Tom> find character boundaries in the input.  I have not attempted this.
>> Tom> ...
>>
>> I no longer remember what made this difficult.  I wonder if it's
>> possible to simply emit as many characters as possible in a single call,
>> and then use getyx to figure out the length of the prompt after it has
>> been fully displayed.  If the prompt wraps or if it takes multiple
>> lines, offhand it seems fine to just pick whatever the final column
>> happens to be.
> 
> I've given that a try, and that seems to work.
> 
> I also realized that we don't cover wrapping prompts in the testsuite, 
> so I wrote a test-case ( 
> https://sourceware.org/pipermail/gdb-patches/2023-May/199950.html ).

I've committed the test-case (excluding the proposed fix for now), with 
an extra check that FAILs in combination with this patch:
...
FAIL: gdb.tui/long-prompt.exp: prompt size == width + 1: end of screen: 
scrolling
...
because we have:
...
    17 (gdb) set prompt 123456789A123456789B123
    18 456789C123456789D>
    19 123456789A123456789B123456789C123456789D
    20 123456789A123456789B123456789C123456789D
    21 123456789A123456789B123456789C123456789D
    22 >set prompt (gdb)
    23 (gdb)
...
instead of the expected:
...
    19 (gdb) set prompt 123456789A123456789B123
    20 456789C123456789D>
    21 123456789A123456789B123456789C123456789D
    22 >set prompt (gdb)
    23 (gdb)
...

The logic I used in this patch:
...
+  if (height != nullptr)
+    {
+      int line = getcury (w);
+      *height += line - prev_line;
      }
...
was to use the current line to detect a wrap but that doesn't work if 
writing the prompt wraps at the last line which then generates a scroll.

This is the reason that a wrap is detected in the original code using a 
reduction in wrap position.

But AFAIU that doesn't work either for this "maximum-string" approach.
If the string is long enough, it's possible to wrap and increase column 
position.

In conclusion, AFAIU this approach doesn't work.

Thanks,
- Tom


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] [gdb/tui] Handle unicode chars in prompt
  2023-05-26 13:56 ` Eli Zaretskii
  2023-05-30 16:51   ` Tom Tromey
@ 2023-06-09  9:34   ` Tom de Vries
  2023-06-09 10:21     ` Eli Zaretskii
  1 sibling, 1 reply; 14+ messages in thread
From: Tom de Vries @ 2023-06-09  9:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gdb-patches, tom

On 5/26/23 15:56, Eli Zaretskii wrote:
>> Cc: Tom Tromey <tom@tromey.com>
>> Date: Fri, 26 May 2023 15:25:12 +0200
>> From: Tom de Vries via Gdb-patches <gdb-patches@sourceware.org>
>>
>> +/* Return true if STRING starts with a multi-byte char.  Return the length of
>> +   the multi-byte char in LEN, or 0 in case it's a multi-byte null char.
>> +   Implementation based on _rl_read_mbchar.  */
>> +
>> +static bool
>> +is_mb_char (const char *string, int &len)
>> +{
>> +  for (len = 1; len <= MB_CUR_MAX; len++)
>> +    {
>> +      size_t res;
>> +
>> +      {
>> +	wchar_t wc;  <<<<<<<<<<<<<<<<<<<<<<<
>> +	mbstate_t ps;
>> +	memset (&ps, 0, sizeof (mbstate_t));
>> +	res = mbrtowc (&wc, string, len, &ps);
> 
> The above assumes each call to mbrtowc produces only one wchar_t
> value.  But that's non-portable: on MS-Windows wchar_t is a 16-bit
> wide data type, and wchar_t "wide characters" are actually encoded in
> UTF-16.  So characters beyond the BMP will yield 2 wchar_t values, not
> one.
> 

Hi Eli,

I see, thanks for pointing that out.  I've fixed this by using nullptr 
instead of &wc.

> One additional caveat: "multibyte" != "UTF-8".  There's more than one
> multibyte encoding, and the current locale could use some non-UTF-8
> encoding instead.  For example, some encoding of the ISO-2022 family.
> I'm not sure what this means for the issue at hand.
> 

AFAIU, interpreting the currently locale and encoding correctly is up to 
mbrtowc, so as long as it does that correctly I think there's no problem.

> Yet another consideration is whether tui_puts_internal is used for
> outputting text in the target charset, in which case you may have
> problems with using mbrtowc, because AFAIK that supports only the
> current locale's codeset.  If the target charset is different from the
> locale's (basically, the host) charset, and we don't convert one to
> the other before calling tui_puts_internal, mbrtowc will fail.
> 

[ Addressed by Tom Tromey in this thread. ]

> Yes, this is a mess.
> 

Indeed :)

V2 posted here ( 
https://sourceware.org/pipermail/gdb-patches/2023-June/200181.html ).

Thanks,
- Tom


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] [gdb/tui] Handle unicode chars in prompt
  2023-05-30 17:03   ` Tom Tromey
  2023-05-30 18:07     ` DJ Delorie
  2023-05-31 11:29     ` Tom de Vries
@ 2023-06-09  9:48     ` Tom de Vries
  2023-06-09 15:15       ` Tom Tromey
  2 siblings, 1 reply; 14+ messages in thread
From: Tom de Vries @ 2023-06-09  9:48 UTC (permalink / raw)
  To: Tom Tromey, Tom de Vries via Gdb-patches

On 5/30/23 19:03, Tom Tromey wrote:
>>> In TUI, the prompt is written out by tui_puts_internal, which outputs one byte
>>> at a time using waddch, which apparantly breaks multi-byte char support.
>>> Fix this by detecting multi-byte chars in tui_puts_internal, and
>>> printing them using
>>> waddnstr.
> 
>> FWIW, I just came across this commit, which seems relevant:
> 
> Tom> Note that tui_puts_internal remains.  It is needed to handle computing
> Tom> the start line of the readline prompt, which is difficult to do
> Tom> properly in the case where redisplaying can also cause the command
> Tom> window to scroll.  This might be possible to implement by reverting to
> Tom> single "character" output, by using mbsrtowcs for its side effects to
> Tom> find character boundaries in the input.  I have not attempted this.
> Tom> ...
> 
> I no longer remember what made this difficult.  I wonder if it's
> possible to simply emit as many characters as possible in a single call,
> and then use getyx to figure out the length of the prompt after it has
> been fully displayed.  If the prompt wraps or if it takes multiple
> lines, offhand it seems fine to just pick whatever the final column
> happens to be.
> 
> 
> Using wchar functions in gdb is a pain; at least in the past,
> gdb_wchar.h was written to support systems that don't support these at
> all (DJGPP - not sure if that host even builds any more).
> 
> Some characters may take multiple columns (see 'wcwidth').  I'd hope
> that the display-and-getyx approach would avoid having to have gdb
> understand this; though I suppose gdb's pager probably already gets this
> wrong.

Thanks for the pointer.

In v2 I've used #ifdef HAVE_BTOWC to guard the use of mbrtowc, with a 
reference to gdb_wchar.h.

[ Though I do wonder whether we could rely on the c++ stdlib instead and 
just use std::mbrtowc.  ]

I've also fixed a bug, the v1 version didn't take care of wrapping due 
to printing a multi-byte character.

I've added a simplification patch to make the structure of the function 
easier to understand, making that bug easier to spot.

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] [gdb/tui] Handle unicode chars in prompt
  2023-06-09  9:34   ` Tom de Vries
@ 2023-06-09 10:21     ` Eli Zaretskii
  0 siblings, 0 replies; 14+ messages in thread
From: Eli Zaretskii @ 2023-06-09 10:21 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gdb-patches, tom

> Date: Fri, 9 Jun 2023 11:34:28 +0200
> Cc: gdb-patches@sourceware.org, tom@tromey.com
> From: Tom de Vries <tdevries@suse.de>
> 
> > One additional caveat: "multibyte" != "UTF-8".  There's more than one
> > multibyte encoding, and the current locale could use some non-UTF-8
> > encoding instead.  For example, some encoding of the ISO-2022 family.
> > I'm not sure what this means for the issue at hand.
> > 
> 
> AFAIU, interpreting the currently locale and encoding correctly is up to 
> mbrtowc, so as long as it does that correctly I think there's no problem.

Depends on how and for what purposes will the is_mb_char function be
used, I guess.  If it is used to mean "is this a Unicode character
encoded in UTF-8", then the results might not be what the caller
expects.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] [gdb/tui] Handle unicode chars in prompt
  2023-06-08 22:44       ` Tom de Vries
@ 2023-06-09 15:13         ` Tom Tromey
  0 siblings, 0 replies; 14+ messages in thread
From: Tom Tromey @ 2023-06-09 15:13 UTC (permalink / raw)
  To: Tom de Vries via Gdb-patches; +Cc: Tom Tromey, Tom de Vries

>>>>> "Tom" == Tom de Vries via Gdb-patches <gdb-patches@sourceware.org> writes:

Sorry, I missed this email when looking at your latest TUI patches.

Tom> The logic I used in this patch:
Tom> ...
Tom> +  if (height != nullptr)
Tom> +    {
Tom> +      int line = getcury (w);
Tom> +      *height += line - prev_line;
Tom>      }
Tom> ...
Tom> was to use the current line to detect a wrap but that doesn't work if
Tom> writing the prompt wraps at the last line which then generates a
Tom> scroll.

Yeah, I see.  This whole thing is maddening to me because ncurses does
have a flag that indicates that the output wrapped, and while it is in
curses.h, it is clearly non-exported, there's not even an ncurses
extension for it.  And, PDcurses does not have this.

Tom> In conclusion, AFAIU this approach doesn't work.

Yeah :(

Tom

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] [gdb/tui] Handle unicode chars in prompt
  2023-06-09  9:48     ` Tom de Vries
@ 2023-06-09 15:15       ` Tom Tromey
  0 siblings, 0 replies; 14+ messages in thread
From: Tom Tromey @ 2023-06-09 15:15 UTC (permalink / raw)
  To: Tom de Vries via Gdb-patches; +Cc: Tom Tromey, Tom de Vries

Tom> [ Though I do wonder whether we could rely on the c++ stdlib instead
Tom> and just use std::mbrtowc.  ]

I don't know.  We learned with std::thread that not all host ports are
equal, so I think we'd just have to try it and see.

Tom

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-06-09 15:15 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-26 13:25 [PATCH] [gdb/tui] Handle unicode chars in prompt Tom de Vries
2023-05-26 13:56 ` Eli Zaretskii
2023-05-30 16:51   ` Tom Tromey
2023-06-09  9:34   ` Tom de Vries
2023-06-09 10:21     ` Eli Zaretskii
2023-05-26 15:44 ` Tom de Vries
2023-05-30 17:03   ` Tom Tromey
2023-05-30 18:07     ` DJ Delorie
2023-05-31  0:02       ` Tom Tromey
2023-05-31 11:29     ` Tom de Vries
2023-06-08 22:44       ` Tom de Vries
2023-06-09 15:13         ` Tom Tromey
2023-06-09  9:48     ` Tom de Vries
2023-06-09 15:15       ` Tom Tromey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).