public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
* [RFC] Filter invalid encodings from Linux thread names
@ 2023-07-17 20:47 Tom Tromey
  2023-09-12 16:11 ` Tom Tromey
  2023-11-14 16:01 ` Tom Tromey
  0 siblings, 2 replies; 4+ messages in thread
From: Tom Tromey @ 2023-07-17 20:47 UTC (permalink / raw)
  To: gdb-patches; +Cc: Tom Tromey

On Linux, a thread can only be 16 bytes (including the trailing \0).
A user sent in a test case where this causes a truncated UTF-8
sequence, causing gdbserver to create invalid XML.

I went back and forth about different ways to solve this, and in the
end decided to fix it in gdbserver, with the reason being that it
seems important to generate correct XML for the <thread> response.

I am not totally sure whether the call to setlocale could have
unplanned consequences.  This is needed, though, for nl_langinfo to
return the correct result.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30618
---
 gdbserver/linux-low.cc | 59 ++++++++++++++++++++++++++++++++++++++++--
 gdbserver/server.cc    |  1 +
 2 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/gdbserver/linux-low.cc b/gdbserver/linux-low.cc
index 651f219b738..98bb345b415 100644
--- a/gdbserver/linux-low.cc
+++ b/gdbserver/linux-low.cc
@@ -38,14 +38,16 @@
 #include <unistd.h>
 #include <sys/syscall.h>
 #include <sched.h>
-#include <ctype.h>
 #include <pwd.h>
 #include <sys/types.h>
 #include <dirent.h>
 #include <sys/stat.h>
 #include <sys/vfs.h>
 #include <sys/uio.h>
+#include <langinfo.h>
+#include <iconv.h>
 #include "gdbsupport/filestuff.h"
+#include "gdbsupport/gdb-safe-ctype.h"
 #include "tracepoint.h"
 #include <inttypes.h>
 #include "gdbsupport/common-inferior.h"
@@ -6898,10 +6900,63 @@ current_lwp_ptid (void)
   return ptid_of (current_thread);
 }
 
+/* A helper function that copies NAME to DEST, replacing non-printable
+   characters with '?'.  Returns DEST as a convenience.  */
+
+static const char *
+replace_non_ascii (char *dest, const char *name)
+{
+  while (*name != '\0')
+    {
+      if (!ISPRINT (*name))
+	*dest++ = '?';
+      else
+	*dest++ = *name;
+      ++name;
+    }
+  return dest;
+}
+
 const char *
 linux_process_target::thread_name (ptid_t thread)
 {
-  return linux_proc_tid_get_name (thread);
+  static char dest[100];
+
+  const char *name = linux_proc_tid_get_name (thread);
+  if (name == nullptr)
+    return nullptr;
+
+  /* Linux limits the comm file to 16 bytes (including the trailing
+     \0.  If the program or thread name is set when using a multi-byte
+     encoding, this might cause it to be truncated mid-character.  In
+     this situation, sending the truncated form in an XML <thread>
+     response will cause a parse error in gdb.  So, instead convert
+     from the locale's encoding (we can't be sure this is the correct
+     encoding, but it's as good a guess as we have) to UTF-8, but in a
+     way that ignores any encoding errors.  See PR remote/30618.  */
+  const char *cset = nl_langinfo (CODESET);
+  iconv_t handle = iconv_open ("UTF-8//IGNORE", cset);
+  if (handle == (iconv_t) -1)
+    return replace_non_ascii (dest, name);
+
+  size_t inbytes = strlen (name);
+  char *inbuf = const_cast<char *> (name);
+  size_t outbytes = sizeof (dest);
+  char *outbuf = dest;
+  size_t result = iconv (handle, &inbuf, &inbytes, &outbuf, &outbytes);
+
+  if (result == (size_t) -1)
+    {
+      if (errno == E2BIG)
+	outbuf = &dest[sizeof (dest) - 1];
+      else if ((errno == EILSEQ || errno == EINVAL)
+	       && outbuf < &dest[sizeof (dest) - 2])
+	*outbuf++ '?';
+      *outbuf = '\0';
+    }
+
+  iconv_close (handle);
+  return *dest == '\0' ? nullptr : dest;
 }
 
 #if USE_THREAD_DB
diff --git a/gdbserver/server.cc b/gdbserver/server.cc
index c57270175b4..f6eb01af204 100644
--- a/gdbserver/server.cc
+++ b/gdbserver/server.cc
@@ -4056,6 +4056,7 @@ captured_main (int argc, char *argv[])
 int
 main (int argc, char *argv[])
 {
+  setlocale (LC_CTYPE, "");
 
   try
     {
-- 
2.41.0


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] Filter invalid encodings from Linux thread names
  2023-07-17 20:47 [RFC] Filter invalid encodings from Linux thread names Tom Tromey
@ 2023-09-12 16:11 ` Tom Tromey
  2023-11-14 16:01 ` Tom Tromey
  1 sibling, 0 replies; 4+ messages in thread
From: Tom Tromey @ 2023-09-12 16:11 UTC (permalink / raw)
  To: Tom Tromey; +Cc: gdb-patches

>>>>> "Tom" == Tom Tromey <tom@tromey.com> writes:

Tom> On Linux, a thread can only be 16 bytes (including the trailing \0).
Tom> A user sent in a test case where this causes a truncated UTF-8
Tom> sequence, causing gdbserver to create invalid XML.

Tom> I went back and forth about different ways to solve this, and in the
Tom> end decided to fix it in gdbserver, with the reason being that it
Tom> seems important to generate correct XML for the <thread> response.

Tom> I am not totally sure whether the call to setlocale could have
Tom> unplanned consequences.  This is needed, though, for nl_langinfo to
Tom> return the correct result.

Tom> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30618

I'd appreciate comments on this one.
Eventually I guess I'll checked it in, though.

Tom

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] Filter invalid encodings from Linux thread names
  2023-07-17 20:47 [RFC] Filter invalid encodings from Linux thread names Tom Tromey
  2023-09-12 16:11 ` Tom Tromey
@ 2023-11-14 16:01 ` Tom Tromey
  2023-11-27 21:01   ` Simon Marchi
  1 sibling, 1 reply; 4+ messages in thread
From: Tom Tromey @ 2023-11-14 16:01 UTC (permalink / raw)
  To: Tom Tromey; +Cc: gdb-patches

>>>>> "Tom" == Tom Tromey <tom@tromey.com> writes:

Tom> On Linux, a thread can only be 16 bytes (including the trailing \0).
Tom> A user sent in a test case where this causes a truncated UTF-8
Tom> sequence, causing gdbserver to create invalid XML.

Tom> I went back and forth about different ways to solve this, and in the
Tom> end decided to fix it in gdbserver, with the reason being that it
Tom> seems important to generate correct XML for the <thread> response.

Tom> I am not totally sure whether the call to setlocale could have
Tom> unplanned consequences.  This is needed, though, for nl_langinfo to
Tom> return the correct result.

Tom> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30618

I'm going to check this in now.

Tom

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC] Filter invalid encodings from Linux thread names
  2023-11-14 16:01 ` Tom Tromey
@ 2023-11-27 21:01   ` Simon Marchi
  0 siblings, 0 replies; 4+ messages in thread
From: Simon Marchi @ 2023-11-27 21:01 UTC (permalink / raw)
  To: Tom Tromey; +Cc: gdb-patches

On 11/14/23 11:01, Tom Tromey wrote:
>>>>>> "Tom" == Tom Tromey <tom@tromey.com> writes:
> 
> Tom> On Linux, a thread can only be 16 bytes (including the trailing \0).
> Tom> A user sent in a test case where this causes a truncated UTF-8
> Tom> sequence, causing gdbserver to create invalid XML.
> 
> Tom> I went back and forth about different ways to solve this, and in the
> Tom> end decided to fix it in gdbserver, with the reason being that it
> Tom> seems important to generate correct XML for the <thread> response.
> 
> Tom> I am not totally sure whether the call to setlocale could have
> Tom> unplanned consequences.  This is needed, though, for nl_langinfo to
> Tom> return the correct result.
> 
> Tom> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30618
> 
> I'm going to check this in now.
> 
> Tom

Hi Tom,

I see:

  $ make check TESTS="gdb.threads/names.exp" RUNTESTFLAGS="--target_board=native-extended-gdbserver"
  FAIL: gdb.threads/names.exp: list threads

I also see:

  FAIL: gdb.ada/tasks.exp: info threads

I didn't bisect that one, but I guess it's related.

Simon

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-11-27 21:01 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-17 20:47 [RFC] Filter invalid encodings from Linux thread names Tom Tromey
2023-09-12 16:11 ` Tom Tromey
2023-11-14 16:01 ` Tom Tromey
2023-11-27 21:01   ` Simon Marchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).