public inbox for gdb-prs@sourceware.org
help / color / mirror / Atom feed
* [Bug remote/30618] New: warning: while parsing threads: not well-formed (invalid token) - in non-stop + remote mode
@ 2023-07-05 18:52 jonah at kichwacoders dot com
  2023-07-06 22:22 ` [Bug remote/30618] " tromey at sourceware dot org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: jonah at kichwacoders dot com @ 2023-07-05 18:52 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=30618

            Bug ID: 30618
           Summary: warning: while parsing threads: not well-formed
                    (invalid token) - in non-stop + remote mode
           Product: gdb
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: remote
          Assignee: unassigned at sourceware dot org
          Reporter: jonah at kichwacoders dot com
  Target Milestone: ---

Create an empty main method in a file containing unicode characters and compile
it with gcc, start gdbserver and connect to it with gdb in non-stop mode and
the connection sequence fails (full log below):

(gdb) set non-stop on
(gdb) target remote :3333
Remote debugging using :3333
warning: while parsing threads: not well-formed (invalid token)
The target is not running (try extended-remote?)


With remote debugging on this is the output (run in MI mode because the
characters are escaped better):

&"  [remote] Sending packet: $QNonStop:1#8d\n"
&"  [remote] Packet received: OK\n"
&"  [remote] Sending packet: $qXfer:threads:read::0,1000#92\n"
&"  [remote] Packet received: l<threads>\\n<thread id=\"p10883.10883\"
core=\"8\" name=\"issue-275-\\346\\265\\213\\350\\257\"/>\\n</threads>\\n\n"
&"warning: while parsing threads: not well-formed (invalid token)\n"
&"  [remote] Sending packet: $qTStatus#49\n"
&"  [remote] Packet received:
T0;tnotrun:0;tframes:0;tcreated:0;tfree:500000;tsize:500000;circular:0;disconn:0;starttime:0;stoptime:0;username:;notes::\n"
&"  [remote] packet_ok: Packet qTStatus (trace-status) is supported\n"
&"  [remote] Sending packet: $qTfV#81\n"
&"  [remote] Packet received: 1:0:1:74726163655f74696d657374616d70\n"
&"  [remote] Sending packet: $qTsV#8e\n"
&"  [remote] Packet received: l\n"
=tsv-created,name="trace_timestamp",initial="0"
&"  [remote] Sending packet: $?#3f\n"
&"  [remote] Packet received:
T0506:0000000000000000;07:90daffffff7f0000;10:b032fef7ff7f0000;thread:p10883.10883;core:8;\n"
&"  [remote] Sending packet: $vStopped#55\n"
&"  [remote] Packet received: OK\n"
&"[remote] start_remote_1: exit\n"


Here is the source and versions I am using:

$ cat src/integration-tests/test-programs/issue-275-测试.c 
int main(int argc, char *argv[])
{
    return 0;
}
$ gcc -o src/integration-tests/test-programs/issue-275-测试 -g
src/integration-tests/test-programs/issue-275-测试.c
$ gcc --version
gcc (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ gdb --version
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

In case the encoding in bugzilla corrupt it, the 测试 is "test"
(https://translate.google.ca/?sl=auto&tl=en&text=%E6%B5%8B%E8%AF%95&op=translate)
and is encoded in UTF-8 as \xe6\xb5\x8b\xe8\xaf\x95 or \346\265\213\350\257\225

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug remote/30618] warning: while parsing threads: not well-formed (invalid token) - in non-stop + remote mode
  2023-07-05 18:52 [Bug remote/30618] New: warning: while parsing threads: not well-formed (invalid token) - in non-stop + remote mode jonah at kichwacoders dot com
@ 2023-07-06 22:22 ` tromey at sourceware dot org
  2023-07-13 16:20 ` tromey at sourceware dot org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: tromey at sourceware dot org @ 2023-07-06 22:22 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=30618

Tom Tromey <tromey at sourceware dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2023-07-06
                 CC|                            |tromey at sourceware dot org

--- Comment #1 from Tom Tromey <tromey at sourceware dot org> ---
I debugged this a little, and the issue is that the Linux kernel
truncates the 'comm' file at 16 bytes.  This truncates the final
character in the name -- yielding an invalid UTF-8 sequence, which
gdbserver dutifully passes back to gdb.

I am not sure how to handle this.

One idea is to convert all non-ASCII characters to hex.
Or just drop them.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug remote/30618] warning: while parsing threads: not well-formed (invalid token) - in non-stop + remote mode
  2023-07-05 18:52 [Bug remote/30618] New: warning: while parsing threads: not well-formed (invalid token) - in non-stop + remote mode jonah at kichwacoders dot com
  2023-07-06 22:22 ` [Bug remote/30618] " tromey at sourceware dot org
@ 2023-07-13 16:20 ` tromey at sourceware dot org
  2023-07-13 21:26 ` tromey at sourceware dot org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: tromey at sourceware dot org @ 2023-07-13 16:20 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=30618

--- Comment #2 from Tom Tromey <tromey at sourceware dot org> ---
Since this is Linux-specific we could probably just rely
directly on iconv here -- iconv the 'comm' contents to
UTF-8 and drop / substitute anything that gives an error.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug remote/30618] warning: while parsing threads: not well-formed (invalid token) - in non-stop + remote mode
  2023-07-05 18:52 [Bug remote/30618] New: warning: while parsing threads: not well-formed (invalid token) - in non-stop + remote mode jonah at kichwacoders dot com
  2023-07-06 22:22 ` [Bug remote/30618] " tromey at sourceware dot org
  2023-07-13 16:20 ` tromey at sourceware dot org
@ 2023-07-13 21:26 ` tromey at sourceware dot org
  2023-07-17 20:48 ` tromey at sourceware dot org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: tromey at sourceware dot org @ 2023-07-13 21:26 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=30618

--- Comment #3 from Tom Tromey <tromey at sourceware dot org> ---
One other issue here is knowing the correct encoding to use.
gdb itself can pass in target_charset().
I guess gdbserver could use the prevailing encoding from the locale.

I wonder if we even care about non-ASCII characters here.
What if we substitute ? for those instead.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug remote/30618] warning: while parsing threads: not well-formed (invalid token) - in non-stop + remote mode
  2023-07-05 18:52 [Bug remote/30618] New: warning: while parsing threads: not well-formed (invalid token) - in non-stop + remote mode jonah at kichwacoders dot com
                   ` (2 preceding siblings ...)
  2023-07-13 21:26 ` tromey at sourceware dot org
@ 2023-07-17 20:48 ` tromey at sourceware dot org
  2023-07-19 17:40 ` jonah at kichwacoders dot com
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: tromey at sourceware dot org @ 2023-07-17 20:48 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=30618

--- Comment #4 from Tom Tromey <tromey at sourceware dot org> ---
https://sourceware.org/pipermail/gdb-patches/2023-July/200971.html

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug remote/30618] warning: while parsing threads: not well-formed (invalid token) - in non-stop + remote mode
  2023-07-05 18:52 [Bug remote/30618] New: warning: while parsing threads: not well-formed (invalid token) - in non-stop + remote mode jonah at kichwacoders dot com
                   ` (3 preceding siblings ...)
  2023-07-17 20:48 ` tromey at sourceware dot org
@ 2023-07-19 17:40 ` jonah at kichwacoders dot com
  2023-11-14 16:14 ` cvs-commit at gcc dot gnu.org
  2023-11-15 13:53 ` tromey at sourceware dot org
  6 siblings, 0 replies; 8+ messages in thread
From: jonah at kichwacoders dot com @ 2023-07-19 17:40 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=30618

--- Comment #5 from Jonah Graham <jonah at kichwacoders dot com> ---
> This truncates the final
> character in the name -- yielding an invalid UTF-8 sequence, which
> gdbserver dutifully passes back to gdb.

Thanks Tom - with this explanation I was able to craft my test in
cdt-gdb-adapter to avoid this bug where I am trying to improve unicode support
https://github.com/eclipse-cdt-cloud/cdt-gdb-adapter/pull/276.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug remote/30618] warning: while parsing threads: not well-formed (invalid token) - in non-stop + remote mode
  2023-07-05 18:52 [Bug remote/30618] New: warning: while parsing threads: not well-formed (invalid token) - in non-stop + remote mode jonah at kichwacoders dot com
                   ` (4 preceding siblings ...)
  2023-07-19 17:40 ` jonah at kichwacoders dot com
@ 2023-11-14 16:14 ` cvs-commit at gcc dot gnu.org
  2023-11-15 13:53 ` tromey at sourceware dot org
  6 siblings, 0 replies; 8+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-14 16:14 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=30618

--- Comment #6 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tom Tromey <tromey@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=07b3255c3bae7126a0d679f957788560351eb236

commit 07b3255c3bae7126a0d679f957788560351eb236
Author: Tom Tromey <tom@tromey.com>
Date:   Thu Jul 13 17:28:48 2023 -0600

    Filter invalid encodings from Linux thread names

    On Linux, a thread can only be 16 bytes (including the trailing \0).
    A user sent in a test case where this causes a truncated UTF-8
    sequence, causing gdbserver to create invalid XML.

    I went back and forth about different ways to solve this, and in the
    end decided to fix it in gdbserver, with the reason being that it
    seems important to generate correct XML for the <thread> response.

    I am not totally sure whether the call to setlocale could have
    unplanned consequences.  This is needed, though, for nl_langinfo to
    return the correct result.

    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30618

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug remote/30618] warning: while parsing threads: not well-formed (invalid token) - in non-stop + remote mode
  2023-07-05 18:52 [Bug remote/30618] New: warning: while parsing threads: not well-formed (invalid token) - in non-stop + remote mode jonah at kichwacoders dot com
                   ` (5 preceding siblings ...)
  2023-11-14 16:14 ` cvs-commit at gcc dot gnu.org
@ 2023-11-15 13:53 ` tromey at sourceware dot org
  6 siblings, 0 replies; 8+ messages in thread
From: tromey at sourceware dot org @ 2023-11-15 13:53 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=30618

Tom Tromey <tromey at sourceware dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
   Target Milestone|---                         |15.1
         Resolution|---                         |FIXED

--- Comment #7 from Tom Tromey <tromey at sourceware dot org> ---
Fixed.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-11-15 13:53 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-05 18:52 [Bug remote/30618] New: warning: while parsing threads: not well-formed (invalid token) - in non-stop + remote mode jonah at kichwacoders dot com
2023-07-06 22:22 ` [Bug remote/30618] " tromey at sourceware dot org
2023-07-13 16:20 ` tromey at sourceware dot org
2023-07-13 21:26 ` tromey at sourceware dot org
2023-07-17 20:48 ` tromey at sourceware dot org
2023-07-19 17:40 ` jonah at kichwacoders dot com
2023-11-14 16:14 ` cvs-commit at gcc dot gnu.org
2023-11-15 13:53 ` tromey at sourceware dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).