* UTF-8 not working for MI?
@ 2010-08-19 17:25 Marc Khouzam
2010-08-19 18:20 ` Tom Tromey
0 siblings, 1 reply; 6+ messages in thread
From: Marc Khouzam @ 2010-08-19 17:25 UTC (permalink / raw)
To: 'gdb@sourceware.org'
Hi,
I've noticed that in MI mode, gdb does not show UTF-8 characters properly.
Please see short session comparaison below.
Surprisingly, I get proper output when using an MI command from CLI mode.
Is there a setting I'm supposed to turn on when using MI mode? or is this
a bug? If it is a bug, I can open a PR.
Thanks
Marc
Program
1 wchar_t a = 0xe4; // ä
2 int main()
3 {
4 return 0;
5 }
> gdb -nx a.out
GNU gdb (GDB) 7.2.50.20100816-cvs
(gdb) p a
$1 = 228 L'ä'
(gdb) interpreter-exec mi "-var-create - * a"
^done,name="var1",numchild="0",value="228 L'ä'",type="wchar_t",has_more="0"
(gdb) show charset
The host character set is "auto; currently UTF-8".
The target character set is "auto; currently UTF-8".
The target wide character set is "auto; currently UTF-32".
=====
> gdb -i mi -nx a.out
=thread-group-added,id="i1"
~"GNU gdb (GDB) 7.2.50.20100816-cvs\n"
(gdb) -gdb-show charset
^done,value="auto"
(gdb) p a
&"p a\n"
~"$1 = 228 L'\303\244'"
~"\n"
^done
(gdb) -var-create - * a
^done,name="var1",numchild="0",value="228 L'\303\244'",type="wchar_t",has_more="0"
(gdb) show charset
&"show charset \n"
~"The host character set is \"auto; currently UTF-8\".\n"
~"The target character set is \"auto; currently UTF-8\".\n"
~"The target wide character set is \"auto; currently UTF-32\".\n"
^done
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: UTF-8 not working for MI?
2010-08-19 17:25 UTF-8 not working for MI? Marc Khouzam
@ 2010-08-19 18:20 ` Tom Tromey
2010-08-20 1:57 ` Marc Khouzam
0 siblings, 1 reply; 6+ messages in thread
From: Tom Tromey @ 2010-08-19 18:20 UTC (permalink / raw)
To: Marc Khouzam; +Cc: 'gdb@sourceware.org'
>>>>> "Marc" == Marc Khouzam <marc.khouzam@ericsson.com> writes:
Marc> I've noticed that in MI mode, gdb does not show UTF-8 characters
Marc> properly.
Marc> The host character set is "auto; currently UTF-8".
Marc> ~"$1 = 228 L'\303\244'"
\303\244 is the UTF-8 representation of U+00E4
I can't readily find where the escaping is done, but the MI docs say:
`C-STRING ==>'
`""" SEVEN-BIT-ISO-C-STRING-CONTENT """'
which I presume means that readers must do this decoding.
Tom
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: UTF-8 not working for MI?
2010-08-19 18:20 ` Tom Tromey
@ 2010-08-20 1:57 ` Marc Khouzam
2010-08-20 18:51 ` Tom Tromey
0 siblings, 1 reply; 6+ messages in thread
From: Marc Khouzam @ 2010-08-20 1:57 UTC (permalink / raw)
To: Tom Tromey; +Cc: 'gdb@sourceware.org'
Sorry for the poorly formatted reply: I'm on a dumb web client.
> Tom Tromey wrote:
>
> I can't readily find where the escaping is done, but the MI docs say:
> `C-STRING ==>'
> `""" SEVEN-BIT-ISO-C-STRING-CONTENT """'
> which I presume means that readers must do this decoding.
I guess that will have to be my solution, but it still seems suspicious
since running this MI command from CLI works (notice the value field):
(gdb) interpreter-exec mi "-var-create - * a"
^done,name="var1",numchild="0",value="228 L'ä'",type="wchar_t",has_more="0"
Or maybe using "interpreter-exec mi" does not quite give the true MI interpreter?
Marc
________________________________________
From: Tom Tromey [tromey@redhat.com]
Sent: August 19, 2010 2:20 PM
To: Marc Khouzam
Cc: 'gdb@sourceware.org'
Subject: Re: UTF-8 not working for MI?
>>>>> "Marc" == Marc Khouzam <marc.khouzam@ericsson.com> writes:
Marc> I've noticed that in MI mode, gdb does not show UTF-8 characters
Marc> properly.
Marc> The host character set is "auto; currently UTF-8".
Marc> ~"$1 = 228 L'\303\244'"
\303\244 is the UTF-8 representation of U+00E4
I can't readily find where the escaping is done, but the MI docs say:
`C-STRING ==>'
`""" SEVEN-BIT-ISO-C-STRING-CONTENT """'
which I presume means that readers must do this decoding.
Tom
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: UTF-8 not working for MI?
2010-08-20 1:57 ` Marc Khouzam
@ 2010-08-20 18:51 ` Tom Tromey
2010-08-27 15:21 ` Marc Khouzam
0 siblings, 1 reply; 6+ messages in thread
From: Tom Tromey @ 2010-08-20 18:51 UTC (permalink / raw)
To: Marc Khouzam; +Cc: 'gdb@sourceware.org'
>>>>> "Marc" == Marc Khouzam <marc.khouzam@ericsson.com> writes:
Marc> Or maybe using "interpreter-exec mi" does not quite give the true
Marc> MI interpreter?
I dug harder and found this:
static void
mi_command_loop (int mi_version)
{
/* Turn off 8 bit strings in quoted output. Any character with the
high bit set is printed using C's octal format. */
sevenbit_strings = 1;
So, you can reproduce this situation from the CLI by "set print
sevenbit-strings on" before invoking the MI command.
It is an oddity that currently an MI consumer must check gdb's host
charset in order to know how to decode its output. I would recommend
that the client force it to be UTF-8, but I think this currently may not
work with PHONY_ICONV.
Tom
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: UTF-8 not working for MI?
2010-08-20 18:51 ` Tom Tromey
@ 2010-08-27 15:21 ` Marc Khouzam
2010-08-27 16:45 ` Tom Tromey
0 siblings, 1 reply; 6+ messages in thread
From: Marc Khouzam @ 2010-08-27 15:21 UTC (permalink / raw)
To: 'Tom Tromey', 'gdb@sourceware.org'
> -----Original Message-----
> From: Tom Tromey [mailto:tromey@redhat.com]
> Sent: Friday, August 20, 2010 2:51 PM
> To: Marc Khouzam
> Cc: 'gdb@sourceware.org'
> Subject: Re: UTF-8 not working for MI?
Sorry for the delay.
> >>>>> "Marc" == Marc Khouzam <marc.khouzam@ericsson.com> writes:
>
> Marc> Or maybe using "interpreter-exec mi" does not quite
> give the true
> Marc> MI interpreter?
>
> I dug harder and found this:
>
> static void
> mi_command_loop (int mi_version)
> {
> /* Turn off 8 bit strings in quoted output. Any character with the
> high bit set is printed using C's octal format. */
> sevenbit_strings = 1;
>
>
> So, you can reproduce this situation from the CLI by "set print
> sevenbit-strings on" before invoking the MI command.
>
> It is an oddity that currently an MI consumer must check gdb's host
> charset in order to know how to decode its output. I would recommend
> that the client force it to be UTF-8, but I think this
> currently may not work with PHONY_ICONV.
Thanks for taking the time!
I'm not sure how PHONY_ICONV works, but I'm guessing you meant
that it may cause trouble when GDB is used from Eclipse.
I just tried using "set print sevenbit-strings off" in Eclipse
I can see the proper UTF-8 characters returned by GDB. So it seems
like a good solution.
I'd like to use this solution but I'm concerned at why MI conciously
uses sevenbit-strings? Maybe there is a reason behind it and I'm going
to shoot myself in the foot by ignoring it?
Thanks
Marc
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: UTF-8 not working for MI?
2010-08-27 15:21 ` Marc Khouzam
@ 2010-08-27 16:45 ` Tom Tromey
0 siblings, 0 replies; 6+ messages in thread
From: Tom Tromey @ 2010-08-27 16:45 UTC (permalink / raw)
To: Marc Khouzam; +Cc: 'gdb@sourceware.org'
>>>>> "Marc" == Marc Khouzam <marc.khouzam@ericsson.com> writes:
Tom> It is an oddity that currently an MI consumer must check gdb's host
Tom> charset in order to know how to decode its output. I would recommend
Tom> that the client force it to be UTF-8, but I think this
Tom> currently may not work with PHONY_ICONV.
Marc> Thanks for taking the time!
Marc> I'm not sure how PHONY_ICONV works, but I'm guessing you meant
Marc> that it may cause trouble when GDB is used from Eclipse.
Yes, the problem is that the charset machinery in gdb is host-dependent,
and hosts using the PHONY_ICONV code can't use UTF-8.
Marc> I just tried using "set print sevenbit-strings off" in Eclipse
Marc> I can see the proper UTF-8 characters returned by GDB. So it seems
Marc> like a good solution.
Marc> I'd like to use this solution but I'm concerned at why MI conciously
Marc> uses sevenbit-strings? Maybe there is a reason behind it and I'm going
Marc> to shoot myself in the foot by ignoring it?
I don't know the reason for this decision. It is in the grammar,
though, so it seems safest to just follow it.
Tom
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-08-27 16:45 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-19 17:25 UTF-8 not working for MI? Marc Khouzam
2010-08-19 18:20 ` Tom Tromey
2010-08-20 1:57 ` Marc Khouzam
2010-08-20 18:51 ` Tom Tromey
2010-08-27 15:21 ` Marc Khouzam
2010-08-27 16:45 ` Tom Tromey
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).