public inbox for gdb@sourceware.org
 help / color / mirror / Atom feed
* UTF-8 not working for MI?
@ 2010-08-19 17:25 Marc Khouzam
  2010-08-19 18:20 ` Tom Tromey
  0 siblings, 1 reply; 6+ messages in thread
From: Marc Khouzam @ 2010-08-19 17:25 UTC (permalink / raw)
  To: 'gdb@sourceware.org'

Hi,

I've noticed that in MI mode, gdb does not show UTF-8 characters properly.
Please see short session comparaison below.
Surprisingly, I get proper output when using an MI command from CLI mode.

Is there a setting I'm supposed to turn on when using MI mode? or is this
a bug?  If it is a bug, I can open a PR.

Thanks

Marc


Program
1       wchar_t a = 0xe4; // ä
2       int main()
3       {
4           return 0;
5       }

> gdb -nx a.out
GNU gdb (GDB) 7.2.50.20100816-cvs
(gdb) p a
$1 = 228 L'ä'
(gdb) interpreter-exec mi "-var-create - * a"
^done,name="var1",numchild="0",value="228 L'ä'",type="wchar_t",has_more="0"
(gdb) show charset 
The host character set is "auto; currently UTF-8".
The target character set is "auto; currently UTF-8".
The target wide character set is "auto; currently UTF-32".

=====

> gdb -i mi -nx a.out
=thread-group-added,id="i1"
~"GNU gdb (GDB) 7.2.50.20100816-cvs\n"
(gdb) -gdb-show charset
^done,value="auto"
(gdb) p a
&"p a\n"
~"$1 = 228 L'\303\244'"
~"\n"
^done
(gdb) -var-create - * a
^done,name="var1",numchild="0",value="228 L'\303\244'",type="wchar_t",has_more="0"
(gdb) show charset 
&"show charset \n"
~"The host character set is \"auto; currently UTF-8\".\n"
~"The target character set is \"auto; currently UTF-8\".\n"
~"The target wide character set is \"auto; currently UTF-32\".\n"
^done

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: UTF-8 not working for MI?
  2010-08-19 17:25 UTF-8 not working for MI? Marc Khouzam
@ 2010-08-19 18:20 ` Tom Tromey
  2010-08-20  1:57   ` Marc Khouzam
  0 siblings, 1 reply; 6+ messages in thread
From: Tom Tromey @ 2010-08-19 18:20 UTC (permalink / raw)
  To: Marc Khouzam; +Cc: 'gdb@sourceware.org'

>>>>> "Marc" == Marc Khouzam <marc.khouzam@ericsson.com> writes:

Marc> I've noticed that in MI mode, gdb does not show UTF-8 characters
Marc> properly.

Marc> The host character set is "auto; currently UTF-8".

Marc> ~"$1 = 228 L'\303\244'"

\303\244 is the UTF-8 representation of U+00E4

I can't readily find where the escaping is done, but the MI docs say:

`C-STRING ==>'
     `""" SEVEN-BIT-ISO-C-STRING-CONTENT """'

which I presume means that readers must do this decoding.

Tom

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: UTF-8 not working for MI?
  2010-08-19 18:20 ` Tom Tromey
@ 2010-08-20  1:57   ` Marc Khouzam
  2010-08-20 18:51     ` Tom Tromey
  0 siblings, 1 reply; 6+ messages in thread
From: Marc Khouzam @ 2010-08-20  1:57 UTC (permalink / raw)
  To: Tom Tromey; +Cc: 'gdb@sourceware.org'

Sorry for the poorly formatted reply: I'm on a dumb web client.

> Tom Tromey wrote:
>
> I can't readily find where the escaping is done, but the MI docs say:
> `C-STRING ==>'
>     `""" SEVEN-BIT-ISO-C-STRING-CONTENT """'
> which I presume means that readers must do this decoding.

I guess that will have to be my solution, but it still seems suspicious
since running this MI command from CLI works (notice the value field):

(gdb) interpreter-exec mi "-var-create - * a"
^done,name="var1",numchild="0",value="228 L'ä'",type="wchar_t",has_more="0"

Or maybe using "interpreter-exec mi" does not quite give the true MI interpreter?

Marc
 
________________________________________
From: Tom Tromey [tromey@redhat.com]
Sent: August 19, 2010 2:20 PM
To: Marc Khouzam
Cc: 'gdb@sourceware.org'
Subject: Re: UTF-8 not working for MI?

>>>>> "Marc" == Marc Khouzam <marc.khouzam@ericsson.com> writes:

Marc> I've noticed that in MI mode, gdb does not show UTF-8 characters
Marc> properly.

Marc> The host character set is "auto; currently UTF-8".

Marc> ~"$1 = 228 L'\303\244'"

\303\244 is the UTF-8 representation of U+00E4

I can't readily find where the escaping is done, but the MI docs say:

`C-STRING ==>'
     `""" SEVEN-BIT-ISO-C-STRING-CONTENT """'

which I presume means that readers must do this decoding.

Tom

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: UTF-8 not working for MI?
  2010-08-20  1:57   ` Marc Khouzam
@ 2010-08-20 18:51     ` Tom Tromey
  2010-08-27 15:21       ` Marc Khouzam
  0 siblings, 1 reply; 6+ messages in thread
From: Tom Tromey @ 2010-08-20 18:51 UTC (permalink / raw)
  To: Marc Khouzam; +Cc: 'gdb@sourceware.org'

>>>>> "Marc" == Marc Khouzam <marc.khouzam@ericsson.com> writes:

Marc> Or maybe using "interpreter-exec mi" does not quite give the true
Marc> MI interpreter?

I dug harder and found this:

static void
mi_command_loop (int mi_version)
{
  /* Turn off 8 bit strings in quoted output.  Any character with the
     high bit set is printed using C's octal format. */
  sevenbit_strings = 1;


So, you can reproduce this situation from the CLI by "set print
sevenbit-strings on" before invoking the MI command.

It is an oddity that currently an MI consumer must check gdb's host
charset in order to know how to decode its output.  I would recommend
that the client force it to be UTF-8, but I think this currently may not
work with PHONY_ICONV.

Tom

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: UTF-8 not working for MI?
  2010-08-20 18:51     ` Tom Tromey
@ 2010-08-27 15:21       ` Marc Khouzam
  2010-08-27 16:45         ` Tom Tromey
  0 siblings, 1 reply; 6+ messages in thread
From: Marc Khouzam @ 2010-08-27 15:21 UTC (permalink / raw)
  To: 'Tom Tromey', 'gdb@sourceware.org'

> -----Original Message-----
> From: Tom Tromey [mailto:tromey@redhat.com] 
> Sent: Friday, August 20, 2010 2:51 PM
> To: Marc Khouzam
> Cc: 'gdb@sourceware.org'
> Subject: Re: UTF-8 not working for MI?

Sorry for the delay.

> >>>>> "Marc" == Marc Khouzam <marc.khouzam@ericsson.com> writes:
> 
> Marc> Or maybe using "interpreter-exec mi" does not quite 
> give the true
> Marc> MI interpreter?
> 
> I dug harder and found this:
> 
> static void
> mi_command_loop (int mi_version)
> {
>   /* Turn off 8 bit strings in quoted output.  Any character with the
>      high bit set is printed using C's octal format. */
>   sevenbit_strings = 1;
> 
> 
> So, you can reproduce this situation from the CLI by "set print
> sevenbit-strings on" before invoking the MI command.
>
> It is an oddity that currently an MI consumer must check gdb's host
> charset in order to know how to decode its output.  I would recommend
> that the client force it to be UTF-8, but I think this 
> currently may not work with PHONY_ICONV.

Thanks for taking the time!
I'm not sure how PHONY_ICONV works, but I'm guessing you meant
that it may cause trouble when GDB is used from Eclipse.

I just tried using "set print sevenbit-strings off" in Eclipse
I can see the proper UTF-8 characters returned by GDB.  So it seems
like a good solution.

I'd like to use this solution but I'm concerned at why MI conciously
uses sevenbit-strings?  Maybe there is a reason behind it and I'm going
to shoot myself in the foot by ignoring it?

Thanks

Marc



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: UTF-8 not working for MI?
  2010-08-27 15:21       ` Marc Khouzam
@ 2010-08-27 16:45         ` Tom Tromey
  0 siblings, 0 replies; 6+ messages in thread
From: Tom Tromey @ 2010-08-27 16:45 UTC (permalink / raw)
  To: Marc Khouzam; +Cc: 'gdb@sourceware.org'

>>>>> "Marc" == Marc Khouzam <marc.khouzam@ericsson.com> writes:

Tom> It is an oddity that currently an MI consumer must check gdb's host
Tom> charset in order to know how to decode its output.  I would recommend
Tom> that the client force it to be UTF-8, but I think this 
Tom> currently may not work with PHONY_ICONV.

Marc> Thanks for taking the time!
Marc> I'm not sure how PHONY_ICONV works, but I'm guessing you meant
Marc> that it may cause trouble when GDB is used from Eclipse.

Yes, the problem is that the charset machinery in gdb is host-dependent,
and hosts using the PHONY_ICONV code can't use UTF-8.

Marc> I just tried using "set print sevenbit-strings off" in Eclipse
Marc> I can see the proper UTF-8 characters returned by GDB.  So it seems
Marc> like a good solution.

Marc> I'd like to use this solution but I'm concerned at why MI conciously
Marc> uses sevenbit-strings?  Maybe there is a reason behind it and I'm going
Marc> to shoot myself in the foot by ignoring it?

I don't know the reason for this decision.  It is in the grammar,
though, so it seems safest to just follow it.

Tom

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-08-27 16:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-19 17:25 UTF-8 not working for MI? Marc Khouzam
2010-08-19 18:20 ` Tom Tromey
2010-08-20  1:57   ` Marc Khouzam
2010-08-20 18:51     ` Tom Tromey
2010-08-27 15:21       ` Marc Khouzam
2010-08-27 16:45         ` Tom Tromey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).