GDB/MI reporting non-ASCII file names

public inbox for gdb@sourceware.org
 help / color / mirror / Atom feed

* GDB/MI reporting non-ASCII file names
@ 2015-09-29  9:18 Eli Zaretskii
  2015-09-30 11:52 ` Pedro Alves
  0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2015-09-29  9:18 UTC (permalink / raw)
  To: gdb

It seems that "gdb -i=mi" reports non-ASCII characters in filenames as
octal escapes.  Here's an example from a GNU/Linux system whose
locale's codeset is UTF-8:

  (gdb)
  -file-list-exec-source-file
  ^done,line="1",file="/home/e/eliz/\320\277\321\200\320\276\320\262\320\265\321\200\320\272\320\260.c",fullname="/srv/data/home/e/eliz/\320\277\321\200\320\276\320\262\320\265\321\200\320\272\320\260.c",macro-info="1"

Each of these \nnn is a string of literal ASCII characters, not a
single byte whose octal value is nnn.  Why does MI do this?  Where in
GDB do we convert bytes into this representation, and is there any way
of asking MI not to make these conversions?

The reason for these questions is that Emacs's GDB interface fails to
recognize the original file name that hides behind these escapes, and
the result is that debugging a program whose source file names include
non-ASCII characters fails to display the source files through which
GDB steps.

It is, of course, possible to decode these escapes on Emacs's side.
But since many MI output records include file-name fields, doing so
everywhere is quite a PITA.  So if there's a way to avoid the need for
decoding in the first place, it's preferable.

TIA

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: GDB/MI reporting non-ASCII file names
  2015-09-29  9:18 GDB/MI reporting non-ASCII file names Eli Zaretskii
@ 2015-09-30 11:52 ` Pedro Alves
  2015-09-30 14:34   ` Eli Zaretskii
  0 siblings, 1 reply; 9+ messages in thread
From: Pedro Alves @ 2015-09-30 11:52 UTC (permalink / raw)
  To: Eli Zaretskii, gdb

On 09/29/2015 10:18 AM, Eli Zaretskii wrote:
> It seems that "gdb -i=mi" reports non-ASCII characters in filenames as
> octal escapes.  Here's an example from a GNU/Linux system whose
> locale's codeset is UTF-8:
> 
>   (gdb)
>   -file-list-exec-source-file
>   ^done,line="1",file="/home/e/eliz/\320\277\321\200\320\276\320\262\320\265\321\200\320\272\320\260.c",fullname="/srv/data/home/e/eliz/\320\277\321\200\320\276\320\262\320\265\321\200\320\272\320\260.c",macro-info="1"
> 
> Each of these \nnn is a string of literal ASCII characters, not a
> single byte whose octal value is nnn.  Why does MI do this?  Where in
> GDB do we convert bytes into this representation, and is there any way
> of asking MI not to make these conversions?
> 
> The reason for these questions is that Emacs's GDB interface fails to
> recognize the original file name that hides behind these escapes, and
> the result is that debugging a program whose source file names include
> non-ASCII characters fails to display the source files through which
> GDB steps.
> 
> It is, of course, possible to decode these escapes on Emacs's side.
> But since many MI output records include file-name fields, doing so
> everywhere is quite a PITA.  So if there's a way to avoid the need for
> decoding in the first place, it's preferable.

I happened to stumble on this discussion yesterday:

 https://www.sourceware.org/ml/gdb/2012-03/msg00001.html

Which points at:

 https://sourceware.org/ml/gdb/2010-08/msg00129.html

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: GDB/MI reporting non-ASCII file names
  2015-09-30 11:52 ` Pedro Alves
@ 2015-09-30 14:34   ` Eli Zaretskii
  2015-09-30 14:49     ` Pedro Alves
  0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2015-09-30 14:34 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb

> Date: Wed, 30 Sep 2015 12:52:25 +0100
> From: Pedro Alves <palves@redhat.com>
> 
> I happened to stumble on this discussion yesterday:
> 
>  https://www.sourceware.org/ml/gdb/2012-03/msg00001.html
> 
> Which points at:
> 
>  https://sourceware.org/ml/gdb/2010-08/msg00129.html

Thanks.  So it's hard-wired in MI, and the grammar requires that.  Too
bad.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: GDB/MI reporting non-ASCII file names
  2015-09-30 14:34   ` Eli Zaretskii
@ 2015-09-30 14:49     ` Pedro Alves
  2015-09-30 15:52       ` Eli Zaretskii
  0 siblings, 1 reply; 9+ messages in thread
From: Pedro Alves @ 2015-09-30 14:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gdb

On 09/30/2015 03:34 PM, Eli Zaretskii wrote:
>> Date: Wed, 30 Sep 2015 12:52:25 +0100
>> From: Pedro Alves <palves@redhat.com>
>>
>> I happened to stumble on this discussion yesterday:
>>
>>  https://www.sourceware.org/ml/gdb/2012-03/msg00001.html
>>
>> Which points at:
>>
>>  https://sourceware.org/ml/gdb/2010-08/msg00129.html
> 
> Thanks.  So it's hard-wired in MI, and the grammar requires that.  Too
> bad.

We can always extend it.  Maybe we just need to document that ascii is
the default, and that frontends should
issue -gdb-set print sevenbit-strings off if they want non-ascii?
Seems like Eclipse ended up doing that.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: GDB/MI reporting non-ASCII file names
  2015-09-30 14:49     ` Pedro Alves
@ 2015-09-30 15:52       ` Eli Zaretskii
  2015-10-09 11:12         ` Pedro Alves
  0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2015-09-30 15:52 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb

> Date: Wed, 30 Sep 2015 15:49:42 +0100
> From: Pedro Alves <palves@redhat.com>
> CC: gdb@sourceware.org
> 
> >>  https://sourceware.org/ml/gdb/2010-08/msg00129.html
> > 
> > Thanks.  So it's hard-wired in MI, and the grammar requires that.  Too
> > bad.
> 
> We can always extend it.  Maybe we just need to document that ascii is
> the default, and that frontends should
> issue -gdb-set print sevenbit-strings off if they want non-ascii?
> Seems like Eclipse ended up doing that.

But "-gdb-set print sevenbit-strings off" doesn't seem to work: I get
spurious "\200" strings which disrupt the whole sequence.  I thought
that was because 7-bit strings were a requirement in MI, for some
reason.

But now I think the culprit is this part of printchar:

  if (c < 0x20 ||
      (c >= 0x7F && c < 0xA0) ||       <<<<<<<<<<<<<<<<<<<<<<<<<<<<<
      (sevenbit_strings && c >= 0x80))

This produces octal escapes for characters in 0x7F..0x8F regardless of
the value of sevenbit_strings, which does the wrong thing with
non-leading bytes of UTF-8 sequences.

If you compile a program from a source file whose name includes
non-ASCII characters, then debug that program with -i=mi, do you see
the file names correctly, after turning 7 bits off?

Btw, I see valid non-ASCII file names when I use CLI and the
annotations instead.  So does this mean we don't use printchar for
emitting file names in that case?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: GDB/MI reporting non-ASCII file names
  2015-09-30 15:52       ` Eli Zaretskii
@ 2015-10-09 11:12         ` Pedro Alves
  2015-10-09 13:31           ` Eli Zaretskii
  0 siblings, 1 reply; 9+ messages in thread
From: Pedro Alves @ 2015-10-09 11:12 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gdb

Hi Eli,

Sorry for the delay.

On 09/30/2015 04:51 PM, Eli Zaretskii wrote:

> If you compile a program from a source file whose name includes
> non-ASCII characters, then debug that program with -i=mi, do you see
> the file names correctly, after turning 7 bits off?
> 

Looks like I see the same as you.  With a file named "Ã§ÃªÃ¡.c":

(gdb)
set print sevenbit-strings on
&"set print sevenbit-strings on\n"
=cmd-param-changed,param="print sevenbit-strings",value="on"
^done
(gdb) start
...
*stopped,reason="breakpoint-hit",disp="del",bkptno="2",frame={addr="0x00000000004004fb",func="main",args=[{name="argc",value="1"},{name="argv",value="0x7fffffffd838"}],file="\303\247\303\252\303\241.c",fullname="/home/pedro/gdb/tests/\303\247\303\252\303\241.c",line="5"},thread-id="1",stopped-threads="all",core="2"
(gdb)

(gdb)
set print sevenbit-strings off
&"set print sevenbit-strings off\n"
=cmd-param-changed,param="print sevenbit-strings",value="off"
^done
(gdb) start
...
*stopped,reason="breakpoint-hit",disp="del",bkptno="3",frame={addr="0x00000000004004fb",func="main",args=[{name="argc",value="1"},{name="argv",value="0x7fffffffd838"}],file="Ã§ÃªÃ¡.c",fullname="/home/pedro/gdb/tests/Ã§ÃªÃ¡.c",line="5"},thread-id="1",stopped-threads="all",core="2"

But with a file named "Î³Î»ÏŽÏƒÏƒÎ±.c" + "set print sevenbit-strings off":

*stopped,reason="breakpoint-hit",disp="del",bkptno="1",frame={addr="0x00000000004004fb",func="main",args=[{name="argc",value="1"},{name="argv",value="0x7fffffffd808"}],file="Î³Î»ï¿½\216ï¿½\203ï¿½\203Î±.c",fullname="/home/pedro/gdb/tests/Î³Î»ï¿½\216ï¿½\203ï¿½\203Î±.c",line="5"},thread-id="1",stopped-threads="all",core="3"
=breakpoint-deleted,id="1"
(gdb)

> Btw, I see valid non-ASCII file names when I use CLI and the
> annotations instead.  So does this mean we don't use printchar for
> emitting file names in that case?
>

I don't know.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: GDB/MI reporting non-ASCII file names
  2015-10-09 11:12         ` Pedro Alves
@ 2015-10-09 13:31           ` Eli Zaretskii
  2015-10-09 16:48             ` Pedro Alves
  0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2015-10-09 13:31 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb

> Date: Fri, 09 Oct 2015 12:11:57 +0100
> From: Pedro Alves <palves@redhat.com>
> CC: gdb@sourceware.org
> 
> On 09/30/2015 04:51 PM, Eli Zaretskii wrote:
> 
> > If you compile a program from a source file whose name includes
> > non-ASCII characters, then debug that program with -i=mi, do you see
> > the file names correctly, after turning 7 bits off?
> > 
> 
> Looks like I see the same as you.  With a file named "Ã§ÃªÃ¡.c":
> 
> (gdb)
> set print sevenbit-strings on
> &"set print sevenbit-strings on\n"
> =cmd-param-changed,param="print sevenbit-strings",value="on"
> ^done
> (gdb) start
> ...
> *stopped,reason="breakpoint-hit",disp="del",bkptno="2",frame={addr="0x00000000004004fb",func="main",args=[{name="argc",value="1"},{name="argv",value="0x7fffffffd838"}],file="\303\247\303\252\303\241.c",fullname="/home/pedro/gdb/tests/\303\247\303\252\303\241.c",line="5"},thread-id="1",stopped-threads="all",core="2"
> (gdb)
> 
> (gdb)
> set print sevenbit-strings off
> &"set print sevenbit-strings off\n"
> =cmd-param-changed,param="print sevenbit-strings",value="off"
> ^done
> (gdb) start
> ...
> *stopped,reason="breakpoint-hit",disp="del",bkptno="3",frame={addr="0x00000000004004fb",func="main",args=[{name="argc",value="1"},{name="argv",value="0x7fffffffd838"}],file="Ã§ÃªÃ¡.c",fullname="/home/pedro/gdb/tests/Ã§ÃªÃ¡.c",line="5"},thread-id="1",stopped-threads="all",core="2"
> 
> 
> 
> 
> But with a file named "Î³Î»ÏŽÏƒÏƒÎ±.c" + "set print sevenbit-strings off":
> 
> *stopped,reason="breakpoint-hit",disp="del",bkptno="1",frame={addr="0x00000000004004fb",func="main",args=[{name="argc",value="1"},{name="argv",value="0x7fffffffd808"}],file="Î³Î»ï¿½\216ï¿½\203ï¿½\203Î±.c",fullname="/home/pedro/gdb/tests/Î³Î»ï¿½\216ï¿½\203ï¿½\203Î±.c",line="5"},thread-id="1",stopped-threads="all",core="3"
> =breakpoint-deleted,id="1"
> (gdb)

I think the 0x7F..0xA0 range is a left-over from the Latin-N era, and
is a bad idea with the current UTF-8 default.

Would something like the following be acceptable (if accompanied with
the suitable changes to NEWS and the manual)?

diff --git a/gdb/utils.c b/gdb/utils.c
index afeff12..56eb9d5 100644
--- a/gdb/utils.c
+++ b/gdb/utils.c
@@ -1509,12 +1509,11 @@ printchar (int c, void (*do_fputs) (const char *, struct ui_file *),
 	   void (*do_fprintf) (struct ui_file *, const char *, ...)
 	   ATTRIBUTE_FPTR_PRINTF_2, struct ui_file *stream, int quoter)
 {
-  c &= 0xFF;			/* Avoid sign bit follies */
+  c &= 0xFF;				/* Avoid sign bit follies */
 
-  if (c < 0x20 ||		/* Low control chars */
-      (c >= 0x7F && c < 0xA0) ||	/* DEL, High controls */
-      (sevenbit_strings && c >= 0x80))
-    {				/* high order bit set */
+  if (c < 0x20 ||			/* Low control chars */
+      (sevenbit_strings && c >= 0x80))	/* High order bit set */
+    {
       switch (c)
 	{
 	case '\n':

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: GDB/MI reporting non-ASCII file names
  2015-10-09 13:31           ` Eli Zaretskii
@ 2015-10-09 16:48             ` Pedro Alves
  2015-10-09 17:11               ` Eli Zaretskii
  0 siblings, 1 reply; 9+ messages in thread
From: Pedro Alves @ 2015-10-09 16:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gdb

On 10/09/2015 02:31 PM, Eli Zaretskii wrote:

>>
>> But with a file named "Î³Î»ÏŽÏƒÏƒÎ±.c" + "set print sevenbit-strings off":
>>
>> *stopped,reason="breakpoint-hit",disp="del",bkptno="1",frame={addr="0x00000000004004fb",func="main",args=[{name="argc",value="1"},{name="argv",value="0x7fffffffd808"}],file="Î³Î»ï¿½\216ï¿½\203ï¿½\203Î±.c",fullname="/home/pedro/gdb/tests/Î³Î»ï¿½\216ï¿½\203ï¿½\203Î±.c",line="5"},thread-id="1",stopped-threads="all",core="3"
>> =breakpoint-deleted,id="1"
>> (gdb)
> 
> I think the 0x7F..0xA0 range is a left-over from the Latin-N era, and
> is a bad idea with the current UTF-8 default.
> 
> Would something like the following be acceptable (if accompanied with
> the suitable changes to NEWS and the manual)?
> 

I wonder whether we should we use isprint instead of removing
the condition entirely?

If this could be covered by a test it'd be great.

Thanks,
Pedro Alves

> diff --git a/gdb/utils.c b/gdb/utils.c
> index afeff12..56eb9d5 100644
> --- a/gdb/utils.c
> +++ b/gdb/utils.c
> @@ -1509,12 +1509,11 @@ printchar (int c, void (*do_fputs) (const char *, struct ui_file *),
>  	   void (*do_fprintf) (struct ui_file *, const char *, ...)
>  	   ATTRIBUTE_FPTR_PRINTF_2, struct ui_file *stream, int quoter)
>  {
> -  c &= 0xFF;			/* Avoid sign bit follies */
> +  c &= 0xFF;				/* Avoid sign bit follies */
>  
> -  if (c < 0x20 ||		/* Low control chars */
> -      (c >= 0x7F && c < 0xA0) ||	/* DEL, High controls */
> -      (sevenbit_strings && c >= 0x80))
> -    {				/* high order bit set */
> +  if (c < 0x20 ||			/* Low control chars */
> +      (sevenbit_strings && c >= 0x80))	/* High order bit set */
> +    {
>        switch (c)
>  	{
>  	case '\n':
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: GDB/MI reporting non-ASCII file names
  2015-10-09 16:48             ` Pedro Alves
@ 2015-10-09 17:11               ` Eli Zaretskii
  0 siblings, 0 replies; 9+ messages in thread
From: Eli Zaretskii @ 2015-10-09 17:11 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb

> Date: Fri, 09 Oct 2015 17:48:22 +0100
> From: Pedro Alves <palves@redhat.com>
> CC: gdb@sourceware.org
> 
> > Would something like the following be acceptable (if accompanied with
> > the suitable changes to NEWS and the manual)?
> > 
> 
> I wonder whether we should we use isprint instead of removing
> the condition entirely?

'isprint' is locale-dependent, do we really want that dependency?

> If this could be covered by a test it'd be great.

I don't have an easy access to a system where the test suite could be
run, and my Tcl knowledge is limited to trivial changes of existing
code.  So I'd be glad if someone else could write the test for this.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-10-09 17:11 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-29  9:18 GDB/MI reporting non-ASCII file names Eli Zaretskii
2015-09-30 11:52 ` Pedro Alves
2015-09-30 14:34   ` Eli Zaretskii
2015-09-30 14:49     ` Pedro Alves
2015-09-30 15:52       ` Eli Zaretskii
2015-10-09 11:12         ` Pedro Alves
2015-10-09 13:31           ` Eli Zaretskii
2015-10-09 16:48             ` Pedro Alves
2015-10-09 17:11               ` Eli Zaretskii

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).